Dexi Basics
  • 05 Nov 2024
  • 5 Minutes to read
  • Contributors
  • Dark
    Light
  • PDF

Dexi Basics

  • Dark
    Light
  • PDF

Article summary

There are three major components to understand Dexi.

  • Robots
  • Runs
  • Executions

Robots

A robot is the most fundamental part of dexi.io and is something that automates things — such as websites or data flows. Robots come in four forms: Extractors, Crawlers, Pipes, and AutoBots.

Extractors
The Extractor is capable of extracting data from any site. It is fully HTML5-compliant, highly capable, and boasts the same feature set as a desktop web browser. Extractors can handle HTML, CSS, Javascript, downloads, web sockets, Canvas, forms, logins, and much more. The drawback of this rich feature is that unnecessary features tend to lead to longer processing times and slower robots when compared to the Crawler.

Crawlers
The Crawler is a much simpler robot compared to the Extractor. Given a URL to begin with, the Crawler automatically finds all outgoing links from that page within the same domain and traverses these pages, repeating this with each discovered page. This is basically the same technique that Google, Yahoo, and Bing use to index the web, but our Crawler is confined to a single domain per robot. The Crawler doesn't support CSS, Javascript, or any other special elements. This makes it capable of being highly concurrent and very fast. But without advanced feature support, the Crawler is limited by which pages it can interact with.

Pipes
A Pipe is a super robot. They can control other robots and create robot workflows. Essentially, taking one robot, initiated the run, then automatically moving to the next robot and triggering its run, etc. Pipes can pull in external information from API's, databses, and similar sites. Pipe-bots do not extract data from websites themselves, rather they combine other robots, API's, and datasets to make a single flow for data extraction and processing.

Autobots
AutoBots always accept a URL as input and then maps that URL to a list of Extractors for a range of sites. If you request something from an Autobot that it doesn't know how to compute it will add the URL to its list of internal sites. This allows you use a single AutoBot to scrape products from hundreds of sites using URLs for the individual projects.

2021-03-25_11-57-54 1.gif

Runs

For every robot, you must have at least one run to execute it. A run is a configuration of how you want to execute it, not an execution itself.

db1.png

You can have an unlimited number of executions of a single run. You can also have an unlimited number of runs per robot, but for the vast majority of robots, you only need one or two.

A run configuration includes:

  • Concurrency
  • Scheduling
  • Integrations
  • Inputs

db2.png

Integrations

The Integrations tab allows you to select which of your configured integrations this particular run should use. For every selected integration, dexi.io will upload all available formats to that integration upon successful execution.

db3.png

Inputs
Inputs are especially important to understand, as inputs are often used to pass search criteria, login credentials, or other information to the web site. If your robot requires input, you must add inputs to the run or the robot will fail.

Adding an input looks like this in the Extractor editor, in the Inputs tab:

db4.png

Here is what the step looks like:

db5.png

In the configuration page, add inputs using the Inputs tab:

db6.png

You will then get a series of results for each individual input:

db7.png

To import your input values:

  1. Download the CSV template dexi.io automatically creates based on your robot's input fields.
  2. Copy the values.
  3. Save the CSV file.
  4. Upload it using the Import CSV button to import the values.

Watching runs and robots
To be notified via e-mail or push notification when an execution succeeds or fails, you can Watch a run.

To start watching:

  1. Select the Not watching button when editing a run.
  2. A drop-down menu will be brought up where you can specify what you want to watch.

db8.png

Note

To enable push notification to your smartphone or tablet devices you must connect with Pushover.net.

Monitoring your robots

If you want to monitor that your robots are in good condition, you can set up smaller runs that execute daily and then use watching to alert you if something goes wrong.

This will provide you with an early warning system to keep your robots running smoothly.

Executions

Executions are the results of robot configurations after you initiate a run.

Executions contain two tabs:

  • Information
  • Results

You can access an execution by completing the following steps:

  1. Go to Projects.
  2. Open the relevant project folder.
  3. Open the robot.
  4. Double-click on the required Configuration.
  5. Click the Executions tab.
  6. Click View.

The following options are available in both screens:

  • Connect – This displays when the run that was executed contains Integrations. Select Connect to retry integrations associated to the run, if needed.
  • Retry failed / stopped – Continue running the execution.
  • Download – Select from the following options:
    • Excel XML (.xls)
    • Excel spreadsheet (.xlsx)
    • Excel 97-2004 workbook (.xls)
    • Comma separated values (.csv)
    • Semicolon separated values (.scsv)
    • XML (.xml)
    • JSON (.json)
    • Attachments/images (.zip)

Information tab

The Information tab contains robot ID information, how much time it took to extract the data, how much traffic the site used, numbers of errors, list of events, and other relevent statistics.

z project select_configuration execution_information_tab.png

Results tab

The Results tab displays all the results of the execution and whether they've succeeded or not.

For each result, you'll see at least one screenshot, which is the screenshot of the last page of the execution. If your robot has 0 or 1 inputs, all the screenshots will be the same since all the results were retrieved in the same session.

z project select_configuration execution_information.png

Note

Screenshots are only available for scraping executions.

Results tab icons

z project select_configuration execution_information icon filter.png – Filter results by the following statuses:

  • Pending
  • Stopped
  • Failed
  • Ok
  • Running

z project select_configuration execution_information icon refresh_data.png – Refresh results.

z project select_configuration execution_information icon play.png – Auto-fresh results.

z project select_configuration execution_information icon pause_auto-refresh.png – Pause auto-refreshing results.

z project select_configuration execution_information icon retry_count.png – Retry count.

z project select_configuration execution_information icon retry_count_yellow.png – Retry count (no values present)

z project select_configuration execution_information icon view_log.png – Open result log.

z project select_configuration execution_information icon ok.png – Data is good.

z project select_configuration execution_information icon failed.png – Error.

z project select_configuration execution_information icon debug.png – Debug. Opens the robot to fix the error.

z project select_configuration execution_information icon images.png – Screenshot of the last page of the execution.

z project select_configuration execution_information icon no_images.png – No screenshot available.


Was this article helpful?