- Print
- DarkLight
- PDF
There are three major components to understand Dexi.
- Robots
- Runs
- Executions
Robots
A robot is the most fundamental part of dexi.io and is something that automates things — such as websites or data flows. Robots come in four forms: Extractors, Crawlers, Pipes, and AutoBots.
Extractors
The Extractor is capable of extracting data from any site. It is fully HTML5-compliant, highly capable, and boasts the same feature set as a desktop web browser. Extractors can handle HTML, CSS, Javascript, downloads, web sockets, Canvas, forms, logins, and much more. The drawback of this rich feature is that unnecessary features tend to lead to longer processing times and slower robots when compared to the Crawler.
Crawlers
The Crawler is a much simpler robot compared to the Extractor. Given a URL to begin with, the Crawler automatically finds all outgoing links from that page within the same domain and traverses these pages, repeating this with each discovered page. This is basically the same technique that Google, Yahoo, and Bing use to index the web, but our Crawler is confined to a single domain per robot. The Crawler doesn't support CSS, Javascript, or any other special elements. This makes it capable of being highly concurrent and very fast. But without advanced feature support, the Crawler is limited by which pages it can interact with.
Pipes
A Pipe is a super robot. They can control other robots and create robot workflows. Essentially, taking one robot, initiated the run, then automatically moving to the next robot and triggering its run, etc. Pipes can pull in external information from API's, databses, and similar sites. Pipe-bots do not extract data from websites themselves, rather they combine other robots, API's, and datasets to make a single flow for data extraction and processing.
Autobots
AutoBots always accept a URL as input and then maps that URL to a list of Extractors for a range of sites. If you request something from an Autobot that it doesn't know how to compute it will add the URL to its list of internal sites. This allows you use a single AutoBot to scrape products from hundreds of sites using URLs for the individual projects.
Runs
For every robot, you must have at least one run to execute it. A run is a configuration of how you want to execute it, not an execution itself.
You can have an unlimited number of executions of a single run. You can also have an unlimited number of runs per robot, but for the vast majority of robots, you only need one or two.
A run configuration includes:
- Concurrency
- Scheduling
- Integrations
- Inputs
Integrations
The Integrations tab allows you to select which of your configured integrations this particular run should use. For every selected integration, dexi.io will upload all available formats to that integration upon successful execution.
Inputs
Inputs are especially important to understand, as inputs are often used to pass search criteria, login credentials, or other information to the web site. If your robot requires input, you must add inputs to the run or the robot will fail.
Adding an input looks like this in the Extractor editor, in the Inputs tab:
Here is what the step looks like:
In the configuration page, add inputs using the Inputs tab:
You will then get a series of results for each individual input:
To import your input values:
- Download the CSV template dexi.io automatically creates based on your robot's input fields.
- Copy the values.
- Save the CSV file.
- Upload it using the Import CSV button to import the values.
Watching runs and robots
To be notified via e-mail or push notification when an execution succeeds or fails, you can Watch a run.
To start watching:
- Select the Not watching button when editing a run.
- A drop-down menu will be brought up where you can specify what you want to watch.
To enable push notification to your smartphone or tablet devices you must connect with Pushover.net.
Monitoring your robots
If you want to monitor that your robots are in good condition, you can set up smaller runs that execute daily and then use watching to alert you if something goes wrong.
This will provide you with an early warning system to keep your robots running smoothly.
Executions
Executions are the results of robot configurations after you initiate a run.
Executions contain two tabs:
- Information
- Results
You can access an execution by completing the following steps:
- Go to Projects.
- Open the relevant project folder.
- Open the robot.
- Double-click on the required Configuration.
- Click the Executions tab.
- Click View.
The following options are available in both screens:
- Connect – This displays when the run that was executed contains Integrations. Select Connect to retry integrations associated to the run, if needed.
- Retry failed / stopped – Continue running the execution.
- Download – Select from the following options:
- Excel XML (.xls)
- Excel spreadsheet (.xlsx)
- Excel 97-2004 workbook (.xls)
- Comma separated values (.csv)
- Semicolon separated values (.scsv)
- XML (.xml)
- JSON (.json)
- Attachments/images (.zip)
Information tab
The Information tab contains robot ID information, how much time it took to extract the data, how much traffic the site used, numbers of errors, list of events, and other relevent statistics.
Results tab
The Results tab displays all the results of the execution and whether they've succeeded or not.
For each result, you'll see at least one screenshot, which is the screenshot of the last page of the execution. If your robot has 0 or 1 inputs, all the screenshots will be the same since all the results were retrieved in the same session.
Screenshots are only available for scraping executions.
Results tab icons
– Filter results by the following statuses:
- Pending
- Stopped
- Failed
- Ok
- Running
– Refresh results.
– Auto-fresh results.
– Pause auto-refreshing results.
– Retry count.
– Retry count (no values present)
– Open result log.
– Data is good.
– Error.
– Debug. Opens the robot to fix the error.
– Screenshot of the last page of the execution.
– No screenshot available.