- Print
- DarkLight
- PDF
How can I improve robot performance?
A single robot execution will run from start to finish in a single processing thread, meaning that it will select one button at a time and visit one page at a time.
To increase efficiency, you can split your robot into two:
- one that gets all the URLs of the pages to visit and
- one that takes those URLs as input.
This allows the robot to visit pages concurrently (up to your account's concurrency limit), significantly speeding up execution time.
Be aware that increasing concurrency should be done with care and respect for the target site to not interrupt services or cause excessive load/stress on the site.
For smaller sites, stay below a maximum of 10 concurrent robots. For sites that experience larger amounts of traffic, you can probably go a bit higher.
Always read the terms and policies for the site you're scraping to ensure you're complying with their terms.
How do I disable images, stylesheets and Javascript?
You cannot globally disable images, stylesheets or javascript with a single click, but you can prevent specific network requests which will speed up load time.
How to block or ignore network requests
Blocking network requests for certain unnecessary elements can improve robot performance.
When using the robot editor,
- Select the Network tab to get an overview of the network traffic involved in a page request.
- Select the URL icon to mark scripts and elements for blocking or ignoring.
- Here, you can block/ignore specific URLs/file types or entire domains.
By default, we block Google Analytics and other tracking scripts; we don't want to skew the analytics data of any Web site we scrape.