Crawl a website directory
  • 05 Nov 2024
  • 1 Minute to read
  • Contributors
  • Dark
    Light
  • PDF

Crawl a website directory

  • Dark
    Light
  • PDF

Article summary

From the Dashboard or Projects page,

  1. Select the Create New Robot button.
  2. Select Crawler.
  3. Enter the required information.

Screenshot 2021-03-30 at 10.29.39.png

  1. Under the Settings tab, make any desired configuration changes.
  • If you wish to use input data to provide the crawler with URLs to visit, activate the Dynamic URL? checkbox.
  • To follow the rules of any robots.txt files on the site (recommended), activate the Respect robots.txt? checkbox.

1.png

  1. Under the Output tab, create any output fields your project requires. These fields will store the output data generated by the crawler.

2.png

  1. Under the Page Processors tab, configure any page processors required by your project. See Page processors for details.

3.png

  1. When all necessary page processors are configured, select the blue Save button in the top-right of the page to save the crawler.

4.png

On the Projects page,
8. Select the crawler.
9. Select the Create Run button near the top-right of the page.

Screenshot 2021-03-30 at 10.33.38.png

  1. Enter a name for the run.

Screenshot 2021-03-30 at 10.33.28.png

  1. Select the New Run.
  2. Select Open in the slide-in panel.
  3. Under the Configuration tab, change settings as needed.

5.png

  1. Under the Integrations tab, configure any required integrations.

6.png

  1. Under the Executions tab, you may launch the execution when ready, or view existing execution information.

7.png


Was this article helpful?

What's Next