Site navigation
  • 17 May 2024
  • 2 Minutes to read
  • Contributors
  • Dark
    Light
  • PDF

Site navigation

  • Dark
    Light
  • PDF

Article summary

What should I know about site navigation?

dexi.io robots can navigate a Web site in virtually any way a human can. This document discusses a number of Web site navigation elements you may encounter on your travels throughout the Internet and how to address them in our scraper editor.

Links and buttons

To instruct the scraper to follow a link:

  1. Select the link or button element to select it.
  2. In the Element Panel, select Click Element.

Screenshot 2021-03-29 at 12.50.09.png

Navigation menus

To instruct the scraper to follow a single navigation menu item, follow the directions found above, under Links and buttons.

To instruct the scraper to iterate through all menu items:

  1. Select menu items.
  2. In the Element Panel, select Loop Through Elements.
  3. Select the Step Forward button twice to adjust the timeline position to just after the Loop Through Elements step.
  4. Select the first menu item to select it.
  5. In the Element Panel, select Click Element.
  6. Follow this step with any further navigation or extraction steps to be performed after clicking each menu item.

The robot will perform all configured steps for each menu item before continuing with steps following the Loop Through Elements loop.

Authenticating with username and password

Make sure you have input fields and testing values configured for the username and password under the editor's Inputs tab. See What should I know about input and output?

To sign in to a site protected by a username/password security model:

  1. Select the username field to select it.
  2. In the Element Panel, select Input.
  3. In the Step Panel, choose the input field from which to retrieve the username.
  4. Make any other required configuration changes.
  5. Select the password field to select it.
  6. In the Element Panel, select Input.
  7. In the Step Panel, choose the input field from which to retrieve the password.
  8. Make any other configuration changes as needed.
  9. Select the submit credentials button to select it.
  10. In the Element Panel, select Click Element.

Captcha

To address a Captcha or similar anti-robot feature:

  1. Select the Captcha image element to select it.
  2. In the Element Panel, select Resolve Captcha. If selecting a Captcha element doesn't reveal the Resolve Captcha option, temporarily choose another option, such as an Extract option. You can then edit the step to select Resolve Captcha from the Step Type menu and enter the required configuration settings.
  3. Select the response input field to select it.
  4. In the Element Panel, select Input and complete the configuration.
  5. Select the submit response button to select it.
  6. In the Element Panel, select Click Element.

How the extractor navigates

Whenever extractor robots navigate to a new URL they do so by opening up a "tab" - once the iteration for a given page is done it automatically jumps back - closing down the newly opened tabs and shifting back to the existing tab.

Often this will be a loop through a list of links - where each click on the links will open a new tab and perform the actions you've described - and once it iterates to the next link in the list it simply closes the new tabs and shifts back to the existing tab containing the list - causing no additional loading or waiting time.

This is also called page state. Read more about it in What should I know about the extractor editor?


Was this article helpful?