- Print
- DarkLight
- PDF
Extractors: Page Navigation
This article explains how Extractor robots handle page navigation in general.
Extractors: Lists & Loops shows an example of this by navigating back and forth between a product listing page and each product details page:
To learn the basics of how to build Extractor robots, please see Extractors.
Page state
When a step in an Extractor robot navigates to a new URL, e.g., via a Go to URL step or a Click element step that selects a link, it does so in a way that corresponds to a human opening the page in a new browser tab (or browser window).
In the example of looping over product details pages on a listing page, every time the robot selects a details link, the details page is opened in a new tab. After every iteration of the loop, that tab is then closed.
Keeping track of what page should be navigated to next is called page state and is automatically handled by Extractors.
Examples
Almost any interaction with a web page can cause a page navigation (as defined by the developer of the page).
However, some typical examples of what can cause a page navigation are listed below, specifying in parenthesis which step type to use.
- Visiting a new URL - Go to URL
- Selecting a link - Click element
- Submitting a form - Click element
- Navigating a paginated page - Page iteration
- Explicitly changing the URL via JavaScript - Execute JavaScript, e.g., location.href=url
Single page applications
Some websites are built as single page apps (if built using frameworks like React or AngularJS), i.e., from the browser's perspective, when new content is loaded, e.g., loading a product details page from the listing page, it is not considered a page navigation.
This is because the loading of the new content is handled via JavaScript code: the URL in the browser remains the same but the logic in the code swaps out the list page for the details page.
It is not possible to provide a general answer on how to handle single page applications but, in the product listing-details example in each iteration, before going back to the listing page, the final step should somehow reset the page state. This final step could, e.g., select a Back button on the page or be a History -> Back step, corresponding to selecting Back in your browser.
The same resetting of state should be done when using branches in Extractors that interact with single page applications.
Read more about this in the Page state section of The extractor editor
A few websites that implement "normal" page navigation (links opened in new tabs) require the Extractor robot to treat the website as if it were a single page app (due to some quirk in the website code). This behavior can be achieved by enabling the Force single page? option on the Settings tab.