UiPath

Text Retrieval and Data Scraping from Web Pages with UiPath

In UiPath Studio development, we often extract text and table data from web pages.

However, you may not be able to extract data from the web as you would like.

This article explains how to extract text data and tabular data from web pages (data scraping).

 

 

 Related Articles Create a practical UiPath robot with Udemy’s online course

 

The operator of this blog, F-penIT blog

 

This site was created by translating a blog created in Japanese into English using the DeepL translation.

Please forgive me if some of the English text is a little strange

 

Get Text

Getting text data from a web page is done using the Get Text activity.

Get Text is found in UI Automation > Element > Control

Get Text  Setting item

設定場所 設定項目 設定内容
Properties Output Value Enables you to store the text from the specified UI element in a variable, as well as make changes to the text with VB expressions.
Common DisplayName The display name of the activity.
ContinueOnError Specifies if the automation should continue even when the activity throws an error.
Misc
Private If selected, the values of variables and arguments are no longer logged at Verbose level.
Target.Selector Text property used to find a particular UI element when the activity is executed.
Target.TimeoutMS Specifies the amount of time (in milliseconds) to wait for the activity to run before the SelectorNotFoundException error is thrown.
Target.WaitForReady  Before performing the actions, wait for the target to become ready.
Target.Element Use the UiElement variable returned by another activity.
Target.ClippingRegion Defines the clipping rectangle, in pixels, relative to the UiElement, in the following directions: left, top, right, bottom.

 

Sample Process
Get the top article title of Yahoo Sports as text.

・Target web page(Yahoo! Sports Home)

 

・Get Text  Properties

・Execution result

 

Get Full Text

Extracting strings and their information from a web page is done using the Get Full Text activity.

Get Full Text is located in UI Automation > Text > Screen Scraping.

 

Get Full Text  Setting item

Setting location Setting item Setting Contents
Properties Output Text The string extracted from the indicated UI element.
Options IgnoreHidden If this check box is selected, string information from the indicated UI element is NOT extracted.
Common DisplayName The display name of the activity.
ContinueOnError Specifies if the automation should continue even when the activity throws an error.
Misc
Private If selected, the values of variables and arguments are no longer logged at Verbose level.
Target.Selector Text property used to find a particular UI element when the activity is executed.
Target.TimeoutMS Specifies the amount of time (in milliseconds) to wait for the activity to run before the SelectorNotFoundException error is thrown.
Target.WaitForReady Before performing the actions, wait for the target to become ready.
Target.Element  Use the UiElement variable returned by another activity.
Target.ClippingRegion Defines the clipping rectangle, in pixels, relative to the UiElement, in the following directions: left, top, right, bottom.U

 

Sample Process
On the Yahoo Spotrs NFL Schedule page, use “Get Full Text” and “Get Text” for selective weeks to get text data and output it to the log.

・Target web page(NFL Schesule)

・Get Full Text  Properties

・Get Text  Properties

・Execution result

F-pen
F-pen
“Get Full Text” is getting the text for all the months that can be selected, but “Get Text” is only getting the displayed months.

 

Data scraping

Data scraping of a single page

Data scraping is used to retrieve tabular data from web pages.

Data scraping is in the ribbon.

 

How to use data scraping
・Click on [Data Scraping] in the ribbon section

 

・Click [Next]

 

・Move the mouse over the data part of the table format, and click

 

・Click Yes

 

・Click [Finish].

F-pen
F-pen
The default number of retrievals is 100, so if you don’t want to set a limit on the number of retrievals, set it to 0.

 

・If you don’t want to retrieve data for multiple pages, click [No].

・Verify that the data scraping workflow has been created.

F-pen
F-pen
Within the “Attach Browser” activity, an “Extract Structured Data” activity will be created.

 

Extract Structured Data  Setting item

Setting location Setting item Setting Contents
Properties Input ExtractMetadata An XML string that enables you to define what data to extract from the indicated web page.
Target.Selector Text property used to find a particular UI element when the activity is executed.
Target.TimeoutMS Specifies the amount of time (in milliseconds) to wait for the activity to run before the SelectorNotFoundException error is thrown.
Target.WaitForReady Before performing the actions, wait for the target to become ready.
Target.Element Use the UiElement variable returned by another activity.
Target.ClippingRegion Defines the clipping rectangle, in pixels, relative to the UiElement, in the following directions: left, top, right, bottom.
Options DelayBetweenPagesMS The amount of time, in milliseconds, to wait until the next page is loaded.
MaxNumberOfResults The maximum number of results to be extracted.
NextLinkSelector The selector that identifies the link/button used to navigate to the next page.
SendWindowMessages  If selected, in the case where the data that is to be extracted spans multiple pages, the click that changes the page is executed by sending a specific message to the target application.
SimulateClick If selected, in the case where the data that is to be extracted spans multiple pages, it simulates the click that changes the page by using the technology of the target application.
Output DataTable The information extracted from the indicated web page.
Common DisplayName The display name of the activity.
ContinueOnError Specifies if the automation should continue even when the activity throws an error.
Misc Private If selected, the values of variables and arguments are no longer logged at Verbose level.

 

Sample Process
Export Yahoo Sports MLB rankings to a CSV file.

・Target Sites(MLB Standings)

 

・Extract Structured Data  Properties

・Write CSV  Properties

・Setting variables

・CSV file output as a result of execution

 

Data scraping of multiple pages (with repeating links)

Getting multi-page tabular data with repeating links is done by using links across multiple pages in data scraping.

Data scraping is in the ribbon.

 

How to use data scraping
・Click on [Data Scraping] in the ribbon section.

・Click [Next]

・Mouse over the data in the tabular data and click

 

・Click [Yes]

 

・Maxinum number of result(0 for all)] to 0. Click [Finish]

F-pen
F-pen
The default number of retrievals is 100, so if you don’t want to set a limit on the number of retrievals, set it to 0.

 

・Click [Yes]

 

・Click the page transition link (“>” in the following case) with the same link

 

・Verify that the data scraping workflow has been created.

 

 

Sample Process
Get the Boston schedule for the NBA and output to a CSV file.

・Target Sites(Boston Celtics Schedule)

 

・Attach Browser  Properties

・Extract Structured Data  Properties

 

・Write CSV  Properties

 

・CSV file of execution results

sea otter
sea otter
The multi-page schedule data will be output to a CSV file.

 

Data scraping of multiple pages (no repeating links)

To retrieve multi-page tabular data without repeating links, use the data scraping, click(or Select Item) and repeat activity.

 

Sample Process
Output the results of 1week to 3weeks of NFL to a CSV file.

sea otter
sea otter
Specify a specific week with Select Item to move the page and get the schedule.

 

・Target Sites(NFL Schedule)

 

 

・Attach Browser  Properties

 

 

・Extract Structured Data  Properties

F-pen
F-pen
The data in the data table will be appended, not overwritten.

 

・Select Item  properties

F-pen
F-pen
The week specified in Item is selected by a variable.

 

 

・Write CSV  Properties

 

・Variables

 

・CSV file output as a result of execution

ラッコくん
ラッコくん
I’ve been able to get the results of the games from 1 week to 3 weeks.

 

Summary

  • To retrieve text data from a web page, use the “Get Text” activity.
  • Use the “Get Full Text” activity to extract strings and their information from a web page.
  • Data scraping is used to retrieve tabular data from web pages.
  • To retrieve multiple pages of data, specify multiple page elements in data scraping, or use the Click(or Select Item) and Repeat activity.

 

Back to Table of Contents

 

 Related Articles Create a practical UiPath robot with Udemy’s online course

 同カテゴリ UiPath