Text Retrieval and Data Scraping from Web Pages with UiPath

In UiPath Studio development, we often extract text and table data from web pages.

However, you may not be able to extract data from the web as you would like.

This article explains how to extract text data and tabular data from web pages (data scraping).

＼Save during the sale period!／

Take a look at the UiPath course on the online learning service Udemy

*Free video available

Click here for the official Udemy website.

This site was created by translating a blog created in Japanese into English using the DeepL translation.

Please forgive me if some of the English text is a little strange

Get Text

Getting text data from a web page is done using the Get Text activity.

Get Text is found in UI Automation > Element > Control

Get Text Setting item

設定場所		設定項目	設定内容
Properties	Output	Value	Enables you to store the text from the specified UI element in a variable, as well as make changes to the text with VB expressions.
	Common	DisplayName	The display name of the activity.
	Common	ContinueOnError	Specifies if the automation should continue even when the activity throws an error.
	Misc	Private	If selected, the values of variables and arguments are no longer logged at Verbose level.
		Target.Selector	Text property used to find a particular UI element when the activity is executed.
		Target.TimeoutMS	Specifies the amount of time (in milliseconds) to wait for the activity to run before the SelectorNotFoundException error is thrown.
		Target.WaitForReady	Before performing the actions, wait for the target to become ready.
		Target.Element	Use the UiElement variable returned by another activity.
		Target.ClippingRegion	Defines the clipping rectangle, in pixels, relative to the UiElement, in the following directions: left, top, right, bottom.

Get Text

Sample Process
Get the top article title of Yahoo Sports as text.

・Target web page(Yahoo! Sports Home)

・Get Text Properties

・Execution result

Get Full Text

Extracting strings and their information from a web page is done using the Get Full Text activity.

Get Full Text is located in UI Automation > Text > Screen Scraping.

Get Full Text Setting item

Setting location		Setting item	Setting Contents
Properties	Output	Text	The string extracted from the indicated UI element.
	Options	IgnoreHidden	If this check box is selected, string information from the indicated UI element is NOT extracted.
	Common	DisplayName	The display name of the activity.
	Common	ContinueOnError	Specifies if the automation should continue even when the activity throws an error.
	Misc	Private	If selected, the values of variables and arguments are no longer logged at Verbose level.
		Target.Selector	Text property used to find a particular UI element when the activity is executed.
		Target.TimeoutMS	Specifies the amount of time (in milliseconds) to wait for the activity to run before the SelectorNotFoundException error is thrown.
		Target.WaitForReady	Before performing the actions, wait for the target to become ready.
		Target.Element	Use the UiElement variable returned by another activity.
		Target.ClippingRegion	Defines the clipping rectangle, in pixels, relative to the UiElement, in the following directions: left, top, right, bottom.U

Get Full Text

Sample Process
On the Yahoo Spotrs NFL Schedule page, use “Get Full Text” and “Get Text” for selective weeks to get text data and output it to the log.

・Target web page(NFL Schesule)

・Get Full Text Properties

・Get Text Properties

・Execution result

F-pen

“Get Full Text” is getting the text for all the months that can be selected, but “Get Text” is only getting the displayed months.

Data scraping

Data scraping of a single page

Data scraping is used to retrieve tabular data from web pages.

Data scraping is in the ribbon.

How to use data scraping
・Click on [Data Scraping] in the ribbon section

・Click [Next]

・Move the mouse over the data part of the table format, and click

・Click Yes

・Click [Finish].

F-pen

The default number of retrievals is 100, so if you don’t want to set a limit on the number of retrievals, set it to 0.

・If you don’t want to retrieve data for multiple pages, click [No].

・Verify that the data scraping workflow has been created.

F-pen

Within the “Attach Browser” activity, an “Extract Structured Data” activity will be created.

Extract Structured Data Setting item

Setting location		Setting item	Setting Contents
Properties	Input	ExtractMetadata	An XML string that enables you to define what data to extract from the indicated web page.
		Target.Selector	Text property used to find a particular UI element when the activity is executed.
		Target.TimeoutMS	Specifies the amount of time (in milliseconds) to wait for the activity to run before the SelectorNotFoundException error is thrown.
		Target.WaitForReady	Before performing the actions, wait for the target to become ready.
		Target.Element	Use the UiElement variable returned by another activity.
		Target.ClippingRegion	Defines the clipping rectangle, in pixels, relative to the UiElement, in the following directions: left, top, right, bottom.
	Options	DelayBetweenPagesMS	The amount of time, in milliseconds, to wait until the next page is loaded.
		MaxNumberOfResults	The maximum number of results to be extracted.
		NextLinkSelector	The selector that identifies the link/button used to navigate to the next page.
		SendWindowMessages	If selected, in the case where the data that is to be extracted spans multiple pages, the click that changes the page is executed by sending a specific message to the target application.
		SimulateClick	If selected, in the case where the data that is to be extracted spans multiple pages, it simulates the click that changes the page by using the technology of the target application.
	Output	DataTable	The information extracted from the indicated web page.
	Common	DisplayName	The display name of the activity.
	Common	ContinueOnError	Specifies if the automation should continue even when the activity throws an error.
	Misc	Private	If selected, the values of variables and arguments are no longer logged at Verbose level.

Extract Structured Data

Sample Process
Export Yahoo Sports MLB rankings to a CSV file.

・Target Sites(MLB Standings)

・Extract Structured Data Properties

・Write CSV Properties

・Setting variables

・CSV file output as a result of execution

Data scraping of multiple pages (with repeating links)

Getting multi-page tabular data with repeating links is done by using links across multiple pages in data scraping.

Data scraping is in the ribbon.

How to use data scraping
・Click on [Data Scraping] in the ribbon section.

・Click [Next]

・Mouse over the data in the tabular data and click

・Click [Yes]

・Maxinum number of result(0 for all)] to 0. Click [Finish]

F-pen

The default number of retrievals is 100, so if you don’t want to set a limit on the number of retrievals, set it to 0.

・Click [Yes]

・Click the page transition link (“>” in the following case) with the same link

・Verify that the data scraping workflow has been created.

Sample Process
Get the Boston schedule for the NBA and output to a CSV file.

・Target Sites(Boston Celtics Schedule)

・Attach Browser Properties

・Extract Structured Data Properties

・Write CSV Properties

・CSV file of execution results

sea otter

The multi-page schedule data will be output to a CSV file.

Data scraping of multiple pages (no repeating links)

To retrieve multi-page tabular data without repeating links, use the data scraping, click(or Select Item) and repeat activity.

Sample Process
Output the results of 1week to 3weeks of NFL to a CSV file.

sea otter

Specify a specific week with Select Item to move the page and get the schedule.

・Target Sites(NFL Schedule)

・Attach Browser Properties

・Extract Structured Data Properties

F-pen

The data in the data table will be appended, not overwritten.

・Select Item properties

F-pen

The week specified in Item is selected by a variable.

・Write CSV Properties

・Variables

・CSV file output as a result of execution

ラッコくん

I’ve been able to get the results of the games from 1 week to 3 weeks.

Summary

To retrieve text data from a web page, use the “Get Text” activity.
Use the “Get Full Text” activity to extract strings and their information from a web page.
Data scraping is used to retrieve tabular data from web pages.
To retrieve multiple pages of data, specify multiple page elements in data scraping, or use the Click(or Select Item) and Repeat activity.

Back to Table of Contents

＼Save during the sale period!／

Take a look at the UiPath course on the online learning service Udemy

*Free video available

Click here for the official Udemy website.

same category UiPath

The operator of this blog, F-penIT blog

F-Pen

Japanese IT engineer with a wide range of experience in system development, cloud building, and service planning. In this blog, I will share my know-how on UiPath and certification. profile detail / twitter:@fpen17