UiPath

Read text data in PDF files using OCR in UiPath

This article describes how to use UiPath’s OCR to extract text data from PDF files.

Proceed to Table of Contents

The operator of this blog, F-penIT blog

This site was created by translating a blog created in Japanese into English using the DeepL translation.

Please forgive me if some of the English text is a little strange

Get OCR Text

To read text data in PDF files with OCR, use Get OCR Text and the OCR engine.

Get OCR Text setting items

Setup LocationSetting itemsConfiguration details
OutputTextThe string extracted from the indicated UI element.
WordsInfoThe screen coordinates of each word found in the indicated UI element.
MiscPrivateIf selected, the values of variables and arguments are no longer logged at Verbose level.
Target.Selectorext property used to find a particular UI element when the activity is executed.
Target.TimeoutMSSpecifies the amount of time (in milliseconds) to wait for the activity to run before the SelectorNotFoundException error is thrown.
Target.WaitForReadyBefore performing the actions, wait for the target to become ready.
Target.ElementUse the UiElement variable returned by another activity.
Target.ClippingRegionDefines the clipping rectangle, in pixels, relative to the UiElement, in the following directions: left, top, right, bottom.
CommonDisplayNameThe display name of the activity.
ContinueOnErrorSpecifies if the automation should continue even when the activity throws an error.

Uses Tesseract OCR as engine

Reads PDF files with the OCR engine as Tesseract OCR.

Tesseract OCR configuration items

Setup LocationSetting itemsConfiguration details
OptionsAllowedCharactersThe OCR engine extracts the given string according to the characters specified here.
DeniedCharactersThe OCR engine extracts the given string without taking into account the characters specified here.
InvertIf this check box is selected, the colors of the UI element are inverted before scraping.
LanguageThe language used by the OCR engine to extract the string from the UI element.
ExtractWordsIf this check box is selected, the on-screen position of each detected word is extracted.
ProfileChoose a preprocessing profile for the specified image or UI element to achieve a better OCR read.
ScaleThe scaling factor of the selected UI element or image.
OutputTextThe extracted string.
ResultThe extracted words along with their on-screen position.
InputImageThe image that you want to process.
CommonDisplayNameThe display name of the activity.
MiscPrivateIf selected, the values of variables and arguments are no longer logged at Verbose level.

sample process

Read PDF text and output to log

– workflow

– Target PDF


– Properties of “Get OCR Text ‘text It'”.


– Properties of “Tesseract OCR”

– variable

– Execution Result

ラッコくん

I’m able to output PDF text to the log.

Use UiPath Screen OCR as engine

Reads PDF files with the OCR engine as “UiPath Screen OCR”.

UiPath screen OCR setting items

Setup LocationSetting itemsConfiguration details
CommonDisplayNameThe display name of the activity.
InputImageThe image that you want to process.
LogonApiKeyThe API key used to provide you access to the UiPath Screen OCR (not required for the Preview period).
EndpointThe endpoint for UiPath Screen OCR. The default project settings value is https://ocr.uipath.com/.
Timeout (milliseconds)Specifies the amount of time (in milliseconds) to wait for a response from the server before an error is thrown.
MiscPrivateIf selected, the values of variables and arguments are no longer logged at Verbose level.
UseLocalServerDetermines if a local server should be used.
OutputResultProvides the extracted words along with their on-screen position.
TextProvides the extracted text.

sample process

Reads PDF text and outputs it to the log

– workflow

・Get OCR Text ‘document’ Properties

– UiPath Screen OCR Properties

– variable

– PDF to be read

– Execution Result Log

ラッコくん

I’m able to output PDF text to the log.

Uses OCR for Chinese, Japanese and Korean as engine

Read PDF files with OCR engine as OCR – Japanese, Chinese, Korean

OCR for Chinese, Japanese and Korean settings

Setup LocationSetting itemsConfiguration details
CommonDisplayNameThe display name of the activity.
InputImageThe image that you want to process.
LogonApiKeyThe API key used to provide you access to Document Understanding, when the OCR for Chinese, Japanese and Korean is used with Document Understanding activities.
EndpointThe endpoint associated with your Document Understanding API key.
MiscPrivateIf selected, the values of variables and arguments are no longer logged at Verbose level.
OutputResultrovides the extracted words and information about their position on the screen.
TextProvides the extracted text. This field supports only String variables.

sample process

Reads PDF text and outputs it to the log.

– workflow

・Get OCR Text ‘client’ Properties

– OCR – Japanese, Chinese, Korean Properties

F-pen

To obtain an API key, go to “Admin → Licenses → Robots & Service → Computer Vision” in Automation Cloud and click on “Api Key”.


ラッコくん

For endpoint URLs, see Public Endpoints.

– variable

– PDF to be read

– Execution Result Log

ラッコくん

I’m able to output PDF text to the log

Summary

  1. Get OCR Text and read PDF text using one engine.
  2. For reading Japanese, use OCR – Chinese, Japanese, Korean.

Back to Table of Contents

ABOUT ME
F-Pen
Japanese IT engineer with a wide range of experience in system development, cloud building, and service planning. In this blog, I will share my know-how on UiPath and certification. profile detail / twitter:@fpen17