MarketplaceStudioActivityWeb Scraping with XPath

Create your first automation in just a few minutes.Try Studio Web

Web Scraping with XPath

Web Scraping with XPath

by Internal Labs

StarStarStarStarStarStarStarStarStarStar

0

Activity

Downloads

43.3k

back button
back button
carouselImage0
next button
next button

Summary

Summary

XPath query based web scrape method without using a browser.

Overview

Overview

Problem:

Web scrape HTML pages without having a browser open and without UI Automation.

Solution:

XPath selectors ( https://www.w3schools.com/xml/xpath_intro.asp ) over HTML Agility Pack (http://html-agility-pack.net/)

Functional Requirements:

  • Plain HTML pages (content should not be generated dynamically with JavaScript).

You can get ready-made XPath selectors by using Google Chrome to open the page of interest, right clicking an element, Inspect, right click HTML definition, choose Copy and then "Copy XPath" - then you can adjust that selector, make it more generic, etc.

The output is a List of strings.

Test Case 1:
Attribute: "INNERHTML"
URL: "https://www.w3schools.com/xml/xpath_intro.asp"
XPath Selector: "//*[@id=""main""]/h2"

Test Case 2:
Attribute: "href"
URL: "https://www.w3schools.com/xml/xpath_intro.asp"
XPath Selector: "//*[@id=""main""]/div[4]/a[1]"

Features

Features

Simple and fast web scraping using standard XPath selectors without the need for browser (removes the need for IE, Chrome, Firefox browsers to be opened in order to retrieve HTML data, thus avoiding browser dependency).

Additional Information

Additional Information

Dependencies

HtmlAgilityPack (1.8.2)

Code Language

C#, Visual Basic

Runtime

Windows Legacy (.Net Framework 4.6.1)

Publisher

Internal Labs

Visit publisher's page

License & Privacy

License Agreement

Privacy Terms

Technical

Version

1.0.1

Updated

July 10, 2023

Works with

Studio: 21.10 - 22.10

Certification

Silver Certified

Support

UiPath Community Support

Similar Listings