Create your first automation in just a few minutes.Try Studio Web →
by Internal Labs
0
Activity
43.3k
Summary
Summary
XPath query based web scrape method without using a browser.
Overview
Overview
Problem:
Web scrape HTML pages without having a browser open and without UI Automation.
Solution:
XPath selectors ( https://www.w3schools.com/xml/xpath_intro.asp ) over HTML Agility Pack (http://html-agility-pack.net/)
Functional Requirements:
You can get ready-made XPath selectors by using Google Chrome to open the page of interest, right clicking an element, Inspect, right click HTML definition, choose Copy and then "Copy XPath" - then you can adjust that selector, make it more generic, etc.
The output is a List of strings.
Test Case 1:
Attribute: "INNERHTML"
URL: "https://www.w3schools.com/xml/xpath_intro.asp"
XPath Selector: "//*[@id=""main""]/h2"
Test Case 2:
Attribute: "href"
URL: "https://www.w3schools.com/xml/xpath_intro.asp"
XPath Selector: "//*[@id=""main""]/div[4]/a[1]"
Features
Features
Simple and fast web scraping using standard XPath selectors without the need for browser (removes the need for IE, Chrome, Firefox browsers to be opened in order to retrieve HTML data, thus avoiding browser dependency).
Additional Information
Additional Information
Dependencies
HtmlAgilityPack (1.8.2)
Code Language
C#, Visual Basic
Runtime
Windows Legacy (.Net Framework 4.6.1)
Technical
Version
1.0.1Updated
July 10, 2023Works with
Studio: 21.10 - 22.10
Certification
Silver Certified
Support
UiPath Community Support