Introduction
XPath refers to XML Path Language which can be put into action to navigate through specific attributes and elements in an HTML or XML document. XPath is an XSLT standard element that is recommended by W3C and it uses "path like" syntax to recognize and navigate single document nodes in an XML document. Before go through this article, you may check previous article on what is HTML agility pack to rewind the few things about.Free Video Tutorial to Learn XPath using HTML Agility Pack
Components in XPath
The functions for numeric values, sequence handling, string values, booleans, node handling, date and time comparison, and much more are available. Now these days, XPath expressions can also beintegrated in JavaScript, XML Schema, Java, PHP, Python, C and C++, and lots of other languages.
XPath 1.0, XPath 2.0 and XPath 3.0 were the W3C Recommendations..
Using XPath with the HtmlDocument classHere we are using for web scraping websites and extract information as per our requirements.
XPath Demo to ‘Extract text using XPath’
HtmlWeb web = new HtmlWeb();
Step #2: Define object of HtmlDocument()
HtmlDocument doc = new HtmlDocument();
Step #3: Load Document to execute XPath statement
doc = web.Load("https://www.technologycrowds.com/2019/10/compute-sha-256-hash-using-csharp-for-effective-secruity.html");
Step #4: Now extracting text using XPath Statement
var _extractText = doc.DocumentNode.SelectSingleNode("/html/body/div[5]/div/div/div/div[1]/div/div/div[2]/div[1]/div[2]/article/div[2]/div/div[2]").InnerText;
Step #5: Final Method demonstrating XPath
// XPath Method static void xPathByHTMLAgility() { HtmlWeb web = new HtmlWeb(); HtmlDocument doc = new HtmlDocument(); doc = web.Load("https://www.technologycrowds.com/2019/10/compute-sha-256-hash-using-csharp-for-effective-secruity.html"); var _extractText = doc.DocumentNode.SelectSingleNode("/html/body/div[5]/div/div/div/div[1]/div/div/div[2]/div[1]/div[2]/article/div[2]/div/div[2]").InnerText; Console.WriteLine(_extractText); }
Post A Comment:
0 comments: