Introduction
Until far, we have discussed What is HTML Agility Pack and then about how to install it through HAP: Learn Install HTML agility pack and Load an HTML Document. Following it, you were able to unleash the power of this useful library by loading the HTML document and extracting all the href values present in a web page in the Learn HAP: Extract all Href value from HTML Document using HTML agility pack. In this tutorial, we are about to advance to another level by gaining knowledge on how to extract Meta-Information from the website using HTML agility pack and thus, you are also about to learn on scrap website using HTML agility pack.Free Video Library: Learn HTML Agility Pack Step by Step
Get the Meta Info using HAP
Namespace
using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; using HtmlAgilityPack;
Load HTML document using HTML Agility Pack
You have already learnt how to load HTML Document using HTML Agility PackHtmlWeb web = new HtmlWeb(); HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc = web.Load("http://technologyCrowds.com"); GetMetaInformation(doc, "description");
Creating Methods to Extract Meta Information
static void GetMetaInformation(HtmlAgilityPack.HtmlDocument htmldoc, string value) { HtmlNode tcNode = htmldoc.DocumentNode.SelectSingleNode("//meta[@name='" + value + "']"); string fulldescription = string.Empty; if (tcNode != null) { HtmlAttribute desc; desc = tcNode.Attributes["content"]; Console.ForegroundColor = ConsoleColor.Red; Console.Write(desc.Value); Console.ReadLine(); } }
Now you see above steps, how extracted meta information from the website.
Post A Comment:
0 comments: