Overview
The last article on HtmlAgility Search By Text has detailed the steps involved on one of the important utilities provided by the HtmlAgility pack. Yet there is this another significant ability that has to be learned to get the favicon present in a particular web page. If you are new to HtmlAgility pack, you must consider visiting this
What is HTML Agility Pack as there are quite a lot of topics one has to acquaint. You could also have a quick recap on the abilities of HtmlAgility pack regarding
HTML Traversing using Agility Pack.
What is a favicon?
Cut short, favicon is an icon that acts as a shortcut for a website, URL, tab or bookmarks. A web designer uploads it as a part of the page UI and the browser displays this as an image next to the address bar and adjacent to bookmarks list if at all the page was bookmarked by the web user. If the browser supports multi-tabs, then the image is shown beside the title of the page on the tab.
How to extract favicon?
Typically, the favicon inclusion into a web page is achieved through the HTML Mark up. Hence, to extract favicon from website, one must have prerequisite knowledge of HTML Manipulation by using HtmlAgility pack and you can have a good idea of it by referring
here. Follow the below steps to extract favicon.
Step #1
Pass the website URL into a local variable as the first step in the process of extracting the favicon.
Step #2
Further to that, declare a variable of the type HtmlWeb and using this, load the document into another variable.
Step #3
Declare favicon variable and initialize it to null followed by typecasting it to dynamic.
Step #4
Get the node that declares the favicon by using SelectSingleNode as shown below.
var el = htmlDoc.DocumentNode.SelectSingleNode("/html/head/link[@rel='icon' and @href]");
Step #5
Given that the node is not empty or null, get the href attribute value .
Step #6
Display the resulting information on the console or use it for your own purposes.
You may refer the below mentioned code to get a
good view of the logic behind extraction of favicon from the web page.
using System;
using System;
using HtmlAgilityPack;
using System.Net;
public class Program
{
public static void Main()
{
// website URL
var html = @"https://www.TechnologyCrowds.com/";
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12 | SecurityProtocolType.Tls11 | SecurityProtocolType.Tls;
// declare htmlweb and load html document
HtmlWeb web = new HtmlWeb();
var htmlDoc = web.Load(html);
var favicon = (dynamic)null;
// extracting icon
var el = htmlDoc.DocumentNode.SelectSingleNode("/html/head/link[@rel='icon' and @href]");
if (el != null)
{
favicon = el.Attributes["href"].Value;
// showing output here
Console.WriteLine(Convert.ToString(favicon));
}
}
}
Output
https://www.technologycrowds.com/favicon.ico
Post A Comment:
0 comments: