Introduction
Hope, you must have gone through well about all my previous
C# learning articles. Now, in this article, we’ll come to know How to Read an XML Sitemap from website using C#
based. An XML Sitemap is a particularly designed XML file which
is planned to provide the significant content of a website to the major search
engine crawlers (e.g. Google, Bing etc) for the purpose of indexing.
Following is the XML code for the structure of the basic sitemaps:
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"> <url> <loc>https://seoagilitytools.com/</loc> <lastmod>2020-04-16T15:46:46+00:00</lastmod> <priority>1.00</priority> </url> <url> <loc>https://seoagilitytools.com/About-Us</loc> <lastmod>2020-04-16T15:46:46+00:00</lastmod> <priority>0.80</priority> </url> <url> <loc>https://seoagilitytools.com/Meta-Tag-Extractor</loc> <lastmod>2020-04-16T15:46:46+00:00</lastmod> <priority>0.80</priority> </url> <url> <loc>https://seoagilitytools.com/Count-Characters-And-Words</loc> <lastmod>2020-04-16T15:46:46+00:00</lastmod> <priority>0.80</priority> </url> <url> <loc>https://seoagilitytools.com/Compute-Free-Sha-512-Hash-Online</loc> <lastmod>2020-04-16T15:46:46+00:00</lastmod> <priority>0.80</priority> </url> <url> <loc>https://seoagilitytools.com/Compute-Free-Sha-384-Hash-Online</loc> <lastmod>2020-04-16T15:46:46+00:00</lastmod> <priority>0.80</priority> </url> <url> <loc>https://seoagilitytools.com/Compute-Free-Sha-256-Hash-Online</loc> <lastmod>2020-04-16T15:46:46+00:00</lastmod> <priority>0.80</priority> </url> <url> <loc>https://seoagilitytools.com/Compute-Free-Sha-Md5-Hash-Online</loc> <lastmod>2020-04-16T15:46:46+00:00</lastmod> <priority>0.80</priority> </url> <url> <loc>https://seoagilitytools.com/HTML-Entities-Encoder-Decoder</loc> <lastmod>2020-04-16T15:46:46+00:00</lastmod> <priority>0.80</priority> </url> </urlset>
Please note that, multiple <url> </url> tags are repeated inside
the <urlset> nodes as per requirement. Each <url> describes
a page on the website.
Nodes inside URL node
Basically, there are 4 nodes found inside the <url> node such as
<loc>, <priority>, <lastmod>, and <changefreq>.
- The <loc> </loc>node contains the URL of the web page.
- The <priority></priority> node contains the parameter of predefined priority of sitemap set by the web master.
- The <lastmod> </lastmod> node contains the last date on which the web page was modified.
- The <changefreq></changefreq> node contains information about frequency of editing of web pages and suggest the search engine with respect to repeat crawling.
Sample C# console application on reading XML site map
Following is a sample console application on how to read an XML sitemap file into a C# web application:Here, we will inherit the WebClient class from the System. Next, we will use System.Net and System.Xml namespace for the purpose of downloading the file and loading it as an XMLDocument and loop using the nodes to collect the data from the Sitemap. Now, in this example we basically print the data to the console, but practically, these information could be stored in a database, written to a file, or utilized in some other place.
Let's start
Create your own Sitemap.xml file or use the sample file with the
.NET SDK Quick Starts.
- First Copy the Sitemap.xml file to the \Inetpub\Wwwroot folder on your PC.
- Open Visual Studio application.
- Now, create a new C# Console Application. You can either work through the section of Complete code listing or continue through the following steps to build up the application.
- Use the System.Xml namespace so that you don’t have to qualify the XmlTextReader class declaration in your code later.
using System.Net; using System.Xml; using System.Net; public static void Main() { string sitemapURL = "https://seoagilitytools.com/sitemap.xml"; /*Create a new instance of the System.Net Webclient*/ WebClient wc = new WebClient(); /*Set the Encodeing on the Web Client*/ wc.Encoding = System.Text.Encoding.UTF8; /* Download the document as a string*/ string sitemapString = wc.DownloadString(sitemapURL); /*Create a new xml document*/ XmlDocument urldoc = new XmlDocument(); /*Load the downloaded string as XML*/ urldoc.LoadXml(sitemapString); /*Create an list of XML nodes from the url nodes in the sitemap*/ XmlNodeList xmlSitemapList = urldoc.GetElementsByTagName("url"); /*Loops through the node list and prints the values of each node*/ foreach (XmlNode node in xmlSitemapList) { if (node["loc"] != null) { Console.WriteLine("url " + node["loc"].InnerText); } if (node["priority"] != null) { Console.WriteLine("priority " + node["priority"].InnerText); } if (node["lastmod"] != null) { Console.WriteLine("last modified " + node["lastmod"].InnerText); } if (node["changefreq"] != null) { Console.WriteLine("change frequency " + node["changefreq"].InnerText); } Console.WriteLine(Environment.NewLine); } }
Working Sample
Wrapping Up
In the above program, we have created a new instance of the
‘System.Net.WebClient’ object to download the targeted Sitemap as
a string. To load the downloaded Sitemap as an XML file, we have created a new
‘System.Xml.XmlDocument’. The XML Nodes contain all the <url> nodes in the Sitemap. The rest of the lines of code loop through the sub-elements of the
<url> nodes and prints out the information of each node. Now, press Ctrl+F5 in Visual Studio to run the console
application and the output will be as follows.
Overall, you must have a good idea about
How to Read an XML Sitemap From C# based website or application.
Post A Comment:
0 comments: