How To Scrap Links from a Url in Asp.Net
Introduction
The main objective of this post is to demonstrate scraping of links from web pages(URL) using Asp.Net.
Generally, scraping of web pages is done with "HttpWebRequest" and "HttpWebResponse" method of C# in ASP.NET. However, it is observed that it becomes very difficult to fetch page data using "HttpWebRequest method.
Background
In this post, I have created a demonstration web site. I will be scraping web site(URL) using Asp.NET.
Here, I have made use of third party tools "HtmlAgilityPack" to demonstrate this example.
You can get more information about this tool by visiting this link.
Code Snippet of Application
Aspx page:-
<div width="100%">
<asp:TextBox runat="server" ID="txtUrl" Width="300px"></asp:TextBox>
<asp:Button runat="server" ID="btnscrap" Text="Scrap" OnClick="btnscrap_Click" />
<br />
<br />
<asp:Label runat="server" ID="OutputLabel"></asp:Label>
</div>
NameSpace Used Are:-
using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;
using HtmlAgilityPack;
cs page:-
protected void btnscrap_Click(object sender, EventArgs e)
{
var webGet = new HtmlWeb();
var document = webGet.Load(txtUrl.Text.Trim());
var aTags = document.DocumentNode.SelectNodes("//a");
int counter = 1;
if (aTags != null)
{
foreach (var aTag in aTags)
{
if (aTag.Attributes.Contains("href"))
{
OutputLabel.Text += counter + ". " + aTag.InnerHtml + " - " + aTag.Attributes["href"].Value.ToString() + "\t" + "<br />";
counter++;
}
}
}
}
Everything is done now.You just have to run the application.
Snaps:-
Enter Url And Click Download.
You will get the list of links on that url.
You can download sample code from below link:-
Scraping Example
Please give your feedback and comments on Post or You can mail me at samarthsolomon89@gmail.com ...!
<div width="100%">
<asp:TextBox runat="server" ID="txtUrl" Width="300px"></asp:TextBox>
<asp:Button runat="server" ID="btnscrap" Text="Scrap" OnClick="btnscrap_Click" />
<br />
<br />
<asp:Label runat="server" ID="OutputLabel"></asp:Label>
</div>
NameSpace Used Are:-
using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;
using HtmlAgilityPack;
Note:- For NameSpace "HtmlAgilityPack" you need to add Reference in your project. You can Download "HtmlAgilityPack" from :-
http://htmlagilitypack.codeplex.com/
cs page:-
protected void btnscrap_Click(object sender, EventArgs e)
{
var webGet = new HtmlWeb();
var document = webGet.Load(txtUrl.Text.Trim());
var aTags = document.DocumentNode.SelectNodes("//a");
int counter = 1;
if (aTags != null)
{
foreach (var aTag in aTags)
{
if (aTag.Attributes.Contains("href"))
{
OutputLabel.Text += counter + ". " + aTag.InnerHtml + " - " + aTag.Attributes["href"].Value.ToString() + "\t" + "<br />";
counter++;
}
}
}
}
Everything is done now.You just have to run the application.
Snaps:-
Enter Url And Click Download.
You will get the list of links on that url.
You can download sample code from below link:-
Scraping Example
Please give your feedback and comments on Post or You can mail me at samarthsolomon89@gmail.com ...!
No comments:
Post a Comment