473,387 Members | 1,579 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

how to spider web page with button and hyperlink

I have been writing C# programs to spider yellow page to get list of
restaurant name, address to the database. When I encounter button or
hyperlink, I don’t know how to use the program to click the button or
hyperlink. Does anyone have this type of sample code in either C#, vb.net?
Thanks,
Charts

Jun 27 '08 #1
2 3411
Hi Charts,

From your description, you're writing a custom web page spider and
wondering how to deal with button and hyperlinks appear on the page ,
correct?

Based on my understanding, web spider just retrieve the html content of web
pages and parse the elements in it. For button or hyperlinks elements, I
think they'll rely on the following facts:

1. Hyperlink is just a linker point to another external resource, so how
are you parsing the main page(use WebRequest?), you can just retrieve the
"href" location attribute from the hyperlink and use
WebRequest(sequentially or start in a new thread) to visit the linked page.

2. For Button, I think it's more complex. Depend on what does the button
do, if it just submit the page, you need to check the <formtag's "Action"
url, and use WebRequest to visit the resource in the "Action' attribute. If
it just perform a postback (to self page) like ASP.NET, I don't think you
need to do additional work. Also, some button's click may depend on some
other entry fields on the page, it is not quite possible to cover all kinds
of page's action logic in spider code.

BTW, what component are you use to parse html content? I've used the Html
Agility Pack which is a pure .net based library and it's quite useful:

#Html Agility Pack
http://www.codeplex.com/htmlagilitypack

Here are some other good tech aritcles about writing a custom Web Spider:

#MyDownloader: A Multi-thread C# Segmented Download Manager
http://www.codeproject.com/KB/IP/MyD...df=90&mpp=25&n
oise=3&sort=Position&view=Quick&fr=51

#A Web Spider Library in C#
http://www.codeproject.com/KB/aspnet/ZetaWebSpider.aspx

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead
Delighting our customers is our #1 priority. We welcome your comments and
suggestions about how we can improve the support we provide to you. Please
feel free to let my manager know what you think of the level of service
provided. You can send feedback directly to my manager at:
ms****@microsoft.com.

==================================================
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscripti...ult.aspx#notif
ications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscripti...t/default.aspx.
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.

--------------------
>From: =?Utf-8?B?Q2hhcnRz?= <Ac*****@newsgroup.nospam>
Subject: how to spider web page with button and hyperlink
Date: Tue, 24 Jun 2008 14:57:00 -0700
>
I have been writing C# programs to spider yellow page to get list of
restaurant name, address to the database. When I encounter button or
hyperlink, I don’t know how to use the program to click the button or
hyperlink. Does anyone have this type of sample code in either C#, vb.net?
Thanks,
Charts

Jun 27 '08 #2
Steven,
Your post is a great help. I'll follow up and let you know. Thanks so much.
Charts

"Steven Cheng [MSFT]" wrote:
Hi Charts,

From your description, you're writing a custom web page spider and
wondering how to deal with button and hyperlinks appear on the page ,
correct?

Based on my understanding, web spider just retrieve the html content of web
pages and parse the elements in it. For button or hyperlinks elements, I
think they'll rely on the following facts:

1. Hyperlink is just a linker point to another external resource, so how
are you parsing the main page(use WebRequest?), you can just retrieve the
"href" location attribute from the hyperlink and use
WebRequest(sequentially or start in a new thread) to visit the linked page.

2. For Button, I think it's more complex. Depend on what does the button
do, if it just submit the page, you need to check the <formtag's "Action"
url, and use WebRequest to visit the resource in the "Action' attribute. If
it just perform a postback (to self page) like ASP.NET, I don't think you
need to do additional work. Also, some button's click may depend on some
other entry fields on the page, it is not quite possible to cover all kinds
of page's action logic in spider code.

BTW, what component are you use to parse html content? I've used the Html
Agility Pack which is a pure .net based library and it's quite useful:

#Html Agility Pack
http://www.codeplex.com/htmlagilitypack

Here are some other good tech aritcles about writing a custom Web Spider:

#MyDownloader: A Multi-thread C# Segmented Download Manager
http://www.codeproject.com/KB/IP/MyD...df=90&mpp=25&n
oise=3&sort=Position&view=Quick&fr=51

#A Web Spider Library in C#
http://www.codeproject.com/KB/aspnet/ZetaWebSpider.aspx

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead
Delighting our customers is our #1 priority. We welcome your comments and
suggestions about how we can improve the support we provide to you. Please
feel free to let my manager know what you think of the level of service
provided. You can send feedback directly to my manager at:
ms****@microsoft.com.

==================================================
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscripti...ult.aspx#notif
ications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscripti...t/default.aspx.
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.

--------------------
From: =?Utf-8?B?Q2hhcnRz?= <Ac*****@newsgroup.nospam>
Subject: how to spider web page with button and hyperlink
Date: Tue, 24 Jun 2008 14:57:00 -0700

I have been writing C# programs to spider yellow page to get list of
restaurant name, address to the database. When I encounter button or
hyperlink, I don’t know how to use the program to click the button or
hyperlink. Does anyone have this type of sample code in either C#, vb.net?
Thanks,
Charts

Jun 27 '08 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: laura | last post by:
I'm doing a page which gathers some text in the form of a text box in a <form>. This text is saved to a text file, notices.txt and I want to be able to display the saved text on the page, as soon...
3
by: Arun K | last post by:
Hi, I am creating a simple .aspx page to add some fields with validation. I have used different .NET validations like REquiredFieldValidator, RegularExpressionValidator and showed the summary...
8
by: Judy Ward | last post by:
I have an index.aspx with frames. The top frame has a navigation bar with a "Login" hyperlink. If the user has already logged in I want this link to change to "Logout". I am using forms-based...
5
by: tshad | last post by:
Is there a way to carry data that I have already read from the datagrid from page to page? I am looking at my Datagrid that I page through and when the user says get the next page, I have to go...
4
by: zdrakec | last post by:
Hello all: I have a detail page from which the user clicks a hyperlink to get a list page. On the list page, I have included a hyperlink whose NavigateURL property is set, at run time, to be the...
5
by: Daniel | last post by:
Hi All, i have problem to scroll the page the the specific section of the page after click a command button. I have tried using <a name="f"> and use response.redirect("xxxx.aspx?#f"). Although...
6
by: Shawn | last post by:
Any ideas how I can have a button click on one open page force a postback on a different page.
3
by: Tony Lance | last post by:
Big Bertha Thing spider Cosmic Ray Series Possible Real World System Constructs http://web.onetel.com/~tonylance/spider.html Access page JPG 11K Image Astrophysics net ring Access site...
1
by: Wolfman | last post by:
Hi gang! I've been searching for a solution to this problem extensively but nothing really hits the mark. I have a descriptive popup page that contains a PayPal order button. The normal PayPal...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.