473,397 Members | 2,056 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,397 software developers and data experts.

Mini web crawler

Hi,
I have a table in my DB with a list of URLs.
What I want to do is go to each URL and get a div tag with an ID
"ArticalTitle" and get the inner HTML of that div.

What would be the best way to do this? (I`m talking about the HTML Parsing
part, I got the DB stuff covered)

Thanks!
Apr 23 '07 #1
2 3704
Take a look at Simon Mourier's "HtmlAgilityPack". This provides XML DOM -
type XPATH access to the object model of a web page.
Peter

--
Site: http://www.eggheadcafe.com
UnBlog: http://petesbloggerama.blogspot.com
Short urls & more: http://ittyurl.net


"eladla" wrote:
Hi,
I have a table in my DB with a list of URLs.
What I want to do is go to each URL and get a div tag with an ID
"ArticalTitle" and get the inner HTML of that div.

What would be the best way to do this? (I`m talking about the HTML Parsing
part, I got the DB stuff covered)

Thanks!
Apr 23 '07 #2
Use a Regular Expression:

(?i)(?s)id="ArticleTitle"[^>]*>([^<]*)

The InnerHtml will be in Groups[1]

--
HTH,

Kevin Spencer
Microsoft MVP

Printing Components, Email Components,
FTP Client Classes, Enhanced Data Controls, much more.
DSI PrintManager, Miradyne Component Libraries:
http://www.miradyne.net

"eladla" <el****@newsgroups.nospamwrote in message
news:EF**********************************@microsof t.com...
Hi,
I have a table in my DB with a list of URLs.
What I want to do is go to each URL and get a div tag with an ID
"ArticalTitle" and get the inner HTML of that div.

What would be the best way to do this? (I`m talking about the HTML Parsing
part, I got the DB stuff covered)

Thanks!

Apr 23 '07 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Gomez | last post by:
Hi, Is there a way to know if a session on my web server is from an actual user or an automated crawler. please advise. G
1
by: Benjamin Lefevre | last post by:
I am currently developping a web crawler, mainly crawling mobile page (wml, mobile xhtml) but not only (also html/xml/...), and I ask myself which speed I can reach. This crawler is developped in...
1
by: Steve Ocsic | last post by:
Hi, I've coded a basic crawler where by you enter the URL and it will then crawl the said URL. What I would like to do now is to take it one step further and do the following: 1. pick up the...
0
by: Nicolas | last post by:
I need HELP!!!!! The crawler (Google or other) don't index my web site unless the web site is currently visited If there is nobody visiting those .aspx page therefor activating the aspnet no...
3
by: Bill | last post by:
Has anyone used/tested Request.Browser.Crawler ? Is it reliable, or are there false positives/negatives? Thanks!
13
by: abhinav | last post by:
Hi guys.I have to implement a topical crawler as a part of my project.What language should i implement C or Python?Python though has fast development cycle but my concern is speed also.I want to...
3
by: Charles Zhang | last post by:
How can I get Request.Browser.Crawler to work correctly? Do I need to update <browserCapssection of the web configuration file? If yes, where can I find something that cover all browsers and...
3
by: mh121 | last post by:
I am trying to write a web crawler (for academic research purposes) that grabs the number of links different websites/domain names have from other websites, as listed on Google (for example, to get...
0
by: kishorealla | last post by:
Hello I need to create a web bot/crawler/spider that would go into different web sites and collect data for us and store in a database. The crawler needs to 'READ' the options on a website (either...
4
by: sonich | last post by:
I need simple web crawler, I found Ruya, but it's seems not currently maintained. Does anybody know good web crawler on python or with python interface?
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.