473,397 Members | 2,033 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,397 software developers and data experts.

Retrieve Description/ Meta tags from website as well as remove HTML

Hi all, does anyone know of a nice utility/ class which will allow me to
retrieve the details of a webpage?

Specifically, I would like to be able to retrive the html and then call a
method which would retrieve: meta tags
Keywords
Description
as well as another method which removes all the HTML from the string
starting at the body tag

Does one exist? I know I can write one using regular expressions etc but
rather than inventing the wheel :)

Thanks
Mark


Nov 19 '05 #1
3 1437
JV
I assume you mean programmatically, since you can obviously hand-edit in VS
or even just NOTEPAD.

I had to do something like this to work around the VS bug where it
occasionally eats the closing tag on a <link> tag. I didn't do a whole lot
of research but here is what I can tell you.

1) the HTML parsers I found were expensive. I didn't find a free one. Least
not one that was useful.
2) Sometimes people use the IE browser control for DOM access, but I found
it to be pretty clunky for my purposes.
3) You can't really load it in an XML document because the HTML is rarely
well-formed XML (though maybe in VS2005 using XHTML it will be?)

I ended up doing some of my own string parsing since my need was relatively
simple.

"Mark" <ma**@Z-Zvolution.nZt> wrote
Hi all, does anyone know of a nice utility/ class which will allow me to
retrieve the details of a webpage?

Nov 19 '05 #2
On Fri, 24 Jun 2005 15:59:03 +1200, "Mark" <ma**@Z-Zvolution.nZt> wrote:
Hi all, does anyone know of a nice utility/ class which will allow me to
retrieve the details of a webpage?

Specifically, I would like to be able to retrive the html and then call a
method which would retrieve: meta tags
Keywords
Description
as well as another method which removes all the HTML from the string
starting at the body tag

Does one exist? I know I can write one using regular expressions etc but
rather than inventing the wheel :)

Thanks
Mark

Yeah, take a look at this:
http://www.codefluent.com/smourier/d...gilitypack.zip

Nov 19 '05 #3
Thanks for you help guys :)
Cheers
Mark
"Wilbur Slice" <pa@papapapa.com> wrote in message
news:7s********************************@4ax.com...
On Fri, 24 Jun 2005 15:59:03 +1200, "Mark" <ma**@Z-Zvolution.nZt> wrote:
Hi all, does anyone know of a nice utility/ class which will allow me to
retrieve the details of a webpage?

Specifically, I would like to be able to retrive the html and then call a
method which would retrieve: meta tags
Keywords
Description
as well as another method which removes all the HTML from the string
starting at the body tag

Does one exist? I know I can write one using regular expressions etc but
rather than inventing the wheel :)

Thanks
Mark

Yeah, take a look at this:
http://www.codefluent.com/smourier/d...gilitypack.zip

Nov 19 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
by: Barney Norris | last post by:
Hi, The W3C validator tells me this page isn't valid HTML 4.01 Strict: http://www-student.cs.york.ac.uk/~jban100/wont_validate.html The reason it gives is I've closed meta tags with a '/'...
23
by: Fast Eddie | last post by:
What's the benefit to coding <meta name="author"...> and such? Thanks.
7
by: Don NJ | last post by:
First my site name is Sinfullblisslingerie.com. I'm starting this little business to try and make some money on the side. My kids will be going to college in a few years... Anyway, in the past...
8
by: Taras_96 | last post by:
Hi everyone, We' ve come to the conclusion that we wish the user to be directed to an error page if javascript is disabled <enter comment about how a webpage shouldn't rely on javascript here :)...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.