473,320 Members | 1,766 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Parseing HTML

guy
I have a large number of html file (10,000+) and need to programmatical
modify them on a regular basis.

How can I determine the textual data that is present, ignoring tags etc

so if i have a line such as:-
<TD WIDTH="524"><B><FONT SIZE="2" FACE="Times New Roman"
COLOR="#000000">Direction2</FONT></B></TD>

I need to return "Direction2" and nothing else

any ideas?

guy

Nov 10 '06 #1
4 1032
The only way I know how would be to use MSHTML. This has the HTML object
model and allows you to load up an HTML dom, and then you can get the inner
text of a node, and so on.

"guy" <gu*@discussions.microsoft.comwrote in message
news:A9**********************************@microsof t.com...
>I have a large number of html file (10,000+) and need to programmatical
modify them on a regular basis.

How can I determine the textual data that is present, ignoring tags etc

so if i have a line such as:-
<TD WIDTH="524"><B><FONT SIZE="2" FACE="Times New Roman"
COLOR="#000000">Direction2</FONT></B></TD>

I need to return "Direction2" and nothing else

any ideas?

guy
Nov 10 '06 #2
guy
Thanks Marina

I will go and whimper in a corner

guy

"Marina Levit [MVP]" wrote:
The only way I know how would be to use MSHTML. This has the HTML object
model and allows you to load up an HTML dom, and then you can get the inner
text of a node, and so on.

"guy" <gu*@discussions.microsoft.comwrote in message
news:A9**********************************@microsof t.com...
I have a large number of html file (10,000+) and need to programmatical
modify them on a regular basis.

How can I determine the textual data that is present, ignoring tags etc

so if i have a line such as:-
<TD WIDTH="524"><B><FONT SIZE="2" FACE="Times New Roman"
COLOR="#000000">Direction2</FONT></B></TD>

I need to return "Direction2" and nothing else

any ideas?

guy

Nov 10 '06 #3
Try the Html Agility Pack, more info here:
http://chrisfulstow.blogspot.com/200...ml-in-net.html

guy wrote:
Thanks Marina

I will go and whimper in a corner

guy

"Marina Levit [MVP]" wrote:
The only way I know how would be to use MSHTML. This has the HTML object
model and allows you to load up an HTML dom, and then you can get the inner
text of a node, and so on.

"guy" <gu*@discussions.microsoft.comwrote in message
news:A9**********************************@microsof t.com...
>I have a large number of html file (10,000+) and need to programmatical
modify them on a regular basis.
>
How can I determine the textual data that is present, ignoring tags etc
>
so if i have a line such as:-
<TD WIDTH="524"><B><FONT SIZE="2" FACE="Times New Roman"
COLOR="#000000">Direction2</FONT></B></TD>
>
I need to return "Direction2" and nothing else
>
any ideas?
>
guy
>
Nov 10 '06 #4
guy wrote:
I have a large number of html file (10,000+) and need to programmatical
modify them on a regular basis.
The .NET framework does not have an HTML parser. There are however
libraries provided by others such as the HTML agility pack
<http://www.codeplex.com/Wiki/View.aspx?ProjectName=htmlagilitypack>
or the SgmlReader
<http://www.gotdotnet.com/Community/UserSamples/Details.aspx?SampleGuid=B90FDDCE-E60D-43F8-A5C4-C3BD760564BC>


--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Nov 10 '06 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Fred | last post by:
Hi, I am parsing a small xml document and the parseing goes 'all funny' when parsing this element: <useragent>Mozilla/4.61 (WinNT; I)</useragent> I've created a subclass of...
2
by: atapi103 | last post by:
I have documented programing errors in C++ useing xml document. So I want a way to display this xml document but any xml parser I download complains about "A name was started with an invalid...
2
by: An S. | last post by:
I have created a little "update" system, that tells when a update from nvidia is released, currently it tells it from a "simple" protocol "S2P", i have been told that XML, is much easier, for the...
1
by: jdrechsler | last post by:
I play a game that has raw news feeds stored in txt on there website. located http://a.swirve.com/data My problem is I know nothing about PHP. The basics of what I am trying to do is scan the...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.