Hello,
Does anybody know is there a .NET or COM based library to
parse HTML or convert html to xml so I can use xpath to
parse it?
Thanks
Qin Zhou 7 1676
Hi Q.Z,
Thank you for using Microsoft Newsgroup Service. Based on your description,
you are looking for some COM or dotnet components which can convert the
html document into XML (XHTML) style document. Is my understanding correct?
If so, I think Ken Cox've provided some good sites on this topic, they
shows two components of COM. You may have a try on them to see whether they
help.
Steven Cheng
Microsoft Online Support
Get Secure! www.microsoft.com/security
(This posting is provided "AS IS", with no warranties, and confers no
rights.)
Ken and Steven,
Thanks a a lot! Looks like it will do the trick.
Qin ZHou -----Original Message----- It looks like you can use the COM wrapper around Tidy to
get there... http://perso.wanadoo.fr/ablavier/TidyCOM/
http://www.15seconds.com/Issue/010601.htm
"Q.Z." <zh**@netquote.com> wrote in message news:05****************************@phx.gbl... Hello,
Does anybody know is there a .NET or COM based library
to parse HTML or convert html to xml so I can use xpath to parse it?
Thanks Qin Zhou
.
I have tried the SgmlReader but am having difficultly with some sites, such as www.msn.com
If I could find a way to do parsing on HTML using C/C++/C# I would be happy. All I really
need is a way to have an array of <tag> and <data>. Finer grainularity is not necessary. Just
the raw information. I do need the entire page though from opening <html> to the closing </html>.
I would prefer an HTML to XML conversion, but as time is limited, any solution would be
appreciated.
Thanks,
Dave
On Fri, 09 Jan 2004 03:23:29 GMT, v-******@online.microsoft.com (Steven Cheng[MSFT]) wrote: Hi Q.Z,
Thank you for using Microsoft Newsgroup Service. Based on your description, you are looking for some COM or dotnet components which can convert the html document into XML (XHTML) style document. Is my understanding correct?
If so, I think Ken Cox've provided some good sites on this topic, they shows two components of COM. You may have a try on them to see whether they help.
Steven Cheng Microsoft Online Support
Get Secure! www.microsoft.com/security (This posting is provided "AS IS", with no warranties, and confers no rights.)
If you load you page to WebBrowser control you can parse you page using DOM,
this is work slow, but works.
"David Elliott" <Da**********@BellSouth.net.nospam> wrote in message
news:1i********************************@4ax.com... I have tried the SgmlReader but am having difficultly with some sites,
such as www.msn.com If I could find a way to do parsing on HTML using C/C++/C# I would be
happy. All I really need is a way to have an array of <tag> and <data>. Finer grainularity is
not necessary. Just the raw information. I do need the entire page though from opening <html>
to the closing </html>. I would prefer an HTML to XML conversion, but as time is limited, any
solution would be appreciated.
Thanks, Dave On Fri, 09 Jan 2004 03:23:29 GMT, v-******@online.microsoft.com (Steven
Cheng[MSFT]) wrote:Hi Q.Z,
Thank you for using Microsoft Newsgroup Service. Based on your
description,you are looking for some COM or dotnet components which can convert the html document into XML (XHTML) style document. Is my understanding
correct? If so, I think Ken Cox've provided some good sites on this topic, they shows two components of COM. You may have a try on them to see whether
theyhelp.
Steven Cheng Microsoft Online Support
Get Secure! www.microsoft.com/security (This posting is provided "AS IS", with no warranties, and confers no rights.)
Take a look http://blogs.msdn.com/smourier/archi...6/04/8265.aspx
George.
"David Elliott" <Da**********@BellSouth.net.nospam> wrote in message
news:1i********************************@4ax.com... I have tried the SgmlReader but am having difficultly with some sites,
such as www.msn.com If I could find a way to do parsing on HTML using C/C++/C# I would be
happy. All I really need is a way to have an array of <tag> and <data>. Finer grainularity is
not necessary. Just the raw information. I do need the entire page though from opening <html>
to the closing </html>. I would prefer an HTML to XML conversion, but as time is limited, any
solution would be appreciated.
Thanks, Dave On Fri, 09 Jan 2004 03:23:29 GMT, v-******@online.microsoft.com (Steven
Cheng[MSFT]) wrote:Hi Q.Z,
Thank you for using Microsoft Newsgroup Service. Based on your
description,you are looking for some COM or dotnet components which can convert the html document into XML (XHTML) style document. Is my understanding
correct? If so, I think Ken Cox've provided some good sites on this topic, they shows two components of COM. You may have a try on them to see whether
theyhelp.
Steven Cheng Microsoft Online Support
Get Secure! www.microsoft.com/security (This posting is provided "AS IS", with no warranties, and confers no rights.) This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: hunterb |
last post by:
I have a file which has no BOM and contains mostly single byte chars. There
are numerous double byte chars (Japanese) which appear throughout. I need to
take the resulting Unicode and store it in a...
|
by: hunterb |
last post by:
I have a file which has no BOM and contains mostly single byte chars. There
are numerous double byte chars (Japanese) which appear throughout. I need to
take the resulting Unicode and store it in a...
|
by: PenguinPig |
last post by:
Dear All Experts
I would like to know how to convert a HTML into Image using C#. Or allow me
contains HTML code (parsed) in Image? I also tried this way but it just
display the character "<" &...
|
by: Just Another Victim of the Ambient Morality |
last post by:
I've done a google search on this but, amazingly, I'm the first guy to
ever need this! Everyone else seems to need the reverse of this. Actually,
I did find some people who complained about this...
|
by: ryjfgjl |
last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
|
by: ryjfgjl |
last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
| |