473,748 Members | 9,913 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Can I use Internet explorers DOM parser?

I'm writing an HTML parser and would like to use Internet Explorers DOM
parser.

Can I use Internet Explorers DOM parser through a web service?

thanks for the help
Jun 27 '08 #1
4 2568
Hi fbrewster,

From your description, you're looking for some components or means to parse
HTML string, correct?

What's the input format of the html, are you programmticaly captureing the
html content from web and parse it or are there any existing html files on
local file disk?

Yes, in .net you can still use the MSHTML component(IE DOM parser) to parse
html. It is a COM component, therefore you need to call it via COM interop.
Here are some web articles demonstrating how to use it in .net:

#Parsing html markup text using MSHTML
http://www.eggheadcafe.com/articles/parsinghtml.asp

#Parsing HTML without Using the Browser Control
http://www.codeguru.com/vb/vb_intern...cle.php/c4815/

the MSHTML component load the html into it's DOM memory model and you can
access html elements in the DOM structure just like what you can do when
using javascript to accessing client-side html's DOM collection.
Also, for .net framework specific components, I've ever used the "Html
Agility Pack" which is good one for parsing html:

#.NET Html Agility Pack: How to use malformed HTML just like it was
well-formed XML
http://blogs.msdn.com/smourier/archi...6/04/8265.aspx

#Html Agility Pack
http://www.codeplex.com/htmlagilitypack

it also provide a DOM based model. And it also support XPATH based query
which is quite convenient and powerful.

========sample code using Html Aglity Pack=========== =
private void Parse_Questions ()
{
//get html content from web
HttpWebRequest req = WebRequest.Crea te(txtUrl.Text) as
HttpWebRequest;

HttpWebResponse rep = req.GetResponse () as HttpWebResponse ;

StreamReader sr = new StreamReader(re p.GetResponseSt ream());
//construct html document object and load the html stream

html.HtmlDocume nt hd = new HtmlAgilityPack .HtmlDocument() ;
hd.Load(sr);

sr.Close();
rep.Close();
//use xpath t o query the expected nodes in the htmldocument

html.HtmlNode doc = hd.DocumentNode ;

html.HtmlNodeCo llection divs =
doc.SelectNodes ("//div[@class='questio nbody']");
StreamWriter sw = new StreamWriter(@" e:\temp\htmlout put.htm");
int i = 0;
sw.WriteLine("< html><body>");
foreach (html.HtmlNode node in divs)
{
//....processing code

}
sw.WriteLine("</body></html>");

sw.Close();
}

=============== =============== =========

Hope this helps.

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead
Delighting our customers is our #1 priority. We welcome your comments and
suggestions about how we can improve the support we provide to you. Please
feel free to let my manager know what you think of the level of service
provided. You can send feedback directly to my manager at:
ms****@microsof t.com.

=============== =============== =============== =====
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscripti...ult.aspx#notif
ications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscripti...t/default.aspx.
=============== =============== =============== =====
This posting is provided "AS IS" with no warranties, and confers no rights.

--------------------
From: "fbrewster" <fb*******@news group.nospam>
Subject: Can I use Internet explorers DOM parser?
Date: Mon, 16 Jun 2008 12:53:22 -0500
I'm writing an HTML parser and would like to use Internet Explorers DOM
parser.

Can I use Internet Explorers DOM parser through a web service?

thanks for the help

Jun 27 '08 #2
Thanks for your post Stephen.

I followed up on MSHTML, but I found this KB article that indicates that I
can't use MSHTML or IE from a service:
http://support.microsoft.com/kb/244085

My parser needs to run within a Windows Service. Do you know of a
technology that I can use?
"Steven Cheng [MSFT]" <st*****@online .microsoft.comw rote in message
news:a0******** ******@TK2MSFTN GHUB02.phx.gbl. ..
Hi fbrewster,

From your description, you're looking for some components or means to
parse
HTML string, correct?

What's the input format of the html, are you programmticaly captureing the
html content from web and parse it or are there any existing html files on
local file disk?

Yes, in .net you can still use the MSHTML component(IE DOM parser) to
parse
html. It is a COM component, therefore you need to call it via COM
interop.
Here are some web articles demonstrating how to use it in .net:

#Parsing html markup text using MSHTML
http://www.eggheadcafe.com/articles/parsinghtml.asp

#Parsing HTML without Using the Browser Control
http://www.codeguru.com/vb/vb_intern...cle.php/c4815/

the MSHTML component load the html into it's DOM memory model and you can
access html elements in the DOM structure just like what you can do when
using javascript to accessing client-side html's DOM collection.
Also, for .net framework specific components, I've ever used the "Html
Agility Pack" which is good one for parsing html:

#.NET Html Agility Pack: How to use malformed HTML just like it was
well-formed XML
http://blogs.msdn.com/smourier/archi...6/04/8265.aspx

#Html Agility Pack
http://www.codeplex.com/htmlagilitypack

it also provide a DOM based model. And it also support XPATH based query
which is quite convenient and powerful.

========sample code using Html Aglity Pack=========== =
private void Parse_Questions ()
{
//get html content from web
HttpWebRequest req = WebRequest.Crea te(txtUrl.Text) as
HttpWebRequest;

HttpWebResponse rep = req.GetResponse () as HttpWebResponse ;

StreamReader sr = new StreamReader(re p.GetResponseSt ream());
//construct html document object and load the html stream

html.HtmlDocume nt hd = new HtmlAgilityPack .HtmlDocument() ;
hd.Load(sr);

sr.Close();
rep.Close();
//use xpath t o query the expected nodes in the htmldocument

html.HtmlNode doc = hd.DocumentNode ;

html.HtmlNodeCo llection divs =
doc.SelectNodes ("//div[@class='questio nbody']");
StreamWriter sw = new StreamWriter(@" e:\temp\htmlout put.htm");
int i = 0;
sw.WriteLine("< html><body>");
foreach (html.HtmlNode node in divs)
{
//....processing code

}
sw.WriteLine("</body></html>");

sw.Close();
}

=============== =============== =========

Hope this helps.

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead
Delighting our customers is our #1 priority. We welcome your comments and
suggestions about how we can improve the support we provide to you. Please
feel free to let my manager know what you think of the level of service
provided. You can send feedback directly to my manager at:
ms****@microsof t.com.

=============== =============== =============== =====
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscripti...ult.aspx#notif
ications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscripti...t/default.aspx.
=============== =============== =============== =====
This posting is provided "AS IS" with no warranties, and confers no
rights.

--------------------
From: "fbrewster" <fb*******@news group.nospam>
Subject: Can I use Internet explorers DOM parser?
Date: Mon, 16 Jun 2008 12:53:22 -0500
I'm writing an HTML parser and would like to use Internet Explorers DOM
parser.

Can I use Internet Explorers DOM parser through a web service?

thanks for the help

Jun 27 '08 #3
Thanks for your reply fbrewster,

Yes, I've checked that kb article. I suggested that we not use the MSHTML
component in server-side application(suc h as ASP or ASP.NET web
application). I think the reason is due to the MSHTML's COM threading model
that will be quite performance restricted in server-side multi-threading
environment. If your application won't frequently spawn many concurrent
threads that call the components, you can still use it.

Also, for other alternative approach, I suggest you consider the following
HTML parser component I mentioned in last reply:
#.NET Html Agility Pack: How to use malformed HTML just like it was
well-formed XML
http://blogs.msdn.com/smourier/archi...6/04/8265.aspx

#Html Agility Pack
http://www.codeplex.com/htmlagilitypack

the "html agility pack" is a pure .NET component (not rely on COM).
Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead
Delighting our customers is our #1 priority. We welcome your comments and
suggestions about how we can improve the support we provide to you. Please
feel free to let my manager know what you think of the level of service
provided. You can send feedback directly to my manager at:
ms****@microsof t.com.

=============== =============== =============== =====
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscripti...ult.aspx#notif
ications.

=============== =============== =============== =====
This posting is provided "AS IS" with no warranties, and confers no rights.
--------------------
From: "fbrewster" <fb*******@news group.nospam>
References: <Oi************ **@TK2MSFTNGP05 .phx.gbl>
<a0************ **@TK2MSFTNGHUB 02.phx.gbl>
Subject: Re: Can I use Internet explorers DOM parser?
Date: Tue, 17 Jun 2008 10:19:06 -0500

Thanks for your post Stephen.

I followed up on MSHTML, but I found this KB article that indicates that I
can't use MSHTML or IE from a service:
http://support.microsoft.com/kb/244085

My parser needs to run within a Windows Service. Do you know of a
technology that I can use?
"Steven Cheng [MSFT]" <st*****@online .microsoft.comw rote in message
news:a0******** ******@TK2MSFTN GHUB02.phx.gbl. ..
Hi fbrewster,

From your description, you're looking for some components or means to
parse
HTML string, correct?

What's the input format of the html, are you programmticaly captureing the
html content from web and parse it or are there any existing html files on
local file disk?

Yes, in .net you can still use the MSHTML component(IE DOM parser) to
parse
html. It is a COM component, therefore you need to call it via COM
interop.
Here are some web articles demonstrating how to use it in .net:

#Parsing html markup text using MSHTML
http://www.eggheadcafe.com/articles/parsinghtml.asp

#Parsing HTML without Using the Browser Control
http://www.codeguru.com/vb/vb_intern...cle.php/c4815/

the MSHTML component load the html into it's DOM memory model and you can
access html elements in the DOM structure just like what you can do when
using javascript to accessing client-side html's DOM collection.
Also, for .net framework specific components, I've ever used the "Html
Agility Pack" which is good one for parsing html:

#.NET Html Agility Pack: How to use malformed HTML just like it was
well-formed XML
http://blogs.msdn.com/smourier/archi...6/04/8265.aspx

#Html Agility Pack
http://www.codeplex.com/htmlagilitypack

it also provide a DOM based model. And it also support XPATH based query
which is quite convenient and powerful.

========sample code using Html Aglity Pack=========== =
private void Parse_Questions ()
{
//get html content from web
HttpWebRequest req = WebRequest.Crea te(txtUrl.Text) as
HttpWebRequest;

HttpWebResponse rep = req.GetResponse () as HttpWebResponse ;

StreamReader sr = new StreamReader(re p.GetResponseSt ream());
//construct html document object and load the html stream

html.HtmlDocume nt hd = new HtmlAgilityPack .HtmlDocument() ;
hd.Load(sr);

sr.Close();
rep.Close();
//use xpath t o query the expected nodes in the htmldocument

html.HtmlNode doc = hd.DocumentNode ;

html.HtmlNodeCo llection divs =
doc.SelectNodes ("//div[@class='questio nbody']");
StreamWriter sw = new StreamWriter(@" e:\temp\htmlout put.htm");
int i = 0;
sw.WriteLine("< html><body>");
foreach (html.HtmlNode node in divs)
{
//....processing code

}
sw.WriteLine("</body></html>");

sw.Close();
}

=============== =============== =========

Hope this helps.

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead
Delighting our customers is our #1 priority. We welcome your comments and
suggestions about how we can improve the support we provide to you. Please
feel free to let my manager know what you think of the level of service
provided. You can send feedback directly to my manager at:
ms****@microsof t.com.

=============== =============== =============== =====
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscripti...ult.aspx#notif
ications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscripti...t/default.aspx.
=============== =============== =============== =====
This posting is provided "AS IS" with no warranties, and confers no
rights.

--------------------
From: "fbrewster" <fb*******@news group.nospam>
Subject: Can I use Internet explorers DOM parser?
Date: Mon, 16 Jun 2008 12:53:22 -0500
I'm writing an HTML parser and would like to use Internet Explorers DOM
parser.

Can I use Internet Explorers DOM parser through a web service?

thanks for the help


Jun 27 '08 #4
Hi fbrewster,

Have you got any further progress on this or have you tried the "Html
Agility Pack"? If there is anything else need help, welcome to post here.

Sincerely,

Steven Cheng
Microsoft MSDN Online Support Lead
Delighting our customers is our #1 priority. We welcome your comments and
suggestions about how we can improve the support we provide to you. Please
feel free to let my manager know what you think of the level of service
provided. You can send feedback directly to my manager at:
ms****@microsof t.com.

=============== =============== =============== =====
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscripti...ult.aspx#notif
ications.
=============== =============== =============== =====
This posting is provided "AS IS" with no warranties, and confers no rights.
--------------------
X-Tomcat-ID: 48175703
References: <Oi************ **@TK2MSFTNGP05 .phx.gbl>
<a0************ **@TK2MSFTNGHUB 02.phx.gbl>
<eq************ **@TK2MSFTNGP02 .phx.gbl>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_NextPart_0001 _BF42FF59"
Content-Transfer-Encoding: 7bit
From: st*****@online. microsoft.com (Steven Cheng [MSFT])
Organization: Microsoft
Date: Wed, 18 Jun 2008 06:48:23 GMT
Subject: Re: Can I use Internet explorers DOM parser?
Thanks for your reply fbrewster,

Yes, I've checked that kb article. I suggested that we not use the MSHTML
component in server-side application(suc h as ASP or ASP.NET web
application). I think the reason is due to the MSHTML's COM threading model
that will be quite performance restricted in server-side multi-threading
environment. If your application won't frequently spawn many concurrent
threads that call the components, you can still use it.

Also, for other alternative approach, I suggest you consider the following
HTML parser component I mentioned in last reply:
#.NET Html Agility Pack: How to use malformed HTML just like it was
well-formed XML
http://blogs.msdn.com/smourier/archi...6/04/8265.aspx

#Html Agility Pack
http://www.codeplex.com/htmlagilitypack

the "html agility pack" is a pure .NET component (not rely on COM).
Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead
Delighting our customers is our #1 priority. We welcome your comments and
suggestions about how we can improve the support we provide to you. Please
feel free to let my manager know what you think of the level of service
provided. You can send feedback directly to my manager at:
ms****@microsof t.com.

=============== =============== =============== =====
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscripti...ult.aspx#notif
ications.

=============== =============== =============== =====
This posting is provided "AS IS" with no warranties, and confers no rights.
--------------------
Jun 27 '08 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
1771
by: dhplank | last post by:
Hello everyone, I've developed a calendar program in javascript, and up until now I've done most of my testing using Mozilla and Firefox. Everything works fine, but when I try to use Internet Explorer my response time is sometimes 50 times slower than using Mozilla. I know I haven't given you much to go by, but I'm not looking for an answer so much as an approach to debugging the problem. For example, does anyone here know of a good...
1
1760
by: Robert Oschler | last post by:
In pre-Windows XP Internet Explorer, adding a bookmarklet for a user was really simple. All I add to do was create a link that executed addFavorite(). Internet Explorer under Windows XP won't allow that due to its new security measures. How can I make adding a bookmarklet convenient for my Windows XP IE users without having them to do surgery on their security preferences?
9
7280
by: ThunderMusic | last post by:
Hi, probably the question have been asked many times, but I can't find the answer anywhere. Is there a way to detect if an internet connection is active? just like Internet explorers detect you are offline when you try to open it when not connected (then asks if you want to connect or work offline). I need to detect if there is an internet connection available. And if possible too (for another project) detect if a network connection...
0
2649
by: john bailo | last post by:
I am attempting to create a c# program to iterate through the messages in an Outlook/Exchange public folder and extract the headers. My apologies to the VB/VBA groups, but you seem to have more information for Office automation than the c# groups. I am having some problems manipulating the various object models. I say various because based on what source code I could find, I am trying to use the Outlook, CDO and MAPI type libraries
0
1000
by: altMann | last post by:
Hi Any clues on how to add an item to explorers standard 'undo, cut, copy, paste, delete, select all' right context menu when activated over a text box? Thanks.
9
2387
by: Alberto | last post by:
Eh unfortunately Google groups does not provide any longer a way to reply to the group for older posts (though the one I am referring to is not older than one month), and I happen to come back to this after life has asked my attention elsewhere for a while :-) Yet I think your point deserves a reply. You were referring to: http://www.unitedscripters.com/spellbinder/internetexplorer.html with the following observation:
3
1587
by: Craig | last post by:
Is there a way to access the internet explorer DOM in ASP.NET? Thanks
28
3592
by: Neo Geshel | last post by:
NOTE: PAST EXPERIENCE HAS SHOWN ME THAT MANY ON USENET FAIL TO READ ARTICLES PROPERLY PRIOR TO ANSWERING. I AM LOOKING FOR VERY SPECIFIC INFORMATION, THEREFORE PLEASE READ AND UNDERSTAND THOROUGHLY BEFORE RESPONDING; OR ASK QUESTIONS TO CLARIFY. I *WILL* APPRECIATE ANY CONSTRUCTIVE REPLY. Greetings! I am in the process of creating a template for a site. The site will be *true* XHTML 1.1. That is, it will validate as XHTML 1.1 on...
0
9530
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9363
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9312
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
1
6793
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6073
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4593
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4864
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
2775
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2206
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.