Connecting Tech Pros Worldwide Forums | Help | Site Map

using MSHTML for parsing HTML files in c#

philipl@vistatec.ie
Guest
 
Posts: n/a
#1: Nov 15 '05
hi,

Does anyone have any sample code for this?? I can't find anything
relvant at all. Please share out some code if you have any.

thx

Pete
Guest
 
Posts: n/a
#2: Nov 15 '05

re: using MSHTML for parsing HTML files in c#


Hi,

philipl@vistatec.ie wrote:[color=blue]
> Does anyone have any sample code for this?? I can't find anything
> relvant at all. Please share out some code if you have any.[/color]

A simple google search yields quite a few good results. I suggest you learn
to improve your searching technique.

Or did you want someone to do it for you?

-- Pete


philipl@vistatec.ie
Guest
 
Posts: n/a
#3: Nov 15 '05

re: using MSHTML for parsing HTML files in c#


"Pete" <pvidler@gawab.com> wrote in message news:<wmVbb.261$Yy4.149@newsfep3-gui.server.ntli.net>...[color=blue]
> Hi,
>
> philipl@vistatec.ie wrote:[color=green]
> > Does anyone have any sample code for this?? I can't find anything
> > relvant at all. Please share out some code if you have any.[/color]
>
> A simple google search yields quite a few good results. I suggest you learn
> to improve your searching technique.
>
> Or did you want someone to do it for you?
>
> -- Pete[/color]


ehh... did you look at those results??? If you have found anything
useful, pass it on, as I certainly couldn't, find anything more then 5
lines of code.
Otherwise quit spreading negative vibes.
Hasani
Guest
 
Posts: n/a
#4: Nov 15 '05

re: using MSHTML for parsing HTML files in c#


I did a small project using this a few weeks ago. My code will probably look
obscure but I can try to help u if you need more questions.
http://www.skidmore.edu/~h_blackw/mshtmlsample.cs is the stripped version of
my code that uses the mshtml library.
<philipl@vistatec.ie> wrote in message
news:a3f6c32f.0309230036.26d3ac1a@posting.google.c om...[color=blue]
> hi,
>
> Does anyone have any sample code for this?? I can't find anything
> relvant at all. Please share out some code if you have any.
>
> thx[/color]


Sunny
Guest
 
Posts: n/a
#5: Nov 15 '05

re: using MSHTML for parsing HTML files in c#


In article <a3f6c32f.0309230607.50034f0a@posting.google.com >,
philipl@vistatec.ie says...[color=blue]
> "Pete" <pvidler@gawab.com> wrote in message news:<wmVbb.261$Yy4.149@newsfep3-gui.server.ntli.net>...[color=green]
> > Hi,
> >
> > philipl@vistatec.ie wrote:[color=darkred]
> > > Does anyone have any sample code for this?? I can't find anything
> > > relvant at all. Please share out some code if you have any.[/color]
> >
> > A simple google search yields quite a few good results. I suggest you learn
> > to improve your searching technique.
> >
> > Or did you want someone to do it for you?
> >
> > -- Pete[/color]
>
>
> ehh... did you look at those results??? If you have found anything
> useful, pass it on, as I certainly couldn't, find anything more then 5
> lines of code.
> Otherwise quit spreading negative vibes.
>[/color]

Hi, here is a little example. I used this code to read an HTML page and
to replace some of the links in there, and after that to save the
result. The example is not full, but shows how to manipulate HTML page.

Hope that helps
Sunny

<snip>
try
{
mr = new StreamReader(source.OpenRead(sUrl));
sWebPage = mr.ReadToEnd();
}
catch
{ //could not read the URL
return;
}
finally
{
if (mr != null)
mr.Close();
}

HTMLDocumentClass myDoc;

try
{ //place the HTML string in MSHTML doc
object[] oPageText = {sWebPage};
myDoc = new HTMLDocumentClass();
IHTMLDocument2 oMyDoc = (IHTMLDocument2)myDoc;
oMyDoc.write(oPageText);
}
catch
{
//page is not well formated, skip it
return;
}

// if we are here, we have read the page and we are ready to parce it

//get collection of links
IHTMLElementCollection cMyLinks = (IHTMLElementCollection)myDoc.links;

//modify the links
foreach (IHTMLAnchorElement oLink in cMyLinks)
oLink.href = SubstituteTags(true, sUrl, oLink.href);

//get collection of images
cMyLinks = (IHTMLElementCollection)myDoc.images;
//modify images
foreach (IHTMLImgElement oImage in cMyLinks)
oImage.src = SubstituteTags(false, sUrl, oImage.href);

//write the result
StreamWriter myFile = null;
sWebPage = myDoc.documentElement.outerHTML;
try
{
myFile = new StreamWriter("modpage.html", false);
myFile.Write(sWebPage);
}
catch{}
finally
{
if (myFile != null)
myFile.Close();
}

<snip>
Pete
Guest
 
Posts: n/a
#6: Nov 15 '05

re: using MSHTML for parsing HTML files in c#


Hi,

philipl@vistatec.ie wrote:[color=blue]
> ehh... did you look at those results??? If you have found anything
> useful, pass it on, as I certainly couldn't, find anything more then 5
> lines of code.
> Otherwise quit spreading negative vibes.[/color]

*sigh*. Okay. Depending on whether or not you're looking to display the
html, these might be of some use:

http://www.itwriting.com/htmleditor/index.php

http://msdn.microsoft.com/library/de...asp?frame=true

http://msdn.microsoft.com/library/de...ng/hosting.asp

http://msdn.microsoft.com/library/de...mldomfromc.asp

http://www.devhood.com/tutorials/tut...utorial_id=312

http://www.thecodeproject.com/csharp/webbrowser.asp

http://www.codeproject.com/csharp/advhost.asp

http://blog.monstuff.com/archives/000052.html

If you just want to use mshtml without a ui, this might help:

http://www.codeguru.com/ieprogram/HTMLParsing.html

I know that last one isn't c#, but it should show what you need (especially
combined with those others). None of this was hard to find and I'm certain
there's a lot more out there that my brief search didn't pick up.

-- Pete


philipl@vistatec.ie
Guest
 
Posts: n/a
#7: Nov 15 '05

re: using MSHTML for parsing HTML files in c#


"Hasani" <HJB417@hotmail.c0m> wrote in message news:<DP_bb.15824$nU6.3119624@twister.nyc.rr.com>. ..[color=blue]
> I did a small project using this a few weeks ago. My code will probably look
> obscure but I can try to help u if you need more questions.
> http://www.skidmore.edu/~h_blackw/mshtmlsample.cs is the stripped version of
> my code that uses the mshtml library.
> <philipl@vistatec.ie> wrote in message
> news:a3f6c32f.0309230036.26d3ac1a@posting.google.c om...[color=green]
> > hi,
> >
> > Does anyone have any sample code for this?? I can't find anything
> > relvant at all. Please share out some code if you have any.
> >
> > thx[/color][/color]



Thx for the code! I have tried out both implementations but i still
can't access my html page. The problem I think is that the
HTMLDocmentClass does not seem to enumerate my html page properly. It
picks up what size it is etc, but <title> and <body> does not seem to
be enumerated. Can you spot what I maybe doing wrong?

thx

This is the simple html page i am trying to read:
<HTML>
<HEAD>
<TITLE>I Love HTML</TITLE>
</HEAD>

<BODY>
Everything displayed on your page will be in here.
</BODY>

</HTML>

Here is the code:

main()
{

//it seems that HTMLDocumentClass in 'Loadhtml' does not enumerate the
file properly. So I think the problem start here
HTMLDocument htmlDoc =
LoadHtml(@"D:\work\htmlparse\ConsoleApplication\he llo.html");

IHTMLElementCollection title = htmlDoc.getElementsByTagName("title");

//nothing
Console.WriteLine(htmlDoc.title);

//no elements
foreach(IHTMLTitleElement myt in title)
{
Console.WriteLine(myt.ToString());
}



}


private static HTMLDocument LoadHtml(string path)
{
HTMLDocumentClass dom = new HTMLDocumentClass();
System.Runtime.InteropServices.UCOMIPersistFile pf =
(System.Runtime.InteropServices.UCOMIPersistFile)d om;
pf.Load(path,1);

return dom;
}
philipl@vistatec.ie
Guest
 
Posts: n/a
#8: Nov 15 '05

re: using MSHTML for parsing HTML files in c#


"Hasani" <HJB417@hotmail.c0m> wrote in message news:<DP_bb.15824$nU6.3119624@twister.nyc.rr.com>. ..[color=blue]
> I did a small project using this a few weeks ago. My code will probably look
> obscure but I can try to help u if you need more questions.
> http://www.skidmore.edu/~h_blackw/mshtmlsample.cs is the stripped version of
> my code that uses the mshtml library.
> <philipl@vistatec.ie> wrote in message
> news:a3f6c32f.0309230036.26d3ac1a@posting.google.c om...[color=green]
> > hi,
> >
> > Does anyone have any sample code for this?? I can't find anything
> > relvant at all. Please share out some code if you have any.
> >
> > thx[/color][/color]



Thx for the code! I have tried out both implementations but i still
can't access my html page. The problem I think is that the
HTMLDocmentClass does not seem to enumerate my html page properly. It
picks up what size it is etc, but <title> and <body> does not seem to
be enumerated. Can you spot what I maybe doing wrong?

thx

This is the simple html page i am trying to read:
<HTML>
<HEAD>
<TITLE>I Love HTML</TITLE>
</HEAD>

<BODY>
Everything displayed on your page will be in here.
</BODY>

</HTML>

Here is the code:

main()
{

//it seems that HTMLDocumentClass in 'Loadhtml' does not enumerate the
file properly. So I think the problem start here
HTMLDocument htmlDoc =
LoadHtml(@"D:\work\htmlparse\ConsoleApplication\he llo.html");

IHTMLElementCollection title = htmlDoc.getElementsByTagName("title");

//nothing
Console.WriteLine(htmlDoc.title);

//no elements
foreach(IHTMLTitleElement myt in title)
{
Console.WriteLine(myt.ToString());
}



}


private static HTMLDocument LoadHtml(string path)
{
HTMLDocumentClass dom = new HTMLDocumentClass();
System.Runtime.InteropServices.UCOMIPersistFile pf =
(System.Runtime.InteropServices.UCOMIPersistFile)d om;
pf.Load(path,1);

return dom;
}
Sunny
Guest
 
Posts: n/a
#9: Nov 15 '05

re: using MSHTML for parsing HTML files in c#


I have posted yesterday, but it seems my post does not appear. The
following example works just fine.

Hope that helps
Sunny

string myPage = "<HTML><HEAD><TITLE>I Love HTML</TITLE></HEAD>" +
"<BODY>Everything displayed on your page will be in here.</BODY>" +
"</HTML>";

HTMLDocumentClass myDoc;

//loading the document !
object[] oPageText = {myPage};
myDoc = new HTMLDocumentClass();
IHTMLDocument2 oMyDoc = (IHTMLDocument2)myDoc;
oMyDoc.write(oPageText);

IHTMLElementCollection title = myDoc.getElementsByTagName("title");
foreach(IHTMLTitleElement myt in title)
{
Console.WriteLine(myt.text);
}

Console.WriteLine(myDoc.title);
philipl@vistatec.ie
Guest
 
Posts: n/a
#10: Nov 15 '05

re: using MSHTML for parsing HTML files in c#


Sunny <sunnyask@icebergwireless.com> wrote in message news:<MPG.19dcbd36dcfe1a6898969b@msnews.microsoft. com>...[color=blue]
> I have posted yesterday, but it seems my post does not appear. The
> following example works just fine.
>
> Hope that helps
> Sunny
>
> string myPage = "<HTML><HEAD><TITLE>I Love HTML</TITLE></HEAD>" +
> "<BODY>Everything displayed on your page will be in here.</BODY>" +
> "</HTML>";
>
> HTMLDocumentClass myDoc;
>
> //loading the document !
> object[] oPageText = {myPage};
> myDoc = new HTMLDocumentClass();
> IHTMLDocument2 oMyDoc = (IHTMLDocument2)myDoc;
> oMyDoc.write(oPageText);
>
> IHTMLElementCollection title = myDoc.getElementsByTagName("title");
> foreach(IHTMLTitleElement myt in title)
> {
> Console.WriteLine(myt.text);
> }
>
> Console.WriteLine(myDoc.title);[/color]

Thanks links and code guys.
-Sunny Thanks for the code, I was able to get what I need with this as
a start. Cheers.
Closed Thread