By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
458,089 Members | 1,361 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 458,089 IT Pros & Developers. It's quick & easy.

using MSHTML for parsing HTML files in c#

P: n/a
hi,

Does anyone have any sample code for this?? I can't find anything
relvant at all. Please share out some code if you have any.

thx
Nov 15 '05 #1
Share this Question
Share on Google+
9 Replies


P: n/a
Hi,

ph*****@vistatec.ie wrote:
Does anyone have any sample code for this?? I can't find anything
relvant at all. Please share out some code if you have any.


A simple google search yields quite a few good results. I suggest you learn
to improve your searching technique.

Or did you want someone to do it for you?

-- Pete
Nov 15 '05 #2

P: n/a
"Pete" <pv*****@gawab.com> wrote in message news:<wm***************@newsfep3-gui.server.ntli.net>...
Hi,

ph*****@vistatec.ie wrote:
Does anyone have any sample code for this?? I can't find anything
relvant at all. Please share out some code if you have any.


A simple google search yields quite a few good results. I suggest you learn
to improve your searching technique.

Or did you want someone to do it for you?

-- Pete

ehh... did you look at those results??? If you have found anything
useful, pass it on, as I certainly couldn't, find anything more then 5
lines of code.
Otherwise quit spreading negative vibes.
Nov 15 '05 #3

P: n/a
I did a small project using this a few weeks ago. My code will probably look
obscure but I can try to help u if you need more questions.
http://www.skidmore.edu/~h_blackw/mshtmlsample.cs is the stripped version of
my code that uses the mshtml library.
<ph*****@vistatec.ie> wrote in message
news:a3**************************@posting.google.c om...
hi,

Does anyone have any sample code for this?? I can't find anything
relvant at all. Please share out some code if you have any.

thx

Nov 15 '05 #4

P: n/a
In article <a3**************************@posting.google.com >,
ph*****@vistatec.ie says...
"Pete" <pv*****@gawab.com> wrote in message news:<wm***************@newsfep3-gui.server.ntli.net>...
Hi,

ph*****@vistatec.ie wrote:
Does anyone have any sample code for this?? I can't find anything
relvant at all. Please share out some code if you have any.


A simple google search yields quite a few good results. I suggest you learn
to improve your searching technique.

Or did you want someone to do it for you?

-- Pete

ehh... did you look at those results??? If you have found anything
useful, pass it on, as I certainly couldn't, find anything more then 5
lines of code.
Otherwise quit spreading negative vibes.


Hi, here is a little example. I used this code to read an HTML page and
to replace some of the links in there, and after that to save the
result. The example is not full, but shows how to manipulate HTML page.

Hope that helps
Sunny

<snip>
try
{
mr = new StreamReader(source.OpenRead(sUrl));
sWebPage = mr.ReadToEnd();
}
catch
{ //could not read the URL
return;
}
finally
{
if (mr != null)
mr.Close();
}

HTMLDocumentClass myDoc;

try
{ //place the HTML string in MSHTML doc
object[] oPageText = {sWebPage};
myDoc = new HTMLDocumentClass();
IHTMLDocument2 oMyDoc = (IHTMLDocument2)myDoc;
oMyDoc.write(oPageText);
}
catch
{
//page is not well formated, skip it
return;
}

// if we are here, we have read the page and we are ready to parce it

//get collection of links
IHTMLElementCollection cMyLinks = (IHTMLElementCollection)myDoc.links;

//modify the links
foreach (IHTMLAnchorElement oLink in cMyLinks)
oLink.href = SubstituteTags(true, sUrl, oLink.href);

//get collection of images
cMyLinks = (IHTMLElementCollection)myDoc.images;
//modify images
foreach (IHTMLImgElement oImage in cMyLinks)
oImage.src = SubstituteTags(false, sUrl, oImage.href);

//write the result
StreamWriter myFile = null;
sWebPage = myDoc.documentElement.outerHTML;
try
{
myFile = new StreamWriter("modpage.html", false);
myFile.Write(sWebPage);
}
catch{}
finally
{
if (myFile != null)
myFile.Close();
}

<snip>
Nov 15 '05 #5

P: n/a
Hi,

ph*****@vistatec.ie wrote:
ehh... did you look at those results??? If you have found anything
useful, pass it on, as I certainly couldn't, find anything more then 5
lines of code.
Otherwise quit spreading negative vibes.


*sigh*. Okay. Depending on whether or not you're looking to display the
html, these might be of some use:

http://www.itwriting.com/htmleditor/index.php

http://msdn.microsoft.com/library/de...asp?frame=true

http://msdn.microsoft.com/library/de...ng/hosting.asp

http://msdn.microsoft.com/library/de...mldomfromc.asp

http://www.devhood.com/tutorials/tut...utorial_id=312

http://www.thecodeproject.com/csharp/webbrowser.asp

http://www.codeproject.com/csharp/advhost.asp

http://blog.monstuff.com/archives/000052.html

If you just want to use mshtml without a ui, this might help:

http://www.codeguru.com/ieprogram/HTMLParsing.html

I know that last one isn't c#, but it should show what you need (especially
combined with those others). None of this was hard to find and I'm certain
there's a lot more out there that my brief search didn't pick up.

-- Pete
Nov 15 '05 #6

P: n/a
"Hasani" <HJ****@hotmail.c0m> wrote in message news:<DP*********************@twister.nyc.rr.com>. ..
I did a small project using this a few weeks ago. My code will probably look
obscure but I can try to help u if you need more questions.
http://www.skidmore.edu/~h_blackw/mshtmlsample.cs is the stripped version of
my code that uses the mshtml library.
<ph*****@vistatec.ie> wrote in message
news:a3**************************@posting.google.c om...
hi,

Does anyone have any sample code for this?? I can't find anything
relvant at all. Please share out some code if you have any.

thx


Thx for the code! I have tried out both implementations but i still
can't access my html page. The problem I think is that the
HTMLDocmentClass does not seem to enumerate my html page properly. It
picks up what size it is etc, but <title> and <body> does not seem to
be enumerated. Can you spot what I maybe doing wrong?

thx

This is the simple html page i am trying to read:
<HTML>
<HEAD>
<TITLE>I Love HTML</TITLE>
</HEAD>

<BODY>
Everything displayed on your page will be in here.
</BODY>

</HTML>

Here is the code:

main()
{

//it seems that HTMLDocumentClass in 'Loadhtml' does not enumerate the
file properly. So I think the problem start here
HTMLDocument htmlDoc =
LoadHtml(@"D:\work\htmlparse\ConsoleApplication\he llo.html");

IHTMLElementCollection title = htmlDoc.getElementsByTagName("title");

//nothing
Console.WriteLine(htmlDoc.title);

//no elements
foreach(IHTMLTitleElement myt in title)
{
Console.WriteLine(myt.ToString());
}

}
private static HTMLDocument LoadHtml(string path)
{
HTMLDocumentClass dom = new HTMLDocumentClass();
System.Runtime.InteropServices.UCOMIPersistFile pf =
(System.Runtime.InteropServices.UCOMIPersistFile)d om;
pf.Load(path,1);

return dom;
}
Nov 15 '05 #7

P: n/a
"Hasani" <HJ****@hotmail.c0m> wrote in message news:<DP*********************@twister.nyc.rr.com>. ..
I did a small project using this a few weeks ago. My code will probably look
obscure but I can try to help u if you need more questions.
http://www.skidmore.edu/~h_blackw/mshtmlsample.cs is the stripped version of
my code that uses the mshtml library.
<ph*****@vistatec.ie> wrote in message
news:a3**************************@posting.google.c om...
hi,

Does anyone have any sample code for this?? I can't find anything
relvant at all. Please share out some code if you have any.

thx


Thx for the code! I have tried out both implementations but i still
can't access my html page. The problem I think is that the
HTMLDocmentClass does not seem to enumerate my html page properly. It
picks up what size it is etc, but <title> and <body> does not seem to
be enumerated. Can you spot what I maybe doing wrong?

thx

This is the simple html page i am trying to read:
<HTML>
<HEAD>
<TITLE>I Love HTML</TITLE>
</HEAD>

<BODY>
Everything displayed on your page will be in here.
</BODY>

</HTML>

Here is the code:

main()
{

//it seems that HTMLDocumentClass in 'Loadhtml' does not enumerate the
file properly. So I think the problem start here
HTMLDocument htmlDoc =
LoadHtml(@"D:\work\htmlparse\ConsoleApplication\he llo.html");

IHTMLElementCollection title = htmlDoc.getElementsByTagName("title");

//nothing
Console.WriteLine(htmlDoc.title);

//no elements
foreach(IHTMLTitleElement myt in title)
{
Console.WriteLine(myt.ToString());
}

}
private static HTMLDocument LoadHtml(string path)
{
HTMLDocumentClass dom = new HTMLDocumentClass();
System.Runtime.InteropServices.UCOMIPersistFile pf =
(System.Runtime.InteropServices.UCOMIPersistFile)d om;
pf.Load(path,1);

return dom;
}
Nov 15 '05 #8

P: n/a
I have posted yesterday, but it seems my post does not appear. The
following example works just fine.

Hope that helps
Sunny

string myPage = "<HTML><HEAD><TITLE>I Love HTML</TITLE></HEAD>" +
"<BODY>Everything displayed on your page will be in here.</BODY>" +
"</HTML>";

HTMLDocumentClass myDoc;

//loading the document !
object[] oPageText = {myPage};
myDoc = new HTMLDocumentClass();
IHTMLDocument2 oMyDoc = (IHTMLDocument2)myDoc;
oMyDoc.write(oPageText);

IHTMLElementCollection title = myDoc.getElementsByTagName("title");
foreach(IHTMLTitleElement myt in title)
{
Console.WriteLine(myt.text);
}

Console.WriteLine(myDoc.title);
Nov 15 '05 #9

P: n/a
Sunny <su******@icebergwireless.com> wrote in message news:<MP************************@msnews.microsoft. com>...
I have posted yesterday, but it seems my post does not appear. The
following example works just fine.

Hope that helps
Sunny

string myPage = "<HTML><HEAD><TITLE>I Love HTML</TITLE></HEAD>" +
"<BODY>Everything displayed on your page will be in here.</BODY>" +
"</HTML>";

HTMLDocumentClass myDoc;

//loading the document !
object[] oPageText = {myPage};
myDoc = new HTMLDocumentClass();
IHTMLDocument2 oMyDoc = (IHTMLDocument2)myDoc;
oMyDoc.write(oPageText);

IHTMLElementCollection title = myDoc.getElementsByTagName("title");
foreach(IHTMLTitleElement myt in title)
{
Console.WriteLine(myt.text);
}

Console.WriteLine(myDoc.title);


Thanks links and code guys.
-Sunny Thanks for the code, I was able to get what I need with this as
a start. Cheers.
Nov 15 '05 #10

This discussion thread is closed

Replies have been disabled for this discussion.