471,319 Members | 1,441 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,319 software developers and data experts.

Read HTML text

Hi,
There is anyway to read the text of a HTML page? Final text, not the HTML
code.

Thanks :)
Jan 6 '06 #1
3 6242
private string StripHTML(string htmlString)

{

string pattern = @"<(.|\n)*?>";

return Regex.Replace(htmlString,pattern,string.Empty);

}

Use it

string StrOnlyHtmlData =
StripHtml("<html><title></title><head></head><body>hi</body></html>");

-------
Regards ,
C#, VB.NET , SQL SERVER , UML , DESIGN Patterns Interview question book
http://www.geocities.com/dotnetinterviews/
My Interview Blog
http://spaces.msn.com/members/dotnetinterviews/

Jan 6 '06 #2
If you want to read the text as it is rendered in IE, try adding
reference to MSHTML dll and play around with it.

Jan 6 '06 #3
Thanks!
I've added a reference to MSHTML. Now I need to know how to cast a .htm or
..html file in my hard drive to mshtml.IHTMLDocument2, to do
mshtml.IHTMLDocument2.body.innerText.

Thanks again :)

"Truong Hong Thi" <th*****@gmail.com> wrote in message
news:11**********************@g49g2000cwa.googlegr oups.com...
If you want to read the text as it is rendered in IE, try adding
reference to MSHTML dll and play around with it.

Jan 6 '06 #4

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by David Thomas | last post: by
2 posts views Thread by Michael Winter | last post: by
2 posts views Thread by Reply Via Newsgroup | last post: by
1 post views Thread by Magix | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.