In message <eK**************@TK2MSFTNGP12.phx.gbl>, Yosh
<Yo**@nospam.com> writes
I have a web page that I call and I need to get the body text out of
the HTML.
*
<html>
<body>
Hi.
How are you?
</body>
</html>
*
What is the best way to do this in CO# and .NET?
#1 Treat it as a string and parse it using regular expressions.
#2 Use the Microsoft HTML Object Library (mshtml, add reference from COM
tab) to load and parse it, and access it through the document object
model:
using System;
using mshtml;
namespace HTMParse
{
/// <summary>
/// Summary description for Class1.
/// </summary>
class Class1
{
/// <summary>
/// The main entry point for the application.
/// </summary>
[STAThread]
static void Main(string[] args)
{
string s = "<html><body>Hi.How are
you?</body></html>";
IHTMLDocument2 doc = new HTMLDocumentClass();
doc.write(new object[]{s});
doc.close();
Console.Write(doc.body.innerHTML);
Console.Read();
}
}
}
--
Steve Walker