473,506 Members | 16,201 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

using mshtml to access HTML page elements

I was pleased to find this that I could easily access all the links in a
page using this construct:

IHTMLDocument2 d = (IHTMLDocument2) ie.Document;
IHTMLElementCollection links = d.links;

but disappointed to find I couldn't do the same to get all my tables (using
something like d.tables). Instead I'm resorting to the naive approach of
iterating thru d.all casting to a table and picking out the objects that
didn't turn to null.

I realize its horribly inefficient to cast every object to a table and
checking for hits. Can you advise?

Here is the naive approach which is very slow:

SHDocVw.InternetExplorer ie = new SHDocVw.InternetExplorerClass();
object o = System.Reflection.Missing.Value;
object url = "file://" + Path.Combine(Directory.GetCurrentDirectory(),
@"..\..\test\test1.html");

ie.Navigate2(ref url,ref o,ref o,ref o,ref o);
while(ie.Busy){Thread.Sleep(2);}
IHTMLDocument2 d = (IHTMLDocument2) ie.Document;
IHTMLElementCollection all = d.all;
foreach (object el in all)
{
HTMLTableClass t = el as HTMLTableClass;
if(t!=null)
{
if( 3 == t.cells.length)
{
foreach(HTMLTableRow c in t.rows)
{
Console.WriteLine(c.innerText);
}
}

}
}
Nov 15 '05 #1
1 13319
OK. I've made some progress in that I've found out why my naive approach
was so slow. Here is a well written piece from someone who seems to know
what he's talking about:

<snip>
David J. Marcus [@alhakol.com]

I have some fairly extensive experience traversing the DOM.

I can tell you unabashedly that the performance is absurdly bad.

To traverse a DOM of a medium sized web page on an 800MHz Pentium III using
C# takes up to 10 seconds !!!

I've posted the problem before and got no response from the folks at
Microsoft. Perhaps they are embarrassed by the results. The only response I
got was a vague reference to 'marshalling'.

In doing some more research, the problem turns out to be the marshalling of
data from the MSHTML control to the C# environment. In particular, be aware
that MSHTML creates a fully fleshed node for each HTML tag. This includes
ALL the possible the attributes the node can ever have. It then marks each
attribute with a flag (which can be tested) which is 'true' if the attribute
was actually specified in the HTML. This approach is necessary because some
of the attributes have inherited values (meaning that unless the user
explicitly specifies them in the HTML, they contain an inherited value [or a
default value]).

This short of it, there are typically 100 attributes for most HTML tag
types. Multiply this by the number of tags in your HTML page and you get an
idea of the number of marshalling calls required (assuming it is good enough
to marshal an attribute in one call.. if not, it is even worse).

By the way, traversing the same DOM in C++ is virtually instantaneously.

I hope this helps you.

-Regards David
</snip>

This was copied from http://www.dotnet247.com/247reference/msgs/8/41599.aspx


"Taylor" <ta****@u.washington.edu> wrote in message
news:uQ**************@TK2MSFTNGP11.phx.gbl...
I was pleased to find this that I could easily access all the links in a
page using this construct:

IHTMLDocument2 d = (IHTMLDocument2) ie.Document;
IHTMLElementCollection links = d.links;

but disappointed to find I couldn't do the same to get all my tables (using something like d.tables). Instead I'm resorting to the naive approach of
iterating thru d.all casting to a table and picking out the objects that
didn't turn to null.

I realize its horribly inefficient to cast every object to a table and
checking for hits. Can you advise?

Here is the naive approach which is very slow:

SHDocVw.InternetExplorer ie = new SHDocVw.InternetExplorerClass();
object o = System.Reflection.Missing.Value;
object url = "file://" + Path.Combine(Directory.GetCurrentDirectory(),
@"..\..\test\test1.html");

ie.Navigate2(ref url,ref o,ref o,ref o,ref o);
while(ie.Busy){Thread.Sleep(2);}
IHTMLDocument2 d = (IHTMLDocument2) ie.Document;
IHTMLElementCollection all = d.all;
foreach (object el in all)
{
HTMLTableClass t = el as HTMLTableClass;
if(t!=null)
{
if( 3 == t.cells.length)
{
foreach(HTMLTableRow c in t.rows)
{
Console.WriteLine(c.innerText);
}
}

}
}

Nov 15 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
2296
by: Moist | last post by:
Hi, I have an embedded HTML object as follows (ignore the code tag): <code> <object id="page" data="table.html" type="text/html" .... > </code> I look for the Javascript code (placed in...
8
1886
by: Mervin Williams | last post by:
From within a Windows form, I need create a html page and open it within Internet Explorer. Does anyone know whether this is possible within a Windows Forms application? If so, please provide an...
9
30042
by: philipl | last post by:
hi, Does anyone have any sample code for this?? I can't find anything relvant at all. Please share out some code if you have any. thx
3
5048
by: ddd | last post by:
I am trying to use MSHTML without the webbrowser and I am having a few problems. Right now all I am trying to do is load an URL(html page) and access its innerhtml. What I have is: Dim doc As...
2
4494
by: holder | last post by:
Folks, I have a simple web page with frames. I am trying to automate internet explorer to display the page and allow access to its frames. Here is the sample code : using mshtml; static void...
2
2099
by: justplain.kzn | last post by:
Hi, I have a table with dynamic html that contains drop down select lists and readonly text boxes. Dynamic calculations are done on change of a value in one of the drop down select lists. ...
1
4913
by: basavaraj koti | last post by:
I need to show image using xslt Below provided in my xml and xslt. <?xml version="1.0" encoding="iso-8859-1"?> <?xml-stylesheet type="text/xsl" href="../xyz.xsl"?> <Grade class="03"...
2
1321
by: abhishekbrave | last post by:
Hi I have a combo box having the values of year. <select name=year value=2008> <option value=2007>2007</option> <option value=2007>2006</option> <option value=2007>2005</option> </select>& I...
13
2423
by: maddiashok | last post by:
Hi, How to redirect the Html page in spcicific location, while clicking hyperlink or link button. Please help me i am fresher for Asp.net. its urgent Thank you....
0
7308
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
7371
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
7479
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
1
5037
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
4702
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3188
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
1534
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
757
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
410
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.