473,395 Members | 1,774 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Performance over COM

Recently I wrote a program to help my wife to extract data from about
500 excel files. For each file, around 1000 cells are visited and the
texts are retrieved.

Firstly I wrote a python script, used win32com module. The program ran
well, but took 78 mins to accomplished the task. Of course it was a
little bit slow, so I gave csharp a go. The csharp program took 72
mins.

Both Excel and .Net are MS product, run on a microsoft operating
system. However, I don't see any performance advantages. I don't want
to fire a script/compile laugange or opensource/ms war. My question is,
any tricks to speed COM related operation up in csharp, without using
multiple threading?

My code:
<pre>
Application excelApp=new Application();

// Go through 500+ excel files
foreach(string file in fileList){

excelApp.Workbooks.Open(file,Type.Missing, Type.Missing, Type.Missing,
Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing,
Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing,
Type.Missing);

// Go through 1000+ cells in each file
foreach(DictionaryEntry de in cellHash){
string id=(string)de.Key;
string[] val=(string[])de.Value;
string
v=(string)excelApp.Sheet[val[1]].get_Range(val[0],Missing.Value).Text;
Console.WriteLine("{0}\t{1}\t{2}\t{3}",id,day,time ,v);
}

excelApp.Workbooks.Close();
}
</pre>

Sep 9 '06 #1
4 1383
If what you have shown us is, in fact, the code for your actual process then
some factors jump out and hit one between the eyes, including:

the retrieval of the value from the Excel cell is executed 500,000+
times

the Console.Writeline instructuion is also executed 500,000+ times

If you run the program with the Console.Writeline instruction commented out,
you will get a feel for just how long the interaction with Excel takes.

Each interaction with Excel is an expensive operation so the ideal situation
is to reduce the number of interactions to the absolute minimum.

You are reading from the same cells in each Workbook. Instead of reading
form each cell, try reading the the entire range into an array and then
reading the individual 'cells' from the array. This will reduce the number
of interactions with Excel some 1000 fold.

"Linan" <ta*******@gmail.comwrote in message
news:11**********************@e3g2000cwe.googlegro ups.com...
Recently I wrote a program to help my wife to extract data from about
500 excel files. For each file, around 1000 cells are visited and the
texts are retrieved.

Firstly I wrote a python script, used win32com module. The program ran
well, but took 78 mins to accomplished the task. Of course it was a
little bit slow, so I gave csharp a go. The csharp program took 72
mins.

Both Excel and .Net are MS product, run on a microsoft operating
system. However, I don't see any performance advantages. I don't want
to fire a script/compile laugange or opensource/ms war. My question is,
any tricks to speed COM related operation up in csharp, without using
multiple threading?

My code:
<pre>
Application excelApp=new Application();

// Go through 500+ excel files
foreach(string file in fileList){

excelApp.Workbooks.Open(file,Type.Missing, Type.Missing, Type.Missing,
Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing,
Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing,
Type.Missing);

// Go through 1000+ cells in each file
foreach(DictionaryEntry de in cellHash){
string id=(string)de.Key;
string[] val=(string[])de.Value;
string
v=(string)excelApp.Sheet[val[1]].get_Range(val[0],Missing.Value).Text;
Console.WriteLine("{0}\t{1}\t{2}\t{3}",id,day,time ,v);
}

excelApp.Workbooks.Close();
}
</pre>

Sep 9 '06 #2
I will add to that that instead of opening Excell with Automation (something
that will always be slow whatever the scripting language you are using); you
can try reading the file using the JET OLEDB provider or the Excel ODBC
Provider:

http://support.microsoft.com/default...b;en-us;278973

http://msdn2.microsoft.com/en-us/library/ms175866.aspx

http://www.carlprothman.net/Default....rosoftJetExcel

http://www.carlprothman.net/Default....DriverForExcel

Finally, for this type of work, probably that you should use a database
(Access or SQL-Server) instead of Excel. Excel is something like a big
calculator, not a database.

--
Sylvain Lafontaine, ing.
MVP - Technologies Virtual-PC
E-mail: http://cerbermail.com/?QugbLEWINF
"Stephany Young" <noone@localhostwrote in message
news:OS**************@TK2MSFTNGP05.phx.gbl...
If what you have shown us is, in fact, the code for your actual process
then some factors jump out and hit one between the eyes, including:

the retrieval of the value from the Excel cell is executed 500,000+
times

the Console.Writeline instructuion is also executed 500,000+ times

If you run the program with the Console.Writeline instruction commented
out, you will get a feel for just how long the interaction with Excel
takes.

Each interaction with Excel is an expensive operation so the ideal
situation is to reduce the number of interactions to the absolute minimum.

You are reading from the same cells in each Workbook. Instead of reading
form each cell, try reading the the entire range into an array and then
reading the individual 'cells' from the array. This will reduce the
number of interactions with Excel some 1000 fold.

"Linan" <ta*******@gmail.comwrote in message
news:11**********************@e3g2000cwe.googlegro ups.com...
>Recently I wrote a program to help my wife to extract data from about
500 excel files. For each file, around 1000 cells are visited and the
texts are retrieved.

Firstly I wrote a python script, used win32com module. The program ran
well, but took 78 mins to accomplished the task. Of course it was a
little bit slow, so I gave csharp a go. The csharp program took 72
mins.

Both Excel and .Net are MS product, run on a microsoft operating
system. However, I don't see any performance advantages. I don't want
to fire a script/compile laugange or opensource/ms war. My question is,
any tricks to speed COM related operation up in csharp, without using
multiple threading?

My code:
<pre>
Application excelApp=new Application();

// Go through 500+ excel files
foreach(string file in fileList){

excelApp.Workbooks.Open(file,Type.Missing, Type.Missing, Type.Missing,
Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing,
Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing,
Type.Missing);

// Go through 1000+ cells in each file
foreach(DictionaryEntry de in cellHash){
string id=(string)de.Key;
string[] val=(string[])de.Value;
string
v=(string)excelApp.Sheet[val[1]].get_Range(val[0],Missing.Value).Text;
Console.WriteLine("{0}\t{1}\t{2}\t{3}",id,day,tim e,v);
}

excelApp.Workbooks.Close();
}
</pre>


Sep 9 '06 #3
On 9 Sep 2006 15:22:34 -0700, "Linan" <ta*******@gmail.comwrote:
>Recently I wrote a program to help my wife to extract data from about
500 excel files. For each file, around 1000 cells are visited and the
texts are retrieved.

Firstly I wrote a python script, used win32com module. The program ran
well, but took 78 mins to accomplished the task. Of course it was a
little bit slow, so I gave csharp a go. The csharp program took 72
mins.

Both Excel and .Net are MS product, run on a microsoft operating
system. However, I don't see any performance advantages. I don't want
to fire a script/compile laugange or opensource/ms war. My question is,
any tricks to speed COM related operation up in csharp, without using
multiple threading?

My code:
<pre>
Application excelApp=new Application();

// Go through 500+ excel files
foreach(string file in fileList){

excelApp.Workbooks.Open(file,Type.Missing, Type.Missing, Type.Missing,
Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing,
Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing,
Type.Missing);

// Go through 1000+ cells in each file
foreach(DictionaryEntry de in cellHash){
string id=(string)de.Key;
string[] val=(string[])de.Value;
string
v=(string)excelApp.Sheet[val[1]].get_Range(val[0],Missing.Value).Text;
Console.WriteLine("{0}\t{1}\t{2}\t{3}",id,day,time ,v);
}

excelApp.Workbooks.Close();
}
</pre>
Ask your wife to save her files as CSV files and process them by reading the
text files produced by the Excel save to CSV. You will be able to read them
much faster and process them faster too.
Good luck with your project,

Otis Mukinfus
http://www.arltex.com
http://www.tomchilders.com
Sep 10 '06 #4
The OleDb solution speeds up the program. Now it takes only 35 mins.

Thank you very much!

Sylvain Lafontaine (fill the blanks, no spam please) wrote:
I will add to that that instead of opening Excell with Automation (something
that will always be slow whatever the scripting language you are using); you
can try reading the file using the JET OLEDB provider or the Excel ODBC
Provider:

http://support.microsoft.com/default...b;en-us;278973

http://msdn2.microsoft.com/en-us/library/ms175866.aspx

http://www.carlprothman.net/Default....rosoftJetExcel

http://www.carlprothman.net/Default....DriverForExcel

Finally, for this type of work, probably that you should use a database
(Access or SQL-Server) instead of Excel. Excel is something like a big
calculator, not a database.

--
Sylvain Lafontaine, ing.
MVP - Technologies Virtual-PC
E-mail: http://cerbermail.com/?QugbLEWINF
"Stephany Young" <noone@localhostwrote in message
news:OS**************@TK2MSFTNGP05.phx.gbl...
If what you have shown us is, in fact, the code for your actual process
then some factors jump out and hit one between the eyes, including:

the retrieval of the value from the Excel cell is executed 500,000+
times

the Console.Writeline instructuion is also executed 500,000+ times

If you run the program with the Console.Writeline instruction commented
out, you will get a feel for just how long the interaction with Excel
takes.

Each interaction with Excel is an expensive operation so the ideal
situation is to reduce the number of interactions to the absolute minimum.

You are reading from the same cells in each Workbook. Instead of reading
form each cell, try reading the the entire range into an array and then
reading the individual 'cells' from the array. This will reduce the
number of interactions with Excel some 1000 fold.

"Linan" <ta*******@gmail.comwrote in message
news:11**********************@e3g2000cwe.googlegro ups.com...
Recently I wrote a program to help my wife to extract data from about
500 excel files. For each file, around 1000 cells are visited and the
texts are retrieved.

Firstly I wrote a python script, used win32com module. The program ran
well, but took 78 mins to accomplished the task. Of course it was a
little bit slow, so I gave csharp a go. The csharp program took 72
mins.

Both Excel and .Net are MS product, run on a microsoft operating
system. However, I don't see any performance advantages. I don't want
to fire a script/compile laugange or opensource/ms war. My question is,
any tricks to speed COM related operation up in csharp, without using
multiple threading?

My code:
<pre>
Application excelApp=new Application();

// Go through 500+ excel files
foreach(string file in fileList){

excelApp.Workbooks.Open(file,Type.Missing, Type.Missing, Type.Missing,
Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing,
Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing,
Type.Missing);

// Go through 1000+ cells in each file
foreach(DictionaryEntry de in cellHash){
string id=(string)de.Key;
string[] val=(string[])de.Value;
string
v=(string)excelApp.Sheet[val[1]].get_Range(val[0],Missing.Value).Text;
Console.WriteLine("{0}\t{1}\t{2}\t{3}",id,day,time ,v);
}

excelApp.Workbooks.Close();
}
</pre>
Sep 11 '06 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

25
by: Brian Patterson | last post by:
I have noticed in the book of words that hasattr works by calling getattr and raising an exception if no such attribute exists. If I need the value in any case, am I better off using getattr...
12
by: Fred | last post by:
Has anyone a link or any information comparing c and c++ as far as execution speed is concerned? Signal Processing algorithms would be welcome... Thanks Fred
12
by: serge | last post by:
I have an SP that is big, huge, 700-800 lines. I am not an expert but I need to figure out every possible way that I can improve the performance speed of this SP. In the next couple of weeks I...
6
by: teedilo | last post by:
We have an application with a SQL Server 2000 back end that is fairly database intensive -- lots of fairly frequent queries, inserts, updates -- the gamut. The application does not make use of...
5
by: Scott | last post by:
I have a customer that had developed an Access97 application to track their business information. The application grew significantly and they used the Upsizing Wizard to move the tables to SQL...
115
by: Mark Shelor | last post by:
I've encountered a troublesome inconsistency in the C-language Perl extension I've written for CPAN (Digest::SHA). The problem involves the use of a static array within a performance-critical...
13
by: bjarne | last post by:
Willy Denoyette wrote; > ... it > was not the intention of StrousTrup to the achieve the level of efficiency > of C when he invented C++, ... Ahmmm. It was my aim to match the performance...
13
by: Bern McCarty | last post by:
I have run an experiment to try to learn some things about floating point performance in managed C++. I am using Visual Studio 2003. I was hoping to get a feel for whether or not it would make...
7
by: Michael D. Ober | last post by:
When calling Enqueue, the internal array may need to be reallocated. My question is by how much? In the old MFC array classes, you could tell MFC how many additional elements to add to the array...
1
by: jvn | last post by:
I am experiencing a particular problem with performance counters. I have created a set of classes, that uses System.Diagnostics.PerformanceCounter to increment custom performance counters (using...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.