Hi,
Does anyone know a way to import excel data very fast?
I have spreadsheet data that has to be processed, the spreadsheet has
256 columns and 20,000 rows of data. That's 5,120,000 cells.
As the other poster said, this is a lot of data... so I would look to
see if there is something that can be done there to reduce it. But even
if this is already the less data you need, you can still load it very
fast.
The cells can contain any type of data, but the data has to be
coverted to Double arrays. Currently I read the data from the
spreadsheed using OleDB and put the data into datagrid which is not
too bad - takes maybe a minute, but when I move the data from the
datagrid to double array by looping through each row and column the
process takes about and hour on my laptop with 512 mb. The process
is slow because I have to test each cell to make sure it's double
type and put zero in the array if it's not.
2 things here.
First, I don't think this is related to converting to doubles, this
should be very fast.
There might be 2 reasons for this being slow:
a)OleDb is not really loading the data on that minute, but only some
records. When you actually read the values there is when it loads the
valus from the xls file.
b) A datagrid is slow because it has to visually update the data. Can
you load the data into a dataset instead? this should be quite faster.
Second thing, if you are serious about xls files, I would look for a
third party option. There are a lot of them, and for a reason...
We develop one of those solutions so I will only speak about it, but
you can google around for the others.
I wanted to test how the speed would be, so I created a file with 256
columns, with value 1 on column 1, 2 on 2... etc . Then I copied the
row 20,000 times and added some random data so it would not be so easy.
I tried it with our program, and it took a little less than 10 seconds
to read the whole file and fill the double[][] arrays. (on a 3ghz
pentium 512 mb ram laptop)
If you want to test it yourself, go to
www.tmssoftware.com/go/?flexcelnet
and download our product. Then create a new windows project, add a
reference to flexcel.dll, and add this using clauses:
using FlexCel.Core;
using FlexCel.XlsAdapter;
After that, drop a button on a form, double click it and paste this
code:
private void button1_Click(object sender, System.EventArgs e)
{
DateTime StartTime = DateTime.Now;
XlsFile xls = new XlsFile(true);
xls.Open("toomanycells.xls");
double[][] data = new double[xls.RowCount][];
for (int r = xls.RowCount; r >0; r --)
{
data[r-1] = new double[256];
for (int cindex = xls.ColCountInRow(r); cindex > 0;
cindex--)
{
int c = xls.ColFromIndex(r, cindex);
int XF = 0;
object o = xls.GetCellValueIndexed(r, cindex, ref
XF);
if (o is double)
{
data[r-1][c-1] = (double) o;
}
else
{
TFormula f = o as TFormula;
if (f!=null)
{
if (f.Result is double)
{
data[r][c] = (double)f.Result;
}
}
}
}
}
DateTime EndTime = DateTime.Now;
MessageBox.Show((EndTime - StartTime).ToString());
}
}
I am really interested in knowing your numbers with your file. Please
if you decide to try it let me know how it goes. (or just post an
answer here) Remember to compile on release mode for the test, even
when that will not afect the results very much.
Hope this helps
Best regards,
Adrian.