Hi there,
I need to parse a lot of html-files from wikipedia, and I need to do it
as fast as possible. So I started a little testing with XMLTextReader,
but the results I get confuse me. It seems that the Reader ALWAYS needs
about 1 second for the first textReader.Read()
Here's my testcode:
XmlTextReader _myReader = new XmlTextReader(textBox1.Text);
DateTime _firstRead = DateTime.MinValue;
DateTime _start = DateTime.Now;
_myReader.Read();
_firstRead = DateTime.Now;
while (_myReader.Read())
{
}
MessageBox.Show("FirstRead: " + Convert.ToString(_firstRead - _start) +
". Overall: " + Convert.ToString(DateTime.Now - _start));
And here are the results for a 134kb file:
FirstRead: 0.9218750. Overall: 0.9375
I get a similar result for a 15kb file. (And btw. the same result when
using DOM)
Any ideas why it takes so long for the first read and what to do about it?
I have downloaded the whole wikipedia and extract it to the filesystem.
Is that the reason?
Ciao,
Frank