Hi
I'm trying to learn a bit about performance, hope someone can help me out
I have a text file with 8-bit characters in it. In order to improve performance, I'm using a BinaryReader instead of a StreamReader. I've made two versions of my method, one which uses typesafe code, and one which uses unsafe code with pointers. I've read several places that direct pointer access will eliminate bounds-checking when accessing an array, and would like to see the effect myself. However, the typesafe code is as fast or faster than the unsafe code. The methods are practically identical, except for the access bit
Shared code
BinaryReader reader = ... // Open data reader
int fileSize = ... // Get size of file
int bufSize = 32768
byte[] buf = null
int bytesRead = 0
int totalBytesRead = 0
byte msgStartCode = (byte) '$'
Typesafe version is like
d
buf = reader.ReadBytes(bufSize)
bytesRead = buf.Length
totalBytesRead += bytesRead
for (int bufIndex = 0; bufIndex < buf.Length; bufIndex++
if (buf[bufIndex] == msgStartCode
// Parse message
} // fo
} while (totalBytesRead < fileSize)
wheras the unsafe version is like
d
buf = reader.ReadBytes(bufSize)
bytesRead = buf.Length
totalBytesRead += bytesRead
// Pin memory
fixed (byte* bufPtrUnsigned = &buf[0]
sbyte* bufPtr = (sbyte*) bufPtrUnsigned; // Use to build message strin
for (int bufIndex = 0; bufIndex < buf.Length; bufIndex++
if (buf[bufIndex] == msgStartCode
// Parse message, using e.g. new String(bufPtr, bufIndex, msgLength)
} // fo
} // fixe
} while (totalBytesRead < fileSize)
Shouldn't the last version be faster
Thanks in advance for any help 6 3502
Darn typos...
"Einar Hřst" <an*******@discussions.microsoft.com> wrote in message
news:DB**********************************@microsof t.com... Hi,
I'm trying to learn a bit about performance, hope someone can help me out.
I have a text file with 8-bit characters in it. In order to improve
performance, I'm using a BinaryReader instead of a StreamReader. I've made
two versions of my method, one which uses typesafe code, and one which uses
unsafe code with pointers. I've read several places that direct pointer
access will eliminate bounds-checking when accessing an array, and would
like to see the effect myself. However, the typesafe code is as fast or
faster than the unsafe code. The methods are practically identical, except
for the access bit: Shared code: BinaryReader reader = ... // Open data reader. int fileSize = ... // Get size of file. int bufSize = 32768; byte[] buf = null; int bytesRead = 0; int totalBytesRead = 0; byte msgStartCode = (byte) '$';
Typesafe version is like: do { buf = reader.ReadBytes(bufSize); bytesRead = buf.Length; totalBytesRead += bytesRead;
for (int bufIndex = 0; bufIndex < buf.Length; bufIndex++) { if (buf[bufIndex] == msgStartCode) { // Parse message. } } // for
} while (totalBytesRead < fileSize);
wheras the unsafe version is like:
do { buf = reader.ReadBytes(bufSize); bytesRead = buf.Length; totalBytesRead += bytesRead;
// Pin memory. fixed (byte* bufPtrUnsigned = &buf[0]) { sbyte* bufPtr = (sbyte*) bufPtrUnsigned; // Use to build message
string for (int bufIndex = 0; bufIndex < buf.Length; bufIndex++) { if (buf[bufIndex] == msgStartCode)
Sorry, this should be:
if (bufPtr[bufIndex] == msgStartCode)
....cut & paste is a bad idea.
{ // Parse message, using e.g. new String(bufPtr, bufIndex,
msgLength); } } // for
} // fixed
} while (totalBytesRead < fileSize);
Shouldn't the last version be faster?
Thanks in advance for any help!
Einar H?st <an*******@discussions.microsoft.com> wrote: I have a text file with 8-bit characters in it.
What *exactly* do you mean by "8-bit characters"? Which encoding is the
file using?
In order to improve performance, I'm using a BinaryReader instead of a StreamReader.
What makes you think BinaryReader will be faster than StreamReader?
In particular, are you sure you have a performance problem to start
with? How have you identified this to be the bottleneck?
I've made two versions of my method, one which uses typesafe code, and one which uses unsafe code with pointers. I've read several places that direct pointer access will eliminate bounds-checking when accessing an array, and would like to see the effect myself. However, the typesafe code is as fast or faster than the unsafe code. The methods are practically identical, except for the access bit:
Bounds checking is often removed by the JIT compiler anyway.
--
Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Hi Jon
Thanks for your reply
You see, I'm a bit of a curious newbie. I don't really have any performance problems, I just want to try tweaking my routine in order to learn more about performance and .NET. Its a learning thing more than a necessity. And one of the things I learn from is good questions that make me think :-
Regarding the encoding of the file, I'm not really sure what it is, but it seems that the number of bytes in the file correspond to the number of characters. Is there any check I can do to determine the encoding precisely
I don't know if BinaryReader is any faster than StreamReader. The move from StreamReader to BinaryReader was basically done because I wanted to do without the ReadLine method. You see, I'm reading this file containing messages starting with a dollar sign. I just want one of six messages, and so I thought I'd create less garbage by avoiding to create strings for the unwanted messages. I was examining the app in CLR profiler, and found I allocated 16MB to read a file of 3.5MB, of which 8.5MB was strings. Creating my own byte-buffer and searching for dollar signs has reduced the allocation amount to approximately 6.5MB, of which 3.5MB is bytes and 2.0MB is strings. The number of garbage collections went down from 44 to 4! I've doubled the speed of my routine. To try to squeeze out a little more, I decided to experiment with unsafe code, but so far, I haven't got much effect out of that. I guess the benefits are small compared to the other processing I'm doing in my routine..
If you have any further comments, I'd love to hear them!
Einar H?st <an*******@discussions.microsoft.com> wrote: You see, I'm a bit of a curious newbie. I don't really have any performance problems, I just want to try tweaking my routine in order to learn more about performance and .NET. Its a learning thing more than a necessity. And one of the things I learn from is good questions that make me think :-)
While in general I applaud such sentiments (and I like tweaking with
things myself) I would recommend avoiding unsafe code until you
*really* need it. I haven't even *looked* at it myself, on the grounds
that I can't see myself needing it and if I don't actually know it,
I'll be less tempted to start using it where I don't really need it.
Regarding the encoding of the file, I'm not really sure what it is, but it seems that the number of bytes in the file correspond to the number of characters. Is there any check I can do to determine the encoding precisely?
Not really - it could be any number of encodings. What's producing the
file in the first place?
I don't know if BinaryReader is any faster than StreamReader. The move from StreamReader to BinaryReader was basically done because I wanted to do without the ReadLine method.
Well, you can use StreamReader without using ReadLine. You can read a
character at a time, or a block of characters.
You see, I'm reading this file containing messages starting with a dollar sign. I just want one of six messages, and so I thought I'd create less garbage by avoiding to create strings for the unwanted messages. I was examining the app in CLR profiler, and found I allocated 16MB to read a file of3.5MB, of which 8.5MB was strings.
That sounds about right, yes, assuming the whole thing was being loaded
at a time.
Creating my own byte-buffer and searching for dollar signs has reduced the allocation amount to approximately 6.5MB, of which 3.5MB is bytes and 2.0MB is strings. The number of garbage collections went down from 44 to 4! I've doubled the speed of my routine. To try to squeeze out a little more, I decided to experiment with unsafe code, but so far, I haven't got much effect out of that. I guess the benefits are small compared to the other processing I'm doing in my routine...
To find the dollars, you could read chunks in at a time (e.g. 16K
chars) into a fixed buffer, and search within that buffer. When you've
found the appropriate dollar, read the rest of that buffer and then all
the buffers after that (or whatever).
--
Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om... Einar H?st <an*******@discussions.microsoft.com> wrote: You see, I'm a bit of a curious newbie. I don't really have any performance problems, I just want to try tweaking my routine in order to learn more about performance and .NET. Its a learning thing more than a necessity. And one of the things I learn from is good questions that make me think :-)
While in general I applaud such sentiments (and I like tweaking with things myself) I would recommend avoiding unsafe code until you *really* need it. I haven't even *looked* at it myself, on the grounds that I can't see myself needing it and if I don't actually know it, I'll be less tempted to start using it where I don't really need it.
Indeed, I think it will be a while before I include it in any of my
professional work - if ever. As it turns out, the unsafe approach was
slightly faster (5-10%) than the typesafe one when I did no other
processing, just looked for dollars. However, once I started doing other
stuff - even with the exact same code - it all evened out. Perhaps some side
effect of having pinned memory for a prolonged time? Regarding the encoding of the file, I'm not really sure what it is, but it seems that the number of bytes in the file correspond to the number of characters. Is there any check I can do to determine the encoding precisely?
Not really - it could be any number of encodings. What's producing the file in the first place?
I guess I could find out - it's a data logging program written by a
co-worker. It's reading data from the serial port and persisting it to file.
It's written in C++... I'd guess the guy who wrote it used some default
value in the win32 API if possible. I don't know if BinaryReader is any faster than StreamReader. The move from StreamReader to BinaryReader was basically done because I wanted to do without the ReadLine method.
Well, you can use StreamReader without using ReadLine. You can read a character at a time, or a block of characters.
Yeah, I guess you're right - still, C# characters are 16 bit, right? In
general, would there be any performance differences if the two classes are
used for the same task, I wonder? Creating my own byte-buffer and searching for dollar signs has reduced the allocation amount to approximately 6.5MB, of which 3.5MB is bytes and 2.0MB is strings. The number of garbage collections went down from 44 to 4! I've doubled the speed of my routine. To try to squeeze out a little more, I decided to experiment with unsafe code, but so far, I haven't got much effect out of that. I guess the benefits are small compared to the other processing I'm doing in my routine...
To find the dollars, you could read chunks in at a time (e.g. 16K chars) into a fixed buffer, and search within that buffer. When you've found the appropriate dollar, read the rest of that buffer and then all the buffers after that (or whatever).
Indeed, this is approximately what I do - I read 32K bytes, scan for
dollars, check the message type, skip some bytes if its one of the five I
don't want, parse it otherwise. If I need some extra bytes to figure out the
message type or message content, I read the amount I need from the stream.
Thanks again!
Einar Buffer <_e*******@hotmail.com> wrote: While in general I applaud such sentiments (and I like tweaking with things myself) I would recommend avoiding unsafe code until you *really* need it. I haven't even *looked* at it myself, on the grounds that I can't see myself needing it and if I don't actually know it, I'll be less tempted to start using it where I don't really need it.
Indeed, I think it will be a while before I include it in any of my professional work - if ever. As it turns out, the unsafe approach was slightly faster (5-10%) than the typesafe one when I did no other processing, just looked for dollars. However, once I started doing other stuff - even with the exact same code - it all evened out. Perhaps some side effect of having pinned memory for a prolonged time?
Probably more that the actual work was the bottleneck, not looking for
the dollars. Regarding the encoding of the file, I'm not really sure what it is, but it seems that the number of bytes in the file correspond to the number of characters. Is there any check I can do to determine the encoding precisely?
Not really - it could be any number of encodings. What's producing the file in the first place?
I guess I could find out - it's a data logging program written by a co-worker. It's reading data from the serial port and persisting it to file. It's written in C++... I'd guess the guy who wrote it used some default value in the win32 API if possible.
It may well be Encoding.Default - the default ANSI encoding for the
platform. (That's not the default encoding for StreamReader though.) I don't know if BinaryReader is any faster than StreamReader. The move from StreamReader to BinaryReader was basically done because I wanted to do without the ReadLine method.
Well, you can use StreamReader without using ReadLine. You can read a character at a time, or a block of characters.
Yeah, I guess you're right - still, C# characters are 16 bit, right? In general, would there be any performance differences if the two classes are used for the same task, I wonder?
Probably not, but I wouldn't like to say for sure.
--
Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet
If replying to the group, please do not mail me too This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Lionel B |
last post by:
Greetings,
I need to read (unformatted text) from stdin up to EOF into a char
buffer; of course I cannot allocate my buffer until I know how much
text is available, and I do not know how much...
|
by: Hemanth |
last post by:
Hello there,
Could someone pls give me pointers (or examples) on how to use PHP
(installed on a linux) to read & write MS Excel spreadsheets. So far
I've been converting a spreadsheet to text...
|
by: Michel de Becdeličvre |
last post by:
I have some *performance* trouble reading MathML files in my application (in
ASP.Net).
- I have small MathML files (2-3k) as input
- as (almost) all MathML files these use entities. I have no...
|
by: Magnus |
last post by:
allrite folks, got some questions here...
1) LAY-OUT OF REPORTS
How is it possible to fundamentaly change the lay-out/form of a report in
access? I dont really know it that "difficult", but...
|
by: EkteGjetost |
last post by:
I would like to first apologize to those of you who read my last post
"desperately need help". As a regular on other forums i can understand how
aggravating it would be to have someone come on who...
|
by: Michael Mair |
last post by:
Cheerio,
I would appreciate opinions on the following:
Given the task to read a _complete_ text file into a string:
What is the "best" way to do it?
Handling the buffer is not the problem...
|
by: jccorreu |
last post by:
I've got to read info from multiple files that will be given to me. I
know the format and what the data is. The thing is each time we run
the program we may be using a differnt number of files,...
|
by: arne.muller |
last post by:
Hello,
I've come across some problems reading strucutres from binary files.
Basically I've some strutures
typedef struct {
int i;
double x;
int n;
double *mz;
|
by: Clive Green |
last post by:
Hello peeps,
I am using PHP 5.2.2 together with MP3_Id (a PEAR module for reading and
writing MP3 tags). I have been using PHP on the command line (Mac OS X
Unix shell, to be precise), and am...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 2 August 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM)
The start time is equivalent to 19:00 (7PM) in Central...
|
by: erikbower65 |
last post by:
Here's a concise step-by-step guide for manually installing IntelliJ IDEA:
1. Download: Visit the official JetBrains website and download the IntelliJ IDEA Community or Ultimate edition based on...
|
by: kcodez |
last post by:
As a H5 game development enthusiast, I recently wrote a very interesting little game - Toy Claw ((http://claw.kjeek.com/))。Here I will summarize and share the development experience here, and hope it...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Sept 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM)
The start time is equivalent to 19:00 (7PM) in Central...
|
by: Taofi |
last post by:
I try to insert a new record but the error message says the number of query names and destination fields are not the same
This are my field names
ID, Budgeted, Actual, Status and Differences
...
|
by: DJRhino1175 |
last post by:
When I run this code I get an error, its Run-time error# 424 Object required...This is my first attempt at doing something like this. I test the entire code and it worked until I added this -
If...
|
by: DJRhino |
last post by:
Private Sub CboDrawingID_BeforeUpdate(Cancel As Integer)
If = 310029923 Or 310030138 Or 310030152 Or 310030346 Or 310030348 Or _
310030356 Or 310030359 Or 310030362 Or...
|
by: lllomh |
last post by:
How does React native implement an English player?
|
by: Mushico |
last post by:
How to calculate date of retirement from date of birth
| |