473,394 Members | 2,020 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

Reading an Ascii string

Hi,

I'm a beginner is using C# and .net.

I have big legacy files that stores various values (ints, bytes, strings) and want to read them into
a C# programme so that I can store them in a database. The files are written by a late 1980's PC
Pascal programme, for which I don't have the source code. I've managed to reverse engineer the file
format.

The strings are stored as Ascii in the file, with the first byte indicating the string length, and
the rest are the Ascii (ie 8-bit) characters. The string length is always 0, 20 or 40 characters
(never any more) and strings are end-padded with space characters where necessary.

What is the best way to quickly read a string and get rid of the space padding at the end? To make
sure I can read them correctly, I'll put them in a text box. I assume the string used in a test box
uses 16-bit characters (unicode?) but I may be wrong here. When I'm happy I can read them correctly,
I'll get rid of the text box and store them directly in the database. Is it best to store it in the
database as unicode? I'm tempted to use Ascii for efficiency.

I was thinking of using a binary reader (_br) to extract from the file. That should be fine for
everything, but I don't know how to cope the the Ascii strings.
Jul 8 '06 #1
18 9004
Hi John,

First, ASCII is 7-bit, and any character above 127 will need the proper encoding to be read right.
I'm assuming the characters are stored as 8 bit.

You can either read the FileStream directly or as a single string from atextbox.

You will need a loop, which in this case would be something like

index = 0
ArrayList strings

while(index < length of data)
{
numbytes = data[index]
index++

copy the next numbytes bytes to a new string
strings.Add(newstring)

index+= numbytes

remove space padding if needed
index++ if needed
}

It may be easiest to treat the data as a char array or as a string, in which case a textbox should be easy enough. Using a FileStream you wouldneed to read the file as ASCII.

File.ReadAllText(filepath, System.Text.Encoding.ASCII); (C# 2.0)

If the file isn't ASCII, but uses all 8 bits for data, then you need to figure out the correct encoding by trial and error.
--
Happy coding!
Morten Wennevik [C# MVP]
Jul 8 '06 #2
Check out the System.Text Namespace, specifically the Encoding, Encoder, and
Decoder classes.

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist

Big thicks are made up of lots of little thins.
"John" <-wrote in message news:OM**************@TK2MSFTNGP03.phx.gbl...
Hi,

I'm a beginner is using C# and .net.

I have big legacy files that stores various values (ints, bytes, strings)
and want to read them into
a C# programme so that I can store them in a database. The files are
written by a late 1980's PC
Pascal programme, for which I don't have the source code. I've managed to
reverse engineer the file
format.

The strings are stored as Ascii in the file, with the first byte
indicating the string length, and
the rest are the Ascii (ie 8-bit) characters. The string length is always
0, 20 or 40 characters
(never any more) and strings are end-padded with space characters where
necessary.

What is the best way to quickly read a string and get rid of the space
padding at the end? To make
sure I can read them correctly, I'll put them in a text box. I assume the
string used in a test box
uses 16-bit characters (unicode?) but I may be wrong here. When I'm happy
I can read them correctly,
I'll get rid of the text box and store them directly in the database. Is
it best to store it in the
database as unicode? I'm tempted to use Ascii for efficiency.

I was thinking of using a binary reader (_br) to extract from the file.
That should be fine for
everything, but I don't know how to cope the the Ascii strings.


Jul 8 '06 #3
John wrote:
Hi,

I'm a beginner is using C# and .net.

I have big legacy files that stores various values (ints, bytes, strings) and want to read them into
a C# programme so that I can store them in a database. The files are written by a late 1980's PC
Pascal programme, for which I don't have the source code. I've managed to reverse engineer the file
format.

The strings are stored as Ascii in the file, with the first byte indicating the string length, and
the rest are the Ascii (ie 8-bit) characters.
Yes, that's how strings are stored in Pascal.
The string length is always 0, 20 or 40 characters
(never any more) and strings are end-padded with space characters where necessary.
Does the length include the padding or not? If it does, you just have to
trim the string. If it doesn't, you have to calculate how much padding
there is from the length of the string, and skip that number of bytes.
What is the best way to quickly read a string and get rid of the space padding at the end?
Read the length using ReadByte, then use the ReadChars method to get the
string. You get an array of Char, if you want a string just create one
from the array.
To make
sure I can read them correctly, I'll put them in a text box. I assume the string used in a test box
uses 16-bit characters (unicode?) but I may be wrong here. When I'm happy I can read them correctly,
I'll get rid of the text box and store them directly in the database. Is it best to store it in the
database as unicode? I'm tempted to use Ascii for efficiency.

I was thinking of using a binary reader (_br) to extract from the file. That should be fine for
everything, but I don't know how to cope the the Ascii strings.
Yes, a BinaryReader is exactly what I would suggest to use.

Specify the encoding when you create the BinaryReader, that way it can
handle reading chars, and you don't have to read bytes and decode them.

ASCII encoding won't work if the strings contains extended characters
(above 127). Use Encoding.GetEncoding(850) to get the encoding for
extended ASCII.
Jul 8 '06 #4
Göran Andersson <gu***@guffa.comwrote:

<snip>
ASCII encoding won't work if the strings contains extended characters
(above 127). Use Encoding.GetEncoding(850) to get the encoding for
extended ASCII.
Well, use GetEncoding(850) to get one particular form of "extended
ASCII". Several different code pages have been called "extended ASCII"
over the course of time.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Jul 8 '06 #5
Thanks for all your replies.

Just to clarify...

"ASCII is 7-bit, and any character above 127 will need the proper encoding to be read right. I'm
assuming the characters are stored as 8 bit."

Sorry for being inprecise. When said "Ascii (ie 8-bit) characters", I meant that they are stored as
bytes rather than 16-bit quantities as unicode requires. The Ascii characters are all 7-bit from a
quick glance but really I ought to write a quick test programme to check this, which would find and
flag up non-ascii characters. Then I could try to deduce what the encoding is. The original
DOS-based [stock control] programme that created the file was by a UK company and this software was
used on a PC in the UK to generate the files. Is there a standard type of encoding for the UK? Is
this likely to be extended ASCII, for which Göran Andersson suggested using
Encoding.GetEncoding(850)? Thanks Jon Skeet for your comment about extended Ascii. I suppose that
there could potentially be things like a letter e with an acute accent, and I don't want to mangle
these. Once I've discovered the encoding, I'm pretty certain that it will be consistent across all
the files.

Göran Andersson: "Does the length include the padding or not?"

It does include padding, so a string with three characters appears as byte 0x14 (ie 20) followed by
the three characters followed by 17 space characters. I will trim the string.

"John" <-wrote in message news:OM**************@TK2MSFTNGP03.phx.gbl...
Hi,

I'm a beginner is using C# and .net.

I have big legacy files that stores various values (ints, bytes, strings) and want to read them into
a C# programme so that I can store them in a database. The files are written by a late 1980's PC
Pascal programme, for which I don't have the source code. I've managed to reverse engineer the file
format.

The strings are stored as Ascii in the file, with the first byte indicating the string length, and
the rest are the Ascii (ie 8-bit) characters. The string length is always 0, 20 or 40 characters
(never any more) and strings are end-padded with space characters where necessary.

What is the best way to quickly read a string and get rid of the space padding at the end? To make
sure I can read them correctly, I'll put them in a text box. I assume the string used in a test box
uses 16-bit characters (unicode?) but I may be wrong here. When I'm happy I can read them correctly,
I'll get rid of the text box and store them directly in the database. Is it best to store it in the
database as unicode? I'm tempted to use Ascii for efficiency.

I was thinking of using a binary reader (_br) to extract from the file. That should be fine for
everything, but I don't know how to cope the the Ascii strings.

Jul 8 '06 #6
Thanks, that's very helpful.

"Read the length using ReadByte, then use the ReadChars method to get the string. You get an array
of Char, if you want a string just create one from the array."

I've just tried this. How do I create a string from an array of char?

The following didn't work - toString() could not do the conversion:
string str;
char[] charArray;
for(...){
str = charArray.ToString();
}
This worked, but it seems very inefficient to have to create a new string every time:
char[] charArray;
for(...){
str = new string(charArray);
}

"Göran Andersson" <gu***@guffa.comwrote in message news:ON**************@TK2MSFTNGP03.phx.gbl...
John wrote:
Hi,

I'm a beginner is using C# and .net.

I have big legacy files that stores various values (ints, bytes, strings) and want to read them
into
a C# programme so that I can store them in a database. The files are written by a late 1980's PC
Pascal programme, for which I don't have the source code. I've managed to reverse engineer the
file
format.

The strings are stored as Ascii in the file, with the first byte indicating the string length, and
the rest are the Ascii (ie 8-bit) characters.
Yes, that's how strings are stored in Pascal.
The string length is always 0, 20 or 40 characters
(never any more) and strings are end-padded with space characters where necessary.
Does the length include the padding or not? If it does, you just have to
trim the string. If it doesn't, you have to calculate how much padding
there is from the length of the string, and skip that number of bytes.
What is the best way to quickly read a string and get rid of the space padding at the end?
Read the length using ReadByte, then use the ReadChars method to get the
string. You get an array of Char, if you want a string just create one
from the array.
To make
sure I can read them correctly, I'll put them in a text box. I assume the string used in a test
box
uses 16-bit characters (unicode?) but I may be wrong here. When I'm happy I can read them
correctly,
I'll get rid of the text box and store them directly in the database. Is it best to store it in
the
database as unicode? I'm tempted to use Ascii for efficiency.

I was thinking of using a binary reader (_br) to extract from the file. That should be fine for
everything, but I don't know how to cope the the Ascii strings.
Yes, a BinaryReader is exactly what I would suggest to use.

Specify the encoding when you create the BinaryReader, that way it can
handle reading chars, and you don't have to read bytes and decode them.

ASCII encoding won't work if the strings contains extended characters
(above 127). Use Encoding.GetEncoding(850) to get the encoding for
extended ASCII.
Jul 8 '06 #7
String s = new String(chararray);
On Sat, 08 Jul 2006 21:09:08 +0200, John <-wrote:
Thanks, that's very helpful.

"Read the length using ReadByte, then use the ReadChars method to get the string. You get an array
of Char, if you want a string just create one from the array."

I've just tried this. How do I create a string from an array of char?

The following didn't work - toString() could not do the conversion:
string str;
char[] charArray;
for(...){
str = charArray.ToString();
}
This worked, but it seems very inefficient to have to create a new string every time:
char[] charArray;
for(...){
str = new string(charArray);
}

"Göran Andersson" <gu***@guffa.comwrote in message news:ON**************@TK2MSFTNGP03.phx.gbl...
John wrote:
>Hi,

I'm a beginner is using C# and .net.

I have big legacy files that stores various values (ints, bytes, strings) and want to read them
into
a C# programme so that I can store them in a database. The files are written by a late 1980's PC
Pascal programme, for which I don't have the source code. I've managed to reverse engineer the
file
format.

The strings are stored as Ascii in the file, with the first byte indicating the string length, and
the rest are the Ascii (ie 8-bit) characters.

Yes, that's how strings are stored in Pascal.
>The string length is always 0, 20 or 40 characters
(never any more) and strings are end-padded with space characters where necessary.

Does the length include the padding or not? If it does, you just have to
trim the string. If it doesn't, you have to calculate how much padding
there is from the length of the string, and skip that number of bytes.
>What is the best way to quickly read a string and get rid of the space padding at the end?

Read the length using ReadByte, then use the ReadChars method to get the
string. You get an array of Char, if you want a string just create one
from the array.
>To make
sure I can read them correctly, I'll put them in a text box. I assume the string used in a test
box
uses 16-bit characters (unicode?) but I may be wrong here. When I'm happy I can read them
correctly,
I'll get rid of the text box and store them directly in the database. Is it best to store it in
the
database as unicode? I'm tempted to use Ascii for efficiency.

I was thinking of using a binary reader (_br) to extract from the file. That should be fine for
everything, but I don't know how to cope the the Ascii strings.

Yes, a BinaryReader is exactly what I would suggest to use.

Specify the encoding when you create the BinaryReader, that way it can
handle reading chars, and you don't have to read bytes and decode them.

ASCII encoding won't work if the strings contains extended characters
(above 127). Use Encoding.GetEncoding(850) to get the encoding for
extended ASCII.


--
Happy coding!
Morten Wennevik [C# MVP]
Jul 8 '06 #8
Sorry, a bit fast on the send button there.

new String(charArray) is the way to go. It is not any less efficient than ToString would be since ToString would create a new string as well.

Array.ToString is not overridden and will merely return the object type.

On Sat, 08 Jul 2006 21:21:55 +0200, Morten Wennevik <Mo************@hotmail.comwrote:
String s = new String(chararray);
On Sat, 08 Jul 2006 21:09:08 +0200, John <-wrote:
>Thanks, that's very helpful.

"Read the length using ReadByte, then use the ReadChars method to get the string. You get an array
of Char, if you want a string just create one from the array."

I've just tried this. How do I create a string from an array of char?

The following didn't work - toString() could not do the conversion:
string str;
char[] charArray;
for(...){
str = charArray.ToString();
}
This worked, but it seems very inefficient to have to create a new string every time:
char[] charArray;
for(...){
str = new string(charArray);
}

"Göran Andersson" <gu***@guffa.comwrote in message news:ON**************@TK2MSFTNGP03.phx.gbl...
John wrote:
>>Hi,

I'm a beginner is using C# and .net.

I have big legacy files that stores various values (ints, bytes, strings) and want to read them
into
a C# programme so that I can store them in a database. The files are written by a late 1980's PC
Pascal programme, for which I don't have the source code. I've managed to reverse engineer the
file
format.

The strings are stored as Ascii in the file, with the first byte indicating the string length, and
the rest are the Ascii (ie 8-bit) characters.

Yes, that's how strings are stored in Pascal.
>>The string length is always 0, 20 or 40 characters
(never any more) and strings are end-padded with space characters where necessary.

Does the length include the padding or not? If it does, you just have to
trim the string. If it doesn't, you have to calculate how much padding
there is from the length of the string, and skip that number of bytes.
>>What is the best way to quickly read a string and get rid of the space padding at the end?

Read the length using ReadByte, then use the ReadChars method to get the
string. You get an array of Char, if you want a string just create one
from the array.
>>To make
sure I can read them correctly, I'll put them in a text box. I assume the string used in a test
box
uses 16-bit characters (unicode?) but I may be wrong here. When I'm happy I can read them
correctly,
I'll get rid of the text box and store them directly in the database. Is it best to store it in
the
database as unicode? I'm tempted to use Ascii for efficiency.

I was thinking of using a binary reader (_br) to extract from the file. That should be fine for
everything, but I don't know how to cope the the Ascii strings.

Yes, a BinaryReader is exactly what I would suggest to use.

Specify the encoding when you create the BinaryReader, that way it can
handle reading chars, and you don't have to read bytes and decode them.

ASCII encoding won't work if the strings contains extended characters
(above 127). Use Encoding.GetEncoding(850) to get the encoding for
extended ASCII.




--
Happy coding!
Morten Wennevik [C# MVP]
Jul 8 '06 #9
Here you can see the most common DOS code pages:

http://en.wikipedia.org/wiki/Codepage

Codepage 850 is the most likely for a brittish computer.

John wrote:
Thanks for all your replies.

Just to clarify...

"ASCII is 7-bit, and any character above 127 will need the proper encoding to be read right. I'm
assuming the characters are stored as 8 bit."

Sorry for being inprecise. When said "Ascii (ie 8-bit) characters", I meant that they are stored as
bytes rather than 16-bit quantities as unicode requires. The Ascii characters are all 7-bit from a
quick glance but really I ought to write a quick test programme to check this, which would find and
flag up non-ascii characters. Then I could try to deduce what the encoding is. The original
DOS-based [stock control] programme that created the file was by a UK company and this software was
used on a PC in the UK to generate the files. Is there a standard type of encoding for the UK? Is
this likely to be extended ASCII, for which Göran Andersson suggested using
Encoding.GetEncoding(850)? Thanks Jon Skeet for your comment about extended Ascii. I suppose that
there could potentially be things like a letter e with an acute accent, and I don't want to mangle
these. Once I've discovered the encoding, I'm pretty certain that it will be consistent across all
the files.

Göran Andersson: "Does the length include the padding or not?"

It does include padding, so a string with three characters appears as byte 0x14 (ie 20) followed by
the three characters followed by 17 space characters. I will trim the string.

"John" <-wrote in message news:OM**************@TK2MSFTNGP03.phx.gbl...
Hi,

I'm a beginner is using C# and .net.

I have big legacy files that stores various values (ints, bytes, strings) and want to read them into
a C# programme so that I can store them in a database. The files are written by a late 1980's PC
Pascal programme, for which I don't have the source code. I've managed to reverse engineer the file
format.

The strings are stored as Ascii in the file, with the first byte indicating the string length, and
the rest are the Ascii (ie 8-bit) characters. The string length is always 0, 20 or 40 characters
(never any more) and strings are end-padded with space characters where necessary.

What is the best way to quickly read a string and get rid of the space padding at the end? To make
sure I can read them correctly, I'll put them in a text box. I assume the string used in a test box
uses 16-bit characters (unicode?) but I may be wrong here. When I'm happy I can read them correctly,
I'll get rid of the text box and store them directly in the database. Is it best to store it in the
database as unicode? I'm tempted to use Ascii for efficiency.

I was thinking of using a binary reader (_br) to extract from the file. That should be fine for
everything, but I don't know how to cope the the Ascii strings.
Jul 8 '06 #10
Thanks Morten,

Does this mean that if I have say 1 million strings to read in, a new string must be allocated for
each? This must add substantial overhead. If so, what is the earliest time the the strings can be
garbage collected by the runtime - within the for loop or (sounds very inefficient) at the end of
the for loop?

I'm from a C background where I knew exactly what was happening, so what appears to me to be
happening with the C# code looks extremely inefficient, although my knowledge of what goes on behind
the scenes is poor, so I may be missing something.

I know that stringbuilder doesn't allocated new strings whenever a change is made to a string, so
would it be possible (and more efficient) to use this instead?

"Morten Wennevik" <Mo************@hotmail.comwrote in message news:op.tcdxqne1klbvpo@stone...
Sorry, a bit fast on the send button there.

new String(charArray) is the way to go. It is not any less efficient than ToString would be since
ToString would create a new string as well.

Array.ToString is not overridden and will merely return the object type.

On Sat, 08 Jul 2006 21:21:55 +0200, Morten Wennevik <Mo************@hotmail.comwrote:
String s = new String(chararray);
On Sat, 08 Jul 2006 21:09:08 +0200, John <-wrote:
>Thanks, that's very helpful.

"Read the length using ReadByte, then use the ReadChars method to get the string. You get an
array
of Char, if you want a string just create one from the array."

I've just tried this. How do I create a string from an array of char?

The following didn't work - toString() could not do the conversion:
string str;
char[] charArray;
for(...){
str = charArray.ToString();
}
This worked, but it seems very inefficient to have to create a new string every time:
char[] charArray;
for(...){
str = new string(charArray);
}

Jul 9 '06 #11
Basically, each time you modify or read a string, a new string is created. Strings are unique in this matter. Once they are created they cannot be changed. Read Jon Skeet's article:

[Strings in .NET and C#]
http://www.yoda.arachsys.com/csharp/strings.html

If your goal is to assemble a long string out of several smaller ones you might benefit from using other methods. Storing all the characters inan array until the complete sentence is read before turning it to string etc.

The StringBuilder is much better at concatenating several strings than string = string + string, but you won't notice any difference unless you have many strings to concatenate.

If a string might be split into N blocks, I would probably use ReadBytesand store the result in a premade array using Array.Copy/CopyTo or Buffer.BlockCopy.

--
Happy coding!
Morten Wennevik [C# MVP]
Jul 9 '06 #12
John wrote:
Thanks Morten,

Does this mean that if I have say 1 million strings to read in, a new string must be allocated for
each?
Yes.
This must add substantial overhead. If so, what is the earliest time the the strings can be
garbage collected by the runtime - within the for loop or (sounds very inefficient) at the end of
the for loop?
They can be garbage collected as soon as they are not used any more. If
you are creating the strings in a loop, it means that all strings that
you created can be collected, except the last one that you are still using.
I'm from a C background where I knew exactly what was happening, so what appears to me to be
happening with the C# code looks extremely inefficient, although my knowledge of what goes on behind
the scenes is poor, so I may be missing something.
It's normal for a .NET program to allocate and release quite some memory
during execution. The garbage collector handles the deallocation, and
also moves allocated objects so that the memory doesn't get fragmented.

To allocate and release objects is faster in a garbage collected
environment than in a traditional heap that uses reference counters.
I know that stringbuilder doesn't allocated new strings whenever a change is made to a string, so
would it be possible (and more efficient) to use this instead?
I don't think that using a StringBuilder just to avoid creating objects
is going to make any big difference in performance.

If you are using a StringBuilder anyway, there is an override of the
Append method that takes a char array, so then you wouldn't need the
step of creating the string from the array.
>
"Morten Wennevik" <Mo************@hotmail.comwrote in message news:op.tcdxqne1klbvpo@stone...
Sorry, a bit fast on the send button there.

new String(charArray) is the way to go. It is not any less efficient than ToString would be since
ToString would create a new string as well.

Array.ToString is not overridden and will merely return the object type.

On Sat, 08 Jul 2006 21:21:55 +0200, Morten Wennevik <Mo************@hotmail.comwrote:
>String s = new String(chararray);
On Sat, 08 Jul 2006 21:09:08 +0200, John <-wrote:
>>Thanks, that's very helpful.

"Read the length using ReadByte, then use the ReadChars method to get the string. You get an
array
of Char, if you want a string just create one from the array."

I've just tried this. How do I create a string from an array of char?

The following didn't work - toString() could not do the conversion:
string str;
char[] charArray;
for(...){
str = charArray.ToString();
}
This worked, but it seems very inefficient to have to create a new string every time:
char[] charArray;
for(...){
str = new string(charArray);
}

Jul 9 '06 #13
Morten Wennevik <Mo************@hotmail.comwrote:
Basically, each time you modify or read a string, a new string is
created. Strings are unique in this matter.
They're not particularly unique - it's easy to create your own
immutable type, and often that can be a really good idea. (It makes it
easy to be thread-safe etc.)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Jul 9 '06 #14

John wrote:
Thanks for all your replies.

Just to clarify...

"ASCII is 7-bit, and any character above 127 will need the proper encoding to be read right. I'm
assuming the characters are stored as 8 bit."

Sorry for being inprecise. When said "Ascii (ie 8-bit) characters", I meant that they are stored as
bytes rather than 16-bit quantities as unicode requires. The Ascii characters are all 7-bit from a
quick glance but really I ought to write a quick test programme to check this, which would find and
flag up non-ascii characters. Then I could try to deduce what the encoding is.
You will have to forgive the group for jumping on your comment... we
see a LOT of people who are completely oblivious to the many
complexities of encoding, so when someone says "ascii (ie 8-bit)" our
tripwires our tripped... it's rare (but nice) to have someone such as
yourself who actually knows that there are potential problems in store
in situations like this.

--
Larry Lard
Replies to group please
When starting a new topic, please mention which version of VB/C# you
are using

Jul 10 '06 #15
Thanks Larry, and everyone else for your very helpful comments. I'm pleased that the group jumped on
my comments. I've learnt quite a bit.

....but I can't help thinking that the way that C# handles strings is inefficient.

Göran Andersson commented: "to allocate and release objects is faster in a garbage collected
environment than in a traditional heap that uses reference counters."

OK, but in C, I would only allocate the string once, and it would be a fixed-length string of 41
characters (max num characters is 40 plus one extra character for the null terminator). I wouldn't
keep allocating it and de-allocating it - I would use it repeatedly, and if I'm using it a million
times in a for loop, then surely this would be much much faster than what C# does, since it doesn't
need to be allocated and de-allocated each time. Or am I missing something here?

Thanks again everyone for your help,

John
"Larry Lard" <la*******@hotmail.comwrote in message
news:11**********************@35g2000cwc.googlegro ups.com...

John wrote:
Thanks for all your replies.

Just to clarify...

"ASCII is 7-bit, and any character above 127 will need the proper encoding to be read right. I'm
assuming the characters are stored as 8 bit."

Sorry for being inprecise. When said "Ascii (ie 8-bit) characters", I meant that they are stored
as
bytes rather than 16-bit quantities as unicode requires. The Ascii characters are all 7-bit from a
quick glance but really I ought to write a quick test programme to check this, which would find
and
flag up non-ascii characters. Then I could try to deduce what the encoding is.
You will have to forgive the group for jumping on your comment... we
see a LOT of people who are completely oblivious to the many
complexities of encoding, so when someone says "ascii (ie 8-bit)" our
tripwires our tripped... it's rare (but nice) to have someone such as
yourself who actually knows that there are potential problems in store
in situations like this.

--
Larry Lard
Replies to group please
When starting a new topic, please mention which version of VB/C# you
are using
Jul 10 '06 #16
<"John" <->wrote:
Thanks Larry, and everyone else for your very helpful comments. I'm
pleased that the group jumped on my comments. I've learnt quite a
bit.

...but I can't help thinking that the way that C# handles strings is
inefficient.

Göran Andersson commented: "to allocate and release objects is faster
in a garbage collected environment than in a traditional heap that
uses reference counters."

OK, but in C, I would only allocate the string once, and it would be
a fixed-length string of 41 characters (max num characters is 40 plus
one extra character for the null terminator). I wouldn't keep
allocating it and de-allocating it - I would use it repeatedly, and
if I'm using it a million times in a for loop, then surely this would
be much much faster than what C# does, since it doesn't need to be
allocated and de-allocated each time. Or am I missing something here?
Yes - what's involved in allocating 40 characters. On the managed heap,
that involves increasing a pointer by 80 bytes, checking whether or not
it's exceeded the boundaries of the generation, and (assuming it
hasn't) zeroing out the memory. It's not a lot of work.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Jul 10 '06 #17
Jon Skeet [C# MVP] wrote:
<"John" <->wrote:
>Thanks Larry, and everyone else for your very helpful comments. I'm
pleased that the group jumped on my comments. I've learnt quite a
bit.

...but I can't help thinking that the way that C# handles strings is
inefficient.

Göran Andersson commented: "to allocate and release objects is faster
in a garbage collected environment than in a traditional heap that
uses reference counters."

OK, but in C, I would only allocate the string once, and it would be
a fixed-length string of 41 characters (max num characters is 40 plus
one extra character for the null terminator). I wouldn't keep
allocating it and de-allocating it - I would use it repeatedly, and
if I'm using it a million times in a for loop, then surely this would
be much much faster than what C# does, since it doesn't need to be
allocated and de-allocated each time. Or am I missing something here?

Yes - what's involved in allocating 40 characters. On the managed heap,
that involves increasing a pointer by 80 bytes, checking whether or not
it's exceeded the boundaries of the generation, and (assuming it
hasn't) zeroing out the memory. It's not a lot of work.
Indeed not a lot of work.

I made a simple test by creating a 40 character array, and create
strings from that array in a loop. Creating a million strings took about
200 ms on my laptop (Pentium M 2.13 GHz).

Actually, as I was using the regular clock to time it, I had to make it
ten million strings to get a reasonable execution time. That means that
the 200 ms includes garbage collections also, not just allocating the
memory.
Jul 11 '06 #18
Thanks again everyone. That's reassured me.

Best wishes,

John

"Göran Andersson" <gu***@guffa.comwrote in message news:%2****************@TK2MSFTNGP05.phx.gbl...
Jon Skeet [C# MVP] wrote:
<"John" <->wrote:
>Thanks Larry, and everyone else for your very helpful comments. I'm
pleased that the group jumped on my comments. I've learnt quite a
bit.

...but I can't help thinking that the way that C# handles strings is
inefficient.

Göran Andersson commented: "to allocate and release objects is faster
in a garbage collected environment than in a traditional heap that
uses reference counters."

OK, but in C, I would only allocate the string once, and it would be
a fixed-length string of 41 characters (max num characters is 40 plus
one extra character for the null terminator). I wouldn't keep
allocating it and de-allocating it - I would use it repeatedly, and
if I'm using it a million times in a for loop, then surely this would
be much much faster than what C# does, since it doesn't need to be
allocated and de-allocated each time. Or am I missing something here?

Yes - what's involved in allocating 40 characters. On the managed heap,
that involves increasing a pointer by 80 bytes, checking whether or not
it's exceeded the boundaries of the generation, and (assuming it
hasn't) zeroing out the memory. It's not a lot of work.
Indeed not a lot of work.

I made a simple test by creating a 40 character array, and create
strings from that array in a loop. Creating a million strings took about
200 ms on my laptop (Pentium M 2.13 GHz).

Actually, as I was using the regular clock to time it, I had to make it
ten million strings to get a reasonable execution time. That means that
the 200 ms includes garbage collections also, not just allocating the
memory.
Jul 11 '06 #19

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Nick | last post by:
Hi ! I want to load an old Pascal-Dos-File where records stand in. When i view the file in a HEX-Editor it's clear how to acces these Strings and chars in that file. Since these are old 8BIT...
14
by: Job Lot | last post by:
I have tab delimited text file which gets populated on daily basis via automated process. New entry is written at the bottom. I need to create a utility which makes a copy of this file with 10 most...
1
by: enrique | last post by:
Our server-side software is reading in Big5-encoded data as ASCII when the web pages are generated. It seems to work most of the time, since the HTML meta tag is declaring Big5 as the charset. ...
0
by: Ed West | last post by:
Hi, I am trying to read a file, make changes, and write it to a new file. The original file has the copyright character © which is ascii 169 I believe, which is more than 7 bits. I am using...
7
by: Drew Berkemeyer | last post by:
Hello, I'm using the following code to read a text file in VB.NET. Dim sr As StreamReader = File.OpenText(strFilePath) Dim input As String = sr.ReadLine() While Not input Is Nothing...
9
by: Macca | last post by:
Hi, I have a synchronous socket server which my app uses to read data from clients. To test this I have a simulated client that sends 100 byte packets. I have set up the socket server so...
0
by: tshad | last post by:
I can't seem to retrieve messages that are not in my mailbox from Exchange. If I am reading mail from my Exchange server, I will get messages that are in my inbox that have already been read but...
11
by: Freddy Coal | last post by:
Hi, I'm trying to read a binary file of 2411 Bytes, I would like load all the file in a String. I make this function for make that: '-------------------------- Public Shared Function...
3
by: Sir Psycho | last post by:
Hi, For some reason, when i step over this code, it returns the full byte stream im expecting from the server, however when I let it run with no intervention, it only seems to grab a small chunk...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.