By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,831 Members | 1,031 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,831 IT Pros & Developers. It's quick & easy.

BinaryWriter and string data type

P: n/a
Hi,

I'm looking for info that explains the format of a string data type
when written to a stream using a BinaryWriter. I've looked all over
MSDN and Internet and I cannot seem to find it.

I did some simple testing and it seems that the string data is
prefixed w/a variable number of bytes that indicate the length.

1 byte if length <= 127

2 bytes if length > 127 - 1st byte has high order bit set, remaining
bits indicate number of bytes + (2nd byte * 127).

I haven't explored any further.

Thanx

jra

Nov 15 '05 #1
Share this Question
Share on Google+
6 Replies


P: n/a
Hi John,

UTF8Encoding is used by default.
There is an overloaded constructor that accepts Stream an Encoding if you
wish to change the encoding used.

--
Miha Markic - RightHand .NET consulting & development
miha at rthand com
www.rhand.com

"John Aldrin" <Jo********@msn.com> wrote in message
news:kh********************************@4ax.com...
Hi,

I'm looking for info that explains the format of a string data type
when written to a stream using a BinaryWriter. I've looked all over
MSDN and Internet and I cannot seem to find it.

I did some simple testing and it seems that the string data is
prefixed w/a variable number of bytes that indicate the length.

1 byte if length <= 127

2 bytes if length > 127 - 1st byte has high order bit set, remaining
bits indicate number of bytes + (2nd byte * 127).

I haven't explored any further.

Thanx

jra

Nov 15 '05 #2

P: n/a
Length is encoded on 1, 2, 3, 4 or 5 bytes as follows:

* The int (32 bits) is split in 7 bit chunks.
* The 8th bit is used to indicate if the reader should read further (bit
set) or stop (bit clear).

So, if len < 0x7F, it is encoded on one byte as b0 = len
if len < 0x3FFF, is is encoded on 2 bytes as b0 = (len & 0x7F) | 0x80, b1 =
len >> 7
if len < 0x 1FFFFF, it is encoded on 3 bytes as b0 = (len & 0x7F) | 0x80, b1
= ((len >> 7) & 0x7F) | 0x80, b2 = len >> 14
etc.

len is the length of the UTF8 encoding and it is followed by the UTF8 byte
representation of the string.

If you want the source code, I suggest that you download the Shared Source
Common Language Infrastructure from the Microsoft site (just search for
"Shared Source Common Language Infrastructure" on www.microsoft.com) . The
source files for the binary reader/writer are in the sscli\clr\src\system\io
directory

Bruno.
"John Aldrin" <Jo********@msn.com> a écrit dans le message de
news:kh********************************@4ax.com...
Hi,

I'm looking for info that explains the format of a string data type
when written to a stream using a BinaryWriter. I've looked all over
MSDN and Internet and I cannot seem to find it.

I did some simple testing and it seems that the string data is
prefixed w/a variable number of bytes that indicate the length.

1 byte if length <= 127

2 bytes if length > 127 - 1st byte has high order bit set, remaining
bits indicate number of bytes + (2nd byte * 127).

I haven't explored any further.

Thanx

jra

Nov 15 '05 #3

P: n/a
On Thu, 1 Jan 2004 16:26:13 +0100, "Miha Markic" <miha at rthand com>
wrote:
Hi John,

UTF8Encoding is used by default.
There is an overloaded constructor that accepts Stream an Encoding if you
wish to change the encoding used.


Thanx. I took a quick look at UTF8 encoding and I got the impression
that UTF8 specified how the string data w/in a string looks in memory.
I didn't think it addressed the length part of a string.

When looking at examples for the UTF8Encoding class it appeared that
when strings are converted to byte array's the length part was not in
the byte array.

I'm looking for a definition of the length format when a string in
written to a stream using the binary formatter.

Thanx

jra
Nov 15 '05 #4

P: n/a
Ooops: the tests should be done with <= rather than <. Here is a corrected
version.

if len <= 0x7F, len is encoded on one byte as b0 = len
if len <= 0x3FFF, is is encoded on 2 bytes as b0 = (len & 0x7F) | 0x80, b1 =
len >> 7
if len <= 0x 1FFFFF, it is encoded on 3 bytes as b0 = (len & 0x7F) | 0x80,
b1 = ((len >> 7) & 0x7F) | 0x80, b2 = len >> 14
etc.

Bruno

"Bruno Jouhier [MVP]" <bj******@club-internet.fr> a écrit dans le message de
news:u9**************@TK2MSFTNGP10.phx.gbl...
Length is encoded on 1, 2, 3, 4 or 5 bytes as follows:

* The int (32 bits) is split in 7 bit chunks.
* The 8th bit is used to indicate if the reader should read further (bit
set) or stop (bit clear).

So, if len < 0x7F, it is encoded on one byte as b0 = len
if len < 0x3FFF, is is encoded on 2 bytes as b0 = (len & 0x7F) | 0x80, b1 = len >> 7
if len < 0x 1FFFFF, it is encoded on 3 bytes as b0 = (len & 0x7F) | 0x80, b1 = ((len >> 7) & 0x7F) | 0x80, b2 = len >> 14
etc.

len is the length of the UTF8 encoding and it is followed by the UTF8 byte
representation of the string.

If you want the source code, I suggest that you download the Shared Source
Common Language Infrastructure from the Microsoft site (just search for
"Shared Source Common Language Infrastructure" on www.microsoft.com) . The source files for the binary reader/writer are in the sscli\clr\src\system\io directory

Bruno.
"John Aldrin" <Jo********@msn.com> a écrit dans le message de
news:kh********************************@4ax.com...
Hi,

I'm looking for info that explains the format of a string data type
when written to a stream using a BinaryWriter. I've looked all over
MSDN and Internet and I cannot seem to find it.

I did some simple testing and it seems that the string data is
prefixed w/a variable number of bytes that indicate the length.

1 byte if length <= 127

2 bytes if length > 127 - 1st byte has high order bit set, remaining
bits indicate number of bytes + (2nd byte * 127).

I haven't explored any further.

Thanx

jra


Nov 15 '05 #5

P: n/a
Hi John,

Yes, you are right.
Length is added separately and in a form of 7bit integer (every byte has
7bits only) in little endian (less signicative bytes first) .

--
Miha Markic - RightHand .NET consulting & development
miha at rthand com
www.rhand.com

"John Aldrin" <Jo********@msn.com> wrote in message
news:34********************************@4ax.com...
On Thu, 1 Jan 2004 16:26:13 +0100, "Miha Markic" <miha at rthand com>
wrote:
Hi John,

UTF8Encoding is used by default.
There is an overloaded constructor that accepts Stream an Encoding if you
wish to change the encoding used.


Thanx. I took a quick look at UTF8 encoding and I got the impression
that UTF8 specified how the string data w/in a string looks in memory.
I didn't think it addressed the length part of a string.

When looking at examples for the UTF8Encoding class it appeared that
when strings are converted to byte array's the length part was not in
the byte array.

I'm looking for a definition of the length format when a string in
written to a stream using the binary formatter.

Thanx

jra

Nov 15 '05 #6

P: n/a

Many Thanx

On Thu, 1 Jan 2004 17:54:11 +0100, "Bruno Jouhier [MVP]"
<bj******@club-internet.fr> wrote:
Length is encoded on 1, 2, 3, 4 or 5 bytes as follows:

* The int (32 bits) is split in 7 bit chunks.
* The 8th bit is used to indicate if the reader should read further (bit
set) or stop (bit clear).

So, if len < 0x7F, it is encoded on one byte as b0 = len
if len < 0x3FFF, is is encoded on 2 bytes as b0 = (len & 0x7F) | 0x80, b1 =
len >> 7
if len < 0x 1FFFFF, it is encoded on 3 bytes as b0 = (len & 0x7F) | 0x80, b1
= ((len >> 7) & 0x7F) | 0x80, b2 = len >> 14
etc.

len is the length of the UTF8 encoding and it is followed by the UTF8 byte
representation of the string.

If you want the source code, I suggest that you download the Shared Source
Common Language Infrastructure from the Microsoft site (just search for
"Shared Source Common Language Infrastructure" on www.microsoft.com) . The
source files for the binary reader/writer are in the sscli\clr\src\system\io
directory

Bruno.
"John Aldrin" <Jo********@msn.com> a écrit dans le message de
news:kh********************************@4ax.com.. .
Hi,

I'm looking for info that explains the format of a string data type
when written to a stream using a BinaryWriter. I've looked all over
MSDN and Internet and I cannot seem to find it.

I did some simple testing and it seems that the string data is
prefixed w/a variable number of bytes that indicate the length.

1 byte if length <= 127

2 bytes if length > 127 - 1st byte has high order bit set, remaining
bits indicate number of bytes + (2nd byte * 127).

I haven't explored any further.

Thanx

jra


Nov 15 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.