468,761 Members | 1,824 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,761 developers. It's quick & easy.

Fast deserialisation of strings from byte[]

I have an application that performs custom deserialisation of object state
from byte arrays. This happens very regularly, so needs to be fast. In
addition, most of the strings repeat, meaning I'm deserialising the same
sequence of bytes repeatedly, giving the same output string. Let's ignore
the text encoding method, as it's not relevant to my question.

Right now, I'm using BinaryReader.ReadString() which gives the correct
result, however it creates a new instance of System.String for each byte
sequence. What I'd really like is to detect the repeated byte sequence, and
return a reference to an existing deserialised version.

A colleague put me onto string.Intern, but this won't help as by the time
I'm calling that method, I've already allocated the string.

Note that these strings are very short lived. After deserialisation, they
will be processed and (for the most part) garbage collected before they get
promoted to generation 1. This happens several thousand times a second under
normal conditions, giving the garbage collector (what I assume is) a lot of
work. I'm seeing the classic sawtooth pattern in a heap timeline but with
very high frequency.

I'd like to know whether this is a situation in which I can improve
performance. I can envisage some sort of structure (perhaps a Trie) that
hones in on the stored string as we progress through the byte sequence.
However this structure cannot be pre-populated (the strings will be
determined at runtime).

The big question is: do the benefits of reducing string allocation justify
the overhead in finding a stored string? This, no doubt, depends upon the
implementation.

There may also be knock-on benefits from knowing strings having the same
value are identical objects (eg. object.ReferenceEquals rather than
object.Equals), but this is secondary.

This seems to me a great performance question. I hope others find it as
interesting as I do and will share their ideas and experience.

Regards,

Drew Noakes.
Sep 5 '05 #1
7 1517
Drew,

I after reading your question twice is the answer in the first section of
your question.

A string is in Net never mutable. It will forever been build new even with
the slightest change.

The only "string-like" is a stringbuilder which is a kind of collection of
characters, however maybe can that help you.

http://msdn.microsoft.com/library/de...classtopic.asp

Be aware that the description is wrong. There cannot be a mutable string. In
the remarks it is written right.

I hope this helps,

Cor


Sep 5 '05 #2
Hi Cor,

Thanks for your prompt response. I'm aware of the behaviour of strings with
regards to mutability, but this issue is different. Perhaps I didn't explain
myself clearly enough. I simply do not want to instantiate two different
string objects that have the same value.

Therefore, when I'm stepping through the byte[], the first time I see a
given pattern I would create the string and store it. The next time I see
the same pattern, I'll return a reference to the string I have stored. This
avoids the overhead of having two strings on the heap that have identical
values.

Bear in mind that I'm talking about doing this many many times a second, to
a point where I believe there is a performance gain to be reaped from this
added complexity.

Regards,

Drew.
Sep 5 '05 #3
Drewnoakes,

This sounds to the hashtable (dictionary) however I doubt that it will be
giving you benefits.

The bytepatern in the key and the string in the value.

http://msdn.microsoft.com/library/de...ClassTopic.asp

I hope however that it helps anyhow.

Cor
Sep 5 '05 #4
Hi Cor,

If I use a Hashtable, I must create a new byte[] which in turn is another
object allocation. I wish to achieve this lookup without allocating any
object on the heap.

Drew.
Sep 5 '05 #5
Drewnoakes,

Are you sure of that, the hashtable holds objects, not values.

http://msdn.microsoft.com/library/de...ssaddtopic.asp

I hope this helps,

Cor
Sep 5 '05 #6
Cor Ligthert [MVP] wrote:
Drewnoakes,

Are you sure of that, the hashtable holds objects, not values.

http://msdn.microsoft.com/library/de...ssaddtopic.asp

I hope this helps,

Cor

Well, a raw byte[] cannot be used as a key. But you can calculate an
integer hash value for the byte sequence and use that hash as a key in a
normal Hashtable.
But as someone has previously said, this will not necessarily improve
the performance.

Best regards
RG
Sep 6 '05 #7
Hi Cor,

Keying a hash table on byte[] will not reduce my memory overhead. Besides,
I still have to allocate an object (byte[] is an object, not a value-type)
before I can look up the string in the hashtable.

Drew.
Sep 6 '05 #8

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

16 posts views Thread by Paul Prescod | last post: by
1 post views Thread by John Dann | last post: by
7 posts views Thread by drewnoakes | last post: by
1 post views Thread by Dave | last post: by
4 posts views Thread by Alexis Gallagher | last post: by
14 posts views Thread by Dennis Benzinger | last post: by
2 posts views Thread by Taras_96 | last post: by
95 posts views Thread by hstagni | last post: by
4 posts views Thread by Alexey Moskvin | last post: by
reply views Thread by Marin | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.