By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
457,712 Members | 1,138 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 457,712 IT Pros & Developers. It's quick & easy.

Calculate length of byte string with embedded nulls

P: n/a
Hello

I have a stream of bytes - unsigned char*. But the 'string' may contain
embedded nulls. So not like a traditional c string terminated with a null.

I need to calculate the length of these arrays but can't use strlen because
it just stops counting at the first null it finds. so how to do it?

Angus
Jan 4 '07 #1
Share this Question
Share on Google+
13 Replies


P: n/a
Angus a écrit :
Hello

I have a stream of bytes - unsigned char*. But the 'string' may contain
embedded nulls. So not like a traditional c string terminated with a null.

I need to calculate the length of these arrays but can't use strlen because
it just stops counting at the first null it finds. so how to do it?

Angus

There is no way to do it since you have no algorithm to determine
its length.

Jan 4 '07 #2

P: n/a
Angus said:
Hello

I have a stream of bytes - unsigned char*. But the 'string' may contain
embedded nulls. So not like a traditional c string terminated with a
null.

I need to calculate the length of these arrays but can't use strlen
because
it just stops counting at the first null it finds. so how to do it?
Well, now you know what null is for. :-)

Whenever you read data, you need to establish a protocol for stopping. If
you're reading a text file, typically you stop (or at least pause for
thought) when you hit a newline. If you're reading an email feed, you stop
when you get ".\r\n". If you're copying a string, you stop at the null
terminator. All of these are termination protocols.

Clearly, you need a terminating protocol, too. If no particular value ('\0',
'\n') or combination of values (".\r\n") suggests itself as a sentinel,
then you have little option but to insist that your data feed is
accompanied by relevant information regarding its length.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
Jan 4 '07 #3

P: n/a
Angus wrote:
Hello

I have a stream of bytes - unsigned char*. But the 'string' may contain
embedded nulls. So not like a traditional c string terminated with a null.

I need to calculate the length of these arrays but can't use strlen because
it just stops counting at the first null it finds. so how to do it?
If this stream is of a specific format and has the length embedded in
it, you can extract it. How to do this depends on the format.
Otherwise, if the length is not kept elsewhere, you need to keep track
of it yourself.

Jan 4 '07 #4

P: n/a
Angus wrote:
Hello

I have a stream of bytes - unsigned char*. But the 'string' may contain
embedded nulls. So not like a traditional c string terminated with a null.

I need to calculate the length of these arrays but can't use strlen because
it just stops counting at the first null it finds. so how to do it?
Without a condition for termination, there's no way to determine the
end of the stream. As the programmer of the application you should be
knowing this condition. If the array is passed in from a third-party
library, they ought to have documented the same. If both are false,
then your code is broken.

Jan 4 '07 #5

P: n/a
Angus skrev:
Hello

I have a stream of bytes - unsigned char*. But the 'string' may contain
embedded nulls. So not like a traditional c string terminated with a null.

I need to calculate the length of these arrays but can't use strlen because
it just stops counting at the first null it finds. so how to do it?
Just keep track of the number of characters you store in the buffer and
pass that value along with the buffer.
August
Jan 4 '07 #6

P: n/a
"Angus" <no****@gmail.comwrote in message
news:en*******************@news.demon.co.uk...
>
I have a stream of bytes - unsigned char*. But the 'string' may contain
embedded nulls. So not like a traditional c string terminated with a
null.

I need to calculate the length of these arrays but can't use strlen
because
it just stops counting at the first null it finds. so how to do it?
As other posters have indicated, the assumption of \0 termination is "baked
into" much of the 'C' programming language.

I believe this type of string (an array of characters where each character
may contain any value without restriction) is called a "binary string" in
other languages.

The standard 'C' library functions won't work on this type of string.

You could keep track of the length separately from the string.

A second approach is to use an encoding for the string to represent the data
without using \0. The most obvious way to do this is to encode the bytes as
hexadecimal characters, i.e. \0 would be represented as '0' followed by
another '0'. That keeps everything simple, as the length of this kind of
string is double the length of the data. And all the 'C' library functions
will work.
Jan 4 '07 #7

P: n/a
David T. Ashley said:

<snip>
>
I believe this type of string (an array of characters where each character
may contain any value without restriction) is called a "binary string" in
other languages.

The standard 'C' library functions won't work on this type of string.
memcpy, memset, memmove, memchr, memcmp, fread, fwrite, qsort, bsearch are
all counter-examples.
You could keep track of the length separately from the string.
That is necessary if no sentinel is given.
A second approach is to use an encoding for the string to represent the
data
without using \0. The most obvious way to do this is to encode the bytes
as hexadecimal characters, i.e. \0 would be represented as '0' followed by
another '0'. That keeps everything simple, as the length of this kind of
string is double the length of the data. And all the 'C' library
functions will work.
Base-64 encoding would work, too, and wouldn't be quite so noisy. But it's
better by far to keep track of the size.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
Jan 4 '07 #8

P: n/a
>>>>"DTA" == David T Ashley <dt*@e3ft.comwrites:

DTAAs other posters have indicated, the assumption of \0
DTAtermination is "baked into" much of the 'C' programming
DTAlanguage.

Much of the standard library, you mean.

DTAThe standard 'C' library functions won't work on this type of
DTAstring.

But it's a simple matter of programming to implement your own
functions to do this, or to use a library someone else has written.

DTAYou could keep track of the length separately from the
DTAstring.

This is pretty much exactly what you have to do, unless you use
another marker to indicate end-of-string.

Charlton


--
Charlton Wilbur
cw*****@chromatico.net
Jan 4 '07 #9

P: n/a
Angus wrote:
>
Hello

I have a stream of bytes - unsigned char*.
If it's a text stream,
then I suspect that you may be wanting to calculate
the length of the "line" rather than the length of a string.
Lines of text are terminated by a newline character ('\n').
The way to find the length of the line is to do it
while the line is being read.
But the 'string' may contain embedded nulls.
So not like a traditional c string terminated with a null.

I need to calculate the length of these arrays
but can't use strlen because
it just stops counting at the first null it finds. so how to do it?

--
pete
Jan 4 '07 #10

P: n/a

Angus wrote:
Hello

I have a stream of bytes - unsigned char*. But the 'string' may contain
embedded nulls. So not like a traditional c string terminated with a null.

I need to calculate the length of these arrays but can't use strlen because
it just stops counting at the first null it finds. so how to do it?
As other posters have said, you have to know what
bytes actually represent the end of the array, then
write your own code to search the array to locate them.

The only time that I encountered such an array,
its rule was that a single embedded null was part
of it, but two adjacent nulls were its terminator.
--

Jan 4 '07 #11

P: n/a
Charlton Wilbur <cw*****@chromatico.netwrites:
>>>>>"DTA" == David T Ashley <dt*@e3ft.comwrites:

DTAAs other posters have indicated, the assumption of \0
DTAtermination is "baked into" much of the 'C' programming
DTAlanguage.

Much of the standard library, you mean.
And the treatment of string literals.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Jan 4 '07 #12

P: n/a
bert said:
>
Angus wrote:
>Hello

I have a stream of bytes - unsigned char*. But the 'string' may contain
embedded nulls. So not like a traditional c string terminated with a
null.

I need to calculate the length of these arrays but can't use strlen
because
it just stops counting at the first null it finds. so how to do it?

As other posters have said, you have to know what
bytes actually represent the end of the array, then
write your own code to search the array to locate them.

The only time that I encountered such an array,
its rule was that a single embedded null was part
of it, but two adjacent nulls were its terminator.
The problem with such a scheme is that it renders impossible the in-band
representation of two consecutive null bytes. One way around this would be
to use the null character as an escape character, with a subsequent '0'
character representing a null byte, but a subsequent null character
representing the end of the data.

Of course, if you're going to do that, you might as well use some other
character to represent the escape character (e.g. '\\'), with '\\' '\\'
representing backslash, '\\' '0' representing the null byte, and a genuine
null byte representing the end of the data. This does, however, render it
necessary to translate the escape sequences.

All in all, it is a better scheme by far simply to provide the length
information in advance of, or in parallel with, the data, thus rendering
translation unnecessary.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
Jan 4 '07 #13

P: n/a
Richard Heathfield wrote:
Angus said:
>I have a stream of bytes - unsigned char*. But the 'string' may
contain embedded nulls. So not like a traditional c string
terminated with a null.

I need to calculate the length of these arrays but can't use
strlen because it just stops counting at the first null it finds.
so how to do it?

Well, now you know what null is for. :-)

Whenever you read data, you need to establish a protocol for
stopping. If you're reading a text file, typically you stop (or at
least pause for thought) when you hit a newline. If you're reading
an email feed, you stop when you get ".\r\n". If you're copying a
string, you stop at the null terminator. All of these are
termination protocols.

Clearly, you need a terminating protocol, too. If no particular
value ('\0', '\n') or combination of values (".\r\n") suggests
itself as a sentinel, then you have little option but to insist
that your data feed is accompanied by relevant information
regarding its length.
However a special case is exemplified by:

char foobar[] = "foo\0bar\0gup\0etc";
...
fwrite(foobar, 1, sizeof(foobar), f);

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>
Jan 5 '07 #14

This discussion thread is closed

Replies have been disabled for this discussion.