473,394 Members | 1,960 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

Calculate length of byte string with embedded nulls

Hello

I have a stream of bytes - unsigned char*. But the 'string' may contain
embedded nulls. So not like a traditional c string terminated with a null.

I need to calculate the length of these arrays but can't use strlen because
it just stops counting at the first null it finds. so how to do it?

Angus
Jan 4 '07 #1
13 3308
Angus a écrit :
Hello

I have a stream of bytes - unsigned char*. But the 'string' may contain
embedded nulls. So not like a traditional c string terminated with a null.

I need to calculate the length of these arrays but can't use strlen because
it just stops counting at the first null it finds. so how to do it?

Angus

There is no way to do it since you have no algorithm to determine
its length.

Jan 4 '07 #2
Angus said:
Hello

I have a stream of bytes - unsigned char*. But the 'string' may contain
embedded nulls. So not like a traditional c string terminated with a
null.

I need to calculate the length of these arrays but can't use strlen
because
it just stops counting at the first null it finds. so how to do it?
Well, now you know what null is for. :-)

Whenever you read data, you need to establish a protocol for stopping. If
you're reading a text file, typically you stop (or at least pause for
thought) when you hit a newline. If you're reading an email feed, you stop
when you get ".\r\n". If you're copying a string, you stop at the null
terminator. All of these are termination protocols.

Clearly, you need a terminating protocol, too. If no particular value ('\0',
'\n') or combination of values (".\r\n") suggests itself as a sentinel,
then you have little option but to insist that your data feed is
accompanied by relevant information regarding its length.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
Jan 4 '07 #3
Angus wrote:
Hello

I have a stream of bytes - unsigned char*. But the 'string' may contain
embedded nulls. So not like a traditional c string terminated with a null.

I need to calculate the length of these arrays but can't use strlen because
it just stops counting at the first null it finds. so how to do it?
If this stream is of a specific format and has the length embedded in
it, you can extract it. How to do this depends on the format.
Otherwise, if the length is not kept elsewhere, you need to keep track
of it yourself.

Jan 4 '07 #4
Angus wrote:
Hello

I have a stream of bytes - unsigned char*. But the 'string' may contain
embedded nulls. So not like a traditional c string terminated with a null.

I need to calculate the length of these arrays but can't use strlen because
it just stops counting at the first null it finds. so how to do it?
Without a condition for termination, there's no way to determine the
end of the stream. As the programmer of the application you should be
knowing this condition. If the array is passed in from a third-party
library, they ought to have documented the same. If both are false,
then your code is broken.

Jan 4 '07 #5
Angus skrev:
Hello

I have a stream of bytes - unsigned char*. But the 'string' may contain
embedded nulls. So not like a traditional c string terminated with a null.

I need to calculate the length of these arrays but can't use strlen because
it just stops counting at the first null it finds. so how to do it?
Just keep track of the number of characters you store in the buffer and
pass that value along with the buffer.
August
Jan 4 '07 #6
"Angus" <no****@gmail.comwrote in message
news:en*******************@news.demon.co.uk...
>
I have a stream of bytes - unsigned char*. But the 'string' may contain
embedded nulls. So not like a traditional c string terminated with a
null.

I need to calculate the length of these arrays but can't use strlen
because
it just stops counting at the first null it finds. so how to do it?
As other posters have indicated, the assumption of \0 termination is "baked
into" much of the 'C' programming language.

I believe this type of string (an array of characters where each character
may contain any value without restriction) is called a "binary string" in
other languages.

The standard 'C' library functions won't work on this type of string.

You could keep track of the length separately from the string.

A second approach is to use an encoding for the string to represent the data
without using \0. The most obvious way to do this is to encode the bytes as
hexadecimal characters, i.e. \0 would be represented as '0' followed by
another '0'. That keeps everything simple, as the length of this kind of
string is double the length of the data. And all the 'C' library functions
will work.
Jan 4 '07 #7
David T. Ashley said:

<snip>
>
I believe this type of string (an array of characters where each character
may contain any value without restriction) is called a "binary string" in
other languages.

The standard 'C' library functions won't work on this type of string.
memcpy, memset, memmove, memchr, memcmp, fread, fwrite, qsort, bsearch are
all counter-examples.
You could keep track of the length separately from the string.
That is necessary if no sentinel is given.
A second approach is to use an encoding for the string to represent the
data
without using \0. The most obvious way to do this is to encode the bytes
as hexadecimal characters, i.e. \0 would be represented as '0' followed by
another '0'. That keeps everything simple, as the length of this kind of
string is double the length of the data. And all the 'C' library
functions will work.
Base-64 encoding would work, too, and wouldn't be quite so noisy. But it's
better by far to keep track of the size.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
Jan 4 '07 #8
>>>>"DTA" == David T Ashley <dt*@e3ft.comwrites:

DTAAs other posters have indicated, the assumption of \0
DTAtermination is "baked into" much of the 'C' programming
DTAlanguage.

Much of the standard library, you mean.

DTAThe standard 'C' library functions won't work on this type of
DTAstring.

But it's a simple matter of programming to implement your own
functions to do this, or to use a library someone else has written.

DTAYou could keep track of the length separately from the
DTAstring.

This is pretty much exactly what you have to do, unless you use
another marker to indicate end-of-string.

Charlton


--
Charlton Wilbur
cw*****@chromatico.net
Jan 4 '07 #9
Angus wrote:
>
Hello

I have a stream of bytes - unsigned char*.
If it's a text stream,
then I suspect that you may be wanting to calculate
the length of the "line" rather than the length of a string.
Lines of text are terminated by a newline character ('\n').
The way to find the length of the line is to do it
while the line is being read.
But the 'string' may contain embedded nulls.
So not like a traditional c string terminated with a null.

I need to calculate the length of these arrays
but can't use strlen because
it just stops counting at the first null it finds. so how to do it?

--
pete
Jan 4 '07 #10

Angus wrote:
Hello

I have a stream of bytes - unsigned char*. But the 'string' may contain
embedded nulls. So not like a traditional c string terminated with a null.

I need to calculate the length of these arrays but can't use strlen because
it just stops counting at the first null it finds. so how to do it?
As other posters have said, you have to know what
bytes actually represent the end of the array, then
write your own code to search the array to locate them.

The only time that I encountered such an array,
its rule was that a single embedded null was part
of it, but two adjacent nulls were its terminator.
--

Jan 4 '07 #11
Charlton Wilbur <cw*****@chromatico.netwrites:
>>>>>"DTA" == David T Ashley <dt*@e3ft.comwrites:

DTAAs other posters have indicated, the assumption of \0
DTAtermination is "baked into" much of the 'C' programming
DTAlanguage.

Much of the standard library, you mean.
And the treatment of string literals.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Jan 4 '07 #12
bert said:
>
Angus wrote:
>Hello

I have a stream of bytes - unsigned char*. But the 'string' may contain
embedded nulls. So not like a traditional c string terminated with a
null.

I need to calculate the length of these arrays but can't use strlen
because
it just stops counting at the first null it finds. so how to do it?

As other posters have said, you have to know what
bytes actually represent the end of the array, then
write your own code to search the array to locate them.

The only time that I encountered such an array,
its rule was that a single embedded null was part
of it, but two adjacent nulls were its terminator.
The problem with such a scheme is that it renders impossible the in-band
representation of two consecutive null bytes. One way around this would be
to use the null character as an escape character, with a subsequent '0'
character representing a null byte, but a subsequent null character
representing the end of the data.

Of course, if you're going to do that, you might as well use some other
character to represent the escape character (e.g. '\\'), with '\\' '\\'
representing backslash, '\\' '0' representing the null byte, and a genuine
null byte representing the end of the data. This does, however, render it
necessary to translate the escape sequences.

All in all, it is a better scheme by far simply to provide the length
information in advance of, or in parallel with, the data, thus rendering
translation unnecessary.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
Jan 4 '07 #13
Richard Heathfield wrote:
Angus said:
>I have a stream of bytes - unsigned char*. But the 'string' may
contain embedded nulls. So not like a traditional c string
terminated with a null.

I need to calculate the length of these arrays but can't use
strlen because it just stops counting at the first null it finds.
so how to do it?

Well, now you know what null is for. :-)

Whenever you read data, you need to establish a protocol for
stopping. If you're reading a text file, typically you stop (or at
least pause for thought) when you hit a newline. If you're reading
an email feed, you stop when you get ".\r\n". If you're copying a
string, you stop at the null terminator. All of these are
termination protocols.

Clearly, you need a terminating protocol, too. If no particular
value ('\0', '\n') or combination of values (".\r\n") suggests
itself as a sentinel, then you have little option but to insist
that your data feed is accompanied by relevant information
regarding its length.
However a special case is exemplified by:

char foobar[] = "foo\0bar\0gup\0etc";
...
fwrite(foobar, 1, sizeof(foobar), f);

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>
Jan 5 '07 #14

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: sql-db2-dba | last post by:
Does DB2 just fudge it when it is an empty table? Is there a "formula" for average row size when you have variable length records. Or you really have to know what your application is packing into...
10
by: Jean-David Beyer | last post by:
I have some programs running on Red Hat Linux 7.3 working with IBM DB2 V6.1 (with all the FixPacks) on my old machine. I have just installed IBM DB2 V8.1 on this (new) machine running Red Hat...
3
by: Jimski | last post by:
Hello all, I am having a problem where I get an error message when I call FlushFinalBlock when decrypting my encrypted text. I am using the Rijndael algorithm. The error message is "Length...
3
by: Diffident | last post by:
ErrorMessage: ------------- Invalid length for a Base-64 string. ErrorSource: ------------- mscorlib ErrorTargetSite: ----------------
6
by: Tom | last post by:
I'm trying to pass this structure to a dll: <StructLayout(LayoutKind.Sequential, CharSet:=CharSet.Ansi, Pack:=1)> _ Public Structure udtINTER01 <VBFixedString(8),...
2
by: Tom | last post by:
I'm getting this error when I try to pass a structure to a dll. An unhandled exception of type 'System.ArgumentException' occured in Test1.exe Additional Information: Type could not be marshaled...
13
by: Martin Herbert Dietze | last post by:
Hi, I need to calculate the physical length of text in a text input. The term "physical" means in this context, that I consider 7bit-Ascii as one-byte-per character. Other characters may be...
0
by: Hannibal111111 | last post by:
I found this code on a site for doing string encryption/decryption. The string will encrypt fine, but I get this error when I try to decrypt. Any idea why? I posted the code below. The error...
1
by: Sathyaish | last post by:
I have the following scenario: Algorithm: 3DES Cipher Mode: CBC Key Size: 128-bit Block Size: 64 bit IV: 0x0000000000000000 (an eight byte array of zeros) The results I get using .NET with...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.