endianness and sscanf/sprintf

pramod

Two different platforms communicate over protocols which consist of
functions and arguments in ascii form. System might be little
endian/big endian.

It is possible to format string using sprintf and retreive it using
sscanf.
Each parameter has a delimiter, data type size is ported to the
platform, and expected argument order is known.

Is this approach portable w.r.t. endianess ?
regards,
Pramod

Nov 14 '05 #1

Subscribe Post Reply

4871

John Carson

"pramod" <sp********@yahoo.com> wrote in message
news:c6**************************@posting.google.c om

Two different platforms communicate over protocols which consist of
functions and arguments in ascii form. System might be little
endian/big endian.

It is possible to format string using sprintf and retreive it using
sscanf.
Each parameter has a delimiter, data type size is ported to the
platform, and expected argument order is known.

Is this approach portable w.r.t. endianess ?
regards,
Pramod

endianness only affects the way that integers are stored (and perhaps
floating point numbers --- I am not sure). It does not affect the storage of
characters so it is not an issue if you are only sending text.
--
John Carson
1. To reply to email address, remove donald
2. Don't reply to email address (post here instead)

Nov 14 '05 #2

EventHelix.com

You will be fine as everything is being converted to characters.
As long as characters are represented as 8 bytes, the numbers
will be interpreted correctly. Java bytecodes use the same approach.

The following article discusses the endianness in detail:

http://www.eventhelix.com/RealtimeMa...ndOrdering.htm

Sandeep
--
http://www.EventHelix.com/EventStudio
EventStudio 2.0 - Go Beyond UML Use Case and Sequence Diagrams

Nov 14 '05 #3

Richard Heathfield

EventHelix.com wrote:

You will be fine as everything is being converted to characters.
As long as characters are represented as 8 bytes, the numbers
will be interpreted correctly.

In C (and, as far as I am aware, C++ too), characters are always represented
in a single byte. Character /constants/ are represented (in C, but not C++)
by the int type, which might conceivably be eight bytes. Is that what you
meant?

(Followups set to comp.lang.c)

--
Richard Heathfield : bi****@eton.powernet.co.uk
"Usenet is a strange place." - Dennis M Ritchie, 29 July 1999.
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
K&R answers, C books, etc: http://users.powernet.co.uk/eton

Nov 14 '05 #4

Martijn Lievaart

On Wed, 31 Dec 2003 01:20:44 -0800, pramod wrote:

Two different platforms communicate over protocols which consist of
functions and arguments in ascii form. System might be little
endian/big endian.

It is possible to format string using sprintf and retreive it using
sscanf.
Each parameter has a delimiter, data type size is ported to the
platform, and expected argument order is known.

Is this approach portable w.r.t. endianess ?

Yes, and a very good way to do it. But only if really using ascii,
otherwise you may end up mixing codesets. Consider using UTF8 if you use
characters >=128 (i.e. not ascii).

HTH,
M4

Nov 14 '05 #5

Jeff Schwab

EventHelix.com wrote:

You will be fine as everything is being converted to characters.
As long as characters are represented as 8 bytes,
bits?
the numbers
will be interpreted correctly. Java bytecodes use the same approach.

The following article discusses the endianness in detail:

http://www.eventhelix.com/RealtimeMa...ndOrdering.htm

Sandeep
--
http://www.EventHelix.com/EventStudio
EventStudio 2.0 - Go Beyond UML Use Case and Sequence Diagrams

Nov 14 '05 #6

Peter Pichler

"Jeff Schwab" <je******@comcast.net> wrote...

EventHelix.com wrote:
You will be fine as everything is being converted to characters.
As long as characters are represented as 8 bytes,

bits?

Not that it matters. The second sentence almost invalidates the otherwise
perfectly correct first ;-)

Peter

Nov 14 '05 #7

EventHelix.com

Richard Heathfield <in*****@address.co.uk.invalid> wrote in message news:<3f******@news2.power.net.uk>...

EventHelix.com wrote:
You will be fine as everything is being converted to characters.
As long as characters are represented as 8 bytes, the numbers
will be interpreted correctly.

In C (and, as far as I am aware, C++ too), characters are always represented
in a single byte. Character /constants/ are represented (in C, but not C++)
by the int type, which might conceivably be eight bytes. Is that what you
meant?

(Followups set to comp.lang.c)

Typo: it should have been "8 bits" (i.e. byte).

Sandeep

Nov 14 '05 #8

Richard Heathfield

EventHelix.com wrote:

Richard Heathfield <in*****@address.co.uk.invalid> wrote in message
news:<3f******@news2.power.net.uk>...
EventHelix.com wrote:
> You will be fine as everything is being converted to characters.
> As long as characters are represented as 8 bytes, the numbers
> will be interpreted correctly.

In C (and, as far as I am aware, C++ too), characters are always
represented in a single byte. Character /constants/ are represented (in
C, but not C++) by the int type, which might conceivably be eight bytes.
Is that what you meant?

Typo: it should have been "8 bits" (i.e. byte).

But there is no requirement in either C or C++ for a byte to be exactly 8
bits; only that it must be /at least/ 8 bits.

--
Richard Heathfield : bi****@eton.powernet.co.uk
"Usenet is a strange place." - Dennis M Ritchie, 29 July 1999.
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
K&R answers, C books, etc: http://users.powernet.co.uk/eton

Nov 14 '05 #9

Martijn Lievaart

On Fri, 02 Jan 2004 05:53:31 +0000, Richard Heathfield wrote:

EventHelix.com wrote:
Richard Heathfield <in*****@address.co.uk.invalid> wrote in message
news:<3f******@news2.power.net.uk>...
EventHelix.com wrote:

> You will be fine as everything is being converted to characters.
> As long as characters are represented as 8 bytes, the numbers
> will be interpreted correctly.
Even assuming you ment 8 bits, this is not true. If one system uses ascii
and the other uses ebcdic, you're screwed. Even the subtle distinctions
between iso-latin-1 and iso-latin-15, two almost compatible and often used
character sets, might bite you. All of these use 8 bits (well OK, ascii
uses 7).

In C (and, as far as I am aware, C++ too), characters are always
represented in a single byte. Character /constants/ are represented (in
C, but not C++) by the int type, which might conceivably be eight bytes.
Is that what you meant?

Typo: it should have been "8 bits" (i.e. byte).

But there is no requirement in either C or C++ for a byte to be exactly 8
bits; only that it must be /at least/ 8 bits.

But note the unfortunate discrepancy between the meaning of the word byte
in C/C++ and that of measoring storage. However, C/C++ is not alone here,
Internet standards talk about octets when they mean 8 bits.

Same with the unit words. That means different things to different people.
The way I learned it at uni, very long time ago, was that a word was the
basic unit of storage. Same as the definition of byte in C/C++. Along came
MicroSoft and institutionalised the word-size of the 8086 as a WORD, so to
others a word now is 16 bits. I've seen even different uses of the word
'word', anyone got an example?

Why am I saying this? Because in the context of C/C++ a byte has a defined
meaning. However, in the context of disks and memory, a byte has a
different meaning. When the context is not clear it is very easy to get
confusion. Ah I here you say, but this is a C/C++ group, so the meaning is
clear. That may be true, but:
- The problem described a certain context, one where many people
(incorrectly) use the word byte to mean 8 bits.
- It is very confusing to people anyhow. Youngsters are raised with the
notion that a byte is 8 bits.

In the end, we can only conclude that this difference in meaning is very
unfortunate. Technically, an octet is the correct term for 8 bits. But
we're never going to change the common use of byte anymore. In the
meantime we'll have to live with it.

I just wished the C/C++ standards had used a different term than byte.
Even word would have been better.

M4

Nov 14 '05 #10

Keith Thompson

Martijn Lievaart <m@remove.this.part.rtij.nl> writes:
[...]

But note the unfortunate discrepancy between the meaning of the word byte
in C/C++ and that of measoring storage. However, C/C++ is not alone here,
Internet standards talk about octets when they mean 8 bits.

Same with the unit words. That means different things to different people.
The way I learned it at uni, very long time ago, was that a word was the
basic unit of storage. Same as the definition of byte in C/C++. Along came
MicroSoft and institutionalised the word-size of the 8086 as a WORD, so to
others a word now is 16 bits. I've seen even different uses of the word
'word', anyone got an example? [...] I just wished the C/C++ standards had used a different term than byte.
Even word would have been better.

I agree that it would have avoided a lot of confusion if the C and C++
standards had used a term other than "byte" (perhaps "storage unit").
While I'm wishing for things that didn't happen, it would also have
been nice if the concept hadn't been tied to the size of a character.

I think (but I'm not sure, and it doesn't really matter) that the use
of the word "word" predates the 8086 (and it probably would have been
Intel, not Microsoft, that introduced the word "word" in descriptions
of CPU instruction operand sizes). Most or all CPUs I've seen use the
words "byte" and "word" to refer to operand sizes. The meaning of a
"word" varies across architectures far more than the meaning of
"byte".

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"
(Note new e-mail address)

Nov 14 '05 #11

Martijn Lievaart

On Fri, 02 Jan 2004 20:45:45 +0000, Keith Thompson wrote:

I think (but I'm not sure, and it doesn't really matter) that the use
of the word "word" predates the 8086 (and it probably would have been
Intel, not Microsoft, that introduced the word "word" in descriptions
of CPU instruction operand sizes). Most or all CPUs I've seen use the
words "byte" and "word" to refer to operand sizes. The meaning of a
"word" varies across architectures far more than the meaning of
"byte".

Exactly what I was trying to say. F.i the CDC used 60-bit words. (No
wonder that design is extinct :-).

M4

Nov 14 '05 #12

Lew Pitcher

Martijn Lievaart wrote:
[snip]

Same with the unit words. That means different things to different people.
The way I learned it at uni, very long time ago, was that a word was the
basic unit of storage. Same as the definition of byte in C/C++. Along came
MicroSoft and institutionalised the word-size of the 8086 as a WORD, so to
others a word now is 16 bits. I've seen even different uses of the word
'word', anyone got an example?

In the IBM mainframe world, a "word" (or "fullword") has been 32bits for the
last 40+ years. A 16bit quantity is a "halfword".

[snip]
--
Lew Pitcher

Master Codewright and JOAT-in-training
Registered Linux User #112576 (http://counter.li.org/)
Slackware - Because I know what I'm doing.

Nov 14 '05 #13

pete

Lew Pitcher wrote:

Martijn Lievaart wrote:
[snip]
Same with the unit words.
That means different things to different people.
The way I learned it at uni, very long time ago,
was that a word was the basic unit of storage.
Same as the definition of byte in C/C++. Along came
MicroSoft and institutionalised the word-size of
the 8086 as a WORD, so to others a word now is 16 bits.
I've seen even different uses of the word
'word', anyone got an example?

In the IBM mainframe world, a "word" (or "fullword")
has been 32bits for the
last 40+ years. A 16bit quantity is a "halfword".

I'm familiar with "word" having a similar meaning as
the traditional meaning of "int", having the
"natural size suggested by the architecture
of the execution environment"

--
pete

Nov 14 '05 #14

Ron Natalie

"Lew Pitcher" <lp******@sympatico.ca> wrote in message news:fq***********@merlin.l6s4x6-4.ca...

In the IBM mainframe world, a "word" (or "fullword") has been 32bits for the
last 40+ years. A 16bit quantity is a "halfword".

Back when I was heavily into PDP-11's (16 bits), my mainframe friends referred
to my computers as halfword machines.

Just about every 32 bit processor (with the exception of the x86 stuff) calls a
WORD 32 bits. Even on the 386+ the word size really is 32 bits, but since
the thing is upward compatible with the old 16 bit 8086... they call words DWORDS.

On the 7094 and it's follow ons (including the UNIVAC and the DEC-10/20) the
word size is 36 bits. Anything smaller is a "partial word" (which there is no fixed
divisions leading to amusing things such as the same hardware supporting byte sizes
from 5 to 9 bits).

I've worked on 64 bit word machines. The CRAY is word addressed...there really
is NO such hardware datatype other than 64 bit integrals and 64 bit reals. Char's
are a unholy kludge in software (they didn't even try anything else, sizeof any non-comoosite
type is either 8 or 64).

Never say die, the 64 bit word machines are coming back (AMD, IA64, etc...)!

Nov 14 '05 #15

Ron Natalie

"pete" <pf*****@mindspring.com> wrote in message news:3F***********@mindspring.com...

I'm familiar with "word" having a similar meaning as

the traditional meaning of "int", having the
"natural size suggested by the architecture
of the execution environment"

Of course even int's get perverted. For example, on many 64 bit
architectures where 64 bits is the natural size, they've just punted and
made int's 32 bits because that's what the larger body of code assumes.
It took us over a decade to get people to stop expecting *0 to be 0.

Nov 14 '05 #16

Joe Wright

Ron Natalie wrote:

"Lew Pitcher" <lp******@sympatico.ca> wrote in message news:fq***********@merlin.l6s4x6-4.ca...

In the IBM mainframe world, a "word" (or "fullword") has been 32bits for the
last 40+ years. A 16bit quantity is a "halfword".

[ snippage ]
On the 7094 and it's follow ons (including the UNIVAC and the DEC-10/20) the
word size is 36 bits. Anything smaller is a "partial word" (which there is no fixed
divisions leading to amusing things such as the same hardware supporting byte sizes
from 5 to 9 bits).

The IBM 7094 came out in January 1963 and was the last of its ilk from
IBM. Its follow on was the S/360 in 1964. I never came across a "partial
word". For I/O the 36-bit word was divided into 6-bit chunks to be
written to (and read from) 7-channel magnetic tape. For character I/O
the 6 bits were encoded into something called BCD which translated
directly to and from the 026 punch card. With the S/360 came the 32-bit
word and 8-bit character, 9-channel mag tape and EBCDIC (Extended BCD
Interchange Code).
--
Joe Wright http://www.jw-wright.com
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---

Nov 14 '05 #17

Martin Ambuhl

Ron Natalie wrote:

On the 7094 and it's follow ons (including the UNIVAC and the DEC-10/20) the
word size is 36 bits. Anything smaller is a "partial word" (which there is no fixed
divisions leading to amusing things such as the same hardware supporting byte sizes
from 5 to 9 bits).

The PDP-10 and PDP-20 were "follow ons" to the PDP-6, not the 7094,
although both derived features from earlier machines. The the PDP-6/10
family (and, to a lesser degree, the 7090/7094 family) had many
instructions that operated on 18-bit halfwords, for the good reason that
instructions were divided with an 18-bit address field (+indirect bit).
This structure -- from 7094 side again -- lies behind the "car" and "cdr"
functions in Lisp.
The PDP-6 and -10 used byte pointers which could address bytes of any size
from 1- to 36-bits. Some sizes, notably 19-35 bits, are obviously quite
wasteful. The most common sizes were the ones you name (5- to 9-bit bytes).
--
Martin Ambuhl

Nov 14 '05 #18

Ron Natalie

"Joe Wright" <jo********@earthlink.net> wrote in message news:3F***********@earthlink.net...

On the 7094 and it's follow ons (including the UNIVAC and the DEC-10/20) the
word size is 36 bits. Anything smaller is a "partial word" (which there is no fixed
divisions leading to amusing things such as the same hardware supporting byte sizes
from 5 to 9 bits).

The IBM 7094 came out in January 1963 and was the last of its ilk from
IBM. Its follow on was the S/360 in 1964. I never came across a "partial
word".

The follow-on's were not from IBM. The 7094 begat both the UNIVAC
1100 series and the DEC mainframes. Both of which had the arbitrary
byte operations. The 7094 did have both 6 and 7 bit I/O bytes available.
The UNIVAC had an even larger array of byte size usage.

An another amusing asside, is that there was a UNIVAC communications
processor for the 1100-series (I'm spacing on it's nomenclature? CSE?),
which actually ran the 360 instruction set.

Speaking of the 7-track tape drivers, when they shop finally ditched the last
of the 7-track UNISERVO tape drivers we lost the ability to run the program
that played Christmas carols using the sound the tape in the vacuum columns
made. Nobody ever retuned it for the 9-track drives.

Nov 14 '05 #19

Keith Thompson

"Ron Natalie" <ro*@sensor.com> writes:
[...]

I've worked on 64 bit word machines. The CRAY is word
addressed...there really is NO such hardware datatype other than 64
bit integrals and 64 bit reals. Char's are a unholy kludge in
software (they didn't even try anything else, sizeof any
non-comoosite type is either 8 or 64).

There have been a number of different Cray models, with different
architectures, but I think the vector systems (the oldest I've worked
on was the T90) have been fairly consistent in their data types.

I think you're quoting bit sizes rather than byte sizes. The C
compiler uses an 8-bit byte for compatibility with other systems, even
though there's no real hardware support for 8-bit operands.
sizeof(char) is 1, of course; sizeof(TYPE) is 8 (64 bits) for each of
short, int, and long. Byte pointers are word pointers with a byte
offset kludged into the high-order 3 bits. Carefully written C code
works just fine; code that makes too many assumptions can fail badly.

The T3E isn't quite so exotic; it uses Alpha CPUs.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"

Nov 14 '05 #20

Ron Natalie

"Keith Thompson" <ks***@mib.org> wrote in message news:ln************@nuthaus.mib.org...

"Ron Natalie" <ro*@sensor.com> writes:
[...]
I've worked on 64 bit word machines. The CRAY is word
addressed...there really is NO such hardware datatype other than 64
bit integrals and 64 bit reals. Char's are a unholy kludge in
software (they didn't even try anything else, sizeof any
non-comoosite type is either 8 or 64).

There have been a number of different Cray models, with different
architectures, but I think the vector systems (the oldest I've worked
on was the T90) have been fairly consistent in their data types.

I think you're quoting bit sizes rather than byte sizes. The C
compiler uses an 8-bit byte for compatibility with other systems, even
though there's no real hardware support for 8-bit operands.
sizeof(char) is 1, of course; sizeof(TYPE) is 8 (64 bits) for each of
short, int, and long.

Yes, I was talking bits. My experience was with the X/MP and then
the Y-MP EL processors. I actually bought a CRAY 2 in one job, but
I was gone by the time it was delivered.

Nov 14 '05 #21

Gary Labowitz

"Ron Natalie" <ro*@sensor.com> wrote in message
news:3f***********************@news.newshosting.co m...

"Joe Wright" <jo********@earthlink.net> wrote in message news:3F***********@earthlink.net...

[snip]

The IBM 7094 came out in January 1963 and was the last of its ilk from
IBM. Its follow on was the S/360 in 1964. I never came across a "partial
word".

The 16-bit values were referred to as "halfword" and there were a variety
of operations that manipulated them. Loading a halfword into a register got
you sign extension, for example. AFAIK the term "partial word" was never
used.
--
Gary

Nov 14 '05 #22

Lew Pitcher

pete wrote:

Lew Pitcher wrote:
Martijn Lievaart wrote:
[snip]
Same with the unit words.
That means different things to different people.
The way I learned it at uni, very long time ago,
was that a word was the basic unit of storage.
Same as the definition of byte in C/C++. Along came
MicroSoft and institutionalised the word-size of
the 8086 as a WORD, so to others a word now is 16 bits.
I've seen even different uses of the word
'word', anyone got an example?

In the IBM mainframe world, a "word" (or "fullword")
has been 32bits for the
last 40+ years. A 16bit quantity is a "halfword".

I'm familiar with "word" having a similar meaning as
the traditional meaning of "int", having the
"natural size suggested by the architecture
of the execution environment"

IBM System/370 Principles of Operation (GA22-7000-4, September 1, 1975)

System Organization / Information Formats

"The system transmits information between main stroage and a CPU or
channel in units of eight bits, or a multiple of eight bits at a time.
Each eight-bit unit of information is called a /byte/, the basic building
block of all formats.
...
Bytes may be handled separately or grouped together in fields. A
/halfword/ is a group of two consecutive bytes and is the basic building
block of instructions. A /word/ is a group of four consecutive bytes; a
/doubleword/ is a group of eight bytes.
(It should be noted that the term "byte" in the above text refers to a
CPU-measured quantity, and /not/ to the terminology used by the C standard.)

--
Lew Pitcher, IT Consultant, Application Architecture
Enterprise Technology Solutions, TD Bank Financial Group

(Opinions expressed here are my own, not my employer's)

Nov 14 '05 #23

endianness and sscanf/sprintf

Similar topics