Question about disparate CHAR_BIT systems and file access

charles_gero

Hi Everyone,

I have a quick question regarding access to a file from disparate
CHAR_BIT systems. Has anyone had experience writing a file on a system
where CHAR_BIT is one value (let's use the value of 10) and then
reading said file from a system where this value is different (let's
say the common value of 8)?

I'm just curious how this would play out with respect to the standards,
etc. So for example, if I have a system where CHAR_BIT is 10 and I
write a single character to a hard disk file (using fputc, POSIX write,
etc...), and then move this hard drive to a system with CHAR_BIT set to
8 and attempt to read, what would occur? Obviously I would need at
least two "char" reads, but what happens to the 2 bits in the second
read? Are the treated most significant, least significant, etc.? What
would a file size even be reported on such a system?

I ask not because I've seen this, as a matter of fact I don't believe
I've ever personally run into a system where CHAR_BIT is anything other
than 8 (although we know they do exist), but rather in an effort to
understand how to write the highest portable code possible. NOTE: I'm
not limiting this to disk file discussion only, just using it as an
example. The file could be generated on machine A and network
transferred to B. I'm just curious how this would work.

All comments are extremely appreciated. Thank you so much.

-Charlie

Jan 2 '07 #1

Subscribe Post Reply

2243

Ben Pfaff

ch**********@merck.com writes:

I have a quick question regarding access to a file from disparate
CHAR_BIT systems. Has anyone had experience writing a file on a system
where CHAR_BIT is one value (let's use the value of 10) and then
reading said file from a system where this value is different (let's
say the common value of 8)?

When this has been brought up in the past, if I recall correctly
the most common suggestion has been that, if you want to write
portable data files in C, you should only use the least
significant 8 bits of each byte (and zero the rest).
--
Go not to Usenet for counsel, for they will say both no and yes.

Jan 2 '07 #2

Keith Thompson

Ben Pfaff <bl*@cs.stanford.eduwrites:

ch**********@merck.com writes:
>I have a quick question regarding access to a file from disparate
CHAR_BIT systems. Has anyone had experience writing a file on a system
where CHAR_BIT is one value (let's use the value of 10) and then
reading said file from a system where this value is different (let's
say the common value of 8)?

When this has been brought up in the past, if I recall correctly
the most common suggestion has been that, if you want to write
portable data files in C, you should only use the least
significant 8 bits of each byte (and zero the rest).

And even then, transferring and possibly translating the data is
likely to be non-trivial; it's certainly not defined by the standard.

If both systems have mechanisms for sending and receiving data as
streams of bits, then those mechanisms can be used to achieve a sort
of commonality; bits are bits. Or, if the CHAR_BIT==10 system
supports some networking standard, it will be probably able to send
and receive streams of octets somehow, since that's how most modern
networking protocols are defined.

It's not likely that a CHAR_BIT==8 system and a CHAR_BIT==10 system
would be able to share a common file system; CHAR_BIT!=8 systems tend
to be embedded, and might not even support a file system. But the
standard certainly doesn't preclude the possibility, and if this is
done, the details are going to be system-specific.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Jan 2 '07 #3

Chris Torek

In article <11**********************@s34g2000cwa.googlegroups .com>
<ch**********@merck.comwrote:

>I have a quick question regarding access to a file from disparate
CHAR_BIT systems. Has anyone had experience writing a file on a system
where CHAR_BIT is one value (let's use the value of 10) and then
reading said file from a system where this value is different (let's
say the common value of 8)?

For actual historical implementations, just look at the standard
for FTP. (I assume you have access to a "raw-style" ftp command,
rather than the all-automatic, usually-passive implmenentations
built into various browsers under the "ftp://user@host/path"
syntax.) Note that there is usually a "binary" command, which
corresponds to the protocol-level operation, "TYPE L BYTESIZE 8".

>I'm just curious how this would play out with respect to the standards,
etc. So for example, if I have a system where CHAR_BIT is 10 and I
write a single character to a hard disk file (using fputc, POSIX write,
etc...), and then move this hard drive to a system with CHAR_BIT set to
8 and attempt to read, what would occur?

Your first problem turns out to be "and then move this hard drive".
The kinds of "hard drive"s that plug into 6, 7, 9, or 10-bit byte
hardware do not plug into 8-bit-byte hardware. (For one thing,
they have the wrong number of pins on the end of the connector,
since they have a different bus width.)

As it turns out, however, there usually are *some* pieces of
hardware you can use to transfer the data. When you do, one of
several things happens:

- "Extra" bits simply vanish. If they were not predictable,
you are in trouble.

- "Extra" bits are re-coded according to some scheme, e.g., a
36-bit word is reported as four octets (8-bit-bytes), plus a
fifth octet in which at most four bits are ones.

- "Missing" bits are reported as constant, usually 0 (i.e., 6-bit
FIELDATA character data comes out as octets in the range 0..63).

- "Missing" bits are filled in with junk, which you must mask
off.

>Obviously I would need at least two "char" reads, but what happens
to the 2 bits in the second read? Are the treated most significant,
least significant, etc.?

Yes, or sometimes no. :-)

>What would a file size even be reported on such a system?

On most of these systems, the concept of "file size" was pretty
nebulous in the first place. A file had a different number of
bytes (of whatever byte-size) stored in it depending on how you
accessed it. These systems had a plethora of "access methods",
which -- as Ken Thompson put it -- "filled a much-needed gap".
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

Jan 2 '07 #4

Stephen Sprunk

<ch**********@merck.comwrote in message
news:11**********************@s34g2000cwa.googlegr oups.com...

I have a quick question regarding access to a file from disparate
CHAR_BIT systems. Has anyone had experience writing a file on a
system
where CHAR_BIT is one value (let's use the value of 10) and then
reading said file from a system where this value is different (let's
say the common value of 8)?

I'm just curious how this would play out with respect to the
standards,
etc. So for example, if I have a system where CHAR_BIT is 10 and I
write a single character to a hard disk file (using fputc, POSIX
write,
etc...), and then move this hard drive to a system with CHAR_BIT set
to
8 and attempt to read, what would occur? Obviously I would need at
least two "char" reads, but what happens to the 2 bits in the second
read? Are the treated most significant, least significant, etc.?
What
would a file size even be reported on such a system?

You shouldn't be able to physically connect the drive to both systems in
that specific case since a system that uses a non-power-of-two char size
will, by necessity, have a different interface than a power-of-two char
one (i.e. the number of data pins will differ, among other likely
problems). In the more common case where one system uses a CHAR_BIT
that is a multiple of 8, then you could likely connect it and get the
data with the logical multiplication of chars, e.g. one 24-bit char
written equals three 8-bit chars read.

The good news is that people who use such systems are used to these
problems and will likely have tools to convert data (to the extent
conversion is possible). As long as your data doesn't stray outside of
the basic execution character set, you can safely ignore the problem in
practice. It's binary data that will bite you, and there's no portable
answer to that problem.

If there's a light at the end of the tunnel, it's that every mainstream
system (and even most embedded and HPC ones these days) has CHAR_BIT==8.
While it makes sense to ensure your code still works elsewhere, you
generally won't have to deal with moving data between worlds -- it'll
stay stuck in the world where it was created and your program can handle
it natively. Dealing with endianness issues is a far, far worse
problem.

I ask not because I've seen this, as a matter of fact I don't believe
I've ever personally run into a system where CHAR_BIT is anything
other
than 8 (although we know they do exist), but rather in an effort to
understand how to write the highest portable code possible. NOTE: I'm
not limiting this to disk file discussion only, just using it as an
example. The file could be generated on machine A and network
transferred to B. I'm just curious how this would work.

Network protocols are defined to have a specific number of bits per
byte, usually 8. The IETF goes so far as to specify its protocols (like
TCP/IP) in terms of "octets" to avoid any possible confusion. If a
system uses some other numbers of bits, it's required to adapt the data
before transmission or after reception to comply with the protocol.

S

--
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking
--
Posted via a free Usenet account from http://www.teranews.com

Jan 2 '07 #5

Walter Roberson

In article <45**********************@free.teranews.com>,
Stephen Sprunk <st*****@sprunk.orgwrote:

>You shouldn't be able to physically connect the drive to both systems in
that specific case since a system that uses a non-power-of-two char size
will, by necessity, have a different interface than a power-of-two char
one (i.e. the number of data pins will differ, among other likely
problems).

That depends on how far deep you want to get in your definition of
"physically connect". SATA and similar technologies use serial
interfaces, and so at the connection point are not bound to any
particular byte width.

>Network protocols are defined to have a specific number of bits per
byte, usually 8. The IETF goes so far as to specify its protocols (like
TCP/IP) in terms of "octets" to avoid any possible confusion. If a
system uses some other numbers of bits, it's required to adapt the data
before transmission or after reception to comply with the protocol.

The wording you have used might be interpreted by some as indicating
that all networking must be done in 8-bit bytes. That is not the case:
the protocols not defined by IETF (or similar) can use whatever
they want internally, subject to the limitation that they have to
pad terminally to an octet boundary if they want to use ethernet
(and even now, not everything is ethernet.)
--
Programming is what happens while you're busy making other plans.

Jan 2 '07 #6

Gordon Burditt

>I have a quick question regarding access to a file from disparate

>CHAR_BIT systems. Has anyone had experience writing a file on a system
where CHAR_BIT is one value (let's use the value of 10) and then
reading said file from a system where this value is different (let's
say the common value of 8)?

I'm just curious how this would play out with respect to the standards,
etc. So for example, if I have a system where CHAR_BIT is 10 and I
write a single character to a hard disk file (using fputc, POSIX write,
etc...), and then move this hard drive to a system with CHAR_BIT set to
8 and attempt to read, what would occur? Obviously I would need at
least two "char" reads, but what happens to the 2 bits in the second
read? Are the treated most significant, least significant, etc.? What
would a file size even be reported on such a system?

I seem to recall some early DEC hardware (tape, and possibly disks)
that read and wrote data in 16-bit chunks. You could maybe re-connect
this hardware to different systems (e.g. PDP-11 vs. IBM 360), or
move media between them. The interesting thing here:

- 16-bit words (aligned) DID NOT have byte-order problems.
- Strings COULD have byte-order problems (even a zero-length string!).
I believe the UNIX 'dd' command option conv=swab was invented to
deal with this. Related options were various ways of translating
between ASCII and EBCDIC.

Jan 3 '07 #7

by: lester | last post by:

a pre-beginner's question: what is the pros and cons of .net, compared to ++ I am wondering what can I get if I continue to learn C# after I have learned C --> C++ --> C# ?? I think there...

.NET Framework

bit manipulation question

by: Elijah Bailey | last post by:

I have a long x; I want to write a function long f(long x, int k) such that it extracts every k-th bit of x, concatenates them and returns it. Anyone can help me in writing this function? ...

C / C++

beginner question

by: Michael | last post by:

In the past I have developed many micro-controller products using assembler. I have recently started using C, and so far I love it! I am very motivated to learn and become accustomed to the...

C / C++

Question about unpacking a binary file: endian troubles

by: David Buchan | last post by:

Hi guys, This may be a dumb question; I'm just getting into C language here. I wrote a program to unpack a binary file and write out the contents to a new file as a list of unsigned integers....

C / C++

Question converting unsigned char [] to int

by: No Such Luck | last post by:

Hi all: I have an unsigned char array (size 4): unsigned char array; array = 0x00; array = 0x00; array = 0x02; array = 0xe7;

C / C++

Thread synchronization question

by: Alvin Bruney [MVP] | last post by:

What happens in the case where a mutex is used to protect a file on a windows application, but a unix script running elsewhere attempts to manipulate that file while the mutex is not signaled? Is...

C# / C Sharp

Quick question about Hashtable.....

by: Robin Tucker | last post by:

When I create a hashtable hashing on Object-->Item, can I mix "string" and "integer" as the key types? I have a single thumbnail cache for a database with (hashed on key) and a file view (hashed...

Visual Basic .NET

XML Scenario Question

by: needin4mation | last post by:

Not sure where to ask this. I'm using .net, but not really a .net question. In a typical XML data sharing scenario, web service, Company A has a database. They share it via a web service. But,...

.NET Framework

FAQ-Question

by: Till Crueger | last post by:

Hi, I stumbled upon the following code to determine byte ordering in the FAQ: union { int i; char c; } x; /* do stuff */

C / C++

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Question about disparate CHAR_BIT systems and file access

Similar topics