473,513 Members | 2,654 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

processing bytes

I have some data (say in a file) that needs to be handled byte by
byte. Source code I have looked at does this by treating the data as a
stream of 'char's. However, the standard does not require a 'char' to
be a byte wide. Stroustrup's TC++PL mentions an implementation where a
'char' is four bytes. How can I write my program so that it will work
even with such an implementation?
Jul 22 '05 #1
12 1477
jmoy wrote:
I have some data (say in a file) that needs to be handled byte by
byte. Source code I have looked at does this by treating the data as a
stream of 'char's. However, the standard does not require a 'char' to
be a byte wide.
Yes, it does. It just doesn't reqire a byte to be exactly 8 bit wide.
Stroustrup's TC++PL mentions an implementation where a 'char' is four
bytes.
I doubt that.
How can I write my program so that it will work even with such an
implementation?


You would need to read in the bytes and split them up using bit
manipulation operators.

Jul 22 '05 #2
jmoy wrote:
I have some data (say in a file) that needs to be handled byte by
byte. Source code I have looked at does this by treating the data as a
stream of 'char's. However, the standard does not require a 'char' to
be a byte wide. Stroustrup's TC++PL mentions an implementation where a
'char' is four bytes. How can I write my program so that it will work
even with such an implementation?


In modern times, few devices actually require byte by byte handling.
Most of them are designed for speed, and that speed is most efficient
handling "streams" of bytes. For example, many files like to transfer
sectors at a time. So one would allocate an array the size of a sector,
read the sector from the file, then parse through the array.

For text files, especially ones that have records delineated by a
newline, are best read line by line into a std::string. This is still
not processing the file byte by byte.

Also, you will have to separate in your mind, the concept between a
byte, octet and a character. A byte is the minimal unit in computing;
it can be 8 or more bits. The bits in a byte need not be a multiple or
power of 8. An octect is a unit of 8 bits. A character is a single
textual unit, often times a letter. The character may be as small as
6 bits or higher; the number of bits used depends on the platforms
character encoding scheme. The CDC Cyber computers have a 6 / 12 bit
character (popular letters take 6 bits, less popular require 12 bits).
Some Asian character sets require 16 or more bits. Just remember that
there is a difference between a byte, octet and char.
--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book

Jul 22 '05 #3
jmoy wrote:
I have some data (say in a file) that needs to be handled byte by
byte. Source code I have looked at does this by treating the data as a
stream of 'char's. However, the standard does not require a 'char' to
be a byte wide. Stroustrup's TC++PL mentions an implementation where a
'char' is four bytes. How can I write my program so that it will work
even with such an implementation?


Didn't the CDC have a 36-bit word? Not sure if C implementations used a
6-bit, 8-bit, 9-bit, 18-bit or 36-bit char.
Jul 22 '05 #4
"jmoy" <na******@yahoo.co.in> wrote in message
news:8d**************************@posting.google.c om...
I have some data (say in a file) that needs to be handled byte by
byte.
Then use type 'unsigned char'.
Source code I have looked at does this by treating the data as a
stream of 'char's.
That's the only way i/o can be done in (standard) C++.
However, the standard does not require a 'char' to
be a byte wide.
Really?
ISO/IEC 14882:1998(E)

1.7 The C++ memory model

1 The fundamental storage unit in the C++ memory model is the
byte. A byte is at least large enough to contain any member
of the basic execution character set and is composed of a
contiguous sequence of bits, the number of which is
implementation*defined. The least significant bit is called
the low*order bit; the most significant bit is called the
high*order bit. The memory available to a C++ program consists
of one or more sequences of contiguous bytes. Every byte has
a unique address.
Stroustrup's TC++PL mentions an implementation where a
'char' is four bytes.
I can't seem to locate this mention in my copy of Stroustrup.
What page?
How can I write my program so that it will work
even with such an implementation?


Such an implementation (where 'char' types have a size greater
than one byte) does not conform to the C++ standard.

-Mike
Jul 22 '05 #5
red floyd wrote:
Didn't the CDC have a 36-bit word? Not sure if C implementations used a
6-bit, 8-bit, 9-bit, 18-bit or 36-bit char.


60 bits. Characters were 6 bits in Pascal.
Jul 22 '05 #6
jmoy posted:
I have some data (say in a file) that needs to be handled byte by
byte. Source code I have looked at does this by treating the data as a
stream of 'char's. However, the standard does not require a 'char' to
be a byte wide. Stroustrup's TC++PL mentions an implementation where a
'char' is four bytes. How can I write my program so that it will work
even with such an implementation?

I thought there was something in the Standard that said a byte had to be
atleast 8 bits? Is there anything at all in the Standard limiting the
minimum size? I sure hope there is!

-JKop
Jul 22 '05 #7
JKop wrote:
jmoy posted:
I have some data (say in a file) that needs to be handled byte by
byte. Source code I have looked at does this by treating the data as
a stream of 'char's. However, the standard does not require a 'char'
to be a byte wide. Stroustrup's TC++PL mentions an implementation
where a 'char' is four bytes. How can I write my program so that it
will work even with such an implementation?

I thought there was something in the Standard that said a byte had to
be atleast 8 bits? Is there anything at all in the Standard limiting
the minimum size? I sure hope there is!


Yep. In C++, the minimum number of bits in a byte is 8. There is however
no maximum number, so a conforming implementation could be made that
has 5173 bits/byte.

Jul 22 '05 #8
Rolf Magnus posted:
JKop wrote:
jmoy posted:
I have some data (say in a file) that needs to be handled byte by
byte. Source code I have looked at does this by treating the data as
a stream of 'char's. However, the standard does not require a 'char'
to be a byte wide. Stroustrup's TC++PL mentions an implementation
where a 'char' is four bytes. How can I write my program so that it
will work even with such an implementation?

I thought there was something in the Standard that said a byte had to
be atleast 8 bits? Is there anything at all in the Standard limiting
the minimum size? I sure hope there is!


Yep. In C++, the minimum number of bits in a byte is 8. There is however
no maximum number, so a conforming implementation could be made that
has 5173 bits/byte.

I was thinking, if there _wasn't_ a lower minimum, then even things like so
would be undefined behaviour:
unsigned int k = 515;
-JKop
Jul 22 '05 #9
JKop wrote:
Rolf Magnus posted:
JKop wrote:
jmoy posted:

I have some data (say in a file) that needs to be handled byte by
byte. Source code I have looked at does this by treating the data
as a stream of 'char's. However, the standard does not require a
'char' to be a byte wide. Stroustrup's TC++PL mentions an
implementation where a 'char' is four bytes. How can I write my
program so that it will work even with such an implementation?
I thought there was something in the Standard that said a byte had
to be atleast 8 bits? Is there anything at all in the Standard
limiting the minimum size? I sure hope there is!


Yep. In C++, the minimum number of bits in a byte is 8. There is
however no maximum number, so a conforming implementation could be
made that has 5173 bits/byte.

I was thinking, if there _wasn't_ a lower minimum, then even things
like so would be undefined behaviour:
unsigned int k = 515;


Well, for unsigned int, a minimum range from 0 to 65535 is guaranteed.

Jul 22 '05 #10
On Mon, 14 Jun 2004 14:23:52 GMT, Thomas Matthews
<Th****************************@sbcglobal.net> wrote in comp.lang.c++:
jmoy wrote:
I have some data (say in a file) that needs to be handled byte by
byte. Source code I have looked at does this by treating the data as a
stream of 'char's. However, the standard does not require a 'char' to
be a byte wide. Stroustrup's TC++PL mentions an implementation where a
'char' is four bytes. How can I write my program so that it will work
even with such an implementation?


In modern times, few devices actually require byte by byte handling.
Most of them are designed for speed, and that speed is most efficient
handling "streams" of bytes. For example, many files like to transfer
sectors at a time. So one would allocate an array the size of a sector,
read the sector from the file, then parse through the array.


Actually, I need to disagree with you on this. While it may hold true
for high-level applications in hosted environments, it breaks down
quickly for many type of communication interfaces even in those same
environments.

Many programs must deal with a "clump" of data obtained from
(Ethernet, FireWire, USB, CAN, serial port, etc.). The details of the
device are hardware specific and off-topic here, but how a program can
parse and extract various data types from such a "clump" is not.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html
Jul 22 '05 #11
Rolf Magnus posted:
JKop wrote:
Rolf Magnus posted:
JKop wrote:

jmoy posted:

> I have some data (say in a file) that needs to be handled byte by
> byte. Source code I have looked at does this by treating the data
> as a stream of 'char's. However, the standard does not require a
> 'char' to be a byte wide. Stroustrup's TC++PL mentions an
> implementation where a 'char' is four bytes. How can I write my
> program so that it will work even with such an implementation?
I thought there was something in the Standard that said a byte had
to be atleast 8 bits? Is there anything at all in the Standard
limiting the minimum size? I sure hope there is!

Yep. In C++, the minimum number of bits in a byte is 8. There is
however no maximum number, so a conforming implementation could be
made that has 5173 bits/byte.

I was thinking, if there _wasn't_ a lower minimum, then even things
like so would be undefined behaviour:
unsigned int k = 515;


Well, for unsigned int, a minimum range from 0 to 65535 is guaranteed.


If it ain't too much trouble, could you please post all the limits.

-JKop
Jul 22 '05 #12
JKop <NU**@NULL.NULL> wrote in message news:
Well, for unsigned int, a minimum range from 0 to 65535 is guaranteed.


If it ain't too much trouble, could you please post all the limits.


If it ain't too much trouble, could you please read the Standard?
Anyway, the limits are the same as for C , and you can view a draft
C standard here (ignore the 'long long' entries):
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n869/
Jul 22 '05 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
2429
by: D | last post by:
Hi folks, This may be pretty simple for you guys but it has me stumped. BTW I'm using Java 1.1, I know it's old, don't ask me why, I just have to. I have a long string in excess of 50k that I...
8
1577
by: changereality | last post by:
I am trying to process raw IIS log files and insert them into a MySQL database. I have no problem accomplishing this, but the php code runs very slow. Right now, it is processing 10,000 lines in...
1
2354
by: Anthony Liu | last post by:
I believe that topic related to Chinese processing was discussed before. I could not dig out the info I want from the mail list archive. My Python script reads some Chinese text and then split...
0
1200
by: Claire | last post by:
My application has a thread reading byte arrays from an unmanaged dll(realtime controller monitoring). The array represents an unmanaged struct containing a series of header fields plus a variable...
4
3582
by: Alexis Gallagher | last post by:
(I tried to post this yesterday but I think my ISP ate it. Apologies if this is a double-post.) Is it possible to do very fast string processing in python? My bioinformatics application needs to...
5
2853
by: paul | last post by:
Hi all, Could some kind soul peruse the following code and see if there is anything wrong with it? Its producing output, but its only occupying the first third of the output array; to give an...
10
5690
by: Enrique Cruiz | last post by:
Hello all, I am currently implementing a fairly simple algorithm. It scans a grayscale image, and computes a pixel's new value as a function of its original value. Two passes are made, first...
20
2248
by: ML | last post by:
Integers are stored in tables using only 4 bytes. Is there a way in SQL to retrieve the value as it is actually stored, not converted back into the displayed number? For example, if I have...
6
332
by: Jeff | last post by:
Hello I want to read and process and rewrite a very large disk based file (>3Gbytes) as quickly as possible. The processing effectively involves finding certain strings and replacing them with...
0
7157
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7535
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
7098
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
7521
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
5682
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
5084
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
4745
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3232
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
455
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.