OK...I'm in the process of learning C++. In my old (non-portable)
programming days, I made use of binary files a lot...not worrying
about endian issues. I'm starting to understand why C++ makes it
difficult to read/write an integer directly as a bit-stream to a file.
However, I'm at a bit of a loss for how to do the following. So as
not to obfuscate the issue, I won't show what I've been attempting ;-)
What I want to do is the following, using the standare IO streams.
1) open an arbitrary file (file1).
2) starting with the first byte in (file1), read a chunk of data into
an array of integers.
3) manipulate the array, as integer data, and then output the contents
of the array to another file (file2).
4) read the next data-chunk from file1 into the array.
5) goto 3 until end of file.
If anyone knows of a tutorial that contains concrete examples of this,
I'd appreciate a pointer to the info. Thanks 10 9104
On 24 Jul 2003 16:56:53 -0700, ma**********@ya hoo.com (J. Campbell)
wrote: OK...I'm in the process of learning C++. In my old (non-portable) programming days, I made use of binary files a lot...not worrying about endian issues. I'm starting to understand why C++ makes it difficult to read/write an integer directly as a bit-stream to a file. However, I'm at a bit of a loss for how to do the following. So as not to obfuscate the issue, I won't show what I've been attempting ;-)
What I want to do is the following, using the standare IO streams.
# include <fstream>
# include <iostream>
# include <vector>
# include <sstream>
# include <string>
# include <algorithm>
1) open an arbitrary file (file1).
std::ifstream file1("f.txt");
2) starting with the first byte in (file1), read a chunk of data into an array of integers.
const int CHUNK = 128;
char buffer[CHUNK];
file1.read(buff er, CHUNK);
std::vector<int > data;
std::copy(buffe r, buffer + 128, std::back_inser ter(data));
3) manipulate the array, as integer data,
void manipulate(std: :vector<int> &v);
manipulate(data );
and then output the contents of the array to another file (file2).
std::ofstream file2("g.txt"); ;
std::copy(data. begin(), data.end(),
std::ostream_it erator<int>(std ::cout, "\n"));
4) read the next data-chunk from file1 into the array. 5) goto 3 until end of file.
goto 3; :)
If anyone knows of a tutorial that contains concrete examples of this, I'd appreciate a pointer to the info. Thanks
The C++ Standard Library by Josuttis.
Jonathan
On Thu, 24 Jul 2003 20:39:23 -0400, Jonathan Mcdougall
<DE************ ******@yahoo.ca > wrote: On 24 Jul 2003 16:56:53 -0700, ma**********@ya hoo.com (J. Campbell) wrote:
OK...I'm in the process of learning C++. In my old (non-portable) programming days, I made use of binary files a lot...not worrying about endian issues. I'm starting to understand why C++ makes it difficult to read/write an integer directly as a bit-stream to a file. However, I'm at a bit of a loss for how to do the following. So as not to obfuscate the issue, I won't show what I've been attempting ;-)
What I want to do is the following, using the standare IO streams. # include <fstream> # include <vector> # include <algorithm>
Forget these ones :
# include <sstream> # include <iostream> # include <string>
1) open an arbitrary file (file1). std::ifstrea m file1("f.txt");
2) starting with the first byte in (file1), read a chunk of data into an array of integers.
const int CHUNK = 128;
char buffer[CHUNK]; file1.read(buf fer, CHUNK);
std::vector<in t> data; std::copy(buff er, buffer + 128, std::back_inser ter(data));
std::copy(buffe r, buffer + CHUNK, std::back_inser ter(data)); 3) manipulate the array, as integer data,
void manipulate(std: :vector<int> &v);
manipulate(dat a);
and then output the contents of the array to another file (file2).
std::ofstrea m file2("g.txt"); ; std::copy(data .begin(), data.end(), std::ostream_it erator<int>(std ::cout, "\n"));
std::copy(data. begin(), data.end(),
std::ostream_it erator<int>(fil e2, "\n"));
Sorry about that,
Jonathan
Thanks Jonathan.
Your response is most helpful. Now, I need to digest why it works,
and why it's necessarry.
I want to clairify a few things. Assuming int is 32-bits, then,
after:
-----
const int CHUNK = 128;
char buffer[CHUNK];
file1.read(buff er, CHUNK);
------
at this point the char array, "buffer" contains 128 elements of 1-byte
each, right?
-----
std::vector<int > data;
std::copy(buffe r, buffer + 128, std::back_inser ter(data));
-----
now, the vector named "data" contains 32 elements, each of which is a
4-byte integer, right?
How do I know if the bytes that went into the vector integers went in
head-first or feet-first? in other words, if the first 4 bytes of the
file were (HEX):
FF 00 00 00
will the first int in the vector "data" be FF000000 (dec 4278190080)
or will it be 000000FF (dec 255)? Or is it machine dependent?
can I avoid all the "std::" by using "using namespace std;" or is it
necessary to scope-resolve all the keywords?
Another thing... Do you think it's better to read chunks of a file as
I've indicated, or is it better to load the whole file into memory?
Also, your method leaves 2-duplicates of the data in memory...one as
the char array, and once as the vector. is this a problem?
One more thing...I asked a question here recently: http://groups.google.com/groups?hl=e...gle.com&rnum=1
about accessing a char array as an array of int. How is the vector
method different/safer than the (unsafe & non-portable) method I
demonstrated in the earlier post.
thanks again for the help.
I don't seem to be able to quit typing;-) Sorry to innundate you with
so many questions...I realize that you may not choose to address them
all..
Thomas Matthews wrote: To nitpick, the constant should be "unsigned" since a quantity can't be negative. i.e. const unsigned int CHUNK_SIZE = 128;
I'd disagree. It should be signed, since you might have negative offsets
when accessing the array elements, and mixing signed and unsigned
arithmetic can be problematic, and some compilers warn if you do.
Besides, what would you really gain from making it unsigned? std::vector<int > data; std::copy(buffe r, buffer + 128, std::back_inser ter(data)); ----- now, the vector named "data" contains 32 elements, each of which is a 4-byte integer, right? A 4-byte _signed_ integer.
Yes, as int is by default signed. How do I know if the bytes that went into the vector integers went in head-first or feet-first? in other words, if the first 4 bytes of the file were (HEX): FF 00 00 00 will the first int in the vector "data" be FF000000 (dec 4278190080) or will it be 000000FF (dec 255)? Or is it machine dependent? It is machine dependent. The topic is called Endianism.
I've only seen it be called Enidaness.
Try this experiment: const unsigned int endian_test = 0x01020304; unsigned char byte0; unsigned char byte1; unsigned char byte2; unsigned char byte3; unsigned char * ptr = (unsigned char *) &endian_test ; byte0 = *ptr++; byte1 = *ptr++; byte2 = *ptr++; byte3 = *ptr++; cout << hex << (unsigned short) byte0 << endl; cout << hex << (unsigned short) byte1 << endl; cout << hex << (unsigned short) byte2 << endl; cout << hex << (unsigned short) byte3 << endl;
can I avoid all the "std::" by using "using namespace std;" or is it necessary to scope-resolve all the keywords?
This is a personal, style, issue. Here are some popular styles: 1. Declare each function and class with a separate "using" statement: using std::cout; using std::vector; 2. Use the global "using" statement: using namespace std; 3. Prefix each function and class with its namespace: std::cout << "hello" << std::endl; There are different opinions on which to use. Use a search engine and search this newsgroup for "namespace" and "using".
At least, most people seem to agree that it's a bad idea to put
something like this in a header.
Btw: you can also put using into functions. Another thing... Do you think it's better to read chunks of a file as I've indicated, or is it better to load the whole file into memory? If you have the space, read in the whole file; otherwise read it in as chunks. The fewer reads, the faster the execution.
Not necessarily. If you need maximum speed, you should test it for
different block sizes.
On 25 Jul 2003 08:38:23 -0700, ma**********@ya hoo.com (J. Campbell)
wrote: Thanks Jonathan.
Your response is most helpful. Now, I need to digest why it works, and why it's necessarry.
I want to clairify a few things. Assuming int is 32-bits, then, after:
You can't "assume" this, it depends on the platform. Anyways it does
not matter in this case.
----- const int CHUNK = 128;
char buffer[CHUNK]; file1.read(buf fer, CHUNK); ------ at this point the char array, "buffer" contains 128 elements of 1-byte each, right?
Yes.
----- std::vector<in t> data; std::copy(buff er, buffer + 128, std::back_inser ter(data)); ----- now, the vector named "data" contains 32 elements, each of which is a 4-byte integer, right?
No, 'data' contains 128 elements of type int. Each element has a size
of sizeof(int), which *could* be 4 bytes.
data[0]
contains the value which was in
buffer[0]
For example, if the first byte in the file was 65, then buffer[0]
contains char(65) (which is 'A') and data[0] simply contains 65.
can I avoid all the "std::" by using "using namespace std;" or is it necessary to scope-resolve all the keywords?
Yes, but I personnaly not recommend it. I prefer to qualify
everything, but it is a matter of style (and carefulness).
Another thing... Do you think it's better to read chunks of a file as I've indicated, or is it better to load the whole file into memory?
Depends on the file size and the memory available.
Also, your method leaves 2-duplicates of the data in memory...one as the char array, and once as the vector. is this a problem?
Well you explicitly wanted an array of integers and since there is no
function which takes an int[], I needed to do a conversion.
One more thing...I asked a question here recently:
http://groups.google.com/groups?hl=e...gle.com&rnum=1
about accessing a char array as an array of int. How is the vector method different/safer than the (unsafe & non-portable) method I demonstrated in the earlier post.
Variable-length arrays are, afaik, illegal in C++ anyways. Take a
look at that : http://www.btinternet.com/~chrisnewton/pp/contarray.xml
Jonathan
>> ----- std::vector<int > data; std::copy(buffe r, buffer + 128, std::back_inser ter(data)); ----- now, the vector named "data" contains 32 elements, each of which is a 4-byte integer, right? A 4-byte _signed_ integer.
I just want to remind you that 'data' contains *128* elements, not 32
and that the endianness discussion does not apply.
<snip>
Jonathan
Jonathan,
I just tried out your method, and it leaves me scratching my head.
After stumbling briefly for lack of the header to define
back_inserter() and ostream_iterato r() (thanks Google and SGI), the
code compiles fine:
__________code_ _______________ __
#include <fstream>
#include <vector>
#include <iterator>
using namespace std;
int main(){
const int DATACHUNK = 20;
char buffer[DATACHUNK];
ifstream filein("shiftte st.cpp");
filein.read(buf fer, DATACHUNK);
vector<int> filedata;
copy(buffer, buffer + DATACHUNK, back_inserter(f iledata));
ofstream fileout("shiftt est.joe");
copy(filedata.b egin(), filedata.end(),
ostream_iterato r<int>(fileout , "\n" ));
}
_____end code___________ ______
However, when I look at the file out, it contains:
35
105
110
99
108
117
100
101
32
60
105
111
115
116
114
101
97
109
62
10
which is the ASCII representation of the integer representation of the
ASCII sequence "#include <iostream>"
which, strangely enough, happens to be the first line of
"shifttest. cpp" ;-)
This is really not at all what I am wanting to do. Now my 20 bytes is
represented by 93 bytes of a rather odd data-type...neither characters
nor integers, but rather some strange beast that combines the worst of
both worlds.
I'm left wondering, in this strange new world of C++ do I need to get
used to dealing with ASCII representations of numbers for file I/O?
Or do I need to always break my 4-byte integers into individual bytes
prior to I/O if I don't want to waste storage space? I suppose this
would be pretty easy...somethin g like:
//not tested
int bytetowrite;
char holdword[4];
for(int i = 0; i < 4; i++)
holdword[i] = (bytetowrite & (255 << (i * 8))) >> (i * 8);
//holdword now contains, small-byte first, the data from bytetowrite
However, this seems a bit tedious, considering that this rigamarole
doesn't really do anything to the internal data. I feel like there's
something really basic that I don't *get* about streams... All I
really want to do is "get at" the data in a file and treat that data
as numbers typed to the native processor word size...then, manipulate
the data and write the data out to a second file. Consider, for
example, that the file consists of a binary bitmap and I want to
invert it, or rotate it or something.
Anyway...It's apparent that I have a lot to learn. This C++ is
tantalizing me...the code is about 10 to 20 x faster than my old
16-bit compiler...but jeez...what would seem to be a simple
manipulation can become so frustrating!!! It feels a little like
typing with my toes.
Thanks for the help people. It is beginning to make some sense.
Joe
Jonathan Mcdougall <DE************ ******@yahoo.ca > wrote in message news:<kb******* *************** **********@4ax. com>... On Thu, 24 Jul 2003 20:39:23 -0400, Jonathan Mcdougall <DE************ ******@yahoo.ca > wrote:
On 24 Jul 2003 16:56:53 -0700, ma**********@ya hoo.com (J. Campbell) wrote:
OK...I'm in the process of learning C++. In my old (non-portable) programming days, I made use of binary files a lot...not worrying about endian issues. I'm starting to understand why C++ makes it difficult to read/write an integer directly as a bit-stream to a file. However, I'm at a bit of a loss for how to do the following. So as not to obfuscate the issue, I won't show what I've been attempting ;-)
What I want to do is the following, using the standare IO streams.
# include <fstream> # include <vector> # include <algorithm>
Forget these ones :
# include <sstream> # include <iostream> # include <string>
1) open an arbitrary file (file1).
std::ifstrea m file1("f.txt");
2) starting with the first byte in (file1), read a chunk of data into an array of integers.
const int CHUNK = 128;
char buffer[CHUNK]; file1.read(buf fer, CHUNK);
std::vector<in t> data; std::copy(buff er, buffer + 128, std::back_inser ter(data));
std::copy(buffe r, buffer + CHUNK, std::back_inser ter(data));
3) manipulate the array, as integer data,
void manipulate(std: :vector<int> &v);
manipulate(dat a);
and then output the contents of the array to another file (file2).
std::ofstrea m file2("g.txt"); ; std::copy(data .begin(), data.end(), std::ostream_it erator<int>(std ::cout, "\n"));
std::copy(data. begin(), data.end(), std::ostream_it erator<int>(fil e2, "\n"));
Sorry about that,
Jonathan
Jonathan Mcdougall <DE************ ******@yahoo.ca > wrote in message news:<k7******* *************** **********@4ax. com>... ----- std::vector<int > data; std::copy(buffe r, buffer + 128, std::back_inser ter(data)); ----- now, the vector named "data" contains 32 elements, each of which is a 4-byte integer, right? A 4-byte _signed_ integer.
I just want to remind you that 'data' contains *128* elements, not 32 and that the endianness discussion does not apply.
<snip>
Jonathan
Jonathan...I now understand what's going on and the endianness
discussion. My news reader has serious lag, so I may not be current
with the discussion. However...I understand more after this post.
when I said I wanted the file bytes represented by integers, I meant
that I wanted the first ((char)/sizeof(int)) (eg 4) bytes of data to
be put into integerarray[0], the next into integerarray[1]...etc.
Anyway...thanks for clairifying this.
>I just tried out your method, and it leaves me scratching my head. After stumbling briefly for lack of the header to define back_inserter( ) and ostream_iterato r() (thanks Google and SGI), the code compiles fine:
This depends on the implementation. The standard does not specify
which header must be included by which; <iterator> probably got
included by <algorithm>, sorry about that.
__________code _______________ ___
#include <fstream> #include <vector> #include <iterator>
using namespace std;
int main(){ const int DATACHUNK = 20; char buffer[DATACHUNK];
ifstream filein("shiftte st.cpp"); filein.read(buf fer, DATACHUNK);
vector<int> filedata; copy(buffer, buffer + DATACHUNK, back_inserter(f iledata));
ofstream fileout("shiftt est.joe"); copy(filedata.b egin(), filedata.end(), ostream_iterato r<int>(fileout , "\n" )); }
_____end code___________ ______
However, when I look at the file out, it contains:
35 105 110 99 108 117 100 101 32 60 105 111 115 116 114 101 97 109 62 10
which is the ASCII representation of the integer representation of the ASCII sequence "#include <iostream>" which, strangely enough, happens to be the first line of "shifttest.cpp " ;-)
You asked for binary, that is what I gave you. If you want the ASCII
representation, just make the ostream_iterato r <char>, that's it.
This is really not at all what I am wanting to do. Now my 20 bytes is represented by 93 bytes
93 ?? Why do you say that?
of a rather odd data-type...neither characters nor integers, but rather some strange beast that combines the worst of both worlds.
These numbers you saw are the ASCII value of the characters in the
file. The thing is, characters and integers are actually the very
same thing, it's just the output which makes the difference : ints are
displayed as numbers and chars are displayed as characters, which
depend on your implementation (but you are probably using ASCII).
Remember your subject is "Binary file I/O", not "Text file I/O".
I'm left wondering, in this strange new world of C++ do I need to get used to dealing with ASCII representations of numbers for file I/O?
It depends on what you want. In the case of a simple text file
(remember, *text* is a ambiguous term in programming, everything boils
down to zeros and ones) , values would be ASCII numbers and text would
be the representation on the screen (65 would be 'A').
In the case of a binary file (such as an image), values would be
simple numbers formatted according to the image's type (jpg, bmp..)
and text would be... garbage, since these numbers would be printed
according to the ASCII table (remember when you first started and
tried to display binary files on screen? Loads of smileys and beeps
and ascii graphics..).
However, this seems a bit tedious, considering that this rigamarole doesn't really do anything to the internal data. I feel like there's something really basic that I don't *get* about streams... All I really want to do is "get at" the data in a file and treat that data as numbers typed to the native processor word size...then, manipulate the data and write the data out to a second file. Consider, for example, that the file consists of a binary bitmap and I want to invert it, or rotate it or something.
In that case, you would store every byte in a vector of whatever
(unsigned char would be the best, I think), you skip the header until
the data, you invert it and store the whole thing in a new file.
The actual type of the vector (or array, as you wish) does not matter
except for the memory wasted.
Anyway...It' s apparent that I have a lot to learn. This C++ is tantalizing me...the code is about 10 to 20 x faster than my old 16-bit compiler...but jeez...what would seem to be a simple manipulation can become so frustrating!!! It feels a little like typing with my toes.
Hehe.. and you're still only playing with i/o.
Jonathan This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: yaipa |
last post by:
What would be the common sense way of finding a binary pattern in a
..bin file, say some 200 bytes, and replacing it with an updated pattern
of the same length at the same offset?
Also, the pattern can occur on any byte boundary in the file, so
chunking through the code at 16 bytes a frame maybe a problem. The
file itself isn't so large, maybe 32 kbytes is all and the need for
speed is not so great, but the need for accuracy in the...
|
by: wwj |
last post by:
void main()
{
char* p="Hello";
printf("%s",p);
*p='w';
printf("%s",p);
}
|
by: Ching-Lung |
last post by:
Hi all,
I try to create a tool to check the delta (diff) of 2
binaries and create the delta binary. I use binary
formatter (serialization) to create the delta binary. It
works fine but the delta binary is pretty huge in size. I
have 1 byte file and 2 bytes file, the delta should be 1
byte but somehow it turns out to be 249 bytes using binary
formatter. I guess serialization has some other things
added to the delta file.
|
by: John R. Delaney |
last post by:
I am running in debugging mode after a clean C++ compilation under .NET 2003. In a BIG loop (controlled many levels up in the call stack), I open a file with fopen using the "a" option. Then I write 23 doubles to it with fwrite, one call for each double. Then I close the file using fclose. After three times around the loop in the debugger, I stop the program (using "Stop debugging"). That is writing 552 bytes.
The resulting file's properties...
|
by: Neo |
last post by:
Hello:
I am receiving a Binary File in a Request from a application. The
stream which comes to me has the boundary (Something like
"---------------------------39<WBR>0C0F3E0099" without the quotes),
and also
some more text like this and file name (e.g. "Content-Disposition:
form-data; name="upload_file"; filename="C:\testing\myfile.da<WBR>t"
Content-Type: application/octet-stream")
| |
by: Adam J. Schaff |
last post by:
I am writing a quick program to edit a binary file that contains file paths
(amongst other things). If I look at the files in notepad, they look like:
<gibberish>file//g:\pathtofile1<gibberish>file//g:\pathtofile2<gibberish>
etc.
I want to remove the "g:\" from the file paths. I wrote a console app that
successfully reads the file and writes a duplicate of it, but fails for some
reason to do the "replacing" of the "g:\". The code...
|
by: John Dann |
last post by:
I'm trying to read some binary data from a file created by another
program. I know the binary file format but can't change or control the
format. The binary data is organised such that it should populate a
series of structures of specified variable composition.
I have the structures created OK, but actually reading the files is
giving me an error. Can I ask a simple question to start with:
I'm trying to read the file using the...
|
by: vim |
last post by:
hello everybody
Plz tell the differance between binary file and ascii
file...............
Thanks
in advance
vim
|
by: rory |
last post by:
I can't seem to append a string to the end of a binary file. I'm using
the following code:
fstream outFile("test.exe", ios::in | ios::out | ios::binary |
ios::ate | ios::app)
outFile.write("teststring", 10);
outFile.close();
If I leave out the ios::ate and ios::app modes my string is written to
the start of the file as I'd expect but I want to write the data to
|
by: Erwin Moller |
last post by:
Why is a binary file executable? Is any binary file executable? Is
only binary file executable? Are all executable files binary? What is
the connection between the attribute of binary and that of executable?
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
| |
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
| |
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
|
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |