473,382 Members | 1,657 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

fwrite() efficiency/alternative

24
Hello, all.

I became aware of issues with fwrite() a couple years ago, as it threw monkey wrenches in my data management, and I quickly wrote a function to work around this and replace it. But recently, I have decided that my current method is a serious bottleneck, and I was wondering if using fwrite() to write an array of single bytes would be fatal to portability. My current integer writing function, seen below, I've revised and optimized several times and squeezed as much as I can out of it, performance-wise. The datatypes are self explanitory and defined elsewhere; (signedness)(type)(bitcount). So ui32 is an unsigned 32 bit integer, etc.

Expand|Select|Wrap|Line Numbers
  1. ui32 Ectaraio::write( const void * ptr, ui32 size, ui32 n){
  2.     ui32 numWritten=0;
  3.     si8 * p = (si8*)ptr;
  4.     if(size == 1){
  5.         for(ui32 i = n; i--;){
  6.             putc(*(p++),fp);
  7.             numWritten++;
  8.         }
  9.     }else{
  10.         if(!endVar){
  11.             for(ui32 varIndex=n;varIndex--;){
  12.                 p+=size;
  13.                 for(ui32 byteIndex=size;byteIndex--;){
  14.                     putc(*(--p),fp);
  15.                     numWritten++;
  16.                     if(!verbose)continue;
  17.                     sf32 percent = (sf32)numWritten/(size*n)*100;
  18.                     if(((si32)percent%5==0))printf("%.2f%%\n",percent);
  19.                 }
  20.                 p+=size;
  21.             }
  22.         }else{
  23.             for(ui32 varIndex=n;varIndex--;){
  24.                 for(ui32 byteIndex=size;byteIndex--;){
  25.                     putc(*(p++),fp);
  26.                     numWritten++;
  27.                     if(!verbose)continue;
  28.                     sf32 percent = (sf32)numWritten/(size*n)*100;
  29.                     if(((si32)percent%5==0))printf("%.2f%%\n",percent);
  30.                 }
  31.             }
  32.         }
  33.     }
  34. return numWritten;
  35. }
  36.  
Currently, I am having the issue where n*size putc() calls is a lot slower than one fwrite() per size bytes, until a buffer fills and writes out. What I am concerned with, is breaking portability. I noticed years back that so much as recompiling the program somehow made my files unusable, even though I never write structures as a whole or anything of the sort. By the way, endVar is a variable containing the endianness state; 0 for LSB and 1 for MSB. Determined through a quick routine in the constructor for my IO class, part of my library. I digress. Main question, is using fwrite() to write blocks of single bytes unhealthy, if the same file were to be used on many platforms and such? Or is my current function the best I am going to get for my purposes?

- Ectara
Nov 2 '09 #1

✓ answered by donbock

The strategy taken in your code snippet is to write to the file a byte at a time; using the specified endianness to control the order you pluck bytes from the input array.

Another strategy would be to construct a copy of the input array, transforming the copy in accordance with the specified endianness. Then you can write the copy to the file in one chunk.

Personally, I've found it less confusing to use text files to communicate information between systems of potentially different endianness. The disadvantages are obvious: time taken to translate between binary and text; increased size of the file. The advantage is also obvious: processor and compiler changes don't render the data file unuseable.

There are more things than endianness to complicate your life when communicating between systems: two's-complement representation of integers is ubiquitous but not guaranteed to be universal; representation of floating point numbers is notoriously fickle; alignment rules can vary, changing the number of pad bytes between structure fields; order of bits in a bit field can vary; etc.

9 6201
newb16
687 512MB
fwrite must be ok if you pack and unpack your structures into byte array properly (taking care of endianness).
Nov 2 '09 #2
donbock
2,426 Expert 2GB
The strategy taken in your code snippet is to write to the file a byte at a time; using the specified endianness to control the order you pluck bytes from the input array.

Another strategy would be to construct a copy of the input array, transforming the copy in accordance with the specified endianness. Then you can write the copy to the file in one chunk.

Personally, I've found it less confusing to use text files to communicate information between systems of potentially different endianness. The disadvantages are obvious: time taken to translate between binary and text; increased size of the file. The advantage is also obvious: processor and compiler changes don't render the data file unuseable.

There are more things than endianness to complicate your life when communicating between systems: two's-complement representation of integers is ubiquitous but not guaranteed to be universal; representation of floating point numbers is notoriously fickle; alignment rules can vary, changing the number of pad bytes between structure fields; order of bits in a bit field can vary; etc.
Nov 2 '09 #3
Ectara
24
Also, I neglected to mention that the entire library is designed to operate in big endian, as seen in the function above that when it writes out for little endian, it writes each set of size bytes in reverse, while if big endian, it writes the bytes in order. I suppose I could do something like using a small buffer of size length to swap the bytes around and issue one fwrite() call per set. I was just curious if fwrite() really is as fickle as it was in my experience, where a mere change in something cosmetic will magically make the file unreadable after writing it. I guess I'll just keep testing and checking.
Nov 2 '09 #4
donbock
2,426 Expert 2GB
@Ectara
I'm not aware of any fickleness in the fwrite() function. What sort of cosmetic changes have caused these problems for you?
Nov 2 '09 #5
Tassos Souris
152 100+
I totally agree with donbock. Go for the text side. Besides, many very used standards (like XML) depend on text for communication between applications.

Besides, most probably, your implementation of writing a number in a file in its binary format is not as efficient as the services that the OS might provide.
For example, writing each byte to a file individually is not very efficient (remember that the file might be set to unbuffered mode by the client). Also, those two many if's and for's and ... uh.. Branch misses? Those really heart performance.
If you desperately need to write the number in its binary format into the file do:
1) Convert the number into an array of bytes with parallelism using bitwise operators. Do not use conditions.
2) Write the array with fwrite(). When you fread() that chuck of bytes from the file you will have the original array of bytes you produced.
Nov 2 '09 #6
Ectara
24
@donbock
Something silly like recompiling the executable to fix a spelling error in a string literal, unrelated to the function or the file I/O, such as a welcome message that is printed directly without manipulation. All of a sudden, reading in what was once valid data, provides partially correct data, but some stack allocated variables come up empty or bizarre numbers. Boggles my mind how it is possible when I'm looking at a valid hexdump, and how it is read hasn't changed. fwrite() seems to be working good so far in these current trials, though.

@Tassos Souris
I had done quite a bit of research on file buffering, and figured I was making several calls anyway, so whether it wrote as soon as possible or when the buffer filled would make little impact on the performance hit caused by the function overhead. Also, what branch misses? Would the if/else structure not catch all possibilities for those two variables? My brother did suggest using preprocessor directives instead of an if/else, but I prefer the if/else for the sole purpose of spoofing my endianness at will to write or read in a different endianness. It has its uses, despite the minor performance hit for each call to the function. Also, what would you suggest to replace the for loops? I have tried some new things(trusting fwrite() for now, and seeing how it holds up):

Expand|Select|Wrap|Line Numbers
  1. ui32 Ectaraio::write( const void * ptr, ui32 size, ui32 n){
  2.     ui32 numWritten=0;
  3.     si8 * p = (si8*)ptr;
  4.     if(size == 1)numWritten = fwrite(p,sizeof(si8),n,fp);
  5.     else{
  6.         si8 buffer[size];
  7.         if(!endVar){
  8.             for(ui32 varIndex=n;varIndex--;){
  9.                 for(ui32 byteIndex=size;byteIndex--;)buffer[byteIndex] = *(p++);
  10.                 numWritten+=fwrite(buffer,size,1,fp);
  11.             }
  12.         }else numWritten+=fwrite(p,size,n,fp);
  13.     }
  14.         return numWritten;
  15. }
  16.  
Also, thank you for your input everyone. The above function shaved 5 seconds off of writing out a 8.9mb dynamically allocated array of single bytes.
Nov 2 '09 #7
donbock
2,426 Expert 2GB
@Ectara
I might be wrong, but I'm sure these problems were not caused by fwrite() itself. It is much more likely that there was a change in some implementation-defined attribute of the C language, thereby causing an unexpected change in how the byte array was coded/decoded.
Nov 3 '09 #8
Ectara
24
Hm. Well, it is working for now, and many other people use it with few complaints, so I guess I can make replacements for my old byte-by-byte I/O. Is there a more efficient way to buffer and swap the order of the bytes than what I did?
Nov 3 '09 #9
Banfa
9,065 Expert Mod 8TB
@Ectara
Sounds like undefined behaviour to me. I think I would be reaching for a copy of bounds checker right about now.
Nov 3 '09 #10

Sign in to post your reply or Sign up for a free account.

Similar topics

3
by: Antoine Bloncourt | last post by:
Hello everybody Sorry to bother you but I have a problem writing datas into a file ... I want to make a backup of my MySQL database and put the result into a ..sql file. To do this, I use...
15
by: Suraj Kurapati | last post by:
Hello, I'm having a rather strange bug with this code: for certain values of 'buf', a segmentation fault occurs when 'free(buf)' is followed by an 'fwrite()'. In the program output, there is no...
2
by: Richard Hsu | last post by:
// code #include "stdio.h" int status(FILE * f) { printf("ftell:%d, feof:%s\n", ftell(f), feof(f) != 0 ? "true" : "false"); } int case1() { FILE * f = fopen("c:\\blah", "wb+"); int i = 5;
20
by: Jonathan Lamothe | last post by:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hey all. I'm trying to find a way to (portably) write 32-bit integer values to a file. As it stands, I've been using something like this: ...
2
by: Jody | last post by:
Hi I've been working on a database which basically incorporates 3 tables to describe say a widget which is either sold or leased. I have the Widget table which stores the information related...
11
by: David Mathog | last post by:
In the beginning (Kernighan & Ritchie 1978) there was fprintf, and unix write, but no fwrite. That is, no portable C method for writing binary data, only system calls which were OS specific. At...
27
by: Jeff | last post by:
Im trying to figure out why I cant read back a binary file correctly. I have the following union: #define BITE_RECORD_LEN 12 typedef union { unsigned char byte; struct { unsigned char type;...
3
by: Alpha83 | last post by:
Hi, Is there a code measuring tool that tells you which is more efficient cost-wise. For example, if I were to compare the following two identical code blocks, how do I know, which is more...
25
by: Abubakar | last post by:
Hi, recently some C programmer told me that using fwrite/fopen functions are not efficient because the output that they do to the file is actually buffered and gets late in writing. Is that...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.