473,387 Members | 1,611 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

[c] Efficiently writing large quantities of data to a file

Hi,

I need to write large quantities of data to a file in C. The data comes from statistics that are continuously gathered from a simulator, and in order to not slow the whole thing down I would obviously want the writes to go as fast and efficient as possible.

Since I/O operations are rather slow, I was thinking that using a large buffer would be better than writing each data point every time. Each data point calls my function, at which point I can do something.

As I understand correctly, fwrite() already uses a buffer, but since my file is currently growing in pieces of 4 KiB, I suppose that is their buffer size. I was thinking more in something of MiBs.

I have currently allocated a buffer of 1 MiB, in which I write something and then do fwrite.
sprintf(data->buf, "magic instr %d \n", (int)n);
fwrite(data->buf, 1, strlen(data->buf), data->fd);
memset(data->buf, 0, sizeof(data->buf));

This does not do what I want, which is filling the buffer as good as possible, and then flush it to disk. I understand that my code is wrong, but I don't really know how to solve it. Should I wrap the fwrite in an if-statement which checks whether the buffer is almost full, like

if (strlen(data->buf) + strlen(data_to_write_now) > bufsize) {
fwrite(...);
memset(..);
}
sprintf(data->buf, data_to_write_now);

?

I also wonder whether this approach would be the best way to handle this. I have heard about mmap, would it be more efficient?

And then there is the possibility to put the writing in another thread, so that the mean thread puts the data it receives from the simulator in the buffer, and the second one does the actual writing. Will this work better?

Thank you very much,
Thomas
May 23 '07 #1
3 9198
AdrianH
1,251 Expert 1GB
mmap is making a memory mapped file. It is very efficient, but can be cumbersome. You also will need to truncate the file as appropriate when you are done logging.

If you open your file as binary, you can still use fprintf() on it and it will write till it fills the buffer size instead of a line per line basis (I don’t know if that is defined or redefinable anywhere).

You could pass on to a separate thread, but be warned that if you pass a string, it had better be copied or it may no longer be there when the thread tried to read the string.

if (strlen(data->buf) + strlen(data_to_write_now) > bufsize) {
fwrite(...);
memset(..);
}
sprintf(data->buf, data_to_write_now);
Yeah, sort of. You will defiantly have to check to see if you are going to overrun the buffer. One way of using sprintf() is as follows:
Expand|Select|Wrap|Line Numbers
  1. bytesWrittenToBuffer += sprintf(buffer + bytesWrittenToBuffer, stringToOutput);
  2.  
BUT, you still must be aware of buffer overrun issues. That can be alleviated by using snprintf().
Expand|Select|Wrap|Line Numbers
  1. bytesWrittenToBuffer 
  2.   += snprintf(buffer + bytesWrittenToBuffer, bytesAllocatedToBuffer – bytesWrittenToBuffer, stringToOutput);
  3.  
Though this will not cause a buffer overrun, it will truncate making you loose part of your log if you are not careful. Still preferable compared with your programme starting to do random things though ;). So you still should look for a buffer overrun.

If you are not going to be using parameters in your format string, a simple strcpy() or strncpy() would surfice in a similar way as I described.

I would only use memset() for debugging. It is not necessary once you’ve gotten the bugs worked out.


Adrian
May 23 '07 #2
mmap is making a memory mapped file. It is very efficient, but can be cumbersome. You also will need to truncate the file as appropriate when you are done logging.

If you open your file as binary, you can still use fprintf() on it and it will write till it fills the buffer size instead of a line per line basis (I don’t know if that is defined or redefinable anywhere).
I'm on Linux, I thought that writing as binary or text was the same (i.e. the 'b' option is ignored). From the manpage:
The mode string can also include the letter ``b'' either as a last character or as a character between the characters in any of the two-character strings described above. This is strictly for compatibility with C89 and has no effect; the ``b'' is ignored on all POSIX conforming systems, including Linux. (Other systems may treat text files and binary files differently, and adding the ``b'' may be a good idea if you do I/O to a binary file and expect that your program may be ported to non-Unix environments.)
You could pass on to a separate thread, but be warned that if you pass a string, it had better be copied or it may no longer be there when the thread tried to read the string.


Yeah, sort of. You will defiantly have to check to see if you are going to overrun the buffer. One way of using sprintf() is as follows:
Expand|Select|Wrap|Line Numbers
  1. bytesWrittenToBuffer += sprintf(buffer + bytesWrittenToBuffer, stringToOutput);
  2.  
BUT, you still must be aware of buffer overrun issues. That can be alleviated by using snprintf().
Expand|Select|Wrap|Line Numbers
  1. bytesWrittenToBuffer 
  2.   += snprintf(buffer + bytesWrittenToBuffer, bytesAllocatedToBuffer – bytesWrittenToBuffer, stringToOutput);
  3.  
Though this will not cause a buffer overrun, it will truncate making you loose part of your log if you are not careful. Still preferable compared with your programme starting to do random things though ;). So you still should look for a buffer overrun.
What do you mean with the last sentence? I thought snprintf will check for the buffer overrun, right?

If you are not going to be using parameters in your format string, a simple strcpy() or strncpy() would surfice in a similar way as I described.

I would only use memset() for debugging. It is not necessary once you’ve gotten the bugs worked out.

Adrian
When I don't use memset, I have to make sure that I pass the correct length arguments to fwrite, right? Is it ok to give the size argument to fwrite a value of 1 (byte) ?

Thanks, Thomas
May 24 '07 #3
AdrianH
1,251 Expert 1GB
I'm on Linux, I thought that writing as binary or text was the same (i.e. the 'b' option is ignored). From the manpage:
If it is part of the C89 standard then it is true. Then your buffering is going to be as big as the buffer allocated by the stdio library. I know that there are some exceptions, stdin and stdout are line buffered, but this is not the same for a regular file.

What do you mean with the last sentence? I thought snprintf will check for the buffer overrun, right?
Yes, I meant that if you don’t check for a buffer overrun as if you were using sprintf() you would truncate your string using snprintf().



When I don't use memset, I have to make sure that I pass the correct length arguments to fwrite, right? Is it ok to give the size argument to fwrite a value of 1 (byte) ?
You have to do that if you use memset() or not. As far as I know, you can. You can set it higher if you want to pass an array structure objects and don’t want to calculate the total size yourself.

Thanks, Thomas
Your welcome, Adrian ;)
May 24 '07 #4

Sign in to post your reply or Sign up for a free account.

Similar topics

6
by: Sebastian Kemi | last post by:
How should a write a class to a file? Would this example work: object *myobject = 0; tfile.write(reinterpret_cast<char *>(myobject), sizeof(*object)); / sebek
7
by: Adam Hartshorne | last post by:
As a result of a graphics based algorihtms, I have a list of indices to a set of nodes. I want to efficiently identify any node indices that are stored multiple times in the array and the...
1
by: lwickland | last post by:
Summary: System.Net.ScatterGatherBuffers.MemoryChuck allocates inordinately large bytes when sending large post data. The following application consumes inordinate quantities of memory. My code...
37
by: Anony | last post by:
Hi All, I'm trying to chunk a long string SourceString into lines of LineLength using this code: Dim sReturn As String = "" Dim iPos As Integer = 0 Do Until iPos >= SourceString.Length -...
12
by: Chris Springer | last post by:
I'd like to get some feedback on the issue of storing data out to disk and where to store it. I've never been in a production environment in programming so you'll have to bear with me... My...
16
by: Claudio Grondi | last post by:
I have a 250 Gbyte file (occupies the whole hard drive space) and want to change only eight bytes in this file at a given offset of appr. 200 Gbyte (all other data in that file should remain...
2
by: Cameron Walsh | last post by:
Hi all, I have a numpy.array of 89x512x512 uint8's, set up with code like this: data=numpy.array(,dtype="uint8") data.resize((89,512,512)) # Data filled in about 4 seconds from 89 image...
7
by: random guy | last post by:
Hi, I'm writing a program which creates an index of text files. For each file it processes, the program records the start and end positions (as returned by tellg()) of sections of interest,...
3
by: Barry Flynn | last post by:
Hi I am working with a VB 2005 program which has been converted from VB6. It writes data out to a flat file, with code like the following line WriteLine(riFileNo, "Hist", lsAssetID,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.