[c] Efficiently writing large quantities of data to a file | Newbie | | Join Date: May 2007
Posts: 6
| | |
Hi,
I need to write large quantities of data to a file in C. The data comes from statistics that are continuously gathered from a simulator, and in order to not slow the whole thing down I would obviously want the writes to go as fast and efficient as possible.
Since I/O operations are rather slow, I was thinking that using a large buffer would be better than writing each data point every time. Each data point calls my function, at which point I can do something.
As I understand correctly, fwrite() already uses a buffer, but since my file is currently growing in pieces of 4 KiB, I suppose that is their buffer size. I was thinking more in something of MiBs.
I have currently allocated a buffer of 1 MiB, in which I write something and then do fwrite.
sprintf(data->buf, "magic instr %d \n", (int)n);
fwrite(data->buf, 1, strlen(data->buf), data->fd);
memset(data->buf, 0, sizeof(data->buf));
This does not do what I want, which is filling the buffer as good as possible, and then flush it to disk. I understand that my code is wrong, but I don't really know how to solve it. Should I wrap the fwrite in an if-statement which checks whether the buffer is almost full, like
if (strlen(data->buf) + strlen(data_to_write_now) > bufsize) {
fwrite(...);
memset(..);
}
sprintf(data->buf, data_to_write_now);
?
I also wonder whether this approach would be the best way to handle this. I have heard about mmap, would it be more efficient?
And then there is the possibility to put the writing in another thread, so that the mean thread puts the data it receives from the simulator in the buffer, and the second one does the actual writing. Will this work better?
Thank you very much,
Thomas
|  | Expert | | Join Date: Feb 2007 Location: Halifax
Posts: 1,099
| | | re: [c] Efficiently writing large quantities of data to a file
mmap is making a memory mapped file. It is very efficient, but can be cumbersome. You also will need to truncate the file as appropriate when you are done logging.
If you open your file as binary, you can still use fprintf() on it and it will write till it fills the buffer size instead of a line per line basis (I don’t know if that is defined or redefinable anywhere).
You could pass on to a separate thread, but be warned that if you pass a string, it had better be copied or it may no longer be there when the thread tried to read the string. Quote:
Originally Posted by patrickdepinguin if (strlen(data->buf) + strlen(data_to_write_now) > bufsize) {
fwrite(...);
memset(..);
}
sprintf(data->buf, data_to_write_now); Yeah, sort of. You will defiantly have to check to see if you are going to overrun the buffer. One way of using sprintf() is as follows: -
bytesWrittenToBuffer += sprintf(buffer + bytesWrittenToBuffer, stringToOutput);
-
BUT, you still must be aware of buffer overrun issues. That can be alleviated by using snprintf(). -
bytesWrittenToBuffer
-
+= snprintf(buffer + bytesWrittenToBuffer, bytesAllocatedToBuffer – bytesWrittenToBuffer, stringToOutput);
-
Though this will not cause a buffer overrun, it will truncate making you loose part of your log if you are not careful. Still preferable compared with your programme starting to do random things though ;). So you still should look for a buffer overrun.
If you are not going to be using parameters in your format string, a simple strcpy() or strncpy() would surfice in a similar way as I described.
I would only use memset() for debugging. It is not necessary once you’ve gotten the bugs worked out.
Adrian
| | Newbie | | Join Date: May 2007
Posts: 6
| | | re: [c] Efficiently writing large quantities of data to a file Quote:
Originally Posted by AdrianH mmap is making a memory mapped file. It is very efficient, but can be cumbersome. You also will need to truncate the file as appropriate when you are done logging.
If you open your file as binary, you can still use fprintf() on it and it will write till it fills the buffer size instead of a line per line basis (I don’t know if that is defined or redefinable anywhere). I'm on Linux, I thought that writing as binary or text was the same (i.e. the 'b' option is ignored). From the manpage: Quote:
The mode string can also include the letter ``b'' either as a last character or as a character between the characters in any of the two-character strings described above. This is strictly for compatibility with C89 and has no effect; the ``b'' is ignored on all POSIX conforming systems, including Linux. (Other systems may treat text files and binary files differently, and adding the ``b'' may be a good idea if you do I/O to a binary file and expect that your program may be ported to non-Unix environments.)
Quote:
You could pass on to a separate thread, but be warned that if you pass a string, it had better be copied or it may no longer be there when the thread tried to read the string.
Yeah, sort of. You will defiantly have to check to see if you are going to overrun the buffer. One way of using sprintf() is as follows: -
bytesWrittenToBuffer += sprintf(buffer + bytesWrittenToBuffer, stringToOutput);
-
BUT, you still must be aware of buffer overrun issues. That can be alleviated by using snprintf(). -
bytesWrittenToBuffer
-
+= snprintf(buffer + bytesWrittenToBuffer, bytesAllocatedToBuffer – bytesWrittenToBuffer, stringToOutput);
-
Though this will not cause a buffer overrun, it will truncate making you loose part of your log if you are not careful. Still preferable compared with your programme starting to do random things though ;). So you still should look for a buffer overrun.
What do you mean with the last sentence? I thought snprintf will check for the buffer overrun, right? Quote:
If you are not going to be using parameters in your format string, a simple strcpy() or strncpy() would surfice in a similar way as I described.
I would only use memset() for debugging. It is not necessary once you’ve gotten the bugs worked out.
Adrian
When I don't use memset, I have to make sure that I pass the correct length arguments to fwrite, right? Is it ok to give the size argument to fwrite a value of 1 (byte) ?
Thanks, Thomas
|  | Expert | | Join Date: Feb 2007 Location: Halifax
Posts: 1,099
| | | re: [c] Efficiently writing large quantities of data to a file Quote:
Originally Posted by patrickdepinguin I'm on Linux, I thought that writing as binary or text was the same (i.e. the 'b' option is ignored). From the manpage: If it is part of the C89 standard then it is true. Then your buffering is going to be as big as the buffer allocated by the stdio library. I know that there are some exceptions, stdin and stdout are line buffered, but this is not the same for a regular file. Quote:
Originally Posted by patrickdepinguin What do you mean with the last sentence? I thought snprintf will check for the buffer overrun, right? Yes, I meant that if you don’t check for a buffer overrun as if you were using sprintf() you would truncate your string using snprintf(). Quote:
Originally Posted by patrickdepinguin When I don't use memset, I have to make sure that I pass the correct length arguments to fwrite, right? Is it ok to give the size argument to fwrite a value of 1 (byte) ? You have to do that if you use memset() or not. As far as I know, you can. You can set it higher if you want to pass an array structure objects and don’t want to calculate the total size yourself. Quote:
Originally Posted by patrickdepinguin Thanks, Thomas Your welcome, Adrian ;)
|  | | | | /bytes/about
We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights.
Get the best answers to your questions from over 226,449 network members.
|