Connecting Tech Pros Worldwide Forums | Help | Site Map

[c] Efficiently writing large quantities of data to a file

Newbie
 
Join Date: May 2007
Posts: 6
#1: May 23 '07
Hi,

I need to write large quantities of data to a file in C. The data comes from statistics that are continuously gathered from a simulator, and in order to not slow the whole thing down I would obviously want the writes to go as fast and efficient as possible.

Since I/O operations are rather slow, I was thinking that using a large buffer would be better than writing each data point every time. Each data point calls my function, at which point I can do something.

As I understand correctly, fwrite() already uses a buffer, but since my file is currently growing in pieces of 4 KiB, I suppose that is their buffer size. I was thinking more in something of MiBs.

I have currently allocated a buffer of 1 MiB, in which I write something and then do fwrite.
sprintf(data->buf, "magic instr %d \n", (int)n);
fwrite(data->buf, 1, strlen(data->buf), data->fd);
memset(data->buf, 0, sizeof(data->buf));

This does not do what I want, which is filling the buffer as good as possible, and then flush it to disk. I understand that my code is wrong, but I don't really know how to solve it. Should I wrap the fwrite in an if-statement which checks whether the buffer is almost full, like

if (strlen(data->buf) + strlen(data_to_write_now) > bufsize) {
fwrite(...);
memset(..);
}
sprintf(data->buf, data_to_write_now);

?

I also wonder whether this approach would be the best way to handle this. I have heard about mmap, would it be more efficient?

And then there is the possibility to put the writing in another thread, so that the mean thread puts the data it receives from the simulator in the buffer, and the second one does the actual writing. Will this work better?

Thank you very much,
Thomas

AdrianH's Avatar
Expert
 
Join Date: Feb 2007
Location: Halifax
Posts: 1,099
#2: May 23 '07

re: [c] Efficiently writing large quantities of data to a file


mmap is making a memory mapped file. It is very efficient, but can be cumbersome. You also will need to truncate the file as appropriate when you are done logging.

If you open your file as binary, you can still use fprintf() on it and it will write till it fills the buffer size instead of a line per line basis (I don’t know if that is defined or redefinable anywhere).

You could pass on to a separate thread, but be warned that if you pass a string, it had better be copied or it may no longer be there when the thread tried to read the string.

Quote:

Originally Posted by patrickdepinguin

if (strlen(data->buf) + strlen(data_to_write_now) > bufsize) {
fwrite(...);
memset(..);
}
sprintf(data->buf, data_to_write_now);

Yeah, sort of. You will defiantly have to check to see if you are going to overrun the buffer. One way of using sprintf() is as follows:
Expand|Select|Wrap|Line Numbers
  1. bytesWrittenToBuffer += sprintf(buffer + bytesWrittenToBuffer, stringToOutput);
  2.  
BUT, you still must be aware of buffer overrun issues. That can be alleviated by using snprintf().
Expand|Select|Wrap|Line Numbers
  1. bytesWrittenToBuffer 
  2.   += snprintf(buffer + bytesWrittenToBuffer, bytesAllocatedToBuffer – bytesWrittenToBuffer, stringToOutput);
  3.  
Though this will not cause a buffer overrun, it will truncate making you loose part of your log if you are not careful. Still preferable compared with your programme starting to do random things though ;). So you still should look for a buffer overrun.

If you are not going to be using parameters in your format string, a simple strcpy() or strncpy() would surfice in a similar way as I described.

I would only use memset() for debugging. It is not necessary once you’ve gotten the bugs worked out.


Adrian
Newbie
 
Join Date: May 2007
Posts: 6
#3: May 24 '07

re: [c] Efficiently writing large quantities of data to a file


Quote:

Originally Posted by AdrianH

mmap is making a memory mapped file. It is very efficient, but can be cumbersome. You also will need to truncate the file as appropriate when you are done logging.

If you open your file as binary, you can still use fprintf() on it and it will write till it fills the buffer size instead of a line per line basis (I don’t know if that is defined or redefinable anywhere).

I'm on Linux, I thought that writing as binary or text was the same (i.e. the 'b' option is ignored). From the manpage:
Quote:
The mode string can also include the letter ``b'' either as a last character or as a character between the characters in any of the two-character strings described above. This is strictly for compatibility with C89 and has no effect; the ``b'' is ignored on all POSIX conforming systems, including Linux. (Other systems may treat text files and binary files differently, and adding the ``b'' may be a good idea if you do I/O to a binary file and expect that your program may be ported to non-Unix environments.)
Quote:
You could pass on to a separate thread, but be warned that if you pass a string, it had better be copied or it may no longer be there when the thread tried to read the string.


Yeah, sort of. You will defiantly have to check to see if you are going to overrun the buffer. One way of using sprintf() is as follows:
Expand|Select|Wrap|Line Numbers
  1. bytesWrittenToBuffer += sprintf(buffer + bytesWrittenToBuffer, stringToOutput);
  2.  
BUT, you still must be aware of buffer overrun issues. That can be alleviated by using snprintf().
Expand|Select|Wrap|Line Numbers
  1. bytesWrittenToBuffer 
  2.   += snprintf(buffer + bytesWrittenToBuffer, bytesAllocatedToBuffer – bytesWrittenToBuffer, stringToOutput);
  3.  
Though this will not cause a buffer overrun, it will truncate making you loose part of your log if you are not careful. Still preferable compared with your programme starting to do random things though ;). So you still should look for a buffer overrun.
What do you mean with the last sentence? I thought snprintf will check for the buffer overrun, right?

Quote:
If you are not going to be using parameters in your format string, a simple strcpy() or strncpy() would surfice in a similar way as I described.

I would only use memset() for debugging. It is not necessary once you’ve gotten the bugs worked out.

Adrian
When I don't use memset, I have to make sure that I pass the correct length arguments to fwrite, right? Is it ok to give the size argument to fwrite a value of 1 (byte) ?

Thanks, Thomas
AdrianH's Avatar
Expert
 
Join Date: Feb 2007
Location: Halifax
Posts: 1,099
#4: May 24 '07

re: [c] Efficiently writing large quantities of data to a file


Quote:

Originally Posted by patrickdepinguin

I'm on Linux, I thought that writing as binary or text was the same (i.e. the 'b' option is ignored). From the manpage:

If it is part of the C89 standard then it is true. Then your buffering is going to be as big as the buffer allocated by the stdio library. I know that there are some exceptions, stdin and stdout are line buffered, but this is not the same for a regular file.

Quote:

Originally Posted by patrickdepinguin

What do you mean with the last sentence? I thought snprintf will check for the buffer overrun, right?

Yes, I meant that if you don’t check for a buffer overrun as if you were using sprintf() you would truncate your string using snprintf().



Quote:

Originally Posted by patrickdepinguin

When I don't use memset, I have to make sure that I pass the correct length arguments to fwrite, right? Is it ok to give the size argument to fwrite a value of 1 (byte) ?

You have to do that if you use memset() or not. As far as I know, you can. You can set it higher if you want to pass an array structure objects and don’t want to calculate the total size yourself.

Quote:

Originally Posted by patrickdepinguin

Thanks, Thomas

Your welcome, Adrian ;)
Reply


Similar C / C++ bytes