473,756 Members | 1,861 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

[c] Efficiently writing large quantities of data to a file

6 New Member
Hi,

I need to write large quantities of data to a file in C. The data comes from statistics that are continuously gathered from a simulator, and in order to not slow the whole thing down I would obviously want the writes to go as fast and efficient as possible.

Since I/O operations are rather slow, I was thinking that using a large buffer would be better than writing each data point every time. Each data point calls my function, at which point I can do something.

As I understand correctly, fwrite() already uses a buffer, but since my file is currently growing in pieces of 4 KiB, I suppose that is their buffer size. I was thinking more in something of MiBs.

I have currently allocated a buffer of 1 MiB, in which I write something and then do fwrite.
sprintf(data->buf, "magic instr %d \n", (int)n);
fwrite(data->buf, 1, strlen(data->buf), data->fd);
memset(data->buf, 0, sizeof(data->buf));

This does not do what I want, which is filling the buffer as good as possible, and then flush it to disk. I understand that my code is wrong, but I don't really know how to solve it. Should I wrap the fwrite in an if-statement which checks whether the buffer is almost full, like

if (strlen(data->buf) + strlen(data_to_ write_now) > bufsize) {
fwrite(...);
memset(..);
}
sprintf(data->buf, data_to_write_n ow);

?

I also wonder whether this approach would be the best way to handle this. I have heard about mmap, would it be more efficient?

And then there is the possibility to put the writing in another thread, so that the mean thread puts the data it receives from the simulator in the buffer, and the second one does the actual writing. Will this work better?

Thank you very much,
Thomas
May 23 '07 #1
3 9239
AdrianH
1,251 Recognized Expert Top Contributor
mmap is making a memory mapped file. It is very efficient, but can be cumbersome. You also will need to truncate the file as appropriate when you are done logging.

If you open your file as binary, you can still use fprintf() on it and it will write till it fills the buffer size instead of a line per line basis (I don’t know if that is defined or redefinable anywhere).

You could pass on to a separate thread, but be warned that if you pass a string, it had better be copied or it may no longer be there when the thread tried to read the string.

if (strlen(data->buf) + strlen(data_to_ write_now) > bufsize) {
fwrite(...);
memset(..);
}
sprintf(data->buf, data_to_write_n ow);
Yeah, sort of. You will defiantly have to check to see if you are going to overrun the buffer. One way of using sprintf() is as follows:
Expand|Select|Wrap|Line Numbers
  1. bytesWrittenToBuffer += sprintf(buffer + bytesWrittenToBuffer, stringToOutput);
  2.  
BUT, you still must be aware of buffer overrun issues. That can be alleviated by using snprintf().
Expand|Select|Wrap|Line Numbers
  1. bytesWrittenToBuffer 
  2.   += snprintf(buffer + bytesWrittenToBuffer, bytesAllocatedToBuffer – bytesWrittenToBuffer, stringToOutput);
  3.  
Though this will not cause a buffer overrun, it will truncate making you loose part of your log if you are not careful. Still preferable compared with your programme starting to do random things though ;). So you still should look for a buffer overrun.

If you are not going to be using parameters in your format string, a simple strcpy() or strncpy() would surfice in a similar way as I described.

I would only use memset() for debugging. It is not necessary once you’ve gotten the bugs worked out.


Adrian
May 23 '07 #2
patrickdepinguin
6 New Member
mmap is making a memory mapped file. It is very efficient, but can be cumbersome. You also will need to truncate the file as appropriate when you are done logging.

If you open your file as binary, you can still use fprintf() on it and it will write till it fills the buffer size instead of a line per line basis (I don’t know if that is defined or redefinable anywhere).
I'm on Linux, I thought that writing as binary or text was the same (i.e. the 'b' option is ignored). From the manpage:
The mode string can also include the letter ``b'' either as a last character or as a character between the characters in any of the two-character strings described above. This is strictly for compatibility with C89 and has no effect; the ``b'' is ignored on all POSIX conforming systems, including Linux. (Other systems may treat text files and binary files differently, and adding the ``b'' may be a good idea if you do I/O to a binary file and expect that your program may be ported to non-Unix environments.)
You could pass on to a separate thread, but be warned that if you pass a string, it had better be copied or it may no longer be there when the thread tried to read the string.


Yeah, sort of. You will defiantly have to check to see if you are going to overrun the buffer. One way of using sprintf() is as follows:
Expand|Select|Wrap|Line Numbers
  1. bytesWrittenToBuffer += sprintf(buffer + bytesWrittenToBuffer, stringToOutput);
  2.  
BUT, you still must be aware of buffer overrun issues. That can be alleviated by using snprintf().
Expand|Select|Wrap|Line Numbers
  1. bytesWrittenToBuffer 
  2.   += snprintf(buffer + bytesWrittenToBuffer, bytesAllocatedToBuffer – bytesWrittenToBuffer, stringToOutput);
  3.  
Though this will not cause a buffer overrun, it will truncate making you loose part of your log if you are not careful. Still preferable compared with your programme starting to do random things though ;). So you still should look for a buffer overrun.
What do you mean with the last sentence? I thought snprintf will check for the buffer overrun, right?

If you are not going to be using parameters in your format string, a simple strcpy() or strncpy() would surfice in a similar way as I described.

I would only use memset() for debugging. It is not necessary once you’ve gotten the bugs worked out.

Adrian
When I don't use memset, I have to make sure that I pass the correct length arguments to fwrite, right? Is it ok to give the size argument to fwrite a value of 1 (byte) ?

Thanks, Thomas
May 24 '07 #3
AdrianH
1,251 Recognized Expert Top Contributor
I'm on Linux, I thought that writing as binary or text was the same (i.e. the 'b' option is ignored). From the manpage:
If it is part of the C89 standard then it is true. Then your buffering is going to be as big as the buffer allocated by the stdio library. I know that there are some exceptions, stdin and stdout are line buffered, but this is not the same for a regular file.

What do you mean with the last sentence? I thought snprintf will check for the buffer overrun, right?
Yes, I meant that if you don’t check for a buffer overrun as if you were using sprintf() you would truncate your string using snprintf().



When I don't use memset, I have to make sure that I pass the correct length arguments to fwrite, right? Is it ok to give the size argument to fwrite a value of 1 (byte) ?
You have to do that if you use memset() or not. As far as I know, you can. You can set it higher if you want to pass an array structure objects and don’t want to calculate the total size yourself.

Thanks, Thomas
Your welcome, Adrian ;)
May 24 '07 #4

Sign in to post your reply or Sign up for a free account.

Similar topics

6
23603
by: Sebastian Kemi | last post by:
How should a write a class to a file? Would this example work: object *myobject = 0; tfile.write(reinterpret_cast<char *>(myobject), sizeof(*object)); / sebek
7
2378
by: Adam Hartshorne | last post by:
As a result of a graphics based algorihtms, I have a list of indices to a set of nodes. I want to efficiently identify any node indices that are stored multiple times in the array and the location of them in the array /list. Hence the output being some list of lists, containing groups of indices of the storage array that point to the same node index. This is obviously a trivial problem, but if my storage list is large and the set of...
1
4010
by: lwickland | last post by:
Summary: System.Net.ScatterGatherBuffers.MemoryChuck allocates inordinately large bytes when sending large post data. The following application consumes inordinate quantities of memory. My code does not explicitly allocate memory in a loop nor does it explicitly allocate large blocks of memory. Yet, the application’s memory footprint will grow as large as 370 MB. Rarely will it run to completion; usually, it throws an out of memory...
37
2141
by: Anony | last post by:
Hi All, I'm trying to chunk a long string SourceString into lines of LineLength using this code: Dim sReturn As String = "" Dim iPos As Integer = 0 Do Until iPos >= SourceString.Length - LineLength sReturn += SourceString.Substring(iPos, LineLength) + vbCrLf iPos += LineLength
12
3767
by: Chris Springer | last post by:
I'd like to get some feedback on the issue of storing data out to disk and where to store it. I've never been in a production environment in programming so you'll have to bear with me... My question is about storing data in a database. Yes I understand that you can link to a database in your program and read and write to the database etc etc. Well, that's all find and dandy but what if the person you're writing the application for...
16
7188
by: Claudio Grondi | last post by:
I have a 250 Gbyte file (occupies the whole hard drive space) and want to change only eight bytes in this file at a given offset of appr. 200 Gbyte (all other data in that file should remain unchanged). How can I do that in Python? Claudio Grondi
2
1194
by: Cameron Walsh | last post by:
Hi all, I have a numpy.array of 89x512x512 uint8's, set up with code like this: data=numpy.array(,dtype="uint8") data.resize((89,512,512)) # Data filled in about 4 seconds from 89 image slices <snip lots of processing code>
7
3067
by: random guy | last post by:
Hi, I'm writing a program which creates an index of text files. For each file it processes, the program records the start and end positions (as returned by tellg()) of sections of interest, and then some time later uses these positions to read the interesting sections from the file.
3
2696
by: Barry Flynn | last post by:
Hi I am working with a VB 2005 program which has been converted from VB6. It writes data out to a flat file, with code like the following line WriteLine(riFileNo, "Hist", lsAssetID, lsRecordType, lsXNbr, lsFiscYr, "Beg", CStr(H.BegBalAccDepn), CStr(H.BegBalCost), CStr(H.BegBalCostReval), CStr(H.BegBalDepCost), CStr(H.BegBalDepnReval)) The program is running from within a Virtual PC
0
9431
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10014
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
9819
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9689
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8688
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6514
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5119
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
3326
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2647
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.