fasteste way to fill a structure.

Simon wrote:

May not be. You may want to change the structure and make it contain
arrays of char instead of pointers to dynamically allocated arrays.

Then the construction will be a bit faster, you could simply drop the
'string' thing there. Also, if you're sure about the source of the
data, and their format, you could avoid constructing temporaries. Play
with making 'sFileData' look like

char s1[41]; // if it's a C string, reserve the room for the null char
char s2[41];
int one, two;

I know I am going to be told I am too difficult, but the reason why I
dynamically create the string is because they are almost never longer than 5
chars.
So by declaring s1[41] I know that I am wasting around 36 chars, (The sizes
are different, there could be a string of 40 chars).

I know that we are only talking about 36 chars here, but I load 100000's of
lines and the waste really seems unnecessary to me, (and I don't like
wasting memory).
It seems to defeat the object dynamic memory allocations.

Simon

Dynamic memory allocation of many small segments causes
extremely poor memory utilization.

malloc and new (new often uses malloc) get memory from
the operating system in pages (4k, 8k, etc). They use
part of the obtained memory to implement control structures
(for keeping track of allocated and freed/reuseable chunks).
Each allocation also includes book keeping overhead (typically
8 bytes on a 32 bit OS), and normally no less than 16 bytes
is used per allocation - even if the user only asked for
one byte, malloc(1). So, in general, at least 16 bytes plus
a pointer in the malloc control structure (a linked list)
is allocated for each request.

Your program with the 100000 calls to allocate 5 bytes and
another 100000 calls to allocate 8 bytes will use AT LEAST
twice as much memory as you think - because of the hidden
extra memory used to keep track of everything.

These two articles explain it in detail (your OS may vary,
but the generalities apply):

http://www.cs.utk.edu/~plank/plank/c...2/lecture.html
http://www.cs.utk.edu/~plank/plank/c...n/lecture.html

Regards,
Larry

Jul 23 '05 #13

simon wrote:

From my previous post...

If I have a structure,

struct sFileData
{
char*sSomeStrin g1;
char*sSomeStrin g2;
int iSomeNum1;
int iSomeNum2;
sFileData(){... };
~sFileData(){.. .};
sFileData(const sFileData&){... };
const sFileData operator=( const sFileData &s ){...}
};

I read the file as follows

FILE *f = fopen( szPath, "rb" );

int nLineSize = 190;
BYTE b[nLineSize+1];

fread( b, sizeof(BYTE), nLineSize, f );
int numofrecords = atoi( b ); // first line is num of records only,

// read the data itself.
while( fread( b, sizeof(BYTE), nLineSize, f ) == nLineSize )
{
// fill data
// The locations of each items is known
// sString1 = 0->39, with blank spaces filler after data
// sString2 = 40->79, with blank spaces filler after data
// iNum1 = 80->99, with blank spaces filler after data
// iNum2 = 100->end, with blank spaces filler after data
}

what would be the best way to fill the data into an array, (vector)?

Many thanks.

Simon.

You state that each line in the file (including the first one) is
190 bytes long: 'int nLineSize = 190;' Yet your data items are all
ascii, and they occupy the first 100+ bytes. Is the ascii data
followed by additional (possibly binary) data that fills out the
record to a length of 190 bytes? Does each 190 byte record include
a trailing newline (Windows style "\r\n" or non-Windows "\n")?

Since you open the file in binary mode ("rb"), we might infer
at least 4 things:

1) the file contains mixed ascii/binary data records;
each of which is 190 bytes long with NO delimiting
newlines (aka Fixed Block in IBM parlance).

2) the file contains mixed ascii/binary data records;
each of which is 190 bytes long INCLUDING a delimiting
newline (Windows style "\r\n" or non-Windows "\n").

3) the file contains ascii-only data records with
a fixed length of 190 bytes with NO delimiting
newlines.

4) the file contains ascii-only data records with
a fixed length of 190 bytes INCLUDING a delimiting
newline (Windows style "\r\n" or non-Windows "\n").

It will be much easier for us to suggest effecient coding
approaches if you would please describe the EXACT layout
of the 190 byte records - including what follows 'iNum2',
and whether or not each of the 190 byte records includes
a trailing newline.

I have some ideas, but knowing the complete layout of the
190 byte records is key to picking the best approach.

Regards,
Larry

Jul 23 '05 #14

Larry I Smith wrote:

simon wrote:
From my previous post...

If I have a structure,

struct sFileData
{
char*sSomeStrin g1;
char*sSomeStrin g2;
int iSomeNum1;
int iSomeNum2;
sFileData(){... };
~sFileData(){.. .};
sFileData(const sFileData&){... };
const sFileData operator=( const sFileData &s ){...}
};

I read the file as follows

FILE *f = fopen( szPath, "rb" );

int nLineSize = 190;
BYTE b[nLineSize+1];

fread( b, sizeof(BYTE), nLineSize, f );
int numofrecords = atoi( b ); // first line is num of records only,

// read the data itself.
while( fread( b, sizeof(BYTE), nLineSize, f ) == nLineSize )
{
// fill data
// The locations of each items is known
// sString1 = 0->39, with blank spaces filler after data
// sString2 = 40->79, with blank spaces filler after data
// iNum1 = 80->99, with blank spaces filler after data
// iNum2 = 100->end, with blank spaces filler after data
}

what would be the best way to fill the data into an array, (vector)?

Many thanks.

Simon.

You state that each line in the file (including the first one) is
190 bytes long: 'int nLineSize = 190;' Yet your data items are all
ascii, and they occupy the first 100+ bytes. Is the ascii data
followed by additional (possibly binary) data that fills out the
record to a length of 190 bytes? Does each 190 byte record include
a trailing newline (Windows style "\r\n" or non-Windows "\n")?

Since you open the file in binary mode ("rb"), we might infer
at least 4 things:

1) the file contains mixed ascii/binary data records;
each of which is 190 bytes long with NO delimiting
newlines (aka Fixed Block in IBM parlance).

2) the file contains mixed ascii/binary data records;
each of which is 190 bytes long INCLUDING a delimiting
newline (Windows style "\r\n" or non-Windows "\n").

3) the file contains ascii-only data records with
a fixed length of 190 bytes with NO delimiting
newlines.

4) the file contains ascii-only data records with
a fixed length of 190 bytes INCLUDING a delimiting
newline (Windows style "\r\n" or non-Windows "\n").

It will be much easier for us to suggest effecient coding
approaches if you would please describe the EXACT layout
of the 190 byte records - including what follows 'iNum2',
and whether or not each of the 190 byte records includes
a trailing newline.

I have some ideas, but knowing the complete layout of the
190 byte records is key to picking the best approach.

Regards,
Larry

Two more questions:

5) do you wish to have leading/trailing whitespace
stripped from the first 2 string fields before
they are put into the structure?

6) might the first 2 string fields contain embedded
whitespace (e.g. sSomeString1 could be "hello there")?

Regards,
Larry

Jul 23 '05 #15

simon

You state that each line in the file (including the first one) is
190 bytes long: 'int nLineSize = 190;' Yet your data items are all
ascii, and they occupy the first 100+ bytes. Is the ascii data
followed by additional (possibly binary) data that fills out the
record to a length of 190 bytes? Does each 190 byte record include
a trailing newline (Windows style "\r\n" or non-Windows "\n")?
That's part of the problem, the line is 190 char long+'\n'
but the only meaningful data to me is 0->110

Since you open the file in binary mode ("rb"), we might infer
at least 4 things:

1) the file contains mixed ascii/binary data records;
each of which is 190 bytes long with NO delimiting
newlines (aka Fixed Block in IBM parlance).
It is all ascii+'\n'. I open it "rb" because that's what i usually do.
But it is a flat text file.
It will be much easier for us to suggest effecient coding
approaches if you would please describe the EXACT layout
of the 190 byte records - including what follows 'iNum2',
and whether or not each of the 190 byte records includes
a trailing newline.

What follows is more text and number data, (but all in ASCII).

5) do you wish to have leading/trailing whitespace
stripped from the first 2 string fields before
they are put into the structure?
Yes, but only the trailing spaces.The data is leftmost of it's section.

6) might the first 2 string fields contain embedded
whitespace (e.g. sSomeString1 could be "hello there")?
Yes, if that makes it faster to load, the data is 'protected' and i use
functions to return the values.
The problem are the numbers, it would not be very efficient to return
something like 'atoi("1234" ) all the time.

Regards,
Larry

many thanks for your help.
Simon

Jul 23 '05 #16

simon

>>> You state that each line in the file (including the first one) is

190 bytes long: 'int nLineSize = 190;' Yet your data items are all
ascii, and they occupy the first 100+ bytes. Is the ascii data
followed by additional (possibly binary) data that fills out the
record to a length of 190 bytes? Does each 190 byte record include
a trailing newline (Windows style "\r\n" or non-Windows "\n")?

That's part of the problem, the line is 190 char long+'\n'
but the only meaningful data to me is 0->110

Sorry, I made a mistake, the file is 'windows style' 190+ '\r\n'

Simon

Jul 23 '05 #17

simon wrote:

You state that each line in the file (including the first one) is
190 bytes long: 'int nLineSize = 190;' Yet your data items are all
ascii, and they occupy the first 100+ bytes. Is the ascii data
followed by additional (possibly binary) data that fills out the
record to a length of 190 bytes? Does each 190 byte record include
a trailing newline (Windows style "\r\n" or non-Windows "\n")?

That's part of the problem, the line is 190 char long+'\n'
but the only meaningful data to me is 0->110

Sorry, I made a mistake, the file is 'windows style' 190+ '\r\n'

Simon

Ok, so the lines are each 192 bytes long (including the \r\n).

If you use fread() to read the data, then fread needs to read
192 bytes - NOT 190. "\r\n" is not special to fread() - it reads
raw bytes. So, if you read only 190 bytes when each 'line'
is actually 192 bytes long, then the fields for all records
except the first one will each be off by 2 bytes from the previous
record, e.g. by the time you get to record 40, your data fields
will be off by 80 bytes from where you think they are. This will
cause your sFileData structs to NOT have the contents you expect,
and may be contibuting to the terrible performance that you
are seeing.

I have written 3 small programs that I will post in a few
minutes. I wrote them using a 190 byte line length (including
the trailing "\r\n"). As soon as I change them to use 192
byte lines, I'll post them. They are:

simondat.c: to create a test input data file named "simon.dat"
with 100000 records for use by the other 2 programs.

simon.cpp: uses 'char *' with new/delete for the string
fields in sFileData.

simon2.cpp: uses std::string for the string fields in
sFileData.

On my pc (an old Gateway PII 450MHZ with 384MB of RAM):

simon.cpp runs in 2.20 seconds and uses 5624KB of memory.

simon2.cpp runs in 2.22 seconds and uses 6272KB of memory.

Your mileage may vary. I'm running SuSE Linux v9.3 and
using the GCC "g++" compiler v3.3.5.

Regards,
Larry

Jul 23 '05 #18

simon

On my pc (an old Gateway PII 450MHZ with 384MB of RAM):

simon.cpp runs in 2.20 seconds and uses 5624KB of memory.
Thanks for that, I get 1.24 sec and 6mb.
I just need to check what the difference is with my code.

simon2.cpp runs in 2.22 seconds and uses 6272KB of memory.

Your mileage may vary. I'm running SuSE Linux v9.3 and
using the GCC "g++" compiler v3.3.5.

Regards,
Larry
Here are the 3 programs:

<snip code>

Regards,
Larry

Thanks for that, this is great.
I wonder if my Trim(...) function was not part of the problem.

After profiling I noticed that delete [], (or even free(..) ) takes around
50% of the whole time.

Maybe I should get rid of the dynamic allocation all together.

Simon

Jul 23 '05 #19

simon wrote:

On my pc (an old Gateway PII 450MHZ with 384MB of RAM):

simon.cpp runs in 2.20 seconds and uses 5624KB of memory.
Thanks for that, I get 1.24 sec and 6mb.
I just need to check what the difference is with my code.
simon2.cpp runs in 2.22 seconds and uses 6272KB of memory.

Your mileage may vary. I'm running SuSE Linux v9.3 and
using the GCC "g++" compiler v3.3.5.

Regards,
Larry

Here are the 3 programs:

<snip code>
Regards,
Larry

Thanks for that, this is great.
I wonder if my Trim(...) function was not part of the problem.

After profiling I noticed that delete [], (or even free(..) ) takes around
50% of the whole time.

Maybe I should get rid of the dynamic allocation all together.

Simon

What does your profiler say about simon2.cpp?

Actually 1.24 seconds is pretty good for 100000 records.

As far as the memory usage goes, did you read the 2
articles on malloc that I posted earlier? Whether you
use new/delete or std::string (which does its own new/delete
behind the scenes) doesn't make much difference in performance
or memory usage, but std::string allows you much more
flexibility when manipulating the strings after you've
filled your vector (i.e. later in the program).

Due to the many (200000) tiny memory allocations, your memory
usage would be about:

2.5 * (sizeof(sFiledD ata) * 100000)

when both strings (sSomeString1 & sSomeString2) are small.

16 bytes minimum (plus the pointer kept in sFileData) will
be allocated for each of those strings. So, using pointers
in sFileData, the actual memory used for one sFiledData
is at least 48 bytes.

Regards,
Larry

Jul 23 '05 #20

Similar topics

2409

Malloc, Structure Help

by: Mannequin* | last post by:

Hi all, I'm working on a quick program to bring the Bible into memory from a text file. Anyway, I have three questions to ask. First, is my implementation of malloc () correct in the program to follow? Second, have I correctly passed the structure's pointer to the functions in this program?

Internal limitation: structure is too complex or too large.

2239

by: Kirk Marple | last post by:

i have a large C++ data structure that i'm trying to interop with... the structure is 77400 bytes long. i have the structure defined in C#, so i was trying to just use "ref <structure>" as the method parameter. if i use this direct approach, i get the error: Message: Cannot marshal 'parameter #1': Internal limitation: structure is too complex or too large." is there a 64k limit on marshalling structures? what workarounds are there...

Defining the fields of a structure at run-time

2122

by: nambissan.nisha | last post by:

I am facing this problem.... I have to define a structure at runtime as the user specifies... The user will tell the number of fields,the actual fields...(maybe basic or array types or multiple arrays,etc) I do not understand how to define the structure at run time.i.e.what fields it will contain.

UI to enter values into a structure

1040

by: jim_adams | last post by:

For a nested structure such as: Dim userVariable as one structure one dim a as string dim b() as two end structure structure two

Marshal Structure containing arrays to function in DLL

4990

by: David Fort | last post by:

Hi, I'm upgrading a VB6 app to VB.net and I'm having a problem with a call to a function provided in a DLL. The function takes the address of a structure which it will fill in with values. I get an error: ---------------- An unhandled exception of type 'System.NullReferenceException' occured in

1922

Array in Structure

by: Lance | last post by:

Hi all, I've got a some structures defined as ////// <StructLayout(LayoutKind.Sequential)Public Structure GM_LayerInfo_t Public mDescription As String Public mNativeRect As GM_Rectangle_t Public mGlobalRect As GM_Rectangle_t Public mPixelWidth As Int32

Fill structure's fields in a loop ?

4434

by: oliv29 | last post by:

Hello, Can anyone tell me how do i fill a structure's fields in a for loop ? #define MAX_FIELD_LEN 10 typedef struct { char field1; /*will store first string read from keyboard*/

Problem in the C Structure code - pls help me.

1778

by: Vasu | last post by:

Hi! Can anybody there help me in analysis of the following code, which is a structure of customer's details, asks user to fill in the no. of customers and then their details. When I put in no. of customer as 2 and start filling in users details, in the detail of second customer till the name of State is OK as soon as I fill in the detail of State and press enter, it shows the field of PIN code but then suddenly it says something like...

Re: share a structure array containing multidimensional char array

3788

by: =?Utf-8?B?QXlrdXQgRXJnaW4=?= | last post by:

Hi Willy, Thank you very much for your work. C++ code doesnot make any serialization. So at runtime C# code gives an serialization error at "msg_file_s sa = (msg_file_s) bf.Deserialize(ms);" I thought that it is very hard to memory map structure array. I need both read and write memory mapped file at both side of C# and C++.

Problem With Comparison Operator <=> in G++

9152

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

The easy way to turn off automatic updates for Windows 10/11

8885

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

8855

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

7708

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

6515

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

5857

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

Trying to create a lan-to-lan vpn between two differents networks

4358

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

4612

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

transfer the data from one system to another through ip address

3037

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system