473,387 Members | 1,700 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

data structure & alignment accessing speed on 32 bits system

pt
Hi,
i am wonderng what is faster according to accessing speed to read
these data structure from the disk in c/c++ including alignment
handling if we access it on little endian system 32 bits system + OS
e.g. Windows, Linux, WinCE. I am not quite sure about the alignment of
the memory....

soln. 1: should be faster, I am not sure.
idx size (bytes)
1 4
2 4
2 1
3 1
4 1
5 1
6 1
7 1
8 1
9 1
sum = 16 bytes

soln.2
idx size (bytes)
1 4
2 4
3 4 --to get the contents back need to use shift bits 4+ times
4 4 -- to get the contents back need to use shift bits 4 times

sum = 16 bytes

Thanks for any suggestion!

pt.

Aug 1 '06 #1
4 2871
pt wrote:
i am wonderng what is faster according to accessing speed to read
these data structure from the disk in c/c++ including alignment
handling if we access it on little endian system 32 bits system + OS
e.g. Windows, Linux, WinCE.
I can't make heads or tails of this statement. Could you break it up into
shorter sentences?

Also, consider that performance depends greatly on the platform and the
compiler you're using, so there is no single answer to your question.
Performance (especially when such low level of operations is concerned)
is _measured_, not calculated. You write your functions and measure the
time it takes. Then you look at it from the overall program execution
standpoint and decide if the performance of any particular part is of
any importance.

Also, for OS-specific inquiries, consider posting to the newsgroup[s]
dedicated to the OS[es].
[..]
V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
Aug 1 '06 #2
Also, consider that performance depends greatly on the platform and the
compiler you're using, so there is no single answer to your question.
Performance (especially when such low level of operations is concerned)
is _measured_, not calculated. You write your functions and measure the
time it takes. Then you look at it from the overall program execution
standpoint and decide if the performance of any particular part is of
any importance.
Do we have roughly general idea to this? Linux (gcc), WinCE/Windows
(Visual Studio .NET)

Aug 1 '06 #3

pa*******@gmail.com wrote:
Also, consider that performance depends greatly on the platform and the
compiler you're using, so there is no single answer to your question.
Performance (especially when such low level of operations is concerned)
is _measured_, not calculated. You write your functions and measure the
time it takes. Then you look at it from the overall program execution
standpoint and decide if the performance of any particular part is of
any importance.

Do we have roughly general idea to this? Linux (gcc), WinCE/Windows
(Visual Studio .NET)
Not really. I assume you are using x86. (Which is probably wrong
since you mention CE, but it's reasonably common, and you didn't say
what you are using so I will assume it anyway.)

Performance may be effected by any number of things. Aligned reads are
generally faster than unaligned reads, just as a completely general
rule of thumb. However, read penalties are usually less if the data is
in cache. Does your chip have cache? How much? What algorithms
control how the cache is filled and drained of data? Is the cache
shared between multiple cores? How about other processes? Any of this
can effect performance.

Does your CPU support MMX? SSE? 3DNow? Depending on what *exactly*
you are doing, these instructions may help immensely, or be of no
value. Does your compiler output those extra instructions? When does
it do so?

Are you using a 386? Opteron? P4? Some old 386's only had a 16 bit
address bus, so the alignment issue may not be a big deal, considering
you are already so slowed down by the bus. Perfectly optimal code on
an Opteron will look a bit different from optimal code on a P4 in most
cases.

So, what about the compiler? In a minimally optimising mode, what you
write will probably map very directly to the function of the machine
code. On a higher optimisation mode, the machine code may seem only
dimly related to what you wrote. And, two different looking chunks of
C which do the exact same thing may actually wind up being compiled to
the exact same machine code. Which compiler are you using? Which
version? gcc added a lot of new optimising bells and whistles
recently. Maybe one method will be better optimised by the new 4.X
bells and whistles, and the other method will be better optmised by the
older 3.X bells and whistles.

Those are some of the questions off the top of my head that would all
need to be considered before you could really say how well
micro-optimisations will work out. It's basically impossible to be
sure, so you just have to measure actual performance. And, you have to
measure it under actual running conditions.

When considering doing optimisations, people generally try to avoid
really low level stuff as much as possible. First, consider if you are
using an appropriate algorithm. This will almost always give you the
biggest possible speedup. If you can move from an O(n*n) algorithm to
an O(n log n) one, then you have probably sped things up tremendously.
>From there, you need to profile, and see what is running slow. If you
spend two months making a perfectly aligned data structure with
perfectly aligned accesses, that may be great. But, if loading your
structure is only .001% of your run time, then there was probably no
point, even if that specific step is a million times faster. Once you
find what is running slow, start with the low hanging fruit.

For example, imagine that you are processing data in a file. If you
have established that reading the data is slow, and the processing is
reasonably quick, then you need to try and figure out what the easiest
way to speed up the reading is. If you are making a bunch of small
reads, then you could try lumping them together into some big reads
that get a bunch of data at once. This may result in less disk
seeking, which can improve things dramatically. You can also look at
some crazy non portable system calls which will take you a long time to
get working right. But, if grouping your reads gives you 90% of the
crazy and super complex solution, then there is probably no point to
going that route.

So, remember that micro-optimisations suffer bit rot. What you
micro-optimise today will almost certainly be subuptimal on the new
chip they release next month.

I've rambled quite a lot longer than I intended to, and I apologise.
As you can see, getting into micro-optimisations really explodes the
number of issues that might come up. That's why this group is such a
bad source of advice on those sorts of issues. This group is mostly
just about the stuff that is definitely specified in the C standard.
You will be much better off trying to get micro-optimisation advice in
a group dedicated to your compiler, or to assembly coding on your CPU,
etc.

Aug 1 '06 #4
pattreeya writes:
i am wonderng what is faster according to accessing speed to read
these data structure from the disk in c/c++ including alignment
handling if we access it on little endian system 32 bits system + OS
e.g. Windows, Linux, WinCE. I am not quite sure about the alignment of
the memory....
This thread lists a lot of specific issues to consider, but here are
some general ones (none of them overriding the more specific matters:-)

The compiler likely knows better than you do. Even when it doesn't, any
optimization you do for some specific hardware is likely to be obsolete
or even a slowdown soon enough, unless you take care to keep it updated.

So when people ask such a low-level question, the best solution is often
to rearrange their code so that the compiler will take care of the
matter. E.g. give your data the proper data type, and maybe put it a
struct or union instead of aligning it "by hand" in some malloced area.

If you do need to know the alignment of some type T, anyway, you can
ask the compiler to align it for you and read out the alignment it used.
C++ has some alignof feature, I don't remember what. In C, use:
struct align_T { char dummy; T align; };
offsetof(struct align_T, align)
One step more specific: 'int' is supposed to have the host's natural
word size, so the alignment the compiler gives 'int' _may_ be the best
minimum alignment (that is, the best alignment for data types smaller
than int). For example, I've heard of some host where memcpy() tried to
first deal with the first <address mod 3bytes and then take the rest
in 4-byte chunks, or something like that. So if you want optimal access
to some char[] array, and it's not already inside a struct which
contains an int, a pointer, or some wider data type, you can use
union foo {
char data[SIZE];
int align;
};
Then the compiler will give it the proper alignment, and if you use
'union foo' pointers it will also know that the data is "int-size"
aligned, so it won't need to generate code to handle byte-sized
aligned data. (Unlike if you pass around char* pointers.)

On the other hand, even that little hack can be a pessimization:
For one thing, unions and structs are more complex to handle for the
compiler than simple char* arrays, so it may not optimize the code as
well. Also, pointers to different data types may have different
representations, and if you cause it to convert pointers back and forth
a lot you are out of luck. And if you increase the data size, you'll
increase memory use, reduce the usefulness of the cache, etc.

--
Hallvard
Aug 1 '06 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

29
by: Bart Nessux | last post by:
Just fooling around this weekend. Wrote and timed programs in C, Perl and Python. Each Program counts to 1,000,000 and prints each number to the console as it counts. I was a bit surprised. I'm not...
1
by: J. Campbell | last post by:
I have a feeling that I'm doing things all ass-backwards (again ;-), and would like some advice. What I want to do is: put some data to memory and then access that memory space as an array of...
14
by: | last post by:
When we have a structure in the following form typedef struct { int I1; int I2; int I3; int I4; float F1; float F2; float F3;
4
by: Thomas Paul Diffenbach | last post by:
Can anyone point me to an open source library of /statically allocated/ data structures? I'm writing some code that would benefit from trees, preferably self balancing, but on an embedded system...
13
by: aegis | last post by:
The following was mentioned by Eric Sosman from http://groups.google.com/group/comp.lang.c/msg/b696b28f59b9dac4?dmode=source "The alignment requirement for any type T must be a divisor of...
20
by: Lalatendu Das | last post by:
hi let's say i have a structure struct test { int A; char B; int C; }; this above structure defination always going to take 16 byte in memeory in whatever manner we align the member variables...
5
by: pt | last post by:
Hi, i am wonderng what is faster according to accessing speed to read these data structure from the disk in c/c++ including alignment handling if we access it on little endian system 32 bits...
5
by: xmllmx | last post by:
Please forgive me for cross-posting. I've post this to microsoft.publoc.vc.mfc. But I can't get any response. Maybe only MFC- related topics are cared there. To begin with code: union XXX {...
76
by: jacob navia | last post by:
Since standard C doesn't provide any way for the programmer to direct the compiler as to how to layout structures, most compilers provide some way to do this, albeit in different forms. ...
5
by: mdh | last post by:
The 3rd paragraph says: "Alignment requirements can generally be satisfied easily, at the cost of some wasted space, by ensuring that the allocator always return a poiner that meets all (...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.