473,326 Members | 2,010 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

Properly aligned dynamic buffer

Hello everyone,

As far as I understand, if I request a uint8_t buffer,
it could be allocated anywhere.

uint8_t *buf = new uint8_t[1328]

By anywhere, I mean e.g. it could start at an odd address.

Therefore it might be incorrect to access 32 bits at a time:

*(uint32_t *)(buf+4*i) is probably illegal.

If I want a uint8_t buffer that is aligned on, say, 32 bits,
can I do the following:

uint8_t *buf = ( uint8_t * )( new uint32_t[1328/4] );

and have the guarantee that I can always access
*(uint32_t *)(buf+4*i) without any problem?

What if I want to align to 256 bits?
Do I have to create a bogus 256-bit structure?
(Since there is no native 256-bit integral type.)

Regards.
Dec 27 '06 #1
9 3583

Spoon wrote:
Hello everyone,
Hello
As far as I understand, if I request a uint8_t buffer,
it could be allocated anywhere.

uint8_t *buf = new uint8_t[1328]

By anywhere, I mean e.g. it could start at an odd address.
(*very* unlikely, see bellow)
Therefore it might be incorrect to access 32 bits at a time:

*(uint32_t *)(buf+4*i) is probably illegal.

If I want a uint8_t buffer that is aligned on, say, 32 bits,
can I do the following:

uint8_t *buf = ( uint8_t * )( new uint32_t[1328/4] );

and have the guarantee that I can always access
*(uint32_t *)(buf+4*i) without any problem?
Well depends on what degree of portability you want to reach. First of
all, above with 4*i you automatically assume that
std::numeric_limits<uint8_t>::digits == 8 (ie that a byte has 8 bits
which may not be the case on all platforms, sure also nobody says there
is a std::numeric_limits<uint8_tbut usually if your platform provides
uint8_t it also comes with a numeric_limits specialization for it).

To reply to your question, yes a uint32_t allocated array should be
able to access it in sizeof(uint32_t) byte offsets (I said
sizeof(uint32_t) not 4 cause as I said it may not always be the same).
What if I want to align to 256 bits?
Do I have to create a bogus 256-bit structure?
(Since there is no native 256-bit integral type.)
While not all platforms allow one to access "words" (a term defined for
a specific platform) at any memory address and must be aligned
accordingly, they all (AFAIK) have addresses which are aligned for ANY
type of access. This feature is generally used by C/C++ memory
allocators (ex. malloc() needs to return such aligned for all memory
usually while C++'s new is slightly more free to perform some alignment
optimizations).

So in order to get "dynamic memory" for any alignment so far you may:
1. use std::malloc(), please note that the alignment guarantee is ONLY
at the address returned and at multiples of sizeof() of the object
accessed from that base address (like an array) and not some random
location within the returned memory buffer; but then again why not use
"new" for the type if you are going to store a single type (or array
of) at that buffer
2. perform some trick to spot properly aligned memory for the type
needed; this is used in boost::alignment_of<which has an interesting
implementation

My only need when I couldnt go with 1 and needed something like in 2
was when I had to do my own memory allocator over a given contigous
memory area (got with POSIX shared memory calls). In order to be more
portable I used the boost::alignment_of<stuff to determine properly
aligned addresses for whatever data structures the allocator users
required.

Hope this helps.

Dec 27 '06 #2

diz...@gmail.com wrote:
Well depends on what degree of portability you want to reach. First of
all, above with 4*i you automatically assume that
std::numeric_limits<uint8_t>::digits == 8 (ie that a byte has 8 bits
which may not be the case on all platforms, sure also nobody says there
is a std::numeric_limits<uint8_tbut usually if your platform provides
uint8_t it also comes with a numeric_limits specialization for it).
Whoops, I ment "you assume that std::numeric_limits<char>::digits == 8
(char because it's the only type of which the standard says that it's
sizeof() is always 1, ie a byte).

Dec 27 '06 #3

"Spoon" <de*****@localhost.comwrote in message
news:45***********************@news.free.fr...
Hello everyone,

As far as I understand, if I request a uint8_t buffer,
it could be allocated anywhere.

uint8_t *buf = new uint8_t[1328]

By anywhere, I mean e.g. it could start at an odd address.
I don't believe so. I had read that malloc (which new generally uses)
returns an address that can be used for any type (char, int, etc..) so it
should always be aligned properly. dzone said the same thing.
Therefore it might be incorrect to access 32 bits at a time:

*(uint32_t *)(buf+4*i) is probably illegal.

If I want a uint8_t buffer that is aligned on, say, 32 bits,
can I do the following:

uint8_t *buf = ( uint8_t * )( new uint32_t[1328/4] );

and have the guarantee that I can always access
*(uint32_t *)(buf+4*i) without any problem?

What if I want to align to 256 bits?
Do I have to create a bogus 256-bit structure?
(Since there is no native 256-bit integral type.)

Regards.

Dec 28 '06 #4
dizone wrote:
Spoon wrote:
>As far as I understand, if I request a uint8_t buffer,
it could be allocated anywhere.

uint8_t *buf = new uint8_t[1328]

By anywhere, I mean e.g. it could start at an odd address.

(*very* unlikely, see below)
>Therefore it might be incorrect to access 32 bits at a time:

*(uint32_t *)(buf+4*i) is probably illegal.

If I want a uint8_t buffer that is aligned on, say, 32 bits,
can I do the following:

uint8_t *buf = ( uint8_t * )( new uint32_t[1328/4] );

and have the guarantee that I can always access
*(uint32_t *)(buf+4*i) without any problem?

Well depends on what degree of portability you want to reach. First of
all, above with 4*i you automatically assume that
std::numeric_limits<uint8_t>::digits == 8 (ie that a byte has 8 bits
which may not be the case on all platforms,
??

http://www.opengroup.org/onlinepubs/.../stdint.h.html

I only assume that my platform defines uint8_t, which denotes an
unsigned integer type with a width of exactly 8 bits.

AFAIU, (uint32_t *)(buf+4*i) and ((uint32_t *)buf)+i are strictly
equivalent. I don't assume anything, do I?
Dec 28 '06 #5

Spoon wrote:
dizone wrote:
Well depends on what degree of portability you want to reach. First of
all, above with 4*i you automatically assume that
std::numeric_limits<uint8_t>::digits == 8 (ie that a byte has 8 bits
which may not be the case on all platforms,

??

http://www.opengroup.org/onlinepubs/.../stdint.h.html

I only assume that my platform defines uint8_t, which denotes an
unsigned integer type with a width of exactly 8 bits.
Yes that was a stupid thing for me to say, I corrected myself in
another message, however, I ment to say you assume that
std::numeric_limits<char>::digits == 8 (that a byte has 8 bits).
>
AFAIU, (uint32_t *)(buf+4*i) and ((uint32_t *)buf)+i are strictly
equivalent. I don't assume anything, do I?
I'm not so sure. How can you be sure that on every platform
sizeof(uint8_t) * 4 == sizeof(uint32_t) ?

As I was trying to say in my earlier message some platforms have a
different number of bits for a byte and thus it's natural to assume
that on those platforms alignment for a word won't happen in a multiple
of 8 bits but in a multiple of their byte (which lets assume it has 12
bits). In that case that platform in order to support fixed bitlen
types (like uint8_t) it will probably do some padding as such on that
platform sizeof(uint8_t) will be 1 (ie one 12bit byte can support a
8bit integer value) but sizeof(uint32_t) might be 3 (ie 3 bytes,
because 3 * 12bit = 36bit and can support a 32bit integer value).

Dec 28 '06 #6
dizone wrote:
Spoon wrote:
>What if I want to align to 256 bits?
Do I have to create a bogus 256-bit structure?
(Since there is no native 256-bit integral type.)

While not all platforms allow one to access "words" (a term defined for
a specific platform) at any memory address and must be aligned
accordingly, they all (AFAIK) have addresses which are aligned for ANY
type of access. This feature is generally used by C/C++ memory
allocators (ex. malloc() needs to return such aligned for all memory
usually while C++'s new is slightly more free to perform some alignment
optimizations).
Let me say a bit more about what I want to do.

Consider two N-bit buffers A and B. (Typically N is ~10000)

I want to compute C = A XOR B.
That is, for each bit i, C[i] = A[i] XOR B[i]
As you can see, this problem is embarrassingly parallel.

The original naive solution was:

Q=N/8
uint8_t A[Q]; uint8_t B[Q]; uint8_t C[Q];
for (int i=0; i < Q; ++i) C[i] = A[i] ^ B[i];

The next step was to work with words (32 bits on my platform).

Q=N/32
uint32_t A[Q]; uint32_t B[Q]; uint32_t C[Q];
for (int i=0; i < Q; ++i) C[i] = A[i] ^ B[i];

Profiling indicated that I spend most of my time in this function, so I
figured I'd turn to platform-specific optimizations. My platform
provides 128-bit multimedia registers. But unaligned access incurs a
penalty. Thus, I want to guarantee that all 3 buffers are 128-bit
aligned, in order to write something like:

Q=N/128
uint32_t A[Q]; uint32_t B[Q]; uint32_t C[Q];
for (int i=0; i < Q; ++i) C[i] = A[i] ^ B[i];

(~16 times faster than my original implementation.)

How do I convince new to give me a 128-bit aligned buffer?

#include <cstdio>
struct foo { long long x,y; };
int main()
{
for (int i=0; i < 3; ++i) printf("%p\n", ( void * )( new foo ));
}

0x804a008
0x804a020
0x804a038

Are you saying I need to request "more" and "fix" the pointer?
So in order to get "dynamic memory" for any alignment so far you may:
1. use std::malloc(), please note that the alignment guarantee is ONLY
at the address returned and at multiples of sizeof() of the object
accessed from that base address (like an array) and not some random
location within the returned memory buffer; but then again why not use
"new" for the type if you are going to store a single type (or array
of) at that buffer
2. perform some trick to spot properly aligned memory for the type
needed; this is used in boost::alignment_of<which has an interesting
implementation

My only need when I couldn't go with 1 and needed something like in 2
was when I had to do my own memory allocator over a given contigous
memory area (got with POSIX shared memory calls). In order to be more
portable I used the boost::alignment_of<stuff to determine properly
aligned addresses for whatever data structures the allocator users
required.
Regards.
Dec 28 '06 #7
dizone wrote:
Spoon wrote:
>dizone wrote:
>>Well depends on what degree of portability you want to reach. First of
all, above with 4*i you automatically assume that
std::numeric_limits<uint8_t>::digits == 8 (ie that a byte has 8 bits
which may not be the case on all platforms,

??

http://www.opengroup.org/onlinepubs/.../stdint.h.html

I only assume that my platform defines uint8_t, which denotes an
unsigned integer type with a width of exactly 8 bits.

Yes that was a stupid thing for me to say, I corrected myself in
another message, however, I ment to say you assume that
std::numeric_limits<char>::digits == 8 (that a byte has 8 bits).
>AFAIU, (uint32_t *)(buf+4*i) and ((uint32_t *)buf)+i are strictly
equivalent. I don't assume anything, do I?

I'm not so sure. How can you be sure that on every platform
sizeof(uint8_t) * 4 == sizeof(uint32_t) ?

As I was trying to say in my earlier message some platforms have a
different number of bits for a byte and thus it's natural to assume
that on those platforms alignment for a word won't happen in a multiple
of 8 bits but in a multiple of their byte (which lets assume it has 12
bits). In that case that platform in order to support fixed bitlen
types (like uint8_t) it will probably do some padding as such on that
platform sizeof(uint8_t) will be 1 (ie one 12bit byte can support a
8bit integer value) but sizeof(uint32_t) might be 3 (ie 3 bytes,
because 3 * 12bit = 36bit and can support a 32bit integer value).
Perhaps I misread the standard, but it seems to me that uint8_t is only
defined on platforms where there exists a native unsigned integer type
with a width of *exactly* 8 bits.

Thus, on your hypothetical platform with 12-bit chars, uint8_t would not
be defined, as far as I understand.
Dec 28 '06 #8
Spoon wrote:
dizone wrote:
Spoon wrote:
What if I want to align to 256 bits?
Do I have to create a bogus 256-bit structure?
(Since there is no native 256-bit integral type.)
While not all platforms allow one to access "words" (a term defined for
a specific platform) at any memory address and must be aligned
accordingly, they all (AFAIK) have addresses which are aligned for ANY
type of access. This feature is generally used by C/C++ memory
allocators (ex. malloc() needs to return such aligned for all memory
usually while C++'s new is slightly more free to perform some alignment
optimizations).

Let me say a bit more about what I want to do.

Consider two N-bit buffers A and B. (Typically N is ~10000)

I want to compute C = A XOR B.
That is, for each bit i, C[i] = A[i] XOR B[i]
As you can see, this problem is embarrassingly parallel.

The original naive solution was:

Q=N/8
uint8_t A[Q]; uint8_t B[Q]; uint8_t C[Q];
for (int i=0; i < Q; ++i) C[i] = A[i] ^ B[i];
slow and unportable (doesnt catch all the bits as I explained with the
12bit byte platforms).
The next step was to work with words (32 bits on my platform).

Q=N/32
uint32_t A[Q]; uint32_t B[Q]; uint32_t C[Q];
for (int i=0; i < Q; ++i) C[i] = A[i] ^ B[i];
Much faster but still unportable :)
Profiling indicated that I spend most of my time in this function, so I
figured I'd turn to platform-specific optimizations. My platform
provides 128-bit multimedia registers. But unaligned access incurs a
penalty. Thus, I want to guarantee that all 3 buffers are 128-bit
aligned, in order to write something like:

Q=N/128
uint32_t A[Q]; uint32_t B[Q]; uint32_t C[Q];
for (int i=0; i < Q; ++i) C[i] = A[i] ^ B[i];

(~16 times faster than my original implementation.)

How do I convince new to give me a 128-bit aligned buffer?
By asking it to allocate something of 128bit I guess although Im not so
sure about that (because of the problems with 8bit not being a byte).

However, in your situation, what I would do were to just use the
"natural" platform integer type and that's what "int" is. This is
actually why C/C++ never specify fixed bit sizes for integers because
they want to allow the coder to just use "int" which would turn into
the natural platform integer type (for 32bit platforms it has 32bit for
64bit it has 32 or 64, it depends).

So in your situation I would just use new int[some size] and cycle over
it and XOR it. This guarantees that it should be fast ("int" is the
natural platform integer, the one with which works fastest normally)
and that its portable (for 12bit byte platforms "int" would probably
turn into something of 12bit multiple size thus no issues with the
padding bits when XORing the buffer).

I would also test with long and benchmark against int, if it provides
better speed (long still is portable).

However, if you want to use special CPU instructions (that work with
128bit integers) you may get memory alligned for anything (including
for 128bit access) with std::malloc(). This use std::malloc() to get
the memory and the special CPU instructions to cycle and XOR it.
>
#include <cstdio>
struct foo { long long x,y; };
int main()
{
for (int i=0; i < 3; ++i) printf("%p\n", ( void * )( new foo ));
}

0x804a008
0x804a020
0x804a038

Are you saying I need to request "more" and "fix" the pointer?
That is another solution too. You can request more and fix the pointer
using boost::alignment_of<but in your particular case I don't think
it's needed, just test with int/long normal "new" and XOR cycle and if
that's not fast enough use std::malloc() to get memory aligned for
anything and the special CPU 128bit working instructions over it.

Dec 28 '06 #9
Spoon wrote:
Perhaps I misread the standard, but it seems to me that uint8_t is only
defined on platforms where there exists a native unsigned integer type
with a width of *exactly* 8 bits.

Thus, on your hypothetical platform with 12-bit chars, uint8_t would not
be defined, as far as I understand.
I see, sorry about that, I only limited myself to ISO C++ and there is
no uint8_t in it so I assumed some things about it :)

Indeed if that's the text then my example is of no use (however I
really think I read about REAL platforms with 12bit bytes length so not
only in theory).

Dec 28 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Shailesh | last post by:
If I'm not mistaken, C++ doesn't have support for dynamic class members. I'm considering if such a facility would be useful, and what method would make a good workaround. I have a generic...
4
by: Scott Lyons | last post by:
Hey all, Can someone help me figure out how to pass a dynamic array into a function? Its been giving me some trouble, and my textbook of course doesnt cover the issue. Its probably something...
8
by: Peter B. Steiger | last post by:
The latest project in my ongoing quest to evolve my brain from Pascal to C is a simple word game that involves stringing together random lists of words. In the Pascal version the whole array was...
5
by: swarsa | last post by:
Hi All, I realize this is not a Palm OS development forum, however, even though my question is about a Palm C program I'm writing, I believe the topics are relevant here. This is because I...
13
by: coosa | last post by:
Dear all, Using the conio implementation i wanted to create a dynamic string, whereby its size would be determined after each keyboard hit; in other words, i don't want to ask the user to...
0
by: vinbelgian | last post by:
I have some trouble with making a buffer in vb.net. I use a C dll that requires me to give him a pointer to a buffer of bytes where he is going to write bytes to, depending on the command i...
0
by: =?Utf-8?B?QmlzaG95?= | last post by:
Hi All, I have a Right to Left web page with asp.net 2.0 containing a horizontal menu. The menu is right to left as the page. The problem is the submenu items or the dynamic menu items are...
4
by: Asm23 | last post by:
Hi i'm using intel P4. when I write the statement like char *p=new char what's the p value, is it aligned by 4 bytes? that's is p%4==0? and, now I'm using the Intel SSE2 instruction to do...
6
by: Richard Gilmore | last post by:
Ok I need to create a dynamic array of pointers to strings, the number of strings is determined by the typed value at the keyboard, and then the length of each string is determined as it's typed in ...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.