473,569 Members | 2,762 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Properly aligned dynamic buffer

Hello everyone,

As far as I understand, if I request a uint8_t buffer,
it could be allocated anywhere.

uint8_t *buf = new uint8_t[1328]

By anywhere, I mean e.g. it could start at an odd address.

Therefore it might be incorrect to access 32 bits at a time:

*(uint32_t *)(buf+4*i) is probably illegal.

If I want a uint8_t buffer that is aligned on, say, 32 bits,
can I do the following:

uint8_t *buf = ( uint8_t * )( new uint32_t[1328/4] );

and have the guarantee that I can always access
*(uint32_t *)(buf+4*i) without any problem?

What if I want to align to 256 bits?
Do I have to create a bogus 256-bit structure?
(Since there is no native 256-bit integral type.)

Regards.
Dec 27 '06 #1
9 3602

Spoon wrote:
Hello everyone,
Hello
As far as I understand, if I request a uint8_t buffer,
it could be allocated anywhere.

uint8_t *buf = new uint8_t[1328]

By anywhere, I mean e.g. it could start at an odd address.
(*very* unlikely, see bellow)
Therefore it might be incorrect to access 32 bits at a time:

*(uint32_t *)(buf+4*i) is probably illegal.

If I want a uint8_t buffer that is aligned on, say, 32 bits,
can I do the following:

uint8_t *buf = ( uint8_t * )( new uint32_t[1328/4] );

and have the guarantee that I can always access
*(uint32_t *)(buf+4*i) without any problem?
Well depends on what degree of portability you want to reach. First of
all, above with 4*i you automatically assume that
std::numeric_li mits<uint8_t>:: digits == 8 (ie that a byte has 8 bits
which may not be the case on all platforms, sure also nobody says there
is a std::numeric_li mits<uint8_tbut usually if your platform provides
uint8_t it also comes with a numeric_limits specialization for it).

To reply to your question, yes a uint32_t allocated array should be
able to access it in sizeof(uint32_t ) byte offsets (I said
sizeof(uint32_t ) not 4 cause as I said it may not always be the same).
What if I want to align to 256 bits?
Do I have to create a bogus 256-bit structure?
(Since there is no native 256-bit integral type.)
While not all platforms allow one to access "words" (a term defined for
a specific platform) at any memory address and must be aligned
accordingly, they all (AFAIK) have addresses which are aligned for ANY
type of access. This feature is generally used by C/C++ memory
allocators (ex. malloc() needs to return such aligned for all memory
usually while C++'s new is slightly more free to perform some alignment
optimizations).

So in order to get "dynamic memory" for any alignment so far you may:
1. use std::malloc(), please note that the alignment guarantee is ONLY
at the address returned and at multiples of sizeof() of the object
accessed from that base address (like an array) and not some random
location within the returned memory buffer; but then again why not use
"new" for the type if you are going to store a single type (or array
of) at that buffer
2. perform some trick to spot properly aligned memory for the type
needed; this is used in boost::alignmen t_of<which has an interesting
implementation

My only need when I couldnt go with 1 and needed something like in 2
was when I had to do my own memory allocator over a given contigous
memory area (got with POSIX shared memory calls). In order to be more
portable I used the boost::alignmen t_of<stuff to determine properly
aligned addresses for whatever data structures the allocator users
required.

Hope this helps.

Dec 27 '06 #2

diz...@gmail.co m wrote:
Well depends on what degree of portability you want to reach. First of
all, above with 4*i you automatically assume that
std::numeric_li mits<uint8_t>:: digits == 8 (ie that a byte has 8 bits
which may not be the case on all platforms, sure also nobody says there
is a std::numeric_li mits<uint8_tbut usually if your platform provides
uint8_t it also comes with a numeric_limits specialization for it).
Whoops, I ment "you assume that std::numeric_li mits<char>::dig its == 8
(char because it's the only type of which the standard says that it's
sizeof() is always 1, ie a byte).

Dec 27 '06 #3

"Spoon" <de*****@localh ost.comwrote in message
news:45******** *************** @news.free.fr.. .
Hello everyone,

As far as I understand, if I request a uint8_t buffer,
it could be allocated anywhere.

uint8_t *buf = new uint8_t[1328]

By anywhere, I mean e.g. it could start at an odd address.
I don't believe so. I had read that malloc (which new generally uses)
returns an address that can be used for any type (char, int, etc..) so it
should always be aligned properly. dzone said the same thing.
Therefore it might be incorrect to access 32 bits at a time:

*(uint32_t *)(buf+4*i) is probably illegal.

If I want a uint8_t buffer that is aligned on, say, 32 bits,
can I do the following:

uint8_t *buf = ( uint8_t * )( new uint32_t[1328/4] );

and have the guarantee that I can always access
*(uint32_t *)(buf+4*i) without any problem?

What if I want to align to 256 bits?
Do I have to create a bogus 256-bit structure?
(Since there is no native 256-bit integral type.)

Regards.

Dec 28 '06 #4
dizone wrote:
Spoon wrote:
>As far as I understand, if I request a uint8_t buffer,
it could be allocated anywhere.

uint8_t *buf = new uint8_t[1328]

By anywhere, I mean e.g. it could start at an odd address.

(*very* unlikely, see below)
>Therefore it might be incorrect to access 32 bits at a time:

*(uint32_t *)(buf+4*i) is probably illegal.

If I want a uint8_t buffer that is aligned on, say, 32 bits,
can I do the following:

uint8_t *buf = ( uint8_t * )( new uint32_t[1328/4] );

and have the guarantee that I can always access
*(uint32_t *)(buf+4*i) without any problem?

Well depends on what degree of portability you want to reach. First of
all, above with 4*i you automatically assume that
std::numeric_li mits<uint8_t>:: digits == 8 (ie that a byte has 8 bits
which may not be the case on all platforms,
??

http://www.opengroup.org/onlinepubs/.../stdint.h.html

I only assume that my platform defines uint8_t, which denotes an
unsigned integer type with a width of exactly 8 bits.

AFAIU, (uint32_t *)(buf+4*i) and ((uint32_t *)buf)+i are strictly
equivalent. I don't assume anything, do I?
Dec 28 '06 #5

Spoon wrote:
dizone wrote:
Well depends on what degree of portability you want to reach. First of
all, above with 4*i you automatically assume that
std::numeric_li mits<uint8_t>:: digits == 8 (ie that a byte has 8 bits
which may not be the case on all platforms,

??

http://www.opengroup.org/onlinepubs/.../stdint.h.html

I only assume that my platform defines uint8_t, which denotes an
unsigned integer type with a width of exactly 8 bits.
Yes that was a stupid thing for me to say, I corrected myself in
another message, however, I ment to say you assume that
std::numeric_li mits<char>::dig its == 8 (that a byte has 8 bits).
>
AFAIU, (uint32_t *)(buf+4*i) and ((uint32_t *)buf)+i are strictly
equivalent. I don't assume anything, do I?
I'm not so sure. How can you be sure that on every platform
sizeof(uint8_t) * 4 == sizeof(uint32_t ) ?

As I was trying to say in my earlier message some platforms have a
different number of bits for a byte and thus it's natural to assume
that on those platforms alignment for a word won't happen in a multiple
of 8 bits but in a multiple of their byte (which lets assume it has 12
bits). In that case that platform in order to support fixed bitlen
types (like uint8_t) it will probably do some padding as such on that
platform sizeof(uint8_t) will be 1 (ie one 12bit byte can support a
8bit integer value) but sizeof(uint32_t ) might be 3 (ie 3 bytes,
because 3 * 12bit = 36bit and can support a 32bit integer value).

Dec 28 '06 #6
dizone wrote:
Spoon wrote:
>What if I want to align to 256 bits?
Do I have to create a bogus 256-bit structure?
(Since there is no native 256-bit integral type.)

While not all platforms allow one to access "words" (a term defined for
a specific platform) at any memory address and must be aligned
accordingly, they all (AFAIK) have addresses which are aligned for ANY
type of access. This feature is generally used by C/C++ memory
allocators (ex. malloc() needs to return such aligned for all memory
usually while C++'s new is slightly more free to perform some alignment
optimizations).
Let me say a bit more about what I want to do.

Consider two N-bit buffers A and B. (Typically N is ~10000)

I want to compute C = A XOR B.
That is, for each bit i, C[i] = A[i] XOR B[i]
As you can see, this problem is embarrassingly parallel.

The original naive solution was:

Q=N/8
uint8_t A[Q]; uint8_t B[Q]; uint8_t C[Q];
for (int i=0; i < Q; ++i) C[i] = A[i] ^ B[i];

The next step was to work with words (32 bits on my platform).

Q=N/32
uint32_t A[Q]; uint32_t B[Q]; uint32_t C[Q];
for (int i=0; i < Q; ++i) C[i] = A[i] ^ B[i];

Profiling indicated that I spend most of my time in this function, so I
figured I'd turn to platform-specific optimizations. My platform
provides 128-bit multimedia registers. But unaligned access incurs a
penalty. Thus, I want to guarantee that all 3 buffers are 128-bit
aligned, in order to write something like:

Q=N/128
uint32_t A[Q]; uint32_t B[Q]; uint32_t C[Q];
for (int i=0; i < Q; ++i) C[i] = A[i] ^ B[i];

(~16 times faster than my original implementation. )

How do I convince new to give me a 128-bit aligned buffer?

#include <cstdio>
struct foo { long long x,y; };
int main()
{
for (int i=0; i < 3; ++i) printf("%p\n", ( void * )( new foo ));
}

0x804a008
0x804a020
0x804a038

Are you saying I need to request "more" and "fix" the pointer?
So in order to get "dynamic memory" for any alignment so far you may:
1. use std::malloc(), please note that the alignment guarantee is ONLY
at the address returned and at multiples of sizeof() of the object
accessed from that base address (like an array) and not some random
location within the returned memory buffer; but then again why not use
"new" for the type if you are going to store a single type (or array
of) at that buffer
2. perform some trick to spot properly aligned memory for the type
needed; this is used in boost::alignmen t_of<which has an interesting
implementation

My only need when I couldn't go with 1 and needed something like in 2
was when I had to do my own memory allocator over a given contigous
memory area (got with POSIX shared memory calls). In order to be more
portable I used the boost::alignmen t_of<stuff to determine properly
aligned addresses for whatever data structures the allocator users
required.
Regards.
Dec 28 '06 #7
dizone wrote:
Spoon wrote:
>dizone wrote:
>>Well depends on what degree of portability you want to reach. First of
all, above with 4*i you automatically assume that
std::numeric_ limits<uint8_t> ::digits == 8 (ie that a byte has 8 bits
which may not be the case on all platforms,

??

http://www.opengroup.org/onlinepubs/.../stdint.h.html

I only assume that my platform defines uint8_t, which denotes an
unsigned integer type with a width of exactly 8 bits.

Yes that was a stupid thing for me to say, I corrected myself in
another message, however, I ment to say you assume that
std::numeric_li mits<char>::dig its == 8 (that a byte has 8 bits).
>AFAIU, (uint32_t *)(buf+4*i) and ((uint32_t *)buf)+i are strictly
equivalent. I don't assume anything, do I?

I'm not so sure. How can you be sure that on every platform
sizeof(uint8_t) * 4 == sizeof(uint32_t ) ?

As I was trying to say in my earlier message some platforms have a
different number of bits for a byte and thus it's natural to assume
that on those platforms alignment for a word won't happen in a multiple
of 8 bits but in a multiple of their byte (which lets assume it has 12
bits). In that case that platform in order to support fixed bitlen
types (like uint8_t) it will probably do some padding as such on that
platform sizeof(uint8_t) will be 1 (ie one 12bit byte can support a
8bit integer value) but sizeof(uint32_t ) might be 3 (ie 3 bytes,
because 3 * 12bit = 36bit and can support a 32bit integer value).
Perhaps I misread the standard, but it seems to me that uint8_t is only
defined on platforms where there exists a native unsigned integer type
with a width of *exactly* 8 bits.

Thus, on your hypothetical platform with 12-bit chars, uint8_t would not
be defined, as far as I understand.
Dec 28 '06 #8
Spoon wrote:
dizone wrote:
Spoon wrote:
What if I want to align to 256 bits?
Do I have to create a bogus 256-bit structure?
(Since there is no native 256-bit integral type.)
While not all platforms allow one to access "words" (a term defined for
a specific platform) at any memory address and must be aligned
accordingly, they all (AFAIK) have addresses which are aligned for ANY
type of access. This feature is generally used by C/C++ memory
allocators (ex. malloc() needs to return such aligned for all memory
usually while C++'s new is slightly more free to perform some alignment
optimizations).

Let me say a bit more about what I want to do.

Consider two N-bit buffers A and B. (Typically N is ~10000)

I want to compute C = A XOR B.
That is, for each bit i, C[i] = A[i] XOR B[i]
As you can see, this problem is embarrassingly parallel.

The original naive solution was:

Q=N/8
uint8_t A[Q]; uint8_t B[Q]; uint8_t C[Q];
for (int i=0; i < Q; ++i) C[i] = A[i] ^ B[i];
slow and unportable (doesnt catch all the bits as I explained with the
12bit byte platforms).
The next step was to work with words (32 bits on my platform).

Q=N/32
uint32_t A[Q]; uint32_t B[Q]; uint32_t C[Q];
for (int i=0; i < Q; ++i) C[i] = A[i] ^ B[i];
Much faster but still unportable :)
Profiling indicated that I spend most of my time in this function, so I
figured I'd turn to platform-specific optimizations. My platform
provides 128-bit multimedia registers. But unaligned access incurs a
penalty. Thus, I want to guarantee that all 3 buffers are 128-bit
aligned, in order to write something like:

Q=N/128
uint32_t A[Q]; uint32_t B[Q]; uint32_t C[Q];
for (int i=0; i < Q; ++i) C[i] = A[i] ^ B[i];

(~16 times faster than my original implementation. )

How do I convince new to give me a 128-bit aligned buffer?
By asking it to allocate something of 128bit I guess although Im not so
sure about that (because of the problems with 8bit not being a byte).

However, in your situation, what I would do were to just use the
"natural" platform integer type and that's what "int" is. This is
actually why C/C++ never specify fixed bit sizes for integers because
they want to allow the coder to just use "int" which would turn into
the natural platform integer type (for 32bit platforms it has 32bit for
64bit it has 32 or 64, it depends).

So in your situation I would just use new int[some size] and cycle over
it and XOR it. This guarantees that it should be fast ("int" is the
natural platform integer, the one with which works fastest normally)
and that its portable (for 12bit byte platforms "int" would probably
turn into something of 12bit multiple size thus no issues with the
padding bits when XORing the buffer).

I would also test with long and benchmark against int, if it provides
better speed (long still is portable).

However, if you want to use special CPU instructions (that work with
128bit integers) you may get memory alligned for anything (including
for 128bit access) with std::malloc(). This use std::malloc() to get
the memory and the special CPU instructions to cycle and XOR it.
>
#include <cstdio>
struct foo { long long x,y; };
int main()
{
for (int i=0; i < 3; ++i) printf("%p\n", ( void * )( new foo ));
}

0x804a008
0x804a020
0x804a038

Are you saying I need to request "more" and "fix" the pointer?
That is another solution too. You can request more and fix the pointer
using boost::alignmen t_of<but in your particular case I don't think
it's needed, just test with int/long normal "new" and XOR cycle and if
that's not fast enough use std::malloc() to get memory aligned for
anything and the special CPU 128bit working instructions over it.

Dec 28 '06 #9
Spoon wrote:
Perhaps I misread the standard, but it seems to me that uint8_t is only
defined on platforms where there exists a native unsigned integer type
with a width of *exactly* 8 bits.

Thus, on your hypothetical platform with 12-bit chars, uint8_t would not
be defined, as far as I understand.
I see, sorry about that, I only limited myself to ISO C++ and there is
no uint8_t in it so I assumed some things about it :)

Indeed if that's the text then my example is of no use (however I
really think I read about REAL platforms with 12bit bytes length so not
only in theory).

Dec 28 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
1554
by: Shailesh | last post by:
If I'm not mistaken, C++ doesn't have support for dynamic class members. I'm considering if such a facility would be useful, and what method would make a good workaround. I have a generic buffer class, which can have one or more cursor classes attached to it. Right now, they are both separate classes, and I use them as follows: class...
4
7667
by: Scott Lyons | last post by:
Hey all, Can someone help me figure out how to pass a dynamic array into a function? Its been giving me some trouble, and my textbook of course doesnt cover the issue. Its probably something simple, but its just not popping into my mind at the moment. My little snippet of code is below. Basically, the studentID array is dynamic so it...
8
3669
by: Peter B. Steiger | last post by:
The latest project in my ongoing quest to evolve my brain from Pascal to C is a simple word game that involves stringing together random lists of words. In the Pascal version the whole array was static; if the input file contained more than entries, tough. This time I want to do it right - use a dynamic array that increases in size with...
5
3742
by: swarsa | last post by:
Hi All, I realize this is not a Palm OS development forum, however, even though my question is about a Palm C program I'm writing, I believe the topics are relevant here. This is because I believe the problem centers around my handling of strings, arrays, pointers and dynamic memory allocation. Here is the problem I'm trying to solve: ...
13
2020
by: coosa | last post by:
Dear all, Using the conio implementation i wanted to create a dynamic string, whereby its size would be determined after each keyboard hit; in other words, i don't want to ask the user to specify the the size, but rather keep him/her typing and after each keyboard hit, the function getch() determines whether he/she entered the ENTER key to...
0
1410
by: vinbelgian | last post by:
I have some trouble with making a buffer in vb.net. I use a C dll that requires me to give him a pointer to a buffer of bytes where he is going to write bytes to, depending on the command i give. the import of the dll: with the C procedures in comments ' declarations for integration test.dll ' int __stdcall test_Open(char *device, long...
0
2006
by: =?Utf-8?B?QmlzaG95?= | last post by:
Hi All, I have a Right to Left web page with asp.net 2.0 containing a horizontal menu. The menu is right to left as the page. The problem is the submenu items or the dynamic menu items are not aligned right, so the menu has a bad shape. -------------------------------------------------------------
4
1999
by: Asm23 | last post by:
Hi i'm using intel P4. when I write the statement like char *p=new char what's the p value, is it aligned by 4 bytes? that's is p%4==0? and, now I'm using the Intel SSE2 instruction to do some fast algorthm, some instructions I need to move data from memory to XMM0 registers(which is 128 bits) should be be aligned on 16-byte
6
3875
by: Richard Gilmore | last post by:
Ok I need to create a dynamic array of pointers to strings, the number of strings is determined by the typed value at the keyboard, and then the length of each string is determined as it's typed in char **zerg; char ch; int num; int i; char buffer; printf("Enter the number of names you wish to store>"); scanf("%d",&num); zerg = (char...
0
7694
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7609
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7921
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
1
7666
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
7964
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
6278
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
1
2107
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1208
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
936
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.