Properly aligned dynamic buffer

Spoon

Hello everyone,

As far as I understand, if I request a uint8_t buffer,
it could be allocated anywhere.

uint8_t *buf = new uint8_t[1328]

By anywhere, I mean e.g. it could start at an odd address.

Therefore it might be incorrect to access 32 bits at a time:

*(uint32_t *)(buf+4*i) is probably illegal.

If I want a uint8_t buffer that is aligned on, say, 32 bits,
can I do the following:

uint8_t *buf = ( uint8_t * )( new uint32_t[1328/4] );

and have the guarantee that I can always access
*(uint32_t *)(buf+4*i) without any problem?

What if I want to align to 256 bits?
Do I have to create a bogus 256-bit structure?
(Since there is no native 256-bit integral type.)

Regards.

Dec 27 '06 #1

Subscribe Reply

3602

dizone

Spoon wrote:

Hello everyone,

Hello

As far as I understand, if I request a uint8_t buffer,
it could be allocated anywhere.

uint8_t *buf = new uint8_t[1328]

By anywhere, I mean e.g. it could start at an odd address.

(*very* unlikely, see bellow)

Therefore it might be incorrect to access 32 bits at a time:

*(uint32_t *)(buf+4*i) is probably illegal.

If I want a uint8_t buffer that is aligned on, say, 32 bits,
can I do the following:

uint8_t *buf = ( uint8_t * )( new uint32_t[1328/4] );

and have the guarantee that I can always access
*(uint32_t *)(buf+4*i) without any problem?

Well depends on what degree of portability you want to reach. First of
all, above with 4*i you automatically assume that
std::numeric_li mits<uint8_t>:: digits == 8 (ie that a byte has 8 bits
which may not be the case on all platforms, sure also nobody says there
is a std::numeric_li mits<uint8_tbut usually if your platform provides
uint8_t it also comes with a numeric_limits specialization for it).

To reply to your question, yes a uint32_t allocated array should be
able to access it in sizeof(uint32_t ) byte offsets (I said
sizeof(uint32_t ) not 4 cause as I said it may not always be the same).

What if I want to align to 256 bits?
Do I have to create a bogus 256-bit structure?
(Since there is no native 256-bit integral type.)

While not all platforms allow one to access "words" (a term defined for
a specific platform) at any memory address and must be aligned
accordingly, they all (AFAIK) have addresses which are aligned for ANY
type of access. This feature is generally used by C/C++ memory
allocators (ex. malloc() needs to return such aligned for all memory
usually while C++'s new is slightly more free to perform some alignment
optimizations).

So in order to get "dynamic memory" for any alignment so far you may:
1. use std::malloc(), please note that the alignment guarantee is ONLY
at the address returned and at multiples of sizeof() of the object
accessed from that base address (like an array) and not some random
location within the returned memory buffer; but then again why not use
"new" for the type if you are going to store a single type (or array
of) at that buffer
2. perform some trick to spot properly aligned memory for the type
needed; this is used in boost::alignmen t_of<which has an interesting
implementation

My only need when I couldnt go with 1 and needed something like in 2
was when I had to do my own memory allocator over a given contigous
memory area (got with POSIX shared memory calls). In order to be more
portable I used the boost::alignmen t_of<stuff to determine properly
aligned addresses for whatever data structures the allocator users
required.

Hope this helps.

Dec 27 '06 #2

dizone

diz...@gmail.co m wrote:

Well depends on what degree of portability you want to reach. First of
all, above with 4*i you automatically assume that
std::numeric_li mits<uint8_t>:: digits == 8 (ie that a byte has 8 bits
which may not be the case on all platforms, sure also nobody says there
is a std::numeric_li mits<uint8_tbut usually if your platform provides
uint8_t it also comes with a numeric_limits specialization for it).

Whoops, I ment "you assume that std::numeric_li mits<char>::dig its == 8
(char because it's the only type of which the standard says that it's
sizeof() is always 1, ie a byte).

Dec 27 '06 #3

Jim Langston

"Spoon" <de*****@localh ost.comwrote in message
news:45******** *************** @news.free.fr.. .

Hello everyone,

As far as I understand, if I request a uint8_t buffer,
it could be allocated anywhere.

uint8_t *buf = new uint8_t[1328]

By anywhere, I mean e.g. it could start at an odd address.

I don't believe so. I had read that malloc (which new generally uses)
returns an address that can be used for any type (char, int, etc..) so it
should always be aligned properly. dzone said the same thing.

Therefore it might be incorrect to access 32 bits at a time:

*(uint32_t *)(buf+4*i) is probably illegal.

If I want a uint8_t buffer that is aligned on, say, 32 bits,
can I do the following:

uint8_t *buf = ( uint8_t * )( new uint32_t[1328/4] );

and have the guarantee that I can always access
*(uint32_t *)(buf+4*i) without any problem?

What if I want to align to 256 bits?
Do I have to create a bogus 256-bit structure?
(Since there is no native 256-bit integral type.)

Regards.

Dec 28 '06 #4

Spoon

dizone wrote:

Spoon wrote:

>As far as I understand, if I request a uint8_t buffer,
it could be allocated anywhere.

uint8_t *buf = new uint8_t[1328]

By anywhere, I mean e.g. it could start at an odd address.

(*very* unlikely, see below)

>Therefore it might be incorrect to access 32 bits at a time:

*(uint32_t *)(buf+4*i) is probably illegal.

If I want a uint8_t buffer that is aligned on, say, 32 bits,
can I do the following:

uint8_t *buf = ( uint8_t * )( new uint32_t[1328/4] );

and have the guarantee that I can always access
*(uint32_t *)(buf+4*i) without any problem?

Well depends on what degree of portability you want to reach. First of
all, above with 4*i you automatically assume that
std::numeric_li mits<uint8_t>:: digits == 8 (ie that a byte has 8 bits
which may not be the case on all platforms,

??

http://www.opengroup.org/onlinepubs/.../stdint.h.html

I only assume that my platform defines uint8_t, which denotes an
unsigned integer type with a width of exactly 8 bits.

AFAIU, (uint32_t *)(buf+4*i) and ((uint32_t *)buf)+i are strictly
equivalent. I don't assume anything, do I?

Dec 28 '06 #5

dizone

Spoon wrote:

dizone wrote:

Well depends on what degree of portability you want to reach. First of
all, above with 4*i you automatically assume that
std::numeric_li mits<uint8_t>:: digits == 8 (ie that a byte has 8 bits
which may not be the case on all platforms,

??

http://www.opengroup.org/onlinepubs/.../stdint.h.html

I only assume that my platform defines uint8_t, which denotes an
unsigned integer type with a width of exactly 8 bits.

Yes that was a stupid thing for me to say, I corrected myself in
another message, however, I ment to say you assume that
std::numeric_li mits<char>::dig its == 8 (that a byte has 8 bits).

>
AFAIU, (uint32_t *)(buf+4*i) and ((uint32_t *)buf)+i are strictly
equivalent. I don't assume anything, do I?

I'm not so sure. How can you be sure that on every platform
sizeof(uint8_t) * 4 == sizeof(uint32_t ) ?

As I was trying to say in my earlier message some platforms have a
different number of bits for a byte and thus it's natural to assume
that on those platforms alignment for a word won't happen in a multiple
of 8 bits but in a multiple of their byte (which lets assume it has 12
bits). In that case that platform in order to support fixed bitlen
types (like uint8_t) it will probably do some padding as such on that
platform sizeof(uint8_t) will be 1 (ie one 12bit byte can support a
8bit integer value) but sizeof(uint32_t ) might be 3 (ie 3 bytes,
because 3 * 12bit = 36bit and can support a 32bit integer value).

Dec 28 '06 #6

Spoon

dizone wrote:

Spoon wrote:

>What if I want to align to 256 bits?
Do I have to create a bogus 256-bit structure?
(Since there is no native 256-bit integral type.)

While not all platforms allow one to access "words" (a term defined for
a specific platform) at any memory address and must be aligned
accordingly, they all (AFAIK) have addresses which are aligned for ANY
type of access. This feature is generally used by C/C++ memory
allocators (ex. malloc() needs to return such aligned for all memory
usually while C++'s new is slightly more free to perform some alignment
optimizations).

Let me say a bit more about what I want to do.

Consider two N-bit buffers A and B. (Typically N is ~10000)

I want to compute C = A XOR B.
That is, for each bit i, C[i] = A[i] XOR B[i]
As you can see, this problem is embarrassingly parallel.

The original naive solution was:

Q=N/8
uint8_t A[Q]; uint8_t B[Q]; uint8_t C[Q];
for (int i=0; i < Q; ++i) C[i] = A[i] ^ B[i];

The next step was to work with words (32 bits on my platform).

Q=N/32
uint32_t A[Q]; uint32_t B[Q]; uint32_t C[Q];
for (int i=0; i < Q; ++i) C[i] = A[i] ^ B[i];

Profiling indicated that I spend most of my time in this function, so I
figured I'd turn to platform-specific optimizations. My platform
provides 128-bit multimedia registers. But unaligned access incurs a
penalty. Thus, I want to guarantee that all 3 buffers are 128-bit
aligned, in order to write something like:

Q=N/128
uint32_t A[Q]; uint32_t B[Q]; uint32_t C[Q];
for (int i=0; i < Q; ++i) C[i] = A[i] ^ B[i];

(~16 times faster than my original implementation. )

How do I convince new to give me a 128-bit aligned buffer?

#include <cstdio>
struct foo { long long x,y; };
int main()
{
for (int i=0; i < 3; ++i) printf("%p\n", ( void * )( new foo ));
}

0x804a008
0x804a020
0x804a038

Are you saying I need to request "more" and "fix" the pointer?

So in order to get "dynamic memory" for any alignment so far you may:
1. use std::malloc(), please note that the alignment guarantee is ONLY
at the address returned and at multiples of sizeof() of the object
accessed from that base address (like an array) and not some random
location within the returned memory buffer; but then again why not use
"new" for the type if you are going to store a single type (or array
of) at that buffer
2. perform some trick to spot properly aligned memory for the type
needed; this is used in boost::alignmen t_of<which has an interesting
implementation

My only need when I couldn't go with 1 and needed something like in 2
was when I had to do my own memory allocator over a given contigous
memory area (got with POSIX shared memory calls). In order to be more
portable I used the boost::alignmen t_of<stuff to determine properly
aligned addresses for whatever data structures the allocator users
required.

Regards.

Dec 28 '06 #7

Spoon

dizone wrote:

Spoon wrote:

>dizone wrote:

>>Well depends on what degree of portability you want to reach. First of
all, above with 4*i you automatically assume that
std::numeric_ limits<uint8_t> ::digits == 8 (ie that a byte has 8 bits
which may not be the case on all platforms,

??

http://www.opengroup.org/onlinepubs/.../stdint.h.html

I only assume that my platform defines uint8_t, which denotes an
unsigned integer type with a width of exactly 8 bits.

Yes that was a stupid thing for me to say, I corrected myself in
another message, however, I ment to say you assume that
std::numeric_li mits<char>::dig its == 8 (that a byte has 8 bits).

>AFAIU, (uint32_t *)(buf+4*i) and ((uint32_t *)buf)+i are strictly
equivalent. I don't assume anything, do I?

I'm not so sure. How can you be sure that on every platform
sizeof(uint8_t) * 4 == sizeof(uint32_t ) ?

As I was trying to say in my earlier message some platforms have a
different number of bits for a byte and thus it's natural to assume
that on those platforms alignment for a word won't happen in a multiple
of 8 bits but in a multiple of their byte (which lets assume it has 12
bits). In that case that platform in order to support fixed bitlen
types (like uint8_t) it will probably do some padding as such on that
platform sizeof(uint8_t) will be 1 (ie one 12bit byte can support a
8bit integer value) but sizeof(uint32_t ) might be 3 (ie 3 bytes,
because 3 * 12bit = 36bit and can support a 32bit integer value).

Perhaps I misread the standard, but it seems to me that uint8_t is only
defined on platforms where there exists a native unsigned integer type
with a width of *exactly* 8 bits.

Thus, on your hypothetical platform with 12-bit chars, uint8_t would not
be defined, as far as I understand.

Dec 28 '06 #8

dizone

Spoon wrote:

dizone wrote:

Spoon wrote:

What if I want to align to 256 bits?
Do I have to create a bogus 256-bit structure?
(Since there is no native 256-bit integral type.)
While not all platforms allow one to access "words" (a term defined for
a specific platform) at any memory address and must be aligned
accordingly, they all (AFAIK) have addresses which are aligned for ANY
type of access. This feature is generally used by C/C++ memory
allocators (ex. malloc() needs to return such aligned for all memory
usually while C++'s new is slightly more free to perform some alignment
optimizations).

Let me say a bit more about what I want to do.

Consider two N-bit buffers A and B. (Typically N is ~10000)

I want to compute C = A XOR B.
That is, for each bit i, C[i] = A[i] XOR B[i]
As you can see, this problem is embarrassingly parallel.

The original naive solution was:

Q=N/8
uint8_t A[Q]; uint8_t B[Q]; uint8_t C[Q];
for (int i=0; i < Q; ++i) C[i] = A[i] ^ B[i];

slow and unportable (doesnt catch all the bits as I explained with the
12bit byte platforms).

The next step was to work with words (32 bits on my platform).

Q=N/32
uint32_t A[Q]; uint32_t B[Q]; uint32_t C[Q];
for (int i=0; i < Q; ++i) C[i] = A[i] ^ B[i];

Much faster but still unportable :)

Profiling indicated that I spend most of my time in this function, so I
figured I'd turn to platform-specific optimizations. My platform
provides 128-bit multimedia registers. But unaligned access incurs a
penalty. Thus, I want to guarantee that all 3 buffers are 128-bit
aligned, in order to write something like:

Q=N/128
uint32_t A[Q]; uint32_t B[Q]; uint32_t C[Q];
for (int i=0; i < Q; ++i) C[i] = A[i] ^ B[i];

(~16 times faster than my original implementation. )

How do I convince new to give me a 128-bit aligned buffer?

By asking it to allocate something of 128bit I guess although Im not so
sure about that (because of the problems with 8bit not being a byte).

However, in your situation, what I would do were to just use the
"natural" platform integer type and that's what "int" is. This is
actually why C/C++ never specify fixed bit sizes for integers because
they want to allow the coder to just use "int" which would turn into
the natural platform integer type (for 32bit platforms it has 32bit for
64bit it has 32 or 64, it depends).

So in your situation I would just use new int[some size] and cycle over
it and XOR it. This guarantees that it should be fast ("int" is the
natural platform integer, the one with which works fastest normally)
and that its portable (for 12bit byte platforms "int" would probably
turn into something of 12bit multiple size thus no issues with the
padding bits when XORing the buffer).

I would also test with long and benchmark against int, if it provides
better speed (long still is portable).

However, if you want to use special CPU instructions (that work with
128bit integers) you may get memory alligned for anything (including
for 128bit access) with std::malloc(). This use std::malloc() to get
the memory and the special CPU instructions to cycle and XOR it.

>
#include <cstdio>
struct foo { long long x,y; };
int main()
{
for (int i=0; i < 3; ++i) printf("%p\n", ( void * )( new foo ));
}

0x804a008
0x804a020
0x804a038

Are you saying I need to request "more" and "fix" the pointer?

That is another solution too. You can request more and fix the pointer
using boost::alignmen t_of<but in your particular case I don't think
it's needed, just test with int/long normal "new" and XOR cycle and if
that's not fast enough use std::malloc() to get memory aligned for
anything and the special CPU 128bit working instructions over it.

Dec 28 '06 #9

dizone

Spoon wrote:

Perhaps I misread the standard, but it seems to me that uint8_t is only
defined on platforms where there exists a native unsigned integer type
with a width of *exactly* 8 bits.

Thus, on your hypothetical platform with 12-bit chars, uint8_t would not
be defined, as far as I understand.

I see, sorry about that, I only limited myself to ISO C++ and there is
no uint8_t in it so I assumed some things about it :)

Indeed if that's the text then my example is of no use (however I
really think I read about REAL platforms with 12bit bytes length so not
only in theory).

Dec 28 '06 #10

Similar topics

1554

dynamic class members workaround

by: Shailesh | last post by:

If I'm not mistaken, C++ doesn't have support for dynamic class members. I'm considering if such a facility would be useful, and what method would make a good workaround. I have a generic buffer class, which can have one or more cursor classes attached to it. Right now, they are both separate classes, and I use them as follows: class...

C / C++

7667

Passing Dynamic Arrays to a function?

by: Scott Lyons | last post by:

Hey all, Can someone help me figure out how to pass a dynamic array into a function? Its been giving me some trouble, and my textbook of course doesnt cover the issue. Its probably something simple, but its just not popping into my mind at the moment. My little snippet of code is below. Basically, the studentID array is dynamic so it...

C / C++

3669

Can a static array contain a dynamic array of pointers?

by: Peter B. Steiger | last post by:

The latest project in my ongoing quest to evolve my brain from Pascal to C is a simple word game that involves stringing together random lists of words. In the Pascal version the whole array was static; if the input file contained more than entries, tough. This time I want to do it right - use a dynamic array that increases in size with...

C / C++

3742

strings, arrays, pointers and dynamic memory allocation

by: swarsa | last post by:

Hi All, I realize this is not a Palm OS development forum, however, even though my question is about a Palm C program I'm writing, I believe the topics are relevant here. This is because I believe the problem centers around my handling of strings, arrays, pointers and dynamic memory allocation. Here is the problem I'm trying to solve: ...

C / C++

2020

Dynamic C String Problem

by: coosa | last post by:

Dear all, Using the conio implementation i wanted to create a dynamic string, whereby its size would be determined after each keyboard hit; in other words, i don't want to ask the user to specify the the size, but rather keep him/her typing and after each keyboard hit, the function getch() determines whether he/she entered the ENTER key to...

C / C++

1410

making dynamic buffer in VB.net to use in C dll

by: vinbelgian | last post by:

I have some trouble with making a buffer in vb.net. I use a C dll that requires me to give him a pointer to a buffer of bytes where he is going to write bytes to, depending on the command i give. the import of the dll: with the C procedures in comments ' declarations for integration test.dll ' int __stdcall test_Open(char *device, long...

Visual Basic .NET

2006

Dynamic Menu Items is not right aligned with Right to Left documen

by: =?Utf-8?B?QmlzaG95?= | last post by:

Hi All, I have a Right to Left web page with asp.net 2.0 containing a horizontal menu. The menu is right to left as the page. The problem is the submenu items or the dynamic menu items are not aligned right, so the menu has a bad shape. -------------------------------------------------------------

ASP.NET

1999

the address returned from "new" statement is 32 bit aligned?

by: Asm23 | last post by:

Hi i'm using intel P4. when I write the statement like char *p=new char what's the p value, is it aligned by 4 bytes? that's is p%4==0? and, now I'm using the Intel SSE2 instruction to do some fast algorthm, some instructions I need to move data from memory to XMM0 registers(which is 128 bits) should be be aligned on 16-byte

C / C++

3875

Dynamic pointer array to strings

by: Richard Gilmore | last post by:

Ok I need to create a dynamic array of pointers to strings, the number of strings is determined by the typed value at the keyboard, and then the length of each string is determined as it's typed in char **zerg; char ch; int num; int i; char buffer; printf("Enter the number of names you wish to store>"); scanf("%d",&num); zerg = (char...

C / C++

7694

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...

General

7609

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...

Windows Server

7921

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...

C / C++

7666

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...

Windows Server

7964

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...

General

6278

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...

Career Advice

2107

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

1208

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

936

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

General