By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
449,007 Members | 979 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 449,007 IT Pros & Developers. It's quick & easy.

Is this standard c++...

P: n/a
I am thinking about using this technique for all the "local" memory pools in
a paticular multi-threaded allocator algorithm I invented. Some more info on
that can be found here:

http://groups.google.com/group/comp....c40d42a04ee855

Anyway, here is the code snippet:

#include <cstdio>
#include <cstddef>
#include <new>
template<size_t T_sz>
class lmem {
unsigned char m_buf[T_sz];
public:
void* loadptr() {
return m_buf;
}
};
class foo {
public:
foo() { printf("(%p)foo::~foo()", (void*)this); }
~foo() { printf("(%p)foo::~foo()", (void*)this); }
};
int main(void) {
foo *f;
lmem<sizeof(*f)foomem;
f = new (foomem.loadptr()) foo;
f->~foo();
return 0;
}

Feb 27 '07 #1
Share this Question
Share on Google+
22 Replies


P: n/a
Chris Thomasson wrote:
I am thinking about using this technique for all the "local" memory
pools in a paticular multi-threaded allocator algorithm I invented.
Some more info on that can be found here:

http://groups.google.com/group/comp....c40d42a04ee855

Anyway, here is the code snippet:

#include <cstdio>
#include <cstddef>
#include <new>
template<size_t T_sz>
class lmem {
unsigned char m_buf[T_sz];
public:
void* loadptr() {
return m_buf;
}
};
class foo {
public:
foo() { printf("(%p)foo::~foo()", (void*)this); }
The string seems to be incorrect.
~foo() { printf("(%p)foo::~foo()", (void*)this); }
};
int main(void) {
foo *f;
lmem<sizeof(*f)foomem;
f = new (foomem.loadptr()) foo;
It's not "incorrect", but I believe you may have a problem
with alignment. Only a memory block sized 'sizeof(foo)'
obtained from free store is aligned correctly to have a 'foo'
constructed in it like that.
f->~foo();
return 0;
}
V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
Feb 27 '07 #2

P: n/a
Chris Thomasson wrote:
I am thinking about using this technique for all the "local" memory pools in
a paticular multi-threaded allocator algorithm I invented. Some more info on
that can be found here:

http://groups.google.com/group/comp....c40d42a04ee855

Anyway, here is the code snippet:

#include <cstdio>
#include <cstddef>
#include <new>
template<size_t T_sz>
class lmem {
unsigned char m_buf[T_sz];
You may run into alignment problems by using an array of unsigned char,
you should follow the same system specific alignment rules as malloc.
public:
void* loadptr() {
return m_buf;
}
};
class foo {
public:
foo() { printf("(%p)foo::~foo()", (void*)this); }
~foo() { printf("(%p)foo::~foo()", (void*)this); }
};
C++ purists avoid printf. Never mix it with iostreams.
>
int main(void) {
Just int main() is the norm in C++.

--
Ian Collins.
Feb 27 '07 #3

P: n/a
Ian Collins wrote:

C++ purists avoid printf. Never mix it with iostreams.
Bah. Pointlessly dogmatic. Sometimes it's the right thing.


Brian
Feb 28 '07 #4

P: n/a
Default User wrote:
Ian Collins wrote:
>>C++ purists avoid printf. Never mix it with iostreams.

Bah. Pointlessly dogmatic. Sometimes it's the right thing.
Care to cite an example?

--
Ian Collins.
Feb 28 '07 #5

P: n/a
On 27 Feb, 22:59, "Victor Bazarov" <v.Abaza...@comAcast.netwrote:
Chris Thomasson wrote:
I am thinking about using this technique for all the "local" memory
pools in a paticular multi-threaded allocator algorithm I invented.
Some more info on that can be found here:
http://groups.google.com/group/comp....ead/24c40d42a0...
Anyway, here is the code snippet:
#include <cstdio>
#include <cstddef>
#include <new>
template<size_t T_sz>
class lmem {
unsigned char m_buf[T_sz];
public:
void* loadptr() {
return m_buf;
}
};
class foo {
public:
foo() { printf("(%p)foo::~foo()", (void*)this); }

The string seems to be incorrect.
~foo() { printf("(%p)foo::~foo()", (void*)this); }
};
int main(void) {
foo *f;
lmem<sizeof(*f)foomem;
f = new (foomem.loadptr()) foo;

It's not "incorrect", but I believe you may have a problem
with alignment. Only a memory block sized 'sizeof(foo)'
obtained from free store is aligned correctly to have a 'foo'
constructed in it like that.
IMO, because the array is wrapped in a class there shouldnt be a
problem. IOW thc class alignment will take care of the issue. However
It should be possible to check . (Not absolutely sure if this is the
correct solution but whatever ...

regards
Andy Little

#include <stdexcept>

// aka std::tr1::
template <typename T>
struct alignment_of{
#if defined _MSC_VER
static const unsigned int value = __alignof(T);
#elif defined __GNUC__
static const unsigned int value = __alignof__(T);
#else
#error need to define system dependent align_of
#endif
};
template<typename T, size_t T_sz>
class lmem {

unsigned char m_buf[T_sz];
public:
void* loadptr() {
// check its aligned correctly
ptrdiff_t dummy = m_buf - static_cast<char*>(0);
ptrdiff_t offset = dummy % alignment_of<T>::value;
if(!offset)
return m_buf;
throw std::logic_error("lmem memory doesnt satisfy alignment");
}
};
int main()
{
typedef double type;
lmem<type,sizeof(type)x;
}

Feb 28 '07 #6

P: n/a
On 28 Feb, 01:25, "kwikius" <a...@servocomm.freeserve.co.ukwrote:
IMO, because the array is wrapped in a class there shouldnt be a
problem. IOW thc class alignment will take care of the issue.
Well I was wrong about that :.
(Back to the drawing board I guess)

But at least the code seems to detect the problem :

#include <stdexcept>

// aka std::tr1::
template <typename T>
struct alignment_of{
#if defined _MSC_VER
static const unsigned int value = __alignof (T);
#elif defined __GNUC__
static const unsigned int value = __alignof__(T);
#else
#error need to define system dependent align_of
#endif
};
template<typename T, size_t T_sz>
class lmem {

unsigned char m_buf[T_sz];
public:
void* loadptr() {
// check its aligned correctly
ptrdiff_t dummy = m_buf - static_cast<unsigned char*>(0);
ptrdiff_t offset = dummy % alignment_of<T>::value;
if(!offset)
return m_buf;
throw std::logic_error("lmem memory doesnt staisfy alignment");
}
};

#include <iostream>
int main()
{
try{
typedef double type;
char c;
lmem<type,sizeof(type)x;
void * p = x.loadptr() ;
}
catch(std::exception & e)
{
std::cout << e.what() <<'\n';
}
}
/* output:
lmem memory doesnt staisfy alignment

*/
regards
Andy Little

Feb 28 '07 #7

P: n/a
kwikius wrote:
>
IMO, because the array is wrapped in a class there shouldnt be a
problem. IOW thc class alignment will take care of the issue. However
It should be possible to check . (Not absolutely sure if this is the
correct solution but whatever ...
No, the class may be aligned according to the alignment requirements of
its members, in this case unsigned char.

--
Ian Collins.
Feb 28 '07 #8

P: n/a
On 28 Feb, 02:47, Ian Collins <ian-n...@hotmail.comwrote:
kwikius wrote:
IMO, because the array is wrapped in a class there shouldnt be a
problem. IOW thc class alignment will take care of the issue. However
It should be possible to check . (Not absolutely sure if this is the
correct solution but whatever ...

No, the class may be aligned according to the alignment requirements of
its members, in this case unsigned char.
Yep. I figured it out eventaully I think.
It seems to be possible but at the expense of always allocating your
char array oversize by alignment_of<T-1. Its not possible to know
where on the stack the lmem object will go.

Anyway the following seems to work :
template<typename T, size_t T_sz>
class lmem {
// Don't think there is a way to avoid
// allocating extra stack space ...
unsigned char m_buf[T_sz + alignment_of<T>::value -1];
public:
void* loadptr() {
// align memory to T
ptrdiff_t dummy = m_buf - static_cast<unsigned char*>(0);
ptrdiff_t offset = dummy % alignment_of<T>::value;
ptrdiff_t result = offset == 0
? dummy
: dummy + alignment_of<T>::value - offset;
// check this works
assert( result % alignment_of<T>::value == 0);
return static_cast<unsigned char*>(0) + result;
}
};

struct my{
int x, y;
double z;
my(int xx, int yy, double zz): x(xx),y(yy),z(zz){}
};

#include <iostream>
int main()
{

// muck about with stack offsets
char c = '\n';
short n = 1;
lmem<double,sizeof(double)x;
double * pd = new (x.loadptr()) double(1234.56789);
std::cout << *pd <<'\n';

lmem<my, sizeof(my)y;
my * pc = new (y.loadptr()) my(1,2,3);

std::cout << pc->x << ' ' << pc->y << ' ' << pc->z <<'\n';

}

regards
Andy Little

>
--
Ian Collins.

Feb 28 '07 #9

P: n/a
kwikius wrote:
On 28 Feb, 02:47, Ian Collins <ian-n...@hotmail.comwrote:
>>kwikius wrote:

>>>IMO, because the array is wrapped in a class there shouldnt be a
problem. IOW thc class alignment will take care of the issue. However
It should be possible to check . (Not absolutely sure if this is the
correct solution but whatever ...

No, the class may be aligned according to the alignment requirements of
its members, in this case unsigned char.


Yep. I figured it out eventaully I think.
It seems to be possible but at the expense of always allocating your
char array oversize by alignment_of<T-1. Its not possible to know
where on the stack the lmem object will go.

Anyway the following seems to work :
template<typename T, size_t T_sz>
class lmem {
// Don't think there is a way to avoid
// allocating extra stack space ...
unsigned char m_buf[T_sz + alignment_of<T>::value -1];
public:
void* loadptr() {
// align memory to T
ptrdiff_t dummy = m_buf - static_cast<unsigned char*>(0);
ptrdiff_t offset = dummy % alignment_of<T>::value;
ptrdiff_t result = offset == 0
? dummy
: dummy + alignment_of<T>::value - offset;
// check this works
assert( result % alignment_of<T>::value == 0);
return static_cast<unsigned char*>(0) + result;
}
};
Or even:

template <typename T>
class lmem {
T t;
public:
void* loadptr() {
return &t;
}
};

:)
--
Ian Collins.
Feb 28 '07 #10

P: n/a
On 28 Feb, 03:28, Ian Collins <ian-n...@hotmail.comwrote:
kwikius wrote:
On 28 Feb, 02:47, Ian Collins <ian-n...@hotmail.comwrote:
>kwikius wrote:
>>IMO, because the array is wrapped in a class there shouldnt be a
problem. IOW thc class alignment will take care of the issue. However
It should be possible to check . (Not absolutely sure if this is the
correct solution but whatever ...
>No, the class may be aligned according to the alignment requirements of
its members, in this case unsigned char.
Yep. I figured it out eventaully I think.
It seems to be possible but at the expense of always allocating your
char array oversize by alignment_of<T-1. Its not possible to know
where on the stack the lmem object will go.
Anyway the following seems to work :
template<typename T, size_t T_sz>
class lmem {
// Don't think there is a way to avoid
// allocating extra stack space ...
unsigned char m_buf[T_sz + alignment_of<T>::value -1];
public:
void* loadptr() {
// align memory to T
ptrdiff_t dummy = m_buf - static_cast<unsigned char*>(0);
ptrdiff_t offset = dummy % alignment_of<T>::value;
ptrdiff_t result = offset == 0
? dummy
: dummy + alignment_of<T>::value - offset;
// check this works
assert( result % alignment_of<T>::value == 0);
return static_cast<unsigned char*>(0) + result;
}
};

Or even:

template <typename T>
class lmem {
T t;
public:
void* loadptr() {
return &t;
}

};
But consider a variant, and you are smokin', treating stack like heap
with no alloc overhead...

This may be the purpose behind the device ...

template< size_t T_sz>
class lmem {

unsigned char m_buf[T_sz];
public:
template <typename T>
void* loadptr() {
assert( T_sz >= sizeof(T) + alignment_of<T>::value -1);
// align memory to T
ptrdiff_t dummy = m_buf - static_cast<unsigned char*>(0);
ptrdiff_t offset = dummy % alignment_of<T>::value;
ptrdiff_t result = offset == 0
? dummy
: dummy + alignment_of<T>::value - offset;
// check this works
assert( result % alignment_of<T>::value == 0);
return static_cast<unsigned char*>(0) + result;
}
};

struct my{
int x, y;
double z;
my(int xx, int yy, double zz): x(xx),y(yy),z(zz){}
};

#include <iostream>
int main()
{

// muck about with stack offsets
char c = '\n';
short n = 1;
lmem<1000x;
double * pd = new (x.loadptr<double>()) double(1234.56789);
std::cout << *pd <<'\n';
// add some detroy function which discrimainates pods etc
my * pc = new (x.loadptr<my>()) my(1,2,3);
std::cout << pc->x << ' ' << pc->y << ' ' << pc->z <<'\n';
// use some destroy() for my dtor
}

regards
Andy Little
Feb 28 '07 #11

P: n/a
Ian Collins wrote:
Default User wrote:
Ian Collins wrote:
C++ purists avoid printf. Never mix it with iostreams.
Bah. Pointlessly dogmatic. Sometimes it's the right thing.
Care to cite an example?
Any time you need a compact statement for formatting output from
multiple variables. All that screwing around with setw and setprecision
and whatnot is ridiculous.

Just because dumbasses sometimes fail to use printf() in a correct
manner doesn't mean that those of us who aren't dumbasses should not
use it.

Yeah, yeah, I know Boost has some sort of formatted output thing, but
some of us can't use it. I have no idea what its status is vis-a-vis
the standard.

Brian
Feb 28 '07 #12

P: n/a
Default User wrote:
Ian Collins wrote:

>>Default User wrote:
>>>Ian Collins wrote:
C++ purists avoid printf. Never mix it with iostreams.

Bah. Pointlessly dogmatic. Sometimes it's the right thing.

Care to cite an example?


Any time you need a compact statement for formatting output from
multiple variables. All that screwing around with setw and setprecision
and whatnot is ridiculous.
Then use sprintf.

I've seen real problems where printf and iostreams were used on the same
stream (stdout).
Just because dumbasses sometimes fail to use printf() in a correct
manner doesn't mean that those of us who aren't dumbasses should not
use it.
I wasn't saying don't use it, I was saying don't mix.

--
Ian Collins.
Feb 28 '07 #13

P: n/a
Ian Collins wrote:
>
I've seen real problems where printf and iostreams were used on the same
stream (stdout).
There shouldn't be any problems with a standard-conforming library,
unless the program does something stupid. cout and stdout are
synchronized, so output comes out just the way you'd expect.

--

-- Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com)
Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." (www.petebecker.com/tr1book)
Feb 28 '07 #14

P: n/a
Pete Becker wrote:
Ian Collins wrote:
>>
I've seen real problems where printf and iostreams were used on the same
stream (stdout).

There shouldn't be any problems with a standard-conforming library,
unless the program does something stupid. cout and stdout are
synchronized, so output comes out just the way you'd expect.
The problems I had were with threaded code where the access guards for
the streams got in a mess (IIRC there were different guards for
iostreams and stdio streams). Anyway, this was in pre-standard days.

--
Ian Collins.
Feb 28 '07 #15

P: n/a
"kwikius" <an**@servocomm.freeserve.co.ukwrote in message
news:11**********************@m58g2000cwm.googlegr oups.com...
[...]
Yep. I figured it out eventaully I think.
It seems to be possible but at the expense of always allocating your
char array oversize by alignment_of<T-1. Its not possible to know
where on the stack the lmem object will go.
[...]
But consider a variant, and you are smokin', treating stack like heap
with no alloc overhead...

This may be the purpose behind the device ...
[...]

Indeed it its. I believe that I could make use of the following function
'ac_malloc_aligned' :

http://appcore.home.comcast.net/appc...appcore_c.html
(2nd to last function in the file...)

to get the alignment correct. Your correct in that I have to endure a
penalty of an "over allocation" in order to get the alignment correct. I was
hoping to align the "main buffer" to a multiple of the size of a
architecture specific L2 cache-line then align is on L2 cacheline boundary.
I could then start to allocate the individual buffered chunks from there...
Well, I will post some more source code in a day or two... So far, it should
still be in the realm of standard c++... However, once I get this first
phase out of the way... well, the code is going to get HIGHLY architecture
specific as the so-called "critical" parts of my memory allocator algorithm
wart the interlocked RMW instructions and memory barriers are packed away
into ia32 and SPARC assembly language.
Mar 1 '07 #16

P: n/a
[...]Indeed it its. I believe that I could make use of the following
function
'ac_malloc_aligned' :
Minus the explicit heap allocation of course!

;^)
Mar 1 '07 #17

P: n/a
On 28 Feb, 23:59, "Chris Thomasson" <cris...@comcast.netwrote:

I have been lazily following your thread related stuff. One day I may
be able to make use of it, though currently it all looks pretty opaque
I'm afraid.

On this subject I had hoped that I could use boost::shared_ptr for a
GUI smart_ptr class but unfortunately it doesnt work in that
situation, so I have been forced to roll my own.

Here is some (slightly confused discussion baout it):

http://tinyurl.com/3debeh

I have been sticking to single threaded version and probably will for
the time being, but if I get onto anything more substatial it will
need to work with some form of concurrency, so at (if I get to) that
point I will certainly be interested...

regards
Andy Little
Mar 1 '07 #18

P: n/a
"Chris Thomasson" <cr*****@comcast.netwrote in message
news:Xf******************************@comcast.com. ..
"kwikius" <an**@servocomm.freeserve.co.ukwrote in message
news:11**********************@m58g2000cwm.googlegr oups.com...
[...]
>Yep. I figured it out eventaully I think.
It seems to be possible but at the expense of always allocating your
char array oversize by alignment_of<T-1. Its not possible to know
where on the stack the lmem object will go.
[...]
>But consider a variant, and you are smokin', treating stack like heap
with no alloc overhead...

This may be the purpose behind the device ...
[...]

Indeed it its. I believe that I could make use of the following function
'ac_malloc_aligned' :

http://appcore.home.comcast.net/appc...appcore_c.html
(2nd to last function in the file...)
Okay. I was thinking of something kind of like:
<pseudo-code/sketch>
---------

#include <cstdio>
#include <cstddef>
#include <cassert>
#include <new>
template<size_t T_basesz, size_t T_metasz, size_t T_basealign>
class lmem {
unsigned char m_basebuf[T_basesz + T_metasz + T_basealign - 1];
unsigned char *m_alignbuf;

private:
static unsigned char* alignptr(unsigned char *buf, size_t alignsz) {
ptrdiff_t base = buf - static_cast<unsigned char*>(0);
ptrdiff_t offset = base % alignsz;
ptrdiff_t result = (! offset) ? base : base + alignsz - offset;
assert(! result % alignsz);
return static_cast<unsigned char*>(0) + result;
}

public:
lmem() : m_alignbuf(alignptr(m_basebuf, T_basealign)) {}
template<typename T>
void* loadptr() const {
assert(T_basesz >= (sizeof(T) * 2) - 1);
return alignptr(m_alignbuf, sizeof(T));
}
void* loadmetaptr() const {
return m_alignbuf + T_basesz;
}
};
namespace detail {
namespace os {
namespace cfg {
enum config_e {
PAGE_SZ = 8192
};
}}

namespace arch {
namespace cfg {
enum config_e {
L2_CACHELINE_SZ = 64
};
}}

namespace lheap {
namespace cfg {
enum config_e {
BUF_SZ = os::cfg::PAGE_SZ * 2,
BUF_ALIGN_SZ = arch::cfg::L2_CACHELINE_SZ,
BUF_METADATA_SZ = sizeof(void*)
};
}}
}
template<typename T>
class autoptr_calldtor {
T *m_ptr;
public:
autoptr_calldtor(T *ptr) : m_ptr(ptr) {}
~autoptr_calldtor() {
if (m_ptr) { m_ptr->~T(); }
}
T* loadptr() const {
return m_ptr;
}
};

namespace lheap {
using namespace detail::lheap;
}

class foo {
public:
foo() { printf("(%p)foo::foo()", (void*)this); }
~foo() { printf("(%p)foo::~foo()", (void*)this); }
};

int main() {
{
lmem<lheap::cfg::BUF_SZ,
lheap::cfg::BUF_METADATA_SZ,
lheap::cfg::BUF_ALIGN_SZfoomem;

autoptr_calldtor<foof(new (foomem.loadptr<foo>()) foo);
}

printf("\npress any key to exit...\n"); getchar();
return 0;
}

The lmem object is meant to be barebones low-level buffer object in the
"system-code" part of my c++ memory allocator library I am currently
developing. Basically, I am going for a fairly thin wrapper over the
allocator pseudo-code I posted; you can follow the link to the invention to
look at it. Humm... As you can probably clearly see by now, I am a hard core
C programmer and I must admit that my c++ could skills can be improved
upon... So, any ideas for interface designs, or even system level design,
are welcome...

I was thinking about using a single lmem object per-thread and then
subsequently using it to allocate all of the per-thread data-structures my
allocator algorithms relies upon. So, essentially, every single
data-structure that makes up my multi-threaded allocator design can be based
entirely in the stacks of a plurality of threads. Wow, this has the
potential to have simply excellent scalability and performance
characteristics; anyway... ;^)

So, since lmem is all I "really" need and I don't want to post any of the
implementations details wrt lock-free algorithms, ect... what else can I
discuss here that's on topic... I am going to need to finally decide on
exactly how I will be laying out the per-thread structures in the buffer
managed by lmem...

Then I need to think about how I am going to ensure that the threads stacks
don't go away when any of its allocator structures are in use by any other
thread. The following technique currently works fine:

<pseudo-code>

template

class mylib_thread {
// ...
public:
~mylib_thread() {
/*
special atomic-decrement-and-wait function;
off-topic, not shown here...
we can discuss the lock-free aspects of my algorithm
over on comp.programming.threads...
*/
}
};

user_thread_entry(mylib_thread &_this) {
// user application code
}

libsys_thread_entry(...) {
// library system code

lmem<lheap::cfg::BUF_SZ,
lheap::cfg::BUF_METADATA_SZ,
lheap::cfg::BUF_ALIGN_SZfoomem;

autoptr_calldtor<foo_this(new (_thismem.loadptr<mylib_thread>())
mylib_thread);

user_thread_entry(*_this.loadptr());
}


Any thoughts?
Mar 2 '07 #19

P: n/a
On 2 Mar, 11:48, "Chris Thomasson" <cris...@comcast.netwrote:

<...>
Any thoughts?
First I don't know enough about threads to make any constructive
commnts. OTOH give me access to the system timer and program counter
and ability to disable interrupts and allow me to write interrrupt
service routines, in fact control of the system event mechanisms and I
would be happy. OTOH I guess I should head over to the threads
newsgroup and try to understand them better. Anyway here are some
thoughts (in contradiction to the above) thouh I havent studied your
code in any depth:

Firstly I don't understand from that the way you want to use the
device, but I would guess it would be restricted to specialised use,
however a few (confused) thoughts spring to mind. The first is that in
my environment the stack is actually quite a scarce resource, default
around 1 Mb (In VC7.1), after which you get a (non C++) stack overflow
in VC7.1. I presume you can modify this though.

The heap on the other hand can be looked on as a(almost) infinite
resource, even if you run out of physical memory the system will start
swapping memory to from disk. Very system specific though I guess..

In that sense I would guess that use of the stack is by nature not as
scaleable as using the heap.

So from that point of view it is interesting to try to come up with
scenarios where you would use the device.
>From the viewpoint of allocationg on the stack, essentially you have
to know how much you are going to allocate beforehand, but if the
stack is a scarce resource,its not viable to treat it in the same
carefree way as the heap proper and just allocate huge. Of course it
may be possible to use some assembler to write your own functions
grabbing runtime amounts of stack, which would maybe make the device
more versatile.

For use as an allocator, the alternative is of course to use malloc
for your one time start up allocation for your own allocator, and then
after the start up cost whether you allocated on the heap or stack I
would guess the cost of getting memory from the allocator is going to
be the same regardless where it is.

Therefore the main potential use of such a device would be where you
know the size of fixed size allocation, but don't know what types you
want to put in it, and where you doing your start up allocation
frequently. This may be in some situation where you actually dont want
to use the function call mechanism for some reason, IOW using the
scheme as a sort of scratch space where you are modifying the types in
the scratch space dependent on what you are doing. This does somehow
bring to mind use in embedded systems where there is no heap as such
so the scheme could be used as a heap for systems without a heap as it
were. You would then presumably need to keep passing a reference to
the allocator in to child functions or put it in a global variable.

Overall then its difficult to know where the advantages outweigh the
difficulties.

regards
Andy Little





Mar 3 '07 #20

P: n/a
"kwikius" <an**@servocomm.freeserve.co.ukwrote in message
news:11*********************@n33g2000cwc.googlegro ups.com...
On 2 Mar, 11:48, "Chris Thomasson" <cris...@comcast.netwrote:

<...>
>Any thoughts?

First I don't know enough about threads to make any constructive
commnts.
<an "on the soap box" like statement>
I know that you know more than you think you do.... The steps that are
necessary in order to realize a constructive use of concurrency in general
eventually end up being a strict lesson in common sense... Well, of course
you, and "virtually" all of us, are already "blessed" with some form of
ingrained comment sense. So, do you have what it takes to make interesting
and good uses of multithreading techniques, IMHO, of course you do!

Never sell yourself short wrt any aspect of what you can and cannot
learn!... :O
</an "on the soap box" like statement>

;^)

OTOH give me access to the system timer and program counter
and ability to disable interrupts and allow me to write interrrupt
service routines, in fact control of the system event mechanisms and I
would be happy.
Yup. I remember the good old days when I was implementing the early stages
of operating system loading processes. Kind of gives you a GOD complex
though. I mean, you get to fill in all of the interrupt vectors with your
stuff, and you have access to those wonderful hardware control words. Good
times!

OTOH I guess I should head over to the threads
newsgroup and try to understand them better.
We would be happy to help you. That group needs some more traffic anyway!
Seems like its sort of "dead" at times... So, if you end up handing out some
more on comp.programming.threads, go ahead an tell some of your friends
about it! BTW, David Butenhof hangs around there sometimes... Indeed you can
have some of the worlds best threading gurus give a little introspection on
any question you may have. Okay, enough with the "salesman" crap! :O Sorry.

Anyway here are some
thoughts (in contradiction to the above) thouh I havent studied your
code in any depth:
Okay.

Firstly I don't understand from that the way you want to use the
device, but I would guess it would be restricted to specialised use,
however a few (confused) thoughts spring to mind.
First of all, let me provide a "quick introduction" on all of my subsequent
comments:

"I want the device to be able to provide the end user with the exact same
functionality as the following ANSI C functions do:

extern "C" {
extern void* malloc(size_t);
extern void free(void*);
}

with the following exception: malloc and free are going to be 100%
thread-safe. The low-level synchronization scheme is going to exist in
per-thread data-structures. Any allocations that cannot be safely and
efficiently addressed by the per-thread scheme (e.g., the "local heap") will
be subjected to an allocation from the "global heap". In other words, if
allocations are sufficiently small in size (e.g., sizeof(void*) to 256 bytes
are fine), then all aspects of the device will be completely utilized. The
devices "global heaps" will be invoked for everything else.

The device is able to accommodate highly efficient threading designs. For
instance, if your threads keep their allocations local, then the device will
not use any multiprocessor synchronization techniques at all. It will be
equivalent to a single-threaded allocation. The device WILL make use of an
interlocked RMW instruction and a memory barrier (e.g., CAS and SPARC's
membar #StoreStore) ONLY IF your threads can "pass allocations around
amongst themselves", AND, "the thread that allocated an object 'A' winds up
not being the thread that eventually frees object 'A'"
"

The first is that in
my environment the stack is actually quite a scarce resource, default
around 1 Mb (In VC7.1), after which you get a (non C++) stack overflow
in VC7.1. I presume you can modify this though.
Yes. Most threading abstractions allow one to setup a threads individual
stack size. Actually, and in IMHO of course, a "lot of programs" don't
really end up making "complete" use of that 1MB stack... Also, lets try to
keep in mind that virtually any so-called "competent" system that makes use
of some form of recursion has to have a method for "ensuring that there is
enough space on the stack to accommodate the complete depth of the recursive
operation".

The heap on the other hand can be looked on as a(almost) infinite
resource, even if you run out of physical memory the system will start
swapping memory to from disk. Very system specific though I guess..
Indeed it is system specific. However, you raise a critical point. I address
this scenario by funneling all allocations that the per-thread part of my
allocator implementation cannot address through an abstraction of the OS
heap. The abstraction can be as simple as a call to the OS provided standard
C library malloc function. My stuff essentially sits on top of the threads
stacks and uses malloc to enable it to not fall off when it runs out of
memory during "high-stress" situations that a multi-threaded user
application can sometimes, and usually will end up generating.

In that sense I would guess that use of the stack is by nature not as
scaleable as using the heap.
Which is exactly why I am forced to use the heap as a slow-path in my
current implementation of the allocator algorithm.

So from that point of view it is interesting to try to come up with
scenarios where you would use the device.
Thanks! IMHO, the allocator works perfectly when you try to make any
deallocations of a memory block 'X' occur in the same thread that originally
allocated block 'X' in the first place. IMHO, I can sort of "get away" with
using the phrase "works perfectly" simply because the allocator endures no
more overhead than a single-threaded allocator would when an application
tryst real hard to keep things "thread local". My invention will use an
interlocked RMW instruction and a #StoreStore memory barrier to "penalize
any thread that tries to deallocate a memory block that it did not allocate
itself!". Everything is lock-free, however, as we all should know,
interlocked RMW and/or memory barrier instructions are fairly expensive for
basically any modern processor to execute. They tend to have the unfortunate
side-effect of blowing any cache the CPU has accumulated and devastating
part of any pipelined operations that were pending. This type of behavior
can gravely wound a number of aspects that are involved with scalability and
throughput in general.

>>From the viewpoint of allocationg on the stack, essentially you have
to know how much you are going to allocate beforehand, but if the
stack is a scarce resource,its not viable to treat it in the same
carefree way as the heap proper and just allocate huge. Of course it
may be possible to use some assembler to write your own functions
grabbing runtime amounts of stack, which would maybe make the device
more versatile.
Perhaps. I am leaning toward keeping things local, and using a thin API
layer around the OS heap as a sort of "last resort".

For use as an allocator, the alternative is of course to use malloc
for your one time start up allocation for your own allocator, and then
after the start up cost whether you allocated on the heap or stack I
would guess the cost of getting memory from the allocator is going to
be the same regardless where it is.
Great point. Perhaps I am being a bit unpractical wrt my line of thought
that most programs don't make use of the fairly large default stack size the
OS regularly hands end up handing out to a processes threads. Anyway, I kind
of like the idea of not having to make use of malloc when you have perfectly
good, and most likely unused, stack space sitting around on your
applications threads. The synchronization scheme I created allows for a
thread 'A' to allocate a memory block 'X' and subsequently pass it around to
threads 'B' through 'Z'. So, thread 'B' can use block 'X' even though the
memory the makes up block 'X' resides on thread 'A' stack. Humm... Your
solution to the problem (e.g., use malloc to provide the allocators
per-thread data-structures) is well grounded in common sense. Like I said,
perhaps I am being a bit unpractical here... Well, screw it! I think its
neat that an allocator can use a plurality use threads stacks for most of
its allocations.

;^)

[...]
This does somehow
bring to mind use in embedded systems where there is no heap as such
so the scheme could be used as a heap for systems without a heap as it
were.
Ahh yes. That sure seems like it could possible come in handy in that type
of situation! Indeed. Thanks for your comments!

:^)

You would then presumably need to keep passing a reference to
the allocator in to child functions or put it in a global variable.
My allocators system data-structures rely on a given platform to provide
each of its threads with a stack space large enough to hold at least 4 to 8
kilobytes, a function that is analogous to pthread_get/setspecific(...)
and/or pthread_self(), an interlocked RMW (e.g., CAS or even LL/SC), and of
course (e.g., assuming you looked at the pseudo-code implementation), a
#StoreStore memory barrier instruction. Luckily for me, CAS and LL/SC are
fairly common, and if there not there, well, they can certainly be emulated
with a hashed locking pattern. If the platform has no interlocked RMW
instructions, or mutexs, then the multi-threaded aspect of my allocator
cannot be realized.

Overall then its difficult to know where the advantages outweigh the
difficulties.
Hopefully, I cleared some things up. What do ya think Andy?

:^)
Mar 3 '07 #21

P: n/a
Okay... Here is a snippet of some compliable example code:
<code>

#include <cstdio>
#include <cstddef>
#include <cassert>
#include <new>
template<size_t T_basesz, size_t T_metasz, size_t T_basealign>
class lmem {
unsigned char m_basebuf[T_basesz + T_metasz + T_basealign - 1];
unsigned char *m_alignbuf;

private:
static unsigned char* alignptr(unsigned char *buf, size_t alignsz) {
ptrdiff_t base = buf - static_cast<unsigned char*>(0);
ptrdiff_t offset = base % alignsz;
ptrdiff_t result = (! offset) ? base : base + alignsz - offset;
assert(! (result % alignsz));
return static_cast<unsigned char*>(0) + result;
}

public:
lmem() : m_alignbuf(alignptr(m_basebuf, T_basealign)) {
printf("(%p)lmem::lmem()\n - buffer size: %u\n - m_basebuf(%p)\n -
m_alignbuf(%p)\n\n",
(void*)this,
T_basesz + T_metasz + T_basealign - 1,
(void*)m_basebuf,
(void*)m_alignbuf);
}
template<typename T>
void* loadptr() const {
assert(T_basesz >= (sizeof(T) * 2) - 1);
printf("(%p)lmem::loadptr() - buffer size: %u\n",
(void*)this,
(sizeof(T) * 2) - 1);
return alignptr(m_alignbuf, sizeof(T));
}
void* loadmetaptr() const {
return m_alignbuf + T_basesz;
}
};
namespace detail {
namespace os {
namespace cfg {
enum config_e {
PAGE_SZ = 8192
};
}}

namespace arch {
namespace cfg {
enum config_e {
L2_CACHELINE_SZ = 128
};
}}

namespace lheap {
namespace cfg {
enum config_e {
BUF_SZ = os::cfg::PAGE_SZ * 2,
BUF_ALIGN_SZ = arch::cfg::L2_CACHELINE_SZ,
BUF_METADATA_SZ = sizeof(void*)
};
}}
}
template<typename T>
class autoptr_calldtor {
T *m_ptr;
public:
autoptr_calldtor(T *ptr) : m_ptr(ptr) {}
~autoptr_calldtor() {
if (m_ptr) { m_ptr->~T(); }
}
T* loadptr() const {
return m_ptr;
}
};
namespace lheap {
using namespace detail::lheap;
}
class foo1 {
// mess with the stack
int m_1;
char m_2[73];
short m_3;
char m_4[18];
public:
foo1() { printf("(%p)foo1::foo1()\n", (void*)this); }
~foo1() { printf("(%p)foo1::~foo1()\n\n", (void*)this); }
};
class foo2 {
// mess with the stack
int m_1;
char m_2[111];
short m_3;
char m_4[222];
public:
foo2() { printf("(%p)foo2::foo2()\n", (void*)this); }
~foo2() { printf("(%p)foo2::~foo2()\n\n", (void*)this); }
};
int main() {
// mess with the stack
int m_1;
char m_2[73];
short m_3;
char m_4[18];

{
// setup this threads allocator
lmem<lheap::cfg::BUF_SZ,
lheap::cfg::BUF_METADATA_SZ,
lheap::cfg::BUF_ALIGN_SZfoomem;

{
// mess with the stack
int m_1;
char m_2[142];
short m_3;
char m_4[188];

autoptr_calldtor<foo1f(new (foomem.loadptr<foo1>()) foo1);
}

{
// mess with the stack
int m_1;
char m_2[1];
short m_3;
char m_4[3];

autoptr_calldtor<foo2f(new (foomem.loadptr<foo2>()) foo2);
}
}

printf("\n_________\npress any key to exit...\n");
getchar();
return 0;
}

</code>

Can anybody run this without tripping an assertion? It seems to be running
fine on my systems...

;^)
P.S.

This should compile fine with the following options:

-Wall -pedantic -ansi

If you have any problems, let me know.
Mar 3 '07 #22

P: n/a
On 3 Mar, 09:07, "Chris Thomasson" <cris...@comcast.netwrote:
Hopefully, I cleared some things up. What do ya think Andy?
I think I'd better pull back out of discussions re threads and leave
it to the experts :-)

regards
Andy Little
Mar 5 '07 #23

This discussion thread is closed

Replies have been disabled for this discussion.