Effective streaming questions

Alexander Adam

Hi there!

I've got a few (more generic) questions for those C++ experts out
there.

1. I am trying to store any kind of data (custom PODs, numbers,
strings etc.) within some kind of stream which consists of an unsigned
char* array which gets resized whenever needed. Writing and reading is
done by using a simple memcpy(). Is this the most efficient method
considering that a single stream might include up to 10 different data
values within it and for each read/write (which happens very often
especially the reading) I'll do such a memcpy or is there a better,
more general approach for such a thing? I am talking specifically
about a rendering tree storage structure for 2d graphics

2. What's the fastest and most efficient way to compress / uncompress
the mentioned streams in memory?

3. I am using some custom stream implementation for reading which can
also do unicode conversion on the fly etc. The problem is this: first,
I am using a virtual base class which defines a virtual ReadWChar()
function. Next, I do call this function for the whole stream for each
char which can result in thousands of calls at a time. Now, reading is
not so of a problem as I am using a buffer underneath but I am more
concerned of the overhead of the function call. Any thoughts on that?

4. What's the general overhead considering perfomance of virtual
classes / functions? To be honest, I am already scared on declaring a
virtual destructor as I fear that the overall classes' function calls
will be with lower perfomance or is that simply not true?

5. Are there any updated perfomance documents out there giving some
insights and what you can take care from the beginning considering
modern C++ programming?

thanks!
Alex

Sep 3 '08 #1

Subscribe Post Reply

1258

red floyd

On Sep 3, 1:29*am, Alexander Adam <cont...@emiasys.comwrote:

Hi there!

I've got a few (more generic) questions for those C++ experts out
there.

1. I am trying to store any kind of data (custom PODs, numbers,
strings etc.) within some kind of stream which consists of an unsigned
char* array which gets resized whenever needed. Writing and reading is
done by using a simple memcpy(). Is this the most efficient method
considering that a single stream might include up to 10 different data
values within it and for each read/write (which happens very often
especially the reading) I'll do such a memcpy or is there a better,
more general approach for such a thing? I am talking specifically
about a rendering tree storage structure for 2d graphics

2. What's the fastest and most efficient way to compress / uncompress
the mentioned streams in memory?

3. I am using some custom stream implementation for reading which can
also do unicode conversion on the fly etc. The problem is this: first,
I am using a virtual base class which defines a virtual ReadWChar()
function. Next, I do call this function for the whole stream for each
char which can result in thousands of calls at a time. Now, reading is
not so of a problem as I am using a buffer underneath but I am more
concerned of the overhead of the function call. Any thoughts on that?

4. What's the general overhead considering perfomance of virtual
classes / functions? To be honest, I am already scared on declaring a
virtual destructor as I fear that the overall classes' function calls
will be with lower perfomance or is that simply not true?

5. Are there any updated perfomance documents out there giving some
insights and what you can take care from the beginning considering
modern C++ programming?

Remove the word "efficient" from your vocabulary. Remember Knuth's
Law: "Premature Optimization is the Root of All Evil". It's a hell of
a lot easier to make a correct program fast, than to make a fast,
incorrect program correct.

Sep 3 '08 #2

Pascal J. Bourguignon

Alexander Adam <co*****@emiasys.comwrites:

Hi there!

I've got a few (more generic) questions for those C++ experts out
there.

1. I am trying to store any kind of data (custom PODs, numbers,
strings etc.) within some kind of stream which consists of an unsigned
char* array which gets resized whenever needed. Writing and reading is
done by using a simple memcpy(). Is this the most efficient method
considering that a single stream might include up to 10 different data
values within it and for each read/write (which happens very often
especially the reading) I'll do such a memcpy or is there a better,
more general approach for such a thing? I am talking specifically
about a rendering tree storage structure for 2d graphics

2. What's the fastest and most efficient way to compress / uncompress
the mentioned streams in memory?

3. I am using some custom stream implementation for reading which can
also do unicode conversion on the fly etc. The problem is this: first,
I am using a virtual base class which defines a virtual ReadWChar()
function. Next, I do call this function for the whole stream for each
char which can result in thousands of calls at a time. Now, reading is
not so of a problem as I am using a buffer underneath but I am more
concerned of the overhead of the function call. Any thoughts on that?

4. What's the general overhead considering perfomance of virtual
classes / functions? To be honest, I am already scared on declaring a
virtual destructor as I fear that the overall classes' function calls
will be with lower perfomance or is that simply not true?

5. Are there any updated perfomance documents out there giving some
insights and what you can take care from the beginning considering
modern C++ programming?

If you need the last drop of performance and processor efficiency, C++
is not the language for you. Use assembler, and fine true every
instruction!

--
__Pascal Bourguignon__

Sep 3 '08 #3

tomisarobot

I never noticed virtual calls becoming a problem until I was in the
millions of operations in a tight loop. Write it as fast as you can
and then see where it bottlenecks, and improve the bottleneck. Keep
track of time improvements and you'll get a feel for when you are
wasting time by optimizing. main thing is don't concat strings :)

Sep 3 '08 #4

tomisarobot

dont concat strings in a big loop, I should say.

Sep 3 '08 #5

James Kanze

On Sep 3, 5:13 pm, red floyd <redfl...@gmail.comwrote:

On Sep 3, 1:29 am, Alexander Adam <cont...@emiasys.comwrote:

[...]

Remove the word "efficient" from your vocabulary.

No. Just apply it where it makes the most sense, cost-wise.
Programmer efficiency is critical, for example. Program
efficiency is generally very relative, and depends on what the
program will be used for.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Sep 4 '08 #6

James Kanze

On Sep 3, 10:29 am, Alexander Adam <cont...@emiasys.comwrote:

I've got a few (more generic) questions for those C++ experts
out there.

1. I am trying to store any kind of data (custom PODs, numbers,
strings etc.) within some kind of stream which consists of an unsigned
char* array which gets resized whenever needed. Writing and reading is
done by using a simple memcpy(). Is this the most efficient method
considering that a single stream might include up to 10 different data
values within it and for each read/write (which happens very often
especially the reading) I'll do such a memcpy or is there a better,
more general approach for such a thing? I am talking specifically
about a rendering tree storage structure for 2d graphics

I'm not too sure what you mean with regards to streaming here.
What is the purpose of using an unsigned char*, except if you're
going to write and read it to some external source (disk or
network)? And in that case, memcpy doesn't work; there's
absolutely no guarantee that another program can read what
you've written.

2. What's the fastest and most efficient way to compress /
uncompress the mentioned streams in memory?

Using a fast and efficient compression algorithm (which may
depend on the actual data types).

Note that if you're doing any serious compression downstream,
the time it takes to serialize (your point 1, above) will be
negligeable. You might even want to consider a text format.
(The one time I did this, I output a highly redundant text
format, piped to gzip.)

3. I am using some custom stream implementation for reading which can
also do unicode conversion on the fly etc. The problem is this: first,
I am using a virtual base class which defines a virtual ReadWChar()
function. Next, I do call this function for the whole stream for each
char which can result in thousands of calls at a time. Now, reading is
not so of a problem as I am using a buffer underneath but I am more
concerned of the overhead of the function call. Any thoughts on that?

It depends on the system and the compiler. I do this all the
time, but I suspect that it could make a measurable difference
on some machines, in some specific cases, but not very often.

Note too that some (regrettably very few) compilers will use
profiling data to eliminate critical virtual function calls,
replacing them with inline versions of the function if the find
that the same function is actually called most of the time.

4. What's the general overhead considering perfomance of virtual
classes / functions?

Compared to what? On the machines I usually use (Intel, AMD and
Sparc), an implementation using virtual functions is no more
expensive than one using switches or other mechanisms, but from
what I understand, this isn't true on all machines.

To be honest, I am already scared on declaring a virtual
destructor as I fear that the overall classes' function calls
will be with lower perfomance or is that simply not true?

That's simply not true. Declaring the destructor virtual may
have a very small impact on destructing the object (but
typically not measurable), but I know of no compiler where it
will have any impact on anything else.

5. Are there any updated perfomance documents out there giving
some insights and what you can take care from the beginning
considering modern C++ programming?

Probably, but... performance depends enormously on the
individual machine and compiler. What's true for one
configuration won't be true for another. From experience, the
most important thing you can do to ensure sufficient performance
is to enforce rigorous encapsulation. That way, once you've
profiled, and found what has to be changed in your code, for
your configuration, you can change it easily, without having to
rewrite the entire program.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Sep 4 '08 #7

Jerry Coffin

In article <7c************@pbourguignon.anevia.com>,
pj*@informatimago.com says...

[ ... ]

If you need the last drop of performance and processor efficiency, C++
is not the language for you. Use assembler, and fine true every
instruction!

It's interesting how the people who give advice like this are _never_
the same ones who hang out in any of the newsgroups where assembly
language is topical, and shows that they really know assembly language
well.

In reality, somebody who knows what s/he's doing can predict quite well
what a C or C++ (or Ada, Lisp, etc.) compiler is going to produce for
most input, and the majority of the time, the compiler produces about
the same code you'd write by hand. That being the case, writing all that
code by hand rarely accomplishes anything useful at all.

--
Later,
Jerry.

The universe is a figment of its own imagination.

Sep 7 '08 #8

James Kanze

On Sep 7, 4:19 pm, Jerry Coffin <jcof...@taeus.comwrote:

In article <7ciqtdw8lm....@pbourguignon.anevia.com>,
p...@informatimago.com says...
[ ... ]
In reality, somebody who knows what s/he's doing can predict
quite well what a C or C++ (or Ada, Lisp, etc.) compiler is
going to produce for most input, and the majority of the time,
the compiler produces about the same code you'd write by hand.

With a really good compiler, that's not necessarily true. The
compiler will produce much better code than you could write by
hand. (Admittedly most C++ compilers aren't that good. Yet.)

That being the case, writing all that code by hand rarely
accomplishes anything useful at all.

Writing everything in assembler is always a waste of time.
Conceivably, if the profiler shows a problem in one tight loop,
and all other attempts fail to achieve enough performance,
rewriting that one loop in assembler might accomplish something,
assuming you know the machine very, very well.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Sep 7 '08 #9

Jerry Coffin

In article <7b2ad5ae-f0c8-40da-9ac8-f2aa1087e5f5
@f36g2000hsa.googlegroups.com>, ja*********@gmail.com says...

[ ... ]

With a really good compiler, that's not necessarily true. The
compiler will produce much better code than you could write by
hand. (Admittedly most C++ compilers aren't that good. Yet.)

Sadly, I've yet to see any compiler for any language for which that was
true. Most compilers do better than most people realize, but a decent
(not even great) assembly language programmer virtually never has any
difficulty doing better. OTOH, you're unlikely to get a _significant_
improvement except by knowing quite a bit and/or dealing with relatively
unusual situations.

That being the case, writing all that code by hand rarely
accomplishes anything useful at all.

Writing everything in assembler is always a waste of time.

I'd agree that it's true 99.9% (and maybe even add another '9' or two on
the end) of the time, but not quite always. An obvious example would be
a talking greeting card. Here the price of even a small amount of memory
is a significant percentage of the overall price, and there's little
enough product differentiation that one being even marginally less
expensive than another could lead to essentially shutting the second out
of the market entirely.

IOW, using a higher level language for even part of the final product
virtually guarantees failure. OTOH, I'm the first to admit that such
situations are _quite_ unusual, to put it mildly. For typical desktop
and/or server applications, the situation is entirely different.

Conceivably, if the profiler shows a problem in one tight loop,
and all other attempts fail to achieve enough performance,
rewriting that one loop in assembler might accomplish something,
assuming you know the machine very, very well.

I've written enough assembly language to be able to say with reasonable
certainty that the situation's a bit more positive than that -- if you
know the assembly language _reasonably_ well, and find a justification
for doing so, you can pretty much guarantee doing better than any
compiler around today.

You just need to be aware of what you're getting into: not only is it
extra work and produces fragile output, but it tends to be obsolete a
_lot_ sooner than something written in a higher level language. A minor
change in microarchitecture can render your careful optimizations
entirely obsolete without a second thought!

--
Later,
Jerry.

The universe is a figment of its own imagination.

Sep 7 '08 #10

James Kanze

On Sep 8, 1:17 am, Jerry Coffin <jcof...@taeus.comwrote:

In article <7b2ad5ae-f0c8-40da-9ac8-f2aa1087e5f5
@f36g2000hsa.googlegroups.com>, james.ka...@gmail.com says...

[ ... ]
With a really good compiler, that's not necessarily true. The
compiler will produce much better code than you could write by
hand. (Admittedly most C++ compilers aren't that good. Yet.)

Sadly, I've yet to see any compiler for any language for which
that was true. Most compilers do better than most people
realize, but a decent (not even great) assembly language
programmer virtually never has any difficulty doing better.

Globally, or locally. I've actually used a Fortran compiler
that did better than assembler programmers globally. Locally,
of course, for some specific constructs, it could be beaten, but
an assembler programmer's memory (e.g. with regard to what's in
each register) isn't as good as that of the compiler.

I suspect that it depends on the architecture as well. A
machine with complicated timing constraints in the pipeline and
a lot of registers favors the compiler; a machine with few
registers and relatively few issues concerning the pipeline
(Intel?) favors the programmer.

OTOH, you're unlikely to get a _significant_ improvement
except by knowing quite a bit and/or dealing with relatively
unusual situations.

It's the dealing with unusual situations which really pays off.
If the machine supports decimal arithmetic at the hardware
level, for example, a few well placed lines of assembler will
make all the difference in a BigDecimal class.

That being the case, writing all that code by hand rarely
accomplishes anything useful at all.

Writing everything in assembler is always a waste of time.

I'd agree that it's true 99.9% (and maybe even add another '9'
or two on the end) of the time, but not quite always. An
obvious example would be a talking greeting card. Here the
price of even a small amount of memory is a significant
percentage of the overall price, and there's little enough
product differentiation that one being even marginally less
expensive than another could lead to essentially shutting the
second out of the market entirely.

Agreed. I'd forgotten about things like that.

IOW, using a higher level language for even part of the final
product virtually guarantees failure. OTOH, I'm the first to
admit that such situations are _quite_ unusual, to put it
mildly.

Maybe not that unusual, but in such cases, you generally don't
have the option of C++ anyway. (You might have C, but even
that's not really certain.

For typical desktop and/or server applications, the
situation is entirely different.

Conceivably, if the profiler shows a problem in one tight
loop, and all other attempts fail to achieve enough
performance, rewriting that one loop in assembler might
accomplish something, assuming you know the machine very,
very well.

I've written enough assembly language to be able to say with
reasonable certainty that the situation's a bit more positive
than that -- if you know the assembly language _reasonably_
well, and find a justification for doing so, you can pretty
much guarantee doing better than any compiler around today.

For a single loop, perhaps, and even then, I'm not really
convinced. I know that I couldn't improve most of the loops
generated by the Fortran compiler on the Perkin-Elmer (formally
Interdata) 8/32, and I knew the assembler and the processor well
enough to teach it, and to write drivers and such.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Sep 8 '08 #11

stan

James Kanze wrote:

On Sep 7, 4:19 pm, Jerry Coffin <jcof...@taeus.comwrote:
>In article <7ciqtdw8lm....@pbourguignon.anevia.com>,
p...@informatimago.com says...
[ ... ]
In reality, somebody who knows what s/he's doing can predict
quite well what a C or C++ (or Ada, Lisp, etc.) compiler is
going to produce for most input, and the majority of the time,
the compiler produces about the same code you'd write by hand.

With a really good compiler, that's not necessarily true. The
compiler will produce much better code than you could write by
hand. (Admittedly most C++ compilers aren't that good. Yet.)

There are a few pretty good assembly jockeys, mostly leftovers who would
do as well if not better. Theres no inherent reason a person would be
inferior to automation. But your point is certainly true in the overwhelming
majority of cases, and probably all really large applications. I agree
the exceptions to your point are vanishingly few.

>That being the case, writing all that code by hand rarely
accomplishes anything useful at all.

Writing everything in assembler is always a waste of time.

Unless your taking an assembly writing class :O

Conceivably, if the profiler shows a problem in one tight loop,
and all other attempts fail to achieve enough performance,
rewriting that one loop in assembler might accomplish something,
assuming you know the machine very, very well.

Experience and literature would suggest that it's nearly always better
to find a better algorithm or data structure; basically work smarter not
harder.

Sep 8 '08 #12

Effective streaming questions

Similar topics