Alternatives to the C++ Standard Library?

Steven T. Hatton

Now that I have a better grasp of the scope and capabilities of the C++
Standard Library, I understand that products such as Qt actually provide
much of the same functionality through their own libraries. I'm not sure
if that's a good thing or not. AFAIK, most of Qt is compatable with the
Standard Library. That is, QLT can interoperate with STL, and you can
convert back and forth between std::string and Qt::QString, etc.

Are there any other libraries based on Standard C++, but not on the C++
Standard Library? What might these be? Why do people create these? Are
any significantly different from the C++ Standard Library? Is it good that
people create their own basic libraries instead of using the Standard
Library?
--
If our hypothesis is about anything and not about some one or more
particular things, then our deductions constitute mathematics. Thus
mathematics may be defined as the subject in which we never know what we
are talking about, nor whether what we are saying is true.-Bertrand Russell

Jul 23 '05 #1

Subscribe Post Reply

4903

Donovan Rebbechi

On 2005-06-30, Steven T. Hatton <ch********@germania.sup> wrote:

Now that I have a better grasp of the scope and capabilities of the C++
Standard Library, I understand that products such as Qt actually provide
much of the same functionality through their own libraries. I'm not sure
if that's a good thing or not.
For the most part, it's not a very good thing, and if one were building these
toolkits from scratch today, they would be done differently (one would hope...)

Many of these toolkits predate the C++ '98 standard. So they built
similar infrastructure (actually, usually worse. Most pre-STL container
toolkits look pretty amateurish compared to STL)

One of the benefits of using a mature language is that there are all of these
toolkits to work with. But part of the price you pay is that there is a lot
of baggage as well -- you get things that are written to old standards, and
then have to be bent and twisted to work with new standards. There are
examples of this in the C++ language itself.
AFAIK, most of Qt is compatable with the
Standard Library. That is, QLT can interoperate with STL, and you can
convert back and forth between std::string and Qt::QString, etc.

Are there any other libraries based on Standard C++,

Qt is not really based on standard C++, since it predates that (the Qt
project started around 1996 or so)

So you could go and implement your new and improved toolkit that's integrated
with STL, or settle for the mature one that is awkward to use with STL. Or
you could look for a toolkit written by someone else post-standard that is
more STL friendly (but probably less mature than the pre-standard kits).
Choose your poison.

Cheers,
--
Donovan Rebbechi
http://pegasus.rutgers.edu/~elflord/

Jul 23 '05 #2

John Carson

"Steven T. Hatton" <ch********@germania.sup> wrote in message
news:ev********************@speakeasy.net

Now that I have a better grasp of the scope and capabilities of the
C++ Standard Library, I understand that products such as Qt actually
provide much of the same functionality through their own libraries.
I'm not sure if that's a good thing or not. AFAIK, most of Qt is
compatable with the Standard Library. That is, QLT can interoperate
with STL, and you can convert back and forth between std::string and
Qt::QString, etc.

Are there any other libraries based on Standard C++, but not on the
C++ Standard Library? What might these be? Why do people create
these? Are any significantly different from the C++ Standard
Library? Is it good that people create their own basic libraries
instead of using the Standard Library?

I think you will find that most of the libraries pre-date the standard
library (i.e., they date from pre-1998). MFC, which, like QT, pre-dates
standard C++ by many years, has its own string, array, list and map classes,
as does ATL (Active Template Library).

These classes linger on because of backward compatibility, because
programmers are familiar with them and don't want to change, and (sometimes)
because they embody different design decisions that are genuinely preferable
for some purposes.

--
John Carson

Jul 23 '05 #3

Steven T. Hatton

Donovan Rebbechi wrote:

On 2005-06-30, Steven T. Hatton <ch********@germania.sup> wrote:
Now that I have a better grasp of the scope and capabilities of the C++
Standard Library, I understand that products such as Qt actually provide
much of the same functionality through their own libraries. I'm not sure
if that's a good thing or not.
For the most part, it's not a very good thing, and if one were building
these toolkits from scratch today, they would be done differently (one
would hope...)

I've been using Qt based products since 1997 when the KDE was still in
alpha. I don't believe Trolltech have significant misgivings about the
core design of their libraries, and since they reinvent the blinkin' thing
about once every two years, whatever they would have done differently, has
been done differently.
Many of these toolkits predate the C++ '98 standard. So they built
similar infrastructure (actually, usually worse. Most pre-STL container
toolkits look pretty amateurish compared to STL)
I don't belive the QTL will fit that category. The functionality offered in
their API is arguably superior to that of the Standard, and the Standard
doesn't specify implementation.
One of the benefits of using a mature language is that there are all of
these toolkits to work with. But part of the price you pay is that there
is a lot of baggage as well -- you get things that are written to old
standards, and then have to be bent and twisted to work with new
standards. There are examples of this in the C++ language itself.
I've seen this happen with Java and Qt. The first release was dramatically
altered in the next major version. This broke a lot of the original
software. Fortunately for the Qt and Java developers, that break with the
past was worth the improved functionality. As Java and Qt became more
established, there was more concern with backward compatability. I have to
give Trolltech credit for their audacity in rolling out a major redesign in
Qt4 while including a full backward compatability subset.

AFAIK, most of Qt is compatable with the
Standard Library. That is, QLT can interoperate with STL, and you can
convert back and forth between std::string and Qt::QString, etc.

Are there any other libraries based on Standard C++,

Qt is not really based on standard C++, since it predates that (the Qt
project started around 1996 or so)

I disagree. There really is nothing in Qt that is not implemented using
Standard C++. Basically what they do is provide some language extensions
which are processed by the MOC to produce pure, Standard C++. I don't like
the amount of #MACRO magic they use, but the only times I've really had
problems with it are when I have collisions with Boost #MACRO magic. Since
boost is pretty much the playground of the Standard Committee, it's hard to
be too critical of Trolltech in that situation.
So you could go and implement your new and improved toolkit that's
integrated with STL, or settle for the mature one that is awkward to use
with STL. Or you could look for a toolkit written by someone else
post-standard that is more STL friendly (but probably less mature than the
pre-standard kits). Choose your poison.

Actually, the QTL containers are, AFAIK, fully compatable with STL
algorithms, and there is no reason that I can think of for not being able
to construct STL from QTL containers and vis-verse. The QTL containers
offer additional features such as implicit sharing, and "Java-style"
iterators. Hopefully, C++0X will adopt the new for() semantics supporting
forEach functionality. That will make Qt's current foreach #MACRO
redundant. For now, Qt offers that additional, and superior functionality.

As for QString verses std::string, the advantage std::string has is that
it's standard. Beyond that, I don't believe it even comes close to
QString, and that doesn't even address QStringList.

The downside I see with using QTL is that it is bound to Qt. As you might
have gathered, I'm a big fan of Qt. Nonetheless, I want my code to stand
on its own in areas where it should be toolkit agnostic.
--
If our hypothesis is about anything and not about some one or more
particular things, then our deductions constitute mathematics. Thus
mathematics may be defined as the subject in which we never know what we
are talking about, nor whether what we are saying is true.-Bertrand Russell

Jul 23 '05 #4

Dietmar Kuehl

Steven T. Hatton wrote:

Is it good
that people create their own basic libraries instead of using the Standard
Library?

C++ is very much about the freedom to do thing the way you want to
do them. If you don't like a feature, you would not use it but you
would use something different. Essentially, C++ is intended to
provide the infrastructure to create your own library and it tries
to get out of your way as much as possible. There are a few aspects
which are technically in the standard library which you can't avoid
entirely like operators new and delete (although you can even replace
these if you really want to), the RTTI stuff, or the basic exception
types thrown by language constructs (e.g. bad_cast) but most of the
stuff you can choose from alternatives.

Personally, I think this is the right way to go for a language: it
provides facilities which are reasonable to use (and actually superior
in many cases to alternatives although not always) and allow you to do
real work. However, providing the possibility to use alternatives
opens the door for new inventions. Actually, the intention behind the
relatively minimalist standard library in C++ is to open a market for
people providing advanced features as is done by several vendors or
organizations (for example, Qt, Tools++, Boost; see the Available C++
Libraries FAQ for a more complete list).

That said, I want to point out that the standardization committee did
something quite remarkable when they put together the standard C++
library: in an environment buzzing about object-orientation, they
realized that object-orientation is the wrong approach to many of the
real task in programming! Instead, all algorithmic work is better done
in a rather different form using other mechanisms than classes and
virtual functions. The result was the introduction of STL into the
standard C++ library (of course, the major credit for the creation of
STL and the theory behind it goes to Alexander Stepanov and David
Musser but the C++ committee had the courage to bring their avantgarde
product forward as part of the standard). Everybody creating a new
library is well-advised to understand why STL works that well and to
built on this experience! Of course, STL is not ideal (in fact, I have
a major problem with the basis STL is built upon: the iterator concept
should have been split into two distinct concepts, i.e. cursors and
property maps) but the approach taken (using techniques which now go
under the name "generic programming") is right on target.
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence

Jul 23 '05 #5

Calum Grant

Steven T. Hatton wrote:

Now that I have a better grasp of the scope and capabilities of the C++
Standard Library, I understand that products such as Qt actually provide
much of the same functionality through their own libraries. I'm not sure
if that's a good thing or not. AFAIK, most of Qt is compatable with the
Standard Library. That is, QLT can interoperate with STL, and you can
convert back and forth between std::string and Qt::QString, etc.

Are there any other libraries based on Standard C++, but not on the C++
Standard Library? What might these be? Why do people create these? Are
any significantly different from the C++ Standard Library? Is it good that
people create their own basic libraries instead of using the Standard
Library?

The Boost libraries are designed to work as closely with the STL as
possible. This means that they essentially build on and enhance the
STL, rather than duplicating its functionality.

Wherever possible, the "standard" way should be used (i.e. the STL), since
- code is more portable
- the STL is generally more efficient
- the STL is generally better debugged than home-brew implementations
- it will integrate better with other libraries, and STL functions and
algorithms
- All C++ programmers will understand such code (and if they don't they
can't really be called C++ programmers).
- there is less conversion back and forth between different formats
(e.g. QString, char*, BSTR, std::string, CString) so it is more
efficient to use just one string type.
- your code has fewer dependencies; it can assume the STL will always be
available.

Generally, non-STL libraries were written before compilers had decent
template support. Also, some C++ programmers are still a little uneasy
about templates. Some libraries/wizards require the use of idioms like
BSTR and CString, this is a real nuisance.

Calum

Jul 23 '05 #6

Panjandrum

Steven T. Hatton wrote:

Now that I have a better grasp of the scope and capabilities of the C++
Standard Library, I understand that products such as Qt actually provide
much of the same functionality through their own libraries.
The STL zealots will beat you ...
I'm not sure
if that's a good thing or not. AFAIK, most of Qt is compatable with the
Standard Library. That is, QLT can interoperate with STL, and you can
convert back and forth between std::string and Qt::QString, etc.

Are there any other libraries based on Standard C++, but not on the C++
Standard Library? What might these be? Why do people create these? Are
any significantly different from the C++ Standard Library? Is it good that
people create their own basic libraries instead of using the Standard
Library?

Unfortunately, there isn't much (STL based or not),e.g.:
http://www.ncbi.nlm.nih.gov/books/bv...it.TOC&depth=2
http://www.gnu.org/software/commoncpp/
http://seal.web.cern.ch/seal/
http://synesis.com.au/software/stlsoft/
http://www.cs.berkeley.edu/~smcpeak/...ources/smbase/

Jul 23 '05 #7

Steven T. Hatton

Dietmar Kuehl wrote:

Steven T. Hatton wrote:
Is it good
that people create their own basic libraries instead of using the
Standard Library?
C++ is very much about the freedom to do thing the way you want to
do them. If you don't like a feature, you would not use it but you
would use something different. Essentially, C++ is intended to
provide the infrastructure to create your own library and it tries
to get out of your way as much as possible.

I'm very much a believer in standardization where it is appropriate. IOW,
there should not be several different "flavors" of C++ available which, at
their core, result in non-portable code. When there is a reasonably good
open standard, I believe its best to adopt it, rather than use something
that will not work with programs that do adhere to the standard. This is
the strength behind the Internet, and a significant accomplishment of the
IETF.

[...]
Personally, I think this is the right way to go for a language: it
provides facilities which are reasonable to use (and actually superior
in many cases to alternatives although not always) and allow you to do
real work. However, providing the possibility to use alternatives
opens the door for new inventions. Actually, the intention behind the
relatively minimalist standard library in C++ is to open a market for
people providing advanced features as is done by several vendors or
organizations (for example, Qt, Tools++, Boost; see the Available C++
Libraries FAQ for a more complete list).
I agree that the Standard Library does a pretty good job of this. It's kind
of hard to know what else might be worth adding. People talk about basic
support for threading, and Stroustrup wants a GUI abstraction library.
Something I call an AUI (Abstract User Interface). The biggest problem I
see in adding such elements is that they could lead to restrictions on the
kinds of solutions people feel they should build. If the Standard supports
on threading model, and someone else comes up with something superior, the
superior solution might be rejected because it is seen as "non-standard".
OTOH, it's nice to have some accepted standard way of doing these things.
That said, I want to point out that the standardization committee did
something quite remarkable when they put together the standard C++
library: in an environment buzzing about object-orientation, they
realized that object-orientation is the wrong approach to many of the
real task in programming! Instead, all algorithmic work is better done
in a rather different form using other mechanisms than classes and
virtual functions.
Don't tell anybody, but the mem_fun stuff uses pointer to member which is,
as I understand it, one of the more expensive virtual operations. That may
all get optimized away. I really don't know.
The result was the introduction of STL into the
standard C++ library (of course, the major credit for the creation of
STL and the theory behind it goes to Alexander Stepanov and David
Musser but the C++ committee had the courage to bring their avantgarde
product forward as part of the standard). Everybody creating a new
library is well-advised to understand why STL works that well and to
built on this experience!
Actually, from what I recently read, some of the credit goes back to
Stroustrup for introducing the concept of parameterized types in the first
place. I've recently come to realize the Standard Library is not really as
complicated, and intractable as it once seemed to me. There are basically
two different kinds of primary entities, containers, and namespace local
functions that operate one them. A third category is probably worth
identifying in the function operators. That category is probably best
understood in terms of two subcategories. The actual function object
classes, and the funcitons that use (or create) them. It seems a shame
strings and I/O are treated, more or less, as distinct facilities, when in
principle, they could be treated in a way that would be more congruent with
the STL.
Of course, STL is not ideal (in fact, I have
a major problem with the basis STL is built upon: the iterator concept
should have been split into two distinct concepts, i.e. cursors and
property maps) but the approach taken (using techniques which now go
under the name "generic programming") is right on target.

Interesting. When I think of property maps, I think of key/value pairs. Is
this the kind of concept you are envisioning?

I'd be happy just to see std::vector<int> v; v::iterator it = v.begin(); Do
you know if that is in the works?
--
If our hypothesis is about anything and not about some one or more
particular things, then our deductions constitute mathematics. Thus
mathematics may be defined as the subject in which we never know what we
are talking about, nor whether what we are saying is true.-Bertrand Russell

Jul 23 '05 #8

Panjandrum

Calum Grant wrote:

The Boost libraries are designed to work as closely with the STL as
possible. This means that they essentially build on and enhance the
STL, rather than duplicating its functionality.

Wherever possible, the "standard" way should be used (i.e. the STL), since

Whenever possible the KISS way should be used and not the STL/Boost
MICI (make it complicated and impenetrable) way.

Jul 23 '05 #9

Steven T. Hatton

Panjandrum wrote:

Calum Grant wrote:
The Boost libraries are designed to work as closely with the STL as
possible. This means that they essentially build on and enhance the
STL, rather than duplicating its functionality.

Wherever possible, the "standard" way should be used (i.e. the STL),
since

Whenever possible the KISS way should be used and not the STL/Boost
MICI (make it complicated and impenetrable) way.

Are you confusing "complex" with 'arcane'? The STL is really very simple.
It just doesn't present a very intuitive interface. For example, this is
not all that compicated:

template<class ForwardIterator1,
class ForwardIterator2,
class BinaryPredicate>
ForwardIterator1 search(ForwardIterator1 first1, ForwardIterator1 last1,
ForwardIterator2 first2, ForwardIterator2 last2,
BinaryPredicate pred);

But without a bit of additional knowledge, it doesn't make a lot of sense.

Likewise for this example from TC++SL (stripped of comments):

transform (coll1.begin(), coll1.end(),
back_inserter(coll2),
bind2nd(multiplies<int>(),10));

The notation simply does not speak clearly for itself. You have to
understand what is happening with each parameter, and what it means. The
function object stuff can be especially hard to sort out.

bind2nd(multiplies<int>(),10))

actually means (_ * 10), where _ is the element from the container being
iterated over.

Boost is a mixed bag. Some of the stuff is very clear, powerful, and
useful. The Boost.Signals library looks pretty clear and usable. The
basics of Boost.Lambda also look easy to use. But Boost.Lambda looks to me
like the introduction of a completely new (functional) language, and could
be very hard to learn. It might also be very powerful once you do learn
it.

Though it's a core language feature, this, to me, looks like the essence of
C++ thinking, and makes me wonder why C++ is the follower in introducing
the construct (assuming the good idea is approved):
http://www.open-std.org/jtc1/sc22/wg...005/n1796.html
--
If our hypothesis is about anything and not about some one or more
particular things, then our deductions constitute mathematics. Thus
mathematics may be defined as the subject in which we never know what we
are talking about, nor whether what we are saying is true.-Bertrand Russell

Jul 23 '05 #10

Dietmar Kuehl

Steven T. Hatton wrote:

I'm very much a believer in standardization where it is appropriate. IOW,
there should not be several different "flavors" of C++ available which, at
their core, result in non-portable code.
This is not what I was saying, though. C++ tries to provide a portable
core on which to build libraries. Whether these take advantage of the
standard library facilities or not is up to these libraries.

Of course, currently, the standard C++ library does not address some
areas at all and libraries have to create them using non-portable
facilities. Important areas include GUIs, sockets, and multi-threading
(see below for multi-threading, however). This should be addressed in
some form although I'm not aware of any proposals for GUIs or sockets
and the timeline essentially dictates that no new proposals for C++0x
are accepted after the next committee meeting (in October). Actually,
some people favored the idea of not accepting any new proposals at all
at the last meeting (i.e. if somebody wants to propose something really
important, better hurry and create a proposal...).
When there is a reasonably good
open standard, I believe its best to adopt it, rather than use something
that will not work with programs that do adhere to the standard.
Interoperability and portability are two different things. It may well
be reasonable to create own libraries and use these within a community
rather than using what the rest of world uses. In fact, there are
clearly several rather different styles of programmin C++ anyway. The
"MFC style" and the "STL style" are rather different approaches. Also,
some people will disagree with my assessment that the generic approach
is the way to go and prefer an object oriented approach. It is a free
world and everybody is entitled to seek his luck where he thinks he'll
find it - they'd just better not complain to me about seeking it in
object orientation and not finding it there because I told them :-)
This is
the strength behind the Internet, and a significant accomplishment of the
IETF.
You'll find that something things work together and others don't even
in this area. It is generally quite reasonable to use different
approaches to solve different problems.

Personally, I think this is the right way to go for a language: it
provides facilities which are reasonable to use (and actually superior
in many cases to alternatives although not always) and allow you to do
real work. However, providing the possibility to use alternatives
opens the door for new inventions. Actually, the intention behind the
relatively minimalist standard library in C++ is to open a market for
people providing advanced features as is done by several vendors or
organizations (for example, Qt, Tools++, Boost; see the Available C++
Libraries FAQ for a more complete list).

I agree that the Standard Library does a pretty good job of this.

You mean in opening a market for other libraries?
It's kind of hard to know what else might be worth adding.
Contenders are clearly things which can't be done portably right now.
People talk about basic
support for threading, and Stroustrup wants a GUI abstraction library.
Something I call an AUI (Abstract User Interface). The biggest problem I
see in adding such elements is that they could lead to restrictions on the
kinds of solutions people feel they should build.
Correct. ... and there are different movements within the standardization
committee or groups which work on proposals. My personal view is to do
a minimalist job and create the bare minimum which is needed to avoid
platform specific code and let others build portable libraries on these
basis. Other have different ideas, though.
If the Standard
supports on threading model, and someone else comes up with something
superior, the superior solution might be rejected because it is seen as
"non-standard". OTOH, it's nice to have some accepted standard way of
doing these things.
Especially with respect to threading, it is generally accepted that it
cannot be done by a library alone. The compiler also has to make certain
guarantees about reordering of statements. I think that the direction
championed by Andrei Alexandrescu and Hans Boehm which builds upon ideas
from the "lock free multi-threading" community is the right way to go
but I have no idea whether this is too avantgarde - I hope it is not.
.... but then, the C++ committee got hurt at the [b]leading edge before
(although STL is a huge success, there were also many problems introduced
by standardizing features nobody had experience with; although this does
not that badly apply to STL itself but it clearly applies to features
which were necessary to make STL work).

That said, I want to point out that the standardization committee did
something quite remarkable when they put together the standard C++
library: in an environment buzzing about object-orientation, they
realized that object-orientation is the wrong approach to many of the
real task in programming! Instead, all algorithmic work is better done
in a rather different form using other mechanisms than classes and
virtual functions.

Don't tell anybody, but the mem_fun stuff uses pointer to member which is,
as I understand it, one of the more expensive virtual operations. That
may all get optimized away. I really don't know.

STL is mostly about interoperability. Thus it is only consistent to
provide facilities which allow interoperability with object oriented
features. In fact, generic programming is not about replacing object
oriented techniques but it is a complementary technique which is good
at things where object orientation is bad at (although some highly
religious object orientation believers will consider the indication
that object orientation is not the panacea as blasphemy - I don't mind
since I intend to go to object orientation hell, i.e. to generic
programming heaven :-)

The result was the introduction of STL into the
standard C++ library (of course, the major credit for the creation of
STL and the theory behind it goes to Alexander Stepanov and David
Musser but the C++ committee had the courage to bring their avantgarde
product forward as part of the standard). Everybody creating a new
library is well-advised to understand why STL works that well and to
built on this experience!

Actually, from what I recently read, some of the credit goes back to
Stroustrup for introducing the concept of parameterized types in the first
place.

Yes and no. The basics concepts of generic programming where developed
entirely independent of C++ and were tried in languages like Ada and
Scheme before. The credit of Stroustrup with respect to generic
programming was to create the basis of the template mechanism we now
have and to explain this (together with Andrew Koenig if I remember
correctly) to Alexander Stepanov who realized that C++ provides the
model he was looking for.
I've recently come to realize the Standard Library is not really
as complicated, and intractable as it once seemed to me.
It is not complicated at all. It is actually quite simple. There is
just one thing to realize: the core of STL are concepts - not more
but no less. The rest is quite irrelevant to the understanding of
STL although it is the stuff which make it useful.
There are basically
two different kinds of primary entities, containers, and namespace local
functions that operate one them.
Apparently, you haven't understood what STL is about ;-) There are just
a few kind of entities which are completely described by concepts with
the central concept (group) being the iterators. Actually, STL
essentially revolves around iterators: they are used by the algorithms
and provided by the containers which decouples the algorithms from the
containers and allows independent realization of algorithms and
containers. BTW, containers are entirely irrelevant to STL. STL
operates on sequences, not containers. There is a huge conceptual
difference between these two.
Interesting. When I think of property maps, I think of key/value pairs.
Is this the kind of concept you are envisioning?
Yes, this is essentially it: cursors provide a position and some form
of a "key" and a property map provides values associated with the "key".
However, the "key" can be just an index into a vector or a reference to
the value itself. How the property map accesses a value related to a
key is entirely up to the property map. In particular, you should rather
envision an index lookup than a lookup in map or a has, i.e. the
property map access should be considered to be a cheap O(1) access
although it may be more involved.
I'd be happy just to see std::vector<int> v; v::iterator it = v.begin();
Do you know if that is in the works?

I don't think it is but I'm not sure. I'd consider it more important
to work on some form of lambda expressions because this would remove
the need to name these types in the first place! The above looks
cunningly as if it were used as some part of a loop. There should be
no necessity to write a loop to process the elements in a sequence.
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence

Jul 23 '05 #11

Steven T. Hatton

Dietmar Kuehl wrote:

Steven T. Hatton wrote: [...]
When there is a reasonably good
open standard, I believe its best to adopt it, rather than use something
that will not work with programs that do adhere to the standard.

Interoperability and portability are two different things. It may well
be reasonable to create own libraries and use these within a community
rather than using what the rest of world uses. In fact, there are
clearly several rather different styles of programmin C++ anyway. The
"MFC style" and the "STL style" are rather different approaches. Also,
some people will disagree with my assessment that the generic approach
is the way to go and prefer an object oriented approach. It is a free
world and everybody is entitled to seek his luck where he thinks he'll
find it - they'd just better not complain to me about seeking it in
object orientation and not finding it there because I told them :-)

I don't see it as one way or another. It really depends when you need the
polymorphism, and which is better at expressing a particular concept. I
still haven't sorted all this out (and I wonder who has). Templates can be
very flexible at compile time. Their type checking can be both a boon and
a bane. It's often exactly the behavior you want when the compiler refuses
to compare or assign two distinct types. OTOH, if you really do want two
types of templated objects to be assignable and comparable, you have to do
a bit of extra work, and you have to understand how to work with templates
fairly well.

Personally, I think this is the right way to go for a language: it
provides facilities which are reasonable to use (and actually superior
in many cases to alternatives although not always) and allow you to do
real work. However, providing the possibility to use alternatives
opens the door for new inventions. Actually, the intention behind the
relatively minimalist standard library in C++ is to open a market for
people providing advanced features as is done by several vendors or
organizations (for example, Qt, Tools++, Boost; see the Available C++
Libraries FAQ for a more complete list).

I agree that the Standard Library does a pretty good job of this.

You mean in opening a market for other libraries?

I was really speaking in more general terms, meaning that it provides a good
lean foundation to build upon.

If the Standard
supports on threading model, and someone else comes up with something
superior, the superior solution might be rejected because it is seen as
"non-standard". OTOH, it's nice to have some accepted standard way of
doing these things.

Especially with respect to threading, it is generally accepted that it
cannot be done by a library alone. The compiler also has to make certain
guarantees about reordering of statements. I think that the direction
championed by Andrei Alexandrescu and Hans Boehm which builds upon ideas
from the "lock free multi-threading" community is the right way to go
but I have no idea whether this is too avantgarde - I hope it is not.

Perhaps I have the advantage of not having too many preconceived notions of
how it should be done. I will say the Java approach to threading was easy
for me to grasp.
... but then, the C++ committee got hurt at the [b]leading edge before
(although STL is a huge success, there were also many problems introduced
by standardizing features nobody had experience with; although this does
not that badly apply to STL itself but it clearly applies to features
which were necessary to make STL work).
Yes, but years later, C++ programmers are usually passionate advocate of the
STL.

Don't tell anybody, but the mem_fun stuff uses pointer to member which
is,
as I understand it, one of the more expensive virtual operations. That
may all get optimized away. I really don't know.

STL is mostly about interoperability.

Many people will claim performance is also one of the primary advantage of
compile time polymorphism.
Thus it is only consistent to
provide facilities which allow interoperability with object oriented
features.
I need to get around to running some tests on algorithms using mem_fun
verses using for loops and static invocation.
In fact, generic programming is not about replacing object
oriented techniques but it is a complementary technique which is good
at things where object orientation is bad at (although some highly
religious object orientation believers will consider the indication
that object orientation is not the panacea as blasphemy - I don't mind
since I intend to go to object orientation hell, i.e. to generic
programming heaven :-)
The hardest thing for me to deal with when it comes to generic programming
has been the fact that the concept is often not clear from the context.
With traditional UDT programming, you can clearly see that the parameter
you are passing is of the correct type. With generic programming, we
encounter the problem of the type of the type parameter. There is no
programmatic way, that I understand, of expressing what a "pure
abstraction" is. I've heard of programming entities called "concepts", but
I know nothing about them.
[...]

There are basically
two different kinds of primary entities, containers, and namespace local
functions that operate one them.

Apparently, you haven't understood what STL is about ;-) There are just
a few kind of entities which are completely described by concepts with
the central concept (group) being the iterators.

I was considering the iterators to be integral to the containers.
Actually, STL
essentially revolves around iterators: they are used by the algorithms
and provided by the containers which decouples the algorithms from the
containers and allows independent realization of algorithms and
containers. BTW, containers are entirely irrelevant to STL. STL
operates on sequences, not containers. There is a huge conceptual
difference between these two.

I almost used the word "collection" because, to me, the STL "containers"
really aren't containers. This is especially true of vector and deque. I
would not even use the term "sequence". To me, "sequence" suggests the
elements under discussion have some inherent ordering. The relevant
ordering comes about when the iterator is incremented. It is defined
externally from the data. I would rather call the things being visited
during an iteration "enumerable elements".

Interesting. When I think of property maps, I think of key/value pairs.
Is this the kind of concept you are envisioning?

Yes, this is essentially it: cursors provide a position and some form
of a "key" and a property map provides values associated with the "key".
However, the "key" can be just an index into a vector or a reference to
the value itself. How the property map accesses a value related to a
key is entirely up to the property map. In particular, you should rather
envision an index lookup than a lookup in map or a has, i.e. the
property map access should be considered to be a cheap O(1) access
although it may be more involved.

Actually, I didn't understand what you meant. I was thinking of the
"properties" as being things such as the size, type of element, iterators
available, etc.

I'd be happy just to see std::vector<int> v; v::iterator it = v.begin();
Do you know if that is in the works?

I don't think it is but I'm not sure. I'd consider it more important
to work on some form of lambda expressions because this would remove
the need to name these types in the first place! The above looks
cunningly as if it were used as some part of a loop. There should be
no necessity to write a loop to process the elements in a sequence.

I was actually thinking in more general terms of being able to access the
typedefs inside a class through an instance. But that might run contrary to
concepts of OO and polymorphism. The example was probably most applicable
to a loop. The suggestion that lambda expression can do away with loops is
fully convincing to me. Whether you do away with the body of the loop or
not, you have two choices when processing a collection one at a time.
Iteration or recursion. There may be some good arguments for using
recursion in some cases. In general, it is not as efficient. The only
thing you accomplish with a lambda in a typical STL algorithm is to hide
the loop in the algorithm implementation.

I'm pretty sure the way lambda expressions will work is similar to the way
Mathematica works. That is significantly different than the way
traditional C++ works. Functional programming is not an intuiteve art for
me. I've done some fun stuff with it, but it usually requires more careful
thought than writing out a proceedural block of code.
--
If our hypothesis is about anything and not about some one or more
particular things, then our deductions constitute mathematics. Thus
mathematics may be defined as the subject in which we never know what we
are talking about, nor whether what we are saying is true.-Bertrand Russell

Jul 23 '05 #12

Calum Grant

Panjandrum wrote:

Calum Grant wrote:
The Boost libraries are designed to work as closely with the STL as
possible. This means that they essentially build on and enhance the
STL, rather than duplicating its functionality.

Wherever possible, the "standard" way should be used (i.e. the STL), since

Whenever possible the KISS way should be used and not the STL/Boost
MICI (make it complicated and impenetrable) way.

These two principles (KISS + adhering to standards) aren't necessarily
incompatible.

Believe it or not, the STL is supposed to be easy to use. Or at least,
easier than not using the STL. For example, std::string is simpler than
char*, and std::vector is simpler than a C array.

I agree, its syntax could be a lot nicer, but I believe its benefits
outweigh its complexity.

Seeing how the STL is standard, you just have to learn to like it.

There are two choices in library design:
1) Pre-templates: contained values are void*
2) Post-templates: containers are templated.

The advantages of 2 are: type-safety, no casts needed in client code,
faster code.

Alternatively, don't use C++, use a typeless language like Python. C++
as a whole does not adhere to KISS, but I believe the complexity is
worth it.

Jul 23 '05 #13

Donovan Rebbechi

On 2005-06-30, Steven T. Hatton <ch********@germania.sup> wrote:

Donovan Rebbechi wrote:
On 2005-06-30, Steven T. Hatton <ch********@germania.sup> wrote:
Now that I have a better grasp of the scope and capabilities of the C++
Standard Library, I understand that products such as Qt actually provide
much of the same functionality through their own libraries. I'm not sure
if that's a good thing or not.
For the most part, it's not a very good thing, and if one were building
these toolkits from scratch today, they would be done differently (one
would hope...)

I've been using Qt based products since 1997 when the KDE was still in
alpha. I don't believe Trolltech have significant misgivings about the
core design of their libraries, and since they reinvent the blinkin' thing
about once every two years, whatever they would have done differently, has
been done differently.

Not necessarily. Why discard work that they've already done ? They may have
chosen not to do it in the first place, but now that it's there, it may be
advantageous for them to keep it. For example, the QTL has the advantage, for
them (Trolltech), of being controlled by them, and having some design features
that are appropriate for use in a GUI toolkit. For example, they are pointer-based,
as opposed to value-based STL containers. They use the derive-from-void* trick
to avoid template bloat (one of the advantages of Qt is that it avoids compile
time dependencies -- so this is consistent with that philosophy)

And in some cases, it may really be too much work to take another approach. For
example, implementing a template based signal/slot architecture would amount to
a rewrite of the toolkit.

Many of these toolkits predate the C++ '98 standard. So they built
similar infrastructure (actually, usually worse. Most pre-STL container
toolkits look pretty amateurish compared to STL)

I don't belive the QTL will fit that category. The functionality offered in
their API is arguably superior to that of the Standard, and the Standard
doesn't specify implementation.

They have improved their container classes (learned from STL, I suppose) but
earlier versions suffered from a number of problems:

(1) bloated interfaces (count the member functions, compare to STL classes) this
is even true of newer versions
(2) earlier versions enabled fairly badly broken element access. For example, I
consider a "random access interface" to a linked list to be a conceptual
error. If you're using indices to access elements of a linked list, you don't
understand what a linked list is, and you're using the wrong data structure.
Use a dynamic array, a hashtable, or a tree based map instead. Also, the
"current item" they had in the previous versions is a broken way to do element
access. It needlessly fattens the interface, and it also unnecessarily modifies
the state of the list (by storing state information that should be decoupled
from the list)

(3) why virtual functions ? Does it make sense to use these classes polymorphically ?
I disagree. There really is nothing in Qt that is not implemented using
Standard C++.
What I meant was, it can't be based on "standard C++", because there was no standard
C++ available at the time it was implemented. So in that sense, the existence of
a "standard C++" was not a premise for the design. Of course it builds against
"standard C++" compilers (to the extent that such things exist), but it wasn't
*designed* against a backdrop of todays standard.

[snip] As for QString verses std::string, the advantage std::string has is that
it's standard. Beyond that, I don't believe it even comes close to
QString, and that doesn't even address QStringList.

I don't believe in the "more features is better" principle. Both classes are
somewhat bloated.

Cheers,
--
Donovan Rebbechi
http://pegasus.rutgers.edu/~elflord/

Jul 23 '05 #14

Panjandrum

Calum Grant wrote:

Believe it or not, the STL is supposed to be easy to use. Or at least,
easier than not using the STL.
If STL were easy to use it would be used more widely.
For example, std::string is simpler than
char*, and std::vector is simpler than a C array.
Agreed (except that std::string is not part of STL). But you compare
apples to oranges (C to C++). Comparing one C++ approach (STL) to other
approaches would be more revealing.
I agree, its syntax could be a lot nicer, but I believe its benefits
outweigh its complexity.

Seeing how the STL is standard, you just have to learn to like it.
That's the wrong way. You don't start to like something just because
it's 'standard'.
There are two choices in library design:
1) Pre-templates: contained values are void*
2) Post-templates: containers are templated.
.... and others. But even if you are right it doesn't mean that
STL/Boost is the only way how templates are to be used.
The advantages of 2 are: type-safety, no casts needed in client code,
faster code.

Alternatively, don't use C++, use a typeless language like Python. C++
as a whole does not adhere to KISS, but I believe the complexity is
worth it.

Alternatively, you can use easier approaches within C++. At least, C++
is touted as "Multiparadigm" language, not as STL/Boost paradigm
language.

----
"Removing a pattern can simplify a system and a simple solution should
almost always win. Coming up with simple solutions is the real
challenge."
Erich Gamma

Jul 23 '05 #15

Donovan Rebbechi

On 2005-06-30, Panjandrum <pa********@spambob.com> wrote:

Calum Grant wrote:
Believe it or not, the STL is supposed to be easy to use. Or at least,
easier than not using the STL.
If STL were easy to use it would be used more widely.

It takes time for things to be "used widely". First, the early adopters use
it. Then people write books on it. Then it is more widely learned, since all
the books cover it. Then new projects (but not necessarily old projects) start
using it. Then the new projects gain visibility. I'm not convinced that STL
is not more widely used because it's "too hard" to use.

For example, std::string is simpler than
char*, and std::vector is simpler than a C array.

Agreed (except that std::string is not part of STL). But you compare
apples to oranges (C to C++). Comparing one C++ approach (STL) to other
approaches would be more revealing.

What are some of these approaches, and what advantages do they have over
STL ?
... and others. But even if you are right it doesn't mean that
STL/Boost is the only way how templates are to be used.

What other ways did you have in mind ?

Cheers,
--
Donovan Rebbechi
http://pegasus.rutgers.edu/~elflord/

Jul 23 '05 #16

Steven T. Hatton

Donovan Rebbechi wrote:

On 2005-06-30, Steven T. Hatton <ch********@germania.sup> wrote:
I've been using Qt based products since 1997 when the KDE was still in
alpha. I don't believe Trolltech have significant misgivings about the
core design of their libraries, and since they reinvent the blinkin'
thing about once every two years, whatever they would have done
differently, has been done differently.

Not necessarily. Why discard work that they've already done ?

Qt 4 has significantly redesigned container classes. They don't call it QTL
anymore. It's called Tulip.

[...] For
example, they are pointer-based, as opposed to value-based STL containers.
Some of the QTL containers were pointer based, but they did away with those.
They use the derive-from-void* trick to avoid template bloat (one of the
advantages of Qt is that it avoids compile time dependencies -- so this is
consistent with that philosophy)
I'm not sure what you mean by "derive from void", but I don't believe it
applies to the current release.
And in some cases, it may really be too much work to take another
approach. For example, implementing a template based signal/slot
architecture would amount to a rewrite of the toolkit.
They claim to not be interested in implementing signals and slots using
templates, saying the current approach is easier to understand.

Many of these toolkits predate the C++ '98 standard. So they built
similar infrastructure (actually, usually worse. Most pre-STL container
toolkits look pretty amateurish compared to STL)

I don't belive the QTL will fit that category. The functionality offered
in their API is arguably superior to that of the Standard, and the
Standard doesn't specify implementation.

They have improved their container classes (learned from STL, I suppose)
but earlier versions suffered from a number of problems:

(1) bloated interfaces (count the member functions, compare to STL
classes) this is even true of newer versions

Considering they provide both the STL names with under_scores, and the Qt
style synonyms with mixedCase, and arguably better meaning, I don't see a
problem. It's not as if that adds to the size of your code or the compiled
result. It simply enables you to produce a consistent API or program with
a uniform naming system.
(2) earlier versions enabled fairly badly broken element access. For
example, I
consider a "random access interface" to a linked list to be a
conceptual error. If you're using indices to access elements of a
linked list, you don't understand what a linked list is, and you're
using the wrong data structure. Use a dynamic array, a hashtable, or a
tree based map instead. Also, the "current item" they had in the
previous versions is a broken way to do element access. It needlessly
fattens the interface, and it also unnecessarily modifies the state of
the list (by storing state information that should be decoupled from
the list)
You seem to be making my original point for me. "whatever they would have
done differently, has been done differently." Bear in mind they have QList
and QLinkedList. QList is the counterpart of the STL deque.
(3) why virtual functions ? Does it make sense to use these classes
polymorphically ?
It may have seemed like a possibility they wanted to leave open at the time.
There's nothing like that in the current Qt containers I've looked at.

I disagree. There really is nothing in Qt that is not implemented using
Standard C++.

What I meant was, it can't be based on "standard C++", because there was
no standard C++ available at the time it was implemented. So in that
sense, the existence of a "standard C++" was not a premise for the design.
Of course it builds against "standard C++" compilers (to the extent that
such things exist), but it wasn't *designed* against a backdrop of todays
standard.

The current Qt is not the same as the original Qt. It has been
significantly redesigned more than once.
[snip]
As for QString verses std::string, the advantage std::string has is that
it's standard. Beyond that, I don't believe it even comes close to
QString, and that doesn't even address QStringList.

I don't believe in the "more features is better" principle. Both classes
are somewhat bloated.

Cheers,

To me it's not a question of applying some management metric such as how
many functions there are on the interface. It's a question of whether it
does what I need without having to bend over backwords to make it work.
--
If our hypothesis is about anything and not about some one or more
particular things, then our deductions constitute mathematics. Thus
mathematics may be defined as the subject in which we never know what we
are talking about, nor whether what we are saying is true.-Bertrand Russell

Jul 23 '05 #17

Donovan Rebbechi

On 2005-06-30, Steven T. Hatton <ch********@germania.sup> wrote:

They use the derive-from-void* trick to avoid template bloat (one of the
advantages of Qt is that it avoids compile time dependencies -- so this is
consistent with that philosophy)
I'm not sure what you mean by "derive from void", but I don't believe it
applies to the current release.

I mean that you can implement a container of pointers in terms of a container
of void pointers. You avoid template bloat by doing this, because the
parametrised version of the class is reduced to a thin type-safety wrapper.

And in some cases, it may really be too much work to take another
approach. For example, implementing a template based signal/slot
architecture would amount to a rewrite of the toolkit.

They claim to not be interested in implementing signals and slots using
templates, saying the current approach is easier to understand.

I've read what they claim, and I believe my point still stands. It doesn't
make a lot of sense for them to throw away all their metaobject
infrastructure -- now that it works, and is a mature system. But they do
also say that the state of compilers (and support for templates) at the
time was part of the reason that they did it that way in the first place.

(1) bloated interfaces (count the member functions, compare to STL
classes) this is even true of newer versions

Considering they provide both the STL names with under_scores, and the Qt
style synonyms with mixedCase, and arguably better meaning, I don't see a
problem.

Oh dear, I didn't even know they did that. That's even worse than I thought.
It's not as if that adds to the size of your code or the compiled
result. It simply enables you to produce a consistent API or program with
a uniform naming system.
It also facilitates inconsistency.

But that's not what I was referring to anyway. I was referring to classes
with an enormous number of member functions.
You seem to be making my original point for me. "whatever they would have
done differently, has been done differently."
No. That is a single example, and a fairly small part of the toolkit.

(3) why virtual functions ? Does it make sense to use these classes
polymorphically ?

It may have seemed like a possibility they wanted to leave open at the time.

Yes, exactly. Still, I hope I've convinced you that at least in earlier
iterations of their toolkit, they really did have some glaring flaws (though
they appear to have learned)
There's nothing like that in the current Qt containers I've looked at.
Yes, I think generally, people have learned from STL. It's not just a good
container library, it's also a good example of how to approach writing a
templated container library.

What I meant was, it can't be based on "standard C++", because there was
no standard C++ available at the time it was implemented. So in that
sense, the existence of a "standard C++" was not a premise for the design.
Of course it builds against "standard C++" compilers (to the extent that
such things exist), but it wasn't *designed* against a backdrop of todays
standard.

The current Qt is not the same as the original Qt. It has been
significantly redesigned more than once.

How has it been "significantly redesigned" ? I don't consider breaking binary
compatibility to be a redesign. It still uses MOC, string based signals and
slots, and much the same class heirarchy, right ?
To me it's not a question of applying some management metric such as how
many functions there are on the interface.
It's not just a "management metric". There are actually reasons that coding
guidelines like these exist. As far as bloated interfaces go, there are a
number of issues. This is discussed in Scott Meyers under "make interfaces
minimal and complete". I don't have the book, but here are some reasons I
don't like bloated interfaces:

(1) it makes the interface harder to learn, because there's more stuff there.
Of course you can just learn a subset of the interface, which works fine
until you have to read someone else's code (this is the reason why perl code
is often considered unscrutable -- "there's more than one way to do it" is
misguidedly elevated to a design principle).

(2) it increases the likelihood that there are several ways to do the same
thing, which leads to inconsistent code

(3) it increases the likelihood that the class interface will need to be
changed. For example, when you have member functions like currentItem(), and
operator[] in a linked list class you are locking in some really poor design
choices. In the case of Qt, the result was that they decided the class needed
to be thrown out. In the case of QString, it means that they have a bunch of
legacy crud in the interface, which they will probably drop. Of course,
Trolltech have the option of rewriting classes that they botch. The standards
committee don't.

(4) it violates the principal of minimal coupling/l;east privelige. Let's
explore this a little further. In some computer science text books, examples of
linked lists include member functions like print(). Often, these bad linked
list classes offer no way to do element access without resorting to adding
private member functions (not an option in the real world!) or using indices
(clunk!)

So when you have functions like replace(), find(), etc, either
(a) they don't need to be member functions (so they shouldn't be)
(b) they do need to be member functions, which indicates a problem with the
design of the class (since adding basic functionality is impossible without
accss to private members.

It's a question of whether it does what I need without having to bend over
backwords to make it work.

Well, despite its flaws, both std::string and QString are both quite useable.

Cheers,
--
Donovan Rebbechi
http://pegasus.rutgers.edu/~elflord/

Jul 23 '05 #18

Calum Grant

Panjandrum wrote:

Calum Grant wrote:
Believe it or not, the STL is supposed to be easy to use. Or at least,
easier than not using the STL.

If STL were easy to use it would be used more widely.

I thought it was fairly universally used. Except for legacy code of course.

For example, std::string is simpler than
char*, and std::vector is simpler than a C array.

Agreed (except that std::string is not part of STL). But you compare
apples to oranges (C to C++). Comparing one C++ approach (STL) to other
approaches would be more revealing.

What "other approaches" are there for representing an array? Something
based upon void* / CObject* would surely be worse?

I agree, its syntax could be a lot nicer, but I believe its benefits
outweigh its complexity.

Seeing how the STL is standard, you just have to learn to like it.

That's the wrong way. You don't start to like something just because
it's 'standard'.

Okay, you're right. But the obvious question is: how would you improve
it? I mean, if you for example dropped templates, or iterators, what
would you replace them with?

There are two choices in library design:
1) Pre-templates: contained values are void*
2) Post-templates: containers are templated.

... and others. But even if you are right it doesn't mean that
STL/Boost is the only way how templates are to be used.

The advantages of 2 are: type-safety, no casts needed in client code,
faster code.

Alternatively, don't use C++, use a typeless language like Python. C++
as a whole does not adhere to KISS, but I believe the complexity is
worth it.

Alternatively, you can use easier approaches within C++. At least, C++
is touted as "Multiparadigm" language, not as STL/Boost paradigm
language.

But I'm sure these "easier" approaches have some disadvantages. Anyway
which did you have in mind?
----
"Removing a pattern can simplify a system and a simple solution should
almost always win. Coming up with simple solutions is the real
challenge."
Erich Gamma

I love that quote. I'll be sure to spread it around work.

Calum

Jul 23 '05 #19

Panjandrum

Calum Grant wrote:

But I'm sure these "easier" approaches have some disadvantages. Anyway
which did you have in mind?

We probably agree that a template parameter should be used for the type
of object in the container. Use your imagination for the rest of the
'design space'.

----
"Removing a pattern can simplify a system and a simple solution should
almost always win. Coming up with simple solutions is the real
challenge."
Erich Gamma

I love that quote. I'll be sure to spread it around work.

Taken from here:
http://www.artima.com/lejava/article...practice2.html

Jul 23 '05 #20

Dietmar Kuehl

Panjandrum wrote:

Calum Grant wrote:
But I'm sure these "easier" approaches have some disadvantages. Anyway
which did you have in mind?

We probably agree that a template parameter should be used for the type
of object in the container. Use your imagination for the rest of the
'design space'.

I'm probably just too dense and don't get it at all. Of course, my
problem is already that I think template parameters can and should
be used for other things than object types in containers. Maybe this
limits my imagination already too much. Can you help out the stupid,
like me, and point the blind into the right direction with a hint?
Thank you very much!
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence

Jul 23 '05 #21

Michael Olea

Dietmar Kuehl wrote:

/*
Of course, STL is not ideal (in fact, I have
a major problem with the basis STL is built upon: the iterator concept
should have been split into two distinct concepts, i.e. cursors and
property maps)...
*/

And later:

... cursors provide a position and some form
of a "key" and a property map provides values associated with the "key".
However, the "key" can be just an index into a vector or a reference to
the value itself. How the property map accesses a value related to a
key is entirely up to the property map. In particular, you should rather
envision an index lookup than a lookup in map or a has, i.e. the
property map access should be considered to be a cheap O(1) access
although it may be more involved.

Hi, Dietmar. All very interesting - it caught my attention partly becase I
am in the early stages of designing a template-based sequence analysis
library (sequence as in bioinformatics rather than functional analysis) and
I was thinking of making iterators one of two or three central organizing
ideas. So my question is just: is your master's thesis still available
on-line? I went here:

http://www.boost.org/libs/graph/doc/publications.html

And found a link to "Design Pattern for the Implementation of Graph
Algorithms", but, alas, the link is broken.

Just as an aside - the inputs, and often the outputs, of sequence analysis
are sequences - strings - but intermediate data structures like Suffix
Trees have nodes and edges with properties that vary by algorithm. So the
property map idea looks interesting.

Regards, Michael

Jul 23 '05 #22

Dietmar Kuehl

Michael Olea wrote:

Hi, Dietmar. All very interesting - it caught my attention partly becase I
am in the early stages of designing a template-based sequence analysis
library (sequence as in bioinformatics rather than functional analysis)
and I was thinking of making iterators one of two or three central
organizing ideas.
Which is quite reasonable although, as stated, you will indeed want to
split the iterator concept into two distinct concepts: one for moving
(cursors) and one for accessing data (property maps).
So my question is just: is your master's thesis still
available on-line? I went here:

http://www.boost.org/libs/graph/doc/publications.html

And found a link to "Design Pattern for the Implementation of Graph
Algorithms", but, alas, the link is broken.
Yes, I know, but I don't think I can fix the link. My thesis is still
available at
<http://www.dietmar-kuehl.de/generic-graph-algorithms.pdf>. However,
there are a few things to note:

- The Boost Graph Library (BGL) by Jeremy Siek et al. is based on
very similar ideas but is completely available as a library you
can download and there is a book describing it written by Jeremy.
I'd recommend that you rather look at this.
- Since I wrote the thesis, some things evolved and the important
concepts have undergone some changes, not to mentioned renaming
(due to discussions with Jeremy and others). In particular, what
I originally called "Data Accessor" is now called "Property Map".
Also, to distinguish concepts, I still used the term iterator for
the traversal concept rather than using "Cursor".
- The currently favored concept for property maps looks quite
different, partially due to quite resent discussions with Dave
Abrahams and Doug Gregor. In short, the concept looks like this
("pm" is a property map; "c" is cursor; "v" is a value as obtained
by combining the type of "*c" with the type of "pm"):

- reading a value: v = pm(*c); // requires readable PM
- writing a value: pm(*c, v) // requires writable PM
- lvalue access: type& v = pm(*c, v); // requires lvalue PM

For more information on the current state of property maps, have
a look at
<http://boost-consulting.com/projects/mtl4/libs/sequence/doc/html/cursors_and_property_maps.html>
Just as an aside - the inputs, and often the outputs, of sequence analysis
are sequences - strings - but intermediate data structures like Suffix
Trees have nodes and edges with properties that vary by algorithm. So the
property map idea looks interesting.

The original idea of property maps (then called data accessors)
arose in the context where there are several problems related to
properties associated with nodes and edges:

- The set of used properties vary widely between different algorithms
and depending on the used algorithm you want to provide them
somehow.
- Different algorithms think of the same property in different terms.
For example, what is a length to one algorithm (e.g. a path find
algorithm) could be a capacity to another algorithm (e.g. a flow
algorithm). Still, the algorithm want to communicate with each
other. This was actually the primary reason why I introduced
property maps for the graph abstraction.
- The same properties are often differently represented and
interrelated. For example, in flow networks each directed edge
often has a total capacity, a current flow, and a free capacity.
Obviously, "free capacity == capacity - current flow" hold, i.e.
it is sufficient to represent only two of the three attributes
and depending on the context you don't know which is the best
combination.

From your description it sounds as if there are similar issues for
the analysis of sequences and I guess that property maps can be a
viable to address these problems.
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence

Jul 23 '05 #23

Dietmar Kuehl

Steven T. Hatton wrote:

Dietmar Kuehl wrote:
... but then, the C++ committee got hurt at the [b]leading edge before
(although STL is a huge success, there were also many problems introduced
by standardizing features nobody had experience with; although this does
not that badly apply to STL itself but it clearly applies to features
which were necessary to make STL work).
Yes, but years later, C++ programmers are usually passionate advocate of
the STL.

Of course, they are. As far as I can tell, the majority of the C++
committee members (if not all of them) think that the introduction
of STL was the right thing to do. Still, it was and is a problematic
path in many areas. The STL is riddled with many problems, too, and
the specification is actually quite imprecise in many areas
- something which was only learned with the experience of using the
STL. Although many think that the STL was the right thing to happen
at that time, I have the impression that many also think that such
a process should not repeated with the next feature. There is an
advantage of operating at the leading edge but lack of experience
can also be quite harmful if the result is cast into stone in the
form of an international industry standard.

STL is mostly about interoperability.

Many people will claim performance is also one of the primary advantage of
compile time polymorphism.

A certain form of performance was indeed a design rule for how the
STL works (actually, the rules were quite strict in that they kind
of required that the abstraction penalty is non-existant or at
least rather low). However, this is kind of a restriction on the
design space. The primary goal was to find an abstraction which
makes all algorithms applicable to all reasonable data structures
while writing each algorithm a small number of times. Of course,
"algorithm" is actually a misnomer in this context: something like
"solver" or "operation" would be better since the different
implementations of an "algorithms" are actually different algorithms
(in the conventional meaning of the term) solving the same problem
while taking advantage of different properties in the input (i.e.
they are based on different requirements; e.g. 'distance(beg, end)'
actually counts the number of elements for all iterator categories
unless the iterators are random access iterators in which case the
function just uses a substraction: these are two different
algorithms solving the same problem).

Minimizing the abstraction penalty actually excludes the use of
run-time polymorphism for the general concept because run-time
polymorphism invariably has some abstraction penalty, e.g. because
the resulting blocks fed to the optimizer are chopped up at each
lately bound function call. Of course, where reasonable, run-time
polymorphism can still be used. On the other hand, the concepts
used by STL are rather small operations and the ration between
computations done by a single variation point and a possible
function call overhead is typically rather bad, i.e. when using
STL concepts you are better off not also using run-time
polymorphism. When using run-time polymorphism you want your
variations points to do more computations, i.e. you would fold
multiple operations into one call. This can be witnessed by the
radically different strategies used in STL and sequence accesses
in "purely object-oriented languages" (like Java or C#). To access
a single element in a sequence STL [normally] uses three operations:

- a check whether the current position indicates that processing
has completed ("current == end");
- access to the current element ("*current")
- moving to the next element ("++current")

In Java and C# these three operations are folded into one operation
("Next()") which returns a null value if there is no next position
and otherwise returns the current value and moves the iterator to
the next position.

Now, which is better? If you don't pay for calling multiple
functions, the former is more desirable because it offers more
flexibility and allows for smaller points of variability (although
it is still too course-grained...). This approach also plays well
with having a hierarchy of concepts which may take advantage of
special features although only a minimalist requirement is made
(which in turn caters for some very powerful optimizations because
this can take advantage of special structures in the underlying
data structure).

On the other hand, if each operation incurs a function call or at
least chops up your optimization blocks, the extra benefit is
clearly dwarfed by additional costs. Thus, in "purely
object-oriented languages" (i.e. in languages which incur arbitrary
restrictions without any gain but with, IMO, huge losses)
an infe^H^H^H^H^H^H different approach is used which reduces the
costs (the pros) and the possibilities (the cons).

Thus it is only consistent to
provide facilities which allow interoperability with object oriented
features.

I need to get around to running some tests on algorithms using mem_fun
verses using for loops and static invocation.

You might want to also have a look at TR1's "mem_fn" which is
also available from Boost.
The hardest thing for me to deal with when it comes to generic programming
has been the fact that the concept is often not clear from the context.
Yes, this is indeed a recognized problem. There are proposals to
add some form of explicit concepts to the C++ core language. The
major discussion is currently between two competing views on how
the details should look. The basic idea is to somehow describe a
"concept" which is essentially a set of operations used by
algorithms and provided by the entities passed as arguments to
these algorithms. The biggest argument is whether the arguments
have to specify more or less explicitly that they implement a
concept or if this should be implicitly determined by the compiler
based on the presence of operations for the arguments. You might
want to search the papers of the last few meetings if you are
interested in details.

Apparently, you haven't understood what STL is about ;-) There are just
a few kind of entities which are completely described by concepts with
the central concept (group) being the iterators.

I was considering the iterators to be integral to the containers.

It is necessary that a container is accessible view iterators to
be used with STL algorithms. However, iterators need not at all be
related with any container. For example, an iterator over the even
integers could compute its current state and operations without the
need of any container.

Actually, STL
essentially revolves around iterators: they are used by the algorithms
and provided by the containers which decouples the algorithms from the
containers and allows independent realization of algorithms and
containers. BTW, containers are entirely irrelevant to STL. STL
operates on sequences, not containers. There is a huge conceptual
difference between these two.

I almost used the word "collection" because, to me, the STL "containers"
really aren't containers. This is especially true of vector and deque. I
would not even use the term "sequence". To me, "sequence" suggests the
elements under discussion have some inherent ordering.

They have, indeed! In fact, the iteration over a sequence just
shows how the elements are ordered [currently]. However, the
element order may depend on other aspects than the element values.
For example, it may depend on the insertion order instead. I think,
this pretty accurately reflects typical definitions of "sequence".
See, for example, <http://en.wikipedia.org/wiki/Sequence>.
Actually, I didn't understand what you meant.
Well, I wasn't really precise at all. See the subthread about
cursors and property maps if you are interested in more details
on property maps.

I don't think it is but I'm not sure. I'd consider it more important
to work on some form of lambda expressions because this would remove
the need to name these types in the first place! The above looks
cunningly as if it were used as some part of a loop. There should be
no necessity to write a loop to process the elements in a sequence.

I was actually thinking in more general terms of being able to access the
typedefs inside a class through an instance.

I realized this. However, at least in the context of STL this need
should not arise at all...
I'm pretty sure the way lambda expressions will work is similar to the way
Mathematica works. That is significantly different than the way
traditional C++ works. Functional programming is not an intuiteve art for
me. I've done some fun stuff with it, but it usually requires more
careful thought than writing out a proceedural block of code.

Well, I consider the ability to inject portions of code into a
templatized function taking operations as arguments as an
application for lambda functions. It is a relatively primitive
use but it would drastically ease the use STL algorithms. It is
related to functional programming in this use, although real
lambda expressions would probably allow at least some forms of
functional programming.
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence

Jul 23 '05 #24

Panjandrum

Dietmar Kuehl wrote:

As far as I can tell, the majority of the C++
committee members (if not all of them) think that the introduction
of STL was the right thing to do. Still, it was and is a problematic
path in many areas. The STL is riddled with many problems, too, and
the specification is actually quite imprecise in many areas
- something which was only learned with the experience of using the
STL. Although many think that the STL was the right thing to happen
at that time, I have the impression that many also think that such
a process should not repeated with the next feature. There is an
advantage of operating at the leading edge but lack of experience
can also be quite harmful if the result is cast into stone in the
form of an international industry standard.
Read between the lines: 'We know that making STL the C++ Standard
library was a huge mistake. But we'll never admit it.'

STL is mostly about interoperability.

Many people will claim performance is also one of the primary advantage of
compile time polymorphism.

Minimizing the abstraction penalty actually excludes the use of
run-time polymorphism for the general concept because run-time
polymorphism invariably has some abstraction penalty, e.g. because
the resulting blocks fed to the optimizer are chopped up at each
lately bound function call.

Haha, the myth of the 'overhead' of a virtual function call has been
repeated for 20 years. It's usually spread by the same people who
recommend std::vector<std::string> > or boost::shared_ptr (which even
unnecessarily allocate lot of memory on the heap).
This can be witnessed by the
radically different strategies used in STL and sequence accesses
in "purely object-oriented languages" (like Java or C#). To access
a single element in a sequence STL [normally] uses three operations:

- a check whether the current position indicates that processing
has completed ("current == end");
- access to the current element ("*current")
- moving to the next element ("++current")

In Java and C# these three operations are folded into one operation
("Next()") which returns a null value if there is no next position
and otherwise returns the current value and moves the iterator to
the next position.

Now, which is better?
Wrong question. I'm not fond of Java collections and iterators. But the
right question is: 'Which is more usable?'. You find many STL-related
questions in C++ newsgroups but only few questions related to
collections and iterators in Java groups.
It is necessary that a container is accessible view iterators to
be used with STL algorithms. However, iterators need not at all be
related with any container. For example, an iterator over the even
integers could compute its current state and operations without the
need of any container.
Means, iterator and container are overlapping concepts in STL. You
could drop containers.
Well, I consider the ability to inject portions of code into a
templatized function taking operations as arguments as an
application for lambda functions.
.... and a way to further comliplicate the C++ language to satisfy the
request of a few academics. Sorting and searching are the only
'algorithms' one needs. The rest is better done with iterators and an
'old-fashioned' for-loop.
It is a relatively primitive
use but it would drastically ease the use STL algorithms. It is
related to functional programming in this use, although real
lambda expressions would probably allow at least some forms of
functional programming.

Take up time! Don't finish the next C++ Standard before 2034 after
having extended and 'enhanced' the language beyond recognizability and,
of course, after having included the whole Boost (5 times the size of
today) into the next Standard library. In the meantime let us work with
the 'manageable mess' called C++98 and reach a secure place before the
next 'big thing' in C++ happens.

Jul 23 '05 #25

Dietmar Kuehl

Panjandrum wrote:

Dietmar Kuehl wrote:
As far as I can tell, the majority of the C++
committee members (if not all of them) think that the introduction
of STL was the right thing to do. Still, it was and is a problematic
path in many areas. The STL is riddled with many problems, too, and
the specification is actually quite imprecise in many areas
- something which was only learned with the experience of using the
STL. Although many think that the STL was the right thing to happen
at that time, I have the impression that many also think that such
a process should not repeated with the next feature. There is an
advantage of operating at the leading edge but lack of experience
can also be quite harmful if the result is cast into stone in the
form of an international industry standard.

Read between the lines: 'We know that making STL the C++ Standard
library was a huge mistake. But we'll never admit it.'

I don't need to read between the lines because I know what I want
to express: Introducing the STL was the right thing to do. The
process by which it was done was suboptimal, though. Both things are
recognized and stated. I can't speak for other committee members,
of course, but I my impression is that the majority of the committee
members agrees with view. However, I'm 100% positive that the majority
of the committee does *NOT* consider the introduction of STL to be a
mistake.

Minimizing the abstraction penalty actually excludes the use of
run-time polymorphism for the general concept because run-time
polymorphism invariably has some abstraction penalty, e.g. because
the resulting blocks fed to the optimizer are chopped up at each
lately bound function call.

Haha, the myth of the 'overhead' of a virtual function call has been
repeated for 20 years. It's usually spread by the same people who
recommend std::vector<std::string> > or boost::shared_ptr (which even
unnecessarily allocate lot of memory on the heap).

Note that I didn't even mention any "'overhead' of a virtual function
call"! This is indeed neglectable, especially with modern processors.
From my experience the performance loss derives from reducing the
the basic blocks for the optimizer which is what I said above. Of
course, virtual functions are just one reason why the basic blocks
are reduced.

Now, which is better?

Wrong question. I'm not fond of Java collections and iterators. But the
right question is: 'Which is more usable?'. You find many STL-related
questions in C++ newsgroups but only few questions related to
collections and iterators in Java groups.

More usable? This is quite obvious: since neither Java nor .Net ships
with a reasonable collection of algorithms at all, I wouldn't even
consider their approach to be some form of competition. Of course,
nobody asks how to use Java enumerators with algorithms. However, this
is not because it is easier to use with their algorithms but it is the
lack of corresponding algorithms in the first place! Also, enumerators
indeed have a simpler interface but also less capabilities. Nobody
would ask how to move three items back with an enumerator because it
is impossible anyway.

That said, it is recognized, too, that it would be desirable to have
a simpler interface to algorithms which is traded for less capabilities.
For example, it is desirable to have an algorithm "sort()" which takes
a container, i.e. getting rid of the need to obtain the iterators from
the container. Of course, the sort function taking a container would
itself just delegate to the sort function taking iterators. On reason
why such a mechanism is not part of the STL is that no one had any
experience with using STL at the time it was introduced and thus some
glaringly obvious things are missing. This is exactly the procedural
problem I mentioned before.

It is necessary that a container is accessible view iterators to
be used with STL algorithms. However, iterators need not at all be
related with any container. For example, an iterator over the even
integers could compute its current state and operations without the
need of any container.

Means, iterator and container are overlapping concepts in STL. You
could drop containers.

No, they are not. Actually, they have nothing in common. The only
relation between the iterator and container concepts in STL is the
fact that containers provide some functions using iterators. In fact,
iterators and containers are related the same way enumerators and
containers are related in Java or .Net. Like iterators, enumerators
are also not necessarily related to a container but a container is
required to provide enumerators.

Well, I consider the ability to inject portions of code into a
templatized function taking operations as arguments as an
application for lambda functions.

... and a way to further comliplicate the C++ language to satisfy the
request of a few academics. Sorting and searching are the only
'algorithms' one needs. The rest is better done with iterators and an
'old-fashioned' for-loop.

Really? For resource allocations it is not that unusual to optimize
a flow network (max flow, min cost flow, etc.) and you just throw
the corresponding algorithms together on the fly by using loops?
That's pretty impressive! ... or, maybe, you have not been exposed
to non-trivial problems which I would consider more likely. Also,
even if you just use trivial algorithm like copying data, it is
possible to improve the performance by using an algorithm which
takes advantage of the underlying data structures. Of course, this
is only important when working on performance critical software.
From your answer I would guess that you also were never exposed to
this area. Writing trivial, non-performance critical software is
quite easy to do, BTW. If you are creating non-trivial and
performance critical stuff, you will welcome the flexibility and
performance STL provides - assuming you bother to understand what
it is about which you obviously haven't yet done.
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence

Jul 23 '05 #26

Steven T. Hatton

Panjandrum wrote:

Dietmar Kuehl wrote:
As far as I can tell, the majority of the C++
committee members (if not all of them) think that the introduction
of STL was the right thing to do. Still, it was and is a problematic
path in many areas. The STL is riddled with many problems, too, and
the specification is actually quite imprecise in many areas
- something which was only learned with the experience of using the
STL. Although many think that the STL was the right thing to happen
at that time, I have the impression that many also think that such
a process should not repeated with the next feature. There is an
advantage of operating at the leading edge but lack of experience
can also be quite harmful if the result is cast into stone in the
form of an international industry standard.
Read between the lines: 'We know that making STL the C++ Standard
library was a huge mistake. But we'll never admit it.'

I don't believe that is what Dietmar meant, and I don't believe there are
many who would agree with your assessment. People such as Stroustrup, and
Josuttis are very candid in admitting there are shortcomings in C++. There
is room for improvement in both the core language, and the library. There
are also some things that should have been done differently, but are not
likely to change because they are now part of the foundation.

One big advantage to the STL is that it is highspeed and low overhead. It's
really not /that/ hard to understand, and it is not a requirement. Using it
requires a different kind of thinking than using comperable Java
constructs, but when you get the hang of it, you can write some very
concise and powerful code.

C++ and the Standard Library do not strive to be a complete SDK. They are
intended to be the foundation upon which an comprehensive SDK can be built.
There are many things I don't like about C++, and, in particular the lack
of standardization in practices such as file naming, and directory
organization. For the most part, I'd like to see the CPP get deep-sixed.

I would like to hear the opinion of anybody reading this who has extensive
experience working with Java as to whether Java provides a more coherent
means of locating program entities. In Java that basically means classes
and packages.

But I really don't have any major complaints about the Standard Library.
There are a lot of minor problems. For example, I would really like to see
the I/O flag mechanism wrapped up in classes, or even little namespaces. I
would like to see the entire library segmented into namespaces as well.

>> STL is mostly about interoperability.
>
> Many people will claim performance is also one of the primary advantage
> of compile time polymorphism.

Minimizing the abstraction penalty actually excludes the use of
run-time polymorphism for the general concept because run-time
polymorphism invariably has some abstraction penalty, e.g. because
the resulting blocks fed to the optimizer are chopped up at each
lately bound function call.

Haha, the myth of the 'overhead' of a virtual function call has been
repeated for 20 years. It's usually spread by the same people who
recommend std::vector<std::string> > or boost::shared_ptr (which even
unnecessarily allocate lot of memory on the heap).

I've read the test numbers for some cases. I don't know if they've changed,
or how much they've changed, but I've seen numbers ranging from twice to 8
times the time cost for a virtual function call compared to a static call.
Often this is negligible, and not worth taking into consideration. For
example, if it is a function that executes once when a window is created,
or a program is loaded, it probably doesn't make an observable difference.
OTOH, if you are trying to compute a large spreadsheet, or process complex
3D graphics, that time can start to really matter. And these are the areas
where the STL is most suited to.
Wrong question. I'm not fond of Java collections and iterators. But the
right question is: 'Which is more usable?'. You find many STL-related
questions in C++ newsgroups but only few questions related to
collections and iterators in Java groups.
I agree that Java-style iterators have some modest syntactictic advantages
over C++ style iterators. I really hope the new for() semantics are
approved for C++0X. I know a lot of people have the hots for Boost.Lambda,
and see the new for() semantics as a challenge to its attractiveness. All
I can say is that the foreach construct (the new for() semantics) is
intuitively obvious to me, and will not take more than a few seconds to
learn.

There may be real expressive power here, but C++ already takes about a solid
year of very hard work to gain a basic understanding of the core language
and library. I would have to spend a fair amount of time playing with this
before I could use it well:
http://www.boost.org/doc/html/lambda/le_in_details.html

And even then, I wonder if the code will be as understandable as more
traditional code using a foreach construct.

It is necessary that a container is accessible view iterators to
be used with STL algorithms. However, iterators need not at all be
related with any container. For example, an iterator over the even
integers could compute its current state and operations without the
need of any container.

Means, iterator and container are overlapping concepts in STL. You
could drop containers.

No, containers provide more than what is associated with their corresponding
iterators. They provide allocation management, some safe access through
at(), information about their state such as size(), a means of
communicating standard structure and behavior, automatic sorting (in some
cases), and probably other advantages I can't think of right now.
... and a way to further comliplicate the C++ language to satisfy the
request of a few academics. Sorting and searching are the only
'algorithms' one needs. The rest is better done with iterators and an
'old-fashioned' for-loop.

Again, I disagree. Though most of the 53 (IIRC) STL algorithms are for
searching and sorting, set_union, set_intersection, set_difference,
generate, unique, fill, replace, count, transform, copy, merge, the heap
algorithms etc. are all potentially useful.

It is a relatively primitive
use but it would drastically ease the use STL algorithms. It is
related to functional programming in this use, although real
lambda expressions would probably allow at least some forms of
functional programming.

Take up time! Don't finish the next C++ Standard before 2034 after
having extended and 'enhanced' the language beyond recognizability and,
of course, after having included the whole Boost (5 times the size of
today) into the next Standard library. In the meantime let us work with
the 'manageable mess' called C++98 and reach a secure place before the
next 'big thing' in C++ happens.

The changes I've seen which look like they're on the shortlist don't look
too bad to me. I'm pushing hard for your favorite item, reference counted
object support. One item that I really like, but which doesn't seem to
have a lot of support is this:

http://www.open-std.org/jtc1/sc22/wg...2004/n1671.pdf

It may seem completely esoteric to a lot of people, but to me, it addresses
a specific problem I've encountered. That is working with reference
counted function objects.
--
If our hypothesis is about anything and not about some one or more
particular things, then our deductions constitute mathematics. Thus
mathematics may be defined as the subject in which we never know what we
are talking about, nor whether what we are saying is true.-Bertrand Russell

Jul 23 '05 #27

Dietmar Kuehl

Steven T. Hatton wrote:

The changes I've seen which look like they're on the shortlist don't look
too bad to me. I'm pushing hard for your favorite item, reference counted
object support. One item that I really like, but which doesn't seem to
have a lot of support is this:

http://www.open-std.org/jtc1/sc22/wg...2004/n1671.pdf
[for those who don't look: this paper is about overloading operators
"." and ".*"]

My understanding is that it is generally seen as an unfortunate
omission much like template typedefs: although it was deliberately
not included the last time, there seems to be general consensus that
it should have been supported now that we have more experience with
various aspects of the language. The tricky part is probably getting
the semantics right...
It may seem completely esoteric to a lot of people, but to me, it
addresses a specific problem I've encountered. That is working with
reference counted function objects.

More generally, it provides the possibility to have transparent
proxy classes which is quite important in many cases.
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence

Jul 23 '05 #28

Steven T. Hatton

Dietmar Kuehl wrote:

Steven T. Hatton wrote:
Yes, but years later, C++ programmers are usually passionate advocate of
the STL.
Of course, they are. As far as I can tell, the majority of the C++
committee members (if not all of them) think that the introduction
of STL was the right thing to do. Still, it was and is a problematic
path in many areas. The STL is riddled with many problems, too, and
the specification is actually quite imprecise in many areas
- something which was only learned with the experience of using the
STL.

IMO, the wording of the Standard is too lawyerly. I really believe it could
be translated into English if someone who understands the intended meaning
were motivated to do so.
Although many think that the STL was the right thing to happen
at that time, I have the impression that many also think that such
a process should not repeated with the next feature. There is an
advantage of operating at the leading edge but lack of experience
can also be quite harmful if the result is cast into stone in the
form of an international industry standard.
Understood. That's why I favor a very limited, orthogonal collection of
features that really need to be standardized at the foundational level.
Some of the essentials of thread support fall into that category, and the
new move semantics and rvalue references may also be valuable additions.
Though I have not studied these closely. The unique_ptr<> looked pretty
handy when I saw it.

STL is mostly about interoperability.

Many people will claim performance is also one of the primary advantage
of compile time polymorphism.

A certain form of performance was indeed a design rule for how the
STL works (actually, the rules were quite strict in that they kind
of required that the abstraction penalty is non-existant or at
least rather low). However, this is kind of a restriction on the
design space.

I believe it may be the strongest justification for having the 'T' in STL.
The primary goal was to find an abstraction which
makes all algorithms applicable to all reasonable data structures
while writing each algorithm a small number of times.
There are a some algorithms that are not applicable to associative
containers because they rearage the elements, and would therefore break the
sorting order.
Of course,
"algorithm" is actually a misnomer in this context: something like
"solver" or "operation" would be better since the different
implementations of an "algorithms" are actually different algorithms
(in the conventional meaning of the term) solving the same problem
while taking advantage of different properties in the input (i.e.
they are based on different requirements; e.g. 'distance(beg, end)'
actually counts the number of elements for all iterator categories
unless the iterators are random access iterators in which case the
function just uses a substraction: these are two different
algorithms solving the same problem).
I'd go with "operations", "solver" sounds like something out of the
marketing department. :)
Minimizing the abstraction penalty actually excludes the use of
run-time polymorphism for the general concept because run-time
polymorphism invariably has some abstraction penalty, e.g. because
the resulting blocks fed to the optimizer are chopped up at each
lately bound function call. Of course, where reasonable, run-time
polymorphism can still be used. On the other hand, the concepts
used by STL are rather small operations and the ration between
computations done by a single variation point
I'm not sure what you mean by variation point.
and a possible
function call overhead is typically rather bad, i.e. when using
STL concepts you are better off not also using run-time
polymorphism.
This is why I have serious questions about mem_fun. It is typically
implemented using a pointer to member function, which according to what I
was able to glean from thumbing through _Inside_The_C++_Object_Model_ has
fairly bad performance.
When using run-time polymorphism you want your
variations points to do more computations, i.e. you would fold
multiple operations into one call. This can be witnessed by the
radically different strategies used in STL and sequence accesses
in "purely object-oriented languages" (like Java or C#). To access
a single element in a sequence STL [normally] uses three operations:

- a check whether the current position indicates that processing
has completed ("current == end");
- access to the current element ("*current")
- moving to the next element ("++current")

In Java and C# these three operations are folded into one operation
("Next()") which returns a null value if there is no next position
and otherwise returns the current value and moves the iterator to
the next position.
But this is quite easy to implement in C++, I've even done it. The naive
approach uses postfix operators, so may be slightly less efficient than the
STL form, but I suspect the difference is negligible since I maintain an
internal temp object to return as a reference.

Now *_this_* is elegant:

http://www.open-std.org/JTC1/SC22/WG...005/n1796.html

for( int i : vec ) std::cout << i;
Now, which is better? If you don't pay for calling multiple
functions, the former is more desirable because it offers more
flexibility and allows for smaller points of variability (although
it is still too course-grained...).
With Java you can call toArray() on a collection, and get a random access
interface to iterate over with a for() loop using the same sequence of
checks you've stated above.
This approach also plays well
with having a hierarchy of concepts which may take advantage of
special features although only a minimalist requirement is made
(which in turn caters for some very powerful optimizations because
this can take advantage of special structures in the underlying
data structure).
I don't follow. What difference does it make whether I type the following,
or I leave it to the compiler to 'type' it for me:

using std::begin; // enable ADL
using std::end; // ditto
for( auto __begin = begin(vec),
__end = end(vec);
__begin != __end; ++__begin )
{
int i = *__begin;
std::cout << i;
}?

On the other hand, if each operation incurs a function call or at
least chops up your optimization blocks, the extra benefit is
clearly dwarfed by additional costs. Thus, in "purely
object-oriented languages" (i.e. in languages which incur arbitrary
restrictions without any gain but with, IMO, huge losses)
an infe^H^H^H^H^H^H different approach is used which reduces the
costs (the pros) and the possibilities (the cons).

Again, it you are intending Java, this will not necessarily apply. Java
does have some low-level optimization for its array object which is not
implemented in the same way is its other collections.

Thus it is only consistent to
provide facilities which allow interoperability with object oriented
features.

I need to get around to running some tests on algorithms using mem_fun
verses using for loops and static invocation.

You might want to also have a look at TR1's "mem_fn" which is
also available from Boost.

I'd have to study it a bit more. I don't believe it addresses the
performance issue I suspect exists with std::mem_fun; If I understand
correctly, it is something of a combination of std::mem_fun,
std::mem_fun_ref, and an extension (not really a generalization) of
std::binary_function. It seems to me that having overloaded operator.()
and operator.*() would be more useful for the kinds of problems I've
recently encountered, and would seem to be more general:

http://www.open-std.org/jtc1/sc22/wg...2004/n1671.pdf

The hardest thing for me to deal with when it comes to generic
programming has been the fact that the concept is often not clear from
the context.

Yes, this is indeed a recognized problem. There are proposals to
add some form of explicit concepts to the C++ core language. The
major discussion is currently between two competing views on how
the details should look. The basic idea is to somehow describe a
"concept" which is essentially a set of operations used by
algorithms and provided by the entities passed as arguments to
these algorithms. The biggest argument is whether the arguments
have to specify more or less explicitly that they implement a
concept or if this should be implicitly determined by the compiler
based on the presence of operations for the arguments. You might
want to search the papers of the last few meetings if you are
interested in details.

So now we have meta-types to deal with the type checking of our "pure
abstractions". At some point people are going to realize the meta-types
can be grouped into useful categories, and will want to check if that fit
into these categories. In that event we will have meta-meta-types. There
are tow ways of dealing with Russell's paradox, accept that languages with
selfreference allow for the formulation of selfcontradictory statements, or
insist that sets cannot contain other sets, only meta-sets can contain
sets, and only meta-meta-sets can contain sets....

Apparently, you haven't understood what STL is about ;-) There are just
a few kind of entities which are completely described by concepts with
the central concept (group) being the iterators.

I was considering the iterators to be integral to the containers.

It is necessary that a container is accessible view iterators to
be used with STL algorithms. However, iterators need not at all be
related with any container. For example, an iterator over the even
integers could compute its current state and operations without the
need of any container.

Agreed, but that isn't very useful in understanding what's in the STL, and
how it works in general. Your example seems like the example of a function
object which is an example of a useful class type that has no data members
or state. I argue that the fundamental concept of OO is having user
defined types containing data and having functions bound to these types to
operate on the data. When I presented that argument, it was challenged by
using the example of a function object. My response is that a function
object is a degenerate case of the general construct I was describing.

I almost used the word "collection" because, to me, the STL "containers"
really aren't containers. This is especially true of vector and deque. I
would not even use the term "sequence". To me, "sequence" suggests the
elements under discussion have some inherent ordering.

They have, indeed! In fact, the iteration over a sequence just
shows how the elements are ordered [currently]. However, the
element order may depend on other aspects than the element values.
For example, it may depend on the insertion order instead. I think,
this pretty accurately reflects typical definitions of "sequence".
See, for example, <http://en.wikipedia.org/wiki/Sequence>.

But out can leave the collection unchanged, an iterate over in a completely
different order. For example, you could create an iterator that traverses
a binary tree using depth-first traversal, and another using breadth-first
traversal. In this case we do have something of a container, but the order
of traversal is external to it.

I was actually thinking in more general terms of being able to access the
typedefs inside a class through an instance.

I realized this. However, at least in the context of STL this need
should not arise at all...

Are you suggesting I should neve have to explicitly instantiate iterators?

I'm pretty sure the way lambda expressions will work is similar to the
way
Mathematica works. That is significantly different than the way
traditional C++ works. Functional programming is not an intuiteve art
for
me. I've done some fun stuff with it, but it usually requires more
careful thought than writing out a proceedural block of code.

Well, I consider the ability to inject portions of code into a
templatized function taking operations as arguments as an
application for lambda functions. It is a relatively primitive
use but it would drastically ease the use STL algorithms. It is
related to functional programming in this use, although real
lambda expressions would probably allow at least some forms of
functional programming.

My problem with Boost.Lambda is that it introduces additional syntax into
the language, and doesn't appear to provide much in the way of useful
additional functionality. It seems the entire effort is to find a way of
shoving more complex functions into the third parameter of std::for_each.
Perhaps I'm wrong about this, but if I had to chose between for( int i :
vec ) std::cout << i; and Boost.Lambda for inclusion in C++0X, I'd go with
the foreach construct.
--
If our hypothesis is about anything and not about some one or more
particular things, then our deductions constitute mathematics. Thus
mathematics may be defined as the subject in which we never know what we
are talking about, nor whether what we are saying is true.-Bertrand Russell

Jul 23 '05 #29

Dietmar Kuehl

Steven T. Hatton wrote:

Dietmar Kuehl wrote:
IMO, the wording of the Standard is too lawyerly. I really believe it
could be translated into English if someone who understands the intended
meaning were motivated to do so.
The standard is part of a contract between the user of a C++ tool (e.g.
a compiler) and the implementer of said tool. Contracts tend to be
lawyerly. The problem is that the two parties agreeing on the contract
have radically different views of its content and the specification has
to cater for both. At the same time it should be minimalist, consistent,
and complete. I doubt that translating it into English would be a useful
enterprise, not to mention a futile one (the real standard will stay in
more or less the form it currently is.

A certain form of performance was indeed a design rule for how the
STL works (actually, the rules were quite strict in that they kind
of required that the abstraction penalty is non-existant or at
least rather low). However, this is kind of a restriction on the
design space.

I believe it may be the strongest justification for having the 'T' in STL.

Well, the strongest justification for using compile-time polymorphism
is that this allows the search for the best algorithm. Also, template
provide a mechanism which allows dispatch based on multiple participant,
something only few systems support with run-time polymorphism. In any
case, the performance benefit is beyond merely avoiding a few virtual
functions: the fact that e.g. the result type of an operation is
statically known but can still vary with different instantiations is
rather powerful.

The primary goal was to find an abstraction which
makes all algorithms applicable to all reasonable data structures
while writing each algorithm a small number of times.

There are a some algorithms that are not applicable to associative
containers because they rearage the elements, and would therefore break
the sorting order.

Of course, but you would not expect them to be applicable anyway! This
is what the term "reasonable" refers to in the above statement: you
cannot apply all sequence algorithms to all sequences. E.g. you cannot
sort a sequence exposed by input iterators. In general, the required
concepts should be such that the compiler issues an error message. For
example, trying to sort an associative container will fail to compile
because the elements in the sequence are not assignable (their first
member is const-qualified).

Of course,
"algorithm" is actually a misnomer in this context: something like
"solver" or "operation" would be better since the different
implementations of an "algorithms" are actually different algorithms
(in the conventional meaning of the term) solving the same problem
while taking advantage of different properties in the input (i.e.
they are based on different requirements; e.g. 'distance(beg, end)'
actually counts the number of elements for all iterator categories
unless the iterators are random access iterators in which case the
function just uses a substraction: these are two different
algorithms solving the same problem).

I'd go with "operations", "solver" sounds like something out of the
marketing department. :)

I doubt that the term for algorithms is changed in this context, however.
"Operation" would be a better fit from my perspective but I'm not a
native English speaker.

Minimizing the abstraction penalty actually excludes the use of
run-time polymorphism for the general concept because run-time
polymorphism invariably has some abstraction penalty, e.g. because
the resulting blocks fed to the optimizer are chopped up at each
lately bound function call. Of course, where reasonable, run-time
polymorphism can still be used. On the other hand, the concepts
used by STL are rather small operations and the ration between
computations done by a single variation point

I'm not sure what you mean by variation point.

Essentially, something which can be changed when using a polymorph
implementation. In statically typed object-oriented languages the
virtual functions are the variation points. With static polymorphism
the operations implementing a concept are the variation points, e.g.
operator++(), operator*(), and operator==() for iterators (this is
an incomplete list, however).

and a possible
function call overhead is typically rather bad, i.e. when using
STL concepts you are better off not also using run-time
polymorphism.

This is why I have serious questions about mem_fun. It is typically
implemented using a pointer to member function, which according to what I
was able to glean from thumbing through _Inside_The_C++_Object_Model_ has
fairly bad performance.

Does it? When I measured using pointer to member functions I had
differing results depending on how the pointer was actually used.
However, it is indeed likely that mem_fun has no real choice to do
a good job. If the member [function] pointer is turned into a
template parameter, the function can even be inlined. On the other
hand, if it is turned into a variable, this chance is reduced
substantially. I haven't measured this effect lately, though. It is
well possible that compiler are smart enough to detect that the
pointer does not change at all.

In Java and C# these three operations are folded into one operation
("Next()") which returns a null value if there is no next position
and otherwise returns the current value and moves the iterator to
the next position.

But this is quite easy to implement in C++, I've even done it.

Many people have done so before STL was introduced! ... and they all
failed to deliver the performance and flexibility STL delivers.
Actually, the generally failed to deliver general algorithms in the
first place!

One of the key feature of the STL approach is that algorithms are
accepting iterators with minimal requirements (typically iterators
are only required to be input or output iterators) but use an
algorithm best suited for the iterators actually used. For example,
'distance()' merely requires that the user passes a pair of input
iterators and it just moves through the sequence counting the number
of steps. However, if the user actually passes a pair of random
access iterators, the operation becomes a mere substraction which
tends to be substantially faster.

You cannot do a similar thing with enumerators using run-time
polymorphism: the interface is only know to be an enumerators not
some extended interface. Well, you could (at run-time) test for
more powerful derived interfaces but the whole point is to improve
the performance not to kill it.

There are other issues, too. For example, if you combine the three
operations (advance, access, and test) into one operation you need
to perform all three even if you only need a subset. For example,
'distance()' never accesses the value. An algorithm using loop
unrolling only tests relatively few steps, not ever step and does
not necessarily advance the iterator in each step but might step
forward multiple elements at once. From a practical point of view,
separating access and test also has the advantage that it is not
necessary to have a dedicated null value for each type. This in
turn allows for value semantics which is more general than reference
semantic (where you get an obvious null value): if you want
references you just use a proxy which behaves as if it is a value
(here overloading operator.() would be handy...). This is not
possible the other way around, i.e. you cannot simulate value
semantics if your basis is implemented to use reference semantic.
The naive
approach uses postfix operators, so may be slightly less efficient than
the STL form, but I suspect the difference is negligible since I maintain
an internal temp object to return as a reference.

Now *_this_* is elegant:

http://www.open-std.org/JTC1/SC22/WG...005/n1796.html

for( int i : vec ) std::cout << i;
I disagree. I would prefer something like this

for_each(vec, std::cout << _1);

Assuming a suitable implementation of 'for_each()' and a using
statement for an appropriate namespace this is already achievable
with current C++. However, if the operation becomes more complex
than an expression, it kind of stops working. In this case language
level lambda support becomes necessary. This yields a more flexible
and more general approach than the rather limited proposal of the
above paper. Of course, the above notation would introduce a minor
semantic difference to your statement: the element type of 'vec' is
used to in the print expression. This is easily fixed assuming
language level support of lambdas:

for_each(vec, std::cout << int(_1))

if something akin to Boost::Lambda is built-in or

for_each(vec, { std::cout << int(_1); });

if braces are necessary to signal use of a lambda expression.

Now, which is better? If you don't pay for calling multiple
functions, the former is more desirable because it offers more
flexibility and allows for smaller points of variability (although
it is still too course-grained...).

With Java you can call toArray() on a collection, and get a random access
interface to iterate over with a for() loop using the same sequence of
checks you've stated above.

I doubt that 'toArray()' has O(1) performance on doubly linked lists.
Thus, it is dead in the water: effectively, it either has to copy the
list contents into an array, possibly using some form of proxies to
avoid at least copying the whole content, or it has to use iteration
to move freely within an array. Both cases kill the performance,
especially when using only a small fraction of the sequence as is
quite likely. If you don't care about performance, feel free to use
an approach akin to 'toArray()'. Or you may use vectors all the time.
In all project I was involved performance *was* an issue. Not the
primary one but I always had to tune the software to make it run fast
enough. Deliberately killing efficiency has never been an option.

This approach also plays well
with having a hierarchy of concepts which may take advantage of
special features although only a minimalist requirement is made
(which in turn caters for some very powerful optimizations because
this can take advantage of special structures in the underlying
data structure).

I don't follow. What difference does it make whether I type the following,
or I leave it to the compiler to 'type' it for me:

using std::begin; // enable ADL
using std::end; // ditto
for( auto __begin = begin(vec),
__end = end(vec);
__begin != __end; ++__begin )
{
int i = *__begin;
std::cout << i;
}?

Since formatting the output for 'std::cout' and buffering the
result is the bottleneck in this example, it does not make much of
a difference. However, the compiler might write something rather
different instead:

using std::begin; // this is a neat trick; I should remember it
using std::end;
auto _B = begin(vec);
auto _E = end(vec);
for (; 2 <= _E - _B; _B += 2)
std::cout << int(_B[0]) << int(_B[1]);
for (; _B != _E; ++_B)
std::cout << int(*_B);

Of course, instead of "2" the compiler would rather insert "16"
or something depending on the size of the type and provide a
corresponding amount of operations. There are other optimizations
possible, too, even for such a trivial operation! Of course,
things like 'std::copy()' provide even more alternatives.

On the other hand, if each operation incurs a function call or at
least chops up your optimization blocks, the extra benefit is
clearly dwarfed by additional costs. Thus, in "purely
object-oriented languages" (i.e. in languages which incur arbitrary
restrictions without any gain but with, IMO, huge losses)
an infe^H^H^H^H^H^H different approach is used which reduces the
costs (the pros) and the possibilities (the cons).

Again, it you are intending Java, this will not necessarily apply. Java
does have some low-level optimization for its array object which is not
implemented in the same way is its other collections.

I consider it rather unlikely that you are always using array
objects and if you don't you incur a fair performance hit by
copying the content of your container into an array already even
if only references to the actual objects are copied. STL always
operates on the actual sequence and does so quickly if the iterators
are reasonably implemented. It is not by accident that Java does not
ship with an extensive algorithms library.
So now we have meta-types to deal with the type checking of our "pure
abstractions". At some point people are going to realize the meta-types
can be grouped into useful categories, and will want to check if that fit
into these categories. In that event we will have meta-meta-types. There
are tow ways of dealing with Russell's paradox, accept that languages with
selfreference allow for the formulation of selfcontradictory statements,
or insist that sets cannot contain other sets, only meta-sets can contain
sets, and only meta-meta-sets can contain sets....
I don't think you want concepts on meta types. On the other hand, people
doing meta programming might disagree... I still think that concepts can
provide some benefit when done correctly. They can also incur loads of
overhead with no benefit if done wrong. From my perspective it is still
open whether the correct or the wrong approach is chosen...

It is necessary that a container is accessible view iterators to
be used with STL algorithms. However, iterators need not at all be
related with any container. For example, an iterator over the even
integers could compute its current state and operations without the
need of any container.

Agreed, but that isn't very useful in understanding what's in the STL, and
how it works in general.

I think the understanding the concepts is the only viable route to
understand what STL is about in the first place. What's in there actually
become secondary once you have understood the basic principle. Applying
STL concepts to rather strange entities helps in understanding what STL
is about, IMO: who would bother to implement algorithms on more or less
arbitrary sequences? Well, STL readily provides them and the user can
plug in more algorithms applicable to all sequences and, of course,
arbitrary new sequences. The only restriction on the definition of the
sequence is that depending on its capabilities you might get only certain
subsets of algorithms: again this "reasonable"-thing mentioned above.

They have, indeed! In fact, the iteration over a sequence just
shows how the elements are ordered [currently]. However, the
element order may depend on other aspects than the element values.
For example, it may depend on the insertion order instead. I think,
this pretty accurately reflects typical definitions of "sequence".
See, for example, <http://en.wikipedia.org/wiki/Sequence>.

But out can leave the collection unchanged, an iterate over in a
completely
different order.

You will find that you have to iterate over a sequence in always the
same order unless you can iterator over it only once, i.e. if is
accessed by an input or an output iterators. In all cases, the sequence
stays the same whenever you iterate over it (unless you change it
explicitly by means outside the iteration).
For example, you could create an iterator that traverses
a binary tree using depth-first traversal, and another using breadth-first
traversal. In this case we do have something of a container, but the
order of traversal is external to it.
Note that you got two different sequences here for the same container:
one for the DFS order and one for the BFS order. That is, a container
can have more than one sequence associated with it. Well, as stated
before, the container is actually quite irrelevant to STL anyway.

I was actually thinking in more general terms of being able to access
the typedefs inside a class through an instance.

I realized this. However, at least in the context of STL this need
should not arise at all...

Are you suggesting I should neve have to explicitly instantiate iterators?

You should never have to explicitly access the iterator type. It is
often convenient to use subsequences and thus you might access the
iterators of a container (actually, with the current definition of
the STL you have to) but you should never need their type - unless
you are implementing a generic algorithm. But then, in this case you
either get the iterator types readily delivered or you should extract
the iterators from the object and pass them directly to another
generic algorithm operating on iterators. In generic programming you
rarely need the typedefs and if you do you normally want to infer
associated types, e.g. the value type of an iterator to define the
return type of an algorithm.
My problem with Boost.Lambda is that it introduces additional syntax into
the language,
Boost::Lambda is entirely implemented in terms of the current core
language with no changes made to support it. Where does it introduce
new syntax and, even more interesting, how?
and doesn't appear to provide much in the way of useful
additional functionality. It seems the entire effort is to find a way of
shoving more complex functions into the third parameter of std::for_each.
I thought you were in favor of n1796...!? *This* proposal is nothing
more than a glorified for_each! Lambdas can be used everywhere you
are using a functor. In fact, they would neatly solve your performance
problem with mem_fun(), too: Instead of 'mem_fun(foo::bar)' you would
write '_1.bar()', '_1.bar(_2)', ... You'd need to be consistent with
your desires and look from a more general perspective than just solving
a minor quibble with a special case. Solving many minor quibbles with
just one general solution is much more desirable than having many
special cases. Lambda support *would* solve many minor problems and
avoid the need for special solutions. In fact, Boost::Lambda already
does a very good job in this direction but it cannot solve all
problems. For example, the member function issue is not readily solved
but I think it can be addressed using special classes. Of course, if
you need to implement a class for each member function the benefit is
dwarfed by the needed effort.
Perhaps I'm wrong about this, but if I had to chose between for( int i :
vec ) std::cout << i; and Boost.Lambda for inclusion in C++0X, I'd go with
the foreach construct.

You got a very limited view, IMO. I'd rather go with a general solution!
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence

Jul 23 '05 #30

Stephen Howe

> ... and a way to further comliplicate the C++ language to satisfy the

request of a few academics. Sorting and searching are the only
'algorithms' one needs. The rest is better done with iterators and an
'old-fashioned' for-loop.

Really? Your knowledge of algorithms is extremely limited then.

nth_element(), partitition() and stable_partition() are all extremely useful
but they don't quite fit into "sorting or searching" (possibly
nth_element()).

Stephen Howe

Jul 23 '05 #31

Michael Olea

First of all, thanks for the detailed and thoughtful reply - it is much
appreciated.

Dietmar Kuehl wrote:

... as stated, you will indeed want to
split the iterator concept into two distinct concepts: one for moving
(cursors) and one for accessing data (property maps).

So for example a function that returns the "edit distance" between two
sequences - that is the minimum number of substitutions, insertions, and
deletions needed to turn one sequence into the other - might have an
interface something like:

template<typename RandomAccessCursor1, typename RandomAccessCursor2,
typename ReadablePropertyMap1, typename ReadablePropertyMap2>
unsigned int edit_distance(
RandomAccessCursor1 b1, RandomAccessCursor1, e1,
RandomAccessCursor2 b2, RandomAccessCursor2 e2,
ReadablePropertyMap1 rpm1 = identity_property_map<typename
RandomAccessCursor1::value_type>(),
ReadablePropertyMap2 rpm2 = identity_property_map<typename
RandomAccessCursor2::value_type>() );

A few first impressions:

The algorithm, a dynamic programming algorithm, does not really need random
access to the sequences, it just needs to know up front the lengths of both
sequences, which should be available in constant time, and it needs to make
multiple passes over one of the sequences - by convention say sequence two.
So after traversing sequence two it needs to go back to the start and
traverse the elements in order again. The other sequence is traversed in
order once. So "random access" is more than is really required.

There is a requirement that the types returned by rpm1 and rpm2 can be
compared using "==".

I gather that an identity_property_map has lvalue access - T& pm(*c), but
when c iterates over small elements, like char, I would want T pm(*c) - the
access occurs in the inner loop of the algorithm. For large objects of
course I would want const T& pm(*c) - so I guess the defaults above should
be something like readable_identity_property_map - an object that would
have the desired behavior.

In reality the hypothetical function "edit_distance" would be at most a
conveniance function - for a variety of reasons the real worker would be a
class, most likely a functor.

The first version of "edit_distance" I wrote long ago had the interface:

distance(const char *p, const char *q);

Then it became necessary to compare substrings, so:

distance(const char *pS, const char *pE, const char *qS, const char *qE);

Then it became necessary to compare the results of a character recognition
device, a list of structs containing first, second, and third choice
characters, each with an associated confidence level, with strings,
substrings, and each other. Then came sequences of vertices and edges in
planar maps, and now sequences of DNA, RNA, amino acids, genes...

My thesis is still
available at
<http://www.dietmar-kuehl.de/generic-graph-algorithms.pdf>. However,
there are a few things to note:
Thanks. I downloaded it despite the caveats below.

- The Boost Graph Library (BGL) by Jeremy Siek et al. is based on
very similar ideas but is completely available as a library you
can download and there is a book describing it written by Jeremy.
I'd recommend that you rather look at this.
In fact I had ordered the book before asking about your thesis - should have
it in "2 to 4 weeks". In the meantime I thought it would be useful to see
some of the history of the ideas. Also there is much documentation to read
on-line - so no shortage of study materials during those 2 to 4 weeks. I do
also have a 2 year old copy of boost, including BGL, on one of my machines.
I took a quick look - probably should get the latest...
- Since I wrote the thesis, some things evolved and the important
concepts have undergone some changes, not to mentioned renaming
(due to discussions with Jeremy and others). In particular, what
I originally called "Data Accessor" is now called "Property Map".
Also, to distinguish concepts, I still used the term iterator for
the traversal concept rather than using "Cursor".
- The currently favored concept for property maps looks quite
different, partially due to quite resent discussions with Dave
Abrahams and Doug Gregor. In short, the concept looks like this
("pm" is a property map; "c" is cursor; "v" is a value as obtained
by combining the type of "*c" with the type of "pm"):

- reading a value: v = pm(*c); // requires readable PM
- writing a value: pm(*c, v) // requires writable PM
- lvalue access: type& v = pm(*c, v); // requires lvalue PM
This looks pretty clean.

For more information on the current state of property maps, have
a look at
<http://boost-consulting.com/projects/mtl4/libs/sequence/doc/html/cursors_and_property_maps.html>

I took a quick look. Really I will have to write some code to get a better
feel for how to use them in the design of my library, but my first
impression is that it makes sense to do so.

Just as an aside - the inputs, and often the outputs, of sequence
analysis are sequences - strings - but intermediate data structures like
Suffix Trees have nodes and edges with properties that vary by algorithm.
So the property map idea looks interesting.

The original idea of property maps (then called data accessors)
arose in the context where there are several problems related to
properties associated with nodes and edges:

- The set of used properties vary widely between different algorithms
and depending on the used algorithm you want to provide them
somehow.
- Different algorithms think of the same property in different terms.
For example, what is a length to one algorithm (e.g. a path find
algorithm) could be a capacity to another algorithm (e.g. a flow
algorithm). Still, the algorithm want to communicate with each
other. This was actually the primary reason why I introduced
property maps for the graph abstraction.
- The same properties are often differently represented and
interrelated. For example, in flow networks each directed edge
often has a total capacity, a current flow, and a free capacity.
Obviously, "free capacity == capacity - current flow" hold, i.e.
it is sufficient to represent only two of the three attributes
and depending on the context you don't know which is the best
combination.

From your description it sounds as if there are similar issues for
the analysis of sequences and I guess that property maps can be a
viable to address these problems.

Yes. Several of the central data structures are various kinds of trees and
DAGS - special cases of graphs. I will definitely have to study BGL - both
the book and the code.

- Michael

Jul 23 '05 #32

Steven T. Hatton

Dietmar Kuehl wrote:

The standard is part of a contract between the user of a C++ tool (e.g.
a compiler) and the implementer of said tool. Contracts tend to be
lawyerly. The problem is that the two parties agreeing on the contract
have radically different views of its content and the specification has
to cater for both. At the same time it should be minimalist, consistent,
and complete. I doubt that translating it into English would be a useful
enterprise, not to mention a futile one (the real standard will stay in
more or less the form it currently is.
It's the ulterior motive that bothers me. To wit: built-in job security for
language lawyers. One aspect of the language specification that surprised
me is the amount of forwar reference it makes. I expected it to be more
linear. Not introducing concepts until their prerequisites were formally
presented.

I believe it may be the strongest justification for having the 'T' in
STL.

Well, the strongest justification for using compile-time polymorphism
is that this allows the search for the best algorithm. Also, template
provide a mechanism which allows dispatch based on multiple participant,
something only few systems support with run-time polymorphism. In any
case, the performance benefit is beyond merely avoiding a few virtual
functions: the fact that e.g. the result type of an operation is
statically known but can still vary with different instantiations is
rather powerful.

I believe that's what I said, even though I don't know what you just
said. :) I have much to learn yet.

I'd go with "operations", "solver" sounds like something out of the
marketing department. :)

I doubt that the term for algorithms is changed in this context, however.
"Operation" would be a better fit from my perspective but I'm not a
native English speaker.

Google for `Schroedinger ansatz'.

I'm not sure what you mean by variation point.

Essentially, something which can be changed when using a polymorph
implementation. In statically typed object-oriented languages the
virtual functions are the variation points. With static polymorphism
the operations implementing a concept are the variation points, e.g.
operator++(), operator*(), and operator==() for iterators (this is
an incomplete list, however).

I believe what you are saying is a variation point is a "branch point" the
compiler cannot analyse. My wording is probably sloppy, but I suspect the
concept is there.

This is why I have serious questions about mem_fun. It is typically
implemented using a pointer to member function, which according to what I
was able to glean from thumbing through _Inside_The_C++_Object_Model_ has
fairly bad performance.

Does it? When I measured using pointer to member functions I had
differing results depending on how the pointer was actually used.
However, it is indeed likely that mem_fun has no real choice to do
a good job. If the member [function] pointer is turned into a
template parameter, the function can even be inlined. On the other
hand, if it is turned into a variable, this chance is reduced
substantially. I haven't measured this effect lately, though. It is
well possible that compiler are smart enough to detect that the
pointer does not change at all.

It's really a question of what the compiler can deterministally know at
compile time. If the parameter passed to mem_fun can be statically
determined by the context, the polymorphic aspects (vtbl indirection, etc)
can be eliminated. If the bounds of the iterators can be determined at
compile time, it may be possible to "unroll the loop".

In Java and C# these three operations are folded into one operation
("Next()") which returns a null value if there is no next position
and otherwise returns the current value and moves the iterator to
the next position.

But this is quite easy to implement in C++, I've even done it.

Many people have done so before STL was introduced! ... and they all
failed to deliver the performance and flexibility STL delivers.
Actually, the generally failed to deliver general algorithms in the
first place!

What I meant is that I've done it starting with STL iterators. It's really
quite simple. There is one qualification, and I would have to check the
Java documentation to see exactly how Java hadels this: rather than
checking next()==null, I used bool hasNext(); It's a straightforward
wrapper around operator++(int);
One of the key feature of the STL approach is that algorithms are
accepting iterators with minimal requirements (typically iterators
are only required to be input or output iterators) but use an
algorithm best suited for the iterators actually used. For example,
'distance()' merely requires that the user passes a pair of input
iterators and it just moves through the sequence counting the number
of steps. However, if the user actually passes a pair of random
access iterators, the operation becomes a mere substraction which
tends to be substantially faster.
From my readding of TC++PL, getting the best performance is often obtained
by using the member function for a particular container template. But it
does seem reasonable that the implementation could detect the container
type, and chose the best algorithm for it.
You cannot do a similar thing with enumerators using run-time
polymorphism: the interface is only know to be an enumerators not
some extended interface. Well, you could (at run-time) test for
more powerful derived interfaces but the whole point is to improve
the performance not to kill it.

There are other issues, too. For example, if you combine the three
operations (advance, access, and test) into one operation you need
to perform all three even if you only need a subset. For example,
'distance()' never accesses the value. An algorithm using loop
unrolling only tests relatively few steps, not ever step and does
not necessarily advance the iterator in each step but might step
forward multiple elements at once. From a practical point of view,
separating access and test also has the advantage that it is not
necessary to have a dedicated null value for each type. This in
turn allows for value semantics which is more general than reference
semantic (where you get an obvious null value): if you want
references you just use a proxy which behaves as if it is a value
(here overloading operator.() would be handy...). This is not
possible the other way around, i.e. you cannot simulate value
semantics if your basis is implemented to use reference semantic.
It seems your last sentence is the real distinction you are trying to make.
If I have value semantics at the fundamental level, I can probably find a
way for the compiler to evaluate my code having all three operations
combined in one, and to omit unused operations from the object code.

The naive
approach uses postfix operators, so may be slightly less efficient than
the STL form, but I suspect the difference is negligible since I maintain
an internal temp object to return as a reference.

Now *_this_* is elegant:

http://www.open-std.org/JTC1/SC22/WG...005/n1796.html

for( int i : vec ) std::cout << i;

I disagree. I would prefer something like this

for_each(vec, std::cout << _1);

Assuming a suitable implementation of 'for_each()' and a using
statement for an appropriate namespace this is already achievable
with current C++. However, if the operation becomes more complex
than an expression, it kind of stops working. In this case language
level lambda support becomes necessary. This yields a more flexible
and more general approach than the rather limited proposal of the
above paper. Of course, the above notation would introduce a minor
semantic difference to your statement: the element type of 'vec' is
used to in the print expression. This is easily fixed assuming
language level support of lambdas:

for_each(vec, std::cout << int(_1))

if something akin to Boost::Lambda is built-in or

for_each(vec, { std::cout << int(_1); });

if braces are necessary to signal use of a lambda expression.
Please note what the first paragraph says about anonymous inner classes in
Java:

http://mindprod.com/jgloss/anonymousclasses.html

I know these are not identical to lambda expressions, but they fall into the
same category. I'm not really sure what it means to have lambda support in
the core language, but I believe this area of programming is not something
that would help any but advanced programmers working is certain specialized
areas. The foreach construct is something that would help a firstyear
computer science major get his or her homeword done more quickely and with
fewer errors.

Now, which is better? If you don't pay for calling multiple
functions, the former is more desirable because it offers more
flexibility and allows for smaller points of variability (although
it is still too course-grained...).

With Java you can call toArray() on a collection, and get a random access
interface to iterate over with a for() loop using the same sequence of
checks you've stated above.

I doubt that 'toArray()' has O(1) performance on doubly linked lists.

I'd have to read the API docs, but many of Java's containers are really
interfaces which make no requirement of the form of the "backing data
structure". I'll grant that you are at the mercy of the Java
implementation unless you can JNI your way around it. I'm not sure how
Java linked lists pose any significant disadvantage over their C++
counterpart.
Thus, it is dead in the water: effectively, it either has to copy the
list contents into an array, possibly using some form of proxies to
avoid at least copying the whole content, or it has to use iteration
to move freely within an array. Both cases kill the performance,
especially when using only a small fraction of the sequence as is
quite likely. If you don't care about performance, feel free to use
an approach akin to 'toArray()'. Or you may use vectors all the time.
In all project I was involved performance *was* an issue. Not the
primary one but I always had to tune the software to make it run fast
enough. Deliberately killing efficiency has never been an option.
Java Arrays are very friendly little animals:
http://java.sun.com/docs/books/jls/t...ml/arrays.html
Note that these are arrays of _values_ rather than references (unless that's
what you chose to put in them).

The closest thing C++ has is Josuttis's boost:array<> which is a bit more
cumbersome to work with, and is not dynamic.

I don't follow. What difference does it make whether I type the
following, or I leave it to the compiler to 'type' it for me:

using std::begin; // enable ADL
using std::end; // ditto
for( auto __begin = begin(vec),
__end = end(vec);
__begin != __end; ++__begin )
{
int i = *__begin;
std::cout << i;
}?

Since formatting the output for 'std::cout' and buffering the
result is the bottleneck in this example, it does not make much of
a difference. However, the compiler might write something rather
different instead:

using std::begin; // this is a neat trick; I should remember it
using std::end;
auto _B = begin(vec);
auto _E = end(vec);
for (; 2 <= _E - _B; _B += 2)
std::cout << int(_B[0]) << int(_B[1]);
for (; _B != _E; ++_B)
std::cout << int(*_B);

Of course, instead of "2" the compiler would rather insert "16"
or something depending on the size of the type and provide a
corresponding amount of operations. There are other optimizations
possible, too, even for such a trivial operation! Of course,
things like 'std::copy()' provide even more alternatives.

I'm confident there are advantages to using the existing STL algorithms
(what's the plural for ansatz?) over the foreach construct. This may be the
best counterargument to introducing the foreach construct. It might
encourage people write suboptimal code. OTOH, it might also encourage
people to write C++ code in the first place.

Again, it you are intending Java, this will not necessarily apply. Java
does have some low-level optimization for its array object which is not
implemented in the same way is its other collections.

I consider it rather unlikely that you are always using array
objects and if you don't you incur a fair performance hit by
copying the content of your container into an array already even
if only references to the actual objects are copied. STL always
operates on the actual sequence and does so quickly if the iterators
are reasonably implemented. It is not by accident that Java does not
ship with an extensive algorithms library.

Bear in mind that Java is organized differently than C++'s STL, so this
comparison is only of limited worth:

http://java.sun.com/j2se/1.5.0/docs/api/index.html

So now we have meta-types to deal with the type checking of our "pure
abstractions". At some point people are going to realize the meta-types
can be grouped into useful categories, and will want to check if that fit
into these categories. In that event we will have meta-meta-types.
There are tow ways of dealing with Russell's paradox, accept that
languages with selfreference allow for the formulation of
selfcontradictory statements, or insist that sets cannot contain other
sets, only meta-sets can contain sets, and only meta-meta-sets can
contain sets....

I don't think you want concepts on meta types.

If you are understanding my comments as a criticism that would be exceeding
my intent. I was merely mocking the ultimate futility of any effort at
creating a perfectly consistent and complete language.
On the other hand, people
doing meta programming might disagree... I still think that concepts can
provide some benefit when done correctly. They can also incur loads of
overhead with no benefit if done wrong. From my perspective it is still
open whether the correct or the wrong approach is chosen...
As understand what you were saying, there are two approaches, trust the
programmer to tell you his or her template is implements a certain concept,
or verify it explicitly. On suggestion I made a while back is what I call
a massless interface. It's just a formal declaration of what is to be
provided, and does not become part of the executable code. The problem
with that is there may be circumstances when only part of the
meta-interface is needed by the class instantiated from the template.

This is a gotcha one way or the other. If you don't check for complete
implementation of a concept, some of your code could have errors in it, and
yet the program it appears in will still compile. For instance a function
defined for a class template but never used in an instantiation.

It is necessary that a container is accessible view iterators to
be used with STL algorithms. However, iterators need not at all be
related with any container. For example, an iterator over the even
integers could compute its current state and operations without the
need of any container.

Agreed, but that isn't very useful in understanding what's in the STL,
and how it works in general.

I think the understanding the concepts is the only viable route to
understand what STL is about in the first place. What's in there actually
become secondary once you have understood the basic principle. Applying
STL concepts to rather strange entities helps in understanding what STL
is about, IMO: who would bother to implement algorithms on more or less
arbitrary sequences? Well, STL readily provides them and the user can
plug in more algorithms applicable to all sequences and, of course,
arbitrary new sequences. The only restriction on the definition of the
sequence is that depending on its capabilities you might get only certain
subsets of algorithms: again this "reasonable"-thing mentioned above.

I was talking about a first level working understanding, not a more complete
and comprehensive one. I will observe that this discussion helped solidify
my understanding that iterators are not "part of" the container, but are
instead instances of types bound to the containers through typedefs.
You will find that you have to iterate over a sequence in always the
same order unless you can iterator over it only once, i.e. if is
accessed by an input or an output iterators. In all cases, the sequence
stays the same whenever you iterate over it (unless you change it
explicitly by means outside the iteration).

Ah! But what do you really mean by sequence in this case? They are ordered
mappings over an unordered collection.

For example, you could create an iterator that traverses
a binary tree using depth-first traversal, and another using
breadth-first
traversal. In this case we do have something of a container, but the
order of traversal is external to it.

Note that you got two different sequences here for the same container:
one for the DFS order and one for the BFS order. That is, a container
can have more than one sequence associated with it. Well, as stated
before, the container is actually quite irrelevant to STL anyway.

That's my point. The actual collection is not a sequence. The sequence is
imposed on it as some kind of mapping.

My problem with Boost.Lambda is that it introduces additional syntax into
the language,

Boost::Lambda is entirely implemented in terms of the current core
language with no changes made to support it. Where does it introduce
new syntax and, even more interesting, how?
and doesn't appear to provide much in the way of useful
additional functionality. It seems the entire effort is to find a way of
shoving more complex functions into the third parameter of std::for_each.

I thought you were in favor of n1796...!? *This* proposal is nothing
more than a glorified for_each! Lambdas can be used everywhere you
are using a functor. In fact, they would neatly solve your performance
problem with mem_fun(), too: Instead of 'mem_fun(foo::bar)' you would
write '_1.bar()', '_1.bar(_2)', ... You'd need to be consistent with
your desires and look from a more general perspective than just solving
a minor quibble with a special case. Solving many minor quibbles with
just one general solution is much more desirable than having many
special cases. Lambda support *would* solve many minor problems and
avoid the need for special solutions.

But it needs to be comprehensible to be usable. For the beginner in C++
that will not be the case. The foreach construct is applicable to every 3D
program I have ever written. I might be able to fold all that into
superior lambda expressions, but only after I learn to use them, and I have
too much other stuff to learn before I try.

Perhaps I'm wrong about this, but if I had to chose between for( int i :
vec ) std::cout << i; and Boost.Lambda for inclusion in C++0X, I'd go
with the foreach construct.

You got a very limited view, IMO. I'd rather go with a general solution!

I'm not sure what core language changes are involved in adding lambda
support, but the fact that there are changes to the core language required
to make it fully functional is the only argument I can see for wanting it
in C++0X. If it were just a library, I'd say let it be provided by and for
the people who understand it and want it.

--
If our hypothesis is about anything and not about some one or more
particular things, then our deductions constitute mathematics. Thus
mathematics may be defined as the subject in which we never know what we
are talking about, nor whether what we are saying is true.-Bertrand Russell

Jul 23 '05 #33

Steven T. Hatton

Steven T. Hatton wrote:

Dietmar Kuehl wrote:

I consider it rather unlikely that you are always using array
objects and if you don't you incur a fair performance hit by
copying the content of your container into an array already even
if only references to the actual objects are copied. STL always
operates on the actual sequence and does so quickly if the iterators
are reasonably implemented. It is not by accident that Java does not
ship with an extensive algorithms library.

Bear in mind that Java is organized differently than C++'s STL, so this
comparison is only of limited worth:

http://java.sun.com/j2se/1.5.0/docs/api/index.html

Whoops! Damn frames!
http://java.sun.com/j2se/1.5.0/docs/...reference.html
--
If our hypothesis is about anything and not about some one or more
particular things, then our deductions constitute mathematics. Thus
mathematics may be defined as the subject in which we never know what we
are talking about, nor whether what we are saying is true.-Bertrand Russell

Jul 23 '05 #34

P.J. Plauger

"Steven T. Hatton" <ch********@germania.sup> wrote in message
news:UM********************@speakeasy.net...

Dietmar Kuehl wrote:
The standard is part of a contract between the user of a C++ tool (e.g.
a compiler) and the implementer of said tool. Contracts tend to be
lawyerly. The problem is that the two parties agreeing on the contract
have radically different views of its content and the specification has
to cater for both. At the same time it should be minimalist, consistent,
and complete. I doubt that translating it into English would be a useful
enterprise, not to mention a futile one (the real standard will stay in
more or less the form it currently is.
It's the ulterior motive that bothers me. To wit: built-in job security
for
language lawyers.

Excuse me? You've been campaigning for the past week or so to get
the C++ committee to do work that, so far, only *you* want to
have done. You've been told consistently that:

a) the committee is overworked as it is

b) if you want to get something done you have to do the work and
make a formal proposal

Please notice that not once has any committee member said, "Whoopee,
another thing to do to justify my continued work on the C++
Standard! Where do I begin?"

And now you accuse the committee, or at least parts of it, of
obfuscating the C++ Standard to ensure job security? To get this
past the moderators, I'll simply say ___________________________!
You can fill in the blanks yourself.
One aspect of the language specification that surprised
me is the amount of forwar reference it makes. I expected it to be more
linear. Not introducing concepts until their prerequisites were formally
presented.

One cure for your surprise is to sit down and do the exercise.
Try restructuring the C++ Standard to remove all those forward
references you didn't expect. You might learn something.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

Jul 23 '05 #35

Dietmar Kuehl

P.J. Plauger wrote:

And now you accuse the committee, or at least parts of it, of
obfuscating the C++ Standard to ensure job security? To get this
past the moderators, I'll simply say ___________________________!
You can fill in the blanks yourself.

Note that comp.lang.c++ is not moderated, i.e. there is no moderator
to blame for approving this article... (although I probably would
have approved this article if it had been posted to clc++m).
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence

Jul 23 '05 #36

Dietmar Kuehl

Steven T. Hatton wrote:

Dietmar Kuehl wrote:
It's the ulterior motive that bothers me. To wit: built-in job security
for language lawyers.
I'm not aware of anybody in the committee who has such "ulterior motive."
I certainly have not. As far as I can tell, everybody is doing their best
to maintain an efficient, flexible, and highly usable language. The fact
that the standard is not easy to follow derives from trying to avoid
complications by stating things only once and thereby avoiding
contradictions (which are still a problem). There is also the problem
that it is hard enough to push a change to the standard which enhances
the language definition. I'm almost 100% certain that it is even much
harder to move a change forward which supposedly only makes the standard
easier to read (for a certain fraction of the readers): Guaranteeing that
a change really does what is intended is pretty hard and we are all the
time busy fixing up oversights. I doubt that anybody is really willing to
accept the danger of breaking something just to make it easier to
understand! In fact, we even know about certain areas where there are
problems with the current definition (not only with respect to
understanding them but also what they are saying) and nobody dares
touching them!

.... and job security does not really matter for the majority of the
committee anyway: essentially, the majority of the attendees has a more
than average grasp of software development and thus does not really run
the risk of staying unemployed for extensive periods of time. And this
is not at all due to C++ only: I know of several committee members who
earn their money programming primarily in languages different from C++
(e.g. C, Java, or C#).
One aspect of the language specification that surprised
me is the amount of forwar reference it makes. I expected it to be more
linear. Not introducing concepts until their prerequisites were formally
presented.
Although this approach is desirable, it is not always possible in the
first place. In addition, this educational goal also has severe
drawbacks: currently, the standard is organized according to different
topics rather than any order imposed by trying to avoid forward
references. This makes it easy to locate specific sections. Since the
standard is not even intended to be read front to back, this organization
is much more reasonable. Also note that the standard is not intended as a
tutorial. It is intended as part of a contract and anybody determined
enough can learn to use it. It doesn't even take that long to get used to
using the standard. I think it took me less than three month to work
efficiently with the standard (of course, I didn't read the standard full
time in these three month: I just used it to look-up the details when
writing answers to questions asked in comp.lang.c++). If you think that
a deep knowledge of the C++ standard provides job security, I don't think
this is too much of an investment. Of course, I think there is more
knowledge involved in securing the jobs of the committee members...

I think you should stop whining about the standard and just accept that
it is the way it is. If you think you can do an excellent job in
"translating it to English", I'm quite confident that there is a market
for a book with the translation referencing the corresponding sections
of the standard. In any case, wasting your and other people's time with
your complaints is not going to change a jota in the standard!

I doubt that the term for algorithms is changed in this context, however.
"Operation" would be a better fit from my perspective but I'm not a
native English speaker.

Google for `Schroedinger ansatz'.

The German term "Ansatz" is about the details of a solution to a problem,
very much like "algorithm" is a description of how to solve a problem.
I.e. "Ansatz" is as inappropriate as "algorithm" for the things currently
called "algorithms". I think "operation", "solver", etc., i.e. something
which states the problem, not how to solve it, is the right approach.
I believe what you are saying is a variation point is a "branch point" the
compiler cannot analyse. My wording is probably sloppy, but I suspect the
concept is there.
No, I don't think so. Actually, yesterday I read the term "customization
point" for exactly the concept I called "variation point". I just recalled
the term wrongly: "customization point" is an appropriate name.

[static resolution of a member using 'mem_fun()' and relatives] It's really a question of what the compiler can deterministally know at
compile time. If the parameter passed to mem_fun can be statically
determined by the context, the polymorphic aspects (vtbl indirection, etc)
can be eliminated.
Well, this is part of it. I had member function adapters which encoded
the member function in the type of the functor, e.g.:

template <typename T, typename S, T (S::*mem)()>
struct unary_member
{
T operator()(S& s) { return (s.*mem)(); }
}

When I played with stuff like this I verified that they are inlined
by contemporary compilers. Although 'mem_fun()' cannot possibly create
a corresponding object, const propagation in the compiler's optimizer
could in principle detect that the referenced member function never
really changes and possibly inline it. Of course, if the member
function is actually a virtual function, things become even more
complex but if the static type of the objects does not change, it
could indeed also detect this.
If the bounds of the iterators can be determined at
compile time, it may be possible to "unroll the loop".
Loop unrolling can be done without knowing the bounds statically: in
general, you don't want to unroll the whole loop anyway, even if the
bounds are statically known. You want to unroll a bunch of iterations
and put them into a loop which steps with bigger strides.
What I meant is that I've done it starting with STL iterators. It's
really
quite simple. There is one qualification, and I would have to check the
Java documentation to see exactly how Java hadels this: rather than
checking next()==null, I used bool hasNext(); It's a straightforward
wrapper around operator++(int);
Like I said before: The goal of STL is to be fast, not to deliberately
kill the performance! Actually, one of STL's goals was to provide
generic algorithms applicable to all reasonable data structures which
are as fast as algorithms hand crafted specifically for the corresponding
data structures. Although this goal is not met in all cases the deviation
is normally in a very small margin. There were also secondary goals,
like the applicability to subsequences (if the sequence supports at least
forwarding iterators) which make operations like "hasNext()" as the basis
unsuitable.
From my readding of TC++PL, getting the best performance is often obtained
by using the member function for a particular container template.
You got it wrong. The generic algorithms are not applicable in all cases
but in some of these cases the internal structure of a certain container
still allows an implementation with good performance. In cases like this,
the container provides a corresponding operation as a member. For
example, the generic algorithm for binary search (aka 'lower_bound()',
'upper_bound()', and 'equal_range()') are only applicable to random
access iterators. However, 'set's and 'map's can provide efficient
searches [for the key type only, though] although they only support
bidirectional iteration. Thus, there are specialized versions as member
functions.
But it
does seem reasonable that the implementation could detect the container
type, and chose the best algorithm for it.
The implementation of generic algorithms does not detect the container
types but the operations provided by the iterator. Thus, it can choose
to take advantage of better suited operations if the iterator is more
capable. However, in general the iterators do not have access to the
container but only to the container's elements. For example, the list
iterator will only point to the current element in the list which does
not have a pointer to the list it is contained in.
It seems your last sentence is the real distinction you are trying to
make. If I have value semantics at the fundamental level, I can probably
find a way for the compiler to evaluate my code having all three
operations combined in one, and to omit unused operations from the object
code.
No. My point is that combining all three operations into one does not
leave the opportunity to omit unnecessary operations. Also, STL iterators
have the flexibility to use other operations if available, e.g. stepping
with 'it += n' (with n > 1).
Please note what the first paragraph says about anonymous inner classes in
Java:

http://mindprod.com/jgloss/anonymousclasses.html
I know how anonymous inner classes work in Java. I think that they are
indeed quite similar to at least my wish for lambda expression in C++:
essentially, I just want a more convenient approach to create functors
than having to write a separate function. Whether it should include
support for closures or not and how the syntax should look are actually
the only bigger issues.
I know these are not identical to lambda expressions, but they fall into
the same category.
.... and I think that they help also programmers which are not that
advanced.
I'm not really sure what it means to have lambda support
in the core language, but I believe this area of programming is not
something that would help any but advanced programmers working is certain
specialized areas.
I put a quote from a little bit above here:

for( int i : vec ) std::cout << i;
for_each(vec, std::cout << _1); The foreach construct is something that would help a firstyear
computer science major get his or her homeword done more quickely and with
fewer errors.
Can you please identify which part of the two above code excerpts makes
one superior over the other? If there is nothing which makes the first
excerpt *much* easier for non-advanced programmers, we should go with
the more general approach. Personally, I only see one minor difference
between the two codes above: the variable has a name in the first one
while it is referenced by position in the second one. Of course, if you
want to name, you would be free to do so:

for_each(vec, { int i = _1; std::cout << i; });

For other types than integers you would use 'T const& i = _1;' or
something similar. In teaching a feature like this, it is not even
necessary to state that there is a lambda expression creating a functor.
On the other hand, even if it is done it is not harmful if the reader
was exposed to the concept of a functor before.
I'd have to read the API docs, but many of Java's containers are really
interfaces which make no requirement of the form of the "backing data
structure".
You just have to understand that it does not matter whether the
containers are interfaces or not. What matters for algorithms is what
kind of access can efficiently provided. In particular, you cannot do
both efficiently (meaning in O(1)):

- insert elements at arbitrary positions
- provide random access

For example, a vector has chosen to provide random access while a list
has chosen to provide element insertion at arbitrary positions (actually,
there are more operations in each group but the above are sufficiently
characteristic). Thus, you would need a slow conversion from a list to
an array if all algorithms operate using random access. Or you could
decide that you cannot insert into the middle of a container efficiently.
I'll grant that you are at the mercy of the Java
implementation unless you can JNI your way around it. I'm not sure how
Java linked lists pose any significant disadvantage over their C++
counterpart.
You stated that in Java you would use an array in your algorithms
(quoting you from an earlier article in this thread):
With Java you can call toArray() on a collection, and get a random access
interface to iterate over with a for() loop using the same sequence of
checks you've stated above.
This would mean one of two things:
- Insertion into Java lists is slow
- Using algorithms on Java lists is slow

You cannot get both if you go through an array interface. I consider
this a huge disadvantage. There are, however, other alternatives like
going through virtual functions but these approaches have their own
problems, e.g. a loss in optimization opportunities.
Java Arrays are very friendly little animals:
http://java.sun.com/docs/books/jls/t...ml/arrays.html
Note that these are arrays of _values_ rather than references (unless
that's what you chose to put in them).
Java arrays have their own problems, too. Actually, the designers of
Java got the conversion rules wrong for them: arrays should allow no
conversion because otherwise you could easily get violations of the
Liskov Substition Principle. Immutable arrays could be covariant but
Java has no concept of immutability on the type level.
The closest thing C++ has is Josuttis's boost:array<> which is a bit more
cumbersome to work with, and is not dynamic.
'std::vector' is the right match for Java arrays. The Java arrays are
not statically sized like C++ built-in arrays or 'std::tr1::array's.
There could possibly be class between 'std::vector' and 'std::tr1::array',
i.e. a class which is sized at run-time but once the size is set does not
allow size changes. This would be a nearly exact match for Java arrays.
The only additional "capability" would be the conversion to arrays of
bases which is, however, broken in the first place.
I'm confident there are advantages to using the existing STL algorithms
(what's the plural for ansatz?)
"Ansätze" (I don't know whether the third letter is shown on your
computer the way it should be because it is outside the conventional
Usenet encoding of ASCII; it should be an "a" with an "umlaut", i.e.
a horizontal pair of dots).
over the foreach construct. This may be
the best counterargument to introducing the foreach construct.
The best counterargument for introduction of a specialized solution is
that a more general solution exists.
It might
encourage people write suboptimal code. OTOH, it might also encourage
people to write C++ code in the first place.
If you explain why my version of the "for_each()" construct is vastly
inferior to yours, I can see reason to accept this argument. However,
I don't see it to be inferior. Actually, I see it as a vastly superior
construct because it removes syntax oddities in other places, too,
making the code easier also for new users.
Bear in mind that Java is organized differently than C++'s STL, so this
comparison is only of limited worth:
I wasn't aware of a class providing algorithms on abstract containers.
Of course, it is the wrong approach to require a container in the first
place. However, the really interesting analysis is whether it provides
the algorithms efficiently or only with relatively minor losses compared
to hand-crafted versions for specific containers.
As understand what you were saying, there are two approaches, trust the
programmer to tell you his or her template is implements a certain
concept, or verify it explicitly.
No, it is more the other way around: the presence of required features
is or least can be checked with both approaches. The difference is whether
the compiler decides whether certain requirements are met on the basis
that the corresponding declarations are present or whether it has to be
explicitly allowed to make use of the corresponding features. The issue
is essentially this: assume some generic implementation uses a member
function "foo()" with no arguments. Now, a user tries to use a class
with this generic implementation which indeed has a function "foo()"
with no argument. The question is essentially: can the compiler assume
that this function "foo()" meets the requirement without the user
explicitly stating so? After all, it could be by mere accident that
there is a function "foo()" but this function can do something rather
different than is expected by the generic implementation. For more
details, read the corresponding papers in the mailings.
On suggestion I made a while back is what I call
a massless interface. It's just a formal declaration of what is to be
provided, and does not become part of the executable code. The problem
with that is there may be circumstances when only part of the
meta-interface is needed by the class instantiated from the template.
Concepts are basically that. However, the problems are rather different
than what you assumed.
I was talking about a first level working understanding, not a more
complete and comprehensive one.
Right. You need to understand concepts to understand STL - at the first
level. If you don't understand concepts you have no chance whatsoever to
ever understand STL.
I will observe that this discussion helped
solidify my understanding that iterators are not "part of" the container,
but are instead instances of types bound to the containers through
typedefs.
.... or, to put it differently, you still haven't understood what iterators
in STL are about. Maybe you should start picking up a book introducing
STL rather than trying to make sense of what is written in the header
files. You won't learn how STL works from the header files.
You will find that you have to iterate over a sequence in always the
same order unless you can iterator over it only once, i.e. if is
accessed by an input or an output iterators. In all cases, the sequence
stays the same whenever you iterate over it (unless you change it
explicitly by means outside the iteration).

Ah! But what do you really mean by sequence in this case? They are
ordered mappings over an unordered collection.

I never stated differently. You did, however! Iteration in STL defines
a sequence. The elements are order according to the position in which
they appear in the iteration, i.e. the iteration uses a sequence. Hm,
sounds like a tautology. You should, BTW, forget about collections when
you think about STL! Sequences are not directly related to collections,
i.e. you [still] don't need a collection or a container to have a
sequence. If you have a collection or a container it is likely that you
have multiple associated sequences. For example, the STL containers are
all associated with two sequences, one moving from front to back the
other in the reverse direction.
That's my point. The actual collection is not a sequence.
The issue was whether a STL algorithms operate on sequences. Nothing
more and nothing less. I never claimed that the containers or collections
are sequence. I actually claimed that you don't need containers in the
first place and that STL operates on sequences. Actually, I think you
should do yourself a favor and stop confusing yourself!
But it needs to be comprehensible to be usable.
Right. Please point out why

for (int i: vec) std::cout << i;

is more comprehensible than

for_each(vec, std::cout << _1);
For the beginner in C++ that will not be the case.
If your only argument is that the beginner in C++ cannot understand it
while everybody else can, you are just pulling up a strawman.
The foreach construct is applicable to every 3D
program I have ever written.
So is the for_each() construct I have given. In fact, the only difference
is a syntactical one: You can view a block as nothing else than a functor.
In your proposal, you put the functor after the closing parenthesis, in
my proposal it is within. Semantically, your functor has no arguments but
access it context entirely using the closure while my functor has one
argument and does not necessarily need a closure at all. Big deal, isn't
it? The real difference is, however, that you have a small, tiny feature
which cannot be used anywhere else while I got a huge feature which can
be used in many different places! For example, my feature also covers
the functor passed to 'find_if()' while yours does not.
I might be able to fold all that into
superior lambda expressions, but only after I learn to use them, and I
have too much other stuff to learn before I try.
Sorry for you. Why walk if you also have the option to crawl? You got so
much other stuff to learn, too...
I'm not sure what core language changes are involved in adding lambda
support, but the fact that there are changes to the core language required
to make it fully functional is the only argument I can see for wanting it
in C++0X. If it were just a library, I'd say let it be provided by and
for the people who understand it and want it.

If it could be implemented just as a library, I would not bother to ask
for it! Why would I? I can implement libraries myself which is a much
safer approach than waiting for compiler implementers to implement some
feature. However, since it requires core features to do it right, I
don't want too much competing features (like special case solutions) in
the language. Especially, if the special case can easily be addressed
using the general solution!
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence

Jul 23 '05 #37

Dietmar Kuehl

Michael Olea wrote:

First of all, thanks for the detailed and thoughtful reply - it is much
appreciated.
You are welcome.
So for example a function that returns the "edit distance" between two
sequences - that is the minimum number of substitutions, insertions, and
deletions needed to turn one sequence into the other - might have an
interface something like:

template<typename RandomAccessCursor1, typename RandomAccessCursor2,
typename ReadablePropertyMap1, typename ReadablePropertyMap2>
unsigned int edit_distance(
RandomAccessCursor1 b1, RandomAccessCursor1, e1,
RandomAccessCursor2 b2, RandomAccessCursor2 e2,
ReadablePropertyMap1 rpm1 = identity_property_map<typename
RandomAccessCursor1::value_type>(),
ReadablePropertyMap2 rpm2 = identity_property_map<typename
RandomAccessCursor2::value_type>() );

A few first impressions:

The algorithm, a dynamic programming algorithm, does not really need
random access to the sequences, it just needs to know up front the lengths
of both sequences, which should be available in constant time, and it
needs to make multiple passes over one of the sequences - by convention
say sequence two. So after traversing sequence two it needs to go back to
the start and traverse the elements in order again. The other sequence is
traversed in order once. So "random access" is more than is really
required.
Well, then don't use random access iterators:

template <typename InputCursor, typename ForwardCursor,
typename ReadPM1, typename ReadPM2>
unsigned int edit_distance(InputCursor b1, InputCursor e1,
ForwardCursor b2, ForwardCursor e2,
ReadPM1 rpm1, ReadPM2 rpm2);

(I omitted the default arguments because they don't matter much for the
interface).
There is a requirement that the types returned by rpm1 and rpm2 can be
compared using "==".
This is just a usual requirement on some type which is not captured by
the interface at all.
I gather that an identity_property_map has lvalue access - T& pm(*c), but
when c iterates over small elements, like char, I would want T pm(*c) -
the access occurs in the inner loop of the algorithm.

I don't think that this matters if all relevant functions are inline. If
it does, you can easily create your own replacement for an identity
property map or use a specialized property map, e.g.:

template <typename T>
struct my_map
{
T const& operator()(T const& k) { return k; }
};
template <>
struct my_map<char>
{
char operator()(char k) { return k; }
};

Property maps tend to be as trivial as this but they still provide
interesting additional power.
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence

Jul 23 '05 #38

Michael Olea

Dietmar Kuehl wrote:

... So "random access" is more than is
really required.
Well, then don't use random access iterators:

template <typename InputCursor, typename ForwardCursor,
typename ReadPM1, typename ReadPM2>
unsigned int edit_distance(InputCursor b1, InputCursor e1,
ForwardCursor b2, ForwardCursor e2,
ReadPM1 rpm1, ReadPM2 rpm2);

The question then becomes how to get the lengths of the sequences, which are
needed to initialize a table for dynamic programming. I suppose one answer
is to make them parameters also. Of course I would like to deduce the
lengths of the sequences from the cursor arguments when possible - use
cursor arithmetic when it's available, otherwise make a count-pass given a
cursor supporting multiple passes, otherwise require a length parameter.
I'm not sure yet how to do that.
... you can easily create your own replacement for an identity
property map or use a specialized property map, e.g.:

template <typename T>
struct my_map
{
T const& operator()(T const& k) { return k; }
};
template <>
struct my_map<char>
{
char operator()(char k) { return k; }
};

Property maps tend to be as trivial as this but they still provide
interesting additional power.

That gives me another idea - a property map would be a good place to put
character translation. A problem that comes up in DNA analysis is the
search for "complemented palindromes". A palindrome, of course, is a string
that reads the same forward as backward: "abccba". A "complemented
palindrome" is one that reads the same forwards as its complemented version
reads backwards, where each character of the alphabet has a unique
complement, and is the complement of its complement. Say {A <=> a, B <=> b,
C <=> c} - so then "abcCBA" would be a complemented palindrome. I am using
CharMaps already for that problem, but not reverse iterators - making
reversed complemented copies of the string to be searched. By separating
the concepts of movement and access I think I see an improvement both in
speed and space, and quite likely a reduction in code. Cool!

Jul 23 '05 #39

Steven T. Hatton

Dietmar Kuehl wrote:

Steven T. Hatton wrote:
Dietmar Kuehl wrote:
It's the ulterior motive that bothers me. To wit: built-in job security
for language lawyers.
I'm not aware of anybody in the committee who has such "ulterior motive."
I certainly have not. As far as I can tell, everybody is doing their best
to maintain an efficient, flexible, and highly usable language. The fact
that the standard is not easy to follow derives from trying to avoid
complications by stating things only once and thereby avoiding
contradictions (which are still a problem). There is also the problem
that it is hard enough to push a change to the standard which enhances
the language definition. I'm almost 100% certain that it is even much
harder to move a change forward which supposedly only makes the standard
easier to read (for a certain fraction of the readers): Guaranteeing that
a change really does what is intended is pretty hard and we are all the
time busy fixing up oversights. I doubt that anybody is really willing to
accept the danger of breaking something just to make it easier to
understand! In fact, we even know about certain areas where there are
problems with the current definition (not only with respect to
understanding them but also what they are saying) and nobody dares
touching them!

There are places where the wording is overly "artful". A very good example
is with the use of the term "header". And some statements which people
take to be definitions simply aren't.
... and job security does not really matter for the majority of the
committee anyway: essentially, the majority of the attendees has a more
than average grasp of software development and thus does not really run
the risk of staying unemployed for extensive periods of time. And this
is not at all due to C++ only: I know of several committee members who
earn their money programming primarily in languages different from C++
(e.g. C, Java, or C#).
You aren't seeing what I'm seeing. And I mean in terms of my sent mail, and
my inbox.

One aspect of the language specification that surprised
me is the amount of forwar reference it makes. I expected it to be more
linear. Not introducing concepts until their prerequisites were formally
presented.

Although this approach is desirable, it is not always possible in the
first place. In addition, this educational goal also has severe
drawbacks: currently, the standard is organized according to different
topics rather than any order imposed by trying to avoid forward
references.

I really wasn't thinking in terms of education. I was thinking in terms of
formal development of ideas. Mathematical rigor.
This makes it easy to locate specific sections. Since the
standard is not even intended to be read front to back, this organization
is much more reasonable. Also note that the standard is not intended as a
tutorial. It is intended as part of a contract and anybody determined
enough can learn to use it. It doesn't even take that long to get used to
using the standard. I think it took me less than three month to work
efficiently with the standard (of course, I didn't read the standard full
time in these three month: I just used it to look-up the details when
writing answers to questions asked in comp.lang.c++). If you think that
a deep knowledge of the C++ standard provides job security, I don't think
this is too much of an investment. Of course, I think there is more
knowledge involved in securing the jobs of the committee members...
I was, to some extent speaking metaphorically. More in the sense of ivory
tower than economics.
I think you should stop whining about the standard and just accept that
it is the way it is. If you think you can do an excellent job in
"translating it to English", I'm quite confident that there is a market
for a book with the translation referencing the corresponding sections
of the standard. In any case, wasting your and other people's time with
your complaints is not going to change a jota in the standard!
If, perchance you are referring to my conversations elsewhere, please note
that I already stated that I did not believe I would succeed in affecting a
change in the standard. I had dropped the subject completely until it was
insinuated that I was merely trying to get someone else to do my work for
me. What I have been attempting by continuing to discuss the issue is to
get people to think in terms of an API. I must say, I am stunned by the
number of people who don't believe that the declarations in an interface
are the formal statement of an API and should be consulted as such.

I doubt that the term for algorithms is changed in this context,
however. "Operation" would be a better fit from my perspective but I'm
not a native English speaker.

Google for `Schroedinger ansatz'.

The German term "Ansatz" is about the details of a solution to a problem,
very much like "algorithm" is a description of how to solve a problem.
I.e. "Ansatz" is as inappropriate as "algorithm" for the things currently
called "algorithms". I think "operation", "solver", etc., i.e. something
which states the problem, not how to solve it, is the right approach.

In terms of Schrödinger's equations, they are the operators which provide
the solution. Actually very similar to the concept of a function object.

I believe what you are saying is a variation point is a "branch point"
the compiler cannot analyse. My wording is probably sloppy, but I suspect
the concept is there.

No, I don't think so. Actually, yesterday I read the term "customization
point" for exactly the concept I called "variation point". I just recalled
the term wrongly: "customization point" is an appropriate name.

[static resolution of a member using 'mem_fun()' and relatives]
It's really a question of what the compiler can deterministally know at
compile time. If the parameter passed to mem_fun can be statically
determined by the context, the polymorphic aspects (vtbl indirection,
etc) can be eliminated.

Well, this is part of it. I had member function adapters which encoded
the member function in the type of the functor, e.g.:

template <typename T, typename S, T (S::*mem)()>
struct unary_member
{
T operator()(S& s) { return (s.*mem)(); }
}

When I played with stuff like this I verified that they are inlined
by contemporary compilers. Although 'mem_fun()' cannot possibly create
a corresponding object, const propagation in the compiler's optimizer
could in principle detect that the referenced member function never
really changes and possibly inline it. Of course, if the member
function is actually a virtual function, things become even more
complex but if the static type of the objects does not change, it
could indeed also detect this.

I actually hadn't even considered that the member function could be static.
That does sound reasonable. I believe the numbers I looked at were
referring to virtual funcitons, and yes, if the type of the object who's
member is being invoked can be determined by the context at compile time,
the compiler should in principle be able to inline it.

If the bounds of the iterators can be determined at
compile time, it may be possible to "unroll the loop".

Loop unrolling can be done without knowing the bounds statically: in
general, you don't want to unroll the whole loop anyway, even if the
bounds are statically known. You want to unroll a bunch of iterations
and put them into a loop which steps with bigger strides.

That makes sense. I hadn't though about that.
Like I said before: The goal of STL is to be fast, not to deliberately
kill the performance! Actually, one of STL's goals was to provide
generic algorithms applicable to all reasonable data structures which
are as fast as algorithms hand crafted specifically for the corresponding
data structures. Although this goal is not met in all cases the deviation
is normally in a very small margin. There were also secondary goals,
like the applicability to subsequences (if the sequence supports at least
forwarding iterators) which make operations like "hasNext()" as the basis
unsuitable.
Not sure what you mean here. The reason I mentioned the Java-style iterator
is because I believe in reality it is just using the three operations you
had enumerated, and I was suggesting that it could therefore be optimized
in much the same way. Acknowledging that it does use postfix
incrementation.

From my readding of TC++PL, getting the best performance is often
obtained by using the member function for a particular container
template.

You got it wrong. The generic algorithms are not applicable in all cases
but in some of these cases the internal structure of a certain container
still allows an implementation with good performance. In cases like this,
the container provides a corresponding operation as a member. For
example, the generic algorithm for binary search (aka 'lower_bound()',
'upper_bound()', and 'equal_range()') are only applicable to random
access iterators. However, 'set's and 'map's can provide efficient
searches [for the key type only, though] although they only support
bidirectional iteration. Thus, there are specialized versions as member
functions.

I thought that's what I said. I did, however, intent to type TC++SL. As in
Josuttis's book.

But it
does seem reasonable that the implementation could detect the container
type, and chose the best algorithm for it.

The implementation of generic algorithms does not detect the container
types but the operations provided by the iterator. Thus, it can choose
to take advantage of better suited operations if the iterator is more
capable. However, in general the iterators do not have access to the
container but only to the container's elements. For example, the list
iterator will only point to the current element in the list which does
not have a pointer to the list it is contained in.

I was thinking of a strategy implemented at a higher level. That is, the
compiler actually swapping out the algorithm to get the best performance.
It seems your last sentence is the real distinction you are trying to
make. If I have value semantics at the fundamental level, I can probably
find a way for the compiler to evaluate my code having all three
operations combined in one, and to omit unused operations from the object
code.

No. My point is that combining all three operations into one does not
leave the opportunity to omit unnecessary operations. Also, STL iterators
have the flexibility to use other operations if available, e.g. stepping
with 'it += n' (with n > 1).

I don't see why the compiler is obligated to perform all these operation if
the same behavior can be obtained by eliminating on of them.

Please note what the first paragraph says about anonymous inner classes
in Java:

http://mindprod.com/jgloss/anonymousclasses.html

I know how anonymous inner classes work in Java. I think that they are
indeed quite similar to at least my wish for lambda expression in C++:
essentially, I just want a more convenient approach to create functors
than having to write a separate function. Whether it should include
support for closures or not and how the syntax should look are actually
the only bigger issues.
I know these are not identical to lambda expressions, but they fall into
the same category.

... and I think that they help also programmers which are not that
advanced.

I've done some interesting things with them, but unless I'm using them every
day, they are difficult to understand. They take me more time to understand
and modify than do the slightly more verbose approach they replace.
There's also the problem of not seeing everything relevant to the construct
where it is located. I would have to think about the topic a bit to
provide a concrete example, but I am confident in what I asserted.

I'm not really sure what it means to have lambda support
in the core language, but I believe this area of programming is not
something that would help any but advanced programmers working is certain
specialized areas.

I put a quote from a little bit above here:
for( int i : vec ) std::cout << i; for_each(vec, std::cout << _1);
The foreach construct is something that would help a firstyear
computer science major get his or her homeword done more quickely and
with fewer errors.

Can you please identify which part of the two above code excerpts makes
one superior over the other? If there is nothing which makes the first
excerpt *much* easier for non-advanced programmers, we should go with
the more general approach. Personally, I only see one minor difference
between the two codes above: the variable has a name in the first one
while it is referenced by position in the second one. Of course, if you
want to name, you would be free to do so:

The proposed foreach construct has the ability to take bounds as arguments
so I could (probably) make this work. This is just something I have handy
that shows the kind of complexity I am thinking about. I wouldn't even
know where to begin doing it with a lambda expression:

for(size_t i = 0; i < _radii.size()-1; i++) {
osg::Vec2 unv(_radii[i+1] - _radii[i]);
unv = osg::Vec2f(-unv[1], unv[0]);
unv.normalize();

for(size_t j = 0; j < _ring.size(); j++) {
osg::Vec3 v0(_radii[i][0], _radii[i][1] * _ring[j][0], _radii[i][1]
* _ring[j][1]);
osg::Vec3 v1(_radii[i+1][0], _radii[i+1][1] * _ring[j][0],
_radii[i+1][1] * _ring[j][1]);
osg::Vec3 nv(unv[0], unv[1] * _ring[j][0], unv[1] * _ring[j][1]);

v0 *= scale;
v1 *= scale;

if(_axis == X){
vertices->push_back(v0);
vertices->push_back(v1);
normals->push_back(nv);
} else {
vertices->push_back(rotv3(v0, _axis));
vertices->push_back(rotv3(v1, _axis));
normals->push_back(rotv3(nv, _axis));
}

normals->push_back(normals->back());

size_t f2 = 2 * _facets;
size_t n = f2 * i;
size_t j2 = 2 * j;

osg::DrawElementsUInt* quad = new
osg::DrawElementsUInt(osg::PrimitiveSet::QUADS, 0);

quad->push_back(n + j2);
quad->push_back(n + j2 + 1);
quad->push_back(n + (j2 + 3)%f2);
quad->push_back(n + (j2 + 2)%f2);

psl->push_back(quad);

}
}

for_each(vec, { int i = _1; std::cout << i; });

For other types than integers you would use 'T const& i = _1;' or
something similar. In teaching a feature like this, it is not even
necessary to state that there is a lambda expression creating a functor.
On the other hand, even if it is done it is not harmful if the reader
was exposed to the concept of a functor before.
I'd have to read the API docs, but many of Java's containers are really
interfaces which make no requirement of the form of the "backing data
structure".
You just have to understand that it does not matter whether the
containers are interfaces or not. What matters for algorithms is what
kind of access can efficiently provided. In particular, you cannot do
both efficiently (meaning in O(1)):

- insert elements at arbitrary positions
- provide random access

For example, a vector has chosen to provide random access while a list
has chosen to provide element insertion at arbitrary positions (actually,
there are more operations in each group but the above are sufficiently
characteristic). Thus, you would need a slow conversion from a list to
an array if all algorithms operate using random access. Or you could
decide that you cannot insert into the middle of a container efficiently.

But I don't see how this distinction is any different for Java than it is
for C++. My point in saying that you can call toArry on a Java container
was that you can impose a random access mappin over the collection, and use
random access algorithms to operate on it. Mind you with Java you will
mostlikely be sorting pointers. That has advantages and disadvantages.
One advantage is that it's possible to impose multiple mappings at the same
time on the same collection.
I'll grant that you are at the mercy of the Java
implementation unless you can JNI your way around it. I'm not sure how
Java linked lists pose any significant disadvantage over their C++
counterpart.

You stated that in Java you would use an array in your algorithms
(quoting you from an earlier article in this thread):

I was saying that you could convert the linked list to an Array in order to
get random access. But that was assuming we were talking about C++ random
access algorithms.

With Java you can call toArray() on a collection, and get a random access
interface to iterate over with a for() loop using the same sequence of
checks you've stated above.

This would mean one of two things:
- Insertion into Java lists is slow
- Using algorithms on Java lists is slow

You cannot get both if you go through an array interface. I consider
this a huge disadvantage. There are, however, other alternatives like
going through virtual functions but these approaches have their own
problems, e.g. a loss in optimization opportunities.

I would have to read up on exactly what determines the backing data
structure of the various Java collections. They may actually hold
everything with an Array of pointers, and merely manipulate pointers to
provide the different interfaces. If you add an element to a linked list
you could put it at the end of the array and put it's pointer somewhere
else in the linked list. Yes you will pay an indirection cost. And Java
may also leave holes on removal of elements, and pack it when it feels like
it. I don't recall, if I ever knew.

Java Arrays are very friendly little animals:
http://java.sun.com/docs/books/jls/t...ml/arrays.html
Note that these are arrays of _values_ rather than references (unless
that's what you chose to put in them).

Java arrays have their own problems, too. Actually, the designers of
Java got the conversion rules wrong for them: arrays should allow no
conversion because otherwise you could easily get violations of the
Liskov Substition Principle. Immutable arrays could be covariant but
Java has no concept of immutability on the type level.

I don't know how their so-called generics work, but supposedly they address
some of that. I'm sure the designers of Java understood the tradeoff, they
just opted for sloppy type checking to get flexible behavior.

The closest thing C++ has is Josuttis's boost:array<> which is a bit more
cumbersome to work with, and is not dynamic.

'std::vector' is the right match for Java arrays. The Java arrays are
not statically sized like C++ built-in arrays or 'std::tr1::array's.
There could possibly be class between 'std::vector' and 'std::tr1::array',
i.e. a class which is sized at run-time but once the size is set does not
allow size changes.

That is something I would like to have in C++. You can do it with built-in
arrays, but you have to jump through a lot of hoops. The one feature of
Java Arrays that makes them superior to C++ built-ins is the .size() (or is
that .length()?) method. You can do it with sizeof, but that gets really
weird when you start passing them as parameters.
This would be a nearly exact match for Java arrays.
The only additional "capability" would be the conversion to arrays of
bases which is, however, broken in the first place.
Agreed.

I'm confident there are advantages to using the existing STL algorithms
(what's the plural for ansatz?)

"Ansätze" (I don't know whether the third letter is shown on your
computer the way it should be because it is outside the conventional
Usenet encoding of ASCII; it should be an "a" with an "umlaut", i.e.
a horizontal pair of dots).

Other than the fact that the guys as SuSE change my keyboard mapping once a
week, I can do a reasonable job of typing in Old Norse, so the umlauts come
through, no problem.
If you explain why my version of the "for_each()" construct is vastly
inferior to yours, I can see reason to accept this argument. However,
I don't see it to be inferior. Actually, I see it as a vastly superior
construct because it removes syntax oddities in other places, too,
making the code easier also for new users.

See my code example above.

Bear in mind that Java is organized differently than C++'s STL, so this
comparison is only of limited worth:

I wasn't aware of a class providing algorithms on abstract containers.
Of course, it is the wrong approach to require a container in the first
place. However, the really interesting analysis is whether it provides
the algorithms efficiently or only with relatively minor losses compared
to hand-crafted versions for specific containers.

I really don't know. I'm trying to learn C++ right now. I haven't worked
with Java in well over a year.

On suggestion I made a while back is what I call
a massless interface. It's just a formal declaration of what is to be
provided, and does not become part of the executable code. The problem
with that is there may be circumstances when only part of the
meta-interface is needed by the class instantiated from the template.

Concepts are basically that. However, the problems are rather different
than what you assumed.
I was talking about a first level working understanding, not a more
complete and comprehensive one.

Right. You need to understand concepts to understand STL - at the first
level. If you don't understand concepts you have no chance whatsoever to
ever understand STL.

When I was able to divide the STL conceptually into two groups of things;
collections, and algorithms, I was able to think about it in finite terms.
It no longer seemed open-ended, That's what I mean. That's what I
consider the first step in really understanding something.

I will observe that this discussion helped
solidify my understanding that iterators are not "part of" the container,
but are instead instances of types bound to the containers through
typedefs.

... or, to put it differently, you still haven't understood what iterators
in STL are about. Maybe you should start picking up a book introducing
STL rather than trying to make sense of what is written in the header
files. You won't learn how STL works from the header files.

What I said is completely accurate regarding the iterators that are
associated with containers. I have no idea what you are talking about.
Every STL container other than bitset has iterator typedefs. They have an
iterator, a const_iterator, a reverse_iterator, and a
const_reverse_iterator. The sequence types have a difference type for
storing the difference between iterator positions.

The reverse iterators are actually instances of iterator adapter templates
taking iterators as arguments. They use a trick that separates the
physical and logical positions of the iterators so that container.end() is
the same as the physical position of container.rbegin(), but
container.rbegin()'s logical position is container.end()-1. That way you
can have half open ranges in both directions using the same physical
positions. The main reason for doing this is because the language doesn't
specify that arrays don't start at address zero. If you tried to use
address -1, you'd probably be in trouble.
There are different categories of iterator:
input iterators which read forward

output iterators which write forward

forward iterators which read and write forward

bidirectional iterators which read and write forward and backward

random access iterators which add integral indexing to the functionality of
bidirectional iterators (only for sequences, not associative containers)

There are other iterator adapters:

Insert iterators:
back_insert_iterator
front_insert_iterator
insert_iterator

each insert iterator has an associated creation function:
back_inserter
front_inserter
inserter

There are also stream iterators.

Then there are iterator traits and iterator tags for creating iterators.

But it's important to bear in mind that iterators are a pure abstraction, so
anything that acts like an iterator is an iterator.

Perhaps I'm wasting my time reading Josuttis's The C++ Standard Library? I
only have 8 more pages left in Chapter 13. But perhaps I'm wasting my
time. Can you suggest a *_good_* book. Guess I'll skip chapter 14. I'm
sure the quality of presentation is not up to your high standards.

For the beginner in C++ that will not be the case.

If your only argument is that the beginner in C++ cannot understand it
while everybody else can, you are just pulling up a strawman.

No. And that's not really what I'm saying. I don't believe I would be able
to write intuitive code using the feature.

I'm not sure what core language changes are involved in adding lambda
support, but the fact that there are changes to the core language
required to make it fully functional is the only argument I can see for
wanting it
in C++0X. If it were just a library, I'd say let it be provided by and
for the people who understand it and want it.

If it could be implemented just as a library, I would not bother to ask
for it! Why would I? I can implement libraries myself which is a much
safer approach than waiting for compiler implementers to implement some
feature. However, since it requires core features to do it right, I
don't want too much competing features (like special case solutions) in
the language. Especially, if the special case can easily be addressed
using the general solution!

There is clearly a lot of tricky stuff to learn here:
http://www.boost.org/doc/html/lambda/le_in_details.html

--
If our hypothesis is about anything and not about some one or more
particular things, then our deductions constitute mathematics. Thus
mathematics may be defined as the subject in which we never know what we
are talking about, nor whether what we are saying is true.-Bertrand Russell

Jul 23 '05 #40

Dietmar Kuehl

Michael Olea wrote:

Dietmar Kuehl wrote:
template <typename InputCursor, typename ForwardCursor,
typename ReadPM1, typename ReadPM2>
unsigned int edit_distance(InputCursor b1, InputCursor e1,
ForwardCursor b2, ForwardCursor e2,
ReadPM1 rpm1, ReadPM2 rpm2);
The question then becomes how to get the lengths of the sequences, which
are needed to initialize a table for dynamic programming.

Do you need the length of both sequences up front? If so, the best
bet would be asking for two multi-pass sequences, i.e. forward cursors
for both sequence. To determine the actual length, you would always
use

distance(b, e);

where 'b' and 'e' are the beginning and the end of the corresponding
sequence. This would count all elements in within the range if the
user of the algorithm does not pass random access cursors. For random
access cursors a simple substraction would be automatically used. That
is, although you only ask for multi-pass iterators, the algorithm
would take advantage of optimizations available for the actual
parameters.
I suppose one answer is to make them parameters also.

I would not go down this route! The restriction to have a multi-pass
iterator is not that bad unless the sequences are really tremendous
(i.e. they won't fit into memory together with the other necessary
data). Of course, you might provide an interface to the algorithm
which takes the length as argument but which only requires input
iterators. This algorithm would then be used to implement one which
determines the length on its own.
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence

Jul 23 '05 #41

Dietmar Kuehl

Steven T. Hatton wrote:

Dietmar Kuehl wrote:
There are places where the wording is overly "artful". A very good
example is with the use of the term "header".
I don't think "header" is can be replaced with anything which better
describes it to the target audience of the C++ standard (people who
know C++ and just want to find out details of the actual specification).
The term "header" for sets of declarations is sufficiently old and
established that everybody fluent in C++ (or C) knows what it is about.
The standard does not live in isolation nor is it intended to do so.
Actually, there are known problems with the standard which are more
harmful. For example, some things are not really specified because it
is assumed that the reader of the standard magically knows what the
actual intend was.

... and job security does not really matter for the majority of the
committee anyway: essentially, the majority of the attendees has a more
than average grasp of software development and thus does not really run
the risk of staying unemployed for extensive periods of time. And this
is not at all due to C++ only: I know of several committee members who
earn their money programming primarily in languages different from C++
(e.g. C, Java, or C#).

You aren't seeing what I'm seeing. And I mean in terms of my sent mail,
and my inbox.

You mean, you get requests from committee members pleading for job? I'm
convinced that none of the committee members has intended to secure his
jobs by deliberately making the standard the way it is.
I really wasn't thinking in terms of education. I was thinking in terms
of formal development of ideas. Mathematical rigor.
Ah, you mean the standard committee should have rather gone into some
suitable ivory tower instead of practically working on a document. Of
course, it would have been desirable to be absolutely precise, have
no contradictions, have definitions for everything, etc. The committee
tried and actually still tries to do so. Practical considerations are,
however, still important. For example, the time we can work on the
standard is actually rather limited since everybody actually does
something for a living. I'm not aware of anybody who has the liberty
to work full-time on the standard although some organizations devote
substantial amounts of time to the process already. ... and the
community wants results faster than they can be delivered, too.

I was, to some extent speaking metaphorically. More in the sense of ivory
tower than economics.
Well, the C++ standard was not and never will be produced in an ivory
tower. It is driven by practical and economical considerations and
this is not going to change.
Not sure what you mean here. The reason I mentioned the Java-style
iterator is because I believe in reality it is just using the three
operations you had enumerated, and I was suggesting that it could
therefore be optimized in much the same way.
I don't think that Java enumerators can be substantially optimized.
For one, it is hard enough to determine that the object's type cannot
change in the given context and even if this can be determined, the
three operations (advance, access, and check) are too interwinded to
be separable at all. In addition, Java enumerators lack the interface
to take advantage of additional operations.

From my readding of TC++PL, getting the best performance is often
obtained by using the member function for a particular container
template.

You got it wrong. The generic algorithms are not applicable in all cases
but in some of these cases the internal structure of a certain container
still allows an implementation with good performance. In cases like this,
the container provides a corresponding operation as a member. For
example, the generic algorithm for binary search (aka 'lower_bound()',
'upper_bound()', and 'equal_range()') are only applicable to random
access iterators. However, 'set's and 'map's can provide efficient
searches [for the key type only, though] although they only support
bidirectional iteration. Thus, there are specialized versions as member
functions.

I thought that's what I said.

Maybe you should understand my wording, then review your wording, and
then realize that they actually state rather different things with
mine reflecting what Nico almost certainly wanted to state. Essentially,
I'm saying that the generic (non-member) algorithms have the best
performance for all sequences to which they are applicable (*). The
container classes may provide specialized (member) algorithms in cases
where the generic algorithm is not applicable but an algorithm with
good (or for the given data structure best) performance can be
implemented due to the knowledge of the data structure. Essentially,
your statement implies that there is a choice between algorithms used
(generic or specific ones). This is not the case: containers only have
specialized algorithms if the generic ones cannot be applied.

(*) Actually, this statement is too strong in some cases: it is sometimes
possible to create a faster algorithm based on special properties of
the used data structures. This is, however, essentially only due to
incomplete sets of concepts and corresponding incomplete implementation
of algorithms. Conceptually, the generic algorithms should be as good
as any special algorithm - and in general even better because having to
maintain only one (or a small few) generic implementation of an
algorithm is more effective than implementing the same algorithms for
each data structure and should thus provide the freedome to aggressively
optimize the generic implementation.

But it
does seem reasonable that the implementation could detect the container
type, and chose the best algorithm for it.

The implementation of generic algorithms does not detect the container
types but the operations provided by the iterator. Thus, it can choose
to take advantage of better suited operations if the iterator is more
capable. However, in general the iterators do not have access to the
container but only to the container's elements. For example, the list
iterator will only point to the current element in the list which does
not have a pointer to the list it is contained in.

I was thinking of a strategy implemented at a higher level. That is, the
compiler actually swapping out the algorithm to get the best performance.

.... and I explained that generic algorithms actually indeed determine
the best algorithm (at compile time) but do so based on the sequence
concepts, entirely ignoring any possibly involved container. Of course,
I'm trying to bring home the idea that containers are entirely irrelevant
in STL since at least half a dozens articles in this thread (and many
more before this thread...).
I don't see why the compiler is obligated to perform all these operation
if the same behavior can be obtained by eliminating on of them.
Actually, the compiler does not really do it at all. It is the
implementer of the generic algorithm who does. The compiler is then
forced to choose the right algorithm by various rules of the C++
language, e.g. overload resolution or SFINAE.
I've done some interesting things with them, but unless I'm using them
every day, they are difficult to understand. They take me more time to
understand and modify than do the slightly more verbose approach they
replace.
["they" and "them" referring to Java's anonymous classes]

I agree with this. However, the envisioned lambda expressions work
only similar semantically, not syntactically, anyway. In some sense,
you can view the envisioned lambda as nothing different than a block.
The only news is that blocks can also be passed to functions in which
case they become nothing else than a functor. To allow arguments to
these functors, you can use the "special" variable "_1", "_2", ... I
don't consider this overly complicated. Of course, it assumes that
the user has at least a vague idea about what a "functor" is. However,
this is a pretty central concept and without knowing it a user would
not be very effective in using STL algorithms anyway.
There's also the problem of not seeing everything relevant to the
construct
where it is located. I would have to think about the topic a bit to
provide a concrete example, but I am confident in what I asserted.
I'm also confident in what I have asserted.
The proposed foreach construct has the ability to take bounds as arguments
so I could (probably) make this work. This is just something I have handy
that shows the kind of complexity I am thinking about. I wouldn't even
know where to begin doing it with a lambda expression:
The funny thing is that it wouldn't look that much different at all
- despite the fact that I wasn't able to locate any use of a "for_each"
construct... To make use of lambda expressions in this ordinary C++
code, you would essentially replace the "for" statements by function
templates (which can be provided by a library) which essentially call
each block with a corresponding index. The index would not be called
"i", "j", ... but "_1", "_2", ..., e.g.

for_each_index(_radii, {
osg::Vec2 unv(_radii[_1 + 1] - _radii[_1]);
unv.normalize();

for_each_index(_radii, _1, _ring, {
osg::Vec3 v0(_radii[_1][0], _radii[_1][1] * _ring[_2][0],
_radii[_1][1] * _ring[_2][1]);
// ... (I think it is clear how to modify the remainder to
// achieve the same effect of the loop)
});
});
Of course, this uses two different overloads of the 'for_each_index()'
function:
- one taking a unary functor as argument.
- one taking a binary functor as argument (the first argument is
fixed by the function templates second argument, the second is
the varying index)

For example, a vector has chosen to provide random access while a list
has chosen to provide element insertion at arbitrary positions (actually,
there are more operations in each group but the above are sufficiently
characteristic). Thus, you would need a slow conversion from a list to
an array if all algorithms operate using random access. Or you could
decide that you cannot insert into the middle of a container efficiently.

But I don't see how this distinction is any different for Java than it is
for C++. My point in saying that you can call toArry on a Java container
was that you can impose a random access mappin over the collection, and
use
random access algorithms to operate on it. Mind you with Java you will
mostlikely be sorting pointers.

Imposing any random access structure (one where all elements can be
accessed in O(1)) to a container where you can insert in O(1) at
arbitrary positions involves at least an O(n) operation. Although it
reduces the constant when only using pointers it does not change
the asymptotic complexity. Thus, doing such a conversion necessarily
imposes a substantial overhead, especially if the actual operation
can be done with a complexity smaller than O(n), like searching in
a sorted sequence which can be done in O(ln n). I assume you know
enough about complexity theory to make sense of these funny O(f(n))
notations (if not you should read up on them; this is a pretty
important topic if you want to do serious software development).
That has advantages and disadvantages.
One advantage is that it's possible to impose multiple mappings at the
same time on the same collection.
Although it is possible with the current STL concepts (if you are
trying hard enough) this is something which is trivially done using
cursors and property maps as the basic concepts - one of the reasons
why I'm trying to move the committee to refine the current STL
concepts to use cursors and property maps as the basic sequence
abstraction instead of iterators.
I was saying that you could convert the linked list to an Array in order
to
get random access. But that was assuming we were talking about C++ random
access algorithms.
At least I was talking about generic algorithms which do some funny
things to provide the best performance achievable for a given sequence.
Converting between different kinds of sequence abstraction like
converting between an item based (e.g. forward or bidirectional access
is provided by linked lists) and a random access based (i.e. you can
move between arbitrary elements in O(1) time) is none of these funny
things, however. The key of generic algorithms is to provide efficient
implementations based on what you pass to them. If you pass them a
list iterator (i.e. you get forward or bidirectional access) they use
just this and to the best possible on the basis of this restriction.
If you pass them a vector iterator (i.e. you get random access) they
may be able to use a more efficient algorithm due to the enhanced
capabilities.

With Java you can call toArray() on a collection, and get a random
access interface to iterate over with a for() loop using the same
sequence of checks you've stated above.

This would mean one of two things:
- Insertion into Java lists is slow
- Using algorithms on Java lists is slow

You cannot get both if you go through an array interface. I consider
this a huge disadvantage. There are, however, other alternatives like
going through virtual functions but these approaches have their own
problems, e.g. a loss in optimization opportunities.

I would have to read up on exactly what determines the backing data
structure of the various Java collections. They may actually hold
everything with an Array of pointers, and merely manipulate pointers to
provide the different interfaces. If you add an element to a linked list
you could put it at the end of the array and put it's pointer somewhere
else in the linked list. Yes you will pay an indirection cost. And Java
may also leave holes on removal of elements, and pack it when it feels
like it. I don't recall, if I ever knew.

You cannot get both: insertion at an arbitrary position and random
access. Well, you can get insertion at an arbitrary position and
random access if you don't need the order for the random access to
retain the original order. Of course, loosing the original order
effectively kills your chances to use a more efficient algorithms
taking advantage of random access. That is, simply adding things to
the end of an array does not help you when you want to pass the
sequence to an algorithm.

I think rather than reading up on Java stuff you should at least
consider trying to understand what I'm saying. I'm convinced that
this provides more interesting insights into implementation of
algorithms than finding out about details of Java's internal data
structures...
I don't know how their so-called generics work, but supposedly they
address
some of that. I'm sure the designers of Java understood the tradeoff,
they just opted for sloppy type checking to get flexible behavior.
I don't think that the Java designers understood anything about generic
programming at all, otherwise they would not have done what they did.
I understand the restrictions they felt themselves caught in but I have
seen a much better approach to generic on a JVM than what Java now has!
They opted for something which was done in C++ something like 15 years
ago and which became what are templates now. Essentially, the Java
designers refused to learn the lessons learned in the C++ context (of
course, they also refused from other experiences before, too).
The one feature of
Java Arrays that makes them superior to C++ built-ins is the .size() (or
is that .length()?) method.
It is trivial to create a class which looks like a built-in array but
adds a 'size()' member. The standard library does not do it because it
is deemed sufficient to provide 'std::vector': the two extra pointers
per object are not considered to be too expensive and I doubt that
there are indeed many applications where they are harmful. In the few
cases where they are, it is sufficiently easy to create a suitable
class [template].
You can do it with sizeof, but that gets really
weird when you start passing them as parameters.
Of course, I have rare occasion to pass arrays in the first place. I
generally pass iterator pairs to suitable function templates. Maybe you
should try to understand what STL is about and you will realize that
there are few situations where you want to restrict your functions to
accept arrays anyway.

"Ansätze" (I don't know whether the third letter is shown on your
computer the way it should be because it is outside the conventional
Usenet encoding of ASCII; it should be an "a" with an "umlaut", i.e.
a horizontal pair of dots).

Other than the fact that the guys as SuSE change my keyboard mapping once
a week, I can do a reasonable job of typing in Old Norse, so the umlauts
come through, no problem.

Of course, I assume that others benefit from this thread, too (?). Thus,
it was a comment addressing everybody who just sees "Ans<some garbage>tze"
instead of what I had written.

If you explain why my version of the "for_each()" construct is vastly
inferior to yours, I can see reason to accept this argument. However,
I don't see it to be inferior. Actually, I see it as a vastly superior
construct because it removes syntax oddities in other places, too,
making the code easier also for new users.

See my code example above.

Ditto.
When I was able to divide the STL conceptually into two groups of things;
collections, and algorithms, I was able to think about it in finite terms.
It no longer seemed open-ended, That's what I mean. That's what I
consider the first step in really understanding something.
Except that neither the set of collections nor the set of algorithms
is finite to the STL! Part of understanding STL is that although you
have an infinite set of collections, you only need to implement each
algorithm a finite (actually small) number of times. That is, if you
want an algorithm which is applicable to all containers, you only
need to provide one implementation thereof (possibly, you need a few
more implementations if you want to take advantage of specialstructures
for providing better performance but this is only about optimization).
On the other hand, if want to create a container which can be used with
all algorithms, you only need to provide a few entities abstracting
your container (e.g. iterators). That is, both the set of containers
and the set of algorithms are open. The only thing which is indeed
conceptually finite in STL is the set of concepts (and even this is
only finite per domain addressed by the algorithms and data structures).

BTW, STL conceptually consists of at least four different things:
- concepts
- algorithms
- iterators and functors
- containers
(listed in decreasing order of importance).

I will observe that this discussion helped
solidify my understanding that iterators are not "part of" the
container, but are instead instances of types bound to the containers
through typedefs.

... or, to put it differently, you still haven't understood what
iterators in STL are about. Maybe you should start picking up a book
introducing STL rather than trying to make sense of what is written in
the header files. You won't learn how STL works from the header files.

What I said is completely accurate regarding the iterators that are
associated with containers. I have no idea what you are talking about.

It wouldn't have been necessary to state the last sentence since this is
indeed obvious. In several messages I tried to explain that iterators are
a concept which is entirely independent of containers. You don't need any
container but you can still have a sequence which is accessed by
iterators. You are right that there are iterators associated with
containers but this only means that containers have associated iterators
and that they may indeed sometimes depend on them. On the other hand, you
can create iterators for containers which are ignorant of iterators, i.e.
you can apply STL algorithms to containers which were created in complete
unawareness of STL in the first place! All you need to do is to create
suitable iterators.

.... and maybe you will eventually understand what I'm talking off and
realize that your statement was not as accurate as you apparently think
it was.
Every STL container other than bitset has iterator typedefs. They have an
iterator, a const_iterator, a reverse_iterator, and a
const_reverse_iterator. The sequence types have a difference type for
storing the difference between iterator positions.
You aren't telling me about STL containers, are you? If you do, you might
want to read the second paragraph in the acknowledgments of Nico's book...
(although since then I have not only implemented the locales and IOStreams
library but most of the standard C++ library).

[relatively irrelevant although correct stuff about particular iterators
remove]
But it's important to bear in mind that iterators are a pure abstraction,
so anything that acts like an iterator is an iterator.
The above sentence is part of the key to STL. I just have the strong
impression you have not yet digested its implications. Also, it is
a major oversimplification when it comes to concept hierarchies.
Perhaps I'm wasting my time reading Josuttis's The C++ Standard Library?
Perhaps. It would not be due to Nico's presentation, though.
I only have 8 more pages left in Chapter 13. But perhaps I'm wasting my
time. Can you suggest a *_good_* book. Guess I'll skip chapter 14. I'm
sure the quality of presentation is not up to your high standards.
This is an interesting statement, especially taking the last sentence
of the first paragraph of Chapter 14 into account... I'm undecided on
whether you are ridiculing me or you didn't know what is says. In any
case, you are right with your assumption that I would present the
contents differently today (but I still think it is appropriate).
No. And that's not really what I'm saying. I don't believe I would be
able to write intuitive code using the feature.
Can you point out where

for_each(vec, { std::cout << _1; });

is more complex than

for_each(int i: vec) { std::cout << i; }

? These two statements are indeed different but I seriously don't see
where they are substantially different to justify that the first
code is not intuitive...

There is clearly a lot of tricky stuff to learn here:
http://www.boost.org/doc/html/lambda/le_in_details.html

I'm well aware of Boost::Lambda and how it is implemented, including
the more complicated scenarios. Still, this library does not cover
everything, nor can it.
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence

Jul 23 '05 #42

Michael Olea

Dietmar Kuehl wrote:

Michael Olea wrote:

The question then becomes how to get the lengths of the sequences...

... you would always use

distance(b, e);

[which does a simple subtraction when it can, otherwise counts]

Perfect. Done.

(I was wondering how "distance" could specialize on iterator type, but
looking at a g++ version I see it simply dispatches to an overloaded
function taking a 3rd argument of type xxx_iterator_tag.)

Now I have a generic edit_distance algorithm that cleanly separates the
notions of movement and access (a real issue in sequence analysis), and is
as efficient as hand crafted versions would be. Gotta love generic code!

The interface, though, is a little cumbersome - 6 parameters to specify two
sequences. A higher level abstraction - "Sequence" - might be called for:

http://boost-consulting.com/projects...ence/doc/html/

(I know you are aware of this link, I am assuming there is some non-zero
chance others may be reading this thread.) Then maybe the interface would
look like:

template<class ReadableForwardSeq1, class ReadableForwardSeq2>
unsigned int edit_distance(ReadableForwardSeq1 s1, ReadableForwardSeq2 s2);

I don't know if that would simplify client code much, but there is another
motive: there will be a functor version of edit_distance, one of a number
of metric functors to be passed to a handful of generic cluster analysis
algorithms. The idea is that these algorithms do not need to know anything
at all about the objects being clustered (vectors, tuples, strings,
graphs...) other than some notion of proximity, which is provided by a
metric (or alternatively a similarity) functor. And in general the call:
my_metric(x, y) seems more natural than my_metric(b1, e1, b2, e2, rmp1,
rpm2).

Just thinking out loud here. Anyway, I am going out of town tomorrow morning
for a meeting - and it is *very nice* to be walking into that meeting with
a working version of edit_distance based on the cursor/property_map idea,
where edit_distance is serving as a model for many other functions like it.

- Michael

Jul 23 '05 #43

Steven T. Hatton

Dietmar Kuehl wrote:

Steven T. Hatton wrote:
Dietmar Kuehl wrote:
There are places where the wording is overly "artful". A very good
example is with the use of the term "header".
I don't think "header" is can be replaced with anything which better
describes it to the target audience of the C++ standard (people who
know C++ and just want to find out details of the actual specification).
The term "header" for sets of declarations is sufficiently old and
established that everybody fluent in C++ (or C) knows what it is about.

In C++ "header" does not mean the same thing that it means on page 82 of
K&R. I try do distinguish between 'header' meaning any header file, and
Standard Header, meaning a member of a particular set of entities specified
in the C++ Standard. I treat "Standard Header" as a proper noun. I'm sure
the English professors can throw their Latin words at me for it, but I
don't care.
The standard does not live in isolation nor is it intended to do so.
Actually, there are known problems with the standard which are more
harmful. For example, some things are not really specified because it
is assumed that the reader of the standard magically knows what the
actual intend was.
One thing I found lacking, though it seems to clearly have been intended, is
a specification of what it means for one program element to appear after
another. There is, AFAIK, no stipulation that a file be parsed in
assending address order. That is, source text should be read top to
bottom, left to right. That may seem to be a foregone conclusion. Perhaps
it's best left as such, rather than trying to specify what top to bottom,
left to right means in terms of a translation unit. Doing otherwise could
result in volumes of philosophical essays.
I'm
convinced that none of the committee members has intended to secure his
jobs by deliberately making the standard the way it is.
It's not the big guys with the trillion dollar market capitalization.

I really wasn't thinking in terms of education. I was thinking in terms
of formal development of ideas. Mathematical rigor.

Ah, you mean the standard committee should have rather gone into some
suitable ivory tower instead of practically working on a document. Of
course, it would have been desirable to be absolutely precise, have
no contradictions, have definitions for everything, etc. The committee
tried and actually still tries to do so.

There is a big difference between saying 'the Committee should have done
such and such', and saying 'I was surprised to find such and such'. I did
not say it was wrong that the Standard is constructed as it is. I was just
surprised to find that it is so constructed. Yes I made the comment in the
context of discussin the understandability of the Standard, but I only made
it as an observation about the structure offered for consideration. It was
not a criticism.

Computers offer us a powerful collection of tools, and the ability to build
new tools for analysing the structure of documents such as the C++
Standard. Personally, I find the prospect attractive as an intellectual
exercise, and believe it could have practical benefit.
Practical considerations are,
however, still important. For example, the time we can work on the
standard is actually rather limited since everybody actually does
something for a living. I'm not aware of anybody who has the liberty
to work full-time on the standard although some organizations devote
substantial amounts of time to the process already. ... and the
community wants results faster than they can be delivered, too.
Understood. A careful analysis of the structure of the Standard is probably
something better done in the offseason. I've suggested the text of the
document be converted to XML in oder to facilitate such a study. Working
with it in PDF is beyond painful for any purpose other than simply reading
it or printing it. I suspect copying from the document when it's open in
Acrobat is easier under Windos than under X, but that would force me to
work in an environment lacking all the other tools I rely on.

I was, to some extent speaking metaphorically. More in the sense of
ivory tower than economics.

Well, the C++ standard was not and never will be produced in an ivory
tower. It is driven by practical and economical considerations and
this is not going to change.

Actually, C and C++ are the products of a different culture than the one
that exists today. Bell Labs used to have the all the reverence bestowed
on a major university. Likewise for XEROX PARC. But I don't care to play
language lawyer in defending my choice of words. Let them fall as they
will.
I don't think that Java enumerators can be substantially optimized.
For one, it is hard enough to determine that the object's type cannot
change in the given context and even if this can be determined, the
three operations (advance, access, and check) are too interwinded to
be separable at all. In addition, Java enumerators lack the interface
to take advantage of additional operations.
What can be done in Java as implemented on a JVM, what can be done with the
syntax of the Java-style iterators. I'm not sure of the distinction
between iterators and enumerators, and don't have time to look it up, but
these are two distinct interfaces in Java. My point was regarding the
relationship between syntax of the programming language and the ability of
the compiler to optimize it. It's easy to lose sight of the fact that the
source language does not necessarily dictate what the compiler can
optimize. As regards flexibility of syntax at the source level, I agree
that Java-style iterators are not as flexible. I often found myself
casting things to Array (a rather arcane exercise in Java) to get a more
comfortable interface than that provided by Java containers.

I thought that's what I said.

Maybe you should understand my wording, then review your wording, and
then realize that they actually state rather different things with
mine reflecting what Nico almost certainly wanted to state. Essentially,
I'm saying that the generic (non-member) algorithms have the best
performance for all sequences to which they are applicable (*). The
container classes may provide specialized (member) algorithms in cases
where the generic algorithm is not applicable but an algorithm with
good (or for the given data structure best) performance can be
implemented due to the knowledge of the data structure. Essentially,
your statement implies that there is a choice between algorithms used
(generic or specific ones). This is not the case: containers only have
specialized algorithms if the generic ones cannot be applied.

There are some cases where he states that std::list provides member
functions which are superior to the namespace local (he says "global", but
I try to avoid using foul language in fromal discussions) algorithms. See
for example §9.8.1. You are thinking of the cases where the namespace
local algorithms are not applicable to associative containers. Bear in mind
that the highlighter ink is still wet on my copy of TC++SL.

(*)
Agreed.

I don't see why the compiler is obligated to perform all these operation
if the same behavior can be obtained by eliminating on of them.

Actually, the compiler does not really do it at all. It is the
implementer of the generic algorithm who does. The compiler is then
forced to choose the right algorithm by various rules of the C++
language, e.g. overload resolution or SFINAE.

I'd have to really get down into the weeds to reply intelligently, and I
don't believe that would be practical for a person with my limited
understanding of these matters. As for SFINAE, I thought that was the
silliest "look at the neat tricks we can do with templates" example
possible, until I saw the paper on overloading operator.().

I've done some interesting things with them, but unless I'm using them
every day, they are difficult to understand. They take me more time to
understand and modify than do the slightly more verbose approach they
replace.

["they" and "them" referring to Java's anonymous classes]

I agree with this. However, the envisioned lambda expressions work
only similar semantically, not syntactically, anyway. In some sense,
you can view the envisioned lambda as nothing different than a block.
The only news is that blocks can also be passed to functions in which
case they become nothing else than a functor. To allow arguments to
these functors, you can use the "special" variable "_1", "_2", ... I
don't consider this overly complicated. Of course, it assumes that
the user has at least a vague idea about what a "functor" is. However,
this is a pretty central concept and without knowing it a user would
not be very effective in using STL algorithms anyway.

Perhaps I'm too closed minded on the issue, but when I started reading about
delayed evaluation, etc., I had the impression that lambda can get pretty
tricky.

There's also the problem of not seeing everything relevant to the
construct
where it is located. I would have to think about the topic a bit to
provide a concrete example, but I am confident in what I asserted.

I'm also confident in what I have asserted.
The proposed foreach construct has the ability to take bounds as
arguments
so I could (probably) make this work. This is just something I have
handy
that shows the kind of complexity I am thinking about. I wouldn't even
know where to begin doing it with a lambda expression:

The funny thing is that it wouldn't look that much different at all
- despite the fact that I wasn't able to locate any use of a "for_each"
construct... To make use of lambda expressions in this ordinary C++
code, you would essentially replace the "for" statements by function
templates (which can be provided by a library) which essentially call
each block with a corresponding index. The index would not be called
"i", "j", ... but "_1", "_2", ..., e.g.

for_each_index(_radii, {
osg::Vec2 unv(_radii[_1 + 1] - _radii[_1]);
unv.normalize();

for_each_index(_radii, _1, _ring, {
osg::Vec3 v0(_radii[_1][0], _radii[_1][1] * _ring[_2][0],
_radii[_1][1] * _ring[_2][1]);
// ... (I think it is clear how to modify the remainder to
// achieve the same effect of the loop)
});
});
Of course, this uses two different overloads of the 'for_each_index()'
function:
- one taking a unary functor as argument.
- one taking a binary functor as argument (the first argument is
fixed by the function templates second argument, the second is
the varying index)

Perhaps I should study it more carefully. Thanks.

For example, a vector has chosen to provide random access while a list
has chosen to provide element insertion at arbitrary positions
(actually, there are more operations in each group but the above are
sufficiently characteristic). Thus, you would need a slow conversion
from a list to an array if all algorithms operate using random access.
Or you could decide that you cannot insert into the middle of a
container efficiently.

But I don't see how this distinction is any different for Java than it is
for C++. My point in saying that you can call toArry on a Java container
was that you can impose a random access mappin over the collection, and
use
random access algorithms to operate on it. Mind you with Java you will
mostlikely be sorting pointers.

Imposing any random access structure (one where all elements can be
accessed in O(1)) to a container where you can insert in O(1) at
arbitrary positions involves at least an O(n) operation. Although it
reduces the constant when only using pointers it does not change
the asymptotic complexity. Thus, doing such a conversion necessarily
imposes a substantial overhead, especially if the actual operation
can be done with a complexity smaller than O(n), like searching in
a sorted sequence which can be done in O(ln n). I assume you know
enough about complexity theory to make sense of these funny O(f(n))
notations (if not you should read up on them; this is a pretty
important topic if you want to do serious software development).

I'm not fully convinced of this. If the elements actually are stored in an
array with the possibility of holes, providing random access *may* not be
as expensive as you are suggesting.

I would have to read up on exactly what determines the backing data
structure of the various Java collections. They may actually hold
everything with an Array of pointers, and merely manipulate pointers to
provide the different interfaces. If you add an element to a linked list
you could put it at the end of the array and put it's pointer somewhere
else in the linked list. Yes you will pay an indirection cost. And Java
may also leave holes on removal of elements, and pack it when it feels
like it. I don't recall, if I ever knew.

You cannot get both: insertion at an arbitrary position and random
access. Well, you can get insertion at an arbitrary position and
random access if you don't need the order for the random access to
retain the original order. Of course, loosing the original order
effectively kills your chances to use a more efficient algorithms
taking advantage of random access. That is, simply adding things to
the end of an array does not help you when you want to pass the
sequence to an algorithm.

I'm hesitant to accept absolute statements about whether it can "help".
There may be a way to leverage the order of the linked list mapping in
order to optimize the production of a packed, ordered array.

I don't know how their so-called generics work, but supposedly they
address
some of that. I'm sure the designers of Java understood the tradeoff,
they just opted for sloppy type checking to get flexible behavior.

I don't think that the Java designers understood anything about generic
programming at all, otherwise they would not have done what they did.

I was talking about the original tradeoff. Basically Gosling took the Emacs
Lisp idea that a list is a list is a list, and applied it to an OOPL that
looks like C++ and acts like Smalltalk.
I understand the restrictions they felt themselves caught in but I have
seen a much better approach to generic on a JVM than what Java now has!
They opted for something which was done in C++ something like 15 years
ago and which became what are templates now. Essentially, the Java
designers refused to learn the lessons learned in the C++ context (of
course, they also refused from other experiences before, too).
I can't comment on what they now have. About two years ago I suggested they
leave things alone and simply focus on effectively using what they had.

The one feature of
Java Arrays that makes them superior to C++ built-ins is the .size() (or
is that .length()?) method.

It is trivial to create a class which looks like a built-in array but
adds a 'size()' member. The standard library does not do it because it
is deemed sufficient to provide 'std::vector': the two extra pointers
per object are not considered to be too expensive and I doubt that
there are indeed many applications where they are harmful. In the few
cases where they are, it is sufficiently easy to create a suitable
class [template].

The syntax in C++ is not as elegant for creating multidimessional jaggad
arrays of std::vector as it is in Java for creating their smooth
counterpart.

You can do it with sizeof, but that gets really
weird when you start passing them as parameters.

Of course, I have rare occasion to pass arrays in the first place. I
generally pass iterator pairs to suitable function templates. Maybe you
should try to understand what STL is about and you will realize that
there are few situations where you want to restrict your functions to
accept arrays anyway.

It's more the case that the data I'm working with lends itself naturally to
being expressed as multidimensional arrays. The same can be done using
std::valarray and std::gslice, but it is not as intuitive for me.

Suggesting that there may be an STL approach to the same problem may be
useful. Suggesting that I am not trying to learn to use the STL is wrong.
It may well be that STL iterators will do very much the same thing that
using boost::array<> will do. When I pass a boost::array<>* I'm passing a
reference to the data, and information about its bounds. I typically want
to iterate over all elements of the array, and often want to do that for
each member array. This is easy to do if I can simply pass the whole data
structure by reference, and have the called function determine the bounds.
STL iterators may give me more flexibility in such a situation, but I'm not
sure I gain much from that. It would seem more productive to find examples
where the array approach is used, and show the advantage of using the STL
approach instead, than it would be to suggest that a person is not trying
to learn the STL approach.

When I was able to divide the STL conceptually into two groups of things;
collections, and algorithms, I was able to think about it in finite
terms.
It no longer seemed open-ended, That's what I mean. That's what I
consider the first step in really understanding something.

Except that neither the set of collections nor the set of algorithms
is finite to the STL! Part of understanding STL is that although you
have an infinite set of collections, you only need to implement each
algorithm a finite (actually small) number of times. That is, if you
want an algorithm which is applicable to all containers, you only
need to provide one implementation thereof (possibly, you need a few
more implementations if you want to take advantage of specialstructures
for providing better performance but this is only about optimization).
On the other hand, if want to create a container which can be used with
all algorithms, you only need to provide a few entities abstracting
your container (e.g. iterators). That is, both the set of containers
and the set of algorithms are open. The only thing which is indeed
conceptually finite in STL is the set of concepts (and even this is
only finite per domain addressed by the algorithms and data structures).

BTW, STL conceptually consists of at least four different things:
- concepts

Does the Standard currently provide a formal definition (I'll tell you when
I've stopped laughing) of "concept" in the STL sense? I do believe the
term has a specific meaning in the context of the STL, and I believe it is
close to what I call a meta-type.
- algorithms
- iterators and functors
- containers
(listed in decreasing order of importance).

When I was talking about the STL being finite, I meant the actual number of
specifications in the C++ Standard. I agree your list is more complete
than mine. I suggest that iterators should be placed on a separate line
above functors. Helper functions (for lack of a better term) may also be
worth mentioning. That is mem_fun vs mem_fun_t. You may prefer to leave
the notion of traits implied by the 'T' in STL.

I only have 8 more pages left in Chapter 13. But perhaps I'm wasting my
time. Can you suggest a *_good_* book. Guess I'll skip chapter 14. I'm
sure the quality of presentation is not up to your high standards.

This is an interesting statement, especially taking the last sentence
of the first paragraph of Chapter 14 into account... I'm undecided on
whether you are ridiculing me or you didn't know what is says. In any
case, you are right with your assumption that I would present the
contents differently today (but I still think it is appropriate).

Well, since I've been getting a lot of negative feedback on the idea of
viewing the Standard Headers sans comments as an instructive exercise (and
you seem to be aware of that discussion), and you suggested I stop looking
at the the headers and, _instead_, read a book on the subject, your comment
didn't come across very well. I'm a slow reader. I have what some people
call learning disabilities. It's very hard work for me to get through a
book like that.

I will share this observation about my experience in reading the book. The
reason I was eight pages from the end of Chapter 13 is because it wasn't
"clicking". There's something that's difficult to explain which happens to
me when I fail to understand some key concept in such a development. It
happened when I was reading a book on the physics of elasticity. I had not
fully understood the difference between force and stress. There are some
fairly elementary trigonometric derivations following the discussion of
stress which I simply stared at without being able to follow the
derivations. Once I figured out the conceptual part of what distinguishes
stress from force, the derivations were very easy to follow. This may not
seem rational to most people. It doesn't seem rational to me.
Nonetheless, it's how my mind works.

I cannot claim to fully understand every step in detail regarding the
discussion 676-677, but after examining the various I/O Header files I've
extracted from the Standard, the discussion began to make sense.

Obviously the most important of these is the one containing:

namespace std {
template <class charT, class traits = char_traits<charT> >
class basic_streambuf {...};
}

No. And that's not really what I'm saying. I don't believe I would be
able to write intuitive code using the feature.

Can you point out where

for_each(vec, { std::cout << _1; });

is more complex than

for_each(int i: vec) { std::cout << i; }

? These two statements are indeed different but I seriously don't see
where they are substantially different to justify that the first
code is not intuitive...

In the latter I can see where 'i' is coming from.
--
If our hypothesis is about anything and not about some one or more
particular things, then our deductions constitute mathematics. Thus
mathematics may be defined as the subject in which we never know what we
are talking about, nor whether what we are saying is true.-Bertrand Russell

Jul 23 '05 #44

Alternatives to the C++ Standard Library?

Similar topics