Alternatives to the C++ Standard Library?

Steven T. Hatton

Now that I have a better grasp of the scope and capabilities of the C++
Standard Library, I understand that products such as Qt actually provide
much of the same functionality through their own libraries. I'm not sure
if that's a good thing or not. AFAIK, most of Qt is compatable with the
Standard Library. That is, QLT can interoperate with STL, and you can
convert back and forth between std::string and Qt::QString, etc.

Are there any other libraries based on Standard C++, but not on the C++
Standard Library? What might these be? Why do people create these? Are
any significantly different from the C++ Standard Library? Is it good that
people create their own basic libraries instead of using the Standard
Library?
--
If our hypothesis is about anything and not about some one or more
particular things, then our deductions constitute mathematics. Thus
mathematics may be defined as the subject in which we never know what we
are talking about, nor whether what we are saying is true.-Bertrand Russell

Jul 23 '05

Subscribe Reply

5019

« First
<
2
3
4
5
>

Stephen Howe

> ... and a way to further comliplicate the C++ language to satisfy the

request of a few academics. Sorting and searching are the only
'algorithms' one needs. The rest is better done with iterators and an
'old-fashioned' for-loop.

Really? Your knowledge of algorithms is extremely limited then.

nth_element(), partitition() and stable_partitio n() are all extremely useful
but they don't quite fit into "sorting or searching" (possibly
nth_element()).

Stephen Howe

Jul 23 '05 #31

Michael Olea

First of all, thanks for the detailed and thoughtful reply - it is much
appreciated.

Dietmar Kuehl wrote:

... as stated, you will indeed want to
split the iterator concept into two distinct concepts: one for moving
(cursors) and one for accessing data (property maps).

So for example a function that returns the "edit distance" between two
sequences - that is the minimum number of substitutions, insertions, and
deletions needed to turn one sequence into the other - might have an
interface something like:

template<typena me RandomAccessCur sor1, typename RandomAccessCur sor2,
typename ReadablePropert yMap1, typename ReadablePropert yMap2>
unsigned int edit_distance(
RandomAccessCur sor1 b1, RandomAccessCur sor1, e1,
RandomAccessCur sor2 b2, RandomAccessCur sor2 e2,
ReadablePropert yMap1 rpm1 = identity_proper ty_map<typename
RandomAccessCur sor1::value_typ e>(),
ReadablePropert yMap2 rpm2 = identity_proper ty_map<typename
RandomAccessCur sor2::value_typ e>() );

A few first impressions:

The algorithm, a dynamic programming algorithm, does not really need random
access to the sequences, it just needs to know up front the lengths of both
sequences, which should be available in constant time, and it needs to make
multiple passes over one of the sequences - by convention say sequence two.
So after traversing sequence two it needs to go back to the start and
traverse the elements in order again. The other sequence is traversed in
order once. So "random access" is more than is really required.

There is a requirement that the types returned by rpm1 and rpm2 can be
compared using "==".

I gather that an identity_proper ty_map has lvalue access - T& pm(*c), but
when c iterates over small elements, like char, I would want T pm(*c) - the
access occurs in the inner loop of the algorithm. For large objects of
course I would want const T& pm(*c) - so I guess the defaults above should
be something like readable_identi ty_property_map - an object that would
have the desired behavior.

In reality the hypothetical function "edit_dista nce" would be at most a
conveniance function - for a variety of reasons the real worker would be a
class, most likely a functor.

The first version of "edit_dista nce" I wrote long ago had the interface:

distance(const char *p, const char *q);

Then it became necessary to compare substrings, so:

distance(const char *pS, const char *pE, const char *qS, const char *qE);

Then it became necessary to compare the results of a character recognition
device, a list of structs containing first, second, and third choice
characters, each with an associated confidence level, with strings,
substrings, and each other. Then came sequences of vertices and edges in
planar maps, and now sequences of DNA, RNA, amino acids, genes...

My thesis is still
available at
<http://www.dietmar-kuehl.de/generic-graph-algorithms.pdf> . However,
there are a few things to note:
Thanks. I downloaded it despite the caveats below.

- The Boost Graph Library (BGL) by Jeremy Siek et al. is based on
very similar ideas but is completely available as a library you
can download and there is a book describing it written by Jeremy.
I'd recommend that you rather look at this.
In fact I had ordered the book before asking about your thesis - should have
it in "2 to 4 weeks". In the meantime I thought it would be useful to see
some of the history of the ideas. Also there is much documentation to read
on-line - so no shortage of study materials during those 2 to 4 weeks. I do
also have a 2 year old copy of boost, including BGL, on one of my machines.
I took a quick look - probably should get the latest...
- Since I wrote the thesis, some things evolved and the important
concepts have undergone some changes, not to mentioned renaming
(due to discussions with Jeremy and others). In particular, what
I originally called "Data Accessor" is now called "Property Map".
Also, to distinguish concepts, I still used the term iterator for
the traversal concept rather than using "Cursor".
- The currently favored concept for property maps looks quite
different, partially due to quite resent discussions with Dave
Abrahams and Doug Gregor. In short, the concept looks like this
("pm" is a property map; "c" is cursor; "v" is a value as obtained
by combining the type of "*c" with the type of "pm"):

- reading a value: v = pm(*c); // requires readable PM
- writing a value: pm(*c, v) // requires writable PM
- lvalue access: type& v = pm(*c, v); // requires lvalue PM
This looks pretty clean.

For more information on the current state of property maps, have
a look at
<http://boost-consulting.com/projects/mtl4/libs/sequence/doc/html/cursors_and_pro perty_maps.html >

I took a quick look. Really I will have to write some code to get a better
feel for how to use them in the design of my library, but my first
impression is that it makes sense to do so.

Just as an aside - the inputs, and often the outputs, of sequence
analysis are sequences - strings - but intermediate data structures like
Suffix Trees have nodes and edges with properties that vary by algorithm.
So the property map idea looks interesting.

The original idea of property maps (then called data accessors)
arose in the context where there are several problems related to
properties associated with nodes and edges:

- The set of used properties vary widely between different algorithms
and depending on the used algorithm you want to provide them
somehow.
- Different algorithms think of the same property in different terms.
For example, what is a length to one algorithm (e.g. a path find
algorithm) could be a capacity to another algorithm (e.g. a flow
algorithm). Still, the algorithm want to communicate with each
other. This was actually the primary reason why I introduced
property maps for the graph abstraction.
- The same properties are often differently represented and
interrelated. For example, in flow networks each directed edge
often has a total capacity, a current flow, and a free capacity.
Obviously, "free capacity == capacity - current flow" hold, i.e.
it is sufficient to represent only two of the three attributes
and depending on the context you don't know which is the best
combination.

From your description it sounds as if there are similar issues for
the analysis of sequences and I guess that property maps can be a
viable to address these problems.

Yes. Several of the central data structures are various kinds of trees and
DAGS - special cases of graphs. I will definitely have to study BGL - both
the book and the code.

- Michael

Jul 23 '05 #32

Steven T. Hatton

Dietmar Kuehl wrote:

The standard is part of a contract between the user of a C++ tool (e.g.
a compiler) and the implementer of said tool. Contracts tend to be
lawyerly. The problem is that the two parties agreeing on the contract
have radically different views of its content and the specification has
to cater for both. At the same time it should be minimalist, consistent,
and complete. I doubt that translating it into English would be a useful
enterprise, not to mention a futile one (the real standard will stay in
more or less the form it currently is.
It's the ulterior motive that bothers me. To wit: built-in job security for
language lawyers. One aspect of the language specification that surprised
me is the amount of forwar reference it makes. I expected it to be more
linear. Not introducing concepts until their prerequisites were formally
presented.

I believe it may be the strongest justification for having the 'T' in
STL.

Well, the strongest justification for using compile-time polymorphism
is that this allows the search for the best algorithm. Also, template
provide a mechanism which allows dispatch based on multiple participant,
something only few systems support with run-time polymorphism. In any
case, the performance benefit is beyond merely avoiding a few virtual
functions: the fact that e.g. the result type of an operation is
statically known but can still vary with different instantiations is
rather powerful.

I believe that's what I said, even though I don't know what you just
said. :) I have much to learn yet.

I'd go with "operations ", "solver" sounds like something out of the
marketing department. :)

I doubt that the term for algorithms is changed in this context, however.
"Operation" would be a better fit from my perspective but I'm not a
native English speaker.

Google for `Schroedinger ansatz'.

I'm not sure what you mean by variation point.

Essentially, something which can be changed when using a polymorph
implementation. In statically typed object-oriented languages the
virtual functions are the variation points. With static polymorphism
the operations implementing a concept are the variation points, e.g.
operator++(), operator*(), and operator==() for iterators (this is
an incomplete list, however).

I believe what you are saying is a variation point is a "branch point" the
compiler cannot analyse. My wording is probably sloppy, but I suspect the
concept is there.

This is why I have serious questions about mem_fun. It is typically
implemented using a pointer to member function, which according to what I
was able to glean from thumbing through _Inside_The_C++ _Object_Model_ has
fairly bad performance.

Does it? When I measured using pointer to member functions I had
differing results depending on how the pointer was actually used.
However, it is indeed likely that mem_fun has no real choice to do
a good job. If the member [function] pointer is turned into a
template parameter, the function can even be inlined. On the other
hand, if it is turned into a variable, this chance is reduced
substantially. I haven't measured this effect lately, though. It is
well possible that compiler are smart enough to detect that the
pointer does not change at all.

It's really a question of what the compiler can deterministally know at
compile time. If the parameter passed to mem_fun can be statically
determined by the context, the polymorphic aspects (vtbl indirection, etc)
can be eliminated. If the bounds of the iterators can be determined at
compile time, it may be possible to "unroll the loop".

In Java and C# these three operations are folded into one operation
("Next()") which returns a null value if there is no next position
and otherwise returns the current value and moves the iterator to
the next position.

But this is quite easy to implement in C++, I've even done it.

Many people have done so before STL was introduced! ... and they all
failed to deliver the performance and flexibility STL delivers.
Actually, the generally failed to deliver general algorithms in the
first place!

What I meant is that I've done it starting with STL iterators. It's really
quite simple. There is one qualification, and I would have to check the
Java documentation to see exactly how Java hadels this: rather than
checking next()==null, I used bool hasNext(); It's a straightforward
wrapper around operator++(int) ;
One of the key feature of the STL approach is that algorithms are
accepting iterators with minimal requirements (typically iterators
are only required to be input or output iterators) but use an
algorithm best suited for the iterators actually used. For example,
'distance()' merely requires that the user passes a pair of input
iterators and it just moves through the sequence counting the number
of steps. However, if the user actually passes a pair of random
access iterators, the operation becomes a mere substraction which
tends to be substantially faster.
From my readding of TC++PL, getting the best performance is often obtained
by using the member function for a particular container template. But it
does seem reasonable that the implementation could detect the container
type, and chose the best algorithm for it.
You cannot do a similar thing with enumerators using run-time
polymorphism: the interface is only know to be an enumerators not
some extended interface. Well, you could (at run-time) test for
more powerful derived interfaces but the whole point is to improve
the performance not to kill it.

There are other issues, too. For example, if you combine the three
operations (advance, access, and test) into one operation you need
to perform all three even if you only need a subset. For example,
'distance()' never accesses the value. An algorithm using loop
unrolling only tests relatively few steps, not ever step and does
not necessarily advance the iterator in each step but might step
forward multiple elements at once. From a practical point of view,
separating access and test also has the advantage that it is not
necessary to have a dedicated null value for each type. This in
turn allows for value semantics which is more general than reference
semantic (where you get an obvious null value): if you want
references you just use a proxy which behaves as if it is a value
(here overloading operator.() would be handy...). This is not
possible the other way around, i.e. you cannot simulate value
semantics if your basis is implemented to use reference semantic.
It seems your last sentence is the real distinction you are trying to make.
If I have value semantics at the fundamental level, I can probably find a
way for the compiler to evaluate my code having all three operations
combined in one, and to omit unused operations from the object code.

The naive
approach uses postfix operators, so may be slightly less efficient than
the STL form, but I suspect the difference is negligible since I maintain
an internal temp object to return as a reference.

Now *_this_* is elegant:

http://www.open-std.org/JTC1/SC22/WG...005/n1796.html

for( int i : vec ) std::cout << i;

I disagree. I would prefer something like this

for_each(vec, std::cout << _1);

Assuming a suitable implementation of 'for_each()' and a using
statement for an appropriate namespace this is already achievable
with current C++. However, if the operation becomes more complex
than an expression, it kind of stops working. In this case language
level lambda support becomes necessary. This yields a more flexible
and more general approach than the rather limited proposal of the
above paper. Of course, the above notation would introduce a minor
semantic difference to your statement: the element type of 'vec' is
used to in the print expression. This is easily fixed assuming
language level support of lambdas:

for_each(vec, std::cout << int(_1))

if something akin to Boost::Lambda is built-in or

for_each(vec, { std::cout << int(_1); });

if braces are necessary to signal use of a lambda expression.
Please note what the first paragraph says about anonymous inner classes in
Java:

http://mindprod.com/jgloss/anonymousclasses.html

I know these are not identical to lambda expressions, but they fall into the
same category. I'm not really sure what it means to have lambda support in
the core language, but I believe this area of programming is not something
that would help any but advanced programmers working is certain specialized
areas. The foreach construct is something that would help a firstyear
computer science major get his or her homeword done more quickely and with
fewer errors.

Now, which is better? If you don't pay for calling multiple
functions, the former is more desirable because it offers more
flexibility and allows for smaller points of variability (although
it is still too course-grained...).

With Java you can call toArray() on a collection, and get a random access
interface to iterate over with a for() loop using the same sequence of
checks you've stated above.

I doubt that 'toArray()' has O(1) performance on doubly linked lists.

I'd have to read the API docs, but many of Java's containers are really
interfaces which make no requirement of the form of the "backing data
structure". I'll grant that you are at the mercy of the Java
implementation unless you can JNI your way around it. I'm not sure how
Java linked lists pose any significant disadvantage over their C++
counterpart.
Thus, it is dead in the water: effectively, it either has to copy the
list contents into an array, possibly using some form of proxies to
avoid at least copying the whole content, or it has to use iteration
to move freely within an array. Both cases kill the performance,
especially when using only a small fraction of the sequence as is
quite likely. If you don't care about performance, feel free to use
an approach akin to 'toArray()'. Or you may use vectors all the time.
In all project I was involved performance *was* an issue. Not the
primary one but I always had to tune the software to make it run fast
enough. Deliberately killing efficiency has never been an option.
Java Arrays are very friendly little animals:
http://java.sun.com/docs/books/jls/t...ml/arrays.html
Note that these are arrays of _values_ rather than references (unless that's
what you chose to put in them).

The closest thing C++ has is Josuttis's boost:array<> which is a bit more
cumbersome to work with, and is not dynamic.

I don't follow. What difference does it make whether I type the
following, or I leave it to the compiler to 'type' it for me:

using std::begin; // enable ADL
using std::end; // ditto
for( auto __begin = begin(vec),
__end = end(vec);
__begin != __end; ++__begin )
{
int i = *__begin;
std::cout << i;
}?

Since formatting the output for 'std::cout' and buffering the
result is the bottleneck in this example, it does not make much of
a difference. However, the compiler might write something rather
different instead:

using std::begin; // this is a neat trick; I should remember it
using std::end;
auto _B = begin(vec);
auto _E = end(vec);
for (; 2 <= _E - _B; _B += 2)
std::cout << int(_B[0]) << int(_B[1]);
for (; _B != _E; ++_B)
std::cout << int(*_B);

Of course, instead of "2" the compiler would rather insert "16"
or something depending on the size of the type and provide a
corresponding amount of operations. There are other optimizations
possible, too, even for such a trivial operation! Of course,
things like 'std::copy()' provide even more alternatives.

I'm confident there are advantages to using the existing STL algorithms
(what's the plural for ansatz?) over the foreach construct. This may be the
best counterargument to introducing the foreach construct. It might
encourage people write suboptimal code. OTOH, it might also encourage
people to write C++ code in the first place.

Again, it you are intending Java, this will not necessarily apply. Java
does have some low-level optimization for its array object which is not
implemented in the same way is its other collections.

I consider it rather unlikely that you are always using array
objects and if you don't you incur a fair performance hit by
copying the content of your container into an array already even
if only references to the actual objects are copied. STL always
operates on the actual sequence and does so quickly if the iterators
are reasonably implemented. It is not by accident that Java does not
ship with an extensive algorithms library.

Bear in mind that Java is organized differently than C++'s STL, so this
comparison is only of limited worth:

http://java.sun.com/j2se/1.5.0/docs/api/index.html

So now we have meta-types to deal with the type checking of our "pure
abstractions". At some point people are going to realize the meta-types
can be grouped into useful categories, and will want to check if that fit
into these categories. In that event we will have meta-meta-types.
There are tow ways of dealing with Russell's paradox, accept that
languages with selfreference allow for the formulation of
selfcontradicto ry statements, or insist that sets cannot contain other
sets, only meta-sets can contain sets, and only meta-meta-sets can
contain sets....

I don't think you want concepts on meta types.

If you are understanding my comments as a criticism that would be exceeding
my intent. I was merely mocking the ultimate futility of any effort at
creating a perfectly consistent and complete language.
On the other hand, people
doing meta programming might disagree... I still think that concepts can
provide some benefit when done correctly. They can also incur loads of
overhead with no benefit if done wrong. From my perspective it is still
open whether the correct or the wrong approach is chosen...
As understand what you were saying, there are two approaches, trust the
programmer to tell you his or her template is implements a certain concept,
or verify it explicitly. On suggestion I made a while back is what I call
a massless interface. It's just a formal declaration of what is to be
provided, and does not become part of the executable code. The problem
with that is there may be circumstances when only part of the
meta-interface is needed by the class instantiated from the template.

This is a gotcha one way or the other. If you don't check for complete
implementation of a concept, some of your code could have errors in it, and
yet the program it appears in will still compile. For instance a function
defined for a class template but never used in an instantiation.

It is necessary that a container is accessible view iterators to
be used with STL algorithms. However, iterators need not at all be
related with any container. For example, an iterator over the even
integers could compute its current state and operations without the
need of any container.

Agreed, but that isn't very useful in understanding what's in the STL,
and how it works in general.

I think the understanding the concepts is the only viable route to
understand what STL is about in the first place. What's in there actually
become secondary once you have understood the basic principle. Applying
STL concepts to rather strange entities helps in understanding what STL
is about, IMO: who would bother to implement algorithms on more or less
arbitrary sequences? Well, STL readily provides them and the user can
plug in more algorithms applicable to all sequences and, of course,
arbitrary new sequences. The only restriction on the definition of the
sequence is that depending on its capabilities you might get only certain
subsets of algorithms: again this "reasonable "-thing mentioned above.

I was talking about a first level working understanding, not a more complete
and comprehensive one. I will observe that this discussion helped solidify
my understanding that iterators are not "part of" the container, but are
instead instances of types bound to the containers through typedefs.
You will find that you have to iterate over a sequence in always the
same order unless you can iterator over it only once, i.e. if is
accessed by an input or an output iterators. In all cases, the sequence
stays the same whenever you iterate over it (unless you change it
explicitly by means outside the iteration).

Ah! But what do you really mean by sequence in this case? They are ordered
mappings over an unordered collection.

For example, you could create an iterator that traverses
a binary tree using depth-first traversal, and another using
breadth-first
traversal. In this case we do have something of a container, but the
order of traversal is external to it.

Note that you got two different sequences here for the same container:
one for the DFS order and one for the BFS order. That is, a container
can have more than one sequence associated with it. Well, as stated
before, the container is actually quite irrelevant to STL anyway.

That's my point. The actual collection is not a sequence. The sequence is
imposed on it as some kind of mapping.

My problem with Boost.Lambda is that it introduces additional syntax into
the language,

Boost::Lambda is entirely implemented in terms of the current core
language with no changes made to support it. Where does it introduce
new syntax and, even more interesting, how?
and doesn't appear to provide much in the way of useful
additional functionality. It seems the entire effort is to find a way of
shoving more complex functions into the third parameter of std::for_each.

I thought you were in favor of n1796...!? *This* proposal is nothing
more than a glorified for_each! Lambdas can be used everywhere you
are using a functor. In fact, they would neatly solve your performance
problem with mem_fun(), too: Instead of 'mem_fun(foo::b ar)' you would
write '_1.bar()', '_1.bar(_2)', ... You'd need to be consistent with
your desires and look from a more general perspective than just solving
a minor quibble with a special case. Solving many minor quibbles with
just one general solution is much more desirable than having many
special cases. Lambda support *would* solve many minor problems and
avoid the need for special solutions.

But it needs to be comprehensible to be usable. For the beginner in C++
that will not be the case. The foreach construct is applicable to every 3D
program I have ever written. I might be able to fold all that into
superior lambda expressions, but only after I learn to use them, and I have
too much other stuff to learn before I try.

Perhaps I'm wrong about this, but if I had to chose between for( int i :
vec ) std::cout << i; and Boost.Lambda for inclusion in C++0X, I'd go
with the foreach construct.

You got a very limited view, IMO. I'd rather go with a general solution!

I'm not sure what core language changes are involved in adding lambda
support, but the fact that there are changes to the core language required
to make it fully functional is the only argument I can see for wanting it
in C++0X. If it were just a library, I'd say let it be provided by and for
the people who understand it and want it.

--
If our hypothesis is about anything and not about some one or more
particular things, then our deductions constitute mathematics. Thus
mathematics may be defined as the subject in which we never know what we
are talking about, nor whether what we are saying is true.-Bertrand Russell

Jul 23 '05 #33

Steven T. Hatton

Steven T. Hatton wrote:

Dietmar Kuehl wrote:

I consider it rather unlikely that you are always using array
objects and if you don't you incur a fair performance hit by
copying the content of your container into an array already even
if only references to the actual objects are copied. STL always
operates on the actual sequence and does so quickly if the iterators
are reasonably implemented. It is not by accident that Java does not
ship with an extensive algorithms library.

Bear in mind that Java is organized differently than C++'s STL, so this
comparison is only of limited worth:

http://java.sun.com/j2se/1.5.0/docs/api/index.html

Whoops! Damn frames!
http://java.sun.com/j2se/1.5.0/docs/...reference.html
--
If our hypothesis is about anything and not about some one or more
particular things, then our deductions constitute mathematics. Thus
mathematics may be defined as the subject in which we never know what we
are talking about, nor whether what we are saying is true.-Bertrand Russell

Jul 23 '05 #34

P.J. Plauger

"Steven T. Hatton" <ch********@ger mania.sup> wrote in message
news:UM******** ************@sp eakeasy.net...

Dietmar Kuehl wrote:
The standard is part of a contract between the user of a C++ tool (e.g.
a compiler) and the implementer of said tool. Contracts tend to be
lawyerly. The problem is that the two parties agreeing on the contract
have radically different views of its content and the specification has
to cater for both. At the same time it should be minimalist, consistent,
and complete. I doubt that translating it into English would be a useful
enterprise, not to mention a futile one (the real standard will stay in
more or less the form it currently is.
It's the ulterior motive that bothers me. To wit: built-in job security
for
language lawyers.

Excuse me? You've been campaigning for the past week or so to get
the C++ committee to do work that, so far, only *you* want to
have done. You've been told consistently that:

a) the committee is overworked as it is

b) if you want to get something done you have to do the work and
make a formal proposal

Please notice that not once has any committee member said, "Whoopee,
another thing to do to justify my continued work on the C++
Standard! Where do I begin?"

And now you accuse the committee, or at least parts of it, of
obfuscating the C++ Standard to ensure job security? To get this
past the moderators, I'll simply say _______________ ____________!
You can fill in the blanks yourself.
One aspect of the language specification that surprised
me is the amount of forwar reference it makes. I expected it to be more
linear. Not introducing concepts until their prerequisites were formally
presented.

One cure for your surprise is to sit down and do the exercise.
Try restructuring the C++ Standard to remove all those forward
references you didn't expect. You might learn something.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

Jul 23 '05 #35

Dietmar Kuehl

P.J. Plauger wrote:

And now you accuse the committee, or at least parts of it, of
obfuscating the C++ Standard to ensure job security? To get this
past the moderators, I'll simply say _______________ ____________!
You can fill in the blanks yourself.

Note that comp.lang.c++ is not moderated, i.e. there is no moderator
to blame for approving this article... (although I probably would
have approved this article if it had been posted to clc++m).
--
<mailto:di***** ******@yahoo.co m> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence

Jul 23 '05 #36

Dietmar Kuehl

Steven T. Hatton wrote:

Dietmar Kuehl wrote:
It's the ulterior motive that bothers me. To wit: built-in job security
for language lawyers.
I'm not aware of anybody in the committee who has such "ulterior motive."
I certainly have not. As far as I can tell, everybody is doing their best
to maintain an efficient, flexible, and highly usable language. The fact
that the standard is not easy to follow derives from trying to avoid
complications by stating things only once and thereby avoiding
contradictions (which are still a problem). There is also the problem
that it is hard enough to push a change to the standard which enhances
the language definition. I'm almost 100% certain that it is even much
harder to move a change forward which supposedly only makes the standard
easier to read (for a certain fraction of the readers): Guaranteeing that
a change really does what is intended is pretty hard and we are all the
time busy fixing up oversights. I doubt that anybody is really willing to
accept the danger of breaking something just to make it easier to
understand! In fact, we even know about certain areas where there are
problems with the current definition (not only with respect to
understanding them but also what they are saying) and nobody dares
touching them!

.... and job security does not really matter for the majority of the
committee anyway: essentially, the majority of the attendees has a more
than average grasp of software development and thus does not really run
the risk of staying unemployed for extensive periods of time. And this
is not at all due to C++ only: I know of several committee members who
earn their money programming primarily in languages different from C++
(e.g. C, Java, or C#).
One aspect of the language specification that surprised
me is the amount of forwar reference it makes. I expected it to be more
linear. Not introducing concepts until their prerequisites were formally
presented.
Although this approach is desirable, it is not always possible in the
first place. In addition, this educational goal also has severe
drawbacks: currently, the standard is organized according to different
topics rather than any order imposed by trying to avoid forward
references. This makes it easy to locate specific sections. Since the
standard is not even intended to be read front to back, this organization
is much more reasonable. Also note that the standard is not intended as a
tutorial. It is intended as part of a contract and anybody determined
enough can learn to use it. It doesn't even take that long to get used to
using the standard. I think it took me less than three month to work
efficiently with the standard (of course, I didn't read the standard full
time in these three month: I just used it to look-up the details when
writing answers to questions asked in comp.lang.c++). If you think that
a deep knowledge of the C++ standard provides job security, I don't think
this is too much of an investment. Of course, I think there is more
knowledge involved in securing the jobs of the committee members...

I think you should stop whining about the standard and just accept that
it is the way it is. If you think you can do an excellent job in
"translatin g it to English", I'm quite confident that there is a market
for a book with the translation referencing the corresponding sections
of the standard. In any case, wasting your and other people's time with
your complaints is not going to change a jota in the standard!

I doubt that the term for algorithms is changed in this context, however.
"Operation" would be a better fit from my perspective but I'm not a
native English speaker.

Google for `Schroedinger ansatz'.

The German term "Ansatz" is about the details of a solution to a problem,
very much like "algorithm" is a description of how to solve a problem.
I.e. "Ansatz" is as inappropriate as "algorithm" for the things currently
called "algorithms ". I think "operation" , "solver", etc., i.e. something
which states the problem, not how to solve it, is the right approach.
I believe what you are saying is a variation point is a "branch point" the
compiler cannot analyse. My wording is probably sloppy, but I suspect the
concept is there.
No, I don't think so. Actually, yesterday I read the term "customizat ion
point" for exactly the concept I called "variation point". I just recalled
the term wrongly: "customizat ion point" is an appropriate name.

[static resolution of a member using 'mem_fun()' and relatives] It's really a question of what the compiler can deterministally know at
compile time. If the parameter passed to mem_fun can be statically
determined by the context, the polymorphic aspects (vtbl indirection, etc)
can be eliminated.
Well, this is part of it. I had member function adapters which encoded
the member function in the type of the functor, e.g.:

template <typename T, typename S, T (S::*mem)()>
struct unary_member
{
T operator()(S& s) { return (s.*mem)(); }
}

When I played with stuff like this I verified that they are inlined
by contemporary compilers. Although 'mem_fun()' cannot possibly create
a corresponding object, const propagation in the compiler's optimizer
could in principle detect that the referenced member function never
really changes and possibly inline it. Of course, if the member
function is actually a virtual function, things become even more
complex but if the static type of the objects does not change, it
could indeed also detect this.
If the bounds of the iterators can be determined at
compile time, it may be possible to "unroll the loop".
Loop unrolling can be done without knowing the bounds statically: in
general, you don't want to unroll the whole loop anyway, even if the
bounds are statically known. You want to unroll a bunch of iterations
and put them into a loop which steps with bigger strides.
What I meant is that I've done it starting with STL iterators. It's
really
quite simple. There is one qualification, and I would have to check the
Java documentation to see exactly how Java hadels this: rather than
checking next()==null, I used bool hasNext(); It's a straightforward
wrapper around operator++(int) ;
Like I said before: The goal of STL is to be fast, not to deliberately
kill the performance! Actually, one of STL's goals was to provide
generic algorithms applicable to all reasonable data structures which
are as fast as algorithms hand crafted specifically for the corresponding
data structures. Although this goal is not met in all cases the deviation
is normally in a very small margin. There were also secondary goals,
like the applicability to subsequences (if the sequence supports at least
forwarding iterators) which make operations like "hasNext()" as the basis
unsuitable.
From my readding of TC++PL, getting the best performance is often obtained
by using the member function for a particular container template.
You got it wrong. The generic algorithms are not applicable in all cases
but in some of these cases the internal structure of a certain container
still allows an implementation with good performance. In cases like this,
the container provides a corresponding operation as a member. For
example, the generic algorithm for binary search (aka 'lower_bound()' ,
'upper_bound()' , and 'equal_range()' ) are only applicable to random
access iterators. However, 'set's and 'map's can provide efficient
searches [for the key type only, though] although they only support
bidirectional iteration. Thus, there are specialized versions as member
functions.
But it
does seem reasonable that the implementation could detect the container
type, and chose the best algorithm for it.
The implementation of generic algorithms does not detect the container
types but the operations provided by the iterator. Thus, it can choose
to take advantage of better suited operations if the iterator is more
capable. However, in general the iterators do not have access to the
container but only to the container's elements. For example, the list
iterator will only point to the current element in the list which does
not have a pointer to the list it is contained in.
It seems your last sentence is the real distinction you are trying to
make. If I have value semantics at the fundamental level, I can probably
find a way for the compiler to evaluate my code having all three
operations combined in one, and to omit unused operations from the object
code.
No. My point is that combining all three operations into one does not
leave the opportunity to omit unnecessary operations. Also, STL iterators
have the flexibility to use other operations if available, e.g. stepping
with 'it += n' (with n > 1).
Please note what the first paragraph says about anonymous inner classes in
Java:

http://mindprod.com/jgloss/anonymousclasses.html
I know how anonymous inner classes work in Java. I think that they are
indeed quite similar to at least my wish for lambda expression in C++:
essentially, I just want a more convenient approach to create functors
than having to write a separate function. Whether it should include
support for closures or not and how the syntax should look are actually
the only bigger issues.
I know these are not identical to lambda expressions, but they fall into
the same category.
.... and I think that they help also programmers which are not that
advanced.
I'm not really sure what it means to have lambda support
in the core language, but I believe this area of programming is not
something that would help any but advanced programmers working is certain
specialized areas.
I put a quote from a little bit above here:

for( int i : vec ) std::cout << i;
for_each(vec, std::cout << _1); The foreach construct is something that would help a firstyear
computer science major get his or her homeword done more quickely and with
fewer errors.
Can you please identify which part of the two above code excerpts makes
one superior over the other? If there is nothing which makes the first
excerpt *much* easier for non-advanced programmers, we should go with
the more general approach. Personally, I only see one minor difference
between the two codes above: the variable has a name in the first one
while it is referenced by position in the second one. Of course, if you
want to name, you would be free to do so:

for_each(vec, { int i = _1; std::cout << i; });

For other types than integers you would use 'T const& i = _1;' or
something similar. In teaching a feature like this, it is not even
necessary to state that there is a lambda expression creating a functor.
On the other hand, even if it is done it is not harmful if the reader
was exposed to the concept of a functor before.
I'd have to read the API docs, but many of Java's containers are really
interfaces which make no requirement of the form of the "backing data
structure".
You just have to understand that it does not matter whether the
containers are interfaces or not. What matters for algorithms is what
kind of access can efficiently provided. In particular, you cannot do
both efficiently (meaning in O(1)):

- insert elements at arbitrary positions
- provide random access

For example, a vector has chosen to provide random access while a list
has chosen to provide element insertion at arbitrary positions (actually,
there are more operations in each group but the above are sufficiently
characteristic) . Thus, you would need a slow conversion from a list to
an array if all algorithms operate using random access. Or you could
decide that you cannot insert into the middle of a container efficiently.
I'll grant that you are at the mercy of the Java
implementation unless you can JNI your way around it. I'm not sure how
Java linked lists pose any significant disadvantage over their C++
counterpart.
You stated that in Java you would use an array in your algorithms
(quoting you from an earlier article in this thread):
With Java you can call toArray() on a collection, and get a random access
interface to iterate over with a for() loop using the same sequence of
checks you've stated above.
This would mean one of two things:
- Insertion into Java lists is slow
- Using algorithms on Java lists is slow

You cannot get both if you go through an array interface. I consider
this a huge disadvantage. There are, however, other alternatives like
going through virtual functions but these approaches have their own
problems, e.g. a loss in optimization opportunities.
Java Arrays are very friendly little animals:
http://java.sun.com/docs/books/jls/t...ml/arrays.html
Note that these are arrays of _values_ rather than references (unless
that's what you chose to put in them).
Java arrays have their own problems, too. Actually, the designers of
Java got the conversion rules wrong for them: arrays should allow no
conversion because otherwise you could easily get violations of the
Liskov Substition Principle. Immutable arrays could be covariant but
Java has no concept of immutability on the type level.
The closest thing C++ has is Josuttis's boost:array<> which is a bit more
cumbersome to work with, and is not dynamic.
'std::vector' is the right match for Java arrays. The Java arrays are
not statically sized like C++ built-in arrays or 'std::tr1::arra y's.
There could possibly be class between 'std::vector' and 'std::tr1::arra y',
i.e. a class which is sized at run-time but once the size is set does not
allow size changes. This would be a nearly exact match for Java arrays.
The only additional "capability " would be the conversion to arrays of
bases which is, however, broken in the first place.
I'm confident there are advantages to using the existing STL algorithms
(what's the plural for ansatz?)
"Ansätze" (I don't know whether the third letter is shown on your
computer the way it should be because it is outside the conventional
Usenet encoding of ASCII; it should be an "a" with an "umlaut", i.e.
a horizontal pair of dots).
over the foreach construct. This may be
the best counterargument to introducing the foreach construct.
The best counterargument for introduction of a specialized solution is
that a more general solution exists.
It might
encourage people write suboptimal code. OTOH, it might also encourage
people to write C++ code in the first place.
If you explain why my version of the "for_each() " construct is vastly
inferior to yours, I can see reason to accept this argument. However,
I don't see it to be inferior. Actually, I see it as a vastly superior
construct because it removes syntax oddities in other places, too,
making the code easier also for new users.
Bear in mind that Java is organized differently than C++'s STL, so this
comparison is only of limited worth:
I wasn't aware of a class providing algorithms on abstract containers.
Of course, it is the wrong approach to require a container in the first
place. However, the really interesting analysis is whether it provides
the algorithms efficiently or only with relatively minor losses compared
to hand-crafted versions for specific containers.
As understand what you were saying, there are two approaches, trust the
programmer to tell you his or her template is implements a certain
concept, or verify it explicitly.
No, it is more the other way around: the presence of required features
is or least can be checked with both approaches. The difference is whether
the compiler decides whether certain requirements are met on the basis
that the corresponding declarations are present or whether it has to be
explicitly allowed to make use of the corresponding features. The issue
is essentially this: assume some generic implementation uses a member
function "foo()" with no arguments. Now, a user tries to use a class
with this generic implementation which indeed has a function "foo()"
with no argument. The question is essentially: can the compiler assume
that this function "foo()" meets the requirement without the user
explicitly stating so? After all, it could be by mere accident that
there is a function "foo()" but this function can do something rather
different than is expected by the generic implementation. For more
details, read the corresponding papers in the mailings.
On suggestion I made a while back is what I call
a massless interface. It's just a formal declaration of what is to be
provided, and does not become part of the executable code. The problem
with that is there may be circumstances when only part of the
meta-interface is needed by the class instantiated from the template.
Concepts are basically that. However, the problems are rather different
than what you assumed.
I was talking about a first level working understanding, not a more
complete and comprehensive one.
Right. You need to understand concepts to understand STL - at the first
level. If you don't understand concepts you have no chance whatsoever to
ever understand STL.
I will observe that this discussion helped
solidify my understanding that iterators are not "part of" the container,
but are instead instances of types bound to the containers through
typedefs.
.... or, to put it differently, you still haven't understood what iterators
in STL are about. Maybe you should start picking up a book introducing
STL rather than trying to make sense of what is written in the header
files. You won't learn how STL works from the header files.
You will find that you have to iterate over a sequence in always the
same order unless you can iterator over it only once, i.e. if is
accessed by an input or an output iterators. In all cases, the sequence
stays the same whenever you iterate over it (unless you change it
explicitly by means outside the iteration).

Ah! But what do you really mean by sequence in this case? They are
ordered mappings over an unordered collection.

I never stated differently. You did, however! Iteration in STL defines
a sequence. The elements are order according to the position in which
they appear in the iteration, i.e. the iteration uses a sequence. Hm,
sounds like a tautology. You should, BTW, forget about collections when
you think about STL! Sequences are not directly related to collections,
i.e. you [still] don't need a collection or a container to have a
sequence. If you have a collection or a container it is likely that you
have multiple associated sequences. For example, the STL containers are
all associated with two sequences, one moving from front to back the
other in the reverse direction.
That's my point. The actual collection is not a sequence.
The issue was whether a STL algorithms operate on sequences. Nothing
more and nothing less. I never claimed that the containers or collections
are sequence. I actually claimed that you don't need containers in the
first place and that STL operates on sequences. Actually, I think you
should do yourself a favor and stop confusing yourself!
But it needs to be comprehensible to be usable.
Right. Please point out why

for (int i: vec) std::cout << i;

is more comprehensible than

for_each(vec, std::cout << _1);
For the beginner in C++ that will not be the case.
If your only argument is that the beginner in C++ cannot understand it
while everybody else can, you are just pulling up a strawman.
The foreach construct is applicable to every 3D
program I have ever written.
So is the for_each() construct I have given. In fact, the only difference
is a syntactical one: You can view a block as nothing else than a functor.
In your proposal, you put the functor after the closing parenthesis, in
my proposal it is within. Semantically, your functor has no arguments but
access it context entirely using the closure while my functor has one
argument and does not necessarily need a closure at all. Big deal, isn't
it? The real difference is, however, that you have a small, tiny feature
which cannot be used anywhere else while I got a huge feature which can
be used in many different places! For example, my feature also covers
the functor passed to 'find_if()' while yours does not.
I might be able to fold all that into
superior lambda expressions, but only after I learn to use them, and I
have too much other stuff to learn before I try.
Sorry for you. Why walk if you also have the option to crawl? You got so
much other stuff to learn, too...
I'm not sure what core language changes are involved in adding lambda
support, but the fact that there are changes to the core language required
to make it fully functional is the only argument I can see for wanting it
in C++0X. If it were just a library, I'd say let it be provided by and
for the people who understand it and want it.

If it could be implemented just as a library, I would not bother to ask
for it! Why would I? I can implement libraries myself which is a much
safer approach than waiting for compiler implementers to implement some
feature. However, since it requires core features to do it right, I
don't want too much competing features (like special case solutions) in
the language. Especially, if the special case can easily be addressed
using the general solution!
--
<mailto:di***** ******@yahoo.co m> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence

Jul 23 '05 #37

Dietmar Kuehl

Michael Olea wrote:

First of all, thanks for the detailed and thoughtful reply - it is much
appreciated.
You are welcome.
So for example a function that returns the "edit distance" between two
sequences - that is the minimum number of substitutions, insertions, and
deletions needed to turn one sequence into the other - might have an
interface something like:

template<typena me RandomAccessCur sor1, typename RandomAccessCur sor2,
typename ReadablePropert yMap1, typename ReadablePropert yMap2>
unsigned int edit_distance(
RandomAccessCur sor1 b1, RandomAccessCur sor1, e1,
RandomAccessCur sor2 b2, RandomAccessCur sor2 e2,
ReadablePropert yMap1 rpm1 = identity_proper ty_map<typename
RandomAccessCur sor1::value_typ e>(),
ReadablePropert yMap2 rpm2 = identity_proper ty_map<typename
RandomAccessCur sor2::value_typ e>() );

A few first impressions:

The algorithm, a dynamic programming algorithm, does not really need
random access to the sequences, it just needs to know up front the lengths
of both sequences, which should be available in constant time, and it
needs to make multiple passes over one of the sequences - by convention
say sequence two. So after traversing sequence two it needs to go back to
the start and traverse the elements in order again. The other sequence is
traversed in order once. So "random access" is more than is really
required.
Well, then don't use random access iterators:

template <typename InputCursor, typename ForwardCursor,
typename ReadPM1, typename ReadPM2>
unsigned int edit_distance(I nputCursor b1, InputCursor e1,
ForwardCursor b2, ForwardCursor e2,
ReadPM1 rpm1, ReadPM2 rpm2);

(I omitted the default arguments because they don't matter much for the
interface).
There is a requirement that the types returned by rpm1 and rpm2 can be
compared using "==".
This is just a usual requirement on some type which is not captured by
the interface at all.
I gather that an identity_proper ty_map has lvalue access - T& pm(*c), but
when c iterates over small elements, like char, I would want T pm(*c) -
the access occurs in the inner loop of the algorithm.

I don't think that this matters if all relevant functions are inline. If
it does, you can easily create your own replacement for an identity
property map or use a specialized property map, e.g.:

template <typename T>
struct my_map
{
T const& operator()(T const& k) { return k; }
};
template <>
struct my_map<char>
{
char operator()(char k) { return k; }
};

Property maps tend to be as trivial as this but they still provide
interesting additional power.
--
<mailto:di***** ******@yahoo.co m> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence

Jul 23 '05 #38

Michael Olea

Dietmar Kuehl wrote:

... So "random access" is more than is
really required.
Well, then don't use random access iterators:

template <typename InputCursor, typename ForwardCursor,
typename ReadPM1, typename ReadPM2>
unsigned int edit_distance(I nputCursor b1, InputCursor e1,
ForwardCursor b2, ForwardCursor e2,
ReadPM1 rpm1, ReadPM2 rpm2);

The question then becomes how to get the lengths of the sequences, which are
needed to initialize a table for dynamic programming. I suppose one answer
is to make them parameters also. Of course I would like to deduce the
lengths of the sequences from the cursor arguments when possible - use
cursor arithmetic when it's available, otherwise make a count-pass given a
cursor supporting multiple passes, otherwise require a length parameter.
I'm not sure yet how to do that.
... you can easily create your own replacement for an identity
property map or use a specialized property map, e.g.:

template <typename T>
struct my_map
{
T const& operator()(T const& k) { return k; }
};
template <>
struct my_map<char>
{
char operator()(char k) { return k; }
};

Property maps tend to be as trivial as this but they still provide
interesting additional power.

That gives me another idea - a property map would be a good place to put
character translation. A problem that comes up in DNA analysis is the
search for "complement ed palindromes". A palindrome, of course, is a string
that reads the same forward as backward: "abccba". A "complement ed
palindrome" is one that reads the same forwards as its complemented version
reads backwards, where each character of the alphabet has a unique
complement, and is the complement of its complement. Say {A <=> a, B <=> b,
C <=> c} - so then "abcCBA" would be a complemented palindrome. I am using
CharMaps already for that problem, but not reverse iterators - making
reversed complemented copies of the string to be searched. By separating
the concepts of movement and access I think I see an improvement both in
speed and space, and quite likely a reduction in code. Cool!

Jul 23 '05 #39

Steven T. Hatton

Dietmar Kuehl wrote:

Steven T. Hatton wrote:
Dietmar Kuehl wrote:
It's the ulterior motive that bothers me. To wit: built-in job security
for language lawyers.
I'm not aware of anybody in the committee who has such "ulterior motive."
I certainly have not. As far as I can tell, everybody is doing their best
to maintain an efficient, flexible, and highly usable language. The fact
that the standard is not easy to follow derives from trying to avoid
complications by stating things only once and thereby avoiding
contradictions (which are still a problem). There is also the problem
that it is hard enough to push a change to the standard which enhances
the language definition. I'm almost 100% certain that it is even much
harder to move a change forward which supposedly only makes the standard
easier to read (for a certain fraction of the readers): Guaranteeing that
a change really does what is intended is pretty hard and we are all the
time busy fixing up oversights. I doubt that anybody is really willing to
accept the danger of breaking something just to make it easier to
understand! In fact, we even know about certain areas where there are
problems with the current definition (not only with respect to
understanding them but also what they are saying) and nobody dares
touching them!

There are places where the wording is overly "artful". A very good example
is with the use of the term "header". And some statements which people
take to be definitions simply aren't.
... and job security does not really matter for the majority of the
committee anyway: essentially, the majority of the attendees has a more
than average grasp of software development and thus does not really run
the risk of staying unemployed for extensive periods of time. And this
is not at all due to C++ only: I know of several committee members who
earn their money programming primarily in languages different from C++
(e.g. C, Java, or C#).
You aren't seeing what I'm seeing. And I mean in terms of my sent mail, and
my inbox.

One aspect of the language specification that surprised
me is the amount of forwar reference it makes. I expected it to be more
linear. Not introducing concepts until their prerequisites were formally
presented.

Although this approach is desirable, it is not always possible in the
first place. In addition, this educational goal also has severe
drawbacks: currently, the standard is organized according to different
topics rather than any order imposed by trying to avoid forward
references.

I really wasn't thinking in terms of education. I was thinking in terms of
formal development of ideas. Mathematical rigor.
This makes it easy to locate specific sections. Since the
standard is not even intended to be read front to back, this organization
is much more reasonable. Also note that the standard is not intended as a
tutorial. It is intended as part of a contract and anybody determined
enough can learn to use it. It doesn't even take that long to get used to
using the standard. I think it took me less than three month to work
efficiently with the standard (of course, I didn't read the standard full
time in these three month: I just used it to look-up the details when
writing answers to questions asked in comp.lang.c++). If you think that
a deep knowledge of the C++ standard provides job security, I don't think
this is too much of an investment. Of course, I think there is more
knowledge involved in securing the jobs of the committee members...
I was, to some extent speaking metaphorically. More in the sense of ivory
tower than economics.
I think you should stop whining about the standard and just accept that
it is the way it is. If you think you can do an excellent job in
"translatin g it to English", I'm quite confident that there is a market
for a book with the translation referencing the corresponding sections
of the standard. In any case, wasting your and other people's time with
your complaints is not going to change a jota in the standard!
If, perchance you are referring to my conversations elsewhere, please note
that I already stated that I did not believe I would succeed in affecting a
change in the standard. I had dropped the subject completely until it was
insinuated that I was merely trying to get someone else to do my work for
me. What I have been attempting by continuing to discuss the issue is to
get people to think in terms of an API. I must say, I am stunned by the
number of people who don't believe that the declarations in an interface
are the formal statement of an API and should be consulted as such.

I doubt that the term for algorithms is changed in this context,
however. "Operation" would be a better fit from my perspective but I'm
not a native English speaker.

Google for `Schroedinger ansatz'.

The German term "Ansatz" is about the details of a solution to a problem,
very much like "algorithm" is a description of how to solve a problem.
I.e. "Ansatz" is as inappropriate as "algorithm" for the things currently
called "algorithms ". I think "operation" , "solver", etc., i.e. something
which states the problem, not how to solve it, is the right approach.

In terms of Schrödinger's equations, they are the operators which provide
the solution. Actually very similar to the concept of a function object.

I believe what you are saying is a variation point is a "branch point"
the compiler cannot analyse. My wording is probably sloppy, but I suspect
the concept is there.

No, I don't think so. Actually, yesterday I read the term "customizat ion
point" for exactly the concept I called "variation point". I just recalled
the term wrongly: "customizat ion point" is an appropriate name.

[static resolution of a member using 'mem_fun()' and relatives]
It's really a question of what the compiler can deterministally know at
compile time. If the parameter passed to mem_fun can be statically
determined by the context, the polymorphic aspects (vtbl indirection,
etc) can be eliminated.

Well, this is part of it. I had member function adapters which encoded
the member function in the type of the functor, e.g.:

template <typename T, typename S, T (S::*mem)()>
struct unary_member
{
T operator()(S& s) { return (s.*mem)(); }
}

When I played with stuff like this I verified that they are inlined
by contemporary compilers. Although 'mem_fun()' cannot possibly create
a corresponding object, const propagation in the compiler's optimizer
could in principle detect that the referenced member function never
really changes and possibly inline it. Of course, if the member
function is actually a virtual function, things become even more
complex but if the static type of the objects does not change, it
could indeed also detect this.

I actually hadn't even considered that the member function could be static.
That does sound reasonable. I believe the numbers I looked at were
referring to virtual funcitons, and yes, if the type of the object who's
member is being invoked can be determined by the context at compile time,
the compiler should in principle be able to inline it.

If the bounds of the iterators can be determined at
compile time, it may be possible to "unroll the loop".

Loop unrolling can be done without knowing the bounds statically: in
general, you don't want to unroll the whole loop anyway, even if the
bounds are statically known. You want to unroll a bunch of iterations
and put them into a loop which steps with bigger strides.

That makes sense. I hadn't though about that.
Like I said before: The goal of STL is to be fast, not to deliberately
kill the performance! Actually, one of STL's goals was to provide
generic algorithms applicable to all reasonable data structures which
are as fast as algorithms hand crafted specifically for the corresponding
data structures. Although this goal is not met in all cases the deviation
is normally in a very small margin. There were also secondary goals,
like the applicability to subsequences (if the sequence supports at least
forwarding iterators) which make operations like "hasNext()" as the basis
unsuitable.
Not sure what you mean here. The reason I mentioned the Java-style iterator
is because I believe in reality it is just using the three operations you
had enumerated, and I was suggesting that it could therefore be optimized
in much the same way. Acknowledging that it does use postfix
incrementation.

From my readding of TC++PL, getting the best performance is often
obtained by using the member function for a particular container
template.

You got it wrong. The generic algorithms are not applicable in all cases
but in some of these cases the internal structure of a certain container
still allows an implementation with good performance. In cases like this,
the container provides a corresponding operation as a member. For
example, the generic algorithm for binary search (aka 'lower_bound()' ,
'upper_bound()' , and 'equal_range()' ) are only applicable to random
access iterators. However, 'set's and 'map's can provide efficient
searches [for the key type only, though] although they only support
bidirectional iteration. Thus, there are specialized versions as member
functions.

I thought that's what I said. I did, however, intent to type TC++SL. As in
Josuttis's book.

But it
does seem reasonable that the implementation could detect the container
type, and chose the best algorithm for it.

The implementation of generic algorithms does not detect the container
types but the operations provided by the iterator. Thus, it can choose
to take advantage of better suited operations if the iterator is more
capable. However, in general the iterators do not have access to the
container but only to the container's elements. For example, the list
iterator will only point to the current element in the list which does
not have a pointer to the list it is contained in.

I was thinking of a strategy implemented at a higher level. That is, the
compiler actually swapping out the algorithm to get the best performance.
It seems your last sentence is the real distinction you are trying to
make. If I have value semantics at the fundamental level, I can probably
find a way for the compiler to evaluate my code having all three
operations combined in one, and to omit unused operations from the object
code.

No. My point is that combining all three operations into one does not
leave the opportunity to omit unnecessary operations. Also, STL iterators
have the flexibility to use other operations if available, e.g. stepping
with 'it += n' (with n > 1).

I don't see why the compiler is obligated to perform all these operation if
the same behavior can be obtained by eliminating on of them.

Please note what the first paragraph says about anonymous inner classes
in Java:

http://mindprod.com/jgloss/anonymousclasses.html

I know how anonymous inner classes work in Java. I think that they are
indeed quite similar to at least my wish for lambda expression in C++:
essentially, I just want a more convenient approach to create functors
than having to write a separate function. Whether it should include
support for closures or not and how the syntax should look are actually
the only bigger issues.
I know these are not identical to lambda expressions, but they fall into
the same category.

... and I think that they help also programmers which are not that
advanced.

I've done some interesting things with them, but unless I'm using them every
day, they are difficult to understand. They take me more time to understand
and modify than do the slightly more verbose approach they replace.
There's also the problem of not seeing everything relevant to the construct
where it is located. I would have to think about the topic a bit to
provide a concrete example, but I am confident in what I asserted.

I'm not really sure what it means to have lambda support
in the core language, but I believe this area of programming is not
something that would help any but advanced programmers working is certain
specialized areas.

I put a quote from a little bit above here:
for( int i : vec ) std::cout << i; for_each(vec, std::cout << _1);
The foreach construct is something that would help a firstyear
computer science major get his or her homeword done more quickely and
with fewer errors.

Can you please identify which part of the two above code excerpts makes
one superior over the other? If there is nothing which makes the first
excerpt *much* easier for non-advanced programmers, we should go with
the more general approach. Personally, I only see one minor difference
between the two codes above: the variable has a name in the first one
while it is referenced by position in the second one. Of course, if you
want to name, you would be free to do so:

The proposed foreach construct has the ability to take bounds as arguments
so I could (probably) make this work. This is just something I have handy
that shows the kind of complexity I am thinking about. I wouldn't even
know where to begin doing it with a lambda expression:

for(size_t i = 0; i < _radii.size()-1; i++) {
osg::Vec2 unv(_radii[i+1] - _radii[i]);
unv = osg::Vec2f(-unv[1], unv[0]);
unv.normalize() ;

for(size_t j = 0; j < _ring.size(); j++) {
osg::Vec3 v0(_radii[i][0], _radii[i][1] * _ring[j][0], _radii[i][1]
* _ring[j][1]);
osg::Vec3 v1(_radii[i+1][0], _radii[i+1][1] * _ring[j][0],
_radii[i+1][1] * _ring[j][1]);
osg::Vec3 nv(unv[0], unv[1] * _ring[j][0], unv[1] * _ring[j][1]);

v0 *= scale;
v1 *= scale;

if(_axis == X){
vertices->push_back(v0 );
vertices->push_back(v1 );
normals->push_back(nv );
} else {
vertices->push_back(rotv 3(v0, _axis));
vertices->push_back(rotv 3(v1, _axis));
normals->push_back(rotv 3(nv, _axis));
}

normals->push_back(norm als->back());

size_t f2 = 2 * _facets;
size_t n = f2 * i;
size_t j2 = 2 * j;

osg::DrawElemen tsUInt* quad = new
osg::DrawElemen tsUInt(osg::Pri mitiveSet::QUAD S, 0);

quad->push_back(n + j2);
quad->push_back(n + j2 + 1);
quad->push_back(n + (j2 + 3)%f2);
quad->push_back(n + (j2 + 2)%f2);

psl->push_back(quad );

}
}

for_each(vec, { int i = _1; std::cout << i; });

For other types than integers you would use 'T const& i = _1;' or
something similar. In teaching a feature like this, it is not even
necessary to state that there is a lambda expression creating a functor.
On the other hand, even if it is done it is not harmful if the reader
was exposed to the concept of a functor before.
I'd have to read the API docs, but many of Java's containers are really
interfaces which make no requirement of the form of the "backing data
structure".
You just have to understand that it does not matter whether the
containers are interfaces or not. What matters for algorithms is what
kind of access can efficiently provided. In particular, you cannot do
both efficiently (meaning in O(1)):

- insert elements at arbitrary positions
- provide random access

For example, a vector has chosen to provide random access while a list
has chosen to provide element insertion at arbitrary positions (actually,
there are more operations in each group but the above are sufficiently
characteristic) . Thus, you would need a slow conversion from a list to
an array if all algorithms operate using random access. Or you could
decide that you cannot insert into the middle of a container efficiently.

But I don't see how this distinction is any different for Java than it is
for C++. My point in saying that you can call toArry on a Java container
was that you can impose a random access mappin over the collection, and use
random access algorithms to operate on it. Mind you with Java you will
mostlikely be sorting pointers. That has advantages and disadvantages.
One advantage is that it's possible to impose multiple mappings at the same
time on the same collection.
I'll grant that you are at the mercy of the Java
implementation unless you can JNI your way around it. I'm not sure how
Java linked lists pose any significant disadvantage over their C++
counterpart.

You stated that in Java you would use an array in your algorithms
(quoting you from an earlier article in this thread):

I was saying that you could convert the linked list to an Array in order to
get random access. But that was assuming we were talking about C++ random
access algorithms.

With Java you can call toArray() on a collection, and get a random access
interface to iterate over with a for() loop using the same sequence of
checks you've stated above.

This would mean one of two things:
- Insertion into Java lists is slow
- Using algorithms on Java lists is slow

You cannot get both if you go through an array interface. I consider
this a huge disadvantage. There are, however, other alternatives like
going through virtual functions but these approaches have their own
problems, e.g. a loss in optimization opportunities.

I would have to read up on exactly what determines the backing data
structure of the various Java collections. They may actually hold
everything with an Array of pointers, and merely manipulate pointers to
provide the different interfaces. If you add an element to a linked list
you could put it at the end of the array and put it's pointer somewhere
else in the linked list. Yes you will pay an indirection cost. And Java
may also leave holes on removal of elements, and pack it when it feels like
it. I don't recall, if I ever knew.

Java Arrays are very friendly little animals:
http://java.sun.com/docs/books/jls/t...ml/arrays.html
Note that these are arrays of _values_ rather than references (unless
that's what you chose to put in them).

Java arrays have their own problems, too. Actually, the designers of
Java got the conversion rules wrong for them: arrays should allow no
conversion because otherwise you could easily get violations of the
Liskov Substition Principle. Immutable arrays could be covariant but
Java has no concept of immutability on the type level.

I don't know how their so-called generics work, but supposedly they address
some of that. I'm sure the designers of Java understood the tradeoff, they
just opted for sloppy type checking to get flexible behavior.

The closest thing C++ has is Josuttis's boost:array<> which is a bit more
cumbersome to work with, and is not dynamic.

'std::vector' is the right match for Java arrays. The Java arrays are
not statically sized like C++ built-in arrays or 'std::tr1::arra y's.
There could possibly be class between 'std::vector' and 'std::tr1::arra y',
i.e. a class which is sized at run-time but once the size is set does not
allow size changes.

That is something I would like to have in C++. You can do it with built-in
arrays, but you have to jump through a lot of hoops. The one feature of
Java Arrays that makes them superior to C++ built-ins is the .size() (or is
that .length()?) method. You can do it with sizeof, but that gets really
weird when you start passing them as parameters.
This would be a nearly exact match for Java arrays.
The only additional "capability " would be the conversion to arrays of
bases which is, however, broken in the first place.
Agreed.

I'm confident there are advantages to using the existing STL algorithms
(what's the plural for ansatz?)

"Ansätze" (I don't know whether the third letter is shown on your
computer the way it should be because it is outside the conventional
Usenet encoding of ASCII; it should be an "a" with an "umlaut", i.e.
a horizontal pair of dots).

Other than the fact that the guys as SuSE change my keyboard mapping once a
week, I can do a reasonable job of typing in Old Norse, so the umlauts come
through, no problem.
If you explain why my version of the "for_each() " construct is vastly
inferior to yours, I can see reason to accept this argument. However,
I don't see it to be inferior. Actually, I see it as a vastly superior
construct because it removes syntax oddities in other places, too,
making the code easier also for new users.

See my code example above.

Bear in mind that Java is organized differently than C++'s STL, so this
comparison is only of limited worth:

I wasn't aware of a class providing algorithms on abstract containers.
Of course, it is the wrong approach to require a container in the first
place. However, the really interesting analysis is whether it provides
the algorithms efficiently or only with relatively minor losses compared
to hand-crafted versions for specific containers.

I really don't know. I'm trying to learn C++ right now. I haven't worked
with Java in well over a year.

On suggestion I made a while back is what I call
a massless interface. It's just a formal declaration of what is to be
provided, and does not become part of the executable code. The problem
with that is there may be circumstances when only part of the
meta-interface is needed by the class instantiated from the template.

Concepts are basically that. However, the problems are rather different
than what you assumed.
I was talking about a first level working understanding, not a more
complete and comprehensive one.

Right. You need to understand concepts to understand STL - at the first
level. If you don't understand concepts you have no chance whatsoever to
ever understand STL.

When I was able to divide the STL conceptually into two groups of things;
collections, and algorithms, I was able to think about it in finite terms.
It no longer seemed open-ended, That's what I mean. That's what I
consider the first step in really understanding something.

I will observe that this discussion helped
solidify my understanding that iterators are not "part of" the container,
but are instead instances of types bound to the containers through
typedefs.

... or, to put it differently, you still haven't understood what iterators
in STL are about. Maybe you should start picking up a book introducing
STL rather than trying to make sense of what is written in the header
files. You won't learn how STL works from the header files.

What I said is completely accurate regarding the iterators that are
associated with containers. I have no idea what you are talking about.
Every STL container other than bitset has iterator typedefs. They have an
iterator, a const_iterator, a reverse_iterato r, and a
const_reverse_i terator. The sequence types have a difference type for
storing the difference between iterator positions.

The reverse iterators are actually instances of iterator adapter templates
taking iterators as arguments. They use a trick that separates the
physical and logical positions of the iterators so that container.end() is
the same as the physical position of container.rbegi n(), but
container.rbegi n()'s logical position is container.end()-1. That way you
can have half open ranges in both directions using the same physical
positions. The main reason for doing this is because the language doesn't
specify that arrays don't start at address zero. If you tried to use
address -1, you'd probably be in trouble.
There are different categories of iterator:
input iterators which read forward

output iterators which write forward

forward iterators which read and write forward

bidirectional iterators which read and write forward and backward

random access iterators which add integral indexing to the functionality of
bidirectional iterators (only for sequences, not associative containers)

There are other iterator adapters:

Insert iterators:
back_insert_ite rator
front_insert_it erator
insert_iterator

each insert iterator has an associated creation function:
back_inserter
front_inserter
inserter

There are also stream iterators.

Then there are iterator traits and iterator tags for creating iterators.

But it's important to bear in mind that iterators are a pure abstraction, so
anything that acts like an iterator is an iterator.

Perhaps I'm wasting my time reading Josuttis's The C++ Standard Library? I
only have 8 more pages left in Chapter 13. But perhaps I'm wasting my
time. Can you suggest a *_good_* book. Guess I'll skip chapter 14. I'm
sure the quality of presentation is not up to your high standards.

For the beginner in C++ that will not be the case.

If your only argument is that the beginner in C++ cannot understand it
while everybody else can, you are just pulling up a strawman.

No. And that's not really what I'm saying. I don't believe I would be able
to write intuitive code using the feature.

I'm not sure what core language changes are involved in adding lambda
support, but the fact that there are changes to the core language
required to make it fully functional is the only argument I can see for
wanting it
in C++0X. If it were just a library, I'd say let it be provided by and
for the people who understand it and want it.

If it could be implemented just as a library, I would not bother to ask
for it! Why would I? I can implement libraries myself which is a much
safer approach than waiting for compiler implementers to implement some
feature. However, since it requires core features to do it right, I
don't want too much competing features (like special case solutions) in
the language. Especially, if the special case can easily be addressed
using the general solution!

There is clearly a lot of tricky stuff to learn here:
http://www.boost.org/doc/html/lambda/le_in_details.html

--
If our hypothesis is about anything and not about some one or more
particular things, then our deductions constitute mathematics. Thus
mathematics may be defined as the subject in which we never know what we
are talking about, nor whether what we are saying is true.-Bertrand Russell

Jul 23 '05 #40

Similar topics