473,322 Members | 1,345 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,322 software developers and data experts.

Interface of the set classes

Ok, I first want to stress that I looked through the newsgroups archives
(on Google) for an answer to my question, but I didn't find it.

It's about the interface of the set classes as defined in the PEP 218.
I don't understand why the sets has an interface of mapping object
without associated value and not an interface of sequence ?

For me, sets are first sequences. I use set where lists are inefficient
or inadapted, for example when I need to remove redundancy and that I
don't care about the positions of my elements. So I happen to switch
from list to sets depending of the data involved (for example if the
data are non-redundant, lists can be more efficient, but if they are,
they become too expensive). And that's just impossible to do with Python
sets ...

Then, why can't you at all access sets element by position ? As Python
iterator are so poor (mainly, you cannot copy them ...) you have things
you cannot do without positionnal iteration (typically when you need to
store the current position and go on the iteration to restart from the
saved position later on).

Please, if this discussion already exists give me the pointer ... and
don't tell me "because of implementation if over dictionnary" ... I
don't think that can be a good answer :o)

Thanks,

Pierre Barbier de Reuille
Jul 18 '05 #1
9 1889
Pierre Barbier de Reuille <pi************@cirad.fr> wrote:
Ok, I first want to stress that I looked through the newsgroups archives
(on Google) for an answer to my question, but I didn't find it.

It's about the interface of the set classes as defined in the PEP 218.
You should check out the built-in set type in 2.4 (the beta is quite
usable, go ahead and get it!) -- dunno if PEP 218 has been updated to
reflect exactly what's ended up in the language.
I don't understand why the sets has an interface of mapping object
without associated value and not an interface of sequence ?
Sequences have order, sets don't.
Then, why can't you at all access sets element by position ? As Python
iterator are so poor (mainly, you cannot copy them ...) you have things
That's what itertools.tee is for (in Python 2.4).
you cannot do without positionnal iteration (typically when you need to
store the current position and go on the iteration to restart from the
saved position later on).
Yep, the quintessential usecase of itertools.tee.
Please, if this discussion already exists give me the pointer ... and
don't tell me "because of implementation if over dictionnary" ... I
don't think that can be a good answer :o)


It would be false (in 2.4) -- set is a builtin type, side by side with
list and dict, not relying on either of them.
It may well be that, for whatever needs, you need a different container
type. That's what the collections module in the standard library is for
(again, in 2.4) -- in 2.4 there's only deque in it, but that's the place
where, in 2.5, other collection types may go. In my case, the main
reason I find myself using something other than set, for example, is
when I have non-hashable items.

The best way to get new collection types into Python 2.5 is to write
them now (pure Python, pyrex, C, whatever), publish them (Activestate
cookbook, sourceforge, whatever), get a user-base so that experience can
enhance them, _and_ also get involved in python-dev to make a PEP and
champion it. No hurry, it's probably 18+ months to go to 2.5, but the
herein-suggested, soundest process to make it likely that one's changes,
in the best possible form, will get into 2.5's standard library, does
take its time, too;-)
Alex
Jul 18 '05 #2
Alex Martelli a écrit :
Pierre Barbier de Reuille <pi************@cirad.fr> wrote:

Ok, I first want to stress that I looked through the newsgroups archives
(on Google) for an answer to my question, but I didn't find it.

It's about the interface of the set classes as defined in the PEP 218.

You should check out the built-in set type in 2.4 (the beta is quite
usable, go ahead and get it!) -- dunno if PEP 218 has been updated to
reflect exactly what's ended up in the language.


I've read somehow the documentation, and that's exactly what's in the PEP :)

I don't understand why the sets has an interface of mapping object
without associated value and not an interface of sequence ?

Sequences have order, sets don't.


Yes, and ? I don't understand why that's a reason ? If you take the C++
STL, the sets implements the sequence concept but without the impossible
things to do ... so if you don't need the order in your list you can
switch freely from vector to list to sets depending on the properties or
efficiency you want. I don't know why this is different in Python !
Using "add" instead of "append" and "update" instead of "extend" does
not makes sense to me. It's even worse because Python rely more on
interface than in inheritence ... so making classes similar helps a lot
reusability.

Then, why can't you at all access sets element by position ? As Python
iterator are so poor (mainly, you cannot copy them ...) you have things

That's what itertools.tee is for (in Python 2.4).


Oky, thanks ^_^ So I should not need the __getitem__ anymore in sets.

you cannot do without positionnal iteration (typically when you need to
store the current position and go on the iteration to restart from the
saved position later on).

Yep, the quintessential usecase of itertools.tee.

Please, if this discussion already exists give me the pointer ... and
don't tell me "because of implementation if over dictionnary" ... I
don't think that can be a good answer :o)

It would be false (in 2.4) -- set is a builtin type, side by side with
list and dict, not relying on either of them.


Yes, I know it will become a built-ins type ! But that does not explain
this interface incompatible with lists. I would expect maximum
compatibility to be able to exchange the types. Even compatibility
between dictionnary and list is usefull ... because at some point you
can consider a dictionnary as a list of tuples. And then, too bad you
cannot because of simple design decisions.


It may well be that, for whatever needs, you need a different container
type. That's what the collections module in the standard library is for
(again, in 2.4) -- in 2.4 there's only deque in it, but that's the place
where, in 2.5, other collection types may go. In my case, the main
reason I find myself using something other than set, for example, is
when I have non-hashable items.

The best way to get new collection types into Python 2.5 is to write
them now (pure Python, pyrex, C, whatever), publish them (Activestate
cookbook, sourceforge, whatever), get a user-base so that experience can
enhance them, _and_ also get involved in python-dev to make a PEP and
champion it. No hurry, it's probably 18+ months to go to 2.5, but the
herein-suggested, soundest process to make it likely that one's changes,
in the best possible form, will get into 2.5's standard library, does
take its time, too;-)
Alex


No, what I want is not a new container type. I just want to be able to
change the container I use without changing all my codes ! It's too
expensive. At the end, I finish using sets and hashsets from the STL
exported to Python as sequences. The efficiency is the same and it just
works great :) I just feel bad not using the "standard" sets class ...

Pierre
Jul 18 '05 #3
Pierre Barbier de Reuille <pi************@cirad.fr> wrote:
...
I don't understand why the sets has an interface of mapping object
without associated value and not an interface of sequence ?
Sequences have order, sets don't.


Yes, and ? I don't understand why that's a reason ? If you take the C++
STL, the sets implements the sequence concept but without the impossible
things to do


C++'s std::set<...> is ordered (and so is STL's, both at the time in
which the C++ standard took inspiration from it, and in its long-since
diverged state).
... so if you don't need the order in your list you can
switch freely from vector to list to sets depending on the properties or
efficiency you want.
Nope -- to use std::set<T>, you need to be able to define a < comparator
that obeys the required semantic rules (either as an operator< on T, or
separately in various possible ways); to use std::vector<T>, you don't.
So, you cannot "switch freely" -- there are constraints.

That's because a std::set<...> is based on ordering (a Python's set is
based on hashing, so it imposes different constraints -- namely, that
the items be hashable, rather than having them be comparable).

Moreover, the set of valid operations on a std::vector<T>, call it V, is
extremely different from the set of valid operations on an std::set<T>,
call it S. You can write V[3], but you can't write S[3], for example.
Here, too, your assertion that "you can switch freely" is utterly
baffling.
I don't know why this is different in Python !
It's not: neither in Python nor in C++ can you switch freely.
Using "add" instead of "append" and "update" instead of "extend" does
not makes sense to me.
Try, in C++'s example above mentioned, calling S.push_back(...), then...
does the inability to use *THAT* make sense to you? It sure does to
_me_: V has an order based on insertion, so "pushing a new item at the
back" makes sense (though Python's 'append' terminology is clearer). S
doesn't (no order in Python, order based on item values in C++), so it
makes no sense to even think of "pushing onto the back" (or equivalently
"appending"). Basically, the commonality between S and V is: you can
loop on either (net of order &c issues). And that holds in C++ as well
as in Python.
It's even worse because Python rely more on
interface than in inheritence ... so making classes similar helps a lot
reusability.
Not when the semantics are too different, as in the case of S and V.
Then, why can't you at all access sets element by position ? As Python
iterator are so poor (mainly, you cannot copy them ...) you have things


That's what itertools.tee is for (in Python 2.4).


Oky, thanks ^_^ So I should not need the __getitem__ anymore in sets.


Well, you surely didn't get operator[] on C++'s std::set<...>s, that's
for sure...
It would be false (in 2.4) -- set is a builtin type, side by side with
list and dict, not relying on either of them.


Yes, I know it will become a built-ins type ! But that does not explain


It is one now -- just download 2.4.
this interface incompatible with lists. I would expect maximum
compatibility to be able to exchange the types. Even compatibility
between dictionnary and list is usefull ... because at some point you
can consider a dictionnary as a list of tuples. And then, too bad you
cannot because of simple design decisions.
You make claims, above, about the container templates in C++'s standard
library, that leave me in doubt. The very fact that you make strong
assertions without even hedging them in any way appears to indicate you
are experienced with those templates and familiar with them -- yet what
you say is false, appearing to indicate you aren't. So, I don't know if
I can take for granted all the rich conceptual framework of generic
programming, and just point to the fact that the interface differences
between Python's sequences, mappings and sets reflect exactly their
conceptual semantic differences -- but then, that should be prima facie
obvious!; or, do I need to try to forget the fact that you ever
mentioned C++, and try to teach and explain both the conceptual
underpinnings and the practical implications of generic programming (or
more generally signature-based polymorphism) in purely Pythonic terms.

You can probably clarify your background, and therefore which approach
would help you most, more effectively, than I can just "stab in the
dark" at it. Do you know generic programming, and the offerings in
C++'s standard library, solidly and in-depth? If so, do you feel that
C++'s standard library offers you the right amount of polymorphism among
vectors, sets and maps, while you perceive a lesser amount among
Python's lists, sets and dicts, and in what aspects precisely? Whether
this is the case, or not, can you present a simplified problem where you
feel there isn't enough polymorphism for your purposes? I may perhaps
be able to offer help in any case, both practical and theoretical, but
there is such a huge gulf among the various possibilities here that I
think it's probably better for you to explain exactly where you come
from and what you're after, first -- a theoretical background to help
you understand the rationale behind the choices, a practical approach to
use the best solutions for specific problems, etc, etc...!

No, what I want is not a new container type. I just want to be able to
change the container I use without changing all my codes ! It's too
It depends on what operations you're doing on that container, of course.
expensive. At the end, I finish using sets and hashsets from the STL
exported to Python as sequences. The efficiency is the same and it just
works great :) I just feel bad not using the "standard" sets class ...


But -- there IS no operator[] in the C++ standard library's sets! And
hashset (spelled with an underscore, I believe) is indeed a SGI
extension (so, in STL stricto sensu), but it's not in the C++ standard,
and AFAIK doesn't support indexing either...
Alex
Jul 18 '05 #4
Ok, I'll try to answer and better explain what I want to understand.

Alex Martelli a écrit :
Pierre Barbier de Reuille <pi************@cirad.fr> wrote:
...

Nope -- to use std::set<T>, you need to be able to define a < comparator
that obeys the required semantic rules (either as an operator< on T, or
separately in various possible ways); to use std::vector<T>, you don't.
So, you cannot "switch freely" -- there are constraints.
Right ... but take hash_set and you have exactly the same kind of "set".
In my case, I already had the '<' operator ... so it wasn't a problem.
Then, you'll argue that you need to be hashable. True, but as I said (I
don't mean here I was clear about that :o) ), the choice of the
container usually depend on the data set.

That's because a std::set<...> is based on ordering (a Python's set is
based on hashing, so it imposes different constraints -- namely, that
the items be hashable, rather than having them be comparable).

Moreover, the set of valid operations on a std::vector<T>, call it V, is
extremely different from the set of valid operations on an std::set<T>,
call it S. You can write V[3], but you can't write S[3], for example.
Here, too, your assertion that "you can switch freely" is utterly
baffling.
Yes, that's why I needed __getitem__ only because some things cannot be
done (in Python 2.3) without.

I don't know why this is different in Python !

It's not: neither in Python nor in C++ can you switch freely.


True, but in part ^_^ In C++, you can easyly design algorithm working
with whatever sequence you have. But for that you'll have to use
templates, iterators, probably tags, and on. But it's possible ... I
don't see a way to do it in Python ... and that's my whole point.

Using "add" instead of "append" and "update" instead of "extend" does
not makes sense to me.

Try, in C++'s example above mentioned, calling S.push_back(...), then...
does the inability to use *THAT* make sense to you? It sure does to
_me_: V has an order based on insertion, so "pushing a new item at the
back" makes sense (though Python's 'append' terminology is clearer). S
doesn't (no order in Python, order based on item values in C++), so it
makes no sense to even think of "pushing onto the back" (or equivalently
"appending"). Basically, the commonality between S and V is: you can
loop on either (net of order &c issues). And that holds in C++ as well
as in Python.


Yes, but use insert_iterator and you can insert with both vector and set
with the same syntax ... even if, obviously, the position you gave for
set will be ignored. So :

inserter(result, result.end()) = 3;

will insert 3 into result, at the end if result is "ordered" (as the
list in python), or where it has to be for "sorted" or "unordered"
containers. Genericity in C++ comes from template functions most of the
time.

It's even worse because Python rely more on
interface than in inheritence ... so making classes similar helps a lot
reusability.

Not when the semantics are too different, as in the case of S and V.


Ok, that's something I don't agree ... for me there is a common semantic
between S and V which is "storing values in a linear container". After
that, you can put the properties you need in that container, but if I
need to write an algorithm whose only assumption is "a linear container"
I want to be able to use whatever linear container I have ... and I
don't want to reimplement it for each.

Then, why can't you at all access sets element by position ? As Python
iterator are so poor (mainly, you cannot copy them ...) you have things

That's what itertools.tee is for (in Python 2.4).
Oky, thanks ^_^ So I should not need the __getitem__ anymore in sets.

Well, you surely didn't get operator[] on C++'s std::set<...>s, that's
for sure...


Well, no ... but it's not needed ^_^ And if I really need it you have
the "advance" template function that do the __getitem__ stuff in the
most efficient way, whatever your _linear_ container is.

It would be false (in 2.4) -- set is a builtin type, side by side with
list and dict, not relying on either of them.
Yes, I know it will become a built-ins type ! But that does not explain

It is one now -- just download 2.4.

this interface incompatible with lists. I would expect maximum
compatibility to be able to exchange the types. Even compatibility
between dictionnary and list is usefull ... because at some point you
can consider a dictionnary as a list of tuples. And then, too bad you
cannot because of simple design decisions.

You make claims, above, about the container templates in C++'s standard
library, that leave me in doubt. The very fact that you make strong
assertions without even hedging them in any way appears to indicate you
are experienced with those templates and familiar with them -- yet what
you say is false, appearing to indicate you aren't. So, I don't know if
I can take for granted all the rich conceptual framework of generic
programming, and just point to the fact that the interface differences
between Python's sequences, mappings and sets reflect exactly their
conceptual semantic differences -- but then, that should be prima facie
obvious!; or, do I need to try to forget the fact that you ever
mentioned C++, and try to teach and explain both the conceptual
underpinnings and the practical implications of generic programming (or
more generally signature-based polymorphism) in purely Pythonic terms.


So I hope I made my point clearer this time ^_^ I used Python for a bit
less than 2 years now (so that's not much) and C++ for 5 years with big
projects in C++ (so that's more). So I'm no reusability specialist, but
I used a lot the Boost library, where I learned a lot about reusability
in C++. What I read about Python reusability was based on
signature-based polymorphism as you call it. But then, it fails with
lists, sets and dictionnary. And I don't really see any other way to do
that (but to add methods to the sets ... and I don't want to do so).

You can probably clarify your background, and therefore which approach
would help you most, more effectively, than I can just "stab in the
dark" at it. Do you know generic programming, and the offerings in
C++'s standard library, solidly and in-depth? If so, do you feel that
C++'s standard library offers you the right amount of polymorphism among
vectors, sets and maps, while you perceive a lesser amount among
Python's lists, sets and dicts, and in what aspects precisely? Whether
this is the case, or not, can you present a simplified problem where you
feel there isn't enough polymorphism for your purposes? I may perhaps
be able to offer help in any case, both practical and theoretical, but
there is such a huge gulf among the various possibilities here that I
think it's probably better for you to explain exactly where you come
from and what you're after, first -- a theoretical background to help
you understand the rationale behind the choices, a practical approach to
use the best solutions for specific problems, etc, etc...!
Ok, so how do you write an algorithm on a container without specifying
which, knowing that you cannot use the iterators, neither the signature
of the class, nor the derivation ?

That was all the methods I imagined for Python's genericity. But none of
them work when you see the base containers.


No, what I want is not a new container type. I just want to be able to
change the container I use without changing all my codes ! It's too

It depends on what operations you're doing on that container, of course.

expensive. At the end, I finish using sets and hashsets from the STL
exported to Python as sequences. The efficiency is the same and it just
works great :) I just feel bad not using the "standard" sets class ...

But -- there IS no operator[] in the C++ standard library's sets! And
hashset (spelled with an underscore, I believe) is indeed a SGI
extension (so, in STL stricto sensu), but it's not in the C++ standard,
and AFAIK doesn't support indexing either...


As I said higher in this mail, that's true, set and hashsets does not
support operator[], but you can use the "advance" function to emulate
it. It inefficient with those, but when I have no other choices using
Python technics ... At least those parts are efficients with lists and
as inefficient as enything else with sets.


Alex


Pierre, who hopes to be clearer that time :)
Jul 18 '05 #5
Pierre Barbier de Reuille <pierre.barbier <at> cirad.fr> writes:

For me, sets are first sequences. I use set where lists are inefficient
or inadapted, for example when I need to remove redundancy and that I
don't care about the positions of my elements.


If you still need a fixed ordering after removing duplicates, why don't you just
revert to a list?
l = [11, 1, 2, 2, 3, 5, 11, 3, 7, 5]
list(set(l)) [1, 2, 3, 5, 7, 11] d = list(set(l))
d[3] 5

Of course, if you want to maintain the order that items are inserted, you can
easily do this manually:
class orderedset(set): .... def __init__(self, seq):
.... super(set, self).__init__()
.... self._items = []
.... for item in seq:
.... self.add(item)
.... def add(self, item):
.... if item not in self:
.... self._items.append(item)
.... set.add(self, item)
.... def __iter__(self):
.... return iter(self._items)
.... def __getitem__(self, i):
.... return self._items[i]
.... list(orderedset([11, 1, 2, 2, 3, 5, 11, 3, 7, 5])) [11, 1, 2, 3, 5, 7] orderedset([11, 1, 2, 2, 3, 5, 11, 3, 7, 5])[2:4]

[2, 3]

And, while you're writing your own class to do this, you can change your method
names to append, etc. if you like. =)

The point here, I guess, is that the uses you intend for sets probably do not
align so well with the uses everyone else has for sets. Personally, I don't
expect any ordering in a set, and if I need such ordering I convert back to a
list. I also almost never use the lst[x] syntax, preferring in general to just
iterate with a for-loop (or list comprehension, or generator expression...) If
you find you do need the lst[x] syntax all the time, first, I'd double-check to
see if I couldn't write things better with a for-loop, and then, if I was still
pretty sure I couldn't, I'd write my orderedset class as above...

Steve

Jul 18 '05 #6
Steven Bethard a écrit :
Pierre Barbier de Reuille <pierre.barbier <at> cirad.fr> writes:
For me, sets are first sequences. I use set where lists are inefficient
or inadapted, for example when I need to remove redundancy and that I
don't care about the positions of my elements.

If you still need a fixed ordering after removing duplicates, why don't you just
revert to a list?


I don't _need_ __getitem__ for itself. I need it because I don't know
how to iterate over a list and store the current position for future
use. If I really need to access a given element by position, obviously,
set would not be the right container. But the sets as they are
implemented now only allows simple traversal, and that's not acceptable
for lots of algorithms (or at the cost of complexity and complexity in
Python becomes quickly critical because iterating is a slow operation).

l = [11, 1, 2, 2, 3, 5, 11, 3, 7, 5]
list(set(l))
[1, 2, 3, 5, 7, 11]
d = list(set(l))
d[3]
5

Of course, if you want to maintain the order that items are inserted, you can
easily do this manually:

class orderedset(set):
... def __init__(self, seq):
... super(set, self).__init__()
... self._items = []
... for item in seq:
... self.add(item)
... def add(self, item):
... if item not in self:
... self._items.append(item)
... set.add(self, item)
... def __iter__(self):
... return iter(self._items)
... def __getitem__(self, i):
... return self._items[i]
...
list(orderedset([11, 1, 2, 2, 3, 5, 11, 3, 7, 5]))
[11, 1, 2, 3, 5, 7]
orderedset([11, 1, 2, 2, 3, 5, 11, 3, 7, 5])[2:4]


[2, 3]

And, while you're writing your own class to do this, you can change your method
names to append, etc. if you like. =)

The point here, I guess, is that the uses you intend for sets probably do not
align so well with the uses everyone else has for sets. Personally, I don't
expect any ordering in a set, and if I need such ordering I convert back to a
list. I also almost never use the lst[x] syntax, preferring in general to just
iterate with a for-loop (or list comprehension, or generator expression...) If
you find you do need the lst[x] syntax all the time, first, I'd double-check to
see if I couldn't write things better with a for-loop, and then, if I was still
pretty sure I couldn't, I'd write my orderedset class as above...

Steve


I agree, I can always create my own structure (and that's what I did in
exporting the STL set and hash_set) but I feel it's a shame to do so
when the builtin structure has all the needed characteristic, but my
algorithm cannot use it without a complete rewritting (and debugging ...)

Pierre
Jul 18 '05 #7
Pierre Barbier de Reuille <pierre.barbier <at> cirad.fr> writes:

I don't _need_ __getitem__ for itself. I need it because I don't know
how to iterate over a list and store the current position for future
use.


I'm not sure I understand what you're trying to do here. Could you give an
example? Iterators are resumable, so at least one sense of "storing the current
position" is just retaining the iterator:
l = [11, 1, 2, 2, 3, 5, 11, 3, 7, 5]
items = iter(set(l))
for i, item in enumerate(items): .... print item
.... if i == 3:
.... break
....
1
2
3
5 for item in items:

.... print item
....
7
11

If you could post an example of what you're doing right now with lst[x], maybe
we could suggest some other ways of approaching it that take advantage of, say
__iter__, instead of __getitem__?

Steve

Jul 18 '05 #8

"Pierre Barbier de Reuille" <pi************@cirad.fr> wrote in message
news:41**********************@news.free.fr...
For me, sets are first sequences.


In mathematics, a set is determined by its members. Period. The concept
'ordered set' -- a set with an order relation on its members, where a
relation is a set of ordered pairs -- is a specialization (and
complexification) of the concept 'set'. If some other language misuses
'set' to mean 'ordered set', you have been done a disservice by that
language.

Terry J. Reedy

Jul 18 '05 #9
Pierre Barbier de Reuille <pi************@cirad.fr> wrote:
...
So, you cannot "switch freely" -- there are constraints.
Right ... but take hash_set and you have exactly the same kind of "set".
In my case, I already had the '<' operator ... so it wasn't a problem.
Then, you'll argue that you need to be hashable. True, but as I said (I
don't mean here I was clear about that :o) ), the choice of the
container usually depend on the data set.


Sure.
call it S. You can write V[3], but you can't write S[3], for example.
Here, too, your assertion that "you can switch freely" is utterly
baffling.


Yes, that's why I needed __getitem__ only because some things cannot be
done (in Python 2.3) without.


Such as...? 'tee' is only needed if you need to "move forward a bit" on
one branch and then _backtrack_ to a previous snapshot, or the like. If
all you're doing is suspend an iteration and then resume it, say:

iter1 = iter(container1)
iter2 = iter(container2)

print 'here is the part of ct 1 up to the first FOO if any'
for item1 in iter1:
print item1
if item1 == 'FOO': break
else:
print 'Ah well, no FOO!-('

print 'and here is the part of ct 2 up to the first BAR if any'
for item2 in iter2:
print item2
if item2 == 'BAR': break
else:
print 'Sigh, no BAR!-('

if item1=='FOO':
print 'And here is the rest of ct1'
for item1 in iter1: print item1

if item2=='BAR':
print 'And here is the rest of ct2'
for item2 in iter2: print item2

IOW, as long as you only need forward iteration (w/o
snapshot/backtrack), you're in clover.
If you do need to "snapshot and backtrack", in the _general_ case you're
best off snapshotting into a list anyway - tee, in general, can't do
better than that (until and unless the PEP on copyable iterators lets
suitable iterator objects cooperate more deeply with tee...). So
"cannot be done" is a definite overbid -- you may just be spending a
little more time or memory than you'd like, worst case.

It's not: neither in Python nor in C++ can you switch freely.


True, but in part ^_^ In C++, you can easyly design algorithm working
with whatever sequence you have. But for that you'll have to use
templates, iterators, probably tags, and on. But it's possible ... I
don't see a way to do it in Python ... and that's my whole point.


Python has iterators, and doesn't need templates, since signature based
polymorphism is intrinsic. Python only has forward iterators (actually
only _input_ iterators, since you can never alter the underlying
container directly through an iterator) so what would it do w/tags?

If your complaint is that Python needs more categories of iterators,
please express your complaint in these terms -- rather than in unrelated
terms criticizing the design of container types, a cri de coeur that's
actually quite unrelated to the complaint.

I used to wish for 'snapshottable'/'restartable' iterators, but 'tee'
basically gives me the equivalent (though I'd still dearly love to know
that all suitable iterators _cooperate optimally_ with 'tee', that, at
least for builtin ones, is just a matter of implementation -- exposing
it as _design_ is ``just'' an issue of letting *user-coded* iterators
cooperate with 'tee' just as fully, if and when feasible).

I still do wish (somewhat) for some architecture for output iterators,
and (more) for one for forward iterators. (I would imagine that, if we
had forward iterators, an output iterator would just be a forward
iterator that doesn't return anything useful upon .next() and doesn't
raise StopIteration -- just offers something like a .set(newval) method,
or whatever a forward iterator would offer for that purpose).

Funny enough, not once since C++ stopped being my main programming
language have I found myself wishing for any other category of iterator
in any other language. Perhaps just because the kind of container that
naturally can support a forward-backward iterator isn't common anyway,
and a random-access iterator basically means indexing -- and then I'd
rather use indexing (and slicing...).
Yes, but use insert_iterator and you can insert with both vector and set
with the same syntax ... even if, obviously, the position you gave for
set will be ignored. So :

inserter(result, result.end()) = 3;

will insert 3 into result, at the end if result is "ordered" (as the
list in python), or where it has to be for "sorted" or "unordered"
containers. Genericity in C++ comes from template functions most of the
time.
Again, this basically seems to me to be a wish for more categories of
iterator. The key reason this cannot be done directly in Python is that
there is no output iterator concept.

Of course, it's not rocket science to 'roll your own' in Python (in
Python style, of course). The equivalent of "specializing a template"
requires some kind of typetesting or featuretesting; and we don't have a
"just past the end" ``bookmark'' to use, so, for example, the insert
method of a list can never be used to mean 'append' (sigh). So, we
probably need name distinctions between insert_at_end and
insert_before_position -- I don't think anybody will ever convince GvR
that the latter should accept and ignore a position argument, btw. But
at least insert_at_end should be uncontroversial:

def insert_at_end(container):
try: return container.append
except AttributeError: pass
try: return container.add
except AttributeError: pass
try: update = container.update
except AttributeError: pass
else: return lambda xy: update((xy,))

raise TypeError, "can't insert-at-end in %s' % type(container)

and then of course the equivalent of your snippet above is

insert_at_end(result)(3)

[[ You'll never manage to get 'assign to a function call' into Python,
but the syntax sugar of a call is pretty reasonable here anyhoo. ]]

Of course, if result is a dict this will fail (you'd need a 2-item
tuple, or something that can pass for it in a dim light, as the
argument), just like your C++ snippet will fail if result is a std::map
(you'd need an std::pair as the RHS) -- though C++ will fail at compile
time, and Python at runtime, as usual.

Not when the semantics are too different, as in the case of S and V.


Ok, that's something I don't agree ... for me there is a common semantic
between S and V which is "storing values in a linear container". After
that, you can put the properties you need in that container, but if I
need to write an algorithm whose only assumption is "a linear container"
I want to be able to use whatever linear container I have ... and I
don't want to reimplement it for each.


So write that insert_at_end function -- it IS pretty simple, after all.
Put it in your myneatutils.py module and import it everywhere. Or put
it into a module that you call std.py instead, so the parallel with C++
will be even clearer -- std::inserter vs std.insert_at_end, etc.

Faking that you can insert at any point for a container in which you
CAN'T is extremely dubious, btw. Even the _at_end part of the name of
the function I suggest may be going overboard. After all, any algorithm
that depends on the common semantics between S and V had better NOT
believe anything it inserts does go 'at end', so maybe a name such as
'insert_wherever_it_is_most_convenient_for_you' might be better
(although a little bit verbose).

Well, you surely didn't get operator[] on C++'s std::set<...>s, that's
for sure...


Well, no ... but it's not needed ^_^ And if I really need it you have
the "advance" template function that do the __getitem__ stuff in the
most efficient way, whatever your _linear_ container is.


Again, that's obviously trivial to write for Python, since Python
doesn't have random access iterators, so the most efficient way for any
container c to get an iterator starting at its 5th item will be to start
with iter(c) and step forward 5 times (with itertools.islice)...;-)
(Well, with a list you could slice -- if you never needed to write into
the original list, which, as I mentioned, Python has no iterator kind
for doing, anyway; if you want to prefer this solution, try/except is
just friend -- feature-testing, basically).

Again, a reasonable complaint (if you can convince Raymond Hettinger,
basically) is that itertools.islice should be able to cooperate
specially with some privileged iterators (builtin ones only, or more
ambitiously, user-coded ones too), like itertools.tee should, again by
feature-testing. Essentially an implementation issue as long as you
stick to builtins, becomes a design one if you want usercoded ones too.

So I hope I made my point clearer this time ^_^ I used Python for a bit
Sure! If you had whined about the povery of kinds of iterators that
Python supplies, rather than (as per subject) about the interface of the
set classes, it would have been quite clear from the start.

lists, sets and dictionnary. And I don't really see any other way to do
that (but to add methods to the sets ... and I don't want to do so).
No need: the Python equivalent of C++'s specializing of templates (just
like that of C++'s overloading) is feature-testing (or type-testing, but
generally that's _not_ warranted -- feature-testing allows user-coded
types to cooperate just as well!). So, the above-exemplified function
is the Python equivalent of a C++ template function with
specializations. Of course Python does more at runtime that C++ does at
compiletime -- but that applies in several other ways, anyhow;-).

Ok, so how do you write an algorithm on a container without specifying
which, knowing that you cannot use the iterators, neither the signature
of the class, nor the derivation ?
You encapsulate the feature-testing (looking for suitable methods to
either return directly, or wrap in inner functions -- not necessarily
lambdas, an inner 'def' will often be preferable -- to return) into
functions, once -- equivalent to a C++ template functions
w/specializations. Then you write your algorithms in terms of those
functions, just like, in C++, you'd write them in terms of template
functions (either coded by yourself, or taken from some library coded by
others, such as the excellent Boost people).
That was all the methods I imagined for Python's genericity. But none of
them work when you see the base containers.


"The signature of the class" (or, rather, some specific methods thereof)
is really all you need. But it's best to push that detail into a
function, as above. Should you ever need to optimize away the process
for some given types, it's easy, too:

at_end_inserters_by_type = {
list: list.append,
collections.deque: collections.deque.append,
set: set.add,
}
def insert_at_end(container):
m = at_end_inserters_by_type.get(type(container))
if m is not None: return m.__get__(container)
...same as before...

Note that I have kept the helper dictionary exposed so it's easy to add
optimized customizations, just as in C++ it's easy to add template
specializations. Not just for user-coded types, but also for existing
ones which this particular utility module isn't optimizing for, e.g.:

import std
import array
std.at_end_inserters_by_type[array.array] = array.array.append

and the like (assuming std.py is the module where we have insert_at_end
and its supporting optimizations-helper-dictionary).

If you have several such functions needing helper dictionaries, you may
make a dictionary of dictionaries, keyed first by the function, and
maybe expose a helper function to register optimization-specializations
for a given function, rather than let user-level code deal with directly
indexing dictionaries. Then the above specialization might become:

std.specialize(insert_at_end, array.array, array.array.append)

As long as, after checking the specializations, the function continues
with feature-testing as before, the specializations may only be needed
for performance, or for existing types (maybe from some library) that
supply the needed functionality but need interface adaptation, basically
like in C++.
Alex
Jul 18 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: Anon Email | last post by:
Hi people, I'm learning about header files in C++. The following is code from Bartosz Milewski: // Code const int maxStack = 16; class IStack
26
by: Marius Horak | last post by:
As in subject. Thanks MH
20
by: Ole Hanson | last post by:
I am accessing my database through an interface, to allow future substitution of the physical datastore - hence I would like to declare in my Interface that my DAL-objects implementing the...
9
by: phl | last post by:
hi, I am kind of confused aobut interfaces and abstract classes. In short as I understand it, an interface is like a contract between the class and the interface, so that certain funtions must...
10
by: Brett | last post by:
I'm still trying to figure out concrete reasons to use one over the other. I understand the abstract class can have implementation in its methods and derived classes can only inherit one abstract...
7
by: Brett | last post by:
I have created an interface with five properties and five methods. Five classes are accessed through this interface and contain the implementation. This setup works fine now. All the classes are...
15
by: mr.peteryu | last post by:
Hi, Can someone explain the idea behind casting to an interface? For example: -> I have an IInterface that contains a Read() method. -> I have an object "obj" that implements IInterface. ...
20
by: Luc Kumps | last post by:
(Sorry about the previous post, it got transmitted before it was complete) We try to separate implementation and interface defintions, but we run into a problem. I hope the guru's can solve this,...
15
by: Xah Lee | last post by:
On Java's Interface Xah Lee, 20050223 In Java the language, there's this a keyword “interfaceâ€. In a functional language, a function can be specified by its name and parameter specs....
9
by: msbs1984 | last post by:
Difference Between Interface and Abstract Class?
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.