469,362 Members | 2,352 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,362 developers. It's quick & easy.

Characters contain themselves?


While testing recursive algoritms dealing with generic lists I stumbled
on infinite loops which were triggered by the fact that (at least for my
version of Pyton) characters contain themselves.See session:

system prompt% python
Python 2.3.5 (#2, Feb 9 2005, 00:38:15)
[GCC 3.3.5 (Debian 1:3.3.5-8)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
....
'a' is 'a' True 'a' in 'a' True 'a' in ['a'] True ....


Leading to paradoxes and loops objects which contain themselves (and
other kinds of monsters) are killed in set theory with the Axiom of
Foundation:=)

But let's go back to more earthly matters. I couldn't find any clue in a
python FAQ after having googled with the following "Python strings FAQ"
about why this design choice and how to avoid falling in this trap
without having to litter my code everywhere with tests for stringiness
each time I process a generic list of items.

Any hints would be appreciated.

Apr 7 '06 #1
14 1304
WENDUM Denis 47.76.11 (agent):
While testing recursive algoritms dealing with generic lists I stumbled
on infinite loops which were triggered by the fact that (at least for my
version of Pyton) characters contain themselves.


No, strings contain characters. And 'a' is a string consisting of one
character.

"The items of a string are characters. There is no separate character
type; a character is represented by a string of one item."
http://docs.python.org/ref/types.html

(One item of what type, one might ask)

--
René Pijlman
Apr 7 '06 #2
gry
In fact, not just characters, but strings contain themselves:
'abc' in 'abc' True

This is a very nice(i.e. clear and concise) shortcut for:
'the rain in spain stays mainly'.find('rain') != -1

True

Which I always found contorted and awkward.

Could you be a bit more concrete about your complaint?

-- George
[Thanks, I did enjoy looking up the Axiom of Foundation!]

Apr 7 '06 #3
Rene Pijlman <re********************@my.address.is.invalid> writes:
WENDUM Denis 47.76.11 (agent):
While testing recursive algoritms dealing with generic lists I stumbled
on infinite loops which were triggered by the fact that (at least for my
version of Pyton) characters contain themselves.


No, strings contain characters. And 'a' is a string consisting of one
character.

"The items of a string are characters. There is no separate character
type; a character is represented by a string of one item."
http://docs.python.org/ref/types.html

(One item of what type, one might ask)


Good point. ". . .represented by a string of length one" would be
better.
--
Mark Jackson - http://www.alumni.caltech.edu/~mjackson
An information system based on theory isolated from reality
is bound to fail. - Mitch Kabay
Apr 7 '06 #4
WENDUM Denis 47.76.11 (agent) a écrit :

While testing recursive algoritms dealing with generic lists I stumbled
on infinite loops which were triggered by the fact that (at least for my
version of Pyton) characters contain themselves.


There is *no* character type in Python. 'a' is a string, and of course
the string 'a' is a substring of string 'a'.
Apr 7 '06 #5
On Fri, 07 Apr 2006 15:50:53 +0200, WENDUM Denis 47.76.11 (agent) wrote:

While testing recursive algoritms dealing with generic lists I stumbled
on infinite loops which were triggered by the fact that (at least for my
version of Pyton) characters contain themselves.
A "character" is just a string of length 1.
[snip] Leading to paradoxes and loops objects which contain themselves (and
other kinds of monsters) are killed in set theory with the Axiom of
Foundation:=)
Let me see if I understand you... you would like behaviour like this?
'a' in 'bbabb' True 'a' in 'bbab' True 'a' in 'bab' True 'a' in 'ab' True 'a' in 'a' False # not the real result

Ain't going to happen.

But let's go back to more earthly matters. I couldn't find any clue in a
python FAQ after having googled with the following "Python strings FAQ"
about why this design choice and how to avoid falling in this trap
without having to litter my code everywhere with tests for stringiness
each time I process a generic list of items.

Any hints would be appreciated.


I think that if you want to do work with sets, use sets, not strings.

In Python 2.3, you need:
import sets


In Python 2.4, Sets (note the initial capital) are a built-in.

--
Steven.

Apr 8 '06 #6
On Fri, 07 Apr 2006 15:50:53 +0200, WENDUM Denis 47.76.11 (agent) wrote:
But let's go back to more earthly matters. I couldn't find any clue in a
python FAQ after having googled with the following "Python strings FAQ"
about why this design choice and how to avoid falling in this trap
without having to litter my code everywhere with tests for stringiness
each time I process a generic list of items.

Any hints would be appreciated.


Here is another suggestion.

If you absolutely must use strings rather than sets or lists, create a
custom string class:
class Mystr(str): .... def __contains__(self, other):
.... if self == other: return False
.... return super(Mystr, self).__contains__(other)
.... s = Mystr("hello world")
"x" in s False "w" in s True "hello" in s True "hello world" in s

False
Hope this helps.
--
Steven.

Apr 8 '06 #7

congratulations for (ostensibly) discovering the Barber's paradox (if
the village barber shaves all and only those who don't shave
tehmselves, who shaves the barber?
http://en.wikipedia.org/wiki/Barber_paradox) in python ! :-D

as far as i see it, you complaint is not just that any string X
contains itself but that string X can contain another string Y (i.e.
object of class string to contain another of class string) - where you
understand "contain" as per the operator "in" to be set-theory
operator, when in fact the meaning put for strings is instead "has a
substring".

therefore your grudge is not just with
'a' in 'a'
but also with
'a' in 'abcd'

here is excerpt from the reference manual:
----------------------------------------
The operators in and not in test for set membership. x in s evaluates
to true if x is a member of the set s, and false otherwise. x not in s
returns the negation of x in s. The set membership test has
traditionally been bound to sequences; an object is a member of a set
if the set is a sequence and contains an element equal to that object.
However, it is possible for an object to support membership tests
without being a sequence. In particular, dictionaries support
membership testing as a nicer way of spelling key in dict; other
mapping types may follow suit.

For the list and tuple types, x in y is true if and only if there
exists an index i such that x == y[i] is true.

For the Unicode and string types, x in y is true if and only if x is a
substring of y. An equivalent test is y.find(x) != -1. Note, x and y
need not be the same type; consequently, u'ab' in 'abc' will return
True. Empty strings are always considered to be a substring of any
other string, so "" in "abc" will return True.
----------------------------------------

it is apparent "in" was overriden for strings for convenience's sake,
not to get freaky on the therory of sets.

what can you do about it? well, you can check for string type
specifically but there are no guarantees in life: someone else can
define new type with "in" that behaves like that: say "interval(x,y)",
where "interval(x,y) in interval(a,b)" checks if [x,y] is a
sub-interval of [a,b] - very intuitive - but there you have the problem
again!

or you can specifically check if the objects are from a "semanthically
supported group" of classes - but that will hamper authomatic extension
by introducing new types.

- Nas

WENDUM Denis 47.76.11 (agent) wrote:
While testing recursive algoritms dealing with generic lists I stumbled
on infinite loops which were triggered by the fact that (at least for my
version of Pyton) characters contain themselves.See session:
>>> 'a' is 'a' True >>> 'a' in 'a' True >>> 'a' in ['a'] True >>> ....


Leading to paradoxes and loops objects which contain themselves (and
other kinds of monsters) are killed in set theory with the Axiom of
Foundation:=)

But let's go back to more earthly matters. I couldn't find any clue in a
python FAQ after having googled with the following "Python strings FAQ"
about why this design choice and how to avoid falling in this trap
without having to litter my code everywhere with tests for stringiness
each time I process a generic list of items.

Any hints would be appreciated.


Apr 8 '06 #8
WENDUM Denis 47.76.11 (agent) wrote:
While testing recursive algoritms dealing with generic lists I stumbled
on infinite loops which were triggered by the fact that (at least for my
version of Pyton) characters contain themselves.See session:


Strings are sequences and this is a problem for recursive functions that
process nested lists. You do need to test for stringiness somehow. IIRC
the usual solutions are

- explicit test for string: isinstance(xx, basestring)

- explicit test for list: isinstance(xx, list) - this fails for
user-defined sequences that don't subclass list

- test for __iter__ attribute. IIUC this relies on an implementation
detail that strings don't define __iter__:

In [6]: hasattr('aa', '__iter__')
Out[6]: False

In [7]: hasattr([], '__iter__')
Out[7]: True

Kent
Apr 8 '06 #9
WENDUM Denis 47.76.11 (agent) wrote:

While testing recursive algoritms dealing with generic lists I stumbled
on infinite loops which were triggered by the fact that (at least for my
version of Pyton) characters contain themselves.


Note that the empty string is contained in all strings, including itself.
bool('') False '' in ''

True
Apr 10 '06 #10


gr*@ll.mit.edu wrote:
In fact, not just characters, but strings contain themselves:

'abc' in 'abc'
True

This is a very nice(i.e. clear and concise) shortcut for:

'the rain in spain stays mainly'.find('rain') != -1
True

Which I always found contorted and awkward.

Could you be a bit more concrete about your complaint?

-- George
[Thanks, I did enjoy looking up the Axiom of Foundation!] Here is the minimal context triggering an infinite descent in recursion.
From the answers I've got it seems I'll have to check if I'm iterating
on a string or on another kind of "list". python

Python 2.3.5 (#2, Feb 9 2005, 00:38:15)
[GCC 3.3.5 (Debian 1:3.3.5-8)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
import sys
sys.setrecursionlimit(10)
def f(List): .... try: # assuming List supports iteration
.... new=[f(item) for item in List]
.... except:# if no iteration is possible
.... new=1
.... return new
.... f([1, [2,3]]) # f substitutes each non iterable item by 1 (silly butshows the problem) [1, [1, 1]] f([1, [2.3, (5,6,7)]]) [1, [1, [1, 1, 1]]] f('bac') [[[[[[[[1]]]]]]], [[[[[[[1]]]]]]], [[[[[[[1]]]]]]]] sys.setrecursionlimit(5)
f('bac') [[[1]], [[1]], [[1]]] # each item in 'bac' is a character,

ie. a string of length one on wich one can iterate
and so triggers an infinite descent of recursion.
...

Apr 10 '06 #11
WENDUM Denis 47.76.11 (agent) wrote:
... From the answers I've got it seems I'll have to check if I'm
iterating on a string or on another kind of "list"....
import sys
sys.setrecursionlimit(10)
def f(List): ... try: # assuming List supports iteration
... new=[f(item) for item in List]
... except:# if no iteration is possible
... new=1
... return new f([1, [2,3]]) # f substitutes each non iterable item by 1 (silly
butshows the problem) [1, [1, 1]] f([1, [2.3, (5,6,7)]]) [1, [1, [1, 1, 1]]] f('bac') [[[[[[[[1]]]]]]], [[[[[[[1]]]]]]], [[[[[[[1]]]]]]]] sys.setrecursionlimit(5)
f('bac') [[[1]], [[1]], [[1]]] # each item in 'bac' is a character,

ie. a string of length one on wich one can iterate
and so triggers an infinite descent of recursion.


I'd rewrite your 'f' as follows:
def copier(List):
try: # assuming List supports iteration
new = [f(item) for item in List]
except TypeError: # Non-iterables get replaced
return '?'
except RuntimeError: # Recursion failures get shown
return 'Recursion'
return new

Note, even if you got your wish and somehow you couldn't have an issue
with characters in strings, you would have to deal with this:

simple_list = [3.1415]
container = [simple_list]
simple_list.append(container)

Your function (and mine) assumes the argument is a DAG (Directed Acyclic
Graph), but there is no such guarantee about data structures in python.

--Scott David Daniels
sc***********@acm.org
Apr 10 '06 #12
WENDUM Denis 47.76.11 (agent) wrote:
While testing recursive algoritms dealing with generic lists I stumbled
on infinite loops which were triggered by the fact that (at least for my
version of Pyton) characters contain themselves. [snip] Leading to paradoxes and loops objects which contain themselves (and
other kinds of monsters) are killed in set theory with the Axiom of
Foundation:=)


You could always use an "is-proper-subset-of" function, which is closer
to the intent of your algorithm. Using Jamitzky's very clever infix
recipe [1], you can even write it as an infix operator:

#Jamitzky's infix-operator class, abbreviated
class Infix:
def __init__(self, function):
self.function = function
def __ror__(self, other):
return Infix(lambda x, self=self, other=other:
self.function(other, x))
def __or__(self, other):
return self.function(other)
def __call__(self, value1, value2):
return self.function(value1, value2)

# define our is-proper-subset operator...
ips = Infix(lambda a, b: (a is not b) and (a in b))
print 'a' |ips| 'a' False print 'a' |ips| 'abc' True print '' |ips| ''

False

Of course, |ips| looks like Lips-L, which is odd; but you could pick a
better name. ;-)

Graham

[1] http://aspn.activestate.com/ASPN/Coo.../Recipe/384122

Apr 10 '06 #13
Graham Fawcett <gr************@gmail.com> wrote:
You could always use an "is-proper-subset-of" function, which is closer
to the intent of your algorithm. Using Jamitzky's very clever infix
recipe [1], you can even write it as an infix operator:

#Jamitzky's infix-operator class, abbreviated
class Infix:
[ ... ]

# define our is-proper-subset operator...
ips = Infix(lambda a, b: (a is not b) and (a in b))
print 'a' |ips| 'a'False print 'a' |ips| 'abc'True
Unfortunately:
print 'abc' |ips| 'abc' False print 'a'+'bc' |ips| 'abc' True

Which might not be what you want. On the other hand, it's a simple
fix:
ips = Infix(lambda a, b: (a != b) and (a in b))
print 'a' |ips| 'abc' True
print 'a'+'bc' |ips| 'abc'

False
[1] http://aspn.activestate.com/ASPN/Coo.../Recipe/384122


--
\S -- si***@chiark.greenend.org.uk -- http://www.chaos.org.uk/~sion/
___ | "Frankly I have no feelings towards penguins one way or the other"
\X/ | -- Arthur C. Clarke
her nu becomež se bera eadward ofdun hlęddre heafdes bęce bump bump bump
Apr 11 '06 #14
Sion Arrowsmith wrote:
Unfortunately:
print 'a'+'bc' |ips| 'abc' True

Which might not be what you want. On the other hand, it's a simple
fix: ips = Infix(lambda a, b: (a != b) and (a in b))
print 'a'+'bc' |ips| 'abc'


Ah, good point.

Graham

Apr 11 '06 #15

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

35 posts views Thread by David Mathog | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by suresh191 | last post: by
1 post views Thread by Marylou17 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.