By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,369 Members | 1,207 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,369 IT Pros & Developers. It's quick & easy.

regex into str

P: n/a

I want to use regular expressions with less typing. Like this:

A / 'b.(..)' # test for regex 'b...' in A
A[0] # get the last whole match
A[1] # get the first group in the last match

A /= 'b.','X',1 # replace first occurence of regex 'b.'
# in A with 'X'
A /= 'b.','X' # replace all occurences of regex 'b.'
# in A with 'X'

This works fine if I create a class derived from 'str' and put
in the right functions. I have a demonstration below.

But what I really want is to insert these functions into class
'str' itself, so I can use them on ordinary strings:

def __div__(self, regex):
p = re.compile(regex)
self.__sre__ = p.search(self)
return str(self.__sre__.group())

setattr(str, '__div__', __div__)

But when I try this I get:

TypeError: can't set attributes of built-in/extension type 'str'

I there a way to get this done?


Working example:
#!/usr/bin/env python

import re

class Mystr(str):
def __div__(self, regex):
p = re.compile(regex)
self.sre = p.search(self)
return Mystr(self.sre.group())

def __idiv__(self, tpl):
try:
regex, repl, count = tpl
except ValueError:
regex, repl = tpl
count = 0
p = re.compile(regex)
return Mystr(p.sub(repl, self, count))

def __call__(self, g):
return self.sre.group(g)

if __name__ == '__main__':
a = Mystr('abcdebfghbij')
print "a :", a

print "Match a / 'b(..)(..)' :",
print a / 'b(..)(..)' # find match

print "a[0], a[1], a[2] :",
print a[0], a[1], a[2] # print letters from string

print "a(0), a(1), a(2) :",
print a(0), a(1), a(2) # print matches

print "a :", a

a /= 'b.', 'X', 1 # find and replace once
print "a :", a

a /= 'b.', 'X' # find and replace all
print "a :", a

--
Peter Kleiweg L:NL,af,da,de,en,ia,nds,no,sv,(fr,it) S:NL,de,en,(da,ia)
info: http://www.let.rug.nl/~kleiweg/ls.html

Jul 18 '05 #1
Share this Question
Share on Google+
12 Replies


P: n/a
I wrote:
I want to use regular expressions with less typing. Like this:

A / 'b.(..)' # test for regex 'b...' in A
A[0] # get the last whole match
A[1] # get the first group in the last match


I meant:

A(0)
A(1)

While A[0] and A[1] should work like normal string indexing.
--
Peter Kleiweg L:NL,af,da,de,en,ia,nds,no,sv,(fr,it) S:NL,de,en,(da,ia)
info: http://www.let.rug.nl/~kleiweg/ls.html

The Halloween Documents: http://www.opensource.org/halloween/

Jul 18 '05 #2

P: n/a
This is intended to be impossible.

Even if you could assign to str.__div__ (and this is very deliberately and
specifically disallowed) you would end up disappointed, because strings
are immutable. That means there's nowhere to store "the last match",
no way to mutate the string with the "/=" operator, and also that the
interpreter is free to use the same storage for two equal strings.

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFBMTmBJd01MZaTXX0RAhtuAKCCvsIDU+KsdTlGsnolsj AVeyL2ZwCgjyU5
Kumg9fZvpWWHMFgWRHNBZ/A=
=HNDP
-----END PGP SIGNATURE-----

Jul 18 '05 #3

P: n/a
Jeff Epler schreef:
This is intended to be impossible.
That i svery annoying.
Even if you could assign to str.__div__ (and this is very deliberately and
specifically disallowed) you would end up disappointed, because strings
are immutable. That means there's nowhere to store "the last match",
no way to mutate the string with the "/=" operator, and also that the
interpreter is free to use the same storage for two equal strings.


This works:

a += 'x'

So this should too:

a /= 'x'


Is there a way to tell Python that '' should be something else
than str?

--
Peter Kleiweg L:NL,af,da,de,en,ia,nds,no,sv,(fr,it) S:NL,de,en,(da,ia)
info: http://www.let.rug.nl/~kleiweg/ls.html

Jul 18 '05 #4

P: n/a
Peter Kleiweg wrote:
I want to use regular expressions with less typing. Like this:


Interesting example.

Send it to the Python Cookbook folks.

http://aspn.activestate.com/ASPN/Cookbook/Python

Istvan.
Jul 18 '05 #5

P: n/a
On Sun, 29 Aug 2004 04:08:57 +0200, Peter Kleiweg wrote:
This works:

a += 'x'
In the sense you mean, no it doesn't.

Python 2.3.4 (#1, Jun 8 2004, 17:41:43)
[GCC 3.3.3 20040217 (Gentoo Linux 3.3.3, propolice-3.3-7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
a = 'a'
a 'a' id(a) 1074037376 a += 'b'
a 'ab' id(a)

1074272448

Note the two different id numbers. 'a' and 'ab' are not the same string.
Is there a way to tell Python that '' should be something else
than str?


No.
Jul 18 '05 #6

P: n/a
Jeremy Bowers schreef:
On Sun, 29 Aug 2004 04:08:57 +0200, Peter Kleiweg wrote:
This works:

a += 'x'
In the sense you mean, no it doesn't.


I mean in the sense it does, and in that sense it does.

Python 2.3.4 (#1, Jun 8 2004, 17:41:43)
[GCC 3.3.3 20040217 (Gentoo Linux 3.3.3, propolice-3.3-7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
a = 'a'
a 'a' id(a) 1074037376 a += 'b'
a 'ab' id(a) 1074272448

Note the two different id numbers. 'a' and 'ab' are not the same string.


That is not relevant. What matters is that a += 'b' does what it
is supposed to do.

Is there a way to tell Python that '' should be something else
than str?


No.


Bummer.
--
Peter Kleiweg L:NL,af,da,de,en,ia,nds,no,sv,(fr,it) S:NL,de,en,(da,ia)
info: http://www.let.rug.nl/~kleiweg/ls.html

Jul 18 '05 #7

P: n/a
Jeff Epler wrote:
This is intended to be impossible.

Even if you could assign to str.__div__ (and this is very deliberately and
specifically disallowed) you would end up disappointed, because strings
are immutable. That means there's nowhere to store "the last match",
no way to mutate the string with the "/=" operator, and also that the
interpreter is free to use the same storage for two equal strings.


Another reason for not allowing this is that modifying builtins can lead to
severe bugs, as other libs might rely on certain functionality. If you
change that, things start getting very weird....
--
Regards,

Diez B. Roggisch
Jul 18 '05 #8

P: n/a
Diez B. Roggisch schreef:
Jeff Epler wrote:
This is intended to be impossible.

Even if you could assign to str.__div__ (and this is very deliberately and
specifically disallowed) you would end up disappointed, because strings
are immutable. That means there's nowhere to store "the last match",
no way to mutate the string with the "/=" operator, and also that the
interpreter is free to use the same storage for two equal strings.


Another reason for not allowing this is that modifying builtins can lead to
severe bugs, as other libs might rely on certain functionality. If you
change that, things start getting very weird....


Programming causes bugs. That's not a reason to disallow programming.

--
Peter Kleiweg L:NL,af,da,de,en,ia,nds,no,sv,(fr,it) S:NL,de,en,(da,ia)
info: http://www.let.rug.nl/~kleiweg/ls.html

Jul 18 '05 #9

P: n/a
Hi,
Programming causes bugs. That's not a reason to disallow programming.


Well, with that attitude I suggest you start coding in assembler - all
freedom you can imagine, no rules. Every bit is subject to your personal
interpretation. Or C, which is basically assembler with more names and
curly braces.

But for some reasons people started developing and using higher level
languages, that forbid certain techniques - and everytime somebody yelled
"I want to be free to do what I want" - python has its very special case of
that with its whitspace-dependend blocking structure that frequently causes
people confronted with it to reject python as language.

People started using higher level languages because they actually _did_
decrease the amount of problems programming caused - so the projects could
get more eleaborated.

Don't get me wrong - there is a lot of decisions to be made in language
design, and lots of them are debatable - python is no exception from that
rule. But as I said before - allowing builtins to be manipulated aks for
more trouble than its worth. Imagine a len() that always returns 1 - no
matter what you feed it. Or _if_ you're allowed to change builtin-types
constructors - then who is to decide which of the 5 different string
implementations in the various modules imported is the one to use?

The only thing you really need is a simple constructor for your undoubtly
interesting and useful string-derived class. Overloading "" as the string
constructor isn't possible - for the simple reason that only a statically
typed language could distinct the usage of the "classic" constructor vs.
your enhanced version.

So what you could do is to modify the builtins-_dict_ - that is possible -
to contain a new constructor s in it - then creating your strings is just

s('foo')

Which is only three chars more than usual string creation.

Another approach would be some macro-mechanism - but python doesn't have
such facility builtin - and I'm not aware that there is a widely adopted
3rd-party module/extenion out in the wild.

--
Regards,

Diez B. Roggisch
Jul 18 '05 #10

P: n/a
On Sun, 29 Aug 2004 12:58:57 +0200, Peter Kleiweg wrote:
That is not relevant. What matters is that a += 'b' does what it
is supposed to do.
If you want to be that way, then fine. No, it doesn't do what it is
"supposed" to do, in the context you are discussing. It does not add a "b"
to the original string, it creates a new string containing the original
contents plus the new contents.

Let me refresh your memory: You are arguing that you should be able to
apply a division operator to a string to apply a regex to it. When people
told you it was impossible, because strings were immutable, you said that
a += "b" did what you wanted. In context, this was clearly a claim that
strings are mutable, although that is a translation of your claim from
what you said here:

This works:

a += 'x'

So this should too:

a /= 'x'

which was in reply to
Even if you could assign to str.__div__ (and this is very deliberately and
specifically disallowed) you would end up disappointed, because strings
are immutable. That means there's nowhere to store "the last match",
no way to mutate the string with the "/=" operator, and also that the
interpreter is free to use the same storage for two equal strings.


Therefore, I say again: Your example does not do what you are implicitly
claiming it does. Therefore it is not a counter example to Jeff Epler's
(correct) claim.

So when you say above "This works:", I'm saying, no, it doesn't, not in
the sense you were replying to Jeff. It is not mutating the string
originally referenced by a, you still can't do that, and your
attempted counterpoint has no force, no meaning. (Any other putative
meaning you would claim it had after the fact would simply be a
non-sequitor, in context.)

Any successful attempt to mutate a string (in pure Python) would
constitute a serious bug in Python. (Any successful mutation by a C
extension would constitute a major, Python-breaking bug in that extension.)

Jul 18 '05 #11

P: n/a
Jeremy Bowers schreef:
On Sun, 29 Aug 2004 12:58:57 +0200, Peter Kleiweg wrote:
That is not relevant. What matters is that a += 'b' does what it
is supposed to do.


If you want to be that way, then fine. No, it doesn't do what it is
"supposed" to do, in the context you are discussing. It does not add a "b"
to the original string, it creates a new string containing the original
contents plus the new contents.


Yes, exactly as it is supposed to do.
--
Peter Kleiweg L:NL,af,da,de,en,ia,nds,no,sv,(fr,it) S:NL,de,en,(da,ia)
info: http://www.let.rug.nl/~kleiweg/ls.html

Jul 18 '05 #12

P: n/a
Peter Kleiweg <in*************@nl.invalid> wrote:
...
Is there a way to tell Python that '' should be something else
than str?


No.


Bummer.


I think you might be happier with Ruby -- beyond a number of trivia, the
big difference between the two, from my POV, is that in Ruby you can
alter built-ins, in Python you can't. Which is why I personally stick
with Python, but to anyone who mostly likes Python but believes he would
get better programs by modifying built-ins, I suggest Ruby.
Alex
Jul 18 '05 #13

This discussion thread is closed

Replies have been disabled for this discussion.