473,386 Members | 1,610 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

regex into str


I want to use regular expressions with less typing. Like this:

A / 'b.(..)' # test for regex 'b...' in A
A[0] # get the last whole match
A[1] # get the first group in the last match

A /= 'b.','X',1 # replace first occurence of regex 'b.'
# in A with 'X'
A /= 'b.','X' # replace all occurences of regex 'b.'
# in A with 'X'

This works fine if I create a class derived from 'str' and put
in the right functions. I have a demonstration below.

But what I really want is to insert these functions into class
'str' itself, so I can use them on ordinary strings:

def __div__(self, regex):
p = re.compile(regex)
self.__sre__ = p.search(self)
return str(self.__sre__.group())

setattr(str, '__div__', __div__)

But when I try this I get:

TypeError: can't set attributes of built-in/extension type 'str'

I there a way to get this done?


Working example:
#!/usr/bin/env python

import re

class Mystr(str):
def __div__(self, regex):
p = re.compile(regex)
self.sre = p.search(self)
return Mystr(self.sre.group())

def __idiv__(self, tpl):
try:
regex, repl, count = tpl
except ValueError:
regex, repl = tpl
count = 0
p = re.compile(regex)
return Mystr(p.sub(repl, self, count))

def __call__(self, g):
return self.sre.group(g)

if __name__ == '__main__':
a = Mystr('abcdebfghbij')
print "a :", a

print "Match a / 'b(..)(..)' :",
print a / 'b(..)(..)' # find match

print "a[0], a[1], a[2] :",
print a[0], a[1], a[2] # print letters from string

print "a(0), a(1), a(2) :",
print a(0), a(1), a(2) # print matches

print "a :", a

a /= 'b.', 'X', 1 # find and replace once
print "a :", a

a /= 'b.', 'X' # find and replace all
print "a :", a

--
Peter Kleiweg L:NL,af,da,de,en,ia,nds,no,sv,(fr,it) S:NL,de,en,(da,ia)
info: http://www.let.rug.nl/~kleiweg/ls.html

Jul 18 '05 #1
12 1962
I wrote:
I want to use regular expressions with less typing. Like this:

A / 'b.(..)' # test for regex 'b...' in A
A[0] # get the last whole match
A[1] # get the first group in the last match


I meant:

A(0)
A(1)

While A[0] and A[1] should work like normal string indexing.
--
Peter Kleiweg L:NL,af,da,de,en,ia,nds,no,sv,(fr,it) S:NL,de,en,(da,ia)
info: http://www.let.rug.nl/~kleiweg/ls.html

The Halloween Documents: http://www.opensource.org/halloween/

Jul 18 '05 #2
This is intended to be impossible.

Even if you could assign to str.__div__ (and this is very deliberately and
specifically disallowed) you would end up disappointed, because strings
are immutable. That means there's nowhere to store "the last match",
no way to mutate the string with the "/=" operator, and also that the
interpreter is free to use the same storage for two equal strings.

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFBMTmBJd01MZaTXX0RAhtuAKCCvsIDU+KsdTlGsnolsj AVeyL2ZwCgjyU5
Kumg9fZvpWWHMFgWRHNBZ/A=
=HNDP
-----END PGP SIGNATURE-----

Jul 18 '05 #3
Jeff Epler schreef:
This is intended to be impossible.
That i svery annoying.
Even if you could assign to str.__div__ (and this is very deliberately and
specifically disallowed) you would end up disappointed, because strings
are immutable. That means there's nowhere to store "the last match",
no way to mutate the string with the "/=" operator, and also that the
interpreter is free to use the same storage for two equal strings.


This works:

a += 'x'

So this should too:

a /= 'x'


Is there a way to tell Python that '' should be something else
than str?

--
Peter Kleiweg L:NL,af,da,de,en,ia,nds,no,sv,(fr,it) S:NL,de,en,(da,ia)
info: http://www.let.rug.nl/~kleiweg/ls.html

Jul 18 '05 #4
Peter Kleiweg wrote:
I want to use regular expressions with less typing. Like this:


Interesting example.

Send it to the Python Cookbook folks.

http://aspn.activestate.com/ASPN/Cookbook/Python

Istvan.
Jul 18 '05 #5
On Sun, 29 Aug 2004 04:08:57 +0200, Peter Kleiweg wrote:
This works:

a += 'x'
In the sense you mean, no it doesn't.

Python 2.3.4 (#1, Jun 8 2004, 17:41:43)
[GCC 3.3.3 20040217 (Gentoo Linux 3.3.3, propolice-3.3-7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
a = 'a'
a 'a' id(a) 1074037376 a += 'b'
a 'ab' id(a)

1074272448

Note the two different id numbers. 'a' and 'ab' are not the same string.
Is there a way to tell Python that '' should be something else
than str?


No.
Jul 18 '05 #6
Jeremy Bowers schreef:
On Sun, 29 Aug 2004 04:08:57 +0200, Peter Kleiweg wrote:
This works:

a += 'x'
In the sense you mean, no it doesn't.


I mean in the sense it does, and in that sense it does.

Python 2.3.4 (#1, Jun 8 2004, 17:41:43)
[GCC 3.3.3 20040217 (Gentoo Linux 3.3.3, propolice-3.3-7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
a = 'a'
a 'a' id(a) 1074037376 a += 'b'
a 'ab' id(a) 1074272448

Note the two different id numbers. 'a' and 'ab' are not the same string.


That is not relevant. What matters is that a += 'b' does what it
is supposed to do.

Is there a way to tell Python that '' should be something else
than str?


No.


Bummer.
--
Peter Kleiweg L:NL,af,da,de,en,ia,nds,no,sv,(fr,it) S:NL,de,en,(da,ia)
info: http://www.let.rug.nl/~kleiweg/ls.html

Jul 18 '05 #7
Jeff Epler wrote:
This is intended to be impossible.

Even if you could assign to str.__div__ (and this is very deliberately and
specifically disallowed) you would end up disappointed, because strings
are immutable. That means there's nowhere to store "the last match",
no way to mutate the string with the "/=" operator, and also that the
interpreter is free to use the same storage for two equal strings.


Another reason for not allowing this is that modifying builtins can lead to
severe bugs, as other libs might rely on certain functionality. If you
change that, things start getting very weird....
--
Regards,

Diez B. Roggisch
Jul 18 '05 #8
Diez B. Roggisch schreef:
Jeff Epler wrote:
This is intended to be impossible.

Even if you could assign to str.__div__ (and this is very deliberately and
specifically disallowed) you would end up disappointed, because strings
are immutable. That means there's nowhere to store "the last match",
no way to mutate the string with the "/=" operator, and also that the
interpreter is free to use the same storage for two equal strings.


Another reason for not allowing this is that modifying builtins can lead to
severe bugs, as other libs might rely on certain functionality. If you
change that, things start getting very weird....


Programming causes bugs. That's not a reason to disallow programming.

--
Peter Kleiweg L:NL,af,da,de,en,ia,nds,no,sv,(fr,it) S:NL,de,en,(da,ia)
info: http://www.let.rug.nl/~kleiweg/ls.html

Jul 18 '05 #9
Hi,
Programming causes bugs. That's not a reason to disallow programming.


Well, with that attitude I suggest you start coding in assembler - all
freedom you can imagine, no rules. Every bit is subject to your personal
interpretation. Or C, which is basically assembler with more names and
curly braces.

But for some reasons people started developing and using higher level
languages, that forbid certain techniques - and everytime somebody yelled
"I want to be free to do what I want" - python has its very special case of
that with its whitspace-dependend blocking structure that frequently causes
people confronted with it to reject python as language.

People started using higher level languages because they actually _did_
decrease the amount of problems programming caused - so the projects could
get more eleaborated.

Don't get me wrong - there is a lot of decisions to be made in language
design, and lots of them are debatable - python is no exception from that
rule. But as I said before - allowing builtins to be manipulated aks for
more trouble than its worth. Imagine a len() that always returns 1 - no
matter what you feed it. Or _if_ you're allowed to change builtin-types
constructors - then who is to decide which of the 5 different string
implementations in the various modules imported is the one to use?

The only thing you really need is a simple constructor for your undoubtly
interesting and useful string-derived class. Overloading "" as the string
constructor isn't possible - for the simple reason that only a statically
typed language could distinct the usage of the "classic" constructor vs.
your enhanced version.

So what you could do is to modify the builtins-_dict_ - that is possible -
to contain a new constructor s in it - then creating your strings is just

s('foo')

Which is only three chars more than usual string creation.

Another approach would be some macro-mechanism - but python doesn't have
such facility builtin - and I'm not aware that there is a widely adopted
3rd-party module/extenion out in the wild.

--
Regards,

Diez B. Roggisch
Jul 18 '05 #10
On Sun, 29 Aug 2004 12:58:57 +0200, Peter Kleiweg wrote:
That is not relevant. What matters is that a += 'b' does what it
is supposed to do.
If you want to be that way, then fine. No, it doesn't do what it is
"supposed" to do, in the context you are discussing. It does not add a "b"
to the original string, it creates a new string containing the original
contents plus the new contents.

Let me refresh your memory: You are arguing that you should be able to
apply a division operator to a string to apply a regex to it. When people
told you it was impossible, because strings were immutable, you said that
a += "b" did what you wanted. In context, this was clearly a claim that
strings are mutable, although that is a translation of your claim from
what you said here:

This works:

a += 'x'

So this should too:

a /= 'x'

which was in reply to
Even if you could assign to str.__div__ (and this is very deliberately and
specifically disallowed) you would end up disappointed, because strings
are immutable. That means there's nowhere to store "the last match",
no way to mutate the string with the "/=" operator, and also that the
interpreter is free to use the same storage for two equal strings.


Therefore, I say again: Your example does not do what you are implicitly
claiming it does. Therefore it is not a counter example to Jeff Epler's
(correct) claim.

So when you say above "This works:", I'm saying, no, it doesn't, not in
the sense you were replying to Jeff. It is not mutating the string
originally referenced by a, you still can't do that, and your
attempted counterpoint has no force, no meaning. (Any other putative
meaning you would claim it had after the fact would simply be a
non-sequitor, in context.)

Any successful attempt to mutate a string (in pure Python) would
constitute a serious bug in Python. (Any successful mutation by a C
extension would constitute a major, Python-breaking bug in that extension.)

Jul 18 '05 #11
Jeremy Bowers schreef:
On Sun, 29 Aug 2004 12:58:57 +0200, Peter Kleiweg wrote:
That is not relevant. What matters is that a += 'b' does what it
is supposed to do.


If you want to be that way, then fine. No, it doesn't do what it is
"supposed" to do, in the context you are discussing. It does not add a "b"
to the original string, it creates a new string containing the original
contents plus the new contents.


Yes, exactly as it is supposed to do.
--
Peter Kleiweg L:NL,af,da,de,en,ia,nds,no,sv,(fr,it) S:NL,de,en,(da,ia)
info: http://www.let.rug.nl/~kleiweg/ls.html

Jul 18 '05 #12
Peter Kleiweg <in*************@nl.invalid> wrote:
...
Is there a way to tell Python that '' should be something else
than str?


No.


Bummer.


I think you might be happier with Ruby -- beyond a number of trivia, the
big difference between the two, from my POV, is that in Ruby you can
alter built-ins, in Python you can't. Which is why I personally stick
with Python, but to anyone who mostly likes Python but believes he would
get better programs by modifying built-ins, I suggest Ruby.
Alex
Jul 18 '05 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Jon Maz | last post by:
Hi All, Am getting frustrated trying to port the following (pretty simple) function to CSharp. The problem is that I'm lousy at Regular Expressions.... //from...
9
by: Tim Conner | last post by:
Is there a way to write a faster function ? public static bool IsNumber( char Value ) { if (Regex.IsMatch( Value.ToString(), @"^+$" )) { return true; } else return false; }
20
by: jeevankodali | last post by:
Hi I have an .Net application which processes thousands of Xml nodes each day and for each node I am using around 30-40 Regex matches to see if they satisfy some conditions are not. These Regex...
17
by: clintonG | last post by:
I'm using an .aspx tool I found at but as nice as the interface is I think I need to consider using others. Some can generate C# I understand. Your preferences please... <%= Clinton Gallagher ...
6
by: Extremest | last post by:
I have a huge regex setup going on. If I don't do each one by itself instead of all in one it won't work for. Also would like to know if there is a faster way tried to use string.replace with all...
7
by: Extremest | last post by:
I am using this regex. static Regex paranthesis = new Regex("(\\d*/\\d*)", RegexOptions.IgnoreCase); it should find everything between parenthesis that have some numbers onyl then a forward...
3
by: aspineux | last post by:
My goal is to write a parser for these imaginary string from the SMTP protocol, regarding RFC 821 and 1869. I'm a little flexible with the BNF from these RFC :-) Any comment ? tests= def...
15
by: morleyc | last post by:
Hi, i would like to remove a number of characters from my string (\t \r \n which are throughout the string), i know regex can do this but i have no idea how. Any pointers much appreciated. Chris
4
by: CJ | last post by:
Is this the format to parse a string and return the value between the item? Regex pRE = new Regex("<File_Name>.*>(?<insideText>.*)</File_Name>"); I am trying to parse this string. ...
0
by: Karch | last post by:
I have these two methods that are chewing up a ton of CPU time in my application. Does anyone have any suggestions on how to optimize them or rewrite them without Regex? The most time-consuming...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.