In trying to sdebug why a certain regex wasn't working like I expected
it to, I came across this strange (to me) behavior. The file I am
trying to match definitely contains many instances of the letter 'a',
so I would expect the regex
rgxPrev = re.compile('.*?a.*?')
to match it the string contents of the file. But it doesn't. Here is
a complete example
import re, urllib
rgxPrev = re.compile('.*?a.*?')
url = 'http://nitace.bsd.uchicago.edu:8080/files/share/showdown_example2.html'
s = urllib.urlopen(url).read()
m = rgxPrev.match(s)
print m
print s.find('a')
m is None (no match) and the s.find('a') reports an 'a' at index 48.
I read the regex to mean non-greedy match of anything up to an a,
followed by non-greedy match of anything following an a, which this
file should match.
Or am I insane?
John Hunter
hunter:~/python/projects/poker/data/pokerroom> uname -a
Linux hunter.paradise.lost 2.4.20-8smp #1 SMP Thu Mar 13 17:45:54 EST 2003 i686
i686 i386 GNU/Linux
hunter:~/python/projects/poker/data/pokerroom> python
Python 2.3.2 (#1, Oct 13 2003, 11:33:15)
[GCC 3.3.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Welcome to rlcompleter2 0.95
for nice experiences hit <tab> multiple times 8 1664
MAybe you meant:
import re, urllib
rgxPrev = re.compile('.*?a.*?')
url =
'http://nitace.bsd.uchicago.edu:8080/files/share/showdown_example2.html'
s = urllib.urlopen(url).read()
***m = match(rgxPrev,s)***
print m
print s.find('a')
match takes two arguments
"John Hunter" <jd******@ace.bsd.uchicago.edu> wrote in message
news:ma**************************************@pyth on.org... In trying to sdebug why a certain regex wasn't working like I expected it to, I came across this strange (to me) behavior. The file I am trying to match definitely contains many instances of the letter 'a', so I would expect the regex
rgxPrev = re.compile('.*?a.*?')
to match it the string contents of the file. But it doesn't. Here is a complete example
import re, urllib rgxPrev = re.compile('.*?a.*?')
url =
'http://nitace.bsd.uchicago.edu:8080/files/share/showdown_example2.html' s = urllib.urlopen(url).read() m = rgxPrev.match(s) print m print s.find('a')
m is None (no match) and the s.find('a') reports an 'a' at index 48.
I read the regex to mean non-greedy match of anything up to an a, followed by non-greedy match of anything following an a, which this file should match.
Or am I insane?
John Hunter
hunter:~/python/projects/poker/data/pokerroom> uname -a Linux hunter.paradise.lost 2.4.20-8smp #1 SMP Thu Mar 13 17:45:54 EST 2003
i686 i686 i386 GNU/Linux hunter:~/python/projects/poker/data/pokerroom> python Python 2.3.2 (#1, Oct 13 2003, 11:33:15) [GCC 3.3.1] on linux2 Type "help", "copyright", "credits" or "license" for more information. Welcome to rlcompleter2 0.95 for nice experiences hit <tab> multiple times
John Hunter wrote: In trying to sdebug why a certain regex wasn't working like I expected it to, I came across this strange (to me) behavior. The file I am trying to match definitely contains many instances of the letter 'a', so I would expect the regex
rgxPrev = re.compile('.*?a.*?')
This is a bogus regex - a '*' means "zero or more occurences" for the
expression to the left. '?' means "zero or one occurence" for the exp to
the left. I'm not exactly sure why this is not working, but its definitely
redundant. Eliminiating the redundancy gives you this:
rgxPrev = re.compile('.*a.*')
Works perfect.
Regards,
Diez
On Tue, 09 Dec 2003 09:43:24 -0600,
John Hunter <jd******@ace.bsd.uchicago.edu> wrote: rgxPrev = re.compile('.*?a.*?')
.. doesn't match newlines unless you specify the re.DOTALL / (?s) flag, so it
won't match unless 'a' is on the very first line. Add (?s) to your
expression, and it should work (though it'll be much slower than the .find()
method).
--amk
"Diez B. Roggisch" wrote: John Hunter wrote:
In trying to sdebug why a certain regex wasn't working like I expected it to, I came across this strange (to me) behavior. The file I am trying to match definitely contains many instances of the letter 'a', so I would expect the regex
rgxPrev = re.compile('.*?a.*?')
This is a bogus regex - a '*' means "zero or more occurences" for the expression to the left. '?' means "zero or one occurence" for the exp to the left.
Not true. See http://www.python.org/doc/current/lib/re-syntax.html :
*?, +?, ??
The "*", "+", and "?" qualifiers are all greedy; they match as much text
as possible. .... Adding "?" after the qualifier makes it perform the match
in non-greedy or minimal fashion; as few characters as possible will be
matched. ....
-Peter
John Hunter wrote: In trying to sdebug why a certain regex wasn't working like I expected it to, I came across this strange (to me) behavior. The file I am trying to match definitely contains many instances of the letter 'a', so I would expect the regex
rgxPrev = re.compile('.*?a.*?')
to match it the string contents of the file. But it doesn't. Here is
[...]
I read the regex to mean non-greedy match of anything up to an a, followed by non-greedy match of anything following an a, which this file should match.
There is a nice example where non-greedy regexes are really useful in A. M.
Kuchling's Regex Howto (http://www.amk.ca/python/howto/regex/regex.html)
Or am I insane?
This may be off-topic, but the easiest if not fastest way to find multiple
occurences of a string in a text is: import re r = re.compile("a") for m in r.finditer("abca\na"):
.... print m.start()
....
0
3
5
Peter
>> This is a bogus regex - a '*' means "zero or more occurences" for the expression to the left. '?' means "zero or one occurence" for the exp to the left.
Not true. See http://www.python.org/doc/current/lib/re-syntax.html :
*?, +?, ?? The "*", "+", and "?" qualifiers are all greedy; they match as much text as possible. .... Adding "?" after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched. ....
Hmm. But when thats true, what does ".??" then mean - the first ? is not
greedy, so it is nothing matched at all. The same is true for ".*?", and
".+?" is then equal to "." So what makes this useful? The regex in question
definitely didn't work with it.
Diez Hmm. But when thats true, what does ".??" then mean - the first ? is not greedy, so it is nothing matched at all. The same is true for ".*?", and ".+?" is then equal to "." So what makes this useful? The regex in question definitely didn't work with it.
Ok - I just found out - it makes sense when taking into account what follows
in the regex, as that will be matched earlier. Neat - didn't know that such
things existed.
Diez
>>>>> "Peter" == Peter Otten <__*******@web.de> writes:
Peter> This may be off-topic, but the easiest if not fastest way
Peter> to find multiple occurences of a string in a text is:
Right, I actually am using regex matching and not literal char
matching, but in trying to debug why my regex wasn't working, I
simplified it to the simplest case I could, which was a string
literal.
Thanks for the DOTALL pointer above.
JDH This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Will Clifton |
last post by:
Hello,
Spent all day yesterday reading about this and I still can't get it.
Perhaps my IQ is not much above room temperature...
My mySQL database is a simple inventory-type database with a...
|
by: Xah Lee |
last post by:
http://python.org/doc/2.4.1/lib/module-re.html
http://python.org/doc/2.4.1/lib/node114.html
---------
QUOTE
The module defines several functions, constants, and an exception. Some
of the...
|
by: Christ |
last post by:
Hi there,
i'm trying to make a regex, but it ain't working.
In just one regex expression I want to check a password that must meet
following requirements:
- at least 6 characters long
- at...
|
by: Daniel Billingsley |
last post by:
First, if MSFT is listening I'll say IMO the MSDN material is sorely lacking
in this area... it's just a whole bunch of information thrown at you and
you're left to yourself as to organizing it in...
|
by: DevBoy |
last post by:
I am in need of parsing string based on characters like / or { or } or ^
However, anytime I try and run the following code I do not the proper
results (It always returns the same string unparsed....
|
by: Tom Jones |
last post by:
Hi,
I have a component that accepts a string representing a class of files
(exactly like those you would pass to the 'dir' dos command, ie. '*.txt', or
'???.cpp').
An exception is generated...
|
by: Beeeeeeeeeeeeves |
last post by:
Hi
I do mostly programming in VB6 and C# although I like to dabble in C++ now and again, I was just wondering what is a good* regular expression library to use for C++, given that I DON'T want to...
|
by: Mark |
last post by:
I must create a routine that finds tokens in small, arbitrary VB code
snippets. For example, it might have to find all occurrences of
{Formula}
I was thinking that using regular expressions...
|
by: Jeff |
last post by:
....hoping someone can help someone still new to vb.net 2005 with something
new to him.
....been successfully using the regular expression validators from the
toolbox, but now I have need to do...
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: ryjfgjl |
last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
| |