471,337 Members | 1,177 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,337 software developers and data experts.

Limits on search length

I am trying to locate all lines in a suite of files with quoted strings of
particular lengths. A search pattern like r'".{15}"' finds 15-character
strings very nicely. But I have some very long ones, and a pattern like
r'".{272}"' fails miserably, even though I know I have at least one
272-character string.

In the short term, I can resort to locating the character positions of the
quotes, but this seemed like such an elegant solution I hate to see it not
work. The program is given below (sans imports), in case someone can spot
something I'm overlooking:

# Example usage: search.py *.txt \".{15}\"

filePattern = sys.argv[1]
searchPattern = sys.argv[2]
cpat = re.compile(searchPattern)

for fn in glob.glob(filePattern):
f = open(fn, "r")
if f:
lineNumber = 0
for line in f:
lineNumber += 1
m = cpat.search(line)
if m is not None:
print fn, "(", lineNumber, ")", line
f.close()
--
Daryl Lee
Open the Present -- it's a Gift!

Oct 1 '07 #1
4 1461
Daryl Lee <dl**@altaregos.comwrites:
I am trying to locate all lines in a suite of files with quoted
strings of particular lengths. A search pattern like r'".{15}"'
finds 15-character strings very nicely. But I have some very long
ones, and a pattern like r'".{272}"' fails miserably, even though I
know I have at least one 272-character string.
It seems to work for me. Which version of Python are you using?

Here is how I tested it. First, I modified your program so that it
actually runs (sys and re imports were missing) and removed
unnecessary globbing and file opening:

import sys, re

searchPattern = sys.argv[1]
cpat = re.compile(searchPattern)

lineNumber = 0
for line in sys.stdin:
lineNumber += 1
m = cpat.search(line)
if m is not None:
print "(", lineNumber, ")", line

Now, create a file with three lines, each with a string of different
length:

$ printf '"%*s"\n' 271 fl
$ printf '"%*s"\n' 272 >fl
$ printf '"%*s"\n' 273 >fl

And run the script:

$ python scriptfile '".{272}"' < fl
( 2 ) "[... 272 blanks]"

That looks correct to me.
In the short term, I can resort to locating the character positions
of the quotes,
You can also catch all strings and only filter those of the length you
care about.
Oct 1 '07 #2
Since you are getting the regular expression pattern via an argument I
would first check that searchPattern is what you expect. Shells can do
funny things with arguments containing special characters. Also, is
it possible that the quoted strings in the files contain escapes? For
example if a file contains the text "hello\n" would you consider that
6 characters or 7?

Oct 1 '07 #3
On Oct 1, 6:16 pm, Daryl Lee <d...@altaregos.comwrote:
I am trying to locate all lines in a suite of files with quoted strings of
particular lengths. A search pattern like r'".{15}"' finds 15-character
strings very nicely. But I have some very long ones, and a pattern like
r'".{272}"' fails miserably, even though I know I have at least one
272-character string.

In the short term, I can resort to locating the character positions of the
quotes, but this seemed like such an elegant solution I hate to see it not
work. The program is given below (sans imports), in case someone can spot
something I'm overlooking:

# Example usage: search.py *.txt \".{15}\"
filePattern = sys.argv[1]
searchPattern = sys.argv[2]
cpat = re.compile(searchPattern)
Most shells will expand *.txt to the list of files that match, so
you'll end up with the first .txt file as your 'filePattern', and the
second as the regexp. Could that be it?

--
Paul Hankin

Oct 1 '07 #4
On Oct 2, 3:16 am, Daryl Lee <d...@altaregos.comwrote:
I am trying to locate all lines in a suite of files with quoted strings of
particular lengths. A search pattern like r'".{15}"' finds 15-character
strings very nicely. But I have some very long ones, and a pattern like
r'".{272}"' fails miserably, even though I know I have at least one
272-character string.

In the short term, I can resort to locating the character positions of the
quotes, but this seemed like such an elegant solution I hate to see it not
work. The program is given below (sans imports), in case someone can spot
something I'm overlooking:

# Example usage: search.py *.txt \".{15}\"

filePattern = sys.argv[1]
searchPattern = sys.argv[2]
1. Learn an elementary debugging technique called "print the input".

print "pattern is", repr(searchPattern)

2. Fix your regular expression:
>>import re
patt = r'".{15}"'
patt
'".{15}"'
>>rx = re.compile(patt)
o = rx.search('"123456789012345"'); o
<_sre.SRE_Match object at 0x00B96918>
>>o.group()
'"123456789012345"'
>>o = rx.search('"1234567" "12345"'); o
<_sre.SRE_Match object at 0x00B96950>
>>o.group()
'"1234567" "12345"' ########## whoops ##########
>>>
>>patt = r'"[^"]{15}"' # or use the non-greedy ? tag
rx = re.compile(patt)
o = rx.search('"123456789012345"'); o
<_sre.SRE_Match object at 0x00B96918>
>>o.group()
'"123456789012345"'
>>o = rx.search('"1234567" "12345"'); o
o.group()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: 'NoneType' object has no attribute 'group'

3. Try building scripts from small TESTED parts e.g. in this case
write a function to find all quoted strings of length n inside a given
string. If you do that, you will KNOW there is no limit that stops you
finding a string of length 272, and you can then look for your error
elsewhere.

HTH,
John

Oct 2 '07 #5

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

25 posts views Thread by Maurice LING | last post: by
reply views Thread by todd | last post: by
1 post views Thread by Alan J. Flavell | last post: by
22 posts views Thread by Phlip | last post: by
2 posts views Thread by =?Utf-8?B?TWlrZSBLcmFsZXk=?= | last post: by
reply views Thread by rosydwin | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.