473,392 Members | 1,330 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,392 software developers and data experts.

Limits on search length

I am trying to locate all lines in a suite of files with quoted strings of
particular lengths. A search pattern like r'".{15}"' finds 15-character
strings very nicely. But I have some very long ones, and a pattern like
r'".{272}"' fails miserably, even though I know I have at least one
272-character string.

In the short term, I can resort to locating the character positions of the
quotes, but this seemed like such an elegant solution I hate to see it not
work. The program is given below (sans imports), in case someone can spot
something I'm overlooking:

# Example usage: search.py *.txt \".{15}\"

filePattern = sys.argv[1]
searchPattern = sys.argv[2]
cpat = re.compile(searchPattern)

for fn in glob.glob(filePattern):
f = open(fn, "r")
if f:
lineNumber = 0
for line in f:
lineNumber += 1
m = cpat.search(line)
if m is not None:
print fn, "(", lineNumber, ")", line
f.close()
--
Daryl Lee
Open the Present -- it's a Gift!

Oct 1 '07 #1
4 1528
Daryl Lee <dl**@altaregos.comwrites:
I am trying to locate all lines in a suite of files with quoted
strings of particular lengths. A search pattern like r'".{15}"'
finds 15-character strings very nicely. But I have some very long
ones, and a pattern like r'".{272}"' fails miserably, even though I
know I have at least one 272-character string.
It seems to work for me. Which version of Python are you using?

Here is how I tested it. First, I modified your program so that it
actually runs (sys and re imports were missing) and removed
unnecessary globbing and file opening:

import sys, re

searchPattern = sys.argv[1]
cpat = re.compile(searchPattern)

lineNumber = 0
for line in sys.stdin:
lineNumber += 1
m = cpat.search(line)
if m is not None:
print "(", lineNumber, ")", line

Now, create a file with three lines, each with a string of different
length:

$ printf '"%*s"\n' 271 fl
$ printf '"%*s"\n' 272 >fl
$ printf '"%*s"\n' 273 >fl

And run the script:

$ python scriptfile '".{272}"' < fl
( 2 ) "[... 272 blanks]"

That looks correct to me.
In the short term, I can resort to locating the character positions
of the quotes,
You can also catch all strings and only filter those of the length you
care about.
Oct 1 '07 #2
Since you are getting the regular expression pattern via an argument I
would first check that searchPattern is what you expect. Shells can do
funny things with arguments containing special characters. Also, is
it possible that the quoted strings in the files contain escapes? For
example if a file contains the text "hello\n" would you consider that
6 characters or 7?

Oct 1 '07 #3
On Oct 1, 6:16 pm, Daryl Lee <d...@altaregos.comwrote:
I am trying to locate all lines in a suite of files with quoted strings of
particular lengths. A search pattern like r'".{15}"' finds 15-character
strings very nicely. But I have some very long ones, and a pattern like
r'".{272}"' fails miserably, even though I know I have at least one
272-character string.

In the short term, I can resort to locating the character positions of the
quotes, but this seemed like such an elegant solution I hate to see it not
work. The program is given below (sans imports), in case someone can spot
something I'm overlooking:

# Example usage: search.py *.txt \".{15}\"
filePattern = sys.argv[1]
searchPattern = sys.argv[2]
cpat = re.compile(searchPattern)
Most shells will expand *.txt to the list of files that match, so
you'll end up with the first .txt file as your 'filePattern', and the
second as the regexp. Could that be it?

--
Paul Hankin

Oct 1 '07 #4
On Oct 2, 3:16 am, Daryl Lee <d...@altaregos.comwrote:
I am trying to locate all lines in a suite of files with quoted strings of
particular lengths. A search pattern like r'".{15}"' finds 15-character
strings very nicely. But I have some very long ones, and a pattern like
r'".{272}"' fails miserably, even though I know I have at least one
272-character string.

In the short term, I can resort to locating the character positions of the
quotes, but this seemed like such an elegant solution I hate to see it not
work. The program is given below (sans imports), in case someone can spot
something I'm overlooking:

# Example usage: search.py *.txt \".{15}\"

filePattern = sys.argv[1]
searchPattern = sys.argv[2]
1. Learn an elementary debugging technique called "print the input".

print "pattern is", repr(searchPattern)

2. Fix your regular expression:
>>import re
patt = r'".{15}"'
patt
'".{15}"'
>>rx = re.compile(patt)
o = rx.search('"123456789012345"'); o
<_sre.SRE_Match object at 0x00B96918>
>>o.group()
'"123456789012345"'
>>o = rx.search('"1234567" "12345"'); o
<_sre.SRE_Match object at 0x00B96950>
>>o.group()
'"1234567" "12345"' ########## whoops ##########
>>>
>>patt = r'"[^"]{15}"' # or use the non-greedy ? tag
rx = re.compile(patt)
o = rx.search('"123456789012345"'); o
<_sre.SRE_Match object at 0x00B96918>
>>o.group()
'"123456789012345"'
>>o = rx.search('"1234567" "12345"'); o
o.group()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: 'NoneType' object has no attribute 'group'

3. Try building scripts from small TESTED parts e.g. in this case
write a function to find all quoted strings of length n inside a given
string. If you do that, you will KNOW there is no limit that stops you
finding a string of length 272, and you can then look for your error
elsewhere.

HTH,
John

Oct 2 '07 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

25
by: Maurice LING | last post by:
Hi, I think I've hit a system limit in python when I try to construct a list of 200,000 elements. My error is malloc: vm_allocate (size = 2400256) failed...... Just wondering is this...
0
by: todd | last post by:
here is a search tool SP I wrote. How many times have you wanted to search all your Stored procs or views (in a database) for a keyword but couldn't!? Well now you can! THis can makes life a...
1
by: Alan J. Flavell | last post by:
What are the theoretical and practical limits on the length of a GET query string, currently? Strange to say, I found this rather simple question hard to answer, possibly because of searching...
22
by: Phlip | last post by:
C++ers: Here's an open ended STL question. What's the smarmiest most templated way to use <string>, <algorithms> etc. to turn this: " able search baker search charlie " into this: " able...
3
by: Richard S | last post by:
CODE: ASP.NET with C# DATABASE: ACCES alright, im having a problem, probably a small thing, but i cant figure out, nor find it in any other post, or on the internet realy (probably cuz i wouldnt...
2
by: =?Utf-8?B?TWlrZSBLcmFsZXk=?= | last post by:
In my ASP.NET application, I'd like to set limits on the maximum size of an uploaded file. Normally I'd just set the maxRequestLength of the httpRuntime element in web.config. But in this case,...
0
Debadatta Mishra
by: Debadatta Mishra | last post by:
Introduction In this article I will provide you an approach to manipulate an image file. This article gives you an insight into some tricks in java so that you can conceal sensitive information...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.