473,396 Members | 1,933 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Limits on search length

I am trying to locate all lines in a suite of files with quoted strings of
particular lengths. A search pattern like r'".{15}"' finds 15-character
strings very nicely. But I have some very long ones, and a pattern like
r'".{272}"' fails miserably, even though I know I have at least one
272-character string.

In the short term, I can resort to locating the character positions of the
quotes, but this seemed like such an elegant solution I hate to see it not
work. The program is given below (sans imports), in case someone can spot
something I'm overlooking:

# Example usage: search.py *.txt \".{15}\"

filePattern = sys.argv[1]
searchPattern = sys.argv[2]
cpat = re.compile(searchPattern)

for fn in glob.glob(filePattern):
f = open(fn, "r")
if f:
lineNumber = 0
for line in f:
lineNumber += 1
m = cpat.search(line)
if m is not None:
print fn, "(", lineNumber, ")", line
f.close()
--
Daryl Lee
Open the Present -- it's a Gift!

Oct 1 '07 #1
4 1531
Daryl Lee <dl**@altaregos.comwrites:
I am trying to locate all lines in a suite of files with quoted
strings of particular lengths. A search pattern like r'".{15}"'
finds 15-character strings very nicely. But I have some very long
ones, and a pattern like r'".{272}"' fails miserably, even though I
know I have at least one 272-character string.
It seems to work for me. Which version of Python are you using?

Here is how I tested it. First, I modified your program so that it
actually runs (sys and re imports were missing) and removed
unnecessary globbing and file opening:

import sys, re

searchPattern = sys.argv[1]
cpat = re.compile(searchPattern)

lineNumber = 0
for line in sys.stdin:
lineNumber += 1
m = cpat.search(line)
if m is not None:
print "(", lineNumber, ")", line

Now, create a file with three lines, each with a string of different
length:

$ printf '"%*s"\n' 271 fl
$ printf '"%*s"\n' 272 >fl
$ printf '"%*s"\n' 273 >fl

And run the script:

$ python scriptfile '".{272}"' < fl
( 2 ) "[... 272 blanks]"

That looks correct to me.
In the short term, I can resort to locating the character positions
of the quotes,
You can also catch all strings and only filter those of the length you
care about.
Oct 1 '07 #2
Since you are getting the regular expression pattern via an argument I
would first check that searchPattern is what you expect. Shells can do
funny things with arguments containing special characters. Also, is
it possible that the quoted strings in the files contain escapes? For
example if a file contains the text "hello\n" would you consider that
6 characters or 7?

Oct 1 '07 #3
On Oct 1, 6:16 pm, Daryl Lee <d...@altaregos.comwrote:
I am trying to locate all lines in a suite of files with quoted strings of
particular lengths. A search pattern like r'".{15}"' finds 15-character
strings very nicely. But I have some very long ones, and a pattern like
r'".{272}"' fails miserably, even though I know I have at least one
272-character string.

In the short term, I can resort to locating the character positions of the
quotes, but this seemed like such an elegant solution I hate to see it not
work. The program is given below (sans imports), in case someone can spot
something I'm overlooking:

# Example usage: search.py *.txt \".{15}\"
filePattern = sys.argv[1]
searchPattern = sys.argv[2]
cpat = re.compile(searchPattern)
Most shells will expand *.txt to the list of files that match, so
you'll end up with the first .txt file as your 'filePattern', and the
second as the regexp. Could that be it?

--
Paul Hankin

Oct 1 '07 #4
On Oct 2, 3:16 am, Daryl Lee <d...@altaregos.comwrote:
I am trying to locate all lines in a suite of files with quoted strings of
particular lengths. A search pattern like r'".{15}"' finds 15-character
strings very nicely. But I have some very long ones, and a pattern like
r'".{272}"' fails miserably, even though I know I have at least one
272-character string.

In the short term, I can resort to locating the character positions of the
quotes, but this seemed like such an elegant solution I hate to see it not
work. The program is given below (sans imports), in case someone can spot
something I'm overlooking:

# Example usage: search.py *.txt \".{15}\"

filePattern = sys.argv[1]
searchPattern = sys.argv[2]
1. Learn an elementary debugging technique called "print the input".

print "pattern is", repr(searchPattern)

2. Fix your regular expression:
>>import re
patt = r'".{15}"'
patt
'".{15}"'
>>rx = re.compile(patt)
o = rx.search('"123456789012345"'); o
<_sre.SRE_Match object at 0x00B96918>
>>o.group()
'"123456789012345"'
>>o = rx.search('"1234567" "12345"'); o
<_sre.SRE_Match object at 0x00B96950>
>>o.group()
'"1234567" "12345"' ########## whoops ##########
>>>
>>patt = r'"[^"]{15}"' # or use the non-greedy ? tag
rx = re.compile(patt)
o = rx.search('"123456789012345"'); o
<_sre.SRE_Match object at 0x00B96918>
>>o.group()
'"123456789012345"'
>>o = rx.search('"1234567" "12345"'); o
o.group()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: 'NoneType' object has no attribute 'group'

3. Try building scripts from small TESTED parts e.g. in this case
write a function to find all quoted strings of length n inside a given
string. If you do that, you will KNOW there is no limit that stops you
finding a string of length 272, and you can then look for your error
elsewhere.

HTH,
John

Oct 2 '07 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

25
by: Maurice LING | last post by:
Hi, I think I've hit a system limit in python when I try to construct a list of 200,000 elements. My error is malloc: vm_allocate (size = 2400256) failed...... Just wondering is this...
0
by: todd | last post by:
here is a search tool SP I wrote. How many times have you wanted to search all your Stored procs or views (in a database) for a keyword but couldn't!? Well now you can! THis can makes life a...
1
by: Alan J. Flavell | last post by:
What are the theoretical and practical limits on the length of a GET query string, currently? Strange to say, I found this rather simple question hard to answer, possibly because of searching...
22
by: Phlip | last post by:
C++ers: Here's an open ended STL question. What's the smarmiest most templated way to use <string>, <algorithms> etc. to turn this: " able search baker search charlie " into this: " able...
3
by: Richard S | last post by:
CODE: ASP.NET with C# DATABASE: ACCES alright, im having a problem, probably a small thing, but i cant figure out, nor find it in any other post, or on the internet realy (probably cuz i wouldnt...
2
by: =?Utf-8?B?TWlrZSBLcmFsZXk=?= | last post by:
In my ASP.NET application, I'd like to set limits on the maximum size of an uploaded file. Normally I'd just set the maxRequestLength of the httpRuntime element in web.config. But in this case,...
0
Debadatta Mishra
by: Debadatta Mishra | last post by:
Introduction In this article I will provide you an approach to manipulate an image file. This article gives you an insight into some tricks in java so that you can conceal sensitive information...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.