473,473 Members | 2,021 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

streaming a file object through re.finditer

Hello,

I've been looking for a while for an answer, but so far I haven't been
able to turn anything up yet. Basically, what I'd like to do is to use
re.finditer to search a large file (or a file stream), but I haven't
figured out how to get finditer to work without loading the entire file
into memory, or just reading one line at a time (or more complicated
buffering).

For example, say I do this:
cat a b c > blah

Then run this python script:
import re
for m in re.finditer('\w+', buffer(file('blah'))): .... print m.group()
....
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: buffer object expected

Of course, this works fine, but it loads the file completely into
memory (right?): for m in re.finditer('\w+', buffer(file('blah').read())):

.... print m.group()
....
a
b
c

So, is there any way to do this?

Thanks,

-e

Jul 18 '05 #1
6 3142
Ack, typo. What I meant was this:
cat a b c > blah
import re
for m in re.finditer('\w+', file('blah')):
.... print m.group()
....
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: buffer object expected

Of course, this works fine, but it loads the file completely into
memory (right?): for m in re.finditer('\w+', file('blah').read()):

.... print m.group()
....
a
b
c

Jul 18 '05 #2
The following example loads the file into memory only one line at a
time, so it should suit your purposes:
data = file( "important.dat" , "w" )
data.write("this\nis\nimportant\ndata")
data.close()
now read it....
import re
data = file( "important.dat" , "r" )
line = data.readline()
while line: for x in re.finditer( "\w+" , line):
print x.group()
line = data.readline()
this
is
important
data

--
Daniel Bickett
dbickett at gmail.com
http://heureusement.org/
Jul 18 '05 #3
True, but it doesn't work with multiline regular expressions :(

-e

Jul 18 '05 #4

Is it not possible to wrap your loop below within a loop doing
file.read([size]) (or readline() or readlines([size]),
reading the file a chunk at a time then running your re on a per-chunk
basis?

-ej
"Erick" <id******@gmail.com> wrote in message
news:11*********************@f14g2000cwb.googlegro ups.com...
Ack, typo. What I meant was this:
cat a b c > blah
import re
for m in re.finditer('\w+', file('blah')):
... print m.group()
...
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: buffer object expected

Of course, this works fine, but it loads the file completely into
memory (right?): for m in re.finditer('\w+', file('blah').read()):

... print m.group()
...
a
b
c

Jul 18 '05 #5
Erick wrote:
True, but it doesn't work with multiline regular expressions :(


If your intent is for the expression to traverse multiple lines (and
possibly match *across* multiple lines,) then, as far as I know, you
have no choice but to load the whole file into memory.

--
Daniel Bickett
dbickett at gmail.com
http://heureusement.org/
Jul 18 '05 #6
Erick wrote:
Hello,

I've been looking for a while for an answer, but so far I haven't been
able to turn anything up yet. Basically, what I'd like to do is to use
re.finditer to search a large file (or a file stream), but I haven't
figured out how to get finditer to work without loading the entire file
into memory, or just reading one line at a time (or more complicated
buffering).


Can you use mmap?

http://docs.python.org/lib/module-mmap.html

"You can use mmap objects in most places where strings are expected; for
example, you can use the re module to search through a memory-mapped file."

Seems applicable, and it should keep your memory use down, but I'm not
very experienced with it...

Steve
Jul 18 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Robert Oschler | last post by:
I figured I'd utter some words of praise for re.finditer(). It's such a great feeling, when you come across a single statement that does everything you want in two lines of code: for g in...
4
by: Erik Johnson | last post by:
I am still fairly new to Python and trying to learn to put RE's to good use. I am a little confused about the finditer() method. It is documented like so: finditer( pattern, string) Return an...
8
by: Chris Lasher | last post by:
Hello, I really like the finditer() method of the re module. I'm having difficulty at the moment, however, because finditer() still creates a callable-iterator oject, even when no match is found....
4
by: Poldanziern | last post by:
Is there such a thing in C++ as stream which you can pass data into and then retrieve data out of on the other end (without interacting with the user or a file)? For example, I'd like to extract...
6
by: | last post by:
Hi all, is there a better way to stream binary data stored in a table in sql 2005 to a browser in .net 2.0? Or is the code same as in .net 1.1? We noticed that in certain heavy load scenarios,...
8
by: Amjad | last post by:
Hi i am writing a application where i want to browse video file and copy data into stream and send that stream over network...I have develop P2P windows application where i successfully transfer...
1
by: michael | last post by:
Hello all, I have a Linksys WVC54GC network camera that I am trying to integrate into a website and to enable browsers other than IE to use. Linksys, in their ever-short-sighted ways, decided...
2
by: dpw.asdf | last post by:
I have been searching all over for a solution to this. I am new to Python, so I'm a little lost. Any pointers would be a great help. I have a couple hundred emails that contain data I would like to...
1
by: Faisal Shafiq | last post by:
I want to upload a file direct to the Silverlight Streaming Service from a Web Client such as silverlight application. As per our product requirement we want to upload a .WMV file directly from...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
0
muto222
php
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.