Hello,
I've been looking for a while for an answer, but so far I haven't been
able to turn anything up yet. Basically, what I'd like to do is to use
re.finditer to search a large file (or a file stream), but I haven't
figured out how to get finditer to work without loading the entire file
into memory, or just reading one line at a time (or more complicated
buffering).
For example, say I do this:
cat a b c > blah
Then run this python script: import re for m in re.finditer('\w+', buffer(file('blah'))):
.... print m.group()
....
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: buffer object expected
Of course, this works fine, but it loads the file completely into
memory (right?): for m in re.finditer('\w+', buffer(file('blah').read())):
.... print m.group()
....
a
b
c
So, is there any way to do this?
Thanks,
-e 6 3142
Ack, typo. What I meant was this:
cat a b c > blah import re for m in re.finditer('\w+', file('blah')):
.... print m.group()
....
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: buffer object expected
Of course, this works fine, but it loads the file completely into
memory (right?): for m in re.finditer('\w+', file('blah').read()):
.... print m.group()
....
a
b
c
The following example loads the file into memory only one line at a
time, so it should suit your purposes: data = file( "important.dat" , "w" ) data.write("this\nis\nimportant\ndata") data.close()
now read it....
import re data = file( "important.dat" , "r" ) line = data.readline() while line:
for x in re.finditer( "\w+" , line):
print x.group()
line = data.readline()
this
is
important
data
--
Daniel Bickett
dbickett at gmail.com http://heureusement.org/
True, but it doesn't work with multiline regular expressions :(
-e
Is it not possible to wrap your loop below within a loop doing
file.read([size]) (or readline() or readlines([size]),
reading the file a chunk at a time then running your re on a per-chunk
basis?
-ej
"Erick" <id******@gmail.com> wrote in message
news:11*********************@f14g2000cwb.googlegro ups.com... Ack, typo. What I meant was this: cat a b c > blah
import re for m in re.finditer('\w+', file('blah')): ... print m.group() ... Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: buffer object expected
Of course, this works fine, but it loads the file completely into memory (right?): for m in re.finditer('\w+', file('blah').read()):
... print m.group() ... a b c
Erick wrote: True, but it doesn't work with multiline regular expressions :(
If your intent is for the expression to traverse multiple lines (and
possibly match *across* multiple lines,) then, as far as I know, you
have no choice but to load the whole file into memory.
--
Daniel Bickett
dbickett at gmail.com http://heureusement.org/
Erick wrote: Hello,
I've been looking for a while for an answer, but so far I haven't been able to turn anything up yet. Basically, what I'd like to do is to use re.finditer to search a large file (or a file stream), but I haven't figured out how to get finditer to work without loading the entire file into memory, or just reading one line at a time (or more complicated buffering).
Can you use mmap? http://docs.python.org/lib/module-mmap.html
"You can use mmap objects in most places where strings are expected; for
example, you can use the re module to search through a memory-mapped file."
Seems applicable, and it should keep your memory use down, but I'm not
very experienced with it...
Steve This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Robert Oschler |
last post by:
I figured I'd utter some words of praise for re.finditer(). It's such a
great feeling, when you come across a single statement that does everything
you want in two lines of code:
for g in...
|
by: Erik Johnson |
last post by:
I am still fairly new to Python and trying to learn to put RE's to good
use. I am a little confused about the finditer() method. It is documented
like so:
finditer( pattern, string)
Return an...
|
by: Chris Lasher |
last post by:
Hello,
I really like the finditer() method of the re module. I'm having
difficulty at the moment, however, because finditer() still creates a
callable-iterator oject, even when no match is found....
|
by: Poldanziern |
last post by:
Is there such a thing in C++ as stream which you can pass data into
and then retrieve data out of on the other end (without interacting
with the user or a file)?
For example, I'd like to extract...
|
by: |
last post by:
Hi all, is there a better way to stream binary data stored in a table in sql
2005 to a browser in .net 2.0? Or is the code same as in .net 1.1? We
noticed that in certain heavy load scenarios,...
| |
by: Amjad |
last post by:
Hi i am writing a application where i want to browse video file and copy data
into stream and send that stream over network...I have develop P2P windows
application where i successfully transfer...
|
by: michael |
last post by:
Hello all,
I have a Linksys WVC54GC network camera that I am trying
to integrate into a website and to enable browsers
other than IE to use.
Linksys, in their ever-short-sighted ways, decided...
|
by: dpw.asdf |
last post by:
I have been searching all over for a solution to this. I am new to
Python, so I'm a little lost. Any pointers would be a great help. I
have a couple hundred emails that contain data I would like to...
|
by: Faisal Shafiq |
last post by:
I want to upload a file direct to the Silverlight Streaming Service from a
Web Client such as silverlight application.
As per our product requirement we want to upload a .WMV file directly from...
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
| |
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
| |
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |