473,378 Members | 1,415 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,378 software developers and data experts.

Looping through a file a block of text at a time not by line

Hello

Help is great appreciated in advance.

I need to loop through a file 6000 bytes at a time. I was going to
use the following but do not know how to advance through the file 6000
bytes at a time.

file = open('hotels.xml')
block = file.read(6000)
newblock = re.sub(re.compile(r'<Rate.*?></Rate>'),'',block)
print newblock

I cannot use readlines because the file is 138MB all on one line.

Suggestions?

-Rosario

Jun 14 '06 #1
3 1667

Rosario Morgan wrote:
Hello

Help is great appreciated in advance.

I need to loop through a file 6000 bytes at a time. I was going to
use the following but do not know how to advance through the file 6000
bytes at a time.

file = open('hotels.xml')
block = file.read(6000)
newblock = re.sub(re.compile(r'<Rate.*?></Rate>'),'',block)
print newblock

I cannot use readlines because the file is 138MB all on one line.

Suggestions?

-Rosario


Probably a more terse way to do this, but this seems to work
import os

offset = 0
grab_size = 6000
file_size = os.stat('hotels.xml')[6]
f = open('hotels.xml', 'r')

while offset < file_size:
f.seek(offset)
data_block = f.read(grab_size)
offset += grab_size
print data_block
f.close()

Jun 14 '06 #2
Rune Strand wrote:
Probably a more terse way to do this, but this seems to work
import os

offset = 0
grab_size = 6000
file_size = os.stat('hotels.xml')[6]
ouch. why not just loop until f.read returns an empty string ?
f = open('hotels.xml', 'r')

while offset < file_size:
f.seek(offset)
data_block = f.read(grab_size)
offset += grab_size
print data_block
f.close()


here's a shorter and more reliable version:

f = open(filename)
for block in iter(lambda: f.read(6000), ""):
... process block

here's the terse version:

for block in iter(lambda f=open(filename): f.read(6000), ""): ...

:::

what happens if a <Rate> element straddles the border between two 6000
byte blocks, btw ?

</F>

Jun 14 '06 #3
Rosario Morgan wrote:
Hello

Help is great appreciated in advance.

I need to loop through a file 6000 bytes at a time. I was going to
use the following but do not know how to advance through the file 6000
bytes at a time.

file = open('hotels.xml')
while True:
block = file.read(6000)
if not block:
break
do_something_with_block(block)

or:

block = file.read(6000)
while block:
do_something_with_block(block)
block = file.read(6000)

newblock = re.sub(re.compile(r'<Rate.*?></Rate>'),'',block)
Either you compile the regexp once and use the compiled regexp object:

exp = re.compile(r'<Rate.*?></Rate>')
(...)
newblock = exp.sub('', block)

or you use a non-compiled regexp:

newblock = re.sub(r'<Rate.*?></Rate>','',block)

Here, the first solution may be better. Using a SAX parser may be an
option too... (maybe overkill, or maybe the RightThingToDo(tm),
depending on the context...)

I cannot use readlines because the file is 138MB all on one line.


So much for the "XML is human readable and editable"....
--
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'o****@xiludom.gro'.split('@')])"
Jun 14 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: kaptain kernel | last post by:
i've got a while loop thats iterating through a text file and pumping the contents into a database. the file is quite large (over 150mb). the looping causes my CPU load to race up to 100 per...
2
by: adpsimpson | last post by:
Hi, I have a file which I wish to read from C++. The file, created by another programme, contains both text and numbers, all as ascii (it's a .txt file). A sample of the file is shown below: <<...
7
by: martian | last post by:
Hi, I've a couple of questions regarding the processing of a big text file (16MB). 1) how does python handle: > for line in big_file: is big_file all read into memory or one line is read...
8
by: siliconwafer | last post by:
Hi All, If I open a binary file in text mode and use text functions to read it then will I be reading numbers as characters or actual values? What if I open a text file and read it using binary...
0
by: Lokkju | last post by:
I am pretty much lost here - I am trying to create a managed c++ wrapper for this dll, so that I can use it from c#/vb.net, however, it does not conform to any standard style of coding I have seen....
6
by: tomtown.net | last post by:
Hello I'm trying to get a single line removed from a text file using a search pattern (pretty simple: if line contains "NODE1") -> remove line). To achieve this I's like to operate with only the...
1
by: laredotornado | last post by:
Hi, I'm using PHP 4.4.4 on Apache 2 on Fedora Core 5. PHP was installed using Apache's apxs and the php library was installed to /usr/local/php. However, when I set my "error_reporting"...
22
Dököll
by: Dököll | last post by:
Hiya, Partners! I have been into it for 12 hours straight this week-end, my son is very unhappy. Looks like I am getting pretty close but need your help, Again. I will post my first...
4
by: XpatienceX | last post by:
Hi, Can someone help me with this assignment, I am confused of what is needed to be done here. I am suposed to design a program that models a worm's behavior in the following scenario: A...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.