By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,898 Members | 1,325 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,898 IT Pros & Developers. It's quick & easy.

is for reliable?

P: n/a
Hi to all I have a question about the for statement of python. I have the
following piece of code where cachefilesSet is a set that contains the
names of 1398 html files cached on my hard disk
for fn in cachefilesSet:

fObj = codecs.open( baseDir + fn + '-header.html', 'r', 'iso-8859-1' )
u = fObj.read()

v = u.lower()
rows = v.split('\x0a')

contentType = ''

for r in rows:
if r.find('content-type') != -1:
y = r.find(':')
if y != -1:
z = r.find(';', y)
if z != -1:
contentType = r[y+1:z].strip()
cE = r[z+1:].strip()
characterEncoding = cE.strip('charset = ')
else:
contenType = r[y+1:].strip()
characterEncoding = ''
break

if contentType == 'text/html':
processHTMLfile( baseDir + fn + '-body.html', characterEncoding, cardinalita )

fileCnt += 1
if fileCnt % 100 == 0: print fileCnt

this code stops at the 473th file instead of reaching 1398

however I changed the for and substituted it with a while in this way

while cachefilesSet:
fn = cachefilesSet.pop()
.......
.......

the while loop reaches the 1398th file and is some 3-4 times faster than
the for loop

How is this possible?
May 7 '07 #1
Share this Question
Share on Google+
5 Replies


P: n/a
In <Bm***************@twister2.libero.it>, pa******@giochinternet.com
wrote:
Hi to all I have a question about the for statement of python. I have the
following piece of code where cachefilesSet is a set that contains the
names of 1398 html files cached on my hard disk

[snipped code]

this code stops at the 473th file instead of reaching 1398

however I changed the for and substituted it with a while in this way

while cachefilesSet:
fn = cachefilesSet.pop()
.......
.......

the while loop reaches the 1398th file and is some 3-4 times faster than
the for loop

How is this possible?
Good question. ``for`` loops are of course reliable. Can you give a
short self contained example that shows the behavior?

Ciao,
Marc 'BlackJack' Rintsch
May 7 '07 #2

P: n/a
On 5/7/07, pa******@giochinternet.com <pa******@giochinternet.comwrote:
for fn in cachefilesSet:
....
this code stops at the 473th file instead of reaching 1398
This is often caused by mutating the object you are iterating over
inside the for loop. I didn't see anything in the code you posted
that would do that, but you also didn't show us all of your code. Are
you doing anything that would change cachefilesSet inside your for
loop? If so, try looping over a copy of your set instead:

from copy import copy
for fn in copy(cachefilesSet):
...

--
Jerry
May 7 '07 #3

P: n/a
On May 7, 8:46 pm, "pablo...@giochinternet.com"
<pablo...@giochinternet.comwrote:
Hi to all I have a question about the for statement of python. I have the
following piece of code where cachefilesSet is a set that contains the
names of 1398 html files cached on my hard disk

for fn in cachefilesSet:

fObj = codecs.open( baseDir + fn + '-header.html', 'r', 'iso-8859-1' )
u = fObj.read()

v = u.lower()
rows = v.split('\x0a')

contentType = ''

for r in rows:
if r.find('content-type') != -1:
y = r.find(':')
if y != -1:
z = r.find(';', y)
if z != -1:
contentType = r[y+1:z].strip()
cE = r[z+1:].strip()
characterEncoding = cE.strip('charset = ')
else:
contenType = r[y+1:].strip()
characterEncoding = ''
break

if contentType == 'text/html':
processHTMLfile( baseDir + fn + '-body.html', characterEncoding, cardinalita )

fileCnt += 1
if fileCnt % 100 == 0: print fileCnt
[snip]
I'd like to point out what look like 2 errors in the code:

1. You have "contenType" instead of "contentType" in "contenType = r[y
+1:].strip()".

2. The string method "strip(...)" treats its string argument as a
_set_ of characters to strip, so "cE.strip('charset = ')" will strip
any leading and trailing "c", "h", "a", etc., which isn't what I think
you intended.

May 7 '07 #4

P: n/a
On May 8, 5:46 am, "pablo...@giochinternet.com"
<pablo...@giochinternet.comwrote:
Hi to all I have a question about the for statement of python. I have the
following piece of code where cachefilesSet is a set that contains the
names of 1398 html files cached on my hard disk

for fn in cachefilesSet:

fObj = codecs.open( baseDir + fn + '-header.html', 'r', 'iso-8859-1' )
u = fObj.read()

v = u.lower()
rows = v.split('\x0a')

contentType = ''

for r in rows:
if r.find('content-type') != -1:
y = r.find(':')
if y != -1:
z = r.find(';', y)
if z != -1:
u, v, r, y, z .... are you serious?
contentType = r[y+1:z].strip()
cE = r[z+1:].strip()
characterEncoding = cE.strip('charset = ')
Read the manual ... strip('charset = ') is NOT doing what you think it
is.
else:
contenType = r[y+1:].strip()
Do you mean contentType ?
Consider using pychecker and/or pylint.
characterEncoding = ''
break

if contentType == 'text/html':
processHTMLfile( baseDir + fn + '-body.html', characterEncoding, cardinalita )

We don't have crystal balls -- what does processHTMLfile() do? Where
is "cardinalita" bound to a value?
>
fileCnt += 1
if fileCnt % 100 == 0: print fileCnt

this code stops at the 473th file instead of reaching 1398
Sets are not ordered. There is no such thing as the 473rd element. I
presume you mean that you believe that your code processes only 473
elements
>
however I changed the for and substituted it with a while in this way

while cachefilesSet:
fn = cachefilesSet.pop()
.......
.......

the while loop reaches the 1398th file and is some 3-4 times faster than
the for loop

How is this possible?
Given that you open each file and read it, the notion that processing
3 times as many files is 3-4 times faster is very hard to swallow
(even if you mean 3-4 times faster *per file*). Show us *all* of your
code (with appropriate counting and timing) and its output.

Something daft is happening in the part of the code that you haven't
shown us. E.g. you are deleting elements from cachefilesSet, or
perhaps even adding new entries. As a general rule, don't fiddle with
a container over which you are iterating. Using the
while container:
item = container.pop()
trick is not "iterating over".

Interestingly, 1398 is rather close to 3 times 473. Coincidence?

Have you tried your code on subsets of your 1398 element set? [This
concept is called "testing"] A subset of size 10 plus copious relevant
print statements in your code might show you what is happening.

HTH,
John

May 7 '07 #5

P: n/a
Yes.

<pa******@giochinternet.comwrote in message
news:Bm***************@twister2.libero.it...
| Hi to all I have a question about the for statement of python. I have the
| following piece of code where cachefilesSet is a set that contains the
| names of 1398 html files cached on my hard disk
[snip]|
| this code stops at the 473th file instead of reaching 1398

In what way does it stop? Exception and traceback?

| however I changed the for and substituted it with a while in this way
|
| while cachefilesSet:
| fn = cachefilesSet.pop()
| .......
| .......
|
| the while loop reaches the 1398th file and is some 3-4 times faster than
| the for loop

Perhaps you have a broken installation. What version,system,compiler?

tjr

May 8 '07 #6

This discussion thread is closed

Replies have been disabled for this discussion.