is for reliable?

pabloski

Hi to all I have a question about the for statement of python. I have the
following piece of code where cachefilesSet is a set that contains the
names of 1398 html files cached on my hard disk
for fn in cachefilesSet:

fObj = codecs.open( baseDir + fn + '-header.html', 'r', 'iso-8859-1' )
u = fObj.read()

v = u.lower()
rows = v.split('\x0a')

contentType = ''

for r in rows:
if r.find('content-type') != -1:
y = r.find(':')
if y != -1:
z = r.find(';', y)
if z != -1:
contentType = r[y+1:z].strip()
cE = r[z+1:].strip()
characterEncoding = cE.strip('charset = ')
else:
contenType = r[y+1:].strip()
characterEncoding = ''
break

if contentType == 'text/html':
processHTMLfile( baseDir + fn + '-body.html', characterEncoding, cardinalita )

fileCnt += 1
if fileCnt % 100 == 0: print fileCnt

this code stops at the 473th file instead of reaching 1398

however I changed the for and substituted it with a while in this way

while cachefilesSet:
fn = cachefilesSet.pop()
.......
.......

the while loop reaches the 1398th file and is some 3-4 times faster than
the for loop

How is this possible?

May 7 '07 #1

Subscribe Post Reply

1454

Marc 'BlackJack' Rintsch

In <Bm***************@twister2.libero.it>, pa******@giochinternet.com
wrote:

Hi to all I have a question about the for statement of python. I have the
following piece of code where cachefilesSet is a set that contains the
names of 1398 html files cached on my hard disk

[snipped code]

this code stops at the 473th file instead of reaching 1398

however I changed the for and substituted it with a while in this way

while cachefilesSet:
fn = cachefilesSet.pop()
.......
.......

the while loop reaches the 1398th file and is some 3-4 times faster than
the for loop

How is this possible?

Good question. ``for`` loops are of course reliable. Can you give a
short self contained example that shows the behavior?

Ciao,
Marc 'BlackJack' Rintsch

May 7 '07 #2

Jerry Hill

On 5/7/07, pa******@giochinternet.com <pa******@giochinternet.comwrote:

for fn in cachefilesSet:

....

this code stops at the 473th file instead of reaching 1398

This is often caused by mutating the object you are iterating over
inside the for loop. I didn't see anything in the code you posted
that would do that, but you also didn't show us all of your code. Are
you doing anything that would change cachefilesSet inside your for
loop? If so, try looping over a copy of your set instead:

from copy import copy
for fn in copy(cachefilesSet):
...

--
Jerry

May 7 '07 #3

MRAB

On May 7, 8:46 pm, "pablo...@giochinternet.com"
<pablo...@giochinternet.comwrote:

Hi to all I have a question about the for statement of python. I have the
following piece of code where cachefilesSet is a set that contains the
names of 1398 html files cached on my hard disk

for fn in cachefilesSet:

fObj = codecs.open( baseDir + fn + '-header.html', 'r', 'iso-8859-1' )
u = fObj.read()

v = u.lower()
rows = v.split('\x0a')

contentType = ''

for r in rows:
if r.find('content-type') != -1:
y = r.find(':')
if y != -1:
z = r.find(';', y)
if z != -1:
contentType = r[y+1:z].strip()
cE = r[z+1:].strip()
characterEncoding = cE.strip('charset = ')
else:
contenType = r[y+1:].strip()
characterEncoding = ''
break

if contentType == 'text/html':
processHTMLfile( baseDir + fn + '-body.html', characterEncoding, cardinalita )

fileCnt += 1
if fileCnt % 100 == 0: print fileCnt

[snip]
I'd like to point out what look like 2 errors in the code:

1. You have "contenType" instead of "contentType" in "contenType = r[y
+1:].strip()".

2. The string method "strip(...)" treats its string argument as a
_set_ of characters to strip, so "cE.strip('charset = ')" will strip
any leading and trailing "c", "h", "a", etc., which isn't what I think
you intended.

May 7 '07 #4

John Machin

On May 8, 5:46 am, "pablo...@giochinternet.com"
<pablo...@giochinternet.comwrote:

Hi to all I have a question about the for statement of python. I have the
following piece of code where cachefilesSet is a set that contains the
names of 1398 html files cached on my hard disk

for fn in cachefilesSet:

fObj = codecs.open( baseDir + fn + '-header.html', 'r', 'iso-8859-1' )
u = fObj.read()

v = u.lower()
rows = v.split('\x0a')

contentType = ''

for r in rows:
if r.find('content-type') != -1:
y = r.find(':')
if y != -1:
z = r.find(';', y)
if z != -1:

u, v, r, y, z .... are you serious?

contentType = r[y+1:z].strip()
cE = r[z+1:].strip()
characterEncoding = cE.strip('charset = ')

Read the manual ... strip('charset = ') is NOT doing what you think it
is.

else:
contenType = r[y+1:].strip()

Do you mean contentType ?
Consider using pychecker and/or pylint.

characterEncoding = ''
break

if contentType == 'text/html':
processHTMLfile( baseDir + fn + '-body.html', characterEncoding, cardinalita )

We don't have crystal balls -- what does processHTMLfile() do? Where
is "cardinalita" bound to a value?

>
fileCnt += 1
if fileCnt % 100 == 0: print fileCnt

this code stops at the 473th file instead of reaching 1398

Sets are not ordered. There is no such thing as the 473rd element. I
presume you mean that you believe that your code processes only 473
elements

>
however I changed the for and substituted it with a while in this way

while cachefilesSet:
fn = cachefilesSet.pop()
.......
.......

the while loop reaches the 1398th file and is some 3-4 times faster than
the for loop

How is this possible?

Given that you open each file and read it, the notion that processing
3 times as many files is 3-4 times faster is very hard to swallow
(even if you mean 3-4 times faster *per file*). Show us *all* of your
code (with appropriate counting and timing) and its output.

Something daft is happening in the part of the code that you haven't
shown us. E.g. you are deleting elements from cachefilesSet, or
perhaps even adding new entries. As a general rule, don't fiddle with
a container over which you are iterating. Using the
while container:
item = container.pop()
trick is not "iterating over".

Interestingly, 1398 is rather close to 3 times 473. Coincidence?

Have you tried your code on subsets of your 1398 element set? [This
concept is called "testing"] A subset of size 10 plus copious relevant
print statements in your code might show you what is happening.

HTH,
John

May 7 '07 #5

Terry Reedy

Yes.

<pa******@giochinternet.comwrote in message
news:Bm***************@twister2.libero.it...
| Hi to all I have a question about the for statement of python. I have the
| following piece of code where cachefilesSet is a set that contains the
| names of 1398 html files cached on my hard disk
[snip]|
| this code stops at the 473th file instead of reaching 1398

In what way does it stop? Exception and traceback?

| however I changed the for and substituted it with a while in this way
|
| while cachefilesSet:
| fn = cachefilesSet.pop()
| .......
| .......
|
| the while loop reaches the 1398th file and is some 3-4 times faster than
| the for loop

Perhaps you have a broken installation. What version,system,compiler?

tjr

May 8 '07 #6

Similar topics

referrer string - how reliable?

by: Oliver | last post by:

Hi, I often have hits on my homepage with referrer strings that do not link to my page. Where does this come from? Does it mean that the visitor has been on the refferer before going to my site?...

HTML / CSS

ANNC: Reliable Software releases their own C++ Windows Library

by: relisoft | last post by:

Seattle, WA -- Seattle-based Reliable Software® announces the release their Windows Library into the public domain. Reliable Software Windows Library, RSWL, is the foundation for their compact,...

C / C++

Reliable getObjectPosition() function?

by: Matt Kruse | last post by:

I've found that under some circumstances, some code that I've been using to find an object's coordinates with respect to the viewport does not behave correctly. Is there a function that has been...

Javascript

Finding a reliable and inexpensive ASP.Net Hoster... suggestions?

by: Vaughn | last post by:

I'm looking for a reliable Web Hoster that has full Dot Net functionality.I'm basically looking for something that's about $10/month or less, has ASP.Net support, and has positive feedback and...

ASP.NET

How reliable is mime type in $_FILES superglobal.

by: splodge | last post by:

This may seem like a stupid question but I want to check before I go ahead and build this... I am working on a portal, part of which allows users to upload files. Part of the array within...

PHP

Because of multithreading semantics, this is not reliable.

by: OlafMeding | last post by:

Because of multithreading semantics, this is not reliable. This sentence is found in the Python documentation for "7.8.1 Queue Objects". This scares me! Why would Queue.qsize(), Queue.empty(...

Python

Reliable messaging

by: John Grant | last post by:

If I build a web services today with VS 2005 does it support reliable messaging? If I use WSE 3.0 will it support reliable messaging? If I donâ€™t have reliable messaging can I make a web method...

.NET Framework

ANNC: Reliable Software LLC Advances P2P Version Control technology by including a P2P Wiki and Bug Database with the upcoming release of Code Co-op 5.0

by: relisoft | last post by:

SEATTLE, Washington. - July 12, 2006: Reliable Software® announces the upcoming release of Code Co-op® version 5.0. Code Co-op is an affordable peer-to-peer version control system for distributed...

C / C++

Problem with ServerTooBusyException when using reliable session

by: =?Utf-8?B?S2F1c2hhbCBNZWh0YQ==?= | last post by:

Hi, I am facing the ServerTooBusyException when using reliable session and net.tcp binding. I have single server and single client application. The client registers for the event at the...

.NET Framework

Data Entry Outsourcing Services: Profitable and Reliable Advantage

by: Data Entry Outsourcing | last post by:

Data Entry plays vital role in every business area. Data Entry is one such aspects of any business that needs to be handled properly for expanding your business. Data Entry is one of the leading...

Microsoft Access / VBA

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

Wordpress or something else?

by: Faith0G | last post by:

I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

Content Management Systems

Access Europe: Command bars, the Access Shortcut Tool and a simple Audit Log - Wed 3 April

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

General

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware