urllib2 - iteration over non-sequence

rplobue

im trying to get urllib2 to work on my server which runs python
2.2.1. When i run the following code:
import urllib2
for line in urllib2.urlopen('www.google.com'):
print line
i will always get the error:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: iteration over non-sequence
Anyone have any answers?

Jun 9 '07 #1

Subscribe Post Reply

4139

Larry Bates

rp*****@gmail.com wrote:

im trying to get urllib2 to work on my server which runs python
2.2.1. When i run the following code:
import urllib2
for line in urllib2.urlopen('www.google.com'):
print line
i will always get the error:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: iteration over non-sequence
Anyone have any answers?

I ran your code:

>>import urllib2
urllib2.urlopen('www.google.com')

Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
File "C:\Python25\lib\urllib2.py", line 121, in urlopen
return _opener.open(url, data)
File "C:\Python25\lib\urllib2.py", line 366, in open
protocol = req.get_type()
File "C:\Python25\lib\urllib2.py", line 241, in get_type
raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type: www.google.com

Note the traceback.

you need to call it with type in front of the url:

>>import urllib2
urllib2.urlopen('http://www.google.com')

<addinfourl at 27659320 whose fp = <socket._fileobject object at 0x01A51F48>>

Python's interactive mode is very useful for tracking down this type
of problem.

-Larry

Jun 9 '07 #2

rplobue

Thanks for the reply Larry but I am still having trouble. If i
understand you correctly, your are just suggesting that i add an http://
in front of the address? However when i run this:

>>import urllib2
site = urllib2.urlopen('http://www.google.com')
for line in site:
print line

I am still getting the message:

TypeError: iteration over non-sequence
File "<stdin>", line 1
TypeError: iteration over non-sequence

Jun 9 '07 #3

Gary Herron

rp*****@gmail.com wrote:

Thanks for the reply Larry but I am still having trouble. If i
understand you correctly, your are just suggesting that i add an http://
in front of the address? However when i run this:

>>>import urllib2
site = urllib2.urlopen('http://www.google.com')
for line in site:
print line

I am still getting the message:

TypeError: iteration over non-sequence
File "<stdin>", line 1
TypeError: iteration over non-sequence

Newer version of Python are willing to implement an iterator that
*reads* the contents of a file object and supplies the lines to you
one-by-one in a loop. However, you explicitly said the version of
Python you are using, and that predates generators/iterators.

So... You must explicitly read the contents of the file-like object
yourself, and loop through the lines you self. However, fear not --
it's easy. The socket._fileobject object provides a method "readlines"
that reads the *entire* contents of the object, and returns a list of
lines. And you can iterate through that list of lines. Like this:

import urllib2
url = urllib2.urlopen('http://www.google.com')
for line in url.readlines():
print line
url.close()
Gary Herron

Jun 9 '07 #4

Erik Max Francis

Gary Herron wrote:

So... You must explicitly read the contents of the file-like object
yourself, and loop through the lines you self. However, fear not --
it's easy. The socket._fileobject object provides a method "readlines"
that reads the *entire* contents of the object, and returns a list of
lines. And you can iterate through that list of lines. Like this:

import urllib2
url = urllib2.urlopen('http://www.google.com')
for line in url.readlines():
print line
url.close()

This is really wasteful, as there's no point in reading in the whole
file before iterating over it. To get the same effect as file iteration
in later versions, use the .xreadlines method::

for line in aFile.xreadlines():
...

--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM, Y!M erikmaxfrancis
If you flee from terror, then terror continues to chase you.
-- Benjamin Netanyahu

Jun 10 '07 #5

Paul Rubin

Erik Max Francis <ma*@alcyone.comwrites:

This is really wasteful, as there's no point in reading in the whole
file before iterating over it. To get the same effect as file
iteration in later versions, use the .xreadlines method::

for line in aFile.xreadlines():
...

Ehhh, a heck of a lot of web pages don't have any newlines, so you end
up getting the whole file anyway, with that method. Something like

for line in iter(lambda: aFile.read(4096), ''): ...

may be best.

Jun 10 '07 #6

Gary Herron

Paul Rubin wrote:

Erik Max Francis <ma*@alcyone.comwrites:

>This is really wasteful, as there's no point in reading in the whole
file before iterating over it. To get the same effect as file
iteration in later versions, use the .xreadlines method::

for line in aFile.xreadlines():
...

Ehhh, a heck of a lot of web pages don't have any newlines, so you end
up getting the whole file anyway, with that method. Something like

for line in iter(lambda: aFile.read(4096), ''): ...

may be best.

Certainly there's are cases where xreadlines or read(bytecount) are
reasonable, but only if the total pages size is *very* large. But for
most web pages, you guys are just nit-picking (or showing off) to
suggest that the full read implemented by readlines is wasteful.
Moreover, the original problem was with sockets -- which don't have
xreadlines. That seems to be a method on regular file objects.

For simplicity, I'd still suggest my original use of readlines. If
and when you find you are downloading web pages with sizes that are
putting a serious strain on your memory footprint, then one of the other
suggestions might be indicated.

Gary Herron

Jun 10 '07 #7

Paul Rubin

Gary Herron <gh*****@islandtraining.comwrites:

For simplicity, I'd still suggest my original use of readlines. If
and when you find you are downloading web pages with sizes that are
putting a serious strain on your memory footprint, then one of the other
suggestions might be indicated.

If you know in advance that the page you're retrieving will be
reasonable in size, then using readlines is fine. If you don't know
in advance what you're retrieving (e.g. you're working on a crawler)
you have to assume that you'll hit some very large pages with
difficult construction.

Jun 10 '07 #8

Erik Max Francis

Gary Herron wrote:

Certainly there's are cases where xreadlines or read(bytecount) are
reasonable, but only if the total pages size is *very* large. But for
most web pages, you guys are just nit-picking (or showing off) to
suggest that the full read implemented by readlines is wasteful.
Moreover, the original problem was with sockets -- which don't have
xreadlines. That seems to be a method on regular file objects.

For simplicity, I'd still suggest my original use of readlines. If
and when you find you are downloading web pages with sizes that are
putting a serious strain on your memory footprint, then one of the other
suggestions might be indicated.

It isn't nitpicking to point out that you're making something that will
consume vastly more amounts of memory than it could possibly need. And
insisting that pages aren't _always_ huge is just a silly cop-out; of
course pages get very large.

There is absolutely no reason to read the entire file into memory (which
is what you're doing) before processing it. This is a good example of
the principle of there is one obvious right way to do it -- and it isn't
to read the whole thing in first for no reason whatsoever other than to
avoid an `x`.

--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM, Y!M erikmaxfrancis
The more violent the love, the more violent the anger.
-- _Burmese Proverbs_ (tr. Hla Pe)

Jun 10 '07 #9

Erik Max Francis

Paul Rubin wrote:

If you know in advance that the page you're retrieving will be
reasonable in size, then using readlines is fine. If you don't know
in advance what you're retrieving (e.g. you're working on a crawler)
you have to assume that you'll hit some very large pages with
difficult construction.

And that's before you even mention the point that, depending on the
application, it could easily open yourself up to a DOS attack.

There's premature optimization, and then there's premature completely
obvious and pointless waste. This falls in the latter category.

Besides, someone was asking for/needing an older equivalent to iterating
over a file. That's obviously .xreadlines, not .readlines.

--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM, Y!M erikmaxfrancis
The more violent the love, the more violent the anger.
-- _Burmese Proverbs_ (tr. Hla Pe)

Jun 10 '07 #10

Gabriel Genellina

En Sun, 10 Jun 2007 02:54:47 -0300, Erik Max Francis <ma*@alcyone.com>
escribió:

Gary Herron wrote:

>Certainly there's are cases where xreadlines or read(bytecount) are
reasonable, but only if the total pages size is *very* large. But for
most web pages, you guys are just nit-picking (or showing off) to
suggest that the full read implemented by readlines is wasteful.
Moreover, the original problem was with sockets -- which don't have
xreadlines. That seems to be a method on regular file objects.

There is absolutely no reason to read the entire file into memory (which
is what you're doing) before processing it. This is a good example of
the principle of there is one obvious right way to do it -- and it isn't
to read the whole thing in first for no reason whatsoever other than to
avoid an `x`.

The problem is -and you appear not to have noticed that- that the object
returned by urlopen does NOT have a xreadlines() method; and even if it
had, a lot of pages don't contain any '\n' so using xreadlines would read
the whole page in memory anyway.

Python 2.2 (the version that the OP is using) did include a xreadlines
module (now defunct) but on this case it is painfully slooooooooooooow -
perhaps it tries to read the source one character at a time.

So the best way would be to use (as Paul Rubin already said):

for line in iter(lambda: f.read(4096), ''): print line

--
Gabriel Genellina

Jun 10 '07 #11

Similar topics

FTP with urllib2 behind a proxy

by: O. Koch | last post by:

Until now, i know that ftplib doesn't support proxies and that i have to use urllib2. But i don't know how to use the urllib2 correct. I found some examples, but i don't understand them. Is...

Python

OWA (Outlook Web Access) with urllib2

by: Pascal | last post by:

Hello, I want to acces my OWA (Outlook Web Acces - http Exchange interface) server with urllib2 but, when I try, I've always a 401 http error. Can someone help me (and us)? Thanks. ...

Python

SSL via Proxy (URLLIB2) on Windows gives Unknown Protocol error?

by: Benjamin Schollnick | last post by:

Folks, With Windows XP, and Python v2.41 I am running into a problem.... The following code gives me an unknown protocol error.... And I am not sure how to resolve it... I have a API...

Python

urllib2 Opener and Proxy/Authentication issues

by: Ray Slakinski | last post by:

Hello, I have defined a function to set an opener for urllib2, this opener defines any proxy and http authentication that is required. If the proxy has authencation itself and requests an...

Python

urllib2 through basic auth'ed proxy

by: Alejandro Dubrovsky | last post by:

I see from googling around that this is a popular topic, but I haven't seen anyone saying "ah, yes, that works", so here it goes. How does one connect through a proxy which requires basic...

Python

A question about the urllib2 ?

by: Bo Yang | last post by:

Hi , Recently I use python's urllib2 write a small script to login our university gateway . Usually , I must login into the gateway in order to surf the web . So , every time I start my computer...

Python

urllib2 problem with ports.

by: Ant | last post by:

Hi all, I have just moved to a new machine, and so have installed the latest version of Python (2.4.3 - previously I believe I was running 2.4.2). Unfortunately this seems to have broken...

Python

urllib2 request htaccess page through proxy

by: Alessandro Fachin | last post by:

I write this simply code that should give me the access to private page with htaccess using a proxy, i don't known because it's wrong... import urllib,urllib2 #input url...

Python

Question about using urllib2 to load a url

by: ken | last post by:

Hi, i have the following code to load a url. My question is what if I try to load an invalide url ("http:// www.heise.de/"), will I get an IOException? or it will wait forever? Thanks for any...

Python

Help Tracing urllib2 Error, Please?

by: Larry Hale | last post by:

Since it seems I have a "unique" problem, I wonder if anyone could point me in the general/right direction for tracking down the issue and resolving it myself. See my prior post @...

Python

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice