im trying to get urllib2 to work on my server which runs python
2.2.1. When i run the following code:
import urllib2
for line in urllib2.urlopen ('www.google.co m'):
print line
i will always get the error:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: iteration over non-sequence
Anyone have any answers? 10 4168 rp*****@gmail.c om wrote:
im trying to get urllib2 to work on my server which runs python
2.2.1. When i run the following code:
import urllib2
for line in urllib2.urlopen ('www.google.co m'):
print line
i will always get the error:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: iteration over non-sequence
Anyone have any answers?
I ran your code:
>>import urllib2 urllib2.urlop en('www.google. com')
Traceback (most recent call last):
File "<interacti ve input>", line 1, in <module>
File "C:\Python25\li b\urllib2.py", line 121, in urlopen
return _opener.open(ur l, data)
File "C:\Python25\li b\urllib2.py", line 366, in open
protocol = req.get_type()
File "C:\Python25\li b\urllib2.py", line 241, in get_type
raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type: www.google.com
Note the traceback.
you need to call it with type in front of the url:
>>import urllib2 urllib2.urlop en('http://www.google.com' )
<addinfourl at 27659320 whose fp = <socket._fileob ject object at 0x01A51F48>>
Python's interactive mode is very useful for tracking down this type
of problem.
-Larry
Thanks for the reply Larry but I am still having trouble. If i
understand you correctly, your are just suggesting that i add an http://
in front of the address? However when i run this:
>>import urllib2 site = urllib2.urlopen ('http://www.google.com' ) for line in site: print line
I am still getting the message:
TypeError: iteration over non-sequence
File "<stdin>", line 1
TypeError: iteration over non-sequence rp*****@gmail.c om wrote:
Thanks for the reply Larry but I am still having trouble. If i
understand you correctly, your are just suggesting that i add an http://
in front of the address? However when i run this:
>>>import urllib2 site = urllib2.urlopen ('http://www.google.com' ) for line in site: print line
I am still getting the message:
TypeError: iteration over non-sequence
File "<stdin>", line 1
TypeError: iteration over non-sequence
Newer version of Python are willing to implement an iterator that
*reads* the contents of a file object and supplies the lines to you
one-by-one in a loop. However, you explicitly said the version of
Python you are using, and that predates generators/iterators.
So... You must explicitly read the contents of the file-like object
yourself, and loop through the lines you self. However, fear not --
it's easy. The socket._fileobj ect object provides a method "readlines"
that reads the *entire* contents of the object, and returns a list of
lines. And you can iterate through that list of lines. Like this:
import urllib2
url = urllib2.urlopen ('http://www.google.com' )
for line in url.readlines() :
print line
url.close()
Gary Herron
Gary Herron wrote:
So... You must explicitly read the contents of the file-like object
yourself, and loop through the lines you self. However, fear not --
it's easy. The socket._fileobj ect object provides a method "readlines"
that reads the *entire* contents of the object, and returns a list of
lines. And you can iterate through that list of lines. Like this:
import urllib2
url = urllib2.urlopen ('http://www.google.com' )
for line in url.readlines() :
print line
url.close()
This is really wasteful, as there's no point in reading in the whole
file before iterating over it. To get the same effect as file iteration
in later versions, use the .xreadlines method::
for line in aFile.xreadline s():
...
--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM, Y!M erikmaxfrancis
If you flee from terror, then terror continues to chase you.
-- Benjamin Netanyahu
Erik Max Francis <ma*@alcyone.co mwrites:
This is really wasteful, as there's no point in reading in the whole
file before iterating over it. To get the same effect as file
iteration in later versions, use the .xreadlines method::
for line in aFile.xreadline s():
...
Ehhh, a heck of a lot of web pages don't have any newlines, so you end
up getting the whole file anyway, with that method. Something like
for line in iter(lambda: aFile.read(4096 ), ''): ...
may be best.
Paul Rubin wrote:
Erik Max Francis <ma*@alcyone.co mwrites:
>This is really wasteful, as there's no point in reading in the whole file before iterating over it. To get the same effect as file iteration in later versions, use the .xreadlines method::
for line in aFile.xreadline s(): ...
Ehhh, a heck of a lot of web pages don't have any newlines, so you end
up getting the whole file anyway, with that method. Something like
for line in iter(lambda: aFile.read(4096 ), ''): ...
may be best.
Certainly there's are cases where xreadlines or read(bytecount) are
reasonable, but only if the total pages size is *very* large. But for
most web pages, you guys are just nit-picking (or showing off) to
suggest that the full read implemented by readlines is wasteful.
Moreover, the original problem was with sockets -- which don't have
xreadlines. That seems to be a method on regular file objects.
For simplicity, I'd still suggest my original use of readlines. If
and when you find you are downloading web pages with sizes that are
putting a serious strain on your memory footprint, then one of the other
suggestions might be indicated.
Gary Herron
Gary Herron <gh*****@island training.comwri tes:
For simplicity, I'd still suggest my original use of readlines. If
and when you find you are downloading web pages with sizes that are
putting a serious strain on your memory footprint, then one of the other
suggestions might be indicated.
If you know in advance that the page you're retrieving will be
reasonable in size, then using readlines is fine. If you don't know
in advance what you're retrieving (e.g. you're working on a crawler)
you have to assume that you'll hit some very large pages with
difficult construction.
Gary Herron wrote:
Certainly there's are cases where xreadlines or read(bytecount) are
reasonable, but only if the total pages size is *very* large. But for
most web pages, you guys are just nit-picking (or showing off) to
suggest that the full read implemented by readlines is wasteful.
Moreover, the original problem was with sockets -- which don't have
xreadlines. That seems to be a method on regular file objects.
For simplicity, I'd still suggest my original use of readlines. If
and when you find you are downloading web pages with sizes that are
putting a serious strain on your memory footprint, then one of the other
suggestions might be indicated.
It isn't nitpicking to point out that you're making something that will
consume vastly more amounts of memory than it could possibly need. And
insisting that pages aren't _always_ huge is just a silly cop-out; of
course pages get very large.
There is absolutely no reason to read the entire file into memory (which
is what you're doing) before processing it. This is a good example of
the principle of there is one obvious right way to do it -- and it isn't
to read the whole thing in first for no reason whatsoever other than to
avoid an `x`.
--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM, Y!M erikmaxfrancis
The more violent the love, the more violent the anger.
-- _Burmese Proverbs_ (tr. Hla Pe)
Paul Rubin wrote:
If you know in advance that the page you're retrieving will be
reasonable in size, then using readlines is fine. If you don't know
in advance what you're retrieving (e.g. you're working on a crawler)
you have to assume that you'll hit some very large pages with
difficult construction.
And that's before you even mention the point that, depending on the
application, it could easily open yourself up to a DOS attack.
There's premature optimization, and then there's premature completely
obvious and pointless waste. This falls in the latter category.
Besides, someone was asking for/needing an older equivalent to iterating
over a file. That's obviously .xreadlines, not .readlines.
--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM, Y!M erikmaxfrancis
The more violent the love, the more violent the anger.
-- _Burmese Proverbs_ (tr. Hla Pe) This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: O. Koch |
last post by:
Until now, i know that ftplib doesn't support proxies and that i have
to use urllib2. But i don't know how to use the urllib2 correct. I
found some examples, but i don't understand them.
Is there anyone who can help me?
|
by: Pascal |
last post by:
Hello,
I want to acces my OWA (Outlook Web Acces - http Exchange interface)
server with urllib2 but, when I try, I've always a 401 http error.
Can someone help me (and us)?
Thanks.
here's my pyscript:
|
by: Benjamin Schollnick |
last post by:
Folks,
With Windows XP, and Python v2.41 I am running into a problem....
The following code gives me an unknown protocol error.... And I am not
sure how
to resolve it...
I have a API written for DocuShare for a non-SSL server, and I wanted
to update it
|
by: Ray Slakinski |
last post by:
Hello,
I have defined a function to set an opener for urllib2, this opener
defines any proxy and http authentication that is required.
If the proxy has authencation itself and requests an authenticated file
I get a HTTP status code of 401 (Unauthorized access of the file being
requested) I do see in the headers the Proxy-authorization and the
Authorization headers being sent for the request.
|
by: Alejandro Dubrovsky |
last post by:
I see from googling around that this is a popular topic, but I haven't seen
anyone saying "ah, yes, that works", so here it goes.
How does one connect through a proxy which requires basic authorisation?
The following code, stolen from somewhere, fails with a 407:
proxy_handler = urllib2.ProxyHandler({"http" :
"http://the.proxy.address:3128"})
proxy_auth_handler = urllib2.ProxyBasicAuthHandler()
proxy_auth_handler.add_password("The...
| |
by: Bo Yang |
last post by:
Hi ,
Recently I use python's urllib2 write a small script to login our
university gateway .
Usually , I must login into the gateway in order to surf the web . So ,
every time I
start my computer , it is my first thing to do that open a browser to
login the gateway !
So , I decide to write such a script , sending some post information to
the webserver
|
by: Ant |
last post by:
Hi all,
I have just moved to a new machine, and so have installed the latest
version of Python (2.4.3 - previously I believe I was running 2.4.2).
Unfortunately this seems to have broken urllib2...
An app I wrote for testing our web application makes heavy use of
urllib2 against the localhost, and I am getting the following problem
(minimal code sample):
|
by: Alessandro Fachin |
last post by:
I write this simply code that should give me the access to private page with
htaccess using a proxy, i don't known because it's wrong...
import urllib,urllib2
#input url
url="http://localhost/private/file"
#proxy set up
|
by: ken |
last post by:
Hi,
i have the following code to load a url.
My question is what if I try to load an invalide url ("http://
www.heise.de/"), will I get an IOException? or it will wait forever?
Thanks for any help.
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
|
by: Larry Hale |
last post by:
Since it seems I have a "unique" problem, I wonder if anyone could
point me in the general/right direction for tracking down the issue
and resolving it myself.
See my prior post @ http://groups.google.com/group/comp.lang.python/browse_thread/thread/44775994a6b55161?hl=en#
for more info. (Python 2.5.2 on Win XP 64 ==>Squid Proxy requiring
Authentication ==>Internet not working.)
I've looked the urllib2 source over, but am having...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
| |
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
| |
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
|
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |