By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,213 Members | 1,117 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,213 IT Pros & Developers. It's quick & easy.

why not in python 2.4.3

P: n/a
hi
I made the upgrade to python 2.4.3 from 2.4.2.
I want to take from google news some atom feeds with a funtion like
this
import urllib2
def takefeed(url):
request=urllib2.Request(url)
request.add_header('User-Agent', 'Mozilla/4.0 (compatible; MSIE 5.5;
Windows NT')
opener = urllib2.build_opener()
data=opener.open(request).read()
return data
url='http://news.google.it/?output=rss'
d=takefeed(url)
This woks well with python 2.3.5 but does not work with 2.4.3.
Why?
Thanks

May 28 '06 #1
Share this Question
Share on Google+
10 Replies


P: n/a

Rocco wrote:
hi
I made the upgrade to python 2.4.3 from 2.4.2.
I want to take from google news some atom feeds with a funtion like
this
import urllib2
def takefeed(url):
request=urllib2.Request(url)
request.add_header('User-Agent', 'Mozilla/4.0 (compatible; MSIE 5.5;
Windows NT')
opener = urllib2.build_opener()
data=opener.open(request).read()
return data
url='http://news.google.it/?output=rss'
d=takefeed(url)
This woks well with python 2.3.5 but does not work with 2.4.3.
Why?


Define "woks [sic] well". It works fine for me on 2.4.3 (and by "works
fine" I mean it ran without an exception and it returned what appeared
to be RSS data). If you would give us an exception trace it would help
a lot.

Maybe Google's server (or your ISP's) was down. That happens
sometimes.

Carl

May 28 '06 #2

P: n/a
Rocco:
but does not work with 2.4.3.


Define "does not work".

--
René Pijlman
May 28 '06 #3

P: n/a
This is the problem when I run the function
this is the result from 2.3.5
print rss <?xml version="1.0" encoding="UTF-8"?><feed version="0.3" xml:lang="it"
xmlns="http://purl.org/atom/ns#"><generator>NFE/1.0</generator><title>Google
News Italia</title><link rel="alternate" type="text/html"
href="http://news.google.it/"/><tagline>Google News
Italia</tagline><author><name>Google
Inc.</name><email>ne***********@google.com</email></author><copyright>&amp;copy;2006
Google</copyright><modified>2006-05-28T19:09:13+00:00</modified>
<!-- A couple notes:
* add an "output=atom" param to get Atom
* section pages have a "topic=?" param;
use "topic=h" for a Top Stories section.
--><entry><title>Benedetto XVI: Wojtyla santo subito - LibertÃ
</title><link rel="alternate" type="text/html"
href="http://www.liberta.it/default.asp?IDG=605282024"/><id>tag:news.google.com,2005:cluster=41b535fb</id><summary>Prima
pagina</summary><issued>2006-05-28T11:05:00+00:00</issued><modified>2006-05-28T11:05:00+00:00</modified><content
type="text/html" mode="escaped">&lt;br&gt;&lt;table border=0 align=
cellpadding=5 cellspacing=0&gt;&lt;tr&gt;&lt;td width=80 align=center
valign=top&gt;&lt;a .....
import sys
sys.getdefaultencoding() 'ascii' this is the result with 2.4.3 print rss ヒ rss '\x1f\x8b\x08\x00\x00\x00\x00\x00\x02\xff\xe5}Ks\x e3F\xb6\xe6\xfeF\xdc\xff\x90\xd77\xba\xc3\x9e\x10D \xbc\x01\xcaU\xee\xa1\x9eM[\xa2\xd4$\xabl\xf7\x86\x93\x04\x93Tv\x81H\x1a\x0fV \xa9V\xfe\x0f3\x9b\x8e\x98\x89\xb8\xcb\x1b\xd1\xb3 \x9a\xddD\xef\xec\x7f\xe2_2\xe7$\x00\x8a/\x11|\x93\xd6\xb4\xa3U"\x04\x02\x99\xe7d\x9e<\xdfy \xbe\xf9\xd3\xa7\xbeO\x86,\x8c\xb8\x08\xde~\xa1\x9 d\xaa_\x10\x16x\xa2\xc3\x83\xde\xdb/\xde5\xaf\x15\xf7\x8b?}\xf3&\x8c\xa2\xe7\x9bt\xb8\ xe9\x9b7\xde#\r\x02\xe6\x7f\xf3\xa6\xc7\x02\x16\xd 2X\x84\xdf\xd4\xae\xafJ\xf0\x847\xa5\xe7Kob\x1e\xf b\xec\x9b\x1b!z>#5\xf61"\xd5\x98\xfa\x9c\xbe)\xa5\ x7fy\xe3\xf3\xe0\xc37\x8fq<8+\x95\x02\xf8\xfbiO\xd e{\xca\xe3\xd2\x9b\x92\xfc\xe3\x9b\x0e\x8b\xbc\x90 \x0fbx\xfb\xdc\'\x8d\xff\xfd\x8dO\x83^B{\xec\x1b\x 1e\xc3\xf7\xf3\x0fo>\xb2\xf6\x1d\x8db\x16~\x83/Q\xba\x8cu\xda\xd4\xfb\xf0_\xb3\xb7y\xa2\xff\xa6\x f4|\xcf\x1bO\x0c\x9eB\xde{\x8c\xbf\xf9#\xed\x0f\xb e\xc6\x8f_\xeb\xaaj\x93\xf4\xfdoJ\xcf7\xbc\x19$\xe dK\x1a\xb3o\x1aIpBt\x97\xdc\xd1\'"\xef\xd5\xb53\xc d<3\x1crs\xd7|S\xcao\x83\x11F\xf1y\xc2\xfd\xce2\xd f\x9a\xbc\xf9_\xff\xe5\xcd\xbf)\n\xa9\x10O$\x03
C b\x16\x9d\xfd\xeb\xbf\x10\xfc\xdf\x7f!\xb4\xd3!4
_\x88$\x1e$\xf1[\xe0\xda\x17d@C\xda\'\xb1
=\x16\x93z\xa31\xba9b\x1eR\x0cn\xe8\xb1\x88<\xd2!# \x94|\x11\x8b\x01\xf7\xde\xfe)\xfb\xde\xd7\xd9\xdd \x84$\x11\xcb\xff\xf8\xf8\x05\xe9\x8a\x10nn\x8a\x0 1i\x00\x979|?{\xda\xe9\xbf\xfe\x8b\xa2|\xf3\x86\xf 7%\xd1\x0b\x99\x9f\x84\xfe<\xde\x037J<\x88\xfd\x12 \x8f[\xb0\x0e\xe4\xd3"yG+dp\x17\xef\xbe)\xe1W\x97Y<\xa5 l,<f\xfd|D\x15\xf2=\xed\x88\x8f\xdcc\'\xc4\xe3q\xf c\xeb\x7f\x90\x80\xc2\xc8\x18\xe9p\xf2\x1d\r\x85\x 7fB\xb8O\x1eD\x10\xb3.\xdc\x05\xd4!\xb4\xdbea\x1f\ x1659==%\n\xa9\xfa\xa4\xc9\xfa\x03\xb1\xccB\xc6\x8 f8\xe0?E\xf4m3]P\xf1[\xb8\xae*\xf0\x9f\xfc\xdc\xed\xbc\xad\xcb_\xe0\xae \xb7\xd9C>~\xfcx\xca\xfd\x18_\x82\x0f\xa1\x83A(\xb a"\xe8\xf0>\x0bb\x0e\x04\xea\xb0O\xa74\x
import sys
sys.getdefaultencoding() 'latin_1'

No exception trace
Thanks again

May 28 '06 #4

P: n/a
Rocco wrote:
import sys
sys.getdefaultencoding()

'latin_1'


Don't change default encoding. It should be always ascii.

May 29 '06 #5

P: n/a
Also with ascii the function does not work.

May 29 '06 #6

P: n/a
Rocco wrote:
Also with ascii the function does not work.


Well, at least you fixed misconfiguration ;)

Googling for 1F8B (that's two first bytes from your strange python 2.4
result) gives a hint: it's a beginning of gzip stream. Maybe urllib2 in
python 2.4 reports to the server that it supports compressed data but
doesn't decompress it when receives the reply?

May 29 '06 #7

P: n/a
Thanks Serge.
It's a gzip string.
So the code is
import urllib2
def takefeed(url): request=urllib2.Request(url)
request.add_header('User-Agent', 'Mozilla/4.0 (compatible; MSIE
5.5;Windows NT')
opener = urllib2.build_opener()
data=opener.open(request).read()
return data
url='http://news.google.it/?output=rss'
d=takefeed(url)
from StringIO import StringIO
zipdata=StringIO(d)
import gzip
gz=gzip.GzipFile(fileobj=zipdata)
rss=gz.read()
len(rss) 102529 print rss[0:100] <?xml version="1.0" encoding="UTF-8"?><rss
version="2.0"><channel><generator>NFE/1.0</generator><tit


May 29 '06 #8

P: n/a
On 29/05/2006 10:47 PM, Serge Orlov wrote:
Rocco wrote:
Also with ascii the function does not work.
Well, at least you fixed misconfiguration ;)

Googling for 1F8B (that's two first bytes from your strange python 2.4
result) gives a hint: it's a beginning of gzip stream.


Well done!
Maybe urllib2 in
python 2.4 reports to the server that it supports compressed data but
doesn't decompress it when receives the reply?


Something funny is happening here. Others reported it working with 2.4.3
and Rocco's original code as posted in this thread -- which works for me
on 2.4.2, Windows XP.

There was one suss thing about Rocco's problem description:
First message ended with d=takefeed(url)
But next message said print rss
Is rss == d?

Cheers,
John
May 30 '06 #9

P: n/a
On 30/05/2006 12:44 AM, Rocco wrote:
Thanks Serge.
It's a gzip string.


Look, Ma, no gzip!!!

C:\junk>rocco_rss.py
'<?xml version="1.0" encoding="UTF-8"?><rss
version="2.0"><channel><generator>NF
E/1.0</generator><tit'

C:\junk>type rocco_rss.py
import urllib2
def takefeed(url):
request=urllib2.Request(url)
request.add_header('User-Agent', 'Mozilla/4.0 (compatible; MSIE
5.5; Win
dows NT')
opener = urllib2.build_opener()
data=opener.open(request).read()
return data
url='http://news.google.it/?output=rss'
d=takefeed(url)
print repr(d[:100])
May 30 '06 #10

P: n/a
John Machin wrote:
On 29/05/2006 10:47 PM, Serge Orlov wrote:
Maybe urllib2 in
python 2.4 reports to the server that it supports compressed data but
doesn't decompress it when receives the reply?

Something funny is happening here. Others reported it working with 2.4.3
and Rocco's original code as posted in this thread -- which works for me
on 2.4.2, Windows XP.


It "works" for me too, returning raw uncompressed data.
There was one suss thing about Rocco's problem description:
First message ended with d=takefeed(url)
But next message said print rss
Is rss == d?


Nope. If you look at html tags, 2.3 code returns <feed> <generator> ...
whereas 2.4 code returns <rss> <channel> <generator> ... That may
explain why 2.3 result is not compressed and 2.4 result is compressed,
but that doesn't explain why 2.4 *is* compressed. I looked at python
2.4 httplib, I'm sure it's not a problem, quote from httplib:

# we only want a Content-Encoding of "identity" since we
don't
# support encodings such as x-gzip or x-deflate.

I think there is a web accellerator sitting somewhere between Rocco and
Google server that is confused that Rocco is "misinforming" web server
saying he's using Firefox, but at the same time claiming that he cannot
handle compressed data. That's why they teach little kids: don't lie :)

May 30 '06 #11

This discussion thread is closed

Replies have been disabled for this discussion.