Connecting Tech Pros Worldwide Forums | Help | Site Map

why not in python 2.4.3

Rocco
Guest
 
Posts: n/a
#1: May 28 '06
hi
I made the upgrade to python 2.4.3 from 2.4.2.
I want to take from google news some atom feeds with a funtion like
this
import urllib2
def takefeed(url):
request=urllib2.Request(url)
request.add_header('User-Agent', 'Mozilla/4.0 (compatible; MSIE 5.5;
Windows NT')
opener = urllib2.build_opener()
data=opener.open(request).read()
return data
url='http://news.google.it/?output=rss'
d=takefeed(url)
This woks well with python 2.3.5 but does not work with 2.4.3.
Why?
Thanks


Carl Banks
Guest
 
Posts: n/a
#2: May 28 '06

re: why not in python 2.4.3



Rocco wrote:[color=blue]
> hi
> I made the upgrade to python 2.4.3 from 2.4.2.
> I want to take from google news some atom feeds with a funtion like
> this
> import urllib2
> def takefeed(url):
> request=urllib2.Request(url)
> request.add_header('User-Agent', 'Mozilla/4.0 (compatible; MSIE 5.5;
> Windows NT')
> opener = urllib2.build_opener()
> data=opener.open(request).read()
> return data
> url='http://news.google.it/?output=rss'
> d=takefeed(url)
> This woks well with python 2.3.5 but does not work with 2.4.3.
> Why?[/color]

Define "woks [sic] well". It works fine for me on 2.4.3 (and by "works
fine" I mean it ran without an exception and it returned what appeared
to be RSS data). If you would give us an exception trace it would help
a lot.

Maybe Google's server (or your ISP's) was down. That happens
sometimes.

Carl

Rene Pijlman
Guest
 
Posts: n/a
#3: May 28 '06

re: why not in python 2.4.3


Rocco:[color=blue]
>but does not work with 2.4.3.[/color]

Define "does not work".

--
René Pijlman
Rocco
Guest
 
Posts: n/a
#4: May 28 '06

re: why not in python 2.4.3


This is the problem when I run the function
this is the result from 2.3.5[color=blue][color=green][color=darkred]
>>> print rss[/color][/color][/color]
<?xml version="1.0" encoding="UTF-8"?><feed version="0.3" xml:lang="it"
xmlns="http://purl.org/atom/ns#"><generator>NFE/1.0</generator><title>Google
News Italia</title><link rel="alternate" type="text/html"
href="http://news.google.it/"/><tagline>Google News
Italia</tagline><author><name>Google
Inc.</name><email>news-feedback@google.com</email></author><copyright>&amp;copy;2006
Google</copyright><modified>2006-05-28T19:09:13+00:00</modified>
<!-- A couple notes:
* add an "output=atom" param to get Atom
* section pages have a "topic=?" param;
use "topic=h" for a Top Stories section.
--><entry><title>Benedetto XVI: Wojtyla santo subito - LibertÃ
</title><link rel="alternate" type="text/html"
href="http://www.liberta.it/default.asp?IDG=605282024"/><id>tag:news.google.com,2005:cluster=41b535fb</id><summary>Prima
pagina</summary><issued>2006-05-28T11:05:00+00:00</issued><modified>2006-05-28T11:05:00+00:00</modified><content
type="text/html" mode="escaped">&lt;br&gt;&lt;table border=0 align=
cellpadding=5 cellspacing=0&gt;&lt;tr&gt;&lt;td width=80 align=center
valign=top&gt;&lt;a .....
[color=blue][color=green][color=darkred]
>>> import sys
>>> sys.getdefaultencoding()[/color][/color][/color]
'ascii'[color=blue][color=green][color=darkred]
>>>[/color][/color][/color]
this is the result with 2.4.3[color=blue][color=green][color=darkred]
>>> print rss[/color][/color][/color]
ヒ[color=blue][color=green][color=darkred]
>>> rss[/color][/color][/color]
'\x1f\x8b\x08\x00\x00\x00\x00\x00\x02\xff\xe5}Ks\x e3F\xb6\xe6\xfeF\xdc\xff\x90\xd77\xba\xc3\x9e\x10D \xbc\x01\xcaU\xee\xa1\x9eM[\xa2\xd4$\xabl\xf7\x86\x93\x04\x93Tv\x81H\x1a\x0fV \xa9V\xfe\x0f3\x9b\x8e\x98\x89\xb8\xcb\x1b\xd1\xb3 \x9a\xddD\xef\xec\x7f\xe2_2\xe7$\x00\x8a/\x11|\x93\xd6\xb4\xa3U"\x04\x02\x99\xe7d\x9e<\xdfy \xbe\xf9\xd3\xa7\xbeO\x86,\x8c\xb8\x08\xde~\xa1\x9 d\xaa_\x10\x16x\xa2\xc3\x83\xde\xdb/\xde5\xaf\x15\xf7\x8b?}\xf3&\x8c\xa2\xe7\x9bt\xb8\ xe9\x9b7\xde#\r\x02\xe6\x7f\xf3\xa6\xc7\x02\x16\xd 2X\x84\xdf\xd4\xae\xafJ\xf0\x847\xa5\xe7Kob\x1e\xf b\xec\x9b\x1b!z>#5\xf61"\xd5\x98\xfa\x9c\xbe)\xa5\ x7fy\xe3\xf3\xe0\xc37\x8fq<8+\x95\x02\xf8\xfbiO\xd e{\xca\xe3\xd2\x9b\x92\xfc\xe3\x9b\x0e\x8b\xbc\x90 \x0fbx\xfb\xdc\'\x8d\xff\xfd\x8dO\x83^B{\xec\x1b\x 1e\xc3\xf7\xf3\x0fo>\xb2\xf6\x1d\x8db\x16~\x83/Q\xba\x8cu\xda\xd4\xfb\xf0_\xb3\xb7y\xa2\xff\xa6\x f4|\xcf\x1bO\x0c\x9eB\xde{\x8c\xbf\xf9#\xed\x0f\xb e\xc6\x8f_\xeb\xaaj\x93\xf4\xfdoJ\xcf7\xbc\x19$\xe dK\x1a\xb3o\x1aIpBt\x97\xdc\xd1\'"\xef\xd5\xb53\xc d<3\x1crs\xd7|S\xcao\x83\x11F\xf1y\xc2\xfd\xce2\xd f\x9a\xbc\xf9_\xff\xe5\xcd\xbf)\n\xa9\x10O$\x03
C b\x16\x9d\xfd\xeb\xbf\x10\xfc\xdf\x7f!\xb4\xd3!4
_\x88$\x1e$\xf1[\xe0\xda\x17d@C\xda\'\xb1
=\x16\x93z\xa31\xba9b\x1eR\x0cn\xe8\xb1\x88<\xd2!# \x94|\x11\x8b\x01\xf7\xde\xfe)\xfb\xde\xd7\xd9\xdd \x84$\x11\xcb\xff\xf8\xf8\x05\xe9\x8a\x10nn\x8a\x0 1i\x00\x979|?{\xda\xe9\xbf\xfe\x8b\xa2|\xf3\x86\xf 7%\xd1\x0b\x99\x9f\x84\xfe<\xde\x037J<\x88\xfd\x12 \x8f[\xb0\x0e\xe4\xd3"yG+dp\x17\xef\xbe)\xe1W\x97Y<\xa5 l,<f\xfd|D\x15\xf2=\xed\x88\x8f\xdcc\'\xc4\xe3q\xf c\xeb\x7f\x90\x80\xc2\xc8\x18\xe9p\xf2\x1d\r\x85\x 7fB\xb8O\x1eD\x10\xb3.\xdc\x05\xd4!\xb4\xdbea\x1f\ x1659==%\n\xa9\xfa\xa4\xc9\xfa\x03\xb1\xccB\xc6\x8 f8\xe0?E\xf4m3]P\xf1[\xb8\xae*\xf0\x9f\xfc\xdc\xed\xbc\xad\xcb_\xe0\xae \xb7\xd9C>~\xfcx\xca\xfd\x18_\x82\x0f\xa1\x83A(\xb a"\xe8\xf0>\x0bb\x0e\x04\xea\xb0O\xa74\x
[color=blue][color=green][color=darkred]
>>> import sys
>>> sys.getdefaultencoding()[/color][/color][/color]
'latin_1'[color=blue][color=green][color=darkred]
>>>[/color][/color][/color]
No exception trace
Thanks again

Serge Orlov
Guest
 
Posts: n/a
#5: May 29 '06

re: why not in python 2.4.3


Rocco wrote:
[color=blue][color=green][color=darkred]
> >>> import sys
> >>> sys.getdefaultencoding()[/color][/color]
> 'latin_1'[/color]

Don't change default encoding. It should be always ascii.

Rocco
Guest
 
Posts: n/a
#6: May 29 '06

re: why not in python 2.4.3


Also with ascii the function does not work.

Serge Orlov
Guest
 
Posts: n/a
#7: May 29 '06

re: why not in python 2.4.3


Rocco wrote:[color=blue]
> Also with ascii the function does not work.[/color]

Well, at least you fixed misconfiguration ;)

Googling for 1F8B (that's two first bytes from your strange python 2.4
result) gives a hint: it's a beginning of gzip stream. Maybe urllib2 in
python 2.4 reports to the server that it supports compressed data but
doesn't decompress it when receives the reply?

Rocco
Guest
 
Posts: n/a
#8: May 29 '06

re: why not in python 2.4.3


Thanks Serge.
It's a gzip string.
So the code is[color=blue][color=green][color=darkred]
>>> import urllib2
>>> def takefeed(url):[/color][/color][/color]
request=urllib2.Request(url)
request.add_header('User-Agent', 'Mozilla/4.0 (compatible; MSIE
5.5;Windows NT')
opener = urllib2.build_opener()
data=opener.open(request).read()
return data
[color=blue][color=green][color=darkred]
>>> url='http://news.google.it/?output=rss'
>>> d=takefeed(url)
>>> from StringIO import StringIO
>>> zipdata=StringIO(d)
>>> import gzip
>>> gz=gzip.GzipFile(fileobj=zipdata)
>>> rss=gz.read()
>>> len(rss)[/color][/color][/color]
102529[color=blue][color=green][color=darkred]
>>> print rss[0:100][/color][/color][/color]
<?xml version="1.0" encoding="UTF-8"?><rss
version="2.0"><channel><generator>NFE/1.0</generator><tit[color=blue][color=green][color=darkred]
>>>[/color][/color][/color]

John Machin
Guest
 
Posts: n/a
#9: May 30 '06

re: why not in python 2.4.3


On 29/05/2006 10:47 PM, Serge Orlov wrote:[color=blue]
> Rocco wrote:[color=green]
>> Also with ascii the function does not work.[/color]
>
> Well, at least you fixed misconfiguration ;)
>
> Googling for 1F8B (that's two first bytes from your strange python 2.4
> result) gives a hint: it's a beginning of gzip stream.[/color]

Well done!
[color=blue]
> Maybe urllib2 in
> python 2.4 reports to the server that it supports compressed data but
> doesn't decompress it when receives the reply?
>[/color]

Something funny is happening here. Others reported it working with 2.4.3
and Rocco's original code as posted in this thread -- which works for me
on 2.4.2, Windows XP.

There was one suss thing about Rocco's problem description:
First message ended with d=takefeed(url)
But next message said print rss
Is rss == d?

Cheers,
John
John Machin
Guest
 
Posts: n/a
#10: May 30 '06

re: why not in python 2.4.3


On 30/05/2006 12:44 AM, Rocco wrote:[color=blue]
> Thanks Serge.
> It's a gzip string.[/color]

Look, Ma, no gzip!!!

C:\junk>rocco_rss.py
'<?xml version="1.0" encoding="UTF-8"?><rss
version="2.0"><channel><generator>NF
E/1.0</generator><tit'

C:\junk>type rocco_rss.py
import urllib2
def takefeed(url):
request=urllib2.Request(url)
request.add_header('User-Agent', 'Mozilla/4.0 (compatible; MSIE
5.5; Win
dows NT')
opener = urllib2.build_opener()
data=opener.open(request).read()
return data
url='http://news.google.it/?output=rss'
d=takefeed(url)
print repr(d[:100])
Serge Orlov
Guest
 
Posts: n/a
#11: May 30 '06

re: why not in python 2.4.3


John Machin wrote:[color=blue]
> On 29/05/2006 10:47 PM, Serge Orlov wrote:[color=green]
> > Maybe urllib2 in
> > python 2.4 reports to the server that it supports compressed data but
> > doesn't decompress it when receives the reply?
> >[/color]
>
> Something funny is happening here. Others reported it working with 2.4.3
> and Rocco's original code as posted in this thread -- which works for me
> on 2.4.2, Windows XP.[/color]

It "works" for me too, returning raw uncompressed data.
[color=blue]
> There was one suss thing about Rocco's problem description:
> First message ended with d=takefeed(url)
> But next message said print rss
> Is rss == d?[/color]

Nope. If you look at html tags, 2.3 code returns <feed> <generator> ...
whereas 2.4 code returns <rss> <channel> <generator> ... That may
explain why 2.3 result is not compressed and 2.4 result is compressed,
but that doesn't explain why 2.4 *is* compressed. I looked at python
2.4 httplib, I'm sure it's not a problem, quote from httplib:

# we only want a Content-Encoding of "identity" since we
don't
# support encodings such as x-gzip or x-deflate.

I think there is a web accellerator sitting somewhere between Rocco and
Google server that is confused that Rocco is "misinforming" web server
saying he's using Firefox, but at the same time claiming that he cannot
handle compressed data. That's why they teach little kids: don't lie :)

Closed Thread