By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
449,006 Members | 1,137 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 449,006 IT Pros & Developers. It's quick & easy.

Re: urllib accept-language doesn't have any effect

P: n/a
Hey Philip,

thanks for the snipplet, but I have tried that code already. It does
indeed give me a swedish version.. of www.google.de :) That's the beauty
about Google that they have all languages for all domains available.

However if I try it with www.gizmodo.com (a tech blog in several
languages) I still get the German version.

Both sites obviously redirect the client to the country-based version
according to the IP first, and Google presents that page in the desired
language AFTER that.. most other multihost sites won't have a Swedish
version of the .de site, so this doesn't quit help :(

Thanks anyway,

Martin
>
On Oct 16, 2008, at 6:50 AM, Martin Bachwerk wrote:
>Hmm, thanks for the ideas,

I've checked the requests in Firefox one more time after deleting all
the cookies and both google.com and gizmodo.com do indeed forward me
to the German site without caring about the browser settings.

wget shows me that the server does a 302 redirect straight away.. soo..

I'm not sure what you mean by this. In my experiment with wget, Google
respects the Accept-Language header. On other words, this returns a
Swedish page even though I'm executing it from a U.S. IP address:

wget "--header=Accept-Language: sv" http://www.google.com/
I see the same behavior from urllib2, although my code is slightly
different from yours. Here's my code. If I use "sv" in the header I
get Swedish, "pl" gives me Polish, etc. I get the same result when I
add your Mozilla user-agent string.

----------------------------------------
import urllib2

headers = { "Accept-Language" : "sv" }

req = urllib2.Request("http://www.google.com/", None, headers)
f = urllib2.urlopen(req)
content = f.read()
f.close()

print content
----------------------------------------
Do you get different results with this same code in Germany?

Cheers
Philip
>>
>>>
On Oct 15, 2008, at 9:50 AM, Martin Bachwerk wrote:

Hello,

I'm trying to load a couple of pages using the urllib2 module. The
problem is that I live in Germany and some sites seem to look at
the IP of the client and forward him to a localized page.. Here's
an example of the code, how I want to access google.com main
english page, but get German instead. (For those of you who live in
US, you will probably get correct results.. try emulating with 'fr'
in accepted languages or something)

opener = urllib2.build_opener()
opener.addheaders = [('Host', 'www.google.com'),
('Accept-Language','en-gb,en;q=0.5'), ('User-agent', 'Mozilla/5.0
(Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.1) Gecko/2008070208
Firefox/3.0.1')]
webfile = opener.open(url)

Martin,
It looks to me like what you're sending is correct. Debugging
suggestions --

- Set up a Web server on 127.0.0.1 and see what that server receives
when your Python code connects to it. Maybe you're not sending quite
what you think.
- Try emulating your Python code with wget or a similar command line
tool that lets you set headers.
- Sniff the conversation you're having with google using Wireshark.
Maybe you're getting redirected by the remote server.

Good luck
Philip

Oct 16 '08 #1
Share this Question
Share on Google+
1 Reply


P: n/a
In message <ma**************************************@python.o rg>, Martin
Bachwerk wrote:
It does indeed give me a swedish version.. of www.google.de :) That's the
beauty about Google that they have all languages for all domains
available.

However if I try it with www.gizmodo.com (a tech blog in several
languages) I still get the German version.
Sounds like a bug in the gizmodo.com site.
Oct 17 '08 #2

This discussion thread is closed

Replies have been disabled for this discussion.