urllib.urlencode wrongly encoding ± character

sleytr

Hi, I'm trying to make a gui for a web service. Site using ±
character in value of some fields. But I can't encode this character
properly.

data = {'key':'±'}
urllib.urlencode(data)

'key=%C2%B1'

but it should be only %B1 not %C2%B1. where is this %C2 coming from?

Apr 6 '06 #1

Subscribe Reply

4876

Serge Orlov

sl****@gmail.com wrote:

Hi, I'm trying to make a gui for a web service. Site using ±
character in value of some fields. But I can't encode this character
properly.

data = {'key':'±'}
urllib.urlencode(data) 'key=%C2%B1'

but it should be only %B1 not %C2%B1.
It should be %C2%B1, because de-facto urls are encoded as utf-8. I've
just tried entering ± into four input field: firefox 1.5 search
toolbar, www.google.com search in firefox 1.5, google toolbar in IE 6,
www.google.com search in IE 6. Everywhere ± is encoded as %C2%B1. In
older browsers YMMV.
where is this %C2 coming from?

Your console must be utf-8.
u'±'.encode('utf-8')

'\xc2\xb1'

Apr 6 '06 #2

sleytr

you are right. but when I capture traffic in firefox via
livehttpheaders extension, it shows me that ± is encoded to %B1.
Addition to that, I found lots of page about urlencoding they have a
conversation tables or scripts. All of them defines ± as %B1 .

realy confused? I can copy and use urlencoded values from firefox, but
I'm realy want to do things with right way.

Apr 6 '06 #3

Serge Orlov

sl****@gmail.com wrote:

you are right. but when I capture traffic in firefox via
livehttpheaders extension, it shows me that ± is encoded to %B1.
It depends on whether user entered url into address bar or clicked on
submit button on a page. In the first case there were no standard how
to deal with non-ascii characters for a long time. Only rfc 3986 in
2005 said: use utf-8. In the second case browsers submit forms in the
encoding of the page where the form is defined. Most likely that is
what you see when you capture traffic.

Addition to that, I found lots of page about urlencoding they have a
conversation tables or scripts. All of them defines ± as %B1 .
I guess it is because web pages usually serve pretty closed language
communities. Some people just encode urls as latin-1, and it works for
99.9999% of their users. They just don't care that they don't handle
chinese characters since they have no chinese users.

realy confused? I can copy and use urlencoded values from firefox, but
I'm realy want to do things with right way.

It is not clear what you do. Are you interacting with independant 3rd
party web service or you control both server and client?

Apr 6 '06 #4

sleytr

I have no control over server side.

I'm using Ubuntu Breezy at home and Ubuntu Dapper at work. Now I'm at
work and same code working properly here! (returning %B1) I'm not sure
and not checked yet but locale settings and/or installed Python version
may be different between two computers.

I think there should be way to encode ± to %B1 on any platform/locale
combination. While searching for a real solution, I'm going to add a
search&destroy filter for %C2 on urlencoded dictionary as a workaround.
Because my queries are constant and %C2 is the only problem for now.

Apr 6 '06 #5

Fredrik Lundh

sl****@gmail.com wrote:

I think there should be way to encode ± to %B1 on any platform/locale
combination. While searching for a real solution, I'm going to add a
search&destroy filter for %C2 on urlencoded dictionary as a workaround.
Because my queries are constant and %C2 is the only problem for now.

I'm obviously missing some context here, but "encoding ± to %B1 on any
platform" is exactly what urlencode does:

import urllib
urllib.urlencode([("key", chr(0xb1))])

'key=%B1'

(however, if you pass in unicode values with non-ascii characters, url-
encode will give you an error).

are you sure the conversion to UTF-8 isn't happening *before* you pass
your data to urlencode ? what does

print "1", repr(data)
print "2", repr(urllib.urlencode(data))

print for the kind of data you're encoding ?

</F>

Apr 6 '06 #6

Richard Brodie

"Fredrik Lundh" <fr*****@pythonware.com> wrote in message
news:ma***************************************@pyt hon.org...

I'm obviously missing some context here, but "encoding ± to %B1 on any
platform" is exactly what urlencode does:
>>> import urllib
>>> urllib.urlencode([("key", chr(0xb1))])

'key=%B1'

Yeah but you're cheating by using the platform independent chr(0xb1)
instead of a literal '±' in an unspecified encoding.

Apr 6 '06 #7

Evren Esat Ozkan

when I remove "# -*- coding: utf-8 -*-" line from start of the script
it worked properly. So I moved variable decleration to another file and
imported than it worked too.

Now it's working but I dont understand what I'm doing wrong? I'm new to
Python and unicode encoding. I'm tried
encode/decode(ascii,utf-8,latin-1,iso-8859-9) on this string. None of
them worked and gave fallowing error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 5.
I think I must read more docs about Python and Unicode strings :)

Apr 6 '06 #8

Evren Esat Ozkan

I'm just discovered that I don't have to remove that line, just change
utf-8 to iso-8859-9 and it worked again. But I want to use utf-8.
Please advise...

Apr 6 '06 #9

Fredrik Lundh

Evren Esat Ozkan wrote:

when I remove "# -*- coding: utf-8 -*-" line from start of the script
it worked properly. So I moved variable decleration to another file and
imported than it worked too.

the coding directive controls how *unicode* literals in the *source code*
are parsed into unicode string objects. it has absolutely nothing to do with
how urlencode works.

if would help if you posted a short self-contained code snippet, so we
don't have to keep guessing.

</F>

Apr 6 '06 #10

Evren Esat Ozkan

Ok, I think this code snippet enough to show what i said;

===================================

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#Change utf-8 to latin-1
#Or move variable decleration to another file than import it

val='00090±NO:±H±H±H±H±'

from urllib import urlencode

data={'key':val}

print urlencode(data)

===================================

Apr 7 '06 #11

Fredrik Lundh

Evren Esat Ozkan wrote:

Ok, I think this code snippet enough to show what i said;

===================================

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#Change utf-8 to latin-1
#Or move variable decleration to another file than import it

val='00090±NO:±H±H±H±H±'

from urllib import urlencode

data={'key':val}

print urlencode(data)

===================================

did you cut and paste this into your mail program? because the file
I got was ISO-8859-1 encoded:

Content-Type: text/plain; charset="iso-8859-1"

and uses a single byte to store each "±", and produces

key=00090%B1NO%3A%B1H%B1H%B1H%B1H%B1

when I run it, which is the expected result.

I think you're still not getting what's going on here, so let's try again:

- the urlencode function doesn't care about encodings; it translates
the bytes it gets one by one. if you pass in chr(0xB1), you get %B1
in the output.

- it's your editor that decides how that "±" you typed in the original
script are stored on disk; it may use one ISO-8859-1 bytes, two
UTF-8 bytes, or something else.

- the coding directive doesn't affect non-Unicode string literals in
Python. in an 8-bit string, Python only sees a number of bytes.

- the urlencode function only cares about the bytes.

since you know that you want to use ISO-8859-1 encoding for your
URL, and you seem to insist on typing the "±" characters in your code,
the most portable (and editor-independent) way to write your code is
to use Unicode literals when building the string, and explicitly convert
to ISO-8859-1 on the way out.

# build the URL as a Unicode string
val = u'00090±NO:±H±H±H±H±'

# encode as 8859-1 (latin-1)
val = val.encode("iso-8859-1")

from urllib import urlencode
data={'key':val}
print urlencode(data)

key=00090%B1NO%3A%B1H%B1H%B1H%B1H%B1
this will work the same way no matter what character set you use to
store the Python source file, as long as the coding directive matches
what your editor is actually doing.

if you want to make your code 100% robust, forget the idea of putting
non-ascii characters in string literals, and use \xB1 instead:

val = '00090\xb1NO:\xb1H\xb1H\xb1H\xb1H\xb1'

# no need to encode, since the byte string is already iso-8859-1

from urllib import urlencode
data={'key':val}
print urlencode(data)

key=00090%B1NO%3A%B1H%B1H%B1H%B1H%B1

hope this helps!

</F>

Apr 7 '06 #12

Evren Esat Ozkan

I copied and pasted my code to new file and saved with utf-8 encoding.
it produced 00090%C2%B1NO%3A%C2%B1H%C2%B1H%C2%B1H%C2%B1H%C2%B1
Than I added "u" to decleration and encode it with iso-8859-1 as you
wrote and finally it produced proper result.

Your reply is so helped and clarify some things about unicode string
usage on Python.
Thank you very much!

Apr 7 '06 #13

Similar topics

urlencode vs rawurlencode

by: Joshua Beall | last post by:

Hi All, I can see from the manual that the difference between urlencode and rawurlencode is that urlencode translates spaces to '+' characters, whereas rawurlencode translates it into it's hex...

PHP

bad data from urllib when run from MS .bat file

by: Stuart McGraw | last post by:

I just spent a $*#@!*&^&% hour registering at ^$#@#%^ Sourceforce and trying to submit a Python bug report but it still won't let me. I give up. Maybe someone who cares will see this post, or...

Python

POST data with 401 authentication using urllib(2)

by: Pieter Edelman | last post by:

Hi all, I'm trying to submit some data using a POST request to a HTTP server with BASIC authentication with python, but I can't get it to work. Since it's driving me completely nuts, so here's...

Python

My problem with Server.URLEncode as used here

by: George Hester | last post by:

http://support.microsoft.com/default.aspx?scid=kb;en-us;301464 Look down at the MyPage.asp example. You will see that Microsoft does this: 'Costruct the URL for the current page s = "http://"...

ASP / Active Server Pages

URLEncode Problem from ASP.NET

by: vtreddy | last post by:

Hi All: In my application I am sending the input values through query string from a datagrid template column,I used URLEncode to encode the URL String, please find the input below, I am facing a...

ASP.NET

Server.URLEncode diffrences in asp and vb.net

by: Mark_Rarefy | last post by:

Trying to urlencode this string: ÂÂ»ÃÃÅ’Å ËœÂªÅ’Ââ€ºh^aYh in vb.net (using either HttpUtility.UrlEncode(strEncrypted, encoding.UTF8) orServer.UrlEncode) I get:...

.NET Framework

Question about urllib and posting to an external script

by: evanpmeth | last post by:

I have tried multiple ways of posting information to a website and have failed. I have seen this problem on other forums can someone explain or point me to information on how POST works through...

Python

urllib.unquote and unicode

by: George Sakkis | last post by:

The following snippet results in different outcome for (at least) the last three major releases: # Python 2.3.4 u'%94' # Python 2.4.2 UnicodeDecodeError: 'ascii' codec can't decode byte...

Python

urllib: cannot open webpage which requires authentication

by: gmguyx | last post by:

I tried using urllib.urlopen to open a personalized webpage (my.yahoo.com) but it doesn't work: print urllib.urlopen(http://my.yahoo.com).read() Instead of returning my.yahoo.com, it returns a...

Python

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...

Networking - Hardware / Configuration

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

C# / C Sharp