473,472 Members | 1,831 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

urllib.urlencode wrongly encoding character

Hi, I'm trying to make a gui for a web service. Site using
character in value of some fields. But I can't encode this character
properly.

data = {'key':''}
urllib.urlencode(data)

'key=%C2%B1'

but it should be only %B1 not %C2%B1. where is this %C2 coming from?

Apr 6 '06 #1
12 4876
sl****@gmail.com wrote:
Hi, I'm trying to make a gui for a web service. Site using
character in value of some fields. But I can't encode this character
properly.

data = {'key':''}
urllib.urlencode(data) 'key=%C2%B1'

but it should be only %B1 not %C2%B1.
It should be %C2%B1, because de-facto urls are encoded as utf-8. I've
just tried entering into four input field: firefox 1.5 search
toolbar, www.google.com search in firefox 1.5, google toolbar in IE 6,
www.google.com search in IE 6. Everywhere is encoded as %C2%B1. In
older browsers YMMV.
where is this %C2 coming from?


Your console must be utf-8.
u''.encode('utf-8')

'\xc2\xb1'

Apr 6 '06 #2
you are right. but when I capture traffic in firefox via
livehttpheaders extension, it shows me that is encoded to %B1.
Addition to that, I found lots of page about urlencoding they have a
conversation tables or scripts. All of them defines as %B1 .

realy confused? I can copy and use urlencoded values from firefox, but
I'm realy want to do things with right way.

Apr 6 '06 #3
sl****@gmail.com wrote:
you are right. but when I capture traffic in firefox via
livehttpheaders extension, it shows me that is encoded to %B1.
It depends on whether user entered url into address bar or clicked on
submit button on a page. In the first case there were no standard how
to deal with non-ascii characters for a long time. Only rfc 3986 in
2005 said: use utf-8. In the second case browsers submit forms in the
encoding of the page where the form is defined. Most likely that is
what you see when you capture traffic.

Addition to that, I found lots of page about urlencoding they have a
conversation tables or scripts. All of them defines as %B1 .
I guess it is because web pages usually serve pretty closed language
communities. Some people just encode urls as latin-1, and it works for
99.9999% of their users. They just don't care that they don't handle
chinese characters since they have no chinese users.

realy confused? I can copy and use urlencoded values from firefox, but
I'm realy want to do things with right way.


It is not clear what you do. Are you interacting with independant 3rd
party web service or you control both server and client?

Apr 6 '06 #4
I have no control over server side.

I'm using Ubuntu Breezy at home and Ubuntu Dapper at work. Now I'm at
work and same code working properly here! (returning %B1) I'm not sure
and not checked yet but locale settings and/or installed Python version
may be different between two computers.

I think there should be way to encode to %B1 on any platform/locale
combination. While searching for a real solution, I'm going to add a
search&destroy filter for %C2 on urlencoded dictionary as a workaround.
Because my queries are constant and %C2 is the only problem for now.

Apr 6 '06 #5
sl****@gmail.com wrote:
I think there should be way to encode to %B1 on any platform/locale
combination. While searching for a real solution, I'm going to add a
search&destroy filter for %C2 on urlencoded dictionary as a workaround.
Because my queries are constant and %C2 is the only problem for now.


I'm obviously missing some context here, but "encoding to %B1 on any
platform" is exactly what urlencode does:
import urllib
urllib.urlencode([("key", chr(0xb1))])

'key=%B1'

(however, if you pass in unicode values with non-ascii characters, url-
encode will give you an error).

are you sure the conversion to UTF-8 isn't happening *before* you pass
your data to urlencode ? what does

print "1", repr(data)
print "2", repr(urllib.urlencode(data))

print for the kind of data you're encoding ?

</F>

Apr 6 '06 #6

"Fredrik Lundh" <fr*****@pythonware.com> wrote in message
news:ma***************************************@pyt hon.org...
I'm obviously missing some context here, but "encoding to %B1 on any
platform" is exactly what urlencode does:
>>> import urllib
>>> urllib.urlencode([("key", chr(0xb1))])

'key=%B1'


Yeah but you're cheating by using the platform independent chr(0xb1)
instead of a literal '' in an unspecified encoding.
Apr 6 '06 #7
when I remove "# -*- coding: utf-8 -*-" line from start of the script
it worked properly. So I moved variable decleration to another file and
imported than it worked too.

Now it's working but I dont understand what I'm doing wrong? I'm new to
Python and unicode encoding. I'm tried
encode/decode(ascii,utf-8,latin-1,iso-8859-9) on this string. None of
them worked and gave fallowing error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 5.
I think I must read more docs about Python and Unicode strings :)

Apr 6 '06 #8
I'm just discovered that I don't have to remove that line, just change
utf-8 to iso-8859-9 and it worked again. But I want to use utf-8.
Please advise...

Apr 6 '06 #9
Evren Esat Ozkan wrote:
when I remove "# -*- coding: utf-8 -*-" line from start of the script
it worked properly. So I moved variable decleration to another file and
imported than it worked too.


the coding directive controls how *unicode* literals in the *source code*
are parsed into unicode string objects. it has absolutely nothing to do with
how urlencode works.

if would help if you posted a short self-contained code snippet, so we
don't have to keep guessing.

</F>

Apr 6 '06 #10
Ok, I think this code snippet enough to show what i said;

===================================

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#Change utf-8 to latin-1
#Or move variable decleration to another file than import it

val='00090NO:HHHH'

from urllib import urlencode

data={'key':val}

print urlencode(data)

===================================

Apr 7 '06 #11
Evren Esat Ozkan wrote:
Ok, I think this code snippet enough to show what i said;

===================================

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#Change utf-8 to latin-1
#Or move variable decleration to another file than import it

val='00090NO:HHHH'

from urllib import urlencode

data={'key':val}

print urlencode(data)

===================================


did you cut and paste this into your mail program? because the file
I got was ISO-8859-1 encoded:

Content-Type: text/plain; charset="iso-8859-1"

and uses a single byte to store each "", and produces

key=00090%B1NO%3A%B1H%B1H%B1H%B1H%B1

when I run it, which is the expected result.

I think you're still not getting what's going on here, so let's try again:

- the urlencode function doesn't care about encodings; it translates
the bytes it gets one by one. if you pass in chr(0xB1), you get %B1
in the output.

- it's your editor that decides how that "" you typed in the original
script are stored on disk; it may use one ISO-8859-1 bytes, two
UTF-8 bytes, or something else.

- the coding directive doesn't affect non-Unicode string literals in
Python. in an 8-bit string, Python only sees a number of bytes.

- the urlencode function only cares about the bytes.

since you know that you want to use ISO-8859-1 encoding for your
URL, and you seem to insist on typing the "" characters in your code,
the most portable (and editor-independent) way to write your code is
to use Unicode literals when building the string, and explicitly convert
to ISO-8859-1 on the way out.

# build the URL as a Unicode string
val = u'00090NO:HHHH'

# encode as 8859-1 (latin-1)
val = val.encode("iso-8859-1")

from urllib import urlencode
data={'key':val}
print urlencode(data)

key=00090%B1NO%3A%B1H%B1H%B1H%B1H%B1
this will work the same way no matter what character set you use to
store the Python source file, as long as the coding directive matches
what your editor is actually doing.

if you want to make your code 100% robust, forget the idea of putting
non-ascii characters in string literals, and use \xB1 instead:

val = '00090\xb1NO:\xb1H\xb1H\xb1H\xb1H\xb1'

# no need to encode, since the byte string is already iso-8859-1

from urllib import urlencode
data={'key':val}
print urlencode(data)

key=00090%B1NO%3A%B1H%B1H%B1H%B1H%B1

hope this helps!

</F>

Apr 7 '06 #12
I copied and pasted my code to new file and saved with utf-8 encoding.
it produced 00090%C2%B1NO%3A%C2%B1H%C2%B1H%C2%B1H%C2%B1H%C2%B1
Than I added "u" to decleration and encode it with iso-8859-1 as you
wrote and finally it produced proper result.

Your reply is so helped and clarify some things about unicode string
usage on Python.
Thank you very much!

Apr 7 '06 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Joshua Beall | last post by:
Hi All, I can see from the manual that the difference between urlencode and rawurlencode is that urlencode translates spaces to '+' characters, whereas rawurlencode translates it into it's hex...
7
by: Stuart McGraw | last post by:
I just spent a $*#@!*&^&% hour registering at ^$#@#%^ Sourceforce and trying to submit a Python bug report but it still won't let me. I give up. Maybe someone who cares will see this post, or...
0
by: Pieter Edelman | last post by:
Hi all, I'm trying to submit some data using a POST request to a HTTP server with BASIC authentication with python, but I can't get it to work. Since it's driving me completely nuts, so here's...
3
by: George Hester | last post by:
http://support.microsoft.com/default.aspx?scid=kb;en-us;301464 Look down at the MyPage.asp example. You will see that Microsoft does this: 'Costruct the URL for the current page s = "http://"...
5
by: vtreddy | last post by:
Hi All: In my application I am sending the input values through query string from a datagrid template column,I used URLEncode to encode the URL String, please find the input below, I am facing a...
9
by: Mark_Rarefy | last post by:
Trying to urlencode this string: »ÁÏŒŠ˜ªŒ›h^aYh in vb.net (using either HttpUtility.UrlEncode(strEncrypted, encoding.UTF8) orServer.UrlEncode) I get:...
1
by: evanpmeth | last post by:
I have tried multiple ways of posting information to a website and have failed. I have seen this problem on other forums can someone explain or point me to information on how POST works through...
11
by: George Sakkis | last post by:
The following snippet results in different outcome for (at least) the last three major releases: # Python 2.3.4 u'%94' # Python 2.4.2 UnicodeDecodeError: 'ascii' codec can't decode byte...
0
by: gmguyx | last post by:
I tried using urllib.urlopen to open a personalized webpage (my.yahoo.com) but it doesn't work: print urllib.urlopen(http://my.yahoo.com).read() Instead of returning my.yahoo.com, it returns a...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
1
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.