473,395 Members | 2,151 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

URL Character Decoding

If you have a link such as, e.g.:

<a href="index.py?title=Main Menu">Main menu!</a>

The space will be translated to the character code '%20' when you later
retrieve the GET data. Not knowing if there was a library function that
would convert these back to their actual characters, I've written the
following:

import re

def sub_func(m):
return chr(int(m.group()[1:], 16))

def parse_title(title):
p = re.compile(r'%[0-9][0-9]')
return re.sub(p, sub_func, title)

(I know I could probably use a lambda function instead of sub_func, but
I come to Python via C++ and am still not entirely used to them. This is
clearer to me, at least.)

I guess what I'm asking is: Is there a library function (in Python or
mod_python) that knows how to do this? Or, failing that, is there a
different regex I could use to get rid of the substitution function?

-Kirk McDonald
Jan 30 '06 #1
4 1724
Kirk McDonald wrote:
If you have a link such as, e.g.:

<a href="index.py?title=Main Menu">Main menu!</a>

The space will be translated to the character code '%20' when you later
retrieve the GET data. Not knowing if there was a library function that
would convert these back to their actual characters, I've written the
following:

import re

def sub_func(m):
return chr(int(m.group()[1:], 16))

def parse_title(title):
p = re.compile(r'%[0-9][0-9]')
return re.sub(p, sub_func, title)

(I know I could probably use a lambda function instead of sub_func, but
I come to Python via C++ and am still not entirely used to them. This is
clearer to me, at least.)

I guess what I'm asking is: Is there a library function (in Python or
mod_python) that knows how to do this? Or, failing that, is there a
different regex I could use to get rid of the substitution function?

-Kirk McDonald


Actually, I just noticed this doesn't really work at all. The URL
character codes are in hex, so not only does the regex not match what it
should, but sub_func fails miserably. See why I wanted a library function?

-Kirk McDonald
Jan 30 '06 #2
Kirk McDonald wrote:
Actually, I just noticed this doesn't really work at all. The URL
character codes are in hex, so not only does the regex not match what it
should, but sub_func fails miserably. See why I wanted a library function?

-Kirk McDonald


Not to keep talking to myself, but looks like sub_func works fine, and
the regex just needs to be r'%[0-9a-fA-F][0-9a-fA-F]'. But even so.

-Kirk McDonald
Jan 30 '06 #3
"Kirk McDonald" <mo******@suad.org> wrote in message
news:43******@nntp0.pdx.net...
If you have a link such as, e.g.:

<a href="index.py?title=Main Menu">Main menu!</a>

The space will be translated to the character code '%20' when you later
retrieve the GET data.

I guess what I'm asking is: Is there a library function (in Python or
mod_python) that knows how to do this? Or, failing that, is there a
different regex I could use to get rid of the substitution function?

-Kirk McDonald


import urllib
urllib.quote("index.py?title=Main Menu") 'index.py%3Ftitle%3DMain%20Menu' urllib.unquote("index.py%3Ftitle%3DMain%20Menu")

'index.py?title=Main Menu'
Jan 30 '06 #4
Paul McGuire wrote:
"Kirk McDonald" <mo******@suad.org> wrote in message
news:43******@nntp0.pdx.net...
If you have a link such as, e.g.:

<a href="index.py?title=Main Menu">Main menu!</a>

The space will be translated to the character code '%20' when you later
retrieve the GET data.

I guess what I'm asking is: Is there a library function (in Python or
mod_python) that knows how to do this? Or, failing that, is there a
different regex I could use to get rid of the substitution function?

-Kirk McDonald

import urllib
urllib.quote("index.py?title=Main Menu")
'index.py%3Ftitle%3DMain%20Menu'
urllib.unquote("index.py%3Ftitle%3DMain%20Menu ")


'index.py?title=Main Menu'


Perfect! Thanks.

-Kirk McDonald
Jan 30 '06 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Dominik Jain | last post by:
Hi! We hope somebody can help me with the following: We have a form through which unicode data might be submitted. We need to be able to detect when this happens and, most importantly, we need...
1
by: Darran Edmundson | last post by:
I'm parsing a file with the mailbox and email modules and come across subject headers like: =?us-ascii?Q?Re=3A=20=5Bosg=2Duser=5D=20Culling=20problem?= and...
0
by: Johann Blake | last post by:
In my need to decode a JPEG 2000 file, I discovered like many that there was no functionality for this in the .NET Framework. Instead of forking out a pile of cash to do this, I came up with the...
8
by: Brand Bogard | last post by:
Does the C standard include a library function to convert an 8 bit character string to a 16 bit character string?
37
by: Zhiv Kurilka | last post by:
Hi, I have a text file with following content: "((^)|(.* +))§§§§§§§§" if I read it with: k=System.IO.StreamReader( "file.txt",System.Text.Encoding.ASCII); k.readtotheend()
7
by: Jon Maz | last post by:
Hi, I'm having problems with a RewriteRule that's applied to url's with the % character in them, hope someone can help. The % character is a result of url-encoding non-ASCII words, as in the...
3
by: LiMBi | last post by:
Hi, Is there a way to encode "??????????? ??????????" to "¶ÒÁ¹Ô´¹Ö§¤Ð µÃ§·Õèà»ç¹" and vice versa. Thanks
0
by: =?Utf-8?B?cHJpZWJlZG5h?= | last post by:
I'm running into trouble with the decoding of the micron character in a string. Instead of "µ" I get "??". From what I can see the character is encoded correctly by the caller but is decoded...
4
by: seedstorm | last post by:
I am using a HtmlInputFile control in ASP.NET 2.0 to upload a file in a UserControl. After upload, I am examining the HttpPostedFile property of this object to read the bytes of the uploaded file's...
3
by: bsagert | last post by:
Some web feeds use decimal character entities that seem to confuse Python (or me). For example, the string "doesn't" may be coded as "doesn’t" which should produce a right leaning apostrophe....
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.