473,397 Members | 1,985 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,397 software developers and data experts.

Character encoding

mp
I have html document titles with characters like >,  , and
&#135. How do I decode a string with these values in Python?

Thanks

Nov 7 '06 #1
5 1100
I would suggest using string.replace. Simply replace '&nbsp' with ' '
for each time it occurs. It doesn't take too much code.

On Nov 7, 1:34 pm, "mp" <mailpitc...@email.comwrote:
I have html document titles with characters like &gt;, &nbsp;, and
&#135. How do I decode a string with these values in Python?

Thanks
Nov 7 '06 #2
mp
I'd prefer a more generalized solution which takes care of all possible
ampersand characters. I assume that there is code already written which
does this.

Thanks

i80and wrote:
I would suggest using string.replace. Simply replace '&nbsp' with ' '
for each time it occurs. It doesn't take too much code.

On Nov 7, 1:34 pm, "mp" <mailpitc...@email.comwrote:
I have html document titles with characters like &gt;, &nbsp;, and
&#135. How do I decode a string with these values in Python?

Thanks
Nov 7 '06 #3
At Tuesday 7/11/2006 17:10, mp wrote:
>I'd prefer a more generalized solution which takes care of all possible
ampersand characters. I assume that there is code already written which
does this.
Try the htmlentitydefs module
--
Gabriel Genellina
Softlab SRL

__________________________________________________
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ˇgratis!
ˇAbrí tu cuenta ya! - http://correo.yahoo.com.ar
Nov 7 '06 #4

Dennis Lee Bieber wrote:
On 7 Nov 2006 11:34:32 -0800, "mp" <ma*********@email.comdeclaimed the
following in comp.lang.python:
I have html document titles with characters like &gt;, &nbsp;, and
&#135. How do I sddecode a string with these values in Python?

Wouldn't HTMLParser be suited for such activity?
--
Wulfraed Dennis Lee Bieber KD6MOG
wl*****@ix.netcom.com wu******@bestiaria.com
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: we******@bestiaria.com)
HTTP://www.bestiaria.com/
Use htmlentitydefs and SGMLParser to re-generate it .

Nov 8 '06 #5
mp wrote:
I have html document titles with characters like &gt;, &nbsp;, and
&#135. How do I decode a string with these values in Python?

Thanks

This is definitely the most FAQ. It comes up about once a week.

The stream-editing way is like this:
>>import SE
HTM_Decoder = SE.SE ('htm2iso.se') # Include path
>>test_string = '''I have html document titles with characters like &gt;, &nbsp;, and
‡. How do I decode a string with these values in Python?'''
>>print HTM_Decoder (test_string)
I have html document titles with characters like >, , and
‡. How do I decode a string with these values in Python?

An SE object does files too.
>>HTM_Decoder ('with_codes.txt', 'translated_codes.txt') # Include path
You could download SE from -http://cheeseshop.python.org/pypi/SE/2.3. The translation definitions file "htm2iso.se" is included. If you open it in your editor, you can see how to write your own definition files for other translation tasks you may have some other time.

Regards

Frederic

Nov 8 '06 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: Safalra | last post by:
The idea here is relatively simple: a java program (I'm using JDK1.4 if that makes a difference) that loads an HTML file, removes invalid characters (or replaces them in the case of common ones...
18
by: james | last post by:
Hi, I am loading a CSV file ( Comma Seperated Value) into a Richtext box. I have a routine that splits the data up when it hits the "," and then copies the results into a listbox. The data also...
5
by: Timothy Madden | last post by:
Hello Is there a function that will allow me to output text written in utf-8 (from db for example) if my document has Content-Type: text/html; charset=ISO-8859-1 I mean htmlspecialchars()...
17
by: =?Utf-8?B?R2Vvcmdl?= | last post by:
Hello everyone, Wide character and multi-byte character are two popular encoding schemes on Windows. And wide character is using unicode encoding scheme. But each time I feel confused when...
10
by: Paul W | last post by:
Hi all, I have an application that reads data in from a text file and stores it in a database. My problem is that there are some characters in the file that aren't being handled properly. For...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.