473,408 Members | 1,973 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,408 software developers and data experts.

html escape sequences

Hi,

I'd like to replace html escape sequences, like &nbsp and &#39 with
single characters. Is there a dictionary defined somewhere I can use to
replace these sequences?

Thanks,

Will McGugan
Jul 18 '05 #1
2 2195
Will McGugan wrote:
I'd like to replace html escape sequences, like &nbsp and &#39 with
single characters. Is there a dictionary defined somewhere I can use to
replace these sequences?


How about this?

import re
from htmlentitydefs import name2codepoint

_entity_re = re.compile(r'&(?:(#)(\d+)|([^;]+));')

def _repl_func(match):
if match.group(1): # Numeric character reference
return unichr(int(match.group(2)))
else:
return unichr(name2codepoint[match.group(3)])

def handle_html_entities(string):
return _entity_re.sub(_repl_func, string)
Jul 18 '05 #2
Leif K-Brooks wrote:
Will McGugan wrote:
I'd like to replace html escape sequences, like &nbsp and &#39 with
single characters. Is there a dictionary defined somewhere I can use
to replace these sequences?

How about this?

import re
from htmlentitydefs import name2codepoint

_entity_re = re.compile(r'&(?:(#)(\d+)|([^;]+));')

def _repl_func(match):
if match.group(1): # Numeric character reference
return unichr(int(match.group(2)))
else:
return unichr(name2codepoint[match.group(3)])

def handle_html_entities(string):
return _entity_re.sub(_repl_func, string)


muchas gracias!

Will McGugan
Jul 18 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: harrelson | last post by:
I have a list of about 2500 html escape sequences (decimal) that I need to convert to utf-8. Stuff like: 비 행 기 로 보 낼 거
6
by: Walter L. Preuninger II | last post by:
I need to convert escape sequences entered into my program to the actual code. For example, \r becomes 0x0d I have looked over the FAQ, and searched the web, with no results. Is there a...
7
by: teachtiro | last post by:
Hi, 'C' says \ is the escape character to be used when characters are to be interpreted in an uncommon sense, e.g. \t usage in printf(), but for printing % through printf(), i have read that %%...
18
by: Steve Litvack | last post by:
Hello, I have built an XMLDocument object instance and I get the following string when I examine the InnerXml property: <?xml version=\"1.0\"?><ROOT><UserData UserID=\"2282\"><Tag1...
3
by: Ken | last post by:
HI: I'm reading a string that will be displayed in a MessageBox from a resource file. The string in the resource file contains escape sequences so they will be broken up into multiple lines. ...
3
by: Don | last post by:
I am building a string from a combination of hardcoded string literals and user input (via textbox). I know about using @"c:\temp\filename.txt" to ignore escape sequences. Now let's say I have a...
5
by: nummertolv | last post by:
Hi, My application is receiving strings, representing windows paths, from an external source. When using these paths, by for instance printing them using str() (print path), the backslashes are...
4
by: JJ | last post by:
Is there a way of checking that a line with escape sequences in it, has no strings in it (apart from the escape sequences)? i.e. a line with \n\t\t\t\t\t\t\t\r\n would have no string in it a...
5
by: John Ztwin | last post by:
Hello, I have a file that contains ordinary text and some special charaters in Unicode escape sequences (\uxxxx). When I read the file using e.g. StreamReader Unicode escape sequences are not...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.