473,382 Members | 1,441 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

unescape HTML entities

Hi,

How can I unescape HTML entities like " "?

I know about xml.sax.saxutils.unescape() but it only deals with "&",
"<", and ">".

Also, I know about htmlentitydefs.entitydefs, but not only this
dictionary is the opposite of what I need, it does not have " ".

It has to be in python 2.4.

Thanks a lot,
Ray

Oct 28 '06 #1
4 7684
Jim

Rares Vernica wrote:
How can I unescape HTML entities like " "?
Can I ask what you mean by "unescaping"? Do you mean converting into
numeric references? Into Unicode?

Jim

Oct 28 '06 #2
Rares Vernica wrote:
How can I unescape HTML entities like " "?

I know about xml.sax.saxutils.unescape() but it only deals with
"&", "<", and ">".

Also, I know about htmlentitydefs.entitydefs, but not only this
dictionary is the opposite of what I need, it does not have
" ".
How about something like:

#v+
#!/usr/bin/env/python
'''dehtml.py'''

import re
import htmlentitydef

myrx = re.compile('&(' + '|'.join(htmlentitydefs.name2codepoint.keys()) + ');')

def dehtml(s):
return re.sub(
myrx,
lambda m: unichr(htmlentitydefs.name2codepoint[m.group(1)]),
s
)
# end def dehtml

if __name__ == '__main__':
import sys
print dehtml(sys.stdin.read()).encode('utf-8')
# end if

#v-

E.g.:

#v+

$ echo 'frække frølår' | ./dehtml.py
frække frølår
$

#v-

--
Klaus Alexander Seistrup
Copenhagen, Denmark, EU
http://klaus.seistrup.dk/
Oct 28 '06 #3
Hi,

How does your code deal with ' like entities?

Thanks,
Ray

Klaus Alexander Seistrup wrote:
Rares Vernica wrote:
>How can I unescape HTML entities like " "?

I know about xml.sax.saxutils.unescape() but it only deals with
"&", "<", and ">".

Also, I know about htmlentitydefs.entitydefs, but not only this
dictionary is the opposite of what I need, it does not have
" ".

How about something like:

#v+
#!/usr/bin/env/python
'''dehtml.py'''

import re
import htmlentitydef

myrx = re.compile('&(' + '|'.join(htmlentitydefs.name2codepoint.keys()) + ');')

def dehtml(s):
return re.sub(
myrx,
lambda m: unichr(htmlentitydefs.name2codepoint[m.group(1)]),
s
)
# end def dehtml

if __name__ == '__main__':
import sys
print dehtml(sys.stdin.read()).encode('utf-8')
# end if

#v-

E.g.:

#v+

$ echo 'frække frølår' | ./dehtml.py
frække frølår
$

#v-
Nov 1 '06 #4
Rares Vernica wrote:
How does your code deal with ' like entities?
It doesn't, it deals with named entities only. But take a look
at Fredrik's example.

Cheers,

--
Klaus Alexander Seistrup
København, Danmark, EU
http://klaus.seistrup.dk/
Nov 1 '06 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: micha | last post by:
my php script gets delivered text that contains special chars (like german umlauts), and these chars may, may partially or may not be coverted into html entities already. i don't know beforhand. ...
7
by: Robert Oschler | last post by:
Is there a module/function to remove all the HTML entities from an HTML document (e.g. - &nbsp, &amp, &apos, etc.)? If not I'll just write one myself but I figured I'd save myself some time. ...
4
by: Geoff Wilkins | last post by:
I must confess I only come here when I have a problem - so my apologies if this has been raised before: Using my IE v.6 browser, document.write doesn't convert HTML entities (e.g. ', &) to...
2
by: Beat Richli | last post by:
Hello i have following problem with ASP (using Interdev, Win2003 Server): if a special character is entered in a textbox, ASP or the Client Browser (IE 6) seems to convert this character in HTML...
0
by: David W. Fenton | last post by:
Well, today I needed to process some data for upload to a web page and it needed higher ASCII characters encoded as HTML entities. So, I wrote a function to do the job, which works with a table...
2
by: Joergen Bech | last post by:
Is there a function in the .Net 1.1 framework that will take, say, a string containing Scandinavian characters and output the corret HTML entities, such as æ ø å etc.
3
by: Sebastian Mark | last post by:
I have a simple question, is there an easy way in .NET to unescape HTML characters? for example I have a string "Siebel Analytics Sr Consultant-"Partner w/ Deloitte & have the oppty to...
8
by: Steven D'Aprano | last post by:
I have a string containing Latin-1 characters: s = u"© and many more..." I want to convert it to HTML entities: result => "© and many more..." Decimal/hex escapes would be...
6
by: clintonG | last post by:
Can anybody make sense of this crazy and inconsistent results? // IE7 Feed Reading View disabled displays this raw XML <?xml version="1.0" encoding="utf-8" ?> <!-- AT&T HTML entities & XML...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.