473,326 Members | 2,124 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

unescape HTML entities

Hi,

How can I unescape HTML entities like " "?

I know about xml.sax.saxutils.unescape() but it only deals with "&",
"<", and ">".

Also, I know about htmlentitydefs.entitydefs, but not only this
dictionary is the opposite of what I need, it does not have " ".

It has to be in python 2.4.

Thanks a lot,
Ray

Oct 28 '06 #1
4 7676
Jim

Rares Vernica wrote:
How can I unescape HTML entities like " "?
Can I ask what you mean by "unescaping"? Do you mean converting into
numeric references? Into Unicode?

Jim

Oct 28 '06 #2
Rares Vernica wrote:
How can I unescape HTML entities like " "?

I know about xml.sax.saxutils.unescape() but it only deals with
"&", "<", and ">".

Also, I know about htmlentitydefs.entitydefs, but not only this
dictionary is the opposite of what I need, it does not have
" ".
How about something like:

#v+
#!/usr/bin/env/python
'''dehtml.py'''

import re
import htmlentitydef

myrx = re.compile('&(' + '|'.join(htmlentitydefs.name2codepoint.keys()) + ');')

def dehtml(s):
return re.sub(
myrx,
lambda m: unichr(htmlentitydefs.name2codepoint[m.group(1)]),
s
)
# end def dehtml

if __name__ == '__main__':
import sys
print dehtml(sys.stdin.read()).encode('utf-8')
# end if

#v-

E.g.:

#v+

$ echo 'frække frølår' | ./dehtml.py
frække frølår
$

#v-

--
Klaus Alexander Seistrup
Copenhagen, Denmark, EU
http://klaus.seistrup.dk/
Oct 28 '06 #3
Hi,

How does your code deal with ' like entities?

Thanks,
Ray

Klaus Alexander Seistrup wrote:
Rares Vernica wrote:
>How can I unescape HTML entities like " "?

I know about xml.sax.saxutils.unescape() but it only deals with
"&", "<", and ">".

Also, I know about htmlentitydefs.entitydefs, but not only this
dictionary is the opposite of what I need, it does not have
" ".

How about something like:

#v+
#!/usr/bin/env/python
'''dehtml.py'''

import re
import htmlentitydef

myrx = re.compile('&(' + '|'.join(htmlentitydefs.name2codepoint.keys()) + ');')

def dehtml(s):
return re.sub(
myrx,
lambda m: unichr(htmlentitydefs.name2codepoint[m.group(1)]),
s
)
# end def dehtml

if __name__ == '__main__':
import sys
print dehtml(sys.stdin.read()).encode('utf-8')
# end if

#v-

E.g.:

#v+

$ echo 'frække frølår' | ./dehtml.py
frække frølår
$

#v-
Nov 1 '06 #4
Rares Vernica wrote:
How does your code deal with ' like entities?
It doesn't, it deals with named entities only. But take a look
at Fredrik's example.

Cheers,

--
Klaus Alexander Seistrup
København, Danmark, EU
http://klaus.seistrup.dk/
Nov 1 '06 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: micha | last post by:
my php script gets delivered text that contains special chars (like german umlauts), and these chars may, may partially or may not be coverted into html entities already. i don't know beforhand. ...
7
by: Robert Oschler | last post by:
Is there a module/function to remove all the HTML entities from an HTML document (e.g. - &nbsp, &amp, &apos, etc.)? If not I'll just write one myself but I figured I'd save myself some time. ...
4
by: Geoff Wilkins | last post by:
I must confess I only come here when I have a problem - so my apologies if this has been raised before: Using my IE v.6 browser, document.write doesn't convert HTML entities (e.g. ', &) to...
2
by: Beat Richli | last post by:
Hello i have following problem with ASP (using Interdev, Win2003 Server): if a special character is entered in a textbox, ASP or the Client Browser (IE 6) seems to convert this character in HTML...
0
by: David W. Fenton | last post by:
Well, today I needed to process some data for upload to a web page and it needed higher ASCII characters encoded as HTML entities. So, I wrote a function to do the job, which works with a table...
2
by: Joergen Bech | last post by:
Is there a function in the .Net 1.1 framework that will take, say, a string containing Scandinavian characters and output the corret HTML entities, such as æ ø å etc.
3
by: Sebastian Mark | last post by:
I have a simple question, is there an easy way in .NET to unescape HTML characters? for example I have a string "Siebel Analytics Sr Consultant-"Partner w/ Deloitte & have the oppty to...
8
by: Steven D'Aprano | last post by:
I have a string containing Latin-1 characters: s = u"© and many more..." I want to convert it to HTML entities: result => "© and many more..." Decimal/hex escapes would be...
6
by: clintonG | last post by:
Can anybody make sense of this crazy and inconsistent results? // IE7 Feed Reading View disabled displays this raw XML <?xml version="1.0" encoding="utf-8" ?> <!-- AT&T HTML entities & XML...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.