Hi,
How can I unescape HTML entities like " "?
I know about xml.sax.saxutils.unescape() but it only deals with "&",
"<", and ">".
Also, I know about htmlentitydefs.entitydefs, but not only this
dictionary is the opposite of what I need, it does not have " ".
It has to be in python 2.4.
Thanks a lot,
Ray 4 7676
Rares Vernica wrote:
How can I unescape HTML entities like " "?
Can I ask what you mean by "unescaping"? Do you mean converting into
numeric references? Into Unicode?
Jim
Rares Vernica wrote:
How can I unescape HTML entities like " "?
I know about xml.sax.saxutils.unescape() but it only deals with
"&", "<", and ">".
Also, I know about htmlentitydefs.entitydefs, but not only this
dictionary is the opposite of what I need, it does not have
" ".
How about something like:
#v+
#!/usr/bin/env/python
'''dehtml.py'''
import re
import htmlentitydef
myrx = re.compile('&(' + '|'.join(htmlentitydefs.name2codepoint.keys()) + ');')
def dehtml(s):
return re.sub(
myrx,
lambda m: unichr(htmlentitydefs.name2codepoint[m.group(1)]),
s
)
# end def dehtml
if __name__ == '__main__':
import sys
print dehtml(sys.stdin.read()).encode('utf-8')
# end if
#v-
E.g.:
#v+
$ echo 'frække frølår' | ./dehtml.py
frække frølår
$
#v-
--
Klaus Alexander Seistrup
Copenhagen, Denmark, EU http://klaus.seistrup.dk/
Hi,
How does your code deal with ' like entities?
Thanks,
Ray
Klaus Alexander Seistrup wrote:
Rares Vernica wrote:
>How can I unescape HTML entities like " "?
I know about xml.sax.saxutils.unescape() but it only deals with "&", "<", and ">".
Also, I know about htmlentitydefs.entitydefs, but not only this dictionary is the opposite of what I need, it does not have " ".
How about something like:
#v+
#!/usr/bin/env/python
'''dehtml.py'''
import re
import htmlentitydef
myrx = re.compile('&(' + '|'.join(htmlentitydefs.name2codepoint.keys()) + ');')
def dehtml(s):
return re.sub(
myrx,
lambda m: unichr(htmlentitydefs.name2codepoint[m.group(1)]),
s
)
# end def dehtml
if __name__ == '__main__':
import sys
print dehtml(sys.stdin.read()).encode('utf-8')
# end if
#v-
E.g.:
#v+
$ echo 'frække frølår' | ./dehtml.py
frække frølår
$
#v-
Rares Vernica wrote:
How does your code deal with ' like entities?
It doesn't, it deals with named entities only. But take a look
at Fredrik's example.
Cheers,
--
Klaus Alexander Seistrup
København, Danmark, EU http://klaus.seistrup.dk/ This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: micha |
last post by:
my php script gets delivered text that contains special chars (like
german umlauts), and these chars may, may partially or may not be
coverted into html entities already. i don't know beforhand.
...
|
by: Robert Oschler |
last post by:
Is there a module/function to remove all the HTML entities from an HTML
document (e.g. -  , &, &apos, etc.)?
If not I'll just write one myself but I figured I'd save myself some time.
...
|
by: Geoff Wilkins |
last post by:
I must confess I only come here when I have a problem - so my apologies
if this has been raised before:
Using my IE v.6 browser, document.write doesn't convert HTML entities
(e.g. ', &) to...
|
by: Beat Richli |
last post by:
Hello
i have following problem with ASP (using Interdev, Win2003 Server): if a
special character is entered in a textbox, ASP or the Client Browser (IE 6)
seems to convert this character in HTML...
|
by: David W. Fenton |
last post by:
Well, today I needed to process some data for upload to a web page
and it needed higher ASCII characters encoded as HTML entities.
So, I wrote a function to do the job, which works with a table...
|
by: Joergen Bech |
last post by:
Is there a function in the .Net 1.1 framework
that will take, say, a string containing Scandinavian
characters and output the corret HTML entities, such
as
æ
ø
å
etc.
|
by: Sebastian Mark |
last post by:
I have a simple question, is there an easy way in .NET
to unescape HTML characters?
for example I have a string
"Siebel Analytics Sr Consultant-&quot;Partner w/ Deloitte &amp; have
the oppty to...
|
by: Steven D'Aprano |
last post by:
I have a string containing Latin-1 characters:
s = u"© and many more..."
I want to convert it to HTML entities:
result =>
"© and many more..."
Decimal/hex escapes would be...
|
by: clintonG |
last post by:
Can anybody make sense of this crazy and inconsistent results?
// IE7 Feed Reading View disabled displays this raw XML
<?xml version="1.0" encoding="utf-8" ?>
<!-- AT&T HTML entities & XML...
|
by: ryjfgjl |
last post by:
ExcelToDatabase: batch import excel into database automatically...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: Vimpel783 |
last post by:
Hello!
Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
|
by: ArrayDB |
last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
|
by: PapaRatzi |
last post by:
Hello,
I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
|
by: Defcon1945 |
last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
|
by: Shællîpôpï 09 |
last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
|
by: af34tf |
last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
|
by: Faith0G |
last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
| |