473,480 Members | 3,106 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Convert xml symbol notation

Hi,

I'm working on a script to download and parse a web page, and it
includes xml symbol notation, such as ' for the ' character. Does
anyone know of a pre-existing python script/lib to convert the xml
notation back to the actual symbol it represents?

Apr 6 '07 #1
6 1889
dumbkiwi wrote:
I'm working on a script to download and parse a web page, and it
includes xml symbol notation, such as ' for the ' character. Does
anyone know of a pre-existing python script/lib to convert the xml
notation back to the actual symbol it represents?
Try the htmlentitydefs module.

--
Gabriel Genellina

Apr 7 '07 #2
On Apr 7, 5:23 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.arwrote:
dumbkiwi wrote:
I'm working on a script to download and parse a web page, and it
includes xml symbol notation, such as ' for the ' character. Does
anyone know of a pre-existing python script/lib to convert the xml
notation back to the actual symbol it represents?

Try the htmlentitydefs module.
Is that a standard module? I can't see it anywhere - googled it.
Apr 7 '07 #3
dumbkiwi wrote:
On Apr 7, 5:23 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.arwrote:
>>Try the htmlentitydefs module.

Is that a standard module? I can't see it anywhere - googled it.
Sure! For quite a while, at least, since Python 1.5 (I can't go earlier
in time...)
http://svn.python.org/view/python/tr...lentitydefs.py
Added Wed Sep 27 16:22:08 1995 UTC (11 years, 6 months ago) by guido

--
Gabriel Genellina
Apr 7 '07 #4
>I'm working on a script to download and parse a web page, and it
>includes xml symbol notation, such as ' for the ' character. Does
anyone know of a pre-existing python script/lib to convert the xml
notation back to the actual symbol it represents?

Try the htmlentitydefs module.
That won't help: this is a character reference, not an entity reference.
htmlentitydefs only contains the definitions of entities.

Regards,
Martin
Apr 7 '07 #5
I'm working on a script to download and parse a web page, and it
includes xml symbol notation, such as ' for the ' character. Does
anyone know of a pre-existing python script/lib to convert the xml
notation back to the actual symbol it represents?
If you have this given in an XML file (rather than an HTML file which
is not well-formed XML), you could use an XML parser for the entire
file. This would automatically unescape character references. Likewise,
you can parse it with HTMLParser, which will invoke the handle_charref
method for these.

If you just want to unescape references, you can use the code in

http://effbot.org/zone/re-sub.htm

HTH,
Martin
Apr 7 '07 #6
Martin v. Löwis wrote:
I'm working on a script to download and parse a web page, and it
includes xml symbol notation, such as ' for the ' character. Does
Try the htmlentitydefs module.

That won't help: this is a character reference, not an entity reference.
htmlentitydefs only contains the definitions of entities.
Ouch! Sorry.

--
Gabriel Genellina

Apr 7 '07 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
21993
by: Brad Moore | last post by:
Hey all, I'm getting the following compiler error from my code. I was wondering if anyone could help me understand the concept behind it (I actually did try and compile this degenerate...
10
19585
by: | last post by:
Hey I am calling Convert.ToDouble(someString); But this double can have to ways or representing doubles dependant on the locale, it can be 1.0 or 1,0 Is there a way to make...
19
11284
by: caramel | last post by:
i've been working on this program forever! now i'm stuck and going insane because i keep getting a syntax error msg and i just can't see what the compiler is signaling to! #include <stdio.h>...
0
1255
by: Craig | last post by:
Hi All, I am working on a .NET application and I rendering data into a Bitmap object. We draw symbols with a font that is selected by the customer. The current version draws the symbol using...
5
9175
by: scottrm | last post by:
I have a asp.net web service built in c# which is accepting a string parameter from a vb6 client. The string parameter contains some xml. I am attempting to convert the string to a byte array using...
17
71333
by: Terry Jolly | last post by:
New to C# ---- How do I convert a Date to int? In VB6: Dim lDate as long lDate = CLng(Date) In C#
10
10168
by: Mike9900 | last post by:
Hello, I need a regular expression to match a currency with its symbol, for example Pound66.99 must return 66.99 or Pound(66.99) or Pound-66.99 or -66.99Pound return -66.99 or any other...
0
250
by: Terry Reedy | last post by:
Peter Bulychev wrote: I believe you will have to make up your own translation dictionary for the translations *you* want. You should then be able to use that with the .translate() method. tjr
14
13415
by: rtillmore | last post by:
Hello, I did a quick google search and nothing that was returned is quite what I am looking for. I have a 200 character hexadecimal string that I need to convert into a 100 character string. ...
0
7055
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
6920
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
1
6763
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
7030
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
5367
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
4799
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
3015
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
1313
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
574
muto222
php
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.