Convert xml symbol notation

dumbkiwi

Hi,

I'm working on a script to download and parse a web page, and it
includes xml symbol notation, such as ' for the ' character. Does
anyone know of a pre-existing python script/lib to convert the xml
notation back to the actual symbol it represents?

Apr 6 '07 #1

Subscribe Reply

1889

Gabriel Genellina

dumbkiwi wrote:

I'm working on a script to download and parse a web page, and it
includes xml symbol notation, such as ' for the ' character. Does
anyone know of a pre-existing python script/lib to convert the xml
notation back to the actual symbol it represents?

Try the htmlentitydefs module.

--
Gabriel Genellina

Apr 7 '07 #2

dumbkiwi

On Apr 7, 5:23 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.arwrote:

dumbkiwi wrote:
I'm working on a script to download and parse a web page, and it
includes xml symbol notation, such as ' for the ' character. Does
anyone know of a pre-existing python script/lib to convert the xml
notation back to the actual symbol it represents?

Try the htmlentitydefs module.

Is that a standard module? I can't see it anywhere - googled it.

Apr 7 '07 #3

Gabriel Genellina

dumbkiwi wrote:

On Apr 7, 5:23 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.arwrote:

>>Try the htmlentitydefs module.

Is that a standard module? I can't see it anywhere - googled it.

Sure! For quite a while, at least, since Python 1.5 (I can't go earlier
in time...)
http://svn.python.org/view/python/tr...lentitydefs.py
Added Wed Sep 27 16:22:08 1995 UTC (11 years, 6 months ago) by guido

--
Gabriel Genellina

Apr 7 '07 #4

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

>I'm working on a script to download and parse a web page, and it

>includes xml symbol notation, such as ' for the ' character. Does
anyone know of a pre-existing python script/lib to convert the xml
notation back to the actual symbol it represents?

Try the htmlentitydefs module.

That won't help: this is a character reference, not an entity reference.
htmlentitydefs only contains the definitions of entities.

Regards,
Martin

Apr 7 '07 #5

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

I'm working on a script to download and parse a web page, and it

includes xml symbol notation, such as ' for the ' character. Does
anyone know of a pre-existing python script/lib to convert the xml
notation back to the actual symbol it represents?

If you have this given in an XML file (rather than an HTML file which
is not well-formed XML), you could use an XML parser for the entire
file. This would automatically unescape character references. Likewise,
you can parse it with HTMLParser, which will invoke the handle_charref
method for these.

If you just want to unescape references, you can use the code in

http://effbot.org/zone/re-sub.htm

HTH,
Martin

Apr 7 '07 #6

Gabriel Genellina

Martin v. Löwis wrote:

I'm working on a script to download and parse a web page, and it
includes xml symbol notation, such as ' for the ' character. Does
Try the htmlentitydefs module.

That won't help: this is a character reference, not an entity reference.
htmlentitydefs only contains the definitions of entities.

Ouch! Sorry.

--
Gabriel Genellina

Apr 7 '07 #7

Similar topics

21993

cannot convert char** to const char**

by: Brad Moore | last post by:

Hey all, I'm getting the following compiler error from my code. I was wondering if anyone could help me understand the concept behind it (I actually did try and compile this degenerate...

C / C++

19585

Convert.ToDouble() problem

by: | last post by:

Hey I am calling Convert.ToDouble(someString); But this double can have to ways or representing doubles dependant on the locale, it can be 1.0 or 1,0 Is there a way to make...

C# / C Sharp

11284

convert infix to postfix

by: caramel | last post by:

i've been working on this program forever! now i'm stuck and going insane because i keep getting a syntax error msg and i just can't see what the compiler is signaling to! #include <stdio.h>...

C / C++

1255

Pixel To Point Convert in .NET?

by: Craig | last post by:

Hi All, I am working on a .NET application and I rendering data into a Bitmap object. We draw symbols with a font that is selected by the customer. The current version draws the symbol using...

.NET Framework

9175

problem with Convert.FromBase64String function

by: scottrm | last post by:

I have a asp.net web service built in c# which is accepting a string parameter from a vb6 client. The string parameter contains some xml. I am attempting to convert the string to a byte array using...

C# / C Sharp

71333

How to convert Date to int

by: Terry Jolly | last post by:

New to C# ---- How do I convert a Date to int? In VB6: Dim lDate as long lDate = CLng(Date) In C#

C# / C Sharp

10168

Regular Expression to parse a currency with any symbol

by: Mike9900 | last post by:

Hello, I need a regular expression to match a currency with its symbol, for example Pound66.99 must return 66.99 or Pound(66.99) or Pound-66.99 or -66.99Pound return -66.99 or any other...

C# / C Sharp

250

Re: convert unicode characters to visibly similar ascii characters

by: Terry Reedy | last post by:

Peter Bulychev wrote: I believe you will have to make up your own translation dictionary for the translations *you* want. You should then be able to use that with the .translate() method. tjr

Python

13415

convert string of hex characters to char

by: rtillmore | last post by:

Hello, I did a quick google search and nothing that was returned is quite what I am looking for. I have a 200 character hexadecimal string that I need to convert into a 100 character string. ...

C / C++

7055

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

6920

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

6763

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

7030

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

5367

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

4799

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

3015

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...

Networking - Hardware / Configuration

1313

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

C# / C Sharp

574

php

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP