473,406 Members | 2,387 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

html entity to unicode

Hi,

I'm parsing html. I have a page with a lot of html enitties for hebrew
characters. When i print what i get are blanks, dots and commas. How
can i decode this entities to unicode charachters?

TIA

Zunbeltz

Feb 10 '06 #1
1 2082
zu******@gmail.com schrieb:
Hi,

I'm parsing html. I have a page with a lot of html enitties for hebrew
characters. When i print what i get are blanks, dots and commas. How
can i decode this entities to unicode charachters?


Python doc

13.4 htmlentitydefs -- Definitions of HTML general entities

[...]

name2codepoint
A dictionary that maps HTML entity names to the Unicode codepoints. New in version 2.3.

codepoint2name
A dictionary that maps Unicode codepoints to HTML entity names. New in version 2.3.

Peter Maas Aachen
Feb 10 '06 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Chris Curvey | last post by:
I'm writing an XMLRPC server, which is receiving a request (from a non-Python client) that looks like this (formatted for legibility): <?xml version="1.0"?> <methodCall>...
22
by: Martin Trautmann | last post by:
Hi all, is there any kind of 'hiconv' or other (unix-like) conversion tool that would convert UTF-8 to HTML (ISO-Latin-1 and Unicode)? The database output is UTF-8 or UTF-16 only - Thus almost...
11
by: Patrick Van Esch | last post by:
Hello, I have the following problem of principle: in writing HTML pages containing ancient greek, there are two possibilities: one is to write the unicode characters directly (encoded as two...
3
by: Dale Strickland-Clark | last post by:
A colleague has asked me this and I don't know the answer. Can anyone here help with this? Thanks in advance. Here is his email: I am trying to parse an HTML document using the xml.dom.minidom...
8
by: lorenzo.viscanti | last post by:
X-No-Archive: yes Hi, I've found lots of material on the net about unicode html conversions, but still i'm having many problems converting unicode characters to html entities. Is there any...
10
by: pak.andrei | last post by:
Here is my script: from mechanize import * from BeautifulSoup import * import StringIO b = Browser() f = b.open("http://www.translate.ru/text.asp?lang=ru") b.select_form(nr=0) b = "hello...
8
by: Steven D'Aprano | last post by:
I have a string containing Latin-1 characters: s = u"© and many more..." I want to convert it to HTML entities: result => "&copy; and many more..." Decimal/hex escapes would be...
0
by: saul.baizman | last post by:
Here's a brief description of the problem. My organization has a client who cuts and pastes information from Microsoft Word documents into web-based forms, whose contents is then displayed on a...
5
by: Timothy Madden | last post by:
Hello Is there a function that will allow me to output text written in utf-8 (from db for example) if my document has Content-Type: text/html; charset=ISO-8859-1 I mean htmlspecialchars()...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.