473,757 Members | 6,899 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

unicode html

X-No-Archive: yes
Hi, I've found lots of material on the net about unicode html
conversions, but still i'm having many problems converting unicode
characters to html entities. Is there any available function to solve
this issue?
As an example I would like to do this kind of conversion:
\uc3B4 =ô
for all available html entities.

thanks,
lorenzo

Jul 17 '06 #1
8 2808

lo************* *@gmail.com wrote:
X-No-Archive: yes
Hi, I've found lots of material on the net about unicode html
conversions, but still i'm having many problems converting unicode
characters to html entities. Is there any available function to solve
this issue?
As an example I would like to do this kind of conversion:
\uc3B4 =ô
for all available html entities.

thanks,
lorenzo
no expertise with unicode issues but using 'pytextile' at the minute
which converts non-ascii to (numeric) html entities. It does something
like:
>>s =unicode('\xe7' , encoding='latin-1')
s
u'\xe7'
>>print s
ç
>>print s.encode('ascii ','xmlcharrefre place')
ç
http://wiki.python.org/moin/PyTextile
hth

Gerard

Jul 17 '06 #2
Jim

Sybren Stuvel wrote:
lo************* *@gmail.com enlightened us with:
As an example I would like to do this kind of conversion:
\uc3B4 =ô
for all available html entities.

Why would you want that? Just make sure you declare your document as
UTF-8, encode it as such, and you're done. Much easier.
For example, I am programming a script that makes html pages, but I do
not have the ability to change the "Content-Type .. charset=.." line
that is sent preceeding those pages.

Jim

Jul 17 '06 #3
Jim
Sybren Stuvel wrote:
Jim enlightened us with:
For example, I am programming a script that makes html pages, but I
do not have the ability to change the "Content-Type .. charset=.."
line that is sent preceeding those pages.

"line"? Are you talking about the HTTP header? If it is wrong, it
should be corrected. If you are in control of the content, you should
also be control of the Content-Type header. Otherwise, use a <meta>
tag that describes the content.
Ah, but I cannot change it. It is not my machine and the folks who own
the machine perceive that the charset line that they use is the right
one for them. (Many people ship pages off this machine.)

Unfortunately, the <metatag idea also does not fly: see
http://www.w3.org/TR/html4/charset.html
in section 5.2.2 where it states that in a contest the charset
parameter wins.

My only point is that things are complicated and that there are times
when HTML entities are the answer (or anyway, an answer).

Jim

Jul 17 '06 #4
Jim
Sybren Stuvel wrote:
Jim enlightened us with:
Ah, but I cannot change it. It is not my machine and the folks who
own the machine perceive that the charset line that they use is the
right one for them.

Well, _you_ are the one providing the content, aren't you?
? This site has many people operating off of it (it is
sourceforge-like) and the operators (who are volunteers) are kind
enough to let us use it in the first place. I presume that they think
the charset line that they use is the one that most people want.
Probably if they changed it then someone else would complain.
Sounds like they either don't know what they are talking about, or use
incompetent software. With Apache, it's very easy to give every
directory its own default character encoding header.
I am operating under constraints. Asking the operators of the site has
led to the understanding that I must work with the charset parameter
that I have. That is, I have an environment in which I must work, and
whether you or I think the people providing the service should do it
differently doesn't matter. I replied originally because I thought I
could give an example of HTML entities providing a way that I can solve
the problem that is entirely under my control.
Unfortunately, the <metatag idea also does not fly: see
http://www.w3.org/TR/html4/charset.html in section 5.2.2 where it
states that in a contest the charset parameter wins.

I assume that with "the charset parameter" you mean "the HTTP header",
as the <metatag also has a "charset parameter".
AIUI "charset parameter" is the language of the HTML standard that I
referred to. For the meta tag, I at least would use "charset
attribute".
My only point is that things are complicated

Call me thick, but from my point of view they aren't.
;-)

Jim

Jul 17 '06 #5
Hi, I've found lots of material on the net about unicode html
conversions, but still i'm having many problems converting unicode
characters to html entities. Is there any available function to solve
this issue?
As an example I would like to do this kind of conversion:
\uc3B4 =&ocirc;
'&#%d;' % ord(u'\u0430')

or

'&#x%x;' % ord(u'\u0430')
for all available html entities.

--
damjan
Jul 17 '06 #6
lo************* *@gmail.com wrote:
Hi, I've found lots of material on the net about unicode html
conversions, but still i'm having many problems converting unicode
characters to html entities. Is there any available function to solve
this issue?
As an example I would like to do this kind of conversion:
\uc3B4 =&ocirc;
for all available html entities.
I don't know how you generate your HTML, but ElementTree and lxml both have
good HTML parsers, so that you can let them write out the result with an
"US-ASCII" encoding and they will generate numeric entities for everything
that's not ASCII.
>>from lxml import etree
root = etree.HTML(my_h tml_data)
html_7_bit = etree.tostring( root, "us-ascii")
Stefan
Jul 18 '06 #7
wrote:
As an example I would like to do this kind of conversion:
\uc3B4 =&ocirc;
for all available html entities.
>>u"\u3cB4".enc ode('ascii','xm lcharrefreplace ')
'㲴'

Don't bother using named entities. If you encode your unicode as ascii
replacing all non-ascii characters with the xml entity reference then your
pages will display fine whatever encoding is specified in the HTTP headers.
Jul 18 '06 #8
Sybren Stuvel wrote:
Duncan Booth enlightened us with:
>Don't bother using named entities. If you encode your unicode as
ascii replacing all non-ascii characters with the xml entity
reference then your pages will display fine whatever encoding is
specified in the HTTP headers.

Which means OP can't use Unicode/UTF-8 entity references, since that's
not specified in the HTTP header.
That doesn't matter, character references are not affected by the network
encoding.

From http://www.w3.org/TR/html4/charset.html#h-5.3.1
5.3.1 Numeric character references

Numeric character references specify the code position of a character
in the document character set.
The character references use the *document character set*, which is
independant of the character encoding used for network transmission. This
is defined for HTML as ISO10646, and (section 5.1) "The character set
defined in [ISO10646] is character-by-character equivalent to Unicode
([UNICODE])".
Jul 18 '06 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
5276
by: Bill Eldridge | last post by:
I'm trying to grab a document off the Web and toss it into a MySQL database, but I keep running into the various encoding problems with Unicode (that aren't a problem for me with GB2312, BIG 5, etc.) What I'd like is something as simple as: CREATE TABLE junk (junklet VARCHAR(2500) CHARACTER SET UTF8)); import MySQLdb, re,urllib
6
2785
by: S. | last post by:
if in my website i am using the sgml { notation, is it accurate to say to my users that the site uses unicode or that it requires unicode? is there a mathematical formula to calculate a unicode value given its utf8 value? Rgds, Sam
48
4641
by: Zenobia | last post by:
Recently I was editing a document in GoLive 6. I like GoLive because it has some nice features such as: * rewrite source code * check syntax * global search & replace (through several files at once) * regular expression search & replace. Normally my documents are encoded with the ISO setting. Recently I was writing an XHTML document. After changing the encoding to UTF-8 I used the
3
7773
by: hunterb | last post by:
I have a file which has no BOM and contains mostly single byte chars. There are numerous double byte chars (Japanese) which appear throughout. I need to take the resulting Unicode and store it in a DB and display it onscreen. No matter which way I open the file, convert it to Unicode/leave it as is or what ever, I see all single bytes ok, but double bytes become 2 seperate single bytes. Surely there is an easy way to convert these mixed...
11
3665
by: Patrick Van Esch | last post by:
Hello, I have the following problem of principle: in writing HTML pages containing ancient greek, there are two possibilities: one is to write the unicode characters directly (encoded as two bytes) into the HTML source, and save this source not as an ASCII text, but as a UNICODE text file (using 16 bits per character, also for the Western ASCII characters, which are usually encoded as Ox00XX with XX the ASCII code) ; or to write a pure...
4
6071
by: webdev | last post by:
lo all, some of the questions i'll ask below have most certainly been discussed already, i just hope someone's kind enough to answer them again to help me out.. so i started a python 2.3 script that grabs some web pages from the web, regex parse the data and stores it localy to xml file for further use.. at first i had no problem using python minidom and everything concerning
3
2687
by: dalei | last post by:
My question is presented more clearly in following web page: http://www.pinyinology.com/signs2.html <html> HTML entities display outside script tags: a&sup1;, a&sup2;, a&sup3;, a⁴ But unicode doesn't display outside script tags: a\xb2, a\xb3, a\u2074
1
6919
by: David Dvali | last post by:
Hello. I have a problem with sending Unicode text in mail message. So what I do: First of all I have some template file like this: ================================= <html> <head><title>Test Message</title></head> <body> <p>Hello {0}</p>
2
2803
by: Frantic | last post by:
I'm working on a list of japaneese entities that contain the entity, the unicode hexadecimal code and the xml/sgml entity used for that entity. A unicode document is read into the program, then the program sorts out every doublet and the hexadecimal unicode code is extracted, but I dont know a way to find the xml or sgml-entity equivalent to the unicode code. Anyone who could give me a pointer? Best regards
3
29727
by: pratik.best | last post by:
Hi, I just seen the web site of the unicode committee and was amazed to see the site showing document in Hindi without using any such fonts like "Kruti Dev" or "Dev Lys". "Webdunia.com" is also showing documents in Hindi without the need to download any specific font. How's that done? Also, can I build such a page?
0
9489
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10072
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9906
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9885
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9737
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8737
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
3829
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
3399
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2698
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.