X-No-Archive: yes
Hi, I've found lots of material on the net about unicode html
conversions, but still i'm having many problems converting unicode
characters to html entities. Is there any available function to solve
this issue?
As an example I would like to do this kind of conversion:
\uc3B4 =ô
for all available html entities.
thanks,
lorenzo 8 2808 lo************* *@gmail.com wrote:
X-No-Archive: yes
Hi, I've found lots of material on the net about unicode html
conversions, but still i'm having many problems converting unicode
characters to html entities. Is there any available function to solve
this issue?
As an example I would like to do this kind of conversion:
\uc3B4 =ô
for all available html entities.
thanks,
lorenzo
no expertise with unicode issues but using 'pytextile' at the minute
which converts non-ascii to (numeric) html entities. It does something
like:
>>s =unicode('\xe7' , encoding='latin-1') s
u'\xe7'
>>print s
ç
>>print s.encode('ascii ','xmlcharrefre place')
ç http://wiki.python.org/moin/PyTextile
hth
Gerard
Sybren Stuvel wrote:
lo************* *@gmail.com enlightened us with:
As an example I would like to do this kind of conversion:
\uc3B4 =ô
for all available html entities.
Why would you want that? Just make sure you declare your document as
UTF-8, encode it as such, and you're done. Much easier.
For example, I am programming a script that makes html pages, but I do
not have the ability to change the "Content-Type .. charset=.." line
that is sent preceeding those pages.
Jim
Sybren Stuvel wrote:
Jim enlightened us with:
For example, I am programming a script that makes html pages, but I
do not have the ability to change the "Content-Type .. charset=.."
line that is sent preceeding those pages.
"line"? Are you talking about the HTTP header? If it is wrong, it
should be corrected. If you are in control of the content, you should
also be control of the Content-Type header. Otherwise, use a <meta>
tag that describes the content.
Ah, but I cannot change it. It is not my machine and the folks who own
the machine perceive that the charset line that they use is the right
one for them. (Many people ship pages off this machine.)
Unfortunately, the <metatag idea also does not fly: see http://www.w3.org/TR/html4/charset.html
in section 5.2.2 where it states that in a contest the charset
parameter wins.
My only point is that things are complicated and that there are times
when HTML entities are the answer (or anyway, an answer).
Jim
Sybren Stuvel wrote:
Jim enlightened us with:
Ah, but I cannot change it. It is not my machine and the folks who
own the machine perceive that the charset line that they use is the
right one for them.
Well, _you_ are the one providing the content, aren't you?
? This site has many people operating off of it (it is
sourceforge-like) and the operators (who are volunteers) are kind
enough to let us use it in the first place. I presume that they think
the charset line that they use is the one that most people want.
Probably if they changed it then someone else would complain.
Sounds like they either don't know what they are talking about, or use
incompetent software. With Apache, it's very easy to give every
directory its own default character encoding header.
I am operating under constraints. Asking the operators of the site has
led to the understanding that I must work with the charset parameter
that I have. That is, I have an environment in which I must work, and
whether you or I think the people providing the service should do it
differently doesn't matter. I replied originally because I thought I
could give an example of HTML entities providing a way that I can solve
the problem that is entirely under my control.
Unfortunately, the <metatag idea also does not fly: see http://www.w3.org/TR/html4/charset.html in section 5.2.2 where it
states that in a contest the charset parameter wins.
I assume that with "the charset parameter" you mean "the HTTP header",
as the <metatag also has a "charset parameter".
AIUI "charset parameter" is the language of the HTML standard that I
referred to. For the meta tag, I at least would use "charset
attribute".
My only point is that things are complicated
Call me thick, but from my point of view they aren't.
;-)
Jim
Hi, I've found lots of material on the net about unicode html
conversions, but still i'm having many problems converting unicode
characters to html entities. Is there any available function to solve
this issue?
As an example I would like to do this kind of conversion:
\uc3B4 =ô
'&#%d;' % ord(u'\u0430')
or
'&#x%x;' % ord(u'\u0430')
for all available html entities.
--
damjan lo************* *@gmail.com wrote:
Hi, I've found lots of material on the net about unicode html
conversions, but still i'm having many problems converting unicode
characters to html entities. Is there any available function to solve
this issue?
As an example I would like to do this kind of conversion:
\uc3B4 =ô
for all available html entities.
I don't know how you generate your HTML, but ElementTree and lxml both have
good HTML parsers, so that you can let them write out the result with an
"US-ASCII" encoding and they will generate numeric entities for everything
that's not ASCII.
>>from lxml import etree root = etree.HTML(my_h tml_data) html_7_bit = etree.tostring( root, "us-ascii")
Stefan
wrote:
As an example I would like to do this kind of conversion:
\uc3B4 =ô
for all available html entities.
>>u"\u3cB4".enc ode('ascii','xm lcharrefreplace ')
'㲴'
Don't bother using named entities. If you encode your unicode as ascii
replacing all non-ascii characters with the xml entity reference then your
pages will display fine whatever encoding is specified in the HTTP headers.
Sybren Stuvel wrote:
Duncan Booth enlightened us with:
>Don't bother using named entities. If you encode your unicode as ascii replacing all non-ascii characters with the xml entity reference then your pages will display fine whatever encoding is specified in the HTTP headers.
Which means OP can't use Unicode/UTF-8 entity references, since that's
not specified in the HTTP header.
That doesn't matter, character references are not affected by the network
encoding.
From http://www.w3.org/TR/html4/charset.html#h-5.3.1
5.3.1 Numeric character references
Numeric character references specify the code position of a character
in the document character set.
The character references use the *document character set*, which is
independant of the character encoding used for network transmission. This
is defined for HTML as ISO10646, and (section 5.1) "The character set
defined in [ISO10646] is character-by-character equivalent to Unicode
([UNICODE])". This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Bill Eldridge |
last post by:
I'm trying to grab a document off the Web and toss it
into a MySQL database, but I keep running into the
various encoding problems with Unicode (that aren't
a problem for me with GB2312, BIG 5, etc.)
What I'd like is something as simple as:
CREATE TABLE junk (junklet VARCHAR(2500) CHARACTER SET UTF8));
import MySQLdb, re,urllib
|
by: S. |
last post by:
if in my website i am using the sgml { notation, is it accurate
to say to my users that the site uses unicode or that it requires
unicode?
is there a mathematical formula to calculate a unicode value given its
utf8 value?
Rgds,
Sam
|
by: Zenobia |
last post by:
Recently I was editing a document in GoLive 6. I like GoLive because it has some nice
features such as:
* rewrite source code
* check syntax
* global search & replace (through several files at once)
* regular expression search & replace.
Normally my documents are encoded with the ISO setting.
Recently I was writing an XHTML document. After changing the encoding to UTF-8 I used the
|
by: hunterb |
last post by:
I have a file which has no BOM and contains mostly single byte chars. There
are numerous double byte chars (Japanese) which appear throughout. I need to
take the resulting Unicode and store it in a DB and display it onscreen. No
matter which way I open the file, convert it to Unicode/leave it as is or
what ever, I see all single bytes ok, but double bytes become 2 seperate
single bytes. Surely there is an easy way to convert these mixed...
|
by: Patrick Van Esch |
last post by:
Hello,
I have the following problem of principle:
in writing HTML pages containing ancient greek, there are two
possibilities: one is to write the unicode characters directly
(encoded as two bytes) into the HTML source, and save this source not
as an ASCII text, but as a UNICODE text file (using 16 bits per
character, also for the Western ASCII characters, which are usually
encoded as Ox00XX with XX the ASCII code) ; or to write a pure...
| |
by: webdev |
last post by:
lo all,
some of the questions i'll ask below have most certainly been discussed
already, i just hope someone's kind enough to answer them again to help
me out..
so i started a python 2.3 script that grabs some web pages from the web,
regex parse the data and stores it localy to xml file for further use..
at first i had no problem using python minidom and everything concerning
|
by: dalei |
last post by:
My question is presented more clearly in following web page:
http://www.pinyinology.com/signs2.html
<html>
HTML entities display outside script tags:
a¹, a², a³, a⁴
But unicode doesn't display outside script tags:
a\xb2, a\xb3, a\u2074
|
by: David Dvali |
last post by:
Hello.
I have a problem with sending Unicode text in mail message.
So what I do:
First of all I have some template file like this:
=================================
<html>
<head><title>Test Message</title></head>
<body>
<p>Hello {0}</p>
|
by: Frantic |
last post by:
I'm working on a list of japaneese entities that contain the entity,
the unicode hexadecimal code and the xml/sgml entity used for that
entity. A unicode document is read into the program, then the program
sorts out every doublet and the hexadecimal unicode code is extracted,
but I dont know a way to find the xml or sgml-entity equivalent to the
unicode code. Anyone who could give me a pointer?
Best regards
|
by: pratik.best |
last post by:
Hi,
I just seen the web site of the unicode committee and was amazed to see
the site showing document in Hindi without using any such fonts like
"Kruti Dev" or "Dev Lys". "Webdunia.com" is also showing documents in
Hindi without the need to download any specific font. How's that done?
Also, can I build such a page?
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
| |
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
| |
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
|
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |