I have a list of about 2500 html escape sequences (decimal) that I need
to convert to utf-8. Stuff like:
비
행
기
로
보
낼
거
에
요
내
면
금
이
얼
마
지
잠
Anyone know what the decimal is representing? It doesn't seem to
equate to a unicode codepoint...
culley 3 7294
harrelson wrote: I have a list of about 2500 html escape sequences (decimal) that I need to convert to utf-8. Stuff like:
비 행 기 로 보 낼 거 에 요 내 면 금 이 얼 마 지 잠
Anyone know what the decimal is representing? It doesn't seem to equate to a unicode codepoint...
In well-formed HTML (!) these should be the decimal values of Unicode characters. See http://www.w3.org/TR/html4/charset.html#h-5.3.1
These characters appear to be Hangul Syllables: http://www.unicode.org/charts/PDF/UAC00.pdf
import unicodedata
nums = [
48708,
54665,
44592,
47196,
48372,
45244,
44144,
50640,
50836,
45236,
47732,
44552,
51060,
50620,
47560,
51648,
51104,
]
for num in nums:
print num, unicodedata.nam e(unichr(num), 'Unknown')
=>
48708 HANGUL SYLLABLE BI
54665 HANGUL SYLLABLE HAENG
44592 HANGUL SYLLABLE GI
47196 HANGUL SYLLABLE RO
48372 HANGUL SYLLABLE BO
45244 HANGUL SYLLABLE NAEL
44144 HANGUL SYLLABLE GEO
50640 HANGUL SYLLABLE E
50836 HANGUL SYLLABLE YO
45236 HANGUL SYLLABLE NAE
47732 HANGUL SYLLABLE MYEON
44552 HANGUL SYLLABLE GEUM
51060 HANGUL SYLLABLE I
50620 HANGUL SYLLABLE EOL
47560 HANGUL SYLLABLE MA
51648 HANGUL SYLLABLE JI
51104 HANGUL SYLLABLE JAM
Kent
On Fri, 2004-12-10 at 08:36, harrelson wrote: I have a list of about 2500 html escape sequences (decimal) that I need to convert to utf-8. Stuff like:
I'm pretty sure this somewhat horrifying code does it, but is probably
an example of what not to do: escapeseq = '비' uescape = ("\\u%x" % int(escapeseq[2:-1])).decode("unic ode_escape") uescape
u'\ube44' print uescape
비
(I don't seem to have the font for it, but I think that's right - my
terminal font seems to show it correctly).
I just get the decimal value of the escape, format it as a Python
unicode hex escape sequence, and tell Python to interpret it as an
escaped unicode string.
entities = ['비', '행', '기', '로',
'보', '낼', '거', '에', '요', '내',
'면', '금', '이', '얼', '마', '지',
'잠'] def unescape(escape seq):
.... return ("\\u%x" % int(escapeseq[2:-1])).decode("unic ode_escape")
.... print ' '.join([ unescape(x) for x in entities ])
비 행 기 로 보 낼 거 에 요 내 면 금 이 얼 마 지 *
--
Craig Ringer
On Fri, 2004-12-10 at 16:09, Craig Ringer wrote: On Fri, 2004-12-10 at 08:36, harrelson wrote: I have a list of about 2500 html escape sequences (decimal) that I need to convert to utf-8. Stuff like:
I'm pretty sure this somewhat horrifying code does it, but is probably an example of what not to do:
It is. Sorry. I initially misread Kent Johnson's post. He just used
'unichr()'. Colour me an idiot. If you ever need to know the hard way to
build a unicode character...
--
Craig Ringer This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: SwordAngel |
last post by:
Hello,
I'm looking for a program that converts characters of different
encodings (such as EUC-JP, Big5, GB-18030, etc.) into HTML ampersand
escape sequences. Anybody knows where I can find one?
thx.
|
by: Patrick Van Esch |
last post by:
Hello,
I have the following problem of principle:
in writing HTML pages containing ancient greek, there are two
possibilities: one is to write the unicode characters directly
(encoded as two bytes) into the HTML source, and save this source not
as an ASCII text, but as a UNICODE text file (using 16 bits per
character, also for the Western...
|
by: chri_schiller |
last post by:
I have a home-made website that provides a free
1100 page physics textbook. It is written in html and
css. I recently added some chinese text, and
since that day there are problems.
The entry page has two chinese characters,
but these are not seen on all browsers, even
though the page is validated by
the w3c validator.
(...
|
by: pkaeowic |
last post by:
I am having a problem with the "escape" character \e. This code is in my
Windows form KeyPress event. The compiler gives me "unrecognized escape
sequence" even though this is documented in MSDN. Any idea if this is a bug?
if (e.KeyChar == '\e')
{
this.Close();
}
|
by: Lawrence D'Oliveiro |
last post by:
The "escape" function in the "cgi" module escapes characters with special
meanings in HTML. The ones that need escaping are '<', '&' and '"'.
However, cgi.escape only escapes the quote character if you pass a second
argument of True (the default is False):
'the "quick" & <brown> fox'
'the "quick" & <brown> fox'
This seems to me to be...
| |
by: jeffejohnson |
last post by:
I'm looking to see if anyone has experienced this...
I've got a dropdown that I'm populating dynamically and the items
include HTML special characters (like Ô). If I load them from an
existing JavaScript array I don't have any problems, but I'm generating
the arrays dynamically, then populating my dropdown dynamically with
the onload...
|
by: Michael Goerz |
last post by:
Hi,
I am writing unicode stings into a special text file that requires to
have non-ascii characters as as octal-escaped UTF-8 codes.
For example, the letter "Í" (latin capital I with acute, code point 205)
would come out as "\303\215".
I will also have to read back from the file later on and convert the
escaped characters back into a...
|
by: |
last post by:
I mainly work on OS X, but thought I'd experiment with some Python code on XP. The
problem is I can't seem to get these things to work at all.
First of all, I'd like to use Greek letters in the command prompt window, so I was going to
use unicode to do this. But in the command prompt, the unicode characters are displaying
as strange...
|
by: John Ztwin |
last post by:
Hello,
I have a file that contains ordinary text and some special charaters in
Unicode escape sequences (\uxxxx).
When I read the file using e.g. StreamReader Unicode escape sequences are
not converted to their character representation. They are shown excatly same
way than in file. Literals in C# code's variables are shown corretly.
...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, well explore What is ONU, What Is Router, ONU & Routers main...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language...
| |
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...
| |