[perl-python] unicode study with unicodedata module

Xah Lee

python has this nice unicodedata module that deals with unicode nicely.

#-*- coding: utf-8 -*-
# python

from unicodedata import *

# each unicode char has a unique name.
# one can use the â€œlookupâ€ func to find it

mychar=lookup('greek cApital letter sIgma')
# note letter case doesn't matter
print mychar.encode('utf-8')

m=lookup('CJK UNIFIED IDEOGRAPH-5929')
# for some reason, case must be right here.
print m.encode('utf-8')

# to find a char's name, use the â€œnameâ€ function
print name(u'å¤©')

basically, in unicode, each char has a number of attributes (called
properties) besides its name. These attributes provides necessary info
to form letters, words, or processing such as sorting, capitalization,
etc, of varous human scripts. For example, Latin alphabets has two
forms of upper case and lower case. Korean alphabets are stacked
together. While many symbols corresponds to numbers, and there are also

combining forms used for example to put a bar over any letter or
character. Also some writings systems are directional. In order to form

these symbols for display or process them for computing, info of these
on each char is necessary.

the rest of functions in unicodedata return these attributes.

see unicodedata doc:
http://python.org/doc/2.4/lib/module-unicodedata.html

Official word on unicode character properties:
http://www.unicode.org/uni2book/ch04.pdf

--
i don't know what's the state of Perl's unicode. Is there something
similar?

--
this post is archived at
http://xahlee.org/perl-python/unicodedata_module.html

Xah
xa*@xahlee.org
http://xahlee.org/PageTwo_dir/more.html

Jul 18 '05 #1

Subscribe Post Reply

1867

Xah Lee

how do i get a unicode's number?

e.g. 03ba for greek lowercase kappa? (or in decimal form)

Xah
Xah Lee wrote:

python has this nice unicodedata module that deals with unicode nicely.
#-*- coding: utf-8 -*-
# python

from unicodedata import *

# each unicode char has a unique name.
# one can use the â€œlookupâ€ func to find it

mychar=lookup('greek cApital letter sIgma')
# note letter case doesn't matter
print mychar.encode('utf-8')

m=lookup('CJK UNIFIED IDEOGRAPH-5929')
# for some reason, case must be right here.
print m.encode('utf-8')

# to find a char's name, use the â€œnameâ€ function
print name(u'å¤©')

basically, in unicode, each char has a number of attributes (called
properties) besides its name. These attributes provides necessary info to form letters, words, or processing such as sorting, capitalization, etc, of varous human scripts. For example, Latin alphabets has two
forms of upper case and lower case. Korean alphabets are stacked
together. While many symbols corresponds to numbers, and there are also
combining forms used for example to put a bar over any letter or
character. Also some writings systems are directional. In order to form
these symbols for display or process them for computing, info of these on each char is necessary.

the rest of functions in unicodedata return these attributes.

see unicodedata doc:
http://python.org/doc/2.4/lib/module-unicodedata.html

Official word on unicode character properties:
http://www.unicode.org/uni2book/ch04.pdf

--
i don't know what's the state of Perl's unicode. Is there something
similar?

--
this post is archived at
http://xahlee.org/perl-python/unicodedata_module.html

Xah
xa*@xahlee.org
http://xahlee.org/PageTwo_dir/more.html

Jul 18 '05 #2

Christos TZOTZIOY Georgiou

On 15 Mar 2005 04:55:17 -0800, rumours say that "Xah Lee" <xa*@xahlee.org> might
have written:

how do i get a unicode's number?

e.g. 03ba for greek lowercase kappa? (or in decimal form)

you get the character with:

..>> uc = u"\N{GREEK SMALL LETTER KAPPA}"

or with

..>> uc = unicodedata.lookup("GREEK SMALL LETTER KAPPA")

and you get the ordinal with:

..>> ord(uc)

ord works for strings and unicode.
--
TZOTZIOY, I speak England very best.
"Be strict when sending and tolerant when receiving." (from RFC1958)
I really should keep that in mind when talking with people, actually...

Jul 18 '05 #3

Brian McCauley

Xah Lee wrote:

i don't know what's the state of Perl's unicode.

perldoc perlunicode

Jul 18 '05 #4

Xah Lee

here's a snippet of code that prints a range of unicode chars, along
with their ordinal in hex, and name.

chars without a name are skipped. (some of such are undefined code
points.)

On Microsoft Windows the encoding might need to be changed to utf-16.

Change the range to see different unicode chars.

# -*- coding: utf-8 -*-

from unicodedata import *

l=[]
for i in range(0x0000, 0x0fff):
l.append(eval('u"\\u%04x"' % i))

for x in l:
if name(x,'-')!='-':
print x.encode('utf-8'),'|', "%04x"%(ord(x)), '|', name(x,'-')
--
http://xahlee.org/perl-python/unicodedata_module.html

anyone wants to supply a Perl version?

Xah
xa*@xahlee.org
http://xahlee.org/PageTwo_dir/more.html

Brian McCauley wrote:

Xah Lee wrote:
i don't know what's the state of Perl's unicode.

perldoc perlunicode

Jul 18 '05 #5

Xah Lee

Fuck google incorporated for editing my subject name without
permission.

and fuck google incorporated for editing my message content without
permission.

http://xahlee.org/UnixResource_dir/w...e_license.html

Xah
xa*@xahlee.org
http://xahlee.org/PageTwo_dir/more.html

Jul 18 '05 #6

by: Mark Wilson CPU | last post by:

This must be easy, but I'm missing something... I want to execute a Perl script, and capture ALL its output into a PHP variable. Here are my 2 files: -------------------------------------...

PHP

Java vs Perl for specific tasks

by: John Smith | last post by:

Hello, I have a rather odd question. My company is an all java/oracle shop. We do everything is Java... no matter what it is... parsing of text files, messaging, gui you name it. My question...

Java

Experts on embedding Perl in C wanted: Weird problem on RH7.3/Perl 5.6.1

by: David F. Skoll | last post by:

Hi, I'm tearing my hair out on this one. I'm trying to embed a Perl interpreter into a C program. I need to be able to create and destroy the interpreter periodically, but will never actually...

Perl

accessing different versions of perl

by: Julia Bell | last post by:

I would like to run the same script on two different platforms. The directory in which the script(s) will be stored is common to the two platforms. (I see the same directory contents regardless...

Perl

perl

by: sm00thcrimnl13 | last post by:

if i have windows 2000 and know how to write perl scripts, how to i actuvate the script through perl?

Perl

Perl vs Java for specific task

by: John Smith | last post by:

Hello, I have a rather odd question. My company is an all java/oracle shop. We do everything is Java... no matter what it is... parsing of text files, messaging, gui you name it. My question...

Perl

Dates with Perl

by: Firewalker | last post by:

Hey guys, I am newbie to perl. I am trying to deal with dates ( such trying to find what the date would be after month). Is therea function or date class( I am a java programmer, I couldnt find...

Perl

Perl DBI/XML processing versus PHP ?

by: surfivor | last post by:

I may be involved in a data migration project involving databases and creating XML feeds. Our site is PHP based, so I imagine the team might suggest PHP, but I had a look at the PHP documentation...

PHP

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

[perl-python] unicode study with unicodedata module

Similar topics