473,728 Members | 1,852 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Normalize a polish L

In UTF8, \u0141 is a capital L with a little dash through it as can be
seen in this image:
http://static.peterbe.com/lukasz.png

I tried this:
>>import unicodedata
unicodedata.n ormalize('NFKD' , u'\u0141').enco de('ascii','ign ore')
''

I was hoping it would convert it it 'L' because that's what it
visually looks like. And I've seen it becoming a normal ascii L before
in other programs such as Thunderbird.

I also tried the other forms: 'NFC', 'NFKC', 'NFD', and 'NFKD' but
none of them helped.

What am I doing wrong?

Oct 15 '07 #1
9 4279
* Peter Bengtsson (Mon, 15 Oct 2007 16:33:26 -0000)
In UTF8, \u0141 is a capital L with a little dash through it as can be
seen in this image:
http://static.peterbe.com/lukasz.png
I tried this:
>import unicodedata
unicodedata.no rmalize('NFKD', u'\u0141').enco de('ascii','ign ore')
''

I was hoping it would convert it it 'L' because that's what it
visually looks like. And I've seen it becoming a normal ascii L before
in other programs such as Thunderbird.
The 'L' is actually pronounced like the English "w"...
I also tried the other forms: 'NFC', 'NFKC', 'NFD', and 'NFKD' but
none of them helped.
>>unicodedata.d ecomposition(u' \N{LATIN CAPITAL LETTER C WITH CEDILLA}')
'0043 0327'
>>unicodedata.n ormalize('NFKD' , u'\N{LATIN CAPITAL LETTER C WITH CEDILLA}').enco de('ascii','ign ore')
'C'
>>unicodedata.d ecomposition(u' \N{LATIN CAPITAL LETTER L WITH STROKE}')
''
Oct 15 '07 #2
Thorsten Kampe wrote:
The 'L' is actually pronounced like the English "w"...
'?' originally comes from "L" (<http://en.wikipedia.or g/wiki/?>) and
is AFAIK transcribed so.

Also, a friend of mine writes himself "Lukas" (pronounced L-) even
though in Polish his name is ?ukas (short Wh-).

Regards,
Björn

--
BOFH excuse #126:

it has Intel Inside

Oct 15 '07 #3
Peter Bengtsson <pe*****@gmail. comwrites:
In UTF8, \u0141 is a capital L with a little dash through it as can be
seen in this image:
http://static.peterbe.com/lukasz.png

I tried this:
>>>import unicodedata
unicodedata. normalize('NFKD ', u'\u0141').enco de('ascii','ign ore')
''

I was hoping it would convert it it 'L' because that's what it
visually looks like. And I've seen it becoming a normal ascii L before
in other programs such as Thunderbird.

I also tried the other forms: 'NFC', 'NFKC', 'NFD', and 'NFKD' but
none of them helped.

What am I doing wrong?
I had the same problem and my little research revealed that the problem
is caused by unicode standard itself. I don't know why
but characters with stroke don't have canonical equivalent.
I looked into this file:
http://unicode.org/Public/UNIDATA/UnicodeData.txt

and compared two positions:

1.
<UnicodeData.tx t>
0142;LATIN SMALL LETTER L WITH STROKE;Ll;0;L;; ;;;N;LATIN SMALL LETTER L SLASH \
;;0141;;0141
0141;LATIN CAPITAL LETTER L WITH STROKE;Lu;0;L;; ;;;N;LATIN CAPITAL LETTER L SLASH \
;;;0142;
</UnicodeData.txt >

2.
<UnicodeData.tx t>
0105;LATIN SMALL LETTER A WITH OGONEK;Ll;0;L;0 061 0328;;;;N;LATIN SMALL LETTER A OGONEK \
;;0104;;0104
</UnicodeData.txt >

In the second position there is in the 6-th field canonical equivalent
but in the 1-st there is nothing. I don't know what justification
is behind that, but probably there is something. ;)
Regards,
Rob
Oct 15 '07 #4
* Bjoern Schliessmann (Mon, 15 Oct 2007 21:51:54 +0200)
Thorsten Kampe wrote:
The 'L' is actually pronounced like the English "w"...

'?' originally comes from "L" (<http://en.wikipedia.or g/wiki/?>) and
is AFAIK transcribed so.
There are lots of possible transcriptions for "LATIN CAPITAL LETTER L
WITH STROKE". Transcription is language dependent so the English and
German transcriptions of Polish names are different.
Also, a friend of mine writes himself "Lukas" (pronounced L-) even
though in Polish his name is ?ukas (short Wh-).
Why do you try to use characters in a character set that does not
contain these characters? That doesn't make any sense.
Thorsten
Oct 15 '07 #5
On Oct 16, 2:33 am, Peter Bengtsson <pete...@gmail. comwrote:
In UTF8, \u0141 is a capital L with a little dash through it as can be
seen in this image:http://static.peterbe.com/lukasz.png

I tried this:>>import unicodedata
>unicodedata.no rmalize('NFKD', u'\u0141').enco de('ascii','ign ore')

''

I was hoping it would convert it it 'L' because that's what it
visually looks like. And I've seen it becoming a normal ascii L before
in other programs such as Thunderbird.

I also tried the other forms: 'NFC', 'NFKC', 'NFD', and 'NFKD' but
none of them helped.

What am I doing wrong?
The character in question is NOT composed (in the way that Unicode
means) of an 'L' and a little slash; hence the concepts of
"normalizat ion" and "decomposit ion" don't apply.

To "asciify" such text, you need to build a look-up table that suits
your purpose. unicodedata.dec omposition() is (accidentally) useful in
providing *some* of the entries for such a table.
Oct 15 '07 #6
Thorsten Kampe wrote:
Why do you try to use characters in a character set that does not
contain these characters? That doesn't make any sense.
I thought KNode was smart enough to switch to UTF-8; obviously, it
isn't.

Regards,
Björn

--
BOFH excuse #121:

halon system went off and killed the operators.

Oct 15 '07 #7
Thorsten Kampe wrote:
The 'L' is actually pronounced like the English "w"...
'?' originally comes from "L" (<http://en.wikipedia.or g/wiki/?>) and
is AFAIK transcribed so.

Also, a friend of mine writes himself "Lukas" (pronounced L-) even
though in Polish his name is Łukas (short Wh-).

Regards,
Björn

--
BOFH excuse #126:

it has Intel Inside

Oct 15 '07 #8
On Oct 15, 10:57 pm, John Machin <sjmac...@lexic on.netwrote:
On Oct 16, 2:33 am, Peter Bengtsson <pete...@gmail. comwrote:
In UTF8, \u0141 is a capital L with a little dash through it as can be
seen in this image:http://static.peterbe.com/lukasz.png
I tried this:>>import unicodedata
>>unicodedata.n ormalize('NFKD' , u'\u0141').enco de('ascii','ign ore')
''
I was hoping it would convert it it 'L' because that's what it
visually looks like. And I've seen it becoming a normal ascii L before
in other programs such as Thunderbird.
I also tried the other forms: 'NFC', 'NFKC', 'NFD', and 'NFKD' but
none of them helped.
What am I doing wrong?

The character in question is NOT composed (in the way that Unicode
means) of an 'L' and a little slash; hence the concepts of
"normalizat ion" and "decomposit ion" don't apply.

To "asciify" such text, you need to build a look-up table that suits
your purpose. unicodedata.dec omposition() is (accidentally) useful in
providing *some* of the entries for such a table.
Thank you! That explains it.

Oct 16 '07 #9
On Oct 22, 7:50 pm, Mike Orr <sluggos...@gma il.comwrote:
Well, that gets into official vs unofficial conversions. Does the
Spanish Academy really say 'ü' should be converted to 'u'?
No, but it's the only conversion that makes sense. The only Spanish
letter that doesn't have a standard common conversion by convention
is 'ñ', which is usually ASCIIfied as n, nn, gn, nh, ni, ny, ~n, n~,
or N, with all of them being frequently seen on the Internet.
But whether that should be hardcoded
into a blog URL library is different matter, and if it is there should
probably be plugin tables for different preferred standards.
Actually there is a hardcoded conversion, that is dropping all
accented letters altogether, which is IMHO the worst possible
convention. I have a gallery of pictures of Valparaíso and Viña del
Mar whose URL is .../ValparaSoViADel Mar. And if I wrote a blog entry
about pingüinos and ñandúes, it would appear probably as .../ping-inos-
and-and-es. Ugly and off-topic :)

--
Roberto Bonvallet

Oct 23 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
4146
by: Christos TZOTZIOY Georgiou | last post by:
I found at least one case where decombining and recombining a unicode character does not result in the same character (see at end). I have no extensive knowledge about Unicode, yet I believe that this must be a problem of the Unicode 3.2 specification and not Python's. However, I haven't found out how the decomp_data (in unicodedata_db.h) is built, and neither did I find much more info about the specifics of Unicode 3.2. I thought about...
7
8796
by: Philip Kofoed | last post by:
Greetings, I have a SQL server 2000 running on an english win2000 workstation. In a database I have a table where one varchar column is set to polish collation. Regional settings for the system is polish. Data entered in a client application looks fine until they are posted. When reading the data with the client application, the special polish characters are incorrect, they appears as e.g. '1' and '3'. The strange thing is that when I...
2
1769
by: Marcin Floryan | last post by:
Hello! How can I create an Installer entirely in Polish language using Deployment Project in VB.NET (I have already translated the available texts into Polish). I have .NET 2003 EN and I read I have to download some addins/components but have got no clue where from? regards
0
2358
by: Larry Neylon | last post by:
Hi there, I'm currently trying to implement a website that will store and retrieve Polish, so I need to be able to handle Polish characters using classic ASP with MySql5. Does anybody have an experience of doing this as I'm banging my head against a brick wall getting this to work using either utf8 or latin2. I can't believe I'm the first person in the world to want to do this! I've input Polish characters directly into the database...
2
5629
by: AmigoFd | last post by:
Hello, This problem is really driving me crazy ... * I have a mySql database which is latin1_swedish_ci * In my web.config I have: <globalization requestEncoding="ISO-8859-2" responseEncoding="ISO-8859-2" fileEncoding="iso-8859-2" culture="pl-PL" uiCulture="pl-PL" />
4
2730
by: kollatjorva | last post by:
Hi all I'm trying to get a value from an xml node 'Publisher' use the value as a name of an .css class. This works fine until I get a value from the Publisher node with white space in it. I've been trying to use normalize-space function on this but I can't make this work here is what I'm trying to do
5
10354
by: =?iso-8859-1?B?TWF0dGlhcyBCcuRuZHN0cvZt?= | last post by:
Hello! I'm trying to find what package I should use if I want to: 1. Create 3d vectors. 2. Normalize those vectors. 3. Create a 3x3 rotation matrix from a unit 3-d vector and an angle in radians. 4. Perform matrix multiplication.
4
3687
by: robert.szczepanski | last post by:
Hi everybody; I can't change polish sign to small letter. This is my php script: <?php setlocale(LC_ALL, "pl_PL.UTF-8") ; //this function return "pl_PL.UTF-8"
8
5135
by: Werner Partner | last post by:
I would like to write correct poloish letters, e.g. in the following page: http://www.kairos-team.de/?lang=pl There are such letters as ł, ń, ę, and so on. I found these letter sin polish wikipedia, I looked at the source code, but there is nothing to be seen, what I can do to make these letters. thanks for help
0
8753
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9409
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
9188
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9121
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8110
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6704
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4787
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
2642
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2159
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.