In UTF8, \u0141 is a capital L with a little dash through it as can be
seen in this image: http://static.peterbe.com/lukasz.png
I tried this:
>>import unicodedata unicodedata.normalize('NFKD', u'\u0141').encode('ascii','ignore')
''
I was hoping it would convert it it 'L' because that's what it
visually looks like. And I've seen it becoming a normal ascii L before
in other programs such as Thunderbird.
I also tried the other forms: 'NFC', 'NFKC', 'NFD', and 'NFKD' but
none of them helped.
What am I doing wrong? 9 4236
* Peter Bengtsson (Mon, 15 Oct 2007 16:33:26 -0000)
In UTF8, \u0141 is a capital L with a little dash through it as can be
seen in this image: http://static.peterbe.com/lukasz.png
I tried this:
>import unicodedata unicodedata.normalize('NFKD', u'\u0141').encode('ascii','ignore')
''
I was hoping it would convert it it 'L' because that's what it
visually looks like. And I've seen it becoming a normal ascii L before
in other programs such as Thunderbird.
The 'L' is actually pronounced like the English "w"...
I also tried the other forms: 'NFC', 'NFKC', 'NFD', and 'NFKD' but
none of them helped.
>>unicodedata.decomposition(u'\N{LATIN CAPITAL LETTER C WITH CEDILLA}')
'0043 0327'
>>unicodedata.normalize('NFKD', u'\N{LATIN CAPITAL LETTER C WITH CEDILLA}').encode('ascii','ignore')
'C'
>>unicodedata.decomposition(u'\N{LATIN CAPITAL LETTER L WITH STROKE}')
''
Thorsten Kampe wrote:
The 'L' is actually pronounced like the English "w"...
'?' originally comes from "L" (<http://en.wikipedia.org/wiki/?>) and
is AFAIK transcribed so.
Also, a friend of mine writes himself "Lukas" (pronounced L-) even
though in Polish his name is ?ukas (short Wh-).
Regards,
Bjrn
--
BOFH excuse #126:
it has Intel Inside
Peter Bengtsson <pe*****@gmail.comwrites:
In UTF8, \u0141 is a capital L with a little dash through it as can be
seen in this image: http://static.peterbe.com/lukasz.png
I tried this:
>>>import unicodedata unicodedata.normalize('NFKD', u'\u0141').encode('ascii','ignore')
''
I was hoping it would convert it it 'L' because that's what it
visually looks like. And I've seen it becoming a normal ascii L before
in other programs such as Thunderbird.
I also tried the other forms: 'NFC', 'NFKC', 'NFD', and 'NFKD' but
none of them helped.
What am I doing wrong?
I had the same problem and my little research revealed that the problem
is caused by unicode standard itself. I don't know why
but characters with stroke don't have canonical equivalent.
I looked into this file: http://unicode.org/Public/UNIDATA/UnicodeData.txt
and compared two positions:
1.
<UnicodeData.txt>
0142;LATIN SMALL LETTER L WITH STROKE;Ll;0;L;;;;;N;LATIN SMALL LETTER L SLASH \
;;0141;;0141
0141;LATIN CAPITAL LETTER L WITH STROKE;Lu;0;L;;;;;N;LATIN CAPITAL LETTER L SLASH \
;;;0142;
</UnicodeData.txt>
2.
<UnicodeData.txt>
0105;LATIN SMALL LETTER A WITH OGONEK;Ll;0;L;0061 0328;;;;N;LATIN SMALL LETTER A OGONEK \
;;0104;;0104
</UnicodeData.txt>
In the second position there is in the 6-th field canonical equivalent
but in the 1-st there is nothing. I don't know what justification
is behind that, but probably there is something. ;)
Regards,
Rob
* Bjoern Schliessmann (Mon, 15 Oct 2007 21:51:54 +0200)
Thorsten Kampe wrote:
The 'L' is actually pronounced like the English "w"...
'?' originally comes from "L" (<http://en.wikipedia.org/wiki/?>) and
is AFAIK transcribed so.
There are lots of possible transcriptions for "LATIN CAPITAL LETTER L
WITH STROKE". Transcription is language dependent so the English and
German transcriptions of Polish names are different.
Also, a friend of mine writes himself "Lukas" (pronounced L-) even
though in Polish his name is ?ukas (short Wh-).
Why do you try to use characters in a character set that does not
contain these characters? That doesn't make any sense.
Thorsten
On Oct 16, 2:33 am, Peter Bengtsson <pete...@gmail.comwrote:
In UTF8, \u0141 is a capital L with a little dash through it as can be
seen in this image:http://static.peterbe.com/lukasz.png
I tried this:>>import unicodedata
>unicodedata.normalize('NFKD', u'\u0141').encode('ascii','ignore')
''
I was hoping it would convert it it 'L' because that's what it
visually looks like. And I've seen it becoming a normal ascii L before
in other programs such as Thunderbird.
I also tried the other forms: 'NFC', 'NFKC', 'NFD', and 'NFKD' but
none of them helped.
What am I doing wrong?
The character in question is NOT composed (in the way that Unicode
means) of an 'L' and a little slash; hence the concepts of
"normalization" and "decomposition" don't apply.
To "asciify" such text, you need to build a look-up table that suits
your purpose. unicodedata.decomposition() is (accidentally) useful in
providing *some* of the entries for such a table.
Thorsten Kampe wrote:
Why do you try to use characters in a character set that does not
contain these characters? That doesn't make any sense.
I thought KNode was smart enough to switch to UTF-8; obviously, it
isn't.
Regards,
Bjrn
--
BOFH excuse #121:
halon system went off and killed the operators.
Thorsten Kampe wrote:
The 'L' is actually pronounced like the English "w"...
'?' originally comes from "L" (<http://en.wikipedia.org/wiki/?>) and
is AFAIK transcribed so.
Also, a friend of mine writes himself "Lukas" (pronounced L-) even
though in Polish his name is Łukas (short Wh-).
Regards,
Björn
--
BOFH excuse #126:
it has Intel Inside
On Oct 15, 10:57 pm, John Machin <sjmac...@lexicon.netwrote:
On Oct 16, 2:33 am, Peter Bengtsson <pete...@gmail.comwrote:
In UTF8, \u0141 is a capital L with a little dash through it as can be
seen in this image:http://static.peterbe.com/lukasz.png
I tried this:>>import unicodedata
>>unicodedata.normalize('NFKD', u'\u0141').encode('ascii','ignore')
''
I was hoping it would convert it it 'L' because that's what it
visually looks like. And I've seen it becoming a normal ascii L before
in other programs such as Thunderbird.
I also tried the other forms: 'NFC', 'NFKC', 'NFD', and 'NFKD' but
none of them helped.
What am I doing wrong?
The character in question is NOT composed (in the way that Unicode
means) of an 'L' and a little slash; hence the concepts of
"normalization" and "decomposition" don't apply.
To "asciify" such text, you need to build a look-up table that suits
your purpose. unicodedata.decomposition() is (accidentally) useful in
providing *some* of the entries for such a table.
Thank you! That explains it.
On Oct 22, 7:50 pm, Mike Orr <sluggos...@gmail.comwrote:
Well, that gets into official vs unofficial conversions. Does the
Spanish Academy really say '' should be converted to 'u'?
No, but it's the only conversion that makes sense. The only Spanish
letter that doesn't have a standard common conversion by convention
is '', which is usually ASCIIfied as n, nn, gn, nh, ni, ny, ~n, n~,
or N, with all of them being frequently seen on the Internet.
But whether that should be hardcoded
into a blog URL library is different matter, and if it is there should
probably be plugin tables for different preferred standards.
Actually there is a hardcoded conversion, that is dropping all
accented letters altogether, which is IMHO the worst possible
convention. I have a gallery of pictures of Valparaso and Via del
Mar whose URL is .../ValparaSoViADelMar. And if I wrote a blog entry
about pinginos and andes, it would appear probably as .../ping-inos-
and-and-es. Ugly and off-topic :)
--
Roberto Bonvallet This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Christos TZOTZIOY Georgiou |
last post by:
I found at least one case where decombining and recombining a unicode
character does not result in the same character (see at end).
I have no extensive knowledge about Unicode, yet I believe that...
|
by: Philip Kofoed |
last post by:
Greetings,
I have a SQL server 2000 running on an english win2000 workstation. In a
database I have a table where one varchar column is set to polish
collation.
Regional settings for the system...
|
by: Marcin Floryan |
last post by:
Hello!
How can I create an Installer entirely in Polish language using Deployment
Project in VB.NET (I have already translated the available texts into
Polish).
I have .NET 2003 EN and I read I...
|
by: Larry Neylon |
last post by:
Hi there,
I'm currently trying to implement a website that will store and retrieve
Polish, so I need to be able to handle Polish characters using classic ASP
with MySql5.
Does anybody have an...
|
by: AmigoFd |
last post by:
Hello,
This problem is really driving me crazy ...
* I have a mySql database which is latin1_swedish_ci
* In my web.config I have:
<globalization requestEncoding="ISO-8859-2"...
|
by: kollatjorva |
last post by:
Hi all
I'm trying to get a value from an xml node 'Publisher' use the value as
a name of an .css class. This works fine until I get a value from the
Publisher node with white space in it.
I've...
|
by: =?iso-8859-1?B?TWF0dGlhcyBCcuRuZHN0cvZt?= |
last post by:
Hello!
I'm trying to find what package I should use if I want to:
1. Create 3d vectors.
2. Normalize those vectors.
3. Create a 3x3 rotation matrix from a unit 3-d vector and an angle in...
|
by: robert.szczepanski |
last post by:
Hi everybody;
I can't change polish sign to small letter.
This is my php script:
<?php
setlocale(LC_ALL, "pl_PL.UTF-8") ; //this function return
"pl_PL.UTF-8"
|
by: Werner Partner |
last post by:
I would like to write correct poloish letters, e.g. in the following page:
http://www.kairos-team.de/?lang=pl
There are such letters as ł, ń, ę, and so on.
I found these letter sin polish...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM).
In this month's session, the creator of the excellent VBE...
|
by: DolphinDB |
last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation.
Take...
|
by: DolphinDB |
last post by:
Tired of spending countless mintues downsampling your data? Look no further!
In this article, youll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
|
by: Aftab Ahmad |
last post by:
So, I have written a code for a cmd called "Send WhatsApp Message" to open and send WhatsApp messaage. The code is given below.
Dim IE As Object
Set IE =...
|
by: ryjfgjl |
last post by:
ExcelToDatabase: batch import excel into database automatically...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: marcoviolo |
last post by:
Dear all,
I would like to implement on my worksheet an vlookup dynamic , that consider a change of pivot excel via win32com, from an external excel (without open it) and save the new file into a...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: PapaRatzi |
last post by:
Hello,
I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
| |