In UTF8, \u0141 is a capital L with a little dash through it as can be
seen in this image: http://static.peterbe.com/lukasz.png
I tried this:
>>import unicodedata unicodedata.normalize('NFKD', u'\u0141').encode('ascii','ignore')
''
I was hoping it would convert it it 'L' because that's what it
visually looks like. And I've seen it becoming a normal ascii L before
in other programs such as Thunderbird.
I also tried the other forms: 'NFC', 'NFKC', 'NFD', and 'NFKD' but
none of them helped.
What am I doing wrong? 9 4247
* Peter Bengtsson (Mon, 15 Oct 2007 16:33:26 -0000)
In UTF8, \u0141 is a capital L with a little dash through it as can be
seen in this image: http://static.peterbe.com/lukasz.png
I tried this:
>import unicodedata unicodedata.normalize('NFKD', u'\u0141').encode('ascii','ignore')
''
I was hoping it would convert it it 'L' because that's what it
visually looks like. And I've seen it becoming a normal ascii L before
in other programs such as Thunderbird.
The 'L' is actually pronounced like the English "w"...
I also tried the other forms: 'NFC', 'NFKC', 'NFD', and 'NFKD' but
none of them helped.
>>unicodedata.decomposition(u'\N{LATIN CAPITAL LETTER C WITH CEDILLA}')
'0043 0327'
>>unicodedata.normalize('NFKD', u'\N{LATIN CAPITAL LETTER C WITH CEDILLA}').encode('ascii','ignore')
'C'
>>unicodedata.decomposition(u'\N{LATIN CAPITAL LETTER L WITH STROKE}')
''
Thorsten Kampe wrote:
The 'L' is actually pronounced like the English "w"...
'?' originally comes from "L" (<http://en.wikipedia.org/wiki/?>) and
is AFAIK transcribed so.
Also, a friend of mine writes himself "Lukas" (pronounced L-) even
though in Polish his name is ?ukas (short Wh-).
Regards,
Bjrn
--
BOFH excuse #126:
it has Intel Inside
Peter Bengtsson <pe*****@gmail.comwrites:
In UTF8, \u0141 is a capital L with a little dash through it as can be
seen in this image: http://static.peterbe.com/lukasz.png
I tried this:
>>>import unicodedata unicodedata.normalize('NFKD', u'\u0141').encode('ascii','ignore')
''
I was hoping it would convert it it 'L' because that's what it
visually looks like. And I've seen it becoming a normal ascii L before
in other programs such as Thunderbird.
I also tried the other forms: 'NFC', 'NFKC', 'NFD', and 'NFKD' but
none of them helped.
What am I doing wrong?
I had the same problem and my little research revealed that the problem
is caused by unicode standard itself. I don't know why
but characters with stroke don't have canonical equivalent.
I looked into this file: http://unicode.org/Public/UNIDATA/UnicodeData.txt
and compared two positions:
1.
<UnicodeData.txt>
0142;LATIN SMALL LETTER L WITH STROKE;Ll;0;L;;;;;N;LATIN SMALL LETTER L SLASH \
;;0141;;0141
0141;LATIN CAPITAL LETTER L WITH STROKE;Lu;0;L;;;;;N;LATIN CAPITAL LETTER L SLASH \
;;;0142;
</UnicodeData.txt>
2.
<UnicodeData.txt>
0105;LATIN SMALL LETTER A WITH OGONEK;Ll;0;L;0061 0328;;;;N;LATIN SMALL LETTER A OGONEK \
;;0104;;0104
</UnicodeData.txt>
In the second position there is in the 6-th field canonical equivalent
but in the 1-st there is nothing. I don't know what justification
is behind that, but probably there is something. ;)
Regards,
Rob
* Bjoern Schliessmann (Mon, 15 Oct 2007 21:51:54 +0200)
Thorsten Kampe wrote:
The 'L' is actually pronounced like the English "w"...
'?' originally comes from "L" (<http://en.wikipedia.org/wiki/?>) and
is AFAIK transcribed so.
There are lots of possible transcriptions for "LATIN CAPITAL LETTER L
WITH STROKE". Transcription is language dependent so the English and
German transcriptions of Polish names are different.
Also, a friend of mine writes himself "Lukas" (pronounced L-) even
though in Polish his name is ?ukas (short Wh-).
Why do you try to use characters in a character set that does not
contain these characters? That doesn't make any sense.
Thorsten
On Oct 16, 2:33 am, Peter Bengtsson <pete...@gmail.comwrote:
In UTF8, \u0141 is a capital L with a little dash through it as can be
seen in this image:http://static.peterbe.com/lukasz.png
I tried this:>>import unicodedata
>unicodedata.normalize('NFKD', u'\u0141').encode('ascii','ignore')
''
I was hoping it would convert it it 'L' because that's what it
visually looks like. And I've seen it becoming a normal ascii L before
in other programs such as Thunderbird.
I also tried the other forms: 'NFC', 'NFKC', 'NFD', and 'NFKD' but
none of them helped.
What am I doing wrong?
The character in question is NOT composed (in the way that Unicode
means) of an 'L' and a little slash; hence the concepts of
"normalization" and "decomposition" don't apply.
To "asciify" such text, you need to build a look-up table that suits
your purpose. unicodedata.decomposition() is (accidentally) useful in
providing *some* of the entries for such a table.
Thorsten Kampe wrote:
Why do you try to use characters in a character set that does not
contain these characters? That doesn't make any sense.
I thought KNode was smart enough to switch to UTF-8; obviously, it
isn't.
Regards,
Bjrn
--
BOFH excuse #121:
halon system went off and killed the operators.
Thorsten Kampe wrote:
The 'L' is actually pronounced like the English "w"...
'?' originally comes from "L" (<http://en.wikipedia.org/wiki/?>) and
is AFAIK transcribed so.
Also, a friend of mine writes himself "Lukas" (pronounced L-) even
though in Polish his name is Łukas (short Wh-).
Regards,
Björn
--
BOFH excuse #126:
it has Intel Inside
On Oct 15, 10:57 pm, John Machin <sjmac...@lexicon.netwrote:
On Oct 16, 2:33 am, Peter Bengtsson <pete...@gmail.comwrote:
In UTF8, \u0141 is a capital L with a little dash through it as can be
seen in this image:http://static.peterbe.com/lukasz.png
I tried this:>>import unicodedata
>>unicodedata.normalize('NFKD', u'\u0141').encode('ascii','ignore')
''
I was hoping it would convert it it 'L' because that's what it
visually looks like. And I've seen it becoming a normal ascii L before
in other programs such as Thunderbird.
I also tried the other forms: 'NFC', 'NFKC', 'NFD', and 'NFKD' but
none of them helped.
What am I doing wrong?
The character in question is NOT composed (in the way that Unicode
means) of an 'L' and a little slash; hence the concepts of
"normalization" and "decomposition" don't apply.
To "asciify" such text, you need to build a look-up table that suits
your purpose. unicodedata.decomposition() is (accidentally) useful in
providing *some* of the entries for such a table.
Thank you! That explains it.
On Oct 22, 7:50 pm, Mike Orr <sluggos...@gmail.comwrote:
Well, that gets into official vs unofficial conversions. Does the
Spanish Academy really say '' should be converted to 'u'?
No, but it's the only conversion that makes sense. The only Spanish
letter that doesn't have a standard common conversion by convention
is '', which is usually ASCIIfied as n, nn, gn, nh, ni, ny, ~n, n~,
or N, with all of them being frequently seen on the Internet.
But whether that should be hardcoded
into a blog URL library is different matter, and if it is there should
probably be plugin tables for different preferred standards.
Actually there is a hardcoded conversion, that is dropping all
accented letters altogether, which is IMHO the worst possible
convention. I have a gallery of pictures of Valparaso and Via del
Mar whose URL is .../ValparaSoViADelMar. And if I wrote a blog entry
about pinginos and andes, it would appear probably as .../ping-inos-
and-and-es. Ugly and off-topic :)
--
Roberto Bonvallet This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Christos TZOTZIOY Georgiou |
last post by:
I found at least one case where decombining and recombining a unicode
character does not result in the same character (see at end).
I have no extensive knowledge about Unicode, yet I believe that...
|
by: Philip Kofoed |
last post by:
Greetings,
I have a SQL server 2000 running on an english win2000 workstation. In a
database I have a table where one varchar column is set to polish
collation.
Regional settings for the system...
|
by: Marcin Floryan |
last post by:
Hello!
How can I create an Installer entirely in Polish language using Deployment
Project in VB.NET (I have already translated the available texts into
Polish).
I have .NET 2003 EN and I read I...
|
by: Larry Neylon |
last post by:
Hi there,
I'm currently trying to implement a website that will store and retrieve
Polish, so I need to be able to handle Polish characters using classic ASP
with MySql5.
Does anybody have an...
|
by: AmigoFd |
last post by:
Hello,
This problem is really driving me crazy ...
* I have a mySql database which is latin1_swedish_ci
* In my web.config I have:
<globalization requestEncoding="ISO-8859-2"...
|
by: kollatjorva |
last post by:
Hi all
I'm trying to get a value from an xml node 'Publisher' use the value as
a name of an .css class. This works fine until I get a value from the
Publisher node with white space in it.
I've...
|
by: =?iso-8859-1?B?TWF0dGlhcyBCcuRuZHN0cvZt?= |
last post by:
Hello!
I'm trying to find what package I should use if I want to:
1. Create 3d vectors.
2. Normalize those vectors.
3. Create a 3x3 rotation matrix from a unit 3-d vector and an angle in...
|
by: robert.szczepanski |
last post by:
Hi everybody;
I can't change polish sign to small letter.
This is my php script:
<?php
setlocale(LC_ALL, "pl_PL.UTF-8") ; //this function return
"pl_PL.UTF-8"
|
by: Werner Partner |
last post by:
I would like to write correct poloish letters, e.g. in the following page:
http://www.kairos-team.de/?lang=pl
There are such letters as ł, ń, ę, and so on.
I found these letter sin polish...
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: aa123db |
last post by:
Variable and constants
Use var or let for variables and const fror constants.
Var foo ='bar';
Let foo ='bar';const baz ='bar';
Functions
function $name$ ($parameters$) {
}
...
|
by: ryjfgjl |
last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
|
by: ryjfgjl |
last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
| |