473,394 Members | 2,168 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

No latin9 in Python?

I noticed that Python does not understand the codec alias names
latin7 = iso8859-13, latin9 = iso8859-15
(see http://docs.python.org/lib/standard-encodings.html).

Particularly latin9 is pretty popular here in Western Europe since it
contains the Euro symbol (contrary to latin1).

According to the Wikipedia (http://en.wikipedia.org/wiki/ISO-8859), the
latin7 and latin9 aliases seem to be official, at least they are widely
used an accepted. In PostgreSQL, LATIN9 is even the name of the charset,
and iso8859-15 is the alias name:
http://www.postgresql.org/docs/8.2/s...#CHARSET-TABLE

Is there anything speaking against adding these as aliases? If no, I
would submit a patch. (Also, Python does not support the
latin10=iso8859-16 charset. I could try to add that as well.)

-- Chris
Dec 6 '06 #1
5 1242
Christoph Zwerschke wrote:
I noticed that Python does not understand the codec alias names
latin7 = iso8859-13, latin9 = iso8859-15
(see http://docs.python.org/lib/standard-encodings.html).

Particularly latin9 is pretty popular here in Western Europe since it
contains the Euro symbol (contrary to latin1).
One learns new things every day, I suppose: I've always referred to it
as ISO-8859-15, using whatever combination or absence of "_" or "-"
symbols that is deemed appropriate.
According to the Wikipedia (http://en.wikipedia.org/wiki/ISO-8859), the
latin7 and latin9 aliases seem to be official, at least they are widely
used an accepted. In PostgreSQL, LATIN9 is even the name of the charset,
and iso8859-15 is the alias name:
http://www.postgresql.org/docs/8.2/s...#CHARSET-TABLE
My impression of relational databases is that they often have lots of
legacy names for things like character encodings. A different
perspective may be had by looking at the XML standards where encodings
and their naming have received a great deal of attention:

http://www.w3.org/TR/REC-xml/#NT-EncodingDecl

It may be acceptable even in XML to use latin9 as an encoding name, but
since the XML specifications appear to recommend the ISO names, and
since XML has quite possibly raised awareness and usage of encoding
declarations to previously unknown levels, there may be some benefit in
conservatively shadowing XML and the expectations of its users (and of
the wider Web page authoring community, amongst others). One would have
to look into the rationale of the standards makers to understand why
they've made those particular recommendations, however.
Is there anything speaking against adding these as aliases? If no, I
would submit a patch. (Also, Python does not support the
latin10=iso8859-16 charset. I could try to add that as well.)
I don't see any disadvantages in having aliases, provided that they're
unambiguous, but then I'm far from being any person who is going to be
making that particular call.

Paul

Dec 6 '06 #2
Christoph Zwerschke schrieb:
Is there anything speaking against adding these as aliases? If no, I
would submit a patch. (Also, Python does not support the
latin10=iso8859-16 charset. I could try to add that as well.)
Python tries to follow the IANA charset registry.

http://www.iana.org/assignments/character-sets

If you submit a patch, it would be good if you checked all names and
determined which aliases are missing.

While you are at it, you'll notice that the current version of the
character-sets database lists

Name: ISO-8859-15
MIBenum: 111
Source: ISO
Please see:
<http://www.iana.org/assignments/charset-reg/ISO-8859-15>
Alias: ISO_8859-15
Alias: Latin-9

so the "official" alias is "Latin-9", not "latin9". You may
want to ask the submitter of that entry why this inconsistency
was introduced.

Regards,
Martin
Dec 6 '06 #3
Martin v. Löwis wrote:
While you are at it, you'll notice that the current version of the
character-sets database lists

Name: ISO-8859-15
MIBenum: 111
Source: ISO
Please see:
<http://www.iana.org/assignments/charset-reg/ISO-8859-15>
Alias: ISO_8859-15
Alias: Latin-9

so the "official" alias is "Latin-9", not "latin9". You may
want to ask the submitter of that entry why this inconsistency
was introduced.
Unfortunately, I got no reply and I really cannot see any reason for
this inconsistency; probably it was a mistake or carelessness.

According to http://recode.progiciels-bpi.ca/manual/Tabular.html,
"l9 and latin9 are aliases for this charset. Source: ISO 2375 registry."

So I think it cannot harm adding latin9 as an alias name. "Latin-9" will
then be recognized automatically since I think capitalization and
hyphens do not matter anyway (I'll check that).

Shall I proceed writing such a patch? Shall I also add latin0 and l0
which are other inofficial aliases?

-- Christoph

Dec 15 '06 #4
Christoph Zwerschke schrieb:
Shall I proceed writing such a patch? Shall I also add latin0 and l0
which are other inofficial aliases?
Sure, go ahead. I see no need for the latin0/l0 aliases, though: they
predate the formal adoption of iso-8859-15, and should be phased out
by now (I'm sure that somebody will provide an example of a software
that still uses it, but I likely won't find that single example
convincing).

Regards,
Martin
Dec 16 '06 #5
Martin v. Löwis schrieb:
Christoph Zwerschke schrieb:
>Shall I proceed writing such a patch? Shall I also add latin0 and l0
which are other inofficial aliases?

Sure, go ahead. I see no need for the latin0/l0 aliases, though: they
predate the formal adoption of iso-8859-15, and should be phased out
by now (I'm sure that somebody will provide an example of a software
that still uses it, but I likely won't find that single example
convincing).
Ok, I'll add the alias for latin9, the completely missing latin10, and
will also have a look whether some other things are missing. But
probably I'll only get round to doing so after the Christmas holidays.

-- Christoph
Dec 17 '06 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Emile van Sebille | last post by:
QOTW: "If we get 2.3.3c1 out in early December, we could release 2.3.3 final before the end of the year, and start 2004 with a 100% bug-free codebase <wink>." -- Tim Peters "cjOr proWe vbCould...
0
by: Emile van Sebille | last post by:
QOTW: "Have you ever used the copy module? I am *not* a beginner, and have used it *once* (and I can't remember what for, either)." -- Michael Hudson "It will likely take a little practice...
0
by: Emile van Sebille | last post by:
QOTW (in the OS agnostic category): "There is a (very popular) Python package out there which exposes the win32 api. I'm not sure what it's called. (win32api? pythonwin? win32all?)" -- Francis...
0
by: Emile van Sebille | last post by:
QOTW (advanced interfaces track): "I'm firmly in favour of any language that can DWIMNWIS." -- Tim Delaney QOTW (MS roadkill track): "Underestimate MS at your own risk. It is one thing to not...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.