473,465 Members | 1,892 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Codecs

Hi All--
As far as I can tell, after looking only at the documentation (and not
searching peps etc.), you cannot query the codecs to give you a list of
registered codecs, or a list of possible codecs it could retrieve for
you if you knew enough to ask for them by name.

Why not? It seems to me that if I want to try to read an unknown file
using an exhaustive list of possible encodings, the best place to keep
the most current list is the codec registry itself, not in the
documentation for the codec module.

Metta,
Ivan
----------------------------------------------
Ivan Van Laningham
God N Locomotive Works
http://www.andi-holmes.com/
http://www.foretec.com/python/worksh...oceedings.html
Army Signal Corps: Cu Chi, Class of '70
Author: Teach Yourself Python in 24 Hours
Jul 21 '05 #1
3 2646
Ivan Van Laningham wrote:
Hi All--
As far as I can tell, after looking only at the documentation (and not
searching peps etc.), you cannot query the codecs to give you a list of
registered codecs, or a list of possible codecs it could retrieve for
you if you knew enough to ask for them by name.

Why not?


There are several answers to that question. Which of them is true,
I don't know. In order of likelyhood:
1. When the API was designed, that functionality was forgotten.
It was not possible to add it later on (because of 2)
2. Registration builds on the notion of lookup functions. The
lookup function gets a codec name, and either succeeds in
finding the codec, or raises an exception.
Now, a lookup function, in principle, might not "know" in
advance what codecs it supports, or the number of encoding
it supports might not be finite. So asking such a lookup
function for the complete list of codecs might not be
implementable.

As an example of a lookup function that doesn't know what
encodings it supports, look at my iconv module. This module
provides all codecs that iconv_open(3) supports, yet there
is no standard way to query the iconv library in advance
for a list of all supported codecs.

As an example for a lookup function that supports an infinite
number of codecs, consider the (theoretical) encrypt/password
encoding, which encrypts a string with a password, and the
password is part of the codec name. Each password defines
a new encoding, and there is an infinite number of them.

Now, if 1) would have been considered, it might have been possible
to design the API in a way that didn't support all cases that
the current API supports. Alas, somebody must have misplaced
the time machine.

Regards,
Martin
Jul 21 '05 #2
Ivan Van Laningham wrote:
Hi All--
As far as I can tell, after looking only at the documentation (and not
searching peps etc.), you cannot query the codecs to give you a list of
registered codecs, or a list of possible codecs it could retrieve for
you if you knew enough to ask for them by name.

Why not?


There are several answers to that question. Which of them is true,
I don't know. In order of likelyhood:
1. When the API was designed, that functionality was forgotten.
It was not possible to add it later on (because of 2)
2. Registration builds on the notion of lookup functions. The
lookup function gets a codec name, and either succeeds in
finding the codec, or raises an exception.
Now, a lookup function, in principle, might not "know" in
advance what codecs it supports, or the number of encoding
it supports might not be finite. So asking such a lookup
function for the complete list of codecs might not be
implementable.

As an example of a lookup function that doesn't know what
encodings it supports, look at my iconv module. This module
provides all codecs that iconv_open(3) supports, yet there
is no standard way to query the iconv library in advance
for a list of all supported codecs.

As an example for a lookup function that supports an infinite
number of codecs, consider the (theoretical) encrypt/password
encoding, which encrypts a string with a password, and the
password is part of the codec name. Each password defines
a new encoding, and there is an infinite number of them.

Now, if 1) would have been considered, it might have been possible
to design the API in a way that didn't support all cases that
the current API supports. Alas, somebody must have misplaced
the time machine.

Regards,
Martin
Jul 21 '05 #3
Ivan Van Laningham wrote:

It seems to me that if I want to try to read an unknown file
using an exhaustive list of possible encodings ...

Supposing such a list existed:

What do you mean by "unknown file"? That the encoding is unknown?

Possibility 1:
You are going to try to decode the file from "legacy" to Unicode --
until the first 'success' (defined how?)? But the file could be decoded
by *several* codecs into Unicode without an exception being raised. Just
a simple example: the encodings ['iso-8859-' + x for x in '12459']
define *all* possible 256 characters.

There are various language-guessing algorithms based on e.g. frequency
of ngrams ... try Google.

Possibility 2:
You "know" the file is in a Unicode-encoding e.g. utf-8, have
successfully decoded it to Unicode, and are going to try to encode the
file in a "legacy" encoding but you don't know which one is appropriate?
Sorry, same "But".

Jul 21 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Your Name | last post by:
Hi, I have been trying to generate codecs for my language in Python using gencodec.py. The problem is the codec created does not work. Here is the process that I followed. I created a directory...
0
by: Steven Bethard | last post by:
I just wanted to thank Python for making encodings so easy! I recently discovered that one of the tools I use stores everything in UTF-8, and so I was getting some off-by-one errors because I was...
3
by: Eric Brunel | last post by:
Hi all, I just found a problem in the xreadlines method/module when used with codecs.open: the codec specified in the open does not seem to be taken into account by xreadlines which also returns...
3
by: Paul Watson | last post by:
$ python Python 2.4.1 (#1, May 16 2005, 15:19:29) on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import codecs >>> codecs.lookup('ascii') (<built-in...
1
by: Zhongjian Lu | last post by:
Hi Guys, I was processing a UTF-16 coded file with BOM and was not aware of the codecs package at first. I wrote the following code: ===== Code 1============================ for i in...
7
by: Mike Currie | last post by:
I'm trying to write out files that have utf-8 characters 0x85 and 0x08 in them. Every configuration I try I get a UnicodeError: ascii codec can't decode byte 0x85 in position 255: oridinal not in...
1
by: David Hughes | last post by:
I used this function successfully with Python 2.4 to alter the encoding of a set of database records from latin-1 to utf-8, but the same program raises an exception using Python 2.5. This small...
0
by: yrogirg | last post by:
Actually, I need utf-8 to utf-8 encoding which would change the text to another keyboard layout (e.g. from english to russian ghbdtn -> ÐÒÉ×ÅÔ) and would not affect other symbols. I`m totally...
2
by: George Sakkis | last post by:
I'm trying to use codecs.open() and I see two issues when I pass encoding='utf8': 1) Newlines are hardcoded to LINEFEED (ascii 10) instead of the platform-specific byte(s). import codecs f =...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.