473,386 Members | 1,715 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

pythonXX.dll size: please split CJK codecs out

Hello,

python24.dll is much bigger than python23.dll. This was discussed already on
the newsgroup, see the thread starting here:
http://mail.python.org/pipermail/pyt...ly/229096.html

I don't think I fully understand the reason why additional .pyd modules were
built into the .dll. OTOH, this does not help anyone, since:

- Normal users don't care about the size of the pythonXX.dll, or the number of
dependencies, nor if a given module is shipped as .py or .pyd. They just import
modules of the standard library, ignoring where each module resides. So,
putting more modules (or less modules) within pythonXX.dll makes absolutely no
differences for them.
- Users which freeze applications instead are *worse* served by this, because
they end up with larger programs. For them, it is better to have the highest
granularity wrt external modules, so that the resulting freezed application is
as small as possible.

A post in the previous thread (specifically
http://mail.python.org/pipermail/pyt...ly/229157.html) suggests
that py2exe users might get a small benefit from the fact that in some cases
they would be able to ship the program with only 3 files (app.exe,
python24.dll, and library.zip). But:

1) I reckon this is a *very* rare case. You need to write an application that
does not use Tk, socket, zlib, expat, nor any external library like numarray or
PIL.
2) Even if you fit the above case, you still end up with 3 files, which means
you still have to package your app somehow, etc. Also, the resulting package
will be *bigger* for no reason, as python24.dll might include modules which the
user doesn't need.

I don't think that merging things into python24.dll is a good way to serve
users of freezing programs, not even py2exe users. Personally, I use McMillan's
PyInstaller[1] which always builds a single executable, no matter what. So I do
not like the idea that things are getting worse because of py2exe: py2exe
should be fixed instead, if its users request to have fewer files to ship (in
my case, for instance, this missing feature is a showstopper for adopting
py2exe).

Can we at least undo this unfortunate move in time for 2.5? I would be grateful
if *at least* the CJK codecs (which are like 1Mb big) are splitted out of
python25.dll. IMHO, I would prefer having *more* granularity, rather than
*less*.

+1 on splitting out the CJK codecs.

Thanks,
Giovanni Bajo
[1] See also my page on PyInstaller: http://www.develer.com/oss/PyInstaller
Aug 20 '05 #1
14 1855
Giovanni Bajo wrote:
I don't think I fully understand the reason why additional .pyd modules were
built into the .dll. OTOH, this does not help anyone, since:
The reason is simple: a single DLL is easier to maintain. You only need
to add the new files to the VC project, edit config.c, and be done. No
new project to create for N different configurations, no messing with
the MSI builder.

In addition, having everything in a single DLL speeds up Python startup
a little, since less file searching is necessary.
Can we at least undo this unfortunate move in time for 2.5? I would be grateful
if *at least* the CJK codecs (which are like 1Mb big) are splitted out of
python25.dll. IMHO, I would prefer having *more* granularity, rather than
*less*.


If somebody would formulate a policy (i.e. conditions under which
modules go into python2x.dll, vs. going into separate files), I'm
willing to implement it. This policy should best be formulated in
a PEP.

The policy should be flexible wrt. to future changes. I.e. it should
*not* say "do everything as in Python 2.3", because this means I
would have to rip off the modules added after 2.3 entirely (i.e.
not ship them at all). Instead, the policy should give clear guidance
even for modules that are not yet developed.

It should be a PEP, so that people can comment. For example,
I think I would be -1 on a policy "make python2x.dll as minimal
as possible, containing only modules that are absolutely
needed for startup".

Regards,
Martin
Aug 21 '05 #2
Martin v. Löwis wrote:
I don't think I fully understand the reason why additional .pyd
modules were built into the .dll. OTOH, this does not help anyone,
since:
The reason is simple: a single DLL is easier to maintain. You only
need
to add the new files to the VC project, edit config.c, and be done. No
new project to create for N different configurations, no messing with
the MSI builder.


FWIW, this just highlights how ineffecient your build system is. Everything you
currently do by hand could be automated, including MSI generation. Also, you
describe the Windows procedure, which I suppose it does not take into account
what needs to be done for other OS. But I'm sure that revamping the Python
building system is not a piece of cake.

I'll take the point though: it's easier to maintain for developers, and most
Python users don't care.
In addition, having everything in a single DLL speeds up Python
startup a little, since less file searching is necessary.


I highly doubt this can be noticed in an actual benchmark, but I could be
wrong. I can produce numbers though, if this can help people decide.
Can we at least undo this unfortunate move in time for 2.5? I would
be grateful if *at least* the CJK codecs (which are like 1Mb big)
are splitted out of python25.dll. IMHO, I would prefer having *more*
granularity, rather than *less*.


If somebody would formulate a policy (i.e. conditions under which
modules go into python2x.dll, vs. going into separate files), I'm
willing to implement it. This policy should best be formulated in
a PEP.

The policy should be flexible wrt. to future changes. I.e. it should
*not* say "do everything as in Python 2.3", because this means I
would have to rip off the modules added after 2.3 entirely (i.e.
not ship them at all). Instead, the policy should give clear guidance
even for modules that are not yet developed.

It should be a PEP, so that people can comment. For example,
I think I would be -1 on a policy "make python2x.dll as minimal
as possible, containing only modules that are absolutely
needed for startup".


I'm willing to write up such a PEP, but it's hard to devise an universal
policy. Basically, the only element we can play with is the size of the
resulting binary for the module. Would you like a policy like "split out every
module whose binary on Windows is > X kbytes?".

My personal preference would go to something "make python2x.dll include only
the modules which are really core, like sys and os". This would also provide
guidance to future modules, as they would simply go in external modules (I
don't think really core stuff is being added right now).

At this point, my main goal is getting CJK out of the DLL, so everything that
lets me achieve this goal is good for me.

Thanks,
--
Giovanni Bajo
Aug 21 '05 #3
Giovanni Bajo wrote:

FWIW, this just highlights how ineffecient your build system is. Everything you
currently do by hand could be automated, including MSI generation.
I'm sure Martin would be happy to consider a patch to make the build
system more efficient. :)
I'm willing to write up such a PEP, but it's hard to devise an universal
policy.


This is the reason that a PEP is needed before there are changes.
--
Michael Hoffman
Aug 21 '05 #4
Michael Hoffman wrote:
FWIW, this just highlights how ineffecient your build system is.
Everything you currently do by hand could be automated, including
MSI generation.


I'm sure Martin would be happy to consider a patch to make the build
system more efficient. :)

Out of curiosity, was this ever discussed among Python developers? Would
something like scons qualify for this? OTOH, scons opens nasty
self-bootstrapping issues (being written itself in Python).

Before considering a patch (or even a PEP) for this, the basic requirements
should be made clear. I know portability among several UNIX flavours is one,
for instance. What are the others?
--
Giovanni Bajo
Aug 21 '05 #5
Giovanni Bajo wrote:
FWIW, this just highlights how ineffecient your build system is. Everything you
currently do by hand could be automated, including MSI generation. Also, you
describe the Windows procedure, which I suppose it does not take into account
what needs to be done for other OS. But I'm sure that revamping the Python
building system is not a piece of cake.
You are wrong. It is not true that everything I do by hand could be
automated. Atleast after automation, I still would have to do things
by hand, namely invoke the automation.

You probably haven't looked at the MSI generation at all: it *is*
automatic. However, everytime something changes in the structure,
the code generating the MSI must be adjusted to the new structure.
I'll take the point though: it's easier to maintain for developers, and most
Python users don't care.
See, this I find surprising. If there really is such a big need for
python24.dll being split in many more modules - why doesn't anybody
just do this, and offers it as a separate installation for use
with py2exe?

The fact that this hasn't happened indicates that users don't need
it badly enough. I personally rarely need to create a standalone
Python application, but when I did, I just used freeze, and static
linking. That way, I got a single binary, with no magic packaging,
and a minimal one, too.
In addition, having everything in a single DLL speeds up Python
startup a little, since less file searching is necessary.


I highly doubt this can be noticed in an actual benchmark, but I could be
wrong. I can produce numbers though, if this can help people decide.


No, this is a minor issue. If you do write a PEP, and you find it
relatively easy to compare the maximum modularization to the minimal
one, it would be useful to underline your point, of course.
I'm willing to write up such a PEP, but it's hard to devise an universal
policy.
Indeed. For Python 2.4, I made up a policy for myself: everything that
does not depend on a separate (non-system) library goes into
pythonxy.dll. That way, everybody will be able to compile Python
from sources without downloading anything else, yet it causes minimum
maintenance overhead. That's how the current python24.dll came about.
Basically, the only element we can play with is the size of the
resulting binary for the module. Would you like a policy like "split out every
module whose binary on Windows is > X kbytes?".
It's less important what I like - I think I would ask for a poll on
the proposed PEP, and I would be -1 on anything that means more work
for contributors. But that would be only one voice, and, if a majority
of the Windows Python users preferred your policy, it would be
implemented (of course, somebody contributing the resulting project
files or some automation for them would also help).
My personal preference would go to something "make python2x.dll include only
the modules which are really core, like sys and os". This would also provide
guidance to future modules, as they would simply go in external modules (I
don't think really core stuff is being added right now).


Ok, then write that into the PEP. You would have to provide a definition
for "core", e.g. "everything that is needed for startup".

As a guideline, the Unix build process currently includes only the
following modules by default:

- marshal, imp, __main__, __builtin__, sys, exceptions: Modules
living in Python/*.c
- gc, signal: invoked directly from the interpreter
- thread: not sure
- posix, errno, _sre, _codecs, so that setup.py can run
- zipimport, to avoid bootstrapping problems for importing python24.zip
- _symtable, because setup.py cannot get the dependencies right
- xxsubtype, for an undocumented reason I forgot

Regards,
Martin
Aug 21 '05 #6
Giovanni Bajo wrote:
I'm sure Martin would be happy to consider a patch to make the build
system more efficient. :)
Out of curiosity, was this ever discussed among Python developers? Would
something like scons qualify for this? OTOH, scons opens nasty
self-bootstrapping issues (being written itself in Python).


No. The Windows build system must be integrated with Visual Studio.
(Perhaps this is rather, "dunno: is it integrated with VS.NET 2003?")

When developing on Windows, you really want all the support you can
get from VS, e.g. when debugging, performing name completion, etc.
To me, this makes it likely that only VS project files will work.
Before considering a patch (or even a PEP) for this, the basic requirements
should be made clear. I know portability among several UNIX flavours is one,
for instance. What are the others?


Clearly, the starting requirement would be that you look at the build
process *at all*. The Windows build process and the Unix build process
are completely different. Portability is desirable only for the Unix
build process; however, you might find that it already meets your needs
quite well.

Regards,
Martin
Aug 21 '05 #7
Martin v. Löwis wrote:
Out of curiosity, was this ever discussed among Python developers?
Would something like scons qualify for this? OTOH, scons opens nasty
self-bootstrapping issues (being written itself in Python).
No. The Windows build system must be integrated with Visual Studio.
(Perhaps this is rather, "dunno: is it integrated with VS.NET 2003?")
When developing on Windows, you really want all the support you can
get from VS, e.g. when debugging, performing name completion, etc.
To me, this makes it likely that only VS project files will work.


You seem to ignore the fact that scons can easily generate VS.NET projects. And
it does that by parsing the same file it could use to build the project
directly (by invoking your Visual Studio); and that very same file would be the
same under both Windows and UNIX.

And even if we disabled this feature and build the project directly from
command line, you could still edit your files with the Visual Studio
environment and debug them in there (since you are still compiling them with
Visual C, it's just scons invoking the compiler). You could even setup the
environment so that when you press CTRL+SHIFT+B (or F7, if you have the old
keybinding), it invokes scons and builds the project.

So, if the requirement is "integration with Visual Studio", that is not an
issue to switching to a different build process.
Before considering a patch (or even a PEP) for this, the basic
requirements should be made clear. I know portability among several
UNIX flavours is one, for instance. What are the others?


Clearly, the starting requirement would be that you look at the build
process *at all*.


I compiled Python several times under Windows (both 2.2.x and 2.3.x) using
Visual Studio 6, and one time under Linux. But I never investigated into it in
detail.
The Windows build process and the Unix build process
are completely different.
But there is no technical reason why it has to be so. I work on several
portable projects, and they use the same build process under both Windows and
Unix, while retaining full Visual Studio integration (I myself am a Visual
Studio user).
Portability is desirable only for the Unix
build process; however, you might find that it already meets your
needs quite well.


Well, you came up with a maintenance problem: you told me that building more
external modules needs more effort. In a well-configured and fully-automated
build system, when you add a file you have to write its name only one time in a
project description file; if you want to build a dynamic library, you have to
add a single line. This would take care of both Windows and UNIX, both
compilation, packaging and installation.
--
Giovanni Bajo
Aug 21 '05 #8
Martin v. Löwis wrote:
Can we at least undo this unfortunate move in time for 2.5? I would be grateful
if *at least* the CJK codecs (which are like 1Mb big) are splitted out of
python25.dll. IMHO, I would prefer having *more* granularity, rather than
*less*.
If somebody would formulate a policy (i.e. conditions under which
modules go into python2x.dll, vs. going into separate files), I'm
willing to implement it. This policy should best be formulated in
a PEP.


+1 Yes, I think this needs to be addressed.
The policy should be flexible wrt. to future changes. I.e. it should
*not* say "do everything as in Python 2.3", because this means I
would have to rip off the modules added after 2.3 entirely (i.e.
not ship them at all). Instead, the policy should give clear guidance
even for modules that are not yet developed.
Agree.
It should be a PEP, so that people can comment. For example,
I think I would be -1 on a policy "make python2x.dll as minimal
as possible, containing only modules that are absolutely
needed for startup".
Also agree, Both the minimal and maximal dll size possible are ideals
that are not the most optimal choices.

I would put the starting minimum boundary as:

1. "The minimum required to start the python interpreter with no
additional required files."

Currently python 2.4 (on windows) does not yet meet that guideline, so
it seems some modules still need to be added while other modules, (I
haven't checked which), are probably not needed to meet that guideline.

This could be extended to:

2. "The minimum required to run an agreed upon set of simple Python
programs."

I expect there may be a lot of differing opinions on just what those
minimum Python programs should be. But that is where the PEP process
comes in.
Regards,
Ron

Regards,
Martin


Aug 21 '05 #9
Giovanni Bajo wrote:
You seem to ignore the fact that scons can easily generate VS.NET projects.
I'm not ignoring it - I'm not aware of it. And also, I don't quite
believe it until I see it.
But there is no technical reason why it has to be so. I work on several
portable projects, and they use the same build process under both Windows and
Unix, while retaining full Visual Studio integration (I myself am a Visual
Studio user).
Well, as long "F6" works...
Well, you came up with a maintenance problem: you told me that building more
external modules needs more effort. In a well-configured and fully-automated
build system, when you add a file you have to write its name only one time in a
project description file; if you want to build a dynamic library, you have to
add a single line. This would take care of both Windows and UNIX, both
compilation, packaging and installation.


I very much doubt this is possible. For some modules, you also need to
create autoconf fragments on Unix, for example, and you need might need
to specify different libraries on different systems.

Regards,
Martin
Aug 21 '05 #10
Ron Adam wrote:
I would put the starting minimum boundary as:

1. "The minimum required to start the python interpreter with no
additional required files."

Currently python 2.4 (on windows) does not yet meet that guideline, so
it seems some modules still need to be added while other modules, (I
haven't checked which), are probably not needed to meet that guideline.
I'm not sure, either, but I *think* python24 won't load any .pyd file
on interactive startup.
This could be extended to:

2. "The minimum required to run an agreed upon set of simple Python
programs."

I expect there may be a lot of differing opinions on just what those
minimum Python programs should be. But that is where the PEP process
comes in.


As I mentioned earlier, there also should be a negative list: modules
that depend on external libraries should not be incorporated into
python24.dll. Most notably, this rules out zlib.pyd, _bsddb.pyd,
and _ssl.pyd, all of which people may consider to be useful into these
simple programs.

Regards,
Martin
Aug 21 '05 #11
Martin v. Löwis wrote:
Ron Adam wrote:
I would put the starting minimum boundary as:

1. "The minimum required to start the python interpreter with no
additional required files."

Currently python 2.4 (on windows) does not yet meet that guideline, so
it seems some modules still need to be added while other modules, (I
haven't checked which), are probably not needed to meet that guideline.

I'm not sure, either, but I *think* python24 won't load any .pyd file
on interactive startup.

This could be extended to:

2. "The minimum required to run an agreed upon set of simple Python
programs."

I expect there may be a lot of differing opinions on just what those
minimum Python programs should be. But that is where the PEP process
comes in.

As I mentioned earlier, there also should be a negative list: modules
that depend on external libraries should not be incorporated into
python24.dll.


This fits under the above, rule #1, of not needing additional files.
Most notably, this rules out zlib.pyd, _bsddb.pyd, and _ssl.pyd, all of which people may consider to be useful into these
simple programs.
I would not consider those as being part of "simple" programs. But
that's only an opinion and we need something more objective than opinion.

Now that I think of it.. Rule 2 above should be...

2. "The minimum (modules) required to run an agreed upon set of
"common simple" programs.

Frequency of use is also an important consideration.

Maybe there's a way to classify a programs complexity based on a set of
attributes.

So... program simplicity could consider:

1. Complete program is a single .py file.
2. Not larger than 'n' lines. (some reasonable limit)
3. Limited number of import statements.
(less than 'n' modules imported)
4. Uses only stdio and/or basic file operations for input
and output. (runs in interactive console or command line.)

Then ranking the frequency of imported modules from this set of programs
could give a good hint as to what might be included and those less
frequently used that may be excluded.

Setting a pythonxx.dll minimum file size goal could further help. For
example if excluding modules result is less than the minimum goal, then
a few extra more frequently used modules could be included as a bonus.

This is obviously a "practical beats purity" exercise. ;-)

Cheers,
Ron

Regards,
Martin

Aug 21 '05 #12
"Martin v. Löwis" <ma****@v.loewis.de> writes:
Ron Adam wrote:
I would put the starting minimum boundary as:

1. "The minimum required to start the python interpreter with no
additional required files."

Currently python 2.4 (on windows) does not yet meet that guideline, so
it seems some modules still need to be added while other modules, (I
haven't checked which), are probably not needed to meet that guideline.


I'm not sure, either, but I *think* python24 won't load any .pyd file
on interactive startup.


That seems to be true. But it will need zlib.pyd as soon if you try to
import from compressed zips. So, zlib can be thought as part of the
modules required for bootstrap.

Thomas
Aug 22 '05 #13
Thomas Heller wrote:
That seems to be true. But it will need zlib.pyd as soon if you try to
import from compressed zips. So, zlib can be thought as part of the
modules required for bootstrap.


Right. OTOH, linking zlib to pythonXY means that you cannot build Python
at all anymore unless you also have zlib available.

Regards,
Martin
Aug 22 '05 #14
Giovanni Bajo wrote:
Hello,

python24.dll is much bigger than python23.dll. This was discussed already on
the newsgroup, see the thread starting here:
http://mail.python.org/pipermail/pyt...ly/229096.html

I don't think I fully understand the reason why additional .pyd modules were
built into the .dll. OTOH, this does not help anyone, since:

- Normal users don't care about the size of the pythonXX.dll, or the number of
dependencies, nor if a given module is shipped as .py or .pyd. They just import
modules of the standard library, ignoring where each module resides. So,
putting more modules (or less modules) within pythonXX.dll makes absolutely no
differences for them.
- Users which freeze applications instead are *worse* served by this, because
they end up with larger programs. For them, it is better to have the highest
granularity wrt external modules, so that the resulting freezed application is
as small as possible.

<snip>
1.8Mb - life's too short what gain would you get from removing 1Mb
from that? So it can get on a floppy? ;-). That would be more effort
than is needed, IMHO, even my handy/mobile phone/cell phone can easily
cope with 1.8Mb!

Neil

--

Neil Benn
Senior Automation Engineer
Cenix BioScience
BioInnovations Zentrum
Tatzberg 47
D-01307
Dresden
Germany

Tel : +49 (0)351 4173 154
e-mail : be**@cenix-bioscience.com
Cenix Website : http://www.cenix-bioscience.com

Aug 23 '05 #15

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Eric Brunel | last post by:
Hi all, I just found a problem in the xreadlines method/module when used with codecs.open: the codec specified in the open does not seem to be taken into account by xreadlines which also returns...
3
by: Ivan Van Laningham | last post by:
Hi All-- As far as I can tell, after looking only at the documentation (and not searching peps etc.), you cannot query the codecs to give you a list of registered codecs, or a list of possible...
3
by: Paul Watson | last post by:
$ python Python 2.4.1 (#1, May 16 2005, 15:19:29) on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import codecs >>> codecs.lookup('ascii') (<built-in...
1
by: Zhongjian Lu | last post by:
Hi Guys, I was processing a UTF-16 coded file with BOM and was not aware of the codecs package at first. I wrote the following code: ===== Code 1============================ for i in...
7
by: Mike Currie | last post by:
I'm trying to write out files that have utf-8 characters 0x85 and 0x08 in them. Every configuration I try I get a UnicodeError: ascii codec can't decode byte 0x85 in position 255: oridinal not in...
1
by: David Hughes | last post by:
I used this function successfully with Python 2.4 to alter the encoding of a set of database records from latin-1 to utf-8, but the same program raises an exception using Python 2.5. This small...
0
by: shrik | last post by:
I have following error : Total giant files in replay configuration file are : File name : /new_file/prob1.rec Given file /new_file/prob1.rec is successfully verified. Splitting for giant file...
0
by: yrogirg | last post by:
Actually, I need utf-8 to utf-8 encoding which would change the text to another keyboard layout (e.g. from english to russian ghbdtn -> ÐÒÉ×ÅÔ) and would not affect other symbols. I`m totally...
2
by: George Sakkis | last post by:
I'm trying to use codecs.open() and I see two issues when I pass encoding='utf8': 1) Newlines are hardcoded to LINEFEED (ascii 10) instead of the platform-specific byte(s). import codecs f =...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.