473,324 Members | 2,400 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,324 software developers and data experts.

Shrinky-dink Python (also, non-Unicode Python build is broken)

I'm an indie shareware Windows game developer. In indie shareware
game development, download size is terribly important; conventional
wisdom holds that--even today--your download should be 5MB or less.

I'd like to use Python in my games. However, python24.dll is 1.86MB,
and zips down to 877k. I can't afford to devote 1/6 of my download
to just the scripting interpreter; I've got music, and textures, and
my own crappy code to ship.

Following a friend's suggestion, as an experiment I downloaded the
Python 2.4.2 source, then set about stripping out everything I could.
I removed:
* Unicode support, including the CJK codecs
* All doc strings
* *Every* module written in C
Now when I build, python24.dll is 570k, and zips down to about 260k.
But I learned some things on the way.
First and foremost: turning off Py_USING_UNICODE *breaks the build*
on Windows. The following list of breakages were all fixed with
judicious applications of #ifdef Py_USING_UNICODE:
* The implementation of "multi-byte codecs" (CJK codecs) implicitly
assumes that they can use all the Unicode facilities. So all the
files in "Modules/cjkcodecs" fail to build.
* Obviously, the Unicode string object depends on Unicode support,
so Objects/unicode* doesn't build.
* There are several spots in the code that need to handle Unicode
strings in some slightly special way, and assume Unicode is turned
on. E.g.:
* Modules/posixmodule.c, posix__getfullpathname(), line 1745
* same file, posix_open(), starting on line 5201
* Objects/fileobject.c, open_the_file(), starting on line 158
* _winreg.c, Py2Reg(), starting on lines 724 and 777

In addition, there was one slightly more complicated problem: _winreg.c
assumes it should call PyUnicode_DecodeMBCS() to turn strings pulled
from the registry into Unicode strings. I'm not sure what the correct
thing to do here is; I went with changing the calls from
PyUnicode_DecodeMBCS() to PyString_FromStringAndSize() for non-Unicode
builds.

Of course, it's not the most important thing in the world--after all,
I'm the first person to even *notice*, right? But it seems a shame
that
one can break the build so easily. If it pleases the stewards of
Python, I would be happy to submit patches that fix the non-"using
Unicode" build.
Second of all, the dumb-as-a-bag-of-rocks Windows linker (at least
the one used by VC++ under MSVS .Net 2003) *links in unused static
symbols*. If I want to excise the code for a module, it is not
sufficient to comment-out the relevant _inittab line in config.c.
Nor does it help if I comment out the "extern" prototype for the
init function. As far as I can tell, the only way to *really* get
rid of a module, including all its static functions and static data,
is to actually *remove all the code* (with comments, or #if, or
whatnot). What a nosebleed, huh?

So in order to build my *really* minimal python24.dll, I have to hack
up the source something fierce. It would be pleasant if the Python
source code provided an easy facility for turning off modules at
compile-time. I would be happy to propose something / write a PEP
/ submit patches to do such a thing, if there is a chance that such
a thing could make it into the official Python source. However, I
realize that this has terribly limited appeal; that, and the fact
that Python releases are infrequent, makes me think it's not a
terrible hardship if I had to re-hack up each new Python release
by hand.
Whatcha think, froods?
/larry/

Jan 16 '06 #1
10 2175
Larry Hastings wrote:
Of course, it's not the most important thing in the world--after all,
I'm the first person to even *notice*, right? But it seems a shame
that
one can break the build so easily. If it pleases the stewards of
Python, I would be happy to submit patches that fix the non-"using
Unicode" build.
There was a recent python-dev thread_ suggesting that we drop support
for --disable-unicode, mainly I think because no one was willing to
maintain it. If you're willing to offer patches and some maintenance,
it probably has a decent chance of acceptance.

... _thread:
http://mail.python.org/pipermail/pyt...er/056897.html
So in order to build my *really* minimal python24.dll, I have to hack
up the source something fierce. It would be pleasant if the Python
source code provided an easy facility for turning off modules at
compile-time. I would be happy to propose something / write a PEP
/ submit patches to do such a thing, if there is a chance that such
a thing could make it into the official Python source. However, I
realize that this has terribly limited appeal; that, and the fact
that Python releases are infrequent, makes me think it's not a
terrible hardship if I had to re-hack up each new Python release
by hand.


My impression is that, for most things like this, python-dev is happy to
accept the patches *if* someone is willing to commit to maintaining
them, and they don't make the codebase too much more complex.

STeVe
Jan 16 '06 #2
Larry Hastings wrote:
First and foremost: turning off Py_USING_UNICODE *breaks the build*
on Windows.
Probably nobody does that nowadays. My own feeling (but I don't have numbers
for backing it up) is that the biggest size in the .DLL is represented by
things like the CJK codecs (which are about 800k). I don't think you're
gaining that much by trying to remove unicode support at all, especially
since (as you noticed) it's going to be maintenance headhache.
Second of all, the dumb-as-a-bag-of-rocks Windows linker (at least
the one used by VC++ under MSVS .Net 2003) *links in unused static
symbols*. If I want to excise the code for a module, it is not
sufficient to comment-out the relevant _inittab line in config.c.
Nor does it help if I comment out the "extern" prototype for the
init function. As far as I can tell, the only way to *really* get
rid of a module, including all its static functions and static data,
is to actually *remove all the code* (with comments, or #if, or
whatnot). What a nosebleed, huh?
This is off-topic here, but MSVC linker *can* strip unused symbols, of
course. Look into /OPT:NOREF.
So in order to build my *really* minimal python24.dll, I have to hack
up the source something fierce. It would be pleasant if the Python
source code provided an easy facility for turning off modules at
compile-time. I would be happy to propose something / write a PEP
/ submit patches to do such a thing, if there is a chance that such
a thing could make it into the official Python source. However, I
realize that this has terribly limited appeal; that, and the fact
that Python releases are infrequent, makes me think it's not a
terrible hardship if I had to re-hack up each new Python release
by hand.


You're not the only one complaining about the size of Python .DLL: also
people developing self-contained programs with tools like PyInstaller or
py2exe (that is, programs which are supposed to run without Python
installed) are affected by the lack of a clear policy.

I myself complained before, especially after Python 2.4 got those ginormous
CJK codecs within its standard DLL, you can look for the thread in Google.
The bottom line of that discussion was:

- The policy about what must be linked within python .dll and what must be
kept outside should be proposed as a PEP, and it should provide guidelines
to be applied also for future modules.
- There will be some opposition to the obvious policy of "keeping the bare
minimum inside the DLL" because of inefficiencies in the Python build
system. Specifically, I was told that maintaining modules outside the DLL
instead of inside the DLL is more burdesome for some reason (which I have
not investigated), but surely, with a good build system, switching either
configuration setting should be the matter of changing a single word in a
single place, with no code changes required.

Personally, I could find some time to write up a PEP, but surely not to pick
up a lengthy discussion nor to improve the build system myself. Hence, I
mostly decided to give up for now and stick with recompiling Python myself.
The policy I'd propose is that the DLL should contain the minimum set of
modules needed to run the following Python program:

-------------------
print "hello world"
-------------------

There's probably some specific exception I'm not aware of, but you get the
big picture.
--
Giovanni Bajo
Jan 16 '06 #3
I myself wonder why python.dll can't just load a companion i18n.dll
when and if it's called for in the script. Such as by having week
references to those functions and loading the dll as needed.And
probably throwing an exception if it can't be loaded. Most of the CJK
stuff could then be carried in that DLL and in some cases, such as
py2exe, not even be included because it's not used.

Just my 2 cents.

LL

Jan 17 '06 #4
Larry Hastings:
First and foremost: turning off Py_USING_UNICODE *breaks the build*
on Windows. The following list of breakages were all fixed with
judicious applications of #ifdef Py_USING_UNICODE:
* The implementation of "multi-byte codecs" (CJK codecs) implicitly
assumes that they can use all the Unicode facilities. So all the
files in "Modules/cjkcodecs" fail to build.
* Obviously, the Unicode string object depends on Unicode support,
so Objects/unicode* doesn't build.
* There are several spots in the code that need to handle Unicode
strings in some slightly special way, and assume Unicode is turned
on. E.g.:
* Modules/posixmodule.c, posix__getfullpathname(), line 1745
* same file, posix_open(), starting on line 5201
* Objects/fileobject.c, open_the_file(), starting on line 158
* _winreg.c, Py2Reg(), starting on lines 724 and 777


I'm probably responsible for some of the breakage when adding
Unicode file name support to Python. Windows is a Unicode based
operating system and I expect Unicode calls will eventually infest the
code base to a greater extent than currently. Requiring each
modification that adds a Unicode feature to be safe with
Py_USING_UNICODE turned off will add to the implementation effort for
that feature. I'd prefer to drop support for turning off
Py_USING_UNICODE in Windows specific code. Well, since it is currently
broken, document that it isn't supported. Other platforms may need to
continue allowing non Py_USING_UNICODE builds.

Neil
Jan 17 '06 #5
Giovanni Bajo:
- There will be some opposition to the obvious policy of "keeping the bare
minimum inside the DLL" because of inefficiencies in the Python build
system.


It is also non-optimal for those that do want the full set of
modules as separate files can add overhead for block sizing (both on
disk and in memory, executables pad out each section to some block
size), by requiring more load-time inter-module fixups, and by not
allowing the linker to perform some optimizations. It'd be worthwhile
seeing if the DLL would speed up or shrink if whole program optimization
was turned on.

Neil
Jan 17 '06 #6
"Larry Hastings" <la***@hastings.org> wrote in message
news:11*********************@g44g2000cwa.googlegro ups.com...
Second of all, the dumb-as-a-bag-of-rocks Windows linker (at least
the one used by VC++ under MSVS .Net 2003) *links in unused static
symbols*. If I want to excise the code for a module, it is not
sufficient to comment-out the relevant _inittab line in config.c.
Nor does it help if I comment out the "extern" prototype for the
init function. As far as I can tell, the only way to *really* get
rid of a module, including all its static functions and static data,
is to actually *remove all the code* (with comments, or #if, or
whatnot). What a nosebleed, huh?


This may not be a linker issue. There is a C++ switch /Gy that "enables
function-level linking". That is, without this option enabled, if any
function in a module needs to be linked, the linker goes ahead and links the
whole module. I guess this is supposed to be some kind of linker
optimization. The problem is that the rest of the module may introduce
additional link dependencies, thus aggravating the problem. Perhaps
changing the C compiler to use function-level linking could address this
problem.

-- Paul
Jan 17 '06 #7
There are exactly four non-Unicode build breakages in the Python source
tree that are Win32-specific. Two are simply a matter of #if, two also
require new alternative code (calls to PyString_FromStringAndSize()).
All told, my changes to Win32-specific code to fix Py_USING_UNICODE
consists of exactly twelve new lines of code.

As for future development of Windows-specific Python features...
doesn't that generally happen in modules, rather than the Python
interpreter, these days? Either in Mark Hammond's pywin32 (what used
to be called "win32all"), or perhaps done in Python using ctypes.
There haven't been any changes to the three Windows-specific modules
(msvcrt, winreg, and winsound) mentioned in any "What's New in Python
2.x" document, and 2.0 came out more than five years ago.
/larry/

Jan 17 '06 #8
Neil Hodgson wrote:
- There will be some opposition to the obvious policy of "keeping
the bare minimum inside the DLL" because of inefficiencies in the
Python build system.
It is also non-optimal for those that do want the full set of
modules as separate files can add overhead for block sizing (both on
disk and in memory, executables pad out each section to some block
size), by requiring more load-time inter-module fixups


I would be surprised if this showed up in any profile. Importing modules can
already be slow no matter external stats (see programs like "mercurial" that,
to win benchmarks with C-compiled counterparts, do lazy imports). As for the
overhead at the border of blocks, you should be more worried with 800K of CJK
codecs being loaded in your virtual memory (and not fully swapped out because
of block sizing) which are totally useless for most applications.

Anyway, we're picking nits here, but you have a point in being worried. If I
ever write a PEP, I will produce numbers to show beyond any doubt that there is
no performance difference.
, and by not
allowing the linker to perform some optimizations. It'd be worthwhile
seeing if the DLL would speed up or shrink if whole program
optimization was turned on.


There is no way whole program optimization can produce any advantage as the
modules are totally separated and they don't have direct calls that the
compiler can exploit.
--
Giovanni Bajo
Jan 17 '06 #9
Larry Hastings:
As for future development of Windows-specific Python features...
doesn't that generally happen in modules, rather than the Python
interpreter, these days? Either in Mark Hammond's pywin32 (what used
to be called "win32all"), or perhaps done in Python using ctypes.
There haven't been any changes to the three Windows-specific modules
(msvcrt, winreg, and winsound) mentioned in any "What's New in Python
2.x" document, and 2.0 came out more than five years ago.


It is in the built-in modules providing OS features that there
should be more use of Unicode. Unicode system calls are more accurate
and have fewer limitations than ANSI system calls. Examples are allowing
Unicode in sys.argv and os.environ or for file paths where the ANSI
versions are limited to less than 260 characters.

Are you willing to monitor and fix new Py_USING_UNICODE issues or
are you proposing just to produce a patch now and then expect
contributors to maintain this feature?

Neil
Jan 17 '06 #10
Are you willing to monitor and fix new Py_USING_UNICODE issues or
are you proposing just to produce a patch now and then expect
contributors to maintain this feature?


Neither, I suppose, or perhaps both. I am proposing to produce a patch
now which fixes the non-Unicode build under Windows. However, I don't
expect anything out of other contributors, and I don't set Python
contribution policy. (Obviously the stewards of the Python tree don't
care whether contributions break the non-Unicode build. But that's a
fine policy; after all, they've already got enough to do, and in any
case I'm the first person to even notice.) If this patch is accepted,
and some future contribution breaks the non-Unicode build again, and I
discover the breakage, I might very well create a second patch to
re-fix it.

Since I'm seemingly the only person who cares about non-Unicode builds
on Windows, I suggest this approach would work just fine.
/larry/

Jan 17 '06 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Irmen de Jong | last post by:
QOTW: "Giving full access rights to a secretary or new programmer ought to insure an occasional random file deletion." -- Raymond Hettinger "I always use join, but that's probably because that...
0
by: Irmen de Jong | last post by:
QOTW: "The best use for a bug report on comp.lang.python is as an object lesson for your grandchildren: 40 years from now you can search the archives for it, and tell the little darlings 'see? ...
0
by: Brent Turner | last post by:
Under python 2.2 I was able to create a COM object in an exe much of the same way that I could from py source... meaning that I was able to register the com server and create an instance using...
0
by: Emile van Sebille | last post by:
QOTW (in the OS agnostic category): "There is a (very popular) Python package out there which exposes the win32 api. I'm not sure what it's called. (win32api? pythonwin? win32all?)" -- Francis...
0
by: Cameron Laird | last post by:
QOTW: "What is so nice about Python is that there are so many places on the user/scripter/programmer/computer-scientist spectrum where you can be and find Python to be a useful tool. This makes...
0
by: Simon Brunning | last post by:
QOTW: "" - John Machin, snipping a section of Perl code. "What sort of programmer are you? If it works on your computer, it's done, ship it!" - Grant Edwards Guido invites us to comment on...
0
by: Simon Brunning | last post by:
QOTW: "Python is more concerned with making it easy to write good programs than difficult to write bad ones." - Steve Holden "Scientists build so that they can learn. Programmers and engineers...
2
by: Toon Knapen | last post by:
I'm trying to build the svn-trunk version of python on a Solaris box. However I do not have a python installed yet and apparantly the build of python requires a python to be accessible (as also...
0
by: Jack Diederich | last post by:
QOTW: "being able to cook an egg" - Guido Van Rossum in response to the question, "What do you think is the most important skill every programmer should posses?" "I am asking for your...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.