Help | Site Map
Connecting Tech Pros Worldwide
 
 
LinkBack Thread Tools
  #1  
Old January 16th, 2006, 06:35 PM
Larry Hastings
Guest
 
Posts: n/a
Default Shrinky-dink Python (also, non-Unicode Python build is broken)

I'm an indie shareware Windows game developer. In indie shareware
game development, download size is terribly important; conventional
wisdom holds that--even today--your download should be 5MB or less.

I'd like to use Python in my games. However, python24.dll is 1.86MB,
and zips down to 877k. I can't afford to devote 1/6 of my download
to just the scripting interpreter; I've got music, and textures, and
my own crappy code to ship.

Following a friend's suggestion, as an experiment I downloaded the
Python 2.4.2 source, then set about stripping out everything I could.
I removed:
* Unicode support, including the CJK codecs
* All doc strings
* *Every* module written in C
Now when I build, python24.dll is 570k, and zips down to about 260k.
But I learned some things on the way.


First and foremost: turning off Py_USING_UNICODE *breaks the build*
on Windows. The following list of breakages were all fixed with
judicious applications of #ifdef Py_USING_UNICODE:
* The implementation of "multi-byte codecs" (CJK codecs) implicitly
assumes that they can use all the Unicode facilities. So all the
files in "Modules/cjkcodecs" fail to build.
* Obviously, the Unicode string object depends on Unicode support,
so Objects/unicode* doesn't build.
* There are several spots in the code that need to handle Unicode
strings in some slightly special way, and assume Unicode is turned
on. E.g.:
* Modules/posixmodule.c, posix__getfullpathname(), line 1745
* same file, posix_open(), starting on line 5201
* Objects/fileobject.c, open_the_file(), starting on line 158
* _winreg.c, Py2Reg(), starting on lines 724 and 777

In addition, there was one slightly more complicated problem: _winreg.c
assumes it should call PyUnicode_DecodeMBCS() to turn strings pulled
from the registry into Unicode strings. I'm not sure what the correct
thing to do here is; I went with changing the calls from
PyUnicode_DecodeMBCS() to PyString_FromStringAndSize() for non-Unicode
builds.

Of course, it's not the most important thing in the world--after all,
I'm the first person to even *notice*, right? But it seems a shame
that
one can break the build so easily. If it pleases the stewards of
Python, I would be happy to submit patches that fix the non-"using
Unicode" build.


Second of all, the dumb-as-a-bag-of-rocks Windows linker (at least
the one used by VC++ under MSVS .Net 2003) *links in unused static
symbols*. If I want to excise the code for a module, it is not
sufficient to comment-out the relevant _inittab line in config.c.
Nor does it help if I comment out the "extern" prototype for the
init function. As far as I can tell, the only way to *really* get
rid of a module, including all its static functions and static data,
is to actually *remove all the code* (with comments, or #if, or
whatnot). What a nosebleed, huh?

So in order to build my *really* minimal python24.dll, I have to hack
up the source something fierce. It would be pleasant if the Python
source code provided an easy facility for turning off modules at
compile-time. I would be happy to propose something / write a PEP
/ submit patches to do such a thing, if there is a chance that such
a thing could make it into the official Python source. However, I
realize that this has terribly limited appeal; that, and the fact
that Python releases are infrequent, makes me think it's not a
terrible hardship if I had to re-hack up each new Python release
by hand.


Whatcha think, froods?


/larry/

  #2  
Old January 16th, 2006, 07:35 PM
Steven Bethard
Guest
 
Posts: n/a
Default Re: Shrinky-dink Python (also, non-Unicode Python build is broken)

Larry Hastings wrote:[color=blue]
> Of course, it's not the most important thing in the world--after all,
> I'm the first person to even *notice*, right? But it seems a shame
> that
> one can break the build so easily. If it pleases the stewards of
> Python, I would be happy to submit patches that fix the non-"using
> Unicode" build.[/color]

There was a recent python-dev thread_ suggesting that we drop support
for --disable-unicode, mainly I think because no one was willing to
maintain it. If you're willing to offer patches and some maintenance,
it probably has a decent chance of acceptance.

... _thread:
http://mail.python.org/pipermail/pyt...er/056897.html
[color=blue]
> So in order to build my *really* minimal python24.dll, I have to hack
> up the source something fierce. It would be pleasant if the Python
> source code provided an easy facility for turning off modules at
> compile-time. I would be happy to propose something / write a PEP
> / submit patches to do such a thing, if there is a chance that such
> a thing could make it into the official Python source. However, I
> realize that this has terribly limited appeal; that, and the fact
> that Python releases are infrequent, makes me think it's not a
> terrible hardship if I had to re-hack up each new Python release
> by hand.[/color]

My impression is that, for most things like this, python-dev is happy to
accept the patches *if* someone is willing to commit to maintaining
them, and they don't make the codebase too much more complex.

STeVe
  #3  
Old January 16th, 2006, 07:45 PM
Giovanni Bajo
Guest
 
Posts: n/a
Default Re: Shrinky-dink Python (also, non-Unicode Python build is broken)

Larry Hastings wrote:
[color=blue]
> First and foremost: turning off Py_USING_UNICODE *breaks the build*
> on Windows.[/color]

Probably nobody does that nowadays. My own feeling (but I don't have numbers
for backing it up) is that the biggest size in the .DLL is represented by
things like the CJK codecs (which are about 800k). I don't think you're
gaining that much by trying to remove unicode support at all, especially
since (as you noticed) it's going to be maintenance headhache.
[color=blue]
> Second of all, the dumb-as-a-bag-of-rocks Windows linker (at least
> the one used by VC++ under MSVS .Net 2003) *links in unused static
> symbols*. If I want to excise the code for a module, it is not
> sufficient to comment-out the relevant _inittab line in config.c.
> Nor does it help if I comment out the "extern" prototype for the
> init function. As far as I can tell, the only way to *really* get
> rid of a module, including all its static functions and static data,
> is to actually *remove all the code* (with comments, or #if, or
> whatnot). What a nosebleed, huh?[/color]

This is off-topic here, but MSVC linker *can* strip unused symbols, of
course. Look into /OPT:NOREF.
[color=blue]
> So in order to build my *really* minimal python24.dll, I have to hack
> up the source something fierce. It would be pleasant if the Python
> source code provided an easy facility for turning off modules at
> compile-time. I would be happy to propose something / write a PEP
> / submit patches to do such a thing, if there is a chance that such
> a thing could make it into the official Python source. However, I
> realize that this has terribly limited appeal; that, and the fact
> that Python releases are infrequent, makes me think it's not a
> terrible hardship if I had to re-hack up each new Python release
> by hand.[/color]

You're not the only one complaining about the size of Python .DLL: also
people developing self-contained programs with tools like PyInstaller or
py2exe (that is, programs which are supposed to run without Python
installed) are affected by the lack of a clear policy.

I myself complained before, especially after Python 2.4 got those ginormous
CJK codecs within its standard DLL, you can look for the thread in Google.
The bottom line of that discussion was:

- The policy about what must be linked within python .dll and what must be
kept outside should be proposed as a PEP, and it should provide guidelines
to be applied also for future modules.
- There will be some opposition to the obvious policy of "keeping the bare
minimum inside the DLL" because of inefficiencies in the Python build
system. Specifically, I was told that maintaining modules outside the DLL
instead of inside the DLL is more burdesome for some reason (which I have
not investigated), but surely, with a good build system, switching either
configuration setting should be the matter of changing a single word in a
single place, with no code changes required.

Personally, I could find some time to write up a PEP, but surely not to pick
up a lengthy discussion nor to improve the build system myself. Hence, I
mostly decided to give up for now and stick with recompiling Python myself.
The policy I'd propose is that the DLL should contain the minimum set of
modules needed to run the following Python program:

-------------------
print "hello world"
-------------------

There's probably some specific exception I'm not aware of, but you get the
big picture.
--
Giovanni Bajo


  #4  
Old January 17th, 2006, 01:05 AM
LordLaraby
Guest
 
Posts: n/a
Default Re: Shrinky-dink Python (also, non-Unicode Python build is broken)

I myself wonder why python.dll can't just load a companion i18n.dll
when and if it's called for in the script. Such as by having week
references to those functions and loading the dll as needed.And
probably throwing an exception if it can't be loaded. Most of the CJK
stuff could then be carried in that DLL and in some cases, such as
py2exe, not even be included because it's not used.

Just my 2 cents.

LL

  #5  
Old January 17th, 2006, 02:15 AM
Neil Hodgson
Guest
 
Posts: n/a
Default Re: Shrinky-dink Python (also, non-Unicode Python build is broken)

Larry Hastings:
[color=blue]
> First and foremost: turning off Py_USING_UNICODE *breaks the build*
> on Windows. The following list of breakages were all fixed with
> judicious applications of #ifdef Py_USING_UNICODE:
> * The implementation of "multi-byte codecs" (CJK codecs) implicitly
> assumes that they can use all the Unicode facilities. So all the
> files in "Modules/cjkcodecs" fail to build.
> * Obviously, the Unicode string object depends on Unicode support,
> so Objects/unicode* doesn't build.
> * There are several spots in the code that need to handle Unicode
> strings in some slightly special way, and assume Unicode is turned
> on. E.g.:
> * Modules/posixmodule.c, posix__getfullpathname(), line 1745
> * same file, posix_open(), starting on line 5201
> * Objects/fileobject.c, open_the_file(), starting on line 158
> * _winreg.c, Py2Reg(), starting on lines 724 and 777[/color]

I'm probably responsible for some of the breakage when adding
Unicode file name support to Python. Windows is a Unicode based
operating system and I expect Unicode calls will eventually infest the
code base to a greater extent than currently. Requiring each
modification that adds a Unicode feature to be safe with
Py_USING_UNICODE turned off will add to the implementation effort for
that feature. I'd prefer to drop support for turning off
Py_USING_UNICODE in Windows specific code. Well, since it is currently
broken, document that it isn't supported. Other platforms may need to
continue allowing non Py_USING_UNICODE builds.

Neil
  #6  
Old January 17th, 2006, 02:15 AM
Neil Hodgson
Guest
 
Posts: n/a
Default Re: Shrinky-dink Python (also, non-Unicode Python build is broken)

Giovanni Bajo:
[color=blue]
> - There will be some opposition to the obvious policy of "keeping the bare
> minimum inside the DLL" because of inefficiencies in the Python build
> system.[/color]

It is also non-optimal for those that do want the full set of
modules as separate files can add overhead for block sizing (both on
disk and in memory, executables pad out each section to some block
size), by requiring more load-time inter-module fixups, and by not
allowing the linker to perform some optimizations. It'd be worthwhile
seeing if the DLL would speed up or shrink if whole program optimization
was turned on.

Neil
  #7  
Old January 17th, 2006, 04:45 AM
Paul McGuire
Guest
 
Posts: n/a
Default Re: Shrinky-dink Python (also, non-Unicode Python build is broken)

"Larry Hastings" <larry@hastings.org> wrote in message
news:1137435559.804230.11820@g44g2000cwa.googlegro ups.com...[color=blue]
> Second of all, the dumb-as-a-bag-of-rocks Windows linker (at least
> the one used by VC++ under MSVS .Net 2003) *links in unused static
> symbols*. If I want to excise the code for a module, it is not
> sufficient to comment-out the relevant _inittab line in config.c.
> Nor does it help if I comment out the "extern" prototype for the
> init function. As far as I can tell, the only way to *really* get
> rid of a module, including all its static functions and static data,
> is to actually *remove all the code* (with comments, or #if, or
> whatnot). What a nosebleed, huh?
>[/color]

This may not be a linker issue. There is a C++ switch /Gy that "enables
function-level linking". That is, without this option enabled, if any
function in a module needs to be linked, the linker goes ahead and links the
whole module. I guess this is supposed to be some kind of linker
optimization. The problem is that the rest of the module may introduce
additional link dependencies, thus aggravating the problem. Perhaps
changing the C compiler to use function-level linking could address this
problem.

-- Paul


  #8  
Old January 17th, 2006, 06:55 AM
Larry Hastings
Guest
 
Posts: n/a
Default Re: Shrinky-dink Python (also, non-Unicode Python build is broken)

There are exactly four non-Unicode build breakages in the Python source
tree that are Win32-specific. Two are simply a matter of #if, two also
require new alternative code (calls to PyString_FromStringAndSize()).
All told, my changes to Win32-specific code to fix Py_USING_UNICODE
consists of exactly twelve new lines of code.

As for future development of Windows-specific Python features...
doesn't that generally happen in modules, rather than the Python
interpreter, these days? Either in Mark Hammond's pywin32 (what used
to be called "win32all"), or perhaps done in Python using ctypes.
There haven't been any changes to the three Windows-specific modules
(msvcrt, winreg, and winsound) mentioned in any "What's New in Python
2.x" document, and 2.0 came out more than five years ago.


/larry/

  #9  
Old January 17th, 2006, 11:45 AM
Giovanni Bajo
Guest
 
Posts: n/a
Default Re: Shrinky-dink Python (also, non-Unicode Python build is broken)

Neil Hodgson wrote:
[color=blue][color=green]
>> - There will be some opposition to the obvious policy of "keeping
>> the bare minimum inside the DLL" because of inefficiencies in the
>> Python build system.[/color]
>
> It is also non-optimal for those that do want the full set of
> modules as separate files can add overhead for block sizing (both on
> disk and in memory, executables pad out each section to some block
> size), by requiring more load-time inter-module fixups[/color]

I would be surprised if this showed up in any profile. Importing modules can
already be slow no matter external stats (see programs like "mercurial" that,
to win benchmarks with C-compiled counterparts, do lazy imports). As for the
overhead at the border of blocks, you should be more worried with 800K of CJK
codecs being loaded in your virtual memory (and not fully swapped out because
of block sizing) which are totally useless for most applications.

Anyway, we're picking nits here, but you have a point in being worried. If I
ever write a PEP, I will produce numbers to show beyond any doubt that there is
no performance difference.
[color=blue]
> , and by not
> allowing the linker to perform some optimizations. It'd be worthwhile
> seeing if the DLL would speed up or shrink if whole program
> optimization was turned on.[/color]

There is no way whole program optimization can produce any advantage as the
modules are totally separated and they don't have direct calls that the
compiler can exploit.
--
Giovanni Bajo


  #10  
Old January 17th, 2006, 09:55 PM
Neil Hodgson
Guest
 
Posts: n/a
Default Re: Shrinky-dink Python (also, non-Unicode Python build is broken)

Larry Hastings:
[color=blue]
> As for future development of Windows-specific Python features...
> doesn't that generally happen in modules, rather than the Python
> interpreter, these days? Either in Mark Hammond's pywin32 (what used
> to be called "win32all"), or perhaps done in Python using ctypes.
> There haven't been any changes to the three Windows-specific modules
> (msvcrt, winreg, and winsound) mentioned in any "What's New in Python
> 2.x" document, and 2.0 came out more than five years ago.[/color]

It is in the built-in modules providing OS features that there
should be more use of Unicode. Unicode system calls are more accurate
and have fewer limitations than ANSI system calls. Examples are allowing
Unicode in sys.argv and os.environ or for file paths where the ANSI
versions are limited to less than 260 characters.

Are you willing to monitor and fix new Py_USING_UNICODE issues or
are you proposing just to produce a patch now and then expect
contributors to maintain this feature?

Neil
  #11  
Old January 17th, 2006, 11:55 PM
Larry Hastings
Guest
 
Posts: n/a
Default Re: Shrinky-dink Python (also, non-Unicode Python build is broken)

[color=blue]
> Are you willing to monitor and fix new Py_USING_UNICODE issues or
> are you proposing just to produce a patch now and then expect
> contributors to maintain this feature?[/color]

Neither, I suppose, or perhaps both. I am proposing to produce a patch
now which fixes the non-Unicode build under Windows. However, I don't
expect anything out of other contributors, and I don't set Python
contribution policy. (Obviously the stewards of the Python tree don't
care whether contributions break the non-Unicode build. But that's a
fine policy; after all, they've already got enough to do, and in any
case I'm the first person to even notice.) If this patch is accepted,
and some future contribution breaks the non-Unicode build again, and I
discover the breakage, I might very well create a second patch to
re-fix it.

Since I'm seemingly the only person who cares about non-Unicode builds
on Windows, I suggest this approach would work just fine.


/larry/

 

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over network members.
Post your question now . . .
It's fast and it's free

Popular Articles