473,739 Members | 9,109 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Shrinky-dink Python (also, non-Unicode Python build is broken)

I'm an indie shareware Windows game developer. In indie shareware
game development, download size is terribly important; conventional
wisdom holds that--even today--your download should be 5MB or less.

I'd like to use Python in my games. However, python24.dll is 1.86MB,
and zips down to 877k. I can't afford to devote 1/6 of my download
to just the scripting interpreter; I've got music, and textures, and
my own crappy code to ship.

Following a friend's suggestion, as an experiment I downloaded the
Python 2.4.2 source, then set about stripping out everything I could.
I removed:
* Unicode support, including the CJK codecs
* All doc strings
* *Every* module written in C
Now when I build, python24.dll is 570k, and zips down to about 260k.
But I learned some things on the way.
First and foremost: turning off Py_USING_UNICOD E *breaks the build*
on Windows. The following list of breakages were all fixed with
judicious applications of #ifdef Py_USING_UNICOD E:
* The implementation of "multi-byte codecs" (CJK codecs) implicitly
assumes that they can use all the Unicode facilities. So all the
files in "Modules/cjkcodecs" fail to build.
* Obviously, the Unicode string object depends on Unicode support,
so Objects/unicode* doesn't build.
* There are several spots in the code that need to handle Unicode
strings in some slightly special way, and assume Unicode is turned
on. E.g.:
* Modules/posixmodule.c, posix__getfullp athname(), line 1745
* same file, posix_open(), starting on line 5201
* Objects/fileobject.c, open_the_file() , starting on line 158
* _winreg.c, Py2Reg(), starting on lines 724 and 777

In addition, there was one slightly more complicated problem: _winreg.c
assumes it should call PyUnicode_Decod eMBCS() to turn strings pulled
from the registry into Unicode strings. I'm not sure what the correct
thing to do here is; I went with changing the calls from
PyUnicode_Decod eMBCS() to PyString_FromSt ringAndSize() for non-Unicode
builds.

Of course, it's not the most important thing in the world--after all,
I'm the first person to even *notice*, right? But it seems a shame
that
one can break the build so easily. If it pleases the stewards of
Python, I would be happy to submit patches that fix the non-"using
Unicode" build.
Second of all, the dumb-as-a-bag-of-rocks Windows linker (at least
the one used by VC++ under MSVS .Net 2003) *links in unused static
symbols*. If I want to excise the code for a module, it is not
sufficient to comment-out the relevant _inittab line in config.c.
Nor does it help if I comment out the "extern" prototype for the
init function. As far as I can tell, the only way to *really* get
rid of a module, including all its static functions and static data,
is to actually *remove all the code* (with comments, or #if, or
whatnot). What a nosebleed, huh?

So in order to build my *really* minimal python24.dll, I have to hack
up the source something fierce. It would be pleasant if the Python
source code provided an easy facility for turning off modules at
compile-time. I would be happy to propose something / write a PEP
/ submit patches to do such a thing, if there is a chance that such
a thing could make it into the official Python source. However, I
realize that this has terribly limited appeal; that, and the fact
that Python releases are infrequent, makes me think it's not a
terrible hardship if I had to re-hack up each new Python release
by hand.
Whatcha think, froods?
/larry/

Jan 16 '06 #1
10 2216
Larry Hastings wrote:
Of course, it's not the most important thing in the world--after all,
I'm the first person to even *notice*, right? But it seems a shame
that
one can break the build so easily. If it pleases the stewards of
Python, I would be happy to submit patches that fix the non-"using
Unicode" build.
There was a recent python-dev thread_ suggesting that we drop support
for --disable-unicode, mainly I think because no one was willing to
maintain it. If you're willing to offer patches and some maintenance,
it probably has a decent chance of acceptance.

... _thread:
http://mail.python.org/pipermail/pyt...er/056897.html
So in order to build my *really* minimal python24.dll, I have to hack
up the source something fierce. It would be pleasant if the Python
source code provided an easy facility for turning off modules at
compile-time. I would be happy to propose something / write a PEP
/ submit patches to do such a thing, if there is a chance that such
a thing could make it into the official Python source. However, I
realize that this has terribly limited appeal; that, and the fact
that Python releases are infrequent, makes me think it's not a
terrible hardship if I had to re-hack up each new Python release
by hand.


My impression is that, for most things like this, python-dev is happy to
accept the patches *if* someone is willing to commit to maintaining
them, and they don't make the codebase too much more complex.

STeVe
Jan 16 '06 #2
Larry Hastings wrote:
First and foremost: turning off Py_USING_UNICOD E *breaks the build*
on Windows.
Probably nobody does that nowadays. My own feeling (but I don't have numbers
for backing it up) is that the biggest size in the .DLL is represented by
things like the CJK codecs (which are about 800k). I don't think you're
gaining that much by trying to remove unicode support at all, especially
since (as you noticed) it's going to be maintenance headhache.
Second of all, the dumb-as-a-bag-of-rocks Windows linker (at least
the one used by VC++ under MSVS .Net 2003) *links in unused static
symbols*. If I want to excise the code for a module, it is not
sufficient to comment-out the relevant _inittab line in config.c.
Nor does it help if I comment out the "extern" prototype for the
init function. As far as I can tell, the only way to *really* get
rid of a module, including all its static functions and static data,
is to actually *remove all the code* (with comments, or #if, or
whatnot). What a nosebleed, huh?
This is off-topic here, but MSVC linker *can* strip unused symbols, of
course. Look into /OPT:NOREF.
So in order to build my *really* minimal python24.dll, I have to hack
up the source something fierce. It would be pleasant if the Python
source code provided an easy facility for turning off modules at
compile-time. I would be happy to propose something / write a PEP
/ submit patches to do such a thing, if there is a chance that such
a thing could make it into the official Python source. However, I
realize that this has terribly limited appeal; that, and the fact
that Python releases are infrequent, makes me think it's not a
terrible hardship if I had to re-hack up each new Python release
by hand.


You're not the only one complaining about the size of Python .DLL: also
people developing self-contained programs with tools like PyInstaller or
py2exe (that is, programs which are supposed to run without Python
installed) are affected by the lack of a clear policy.

I myself complained before, especially after Python 2.4 got those ginormous
CJK codecs within its standard DLL, you can look for the thread in Google.
The bottom line of that discussion was:

- The policy about what must be linked within python .dll and what must be
kept outside should be proposed as a PEP, and it should provide guidelines
to be applied also for future modules.
- There will be some opposition to the obvious policy of "keeping the bare
minimum inside the DLL" because of inefficiencies in the Python build
system. Specifically, I was told that maintaining modules outside the DLL
instead of inside the DLL is more burdesome for some reason (which I have
not investigated), but surely, with a good build system, switching either
configuration setting should be the matter of changing a single word in a
single place, with no code changes required.

Personally, I could find some time to write up a PEP, but surely not to pick
up a lengthy discussion nor to improve the build system myself. Hence, I
mostly decided to give up for now and stick with recompiling Python myself.
The policy I'd propose is that the DLL should contain the minimum set of
modules needed to run the following Python program:

-------------------
print "hello world"
-------------------

There's probably some specific exception I'm not aware of, but you get the
big picture.
--
Giovanni Bajo
Jan 16 '06 #3
I myself wonder why python.dll can't just load a companion i18n.dll
when and if it's called for in the script. Such as by having week
references to those functions and loading the dll as needed.And
probably throwing an exception if it can't be loaded. Most of the CJK
stuff could then be carried in that DLL and in some cases, such as
py2exe, not even be included because it's not used.

Just my 2 cents.

LL

Jan 17 '06 #4
Larry Hastings:
First and foremost: turning off Py_USING_UNICOD E *breaks the build*
on Windows. The following list of breakages were all fixed with
judicious applications of #ifdef Py_USING_UNICOD E:
* The implementation of "multi-byte codecs" (CJK codecs) implicitly
assumes that they can use all the Unicode facilities. So all the
files in "Modules/cjkcodecs" fail to build.
* Obviously, the Unicode string object depends on Unicode support,
so Objects/unicode* doesn't build.
* There are several spots in the code that need to handle Unicode
strings in some slightly special way, and assume Unicode is turned
on. E.g.:
* Modules/posixmodule.c, posix__getfullp athname(), line 1745
* same file, posix_open(), starting on line 5201
* Objects/fileobject.c, open_the_file() , starting on line 158
* _winreg.c, Py2Reg(), starting on lines 724 and 777


I'm probably responsible for some of the breakage when adding
Unicode file name support to Python. Windows is a Unicode based
operating system and I expect Unicode calls will eventually infest the
code base to a greater extent than currently. Requiring each
modification that adds a Unicode feature to be safe with
Py_USING_UNICOD E turned off will add to the implementation effort for
that feature. I'd prefer to drop support for turning off
Py_USING_UNICOD E in Windows specific code. Well, since it is currently
broken, document that it isn't supported. Other platforms may need to
continue allowing non Py_USING_UNICOD E builds.

Neil
Jan 17 '06 #5
Giovanni Bajo:
- There will be some opposition to the obvious policy of "keeping the bare
minimum inside the DLL" because of inefficiencies in the Python build
system.


It is also non-optimal for those that do want the full set of
modules as separate files can add overhead for block sizing (both on
disk and in memory, executables pad out each section to some block
size), by requiring more load-time inter-module fixups, and by not
allowing the linker to perform some optimizations. It'd be worthwhile
seeing if the DLL would speed up or shrink if whole program optimization
was turned on.

Neil
Jan 17 '06 #6
"Larry Hastings" <la***@hastings .org> wrote in message
news:11******** *************@g 44g2000cwa.goog legroups.com...
Second of all, the dumb-as-a-bag-of-rocks Windows linker (at least
the one used by VC++ under MSVS .Net 2003) *links in unused static
symbols*. If I want to excise the code for a module, it is not
sufficient to comment-out the relevant _inittab line in config.c.
Nor does it help if I comment out the "extern" prototype for the
init function. As far as I can tell, the only way to *really* get
rid of a module, including all its static functions and static data,
is to actually *remove all the code* (with comments, or #if, or
whatnot). What a nosebleed, huh?


This may not be a linker issue. There is a C++ switch /Gy that "enables
function-level linking". That is, without this option enabled, if any
function in a module needs to be linked, the linker goes ahead and links the
whole module. I guess this is supposed to be some kind of linker
optimization. The problem is that the rest of the module may introduce
additional link dependencies, thus aggravating the problem. Perhaps
changing the C compiler to use function-level linking could address this
problem.

-- Paul
Jan 17 '06 #7
There are exactly four non-Unicode build breakages in the Python source
tree that are Win32-specific. Two are simply a matter of #if, two also
require new alternative code (calls to PyString_FromSt ringAndSize()).
All told, my changes to Win32-specific code to fix Py_USING_UNICOD E
consists of exactly twelve new lines of code.

As for future development of Windows-specific Python features...
doesn't that generally happen in modules, rather than the Python
interpreter, these days? Either in Mark Hammond's pywin32 (what used
to be called "win32all") , or perhaps done in Python using ctypes.
There haven't been any changes to the three Windows-specific modules
(msvcrt, winreg, and winsound) mentioned in any "What's New in Python
2.x" document, and 2.0 came out more than five years ago.
/larry/

Jan 17 '06 #8
Neil Hodgson wrote:
- There will be some opposition to the obvious policy of "keeping
the bare minimum inside the DLL" because of inefficiencies in the
Python build system.
It is also non-optimal for those that do want the full set of
modules as separate files can add overhead for block sizing (both on
disk and in memory, executables pad out each section to some block
size), by requiring more load-time inter-module fixups


I would be surprised if this showed up in any profile. Importing modules can
already be slow no matter external stats (see programs like "mercurial" that,
to win benchmarks with C-compiled counterparts, do lazy imports). As for the
overhead at the border of blocks, you should be more worried with 800K of CJK
codecs being loaded in your virtual memory (and not fully swapped out because
of block sizing) which are totally useless for most applications.

Anyway, we're picking nits here, but you have a point in being worried. If I
ever write a PEP, I will produce numbers to show beyond any doubt that there is
no performance difference.
, and by not
allowing the linker to perform some optimizations. It'd be worthwhile
seeing if the DLL would speed up or shrink if whole program
optimization was turned on.


There is no way whole program optimization can produce any advantage as the
modules are totally separated and they don't have direct calls that the
compiler can exploit.
--
Giovanni Bajo
Jan 17 '06 #9
Larry Hastings:
As for future development of Windows-specific Python features...
doesn't that generally happen in modules, rather than the Python
interpreter, these days? Either in Mark Hammond's pywin32 (what used
to be called "win32all") , or perhaps done in Python using ctypes.
There haven't been any changes to the three Windows-specific modules
(msvcrt, winreg, and winsound) mentioned in any "What's New in Python
2.x" document, and 2.0 came out more than five years ago.


It is in the built-in modules providing OS features that there
should be more use of Unicode. Unicode system calls are more accurate
and have fewer limitations than ANSI system calls. Examples are allowing
Unicode in sys.argv and os.environ or for file paths where the ANSI
versions are limited to less than 260 characters.

Are you willing to monitor and fix new Py_USING_UNICOD E issues or
are you proposing just to produce a patch now and then expect
contributors to maintain this feature?

Neil
Jan 17 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
1822
by: Irmen de Jong | last post by:
QOTW: "Giving full access rights to a secretary or new programmer ought to insure an occasional random file deletion." -- Raymond Hettinger "I always use join, but that's probably because that method is more likely to run code that I once wrote. Never trust code written by a man who uses defines to create his own C syntax." -- Fredrik Lundh Discussion ----------
0
1445
by: Irmen de Jong | last post by:
QOTW: "The best use for a bug report on comp.lang.python is as an object lesson for your grandchildren: 40 years from now you can search the archives for it, and tell the little darlings 'see? if I had only put that on SourceForge instead, the bug would have been fixed by now'." -- Tim Peters "I thought we made dictionaries out of trees... Are you trying to reverse entropy? " -- Bob Gailer
0
1277
by: Brent Turner | last post by:
Under python 2.2 I was able to create a COM object in an exe much of the same way that I could from py source... meaning that I was able to register the com server and create an instance using the current process (exe/source) as the com server. Now, in python 2.3 when I try to create the COM instance from the exe, it starts up a new process and fails (b/c my exe does not support the command line parameters) The Source below is what...
0
2236
by: Emile van Sebille | last post by:
QOTW (in the OS agnostic category): "There is a (very popular) Python package out there which exposes the win32 api. I'm not sure what it's called. (win32api? pythonwin? win32all?)" -- Francis Avila QOTW (in the popular vote category): "So far, python has been the easiest language to learn I've ever come across. I tried learning perl, and it was a disaster.... Too convoluted. Python is a breath of fresh air. Also, the docs and...
0
1191
by: Cameron Laird | last post by:
QOTW: "What is so nice about Python is that there are so many places on the user/scripter/programmer/computer-scientist spectrum where you can be and find Python to be a useful tool. This makes it very easy to build good collaborations between domain experts and 'real programmers' where both sides can use the same language and neither feels constrained or overwhelmed." - Roy Smith "In this setting I'd say that Java is a win, though I...
0
1533
by: Simon Brunning | last post by:
QOTW: "" - John Machin, snipping a section of Perl code. "What sort of programmer are you? If it works on your computer, it's done, ship it!" - Grant Edwards Guido invites us to comment on PEP 343. This Python Enhancement Proposal includes a 'with' statement, allowing you simply and reliably wrap a block of code with entry and exit code, in which resources can be acquired and released. It also proposes enhancements
0
1118
by: Simon Brunning | last post by:
QOTW: "Python is more concerned with making it easy to write good programs than difficult to write bad ones." - Steve Holden "Scientists build so that they can learn. Programmers and engineers learn so that they can build." - Magnus Lycka "It happens that old Java programmers make one module per class when they start using Python. That's more or less equivalent of never using more than 8.3 characters in filenames in modern operating...
2
1467
by: Toon Knapen | last post by:
I'm trying to build the svn-trunk version of python on a Solaris box. However I do not have a python installed yet and apparantly the build of python requires a python to be accessible (as also annotated in the Makefile generated during the ./configure). How can I solve this situation? Thanks, toon
0
1275
by: Jack Diederich | last post by:
QOTW: "being able to cook an egg" - Guido Van Rossum in response to the question, "What do you think is the most important skill every programmer should posses?" "I am asking for your forgiveness" - an open letter to Guido by someone who took the "D" in "BDFL" too literally. Parsing a Grammar. Several solid tools are suggested. http://groups.google.com/group/comp.lang.python/browse_thread/thread/ea9736e13bd20fe2
0
8969
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8792
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9479
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
9266
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8215
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
4570
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
3280
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2748
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2193
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.