By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,199 Members | 1,077 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,199 IT Pros & Developers. It's quick & easy.

Building Python 2.4 with icc and processor-specific optimizations

P: n/a
Just out of curiosity, I was wondering if anyone has
compiled Python 2.4 with the Intel C Compiler and its
processor specific optimizations. I can build it fine
with OPT="-O3" or OPT="-xN" but when I try to combine
them I get this as soon as ./python is run:

"""
case $MAKEFLAGS in \
*-s*) CC='icc -pthread' LDSHARED='icc -pthread -shared' OPT='-DNDEBUG -O3 -xN' ./python -E ./setup.py -q build;; \
*) CC='icc -pthread' LDSHARED='icc -pthread -shared' OPT='-DNDEBUG -O3 -xN' ./python -E ./setup.py build;; \
esac
'import site' failed; use -v for traceback
Traceback (most recent call last):
File "./setup.py", line 6, in ?
import sys, os, getopt, imp, re
File "/usr/local/src/Python-2.4/Lib/os.py", line 130, in ?
raise ImportError, 'no os specific module found'
ImportError: no os specific module found
make: *** [sharedmods] Error 1
"""

Also, if I run ./python, I have this interesting result:

"""
$ ./python
'import site' failed; use -v for traceback
Python 2.4 (#34, Mar 12 2005, 18:46:28)
[GCC Intel(R) C++ gcc 3.0 mode] on linux2
Type "help", "copyright", "credits" or "license" for more information.
import sys
sys.builtin_module_names

('__main__', '__builtin__', '__builtin__', '__builtin__', '__builtin__', '__builtin__', '__builtin__', '__builtin__', '__builtin__', '__builtin__', '__builtin__', '__builtin__', '__builtin__', 'exceptions', 'gc', 'gc')
"""

Whoa--what's going on? Any ideas?

--
Michael Hoffman
Jul 18 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
Michael Hoffman wrote:
Just out of curiosity, I was wondering if anyone has
compiled Python 2.4 with the Intel C Compiler and its
processor specific optimizations. I can build it fine
with OPT="-O3" or OPT="-xN" but when I try to combine
them I get this as soon as ./python is run:

"""
case $MAKEFLAGS in \
*-s*) CC='icc -pthread' LDSHARED='icc -pthread -shared' OPT='-DNDEBUG
-O3 -xN' ./python -E ./setup.py -q build;; \
*) CC='icc -pthread' LDSHARED='icc -pthread -shared' OPT='-DNDEBUG -O3
-xN' ./python -E ./setup.py build;; \
esac
'import site' failed; use -v for traceback
Traceback (most recent call last):
File "./setup.py", line 6, in ?
import sys, os, getopt, imp, re
File "/usr/local/src/Python-2.4/Lib/os.py", line 130, in ?
raise ImportError, 'no os specific module found'
ImportError: no os specific module found
make: *** [sharedmods] Error 1
"""

Also, if I run ./python, I have this interesting result:

"""
$ ./python
'import site' failed; use -v for traceback
Python 2.4 (#34, Mar 12 2005, 18:46:28)
[GCC Intel(R) C++ gcc 3.0 mode] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.builtin_module_names

('__main__', '__builtin__', '__builtin__', '__builtin__', '__builtin__',
'__builtin__', '__builtin__', '__builtin__', '__builtin__',
'__builtin__', '__builtin__', '__builtin__', '__builtin__',
'exceptions', 'gc', 'gc')
"""

Whoa--what's going on? Any ideas?


Further investigation reveals that the function that sets
sys.builtin_module_names sorts the list before turning it into a
tuple. And binarysort() in Objects/listobject.c doesn't work when
optimized in that fashion. Adding #pragma optimize("", off)
beforehand solves the problem. Why that is, I have no idea. Is
anyone else curious?

Also, if anyone is looking for a way to squeeze a little extra time
out of the startup, perhaps sorting the list at build-time,
rather than when Python starts would be good. Although probably
not worth the trouble. ;-)
--
Michael Hoffman
Jul 18 '05 #2

P: n/a
Michael Hoffman wrote:
Further investigation reveals that the function that sets
sys.builtin_module_names sorts the list before turning it into a
tuple. And binarysort() in Objects/listobject.c doesn't work when
optimized in that fashion. Adding #pragma optimize("", off)
beforehand solves the problem. Why that is, I have no idea. Is
anyone else curious?
I would really like to know, indeed. OTOH, I probably don't have the
time to analyse it myself.

Looks like a compiler bug to me: perhaps, some condition is compile-time
asserted to be always true even though it could happen that it is false.

OTOH, it could also be Python's failure to follow C's aliasing rules
correctly; Python casts between C pointers which, in strict C, causes
undefined behaviour. So if your compiler has something similar to GCC's
-fno-strict-aliasing, you could see whether this helps.

If not, just try comparing the assembler output of either code, on
a function-by-function basis. Alternatively, try to annotate the
calls that go out of the sorting (e.g. to RichCompareBool) so that
you get tracing, and then see where the traces differ.
Also, if anyone is looking for a way to squeeze a little extra time
out of the startup, perhaps sorting the list at build-time,
rather than when Python starts would be good. Although probably
not worth the trouble. ;-)


Probably not. config.c is hand-written in some (embedded Python)
environments, and expecting it to be sorted would break these
environments.

Regards,
Martin
Jul 18 '05 #3

P: n/a
Martin v. Lwis wrote:
OTOH, it could also be Python's failure to follow C's aliasing rules
correctly; Python casts between C pointers which, in strict C, causes
undefined behaviour. So if your compiler has something similar to GCC's
-fno-strict-aliasing, you could see whether this helps.
There's nothing like that specifically. There is an -falias option
which the manual just says "assume aliasing."
If not, just try comparing the assembler output of either code, on
a function-by-function basis.
Oh boy, it's a 10,000 line diff. The joys of interprocedural
optimization. I think I'll quit while I'm ahead...
Alternatively, try to annotate the
calls that go out of the sorting (e.g. to RichCompareBool) so that
you get tracing, and then see where the traces differ.


Well, they go wrong almost right away:

non-optimized:

PyObject_RichCompareBool('signal', 'thread', 0)
PyObject_RichCompareBool('posix', 'signal', 0)
PyObject_RichCompareBool('errno', 'posix', 0)
PyObject_RichCompareBool('_sre', 'errno', 0)
PyObject_RichCompareBool('_codecs', '_sre', 0)
PyObject_RichCompareBool('zipimport', '_codecs', 0)
PyObject_RichCompareBool('zipimport', 'posix', 0)
PyObject_RichCompareBool('zipimport', 'thread', 0)
PyObject_RichCompareBool('_symtable', 'posix', 0)

optimized:

PyObject_RichCompareBool('signal', 'thread', 0)
PyObject_RichCompareBool('posix', 'errno', 0) # hmmm, comparing in the wrong direction
PyObject_RichCompareBool('posix', 'thread', 0)
PyObject_RichCompareBool('posix', 'signal', 0)
PyObject_RichCompareBool('errno', 'errno', 0) # totally bogus!
PyObject_RichCompareBool('errno', 'errno', 0) # and repeating it twice for good measure!
PyObject_RichCompareBool('_sre', 'errno', 0)
PyObject_RichCompareBool('_sre', 'errno', 0)
PyObject_RichCompareBool('_sre', 'posix', 0)

Well I probably have spent too much time on this already. To top things off, python
compiled with -O3 and without -xN actually runs faster, so I shouldn't even be trying
this road.
--
Michael Hoffman
Jul 18 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.