473,388 Members | 1,322 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,388 software developers and data experts.

writing python extensions in assembly

Can anyone give me pointers/instructions/a template for writing a Python
extension in assembly (or better, HLA)?
Jun 27 '08 #1
16 1545
inhahe schrieb:
Can anyone give me pointers/instructions/a template for writing a Python
extension in assembly (or better, HLA)?
You could write a C-extension and embed assembly. See the docs for how
to write one. If you know how to implement a C-callingconvention-based
shared library in assembly (being an assembler guru you sure know how
that works), you could mimic a C-extension.

Diez
Jun 27 '08 #2
Well the problem is that I'm actually not an assembler guru, so I don't know
how to implement a dll in asm or use a c calling convention, although I'm
sure those instructions are available on the web. I was just afraid of
trying to learn that AND making python-specific extensions at the same time.
I thought of making a c extension with embedded asm, but that just seemed
less than ideal. But if somebody thinks that's the Right Way to do it,
that's good enough..

"Diez B. Roggisch" <de***@nospam.web.dewrote in message
news:69*************@mid.uni-berlin.de...
inhahe schrieb:
>Can anyone give me pointers/instructions/a template for writing a Python
extension in assembly (or better, HLA)?

You could write a C-extension and embed assembly. See the docs for how to
write one. If you know how to implement a C-callingconvention-based shared
library in assembly (being an assembler guru you sure know how that
works), you could mimic a C-extension.

Diez

Jun 27 '08 #3
inhahe schrieb:
Well the problem is that I'm actually not an assembler guru, so I don't know
how to implement a dll in asm or use a c calling convention, although I'm
sure those instructions are available on the web. I was just afraid of
trying to learn that AND making python-specific extensions at the same time.
I thought of making a c extension with embedded asm, but that just seemed
less than ideal. But if somebody thinks that's the Right Way to do it,
that's good enough..
I think the right thing to do if you are not as fluent in assembly is do
not do anything in it at all. What do you need it for?

Diez
Jun 27 '08 #4
On Fri, 16 May 2008 11:21:39 -0400
"inhahe" <in****@gmail.comwrote:
You could be right, but here are my reasons.

I need to make something that's very CPU-intensive and as fast as possible.
The faster, the better, and if it's not fast enough it won't even work.

They say that the C++ optimizer can usually optimize better than a person
coding in assembler by hand can, but I just can't believe that, at least for
me, because when I code in assembler, I feel like I can see the best way to
do it and I just can't imagine AI would even be smart enough to do it that
way...
Perhaps. Conventional wisdom says that you shouldn't optimize until
you need to though. That's one of the benefits of the way Python
works. Here's how I would do it.

1. Write the code (call it a prototype) in pure Python. Make sure that
everything is modularized based on functionality. Try to get it split
into nice, bite size chunks. Make sure that you have unit tests for
everything that you write.

2. Once the code is functioning, benchmark it and find the
bottlenecks. Replace the problem methods with a C extension. Refactor
(and check your unit tests again) if needed to break out the problem
areas into as small a piece as possible.

3. If it is still slow, embed some assembler where it is slowing down.

One advantage of this is that you always know if your optimizations are
useful. You may be surprised to find that you hardly ever need to go
beyond step 1 leaving you with the most portable and easily maintained
code that you can have.
For portability, I'd simply write different asm routines for different
systems. How wide a variety of systems I'd support I don't know. As a bare
minimum, 32-bit x86, 64-bit x86, and one or more of their available forms of
SIMD.
Even on the same processor you may have different assemblers depending
on the OS.

--
D'Arcy J.M. Cain <da***@druid.net | Democracy is three wolves
http://www.druid.net/darcy/ | and a sheep voting on
+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
Jun 27 '08 #5
I like to learn what I need, but I have done assembly before, I wrote a
terminal program in assembly for example, with ansi and avatar support. I'm
just not fluent in much other than the language itself, per se.

Perhaps C would be as fast as my asm would, but C would not allow me to use
SIMD, which seems like it would improve my speed a lot, I think my goals are
pretty much what SIMD was made for.
I think the right thing to do if you are not as fluent in assembly is do
not do anything in it at all. What do you need it for?

Diez

Jun 27 '08 #6
inhahe schrieb:
I like to learn what I need, but I have done assembly before, I wrote a
terminal program in assembly for example, with ansi and avatar support. I'm
just not fluent in much other than the language itself, per se.

Perhaps C would be as fast as my asm would, but C would not allow me to use
SIMD, which seems like it would improve my speed a lot, I think my goals are
pretty much what SIMD was made for.

That is not true. I've used the altivec-extensions myself on OSX and
inside C.

Besides, the parts of your program that are really *worth* optimizing
are astonishly few. Don't bother using assembler until you need to.

Diez
Jun 27 '08 #7

"D'Arcy J.M. Cain" <da***@druid.netwrote in message
news:ma***************************************@pyt hon.org...
>
2. Once the code is functioning, benchmark it and find the
bottlenecks. Replace the problem methods with a C extension. Refactor
(and check your unit tests again) if needed to break out the problem
areas into as small a piece as possible.
There's probably only 2 or 3 basic algorithms that will need to have all
that speed.
>
3. If it is still slow, embed some assembler where it is slowing down.
I won't know if the assembler is faster until I embed it, and if I'm going
to do that I might as well use it.
Although it's true I'd only have to embed it for one system to see (more or
less).
>
>For portability, I'd simply write different asm routines for different
systems. How wide a variety of systems I'd support I don't know. As a
bare
minimum, 32-bit x86, 64-bit x86, and one or more of their available forms
of
SIMD.

Even on the same processor you may have different assemblers depending
on the OS.
yeah I don't know much about that, I was figuring perhaps I could limit the
assembler parts / methodology to something I could write generically
enough.. and if all else fails write for the other OS's or only support
windows. also I think I should be using SIMD of some sort, and I'm not
sure but I highly doubt C++ compilers support SIMD.
Jun 27 '08 #8
>
yeah I don't know much about that, *I was figuring perhaps I could limitthe
assembler parts / methodology to something I could write generically
enough.. and if all else fails write for the other OS's or only support
windows. * also I think I should be using SIMD of some sort, and I'm not
sure but I highly doubt C++ compilers support SIMD.
You're wrong.

Maybe we could help you better if you told us what task are you
trying to achieve (or which algorithms do you think need optimization).
Jun 27 '08 #9
On May 16, 11:24*am, "inhahe" <inh...@gmail.comwrote:
"D'Arcy J.M. Cain" <da...@druid.netwrote in messagenews:ma************************************ ***@python.org...
2. Once the code is functioning, benchmark it and find the
bottlenecks. *Replace the problem methods with a C extension. *Refactor
(and check your unit tests again) if needed to break out the problem
areas into as small a piece as possible.

There's probably only 2 or 3 basic algorithms that will need to have all
that speed.
3. *If it is still slow, embed some assembler where it is slowing down..

I won't know if the assembler is faster until I embed it, and if I'm going
to do that I might as well use it.
Although it's true I'd only have to embed it for one system to see (more or
less).
For portability, I'd simply write different asm routines for different
systems. *How wide a variety of systems I'd support I don't know. *As a
bare
minimum, 32-bit x86, 64-bit x86, and one or more of their available forms
of
SIMD.
Even on the same processor you may have different assemblers depending
on the OS.

yeah I don't know much about that, *I was figuring perhaps I could limitthe
assembler parts / methodology to something I could write generically
enough.. and if all else fails write for the other OS's or only support
windows. * also I think I should be using SIMD of some sort, and I'm not
sure but I highly doubt C++ compilers support SIMD.
The Society for Inherited Metabolic Disorders?

Why wouldn't the compilers support it? It's part of the x86
architexture,
isn't it?
Jun 27 '08 #10
>3. If it is still slow, embed some assembler where it is slowing down.
>>

I won't know if the assembler is faster until I embed it, and if I'm going
to do that I might as well use it.
Although it's true I'd only have to embed it for one system to see (more or
less).
Regardless of whether it's faster, I thought you indicated that really
it's most important that it's fast enough.

That said, it's not true that you won't know if it's faster until you
embed it--that's what unit testing would be for. Write your loop(s)
in Python, C, ASM, <insert language hereand run them, on actual
inputs (or synthetic, if necessary, I suppose). That's how you'll be
able to tell whether it's even worth the effort to get the assembly
callable from Python.

On Fri, May 16, 2008 at 1:27 PM, Mensanator <me********@aol.comwrote:
>
Why wouldn't the compilers support it? It's part of the x86
architexture,
isn't it?
Yeah, but I don't know if it uses it by default, and my guess is it
depends on how the compiler back end goes about optimizing the code
for whether it will see data access/computation patterns amenable to
SIMD.
Jun 27 '08 #11
On May 16, 12:24 pm, "inhahe" <inh...@gmail.comwrote:
"D'Arcy J.M. Cain" <da...@druid.netwrote in messagenews:ma************************************ ***@python.org...
2. Once the code is functioning, benchmark it and find the
bottlenecks. Replace the problem methods with a C extension. Refactor
(and check your unit tests again) if needed to break out the problem
areas into as small a piece as possible.

There's probably only 2 or 3 basic algorithms that will need to have all
that speed.
3. If it is still slow, embed some assembler where it is slowing down.

I won't know if the assembler is faster until I embed it, and if I'm going
to do that I might as well use it.
You won't know if the C is faster than the assembly until you write
it, and if you're going to do that you might as well use it...

If the C is fast enough, there's no point in wasting time writing the
assembly.

(Also FWIW C and C++ are different languages; you seem to conflate the
two a few times upthread).
Jun 27 '08 #12

"Dan Upton" <up***@virginia.eduwrote in message
news:ma***************************************@pyt hon.org...

>
On Fri, May 16, 2008 at 1:27 PM, Mensanator <me********@aol.comwrote:
>>
Why wouldn't the compilers support it? It's part of the x86
architexture,
isn't it?

Yeah, but I don't know if it uses it by default, and my guess is it
depends on how the compiler back end goes about optimizing the code
for whether it will see data access/computation patterns amenable to
SIMD.
perhaps you explicitly use them with some extended syntax or something?
Jun 27 '08 #13
On Fri, May 16, 2008 at 2:08 PM, inhahe <in****@gmail.comwrote:
>
"Dan Upton" <up***@virginia.eduwrote in message
news:ma***************************************@pyt hon.org...

>>
On Fri, May 16, 2008 at 1:27 PM, Mensanator <me********@aol.comwrote:
>>>
Why wouldn't the compilers support it? It's part of the x86
architexture,
isn't it?

Yeah, but I don't know if it uses it by default, and my guess is it
depends on how the compiler back end goes about optimizing the code
for whether it will see data access/computation patterns amenable to
SIMD.

perhaps you explicitly use them with some extended syntax or something?
Hey, I learned something today.

http://www.tuleriit.ee/progs/rexample.php

Also, from the gcc manpage, apparently 387 is the default when
compiling for 32 bit architectures, and using sse instructions is
default on x86-64 architectures, but you can use -march=(some
architecture with simd instructions), -msse, -msse2, -msse3, or
-mfpmath=(one of 387, sse, or sse,387) to get the compiler to use
them.

As long as we're talking about compilers and such... anybody want to
chip in how this works in Python bytecode or what the bytecode
interpreter does? Okay, wait, before anybody says that's
implementation-dependent: does anybody want to chip in what the
CPython implementation does? (or any other implementation they're
familiar with, I guess)
Jun 27 '08 #14
On Fri, 16 May 2008 10:13:04 -0400, inhahe wrote:
Can anyone give me pointers/instructions/a template for writing a Python
extension in assembly (or better, HLA)?
Look up pygame sources. They have some hot inline MMX stuff.
I experimented with this rescently and I must admit that it's etremely
hard to beat C compiler. My first asm code actually was slower than C,
only after reading Intel docs, figuring out what makes 'movq' and
'movntq' different I was able to write something that was several times
faster than C.

D language inline asm and tools to make Python extensions look very
promising although I haven't tried them yet.

-- Ivan
Jun 27 '08 #15
Also, from the gcc manpage, apparently 387 is the default when
compiling for 32 bit architectures, and using sse instructions is
default on x86-64 architectures, but you can use -march=(some
architecture with simd instructions), -msse, -msse2, -msse3, or
-mfpmath=(one of 387, sse, or sse,387) to get the compiler to use
them.

As long as we're talking about compilers and such... anybody want to
chip in how this works in Python bytecode or what the bytecode
interpreter does? Okay, wait, before anybody says that's
implementation-dependent: does anybody want to chip in what the
CPython implementation does? (or any other implementation they're
familiar with, I guess)
There isn't anything in (C)Python aware of these architecture extensions
- unless 3rd-party-libs utilize it. The bytecode-interpreter is machine
and os-independent. So it's above that level anyway. And AFAIK all
mathematical functionality is the one exposed by the OS math-libs.

Having said that, there are of course libs like NumPy that do take
advantage of these architectures, through the use of e.g. lib atlas.

Diez
Jun 27 '08 #16
On Fri, 16 May 2008 11:21:39 -0400, "inhahe"
<in****@gmail.comwrote:
They say that the C++ optimizer can usually optimize
better than a person coding in assembler by hand can,
but I just can't believe that, at least for me,
because when I code in assembler,
if one hand compiles C++ into assembler, the result will
doubtless be crap compared to what the compiler will
generate. If, however, one expresses one's algorithm in
assembler, rather than in C++, the result may well be
dramatically more efficient than expressing one's
algorithm in C++ and the compiler translating it into
assembler. A factor of ten is quite common.

--
----------------------
We have the right to defend ourselves and our property, because
of the kind of animals that we are. True law derives from this
right, not from the arbitrary power of the omnipotent state.

http://www.jim.com/ James A. Donald
Jun 27 '08 #17

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Simon Foster | last post by:
Anyone have any experience or pointers to how to go about creating a parser lexer for assemble in Python. I was thinking of using PLY but wonder whether it's too heavyweight for what I want. ...
53
by: Krystian | last post by:
Hi are there any future perspectives for Python to be as fast as java? i would like to use Python as a language for writing games. best regards krystian
0
by: mario.danic | last post by:
Hello, If there is anyone familiar with python C api, and would like to get involved with writing python extensions for above two libs (http://libburn.pykix.org) please contact me. Kind...
2
by: antonyliu2002 | last post by:
I am testing AJAX. I've downloaded the AJAX Extension and the CTP December package and installed on BOTH my development machine and the production server. Then I created a very very simple web...
3
by: =?Utf-8?B?TWljaGFlbA==?= | last post by:
Hi, I have been upgrading to AJAX 1.0. I removed this line from my config file: <add assembly="Microsoft.Web.Extensions, Version=1.0.61025.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35"/>...
1
by: spohle | last post by:
hi, i use a lot the enumerate in my scripts and got really interested in possibly writing my own enumerate as an extension, for which i would want to extend it to be able to pass a start and...
15
by: kyosohma | last post by:
Hi, I am trying to get a small group of volunteers together to create Windows binaries for any Python extension developer that needs them, much like the package/extension builders who volunteer...
1
by: Yan | last post by:
Hi, I'm apparently far from being the 1st one to meet this error... My asp.net ajax page requires ScriptManager.axd but gets a 404. following a lot of "This is how I solved it" posts, I : -...
0
by: Adam Salisbury | last post by:
**To members of microsoft.public.dotnet.framework, apologies for the crosspost. I originally posted this message into that group however have since realised this may have been a better...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.