numarray speed question

grv

So it is supposed to be very fast to have an array of say 5 million
integers stored in a binary file and do

a = numarray.fromfi le('filename', (2, 2, 2))
numarray.add(a, 9, a)

but how is that faster than reading the entire file into memory and then
having a for loop in C:
(loop over range) {
*p++ += 9 }

or is that essentially what's going on?

Jul 18 '05 #1

Subscribe Reply

1988

Christopher T King

On Wed, 4 Aug 2004, grv wrote:

So it is supposed to be very fast to have an array of say 5 million
integers stored in a binary file and do

a = numarray.fromfi le('filename', (2, 2, 2))
numarray.add(a, 9, a)

but how is that faster than reading the entire file into memory and then
having a for loop in C:
(loop over range) {
*p++ += 9 }

or is that essentially what's going on?

That's essentially what's going on ;) The point of numarray isn't to be
hyper-fast, but to be as fast as the equivalent C (or Fortran, or
what-have-you) implementation. In many cases, it's faster, because
numarray is designed with several speed hacks in mind, but it's nothing
you can't do (without a little work) in C.

Jul 18 '05 #2

grv

sq******@WPI.ED U (Christopher T King) wrote in
<Pi************ *************** ***********@ccc 6.wpi.edu>:

On Wed, 4 Aug 2004, grv wrote:
So it is supposed to be very fast to have an array of say 5 million
integers stored in a binary file and do

a = numarray.fromfi le('filename', (2, 2, 2))
numarray.add(a, 9, a)

but how is that faster than reading the entire file into memory and
then having a for loop in C:
(loop over range) {
*p++ += 9 }

or is that essentially what's going on?

That's essentially what's going on ;) The point of numarray isn't to be
hyper-fast, but to be as fast as the equivalent C (or Fortran, or
what-have-you) implementation. In many cases, it's faster, because
numarray is designed with several speed hacks in mind, but it's nothing
you can't do (without a little work) in C.

Yes but see I'm interested in what speed hacks can actually be done to
improve the above code. I just don't see anything that can iterate and add
over that memory region faster.

Jul 18 '05 #3

David M. Cooke

At some point, gr****@hotmail. com (grv) wrote:

sq******@WPI.ED U (Christopher T King) wrote in
<Pi************ *************** ***********@ccc 6.wpi.edu>:
On Wed, 4 Aug 2004, grv wrote:
So it is supposed to be very fast to have an array of say 5 million
integers stored in a binary file and do

a = numarray.fromfi le('filename', (2, 2, 2))
numarray.add(a, 9, a)

but how is that faster than reading the entire file into memory and
then having a for loop in C:
(loop over range) {
*p++ += 9 }

or is that essentially what's going on?

That's essentially what's going on ;) The point of numarray isn't to be
hyper-fast, but to be as fast as the equivalent C (or Fortran, or
what-have-you) implementation. In many cases, it's faster, because
numarray is designed with several speed hacks in mind, but it's nothing
you can't do (without a little work) in C.

Yes but see I'm interested in what speed hacks can actually be done to
improve the above code. I just don't see anything that can iterate and add
over that memory region faster.

Well, numarray probably isn't faster for this case (adding a scalar to
a vector). In fact, the relevant numarray code looks like this:

static int add_Float64_vec tor_scalar(long niter, long ninargs, long noutargs, vo
id **buffers, long *bsizes) {
long i;
Float64 *tin1 = (Float64 *) buffers[0];
Float64 tscalar = *(Float64 *) buffers[1];
Float64 *tout = (Float64 *) buffers[2];

for (i=0; i<niter; i++, tin1++, tout++) {
*tout = *tin1 + tscalar;
}
return 0;
}

What you *do* get with numarray is:

1) transparent handling of byteswapped, misaligned, discontiguous,
type-mismatched data (say, from a memory-mapped file generated on a
system with a different byte order as single-precision instead of
double-precision).

2) ease-of-use. That two lines of python code above is _it_ (except
for an 'import numarray' statement). Your C code isn't anywhere
nearly complete enough to use. You would need to add routines to
read the file, etc.

3) interactive use. You can do all this in the Python command line. If
you want to multiply instead of add, an up-arrow and some editing
will do that. With C, you'd have to recompile.

If you need the best possible speed (after doing it in numarray and
finding it isn't fast enough), you can write an extension module to
do that bit in C, or look into scipy.weave for inlining C code, or into
f2py for linking Fortran code to Python.

--
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke
|cookedm(at)phy sics(dot)mcmast er(dot)ca

Jul 18 '05 #4

grv575

co**********@ph ysics.mcmaster. ca (David M. Cooke) wrote in message news:<qn******* ******@arbutus. physics.mcmaste r.ca>...

Well, numarray probably isn't faster for this case (adding a scalar to
a vector). In fact, the relevant numarray code looks like this:

static int add_Float64_vec tor_scalar(long niter, long ninargs, long noutargs, vo
id **buffers, long *bsizes) {
long i;
Float64 *tin1 = (Float64 *) buffers[0];
Float64 tscalar = *(Float64 *) buffers[1];
Float64 *tout = (Float64 *) buffers[2];

for (i=0; i<niter; i++, tin1++, tout++) {
*tout = *tin1 + tscalar;
}
return 0;
}
OK good. So doing it in C isn't really that much of a headache when
it comes to optimization.
What you *do* get with numarray is:

1) transparent handling of byteswapped, misaligned, discontiguous,
type-mismatched data (say, from a memory-mapped file generated on a
system with a different byte order as single-precision instead of
double-precision).
Heh. Try timing the example I gave (a += 5) using byteswapped vs.
byteswap(). It's fairly fast to do the byteswap. If you go the
interpretation way (byteswapped) then all subsequent array operations
are at least an order of magnitude slower (5 million elements test
example).
2) ease-of-use. That two lines of python code above is _it_ (except
for an 'import numarray' statement). Your C code isn't anywhere
nearly complete enough to use. You would need to add routines to
read the file, etc.
Can't argue here.
3) interactive use. You can do all this in the Python command line. If
you want to multiply instead of add, an up-arrow and some editing
will do that. With C, you'd have to recompile.
As much as I hate the .edu push for interpreted languages like lisp
and ml, having a python interpreter to test code out real quickly
before it goes into the source script is real nice.
If you need the best possible speed (after doing it in numarray and
finding it isn't fast enough), you can write an extension module to
do that bit in C, or look into scipy.weave for inlining C code, or into
f2py for linking Fortran code to Python.

Well re speed what really bothers me is the slowness in which numarray
is improving in this area. If I have to take 1000 FFT's over 32
element arrays, then it's useless. I'll have to install both numarray
and numeric :/

Jul 18 '05 #5

David M. Cooke

At some point, gr****@hotmail. com (grv575) wrote:

What you *do* get with numarray is:

1) transparent handling of byteswapped, misaligned, discontiguous,
type-mismatched data (say, from a memory-mapped file generated on a
system with a different byte order as single-precision instead of
double-precision).

Heh. Try timing the example I gave (a += 5) using byteswapped vs.
byteswap(). It's fairly fast to do the byteswap. If you go the
interpretation way (byteswapped) then all subsequent array operations
are at least an order of magnitude slower (5 million elements test
example).

You mean something like
a = arange(0, 5000000, type=Float64).b yteswapped()
a += 5

vs.
a = arange(0, 5000000, type=Float64)
a.byteswap()
a += 5

? I get the same time for the a+=5 in each case -- and it's only twice
as slow as operating on a non-byteswapped version. Note that numarray
calls the ufunc add routine with non-byteswapped numbers; it takes a
block, orders it correctly, then adds 5 to that, does the byteswap on
the result, and stores that back. (You're not making a full copy of
the array; just a large enough section at a time to do useful work.)

If you need the best possible speed (after doing it in numarray and
finding it isn't fast enough), you can write an extension module to
do that bit in C, or look into scipy.weave for inlining C code, or into
f2py for linking Fortran code to Python.

Well re speed what really bothers me is the slowness in which numarray
is improving in this area. If I have to take 1000 FFT's over 32
element arrays, then it's useless. I'll have to install both numarray
and numeric :/

Maybe what you need is a package designed for *small* arrays ( < 1000).
Simple C wrappers; just C doubles and ints, no byteswap, non-aligned.
Maybe a fixed number of dimensions. Probably easy to throw something
together using Pyrex. Or, wrap blitz++ with boost::python.

--
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke
|cookedm(at)phy sics(dot)mcmast er(dot)ca

Jul 18 '05 #6

grv

co**********@ph ysics.mcmaster. ca (David M. Cooke) wrote in
<qn************ *@arbutus.physi cs.mcmaster.ca> :

At some point, gr****@hotmail. com (grv575) wrote:
Heh. Try timing the example I gave (a += 5) using byteswapped vs.
byteswap(). It's fairly fast to do the byteswap. If you go the
interpretation way (byteswapped) then all subsequent array operations
are at least an order of magnitude slower (5 million elements test
example).

You mean something like
a = arange(0, 5000000, type=Float64).b yteswapped()
a += 5

vs.
a = arange(0, 5000000, type=Float64)
a.byteswap()
a += 5

? I get the same time for the a+=5 in each case -- and it's only twice
as slow as operating on a non-byteswapped version. Note that numarray
calls the ufunc add routine with non-byteswapped numbers; it takes a
block, orders it correctly, then adds 5 to that, does the byteswap on
the result, and stores that back. (You're not making a full copy of
the array; just a large enough section at a time to do useful work.)

It must be using some sort of cache for the multiplication. Seems like on
the first run it takes 6 seconds and subsequently .05 seconds for either
version.
Maybe what you need is a package designed for *small* arrays ( < 1000).
Simple C wrappers; just C doubles and ints, no byteswap, non-aligned.
Maybe a fixed number of dimensions. Probably easy to throw something
together using Pyrex. Or, wrap blitz++ with boost::python.

I'll check out Numeric first. Would rather have a drop-in solution (which
hopefully will get more optimized in future releases) rather than hacking
my own wrappers. Is it some purist mentality that's keeping numarray from
dropping to C code for the time-critical routines? Or can a lot of the
speed issues be attributed to the overhead of using objects for the library
(numarray does seem more general)?

Jul 18 '05 #7

David M. Cooke

At some point, gr****@hotmail. com (grv) wrote:

co**********@ph ysics.mcmaster. ca (David M. Cooke) wrote in
<qn************ *@arbutus.physi cs.mcmaster.ca> :
At some point, gr****@hotmail. com (grv575) wrote:

Heh. Try timing the example I gave (a += 5) using byteswapped vs.
byteswap(). It's fairly fast to do the byteswap. If you go the
interpretation way (byteswapped) then all subsequent array operations
are at least an order of magnitude slower (5 million elements test
example).

You mean something like
a = arange(0, 5000000, type=Float64).b yteswapped()
a += 5

vs.
a = arange(0, 5000000, type=Float64)
a.byteswap( )
a += 5

? I get the same time for the a+=5 in each case -- and it's only twice
as slow as operating on a non-byteswapped version. Note that numarray
calls the ufunc add routine with non-byteswapped numbers; it takes a
block, orders it correctly, then adds 5 to that, does the byteswap on
the result, and stores that back. (You're not making a full copy of
the array; just a large enough section at a time to do useful work.)

It must be using some sort of cache for the multiplication. Seems like on
the first run it takes 6 seconds and subsequently .05 seconds for either
version.

There is. The ufunc for the addition gets cached, so the first time
takes longer (but not that much???)

Maybe what you need is a package designed for *small* arrays ( < 1000).
Simple C wrappers; just C doubles and ints, no byteswap, non-aligned.
Maybe a fixed number of dimensions. Probably easy to throw something
together using Pyrex. Or, wrap blitz++ with boost::python.

I'll check out Numeric first. Would rather have a drop-in solution (which
hopefully will get more optimized in future releases) rather than hacking
my own wrappers. Is it some purist mentality that's keeping numarray from
dropping to C code for the time-critical routines? Or can a lot of the
speed issues be attributed to the overhead of using objects for the library
(numarray does seem more general)?

It's the object overhead in numarray. The developers moved stuff up to
Python, where it's more flexible to handle. Numeric is faster for
small arrays (say < 3000), but numarray is much better at large
arrays. I have some speed comparisions at
http://arbutus.mcmaster.ca/dmc/numpy/

I did a simple wrapper using Pyrex the other night for a vector of
doubles (it just does addition, so it's not much good :-) It's twice
as fast as Numeric, so I might give it a further try.

--
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke
|cookedm(at)phy sics(dot)mcmast er(dot)ca

Jul 18 '05 #8

grv575

Is it true that python uses doubles for all it's internal floating
point arithmetic even if you're using something like numarray's
Complex32? Is it possible to do single precision ffts in numarray or
no?

co**********@ph ysics.mcmaster. ca (David M. Cooke) wrote in message news:<qn******* ******@arbutus. physics.mcmaste r.ca>...

At some point, gr****@hotmail. com (grv) wrote:
co**********@ph ysics.mcmaster. ca (David M. Cooke) wrote in
<qn************ *@arbutus.physi cs.mcmaster.ca> :
At some point, gr****@hotmail. com (grv575) wrote: Heh. Try timing the example I gave (a += 5) using byteswapped vs.
byteswap(). It's fairly fast to do the byteswap. If you go the
interpretation way (byteswapped) then all subsequent array operations
are at least an order of magnitude slower (5 million elements test
example).

You mean something like
a = arange(0, 5000000, type=Float64).b yteswapped()
a += 5

vs.
a = arange(0, 5000000, type=Float64)
a.byteswap( )
a += 5

? I get the same time for the a+=5 in each case -- and it's only twice
as slow as operating on a non-byteswapped version. Note that numarray
calls the ufunc add routine with non-byteswapped numbers; it takes a
block, orders it correctly, then adds 5 to that, does the byteswap on
the result, and stores that back. (You're not making a full copy of
the array; just a large enough section at a time to do useful work.)

It must be using some sort of cache for the multiplication. Seems like on
the first run it takes 6 seconds and subsequently .05 seconds for either
version.

There is. The ufunc for the addition gets cached, so the first time
takes longer (but not that much???)
Maybe what you need is a package designed for *small* arrays ( < 1000).
Simple C wrappers; just C doubles and ints, no byteswap, non-aligned.
Maybe a fixed number of dimensions. Probably easy to throw something
together using Pyrex. Or, wrap blitz++ with boost::python.

I'll check out Numeric first. Would rather have a drop-in solution (which
hopefully will get more optimized in future releases) rather than hacking
my own wrappers. Is it some purist mentality that's keeping numarray from
dropping to C code for the time-critical routines? Or can a lot of the
speed issues be attributed to the overhead of using objects for the library
(numarray does seem more general)?

It's the object overhead in numarray. The developers moved stuff up to
Python, where it's more flexible to handle. Numeric is faster for
small arrays (say < 3000), but numarray is much better at large
arrays. I have some speed comparisions at
http://arbutus.mcmaster.ca/dmc/numpy/

I did a simple wrapper using Pyrex the other night for a vector of
doubles (it just does addition, so it's not much good :-) It's twice
as fast as Numeric, so I might give it a further try.

Jul 18 '05 #9

Tim Hochberg

grv575 wrote:

Is it true that python uses doubles for all it's internal floating
point arithmetic
Yes.

even if you're using something like numarray's Complex32?
No. Numarray is an extension module and can use whatever numeric types
it feels like. Float32 for instance is an array of C floats (assuming
floats are 32 bits on your box, which they almost certainly are).
Is it possible to do single precision ffts in numarray or no?
I believe so, but I'm not sure off the top of my head. I recommend that
you ask on numpy-discussion <nu************ **@lists.source forge.net> or
peek at the implementation. It's possible that all FFTs are done double
precision, but I don't think so.

-tim

co**********@ph ysics.mcmaster. ca (David M. Cooke) wrote in message news:<qn******* ******@arbutus. physics.mcmaste r.ca>...
At some point, gr****@hotmail. com (grv) wrote:

co********** @physics.mcmast er.ca (David M. Cooke) wrote in
<qn********* ****@arbutus.ph ysics.mcmaster. ca>:
At some point, gr****@hotmail. com (grv575) wrote:

>Heh. Try timing the example I gave (a += 5) using byteswapped vs.
>byteswap() . It's fairly fast to do the byteswap. If you go the
>interpreta tion way (byteswapped) then all subsequent array operations
>are at least an order of magnitude slower (5 million elements test
>example) .

You mean something like
a = arange(0, 5000000, type=Float64).b yteswapped()
a += 5

vs.
a = arange(0, 5000000, type=Float64)
a.byteswap( )
a += 5

? I get the same time for the a+=5 in each case -- and it's only twice
as slow as operating on a non-byteswapped version. Note that numarray
calls the ufunc add routine with non-byteswapped numbers; it takes a
block, orders it correctly, then adds 5 to that, does the byteswap on
the result, and stores that back. (You're not making a full copy of
the array; just a large enough section at a time to do useful work.)

It must be using some sort of cache for the multiplication. Seems like on
the first run it takes 6 seconds and subsequently .05 seconds for either
version.

There is. The ufunc for the addition gets cached, so the first time
takes longer (but not that much???)

Maybe what you need is a package designed for *small* arrays ( < 1000).
Simple C wrappers; just C doubles and ints, no byteswap, non-aligned.
Maybe a fixed number of dimensions. Probably easy to throw something
together using Pyrex. Or, wrap blitz++ with boost::python.

I'll check out Numeric first. Would rather have a drop-in solution (which
hopefully will get more optimized in future releases) rather than hacking
my own wrappers. Is it some purist mentality that's keeping numarray from
dropping to C code for the time-critical routines? Or can a lot of the
speed issues be attributed to the overhead of using objects for the library
(numarray does seem more general)?

It's the object overhead in numarray. The developers moved stuff up to
Python, where it's more flexible to handle. Numeric is faster for
small arrays (say < 3000), but numarray is much better at large
arrays. I have some speed comparisions at
http://arbutus.mcmaster.ca/dmc/numpy/

I did a simple wrapper using Pyrex the other night for a vector of
doubles (it just does addition, so it's not much good :-) It's twice
as fast as Numeric, so I might give it a further try.

Jul 18 '05 #10

Similar topics

1910

Numarray for Python 2.3

by: Tim Rowe | last post by:

Does numarray-0.5.win32-py2.2.exe work with Python 2.3? If not, is there a version that will? Tia, Tim

Python

1477

Sub-classing NumArray - two questions

by: Colin J. Williams | last post by:

numarray is a package which is under development and intended to replace Numeric, an efficient and operational package. One of the classes in numarray is NumArray. As currently implemented, instances of this class are instantiated using factory functions. This appears to restrict the sub-classing of NumArray Examples: a sub-class Matrix, an array restricted to two dimensions, or a sub-class Mix, which combines the NumArray with...

Python

1486

numarray and SMP

by: Christopher T King | last post by:

In a quest to speed up numarray computations, I tried writing a 'threaded array' class for use on SMP systems that would distribute its workload across the processors. I hit a snag when I found out that since the Python interpreter is not reentrant, this effectively disables parallel processing in Python. I've come up with two solutions to this problem, both involving numarray's C functions that perform the actual vector operations: 1)...

Python

1703

iterator over a numarray?

by: Alex Hunsley | last post by:

I'm looking for a way to iterate over all the items in a numarray. Writing a few nested loops isn't going to cut it, because the numarray in question could be of any dimension... I am aware of the revel function, but that appears to just flatten the numarray. What I need is an iterator that can give each value and the coordinates in the array of that item.... thanks alex

Python

1600

numarray question

by: SunX | last post by:

I tried to initialize a float point array by: import numarray xur = numarray.fromfunction(lambda x,y,z:x*y*z, (2, 2, 2)) but I ended up with an integer array even though x, y, and z are all floats. BTW, how do you unzip NumTut in windows? And is there a newer version? Thank you

Python

3212

Rounding the elements of a Python array (numarray module)

by: Chris P. | last post by:

Hi. I have a very simple task to perform and I'm having a hard time doing it. Given an array called 'x' (created using the numarray library), is there a single command that rounds each of its elements to the nearest integer? I've already tried something like >>> x_rounded = x.astype(numarray.Int) but that only truncates each element (i.e. '5.9' becomes '5'). I've read over all the relevant numarray documentation, and it

Python

2056

Dynamically growing numarray array.

by: Ivan Vinogradov | last post by:

Hello All, this seems like a trivial problem, but I just can't find an elegant solution neither by myself, nor with google's help. I'd like to be able to keep an array representing coordinates for a system of points. Since I'd like to operate on each point's coordinates individually, for speed and ufuncs numarray fits the bill perfectly, especially since system.coordinates

Python

2242

numeric/numpy/numarray

by: Bryan | last post by:

hi, what is the difference among numeric, numpy and numarray? i'm going to start using matplotlib soon and i'm not sure which one i should use. this page says, "Numarray is a re-implementation of an older Python array module called Numeric" http://www.stsci.edu/resources/software_hardware/numarray

Python

1997

Speed comparison of Numeric, numarray, numpy

by: robert | last post by:

just a note - some speed comparisons : 0.60627370238398726 0.42836673376223189 0.36965815487747022 0.016557970357098384 0.15692469294117473 0.01951756438393204

Python

8830

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

9541

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

8242

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

6796

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

6074

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

4602

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

3312

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

2782

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

2215

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General