So it is supposed to be very fast to have an array of say 5 million
integers stored in a binary file and do
a = numarray.fromfile('filename', (2, 2, 2))
numarray.add(a, 9, a)
but how is that faster than reading the entire file into memory and then
having a for loop in C:
(loop over range) {
*p++ += 9 }
or is that essentially what's going on? 11 1864
On Wed, 4 Aug 2004, grv wrote: So it is supposed to be very fast to have an array of say 5 million integers stored in a binary file and do
a = numarray.fromfile('filename', (2, 2, 2)) numarray.add(a, 9, a)
but how is that faster than reading the entire file into memory and then having a for loop in C: (loop over range) { *p++ += 9 }
or is that essentially what's going on?
That's essentially what's going on ;) The point of numarray isn't to be
hyper-fast, but to be as fast as the equivalent C (or Fortran, or
what-have-you) implementation. In many cases, it's faster, because
numarray is designed with several speed hacks in mind, but it's nothing
you can't do (without a little work) in C. sq******@WPI.EDU (Christopher T King) wrote in
<Pi**************************************@ccc6.wpi .edu>: On Wed, 4 Aug 2004, grv wrote:
So it is supposed to be very fast to have an array of say 5 million integers stored in a binary file and do
a = numarray.fromfile('filename', (2, 2, 2)) numarray.add(a, 9, a)
but how is that faster than reading the entire file into memory and then having a for loop in C: (loop over range) { *p++ += 9 }
or is that essentially what's going on?
That's essentially what's going on ;) The point of numarray isn't to be hyper-fast, but to be as fast as the equivalent C (or Fortran, or what-have-you) implementation. In many cases, it's faster, because numarray is designed with several speed hacks in mind, but it's nothing you can't do (without a little work) in C.
Yes but see I'm interested in what speed hacks can actually be done to
improve the above code. I just don't see anything that can iterate and add
over that memory region faster.
At some point, gr****@hotmail.com (grv) wrote: sq******@WPI.EDU (Christopher T King) wrote in <Pi**************************************@ccc6.wpi .edu>:
On Wed, 4 Aug 2004, grv wrote:
So it is supposed to be very fast to have an array of say 5 million integers stored in a binary file and do
a = numarray.fromfile('filename', (2, 2, 2)) numarray.add(a, 9, a)
but how is that faster than reading the entire file into memory and then having a for loop in C: (loop over range) { *p++ += 9 }
or is that essentially what's going on?
That's essentially what's going on ;) The point of numarray isn't to be hyper-fast, but to be as fast as the equivalent C (or Fortran, or what-have-you) implementation. In many cases, it's faster, because numarray is designed with several speed hacks in mind, but it's nothing you can't do (without a little work) in C.
Yes but see I'm interested in what speed hacks can actually be done to improve the above code. I just don't see anything that can iterate and add over that memory region faster.
Well, numarray probably isn't faster for this case (adding a scalar to
a vector). In fact, the relevant numarray code looks like this:
static int add_Float64_vector_scalar(long niter, long ninargs, long noutargs, vo
id **buffers, long *bsizes) {
long i;
Float64 *tin1 = (Float64 *) buffers[0];
Float64 tscalar = *(Float64 *) buffers[1];
Float64 *tout = (Float64 *) buffers[2];
for (i=0; i<niter; i++, tin1++, tout++) {
*tout = *tin1 + tscalar;
}
return 0;
}
What you *do* get with numarray is:
1) transparent handling of byteswapped, misaligned, discontiguous,
type-mismatched data (say, from a memory-mapped file generated on a
system with a different byte order as single-precision instead of
double-precision).
2) ease-of-use. That two lines of python code above is _it_ (except
for an 'import numarray' statement). Your C code isn't anywhere
nearly complete enough to use. You would need to add routines to
read the file, etc.
3) interactive use. You can do all this in the Python command line. If
you want to multiply instead of add, an up-arrow and some editing
will do that. With C, you'd have to recompile.
If you need the best possible speed (after doing it in numarray and
finding it isn't fast enough), you can write an extension module to
do that bit in C, or look into scipy.weave for inlining C code, or into
f2py for linking Fortran code to Python.
--
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke
|cookedm(at)physics(dot)mcmaster(dot)ca
co**********@physics.mcmaster.ca (David M. Cooke) wrote in message news:<qn*************@arbutus.physics.mcmaster.ca> ... Well, numarray probably isn't faster for this case (adding a scalar to a vector). In fact, the relevant numarray code looks like this:
static int add_Float64_vector_scalar(long niter, long ninargs, long noutargs, vo id **buffers, long *bsizes) { long i; Float64 *tin1 = (Float64 *) buffers[0]; Float64 tscalar = *(Float64 *) buffers[1]; Float64 *tout = (Float64 *) buffers[2];
for (i=0; i<niter; i++, tin1++, tout++) { *tout = *tin1 + tscalar; } return 0; }
OK good. So doing it in C isn't really that much of a headache when
it comes to optimization.
What you *do* get with numarray is:
1) transparent handling of byteswapped, misaligned, discontiguous, type-mismatched data (say, from a memory-mapped file generated on a system with a different byte order as single-precision instead of double-precision).
Heh. Try timing the example I gave (a += 5) using byteswapped vs.
byteswap(). It's fairly fast to do the byteswap. If you go the
interpretation way (byteswapped) then all subsequent array operations
are at least an order of magnitude slower (5 million elements test
example).
2) ease-of-use. That two lines of python code above is _it_ (except for an 'import numarray' statement). Your C code isn't anywhere nearly complete enough to use. You would need to add routines to read the file, etc.
Can't argue here.
3) interactive use. You can do all this in the Python command line. If you want to multiply instead of add, an up-arrow and some editing will do that. With C, you'd have to recompile.
As much as I hate the .edu push for interpreted languages like lisp
and ml, having a python interpreter to test code out real quickly
before it goes into the source script is real nice.
If you need the best possible speed (after doing it in numarray and finding it isn't fast enough), you can write an extension module to do that bit in C, or look into scipy.weave for inlining C code, or into f2py for linking Fortran code to Python.
Well re speed what really bothers me is the slowness in which numarray
is improving in this area. If I have to take 1000 FFT's over 32
element arrays, then it's useless. I'll have to install both numarray
and numeric :/
At some point, gr****@hotmail.com (grv575) wrote: What you *do* get with numarray is:
1) transparent handling of byteswapped, misaligned, discontiguous, type-mismatched data (say, from a memory-mapped file generated on a system with a different byte order as single-precision instead of double-precision).
Heh. Try timing the example I gave (a += 5) using byteswapped vs. byteswap(). It's fairly fast to do the byteswap. If you go the interpretation way (byteswapped) then all subsequent array operations are at least an order of magnitude slower (5 million elements test example).
You mean something like
a = arange(0, 5000000, type=Float64).byteswapped()
a += 5
vs.
a = arange(0, 5000000, type=Float64)
a.byteswap()
a += 5
? I get the same time for the a+=5 in each case -- and it's only twice
as slow as operating on a non-byteswapped version. Note that numarray
calls the ufunc add routine with non-byteswapped numbers; it takes a
block, orders it correctly, then adds 5 to that, does the byteswap on
the result, and stores that back. (You're not making a full copy of
the array; just a large enough section at a time to do useful work.) If you need the best possible speed (after doing it in numarray and finding it isn't fast enough), you can write an extension module to do that bit in C, or look into scipy.weave for inlining C code, or into f2py for linking Fortran code to Python.
Well re speed what really bothers me is the slowness in which numarray is improving in this area. If I have to take 1000 FFT's over 32 element arrays, then it's useless. I'll have to install both numarray and numeric :/
Maybe what you need is a package designed for *small* arrays ( < 1000).
Simple C wrappers; just C doubles and ints, no byteswap, non-aligned.
Maybe a fixed number of dimensions. Probably easy to throw something
together using Pyrex. Or, wrap blitz++ with boost::python.
--
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke
|cookedm(at)physics(dot)mcmaster(dot)ca
co**********@physics.mcmaster.ca (David M. Cooke) wrote in
<qn*************@arbutus.physics.mcmaster.ca>: At some point, gr****@hotmail.com (grv575) wrote:
Heh. Try timing the example I gave (a += 5) using byteswapped vs. byteswap(). It's fairly fast to do the byteswap. If you go the interpretation way (byteswapped) then all subsequent array operations are at least an order of magnitude slower (5 million elements test example).
You mean something like a = arange(0, 5000000, type=Float64).byteswapped() a += 5
vs. a = arange(0, 5000000, type=Float64) a.byteswap() a += 5
? I get the same time for the a+=5 in each case -- and it's only twice as slow as operating on a non-byteswapped version. Note that numarray calls the ufunc add routine with non-byteswapped numbers; it takes a block, orders it correctly, then adds 5 to that, does the byteswap on the result, and stores that back. (You're not making a full copy of the array; just a large enough section at a time to do useful work.)
It must be using some sort of cache for the multiplication. Seems like on
the first run it takes 6 seconds and subsequently .05 seconds for either
version.
Maybe what you need is a package designed for *small* arrays ( < 1000). Simple C wrappers; just C doubles and ints, no byteswap, non-aligned. Maybe a fixed number of dimensions. Probably easy to throw something together using Pyrex. Or, wrap blitz++ with boost::python.
I'll check out Numeric first. Would rather have a drop-in solution (which
hopefully will get more optimized in future releases) rather than hacking
my own wrappers. Is it some purist mentality that's keeping numarray from
dropping to C code for the time-critical routines? Or can a lot of the
speed issues be attributed to the overhead of using objects for the library
(numarray does seem more general)?
At some point, gr****@hotmail.com (grv) wrote: co**********@physics.mcmaster.ca (David M. Cooke) wrote in <qn*************@arbutus.physics.mcmaster.ca>:
At some point, gr****@hotmail.com (grv575) wrote:
Heh. Try timing the example I gave (a += 5) using byteswapped vs. byteswap(). It's fairly fast to do the byteswap. If you go the interpretation way (byteswapped) then all subsequent array operations are at least an order of magnitude slower (5 million elements test example).
You mean something like a = arange(0, 5000000, type=Float64).byteswapped() a += 5
vs. a = arange(0, 5000000, type=Float64) a.byteswap() a += 5
? I get the same time for the a+=5 in each case -- and it's only twice as slow as operating on a non-byteswapped version. Note that numarray calls the ufunc add routine with non-byteswapped numbers; it takes a block, orders it correctly, then adds 5 to that, does the byteswap on the result, and stores that back. (You're not making a full copy of the array; just a large enough section at a time to do useful work.)
It must be using some sort of cache for the multiplication. Seems like on the first run it takes 6 seconds and subsequently .05 seconds for either version.
There is. The ufunc for the addition gets cached, so the first time
takes longer (but not that much???) Maybe what you need is a package designed for *small* arrays ( < 1000). Simple C wrappers; just C doubles and ints, no byteswap, non-aligned. Maybe a fixed number of dimensions. Probably easy to throw something together using Pyrex. Or, wrap blitz++ with boost::python.
I'll check out Numeric first. Would rather have a drop-in solution (which hopefully will get more optimized in future releases) rather than hacking my own wrappers. Is it some purist mentality that's keeping numarray from dropping to C code for the time-critical routines? Or can a lot of the speed issues be attributed to the overhead of using objects for the library (numarray does seem more general)?
It's the object overhead in numarray. The developers moved stuff up to
Python, where it's more flexible to handle. Numeric is faster for
small arrays (say < 3000), but numarray is much better at large
arrays. I have some speed comparisions at http://arbutus.mcmaster.ca/dmc/numpy/
I did a simple wrapper using Pyrex the other night for a vector of
doubles (it just does addition, so it's not much good :-) It's twice
as fast as Numeric, so I might give it a further try.
--
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke
|cookedm(at)physics(dot)mcmaster(dot)ca
Is it true that python uses doubles for all it's internal floating
point arithmetic even if you're using something like numarray's
Complex32? Is it possible to do single precision ffts in numarray or
no?
co**********@physics.mcmaster.ca (David M. Cooke) wrote in message news:<qn*************@arbutus.physics.mcmaster.ca> ... At some point, gr****@hotmail.com (grv) wrote:
co**********@physics.mcmaster.ca (David M. Cooke) wrote in <qn*************@arbutus.physics.mcmaster.ca>:
At some point, gr****@hotmail.com (grv575) wrote: Heh. Try timing the example I gave (a += 5) using byteswapped vs. byteswap(). It's fairly fast to do the byteswap. If you go the interpretation way (byteswapped) then all subsequent array operations are at least an order of magnitude slower (5 million elements test example).
You mean something like a = arange(0, 5000000, type=Float64).byteswapped() a += 5
vs. a = arange(0, 5000000, type=Float64) a.byteswap() a += 5
? I get the same time for the a+=5 in each case -- and it's only twice as slow as operating on a non-byteswapped version. Note that numarray calls the ufunc add routine with non-byteswapped numbers; it takes a block, orders it correctly, then adds 5 to that, does the byteswap on the result, and stores that back. (You're not making a full copy of the array; just a large enough section at a time to do useful work.)
It must be using some sort of cache for the multiplication. Seems like on the first run it takes 6 seconds and subsequently .05 seconds for either version.
There is. The ufunc for the addition gets cached, so the first time takes longer (but not that much???)
Maybe what you need is a package designed for *small* arrays ( < 1000). Simple C wrappers; just C doubles and ints, no byteswap, non-aligned. Maybe a fixed number of dimensions. Probably easy to throw something together using Pyrex. Or, wrap blitz++ with boost::python.
I'll check out Numeric first. Would rather have a drop-in solution (which hopefully will get more optimized in future releases) rather than hacking my own wrappers. Is it some purist mentality that's keeping numarray from dropping to C code for the time-critical routines? Or can a lot of the speed issues be attributed to the overhead of using objects for the library (numarray does seem more general)?
It's the object overhead in numarray. The developers moved stuff up to Python, where it's more flexible to handle. Numeric is faster for small arrays (say < 3000), but numarray is much better at large arrays. I have some speed comparisions at http://arbutus.mcmaster.ca/dmc/numpy/
I did a simple wrapper using Pyrex the other night for a vector of doubles (it just does addition, so it's not much good :-) It's twice as fast as Numeric, so I might give it a further try.
grv575 wrote: Is it true that python uses doubles for all it's internal floating point arithmetic
Yes.
even if you're using something like numarray's Complex32?
No. Numarray is an extension module and can use whatever numeric types
it feels like. Float32 for instance is an array of C floats (assuming
floats are 32 bits on your box, which they almost certainly are).
Is it possible to do single precision ffts in numarray or no?
I believe so, but I'm not sure off the top of my head. I recommend that
you ask on numpy-discussion <nu**************@lists.sourceforge.net> or
peek at the implementation. It's possible that all FFTs are done double
precision, but I don't think so.
-tim co**********@physics.mcmaster.ca (David M. Cooke) wrote in message news:<qn*************@arbutus.physics.mcmaster.ca> ...
At some point, gr****@hotmail.com (grv) wrote:
co**********@physics.mcmaster.ca (David M. Cooke) wrote in <qn*************@arbutus.physics.mcmaster.ca> :
At some point, gr****@hotmail.com (grv575) wrote: >Heh. Try timing the example I gave (a += 5) using byteswapped vs. >byteswap(). It's fairly fast to do the byteswap. If you go the >interpretation way (byteswapped) then all subsequent array operations >are at least an order of magnitude slower (5 million elements test >example).
You mean something like a = arange(0, 5000000, type=Float64).byteswapped() a += 5
vs. a = arange(0, 5000000, type=Float64) a.byteswap() a += 5
? I get the same time for the a+=5 in each case -- and it's only twice as slow as operating on a non-byteswapped version. Note that numarray calls the ufunc add routine with non-byteswapped numbers; it takes a block, orders it correctly, then adds 5 to that, does the byteswap on the result, and stores that back. (You're not making a full copy of the array; just a large enough section at a time to do useful work.)
It must be using some sort of cache for the multiplication. Seems like on the first run it takes 6 seconds and subsequently .05 seconds for either version.
There is. The ufunc for the addition gets cached, so the first time takes longer (but not that much???)
Maybe what you need is a package designed for *small* arrays ( < 1000). Simple C wrappers; just C doubles and ints, no byteswap, non-aligned. Maybe a fixed number of dimensions. Probably easy to throw something together using Pyrex. Or, wrap blitz++ with boost::python.
I'll check out Numeric first. Would rather have a drop-in solution (which hopefully will get more optimized in future releases) rather than hacking my own wrappers. Is it some purist mentality that's keeping numarray from dropping to C code for the time-critical routines? Or can a lot of the speed issues be attributed to the overhead of using objects for the library (numarray does seem more general)?
It's the object overhead in numarray. The developers moved stuff up to Python, where it's more flexible to handle. Numeric is faster for small arrays (say < 3000), but numarray is much better at large arrays. I have some speed comparisions at http://arbutus.mcmaster.ca/dmc/numpy/
I did a simple wrapper using Pyrex the other night for a vector of doubles (it just does addition, so it's not much good :-) It's twice as fast as Numeric, so I might give it a further try.
At some point, Tim Hochberg <ti**********@ieee.org> wrote: grv575 wrote:
Is it possible to do single precision ffts in numarray or no?
I believe so, but I'm not sure off the top of my head. I recommend that you ask on numpy-discussion <nu**************@lists.sourceforge.net> or peek at the implementation. It's possible that all FFTs are done double precision, but I don't think so.
Looks like the numarray.fft package uses doubles.
If you really need floats, SciPy wraps the single- and
double-precision versions of FFTW. (Although SciPy uses Numeric, not
numarray).
Or, you can make your own version of numarray.fft using floats
(actually looks to relatively simple to do).
--
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke
|cookedm(at)physics(dot)mcmaster(dot)ca
Great that cleared up that discrepancy between the source and the code
I'm translating to. I think I've tried all the libraries though,
numarray.fft, numarray.fftpack, scipi's fft module. I've even got the
install script for scipi to tell me it found fftw modules so I'm
pretty sure it was using them, but I'm getting the same speeds for
each module...and the code _should_ be about 10x faster (compared to
C/fortran/etc.) They really should better document exactly how to get
fast ffts up and running under these number packages.
co**********@physics.mcmaster.ca (David M. Cooke) wrote in message news:<qn*************@arbutus.physics.mcmaster.ca> ... At some point, Tim Hochberg <ti**********@ieee.org> wrote: grv575 wrote:
Is it possible to do single precision ffts in numarray or no?
I believe so, but I'm not sure off the top of my head. I recommend that you ask on numpy-discussion <nu**************@lists.sourceforge.net> or peek at the implementation. It's possible that all FFTs are done double precision, but I don't think so.
Looks like the numarray.fft package uses doubles.
If you really need floats, SciPy wraps the single- and double-precision versions of FFTW. (Although SciPy uses Numeric, not numarray).
Or, you can make your own version of numarray.fft using floats (actually looks to relatively simple to do). This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Tim Rowe |
last post by:
Does numarray-0.5.win32-py2.2.exe work with Python 2.3? If not, is
there a version that will?
Tia,
Tim
|
by: Colin J. Williams |
last post by:
numarray is a package which is under development and intended to replace
Numeric, an efficient and operational package.
One of the classes in numarray is NumArray. As currently implemented,...
|
by: Christopher T King |
last post by:
In a quest to speed up numarray computations, I tried writing a 'threaded
array' class for use on SMP systems that would distribute its workload
across the processors. I hit a snag when I found out...
|
by: Alex Hunsley |
last post by:
I'm looking for a way to iterate over all the items in a numarray.
Writing a few nested loops isn't going to cut it, because the numarray
in question could be of any dimension...
I am aware of the...
|
by: SunX |
last post by:
I tried to initialize a float point array by:
import numarray
xur = numarray.fromfunction(lambda x,y,z:x*y*z, (2, 2, 2))
but I ended up with an integer array even though x, y, and z are all...
|
by: Chris P. |
last post by:
Hi. I have a very simple task to perform and I'm having a hard time
doing it.
Given an array called 'x' (created using the numarray library), is
there a single command that rounds each of its...
|
by: Ivan Vinogradov |
last post by:
Hello All,
this seems like a trivial problem, but I just can't find an elegant
solution neither by myself, nor with google's help.
I'd like to be able to keep an array representing coordinates...
|
by: Bryan |
last post by:
hi,
what is the difference among numeric, numpy and numarray? i'm going to start
using matplotlib soon and i'm not sure which one i should use.
this page says, "Numarray is a...
|
by: robert |
last post by:
just a note - some speed comparisons :
0.60627370238398726
0.42836673376223189
0.36965815487747022
0.016557970357098384
0.15692469294117473
0.01951756438393204
|
by: Kemmylinns12 |
last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and efficiency. While initially associated with cryptocurrencies...
|
by: antdb |
last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine
In the overall architecture, a new "hyper-convergence" concept was proposed, which integrated multiple engines and...
|
by: Oralloy |
last post by:
Hello Folks,
I am trying to hook up a CPU which I designed using SystemC to I/O pins on an FPGA.
My problem (spelled failure) is with the synthesis of my design into a bitstream, not the C++...
|
by: Carina712 |
last post by:
Setting background colors for Excel documents can help to improve the visual appeal of the document and make it easier to read and understand. Background colors can be used to highlight important...
|
by: BLUEPANDA |
last post by:
At BluePanda Dev, we're passionate about building high-quality software and sharing our knowledge with the community. That's why we've created a SaaS starter kit that's not only easy to use but also...
|
by: Ricardo de Mila |
last post by:
Dear people, good afternoon...
I have a form in msAccess with lots of controls and a specific routine must be triggered if the mouse_down event happens in any control.
Than I need to discover what...
|
by: ezappsrUS |
last post by:
Hi,
I wonder if someone knows where I am going wrong below. I have a continuous form and two labels where only one would be visible depending on the checkbox being checked or not. Below is the...
|
by: jack2019x |
last post by:
hello, Is there code or static lib for hook swapchain present?
I wanna hook dxgi swapchain present for dx11 and dx9.
|
by: F22F35 |
last post by:
I am a newbie to Access (most programming for that matter). I need help in creating an Access database that keeps the history of each user in a database. For example, a user might have lesson 1 sent...
| |