Dear all,
Can C++/STL/Boost do the vectorized calculation as those in Matlab?
For example, in the following code, what I really want to do is to
send in a vector of u's.
All other parameters such as t, l1, l2, l3, etc. are scalars...
But u is a vector.
Thus, t6 becomes a vector.
t9 is an elementwise multiplication...
The following code was actually converted from Matlab.
If vectorized computation is not facilitated, then I have to call this
function millions of times.
But if vectorized computation is okay, then I can send in just a u
vector with batch elements a time.
I have many such code in Matlab need to be converted into C++ with
vectorization.
Any thoughts?
Thank you!
double t5, t6, t7, t9, t11, t13, t16, t20, t23, t27, t32, t34, t36,
t37, t38, t42,
t44, t47, t48, t51, t52, t54, t59, t60, t61, t66, t67, t69, t74,
t75, t76, t81,
t82, t84, t87, t105, t106, t110, t112;
t5 = exp(t * l1  t * l2  t * l3);
t6 = t * u;
t7 = mu1 * mu1;
t9 = u * u;
t11 = kappa * kappa;
t13 = 0.1e1 / (t9 * t7 + t11); 14 2780
Dear all,
>
Can C++/STL/Boost do the vectorized calculation as those in Matlab?
For example, in the following code, what I really want to do is to
send in a vector of u's.
All other parameters such as t, l1, l2, l3, etc. are scalars...
But u is a vector.
Thus, t6 becomes a vector.
t9 is an elementwise multiplication...
The following code was actually converted from Matlab.
If vectorized computation is not facilitated, then I have to call this
function millions of times.
But if vectorized computation is okay, then I can send in just a u
vector with batch elements a time.
I have many such code in Matlab need to be converted into C++ with
vectorization.
Any thoughts?
Thank you!
* * * * * * * * double t5, t6, t7, t9, t11, t13, t16, t20, t23, t27, t32, t34, t36,
t37, t38, t42,
* * * * * * * * * * * * t44, t47, t48, t51, t52, t54, t59, t60, t61, t66, t67, t69, t74,
t75, t76, t81,
* * * * * * * * * * * * t82, t84, t87, t105, t106, t110, t112;
* * * * * * * * t5 = exp(t * l1  t * l2  t * l3);
* * * * * * * * t6 = t * u;
* * * * * * * * t7 = mu1 * mu1;
* * * * * * * * t9 = u * u;
* * * * * * * * t11 = kappa * kappa;
* * * * * * * * t13 = 0.1e1 / (t9 * t7 + t11);
Hi.
I think matlab provides a c++ api. Have you checked it out? There's
also the matrix template library for general algebra computations. You
might find it useful.

Leandro T. C. Melo
I don't think Matlab's C++ API can do that. I think it is just a C
interface. It does not have STL, Boost etc.
Also, we are not talking about things as complicated as high speed
matrix computation, it's just vectorized computation...
On Jul 22, 11:00*am, Leandro Melo <ltcm...@gmail.comwrote:
Dear all,
Can C++/STL/Boost do the vectorized calculation as those in Matlab?
For example, in the following code, what I really want to do is to
send in a vector of u's.
All other parameters such as t, l1, l2, l3, etc. are scalars...
But u is a vector.
Thus, t6 becomes a vector.
t9 is an elementwise multiplication...
The following code was actually converted from Matlab.
If vectorized computation is not facilitated, then I have to call this
function millions of times.
But if vectorized computation is okay, then I can send in just a u
vector with batch elements a time.
I have many such code in Matlab need to be converted into C++ with
vectorization.
Any thoughts?
Thank you!
* * * * * * * * double t5, t6, t7, t9, t11, t13, t16, t20, t23, t27, t32, t34, t36,
t37, t38, t42,
* * * * * * * * * * * * t44, t47, t48, t51, t52, t54, t59, t60, t61, t66, t67, t69, t74,
t75, t76, t81,
* * * * * * * * * * * * t82, t84, t87, t105, t106, t110, t112;
* * * * * * * * t5 = exp(t * l1  t * l2  t * l3);
* * * * * * * * t6 = t * u;
* * * * * * * * t7 = mu1 * mu1;
* * * * * * * * t9 = u * u;
* * * * * * * * t11 = kappa * kappa;
* * * * * * * * t13 = 0.1e1 / (t9 * t7 + t11);
Hi.
I think matlab provides a c++ api. Have you checked it out? There's
also the matrix template library for general algebra computations. You
might find it useful.

Leandro T. C. Melodt
On Tue, 22 Jul 2008 07:08:38 0700, Luna Moon wrote:
Dear all,
Can C++/STL/Boost do the vectorized calculation as those in Matlab?
What exactly do you mean by "vectorized calculation as those in Matlab"?
Do you just mean that Matlab has a native vector type and does
calculations with it, or were you suggesting that Matlab processes
vectors in some special way that C++ cannot?
Matlab, AFAIK, does a lot of its matrix/vector arithmetic, such as dot
products and matrixmatrix or matrixvector multiplication, using a BLAS
library  that is highly optimised linear algebra code (generally written
in Fortran)  which is accessible via C++, since there is a welldefined
interface for C++ (C, really) and Fortran. There is a good chance you
will already have a BLAS library on your system; if not, there are open
source (e.g,. the ATLAS project) as well as vendorsupplied versions
(e.g. Intel, AMD, etc supply BLAS libraries).
It is possible that Matlab will also make use of very machinespecific
optimisations such as sse/mmx for floating point computation. You can use
these too from C++ if you can persuade your compiler to play ball.
The bottom line is that there's nothing Matlab can do that you can't do
in C++, equally (if not more) efficiently. It's more a question of
convenience: Matlab is designed specifically for vector/matrix
manipulation  C++ is a generalpurpose programming language.
For example, in the following code, what I really want to do is to send
in a vector of u's.
All other parameters such as t, l1, l2, l3, etc. are scalars...
But u is a vector.
Thus, t6 becomes a vector.
t9 is an elementwise multiplication...
The following code was actually converted from Matlab.
If vectorized computation is not facilitated, then I have to call this
function millions of times.
But if vectorized computation is okay, then I can send in just a u
vector with batch elements a time.
I'm really not quite sure what you mean here.
The closest thing in C++ to a Matlab vector is probably the
std::valarray<doubleclass, although it seems a bit of a bodge and hence
rather unpopular. The std::vector<doubleclass will probably do you
quite well; it doesn't implement functionality such as elementwise
multiplication, so you will have to do that yourself  but that's pretty
simple.
There are also various matrix/vector C++ libraries knocking around (e.g.
Blitz++) that you might want to look at.
In terms of efficiency, if you are doing a lot of large matrix
multiplications or more sophisticated linear algebra a la Matlab, then
you might want to investigate the BLAS and possibly LAPACK (Linear
Algebra Package), but I suspect that might be overkill in your case. And
it is ugly.
FWIW, I recently ported a lot of Matlab code to C++ and have to say that
C++ generally kicks Matlab's a*se in terms of efficiency  but not in
ease of coding (Matlab appears to suffer performancewise from a lot of
internal copying which you can eliminate in handcoded C++).
I have many such code in Matlab need to be converted into C++ with
vectorization.
Any thoughts?
Thank you!
double t5, t6, t7, t9, t11, t13, t16, t20, t23, t27, t32,
t34, t36,
t37, t38, t42,
t44, t47, t48, t51, t52, t54, t59, t60, t61, t66,
t67, t69, t74,
t75, t76, t81,
t82, t84, t87, t105, t106, t110, t112;
t5 = exp(t * l1  t * l2  t * l3);
t6 = t * u;
t7 = mu1 * mu1;
t9 = u * u;
t11 = kappa * kappa;
t13 = 0.1e1 / (t9 * t7 + t11);

Lionel B
Luna Moon wrote:
Dear all,
Can C++/STL/Boost do the vectorized calculation as those in Matlab?
I don't know what Boost has in the field of matrix & vector
computations, but standard C++ does not have anything even remotely
resembling the capabilities of Matlab.
The closest you can get with standard C++ is to use std::valarray<>,
which was intended to facilitate computations that can potentially be
executed in parallel.
Bart v Ingen Schenau

a.c.l.l.cc++ FAQ: http://www.comeaucomputing.com/learn/faq
c.l.c FAQ: http://cfaq.com/
c.l.c++ FAQ: http://www.parashift.com/c++faqlite/
On 22 Jul, 18:37, Lionel B <m...@privacy.netwrote:
On Tue, 22 Jul 2008 07:08:38 0700, Luna Moon wrote:
Dear all,
Can C++/STL/Boost do the vectorized calculation as those in Matlab?
What exactly do you mean by "vectorized calculation as those in Matlab"?
Do you just mean that Matlab has a native vector type and does
calculations with it, or were you suggesting that Matlab processes
vectors in some special way that C++ cannot?
It is a common misconception amongst matlab users that there is
something special about vectors. Matlab has historically been
very slow when executing explicit forloops and whileloops etc.
The 'standard' matlab way to deal with this is to bypass the
interpreter and call compiled code, often from BLAS or LAPACK,
by 'vectorizing' the matlab code. I commented on that just a
few days ago on comp.softsys.matlab: http://groups.google.no/group/comp.s...o&dmode=source
The problem is that users who only know matlab and no other
programming
languages are conditioned to believe that the problem lies with for
loops
as such, and not with matlab.
Rune
On 23 Jul, 11:26, Lionel B <m...@privacy.netwrote:
On Tue, 22 Jul 2008 10:16:08 0700, Rune Allnor wrote:
On 22 Jul, 18:37, Lionel B <m...@privacy.netwrote:
On Tue, 22 Jul 2008 07:08:38 0700, Luna Moon wrote:
Dear all,
Can C++/STL/Boost do the vectorized calculation as those in Matlab?
What exactly do you mean by "vectorized calculation as those in
Matlab"? Do you just mean that Matlab has a native vector type and does
calculations with it, or were you suggesting that Matlab processes
vectors in some special way that C++ cannot?
It is a common misconception amongst matlab users that there is
something special about vectors. Matlab has historically been very slow
when executing explicit forloops and whileloops etc. The 'standard'
matlab way to deal with this is to bypass the interpreter and call
compiled code, often from BLAS or LAPACK, by 'vectorizing' the matlab
code. I commented on that just a few days ago on comp.softsys.matlab:
Indeed. And if you look inside any BLAS or LAPACK you'll see... loops.
Exactly. I attended a conference on underwater acoustics many years
ago, where one of the presentations dealt with 'efficient
computation.'
In effect, the matlab code was rewritten from readable code (i.e. for
loops) to 'vectorized' matlab code. That presentation was, in fact,
the inspiration for making the test I pointed to yesterday.
Rune
On 22 Jul., 16:08, Luna Moon <lunamoonm...@gmail.comwrote:
Dear all,
Can C++/STL/Boost do the vectorized calculation as those in Matlab?
For example, in the following code, what I really want to do is to
send in a vector of u's.
All other parameters such as t, l1, l2, l3, etc. are scalars...
But u is a vector.
Thus, t6 becomes a vector.
t9 is an elementwise multiplication...
The following code was actually converted from Matlab.
If vectorized computation is not facilitated, then I have to call this
function millions of times.
But if vectorized computation is okay, then I can send in just a u
vector with batch elements a time.
I have many such code in Matlab need to be converted into C++ with
vectorization.
Any thoughts?
Thank you!
* * * * * * * * double t5, t6, t7, t9, t11, t13, t16, t20, t23, t27, t32, t34, t36,
t37, t38, t42,
* * * * * * * * * * * * t44, t47, t48, t51, t52, t54, t59, t60, t61, t66, t67, t69, t74,
t75, t76, t81,
* * * * * * * * * * * * t82, t84, t87, t105, t106, t110, t112;
* * * * * * * * t5 = exp(t * l1  t * l2  t * l3);
* * * * * * * * t6 = t * u;
* * * * * * * * t7 = mu1 * mu1;
* * * * * * * * t9 = u * u;
* * * * * * * * t11 = kappa * kappa;
* * * * * * * * t13 = 0.1e1 / (t9 * t7 + t11);
Why do you want that ? Is it because the code is easier to read, or do
you hope
to get better performance ?
If it is for performance: Writing native loops in C/C++ with some
optimization flags
will give you best performance in most cases. Sometimes optimized BLAS
like
ATLAS will improve performance further.
Vectorized code in Matlab is faster than looping code, because in the
latter the
loops are interpreted which slows things down. Internally Matlab works
as described
above.
Greetings, Uwe
On Jul 22, 10:08 pm, Luna Moon <lunamoonm...@gmail.comwrote:
Dear all,
Can C++/STL/Boost do the vectorized calculation as those in Matlab?
For example, in the following code, what I really want to do is to
send in a vector of u's.
All other parameters such as t, l1, l2, l3, etc. are scalars...
But u is a vector.
Thus, t6 becomes a vector.
t9 is an elementwise multiplication...
The following code was actually converted from Matlab.
If vectorized computation is not facilitated, then I have to call this
function millions of times.
But if vectorized computation is okay, then I can send in just a u
vector with batch elements a time.
I have many such code in Matlab need to be converted into C++ with
vectorization.
Any thoughts?
Thank you!
double t5, t6, t7, t9, t11, t13, t16, t20, t23, t27, t32, t34, t36,
t37, t38, t42,
t44, t47, t48, t51, t52, t54, t59, t60, t61, t66, t67, t69, t74,
t75, t76, t81,
t82, t84, t87, t105, t106, t110, t112;
t5 = exp(t * l1  t * l2  t * l3);
t6 = t * u;
t7 = mu1 * mu1;
t9 = u * u;
t11 = kappa * kappa;
t13 = 0.1e1 / (t9 * t7 + t11);
GSL,GNU Octave,boost
Luna Moon <lu**********@gmail.comwrote:
<snip>
If vectorized computation is not facilitated, then I have to call this
function millions of times.
But if vectorized computation is okay, then I can send in just a u
vector with batch elements a time.
I have many such code in Matlab need to be converted into C++ with
vectorization.
Any thoughts?
Thank you!
double t5, t6, t7, t9, t11, t13, t16, t20, t23, t27, t32, t34, t36,
t37, t38, t42,
t44, t47, t48, t51, t52, t54, t59, t60, t61, t66, t67, t69, t74,
t75, t76, t81,
t82, t84, t87, t105, t106, t110, t112;
t5 = exp(t * l1  t * l2  t * l3);
t6 = t * u;
t7 = mu1 * mu1;
t9 = u * u;
t11 = kappa * kappa;
t13 = 0.1e1 / (t9 * t7 + t11);
This is out of my element, but I have written some vector code in GCC, just
for my own edification. I'm not sure how you would translate the above
code... I don't exactly see the vectors, unless I'm misinterpreting. Anyhow: http://gcc.gnu.org/onlinedocs/gcc4....torExtensions
It's exceedingly simple. ICC and MSC support similar, though incompatible
syntax. With GCC, make sure to manually specify march, otherwise the
generator won't have access to SSE instructions (or whatever your platform
has).
The biggest initial syntax gotcha I encountered was initializing the vector;
the vector can be treated like an array, IIRC, but I encountered some
hangups. Second, GCC has to load all the SSE registers, and other sourcery I
wasn't acquainted with. There just seems like there'd be lots of headaches
keeping the pipeline chugging along, depending on your data set, and where
it comes from.
Also, I have no idea if the syntax carriers over to C++. And perhaps GCC can
already accomplish similar optimizations with valarrays. Either way, you'll
definitely want to use the latest GCC 4.3 version, which AFAIK is at the
moment king of the hill regarding autovectorization.
In article <6d**********************************@x41g2000hsb. googlegroups.com>,
Rune Allnor <al****@tele.ntnu.nowrote:
>On 6 Aug, 14:54, Giovanni Gherdovich <gherdov...@students.math.unifi.itwrote:
>I'm aware of Matlab "vectorization" techniques; I use them to avoid forloops.
That's a *matlab* problem. 'Vectorization' is a concept exclusive to matlab, which historically was caused by what I consider to be bugs in the matlab interpreter.
To describe this property of matlab as "buggy" is inordinately harsh.
It's an inherent property of interpreters: If you're interpreting a
loop, you have to look at the loop condition code, and the loop
bookkeeping code, and the code inside the loop, every time through.
Unless you go out of your way to make this fast, you end up having to
do a lookupdecodeprocess for each of those steps.
Compiling to native code lets you do the lookupdecode at compile time,
and for typical loops only generates a few machinecode instructions
for the loop bookkeeping and condition checking, which substantially
reduces the total amount of work the processor is doing. But making an
interpreter clever enough to do interpreted loops that fast is a Much
Harder Problem.
(So, the answer to the OP's question is (as already noted): Don't worry
about vectorizing, write loops and ask the compiler to optimize it, and
you'll probably come close enough to Matlab's performance that you
won't be able to tell the difference.)
Since Matlab is targeting numerical work with large arrays anyways,
there's not much benefit to speeding up this part of the interpreter;
if the program is spending most of its time inside the largematrix
code (which is compiled to native code, aggressively optimized by the
compiler, and probably handtuned for speed), then speeding up the
interpreter's handling of the loop won't gain you any noticeable
speedup anyways. If you're writing loopy code to do things Matlab has
primitives for, you're probably better off vectorizing it anyways,
since that will make it both clearer and faster.
So (unlike with generalpurpose interpreted languages that don't have
primitives that replace common loop idioms) there's no real benefit to
speeding up the Matlab interpreter's loop handling, and there are
obvious costs (development time, increased complexity, more potential
for bugs), so there are good reasons not to bother.
If you do have code that doesn't fit Matlab's vectorization model, you
can always write it in C or Fortran and wrap it up in a Matlab FFI
wrapper; Matlab's FFI is not hard to use on the compiledtonativecode
side, and looks exactly like a Matlab function on the Matlab code side,
so it's almost always the Right Tool For The Job in that case.
(At my day job, I've been asked to do this for the Matlab programmers a
few times, and for hardtovectorize loopy code getting a speedup of
two or three orders of magnitude just by doing a reasonably direct
translation into C and compiling to native code with an optimizing
compiler is pretty much expected.)
dave

Dave Vandervies dj3vande at eskimo dot com
Erm... wouldn't clock(), used with Bill Godfrey's followup, ignoring my
followup to him (as suggested in your followup to me), do the trick
quite nicely? Joona I Palaste in comp.lang.c
Hello,
thank you for your answers.
Rune Allnor:
Depending on exactly what you do, matlab *can* get very
close to bestpossible performance since it uses highly
tuned lowlevel libraries. If your operation is covered by
such a function, you might find it difficult to beat matlab.
If not, don't be surprised if C++ code beats matlab by
a factor 510 or more.
dave:
(So, the answer to the OP's question is (as already noted): Don't worry
about vectorizing, write loops and ask the compiler to optimize it, and
you'll probably come close enough to Matlab's performance that you
won't be able to tell the difference.)
Uwe:
I mean: taking the inverse of a matrix is linear algebra,
but multiplying two vectors componentwise is just... multiplying.
yes. but you can do some enrollment or other
access patterns for optimizing cache access.
this is a broad field, look at: http://en.wikipedia.org/wiki/Loop_transformation
Rune Allnor:
You would be surprised: There are forloops at the core of all
those libraries, even the BLAS libraries matlab is based on.
There are smart compilation techniques involved, but to *optimize*
the forloops, not to *eliminate* them.
I was among the user who are "conditioned to believe that the
problem lies with forloops as such, and not with matlab",
to use Rune's words.
Thank you all to point it out.
About the performance of numerical computation done using
std::valarray<>'s features:
Uwe:
My question:
When writing C++ code, do you thing I can have faster code
if I use std::valarray<in "the Matlab way", instead
of using, say, std::vector<and forloops?
I do not know how optimized valarray<is. You
should compare it using different matrix/vectorsizes
and different optimization flags
of your compiler and post your results.
Rune Allnor:
I don't know. I haven't used std::valarray<>. I know I have
seen some comment somewhere that std::valarray<was an early
attempt at a standardized way to handle numbercrunching in C++,
which was, well, not quite as successful as one might have
whished for.
It seems that nobody knows if it's worth to use std::valarray<>
and related "vectorized" operators (provided by the standard
library) to do numerical computing in C++.
Googling this topic, I've found this interesting thread in
a forum of a site called "www.velocityreviews.com" http://www.velocityreviews.com/forum...svectors.html
One of the poster, who (like me) took the chapter "Vector Arithmetic"
on Stroustrup's book as The Truth, says that with valarray<>
you can do math at the speed of light, blah blah optimization
blah blah vectorization and so on.
Another user answers with what I find a more reasonable argument:
std::valarray<was designed to meet the characteristic of vector
machines, like the Cray. If you don't have the Cray, there is
no point in doing math with valarray<and related operators.
Anyway, as soon as I have some spare time I will check it on
my own, comparing the results with ATLAS as Uwe suggests.
Regards,
Giovanni Gherdovich
In article <a47fac46bb334608bd97 f7**********@d77g2000hsb.googlegroups.com>, gh********@students.math.unifi.it says...
[ ... ]
Another user answers with what I find a more reasonable argument:
std::valarray<was designed to meet the characteristic of vector
machines, like the Cray. If you don't have the Cray, there is
no point in doing math with valarray<and related operators.
In theory that's right: the basic idea was to provide something that
could be implemented quite efficiently on vector machines. In fact, I've
never heard of anybody optimizing the code for a vector machine, so it
may be open to question whether it provides any real advantage on them.
OTOH, valarray _can_ make some code quite readable, so it's not always a
complete loss anyway.

Later,
Jerry.
The universe is a figment of its own imagination.
Jerry Coffin wrote:
In article <a47fac46bb334608bd97 f7**********@d77g2000hsb.googlegroups.com>, gh********@students.math.unifi.it says...
[ ... ]
>Another user answers with what I find a more reasonable argument: std::valarray<was designed to meet the characteristic of vector machines, like the Cray. If you don't have the Cray, there is no point in doing math with valarray<and related operators.
In theory that's right: the basic idea was to provide something that
could be implemented quite efficiently on vector machines. In fact, I've
never heard of anybody optimizing the code for a vector machine, so it
may be open to question whether it provides any real advantage on them.
During the mid90s both C and C++ were involved in adding features that
would support numerically intense programming. Unfortunately a couple of
years later the companies whose numerical experts doing the grunt work
withdrew support. By hindsight it might have been better to have shelved
the work but both WG14 and WG21 opted to continue hoping that they would
still produce something useful.
It is not obvious to those outside the numerical fields that actually
developing things like complex number support, support for array
arithmetic etc. is fraught with subtle traps.
>
OTOH, valarray _can_ make some code quite readable, so it's not always a
complete loss anyway.

Note that robinton.demon.co.uk addresses are no longer valid.
Hello,
During the mid90s both C and C++ were involved in adding features that
would support numerically intense programming. Unfortunately a couple of
years later the companies whose numerical experts doing the grunt work
withdrew support.
Just for the sake of historical investigation, I found a thread on
this newsgroup from the far 1991, where Walter Bright (who might
be the same Walter Bright who designed the D programming language, http://www.walterbright.com/ http://en.wikipedia.org/wiki/Walter_Bright , but I'm not sure)
lists some shortcomings for the C++ numerical programmer,
and item #6 is
"Optimization of array operations is inhibited by the 'aliasing'
problems." http://en.wikipedia.org/wiki/Aliasing_(computing)
(retrieved from http://groups.google.com/group/comp....b0ec8ea7b24189
Then he mentions some solutions to this (two libraries, which
might be completely out of date nowaday).
Just to say that the Original Poster isn't the first to
address this issue...
By hindsight it might have been better to have shelved
the work but both WG14 and WG21 opted to continue hoping that they would
still produce something useful.
Mmmh... I skimmed over the pages of Working Groups 14 and 21 http://www.openstd.org/jtc1/sc22/wg14 http://www.openstd.org/jtc1/sc22/wg21
and they don't seem to have vector arithmetic among their priorities.
Anyway, from what I've learned from this thread, the overall
theme can very well make no sense, because of CPUs characteristics.
Regards,
Giovanni Gh. This discussion thread is closed Replies have been disabled for this discussion. Similar topics
2 posts
views
Thread by Mathias 
last post: by

4 posts
views
Thread by dataangel 
last post: by

3 posts
views
Thread by Mohammed Smadi 
last post: by

6 posts
views
Thread by Lars Christiansen 
last post: by

53 posts
views
Thread by Michael Tobis 
last post: by

2 posts
views
Thread by Quina 
last post: by

9 posts
views
Thread by Carl 
last post: by

10 posts
views
Thread by Chao 
last post: by

4 posts
views
Thread by itcecsa 
last post: by
          