473,382 Members | 1,376 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

Counter Intuitive Results: Optimising an FFT routine for Speed

Thanks for the replies Tristan, Eric, Steven & Kurt. They have given
me some good leads.

I present justification for a lot of the comments that drew
(constructive) criticism below. Firstly, let me summarise the feedback
that has pointed me in the right direction :-
Steven 20.13: What's the best way of making my program efficient?
A: By picking good algorithms and implementing them carefully.


<hand up> Count me in for that, there was no intention or suggestion
of micro-optimisation or use of inline assembler in my post. What I am
trying to find out is what algorithm is effective and what C syntax is
more efficient, albeit only on the small Turbo-C development system
that I have.
</hand up>
http://www.eskimo.com/~scs/C-faq/top.html Thanks, I am busy downloading the whole list by sequentially stepping
through all the URLs at this site.
The biggest speed gains in an FFT (and most algorithms) come from much higher-
level transformations. For example, our FFTW (www.fftw.org) ...
I see that I previously downloaded
FFTW 3.0.1 is the latest official version of FFTW

a while back, I am overwhelmed with volume :
458 *.c files (2.9MB)
42 *.h files

Could take me a while to assimilate all of this.

you should use a different routine than clock() I thought I did quite well on the Athlon /Win98, synchronising on the
20ms ticks and counting in between I could improve resolution to
600ns. Still dependent on the system time sharing on the CPU though.
The best resolution comes from reading the CPU cycle counter directly (see
www.fftw.org/download.html for code to do this on many CPUs). Yes, I knew there was something called the "software stopwatch",
hopefully I will come right with CYCLE.H.
I can't go and peep at this URL at the mo' as my C-FAQ download is
still stepping through each answer page. CYCLE.H has quite a bit of
code to it, I suppose I just chuck out most of the CPU specific code
and keep

_( Pentium cycle counter)
typedef unsigned long long ticks;
static __inline__ ticks getticks(void) {
ticks ret;
__asm__ __volatile__("rdtsc": "=A" (ret));
return ret; }
Tristan I followed my usual method of
QUERY GOOGLE -> READ -> FILTER -> POST

and using this I selected the four newsgroups that had hits on
C Speed Optimisation FFT
comp.lang.oberon Subject: Re: Optimizing array bounds checking
Newsgroups: comp.lang.oberon
comp.lang.functional Subject: Re: Academia isn't interested...
Newsgroups: comp.lang.functional
.....
But what you *can* do is generate low-level code from high-level
specifications. This has already been well-demonstrated in e.g. the
FFTW
(written in OCaml), which generates extremely efficient C-algorithms
for the FFT from specifications and specialises them for the given
architecture. This, however, is high-level programming again, not
low-level programming. There is no escape from imperativeness if you
work with von-Neumann architectures on a low level.

Eric You'd have to examine the actual generated code to have a hope
of figuring out what's really happening in any particular case. Don't want to do that, anyway my solution has to be fairly portable,
from me (design engineer) it goes to software engineer, rebuilt under
Visual C++, and then targetted to TMS320 (or equiv) DSP.
Steven You realize, of course, that your first mistake was starting with the
radix-2 Numerical Recipes in C code and spending your time attempting
micro-optimizations. The biggest speed gains in an FFT (and most
algorithms) come from much higher-level transformations. Actually I started with someone else's Fortran->C implementation of an
algorithm that looks like it came from the Cooley-Tukey paper. Then I
stepped to the Numerical Recipes (C) algorithm (Danielson - Lanczos?)
(which by the way still carries the legacy of Fortran array indexing)
This took -72% off the time.

I have refined this in about five major steps, so that at the point of
posting I was
-23% off the time of the NumRep routine as published (only high
level tweaks).

I will perform my last trial of converting the original code to double
without changing the indexing, just to see what I would have got,
before ...

Now, however, I will trawl through all the source in FFTW and see what
I can find.

If you just want a fast FFT and are not doing this as a learning experience, you'll
be much better off downloading an existing optimized code. Even though I will probably do better with an FFTW routine, the
lessons learned will be valuable for other pieces of time critical
code.
A decent compiler should transform array references a[j] in a loop
into separate incremented pointers for you, if it's advantageous to do
so. By being clever you can easily just confuse the compiler's
optimizer unless you know what you're doing. The two sets of indices do not change sequentially, which is why I
hoped that me forcing the use of pointers would be faster.

Kurt

I've done the above many a time. The only way I've been able to cope is
by learning the instruction set and reading the (equivalent) assembly code.
It's pretty easy to tell when you've confused the optimizer.

This would be interesting given unlimited time, I do know that the DSP
code was definitely hand tweaked in the past as portability and
maintainability were far lower priority than performance.

Regards,
Paul
Nov 13 '05 #1
4 3485
Paul Brown wrote:
Don't want to do that, anyway my solution has to be fairly portable,
from me (design engineer) it goes to software engineer, rebuilt under
Visual C++, and then targetted to TMS320 (or equiv) DSP.


Whoa, whoa, hold on, you want this to be fast on a DSP chip? But you're
planning to benchmark and optimize it on a general-purpose CPU?

Sigh...

Nov 13 '05 #2
On Tue, 07 Oct 2003 13:43:11 -0400
"Steven G. Johnson" <st*****@alum.mit.edu> wrote:
Paul Brown wrote:
Don't want to do that, anyway my solution has to be fairly portable,
from me (design engineer) it goes to software engineer, rebuilt
under Visual C++, and then targetted to TMS320 (or equiv) DSP.


Whoa, whoa, hold on, you want this to be fast on a DSP chip? But
you're planning to benchmark and optimize it on a general-purpose CPU?

Sigh...


Optimising for the TMS320 processors (they vary a *lot* BTW) is *way* OT
for this group since on the ones I've used there are all sorts of trick
you can play in assembler that you can't (and the compiler I used did
not) use.

The OP should find a group or mailing list specific to the processor he
wants to use and probably obtain a library written in assembler to make
full use of the facilities the processor has. Also read the processor
manual, ISTR the TMS320C1x/2x/5x books had example FFT code...
--
Mark Gordon
Paid to be a Geek & a Senior Software Developer
Although my email address says spamtrap, it is real and I read it.
Nov 13 '05 #3
po********@pwi.mailshell.com (Paul Brown) writes:
[...]
http://www.eskimo.com/~scs/C-faq/top.html

Thanks, I am busy downloading the whole list by sequentially stepping
through all the URLs at this site.


The "other versions" link, <http://www.eskimo.com/~scs/C-faq/versions.html>,
points you to a compressed plain-text copy of the whole thing at
<ftp://ftp.eskimo.com/u/s/scs/C-faq/faq.Z>. You can uncompress it
with "uncompress" or "gunzip". Apparently it's the most up-to-date
version available.

--
Keith Thompson (The_Other_Keith) ks*@cts.com <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"
Nov 13 '05 #4
In article <34**************************@posting.google.com >,
po********@pwi.mailshell.com says...
The best resolution comes from reading the CPU cycle counter directly (see
www.fftw.org/download.html for code to do this on many CPUs).

Yes, I knew there was something called the "software stopwatch",
hopefully I will come right with CYCLE.H.
I can't go and peep at this URL at the mo' as my C-FAQ download is
still stepping through each answer page. CYCLE.H has quite a bit of
code to it, I suppose I just chuck out most of the CPU specific code
and keep

_( Pentium cycle counter)
typedef unsigned long long ticks;
static __inline__ ticks getticks(void) {
ticks ret;
__asm__ __volatile__("rdtsc": "=A" (ret));
return ret; }


This, and most of this thread is OT here, but since you've already
posted this, I don't want others to be bit by this in the future.

Danger: The above is TOTALLY UNRELIABLE on SMP systems. Getting
dependent upon this will bite you, as even notebook computers
are now available with dual CPUs. There is no synchronization
between the TSCs on the individual CPUs. So, you are as likely
as not to get bogus time values if your process gets bounced
between processors.

--
Randy Howard _o
2reply remove FOOBAR \<,
______________________()/ ()______________________________________________
SCO Spam-magnet: po********@sco.com
Nov 13 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Ben Barnes | last post by:
Hi, Having an issue with a pretty standard javascript text counter. The counter works fine when a user physically enters text into a textbox but shoudl the textbox have a value prefilled then...
6
by: Who.Really.Really.Cares | last post by:
Hi! I guess this must be a FAQ but I'll give it a try. I've searched the web and usenet archive and found only negative answers. But most of them were dated like 3-4 years back. Hasn't anything...
0
by: Earl Anderson | last post by:
KB Article Q140908 provided the following function to create an Auto Incrementing Counter: Function Next_Custom_Counter () On Error GoTo Next_Custom_Counter_Err Dim MyDB As Database Dim...
1
by: Suffrinmick | last post by:
Hello Everyone I've built a database using Access 2000 which includes a query which is built using a form containing filters. No problem. When I export the results of the query to excel, (File >...
7
by: Paul Brown | last post by:
I am confused - I have gone through a two day exercise tuning up an FFT routine, and at the last stage I find that where the biggest gains were expected, all my conventional understanding of...
45
by: OcelotIguana | last post by:
Hello, Does anyone have any suggestions for where to find a good sincos routine (i.e. a routine that calculates both the sin and cos of a given argument) written in c? Thanks, ...
8
by: masood.iqbal | last post by:
All this time I was under the illusion that I understand the concept of multi-dimensional arrays well ---- however the following code snippet belies my understanding. I had assumed all along...
74
by: aruna.mysore | last post by:
Hi all, I have a simple definitioin in a C file something like this. main() { char a; ....... int k; }
1
by: zenfor | last post by:
Hi, I have a simple script I found that increments a number in a text file. I was wondering if anyone has a routine that will commify the number. Thank you! =============== <% counter_file...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.