473,394 Members | 1,821 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

FFT: Problem with large dataset in memory

Hi,

I am working in a project dealing with a big many of data. Now I have
a problem with doing the FFT on a very long time trace, a signal with
over 300 million sampling points. After testing with my computer, I
realise that I can store only 2**27 points into memory, which will
need 2 GB RAM. With an array of 2**28 Double_t points the program
crash ("segmentation violation"). I tried the Root cern framework, gsl
and fftw3 library. They all need to load the data into memory. So the
question is: Are there some mechanisms or algorithms to manage the
array on a TTree or somewhere else on the hard disk? And then load the
the data step by step into cache? Something likes a FileArray. Or you
get a better idea?
This is really urgent. I am very grateful if I can hear something from
you!
THX!

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Sep 24 '08 #1
6 2933
Moin!

Simon Lu <lu****@googlemail.comwrites:
Hi,

I am working in a project dealing with a big many of data. Now I have
a problem with doing the FFT on a very long time trace, a signal with
over 300 million sampling points. After testing with my computer, I
realise that I can store only 2**27 points into memory, which will
need 2 GB RAM. With an array of 2**28 Double_t points the program
crash ("segmentation violation").
[...]
And then load the the data step by step into cache? Something likes
a FileArray. Or you get a better idea? This is really urgent. I am
very grateful if I can hear something from you! THX!
man mmap

hth,

Christoph

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Sep 24 '08 #2
This is really urgent. I am very grateful if I can hear something from
you!
THX!
This is off-topic and unrelated to C++.

You can do the FFT in parts in divide and conquer. Do the even
elements and the odd elements individually. If that's too much,
divide each of those FFTs into odd and even, then recombine.

FFT is a classic divide and conquer algorithm. I think the book
"Numerical Recipes" may be useful here.

Joe

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Sep 24 '08 #3
On 2008-09-24 21:28, Simon Lu wrote:
Hi,

I am working in a project dealing with a big many of data. Now I have
a problem with doing the FFT on a very long time trace, a signal with
over 300 million sampling points. After testing with my computer, I
realise that I can store only 2**27 points into memory, which will
need 2 GB RAM. With an array of 2**28 Double_t points the program
crash ("segmentation violation"). I tried the Root cern framework, gsl
and fftw3 library. They all need to load the data into memory. So the
question is: Are there some mechanisms or algorithms to manage the
array on a TTree or somewhere else on the hard disk? And then load the
the data step by step into cache? Something likes a FileArray. Or you
get a better idea?
This is really urgent. I am very grateful if I can hear something from
There are many ways to do what you want, and the best way depends on
your needs. If you need to operate on the whole dataset at once you have
somewhat of a problem, if you are able to work with just a few data-
points at a time you should be able to "stream" the data through your
algorithm and can probably get away with a fairly small memory footprint.

None of this is however directly topical in this group, is is on a to
high level and have no C++ specific parts, try asking in a general
programming group, such as comp.programming, instead.

If you are running on Windows you might also be able to increase the
virtual memory available for your application to 3GB, for more info:
http://www.microsoft.com/whdc/system...AE/PAEmem.mspx

--
Erik Wikström
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Sep 24 '08 #4
On Sep 24, 8:28 pm, Simon Lu <luq...@googlemail.comwrote:
Hi,

I am working in a project dealing with a big many of data. Now I have
a problem with doing the FFT on a very long time trace, a signal with
over 300 million sampling points. After testing with my computer, I
realise that I can store only 2**27 points into memory, which will
need 2 GB RAM. With an array of 2**28 Double_t points the program
crash ("segmentation violation"). I tried the Root cern framework, gsl
and fftw3 library. They all need to load the data into memory. So the
question is: Are there some mechanisms or algorithms to manage the
array on a TTree or somewhere else on the hard disk? And then load the
the data step by step into cache? Something likes a FileArray. Or you
get a better idea?
This is really urgent. I am very grateful if I can hear something from
you!
THX!
Write the whole array into a file (binary format)
Then link the file to the process using mmap()
then you can use some sort of indexing to locate the data in the file.
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Sep 24 '08 #5
On Sep 24, 12:28 pm, Simon Lu <luq...@googlemail.comwrote:
I am working in a project dealing with a big many of data. Now I have
a problem with doing the FFT on a very long time trace, a signal with
over 300 million sampling points. After testing with my computer, I
realise that I can store only 2**27 points into memory, which will
need 2 GB RAM. With an array of 2**28 Double_t points the program
crash ("segmentation violation").
Clearly, the most natural solution would be to port your current C++
program to a 64-bit execution environment. A 64-bit memory model would
be able to accommodate vastly larger amounts of data in RAM than is
possible under the program's current, 32-bit memory model.

Now porting a C++ program to a 64-bit environment might not turn out
to be as straightforward as one might expect. Issues with C++'s sign
promotion of integer types - especially - can lead to surprising
results. Apple's 64-bit transition programmer's guide: (http://
preview.tinyurl.com/3ns3b5) has some examples.

Greg
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Sep 25 '08 #6
Simon Lu wrote:
Hi,

I am working in a project dealing with a big many of data. Now I have
a problem with doing the FFT on a very long time trace, a signal with
over 300 million sampling points. After testing with my computer, I
realise that I can store only 2**27 points into memory, which will
need 2 GB RAM. With an array of 2**28 Double_t points the program
crash ("segmentation violation"). I tried the Root cern framework, gsl
and fftw3 library. They all need to load the data into memory. So the
question is: Are there some mechanisms or algorithms to manage the
array on a TTree or somewhere else on the hard disk? And then load the
the data step by step into cache? Something likes a FileArray. Or you
get a better idea?
Consider rearranging the input file. The FFT algorithm computes the FFT
for the even data and the odd data separately and then combines the
result. Now, what if you'd copy the *even* data points to an "even"
data file and the *odd* points to an "odd" data file and compute the
FFT for these separately? If you review the FFT algorithm you'll notice
how you can easily sweep over these partial results to compute the
total result.

You can apply this recursively until you reach a data set size which
fits into memory.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Sep 25 '08 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Giovanni Ciampaglia | last post by:
Hello everyone, Could someone explain me what is the "steady-state" value contained into the array returned by FFT.fft(data,n=None,axis=-1) at its first position? Is it one of the values...
16
by: Justin Lazanowski | last post by:
Cross posting this question on the recommendation of an I have a .NET application that I am developing in C# I am loading information in from a dataset, and then pushing the dataset to a grid,...
7
by: Paul Brown | last post by:
I am confused - I have gone through a two day exercise tuning up an FFT routine, and at the last stage I find that where the biggest gains were expected, all my conventional understanding of...
7
by: Derrick | last post by:
I'm loading a boatload of data into a DataSet. The memory usage grows and grows for the app while loading that data. Calling GC.Collect() reduces the consumption slightly. When I minimize the...
9
by: Bernie Yaeger | last post by:
This is a question for sql server 2000 and vb .net and xp pro, so I am posting to all ng's. Is there any way of knowing or approximating the amount of memory exhausted in retaining a large...
3
by: vanvee | last post by:
Hi I have an application for my company's HR department where we store resumes for candidates we receive. I have an application that uses VB.Net and ADO.Net and data bindings (through code) to...
2
by: mcdurr | last post by:
I recently installed Python 2.5 on Windows and also installed numpy 1.0. I'd like to compute an FFT on an array of numbers but I can't seem to access the FFT function. I'm fairly new to Python...
4
FFT
by: Jon Harrop | last post by:
Anyone know of an FFT implementation for .NET that works for arbitrary "n" (not just integral powers of two)? -- Dr Jon D Harrop, Flying Frog Consultancy OCaml for Scientists...
3
by: Ken Fine | last post by:
This is a question that someone familiar with ASP.NET and ADO.NET DataSets and DataTables should be able to answer fairly easily. The basic question is how I can efficiently match data from one...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.