FFT: Problem with large dataset in memory

Simon Lu

Hi,

I am working in a project dealing with a big many of data. Now I have
a problem with doing the FFT on a very long time trace, a signal with
over 300 million sampling points. After testing with my computer, I
realise that I can store only 2**27 points into memory, which will
need 2 GB RAM. With an array of 2**28 Double_t points the program
crash ("segmentation violation"). I tried the Root cern framework, gsl
and fftw3 library. They all need to load the data into memory. So the
question is: Are there some mechanisms or algorithms to manage the
array on a TTree or somewhere else on the hard disk? And then load the
the data step by step into cache? Something likes a FileArray. Or you
get a better idea?
This is really urgent. I am very grateful if I can hear something from
you!
THX!

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Sep 24 '08 #1

Subscribe Post Reply

2933

Christoph Kliemt

Moin!

Simon Lu <lu****@googlemail.comwrites:

Hi,

I am working in a project dealing with a big many of data. Now I have
a problem with doing the FFT on a very long time trace, a signal with
over 300 million sampling points. After testing with my computer, I
realise that I can store only 2**27 points into memory, which will
need 2 GB RAM. With an array of 2**28 Double_t points the program
crash ("segmentation violation").

[...]

And then load the the data step by step into cache? Something likes
a FileArray. Or you get a better idea? This is really urgent. I am
very grateful if I can hear something from you! THX!

man mmap

hth,

Christoph

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Sep 24 '08 #2

joseph cook

This is really urgent. I am very grateful if I can hear something from
you!
THX!

This is off-topic and unrelated to C++.

You can do the FFT in parts in divide and conquer. Do the even
elements and the odd elements individually. If that's too much,
divide each of those FFTs into odd and even, then recombine.

FFT is a classic divide and conquer algorithm. I think the book
"Numerical Recipes" may be useful here.

Joe

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Sep 24 '08 #3

=?UTF-8?B?RXJpayBXaWtzdHLDtm0=?=

On 2008-09-24 21:28, Simon Lu wrote:

Hi,

I am working in a project dealing with a big many of data. Now I have
a problem with doing the FFT on a very long time trace, a signal with
over 300 million sampling points. After testing with my computer, I
realise that I can store only 2**27 points into memory, which will
need 2 GB RAM. With an array of 2**28 Double_t points the program
crash ("segmentation violation"). I tried the Root cern framework, gsl
and fftw3 library. They all need to load the data into memory. So the
question is: Are there some mechanisms or algorithms to manage the
array on a TTree or somewhere else on the hard disk? And then load the
the data step by step into cache? Something likes a FileArray. Or you
get a better idea?
This is really urgent. I am very grateful if I can hear something from

There are many ways to do what you want, and the best way depends on
your needs. If you need to operate on the whole dataset at once you have
somewhat of a problem, if you are able to work with just a few data-
points at a time you should be able to "stream" the data through your
algorithm and can probably get away with a fairly small memory footprint.

None of this is however directly topical in this group, is is on a to
high level and have no C++ specific parts, try asking in a general
programming group, such as comp.programming, instead.

If you are running on Windows you might also be able to increase the
virtual memory available for your application to 3GB, for more info:
http://www.microsoft.com/whdc/system...AE/PAEmem.mspx

--
Erik WikstrÃ¶m
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Sep 24 '08 #4

nobyjos

On Sep 24, 8:28 pm, Simon Lu <luq...@googlemail.comwrote:

Hi,

I am working in a project dealing with a big many of data. Now I have
a problem with doing the FFT on a very long time trace, a signal with
over 300 million sampling points. After testing with my computer, I
realise that I can store only 2**27 points into memory, which will
need 2 GB RAM. With an array of 2**28 Double_t points the program
crash ("segmentation violation"). I tried the Root cern framework, gsl
and fftw3 library. They all need to load the data into memory. So the
question is: Are there some mechanisms or algorithms to manage the
array on a TTree or somewhere else on the hard disk? And then load the
the data step by step into cache? Something likes a FileArray. Or you
get a better idea?
This is really urgent. I am very grateful if I can hear something from
you!
THX!

Write the whole array into a file (binary format)
Then link the file to the process using mmap()
then you can use some sort of indexing to locate the data in the file.
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Sep 24 '08 #5

Greg Herlihy

On Sep 24, 12:28 pm, Simon Lu <luq...@googlemail.comwrote:

I am working in a project dealing with a big many of data. Now I have
a problem with doing the FFT on a very long time trace, a signal with
over 300 million sampling points. After testing with my computer, I
realise that I can store only 2**27 points into memory, which will
need 2 GB RAM. With an array of 2**28 Double_t points the program
crash ("segmentation violation").

Clearly, the most natural solution would be to port your current C++
program to a 64-bit execution environment. A 64-bit memory model would
be able to accommodate vastly larger amounts of data in RAM than is
possible under the program's current, 32-bit memory model.

Now porting a C++ program to a 64-bit environment might not turn out
to be as straightforward as one might expect. Issues with C++'s sign
promotion of integer types - especially - can lead to surprising
results. Apple's 64-bit transition programmer's guide: (http://
preview.tinyurl.com/3ns3b5) has some examples.

Greg
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Sep 25 '08 #6

Marco Manfredini

Simon Lu wrote:

Hi,

I am working in a project dealing with a big many of data. Now I have
a problem with doing the FFT on a very long time trace, a signal with
over 300 million sampling points. After testing with my computer, I
realise that I can store only 2**27 points into memory, which will
need 2 GB RAM. With an array of 2**28 Double_t points the program
crash ("segmentation violation"). I tried the Root cern framework, gsl
and fftw3 library. They all need to load the data into memory. So the
question is: Are there some mechanisms or algorithms to manage the
array on a TTree or somewhere else on the hard disk? And then load the
the data step by step into cache? Something likes a FileArray. Or you
get a better idea?

Consider rearranging the input file. The FFT algorithm computes the FFT
for the even data and the odd data separately and then combines the
result. Now, what if you'd copy the *even* data points to an "even"
data file and the *odd* points to an "odd" data file and compute the
FFT for these separately? If you review the FFT algorithm you'll notice
how you can easily sweep over these partial results to compute the
total result.

You can apply this recursively until you reach a data set size which
fits into memory.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Sep 25 '08 #7

Similar topics

[Numpy-FFT] data output format

by: Giovanni Ciampaglia | last post by:

Hello everyone, Could someone explain me what is the "steady-state" value contained into the array returned by FFT.fft(data,n=None,axis=-1) at its first position? Is it one of the values...

Python

Memory issues with Terminal Server??

by: Justin Lazanowski | last post by:

Cross posting this question on the recommendation of an I have a .NET application that I am developing in C# I am loading information in from a dataset, and then pushing the dataset to a grid,...

.NET Framework

Counter Intuitive Results: Optimising an FFT routine for Speed

by: Paul Brown | last post by:

I am confused - I have gone through a two day exercise tuning up an FFT routine, and at the last stage I find that where the biggest gains were expected, all my conventional understanding of...

C / C++

Garbage Collection with Weak References

by: Derrick | last post by:

I'm loading a boatload of data into a DataSet. The memory usage grows and grows for the app while loading that data. Calling GC.Collect() reduces the consumption slightly. When I minimize the...

C# / C Sharp

size of memory used to open a large table with ado .net

by: Bernie Yaeger | last post by:

This is a question for sql server 2000 and vb .net and xp pro, so I am posting to all ng's. Is there any way of knowing or approximating the amount of memory exhausted in retaining a large...

Visual Basic .NET

Navigation Form with Large Number of Records - Help!

by: vanvee | last post by:

Hi I have an application for my company's HR department where we store resumes for candidates we receive. I have an application that uses VB.Net and ADO.Net and data bindings (through code) to...

Visual Basic .NET

Computing FFT with Python NumPy 1.0

by: mcdurr | last post by:

I recently installed Python 2.5 on Windows and also installed numpy 1.0. I'd like to compute an FFT on an array of numbers but I can't seem to access the FFT function. I'm fairly new to Python...

Python

FFT

by: Jon Harrop | last post by:

Anyone know of an FFT implementation for .NET that works for arbitrary "n" (not just integral powers of two)? -- Dr Jon D Harrop, Flying Frog Consultancy OCaml for Scientists...

.NET Framework

How can I make this more efficient? (combining DataSet results with the results of a DB lookup.)

by: Ken Fine | last post by:

This is a question that someone familiar with ASP.NET and ADO.NET DataSets and DataTables should be able to answer fairly easily. The basic question is how I can efficiently match data from one...

ASP.NET

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General