Memory management for Large dataset in python

Hi Everyone,

I am working on Latin dataset and I am supposed to read all the data from ~30,000 files. What i did was, I opened and read the file and written file contents in separate one file (say Master.nc) and then close the individual files. But Master.nc will not close unless to read the last file.

I need Master.nc file at the end having the content information of all ~30,000 files.

My program is running fine for small dataset (i.e., till 900 files). Whenever the dataset files increasing from 900 my program stucks and do not perform my desired processing.

Managing large file as Master.nc is difficult to handle in this situation. Please guide me how can I handle this situation, I need at the end one Master.nc file because I have to use it for training.

Please help me in this scenario.
Thanks alot

Mar 26 '12 #1

Subscribe Post Reply

2202

dwblas

626

Expert 512MB

You should open the output file, then open, read, write, and close each of the input files before processing the next file. I don't understand what is meant by

But Master.nc will not close unless to read the last file.

as you close it after all the files are processed.

Expand|Select|Wrap|Line Numbers

 output = open(combined_file, "w")
 
for fname in list_of_30000:

    fp=open(fname, "r"):

    for rec in fp:

        output.write(rec)

    fp.close()
 
output.close()

Mar 26 '12 #2

Saad Bin Ahmed

Actually I have to read, and save the contents of each file in separate netcdf file named Master.nc. Its mean all files content will be written in one file i.e., Master.nc. At the end I will have one file which should have contents of all files (in my case files=~30,000).

Mar 26 '12 #3

Saad Bin Ahmed

I currently read,write and close every file but Master.nc will remain open until to read, write and close all 30,000.

Mar 26 '12 #4

dwblas

626

Expert 512MB

That is correct. Also, are you sure that you are not running out of disk as the copy may require twice the amount of space on disk of the 30,000 files. You will have to post your code for any more detailed assistance.

Mar 27 '12 #5

Saad Bin Ahmed

Yes, it seems that I am running out of disk by doing all the stuff. Whenever files increased to 900 or more it automatically hangs further processing. It does not show me any error message but also not processed further. I have already used garbage collector function gc.collect() that also could not solve the problem.

Mar 27 '12 #6

Similar topics

Python memory management

by: Marcelo A. Camelo | last post by:

Hi! I will be presenting Python to an audience of game developers, mostly C/C++ programmers. In my presentation I will talk about using python and C/C++ extension instead of pure C/C++ to write...

Python

Is there a "Large Scale Python Software Design" ?

by: Andrea Griffini | last post by:

I did it. I proposed python as the main language for our next CAD/CAM software because I think that it has all the potential needed for it. I'm not sure yet if the decision will get through, but...

Python

Memory management and allocation

by: Dan Nilsen | last post by:

Hi! I'm writing a small piece of software that basically runs on an embedded system with a Power-PC cpu. This runs on a stripped down version of Linux - Busybox. As I'm writing a piece of...

C / C++

ASP.NET memory management

by: Rob Nicholson | last post by:

We're developing our first large scale ASP.NET web application and I'm a little concerned over memory usage of aspnet_wp.exe on the development server during testing. The application appears to use...

ASP.NET

Question regarding memory management

by: trialproduct2004 | last post by:

Hi all, I am having slight confusion regarding memory management in .net. Say suppose i have two application one is in C# and other is in MFC(VC++). Both of this application are using lots...

C# / C Sharp

port Win app to Linux Questions: Memory management and OEM Char

by: vansky | last post by:

Dear all, im porting a WIN APP to LINUX, and face some problems that r hard for me to solve, could u dear guys give me some hands? i'd like to reimplement the following funcs or replace them...

C / C++

Memory Management in python 2.5

by: cesar.ortiz | last post by:

Hi, I am starting to have a look to a python program that does not free memory (I am using python 2.4.3). As I have read about a new memory management in python 2.5...

Python

Memory Management in Embedded Python

by: Huayang Xia | last post by:

Hi there, I have a piece of code like this: void funct(PyObject* pyobj) { char str; strncpy(str, "just a test string", sizeof(str)); PyObject* pydata = PyObject_CallMethod(pyobj,...

Python

Generic Memory Management

by: James | last post by:

I realize the Garbage Collector does a lot of this for me, but I'm having trouble wrapping my head around something. We've been running into System.OutOfMemoryException on our production servers...

ASP.NET

Software memory management reengineering

by: jacob navia | last post by:

Suppose that you have a module that always allocates memory without ever releasing it because the guy that wrote it was lazy, as lazy as me. Now, you want to reuse it in a loop. What do you do?...

C / C++

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++