473,385 Members | 1,901 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Memory management for Large dataset in python

Hi Everyone,

I am working on Latin dataset and I am supposed to read all the data from ~30,000 files. What i did was, I opened and read the file and written file contents in separate one file (say Master.nc) and then close the individual files. But Master.nc will not close unless to read the last file.

I need Master.nc file at the end having the content information of all ~30,000 files.

My program is running fine for small dataset (i.e., till 900 files). Whenever the dataset files increasing from 900 my program stucks and do not perform my desired processing.

Managing large file as Master.nc is difficult to handle in this situation. Please guide me how can I handle this situation, I need at the end one Master.nc file because I have to use it for training.

Please help me in this scenario.
Thanks alot
Mar 26 '12 #1
5 2202
dwblas
626 Expert 512MB
You should open the output file, then open, read, write, and close each of the input files before processing the next file. I don't understand what is meant by
But Master.nc will not close unless to read the last file.
as you close it after all the files are processed.
Expand|Select|Wrap|Line Numbers
  1. output = open(combined_file, "w")
  2.  
  3. for fname in list_of_30000:
  4.     fp=open(fname, "r"):
  5.     for rec in fp:
  6.         output.write(rec)
  7.     fp.close()
  8.  
  9. output.close() 
Mar 26 '12 #2
Actually I have to read, and save the contents of each file in separate netcdf file named Master.nc. Its mean all files content will be written in one file i.e., Master.nc. At the end I will have one file which should have contents of all files (in my case files=~30,000).
Mar 26 '12 #3
I currently read,write and close every file but Master.nc will remain open until to read, write and close all 30,000.
Mar 26 '12 #4
dwblas
626 Expert 512MB
That is correct. Also, are you sure that you are not running out of disk as the copy may require twice the amount of space on disk of the 30,000 files. You will have to post your code for any more detailed assistance.
Mar 27 '12 #5
Yes, it seems that I am running out of disk by doing all the stuff. Whenever files increased to 900 or more it automatically hangs further processing. It does not show me any error message but also not processed further. I have already used garbage collector function gc.collect() that also could not solve the problem.
Mar 27 '12 #6

Sign in to post your reply or Sign up for a free account.

Similar topics

3
by: Marcelo A. Camelo | last post by:
Hi! I will be presenting Python to an audience of game developers, mostly C/C++ programmers. In my presentation I will talk about using python and C/C++ extension instead of pure C/C++ to write...
36
by: Andrea Griffini | last post by:
I did it. I proposed python as the main language for our next CAD/CAM software because I think that it has all the potential needed for it. I'm not sure yet if the decision will get through, but...
7
by: Dan Nilsen | last post by:
Hi! I'm writing a small piece of software that basically runs on an embedded system with a Power-PC cpu. This runs on a stripped down version of Linux - Busybox. As I'm writing a piece of...
1
by: Rob Nicholson | last post by:
We're developing our first large scale ASP.NET web application and I'm a little concerned over memory usage of aspnet_wp.exe on the development server during testing. The application appears to use...
1
by: trialproduct2004 | last post by:
Hi all, I am having slight confusion regarding memory management in .net. Say suppose i have two application one is in C# and other is in MFC(VC++). Both of this application are using lots...
7
by: vansky | last post by:
Dear all, im porting a WIN APP to LINUX, and face some problems that r hard for me to solve, could u dear guys give me some hands? i'd like to reimplement the following funcs or replace them...
4
by: cesar.ortiz | last post by:
Hi, I am starting to have a look to a python program that does not free memory (I am using python 2.4.3). As I have read about a new memory management in python 2.5...
1
by: Huayang Xia | last post by:
Hi there, I have a piece of code like this: void funct(PyObject* pyobj) { char str; strncpy(str, "just a test string", sizeof(str)); PyObject* pydata = PyObject_CallMethod(pyobj,...
3
by: James | last post by:
I realize the Garbage Collector does a lot of this for me, but I'm having trouble wrapping my head around something. We've been running into System.OutOfMemoryException on our production servers...
34
by: jacob navia | last post by:
Suppose that you have a module that always allocates memory without ever releasing it because the guy that wrote it was lazy, as lazy as me. Now, you want to reuse it in a loop. What do you do?...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.