473,511 Members | 16,846 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Memory error due to the huge/huge input file size

Hello Everyone,

I need to read a .csv file which has a size of 2.26 GB . And I wrote a
Python script , where I need to read this file. And my Computer has 2
GB RAM Please see the code as follows:

"""
This program has been developed to retrieve all the promoter sequences
for the specified
list of genes in the given cluster

So, this program will act as a substitute to the whole EZRetrieve
system

Input arguments:

1) Cluster.txt or DowRatClust161718bwithDummy.txt
2) TransProCrossReferenceAndSequences.csv -This is the file that has
all the promoter sequences
3) -2000
4) 500
"""

import time
import csv
import sys
import linecache
import re
from sets import Set
import gc

print time.localtime()

fileInputHandler = open(sys.argv[1],"r")
line = fileInputHandler.readline()

refSeqIDsinTransPro = []
promoterSequencesinTransPro = []
reader2 = csv.reader(open(sys.argv[2],"rb"))
reader2_list = []
reader2_list.extend(reader2)

for data2 in reader2_list:
refSeqIDsinTransPro.append(data2[3])
for data2 in reader2_list:
promoterSequencesinTransPro.append(data2[4])

while line:
l = line.rstrip('\n')
for j in range(1,len(refSeqIDsinTransPro)):
found = re.search(l,refSeqIDsinTransPro[j])
if found:
"""promoterSequencesinTransPro[j] """
print l

line = fileInputHandler.readline()
fileInputHandler.close()
The error that I got is given as follows:
Traceback (most recent call last):
File "RefSeqsToPromoterSequences.py", line 31, in <module>
reader2_list.extend(reader2)
MemoryError

I understand that the issue is Memory error and it is caused because
of the line reader2_list.extend(reader2). Is there any other
alternative method in reading the .csv file line by line?

sincerely,
Suprabhath
Nov 10 '08 #1
3 4546
On Tue, Nov 11, 2008 at 7:47 AM, <te******@gmail.comwrote:
refSeqIDsinTransPro = []
promoterSequencesinTransPro = []
reader2 = csv.reader(open(sys.argv[2],"rb"))
reader2_list = []
reader2_list.extend(reader2)
Without testing, this looks like you're reading the _ENTIRE_
input stream into memory! Try this:

def readCSV(file):

if type(file) == str:
fd = open(file, "rU")
else:
fd = file

sniffer = csv.Sniffer()
dialect = sniffer.sniff(fd.readline())
fd.seek(0)

reader = csv.reader(fd, dialect)
for line in reader:
yield line

for line in readCSV(open("foo.csv", "r")):
...

--JamesMills

--
--
-- "Problems are solved by method"
Nov 10 '08 #2
On Nov 11, 8:47*am, tejsu...@gmail.com wrote:
import linecache
Why???
reader2 = csv.reader(open(sys.argv[2],"rb"))
reader2_list = []
reader2_list.extend(reader2)

for data2 in reader2_list:
* *refSeqIDsinTransPro.append(data2[3])
for data2 in reader2_list:
* *promoterSequencesinTransPro.append(data2[4])

All you need to do is replace the above by:

reader2 = csv.reader(open(sys.argv[2],"rb"))

for data2 in reader2:
refSeqIDsinTransPro.append(data2[3])
promoterSequencesinTransPro.append(data2[4])
Nov 10 '08 #3
On Nov 10, 4:47*pm, tejsu...@gmail.com wrote:
Hello Everyone,

I need to read a .csv file which has a size of 2.26 GB . And I wrote a
Python script , where I need to read this file. And my Computer has 2
GB RAM Please see the code as follows:

"""
This program has been developed to retrieve all the promoter sequences
for the specified
list of genes in the given cluster

So, this program will act as a substitute to the whole EZRetrieve
system

Input arguments:

1) Cluster.txt or DowRatClust161718bwithDummy.txt
2) TransProCrossReferenceAndSequences.csv -This is the file that has
all the promoter sequences
3) -2000
4) 500
"""

import time
import csv
import sys
import linecache
import re
from sets import Set
import gc

print time.localtime()

fileInputHandler = open(sys.argv[1],"r")
line = fileInputHandler.readline()

refSeqIDsinTransPro = []
promoterSequencesinTransPro = []
reader2 = csv.reader(open(sys.argv[2],"rb"))
reader2_list = []
reader2_list.extend(reader2)

for data2 in reader2_list:
* *refSeqIDsinTransPro.append(data2[3])
for data2 in reader2_list:
* *promoterSequencesinTransPro.append(data2[4])

while line:
* *l = line.rstrip('\n')
* *for j in range(1,len(refSeqIDsinTransPro)):
* * * found = re.search(l,refSeqIDsinTransPro[j])
* * * if found:
* * * * *"""promoterSequencesinTransPro[j] *"""
* * * * *print l

* *line = fileInputHandler.readline()

fileInputHandler.close()

The error that I got is given as follows:
Traceback (most recent call last):
* File "RefSeqsToPromoterSequences.py", line 31, in <module>
* * reader2_list.extend(reader2)
MemoryError

I understand that the issue is Memory error and it is caused because
of the *line reader2_list.extend(reader2). Is there any other
alternative method in reading the .csv file *line by line?

sincerely,
Suprabhath
Thanks a Lot James Mills. It worked

Nov 20 '08 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
4521
by: Joe | last post by:
I have an upload file operation in the web application. UploadForm.asp is the form, and UploadAction.asp is the form processing. //UploadForm.asp <FORM NAME="InputForm"...
67
4193
by: Steven T. Hatton | last post by:
Some people have suggested the desire for code completion and refined edit-time error detection are an indication of incompetence on the part of the programmer who wants such features. ...
10
3783
by: VM | last post by:
How can I limit the use of the PC's virtual memory? I'm running a process that basically takes a txt file and loads it to a datatable. The problem is that the file is over 400,000 lines long (77...
12
5333
by: Jeff Calico | last post by:
I have 2 XML data files that I want to extract data from simultaneously and transform with XSLT to generate a report. The first file is huge and when XSLT builds the DOM tree in memory, it runs...
11
7558
by: Hari Sekhon | last post by:
I do import zipfile zip=zipfile.ZipFile('d:\somepath\cdimage.zip') zip.namelist() then either of the two: A) file('someimage.iso','w').write(zip.read('someimage.iso'))
30
4661
by: MAG1301 | last post by:
I've detected memory leaks in our huge .NET 1.1 C# application but couldn't localize them directly. So I've reduced the code to the following console application: using System; using System.IO;...
11
3755
by: skumar434 | last post by:
Hi everybody, I am faceing problem while assigning the memory dynamically to a array of structures . Suppose I have a structure typedef struct hom_id{ int32_t nod_de; int32_t hom_id;
2
3253
by: Mike | last post by:
Hi, I am new to C and having problems with the following program. Basically I am trying to read some files, loading data structures into memory for latter searching. I am trying to use structres...
7
1886
by: ucfcpegirl06 | last post by:
Hello, Maybe someone can help me with this. I believe I have a memory allocation problem. The program crashes w/ a debug error. cpp file: #include <cstdio> #include <cstring> #include...
0
7242
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
7138
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7423
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
7081
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
5668
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
4737
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3225
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
1
781
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
447
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.