reading large file

guillaume

I have to read and process a large ASCII file containing a mesh : a
list of points and triangles.
The file is 100 MBytes.

I first tried to do it in memory but I think I am running out of
memory therefore I decide to use the shelve
module to store my points and elements on disks.
Despite the fact it is slow ... Any hint ? I think I have the same
memory problem but I don't understand why
since my aPoint should be removed by the gc.

Have you any idea ?

Thanks

Guillaume

PS :
here is the code for your info

import string
import os
import sys
import time
import resource
import shelve
import psyco

psyco.full()

class point:
def __init__(self,x,y,z):
self.x = x
self.y = y
self.z = z
def SFMImport(filename):
print 'UNV Import ("%s")' % filename

db = shelve.open('points.db')

file = open(filename, "r")

linenumber = 1
nbpoints = 0
nbfaces = 0

pointList = []
faceList = []

line = file.readline()
words = string.split(line)
nbpoints = string.atoi(words[1])
nbtrias = string.atoi(words[0])

print "found %s points and %s triangles" % (nbpoints, nbtrias)

t1 = time.time()
for i in range(nbpoints):
line = file.readline()
words = string.split(line)

x = string.atof(words[1].replace("D","E"))
y = string.atof(words[2].replace("D","E"))
z = string.atof(words[3].replace("D","E"))

aPoint = point(x, y, z)

as = "point%s" % i

if (i%250000 == 0):
print "%7d points <%s>" % (i, time.time() - t1)
t1 = time.time()

db[as] = aPoint

print "%s points read in %s seconds" % (nbpoints, time.time() - t1)
bd.close()

t1 = time.time()
t2 = time.time()
for i in range(nbtrias):
line = file.readline()
words = string.split(line)

i1 = string.atoi(words[0])
i2 = string.atoi(words[1])
i3 = string.atoi(words[2])

faceList.append((i1,i2,i3))

if (i%100000 == 0):
print "%s faces <%s>" % (i, time.time() - t1)
t1 = time.time()

print "%s points read in %s seconds" % (nbpoints, time.time() - t2)

file.close()

def callback(fs):
filename = fs.filename
UNVImport(filename)
if __name__ == "__main__":
# try:
# import GUI
# except:
# print "This script is only working with the new GUI module
...."
# else:
# fs = GUI.FileSelector()
# fs.activate(callback, fs)
print sys.argv[0]
SFMImport(sys.argv[1])

Jul 18 '05 #1

Subscribe Post Reply

6544

Michael Peuser

"guillaume" <g_******@yahoo.fr> schrieb im Newsbeitrag
news:9b**************************@posting.google.c om...

I have to read and process a large ASCII file containing a mesh : a
list of points and triangles.
The file is 100 MBytes.

I first tried to do it in memory but I think I am running out of
memory therefore I decide to use the shelve
module to store my points and elements on disks.
Despite the fact it is slow ... Any hint ? I think I have the same
memory problem but I don't understand why
since my aPoint should be removed by the gc.

What do you expect from shelve? I should recommend you convert your data in
afirst pass into a binary format (doing all this atoi() in this pre-pass)
Then use memory mapped file access when reading it for your work pass.

But maybe you need a lot of memory for your internal structure as well. If
youe have a small RAM <512 MB the system could do a lot of swapping. You
will notice that when processor load goes down! The cheapest solution
generally is doubling your RAM.

Kindly
Michael P

Jul 18 '05 #2

Paul Rubin

g_******@yahoo.fr (guillaume) writes:

print "found %s points and %s triangles" % (nbpoints, nbtrias)

t1 = time.time()
for i in range(nbpoints):

For another thing, use xrange instead of range here.

Jul 18 '05 #3

Bengt Richter

On 3 Sep 2003 05:00:39 -0700, g_******@yahoo.fr (guillaume) wrote:

I have to read and process a large ASCII file containing a mesh : a
list of points and triangles.
The file is 100 MBytes.

I first tried to do it in memory but I think I am running out of
memory therefore I decide to use the shelve
module to store my points and elements on disks.
Despite the fact it is slow ... Any hint ? I think I have the same
memory problem but I don't understand why
since my aPoint should be removed by the gc.

Have you any idea ?

Since your data is very homogeneous, why don't you store it in a couple of
homogeneous arrays? You could easily create a class to give you convenient
access via indices or iterators etc. Also you could write load and store
methods that could write both arrays in binary to a file. You could
consider doing this as a separate conversion from your source file, and
then run your app using the binary files and wrapper class.

Arrays are described in the array module docs ;-)
I imagine you'd want to use the 'd' type for ponts and 'l' for faces.

Regards,
Bengt Richter

Jul 18 '05 #4

Sophie Alléon

Thanks to your comments, it is now possible to read my large file in a
couple of minutes
on my machine.

Guillaume
"Bengt Richter" <bo**@oz.net> a écrit dans le message de news:
bj**********@216.39.172.122...

On 3 Sep 2003 05:00:39 -0700, g_******@yahoo.fr (guillaume) wrote:
I have to read and process a large ASCII file containing a mesh : a
list of points and triangles.
The file is 100 MBytes.

I first tried to do it in memory but I think I am running out of
memory therefore I decide to use the shelve
module to store my points and elements on disks.
Despite the fact it is slow ... Any hint ? I think I have the same
memory problem but I don't understand why
since my aPoint should be removed by the gc.

Have you any idea ?

Since your data is very homogeneous, why don't you store it in a couple of
homogeneous arrays? You could easily create a class to give you convenient
access via indices or iterators etc. Also you could write load and store
methods that could write both arrays in binary to a file. You could
consider doing this as a separate conversion from your source file, and
then run your app using the binary files and wrapper class.

Arrays are described in the array module docs ;-)
I imagine you'd want to use the 'd' type for ponts and 'l' for faces.

Regards,
Bengt Richter

Jul 18 '05 #5

Bengt Richter

On Fri, 5 Sep 2003 08:26:12 +0200, "Sophie Alléon" <al****@club-internet.fr> wrote:

<toppost moved to preferred location below ;-) />

"Bengt Richter" <bo**@oz.net> a écrit dans le message de news:
bj**********@216.39.172.122...
On 3 Sep 2003 05:00:39 -0700, g_******@yahoo.fr (guillaume) wrote:
>I have to read and process a large ASCII file containing a mesh : a
>list of points and triangles.
>The file is 100 MBytes.
>
>I first tried to do it in memory but I think I am running out of
>memory therefore I decide to use the shelve
>module to store my points and elements on disks.
>Despite the fact it is slow ... Any hint ? I think I have the same
>memory problem but I don't understand why
>since my aPoint should be removed by the gc.
>
>Have you any idea ?
> Since your data is very homogeneous, why don't you store it in a couple of
homogeneous arrays? You could easily create a class to give you convenient
access via indices or iterators etc. Also you could write load and store
methods that could write both arrays in binary to a file. You could
consider doing this as a separate conversion from your source file, and
then run your app using the binary files and wrapper class.

Arrays are described in the array module docs ;-)
I imagine you'd want to use the 'd' type for ponts and 'l' for faces.

Regards,
Bengt Richter

<topPostText>Thanks to your comments, it is now possible to read my large file in a
couple of minutes
on my machine.

Guillaume

</topPostText>

Well, so long as you're happy, glad to have played a role ;-)

But I would think that time could still be cut a fair amount. E.g., I imagine just copying
your file at the command line might take 20-25 sec, depending on your system,
and if you have a fast processor, you should be i/o bound a lot, so a lot of
the conversions etc. should be able to happen mostly while waiting for the disk.

There doesn't seem to be any way to tell the array module an estimated full (over or exact)capacity
for an array yet to be populated, but I would think such a feature in the array module would be good
for your kind of application. (Of course, hopefully the fromfile method increases size with a single
memory allocation, but you can't use that if your data requires conversion or filtering (scanf/printf
per-line conversion from/to ascii files might be another useful feature?)).

Anyway, even as is, I'd bet we could get the time down to under a minute, if it was important.
Of course, a couple of minutes is not bad if you're not going to do it over and over.

Regards,
Bengt Richter

Jul 18 '05 #6

Adam Przybyla

guillaume <g_******@yahoo.fr> wrote:

I have to read and process a large ASCII file containing a mesh : a
list of points and triangles.
The file is 100 MBytes.

I first tried to do it in memory but I think I am running out of
memory therefore I decide to use the shelve
module to store my points and elements on disks.
Despite the fact it is slow ... Any hint ? I think I have the same
memory problem but I don't understand why
since my aPoint should be removed by the gc. Have you any idea ?

... try PyTables;-) Regards
Adam Przybyla

Jul 18 '05 #7

Similar topics

Problem reading large file

by: ohaya | last post by:

Hi, I'm a real newbie, but have been asked to try to fix a problem in one of our JSP pages that is suppose to read in a text file and display it. From my testing thus far, it appears this page...

Java

FSO + XMLHTTP + reading large files + errr....

by: Steven Burn | last post by:

The application; Service on my webserver that allows a user to upload their HOSTS file for functions to verify the contents are still valid. Uses; 1. XMLHTTP (MSXML2) 2. FileSystemObject...

ASP / Active Server Pages

large file support

by: Joseph | last post by:

Hi, I'm having bit of questions on recursive pointer. I have following code that supports upto 8K files but when i do a file like 12K i get a segment fault. I Know it is in this line of code. ...

C / C++

reading a file in reverse order (bootom-top)

by: sahukar praveen | last post by:

Hello, I have a question. I try to print a ascii file in reverse order( bottom-top). Here is the logic. 1. Go to the botton of the file fseek(). move one character back to avoid the EOF. 2....

C / C++

reading large text files in reverse - optimization doubts

by: Rajorshi Biswas | last post by:

Hi folks, Suppose I have a large (1 GB) text file which I want to read in reverse. The number of characters I want to read at a time is insignificant. I'm confused as to how best to do it. Upon...

C / C++

"Not Responding" when reading and processing large file

by: Jimbo | last post by:

I'm working on a win app that reads and processes each line of an ascii file until the end of the file. Since the file's 1.6 million lines long, after a while Windows displays the "Not Responding"...

C# / C Sharp

Reading LARGE image files for web output

by: Brad | last post by:

I'm working on a web app which will display LARGE tiff image files (e.g files 10-20+ mb). Files are hidden from users direct access. For other, smaller image files I have used FileStream to read...

Visual Basic .NET

include large file - right choice ?

by: Bob Bedford | last post by:

hello there, I've a file in wich I've almost all the text of my website. I do this because the site is multilingual and this is easier to translate. The file is becoming ever larger as we add...

PHP

reading binary file

by: Use*n*x | last post by:

Hello, I have a binary file (image file) and am reading 4-bytes at a time. The File size is 63,480,320 bytes. My assumption is that if I loop through this file reading 4 bytes at a time, I...

C / C++

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General