string parsing screwing up on large files?

Daniel Kramer

Hello, I'm fairly new to python but I've written a script that takes
in a special text file (a renderman .rib to be specific).. and filters
some of the commands. The .rib file is a simple text file, but in
some cases it's very large.. can be 20megs or more at times.

The script steps though each line looking for keywords and changes the
line if nessisary but most lines just pass in and out of the script
un-modified. The problem is sometimes the lines aren't written out
correctly and it's an intermittent problem. If I re-run the script
again on the same input usually it works fine. After filtering about
100 files i might get 4 or 5 that come out bad.. simply re-running
those fixes them.

Anyone know what I might look for? It's possible that the machine is
under a lot of i/o load and/or cpu load when it happens, but not sure
about that.. I normally send this processing to a render farm, so it's
hard to predict exactly what sort of load is going on at that time. It
feels like a buffer isn't getting flushed before the text is written
out.. or something like that.

Any suggestions where I might look?

thanks

daniel

Jul 18 '05 #1

Subscribe Post Reply

1551

Rene Pijlman

Daniel Kramer:

Any suggestions where I might look?

In the source code, probably. I've looked long and hard at your posting,
but I didn't find any bug there.

--
René Pijlman

Jul 18 '05 #2

Bengt Richter

On 19 Dec 2003 18:55:29 -0800, da*********@yahoo.com (Daniel Kramer) wrote:

Hello, I'm fairly new to python but I've written a script that takes
in a special text file (a renderman .rib to be specific).. and filters
some of the commands. The .rib file is a simple text file, but in
some cases it's very large.. can be 20megs or more at times.

The script steps though each line looking for keywords and changes the
line if nessisary but most lines just pass in and out of the script
un-modified. The problem is sometimes the lines aren't written out
correctly and it's an intermittent problem. If I re-run the script
again on the same input usually it works fine. After filtering about
100 files i might get 4 or 5 that come out bad.. simply re-running
those fixes them.

Anyone know what I might look for? It's possible that the machine is
under a lot of i/o load and/or cpu load when it happens, but not sure
about that.. I normally send this processing to a render farm, so it's
hard to predict exactly what sort of load is going on at that time. It
feels like a buffer isn't getting flushed before the text is written
out.. or something like that.

Any suggestions where I might look?

What is telling you that some lines aren't correct? Renderman syntax errors?
Maybe if you saved the bad file(s) and re-ran the changes until you got a good
one, and then ran diff -u goodfile badfile to see how things were actually
changing, it would become clear. Or if not, you could post some diffs and
the code that should be accomplishing the changes, and we could go from there.

Is the code threaded? Are you perhaps clobbering something across threads
occasionally? Accidental name collisions? Unsychronized accesses?

You might also want to mention what platform and python version etc you are running.
Maybe there is a file system bug that an upgrade would fix? It doesn't happen often,
but it might be worth googling for for your platform.

Regards,
Bengt Richter

Jul 18 '05 #3

Similar topics

Parsing for Performance

by: Paul | last post by:

I have users who want to search 6 different large flat xml documents I can only fit 3 of these documents into memory at one time So I continually have to swap XML documents in and out of memory...

.NET Framework

Parsing large XML files FAST

by: PedroX | last post by:

Hello: I need to parse some large XML files, and save the data in an Access DB. I was using MSXML 2 and ASP, but it turns out to be extremely slow when then XML documents are like 10 mb in...

.NET Framework

Parsing large files by line

by: Kevin | last post by:

Does anyone have a suggestion for parsing large files line by line without loading the entire file into memory first? I don't want to use file() because the files I'm working with may be...

PHP

file parsing algorithms in vb.net?

by: Christoph Bisping | last post by:

Hello! Maybe someone is able to give me a little hint on this: I've written a vb.net app which is mainly an interpreter for specialized CAD/CAM files. These files mainly contain simple movement...

Visual Basic .NET

Find a String in VB.NET

by: Federico G. Babelis | last post by:

Hi, I need to extract a string from another string separated by "," like a .csv file. for example I have this string: String1 = 000,federico,00,439827HGH,1233,FGB,0000,00,000 and from that...

Visual Basic .NET

Stream from FTP directly to MySQL while parsing CSV

by: Eric Anderson | last post by:

I have some files that sit on a FTP server. These files contain data stored in a tab-separated format. I need to download these files and insert/update them in a MySQL database. My current basic...

PHP

Building several parsing modules

by: Robert Neville | last post by:

Basically, I want to create a table in html, xml, or xslt; with any number of regular expressions; a script (Perl or Python) which reads each table row (regex and replacement); and performs the...

Python

Parsing a text file

by: JJ | last post by:

Whats the best way for me to pull out records from a tab delimited text file? Or rather HOW do I parse the text, knowing that the tabs are field delimiters and a return (I image) signifies a new...

ASP.NET

converting a rss download stream to a string

by: hd95 | last post by:

I have an rss download stream I am parsing using the xmltextreader object. I am finding that lone ampersands and dashes in the content are screwing up the parsing process. If someone knows how to...

Visual Basic .NET

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA