473,322 Members | 1,703 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,322 software developers and data experts.

File processing - is Python suitable?

I have not used Python before, but believe it may be what I need.

I have large text files containing text, numbers, and junk. I want to
delete large chunks process other bits, etc, much like I'd do in an
editor, but want to do it automatically. I have a set of generic
rules that my fingers follow to process these files, which all follow
a similar template.

Question: can I translate these types of rules into programmatical
constructs that Python can use to process these files? Can Python do
the trick?

ferrad

Jun 19 '07 #1
4 1501
ferrad wrote:
I have not used Python before, but believe it may be what I need.

I have large text files containing text, numbers, and junk. I want to
delete large chunks process other bits, etc, much like I'd do in an
editor, but want to do it automatically. I have a set of generic
rules that my fingers follow to process these files, which all follow
a similar template.

Question: can I translate these types of rules into programmatical
constructs that Python can use to process these files? Can Python do
the trick?
I think that's one of the great strength of Python.

Just some pointers

http://gnosis.cx/TPiP/
http://www.egenix.com/products/pytho...e/mxTextTools/
--
Helmut Jarausch

Lehrstuhl fuer Numerische Mathematik
RWTH - Aachen University
D 52056 Aachen, Germany
Jun 19 '07 #2
ferrad wrote:
I have large text files containing text, numbers, and junk. I want to
delete large chunks process other bits, etc, much like I'd do in an
editor, but want to do it automatically.
Question: can I translate these types of rules into programmatical
constructs that Python can use to process these files?
Someone can. ;-)
However if the file is structured,
awk may be faster, since this sounds
like the kind of report generation it
was designed for.

Alan Isaac
Jun 19 '07 #3
ferrad wrote:
I have not used Python before, but believe it may be what I need.

I have large text files containing text, numbers, and junk. I want to
delete large chunks process other bits, etc, much like I'd do in an
editor, but want to do it automatically. I have a set of generic
rules that my fingers follow to process these files, which all follow
a similar template.

Question: can I translate these types of rules into programmatical
constructs that Python can use to process these files? Can Python do
the trick?
Yes, and if you are a non-programmer, the entry barrier for Python is as low
as it can get. However, what a programming language treats as a rule is
much stricter than what a human being might expect. For example, appending
an 's' to the first word in a sentence is "easy" in Python, changing the
subject's numerus to plural is "hard". Both are doable, but the less
technical your rules are the harder they become to translate.

You often have to compromise either by proofreading the results of any
automated processing, or by having your program ask a human operator in the
cases it can't decide upon.

I recommend that you play around a bit in the interactive interpreter to get
a feel for the kind of operations that are easily available on strings.

Then write the processing rules into a script, and always start your
conversion from the original data (of which you you have a backup in some
locker), not some intermediate output. That way you can try processing
without losing information in the data or about the process -- until you
find the results acceptable. Make backups of your script, too, before you
are trying something new.

Peter
Jun 19 '07 #4
On Tue, 19 Jun 2007 05:15:17 -0700, ferrad <ac***@hotmail.comwrote:
I have not used Python before, but believe it may be what I need.

I have large text files containing text, numbers, and junk. I want to
delete large chunks process other bits, etc, much like I'd do in an
editor, but want to do it automatically. I have a set of generic
rules that my fingers follow to process these files, which all follow
a similar template.
Doesn't your text editor have recordable macros?
Question: can I translate these types of rules into programmatical
constructs that Python can use to process these files? Can Python do
the trick?
Impossible to tell, since we do not know these rules. If they need
your good judgement, intelligence, knowledge or taste, a good text
editor, with careful application of recorded macros, is the way to go.
Maybe in combination with a few Perl or Python scripts and special
features of the text editor. I often find myself doing that kind of
work, when the text I start with is too irregular to be easily machine
parsable.

On the other hand, if the work is purely mechanical, tedious stuff,
there is a fair chance that it can be completely automated using
Python. (IMHO, Perl is often a better tool for this kind of
work, but few other languages beat Python in this area.)

/Jorgen

--
// Jorgen Grahn <grahn@ Ph'nglui mglw'nafh Cthulhu
\X/ snipabacken.dyndns.org R'lyeh wgah'nagl fhtagn!
Jun 19 '07 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

220
by: Brandon J. Van Every | last post by:
What's better about Ruby than Python? I'm sure there's something. What is it? This is not a troll. I'm language shopping and I want people's answers. I don't know beans about Ruby or have...
5
by: Scott Brady Drummonds | last post by:
Hi, everyone, I'm a relative novice to Python and am trying to reduce the processing time for a very large text file that I am reading into my Python script. I'm currently reading each line one...
2
by: Geoffrey | last post by:
We have developed a python class that can read data files created from another application. These target files are C-ISAM files used for accounting applications so the "primary" application may be...
9
by: Hans-Joachim Widmaier | last post by:
Hi all. Handling files is an extremely frequent task in programming, so most programming languages have an abstraction of the basic files offered by the underlying operating system. This is...
3
by: rohisingh | last post by:
I have a tar file. The content of the file are as following. rohits@sandman 12-08-04 $ tar tvf 20041208.tar drwxr-xr-x root/root 0 2004-12-08 21:39:19 20041208/ -rw-r--r-- root/root ...
79
by: pinkfloydhomer | last post by:
I want to scan a file byte for byte for occurences of the the four byte pattern 0x00000100. I've tried with this: # start import sys numChars = 0 startCode = 0 count = 0
6
by: Cable | last post by:
Hello, I am hoping that someone can answer a question or two regarding file access. I have created an app that reads an image from a file then displays it (using OpenGL). It works well using...
122
by: seberino | last post by:
I'm interested in knowing which Python web framework is most like Ruby on Rails. I've heard of Subway and Django. Are there other Rails clones in Python land I don't know about? Which one...
16
by: Steven D'Aprano | last post by:
On Tue, 09 Sep 2008 14:59:19 -0700, castironpi wrote: You've created a solution to a problem which (probably) only affects a very small number of people, at least judging by your use-cases....
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.