By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
457,915 Members | 1,334 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 457,915 IT Pros & Developers. It's quick & easy.

File processing - is Python suitable?

P: n/a
I have not used Python before, but believe it may be what I need.

I have large text files containing text, numbers, and junk. I want to
delete large chunks process other bits, etc, much like I'd do in an
editor, but want to do it automatically. I have a set of generic
rules that my fingers follow to process these files, which all follow
a similar template.

Question: can I translate these types of rules into programmatical
constructs that Python can use to process these files? Can Python do
the trick?

ferrad

Jun 19 '07 #1
Share this Question
Share on Google+
4 Replies


P: n/a
ferrad wrote:
I have not used Python before, but believe it may be what I need.

I have large text files containing text, numbers, and junk. I want to
delete large chunks process other bits, etc, much like I'd do in an
editor, but want to do it automatically. I have a set of generic
rules that my fingers follow to process these files, which all follow
a similar template.

Question: can I translate these types of rules into programmatical
constructs that Python can use to process these files? Can Python do
the trick?
I think that's one of the great strength of Python.

Just some pointers

http://gnosis.cx/TPiP/
http://www.egenix.com/products/pytho...e/mxTextTools/
--
Helmut Jarausch

Lehrstuhl fuer Numerische Mathematik
RWTH - Aachen University
D 52056 Aachen, Germany
Jun 19 '07 #2

P: n/a
ferrad wrote:
I have large text files containing text, numbers, and junk. I want to
delete large chunks process other bits, etc, much like I'd do in an
editor, but want to do it automatically.
Question: can I translate these types of rules into programmatical
constructs that Python can use to process these files?
Someone can. ;-)
However if the file is structured,
awk may be faster, since this sounds
like the kind of report generation it
was designed for.

Alan Isaac
Jun 19 '07 #3

P: n/a
ferrad wrote:
I have not used Python before, but believe it may be what I need.

I have large text files containing text, numbers, and junk. I want to
delete large chunks process other bits, etc, much like I'd do in an
editor, but want to do it automatically. I have a set of generic
rules that my fingers follow to process these files, which all follow
a similar template.

Question: can I translate these types of rules into programmatical
constructs that Python can use to process these files? Can Python do
the trick?
Yes, and if you are a non-programmer, the entry barrier for Python is as low
as it can get. However, what a programming language treats as a rule is
much stricter than what a human being might expect. For example, appending
an 's' to the first word in a sentence is "easy" in Python, changing the
subject's numerus to plural is "hard". Both are doable, but the less
technical your rules are the harder they become to translate.

You often have to compromise either by proofreading the results of any
automated processing, or by having your program ask a human operator in the
cases it can't decide upon.

I recommend that you play around a bit in the interactive interpreter to get
a feel for the kind of operations that are easily available on strings.

Then write the processing rules into a script, and always start your
conversion from the original data (of which you you have a backup in some
locker), not some intermediate output. That way you can try processing
without losing information in the data or about the process -- until you
find the results acceptable. Make backups of your script, too, before you
are trying something new.

Peter
Jun 19 '07 #4

P: n/a
On Tue, 19 Jun 2007 05:15:17 -0700, ferrad <ac***@hotmail.comwrote:
I have not used Python before, but believe it may be what I need.

I have large text files containing text, numbers, and junk. I want to
delete large chunks process other bits, etc, much like I'd do in an
editor, but want to do it automatically. I have a set of generic
rules that my fingers follow to process these files, which all follow
a similar template.
Doesn't your text editor have recordable macros?
Question: can I translate these types of rules into programmatical
constructs that Python can use to process these files? Can Python do
the trick?
Impossible to tell, since we do not know these rules. If they need
your good judgement, intelligence, knowledge or taste, a good text
editor, with careful application of recorded macros, is the way to go.
Maybe in combination with a few Perl or Python scripts and special
features of the text editor. I often find myself doing that kind of
work, when the text I start with is too irregular to be easily machine
parsable.

On the other hand, if the work is purely mechanical, tedious stuff,
there is a fair chance that it can be completely automated using
Python. (IMHO, Perl is often a better tool for this kind of
work, but few other languages beat Python in this area.)

/Jorgen

--
// Jorgen Grahn <grahn@ Ph'nglui mglw'nafh Cthulhu
\X/ snipabacken.dyndns.org R'lyeh wgah'nagl fhtagn!
Jun 19 '07 #5

This discussion thread is closed

Replies have been disabled for this discussion.