File handling: The easy and the hard way

Hans-Joachim Widmaier

Hi all.

Handling files is an extremely frequent task in programming, so most
programming languages have an abstraction of the basic files offered by
the underlying operating system. This is indeed also true for our language
of choice, Python. Its file type allows some extraordinary convenient
access like:

for line in open("blah"):
handle_line(line)

While this is very handy for interactive usage or throw-away scripts, I'd
consider it a serious bug in a "production quality" software. Tracebacks
are fine for programmers, but end users really never should see any.
Especially not when the error is not in the program itself, but rather
just a mistyped filename. (Most of my helper scripts that we use to
develop software handle files this way. And even my co-workers don't
recognize 'file or directory not found' for what it is.) End users are
entitled to error messages they can easily understand like "I could not
open 'blaah' because there is no such file". Graceful error handling is
even more important when a program isn't just run on a command line but
with a GUI.

Which means? Which means that all this convenient file handling that
Python offers really should not be used in programs you give away. When I
asked for a canonical file access pattern some months ago, this was the
result:
http://groups.google.com/groups?hl=d...uche%26meta%3D

Now I have some programs that read and write several files at once. And
the reading and writing is done all over the place. If I really wanted to
do it "right", my once clear and readily understandable code turns into a
nightmare. This doesn't look like the language I love for its clarity and
expressivness any more. Python, being a very high level language, needs a
higher level file type, IMHO. This is, of course, much easier said than
done. And renown dimwits like me aren't expected to come up with solutions.
I've thought about subclassing file, but to me it looks like it wouldn't
help much. With all this try/except framing you need to insert a call
level anyway (wondering if this new decorator stuff might help?). The best
I've come up so far is a vague idea for an error callback (if there isn't
one, the well known exceptions might be raised) that gets called for
whatever error occured, like:

class File:
...
def write(self, data):
while True:
try:
self._write(data)
except IOError, e:
if self.errorcallback:
ret, dat = self.errorcallback(self, F_WRITE, e, data)
if ret == F_RETURN:
return dat
else:
raise

The callback could then write a nice error message, abort the program,
maybe retry the operation (that's what the 'while True'-loop is for) or
return whatever value to the original caller. Although the callback
function will usually be more than a few lines, it can be reused. It can
even be packed into your own file-error-handling module, something the
original usage pattern can't.

If you still bear with me, you might as well sacrifice a few more seconds
and tell me what you think about my rant. Is everything just fine as it is
now? Or do I have a point? I always felt it most important to handle all
errors a program may encounter gracefully, and the easier this is to do,
the less likely it is a programmer will just sneak around the issue and
let the interpreter/run time system/operating system handle it. (And yes,
I'm guilty of not obeying it myself, as it can double or triple the time
needed to write the whole program; just because its so cumbersome.)

Waiting-for-you-to-jump-on-me'ly yours,
Hans-Joachim

Jul 18 '05 #1

Subscribe Post Reply

3174

Steve Holden

Hans-Joachim Widmaier wrote:

Hi all.

Handling files is an extremely frequent task in programming, so most
programming languages have an abstraction of the basic files offered by
the underlying operating system. This is indeed also true for our language
of choice, Python. Its file type allows some extraordinary convenient
access like:

for line in open("blah"):
handle_line(line)

While this is very handy for interactive usage or throw-away scripts, I'd
consider it a serious bug in a "production quality" software. Tracebacks
are fine for programmers, but end users really never should see any.
Especially not when the error is not in the program itself, but rather
just a mistyped filename. (Most of my helper scripts that we use to
develop software handle files this way. And even my co-workers don't
recognize 'file or directory not found' for what it is.) End users are
entitled to error messages they can easily understand like "I could not
open 'blaah' because there is no such file". Graceful error handling is
even more important when a program isn't just run on a command line but
with a GUI.
I agree we really shouldn't expect users to have to see tracebacks, but
that doesn't mean that exception handling has to be sophisticated.

Here's something I'd consider acceptable, which doesn't add hugely to
the programming overhead for multiple files but doesn't burden the user
with horrible tracebacks. I've used it to print itself, so you see how
it works and what it contains all in the same output:

sholden@DELLBOY ~
$ ./ft.py one.py ft.py
Problem handling file one.py : [Errno 2] No such file or directory: 'one.py'
#!/usr/bin/python
#
# ft.py: simple multi-file processor with error handling
#
import sys

files = sys.argv[1:]

for f in files:
try:
for l in file(f):
sys.stdout.write(l)
except Exception, reason:
print >> sys.stderr, "Problem handling file", f, ":", reason

I'm quite happy to let the process-termination housekeeping code, or
perhaps (in some implementations) the Python housekeeping at garbage
collection, close the file, which you might think is unduly sloppy. What
can I say, the user pays if they don't want sloppy :-). But I'd consider
this sufficiently close to "production quality" to be delivered to
end-users. Clearly you can add file assignment to a variable and a
try/finally to ensure it's closed. You gets what you pays for.

Naturally, if recovery is required rather than just error-reporting then
the situation can be expected to be a little more complicated.
Which means? Which means that all this convenient file handling that
Python offers really should not be used in programs you give away. When I
asked for a canonical file access pattern some months ago, this was the
result:
http://groups.google.com/groups?hl=d...uche%26meta%3D

Now I have some programs that read and write several files at once. And
the reading and writing is done all over the place. If I really wanted to
do it "right", my once clear and readily understandable code turns into a
nightmare. This doesn't look like the language I love for its clarity and
expressivness any more.
Well, the more complex your processing gets the more complex your
error-handling gets too, but I'd say you should look at some serious
refactoring here - you appear to have what's sometimes called a "code
smell" in extreme programming circles. See

http://c2.com/cgi/wiki/?CodeSmell

You also appear to have a good nose, one of the distinctive properties
of the conscientious programmer.
Python, being a very high level language, needs a
higher level file type, IMHO. This is, of course, much easier said than
done. And renown dimwits like me aren't expected to come up with solutions.
Don't talk yourself down! You have already shown sound instinct.
I've thought about subclassing file, but to me it looks like it wouldn't
help much. With all this try/except framing you need to insert a call
level anyway (wondering if this new decorator stuff might help?). The best
I've come up so far is a vague idea for an error callback (if there isn't
one, the well known exceptions might be raised) that gets called for
whatever error occured, like:

class File:
...
def write(self, data):
while True:
try:
self._write(data)
except IOError, e:
if self.errorcallback:
ret, dat = self.errorcallback(self, F_WRITE, e, data)
if ret == F_RETURN:
return dat
else:
raise

The callback could then write a nice error message, abort the program,
maybe retry the operation (that's what the 'while True'-loop is for) or
return whatever value to the original caller. Although the callback
function will usually be more than a few lines, it can be reused. It can
even be packed into your own file-error-handling module, something the
original usage pattern can't.
The problem that any such approach is likely to have can be summed up as
"If processing is complicated then error-handling may also become
complicated, and error-recovery even more so". You shouldn't expect it
to be too simple, but if it's too complex then you might find that a
restructuring of your program will yield better results.
If you still bear with me, you might as well sacrifice a few more seconds
and tell me what you think about my rant. Is everything just fine as it is
now? Or do I have a point? I always felt it most important to handle all
errors a program may encounter gracefully, and the easier this is to do,
the less likely it is a programmer will just sneak around the issue and
let the interpreter/run time system/operating system handle it. (And yes,
I'm guilty of not obeying it myself, as it can double or triple the time
needed to write the whole program; just because its so cumbersome.)

There's an old rule-of-thumb, which may come from "The Mythical
Man-Month", still quite a worthwhile read though probably at least 30
years old now. Or it may not. It states that you can expect to spend
three times as much effort producing a program product (something to be
delivered to end-users) as producing a program (something you plan to
use yourself, and write accordingly); and that a further factor of three
is required to produce a programmed system product (a collection of
programs which work together as a system and will be delivered to
end-users) over just producing the program products individually.

This combined factor of nine is often referred to as "engineering
effort", and includes

a) The creative exercise of imagination sufficient to anticipate the
usual and unusual failure cases;
b) The creative exercise of programming skill sufficient to ensure that
the failure cases still result in acceptable system behavior; and
c) The creative exercise of political skill sufficient to persuade a
reluctant management that steps a) and b) are worth paying for.

The combination of all three components is to be found in beasts
sometimes known as "software engineers", frequently held by some to be
mythical.

If I were feeling cynical, I might sum this up by saying "Python is a
programming language, not a f***ing magic wand". But that won't stop
people from looking for the silver bullet that solves all their problems
in a songle line of code.

Hope this helps, and doesn't come across as critical. Your questions are
reasonable, and show a sincere appreciation of the difficulties of
producing high-quality software. And we don't ever want anything else,
do we?

regards
Steve

Jul 18 '05 #2

Jeremy Jones

Hans-Joachim Widmaier wrote:

Hi all.

Handling files is an extremely frequent task in programming, so most
programming languages have an abstraction of the basic files offered by
the underlying operating system. This is indeed also true for our language
of choice, Python. Its file type allows some extraordinary convenient
access like:

for line in open("blah"):
handle_line(line)

While this is very handy for interactive usage or throw-away scripts, I'd
consider it a serious bug in a "production quality" software.
I disagree. If you get an exception, you get it for a reason. I'll try
to elaborate more below.
Tracebacks
are fine for programmers, but end users really never should see any.

Bushwa. End users may or may not be able to make sense of them, but it
doesn't mean they _should_ never see them.
Especially not when the error is not in the program itself, but rather
just a mistyped filename. (Most of my helper scripts that we use to
develop software handle files this way. And even my co-workers don't
recognize 'file or directory not found' for what it is.) End users are
entitled to error messages they can easily understand like "I could not
open 'blaah' because there is no such file".
So, you're saying that dumping a raw traceback like:
IOError: [Errno 2] No such file or directory: '/foo/bar/bam'
to a logfile is a no-no? Instead, it should say:

I'm sorry sir, but an error occurred when trying to write to file
/foo/bar/bam because it wasn't there.

I think the traceback is perfectly understandable. I think that even an
end-user would be able to comprehend that type of message. Or, if you
get an IOError, is this not sufficient:
IOError: [Errno 28] No space left on device
?

Chances are, end users aren't going to be particulary concerned with
exceptions you log - unless they've got a problem that they can't figure
out. And if they've got a problem they can't figure out, you'd be
better off giving them as much information as you can give them, or
they'll come to you for help. And when they do come to you for help,
you'd better make sure you've given yourself the most informatino you
can to solve the problem. So logging a traceback is a great idea IMHO.
Now, in areas where you're dead sure that an exception is nothing to be
concerned with, don't bother. So, a good approach may be: handle the
specific exceptions that you know may occur, let other unexpected (or
expected in worst case scenarios) exceptions filter up to a higher
level, log them there, and if need be, terminate program execution.
Graceful error handling is
even more important when a program isn't just run on a command line but
with a GUI.

Maybe so. But if you hit an "Oh, crap, what do I do now?" exception,
you may want to throw up a dialog box with a traceback or something and
when the user clicks OK on it, terminate program execution. That gives
them a chance to (unlikely) figure out what they can do to remedy the
situation, otherwise call for help.
Which means? Which means that all this convenient file handling that
Python offers really should not be used in programs you give away. When I
asked for a canonical file access pattern some months ago, this was the
result:
http://groups.google.com/groups?hl=d...uche%26meta%3D

Now I have some programs that read and write several files at once. And
the reading and writing is done all over the place. If I really wanted to
do it "right", my once clear and readily understandable code turns into a
nightmare. This doesn't look like the language I love for its clarity and
expressivness any more. Python, being a very high level language, needs a
higher level file type, IMHO. This is, of course, much easier said than
done. And renown dimwits like me aren't expected to come up with solutions.
I've thought about subclassing file, but to me it looks like it wouldn't
help much. With all this try/except framing you need to insert a call
level anyway (wondering if this new decorator stuff might help?). The best
I've come up so far is a vague idea for an error callback (if there isn't
one, the well known exceptions might be raised) that gets called for
whatever error occured, like:

class File:
...
def write(self, data):
while True:
try:
self._write(data)
except IOError, e:
if self.errorcallback:
ret, dat = self.errorcallback(self, F_WRITE, e, data)
if ret == F_RETURN:
return dat
else:
raise

The callback could then write a nice error message, abort the program,
maybe retry the operation (that's what the 'while True'-loop is for) or
return whatever value to the original caller. Although the callback
function will usually be more than a few lines, it can be reused. It can
even be packed into your own file-error-handling module, something the
original usage pattern can't.

Hmmm....interesting. Shouldn't you put a break after your
self._write(data)? This is probably not a bad way of going about
things, but what types of files are we talking about here? Log files?
I think you're probably better off using the builtin logging and just
dump raw tracebacks in there. Data files? Then you've probaby got that
wrapped in code to write formatted data to the data file anyway in which
case, this type of specialized class is probably not a bad thing. If
you're trying to write data to a data file, you don't want litter it
with error messages. You want to log it and, maybe even unlink the data
file and do something special.
If you still bear with me, you might as well sacrifice a few more seconds
and tell me what you think about my rant. Is everything just fine as it is
now? Or do I have a point? I always felt it most important to handle all
errors a program may encounter gracefully, and the easier this is to do,
the less likely it is a programmer will just sneak around the issue and
let the interpreter/run time system/operating system handle it. (And yes,
I'm guilty of not obeying it myself, as it can double or triple the time
needed to write the whole program; just because its so cumbersome.)
I dunno - something just doesn't feel right here. I kinda feel like
you're wanting to create an over-generalized solution. Your File class
is interesting and may be a good start for a lot of general solutions
and having a callback mechanism helps specialize it, but....something
just doesn't sit totally right here with me. This may work totally
perfectly and may be an excellent piece of code to handle all of your
file writing activities. I dunno....

You're not going to be able to catch every exception - not meaningfully,
anyway. You could do something like:

if __name__ == "__main__":
try:
main()
except Exception, e:
log(e)

But that isn't handling all errors....

Production quality code doesn't necessarily mean never terminating
because of an exception. You want to reduce the frequency of program
termination due to exceptions. I can appreciate your desire to make
sure you've got good solid software, and not encumber the end user with
ever little exception you hit, but sometimes it's OK to log/show
exceptions. Like I said earlier, when you hit an exception, you hit it
for a reason. Do your best to try to figure out what that reason is,
deal with it, figure out the most reasonable thing to do with _that_
exception, and move on. Sometimes that'll mean throwing a traceback to
a log file, sometimes it will mean handling it gracefully and "prettying
up" the message for logging or display to the end user, sometimes it
will mean totally ignoring it, other times you may need to just stop the
program. All of these resolutions can be part of a production quality
piece of software. The discerning programmer has to decide which
solution is appropriate for which situation. Like Steve Holden
mentioned, it's really good that you're concerned with such things, but
make sure you apply common sense to each scenario.
Waiting-for-you-to-jump-on-me'ly yours,
Hans-Joachim

Hope I didn't jump too hard.

Jeremy Jones

Jul 18 '05 #3

Hans-Joachim Widmaier

Am Thu, 30 Sep 2004 10:37:49 -0400 schrieb Steve Holden:

I agree we really shouldn't expect users to have to see tracebacks, but
that doesn't mean that exception handling has to be sophisticated.

Here's something I'd consider acceptable, which doesn't add hugely to
the programming overhead for multiple files but doesn't burden the user
with horrible tracebacks. I've used it to print itself, so you see how
it works and what it contains all in the same output:

sholden@DELLBOY ~
$ ./ft.py one.py ft.py
Problem handling file one.py : [Errno 2] No such file or directory: 'one.py'
Yes, this should be acceptable for the simple scripts I mentioned.
#!/usr/bin/python
#
# ft.py: simple multi-file processor with error handling # import sys

files = sys.argv[1:]

for f in files:
try:
for l in file(f):
sys.stdout.write(l)
except Exception, reason:
print >> sys.stderr, "Problem handling file", f, ":", reason
Looks very familiar, but doesn't handle '-' (stdin) (but that's another
story, albeit somewhat related). ;-)
I'm quite happy to let the process-termination housekeeping code, or
perhaps (in some implementations) the Python housekeeping at garbage
collection, close the file, which you might think is unduly sloppy. What
can I say, the user pays if they don't want sloppy :-). But I'd consider
this sufficiently close to "production quality" to be delivered to
end-users. Clearly you can add file assignment to a variable and a
try/finally to ensure it's closed. You gets what you pays for.
I often do that, too. But every time I do it, I feel guilty. (I did my
first somewhat more serious programming on an Amiga, where the OS did
_not_ clean up after you - which formed a habit of religiously keeping
track of and freeing each and every resource in each and every case.)
Well, the more complex your processing gets the more complex your
error-handling gets too,
This doesn't come as a surprise. :-) Yet I can see a class of basically
simple programs that are nonetheless meant to be of "production quality"
and where _most_ of the complexity stems from handling these errors.
but I'd say you should look at some serious refactoring here - you
appear to have what's sometimes called a "code smell" in extreme
programming circles. See

http://c2.com/cgi/wiki/?CodeSmell
Never heard of that before. ;-) Anyway, maybe my understanding of
'refactoring' is wrong, but isn't my desire to 'factor out' the error
handling so I can make it rigid/good and reusable (a good incentive for
me!) what might be called refactoring? Cluttering every script with always
the same error handling pattern, well, does have a smell to me. These
things are just too common to not being solved once and hidden in, e.g.,
a module.

The problem that any such approach is likely to have can be summed up as
"If processing is complicated then error-handling may also become
complicated, and error-recovery even more so". You shouldn't expect it
to be too simple, but if it's too complex then you might find that a
restructuring of your program will yield better results.
I do not think it'd be too simple. I think I can (mostly) cope with the
complexity. I'm disappointed because I haven't yet found a nice solution
where I can _hide_ this complexity and do not have to slap it in every
file that starts with '#!/usr/bin/env python'. Still dreaming of a File
object that does all this, maybe with a helper module "fileerrhandler".
Granted, the name is ugly. ;-)

["The Mythical Man-Month"]
The combination of all three components is to be found in beasts
sometimes known as "software engineers", frequently held by some to be
mythical.
Now this is a title that can never be applied to me. I'm basically a
hardware designer, mostly writing embedded programs and doing system
administration by now. And I must admit that I never really *learned*
programming. I'm not in the same league as most others here, so I'm always
uncertain whether my gut feeling (nose tingling?) is correct or totally
off track.
If I were feeling cynical, I might sum this up by saying "Python is a
programming language, not a f***ing magic wand". But that won't stop
people from looking for the silver bullet that solves all their problems
in a songle line of code.
I daresay that if 'sprintf()', 'gets()' and the like never existed, there
would be a lot less CERT advisories. Or, if a file type enforces sane
error handling, there would be less tracebacks. :-)
Hope this helps, and doesn't come across as critical. Your questions are
reasonable, and show a sincere appreciation of the difficulties of
producing high-quality software. And we don't ever want anything else,
do we?

I'm very grateful for your detailed reply. clp always strikes me as being
a place where you don't get smart-ass replies but elaborate answers that
sometimes look like articles in a classy magazine.

Thank you very much for your time and suggestions,
Hans-Joachim

Jul 18 '05 #4

Steve Holden

Jeremy Jones wrote:

Hans-Joachim Widmaier wrote:
[...]
Especially not when the error is not in the program itself, but rather
just a mistyped filename. (Most of my helper scripts that we use to
develop software handle files this way. And even my co-workers don't
recognize 'file or directory not found' for what it is.) End users are
entitled to error messages they can easily understand like "I could not
open 'blaah' because there is no such file".
So, you're saying that dumping a raw traceback like:
IOError: [Errno 2] No such file or directory: '/foo/bar/bam'
to a logfile is a no-no? Instead, it should say:

I'm sorry sir, but an error occurred when trying to write to file
/foo/bar/bam because it wasn't there.

A traceback is not an error message. When a non-programming user sees a
fifteen-line stack trace with python statements and line numbers in it
that's quite enough to stop most of them from even reading it to see if
there's anything they understand at all.
I think the traceback is perfectly understandable. I think that even an
end-user would be able to comprehend that type of message. Or, if you
get an IOError, is this not sufficient:
IOError: [Errno 28] No space left on device
?
Again, that's not a traceback. It's an error message.
Chances are, end users aren't going to be particulary concerned with
exceptions you log - unless they've got a problem that they can't figure
out. And if they've got a problem they can't figure out, you'd be
better off giving them as much information as you can give them, or
they'll come to you for help. And when they do come to you for help,
you'd better make sure you've given yourself the most informatino you
can to solve the problem. So logging a traceback is a great idea IMHO.
Now, in areas where you're dead sure that an exception is nothing to be
concerned with, don't bother. So, a good approach may be: handle the
specific exceptions that you know may occur, let other unexpected (or
expected in worst case scenarios) exceptions filter up to a higher
level, log them there, and if need be, terminate program execution.
This isn't about not terminating the program, it's about reporting the
reasons in a manner acceptable to average users.

Graceful error handling is
even more important when a program isn't just run on a command line but
with a GUI.

Maybe so. But if you hit an "Oh, crap, what do I do now?" exception,
you may want to throw up a dialog box with a traceback or something and
when the user clicks OK on it, terminate program execution. That gives
them a chance to (unlikely) figure out what they can do to remedy the
situation, otherwise call for help.

I'm all for LOGGING tracebacks. Indeed WingIDE is a beauty in this
respect, since it's also prepared to send feedback to Wing if you ask it
to, as is Mozilla and (nowadays) Internet Explorer.

Given this, there's little excuse for showing the traceback in the
regular case, though I don't object to allowing users to look for it if
they want.

Which means? Which means that all this convenient file handling that
Python offers really should not be used in programs you give away. When I
asked for a canonical file access pattern some months ago, this was the
result:
http://groups.google.com/groups?hl=d...uche%26meta%3D
Now I have some programs that read and write several files at once. And
the reading and writing is done all over the place. If I really wanted to
do it "right", my once clear and readily understandable code turns into a
nightmare. This doesn't look like the language I love for its clarity and
expressivness any more. Python, being a very high level language, needs a
higher level file type, IMHO. This is, of course, much easier said than
done. And renown dimwits like me aren't expected to come up with
solutions.
I've thought about subclassing file, but to me it looks like it wouldn't
help much. With all this try/except framing you need to insert a call
level anyway (wondering if this new decorator stuff might help?). The
best
I've come up so far is a vague idea for an error callback (if there isn't
one, the well known exceptions might be raised) that gets called for
whatever error occured, like:

class File:
...
def write(self, data):
while True:
try:
self._write(data)
except IOError, e:
if self.errorcallback:
ret, dat = self.errorcallback(self, F_WRITE, e, data)
if ret == F_RETURN:
return dat
else:
raise

The callback could then write a nice error message, abort the program,
maybe retry the operation (that's what the 'while True'-loop is for) or
return whatever value to the original caller. Although the callback
function will usually be more than a few lines, it can be reused. It can
even be packed into your own file-error-handling module, something the
original usage pattern can't.

Hmmm....interesting. Shouldn't you put a break after your
self._write(data)? This is probably not a bad way of going about
things, but what types of files are we talking about here? Log files?
I think you're probably better off using the builtin logging and just
dump raw tracebacks in there. Data files? Then you've probaby got that
wrapped in code to write formatted data to the data file anyway in which
case, this type of specialized class is probably not a bad thing. If
you're trying to write data to a data file, you don't want litter it
with error messages. You want to log it and, maybe even unlink the data
file and do something special.
If you still bear with me, you might as well sacrifice a few more seconds
and tell me what you think about my rant. Is everything just fine as
it is
now? Or do I have a point? I always felt it most important to handle all
errors a program may encounter gracefully, and the easier this is to do,
the less likely it is a programmer will just sneak around the issue and
let the interpreter/run time system/operating system handle it. (And yes,
I'm guilty of not obeying it myself, as it can double or triple the time
needed to write the whole program; just because its so cumbersome.)

I dunno - something just doesn't feel right here. I kinda feel like
you're wanting to create an over-generalized solution. Your File class
is interesting and may be a good start for a lot of general solutions
and having a callback mechanism helps specialize it, but....something
just doesn't sit totally right here with me. This may work totally
perfectly and may be an excellent piece of code to handle all of your
file writing activities. I dunno....

You're not going to be able to catch every exception - not meaningfully,
anyway. You could do something like:

if __name__ == "__main__":
try:
main()
except Exception, e:
log(e)

But that isn't handling all errors....

It's certainly catching all subclasses of Exception, though, which in
modern Python should be everything not handled inside (string exceptions
are a throwback, retained for compatibility reasons). As to whether they
are bing "handled", I guess that's a matter of opinion.
Production quality code doesn't necessarily mean never terminating
because of an exception. You want to reduce the frequency of program
termination due to exceptions. I can appreciate your desire to make
sure you've got good solid software, and not encumber the end user with
ever little exception you hit, but sometimes it's OK to log/show
exceptions. Like I said earlier, when you hit an exception, you hit it
for a reason. Do your best to try to figure out what that reason is,
deal with it, figure out the most reasonable thing to do with _that_
exception, and move on. Sometimes that'll mean throwing a traceback to
a log file, sometimes it will mean handling it gracefully and "prettying
up" the message for logging or display to the end user, sometimes it
will mean totally ignoring it, other times you may need to just stop the
program. All of these resolutions can be part of a production quality
piece of software. The discerning programmer has to decide which
solution is appropriate for which situation. Like Steve Holden
mentioned, it's really good that you're concerned with such things, but
make sure you apply common sense to each scenario.
Waiting-for-you-to-jump-on-me'ly yours,
Hans-Joachim

Hope I didn't jump too hard.

Possibly.
Jeremy Jones

regards
Steve

Jul 18 '05 #5

Steve Holden

Jeremy Jones wrote:

Hopefully, I'm not belaboring this.
[...belabors issue at great length :-) ...]

I hope that I haven't come across as antagonistic as I fear I may have.
I greatly respect you, Steve. You're a great writer and a valuable
contributor to this group. I don't think we're too far off from what
the other is saying. I don't even think I'm too far off from the spirit
of what the OP was trying to say. I just had a problem with what I
perceived the OP to be saying (WRT generalization of a solution) and
didn't care for how it came across. I feel better after getting that
out. Hope we can still be friends :-)
Well I certainly don't normally take offense when someone promotes a
defensible point of view in a reasonable manner, and I don't feel
inclined to make an exception (no pun intended) in your case.

I don't think our points of view are that far apart either, since we
both appear (like the majority of the Python community) to prefer
pragmatism to blindly following rigid rules.
Mr. Holden, I remain sincerely and respectfully yours,

Jeremy Jones

I think that probably the OP wasn't criticizing Python in the way you
believed, but in the final analysis Python is a programming language,
and it can't take offense. So I try not to take offense on its behalf.
And thank you for your complements. I'm glad you feel better!

regards
Steve
--
XXX Please note recent change of email address

Jul 18 '05 #6

Jeremy Jones

Hopefully, I'm not belaboring this. Just a few questions that need to
be addressed in the development of any application:
1. Who is the user?
2. What is the application? (duh)
3. What problems/errors/exceptions are going to occur and what do you
do with them when they do? (this is highly contingent upon the answers
to the first two questions)

We'll get back to this.
Steve Holden wrote:

Jeremy Jones wrote:
Hans-Joachim Widmaier wrote:
[...]
Especially not when the error is not in the program itself, but rather
just a mistyped filename. (Most of my helper scripts that we use to
develop software handle files this way. And even my co-workers don't
recognize 'file or directory not found' for what it is.) End users are
entitled to error messages they can easily understand like "I could not
open 'blaah' because there is no such file".

So, you're saying that dumping a raw traceback like:
IOError: [Errno 2] No such file or directory: '/foo/bar/bam'
to a logfile is a no-no? Instead, it should say:

I'm sorry sir, but an error occurred when trying to write to file
/foo/bar/bam because it wasn't there.

A traceback is not an error message. When a non-programming user sees
a fifteen-line stack trace with python statements and line numbers in
it that's quite enough to stop most of them from even reading it to
see if there's anything they understand at all.

You're partially right. Not all tracebacks are error messages. Not all
error messages are tracebacks. Tracebacks are when an exception
occurs. An error could happen and no traceback may result. In the
example of the OP of a mistyped filename, you could catch that error
before you try to open the file. That may not be the best way. I kinda
like relying on exceptions to help guide the flow of a program.
Nevertheless, they are different, but there can be overlap. Since the
OP was concerned with exceptions that may happed during opening and
writing to files, I focused on that.

Here we begin to answer the questions I posed above. Who is the user?
BTW - I didn't say it was a good idea in every case to always throw a
traceback in the user's face, nor do I think it's always advisable to
necessarily throw an error message in the user's face when an error
occurs. What I did say was to use common sense in every case. If you
are developing an application for a more technical segment of the
population, you may very well be justified in displaying an error
message at times *when common sense dictates*. Even for the technical
crowd, don't splat them with a stack trace just because you can and they
may be able to decipher it. If you're developing for the more general
public, use common sense there, too, and give them what may help them
solve the problem themselves. That's a good goal, eh? Let the user
solve their own problems to the extent that they can and ask for help
from the developer only when they absolutely need it. I think this is
the laziness virtue of a good programmer.

Another question to look at here is, what error occurred? Again, if the
user mistyped a filename, you should be able to gently tell the user
that the file they specified was not found and not bother them
(technical or not) with a traceback. If you lost the filesystem that
your data files live in.....well....you're pretty outa luck there.
Stack trace isn't going to hurt anything depending on your audience.

I think the traceback is perfectly understandable. I think that even
an end-user would be able to comprehend that type of message. Or, if
you get an IOError, is this not sufficient:
IOError: [Errno 28] No space left on device
?
Again, that's not a traceback. It's an error message.
Chances are, end users aren't going to be particulary concerned with
exceptions you log - unless they've got a problem that they can't
figure out. And if they've got a problem they can't figure out,
you'd be better off giving them as much information as you can give
them, or they'll come to you for help. And when they do come to you
for help, you'd better make sure you've given yourself the most
informatino you can to solve the problem. So logging a traceback is
a great idea IMHO. Now, in areas where you're dead sure that an
exception is nothing to be concerned with, don't bother. So, a good
approach may be: handle the specific exceptions that you know may
occur, let other unexpected (or expected in worst case scenarios)
exceptions filter up to a higher level, log them there, and if need
be, terminate program execution.

This isn't about not terminating the program, it's about reporting the
reasons in a manner acceptable to average users.

?? Did you take from what I said that I wasn't concerned with reporting
the reasons back to the user in an acceptable fashion? That's part of
what I said above with "handle the specific exceptions that you know may
occur." A good developer should try to figure out what can break and
address it before it does (within the bounds of common sense, of
course). Again, WRT common sense, quoting myself from below:
"""
Do your best to try to figure out what that reason is, deal with it,
figure out the most reasonable thing to do with _that_ exception, and
move on. Sometimes that'll mean throwing a traceback to a log file,
sometimes it will mean handling it gracefully and "prettying up" the
message for logging or display to the end user, sometimes it will mean
totally ignoring it, other times you may need to just stop the program.
"""

Honestly, Steve, I know I may not be the sharpest knife in the drawer,
but I'm getting the impression you're trying to make me out to be dumber
than I am....

I'm not saying that we need to just spew tracebacks everywhere. I'm
saying you need to handle them the best you can. _Sometimes_ that means
you know that an exception may possibly occur and the best thing you can
do is just let it hit the fan.

Graceful error handling is
even more important when a program isn't just run on a command line but
with a GUI.

Maybe so. But if you hit an "Oh, crap, what do I do now?" exception,
you may want to throw up a dialog box with a traceback or something
and when the user clicks OK on it, terminate program execution. That
gives them a chance to (unlikely) figure out what they can do to
remedy the situation, otherwise call for help.

I'm all for LOGGING tracebacks. Indeed WingIDE is a beauty in this
respect, since it's also prepared to send feedback to Wing if you ask
it to, as is Mozilla and (nowadays) Internet Explorer.

Given this, there's little excuse for showing the traceback in the
regular case, though I don't object to allowing users to look for it
if they want.

Tell that to the anal retentive security guy who's got the application
server totally cordoned off from the internet so that's an
impossibility. I know, you said, *regular case.* All I'm saying is, be
discerning. Know the user and what kind of error you may be getting.
Provide the information that the user and developer may need to solve
the problem that occurred. If that means sending it back via a feedback
agent, great. If that's dumping a stack trace to a log file, great.
Sometimes, if that means showing the exception to the user, great.

Which means? Which means that all this convenient file handling that
Python offers really should not be used in programs you give away.
When I
asked for a canonical file access pattern some months ago, this was the
result:
http://groups.google.com/groups?hl=d...uche%26meta%3D
Now I have some programs that read and write several files at once. And
the reading and writing is done all over the place. If I really
wanted to
do it "right", my once clear and readily understandable code turns
into a
nightmare. This doesn't look like the language I love for its
clarity and
expressivness any more. Python, being a very high level language,
needs a
higher level file type, IMHO. This is, of course, much easier said than
done. And renown dimwits like me aren't expected to come up with
solutions.
I've thought about subclassing file, but to me it looks like it
wouldn't
help much. With all this try/except framing you need to insert a call
level anyway (wondering if this new decorator stuff might help?).
The best
I've come up so far is a vague idea for an error callback (if there
isn't
one, the well known exceptions might be raised) that gets called for
whatever error occured, like:

class File:
...
def write(self, data):
while True:
try:
self._write(data)
except IOError, e:
if self.errorcallback:
ret, dat = self.errorcallback(self, F_WRITE, e,
data)
if ret == F_RETURN:
return dat
else:
raise

The callback could then write a nice error message, abort the program,
maybe retry the operation (that's what the 'while True'-loop is for) or
return whatever value to the original caller. Although the callback
function will usually be more than a few lines, it can be reused. It
can
even be packed into your own file-error-handling module, something the
original usage pattern can't.

Hmmm....interesting. Shouldn't you put a break after your
self._write(data)? This is probably not a bad way of going about
things, but what types of files are we talking about here? Log
files? I think you're probably better off using the builtin logging
and just dump raw tracebacks in there. Data files? Then you've
probaby got that wrapped in code to write formatted data to the data
file anyway in which case, this type of specialized class is probably
not a bad thing. If you're trying to write data to a data file, you
don't want litter it with error messages. You want to log it and,
maybe even unlink the data file and do something special.
If you still bear with me, you might as well sacrifice a few more
seconds
and tell me what you think about my rant. Is everything just fine as
it is
now? Or do I have a point? I always felt it most important to handle
all
errors a program may encounter gracefully, and the easier this is to
do,
the less likely it is a programmer will just sneak around the issue and
let the interpreter/run time system/operating system handle it. (And
yes,
I'm guilty of not obeying it myself, as it can double or triple the
time
needed to write the whole program; just because its so cumbersome.)

I dunno - something just doesn't feel right here. I kinda feel like
you're wanting to create an over-generalized solution. Your File
class is interesting and may be a good start for a lot of general
solutions and having a callback mechanism helps specialize it,
but....something just doesn't sit totally right here with me. This
may work totally perfectly and may be an excellent piece of code to
handle all of your file writing activities. I dunno....

You're not going to be able to catch every exception - not
meaningfully, anyway. You could do something like:

if __name__ == "__main__":
try:
main()
except Exception, e:
log(e)

But that isn't handling all errors....

It's certainly catching all subclasses of Exception, though, which in
modern Python should be everything not handled inside (string
exceptions are a throwback, retained for compatibility reasons). As to
whether they are bing "handled", I guess that's a matter of opinion.
Production quality code doesn't necessarily mean never terminating
because of an exception. You want to reduce the frequency of program
termination due to exceptions. I can appreciate your desire to make
sure you've got good solid software, and not encumber the end user
with ever little exception you hit, but sometimes it's OK to log/show
exceptions. Like I said earlier, when you hit an exception, you hit
it for a reason. Do your best to try to figure out what that reason
is, deal with it, figure out the most reasonable thing to do with
_that_ exception, and move on. Sometimes that'll mean throwing a
traceback to a log file, sometimes it will mean handling it
gracefully and "prettying up" the message for logging or display to
the end user, sometimes it will mean totally ignoring it, other times
you may need to just stop the program. All of these resolutions can
be part of a production quality piece of software. The discerning
programmer has to decide which solution is appropriate for which
situation. Like Steve Holden mentioned, it's really good that you're
concerned with such things, but make sure you apply common sense to
each scenario.
Waiting-for-you-to-jump-on-me'ly yours,
Hans-Joachim

Hope I didn't jump too hard.

Possibly.

Yes, I may have jumped him a bit hard. Here were my beefs:

1. The OP comes in in rant mode and almost states that Python is flawed
and inadequate to be released as production quality code - or that it is
impossible to write production quality code in Python because it is
flawed. I know that's not what he said, but that's the impression I got
from his overall tone and his comments:

While this is very handy for interactive usage or throw-away scripts, I'd
consider it a serious bug in a "production quality" software.

Python, being a very high level language, needs a
higher level file type, IMHO.

Honestly, they just kinda set me off a bit. I apologize to the OP if I
did jump a bit too hard, but when you waltz into c.l.p. and state what
it felt like he was saying, it's kind of like someone waltzing into my
living room and kicking my dog. It's just not going to go over too
well. I don't think Python needs a higher level file class. I still
think he's got a lot of good ideas in there. I think the thing that
didn't sit well with me earlier was that it came across sounding like,
"Python needs a new file class. You can't claim that your code is
production ready unless you do something similar to this." Which is a
nice segue into

2. It seemed to me that the OP was positing a solution to be used in a
broader context than necessary. Whereas I have stated over and over and
over that common sense must be applied to every situation and every
situation must be evaluated differently, I haven't gotten the same vibe
from the OP. Sometimes his solution may work just fine and dandy. May
even make a great open source project. I'm sure a lot of people would
use it. I might even use it some. I just got the impression that he
was speaking far too categorically for my tastes.

Jeremy Jones

regards
Steve

I hope that I haven't come across as antagonistic as I fear I may have.
I greatly respect you, Steve. You're a great writer and a valuable
contributor to this group. I don't think we're too far off from what
the other is saying. I don't even think I'm too far off from the spirit
of what the OP was trying to say. I just had a problem with what I
perceived the OP to be saying (WRT generalization of a solution) and
didn't care for how it came across. I feel better after getting that
out. Hope we can still be friends :-)

Mr. Holden, I remain sincerely and respectfully yours,

Jeremy Jones

Jul 18 '05 #7

Hans-Joachim Widmaier

Am Thu, 30 Sep 2004 11:15:32 -0400 schrieb Jeremy Jones:

Tracebacks
are fine for programmers, but end users really never should see any.

Bushwa. End users may or may not be able to make sense of them, but it
doesn't mean they _should_ never see them.

I see tracebacks as programmer errors. Now, a missing file isn't a
programmer error, but not handling it gracefully is.
I think the traceback is perfectly understandable. I think that even an
end-user would be able to comprehend that type of message. Or, if you
get an IOError, is this not sufficient: IOError: [Errno 28] No space
left on device ?
To programmers, yes. Or rather, no, as I tried to explain: even my
co-workers switch to 'your program crashed' mode when they're confronted
with a traceback. They never read that final line that tells them that
the sole problem was a typo on their part.
Chances are, end users aren't going to be particulary concerned with
exceptions you log - unless they've got a problem that they can't figure
out. And if they've got a problem they can't figure out, you'd be
better off giving them as much information as you can give them, or
they'll come to you for help. And when they do come to you for help,
you'd better make sure you've given yourself the most informatino you
can to solve the problem. So logging a traceback is a great idea IMHO.
Now, in areas where you're dead sure that an exception is nothing to be
concerned with, don't bother. So, a good approach may be: handle the
specific exceptions that you know may occur, let other unexpected (or
expected in worst case scenarios) exceptions filter up to a higher
level, log them there, and if need be, terminate program execution.
I don't want to _log_ exceptions, I want to handle them in a meaningful
way. Of course, if an end user gets a traceback because of a program
error, I'm more than happy with the traceback. This was unforseen
(otherwise I'd handled that).

I dunno - something just doesn't feel right here. I kinda feel like
you're wanting to create an over-generalized solution.
Yes, that's bugging me, too. I'm not all that happy with it myself,
otherwise I'd started to write such a class in the meantime.
Hope I didn't jump too hard.

I can take it.

Thanks for the good advice,
Hans-Joachim

Jul 18 '05 #8

Jorgen Grahn

On Thu, 30 Sep 2004 11:15:32 -0400, Jeremy Jones <za******@bellsouth.net> wrote:

Hans-Joachim Widmaier wrote: ....
Especially not when the error is not in the program itself, but rather
just a mistyped filename. (Most of my helper scripts that we use to
develop software handle files this way. And even my co-workers don't
recognize 'file or directory not found' for what it is.) End users are
entitled to error messages they can easily understand like "I could not
open 'blaah' because there is no such file".

So, you're saying that dumping a raw traceback like:
IOError: [Errno 2] No such file or directory: '/foo/bar/bam'
to a logfile is a no-no? Instead, it should say:

I'm sorry sir, but an error occurred when trying to write to file
/foo/bar/bam because it wasn't there.

That approach also means different programs will explain the same problem
(failing to use a file provided by the user) in different terms. Bad!
I think the traceback is perfectly understandable. I think that even an
end-user would be able to comprehend that type of message.

I would prefer the normal Unix layout of the message:

/foo/bar/bam: No such file or directory

Seeing language internal things like "IOError" puts some people off.

/Jorgen

--
// Jorgen Grahn <jgrahn@ Ph'nglui mglw'nafh Cthulhu
\X/ algonet.se> R'lyeh wgah'nagl fhtagn!

Jul 18 '05 #9

Thorsten Kampe

* Hans-Joachim Widmaier (2004-09-30 15:56 +0200)

Handling files is an extremely frequent task in programming, so most
programming languages have an abstraction of the basic files offered by
the underlying operating system. This is indeed also true for our language
of choice, Python. Its file type allows some extraordinary convenient
access like:

for line in open("blah"):
handle_line(line)

While this is very handy for interactive usage or throw-away scripts, I'd
consider it a serious bug in a "production quality" software. Tracebacks
are fine for programmers, but end users really never should see any.
Correct. Set sys.tracebacklimit to zero (and leave it at the default
1000 for "--debug').
Especially not when the error is not in the program itself, but rather
just a mistyped filename. (Most of my helper scripts that we use to
develop software handle files this way. And even my co-workers don't
recognize 'file or directory not found' for what it is.) End users are
entitled to error messages they can easily understand like "I could not
open 'blaah' because there is no such file". Graceful error handling is
even more important when a program isn't just run on a command line but
with a GUI.

Which means? Which means that all this convenient file handling that
Python offers really should not be used in programs you give away. When I
asked for a canonical file access pattern some months ago, this was the
result:
http://groups.google.com/groups?hl=d...uche%26meta%3D

Now I have some programs that read and write several files at once. And
the reading and writing is done all over the place. If I really wanted to
do it "right", my once clear and readily understandable code turns into a
nightmare. This doesn't look like the language I love for its clarity and
expressivness any more.

I think the whole task is rather complicated. Python just passes the
error message of the operating system through without interpreting it.
Of course "no such file" is b***sh*t for an error message (because
it's even lacking a verb) but can Python by itself be more
sophisticated about these things than the OS? I don't think so.

Think about "/foo/bar/baz/text.txt doesn't exist". There are multiple
reasons

1. There is a directory /foo/bar/baz/ but no test.txt in it.
2. The whole path hierarchy is missing: there isn't even a /foo.
3. It was there, I know it because wrote to the file, but now I cannot
access the directory anymore. Is the file deleted? The directory? Did
I lose access to the network where the share is actually located?

Now this is where the intelligence of the programmer comes in and they
try to guess. They say: "Sorry, I'm not sitting in front of the
machinem but this error message could mean this." And oftenly they are
ridiciously wrong.

If anyone ever made some kind of "error matrix" - meaning if error foo
and bar but not baz then it's likely whatever, the whole community
could profit.

Thorsten

Jul 18 '05 #10

File handling: The easy and the hard way

Similar topics