On Sun, 12 Oct 2008 14:06:16 -0700, hharry <pa*********@nyc.comwrote:
Hello All,
Does anyone know of a method to automatically detect if a file is
corrupted ?
[...]
I've tried using a streamreader to read the contents of the file,
hoping that an exception would be thrown if the file could not
properly be opened...this didn't work.
No, you're right it wouldn't.
The basic issue is that even though the data in the file may be corrupted,
as far as Windows is concerned, the file itself is fine. The data inside
doesn't match what the application that uses the file would expect, but
Windows has no knowledge of that. A file can be any arbitrary stream of
bytes, and Windows doesn't care _what_ those bytes are.
I've tried opening the file using the Process and ProcessStartInfo
classes, hoping that an exception would be thrown if the file did not
open correctly..this didn't work
Right. Because the best Process can do is use the shell to instruct an
application to open a file. But there's no well-defined mechanism for
that application to return an error (other than the exit code, which isn't
generated until the application actually quits and for GUI applications is
usually not going to report an error in any case).
The best I've come up with is a console app, which opens each file in
a specified directory, one-at-a-time, pauses for 10 seconds, and then
prompts the user to enter Y for corrupted or N for okay. Files which
elicit a Y response are logged as corrupted.
This approach requires user-input and ideally, I'd like to completely
automate the process. i.e, specifiy a directory and then press go and
read the log when the program completes.
Any ideas appreciated
I don't think you're going to find a 100% reliable general-purpose
automated way, unless you are willing to write code that understands _all_
of the file formats you care about.
For the Office files, you can use the Office interop classes to automate
the relevant application (Excel, Word, etc.) and try to open the document
from within the application. That should allow a failure to open the
document to be detectable. I think the latest version of Word might be
able to open PDF files as well, so it's possible you could use the same
approach for that format.
But for other formats, you would need a way of automating the process
that's supposed to be able to open those formats, and that's going to vary
widely from application to application, with some applications simply not
being able to be automated at all without a lot of work (and possibly some
ugly hacks).
Now, all that said, it's possible that you're going about this the wrong
way anyway. That is, there's nothing wrong with trying to get a quick
inventory of which files wound up corrupted. But you can't rely on this
inventory except to positively identify files that _don't_ work. Any file
that passes the inventory could still in fact be corrupted, depending on
what data got modified.
It's true that for any complex file format, the odds of a corruption that
leaves the file still valid are pretty low. But they aren't non-zero. If
you know the backup was corrupted, then to some extent you are just going
to have to assume that _all_ of the files may in fact be corrupted and
treat them as such until proved otherwise by some more reliable inspection
than just seeing if they can be opened in their respective application.
Pete