468,738 Members | 2,463 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,738 developers. It's quick & easy.

Detecting files that won't open

Hello All,

Does anyone know of a method to automatically detect if a file is
corrupted ?
Due to a failed backup process a number of files were corrupted. The
files are mostly .xls, .doc, .pdf. When you try to open the file, a
dialog box opens stating that the file cannot be opened.

I need to log which files wont open properly, and I'm hoping there is
a way to automate the process.

I've tried using a streamreader to read the contents of the file,
hoping that an exception would be thrown if the file could not
properly be opened...this didn't work.

I've tried opening the file using the Process and ProcessStartInfo
classes, hoping that an exception would be thrown if the file did not
open correctly..this didn't work

The best I've come up with is a console app, which opens each file in
a specified directory, one-at-a-time, pauses for 10 seconds, and then
prompts the user to enter Y for corrupted or N for okay. Files which
elicit a Y response are logged as corrupted.

This approach requires user-input and ideally, I'd like to completely
automate the process. i.e, specifiy a directory and then press go and
read the log when the program completes.

Any ideas appreciated
Oct 12 '08 #1
2 4315
On Sun, 12 Oct 2008 14:06:16 -0700, hharry <pa*********@nyc.comwrote:
Hello All,

Does anyone know of a method to automatically detect if a file is
corrupted ?
[...]

I've tried using a streamreader to read the contents of the file,
hoping that an exception would be thrown if the file could not
properly be opened...this didn't work.
No, you're right it wouldn't.

The basic issue is that even though the data in the file may be corrupted,
as far as Windows is concerned, the file itself is fine. The data inside
doesn't match what the application that uses the file would expect, but
Windows has no knowledge of that. A file can be any arbitrary stream of
bytes, and Windows doesn't care _what_ those bytes are.
I've tried opening the file using the Process and ProcessStartInfo
classes, hoping that an exception would be thrown if the file did not
open correctly..this didn't work
Right. Because the best Process can do is use the shell to instruct an
application to open a file. But there's no well-defined mechanism for
that application to return an error (other than the exit code, which isn't
generated until the application actually quits and for GUI applications is
usually not going to report an error in any case).
The best I've come up with is a console app, which opens each file in
a specified directory, one-at-a-time, pauses for 10 seconds, and then
prompts the user to enter Y for corrupted or N for okay. Files which
elicit a Y response are logged as corrupted.

This approach requires user-input and ideally, I'd like to completely
automate the process. i.e, specifiy a directory and then press go and
read the log when the program completes.

Any ideas appreciated
I don't think you're going to find a 100% reliable general-purpose
automated way, unless you are willing to write code that understands _all_
of the file formats you care about.

For the Office files, you can use the Office interop classes to automate
the relevant application (Excel, Word, etc.) and try to open the document
from within the application. That should allow a failure to open the
document to be detectable. I think the latest version of Word might be
able to open PDF files as well, so it's possible you could use the same
approach for that format.

But for other formats, you would need a way of automating the process
that's supposed to be able to open those formats, and that's going to vary
widely from application to application, with some applications simply not
being able to be automated at all without a lot of work (and possibly some
ugly hacks).

Now, all that said, it's possible that you're going about this the wrong
way anyway. That is, there's nothing wrong with trying to get a quick
inventory of which files wound up corrupted. But you can't rely on this
inventory except to positively identify files that _don't_ work. Any file
that passes the inventory could still in fact be corrupted, depending on
what data got modified.

It's true that for any complex file format, the odds of a corruption that
leaves the file still valid are pretty low. But they aren't non-zero. If
you know the backup was corrupted, then to some extent you are just going
to have to assume that _all_ of the files may in fact be corrupted and
treat them as such until proved otherwise by some more reliable inspection
than just seeing if they can be opened in their respective application.

Pete
Oct 12 '08 #2
MC
"Peter Duniho" <Np*********@nnowslpianmk.comwrote in message
news:op***************@petes-computer.local...
Now, all that said, it's possible that you're going about this the wrong
way anyway. That is, there's nothing wrong with trying to get a quick
inventory of which files wound up corrupted. But you can't rely on this
inventory except to positively identify files that _don't_ work. Any file
that passes the inventory could still in fact be corrupted, depending on
what data got modified.

It's true that for any complex file format, the odds of a corruption that
leaves the file still valid are pretty low. But they aren't non-zero. If
you know the backup was corrupted, then to some extent you are just going
to have to assume that _all_ of the files may in fact be corrupted and
treat them as such until proved otherwise by some more reliable inspection
than just seeing if they can be opened in their respective application.
Well said! That's an *extremely* important point. Corruption of a file
could leave it openable but change the data in it.
Oct 13 '08 #3

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

5 posts views Thread by Jole | last post: by
10 posts views Thread by Frances Del Rio | last post: by
2 posts views Thread by Chad Smith | last post: by
3 posts views Thread by Sagaert Johan | last post: by
4 posts views Thread by jcrouse | last post: by
16 posts views Thread by iwdu15 | last post: by
18 posts views Thread by Fuzzyman | last post: by
12 posts views Thread by ABN | last post: by
7 posts views Thread by EliteBadger | last post: by
reply views Thread by zhoujie | last post: by
xarzu
2 posts views Thread by xarzu | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.