473,386 Members | 1,715 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Detecting files that won't open

Hello All,

Does anyone know of a method to automatically detect if a file is
corrupted ?
Due to a failed backup process a number of files were corrupted. The
files are mostly .xls, .doc, .pdf. When you try to open the file, a
dialog box opens stating that the file cannot be opened.

I need to log which files wont open properly, and I'm hoping there is
a way to automate the process.

I've tried using a streamreader to read the contents of the file,
hoping that an exception would be thrown if the file could not
properly be opened...this didn't work.

I've tried opening the file using the Process and ProcessStartInfo
classes, hoping that an exception would be thrown if the file did not
open correctly..this didn't work

The best I've come up with is a console app, which opens each file in
a specified directory, one-at-a-time, pauses for 10 seconds, and then
prompts the user to enter Y for corrupted or N for okay. Files which
elicit a Y response are logged as corrupted.

This approach requires user-input and ideally, I'd like to completely
automate the process. i.e, specifiy a directory and then press go and
read the log when the program completes.

Any ideas appreciated
Oct 12 '08 #1
2 4590
On Sun, 12 Oct 2008 14:06:16 -0700, hharry <pa*********@nyc.comwrote:
Hello All,

Does anyone know of a method to automatically detect if a file is
corrupted ?
[...]

I've tried using a streamreader to read the contents of the file,
hoping that an exception would be thrown if the file could not
properly be opened...this didn't work.
No, you're right it wouldn't.

The basic issue is that even though the data in the file may be corrupted,
as far as Windows is concerned, the file itself is fine. The data inside
doesn't match what the application that uses the file would expect, but
Windows has no knowledge of that. A file can be any arbitrary stream of
bytes, and Windows doesn't care _what_ those bytes are.
I've tried opening the file using the Process and ProcessStartInfo
classes, hoping that an exception would be thrown if the file did not
open correctly..this didn't work
Right. Because the best Process can do is use the shell to instruct an
application to open a file. But there's no well-defined mechanism for
that application to return an error (other than the exit code, which isn't
generated until the application actually quits and for GUI applications is
usually not going to report an error in any case).
The best I've come up with is a console app, which opens each file in
a specified directory, one-at-a-time, pauses for 10 seconds, and then
prompts the user to enter Y for corrupted or N for okay. Files which
elicit a Y response are logged as corrupted.

This approach requires user-input and ideally, I'd like to completely
automate the process. i.e, specifiy a directory and then press go and
read the log when the program completes.

Any ideas appreciated
I don't think you're going to find a 100% reliable general-purpose
automated way, unless you are willing to write code that understands _all_
of the file formats you care about.

For the Office files, you can use the Office interop classes to automate
the relevant application (Excel, Word, etc.) and try to open the document
from within the application. That should allow a failure to open the
document to be detectable. I think the latest version of Word might be
able to open PDF files as well, so it's possible you could use the same
approach for that format.

But for other formats, you would need a way of automating the process
that's supposed to be able to open those formats, and that's going to vary
widely from application to application, with some applications simply not
being able to be automated at all without a lot of work (and possibly some
ugly hacks).

Now, all that said, it's possible that you're going about this the wrong
way anyway. That is, there's nothing wrong with trying to get a quick
inventory of which files wound up corrupted. But you can't rely on this
inventory except to positively identify files that _don't_ work. Any file
that passes the inventory could still in fact be corrupted, depending on
what data got modified.

It's true that for any complex file format, the odds of a corruption that
leaves the file still valid are pretty low. But they aren't non-zero. If
you know the backup was corrupted, then to some extent you are just going
to have to assume that _all_ of the files may in fact be corrupted and
treat them as such until proved otherwise by some more reliable inspection
than just seeing if they can be opened in their respective application.

Pete
Oct 12 '08 #2
MC
"Peter Duniho" <Np*********@nnowslpianmk.comwrote in message
news:op***************@petes-computer.local...
Now, all that said, it's possible that you're going about this the wrong
way anyway. That is, there's nothing wrong with trying to get a quick
inventory of which files wound up corrupted. But you can't rely on this
inventory except to positively identify files that _don't_ work. Any file
that passes the inventory could still in fact be corrupted, depending on
what data got modified.

It's true that for any complex file format, the odds of a corruption that
leaves the file still valid are pretty low. But they aren't non-zero. If
you know the backup was corrupted, then to some extent you are just going
to have to assume that _all_ of the files may in fact be corrupted and
treat them as such until proved otherwise by some more reliable inspection
than just seeing if they can be opened in their respective application.
Well said! That's an *extremely* important point. Corruption of a file
could leave it openable but change the data in it.
Oct 13 '08 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Jole | last post by:
Hi I'm writing a program that needs to read from a file. In order for the program to be robust, it should somehow check that the file isn't corrupt, or stuffed in any way. For example, that...
10
by: Frances Del Rio | last post by:
pls, why is this not working? <SCRIPT language=JavaScript type="text/javascript"> var br = '<SCRIPT language=Javascript' br += 'src="js_pop.js" type="text/javascript">' br += '</SCRIPT>' var...
2
by: Chad Smith | last post by:
Hi, I have created a .NET deployment project in Visual Studio 2003. I have specified an entry point into my own code in this installer which launches into the familiar: public override void...
3
by: Sagaert Johan | last post by:
I have a program that uses the filesystemwatcher. Problem is that i have to wait until the file is closed before i can do someting with it in the changed event. How can i detect if the...
4
by: jcrouse | last post by:
I am using the following code to move a label on a form at runtime: If myMousedown = lblP1JoyRight.Name Then If lblP1JoyRight.BackColor.Equals(Color.Transparent) Then bTransCk = True ...
16
by: iwdu15 | last post by:
how can i open a file i saved and place the info into different text boxes?
18
by: Fuzzyman | last post by:
Hello all, I'm trying to detect line endings used in text files. I *might* be decoding the files into unicode first (which may be encoded using multi-byte encodings) - which is why I'm not...
12
by: ABN | last post by:
I have a C# (.NET 1.1) application in which I loop over a number of files on the hard drive and delete them. A few times, I've experienced an exception that says the file is in use by another...
7
by: EliteBadger | last post by:
Hey, I've searched around on Google Groups for a while on this topic, and haven't found anything useful. I use a FileSystemWatcher to catch filesystem events. I would also like to get an event...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.