By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
454,459 Members | 1,385 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 454,459 IT Pros & Developers. It's quick & easy.

emptying files

P: n/a
I have written this short program to empty out files. It works great
except that it truncates. My guess was that it was in the fopen mode
somewhere but I have played with that and the same results. Empty file of
zero bytes. If I have a 512 byte file of data, I want 512 bytes of '\0'.
Pardon the exit(1)'s. It's hort hand on my implementation for
exit(EXIT_FAILURE); The macro is defined on my implementation as 1.

Bill

/* se, secure erase */

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
if (argc != 2) {
puts("se usage error");
exit(1);
}
int i, j;
FILE *fo, *fw;
if ((fo = fopen(argv[1], "ab")) == NULL) {
printf("%i\n", ferror(fo));
clearerr(fo);
fclose(fo);
exit(1);
}
if ((fw = fopen(argv[1], "ab")) == NULL) {
printf("%i\n", ferror(fw));
clearerr(fw);
fclose(fw);
exit(1);
}
while ((i = getc(fo)) != EOF)
putc(j = 0, fw);
fclose(fo);
fclose(fw);
return 0;
}

Nov 18 '08 #1
Share this Question
Share on Google+
26 Replies


P: n/a
On Nov 19, 12:26 am, "Bill Cunningham" <nos...@nspam.invalidwrote:
Pardon the exit(1)'s. It's hort hand on my implementation for
exit(EXIT_FAILURE); The macro is defined on my implementation as 1.
Bill Cunningham on portability. Priceless
Nov 18 '08 #2

P: n/a

<vi******@gmail.comwrote in message
news:03**********************************@w24g2000 prd.googlegroups.com...
Bill Cunningham on portability. Priceless
I only plan to use this on my system. Or else believe me it would be
exit(EXIT_FAILURE);

Nov 18 '08 #3

P: n/a
Bill Cunningham wrote:
I have written this short program to empty out files. It works great
except that it truncates. My guess was that it was in the fopen mode
somewhere
If you want to set a file to all-zeros, determine how long it is, then
fwrite that many zeros into it. Don't try to read a byte at a time, set
it to zero, write it back, etc etc.

By the way this isn't a very secure erase. You'd need to write at least
seven different bitpatterns over the entire file sequentially.
Nov 18 '08 #4

P: n/a

"Mark McIntyre" <ma**********@TROUSERSspamcop.netwrote in message
news:gD*******************@en-nntp-09.am2.easynews.com...
By the way this isn't a very secure erase. You'd need to write at least
seven different bitpatterns over the entire file sequentially.
I kind of figured it wouldn't be. Thanks. I'll figure out something with
for() for that.

Bill
Nov 18 '08 #5

P: n/a
Hi

On Tue, 18 Nov 2008 18:06:39 -0500, Bill Cunningham wrote:
"Mark McIntyre" <ma**********@TROUSERSspamcop.netwrote in message
news:gD*******************@en-nntp-09.am2.easynews.com...
>By the way this isn't a very secure erase. You'd need to write at least
seven different bitpatterns over the entire file sequentially.

I kind of figured it wouldn't be. Thanks. I'll figure out something
with for() for that.
This only works if the file system overwrites in-place. If not you are
just wasting effort.

See the documentation and source of your platform's shred(1) for pointers.

viza
Nov 18 '08 #6

P: n/a
Hi

On Tue, 18 Nov 2008 17:26:41 -0500, Bill Cunningham wrote:
I have written this short program to empty out files. It works great
except that it truncates. My guess was that it was in the fopen mode
somewhere but I have played with that and the same results. Empty file
of zero bytes. If I have a 512 byte file of data, I want 512 bytes of
'\0'. Pardon the exit(1)'s. It's hort hand on my implementation for
exit(EXIT_FAILURE); The macro is defined on my implementation as 1.
if ((fo = fopen(argv[1], "ab")) == NULL) { ...
if ((fw = fopen(argv[1], "ab")) == NULL) { ...

while ((i = getc(fo)) != EOF)
putc(j = 0, fw);
Opening the file twice is a bad idea.

"a" is append. You want "r+b", and then fseek() to the end and then
ftell() to get the length and then rewind() and fwrite() in large-ish
blocks for some semblance of efficiency. Also, see my other reply.
Nov 18 '08 #7

P: n/a
Mark McIntyre wrote:
If you want to set a file to all-zeros, determine how long it is, then
fwrite that many zeros into it. Don't try to read a byte at a time, set
it to zero, write it back, etc etc.
I think that reading a byte at a time,
is the portable way to determine how long it is.

--
pete
Nov 18 '08 #8

P: n/a

"viza" <to******@gm-il.com.obviouschange.invalidwrote in message
news:n6****************@newsfe25.ams2...
This only works if the file system overwrites in-place. If not you are
just wasting effort.
[snip]

What do you mean? If the filesystem is mounted in rw mode? I guess if it's
mount in ro it would do no good.

Bill
Nov 19 '08 #9

P: n/a
On Tue, 18 Nov 2008 20:35:09 -0500, Bill Cunningham wrote:
"viza" <to******@gm-il.com.obviouschange.invalidwrote in message
news:n6****************@newsfe25.ams2...
>This only works if the file system overwrites in-place. If not you are
just wasting effort.
[snip]

What do you mean? If the filesystem is mounted in rw mode? I guess if
it's mount in ro it would do no good.
If you open a file and write over its contents, sometimes the operating
system writes the new data to the same blocks where the old data was and
does not change which blocks make up the file. In other systems it first
writes the data to a new block, then changes the file to refer to the new
block. The old block is then released back to be reused, without ever
being overwritten whether you try to overwrite it or not.

Someone with low-level access to the disk can look for recently released
blocks and get access to the data that a naive shredding program like
this has left behind, in just the same way as if you had deleted the file
and recreated it.

HTH
viza
Nov 19 '08 #10

P: n/a

"viza" <to******@gm-il.com.obviouschange.invalidwrote in message
news:B9****************@newsfe25.ams2...
If you open a file and write over its contents, sometimes the operating
system writes the new data to the same blocks where the old data was and
does not change which blocks make up the file. In other systems it first
writes the data to a new block, then changes the file to refer to the new
block. The old block is then released back to be reused, without ever
being overwritten whether you try to overwrite it or not.

Someone with low-level access to the disk can look for recently released
blocks and get access to the data that a naive shredding program like
this has left behind, in just the same way as if you had deleted the file
and recreated it.
Sounds like WFP. I hate not being allowed to erase my system critical
files. Atleast they say they are.
Nov 19 '08 #11

P: n/a
"Bill Cunningham" <no****@nspam.invalidwrites:
<vi******@gmail.comwrote in message
news:03**********************************@w24g2000 prd.googlegroups.com...
>Bill Cunningham on portability. Priceless

I only plan to use this on my system. Or else believe me it would be
exit(EXIT_FAILURE);
Why on Earth don't you just write "exit(EXIT_FAILURE);" in the first
place? You've wasted far more time trying to justify your
non-portable code than it would take to make it more portable.

If there were some system-specific advantage to using the less
portable code, then that would be fine -- but by your own admission,
exit(1) does exactly the same thing as exit(EXIT_FAILURE) on your
platform.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Nov 19 '08 #12

P: n/a

"viza" <to******@gm-il.com.obviouschange.invalidwrote in message
news:n6****************@newsfe25.ams2...
See the documentation and source of your platform's shred(1) for pointers.
This shred is pretty cool. But it doesn't create an empty file. Only
secure erase which is what I want.

Bill
Nov 19 '08 #13

P: n/a
On Tue, 18 Nov 2008 22:48:42 +0000, Mark McIntyre
<ma**********@TROUSERSspamcop.netwrote in comp.lang.c:
Bill Cunningham wrote:
I have written this short program to empty out files. It works great
except that it truncates. My guess was that it was in the fopen mode
somewhere

If you want to set a file to all-zeros, determine how long it is, then
fwrite that many zeros into it. Don't try to read a byte at a time, set
it to zero, write it back, etc etc.

By the way this isn't a very secure erase. You'd need to write at least
seven different bitpatterns over the entire file sequentially.
Off-topic, I know, but how well do any erase algorithms do on flash
file devices?

A flash file system, like ffs, deliberately incorporates
wear-leveling, and so most likely will actually write those seven
files full of different patterns to seven distinct locations in the
flash, none of them actually overwriting the original.

And even if the file system has hooks to bypass this for erasing, many
of the flash devices, like compact flash, SD, and others, have the
wear leveling built into the controller micro in the device, and it
can't be bypassed.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://c-faq.com/
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.club.cc.cmu.edu/~ajo/docs/FAQ-acllc.html
Nov 19 '08 #14

P: n/a
pete said:
Mark McIntyre wrote:
>If you want to set a file to all-zeros, determine how long it is, then
fwrite that many zeros into it. Don't try to read a byte at a time, set
it to zero, write it back, etc etc.

I think that reading a byte at a time,
is the portable way to determine how long it is.
It depends what you mean by "how long".

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Nov 19 '08 #15

P: n/a
On Tue, 18 Nov 2008 20:51:18 -0500, Bill Cunningham wrote:
"viza" <to******@gm-il.com.obviouschange.invalidwrote in message
news:B9****************@newsfe25.ams2...
>If you open a file and write over its contents, sometimes the operating
system writes the new data to the same blocks where the old data was
and does not change which blocks make up the file. In other systems it
first writes the data to a new block, then changes the file to refer to
the new block. The old block is then released back to be reused,
without ever being overwritten whether you try to overwrite it or not.

Someone with low-level access to the disk can look for recently
released blocks and get access to the data that a naive shredding
program like this has left behind, in just the same way as if you had
deleted the file and recreated it.

Sounds like WFP. I hate not being allowed to erase my system
critical
files. Atleast they say they are.
No, it is good idea. Suppose the write operation was interrupted by a
power failure. If you write new blocks and then switch over the pointer
then you are can ensure that you always have one good copy of the file,
either before or after the write. If you try to modify the blocks in
place you are much more likely to get junk.

It does mean however that you need a filesystem aware shred program and
appropriate administrator permissions to run it.
Nov 19 '08 #16

P: n/a
Richard Heathfield wrote:
pete said:
>Mark McIntyre wrote:
>>If you want to set a file to all-zeros, determine how long it is, then
fwrite that many zeros into it. Don't try to read a byte at a time, set
it to zero, write it back, etc etc.
I think that reading a byte at a time,
is the portable way to determine how long it is.

It depends what you mean by "how long".
If you want to create a file with the same name and size as an original,
which reads as though it is full of null bytes unlike the original,
getc reading a byte at a time
would be a way to count how many bytes you need to write.

--
pete
Nov 19 '08 #17

P: n/a
pete <pf*****@mindspring.comwrites:
Richard Heathfield wrote:
>pete said:
>>Mark McIntyre wrote:

If you want to set a file to all-zeros, determine how long it is, then
fwrite that many zeros into it. Don't try to read a byte at a time, set
it to zero, write it back, etc etc.
I think that reading a byte at a time,
is the portable way to determine how long it is.
It depends what you mean by "how long".

If you want to create a file with the same name and size as an original,
which reads as though it is full of null bytes unlike the original,
getc reading a byte at a time
would be a way to count how many bytes you need to write.
Unless the implementation pads the end of binary files with zero
bytes, which the standard specifically allows.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Nov 19 '08 #18

P: n/a
In article <ln************@nuthaus.mib.org>,
Keith Thompson <ks***@mib.orgwrote:
>If you want to create a file with the same name and size as an original,
which reads as though it is full of null bytes unlike the original,
getc reading a byte at a time
would be a way to count how many bytes you need to write.
>Unless the implementation pads the end of binary files with zero
bytes, which the standard specifically allows.
Surely the point of that is for implementations that only record the
number of blocks in a file, and mark the "real" end of text files with
a character such as ^Z. In such an implementation, you'd just
unnecessarily write a few extra bytes, and there would be no change in
the length of the file - the unpadded length exists only in the mind
of the user.

-- Richard
--
Please remember to mention me / in tapes you leave behind.
Nov 19 '08 #19

P: n/a
ri*****@cogsci.ed.ac.uk (Richard Tobin) writes:
In article <ln************@nuthaus.mib.org>,
Keith Thompson <ks***@mib.orgwrote:
>>If you want to create a file with the same name and size as an original,
which reads as though it is full of null bytes unlike the original,
getc reading a byte at a time
would be a way to count how many bytes you need to write.
>>Unless the implementation pads the end of binary files with zero
bytes, which the standard specifically allows.

Surely the point of that is for implementations that only record the
number of blocks in a file, and mark the "real" end of text files with
a character such as ^Z. In such an implementation, you'd just
unnecessarily write a few extra bytes, and there would be no change in
the length of the file - the unpadded length exists only in the mind
of the user.
It's for binary files, not text files. If you write, say, 10 bytes to
a file in binary mode, then read it back, you can get back the 10
bytes you wrote plus, say, another 502 null bytes. Since ^Z (ASCII
character 26, assuming an ASCII-based implementation) is a valid
character in a binary file, it can't be used as an end-of-file marker.
Reading from a text file in such an implementation will give you EOF
when it hits the logical end-of-file maker (^Z or whatever), and you
won't see any appended null bytes unless you read the file in binary
mode.

The standard's wording is (C99 7.19.2p3):

A binary stream is an ordered sequence of characters that can
transparently record internal data. Data read in from a binary
stream shall compare equal to the data that were earlier written
out to that stream, under the same implementation. Such a stream
may, however, have an implementation-defined number of null
characters appended to the end of the stream.

But yes, you make a good point. If I write 10 bytes to a file and end
up with a file that's indistinguishable from a file to which I wrote
512 bytes, the last 502 of which are null bytes, then the actual size
of the file is 512 bytes. (If I care, I can always encode the logical
file size in the file's data.)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Nov 19 '08 #20

P: n/a
In article <ln************@nuthaus.mib.org>,
Keith Thompson <ks***@mib.orgwrote:
>Surely the point of that is for implementations that only record the
number of blocks in a file, and mark the "real" end of text files with
a character such as ^Z. In such an implementation, you'd just
unnecessarily write a few extra bytes, and there would be no change in
the length of the file - the unpadded length exists only in the mind
of the user.
>It's for binary files, not text files.
I must have expressed myself badly. I meant to say that these systems
have a way of giving a precise length to text files - using a marker
character - but for binary they only express length to block
granularity.

-- Richard
--
Please remember to mention me / in tapes you leave behind.
Nov 19 '08 #21

P: n/a
pete wrote:
Mark McIntyre wrote:
>If you want to set a file to all-zeros, determine how long it is, then
fwrite that many zeros into it. Don't try to read a byte at a time,
set it to zero, write it back, etc etc.

I think that reading a byte at a time,
is the portable way to determine how long it is.
Um yes, but not reading one byte then truncating it....

--
Mark McIntyre

CLC FAQ <http://c-faq.com/>
CLC readme: <http://www.ungerhu.com/jxh/clc.welcome.txt>
Nov 19 '08 #22

P: n/a
On Nov 19, 9:56 am, Keith Thompson <ks...@mib.orgwrote:
rich...@cogsci.ed.ac.uk (Richard Tobin) writes:
In article <lnbpwb1w2z....@nuthaus.mib.org>,
Keith Thompson <ks...@mib.orgwrote:
>If you want to create a file with the same name and size as an original,
which reads as though it is full of null bytes unlike the original,
getc reading a byte at a time
would be a way to count how many bytes you need to write.
>Unless the implementation pads the end of binary files with zero
bytes, which the standard specifically allows.
Surely the point of that is for implementations that only record the
number of blocks in a file, and mark the "real" end of text files with
a character such as ^Z. In such an implementation, you'd just
unnecessarily write a few extra bytes, and there would be no change in
the length of the file - the unpadded length exists only in the mind
of the user.

It's for binary files, not text files. If you write, say, 10 bytes to
a file in binary mode, then read it back, you can get back the 10
bytes you wrote plus, say, another 502 null bytes. Since ^Z (ASCII
character 26, assuming an ASCII-based implementation) is a valid
character in a binary file, it can't be used as an end-of-file marker.
Reading from a text file in such an implementation will give you EOF
when it hits the logical end-of-file maker (^Z or whatever), and you
won't see any appended null bytes unless you read the file in binary
mode.

The standard's wording is (C99 7.19.2p3):

A binary stream is an ordered sequence of characters that can
transparently record internal data. Data read in from a binary
stream shall compare equal to the data that were earlier written
out to that stream, under the same implementation. Such a stream
may, however, have an implementation-defined number of null
characters appended to the end of the stream.

But yes, you make a good point. If I write 10 bytes to a file and end
up with a file that's indistinguishable from a file to which I wrote
512 bytes, the last 502 of which are null bytes, then the actual size
of the file is 512 bytes. (If I care, I can always encode the logical
file size in the file's data.)
How do you encode the logical file size in the file's data?
Chad

Nov 19 '08 #23

P: n/a
Chad <cd*****@gmail.comwrites:
On Nov 19, 9:56 am, Keith Thompson <ks...@mib.orgwrote:
[...]
>But yes, you make a good point. If I write 10 bytes to a file and end
up with a file that's indistinguishable from a file to which I wrote
512 bytes, the last 502 of which are null bytes, then the actual size
of the file is 512 bytes. (If I care, I can always encode the logical
file size in the file's data.)

How do you encode the logical file size in the file's data?
Any way you like, depending on the file format.

For example, you might write the file's logical size in the first,
say, 8 bytes of the file -- as long as all writers and readers agree
on the format (including how the size is encoded in those 8 bytes).

I *think* that most image file formats, for example, include this kind
of information, though not typically at the very beginning of the file
(which is usually a marker indicating what kind of file it is).

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Nov 19 '08 #24

P: n/a
Mark McIntyre wrote:
pete wrote:
>Mark McIntyre wrote:
>>If you want to set a file to all-zeros, determine how long it is,
then fwrite that many zeros into it. Don't try to read a byte at a
time, set it to zero, write it back, etc etc.

I think that reading a byte at a time,
is the portable way to determine how long it is.

Um yes, but not reading one byte then truncating it....
I hadn't looked at the code, until now.
It looks strange and has some obvious problems.

The printf statements and the clearerr statements are undefined,
because the arguments are null pointers.
So are two of the fclose statements.

The j variable has no purpose.

And the main idea seems to be to open a binary
file twice in append mode,
and it seems to me that it appends a null byte to the end of the file,
for every byte that it reads from a file
which hasn't been opened for reading.

--
pete
Nov 20 '08 #25

P: n/a
Chad wrote:
Keith Thompson <ks...@mib.orgwrote:
.... snip ...
>
>But yes, you make a good point. If I write 10 bytes to a file
and end up with a file that's indistinguishable from a file to
which I wrote 512 bytes, the last 502 of which are null bytes,
then the actual size of the file is 512 bytes. (If I care, I
can always encode the logical file size in the file's data.)

How do you encode the logical file size in the file's data?
For example, you can write "size=10;" in the first 8 bytes of the
file, followed by the 10 binary bytes.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.
Nov 20 '08 #26

P: n/a

"Jack Klein" <ja*******@spamcop.netwrote in message
news:2v********************************@4ax.com...
Off-topic, I know, but how well do any erase algorithms do on flash
file devices?

A flash file system, like ffs, deliberately incorporates
wear-leveling, and so most likely will actually write those seven
files full of different patterns to seven distinct locations in the
flash, none of them actually overwriting the original.

And even if the file system has hooks to bypass this for erasing, many
of the flash devices, like compact flash, SD, and others, have the
wear leveling built into the controller micro in the device, and it
can't be bypassed.
Very good question. Even if OT I would wonder the same thing.

Bill
Nov 20 '08 #27

This discussion thread is closed

Replies have been disabled for this discussion.