Connecting Tech Pros Worldwide Help | Site Map

reading files with error

Maurice LING
Guest
 
Posts: n/a
#1: Sep 18 '05
Hi,

I'm trying to read some files (video files) that seems to have some
errors in it. Basically, I cannot copy it out of discs as that gives me
an error message but I can still play the file using a media player like
VLC or QuickTime. I understand that copying a file will also invoke
checking routines as well, and I am guessing that the file may have some
parity-bit error or something.

Am I able to use Python to force read the entire file (full length)?
That is, do not check the read for errors. I know that this is insideous
in many uses but in media files, it may only result in a skipped frame
or something. What I've done is something like this:
[color=blue][color=green][color=darkred]
>>> f = open('/Volumes/NEW/gameday/morning.dat', 'rb')
>>> data = f.read()
>>> o = open('/Users/mauriceling/Desktop/o.dat', 'wb')
>>> f.close()
>>> o.write(data)
>>> o.close()[/color][/color][/color]

What I've noticed is this:
1. sometimes it (Python) only reads to roughly the point of initial copy
error (I try to take note of how far drag-and-drop copying proceeds
before it fails)
2. sometimes it is able to read pass the drag-and-drop copying
fail-point but may or may not be full length.

What determines if Python is able to make it pass drag-and-drop copying
fail-point?

Is there anyway to do what I want, force read full length?

Thanks and cheers
Maurice
jepler@unpythonic.net
Guest
 
Posts: n/a
#2: Sep 18 '05

re: reading files with error


I have written a program to do something similar. My strategy is:
* use os.read() to read 512 bytes at a time
* when os.read fails, seek to the next multiple of 512 bytes
and write '\0' * 512 to the output
I notice this code doesn't deal properly with short reads, but in my experience
they never happen (because the disk error makes an entire block unreadable,and
a block is never less than 512 bytes)

I use this code on a unix-style system.

def dd(src, target, bs=512):
src = os.open(src, os.O_RDONLY)
if os.path.exists(target):
target = os.open(target, os.O_WRONLY | os.O_APPEND, 0600)
existing = os.lseek(target, 0, SEEK_END)
else:
existing = 0
target = os.open(target, os.O_WRONLY | os.O_CREAT, 0600)

total = os.lseek(src, 0, SEEK_END) / bs
os.lseek(src, existing, SEEK_SET)
os.lseek(target, existing, SEEK_SET)

if existing: print "starting at", existing
i = existing / bs
f = 0
lastrem = -1

last = start = time.time()
while 1:
try:
block = os.read(src, bs)
except os.error, detail:
if detail.errno == errno.EIO:
block = "\0" * bs
os.lseek(src, (i+1) * bs, SEEK_SET)
f = f + 1
else:
raise
if block == "": break

i = i + 1
os.write(target, block)

now = time.time()
if i % 1000 or now - last < 1: continue
last = now

frac = i * 1. / total
rem = int((now-start) * (1-frac) / frac)
if rem < 60 or abs(rem - lastrem) > .5:
rm, rs = divmod(rem, 60)
lastrem = rem
spd = i * 512. / (now - start) / 1024 / 1024
sys.stderr.write("%8d %8d %8d %3.1f%% %6d:%02d %6.1fMB/s\r"
% (i, f, i-f, i * 100. / total, rm, rs, spd))
sys.stderr.write("\n")

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFDLMm0Jd01MZaTXX0RAoyQAKCgrED02MfBBBxGGjB66X R0PtkPUwCeJZaj
qnHVJVnl3zAfuMrOXAiFzd8=
=vBK6
-----END PGP SIGNATURE-----

Maurice Ling
Guest
 
Posts: n/a
#3: Sep 18 '05

re: reading files with error


jepler@unpythonic.net wrote:
[color=blue]
>I have written a program to do something similar. My strategy is:
> * use os.read() to read 512 bytes at a time
> * when os.read fails, seek to the next multiple of 512 bytes
> and write '\0' * 512 to the output
>I notice this code doesn't deal properly with short reads, but in my experience
>they never happen (because the disk error makes an entire block unreadable, and
>a block is never less than 512 bytes)
>
>I use this code on a unix-style system.
>
>def dd(src, target, bs=512):
> src = os.open(src, os.O_RDONLY)
> if os.path.exists(target):
> target = os.open(target, os.O_WRONLY | os.O_APPEND, 0600)
> existing = os.lseek(target, 0, SEEK_END)
> else:
> existing = 0
> target = os.open(target, os.O_WRONLY | os.O_CREAT, 0600)
>
> total = os.lseek(src, 0, SEEK_END) / bs
> os.lseek(src, existing, SEEK_SET)
> os.lseek(target, existing, SEEK_SET)
>
> if existing: print "starting at", existing
> i = existing / bs
> f = 0
> lastrem = -1
>
> last = start = time.time()
> while 1:
> try:
> block = os.read(src, bs)
> except os.error, detail:
> if detail.errno == errno.EIO:
> block = "\0" * bs
> os.lseek(src, (i+1) * bs, SEEK_SET)
> f = f + 1
> else:
> raise
> if block == "": break
>
> i = i + 1
> os.write(target, block)
>
> now = time.time()
> if i % 1000 or now - last < 1: continue
> last = now
>
> frac = i * 1. / total
> rem = int((now-start) * (1-frac) / frac)
> if rem < 60 or abs(rem - lastrem) > .5:
> rm, rs = divmod(rem, 60)
> lastrem = rem
> spd = i * 512. / (now - start) / 1024 / 1024
> sys.stderr.write("%8d %8d %8d %3.1f%% %6d:%02d %6.1fMB/s\r"
> % (i, f, i-f, i * 100. / total, rm, rs, spd))
> sys.stderr.write("\n")
>
>[/color]
Sorry but what are SEEK_END and SEEK_SET?

Maurice

--
Maurice Han Tong LING, BSc(Hons)(Biochem), AdvDipComp, CPT, SSN, FIFA,
MASBMB, MAMBIS, MACM
Doctor of Philosophy (Science) Candidate, The University of Melbourne
mobile: +61 4 22781753, +65 96669233
mailing address: Department of Zoology, The University of Melbourne
Royal Parade, Parkville, Victoria 3010, Australia
residence: 9/41 Dover Street, Flemington, Victoria 3031, Australia
resume: http://maurice.vodien.com/maurice_resume.pdf
www: http://www.geocities.com/beldin79/


Christian Stapfer
Guest
 
Posts: n/a
#4: Sep 18 '05

re: reading files with error


Maurice Ling <mauriceling@acm.org> wrote in message
news:mailman.547.1127016906.509.python-list@python.org...[color=blue]
> jepler@unpythonic.net wrote:
>[color=green]
>>I have written a program to do something similar. My strategy is:
>> * use os.read() to read 512 bytes at a time
>> * when os.read fails, seek to the next multiple of 512 bytes
>> and write '\0' * 512 to the output
>>I notice this code doesn't deal properly with short reads, but in my
>>experience
>>they never happen (because the disk error makes an entire block
>>unreadable, and
>>a block is never less than 512 bytes)
>>
>>I use this code on a unix-style system.
>>
>>def dd(src, target, bs=512):
>> src = os.open(src, os.O_RDONLY)
>> if os.path.exists(target):
>> target = os.open(target, os.O_WRONLY | os.O_APPEND, 0600)
>> existing = os.lseek(target, 0, SEEK_END)
>> else:
>> existing = 0
>> target = os.open(target, os.O_WRONLY | os.O_CREAT, 0600)
>>
>> total = os.lseek(src, 0, SEEK_END) / bs
>> os.lseek(src, existing, SEEK_SET)
>> os.lseek(target, existing, SEEK_SET)
>>
>> if existing: print "starting at", existing
>> i = existing / bs
>> f = 0
>> lastrem = -1
>>
>> last = start = time.time()
>> while 1:
>> try:
>> block = os.read(src, bs)
>> except os.error, detail:
>> if detail.errno == errno.EIO:
>> block = "\0" * bs
>> os.lseek(src, (i+1) * bs, SEEK_SET)
>> f = f + 1
>> else:
>> raise
>> if block == "": break
>>
>> i = i + 1
>> os.write(target, block)
>>
>> now = time.time()
>> if i % 1000 or now - last < 1: continue
>> last = now
>>
>> frac = i * 1. / total
>> rem = int((now-start) * (1-frac) / frac)
>> if rem < 60 or abs(rem - lastrem) > .5:
>> rm, rs = divmod(rem, 60)
>> lastrem = rem
>> spd = i * 512. / (now - start) / 1024 / 1024
>> sys.stderr.write("%8d %8d %8d %3.1f%% %6d:%02d %6.1fMB/s\r"
>> % (i, f, i-f, i * 100. / total, rm, rs, spd))
>> sys.stderr.write("\n")
>>
>>[/color]
> Sorry but what are SEEK_END and SEEK_SET?[/color]

The Python 2.3 documentation seems to specify the *numeric*
values of these constants only. But since Python's file
objects are "implemented using C's stdio package", you
can read

http://www.opengroup.org/onlinepubs/...ons/lseek.html

Regards,
Christian Stapfer


jepler@unpythonic.net
Guest
 
Posts: n/a
#5: Sep 18 '05

re: reading files with error


On Sun, Sep 18, 2005 at 02:15:00PM +1000, Maurice Ling wrote:[color=blue]
> Sorry but what are SEEK_END and SEEK_SET?[/color]

Oops, that's what I get for snipping a part of a larger program.

SEEK_SET = 0
SEEK_END = 2

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFDLYPxJd01MZaTXX0RArHTAKCbHRfGu/Bf7A5sopPXudMrKcZnuQCgjS72
uol/6PiY+7GKCnURTEJ05pE=
=fXAP
-----END PGP SIGNATURE-----

Closed Thread