473,473 Members | 1,982 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

reading files with error

Hi,

I'm trying to read some files (video files) that seems to have some
errors in it. Basically, I cannot copy it out of discs as that gives me
an error message but I can still play the file using a media player like
VLC or QuickTime. I understand that copying a file will also invoke
checking routines as well, and I am guessing that the file may have some
parity-bit error or something.

Am I able to use Python to force read the entire file (full length)?
That is, do not check the read for errors. I know that this is insideous
in many uses but in media files, it may only result in a skipped frame
or something. What I've done is something like this:
f = open('/Volumes/NEW/gameday/morning.dat', 'rb')
data = f.read()
o = open('/Users/mauriceling/Desktop/o.dat', 'wb')
f.close()
o.write(data)
o.close()


What I've noticed is this:
1. sometimes it (Python) only reads to roughly the point of initial copy
error (I try to take note of how far drag-and-drop copying proceeds
before it fails)
2. sometimes it is able to read pass the drag-and-drop copying
fail-point but may or may not be full length.

What determines if Python is able to make it pass drag-and-drop copying
fail-point?

Is there anyway to do what I want, force read full length?

Thanks and cheers
Maurice
Sep 18 '05 #1
4 1966
I have written a program to do something similar. My strategy is:
* use os.read() to read 512 bytes at a time
* when os.read fails, seek to the next multiple of 512 bytes
and write '\0' * 512 to the output
I notice this code doesn't deal properly with short reads, but in my experience
they never happen (because the disk error makes an entire block unreadable,and
a block is never less than 512 bytes)

I use this code on a unix-style system.

def dd(src, target, bs=512):
src = os.open(src, os.O_RDONLY)
if os.path.exists(target):
target = os.open(target, os.O_WRONLY | os.O_APPEND, 0600)
existing = os.lseek(target, 0, SEEK_END)
else:
existing = 0
target = os.open(target, os.O_WRONLY | os.O_CREAT, 0600)

total = os.lseek(src, 0, SEEK_END) / bs
os.lseek(src, existing, SEEK_SET)
os.lseek(target, existing, SEEK_SET)

if existing: print "starting at", existing
i = existing / bs
f = 0
lastrem = -1

last = start = time.time()
while 1:
try:
block = os.read(src, bs)
except os.error, detail:
if detail.errno == errno.EIO:
block = "\0" * bs
os.lseek(src, (i+1) * bs, SEEK_SET)
f = f + 1
else:
raise
if block == "": break

i = i + 1
os.write(target, block)

now = time.time()
if i % 1000 or now - last < 1: continue
last = now

frac = i * 1. / total
rem = int((now-start) * (1-frac) / frac)
if rem < 60 or abs(rem - lastrem) > .5:
rm, rs = divmod(rem, 60)
lastrem = rem
spd = i * 512. / (now - start) / 1024 / 1024
sys.stderr.write("%8d %8d %8d %3.1f%% %6d:%02d %6.1fMB/s\r"
% (i, f, i-f, i * 100. / total, rm, rs, spd))
sys.stderr.write("\n")

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFDLMm0Jd01MZaTXX0RAoyQAKCgrED02MfBBBxGGjB66X R0PtkPUwCeJZaj
qnHVJVnl3zAfuMrOXAiFzd8=
=vBK6
-----END PGP SIGNATURE-----

Sep 18 '05 #2
je****@unpythonic.net wrote:
I have written a program to do something similar. My strategy is:
* use os.read() to read 512 bytes at a time
* when os.read fails, seek to the next multiple of 512 bytes
and write '\0' * 512 to the output
I notice this code doesn't deal properly with short reads, but in my experience
they never happen (because the disk error makes an entire block unreadable, and
a block is never less than 512 bytes)

I use this code on a unix-style system.

def dd(src, target, bs=512):
src = os.open(src, os.O_RDONLY)
if os.path.exists(target):
target = os.open(target, os.O_WRONLY | os.O_APPEND, 0600)
existing = os.lseek(target, 0, SEEK_END)
else:
existing = 0
target = os.open(target, os.O_WRONLY | os.O_CREAT, 0600)

total = os.lseek(src, 0, SEEK_END) / bs
os.lseek(src, existing, SEEK_SET)
os.lseek(target, existing, SEEK_SET)

if existing: print "starting at", existing
i = existing / bs
f = 0
lastrem = -1

last = start = time.time()
while 1:
try:
block = os.read(src, bs)
except os.error, detail:
if detail.errno == errno.EIO:
block = "\0" * bs
os.lseek(src, (i+1) * bs, SEEK_SET)
f = f + 1
else:
raise
if block == "": break

i = i + 1
os.write(target, block)

now = time.time()
if i % 1000 or now - last < 1: continue
last = now

frac = i * 1. / total
rem = int((now-start) * (1-frac) / frac)
if rem < 60 or abs(rem - lastrem) > .5:
rm, rs = divmod(rem, 60)
lastrem = rem
spd = i * 512. / (now - start) / 1024 / 1024
sys.stderr.write("%8d %8d %8d %3.1f%% %6d:%02d %6.1fMB/s\r"
% (i, f, i-f, i * 100. / total, rm, rs, spd))
sys.stderr.write("\n")

Sorry but what are SEEK_END and SEEK_SET?

Maurice

--
Maurice Han Tong LING, BSc(Hons)(Biochem), AdvDipComp, CPT, SSN, FIFA,
MASBMB, MAMBIS, MACM
Doctor of Philosophy (Science) Candidate, The University of Melbourne
mobile: +61 4 22781753, +65 96669233
mailing address: Department of Zoology, The University of Melbourne
Royal Parade, Parkville, Victoria 3010, Australia
residence: 9/41 Dover Street, Flemington, Victoria 3031, Australia
resume: http://maurice.vodien.com/maurice_resume.pdf
www: http://www.geocities.com/beldin79/
Sep 18 '05 #3
Maurice Ling <ma*********@acm.org> wrote in message
news:ma************************************@python .org...
je****@unpythonic.net wrote:
I have written a program to do something similar. My strategy is:
* use os.read() to read 512 bytes at a time
* when os.read fails, seek to the next multiple of 512 bytes
and write '\0' * 512 to the output
I notice this code doesn't deal properly with short reads, but in my
experience
they never happen (because the disk error makes an entire block
unreadable, and
a block is never less than 512 bytes)

I use this code on a unix-style system.

def dd(src, target, bs=512):
src = os.open(src, os.O_RDONLY)
if os.path.exists(target):
target = os.open(target, os.O_WRONLY | os.O_APPEND, 0600)
existing = os.lseek(target, 0, SEEK_END)
else:
existing = 0
target = os.open(target, os.O_WRONLY | os.O_CREAT, 0600)

total = os.lseek(src, 0, SEEK_END) / bs
os.lseek(src, existing, SEEK_SET)
os.lseek(target, existing, SEEK_SET)

if existing: print "starting at", existing
i = existing / bs
f = 0
lastrem = -1

last = start = time.time()
while 1:
try:
block = os.read(src, bs)
except os.error, detail:
if detail.errno == errno.EIO:
block = "\0" * bs
os.lseek(src, (i+1) * bs, SEEK_SET)
f = f + 1
else:
raise
if block == "": break

i = i + 1
os.write(target, block)

now = time.time()
if i % 1000 or now - last < 1: continue
last = now

frac = i * 1. / total
rem = int((now-start) * (1-frac) / frac)
if rem < 60 or abs(rem - lastrem) > .5:
rm, rs = divmod(rem, 60)
lastrem = rem
spd = i * 512. / (now - start) / 1024 / 1024
sys.stderr.write("%8d %8d %8d %3.1f%% %6d:%02d %6.1fMB/s\r"
% (i, f, i-f, i * 100. / total, rm, rs, spd))
sys.stderr.write("\n")

Sorry but what are SEEK_END and SEEK_SET?


The Python 2.3 documentation seems to specify the *numeric*
values of these constants only. But since Python's file
objects are "implemented using C's stdio package", you
can read

http://www.opengroup.org/onlinepubs/...ons/lseek.html

Regards,
Christian Stapfer
Sep 18 '05 #4
On Sun, Sep 18, 2005 at 02:15:00PM +1000, Maurice Ling wrote:
Sorry but what are SEEK_END and SEEK_SET?


Oops, that's what I get for snipping a part of a larger program.

SEEK_SET = 0
SEEK_END = 2

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFDLYPxJd01MZaTXX0RArHTAKCbHRfGu/Bf7A5sopPXudMrKcZnuQCgjS72
uol/6PiY+7GKCnURTEJ05pE=
=fXAP
-----END PGP SIGNATURE-----

Sep 18 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: John | last post by:
I have over 5000 thumbnail pictures of size 5kb each. I would like to able to load all 5000 pictures and view 50 per page using mysql_data_seek(). I would like to know what are the advantages and...
4
by: somaboy mx | last post by:
hi, I'm on winXPpro / apache 1.2 / php 4.4.x I'm experimenting with writing and reading from textfiles via php. I can create a file with fopen, write to it, but I'm having trouble reading...
4
by: Xah Lee | last post by:
# -*- coding: utf-8 -*- # Python # to open a file and write to file # do f=open('xfile.txt','w') # this creates a file "object" and name it f. # the second argument of open can be
3
by: Fredrik Normann | last post by:
Hello, I'm trying to read the binary files under /var/spool/rwho/ so I'm wondering if anyone has done that before or could give me some clues on how to read those files. I've tried to use the...
30
by: siliconwafer | last post by:
Hi All, I want to know tht how can one Stop reading a file in C (e.g a Hex file)with no 'EOF'?
1
by: Andrea Gavana | last post by:
Hello NG, that may sound a silly question, but I didn't find anything really clear about the issue of reading unformatted big endian files with Python. What I was doing till now, was using...
6
by: arne.muller | last post by:
Hello, I've come across some problems reading strucutres from binary files. Basically I've some strutures typedef struct { int i; double x; int n; double *mz;
2
by: Derik | last post by:
I've got a XML file I read using a file_get_contents and turn into a simpleXML node every time index.php loads. I suspect this is causing a noticeable lag in my page-execution time. (Or the...
4
by: Laharl | last post by:
My Operating Systems professor has assigned homework that basically boils down to implementing ls -lra, but with a different output format. In other words, list the files and subdirectories (and a...
0
Guido Geurs
by: Guido Geurs | last post by:
I'm writing a program that list the contents of a CDrom and also the contents of the ZIP files. When there is a bad Zip file on the CD, the program keeps traying to reed the file and after +- 50...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.