472,961 Members | 1,543 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,961 software developers and data experts.

Howto Determine mimetype without the file name extension?

Hi all,
I had a filesystem crash and when I retrieved the data back
the files had random names without extension. I decided to write a
script to determine the file extension and create a newfile with
extension.
---
method 1:
# File extension utility.

import os
import mimetypes
import shutil

def main():

for root,dirs,files in os.walk(r'C:\Senthil\test'):
for each in files:
fname = os.path.join(root,each)
print fname
mtype,entype = mimetypes.guess_type(fname)
fext = mimetypes.guess_extension(mtype)
if fext is not None:
try:
newname = fname + fext
print newname
shutil.copyfile(fname,newname)
except (IOError,os.error), why:
print "Can't copy %s to %s: %s" %
(fname,newname,str(why))
if __name__ == "__main__":
main()

----
The problem I faced with this script is. if the filename did not have
any extension, the mimetypes.guess_type(filename) failed!!!
How do I get around this problem.

As it was a linux box, I tried using file command to get the work done.
----
Method 2:

import os
import shutil
import re

def detext(filename):
cin,cout,cerr = os.popen3('file ' + filename)
fileoutput = cout.read()
rtf = re.compile('Rich Text Format data')
# doc = re.compile('Microsoft Office Document')
pdf = re.compile('PDF')

if rtf.search(fileoutput) is not None:
shutil.copyfile(filename,filename + '.rtf')
if doc.search(fileoutput) is not None:
shutil.copyfile(filename,filename + '.doc')

if pdf.search(fileoutput) is not None:
shutil.copyfile(filename,filename + '.pdf')

def main():
for root,dirs,files in os.walk(os.getcwd()):
for each in files:
fname = os.path.join(root,each)
detext(fname)

if __name__ == '__main__':
main()

----
but the problem with using file was it recognized both .xls (MS Excel)
and .doc ( MS Doc) as Microsoft Word Document only. I need to separate
the .xls and .doc files, I dont know if file will be helpful here.

--
If the first approach of mimetypes works, it would be great!
Has anyone faced this problem? How did you solve it?

thanks,
Senthil

http://phoe6.livejournal.com

Jul 18 '06 #1
3 9037
Phoe6 wrote:
Hi all,
I had a filesystem crash and when I retrieved the data back
the files had random names without extension. I decided to write a
script to determine the file extension and create a newfile with
extension.
[...]
but the problem with using file was it recognized both .xls (MS Excel)
and .doc ( MS Doc) as Microsoft Word Document only. I need to separate
the .xls and .doc files, I dont know if file will be helpful here.
You may want to try the gnome.vfs module:

info = gnome.vfs.get_file_info(filename,
gnome.vfs.FILE_INFO_GET_MIME_TYPE)
info.mime_type #mime type

If all of your documents are .xls and .doc, you could also use one of
the cli tools that converts .doc to txt like catdoc. These tools will
fail on an .xls document, so if you run it and check for output. .doc
files would output a lot, .xls files would output an error or nothing.
The gnome.vfs module is probably your best bet though :-)

Additionally, I would re-organize your program a bit. something like:

import os
import re
import subprocess

types = (
('rtf', 'Rich Text Format data'),
('doc', 'Microsoft Office Document'),
('pdf', 'PDF'),
('txt', 'ASCII English text'),
)

def get_magic(filename):
pipe=subprocess.Popen(['file',filename],stdout=subprocess.PIPE)
output = pipe.stdout.read()
pipe.wait()
return output

def detext(filename):
fileoutput = get_magic(filename)
for ext, pattern in types:
if pattern in fileoutput:
return ext
def allfiles(path):
for root,dirs,files in os.walk(os.getcwd()):
for each in files:
fname = os.path.join(root,each)
yield fname

def fixnames(path):
for fname in allfiles(path):
extension = detext(fname)
print fname, extension #....

def main():
path = os.getcwd()
fixnames(path)

if __name__ == '__main__':
main()

Short functions that just do one thing are always best.

To change that to use gnome.vfs, just change the types list to be a
dictionary like
types = {
'application/msword': 'doc',
'application/vnd.ms-powerpoint': 'ppt',
}

and then

def get_mime(filename):
info = gnome.vfs.get_file_info(filename,
gnome.vfs.FILE_INFO_GET_MIME_TYPE)
return info.mime_type

def detext(filename):
mime_type = get_mime(filename)
return types.get(mime_type)

--
- Justin

Jul 18 '06 #2
Justin Azoff wrote:
Additionally, I would re-organize your program a bit. something like:
Thanks Justin, that was a helpful one. Helping me in learning python
programming.

Thanks,
Senthil

Jul 18 '06 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Logan | last post by:
Several people asked me for the following HOWTO, so I decided to post it here (though it is still very 'alpha' and might contain many (?) mistakes; didn't test what I wrote, but wrote it - more or...
4
by: Alexander Eisenhuth | last post by:
Hi alltogether, I use ActivePython 2.4.1 , also the debug part from http://ftp.activestate.com/ActivePython/etc/ and VC 6.0 unter Windows XP. I can't figure out howto debug my c++ extension....
0
by: Alex Duggleby | last post by:
Hi everybody, I'm trying to add some mime types to the local IIS server using some c# code. The code I'm using is: ---snip--- DirectoryEntry _mimeMap =
18
by: Keith Brown | last post by:
I have an application that allows embedded storage of ANY chosen file in an OLE field. The file could have been dragged-and-dropped into the field or it might have been selected and imported...
16
by: cyranoVR | last post by:
This is the approach I used to automate printing of Microsoft Access reports to PDF format i.e. unattended and without annoying "Save As..." dialogs, and - more importantly - without having to use...
1
by: Roy | last post by:
Hi, I have a problem that I have been working with for a while. I need to be able from server side (asp.net) to detect that the file i'm streaming down to the client is saved...
4
by: lcifers | last post by:
Is there a way, through VB.NET, to determine if the user has selected this option? I am writing an application that does some string functions to rename files, and the file names get chopped up if...
9
by: Erwin Moller | last post by:
Hi, Can anybody comment on this? In comp.lang.php I advised somebody to skip using: <script language="javascript"> and use: <script type="text/javascript"> And mr. Dunlop gave this response:
1
by: topramen | last post by:
does any one here know of a good way to to determine whether or not a given path is a directory (e.g., "c:\mydir") or a file (e.g., "c:\mydir \myfile.txt")? i started attacking this problem by...
0
by: lllomh | last post by:
Define the method first this.state = { buttonBackgroundColor: 'green', isBlinking: false, // A new status is added to identify whether the button is blinking or not } autoStart=()=>{
2
by: DJRhino | last post by:
Was curious if anyone else was having this same issue or not.... I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...
0
tracyyun
by: tracyyun | last post by:
Hello everyone, I have a question and would like some advice on network connectivity. I have one computer connected to my router via WiFi, but I have two other computers that I want to be able to...
4
NeoPa
by: NeoPa | last post by:
Hello everyone. I find myself stuck trying to find the VBA way to get Access to create a PDF of the currently-selected (and open) object (Form or Report). I know it can be done by selecting :...
3
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be using a very simple database which has Form (clsForm) & Report (clsReport) classes that simply handle making the calling Form invisible until the Form, or all...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 1 Nov 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM) Please note that the UK and Europe revert to winter time on...
3
by: nia12 | last post by:
Hi there, I am very new to Access so apologies if any of this is obvious/not clear. I am creating a data collection tool for health care employees to complete. It consists of a number of...
0
isladogs
by: isladogs | last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, Mike...
2
by: GKJR | last post by:
Does anyone have a recommendation to build a standalone application to replace an Access database? I have my bookkeeping software I developed in Access that I would like to make available to other...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.