473,671 Members | 2,446 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Howto Determine mimetype without the file name extension?

Hi all,
I had a filesystem crash and when I retrieved the data back
the files had random names without extension. I decided to write a
script to determine the file extension and create a newfile with
extension.
---
method 1:
# File extension utility.

import os
import mimetypes
import shutil

def main():

for root,dirs,files in os.walk(r'C:\Se nthil\test'):
for each in files:
fname = os.path.join(ro ot,each)
print fname
mtype,entype = mimetypes.guess _type(fname)
fext = mimetypes.guess _extension(mtyp e)
if fext is not None:
try:
newname = fname + fext
print newname
shutil.copyfile (fname,newname)
except (IOError,os.err or), why:
print "Can't copy %s to %s: %s" %
(fname,newname, str(why))
if __name__ == "__main__":
main()

----
The problem I faced with this script is. if the filename did not have
any extension, the mimetypes.guess _type(filename) failed!!!
How do I get around this problem.

As it was a linux box, I tried using file command to get the work done.
----
Method 2:

import os
import shutil
import re

def detext(filename ):
cin,cout,cerr = os.popen3('file ' + filename)
fileoutput = cout.read()
rtf = re.compile('Ric h Text Format data')
# doc = re.compile('Mic rosoft Office Document')
pdf = re.compile('PDF ')

if rtf.search(file output) is not None:
shutil.copyfile (filename,filen ame + '.rtf')
if doc.search(file output) is not None:
shutil.copyfile (filename,filen ame + '.doc')

if pdf.search(file output) is not None:
shutil.copyfile (filename,filen ame + '.pdf')

def main():
for root,dirs,files in os.walk(os.getc wd()):
for each in files:
fname = os.path.join(ro ot,each)
detext(fname)

if __name__ == '__main__':
main()

----
but the problem with using file was it recognized both .xls (MS Excel)
and .doc ( MS Doc) as Microsoft Word Document only. I need to separate
the .xls and .doc files, I dont know if file will be helpful here.

--
If the first approach of mimetypes works, it would be great!
Has anyone faced this problem? How did you solve it?

thanks,
Senthil

http://phoe6.livejournal.com

Jul 18 '06 #1
3 9110
Phoe6 wrote:
Hi all,
I had a filesystem crash and when I retrieved the data back
the files had random names without extension. I decided to write a
script to determine the file extension and create a newfile with
extension.
[...]
but the problem with using file was it recognized both .xls (MS Excel)
and .doc ( MS Doc) as Microsoft Word Document only. I need to separate
the .xls and .doc files, I dont know if file will be helpful here.
You may want to try the gnome.vfs module:

info = gnome.vfs.get_f ile_info(filena me,
gnome.vfs.FILE_ INFO_GET_MIME_T YPE)
info.mime_type #mime type

If all of your documents are .xls and .doc, you could also use one of
the cli tools that converts .doc to txt like catdoc. These tools will
fail on an .xls document, so if you run it and check for output. .doc
files would output a lot, .xls files would output an error or nothing.
The gnome.vfs module is probably your best bet though :-)

Additionally, I would re-organize your program a bit. something like:

import os
import re
import subprocess

types = (
('rtf', 'Rich Text Format data'),
('doc', 'Microsoft Office Document'),
('pdf', 'PDF'),
('txt', 'ASCII English text'),
)

def get_magic(filen ame):
pipe=subprocess .Popen(['file',filename],stdout=subproc ess.PIPE)
output = pipe.stdout.rea d()
pipe.wait()
return output

def detext(filename ):
fileoutput = get_magic(filen ame)
for ext, pattern in types:
if pattern in fileoutput:
return ext
def allfiles(path):
for root,dirs,files in os.walk(os.getc wd()):
for each in files:
fname = os.path.join(ro ot,each)
yield fname

def fixnames(path):
for fname in allfiles(path):
extension = detext(fname)
print fname, extension #....

def main():
path = os.getcwd()
fixnames(path)

if __name__ == '__main__':
main()

Short functions that just do one thing are always best.

To change that to use gnome.vfs, just change the types list to be a
dictionary like
types = {
'application/msword': 'doc',
'application/vnd.ms-powerpoint': 'ppt',
}

and then

def get_mime(filena me):
info = gnome.vfs.get_f ile_info(filena me,
gnome.vfs.FILE_ INFO_GET_MIME_T YPE)
return info.mime_type

def detext(filename ):
mime_type = get_mime(filena me)
return types.get(mime_ type)

--
- Justin

Jul 18 '06 #2
Justin Azoff wrote:
Additionally, I would re-organize your program a bit. something like:
Thanks Justin, that was a helpful one. Helping me in learning python
programming.

Thanks,
Senthil

Jul 18 '06 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
3838
by: Logan | last post by:
Several people asked me for the following HOWTO, so I decided to post it here (though it is still very 'alpha' and might contain many (?) mistakes; didn't test what I wrote, but wrote it - more or less - during my own installation of Python 2.3 on Fedora Core 1 Linux for a friend of mine). Anyway, HTH, L.
4
3679
by: Alexander Eisenhuth | last post by:
Hi alltogether, I use ActivePython 2.4.1 , also the debug part from http://ftp.activestate.com/ActivePython/etc/ and VC 6.0 unter Windows XP. I can't figure out howto debug my c++ extension. If i compile it as release version, I've of course no chance to set a breakpoint. If I compile as debug I get the Error-window: .... missing MSVCR71D.dll ...
0
360
by: Alex Duggleby | last post by:
Hi everybody, I'm trying to add some mime types to the local IIS server using some c# code. The code I'm using is: ---snip--- DirectoryEntry _mimeMap =
18
7707
by: Keith Brown | last post by:
I have an application that allows embedded storage of ANY chosen file in an OLE field. The file could have been dragged-and-dropped into the field or it might have been selected and imported programmatically using the common file dialog. Regardless, I need to determine the filetype/extension of each of these files already stored in my OLE fields and display it for the user. Double-clicking the raw OLE field or using the .Verb =...
16
48868
by: cyranoVR | last post by:
This is the approach I used to automate printing of Microsoft Access reports to PDF format i.e. unattended and without annoying "Save As..." dialogs, and - more importantly - without having to use a commercial program such as Adobe Acrobat and its associated API. The technique uses Ghostscript and Redirection Port Monitor - two free programs for creating PDF documents provided free by Russell Lang. The actual automation requires VBA...
1
5380
by: Roy | last post by:
Hi, I have a problem that I have been working with for a while. I need to be able from server side (asp.net) to detect that the file i'm streaming down to the client is saved completely/succsessfully on the client's computer before updating some metadata on the server (file downloaded date for instance) However, All examples i have tried, and all examples I have found that other people says works - doesn't work for me :-(
4
9713
by: lcifers | last post by:
Is there a way, through VB.NET, to determine if the user has selected this option? I am writing an application that does some string functions to rename files, and the file names get chopped up if the expected extension is not returned. I can write another function to get around this, but if there is a clean way to determine the state of that toggle I would rather do that. I may need this again in the future and don't want to always...
9
8409
by: Erwin Moller | last post by:
Hi, Can anybody comment on this? In comp.lang.php I advised somebody to skip using: <script language="javascript"> and use: <script type="text/javascript"> And mr. Dunlop gave this response:
1
2558
by: topramen | last post by:
does any one here know of a good way to to determine whether or not a given path is a directory (e.g., "c:\mydir") or a file (e.g., "c:\mydir \myfile.txt")? i started attacking this problem by using filesystemobject to check whether or not a directory existed for the path. if so, it was a directory path. if not, i checked to see if a file existed for it. if so, it was a file path. this code is shown below: ...
0
8483
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8401
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
1
8603
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
7444
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6236
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5703
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4227
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
2060
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
2
1815
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.