473,396 Members | 1,992 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

parsing directory for certain filetypes

hi
i wrote a function to parse a given directory and make a sorted list
of files with .txt,.doc extensions .it works,but i want to know if it
is too bloated..can this be rewritten in more efficient manner?

here it is...

from string import split
from os.path import isdir,join,normpath
from os import listdir

def parsefolder(dirname):
filenms=[]
folder=dirname
isadr=isdir(folder)
if (isadr):
dirlist=listdir(folder)
filenm=""
for x in dirlist:
filenm=x
if(filenm.endswith(("txt","doc"))):
nmparts=[]
nmparts=split(filenm,'.' )
if((nmparts[1]=='txt') or (nmparts[1]=='doc')):
filenms.append(filenm)
filenms.sort()
filenameslist=[]
filenameslist=[normpath(join(folder,y)) for y in filenms]
numifiles=len(filenameslist)
print filenameslist
return filenameslist
folder='F:/mysys/code/tstfolder'
parsefolder(folder)
thanks,
RG
Mar 10 '08 #1
10 2162
sam
royG napisał(a):
i wrote a function to parse a given directory and make a sorted list
of files with .txt,.doc extensions .it works,but i want to know if it
is too bloated..can this be rewritten in more efficient manner?
Probably this should be rewriten and should be very compact. Maybe you should
grab string:

find $dirname -type f -a \( -name '*.txt' -o -name '*.doc' \)

and split by "\n"?
--
UFO Occupation
www.totalizm.org
Mar 10 '08 #2
On Mar 10, 8:57 am, royG <roygeor...@gmail.comwrote:
i wrote a function to parse a given directory and make a sorted list
of files with .txt,.doc extensions .it works,but i want to know if it
is too bloated..can this be rewritten in more efficient manner?
Try the 'glob' module.

....
Jay
Mar 10 '08 #3
sam
Robert Bossy napisał(a):
I leave you the exercice to add .doc files. But I must say (whoever's
listening) that I was a bit disappointed that glob('*.{txt,doc}') didn't
work.
"{" and "}" are bash invention and not POSIX standard unfortunately

--
UFO Occupation
www.totalizm.org
Mar 10 '08 #4
On Mar 10, 9:28 am, Robert Bossy <Robert.Bo...@jouy.inra.frwrote:
Personally, I'd use glob.glob:

import os.path
import glob

def parsefolder(folder):
path = os.path.normpath(os.path.join(folder, '*.py'))
lst = [ fn for fn in glob.glob(path) ]
lst.sort()
return lst
Why the 'no-op' list comprehension? Typo?

....
Jay
Mar 10 '08 #5
i wrote a function to parse a given directory and make a sorted list
of files with .txt,.doc extensions .it works,but i want to know if it
is too bloated..can this be rewritten in more efficient manner?

here it is...

from string import split
from os.path import isdir,join,normpath
from os import listdir

def parsefolder(dirname):
filenms=[]
folder=dirname
isadr=isdir(folder)
if (isadr):
dirlist=listdir(folder)
filenm=""
for x in dirlist:
filenm=x
if(filenm.endswith(("txt","doc"))):
nmparts=[]
nmparts=split(filenm,'.' )
if((nmparts[1]=='txt') or (nmparts[1]=='doc')):
filenms.append(filenm)
filenms.sort()
filenameslist=[]
filenameslist=[normpath(join(folder,y)) for y in filenms]
numifiles=len(filenameslist)
print filenameslist
return filenameslist
folder='F:/mysys/code/tstfolder'
parsefolder(folder)
It seems to me that this is awfully baroque with many unneeded
superfluous variables. Is this not the same functionality (minus
prints, unused result-counting, NOPs, and belt-and-suspenders
extension-checking) as

def parsefolder(dirname):
if not isdir(dirname): return
return sorted([
normpath(join(dirname, fname))
for fname in listdir(dirname)
if fname.lower().endswith('.txt')
or fname.lower().endswith('.doc')
])

In Python2.5 (or 2.4 if you implement the any() function, ripped
from the docs[1]), this could be rewritten to be a little more
flexible...something like this (untested):

def parsefolder(dirname, types=['.doc', '.txt']):
if not isdir(dirname): return
return sorted([
normpath(join(dirname, fname))
for fname in listdir(dirname)
if any(
fname.lower().endswith(s)
for s in types)
])

which would allow you to do both

parsefolder('/path/to/wherever/')

and

parsefolder('/path/to/wherever/', ['.xls', '.ppt', '.htm'])

In both cases, you don't define the case where isdir(dirname)
fails. Caveat Implementor.

-tkc
[1] http://docs.python.org/lib/built-in-funcs.html


Mar 10 '08 #6
jay graves wrote:
On Mar 10, 9:28 am, Robert Bossy <Robert.Bo...@jouy.inra.frwrote:
>Personally, I'd use glob.glob:

import os.path
import glob

def parsefolder(folder):
path = os.path.normpath(os.path.join(folder, '*.py'))
lst = [ fn for fn in glob.glob(path) ]
lst.sort()
return lst


Why the 'no-op' list comprehension? Typo?
My mistake, it is:

import os.path
import glob

def parsefolder(folder):
path = os.path.normpath(os.path.join(folder, '*.py'))
lst = glob.glob(path)
lst.sort()
return lst
Mar 10 '08 #7
On Mar 10, 8:03 pm, Tim Chase wrote:
In Python2.5 (or 2.4 if you implement the any() function, ripped
from the docs[1]), this could be rewritten to be a little more
flexible...something like this (untested):
that was quite a good lesson for a beginner like me..
thanks guys

in the version using glob()
>path = os.path.normpath(os.path.join(folder, '*.txt'))
lst = glob.glob(path)
is it possible to check for more than one file extension? here i will
have to create two path variables like
path1 = os.path.normpath(os.path.join(folder, '*.txt'))
path2 = os.path.normpath(os.path.join(folder, '*.doc'))

and then use glob separately..
or is there another way?

RG
Mar 11 '08 #8
On Mar 11, 6:21 am, royG <roygeor...@gmail.comwrote:
On Mar 10, 8:03 pm, Tim Chase wrote:
In Python2.5 (or 2.4 if you implement the any() function, ripped
from the docs[1]), this could be rewritten to be a little more
flexible...something like this (untested):

that was quite a good lesson for a beginner like me..
thanks guys

in the version using glob()
path = os.path.normpath(os.path.join(folder, '*.txt'))
lst = glob.glob(path)

is it possible to check for more than one file extension? here i will
have to create two path variables like
path1 = os.path.normpath(os.path.join(folder, '*.txt'))
path2 = os.path.normpath(os.path.join(folder, '*.doc'))

and then use glob separately..
or is there another way?
I don't think you can match multiple patterns directly with glob, but
`fnmatch` - the module used by glob to do check for matches - has a
`translate` function which will convert a glob pattern to a regular
expression (string). So you can do something along the lines of the
following:

---------------------------------------------

import os
from fnmatch import translate
import re

d = '/tmp'
patt1 = '*.log'
patt2 = '*.ini'
patterns = [patt1, patt2]

rx = '|'.join(translate(p) for p in patterns)
patt = re.compile(rx)

for f in os.listdir(d):
if patt.match(f):
print f

---------------------------------------------

hth

Gerard
Mar 11 '08 #9
On Mar 11, 12:21 am, royG <roygeor...@gmail.comwrote:
On Mar 10, 8:03 pm, Tim Chase wrote:
in the version using glob()
path = os.path.normpath(os.path.join(folder, '*.txt'))
lst = glob.glob(path)

is it possible to check for more than one file extension? here i will
have to create two path variables like
path1 = os.path.normpath(os.path.join(folder, '*.txt'))
path2 = os.path.normpath(os.path.join(folder, '*.doc'))

and then use glob separately..
or is there another way?
use a loop. (untested)

def parsefolder(folder):
lst = []
for pattern in ('*.txt','*.doc'):
path = os.path.normpath(os.path.join(folder, pattern))
lst.extend(glob.glob(path))
lst.sort()
return lst

Mar 11 '08 #10
royG wrote:
On Mar 10, 8:03 pm, Tim Chase wrote:
>In Python2.5 (or 2.4 if you implement the any() function, ripped
from the docs[1]), this could be rewritten to be a little more
flexible...something like this (untested):

that was quite a good lesson for a beginner like me..
thanks guys

in the version using glob()
>path = os.path.normpath(os.path.join(folder, '*.txt'))
lst = glob.glob(path)

is it possible to check for more than one file extension? here i will
have to create two path variables like
path1 = os.path.normpath(os.path.join(folder, '*.txt'))
path2 = os.path.normpath(os.path.join(folder, '*.doc'))

and then use glob separately..
Though it doesn't use glob, the 2nd solution I gave (the one that
uses the any() function you quoted) should be able to handle an
arbitrary number of extensions...

-tkc


Mar 11 '08 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Jakob Bengtsson | last post by:
Hi, Does anyone know how to get the icons for all filetypes installed on the system? I can extract the associated icon from a file, but I don't know how to: 1) Get a list of filetypes; 2) Get...
16
by: Terry | last post by:
Hi, This is a newbie's question. I want to preload 4 images and only when all 4 images has been loaded into browser's cache, I want to start a slideshow() function. If images are not completed...
1
by: Justin Straube | last post by:
Hopefully someone can catch what im missing here. Ive googled this and I think Ive got the filetypes arg written properly, but I get a traceback when calling this function. Heres the code...
2
by: John Young | last post by:
I'm trying to parse a directory, but am not sure of the best way of doing it. Preferably using only .net instructions. Can anyone give me an idea of how to do this? Thanks in advance for any...
3
by: adrin | last post by:
hello, i am writing a simple c# app that will help to manage my music(mp3,ape,etc) collection. it is supposed to get parent directories' names from a cd and 'extract' certain data form it. for...
1
by: entee1 | last post by:
Hi all, I am trying to create a webpage that reads in the name of a host from the user and then attempts to read the contents of this host, in order to search for a particular type of file. So...
1
by: Michael Yanowitz | last post by:
Hello: Is there a global or some trick I can use to have Python remember the last directory visited? What I mean is suppose I have this function: def get_filename(): """ Returns a filename...
0
by: Peter D. Dunlap | last post by:
I've got a site I'm working on that works fine using the VS2005 internal browser. I need to test it from another machine on the network, so I set up a new virtual directory in IIS (on Windows XP...
13
by: Chris Carlen | last post by:
Hi: Having completed enough serial driver code for a TMS320F2812 microcontroller to talk to a terminal, I am now trying different approaches to command interpretation. I have a very simple...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.