parsing directory for certain filetypes

royG

hi
i wrote a function to parse a given directory and make a sorted list
of files with .txt,.doc extensions .it works,but i want to know if it
is too bloated..can this be rewritten in more efficient manner?

here it is...

from string import split
from os.path import isdir,join,normpath
from os import listdir

def parsefolder(dirname):
filenms=[]
folder=dirname
isadr=isdir(folder)
if (isadr):
dirlist=listdir(folder)
filenm=""
for x in dirlist:
filenm=x
if(filenm.endswith(("txt","doc"))):
nmparts=[]
nmparts=split(filenm,'.' )
if((nmparts[1]=='txt') or (nmparts[1]=='doc')):
filenms.append(filenm)
filenms.sort()
filenameslist=[]
filenameslist=[normpath(join(folder,y)) for y in filenms]
numifiles=len(filenameslist)
print filenameslist
return filenameslist
folder='F:/mysys/code/tstfolder'
parsefolder(folder)
thanks,
RG

Mar 10 '08 #1

Subscribe Post Reply

2162

sam

royG napisaÅ‚(a):

i wrote a function to parse a given directory and make a sorted list
of files with .txt,.doc extensions .it works,but i want to know if it
is too bloated..can this be rewritten in more efficient manner?

Probably this should be rewriten and should be very compact. Maybe you should
grab string:

find $dirname -type f -a $ -name '*.txt' -o -name '*.doc' $

and split by "\n"?
--
UFO Occupation
www.totalizm.org

Mar 10 '08 #2

jay graves

On Mar 10, 8:57 am, royG <roygeor...@gmail.comwrote:

i wrote a function to parse a given directory and make a sorted list
of files with .txt,.doc extensions .it works,but i want to know if it
is too bloated..can this be rewritten in more efficient manner?

Try the 'glob' module.

....
Jay

Mar 10 '08 #3

sam

Robert Bossy napisaÅ‚(a):

I leave you the exercice to add .doc files. But I must say (whoever's
listening) that I was a bit disappointed that glob('*.{txt,doc}') didn't
work.

"{" and "}" are bash invention and not POSIX standard unfortunately

--
UFO Occupation
www.totalizm.org

Mar 10 '08 #4

jay graves

On Mar 10, 9:28 am, Robert Bossy <Robert.Bo...@jouy.inra.frwrote:

Personally, I'd use glob.glob:

import os.path
import glob

def parsefolder(folder):
path = os.path.normpath(os.path.join(folder, '*.py'))
lst = [ fn for fn in glob.glob(path) ]
lst.sort()
return lst

Why the 'no-op' list comprehension? Typo?

....
Jay

Mar 10 '08 #5

Tim Chase

i wrote a function to parse a given directory and make a sorted list

of files with .txt,.doc extensions .it works,but i want to know if it
is too bloated..can this be rewritten in more efficient manner?

here it is...

from string import split
from os.path import isdir,join,normpath
from os import listdir

def parsefolder(dirname):
filenms=[]
folder=dirname
isadr=isdir(folder)
if (isadr):
dirlist=listdir(folder)
filenm=""
for x in dirlist:
filenm=x
if(filenm.endswith(("txt","doc"))):
nmparts=[]
nmparts=split(filenm,'.' )
if((nmparts[1]=='txt') or (nmparts[1]=='doc')):
filenms.append(filenm)
filenms.sort()
filenameslist=[]
filenameslist=[normpath(join(folder,y)) for y in filenms]
numifiles=len(filenameslist)
print filenameslist
return filenameslist
folder='F:/mysys/code/tstfolder'
parsefolder(folder)

It seems to me that this is awfully baroque with many unneeded
superfluous variables. Is this not the same functionality (minus
prints, unused result-counting, NOPs, and belt-and-suspenders
extension-checking) as

def parsefolder(dirname):
if not isdir(dirname): return
return sorted([
normpath(join(dirname, fname))
for fname in listdir(dirname)
if fname.lower().endswith('.txt')
or fname.lower().endswith('.doc')
])

In Python2.5 (or 2.4 if you implement the any() function, ripped
from the docs[1]), this could be rewritten to be a little more
flexible...something like this (untested):

def parsefolder(dirname, types=['.doc', '.txt']):
if not isdir(dirname): return
return sorted([
normpath(join(dirname, fname))
for fname in listdir(dirname)
if any(
fname.lower().endswith(s)
for s in types)
])

which would allow you to do both

parsefolder('/path/to/wherever/')

and

parsefolder('/path/to/wherever/', ['.xls', '.ppt', '.htm'])

In both cases, you don't define the case where isdir(dirname)
fails. Caveat Implementor.

-tkc
[1] http://docs.python.org/lib/built-in-funcs.html

Mar 10 '08 #6

Robert Bossy

jay graves wrote:

On Mar 10, 9:28 am, Robert Bossy <Robert.Bo...@jouy.inra.frwrote:

>Personally, I'd use glob.glob:

import os.path
import glob

def parsefolder(folder):
path = os.path.normpath(os.path.join(folder, '*.py'))
lst = [ fn for fn in glob.glob(path) ]
lst.sort()
return lst

Why the 'no-op' list comprehension? Typo?

My mistake, it is:

import os.path
import glob

def parsefolder(folder):
path = os.path.normpath(os.path.join(folder, '*.py'))
lst = glob.glob(path)
lst.sort()
return lst

Mar 10 '08 #7

royG

On Mar 10, 8:03 pm, Tim Chase wrote:

In Python2.5 (or 2.4 if you implement the any() function, ripped
from the docs[1]), this could be rewritten to be a little more
flexible...something like this (untested):

that was quite a good lesson for a beginner like me..
thanks guys

in the version using glob()

>path = os.path.normpath(os.path.join(folder, '*.txt'))
lst = glob.glob(path)

is it possible to check for more than one file extension? here i will
have to create two path variables like
path1 = os.path.normpath(os.path.join(folder, '*.txt'))
path2 = os.path.normpath(os.path.join(folder, '*.doc'))

and then use glob separately..
or is there another way?

RG

Mar 11 '08 #8

Gerard Flanagan

On Mar 11, 6:21 am, royG <roygeor...@gmail.comwrote:

On Mar 10, 8:03 pm, Tim Chase wrote:

In Python2.5 (or 2.4 if you implement the any() function, ripped
from the docs[1]), this could be rewritten to be a little more
flexible...something like this (untested):

that was quite a good lesson for a beginner like me..
thanks guys

in the version using glob()

path = os.path.normpath(os.path.join(folder, '*.txt'))
lst = glob.glob(path)

is it possible to check for more than one file extension? here i will
have to create two path variables like
path1 = os.path.normpath(os.path.join(folder, '*.txt'))
path2 = os.path.normpath(os.path.join(folder, '*.doc'))

and then use glob separately..
or is there another way?

I don't think you can match multiple patterns directly with glob, but
`fnmatch` - the module used by glob to do check for matches - has a
`translate` function which will convert a glob pattern to a regular
expression (string). So you can do something along the lines of the
following:

---------------------------------------------

import os
from fnmatch import translate
import re

d = '/tmp'
patt1 = '*.log'
patt2 = '*.ini'
patterns = [patt1, patt2]

rx = '|'.join(translate(p) for p in patterns)
patt = re.compile(rx)

for f in os.listdir(d):
if patt.match(f):
print f

---------------------------------------------

hth

Gerard

Mar 11 '08 #9

jay graves

On Mar 11, 12:21 am, royG <roygeor...@gmail.comwrote:

On Mar 10, 8:03 pm, Tim Chase wrote:
in the version using glob()

path = os.path.normpath(os.path.join(folder, '*.txt'))
lst = glob.glob(path)

is it possible to check for more than one file extension? here i will
have to create two path variables like
path1 = os.path.normpath(os.path.join(folder, '*.txt'))
path2 = os.path.normpath(os.path.join(folder, '*.doc'))

and then use glob separately..
or is there another way?

use a loop. (untested)

def parsefolder(folder):
lst = []
for pattern in ('*.txt','*.doc'):
path = os.path.normpath(os.path.join(folder, pattern))
lst.extend(glob.glob(path))
lst.sort()
return lst

Mar 11 '08 #10

Tim Chase

royG wrote:

On Mar 10, 8:03 pm, Tim Chase wrote:

>In Python2.5 (or 2.4 if you implement the any() function, ripped
from the docs[1]), this could be rewritten to be a little more
flexible...something like this (untested):

that was quite a good lesson for a beginner like me..
thanks guys

in the version using glob()
>path = os.path.normpath(os.path.join(folder, '*.txt'))
lst = glob.glob(path)

is it possible to check for more than one file extension? here i will
have to create two path variables like
path1 = os.path.normpath(os.path.join(folder, '*.txt'))
path2 = os.path.normpath(os.path.join(folder, '*.doc'))

and then use glob separately..

Though it doesn't use glob, the 2nd solution I gave (the one that
uses the any() function you quoted) should be able to handle an
arbitrary number of extensions...

-tkc

Mar 11 '08 #11

Similar topics

Icons for registered filetypes

by: Jakob Bengtsson | last post by:

Hi, Does anyone know how to get the icons for all filetypes installed on the system? I can extract the associated icon from a file, but I don't know how to: 1) Get a list of filetypes; 2) Get...

.NET Framework

Help with a Simple Question

by: Terry | last post by:

Hi, This is a newbie's question. I want to preload 4 images and only when all 4 images has been loaded into browser's cache, I want to start a slideshow() function. If images are not completed...

Javascript

tkFileDialog.askopenfilename filetypes problem

by: Justin Straube | last post by:

Hopefully someone can catch what im missing here. Ive googled this and I think Ive got the filetypes arg written properly, but I get a traceback when calling this function. Heres the code...

Python

Parsing a directory

by: John Young | last post by:

I'm trying to parse a directory, but am not sure of the best way of doing it. Preferably using only .net instructions. Can anyone give me an idea of how to do this? Thanks in advance for any...

C# / C Sharp

parsing dir names

by: adrin | last post by:

hello, i am writing a simple c# app that will help to manage my music(mp3,ape,etc) collection. it is supposed to get parent directories' names from a cd and 'extract' certain data form it. for...

C# / C Sharp

C#.NET reading remote host directory contents

by: entee1 | last post by:

Hi all, I am trying to create a webpage that reads in the name of a host from the user and then attempts to read the contents of this host, in order to search for a particular type of file. So...

C# / C Sharp

Last used directory?

by: Michael Yanowitz | last post by:

Hello: Is there a global or some trick I can use to have Python remember the last directory visited? What I mean is suppose I have this function: def get_filename(): """ Returns a filename...

Python

XML Parsing Error

by: Peter D. Dunlap | last post by:

I've got a site I'm working on that works fine using the VS2005 internal browser. I need to test it from another machine on the network, so I set up a new virtual directory in IIS (on Windows XP...

ASP.NET

Command language parsing - how formal to get?

by: Chris Carlen | last post by:

Hi: Having completed enough serial driver code for a TMS320F2812 microcontroller to talk to a terminal, I am now trying different approaches to command interpretation. I have a very simple...

C / C++

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General