473,387 Members | 1,596 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Sorting directory contents

H folks,

I got, hmm not really a problem, more a question of elegance:

In a current project I have to read in some files in a given
directory in chronological order, so that I can concatenate the
contents in those files into a new one (it's XML and I have to
concatenate some subelements, about 4 levels below the root
element). It all works, but somehow I got the feeling, that my
solution is not as elegant as it could be:

src_file_paths = dict()
for fname in os.listdir(sourcedir):
fpath = sourcedir+os.sep+fname
if not match_fname_pattern(fname): continue
src_file_paths[os.stat(fpath).st_mtime] = fpath
for ftime in src_file_paths.keys().sort():
read_and_concatenate(src_file_paths[ftime])

of course listdir and sorting could be done in a separate
function, but I wonder if there was a more elegant approach.

Wolfgang Draxinger
--
E-Mail address works, Jabber: he******@jabber.org, ICQ: 134682867

Feb 20 '07 #1
9 2038
Wolfgang Draxinger kirjoitti:
H folks,

I got, hmm not really a problem, more a question of elegance:

In a current project I have to read in some files in a given
directory in chronological order, so that I can concatenate the
contents in those files into a new one (it's XML and I have to
concatenate some subelements, about 4 levels below the root
element). It all works, but somehow I got the feeling, that my
solution is not as elegant as it could be:

src_file_paths = dict()
for fname in os.listdir(sourcedir):
fpath = sourcedir+os.sep+fname
if not match_fname_pattern(fname): continue
src_file_paths[os.stat(fpath).st_mtime] = fpath
for ftime in src_file_paths.keys().sort():
read_and_concatenate(src_file_paths[ftime])

of course listdir and sorting could be done in a separate
function, but I wonder if there was a more elegant approach.

Wolfgang Draxinger
I'm not claiming the following to be more elegant, but I would do it
like this (not tested!):

src_file_paths = dict()
prefix = sourcedir + os.sep
for fname in os.listdir(sourcedir):
if match_fname_pattern(fname):
fpath = prefix + fname
src_file_paths[os.stat(fpath).st_mtime] = fpath
for ftime in src_file_paths.keys().sort():
read_and_concatenate(src_file_paths[ftime])
Cheers,
Jussi
Feb 20 '07 #2
Jussi Salmela wrote:
I'm not claiming the following to be more elegant, but I would
do it like this (not tested!):

src_file_paths = dict()
prefix = sourcedir + os.sep
for fname in os.listdir(sourcedir):
if match_fname_pattern(fname):
fpath = prefix + fname
src_file_paths[os.stat(fpath).st_mtime] = fpath
for ftime in src_file_paths.keys().sort():
read_and_concatenate(src_file_paths[ftime])
Well, both versions, mine and yours won't work as it was written
down, as they neglegt the fact, that different files can have
the same st_mtime and that <listtype>.sort() doesn't return a
sorted list.

However this code works (tested) and behaves just like listdir,
only that it sorts files chronologically, then alphabetically.

def listdir_chrono(dirpath):
import os
files_dict = dict()
for fname in os.listdir(dirpath):
mtime = os.stat(dirpath+os.sep+fname).st_mtime
if not mtime in files_dict:
files_dict[mtime] = list()
files_dict[mtime].append(fname)

mtimes = files_dict.keys()
mtimes.sort()
filenames = list()
for mtime in mtimes:
fnames = files_dict[mtime]
fnames.sort()
for fname in fnames:
filenames.append(fname)
return filenames

Wolfgang Draxinger
--
E-Mail address works, Jabber: he******@jabber.org, ICQ: 134682867

Feb 20 '07 #3
Wolfgang Draxinger wrote:
Jussi Salmela wrote:
>I'm not claiming the following to be more elegant, but I would
do it like this (not tested!):

src_file_paths = dict()
prefix = sourcedir + os.sep
for fname in os.listdir(sourcedir):
if match_fname_pattern(fname):
fpath = prefix + fname
src_file_paths[os.stat(fpath).st_mtime] = fpath
for ftime in src_file_paths.keys().sort():
read_and_concatenate(src_file_paths[ftime])

Well, both versions, mine and yours won't work as it was written
down, as they neglegt the fact, that different files can have
the same st_mtime and that <listtype>.sort() doesn't return a
sorted list.

However this code works (tested) and behaves just like listdir,
only that it sorts files chronologically, then alphabetically.

def listdir_chrono(dirpath):
import os
files_dict = dict()
for fname in os.listdir(dirpath):
mtime = os.stat(dirpath+os.sep+fname).st_mtime
if not mtime in files_dict:
files_dict[mtime] = list()
files_dict[mtime].append(fname)

mtimes = files_dict.keys()
mtimes.sort()
filenames = list()
for mtime in mtimes:
fnames = files_dict[mtime]
fnames.sort()
for fname in fnames:
filenames.append(fname)
return filenames

Wolfgang Draxinger
Four suggestions:

1) You might want to use os.path.join(dirpath, fname) instead of
dirpath+os.sep+fname.

2) You may be able to use glob.glob(<pattern>) to filter the files
more easily.

3) You didn't handle the possibility that there is s subdirectory
in the current directory. You need to check to make sure it is
a file you are processing as os.listdir() returns files AND
directories.

4) If you just put a tuple containing (mtime, filename) in a list
each time through the loop you can just sort that list at the
end it will be sorted by mtime and then alphabetically.

Example (not tested):

def listdir_chrono(dirpath):
import os
#
# Get a list of full pathnames for all the files in dirpath
# and exclude all the subdirectories. Note: This might be
# able to be replaced by glob.glob() to simplify. I would then
# add a second optional parameter: mask="" that would allow me
# to pass in a mask.
#
# List comprehensions are our friend when we are processing
# lists of things.
#
files=[os.path.join(dirpath, x) for x in os.listdir(dirpath)
if not os.path.isdir(os.path.join(dirpath, x)]

#
# Get a list of tuples that contain (mtime, filename) that
# I can sort.
#
flist=[(os.stat(x).st_mtime, x) for x in files]

#
# Sort them. Sort will sort on mtime, then on filename
#
flist.sort()
#
# Extract a list of the filenames only and return it
#
return [x[1] for x in flist]
#
# or if you only want the basenames of the files
#
#return [os.path.basename(x[1]) for x in flist]

-Larry Bates

Feb 20 '07 #4
Wolfgang Draxinger <wd********@darkstargames.dewrites:
src_file_paths = dict()
for fname in os.listdir(sourcedir):
fpath = sourcedir+os.sep+fname
if not match_fname_pattern(fname): continue
src_file_paths[os.stat(fpath).st_mtime] = fpath
for ftime in src_file_paths.keys().sort():
read_and_concatenate(src_file_paths[ftime])
Note you have to used sorted() and not .sort() to get back a value
that you can iterate through.

Untested:

from itertools import ifilter

goodfiles = ifilter(match_fname_pattern,
(sourcedir+os.sep+fname for \
fname in os.listdir(sourcedir))

for f,t in sorted((fname,os.stat(f).st_mtime) for fname in goodfiles,
key=lambda (fname,ftime): ftime):
read_and_concatenate(f)

If you're a lambda-phobe you can use operator.itemgetter(1) instead of
the lambda. Obviously you don't need the separate goodfiles variable
but things get a bit deeply nested without it.
Feb 20 '07 #5
Wolfgang Draxinger kirjoitti:
Jussi Salmela wrote:
>I'm not claiming the following to be more elegant, but I would
do it like this (not tested!):

src_file_paths = dict()
prefix = sourcedir + os.sep
for fname in os.listdir(sourcedir):
if match_fname_pattern(fname):
fpath = prefix + fname
src_file_paths[os.stat(fpath).st_mtime] = fpath
for ftime in src_file_paths.keys().sort():
read_and_concatenate(src_file_paths[ftime])

Well, both versions, mine and yours won't work as it was written
down, as they neglegt the fact, that different files can have
the same st_mtime and that <listtype>.sort() doesn't return a
sorted list.

However this code works (tested) and behaves just like listdir,
only that it sorts files chronologically, then alphabetically.

def listdir_chrono(dirpath):
import os
files_dict = dict()
for fname in os.listdir(dirpath):
mtime = os.stat(dirpath+os.sep+fname).st_mtime
if not mtime in files_dict:
files_dict[mtime] = list()
files_dict[mtime].append(fname)

mtimes = files_dict.keys()
mtimes.sort()
filenames = list()
for mtime in mtimes:
fnames = files_dict[mtime]
fnames.sort()
for fname in fnames:
filenames.append(fname)
return filenames

Wolfgang Draxinger
More elegant or not ... I did it MY WAYYYY!!! (and tested this time
really carefully;)):

#-------------------------------
def listdir_chrono_2(dirpath):
import os
files_dict = {}
prefix = dirpath + os.sep
for fname in os.listdir(dirpath):
mtime = os.stat(prefix + fname).st_mtime
files_dict.setdefault(mtime, []).append(fname)

mtimes = sorted(files_dict.keys())
filenames = []
for mtime in mtimes:
filenames += sorted(files_dict[mtime])
return filenames

firstLst = listdir_chrono('.')
secondLst = listdir_chrono_2('.')
if firstLst == secondLst: print 'OK'
else: print 'ERROR!!!'
#-------------------------------

I keep taking the "dirpath + os.sep" part out of the loop because it is
a loop invariant and doesn't have to be inside the loop.

Cheer
Feb 20 '07 #6

Wolfgang Draxinger wrote:
However this code works (tested) and behaves just like listdir,
only that it sorts files chronologically, then alphabetically.

def listdir_chrono(dirpath):
import os
files_dict = dict()
for fname in os.listdir(dirpath):
mtime = os.stat(dirpath+os.sep+fname).st_mtime
if not mtime in files_dict:
files_dict[mtime] = list()
files_dict[mtime].append(fname)

mtimes = files_dict.keys()
mtimes.sort()
filenames = list()
for mtime in mtimes:
fnames = files_dict[mtime]
fnames.sort()
for fname in fnames:
filenames.append(fname)
return filenames
Using the builtin functions `sorted`, `filter` and the `setdefault`
method of dictionary could a little shorten your code:

def listdir_chrono(dirpath):
import os
files_dict = {}
for fname in filter(os.path.isfile, os.listdir(dirpath)):
mtime = os.stat(os.path.join(dirpath, fname)).st_mtime
files_dict.setdefault(mtime, []).append(fname)

filenames = []
for mtime in sorted(files_dict):
for fname in sorted(files_dict[mtime]):
filenames.append(fname)
return filenames

--
HTH,
Rob

Feb 20 '07 #7
Larry Bates wrote:
3) You didn't handle the possibility that there is s
subdirectory
in the current directory. You need to check to make sure it
is a file you are processing as os.listdir() returns files
AND directories.
Well, the directory the files are in is not supposed to have any
subdirectories.
4) If you just put a tuple containing (mtime, filename) in a
list
each time through the loop you can just sort that list at
the end it will be sorted by mtime and then alphabetically.
Ah, of course. Hmm, seems I was short of caffeine when I hacked
my code :-P
def listdir_chrono(dirpath):
import os
#
# Get a list of full pathnames for all the files in dirpath
# and exclude all the subdirectories. Note: This might be
# able to be replaced by glob.glob() to simplify. I would
# then add a second optional parameter: mask="" that would
# allow me to pass in a mask.
#
# List comprehensions are our friend when we are processing
# lists of things.
#
files=[os.path.join(dirpath, x) for x in
os.listdir(dirpath)
if not os.path.isdir(os.path.join(dirpath, x)]

#
# Get a list of tuples that contain (mtime, filename) that
# I can sort.
#
flist=[(os.stat(x).st_mtime, x) for x in files]

#
# Sort them. Sort will sort on mtime, then on filename
#
flist.sort()
#
# Extract a list of the filenames only and return it
#
return [x[1] for x in flist]
#
# or if you only want the basenames of the files
#
#return [os.path.basename(x[1]) for x in flist]
Now, THAT is elegant.

Wolfgang Draxinger
--
E-Mail address works, Jabber: he******@jabber.org, ICQ: 134682867

Feb 20 '07 #8
Wolfgang Draxinger wrote:
I got, hmm not really a problem, more a question of elegance:

In a current project I have to read in some files in a given
directory in chronological order, so that I can concatenate the
contents in those files into a new one (it's XML and I have to
concatenate some subelements, about 4 levels below the root
element). It all works, but somehow I got the feeling, that my
solution is not as elegant as it could be:

src_file_paths = dict()
for fname in os.listdir(sourcedir):
fpath = sourcedir+os.sep+fname
if not match_fname_pattern(fname): continue
src_file_paths[os.stat(fpath).st_mtime] = fpath
for ftime in src_file_paths.keys().sort():
read_and_concatenate(src_file_paths[ftime])

of course listdir and sorting could be done in a separate
function, but I wonder if there was a more elegant approach.
If glob.glob() is good enough to replace your custom match_fname_pattern()
you can save a few steps:

pattern = os.path.join(sourcedir, "*.xml")
files = glob.glob(pattern)
files.sort(key=os.path.getmtime)
for fn in files:
read_and_concatenate(fn)

Peter
Feb 21 '07 #9
Larry Bates kirjoitti:
Wolfgang Draxinger wrote:
>Jussi Salmela wrote:
>>I'm not claiming the following to be more elegant, but I would
do it like this (not tested!):

src_file_paths = dict()
prefix = sourcedir + os.sep
for fname in os.listdir(sourcedir):
if match_fname_pattern(fname):
fpath = prefix + fname
src_file_paths[os.stat(fpath).st_mtime] = fpath
for ftime in src_file_paths.keys().sort():
read_and_concatenate(src_file_paths[ftime])
Well, both versions, mine and yours won't work as it was written
down, as they neglegt the fact, that different files can have
the same st_mtime and that <listtype>.sort() doesn't return a
sorted list.

However this code works (tested) and behaves just like listdir,
only that it sorts files chronologically, then alphabetically.

def listdir_chrono(dirpath):
import os
files_dict = dict()
for fname in os.listdir(dirpath):
mtime = os.stat(dirpath+os.sep+fname).st_mtime
if not mtime in files_dict:
files_dict[mtime] = list()
files_dict[mtime].append(fname)

mtimes = files_dict.keys()
mtimes.sort()
filenames = list()
for mtime in mtimes:
fnames = files_dict[mtime]
fnames.sort()
for fname in fnames:
filenames.append(fname)
return filenames

Wolfgang Draxinger

Four suggestions:

1) You might want to use os.path.join(dirpath, fname) instead of
dirpath+os.sep+fname.

2) You may be able to use glob.glob(<pattern>) to filter the files
more easily.

3) You didn't handle the possibility that there is s subdirectory
in the current directory. You need to check to make sure it is
a file you are processing as os.listdir() returns files AND
directories.

4) If you just put a tuple containing (mtime, filename) in a list
each time through the loop you can just sort that list at the
end it will be sorted by mtime and then alphabetically.

Example (not tested):

def listdir_chrono(dirpath):
import os
#
# Get a list of full pathnames for all the files in dirpath
# and exclude all the subdirectories. Note: This might be
# able to be replaced by glob.glob() to simplify. I would then
# add a second optional parameter: mask="" that would allow me
# to pass in a mask.
#
# List comprehensions are our friend when we are processing
# lists of things.
#
files=[os.path.join(dirpath, x) for x in os.listdir(dirpath)
if not os.path.isdir(os.path.join(dirpath, x)]

#
# Get a list of tuples that contain (mtime, filename) that
# I can sort.
#
flist=[(os.stat(x).st_mtime, x) for x in files]

#
# Sort them. Sort will sort on mtime, then on filename
#
flist.sort()
#
# Extract a list of the filenames only and return it
#
return [x[1] for x in flist]
#
# or if you only want the basenames of the files
#
#return [os.path.basename(x[1]) for x in flist]

-Larry Bates
And as in Peter Ottens glob.glob variation, this shortens considerably
by using sort with key instead of a separate list flist:

files.sort(key=lambda x:(os.stat(x).st_mtime, x))

Cheers,
Jussi
Feb 21 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

18
by: Matthias Kaeppler | last post by:
Hi, in my program, I have to sort containers of objects which can be 2000 items big in some cases. Since STL containers are based around copying and since I need to sort these containers quite...
17
by: Matt Kruse | last post by:
I'm looking for the best JS/CSS solution to add functionality to tables. The only browser which needs to be supported is IE5.5+, but no activeX can be used. to be able to do: - Fixed header row...
1
by: Sargas Atum | last post by:
Hi all, 1. I have a problem with cell selection in a table in a DataGrid. I dont want that anybody writes in the cells. That was not a problem I changed them to "read only", but if I am going...
25
by: Dan Stromberg | last post by:
Hi folks. Python appears to have a good sort method, but when sorting array elements that are very large, and hence have very expensive compares, is there some sort of already-available sort...
11
by: Paul Lautman | last post by:
I'm having some trouble understanding what is happening with some array sorting functions. In all cases, my compare function is: function compare($x, $y) { if ( $x == $y ) return 0; else if (...
3
by: Eric Capps | last post by:
I am trying to open a directory on a file server to populate a drop down menu. I've been able to do this, but the results are not sorted alphabetically. How would I go about this? I've looked at...
7
by: Kamal | last post by:
Hello all, I have a very simple html table with collapsible rows and sorting capabilities. The collapsible row is hidden with css rule (display:none). When one clicks in the left of the...
7
beacon
by: beacon | last post by:
I'm writing a program as an assignment that takes 5 sorting algorithms and and tests for the amount of time and the number of comparisons it takes to um, sort an array. I have run into some...
5
by: lemlimlee | last post by:
hello, this is the task i need to do: For this task, you are to develop a Java program that allows a user to search or sort an array of numbers using an algorithm that the user chooses. The...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.