473,626 Members | 3,285 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Sorting directory contents

H folks,

I got, hmm not really a problem, more a question of elegance:

In a current project I have to read in some files in a given
directory in chronological order, so that I can concatenate the
contents in those files into a new one (it's XML and I have to
concatenate some subelements, about 4 levels below the root
element). It all works, but somehow I got the feeling, that my
solution is not as elegant as it could be:

src_file_paths = dict()
for fname in os.listdir(sour cedir):
fpath = sourcedir+os.se p+fname
if not match_fname_pat tern(fname): continue
src_file_paths[os.stat(fpath). st_mtime] = fpath
for ftime in src_file_paths. keys().sort():
read_and_concat enate(src_file_ paths[ftime])

of course listdir and sorting could be done in a separate
function, but I wonder if there was a more elegant approach.

Wolfgang Draxinger
--
E-Mail address works, Jabber: he******@jabber .org, ICQ: 134682867

Feb 20 '07 #1
9 2049
Wolfgang Draxinger kirjoitti:
H folks,

I got, hmm not really a problem, more a question of elegance:

In a current project I have to read in some files in a given
directory in chronological order, so that I can concatenate the
contents in those files into a new one (it's XML and I have to
concatenate some subelements, about 4 levels below the root
element). It all works, but somehow I got the feeling, that my
solution is not as elegant as it could be:

src_file_paths = dict()
for fname in os.listdir(sour cedir):
fpath = sourcedir+os.se p+fname
if not match_fname_pat tern(fname): continue
src_file_paths[os.stat(fpath). st_mtime] = fpath
for ftime in src_file_paths. keys().sort():
read_and_concat enate(src_file_ paths[ftime])

of course listdir and sorting could be done in a separate
function, but I wonder if there was a more elegant approach.

Wolfgang Draxinger
I'm not claiming the following to be more elegant, but I would do it
like this (not tested!):

src_file_paths = dict()
prefix = sourcedir + os.sep
for fname in os.listdir(sour cedir):
if match_fname_pat tern(fname):
fpath = prefix + fname
src_file_paths[os.stat(fpath). st_mtime] = fpath
for ftime in src_file_paths. keys().sort():
read_and_concat enate(src_file_ paths[ftime])
Cheers,
Jussi
Feb 20 '07 #2
Jussi Salmela wrote:
I'm not claiming the following to be more elegant, but I would
do it like this (not tested!):

src_file_paths = dict()
prefix = sourcedir + os.sep
for fname in os.listdir(sour cedir):
if match_fname_pat tern(fname):
fpath = prefix + fname
src_file_paths[os.stat(fpath). st_mtime] = fpath
for ftime in src_file_paths. keys().sort():
read_and_concat enate(src_file_ paths[ftime])
Well, both versions, mine and yours won't work as it was written
down, as they neglegt the fact, that different files can have
the same st_mtime and that <listtype>.sort () doesn't return a
sorted list.

However this code works (tested) and behaves just like listdir,
only that it sorts files chronologically , then alphabetically.

def listdir_chrono( dirpath):
import os
files_dict = dict()
for fname in os.listdir(dirp ath):
mtime = os.stat(dirpath +os.sep+fname). st_mtime
if not mtime in files_dict:
files_dict[mtime] = list()
files_dict[mtime].append(fname)

mtimes = files_dict.keys ()
mtimes.sort()
filenames = list()
for mtime in mtimes:
fnames = files_dict[mtime]
fnames.sort()
for fname in fnames:
filenames.appen d(fname)
return filenames

Wolfgang Draxinger
--
E-Mail address works, Jabber: he******@jabber .org, ICQ: 134682867

Feb 20 '07 #3
Wolfgang Draxinger wrote:
Jussi Salmela wrote:
>I'm not claiming the following to be more elegant, but I would
do it like this (not tested!):

src_file_pat hs = dict()
prefix = sourcedir + os.sep
for fname in os.listdir(sour cedir):
if match_fname_pat tern(fname):
fpath = prefix + fname
src_file_paths[os.stat(fpath). st_mtime] = fpath
for ftime in src_file_paths. keys().sort():
read_and_concat enate(src_file_ paths[ftime])

Well, both versions, mine and yours won't work as it was written
down, as they neglegt the fact, that different files can have
the same st_mtime and that <listtype>.sort () doesn't return a
sorted list.

However this code works (tested) and behaves just like listdir,
only that it sorts files chronologically , then alphabetically.

def listdir_chrono( dirpath):
import os
files_dict = dict()
for fname in os.listdir(dirp ath):
mtime = os.stat(dirpath +os.sep+fname). st_mtime
if not mtime in files_dict:
files_dict[mtime] = list()
files_dict[mtime].append(fname)

mtimes = files_dict.keys ()
mtimes.sort()
filenames = list()
for mtime in mtimes:
fnames = files_dict[mtime]
fnames.sort()
for fname in fnames:
filenames.appen d(fname)
return filenames

Wolfgang Draxinger
Four suggestions:

1) You might want to use os.path.join(di rpath, fname) instead of
dirpath+os.sep+ fname.

2) You may be able to use glob.glob(<patt ern>) to filter the files
more easily.

3) You didn't handle the possibility that there is s subdirectory
in the current directory. You need to check to make sure it is
a file you are processing as os.listdir() returns files AND
directories.

4) If you just put a tuple containing (mtime, filename) in a list
each time through the loop you can just sort that list at the
end it will be sorted by mtime and then alphabetically.

Example (not tested):

def listdir_chrono( dirpath):
import os
#
# Get a list of full pathnames for all the files in dirpath
# and exclude all the subdirectories. Note: This might be
# able to be replaced by glob.glob() to simplify. I would then
# add a second optional parameter: mask="" that would allow me
# to pass in a mask.
#
# List comprehensions are our friend when we are processing
# lists of things.
#
files=[os.path.join(di rpath, x) for x in os.listdir(dirp ath)
if not os.path.isdir(o s.path.join(dir path, x)]

#
# Get a list of tuples that contain (mtime, filename) that
# I can sort.
#
flist=[(os.stat(x).st_ mtime, x) for x in files]

#
# Sort them. Sort will sort on mtime, then on filename
#
flist.sort()
#
# Extract a list of the filenames only and return it
#
return [x[1] for x in flist]
#
# or if you only want the basenames of the files
#
#return [os.path.basenam e(x[1]) for x in flist]

-Larry Bates

Feb 20 '07 #4
Wolfgang Draxinger <wd********@dar kstargames.dewr ites:
src_file_paths = dict()
for fname in os.listdir(sour cedir):
fpath = sourcedir+os.se p+fname
if not match_fname_pat tern(fname): continue
src_file_paths[os.stat(fpath). st_mtime] = fpath
for ftime in src_file_paths. keys().sort():
read_and_concat enate(src_file_ paths[ftime])
Note you have to used sorted() and not .sort() to get back a value
that you can iterate through.

Untested:

from itertools import ifilter

goodfiles = ifilter(match_f name_pattern,
(sourcedir+os.s ep+fname for \
fname in os.listdir(sour cedir))

for f,t in sorted((fname,o s.stat(f).st_mt ime) for fname in goodfiles,
key=lambda (fname,ftime): ftime):
read_and_concat enate(f)

If you're a lambda-phobe you can use operator.itemge tter(1) instead of
the lambda. Obviously you don't need the separate goodfiles variable
but things get a bit deeply nested without it.
Feb 20 '07 #5
Wolfgang Draxinger kirjoitti:
Jussi Salmela wrote:
>I'm not claiming the following to be more elegant, but I would
do it like this (not tested!):

src_file_pat hs = dict()
prefix = sourcedir + os.sep
for fname in os.listdir(sour cedir):
if match_fname_pat tern(fname):
fpath = prefix + fname
src_file_paths[os.stat(fpath). st_mtime] = fpath
for ftime in src_file_paths. keys().sort():
read_and_concat enate(src_file_ paths[ftime])

Well, both versions, mine and yours won't work as it was written
down, as they neglegt the fact, that different files can have
the same st_mtime and that <listtype>.sort () doesn't return a
sorted list.

However this code works (tested) and behaves just like listdir,
only that it sorts files chronologically , then alphabetically.

def listdir_chrono( dirpath):
import os
files_dict = dict()
for fname in os.listdir(dirp ath):
mtime = os.stat(dirpath +os.sep+fname). st_mtime
if not mtime in files_dict:
files_dict[mtime] = list()
files_dict[mtime].append(fname)

mtimes = files_dict.keys ()
mtimes.sort()
filenames = list()
for mtime in mtimes:
fnames = files_dict[mtime]
fnames.sort()
for fname in fnames:
filenames.appen d(fname)
return filenames

Wolfgang Draxinger
More elegant or not ... I did it MY WAYYYY!!! (and tested this time
really carefully;)):

#-------------------------------
def listdir_chrono_ 2(dirpath):
import os
files_dict = {}
prefix = dirpath + os.sep
for fname in os.listdir(dirp ath):
mtime = os.stat(prefix + fname).st_mtime
files_dict.setd efault(mtime, []).append(fname)

mtimes = sorted(files_di ct.keys())
filenames = []
for mtime in mtimes:
filenames += sorted(files_di ct[mtime])
return filenames

firstLst = listdir_chrono( '.')
secondLst = listdir_chrono_ 2('.')
if firstLst == secondLst: print 'OK'
else: print 'ERROR!!!'
#-------------------------------

I keep taking the "dirpath + os.sep" part out of the loop because it is
a loop invariant and doesn't have to be inside the loop.

Cheer
Feb 20 '07 #6

Wolfgang Draxinger wrote:
However this code works (tested) and behaves just like listdir,
only that it sorts files chronologically , then alphabetically.

def listdir_chrono( dirpath):
import os
files_dict = dict()
for fname in os.listdir(dirp ath):
mtime = os.stat(dirpath +os.sep+fname). st_mtime
if not mtime in files_dict:
files_dict[mtime] = list()
files_dict[mtime].append(fname)

mtimes = files_dict.keys ()
mtimes.sort()
filenames = list()
for mtime in mtimes:
fnames = files_dict[mtime]
fnames.sort()
for fname in fnames:
filenames.appen d(fname)
return filenames
Using the builtin functions `sorted`, `filter` and the `setdefault`
method of dictionary could a little shorten your code:

def listdir_chrono( dirpath):
import os
files_dict = {}
for fname in filter(os.path. isfile, os.listdir(dirp ath)):
mtime = os.stat(os.path .join(dirpath, fname)).st_mtim e
files_dict.setd efault(mtime, []).append(fname)

filenames = []
for mtime in sorted(files_di ct):
for fname in sorted(files_di ct[mtime]):
filenames.appen d(fname)
return filenames

--
HTH,
Rob

Feb 20 '07 #7
Larry Bates wrote:
3) You didn't handle the possibility that there is s
subdirectory
in the current directory. You need to check to make sure it
is a file you are processing as os.listdir() returns files
AND directories.
Well, the directory the files are in is not supposed to have any
subdirectories.
4) If you just put a tuple containing (mtime, filename) in a
list
each time through the loop you can just sort that list at
the end it will be sorted by mtime and then alphabetically.
Ah, of course. Hmm, seems I was short of caffeine when I hacked
my code :-P
def listdir_chrono( dirpath):
import os
#
# Get a list of full pathnames for all the files in dirpath
# and exclude all the subdirectories. Note: This might be
# able to be replaced by glob.glob() to simplify. I would
# then add a second optional parameter: mask="" that would
# allow me to pass in a mask.
#
# List comprehensions are our friend when we are processing
# lists of things.
#
files=[os.path.join(di rpath, x) for x in
os.listdir(dirp ath)
if not os.path.isdir(o s.path.join(dir path, x)]

#
# Get a list of tuples that contain (mtime, filename) that
# I can sort.
#
flist=[(os.stat(x).st_ mtime, x) for x in files]

#
# Sort them. Sort will sort on mtime, then on filename
#
flist.sort()
#
# Extract a list of the filenames only and return it
#
return [x[1] for x in flist]
#
# or if you only want the basenames of the files
#
#return [os.path.basenam e(x[1]) for x in flist]
Now, THAT is elegant.

Wolfgang Draxinger
--
E-Mail address works, Jabber: he******@jabber .org, ICQ: 134682867

Feb 20 '07 #8
Wolfgang Draxinger wrote:
I got, hmm not really a problem, more a question of elegance:

In a current project I have to read in some files in a given
directory in chronological order, so that I can concatenate the
contents in those files into a new one (it's XML and I have to
concatenate some subelements, about 4 levels below the root
element). It all works, but somehow I got the feeling, that my
solution is not as elegant as it could be:

src_file_paths = dict()
for fname in os.listdir(sour cedir):
fpath = sourcedir+os.se p+fname
if not match_fname_pat tern(fname): continue
src_file_paths[os.stat(fpath). st_mtime] = fpath
for ftime in src_file_paths. keys().sort():
read_and_concat enate(src_file_ paths[ftime])

of course listdir and sorting could be done in a separate
function, but I wonder if there was a more elegant approach.
If glob.glob() is good enough to replace your custom match_fname_pat tern()
you can save a few steps:

pattern = os.path.join(so urcedir, "*.xml")
files = glob.glob(patte rn)
files.sort(key= os.path.getmtim e)
for fn in files:
read_and_concat enate(fn)

Peter
Feb 21 '07 #9
Larry Bates kirjoitti:
Wolfgang Draxinger wrote:
>Jussi Salmela wrote:
>>I'm not claiming the following to be more elegant, but I would
do it like this (not tested!):

src_file_path s = dict()
prefix = sourcedir + os.sep
for fname in os.listdir(sour cedir):
if match_fname_pat tern(fname):
fpath = prefix + fname
src_file_paths[os.stat(fpath). st_mtime] = fpath
for ftime in src_file_paths. keys().sort():
read_and_concat enate(src_file_ paths[ftime])
Well, both versions, mine and yours won't work as it was written
down, as they neglegt the fact, that different files can have
the same st_mtime and that <listtype>.sort () doesn't return a
sorted list.

However this code works (tested) and behaves just like listdir,
only that it sorts files chronologically , then alphabetically.

def listdir_chrono( dirpath):
import os
files_dict = dict()
for fname in os.listdir(dirp ath):
mtime = os.stat(dirpath +os.sep+fname). st_mtime
if not mtime in files_dict:
files_dict[mtime] = list()
files_dict[mtime].append(fname)

mtimes = files_dict.keys ()
mtimes.sort()
filenames = list()
for mtime in mtimes:
fnames = files_dict[mtime]
fnames.sort()
for fname in fnames:
filenames.appen d(fname)
return filenames

Wolfgang Draxinger

Four suggestions:

1) You might want to use os.path.join(di rpath, fname) instead of
dirpath+os.sep+ fname.

2) You may be able to use glob.glob(<patt ern>) to filter the files
more easily.

3) You didn't handle the possibility that there is s subdirectory
in the current directory. You need to check to make sure it is
a file you are processing as os.listdir() returns files AND
directories.

4) If you just put a tuple containing (mtime, filename) in a list
each time through the loop you can just sort that list at the
end it will be sorted by mtime and then alphabetically.

Example (not tested):

def listdir_chrono( dirpath):
import os
#
# Get a list of full pathnames for all the files in dirpath
# and exclude all the subdirectories. Note: This might be
# able to be replaced by glob.glob() to simplify. I would then
# add a second optional parameter: mask="" that would allow me
# to pass in a mask.
#
# List comprehensions are our friend when we are processing
# lists of things.
#
files=[os.path.join(di rpath, x) for x in os.listdir(dirp ath)
if not os.path.isdir(o s.path.join(dir path, x)]

#
# Get a list of tuples that contain (mtime, filename) that
# I can sort.
#
flist=[(os.stat(x).st_ mtime, x) for x in files]

#
# Sort them. Sort will sort on mtime, then on filename
#
flist.sort()
#
# Extract a list of the filenames only and return it
#
return [x[1] for x in flist]
#
# or if you only want the basenames of the files
#
#return [os.path.basenam e(x[1]) for x in flist]

-Larry Bates
And as in Peter Ottens glob.glob variation, this shortens considerably
by using sort with key instead of a separate list flist:

files.sort(key= lambda x:(os.stat(x).s t_mtime, x))

Cheers,
Jussi
Feb 21 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

18
3042
by: Matthias Kaeppler | last post by:
Hi, in my program, I have to sort containers of objects which can be 2000 items big in some cases. Since STL containers are based around copying and since I need to sort these containers quite frequently, I thought it'd be a better idea to manage additional containers which are initialized with pointers to the objects in the primary containers and sort those (only pointers have to be copied around then). However, that also means if I...
17
3676
by: Matt Kruse | last post by:
I'm looking for the best JS/CSS solution to add functionality to tables. The only browser which needs to be supported is IE5.5+, but no activeX can be used. to be able to do: - Fixed header row - Data sorting (preferrably intelligently determining data type based on content) - If possible, locking the first column from scrolling
1
2057
by: Sargas Atum | last post by:
Hi all, 1. I have a problem with cell selection in a table in a DataGrid. I dont want that anybody writes in the cells. That was not a problem I changed them to "read only", but if I am going to scroll the table up and down or from right to the left the content of the active cell will be selected as if for copy/paste. How can I deactivate such a behaviour?
25
2208
by: Dan Stromberg | last post by:
Hi folks. Python appears to have a good sort method, but when sorting array elements that are very large, and hence have very expensive compares, is there some sort of already-available sort function that will merge like elements into a chain, so that they won't have to be recompared as many times? Thanks!
11
1951
by: Paul Lautman | last post by:
I'm having some trouble understanding what is happening with some array sorting functions. In all cases, my compare function is: function compare($x, $y) { if ( $x == $y ) return 0; else if ( $x < $y ) return -1; else return 1;
3
1557
by: Eric Capps | last post by:
I am trying to open a directory on a file server to populate a drop down menu. I've been able to do this, but the results are not sorted alphabetically. How would I go about this? I've looked at the php.net site on directory functions, but haven't found a solution that works. The sort() function seemed like it should, but it instead returned an error: sort() expects parameter 1 to be array, resource given. Any help would be appreciated.
7
4811
by: Kamal | last post by:
Hello all, I have a very simple html table with collapsible rows and sorting capabilities. The collapsible row is hidden with css rule (display:none). When one clicks in the left of the expandable row, the hidden row is made visible with css. The problem is when i sort the rows, the hidden rows get sorted as well which i don't want and want to be moved (while sorting) relative to their parent rows. The following is my complete html code...
7
2529
beacon
by: beacon | last post by:
I'm writing a program as an assignment that takes 5 sorting algorithms and and tests for the amount of time and the number of comparisons it takes to um, sort an array. I have run into some trouble though. On lines 54-59, I previously had them placed from line 46 on. This worked out great for printing just the bubble sort, but I have to get the other four sorting algorithms in here and I was hoping to print the contents out in a for loop, as...
5
3177
by: lemlimlee | last post by:
hello, this is the task i need to do: For this task, you are to develop a Java program that allows a user to search or sort an array of numbers using an algorithm that the user chooses. The search algorithms that can be used are Linear Search and Binary Search. The sorting algorithms are bubble, selection and Insertion sort. First, the user is asked whether he/she wants to perform a search option, a sort operation, or exit the program. If...
0
8262
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8196
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8701
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8637
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8502
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
5571
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4090
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4196
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2623
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.