Help | Site Map
Connecting Tech Pros Worldwide
 
 
LinkBack Thread Tools
  #1  
Old July 19th, 2005, 12:17 AM
andrea.gavana@agip.it
Guest
 
Posts: n/a
Default How To Do It Faster?!?

Hello NG,

in my application, I use os.walk() to walk on a BIG directory. I need
to retrieve the files, in each sub-directory, that are owned by a
particular user. Noting that I am on Windows (2000 or XP), this is what I
do:

for root, dirs, files in os.walk(MyBIGDirectory):

a = os.popen("dir /q /-c /a-d " + root).read().split()

# Retrieve all files owners
user = a[18::20]

# Retrieve all the last modification dates & hours
date = a[15::20]
hours = a[16::20]

# Retrieve all the filenames
name = a[19::20]

# Retrieve all the files sizes
size = a[17::20]

# Loop throu all files owners to see if they belong
# to that particular owner (a string)
for util in user:
if util.find(owner) >= 0:
DO SOME PROCESSING

Does anyone know if there is a faster way to do this job?

Thanks to you all.

Andrea.

------------------------------------------------------------------------------------------------------------------------------------------
Message for the recipient only, if received in error, please notify the
sender and read http://www.eni.it/disclaimer/


  #2  
Old July 19th, 2005, 12:17 AM
Aquila Deus
Guest
 
Posts: n/a
Default Re: How To Do It Faster?!?

andrea.gavana@agip.it wrote:[color=blue]
> Hello NG,
>
> in my application, I use os.walk() to walk on a BIG directory.[/color]
I need[color=blue]
> to retrieve the files, in each sub-directory, that are owned by a
> particular user. Noting that I am on Windows (2000 or XP), this is[/color]
what I[color=blue]
> do:
>
> for root, dirs, files in os.walk(MyBIGDirectory):
>
> a = os.popen("dir /q /-c /a-d " + root).read().split()
>
> # Retrieve all files owners
> user = a[18::20]
>
> # Retrieve all the last modification dates & hours
> date = a[15::20]
> hours = a[16::20]
>
> # Retrieve all the filenames
> name = a[19::20]
>
> # Retrieve all the files sizes
> size = a[17::20]
>
> # Loop throu all files owners to see if they belong
> # to that particular owner (a string)
> for util in user:
> if util.find(owner) >= 0:
> DO SOME PROCESSING
>
> Does anyone know if there is a faster way to do this job?[/color]

You may use "dir /s", which lists everything recursively.

  #3  
Old July 19th, 2005, 12:17 AM
Max Erickson
Guest
 
Posts: n/a
Default Re: How To Do It Faster?!?

I don't quite understand what your program is doing. The user=a[18::20]
looks really fragile/specific to a directory to me. Try something like
this:
[color=blue][color=green][color=darkred]
>>> a=os.popen("dir /s /q /-c /a-d " + root).read().splitlines()[/color][/color][/color]

Should give you the dir output split into lines, for every file below
root(notice that I added '/s' to the dir command). There will be some
extra lines in a that aren't about specific files...
[color=blue][color=green][color=darkred]
>>> a[0][/color][/color][/color]
' Volume in drive C has no label.'

but the files should be there.
[color=blue][color=green][color=darkred]
>>> len(a)[/color][/color][/color]
232

To get a list containing files owned by a specific user, do something
like:[color=blue][color=green][color=darkred]
>>> files=[line.split()[-1] for line in a if owner in line]
>>> len(files)[/color][/color][/color]
118

This is throwing away directory information, but using os.walk()
instead of the /s switch to dir should work, if you need it...

max

  #4  
Old July 19th, 2005, 12:20 AM
Jeremy Bowers
Guest
 
Posts: n/a
Default Re: How To Do It Faster?!?

On Thu, 31 Mar 2005 13:38:34 +0200, andrea.gavana wrote:
[color=blue]
> Hello NG,
>
> in my application, I use os.walk() to walk on a BIG directory. I
> need
> to retrieve the files, in each sub-directory, that are owned by a
> particular user. Noting that I am on Windows (2000 or XP), this is what I
> do:[/color]

You should *try* directly retrieving the relevant information from the OS,
instead of spawning a "dir" process. I have no idea how to do that and it
will probably require the win32 extensions for Python.

After that, you're done. Odds are you'll be disk bound. In fact, you may
get no gain if Windows is optimized enough that the process you describe
below is *still* disk-bound.

Your only hope then is two things:

* Poke around in the Windows API for a function that does what you want,
and hope it can do it faster due to being in the kernel.

* Somehow work this out to be lazy so it tries to grab what the user is
looking at, instead of absolutely everything. Whether or not this will
work depends on your application. If you post more information about how
you are using this data, I can try to help you. (I've had some experience
in this domain, but what is good heavily depends on what you are doing.
For instance, if you're batch processing a whole bunch of records after
the user gave a bulk command, there's not much you can do. But if they're
looking at something in a Windows Explorer-like tree view, there's a lot
you can do to improve responsiveness, even if you can't speed up the
process overall.)
 

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over network members.
Post your question now . . .
It's fast and it's free

Popular Articles