471,319 Members | 1,725 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,319 software developers and data experts.

File Search

Framework version : 1.1

For a given directory (which may have subdirectories), I need to identify
the number of text files (*.txt). For this I have tried recursive method to
search files, it works fine for the directory which has smaller size, but it
takes 2 or 3 minutes to search the directory of size 8GB (say “C:”). Is there
any other quicker method to identify whether the given file type (txt) is
available in the given directory, to avoid unnecessary sequential search in
recursive method.
Feb 14 '06 #1
5 8220
I guess one question would be: how long does it take Windows to perform the
same search? If it takes about the same time you're probably not doing too
much wrong.

Under 2.0 the GetFiles method can accept
System.IO.SearchOption.AllDirectories which recurses on your behalf, but to
be honest if all you want to do is count them I'm not even sure that this is
the best option, as this will end up returning a relatively big array; I
don't know (without trying) how optimised this is; it *might* still be your
best option to iterate through the directories calling GetFiles;

The following takes about 8 seconds to search my c: drive:

static void Main(string[] args)
{
Console.WriteLine(CheckDir(new DirectoryInfo("c:\\")));
}

static int counter = 0;

static int CheckDir(DirectoryInfo di)
{

int count = 0;
try
{ // watch out for permission denied ;-p
count+=di.GetFiles("*.txt").Length;
foreach(DirectoryInfo subDi in di.GetDirectories())
{
count += CheckDir(subDi);
}
}
catch {} // lazy
if(++counter%100 == 0) Console.WriteLine(counter); // just to see it is
working
return count;
}
Feb 14 '06 #2

"Dhans" <Dh***@discussions.microsoft.com> wrote in message
news:AB**********************************@microsof t.com...
Framework version : 1.1

For a given directory (which may have subdirectories), I need to identify
the number of text files (*.txt). For this I have tried recursive method
to
search files, it works fine for the directory which has smaller size, but
it
takes 2 or 3 minutes to search the directory of size 8GB (say "C:"). Is
there
any other quicker method to identify whether the given file type (txt) is
available in the given directory, to avoid unnecessary sequential search
in
recursive method.


It MIGHT be worth eliminating the recursion.

Start with a list of directories (proably just 1) and an (empty) list for
the txt files.
While the directory list is not empty
{
take first directory off the list and examine its content
append directories to directory list
append txt files to file list
}
Feb 14 '06 #3
The same occurred to me; the natural choice here would be a
Queue<DirectoryInfo>, which obviously doesn't exist in 1.1 (as per OP)...
however, timings indicate no appreciable difference in performance between
recursive functions and queueing (some variance both up and down on repeated
tests, but within the same range indicating HDD is the cause). Clearly the
file-system is being the slow dog. Recursion isn't necessarily a sensible
option for horrendous trees, so might be worth refactoring as per Nick's
suggestion.

I ran the tests outside of the debugger, which doubles the performance to
roughly 4.1s to scan my disk (over any implementation). My comparison also
highlighted that SearchOption.AllDirectories is not really a very good
option, as it breaks too easily with any permission denial (unless you are
sa, but of course we don't ever run as admin ;-p).

Code for 2.0 follows:

Queue<DirectoryInfo> queue = new Queue<DirectoryInfo>();
queue.Enqueue(di); // root of search
int files = 0;
while (queue.Count > 0) {
DirectoryInfo current = queue.Dequeue();
try { // watch out for permission denied ;-p
files += current.GetFiles(pattern).Length; // or put
into a List<FileInfo> or something
foreach (DirectoryInfo subDir in
current.GetDirectories()) {
queue.Enqueue(subDir);
}
} catch { } // lazy
}
return files;

Marc
Feb 14 '06 #4
"Marc Gravell" wrote:
I guess one question would be: how long does it take Windows to perform the
same search? If it takes about the same time you're probably not doing too
much wrong.
More or less mysearch take same time duration for a search as windows
takes.
Under 2.0 the GetFiles method can accept
System.IO.SearchOption.AllDirectories which recurses on your behalf, but to
be honest if all you want to do is count them I'm not even sure that this is
the best option, as this will end up returning a relatively big array;


No, I want the file names (fullpath) which matches the search criteria.
Feb 14 '06 #5
Ahh; you mislead me by saying "number of"... but never mind:

Try this; for me under 1.1 this takes 4 seconds to return the 6000+ dll
files on my c: drive (not including UI time to display them) - about 1/4 of
the Windows search time (use different command-line params to select the
root and pattern); what timings do you get with this? How many txt files /
folders are we talking? If the numbers are *very* high, then resizing the
array might be sucking some cycles, in which case eventing or custom
iterators might help...
using System;
using System.IO;
using System.Collections;

namespace ConsoleApplication3
{
/// <summary>
/// Summary description for Class1.
/// </summary>
class Program
{
/// <summary>
/// The main entry point for the application.
/// </summary>
[STAThread]
static int Main(string[] args)
{
try
{
DateTime start = DateTime.Now;
FileInfo[] files =GetFiles(args[0], args[1]);
DateTime stop = DateTime.Now; // stop now as have results
foreach(FileInfo file in files)
Console.WriteLine(file.FullName);
Console.WriteLine(files.Length);
Console.WriteLine(stop.Subtract(start).TotalMillis econds);
return 0;
}
catch (Exception e)
{
Console.WriteLine(e);
return -1;
}

}

static FileInfo[] GetFiles(string path, string pattern)
{
ArrayList queue = new ArrayList(), files = new ArrayList();
queue.Add(new DirectoryInfo(path));
while(queue.Count>0)
{
DirectoryInfo dir = (DirectoryInfo) queue[0];
queue.RemoveAt(0);

try // watch out for permission denied ;-p
{
files.AddRange(dir.GetFiles(pattern));
queue.AddRange(dir.GetDirectories());
}
catch {} // lazy
}
return (FileInfo[]) files.ToArray(typeof(FileInfo));
}
}
}
Feb 14 '06 #6

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

14 posts views Thread by Frances Del Rio | last post: by
4 posts views Thread by Nikos | last post: by
4 posts views Thread by Dameon | last post: by
7 posts views Thread by ianenis.tiryaki | last post: by
75 posts views Thread by ume$h | last post: by
3 posts views Thread by Ahmad Jalil Qarshi | last post: by
16 posts views Thread by vizzz | last post: by
reply views Thread by rosydwin | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.