473,569 Members | 2,598 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

FileSystem search algorithm needed!

Hi

What is the most efficient way to gather a list of files
under a given folder recursively with the ability to
specify exclusion file masks?

For example, say I wanted to get a list of all files
under \Program Files and it's subdirectories that meet
the following criteria:

Include *.exe recursively
Include *.dll recursively
Exclude Common Files\*.* recursively

Ie I want all exe and dlls under Program Files and
subdirectorys, with the exception of any under \Program
files\Common Files.

At the moment, I'm gathering all included files into an
ArrayList and all excluded into another, and then
NAND'ing the two lists. This is turning out to be very
slow.

Any guidance greatfully received.

Ben Fidge
Nov 15 '05 #1
4 7097

"Ben Fidge" <be*******@btop enworld.com> wrote in message
news:09******** *************** *****@phx.gbl.. .
Hi

What is the most efficient way to gather a list of files
under a given folder recursively with the ability to
specify exclusion file masks?

For example, say I wanted to get a list of all files
under \Program Files and it's subdirectories that meet
the following criteria:

Include *.exe recursively
Include *.dll recursively
Exclude Common Files\*.* recursively

Ie I want all exe and dlls under Program Files and
subdirectorys, with the exception of any under \Program
files\Common Files.

At the moment, I'm gathering all included files into an
ArrayList and all excluded into another, and then
NAND'ing the two lists. This is turning out to be very
slow.

Any guidance greatfully received.
This is a fairly quick technique(timin g was 04.6250000 seconds on my machine
without display, thats pretty much on par with cmd.exe which was ~6.5 secs
including listing):

string[] LoadFiles(strin g searchDirectory )
{
ArrayList fileList = new ArrayList();
ArrayList ignoreList = new ArrayList();
ignoreList.Add( searchDirectory + "\\Common Files");

string[] directorys =
System.IO.Direc tory.GetDirecto ries(searchDire ctory);
string[] files;
foreach (string directory in directorys)
{
if (!ignoreList.Co ntains(director y))
{
files = LoadFiles(direc tory);
fileList.AddRan ge(files);
}
}
files = System.IO.Direc tory.GetFiles(s earchDirectory, "*.exe");
fileList.AddRan ge(files);
files = System.IO.Direc tory.GetFiles(s earchDirectory, "*.dll");
fileList.AddRan ge(files);

return (string[])fileList.ToArr ay(typeof(strin g));
}

For a more reusable method, you may want to push ignoreList out as a
parameter(or a class member, depending on the situation) as well as perhaps
providing an extension list instead of hardcoding them.
Ben Fidge

Nov 15 '05 #2
Daniele,

Thanks for the help. I should have specified my criteria
a little more clearly.

Basically, I've written a small backup utility that
allows the backing up of a web-sites content via a Web
Service running on that site.

The site is large (100,000+ aspx, xml, xsl etc files) and
the web service is taking too long to compile the list of
files to download.

The backup utility needs to specify filemasks for
including and excluding in the final result set. So,
hypothetically speaking, a backup set could be as follwos:

include inetpub\wwwroot \*.aspx recursively
include inetpub\wwwroot \*.xml recursively
include inetpub\wwwroot \*.xsl recursively
exclude inetpub\wwwroot \data\*.xml
exclude inetpub\wwwroot \data\savepage. aspx

Your method is certainly very fast, but is it possible to
tailor it for unlimited arbitrary include and exclude
filemasks?

The method I currently use is taking about 90 seconds to
execute on our web server and the Web Service call is
timing out, hence the need for the most efficient file
gathering algorithm.

Thanks for your time, you'll be on my Christmas Card list
if you can solve this one!!

Ben Fidge

-----Original Message-----

"Ben Fidge" <be*******@btop enworld.com> wrote in message
news:09******* *************** ******@phx.gbl. ..
Hi

What is the most efficient way to gather a list of files under a given folder recursively with the ability to
specify exclusion file masks?

For example, say I wanted to get a list of all files
under \Program Files and it's subdirectories that meet
the following criteria:

Include *.exe recursively
Include *.dll recursively
Exclude Common Files\*.* recursively

Ie I want all exe and dlls under Program Files and
subdirectorys, with the exception of any under \Program
files\Common Files.

At the moment, I'm gathering all included files into an
ArrayList and all excluded into another, and then
NAND'ing the two lists. This is turning out to be very
slow.

Any guidance greatfully received.
This is a fairly quick technique(timin g was 04.6250000

seconds on my machinewithout display, thats pretty much on par with cmd.exe which was ~6.5 secsincluding listing):

string[] LoadFiles(strin g searchDirectory )
{
ArrayList fileList = new ArrayList();
ArrayList ignoreList = new ArrayList();
ignoreList.Add( searchDirectory + "\\Common Files");

string[] directorys =
System.IO.Dire ctory.GetDirect ories(searchDir ectory);
string[] files;
foreach (string directory in directorys)
{
if (!ignoreList.Co ntains(director y))
{
files = LoadFiles(direc tory);
fileList.AddRan ge(files);
}
}
files = System.IO.Direc tory.GetFiles (searchDirector y,"*.exe"); fileList.AddRan ge(files);
files = System.IO.Direc tory.GetFiles (searchDirector y,"*.dll"); fileList.AddRan ge(files);

return (string[])fileList.ToArr ay(typeof(strin g));
}

For a more reusable method, you may want to push ignoreList out as aparameter(or a class member, depending on the situation) as well as perhapsproviding an extension list instead of hardcoding them.

Ben Fidge

.

Nov 15 '05 #3

"Ben Fidge" <be*******@btop enworld.com> wrote in message
news:08******** *************** *****@phx.gbl.. .
Daniele,

Thanks for the help. I should have specified my criteria
a little more clearly.

Basically, I've written a small backup utility that
allows the backing up of a web-sites content via a Web
Service running on that site.

The site is large (100,000+ aspx, xml, xsl etc files) and
the web service is taking too long to compile the list of
files to download.

The backup utility needs to specify filemasks for
including and excluding in the final result set. So,
hypothetically speaking, a backup set could be as follwos:

include inetpub\wwwroot \*.aspx recursively
include inetpub\wwwroot \*.xml recursively
include inetpub\wwwroot \*.xsl recursively
exclude inetpub\wwwroot \data\*.xml
exclude inetpub\wwwroot \data\savepage. aspx

Your method is certainly very fast, but is it possible to
tailor it for unlimited arbitrary include and exclude
filemasks?

The method I currently use is taking about 90 seconds to
execute on our web server and the Web Service call is
timing out, hence the need for the most efficient file
gathering algorithm.

Thanks for your time, you'll be on my Christmas Card list
if you can solve this one!!

Ben Fidge


I'm not to fond of this, but it seems to work to an extent. This code runs
across my C: drive(80k files) in 12 seconds. Perhaps filtering out to scan
each folder one time and using regex to match both includes and excludes
will be faster, but I don't have any more time to spend on this today, I
hope it helps:

string[] GetDirectories( string startingPoint, params string[]
searchPatterns)
{
ArrayList directories = new ArrayList();
if (startingPoint == "c:\\System Volume Information")
return (string[])directories.To Array(typeof(st ring));
foreach (string directory in Directory.GetDi rectories(start ingPoint))
{
string[] results = GetDirectories( directory,searc hPatterns);
if (directory == "c:\\System Volume Information")
continue;
foreach (string searchPattern in searchPatterns)
directories.Add (directory + "\\" + searchPattern);
directories.Add Range(results);
}
return (string[])directories.To Array(typeof(st ring));
}
/// <summary>
/// Returns a fileset based on a list of include and exclude patterns
/// </summary>
/// <param name="include"> An IList of wildcard patterns to include</param>
/// <param name="exclude"> An IList of wildcard patterns to exclude</param>
/// <returns>An array of strings containing the resultant set</returns>
string[] LoadFiles(IList include, IList exclude)
{
ArrayList files = new ArrayList();
PrepareFilter(e xclude);
foreach (string mask in include)
{
string directory = Path.GetDirecto ryName(mask);
string fileMask = Path.GetFileNam e(mask);
string[] filesFound = Directory.GetFi les(directory,f ileMask);
filesFound = Filter(filesFou nd,exclude);
files.AddRange( filesFound);
}

return (string[])files.ToArray( typeof(string)) ;
}
/// <summary>
/// Filters out files based on a list of exclusions
/// </summary>
/// <param name="files">Th e files to filter</param>
/// <param name="exclude"> An IList of patterns to use for exclusion
matching</param>
/// <returns>An array of strings that passed the filter</returns>
string[] Filter(string[] files, IList exclude)
{
ArrayList results = new ArrayList();
foreach (string result in files)
{
bool include = true;
foreach (string filter in exclude)
{

Regex regex;
regex = regexCache[filter] as Regex;

if (regex == null)
{
regex = PrepareFilter(f ilter,WildcardT oRegex(filter)) ;
}
if (regex.IsMatch( result))
{
include = false;
break;
}
}
if (include)
results.Add(res ult);
}
return (string[])results.ToArra y(typeof(string ));
}
/// <summary>
/// Prepares filtration by compiling and caching Regex statements
/// </summary>
/// <param name="exclude"> A list of Wildcard patterns to prepare</param>
void PrepareFilter(I List patterns)
{
foreach (string path in patterns)
{
string result = WildcardToRegex (path);
PrepareFilter(p ath,result);
}
}
/// <summary>
/// Prepares and adds a single regex pattern to the regex cache
/// </summary>
/// <param name="name">the pattern name</param>
/// <param name="pattern"> The pattern</param>
/// <returns>A Regex object to perform filtration for that
pattern</returns>
Regex PrepareFilter(s tring name, string pattern)
{
if (regexCache.Con tains(name))
return regexCache[name] as Regex;
Regex regex = new Regex(pattern,R egexOptions.Com piled |
RegexOptions.Ig noreCase );
PrepareFilter(n ame,regex);
return regex;
}
/// <summary>
/// Adds a Regex to the regexcache.
/// </summary>
/// <param name="name">The regex name</param>
/// <param name="regex">Th e Regex object to cache</param>
/// <returns>the Regex passed in as regex, this is done simply to provide
return type compatability with PrepareFilter(s tring,string)</returns>
Regex PrepareFilter(s tring name, Regex regex)
{
if (regexCache.Con tains(name))
return regexCache[name] as Regex;
regexCache.Add( name,regex);
return regex;
}
/// <summary>
/// Converts dos wildcard(*) string into regex in a generic manner
/// </summary>
/// <param name="pattern"> The pattern to convert</param>
/// <returns>The converted regex string.</returns>
/// <remarks>
/// This method does not handle ? in patterns, if this support is needed
it should be added here.
/// </remarks>
string WildcardToRegex (string pattern)
{
pattern = pattern.Replace (@"\",@"\\");
pattern = pattern.Replace ("^",@"\^");
pattern = pattern.Replace (".",@"\.");
pattern = pattern.Replace ("*",".*");
pattern = pattern + "$";
return pattern;
}
IDictionary regexCache = new Hashtable();
Nov 15 '05 #4
Daniel,

Thanks for your time, I will give this a go.

Merry Christmas,

Ben
-----Original Message-----

"Ben Fidge" <be*******@btop enworld.com> wrote in message
news:08******* *************** ******@phx.gbl. ..
Daniele,

Thanks for the help. I should have specified my criteria a little more clearly.

Basically, I've written a small backup utility that
allows the backing up of a web-sites content via a Web
Service running on that site.

The site is large (100,000+ aspx, xml, xsl etc files) and the web service is taking too long to compile the list of files to download.

The backup utility needs to specify filemasks for
including and excluding in the final result set. So,
hypothetically speaking, a backup set could be as follwos:
include inetpub\wwwroot \*.aspx recursively
include inetpub\wwwroot \*.xml recursively
include inetpub\wwwroot \*.xsl recursively
exclude inetpub\wwwroot \data\*.xml
exclude inetpub\wwwroot \data\savepage. aspx

Your method is certainly very fast, but is it possible to tailor it for unlimited arbitrary include and exclude
filemasks?

The method I currently use is taking about 90 seconds to execute on our web server and the Web Service call is
timing out, hence the need for the most efficient file
gathering algorithm.

Thanks for your time, you'll be on my Christmas Card list if you can solve this one!!

Ben Fidge

I'm not to fond of this, but it seems to work to an

extent. This code runsacross my C: drive(80k files) in 12 seconds. Perhaps filtering out to scaneach folder one time and using regex to match both includes and excludeswill be faster, but I don't have any more time to spend on this today, Ihope it helps:

string[] GetDirectories( string startingPoint, params string[]searchPatterns )
{
ArrayList directories = new ArrayList();
if (startingPoint == "c:\\System Volume Information")
return (string[])directories.To Array(typeof(st ring));
foreach (string directory in Directory.GetDi rectories (startingPoint) ) {
string[] results = GetDirectories (directory,sear chPatterns); if (directory == "c:\\System Volume Information")
continue;
foreach (string searchPattern in searchPatterns)
directories.Add (directory + "\\" + searchPattern);
directories.Add Range(results);
}
return (string[])directories.To Array(typeof(st ring));
}
/// <summary>
/// Returns a fileset based on a list of include and exclude patterns /// </summary>
/// <param name="include"> An IList of wildcard patterns to include</param> /// <param name="exclude"> An IList of wildcard patterns to exclude</param> /// <returns>An array of strings containing the resultant set</returns> string[] LoadFiles(IList include, IList exclude)
{
ArrayList files = new ArrayList();
PrepareFilter(e xclude);
foreach (string mask in include)
{
string directory = Path.GetDirecto ryName(mask);
string fileMask = Path.GetFileNam e(mask);
string[] filesFound = Directory.GetFi les (directory,file Mask); filesFound = Filter(filesFou nd,exclude);
files.AddRange( filesFound);
}

return (string[])files.ToArray( typeof(string)) ;
}
/// <summary>
/// Filters out files based on a list of exclusions
/// </summary>
/// <param name="files">Th e files to filter</param>
/// <param name="exclude"> An IList of patterns to use for exclusionmatching</param>
/// <returns>An array of strings that passed the filter</returns> string[] Filter(string[] files, IList exclude)
{
ArrayList results = new ArrayList();
foreach (string result in files)
{
bool include = true;
foreach (string filter in exclude)
{

Regex regex;
regex = regexCache[filter] as Regex;

if (regex == null)
{
regex = PrepareFilter(f ilter,WildcardT oRegex (filter)); }
if (regex.IsMatch( result))
{
include = false;
break;
}
}
if (include)
results.Add(res ult);
}
return (string[])results.ToArra y(typeof(string ));
}
/// <summary>
/// Prepares filtration by compiling and caching Regex statements /// </summary>
/// <param name="exclude"> A list of Wildcard patterns to prepare</param> void PrepareFilter(I List patterns)
{
foreach (string path in patterns)
{
string result = WildcardToRegex (path);
PrepareFilter(p ath,result);
}
}
/// <summary>
/// Prepares and adds a single regex pattern to the regex cache /// </summary>
/// <param name="name">the pattern name</param>
/// <param name="pattern"> The pattern</param>
/// <returns>A Regex object to perform filtration for thatpattern</returns>
Regex PrepareFilter(s tring name, string pattern)
{
if (regexCache.Con tains(name))
return regexCache[name] as Regex;
Regex regex = new Regex(pattern,R egexOptions.Com piled |RegexOptions.I gnoreCase );
PrepareFilter(n ame,regex);
return regex;
}
/// <summary>
/// Adds a Regex to the regexcache.
/// </summary>
/// <param name="name">The regex name</param>
/// <param name="regex">Th e Regex object to cache</param> /// <returns>the Regex passed in as regex, this is done simply to providereturn type compatability with PrepareFilter (string,string) </returns> Regex PrepareFilter(s tring name, Regex regex)
{
if (regexCache.Con tains(name))
return regexCache[name] as Regex;
regexCache.Add( name,regex);
return regex;
}
/// <summary>
/// Converts dos wildcard(*) string into regex in a generic manner /// </summary>
/// <param name="pattern"> The pattern to convert</param> /// <returns>The converted regex string.</returns>
/// <remarks>
/// This method does not handle ? in patterns, if this support is neededit should be added here.
/// </remarks>
string WildcardToRegex (string pattern)
{
pattern = pattern.Replace (@"\",@"\\");
pattern = pattern.Replace ("^",@"\^");
pattern = pattern.Replace (".",@"\.");
pattern = pattern.Replace ("*",".*");
pattern = pattern + "$";
return pattern;
}
IDictionary regexCache = new Hashtable();
.

Nov 15 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
3114
by: pembed2003 | last post by:
Hi all, I asked this question in the C group but no one seems to be interested in answering it. :-( Basically, I wrote a search and replace function so I can do: char source = "abcd?1234?x"; char search = '?'; char* replace = "***"; char* result = search_and_replace(source,search,replace);
4
9005
by: Tarique Jawed | last post by:
Alright I needed some help regarding a removal of a binary search tree. Yes its for a class, and yes I have tried working on it on my own, so no patronizing please. I have most of the code working, even the removal, I just don't know how to keep track of the parent, so that I can set its child to the child of the node to be removed. IE - if...
28
3137
by: joshc | last post by:
If I have an array of data that I know to be sorted in increasing order, and the array is less than 50 elements, and I want to find the first element greater than a certain value, is a simple linear search the best here(for loop from beginning to end of array)? It seems like some other search algorithm like binary or whatever would be of no...
60
49016
by: Julie | last post by:
What is the *fastest* way in .NET to search large on-disk text files (100+ MB) for a given string. The files are unindexed and unsorted, and for the purposes of my immediate requirements, can't be indexed/sorted. I don't want to load the entire file into physical memory, memory-mapped files are ok (and preferred). Speed/performance is a...
12
3425
by: Divick | last post by:
Hi all, does any one know of any data structure in which I can search the key and value equally fast? Though I could use hashes, but I can only search in constant time on key but not on value. If I want to search the value then I will have to do a linear search. Another related questions I have are 1. Does any one know a hash function that...
4
3355
by: Dameon | last post by:
Hi All, I have a process where I'd like to search the contents of a file(in a dir) for all occurences (or the count of) of a given string. My goal is to focus more on performance, as some of the files could be upwards of 25mb in size and time is important. I don't want to take the route of loading the text of the file into a giant string...
9
3132
by: Rick | last post by:
I have a large list of objects where each object has a unique (non-overlapping) date range. The list is sorted chronologically. What is the most efficient way to search this list for a single object that spans a specified date?
14
2958
by: S | last post by:
Any idea on how I would be able to do a search within C# that does ranges or words For example I want to search for Chicken in the string string s1 = "This is Great Chicken";
6
12139
Kelicula
by: Kelicula | last post by:
Why?: One commonly used algorithm is the binary search. If you don't already know it, you should read on. Very helpful. Saves much CPU. Reduces computations exponentially. When searching though a lot of information to find a match, the first idea that comes to mind is the linear search. Loop through all the values looking for a match. If...
0
7700
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7614
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
8125
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
0
7974
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
6284
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
0
3653
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3642
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2114
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1221
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.