473,405 Members | 2,154 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,405 software developers and data experts.

File Search

Framework version : 1.1

For a given directory (which may have subdirectories), I need to identify
the number of text files (*.txt). For this I have tried recursive method to
search files, it works fine for the directory which has smaller size, but it
takes 2 or 3 minutes to search the directory of size 8GB (say “C:”). Is there
any other quicker method to identify whether the given file type (txt) is
available in the given directory, to avoid unnecessary sequential search in
recursive method.
Feb 14 '06 #1
5 8317
I guess one question would be: how long does it take Windows to perform the
same search? If it takes about the same time you're probably not doing too
much wrong.

Under 2.0 the GetFiles method can accept
System.IO.SearchOption.AllDirectories which recurses on your behalf, but to
be honest if all you want to do is count them I'm not even sure that this is
the best option, as this will end up returning a relatively big array; I
don't know (without trying) how optimised this is; it *might* still be your
best option to iterate through the directories calling GetFiles;

The following takes about 8 seconds to search my c: drive:

static void Main(string[] args)
{
Console.WriteLine(CheckDir(new DirectoryInfo("c:\\")));
}

static int counter = 0;

static int CheckDir(DirectoryInfo di)
{

int count = 0;
try
{ // watch out for permission denied ;-p
count+=di.GetFiles("*.txt").Length;
foreach(DirectoryInfo subDi in di.GetDirectories())
{
count += CheckDir(subDi);
}
}
catch {} // lazy
if(++counter%100 == 0) Console.WriteLine(counter); // just to see it is
working
return count;
}
Feb 14 '06 #2

"Dhans" <Dh***@discussions.microsoft.com> wrote in message
news:AB**********************************@microsof t.com...
Framework version : 1.1

For a given directory (which may have subdirectories), I need to identify
the number of text files (*.txt). For this I have tried recursive method
to
search files, it works fine for the directory which has smaller size, but
it
takes 2 or 3 minutes to search the directory of size 8GB (say "C:"). Is
there
any other quicker method to identify whether the given file type (txt) is
available in the given directory, to avoid unnecessary sequential search
in
recursive method.


It MIGHT be worth eliminating the recursion.

Start with a list of directories (proably just 1) and an (empty) list for
the txt files.
While the directory list is not empty
{
take first directory off the list and examine its content
append directories to directory list
append txt files to file list
}
Feb 14 '06 #3
The same occurred to me; the natural choice here would be a
Queue<DirectoryInfo>, which obviously doesn't exist in 1.1 (as per OP)...
however, timings indicate no appreciable difference in performance between
recursive functions and queueing (some variance both up and down on repeated
tests, but within the same range indicating HDD is the cause). Clearly the
file-system is being the slow dog. Recursion isn't necessarily a sensible
option for horrendous trees, so might be worth refactoring as per Nick's
suggestion.

I ran the tests outside of the debugger, which doubles the performance to
roughly 4.1s to scan my disk (over any implementation). My comparison also
highlighted that SearchOption.AllDirectories is not really a very good
option, as it breaks too easily with any permission denial (unless you are
sa, but of course we don't ever run as admin ;-p).

Code for 2.0 follows:

Queue<DirectoryInfo> queue = new Queue<DirectoryInfo>();
queue.Enqueue(di); // root of search
int files = 0;
while (queue.Count > 0) {
DirectoryInfo current = queue.Dequeue();
try { // watch out for permission denied ;-p
files += current.GetFiles(pattern).Length; // or put
into a List<FileInfo> or something
foreach (DirectoryInfo subDir in
current.GetDirectories()) {
queue.Enqueue(subDir);
}
} catch { } // lazy
}
return files;

Marc
Feb 14 '06 #4
"Marc Gravell" wrote:
I guess one question would be: how long does it take Windows to perform the
same search? If it takes about the same time you're probably not doing too
much wrong.
More or less mysearch take same time duration for a search as windows
takes.
Under 2.0 the GetFiles method can accept
System.IO.SearchOption.AllDirectories which recurses on your behalf, but to
be honest if all you want to do is count them I'm not even sure that this is
the best option, as this will end up returning a relatively big array;


No, I want the file names (fullpath) which matches the search criteria.
Feb 14 '06 #5
Ahh; you mislead me by saying "number of"... but never mind:

Try this; for me under 1.1 this takes 4 seconds to return the 6000+ dll
files on my c: drive (not including UI time to display them) - about 1/4 of
the Windows search time (use different command-line params to select the
root and pattern); what timings do you get with this? How many txt files /
folders are we talking? If the numbers are *very* high, then resizing the
array might be sucking some cycles, in which case eventing or custom
iterators might help...
using System;
using System.IO;
using System.Collections;

namespace ConsoleApplication3
{
/// <summary>
/// Summary description for Class1.
/// </summary>
class Program
{
/// <summary>
/// The main entry point for the application.
/// </summary>
[STAThread]
static int Main(string[] args)
{
try
{
DateTime start = DateTime.Now;
FileInfo[] files =GetFiles(args[0], args[1]);
DateTime stop = DateTime.Now; // stop now as have results
foreach(FileInfo file in files)
Console.WriteLine(file.FullName);
Console.WriteLine(files.Length);
Console.WriteLine(stop.Subtract(start).TotalMillis econds);
return 0;
}
catch (Exception e)
{
Console.WriteLine(e);
return -1;
}

}

static FileInfo[] GetFiles(string path, string pattern)
{
ArrayList queue = new ArrayList(), files = new ArrayList();
queue.Add(new DirectoryInfo(path));
while(queue.Count>0)
{
DirectoryInfo dir = (DirectoryInfo) queue[0];
queue.RemoveAt(0);

try // watch out for permission denied ;-p
{
files.AddRange(dir.GetFiles(pattern));
queue.AddRange(dir.GetDirectories());
}
catch {} // lazy
}
return (FileInfo[]) files.ToArray(typeof(FileInfo));
}
}
}
Feb 14 '06 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Rafael Nenninger | last post by:
This question has to do with MS file search but it is happening only with ..asp pages, so I though someone programming with .asp pages has experienced the same situation. I'm trying to find .asp...
14
by: Frances Del Rio | last post by:
if (parent.frames.main.location == 'mediaselect.html') { I have a very simple frameset, name of frame where I'm checking is 'main'... why is this not working? I mean this is correct syntax,...
4
by: Nikos | last post by:
Hi... I would like to search for a hex string (for example: "E903") inside a binary file... Although I open the file correctly, how do I search hex values? Thanks in advance! Nikos
13
by: Ray Muforosky | last post by:
Hello all: Task: I want to do file search, using the "conatining text" option from a web page. How do I search for a file on my local drive containing a certain string, from a web page. That...
4
by: Dameon | last post by:
Hi All, I have a process where I'd like to search the contents of a file(in a dir) for all occurences (or the count of) of a given string. My goal is to focus more on performance, as some of the...
7
by: ianenis.tiryaki | last post by:
well i got this assignment which i dont even have a clue what i am supposed to do. it is about reading me data from the file and load them into a parallel array here is the question: Step (1) ...
75
by: ume$h | last post by:
/* I wrote the following program to calculate no. of 'a' in the file c:/1.txt but it fails to give appropriate result. What is wrong with it? */ #include"stdio.h" int main(void) { FILE *f;...
1
by: theeverdead | last post by:
Ok I have a file in it is a record of a persons first and last name. Format is like: Trevor Johnson Kevin Smith Allan Harris I need to read that file into program and then turn it into a linked...
3
by: Ahmad Jalil Qarshi | last post by:
Hi, I have a text file having size about 2 GB. The text file format is like: Numeric valueAlphaNumeric values Numeric valueAlphaNumeric values Numeric valueAlphaNumeric values For example...
16
by: vizzz | last post by:
Hi there, i need to find an hex pattern like 0x650A1010 in a binary file. i can make a small algorithm that fetch all the file for the match, but this file is huge, and i'm scared about...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development projectplanning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.