By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,592 Members | 1,958 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,592 IT Pros & Developers. It's quick & easy.

How to add a string to a big file in csharp !

P: n/a
I want to add a string to the file and the file is sort by letter! for
examply:
the follow file is a big file
//////////////////////
abort
black
cabbage
dog
egg
fly
..
..
////////////////////
and now i want to add "dad" into it ! Just after "cabbage" and at the front
of "dog"! Because of so many word in file so i need to adopt binary search to
find the location !

/// <summary>
/// want to find the word from given file
/// </summary>
/// <param name="?"></param>

private bool find(string word)
{
if (word == null)
{
throw new ArgumentNullException("word is null.");
}

StreamReader sr = new StreamReader(file.FullName); //file is object of
FileInfo
lock(this)
{
//Check the word is in the first!
string str = sr.ReadLine();
if (str == null)
{
return false;
}
if (string.Compare(str.Trim(),word))
return true;
}

// binary search starts
FileStream fs = File.OpenRead(file.FullName);
long lower = 0;
long upper = fs.Length - 1;
while (lower <= upper)
{
long index = (lower + upper) / 2;
fs.seek(index,SeekOrigin.End);

// read off an incomplete line
str = fs.Read();
////i donot know how to set the parameters of Read() so that it can read a
line

// the line might be null if it's the end of file
int t = str == null ? -1
: string.Compare(word, str.trim());
// found it
if (t == 0)
{
return true;
}
if (t > 0)
{
lower = index + 1;
}
else
{
upper = index - 1;
}
}
}

that is the fuction of method and my question is
1: the FileStream is fitable in it ?
2 : string.Compare is fitable in it ?
3: is there any method i can do it better ?

thanx of all !
Nov 16 '05 #1
Share this Question
Share on Google+
9 Replies


P: n/a
"zjut" <zj**@discussions.microsoft.com> wrote in message
news:A2**********************************@microsof t.com...
I want to add a string to the file and the file is sort by letter! for
examply:
the follow file is a big file
//////////////////////
abort
black
cabbage
dog
egg
fly
.
.
////////////////////
and now i want to add "dad" into it !


Given:
* Your file ("old_file") is in alphabetical order
* old_file is immensely big

Required:
* Adding word ("new_word") to old_file at the right place.

Solution:
* Sequentially read old_file ("word_read") and write word_read to new_file
* If (new_word > word_read) and (new_word< word_read+1) then shove it in

Nov 16 '05 #2

P: n/a
"zjut" <zj**@discussions.microsoft.com> wrote in message
news:A2**********************************@microsof t.com...
I want to add a string to the file and the file is sort by letter! for


If you are having problems with the algorithm let me know and I will post an
example. The example sorts a short alphabetically ordered file into a very
big alphabetically ordered file. By the way, WordPerfect can deal with
immensely big files (100,000+ words). Microsoft Word can't.
Nov 16 '05 #3

P: n/a
I see another NG member has already given you a possible
solution, but I don't feel it would be an optimal solution... You
really have a couple of different options that all revolve around the
same set of principles... First, you know the new size of the word
you are sorting into place, so you'll want to open and then grow
the file by that amount. This is to make sure you can copy the
rest of the file around while you are doing your searching. For a
sanity check, go ahead and check the first and last element to make
sure this isn't a trivial case.

Okay, the binary search is going to involve, cutting the file in half,
you can do this based on length, and then seeking to that location.
Once you've done that, you are going to walk backwards and
forwards until you encounter newlines on either side. This'll be
your *word*, and you'll compare it and continue the process of
cutting the file in half (aka a binary search)... Once you've found
your insertion location, you are going to do large buffer copies (4K
is probably best) of bytes moving all of the end elements into that
space you allocated in the beginning. With that done, write your
word into place. You've just managed an in place insertion.

If you have multiple words to merge, then merge sorting and other
heuristics come into play. Get your basic algorithm and then think
about refactoring.

--
Justin Rogers
DigiTec Web Consultants, LLC.
Blog: http://weblogs.asp.net/justin_rogers

"zjut" <zj**@discussions.microsoft.com> wrote in message
news:A2**********************************@microsof t.com...
I want to add a string to the file and the file is sort by letter! for
examply:
the follow file is a big file
//////////////////////
abort
black
cabbage
dog
egg
fly
.
.
////////////////////
and now i want to add "dad" into it ! Just after "cabbage" and at the front
of "dog"! Because of so many word in file so i need to adopt binary search to
find the location !

/// <summary>
/// want to find the word from given file
/// </summary>
/// <param name="?"></param>

private bool find(string word)
{
if (word == null)
{
throw new ArgumentNullException("word is null.");
}

StreamReader sr = new StreamReader(file.FullName); //file is object of
FileInfo
lock(this)
{
//Check the word is in the first!
string str = sr.ReadLine();
if (str == null)
{
return false;
}
if (string.Compare(str.Trim(),word))
return true;
}

// binary search starts
FileStream fs = File.OpenRead(file.FullName);
long lower = 0;
long upper = fs.Length - 1;
while (lower <= upper)
{
long index = (lower + upper) / 2;
fs.seek(index,SeekOrigin.End);

// read off an incomplete line
str = fs.Read();
////i donot know how to set the parameters of Read() so that it can read a
line

// the line might be null if it's the end of file
int t = str == null ? -1
: string.Compare(word, str.trim());
// found it
if (t == 0)
{
return true;
}
if (t > 0)
{
lower = index + 1;
}
else
{
upper = index - 1;
}
}
}

that is the fuction of method and my question is
1: the FileStream is fitable in it ?
2 : string.Compare is fitable in it ?
3: is there any method i can do it better ?

thanx of all !

Nov 16 '05 #4

P: n/a
Zach <no*@this.address> wrote:
I want to add a string to the file and the file is sort by letter! for


If you are having problems with the algorithm let me know and I will post an
example. The example sorts a short alphabetically ordered file into a very
big alphabetically ordered file. By the way, WordPerfect can deal with
immensely big files (100,000+ words). Microsoft Word can't.


While I wouldn't be surprised if Word had some limits somewhere, Word
can certainly cope with 100,000+ words easily. I just created a
document with over 300,000 words, and Word didn't have any problems
with it.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #5

P: n/a
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Zach <no*@this.address> wrote:
I want to add a string to the file and the file is sort by letter! for


If you are having problems with the algorithm let me know and I will post an example. The example sorts a short alphabetically ordered file into a very big alphabetically ordered file. By the way, WordPerfect can deal with
immensely big files (100,000+ words). Microsoft Word can't.


While I wouldn't be surprised if Word had some limits somewhere, Word
can certainly cope with 100,000+ words easily. I just created a
document with over 300,000 words, and Word didn't have any problems
with it.


I had to process > 100,000 words - sort them and so on to create a
spellcheck
vocabulary, and Word wouldn't do it. Message saying it couldn't handle the
volume. WordPerfect had no problems sorting >100,000 words etc.
(So I wrote some software for the job.)
Nov 16 '05 #6

P: n/a
"Justin Rogers" <Ju****@games4dotnet.com> wrote in message
news:e$**************@TK2MSFTNGP14.phx.gbl...

NB the words of the OP are in a
file to start with and have to be read
and re-written at least once!

Sequentially reading through the parent file,
slipping in the words from a sorted array,
at their respective right places in the parent
file, whilst checking for doubles, is fast, simple
to write and has no capacity constraints.
IMO Doing binary sorts in this situation is silly,
even more so if the new words are in random order.


Nov 16 '05 #7

P: n/a
Zach <no*@this.address> wrote:
While I wouldn't be surprised if Word had some limits somewhere, Word
can certainly cope with 100,000+ words easily. I just created a
document with over 300,000 words, and Word didn't have any problems
with it.


I had to process > 100,000 words - sort them and so on to create a
spellcheck vocabulary, and Word wouldn't do it. Message saying it couldn't
handle the volume. WordPerfect had no problems sorting >100,000 words etc.
(So I wrote some software for the job.)


I would argue that a word processor isn't the right tool for sorting a
vocabulary file anyway. While Word may not be able to sort a document
with over 100,000 words, it's fine when it comes to normal word
processing tasks with the same size of document.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #8

P: n/a
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Zach <no*@this.address> wrote:
I would argue that a word processor isn't the right tool for sorting a
vocabulary file anyway. While Word may not be able to sort a document
with over 100,000 words, it's fine when it comes to normal word
processing tasks with the same size of document.


Yes, and I wanted to throw out the words that WP didn't know,
because they wouldn't be every day vocabulary.
Nov 16 '05 #9

P: n/a
I have a few suggestions:

1) Can you just split the names in the files into 26 separate files
such that 1st file has all A's, 2nd file has all B's and so on. I
think that will reduce the amount of text you need to process.

2) You can also try using a B+ tree. Very good for frequent finds and
few updates.
3) Alternatively, why don't you use a hash table to store the hash of
each of the words. That way, when you want to find where a word goes,
compute its hash and you should be able to see which hash should come
before the word you want to add. That way, when your searching for the
word, it will be much much faster because you can search for a
specific word, rather than comparing each and every word (ie. you can
ignore chunks of data using hashes).
Let me know what you decide to do.
Nov 16 '05 #10

This discussion thread is closed

Replies have been disabled for this discussion.