By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
432,247 Members | 871 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 432,247 IT Pros & Developers. It's quick & easy.

Longest repeated phrase program

P: n/a
I need help with the following question. THANKS :)

Write a program in c# that takes in a directory as a command line
parameter, and returns the longest repeated phrase in ALL text files
in that directory. Assume that all contents of all the files may be
held in memory at the same time. Do not use unsafe code blocks and/
or pointers.
Aug 20 '08 #1
Share this Question
Share on Google+
9 Replies


P: n/a
C#_Help_needed <kh*****@gmail.comwrote:
I need help with the following question. THANKS :)

Write a program in c# that takes in a directory as a command line
parameter, and returns the longest repeated phrase in ALL text files
in that directory. Assume that all contents of all the files may be
held in memory at the same time. Do not use unsafe code blocks and/
or pointers.
That sounds awfully like homework. You're unlikely to get a direct
solution, but if you walk us through what you've got so far and exactly
where you're stuck, we may well be able to nudge you in the right
direction.

--
Jon Skeet - <sk***@pobox.com>
Web site: http://www.pobox.com/~skeet
Blog: http://www.msmvps.com/jon.skeet
C# in Depth: http://csharpindepth.com
Aug 20 '08 #2

P: n/a
On Aug 20, 4:49*pm, "C#_Help_needed" <khya...@gmail.comwrote:
I need help with the following question. THANKS :)

Write a program in c# that takes in a directory as a command line
parameter, and returns the longest repeated phrase in ALL text files
in that directory. *Assume that all contents of all the files may be
held in memory at the same time. * Do not use unsafe code blocks and/
or pointers.
Is that a homework?

What is a phrase? (in your escenario of course :) )
you could use a dictionary in the form <string, intwhere the string
is the phrase and the int is how many times it appear.
Aug 20 '08 #3

P: n/a
On Wed, 20 Aug 2008 13:49:00 -0700 (PDT), "C#_Help_needed"
<kh*****@gmail.comwrote:
>I need help with the following question. THANKS :)

Write a program in c# that takes in a directory as a command line
parameter, and returns the longest repeated phrase in ALL text files
in that directory. Assume that all contents of all the files may be
held in memory at the same time. Do not use unsafe code blocks and/
or pointers.
On the assumption that this is homework, here are some pointers.

1 Write a program to read a directory name in the command line and
echo the name straight back to the console. (Yes I know that this is
very simnple, but do it anyway.) Test this program and correct all
errors until it works perfectly.

2 Modify the program to read the names of all the files in the
directory and echo them to the console. Test and correct until it is
working properly.

3 Modify the program to list only the names of the "text files" and
not the names of the non-text files. Test and correct until it is
working properly. As you will have noticed by now you are working in
small steps from an easy program solving an easy problem towards a not
so easy program solving a more difficult problem. This is a good way
to develop a program.

4 Modify the program to read each text file individually into memory
and look for the longest repeated phrase within that single file. Do
this for each file separately. Test and correct. The testing at each
stage is very important because you will be building on your program
in the next stage and you do not want to build on a faulty program.

5 Modify the program to cross-check the phrases across all the files
(remember that they will all fit into memory). Test and correct
errors. You have now done your homework.

If you get stuck then post what you have got so far and say exactly
what you are having a problem with. This will give us more
information about exactly where you need help.

We will not write your program for you.

rossum

Aug 20 '08 #4

P: n/a
On Aug 20, 7:15*pm, rossum <rossu...@coldmail.comwrote:
On Wed, 20 Aug 2008 13:49:00 -0700 (PDT), "C#_Help_needed"

<khya...@gmail.comwrote:
I need help with the following question. THANKS :)
Write a program in c# that takes in a directory as a command line
parameter, and returns the longest repeated phrase in ALL text files
in that directory. *Assume that all contents of all the files may be
held in memory at the same time. * Do not use unsafe code blocks and/
or pointers.

On the assumption that this is homework, here are some pointers.

1 Write a program to read a directory name in the command line and
echo the name straight back to the console. *(Yes I know that this is
very simnple, but do it anyway.) *Test this program and correct all
errors until it works perfectly.

2 Modify the program to read the names of all the files in the
directory and echo them to the console. *Test and correct until it is
working properly.

3 Modify the program to list only the names of the "text files" and
not the names of the non-text files. *Test and correct until it is
working properly. *As you will have noticed by now you are working in
small steps from an easy program solving an easy problem towards a not
so easy program solving a more difficult problem. *This is a good way
to develop a program.

4 Modify the program to read each text file individually into memory
and look for the longest repeated phrase within that single file. *Do
this for each file separately. *Test and correct. *The testing at each
stage is very important because you will be building on your program
in the next stage and you do not want to build on a faulty program.

5 Modify the program to cross-check the phrases across all the files
(remember that they will all fit into memory). *Test and correct
errors. *You have now done your homework.

If you get stuck then post what you have got so far and say exactly
what you are having a problem with. *This will give us more
information about exactly where you need help.

We will not write your program for you.

rossum
Hi ,

I tried writing the solution for the following code, but got stuck. I
am able to get the folder and enumerate all the txt files present in
it. Now what I want to do is search the longest repeated phrase in
each text file. How do I do that. I have used a ForEach loop to get
the txt files and want to search the longest repeated phrase inside
the for Each loop. Any help.
Sep 16 '08 #5

P: n/a
On Sep 16, 10:05*am, modh...@gmail.com wrote:
On Aug 20, 7:15*pm, rossum <rossu...@coldmail.comwrote:


On Wed, 20 Aug 2008 13:49:00 -0700 (PDT), "C#_Help_needed"
<khya...@gmail.comwrote:
>I need help with the following question. THANKS :)
>Write a program in c# that takes in a directory as a command line
>parameter, and returns the longest repeated phrase in ALL text files
>in that directory. *Assume that all contents of all the files may be
>held in memory at the same time. * Do not use unsafe code blocks and/
>or pointers.
On the assumption that this is homework, here are some pointers.
1 Write a program to read a directory name in the command line and
echo the name straight back to the console. *(Yes I know that this is
very simnple, but do it anyway.) *Test this program and correct all
errors until it works perfectly.
2 Modify the program to read the names of all the files in the
directory and echo them to the console. *Test and correct until it is
working properly.
3 Modify the program to list only the names of the "text files" and
not the names of the non-text files. *Test and correct until it is
working properly. *As you will have noticed by now you are working in
small steps from an easy program solving an easy problem towards a not
so easy program solving a more difficult problem. *This is a good way
to develop a program.
4 Modify the program to read each text file individually into memory
and look for the longest repeated phrase within that single file. *Do
this for each file separately. *Test and correct. *The testing at each
stage is very important because you will be building on your program
in the next stage and you do not want to build on a faulty program.
5 Modify the program to cross-check the phrases across all the files
(remember that they will all fit into memory). *Test and correct
errors. *You have now done your homework.
If you get stuck then post what you have got so far and say exactly
what you are having a problem with. *This will give us more
information about exactly where you need help.
We will not write your program for you.
rossum

Hi ,

I tried writing the solution for the following code, but got stuck. I
am able to get the folder and enumerate all the txt files present in
it. Now what I want to do is search the longest repeated phrase in
each text file. How do I do that. I have used a ForEach loop to get
the txt files and want to search the longest repeated phrase inside
the for Each loop. Any help.- Hide quoted text -

- Show quoted text -
As Ignacio pointed out, you haven't shared the definition of a phrase.
But to give you some clues for possible solutions, you need to come up
with all possible phrases, and then loop and "remember" the longest
one that satisfies the condition for a repeated phrase.
Sep 16 '08 #6

P: n/a
There is no predefined phrase to be searched. I am assuming that a
complete sentence can be a phrase. So I can open a txt file read all
the sentences in an array and loop and find out the longest repetitive
phrase, but I cannot use Dictionary or Hast Table to store my info
because the key value has to be distinct. I am setting the phrase as
key value. Now before inserting into a dictionary/hashtable I can
check whether the key exists or not, if it already exists I only want
to increment its value. But am not able to do so.
Sep 17 '08 #7

P: n/a
On Wed, 17 Sep 2008 13:28:31 -0700, <mo*****@gmail.comwrote:
There is no predefined phrase to be searched. I am assuming that a
complete sentence can be a phrase.
Okay, so in the context of this problem, does that mean that "a phrase" is
simply "a complete sentence"? Or can "a phrase" be some subset of "a
complete sentence"?

As long as we can assume that you already have code to parse out "a
phrase" and that the phrase is always represented by an instance of the
String class, I think your other questions can be answered without the
above information. But having that information would sure help us to
better understand what you're doing.
So I can open a txt file read all
the sentences in an array and loop and find out the longest repetitive
phrase, but I cannot use Dictionary or Hast Table to store my info
because the key value has to be distinct.
I don't understand that statement. Based on your description of the
problem so far, any two "phrases" that are not distinct would be
considered the same, and thus would simply increment the counter in your
Dictionary.

It's true that a Dictionary can only have one element per given key, but
it seems like in this case that would be fine. What about that rule is
causing you trouble?
I am setting the phrase as
key value. Now before inserting into a dictionary/hashtable I can
check whether the key exists or not, if it already exists I only want
to increment its value. But am not able to do so.
Why not? What problem are you having?

For the record: you have asked your questions in a very vague,
nearly-useless way. You say you've had problems, but you don't describe
the problems in any specific way. You are obviously doing this in the
context of some actual code, but you haven't posted any actual code.
People can only help you insofar-as you provide enough detail for them to
answer your questions.

That said, applying a bit of my "psychic debugging" skills, I'll point out
that when dealing with a Dictionary that contains value types or immutable
reference types as the TValue parameter, the only way to update the value
associated with a specific key is to replace it.

So, if you've got a Dictionary<string, int>, then when you find that the
key already exists in the Dictionary, to increment your counter you need
to get the current value, add one to it, and store the new result back
into the Dictionary. Something like this won't work:

Dictionary<string, intdict = ...;
string strPhrase = ...;

// ...

dict[strPhrase]++;

You need something more like this:

dict[strPhrase] = dict[strPhrase] + 1;

The first version attempts to increment what's returned by
"dict[strPhrase]", which works in the sense that the value can be
incremented, but doesn't work in that there's nothing in that line of code
that stores the newly incremented value back to the Dictionary. The
expression "dict[strPhrase]" doesn't refer to the actual value in the
Dictionary, but rather the value returned by the indexer.

The second version corrects this problem by providing the compiler with
explicit instructions to get the current value, add one to it, and store
the result back into the Dictionary.

If that doesn't help, you'll need to be less vague about your question.
We can only get so far by guessing.

Pete
Sep 17 '08 #8

P: n/a
Assumptions:
Phrase: - A complete sentence.
- Any two sentences of the same length are not the same
phrase.
Problem using the dictionary object:
- See code below, when I have a phrase that already exists in
the dictionary I just want to increase its count which is stored in
the value field

Goal: I want to read each text file present in the folder and find the
longest repeated pharse in it and then do a final comparision between
all the files. I am not sure if the logic I have used here is
correct.

Do let me know if you require any other information

DirectoryInfo dirInfo = new
DirectoryInfo(folderBrowserDialog1.SelectedPath);
if (!dirInfo.Exists)
{
throw new DirectoryNotFoundException("Folder not
found: " + folderBrowserDialog1.SelectedPath);
}
//list all the files in that folder
foreach (FileInfo nextFile in dirInfo.GetFiles("*.txt")) //
enumerates all the text file in the given folder
{
strFileContent =
File.ReadAllText(nextFile.FullName); //read the context of the first
text file
strArrayLine = strFileContent.Split(sep); //Splitting
the content of the file based on '.', this gives us the individual
phrases

for (int i = 0; i < strArrayLine.Length; i++) //
iterating through the array of phrases to find out the longest
repeatitive phrase
{
string strNoSpace = strArrayLine[i].Trim(); //
Trimming of the leading and trailing spaces if any
//if (!store.ContainsKey(strArrayLine(i)))
if (!store.ContainsKey(strNoSpace )) // checking
wheteher the dictionary already conatins the phrase
{
intCount = 1; // counter to denote the number
of occurence of the pharse
store.Add(strArrayLine[i],intCount);
}
else
{
intCount = intCount + 1;
// increments the counter as suggested
store[strArrayLine[i]] =
store[strArrayLine[i]] + 1; >error: The given key was not present in
the dictionary
}
}

}

Thanks M
Sep 18 '08 #9

P: n/a
On Thu, 18 Sep 2008 08:42:41 -0700, <mo*****@gmail.comwrote:
[...]
intCount = intCount + 1;
// increments the counter as suggested
store[strArrayLine[i]] =
store[strArrayLine[i]] + 1; >error: The given key was not present in
the dictionary
Since you use "strNoSpace" in your call to ContainsKey(), you should be
using the "strNoSpace" variable here too, not "strArrayLine[i]".

The "intCount" variable (in both clauses of the if/else) is either
superfluous, or an indication that the Dictionary is completely
unnecessary. I can't tell which from your problem description. If by
"repeated phrase" you mean to include only those phrases that are repeated
adjacent to each other, then the Dictionary is unnecessary (but you'll
need to rework the algorithm to track the most recent phrase and update
your counter based on that). If a phrase is considered "repeated" as long
as it appears more than once _anywhere_ in the file, then you need the
Dictionary, but not the "intCount" variable.

Pete
Sep 18 '08 #10

This discussion thread is closed

Replies have been disabled for this discussion.