473,387 Members | 1,903 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Longest repeated phrase program

I need help with the following question. THANKS :)

Write a program in c# that takes in a directory as a command line
parameter, and returns the longest repeated phrase in ALL text files
in that directory. Assume that all contents of all the files may be
held in memory at the same time. Do not use unsafe code blocks and/
or pointers.
Aug 20 '08 #1
9 2968
C#_Help_needed <kh*****@gmail.comwrote:
I need help with the following question. THANKS :)

Write a program in c# that takes in a directory as a command line
parameter, and returns the longest repeated phrase in ALL text files
in that directory. Assume that all contents of all the files may be
held in memory at the same time. Do not use unsafe code blocks and/
or pointers.
That sounds awfully like homework. You're unlikely to get a direct
solution, but if you walk us through what you've got so far and exactly
where you're stuck, we may well be able to nudge you in the right
direction.

--
Jon Skeet - <sk***@pobox.com>
Web site: http://www.pobox.com/~skeet
Blog: http://www.msmvps.com/jon.skeet
C# in Depth: http://csharpindepth.com
Aug 20 '08 #2
On Aug 20, 4:49*pm, "C#_Help_needed" <khya...@gmail.comwrote:
I need help with the following question. THANKS :)

Write a program in c# that takes in a directory as a command line
parameter, and returns the longest repeated phrase in ALL text files
in that directory. *Assume that all contents of all the files may be
held in memory at the same time. * Do not use unsafe code blocks and/
or pointers.
Is that a homework?

What is a phrase? (in your escenario of course :) )
you could use a dictionary in the form <string, intwhere the string
is the phrase and the int is how many times it appear.
Aug 20 '08 #3
On Wed, 20 Aug 2008 13:49:00 -0700 (PDT), "C#_Help_needed"
<kh*****@gmail.comwrote:
>I need help with the following question. THANKS :)

Write a program in c# that takes in a directory as a command line
parameter, and returns the longest repeated phrase in ALL text files
in that directory. Assume that all contents of all the files may be
held in memory at the same time. Do not use unsafe code blocks and/
or pointers.
On the assumption that this is homework, here are some pointers.

1 Write a program to read a directory name in the command line and
echo the name straight back to the console. (Yes I know that this is
very simnple, but do it anyway.) Test this program and correct all
errors until it works perfectly.

2 Modify the program to read the names of all the files in the
directory and echo them to the console. Test and correct until it is
working properly.

3 Modify the program to list only the names of the "text files" and
not the names of the non-text files. Test and correct until it is
working properly. As you will have noticed by now you are working in
small steps from an easy program solving an easy problem towards a not
so easy program solving a more difficult problem. This is a good way
to develop a program.

4 Modify the program to read each text file individually into memory
and look for the longest repeated phrase within that single file. Do
this for each file separately. Test and correct. The testing at each
stage is very important because you will be building on your program
in the next stage and you do not want to build on a faulty program.

5 Modify the program to cross-check the phrases across all the files
(remember that they will all fit into memory). Test and correct
errors. You have now done your homework.

If you get stuck then post what you have got so far and say exactly
what you are having a problem with. This will give us more
information about exactly where you need help.

We will not write your program for you.

rossum

Aug 20 '08 #4
On Aug 20, 7:15*pm, rossum <rossu...@coldmail.comwrote:
On Wed, 20 Aug 2008 13:49:00 -0700 (PDT), "C#_Help_needed"

<khya...@gmail.comwrote:
I need help with the following question. THANKS :)
Write a program in c# that takes in a directory as a command line
parameter, and returns the longest repeated phrase in ALL text files
in that directory. *Assume that all contents of all the files may be
held in memory at the same time. * Do not use unsafe code blocks and/
or pointers.

On the assumption that this is homework, here are some pointers.

1 Write a program to read a directory name in the command line and
echo the name straight back to the console. *(Yes I know that this is
very simnple, but do it anyway.) *Test this program and correct all
errors until it works perfectly.

2 Modify the program to read the names of all the files in the
directory and echo them to the console. *Test and correct until it is
working properly.

3 Modify the program to list only the names of the "text files" and
not the names of the non-text files. *Test and correct until it is
working properly. *As you will have noticed by now you are working in
small steps from an easy program solving an easy problem towards a not
so easy program solving a more difficult problem. *This is a good way
to develop a program.

4 Modify the program to read each text file individually into memory
and look for the longest repeated phrase within that single file. *Do
this for each file separately. *Test and correct. *The testing at each
stage is very important because you will be building on your program
in the next stage and you do not want to build on a faulty program.

5 Modify the program to cross-check the phrases across all the files
(remember that they will all fit into memory). *Test and correct
errors. *You have now done your homework.

If you get stuck then post what you have got so far and say exactly
what you are having a problem with. *This will give us more
information about exactly where you need help.

We will not write your program for you.

rossum
Hi ,

I tried writing the solution for the following code, but got stuck. I
am able to get the folder and enumerate all the txt files present in
it. Now what I want to do is search the longest repeated phrase in
each text file. How do I do that. I have used a ForEach loop to get
the txt files and want to search the longest repeated phrase inside
the for Each loop. Any help.
Sep 16 '08 #5
On Sep 16, 10:05*am, modh...@gmail.com wrote:
On Aug 20, 7:15*pm, rossum <rossu...@coldmail.comwrote:


On Wed, 20 Aug 2008 13:49:00 -0700 (PDT), "C#_Help_needed"
<khya...@gmail.comwrote:
>I need help with the following question. THANKS :)
>Write a program in c# that takes in a directory as a command line
>parameter, and returns the longest repeated phrase in ALL text files
>in that directory. *Assume that all contents of all the files may be
>held in memory at the same time. * Do not use unsafe code blocks and/
>or pointers.
On the assumption that this is homework, here are some pointers.
1 Write a program to read a directory name in the command line and
echo the name straight back to the console. *(Yes I know that this is
very simnple, but do it anyway.) *Test this program and correct all
errors until it works perfectly.
2 Modify the program to read the names of all the files in the
directory and echo them to the console. *Test and correct until it is
working properly.
3 Modify the program to list only the names of the "text files" and
not the names of the non-text files. *Test and correct until it is
working properly. *As you will have noticed by now you are working in
small steps from an easy program solving an easy problem towards a not
so easy program solving a more difficult problem. *This is a good way
to develop a program.
4 Modify the program to read each text file individually into memory
and look for the longest repeated phrase within that single file. *Do
this for each file separately. *Test and correct. *The testing at each
stage is very important because you will be building on your program
in the next stage and you do not want to build on a faulty program.
5 Modify the program to cross-check the phrases across all the files
(remember that they will all fit into memory). *Test and correct
errors. *You have now done your homework.
If you get stuck then post what you have got so far and say exactly
what you are having a problem with. *This will give us more
information about exactly where you need help.
We will not write your program for you.
rossum

Hi ,

I tried writing the solution for the following code, but got stuck. I
am able to get the folder and enumerate all the txt files present in
it. Now what I want to do is search the longest repeated phrase in
each text file. How do I do that. I have used a ForEach loop to get
the txt files and want to search the longest repeated phrase inside
the for Each loop. Any help.- Hide quoted text -

- Show quoted text -
As Ignacio pointed out, you haven't shared the definition of a phrase.
But to give you some clues for possible solutions, you need to come up
with all possible phrases, and then loop and "remember" the longest
one that satisfies the condition for a repeated phrase.
Sep 16 '08 #6
There is no predefined phrase to be searched. I am assuming that a
complete sentence can be a phrase. So I can open a txt file read all
the sentences in an array and loop and find out the longest repetitive
phrase, but I cannot use Dictionary or Hast Table to store my info
because the key value has to be distinct. I am setting the phrase as
key value. Now before inserting into a dictionary/hashtable I can
check whether the key exists or not, if it already exists I only want
to increment its value. But am not able to do so.
Sep 17 '08 #7
On Wed, 17 Sep 2008 13:28:31 -0700, <mo*****@gmail.comwrote:
There is no predefined phrase to be searched. I am assuming that a
complete sentence can be a phrase.
Okay, so in the context of this problem, does that mean that "a phrase" is
simply "a complete sentence"? Or can "a phrase" be some subset of "a
complete sentence"?

As long as we can assume that you already have code to parse out "a
phrase" and that the phrase is always represented by an instance of the
String class, I think your other questions can be answered without the
above information. But having that information would sure help us to
better understand what you're doing.
So I can open a txt file read all
the sentences in an array and loop and find out the longest repetitive
phrase, but I cannot use Dictionary or Hast Table to store my info
because the key value has to be distinct.
I don't understand that statement. Based on your description of the
problem so far, any two "phrases" that are not distinct would be
considered the same, and thus would simply increment the counter in your
Dictionary.

It's true that a Dictionary can only have one element per given key, but
it seems like in this case that would be fine. What about that rule is
causing you trouble?
I am setting the phrase as
key value. Now before inserting into a dictionary/hashtable I can
check whether the key exists or not, if it already exists I only want
to increment its value. But am not able to do so.
Why not? What problem are you having?

For the record: you have asked your questions in a very vague,
nearly-useless way. You say you've had problems, but you don't describe
the problems in any specific way. You are obviously doing this in the
context of some actual code, but you haven't posted any actual code.
People can only help you insofar-as you provide enough detail for them to
answer your questions.

That said, applying a bit of my "psychic debugging" skills, I'll point out
that when dealing with a Dictionary that contains value types or immutable
reference types as the TValue parameter, the only way to update the value
associated with a specific key is to replace it.

So, if you've got a Dictionary<string, int>, then when you find that the
key already exists in the Dictionary, to increment your counter you need
to get the current value, add one to it, and store the new result back
into the Dictionary. Something like this won't work:

Dictionary<string, intdict = ...;
string strPhrase = ...;

// ...

dict[strPhrase]++;

You need something more like this:

dict[strPhrase] = dict[strPhrase] + 1;

The first version attempts to increment what's returned by
"dict[strPhrase]", which works in the sense that the value can be
incremented, but doesn't work in that there's nothing in that line of code
that stores the newly incremented value back to the Dictionary. The
expression "dict[strPhrase]" doesn't refer to the actual value in the
Dictionary, but rather the value returned by the indexer.

The second version corrects this problem by providing the compiler with
explicit instructions to get the current value, add one to it, and store
the result back into the Dictionary.

If that doesn't help, you'll need to be less vague about your question.
We can only get so far by guessing.

Pete
Sep 17 '08 #8
Assumptions:
Phrase: - A complete sentence.
- Any two sentences of the same length are not the same
phrase.
Problem using the dictionary object:
- See code below, when I have a phrase that already exists in
the dictionary I just want to increase its count which is stored in
the value field

Goal: I want to read each text file present in the folder and find the
longest repeated pharse in it and then do a final comparision between
all the files. I am not sure if the logic I have used here is
correct.

Do let me know if you require any other information

DirectoryInfo dirInfo = new
DirectoryInfo(folderBrowserDialog1.SelectedPath);
if (!dirInfo.Exists)
{
throw new DirectoryNotFoundException("Folder not
found: " + folderBrowserDialog1.SelectedPath);
}
//list all the files in that folder
foreach (FileInfo nextFile in dirInfo.GetFiles("*.txt")) //
enumerates all the text file in the given folder
{
strFileContent =
File.ReadAllText(nextFile.FullName); //read the context of the first
text file
strArrayLine = strFileContent.Split(sep); //Splitting
the content of the file based on '.', this gives us the individual
phrases

for (int i = 0; i < strArrayLine.Length; i++) //
iterating through the array of phrases to find out the longest
repeatitive phrase
{
string strNoSpace = strArrayLine[i].Trim(); //
Trimming of the leading and trailing spaces if any
//if (!store.ContainsKey(strArrayLine(i)))
if (!store.ContainsKey(strNoSpace )) // checking
wheteher the dictionary already conatins the phrase
{
intCount = 1; // counter to denote the number
of occurence of the pharse
store.Add(strArrayLine[i],intCount);
}
else
{
intCount = intCount + 1;
// increments the counter as suggested
store[strArrayLine[i]] =
store[strArrayLine[i]] + 1; >error: The given key was not present in
the dictionary
}
}

}

Thanks M
Sep 18 '08 #9
On Thu, 18 Sep 2008 08:42:41 -0700, <mo*****@gmail.comwrote:
[...]
intCount = intCount + 1;
// increments the counter as suggested
store[strArrayLine[i]] =
store[strArrayLine[i]] + 1; >error: The given key was not present in
the dictionary
Since you use "strNoSpace" in your call to ContainsKey(), you should be
using the "strNoSpace" variable here too, not "strArrayLine[i]".

The "intCount" variable (in both clauses of the if/else) is either
superfluous, or an indication that the Dictionary is completely
unnecessary. I can't tell which from your problem description. If by
"repeated phrase" you mean to include only those phrases that are repeated
adjacent to each other, then the Dictionary is unnecessary (but you'll
need to rework the algorithm to track the most recent phrase and update
your counter based on that). If a phrase is considered "repeated" as long
as it appears more than once _anywhere_ in the file, then you need the
Dictionary, but not the "intCount" variable.

Pete
Sep 18 '08 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

14
by: Mudge | last post by:
This thread is hereby dubbed PHP Purpose, or PHPP. It has the following purposes: 1. To bring together PHPers and programmers around the world to become organized. 2. To dicuss purposes of...
0
by: Follower | last post by:
Hi, I am working on a function to return extracts from a text document with a specific phrase highlighted (i.e. display the context of the matched phrase). The requirements are: * Match...
1
by: rodchar | last post by:
hey all, i was reading an article on Testing.Net Application Blocks and I came across a phrase in the article: White box testing and i think i missed the class on this terminology. and this...
2
by: Keith Chadwick | last post by:
XML <data> <option>this is test 1</option> <option>this is test 11</option> <option>this is test 111</option> <option>this is test 1111</option> <option>this is test 11111</option> </data>
23
by: Amar Prakash Tripaithi | last post by:
Dan Hoey, who had recently graduated, wrote a C program to look for and construct the following beauty: A man, a plan, a caret, a ban, a myriad, a sum, a lac, a liar, a hoop, a pint, a catalpa,...
4
by: Robert Dodier | last post by:
Hello all, I'm trying to find substrings that look like 'FOO blah blah blah' in a string. For example give 'blah FOO blah1a blah1b FOO blah2 FOO blah3a blah3b blah3b' I want to get three...
1
by: iElf | last post by:
can someone point out my error? cus this is printing out garbage and not the longest word... Program ( I underlined the loop Im trying to make to read and find the longest word) everything else...
4
by: vicestner | last post by:
Write a Java program that prints the longest palindrome in an input file to standard output. A palindrome is a string whose reverse is the same as the original. Ignore case, whitespace, and...
1
by: chromis | last post by:
Hi, I've been trying to create a carousel class which takes an array of phrases and then creates a textfield for each one positioning it vertically based on the order it was added. The next stage...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.