473,287 Members | 1,978 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,287 software developers and data experts.

i need help with this project please some one help meeee

Introduction

This assignment requires you to develop solutions to the given problem
using several different approaches (which actually involves using three
different STL containers). You will implement all three techniques as
programs. In these programs, as well as solving the problem, you will
also measure how long the program takes to run. The programs are worth
80% of the total mark. The final 20% of the marks are awarded for a
brief report explaining why the different algorithms performed in the
way they did.
The Problem - text analysis/word counting

Computer analysis of texts, which involves making statistical
measurements on the text, is used in a variety of situations, including
authorship attribution (checking a piece of text by an unknown author
against texts by others to try and work out who wrote it) and
plagiarism detection (shudder at the thought). One of the simplest
measures is to take a piece of text and count how often each word used
in the text occurs. Obviously, in a large piece of text many words are
going to be used. We are going to look at several ways of do this
counting. We are not going to worry about analysing the data - that is
the problem for a different program (or different part of the program)
- we are just looking at the word counting phase. We are going to run
various programs (six in all) which all using STL containers (some of
the programs use the same containers but in different ways, so there
are only three different containers used). We are going to time how
long each algorithm takes (don't worry, the first appendix shows you
how to do this). To enable meaningful measurements to be made, a fairly
large piece of text is required. A file called Frank.txt has been put
on BrightSpark for you to use. It contains the first three chapters of
'Frankenstein' by Mary Shelley. The text has been ready prepared
for analysis - all punctuation and capitalisation has been removed.

Each program must read the file, and count how often each of the words
occurs. The program then prints out a list of the words (the order of
the words depends on which part you are doing) and how many times they
are present in the text. There are a lot of ways that this could be
done. Several approaches/algorithms are given below. You should measure
the time taken to complete each algorithm. Outputting the text to the
screen will be a slower process than doing the analysis, so you should
take a time measurement at three points - before starting the
analysis, after the analysis but before the output and after the
output. You can then say how long the analysis took, how long the
printing took and how long the overall analysis took. Remember, do not
take a single value, but run several (say, ten times) and take the
average.


Note

Most of the programs (all except 3) will need you to use a class to
hold a word/word count pair. The following header file defines a class
which (with the addition of the matching body) is sufficient for parts
and 2.:


class WordCount
{
private:
string word;
int count;
public:
WordCount();
WordCount(const string& word);
WordCount(const WordCount& wc);

WordCount operator++(int);
bool operator==(const WordCount& rhs);

friend
ostream& operator<<(ostream& os, const WordCount& rhs);
};

As you can see, there are two attributes, a word and the number of
times that word occurs in the text.

Here are some details of the operators:
WordCount();
This is the default constructor. You will not create objects directly
with it, but it is required for creating some of the initial STL
containers. It should set the count to zero and the word to an empty
string.
WordCount(const string& word);
This is the constructor that you will use to create objects in your
program). You supply the word as a parameter and set the count to 1
(since it is used when you find a new word for the first time).
WordCount(const WordCount& wc);
This is a standard copy constructor which will be needed at some
stages. You do not explicitly call it.

Here are some details of the operators:
operator++
This is the increment operator (actually a post-increment) which
increments the count field (only). Post increment operators are
slightly strange to write, the stages are:
a) take a copy of the object
b) increment the count attribute
c) return the copy of the object
operator==
This only compares the word fields, it returns true if they are the
same and false if they are different. In parts 2 and 3 you will also
need a less than comparison. It is very similar to this operator.
operator<<
The overloaded output operator simplifies the output stage of the
program and should give the word followed by its count - put the
count in brackets to make it easier to follow.

Depending on how you write the code you may need some additional
operators or methods. Hopefully you won't, but if you do then you may
add them.


Part 1

The programs

In this assignment, the words are output in any order, i.e. no sorting
is done. There are three programs to write (although the second is
only a fairly minor modification of the first):

The first program (1)
Use the vector container from STL. As you read in each word, check to
see if it is already stored in the vector. If it is, increment that
word's count. If it isn't, add the word to the end of the vector.
It is easiest if you create a temporary word count object as soon as
you read each word from the file and use that for the comparisons etc.

The second program (2)
This should be a very similar program to the first program, but should
use a list container instead of a vector.

The third program (3)
This should use a hash map. The approach of the third program is
similar to the first two, but you will not use the WordCount class
because the hash map expects pairs of values. You will not have met
hash maps before, so there are notes on using the hash map in appendix
2. The hash maps work on the concept of pairs of values, one being a
key which is used to locate the data and the other being the data
value. If we let the word be the key and the count of that word be the
data then the hash map will work for solving this problem. Because hash
maps automatically work with pairs you will not use the WordCount class
in this section.

When you have read a word, you should use the find method to see if it
is already in the hash map. The find method returns an iterator. If the
iterator is set to the end of the map (i.e. is equal to wordlist.end())
then the word is not in the map. You can use the insert method to
insert this as a new word. Appendix 2 shows gow to use the template
pair to do this.

If the iterator is not set to the end then the word has been found. The
iterator allows you to access two fields, first (the word) and second
(the count). You can increment the count directly.

To output the result you can declare an iterator and step through the
hash map. You can use the first and second fields to mimic the output
of the other programs.

Hash maps are very efficient at insertion and retrieval, and both
operators work in almost constant time.

Oct 19 '06 #1
4 2156
na*****@walla.co.il wrote:
Introduction

This assignment requires you to develop solutions to the given problem
using several different approaches (which actually involves using three
different STL containers). You will implement all three techniques as
<SNIP>

Please see:
http://www.parashift.com/c++-faq-lit...t.html#faq-5.2

Regards,
Sumit.
Oct 19 '06 #2
I can only say - Good luck :)

--
SirMike - http://www.sirmike.org

C makes it easy to shoot yourself in the foot; C++ makes it harder, but
when you do, it blows away your whole leg. - Bjarne Stroustrup
Oct 19 '06 #3
na*****@walla.co.il wrote:
class WordCount
{
private:
string word;
int count;
public:
WordCount();
WordCount(const string& word);
WordCount(const WordCount& wc);

WordCount operator++(int);
bool operator==(const WordCount& rhs);

friend
ostream& operator<<(ostream& os, const WordCount& rhs);
};
Let's start with your WordCount class (strictly speaking, you don't need
it for the assignment, but the teacher wants you to use it so you use
it.)

----- program code -----
#include "WordCount.h"
#include <sstream>
#include <iostream>
#include <cassert>

using namespace std;

int main() {
WordCount wc;
stringstream ss;
ss << wc;
assert( ss.str() == "[0]" );
cout << "Working So Far.";
}
----- end -----

Use the above and put your teachers WordCount class info in a file
called "WordCount.h". You will also need a WordCount.cpp file:

----- program code -----
#include "WordCount.h"

// insert code as necessary

----- end -----

Put code where it says "insert code as necessary" until you can make the
program run and print out "Working So Far".

Once you get that done, post what you have here and I'll help you with
the next step (you can email if you like or IM me [AIM nic: ObjHead] if
you want to go faster.)

If you have any problems, post what you have so far and ask away!

--
There are two things that simply cannot be doubted, logic and perception.
Doubt those, and you no longer*have anyone to discuss your doubts with,
nor any ability to discuss them.
Oct 19 '06 #4
na*****@walla.co.il wrote:
[assignment snipped]

Sure naknak. Just send us your instructor's name and email, and
we'll send himswer right away. That will save you the bother of
having to copy it from the news group.
Socks

Oct 19 '06 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Andrej Hristoliubov | last post by:
I am the best c++ programmer in the whole wide world. Trust ME! My reference is Victor Bazarov,Valentin Samko,Alf P.Steinbach( Me and Alf actually intern together at Microsoft), and Bjarne...
48
by: Chad Z. Hower aka Kudzu | last post by:
A few of you may recognize me from the recent posts I have made about Indy <http://www.indyproject.org/indy.html> Those of you coming to .net from the Delphi world know truly how unique and...
3
by: Russell Smith | last post by:
I am creating an Excel add-in. Please note project type is Shared Add-in vice Excel Workbook. In the OnStartupComplete event handler I am adding a new menu item. I start by getting a reference...
4
by: hb | last post by:
Hi, I have been working on the ASP.Net project for months with VS.Net 2003 in C#. But this afternoon I suddenly got the following error when I tied to compile the whole solution: === ------...
12
by: Noel | last post by:
Hello, I'm currently developing a web service that retrieves data from an employee table. I would like to send and retrieve a custom employee class to/from the webservice. I have currently coded...
15
by: Cheryl Langdon | last post by:
Hello everyone, This is my first attempt at getting help in this manner. Please forgive me if this is an inappropriate request. I suddenly find myself in urgent need of instruction on how to...
21
by: nihad.nasim | last post by:
Hi there, I have a database in Access that I need on the web. The web page should connect to the database and write records for certain tables and view records for others. I want to know a...
4
by: Linda Liu[MSFT] | last post by:
Hi Moondaddy, I downloaded your sample project and run it and did see the problem on my side. There're three problems in the source code of your project. 1. You should move the following...
3
by: shapper | last post by:
Hello, I am working on an ASP.MET MVC Web Application with NET 3.5 in VS 2008. I need to run some extra tasks on this project build so I download MSBuild from http://msbuildtasks.tigris.org/....
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: marcoviolo | last post by:
Dear all, I would like to implement on my worksheet an vlookup dynamic , that consider a change of pivot excel via win32com, from an external excel (without open it) and save the new file into a...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.