473,320 Members | 2,027 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

remove certain words from a c++ string

Hi guys,

I want to remove certain words from a c++ string. The list of words are
in a file with each word in a new line. I tried using the
std::transform, but it dint work.

Anybody got a clue as to how i should go about this.

thanks a lot,
Hp

Oct 22 '05 #1
18 23314

pr****************@gmail.com wrote:
Hi guys,

I want to remove certain words from a c++ string. The list of words are
in a file with each word in a new line. I tried using the
std::transform, but it dint work.

Anybody got a clue as to how i should go about this.

thanks a lot,
Hp

Try using string::find and string remove (I added swap for
optimization, you don't have to):

example:

string str="Hello world is the only assignment I can do",
remword=world;
size_t=pos;

if((pos=str.find(remword))!=string::npos)
{
str.swap(str.erase(pos,remword.length()));

}

ps. I rule!

Oct 23 '05 #2
On 22 Oct 2005 16:33:04 -0700, pr****************@gmail.com wrote:
Hi guys,

I want to remove certain words from a c++ string. The list of words are
in a file with each word in a new line. I tried using the
std::transform, but it dint work.

Anybody got a clue as to how i should go about this.


transform() doesn't remove entries in a container, it only modifies them.

Use string::find() to find the substring, then use string::erase() to remove
the substring from the string.

-dr
Oct 23 '05 #3
Hp
Hi, Thanks a lot for your replies. But i fugured out that before the
word removal from the string, i need to convert the c++ string from
upper to lower case. I infact used the transform() to perform this
operation, which dint work.
And also, after the uppertolower case conversion, i need to read the
file containing all the stopwords, one in each line, to be removed from
the transformed string.

Thanks a lot in advance,
Hp

Oct 23 '05 #4
Hp wrote:
Hi, Thanks a lot for your replies. But i fugured out that before the
word removal from the string, i need to convert the c++ string from
upper to lower case. I infact used the transform() to perform this
operation, which dint work.
How did you do it? How didn't it work?
And also, after the uppertolower case conversion, i need to read the
file containing all the stopwords, one in each line, to be removed from
the transformed string.


Show us some code! Reading a line from a file is a basic operation
described in any textbook (good or bad).

1) get the string
2) convert it to lower case
3) read the lines from the file
4) search the string for the words you just read and remove each one
Jonathan

Oct 23 '05 #5

Hp wrote:
Hi, Thanks a lot for your replies. But i fugured out that before the
word removal from the string, i need to convert the c++ string from
upper to lower case. I infact used the transform() to perform this
operation, which dint work.
And also, after the uppertolower case conversion, i need to read the
file containing all the stopwords, one in each line, to be removed from
the transformed string.

Thanks a lot in advance,
Hp


stopwords?? (like the, an, a)-- sounds like a problem from data mining
what course are you taking?

I think I've dealt with it (long ago in my academic career!!!!)

Oct 24 '05 #6
Hp
Hey Puzzlecracker, its exactly a problem from Datamining...yes, the
stopwords are in a file, with each stop word in a line.

Hi jonathan, thanks for your replies. I used the following code to
convert the string from upper to lower case:
std::transform(file.begin(),file.end(),file.begin( ),(int(*)(int))std::tolower);

file: is the string from which stopwords need to be removed
Thanks a lot,
Hp

Oct 24 '05 #7

Hp wrote:
Hey Puzzlecracker, its exactly a problem from Datamining...yes, the
stopwords are in a file, with each stop word in a line.

Hi jonathan, thanks for your replies. I used the following code to
convert the string from upper to lower case:
std::transform(file.begin(),file.end(),file.begin( ),(int(*)(int))std::tolower);

file: is the string from which stopwords need to be removed
Thanks a lot,
Hp


explain "(int(*)(int))std::tolow*er)"; of transform? not quite sure
what that casting is all about.

thanks

ps I assume you didn't just blindly copied the code.

Oct 24 '05 #8
Hp
Hi, i figured out on how to do the case conversion, it was a casting
error which i took care of, thanks for the hint Puzzlecracker.
I tried using andreus piece of code to remove the stop words, but could
not get thru. Any hint on stopword removal would be greatly
appreciated, as i m a novice to c++.
Thanks, Hp

Oct 24 '05 #9

Hp wrote:
Hi, i figured out on how to do the case conversion, it was a casting
error which i took care of, thanks for the hint Puzzlecracker.
I tried using andreu piece of code to remove the stop words, but could
not get thru. Any hint on stopword removal would be greatly
appreciated, as i m a novice to c++.
Thanks, Hp

easy:

1. populate all stop words into a set
2. read all words from the file into a vector and as you read, check
wether that word is a stop word (use lexegraphics_compare to avoid case
issue. If it is, discard it, otherwise put into a vector.

I will start:?
#include<iostream>
#include<set>
#include<vector>

using namespace std;

void initialize(const set<string>);
int main(int argc, char *argv[])
{

set<string> stopWset;
vector<string> wordvec;
ifstream in("input.txt");

if(!in)
//report error

initialize(stopWset); //

string word;
while(in>>word)
if(stopWset.find(word)!=stopWset.end())
wordvec.push_back(word);
return 0;
}

you get the idea. Or you suggest reading the entire file at once?

Oct 24 '05 #10
Hp
Hi puzzlecracker, I got the idea, wherein we are putting all the
non-stopwords into a vector of strings.
Here, if i am not wrong, input.txt is the file that has the list of
stopwords. Which one is the string that has the contents with the
stopwords and non-stopwords?
And what does initialize do?
Thanks

Oct 24 '05 #11
Hp wrote:

Hi puzzlecracker, I got the idea, wherein we are putting all the
non-stopwords into a vector of strings.
No.
In puzzlecrackers code

stopWset stands for the 'set of stop words'
wordvec is the vector of words you read from your input and which are
(after the loop has finished) not stop words
Here, if i am not wrong, input.txt is the file that has the list of
stopwords.
That's why it is called 'input' :-)
input is the file you want to check against the stop words
Which one is the string that has the contents with the
stopwords and non-stopwords?
And what does initialize do?


What do you think.
There are 2 file operations going on in the whole program
* one deals with your input
* the second one deals with the file of stop words

so if the loop handles your input file, what do you think
will be the job of initialize( stopWset). Especially when one
takes into account that it gets passed 'stopWset'.
--
Karl Heinz Buchegger
kb******@gascad.at
Oct 24 '05 #12
Hp
Hi All,
Thanks a lot for all your replies.

My requirement is as follows:
I need to read a text file, eliminate certain special characters(like !
, - = + ), and then convert it to lower case and then remove certain
stopwords(like and, a, an, by, the etc) which is there in another txt
file.
Then, i need to run it thru a stemmer(a program which converts words
like running to run, ie, converts them to roots words).
Then i need to create a term-by-document matrix, which would be a
matrix, where in M(i,j) will give the number of times the term j occurs
in the document i.

My situation as of now is as below:
I have read the file contents into a string variable, removed/replaced
the special characters with a space using the replace function, and
then converted the string completely to lower case, using the transform
function.

I would really appreciate .any help, thanks i advance.

Thanks,
Hp

Oct 25 '05 #13
Hp wrote:
Hi All,
Thanks a lot for all your replies.

My requirement is as follows:
I need to read a text file, eliminate certain special characters(like !
, - = + ), and then convert it to lower case and then remove certain
stopwords(like and, a, an, by, the etc) which is there in another txt
file.
Then, i need to run it thru a stemmer(a program which converts words
like running to run, ie, converts them to roots words).
Then i need to create a term-by-document matrix, which would be a
matrix, where in M(i,j) will give the number of times the term j occurs
in the document i.

My situation as of now is as below:
I have read the file contents into a string variable, removed/replaced
the special characters with a space using the replace function, and
then converted the string completely to lower case, using the transform
function.

I would really appreciate .any help, thanks i advance.

Thanks,
Hp


I know this may sound sacriliegious in a C++ newsgroup and all, but
does the text processing program have to be written in C++?

There are several dedicated text processing tools such as awk or sed,
or scripting languages (like Perl) that are specifically designed for
text stream editing. While certainly none of these alternatives is
particularly accessible, none has a steep learning curve either.

The power of regular expressions for manipulating text is difficult to
match in a C++ program without such support, at least in my experience.
And since I am not (too much of) a language snob, I recommend choosing
the best language for the job, even if it's not the best language. For
example, lowercasing a file's content with sed is a simple command

sed -e 's/[A-Z]/[a-z]/g' inputfile

Writing a C++ program to do the same would more involved. The good news
is that tr1's regex brings regular expression support to C++. So if a
C++ solution is required, I would look at regex to see whether it can
help solve your problem.

And if you do write the program in a language other than C++, some here
will be able to forgive you. But just don't tell your friends what you
have done.

Greg

Oct 25 '05 #14
Hp
Yeah Greg, i do need to have it coded in C++.
Thanks for your reply though. I still havent found a solution to that..

Greg wrote:
Hp wrote:
Hi All,
Thanks a lot for all your replies.

My requirement is as follows:
I need to read a text file, eliminate certain special characters(like !
, - = + ), and then convert it to lower case and then remove certain
stopwords(like and, a, an, by, the etc) which is there in another txt
file.
Then, i need to run it thru a stemmer(a program which converts words
like running to run, ie, converts them to roots words).
Then i need to create a term-by-document matrix, which would be a
matrix, where in M(i,j) will give the number of times the term j occurs
in the document i.

My situation as of now is as below:
I have read the file contents into a string variable, removed/replaced
the special characters with a space using the replace function, and
then converted the string completely to lower case, using the transform
function.

I would really appreciate .any help, thanks i advance.

Thanks,
Hp


I know this may sound sacriliegious in a C++ newsgroup and all, but
does the text processing program have to be written in C++?

There are several dedicated text processing tools such as awk or sed,
or scripting languages (like Perl) that are specifically designed for
text stream editing. While certainly none of these alternatives is
particularly accessible, none has a steep learning curve either.

The power of regular expressions for manipulating text is difficult to
match in a C++ program without such support, at least in my experience.
And since I am not (too much of) a language snob, I recommend choosing
the best language for the job, even if it's not the best language. For
example, lowercasing a file's content with sed is a simple command

sed -e 's/[A-Z]/[a-z]/g' inputfile

Writing a C++ program to do the same would more involved. The good news
is that tr1's regex brings regular expression support to C++. So if a
C++ solution is required, I would look at regex to see whether it can
help solve your problem.

And if you do write the program in a language other than C++, some here
will be able to forgive you. But just don't tell your friends what you
have done.

Greg


Oct 25 '05 #15
On 24 Oct 2005 20:32:45 -0700, "Greg" <gr****@pacbell.net> wrote:
I know this may sound sacriliegious in a C++ newsgroup and all, but
does the text processing program have to be written in C++?

There are several dedicated text processing tools such as awk or sed,
or scripting languages (like Perl) that are specifically designed for
text stream editing. While certainly none of these alternatives is
particularly accessible, none has a steep learning curve either.


My thoughts exactly. I use Python for my scripting needs. But this is a C++
forum and I think answers using C++ tools are appropriate.

Maybe the OP would like to take a look at boost's regular expressions library?

-dr
Oct 25 '05 #16
Greg wrote:
I know this may sound sacriliegious in a C++ newsgroup and all, but
does the text processing program have to be written in C++?

There are several dedicated text processing tools such as awk or sed,
or scripting languages (like Perl) that are specifically designed for
text stream editing. While certainly none of these alternatives is
particularly accessible, none has a steep learning curve either.


I disagree to some point with that common point of view. Certainly the use
of the language best adapted to the work sound reasonable. But in many
cases one person or organization only has a relatively good knowledge of
one language and a superficial and possibly outdated of others. And the use
of the "main" language even for relatively small things has the advantage
that the code, or parts of it, can be reused in other projects.

Other factor is the coherency of the project. Several projects have a Perl
or Python part that generates C or C++ code. That means that the people
able to collaborate in the project as a whole must know the two languages
you choose.

And finally, the C++ standard library is powerful enough to do without much
effort many things. For example, std::string makes affordable many things
that were unreasonable to write with C-style strings. "Accelerated C++" can
be seen as a sample of how to use C++ to do "scripting-style" tasks.

Certainly there are, for example, a lot of Perl modules for many tasks that
are not easily available or not so versatile in other languages.

--
Salu2
Oct 25 '05 #17
Hp
Hi Guys,
I need to use C++, and no other scripting tool.
If anybody could give a solution to the problem, it would be higly
appreciated.
Thanks,
Hp

Oct 25 '05 #18
"Hp" <pr****************@gmail.com> wrote in message
news:11**********************@z14g2000cwz.googlegr oups.com...
Hi Guys,
I need to use C++, and no other scripting tool.
If anybody could give a solution to the problem, it would be higly
appreciated.


It's almost a certainty that nobody here is going to
simply provide a solution (that's not what this group
is for). You've received many ideas and hints, why
not give it a try, then when you get stuck, you can
post your (relevant) code and ask specific questions,
whereupon you'll receive more specific assistance.

If you really do want a completed solution, you need
to find a 'help wanted' group to post your solicitation.

-Mike
Oct 25 '05 #19

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: Mosher | last post by:
Hi all, I was wondering if php can parse a text string for certain words and return "true" if that word is found. For example, I have a string like this: $string = "The rain in spain is the same...
7
by: Voetleuce en fênsievry | last post by:
Hello everyone. I'm not a JavaScript author myself, but I'm looking for a method to remove duplicate words from a piece of text. This text would presumably be pasted into a text box. I have,...
2
by: Stephajn Craig | last post by:
Is there a way that I can have a page do some highlighting based on a set of given keywords? The scenario is that a user does a search for certain keywords, and then the items returned have the...
3
by: CF FAN | last post by:
Hi I have to remove # from a string in coldfusion ..but getting error .can u help me? plzzzzz <cfset narrative="nrr#tive"> <cfoutput>#Replace(narrative, '##',' ', 'ALL')#</cfoutput>
26
by: Brad | last post by:
I'm writing a function to remove certain characters from strings. For example, I often get strings with commas... they look like this: "12,384" I'd like to take that string, remove the comma...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.