On May 22, 6:45 pm, theronnights... @gmail.com wrote:
I am having a problem with a small bit of code. I can't figure out why
it won't work. Here:
There are a couple of problems.
//code:
#include <iostream>
#include <fstream>
#include <ctype.h>
#include <string>
using namespace std;
int main()
{
ifstream file_in;
ofstream file_out;
string word_in;
int old_total = 0;
int new_total = 0;
int x = 0;
file_in.open("w ords.list");
file_out.open(" words.out");
while ( !file_in.eof() )
This line is definitely wrong. The *only* time it is useful to
check eof() is after a read has failed, to know if it failed
because you were at eof, or because of some other failure.
The standard idiom here would be:
while ( std::getline( file_in, word_in ) ) // ...
In production code, it's generally considered a good idea to
check bad() and eof() *after* the loop: bad() means you had a
hard read error, and !eof() means that there was a format error
in the file---it shouldn't happen with getline().
{
getline(file_in , word_in);
old_total++;
int good = 0;
for ( int i = 0; i < word_in.size(); i++ )
{
if ( isalpha( word_in[i] ) != 0)
This is undefined behavior. You can't legally call the single
argument version of isalpha with a char.
If I were writing this code, I'd use std::find_if, and a
predicate using the std::ctype facet. But I do this sort of
thing often enough that I've got the necessary predicate in my
tool box. For a simple loop, I'd write something like:
for ( std::string::co nst_iterator it = word_in.begin() ;
it != word_in.end()
&& isalpha( static_cast< unsigned char >( *it ) );
++ it ) {
}
if ( it == word_in.end() ) {
file_out << word_in << std::endl ;
++ new_total ;
}
Note that in general, the use of continue or break in a loop is
a sign of poor programming practice. Especially in this case:
you are using a standard algorithm, called linear search, and it
has a more or less standard implementation, easily recognized
and verified by anyone familiar with the idioms of the language.
{
continue;
}
else
{
x = 1;
break;
}
}
if ( x == 0 )
{
file_out << word_in << endl;
new_total++;
}
else
{
continue;
}
}
cout << old_total << " lines in the original file." << endl;
cout << new_total << " lines in new file." << endl;
cout << old_total - new_total << " lines of difference." << endl;
return 0;}
//code
All this is supposed to do is run through a list of words from one
file and put the one that only have the standard 26 letters of the
English alphabet into another file. It works fine until it hits the
first word with a non-english character, at which time it finishes and
exits. The output is this:
72411 lines in the original file.
589 lines in new file.
71822 lines of difference.
Obviously the new file should have a whole lot more lines than that.
At line 590 is the word Asunción. It is such a minor thing but it is
driving me up a wall none the less. Any comments would be much
appreciated.
Well, there are two obvious problems. The first is that once
you've set x, you never reset it, so it is non-zero for the rest
of the program. Of course, to begin with, it shouldn't be an
int, but a bool, with true and false. And since it is only used
within the loop, it should be declared (and initialized) within
the loop, and not outside. But in the end, if you're writing
idiomatic C++, there's no need for it what so ever.
The second problem is that you are calling isalpha with a char.
If char is signed on your implementation (it usually is), then
it is almost certain that the encoding for ó results in a
negative value, which in turn results in undefined behavior when
calling the single parameter version of isalpha. If you use
this version, you absolutely *must* cast the char to unsigned
char. Otherwise, it's undefined behavior. (With most
implementations , you'll get random results---some accented
characters may return true.)
Note too that whether isalpha( 'ó' ) returns true or false
depends on the locale. By default, here, you are in "C" locale,
where it is guaranteed to return false, but it's something you
generally want to be careful with; at least where I live, real
programs very rarely leave the default locale "C". This is
another advantage of using the std::ctype facet---you can
specify any locale you want at the call site, thus avoiding any
risk of the global locale not being the one you want.
--
James Kanze (GABI Software) email:ja******* **@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientier ter Datenverarbeitu ng
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34