By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
429,480 Members | 771 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 429,480 IT Pros & Developers. It's quick & easy.

string letter distribution algorithm

P: n/a
Hi, everyone,

I have a simple problem, which bothers me for some while. I will try
to explain it -

There is some string, whith different letters in it. Is it possible to
analyse this string and the
reffer to it knowing that it has some certain letters in it using only
some integer value?

I.e if a string is 'catpow', there is 5 unique letters 'c', 'a', 't',
'p', 'o' and 'w'. word evaluates to some integer value, lets say
1234.

I would like to filter out all words except those having letter 't' in
it using single integer value and simple arithmetical expression.

Thank you in advance.

Cheers.,
Ulterior

Jun 20 '07 #1
Share this Question
Share on Google+
9 Replies


P: n/a
Ulterior wrote:
I have a simple problem, which bothers me for some while. I will try
to explain it -

There is some string, whith different letters in it. Is it possible to
analyse this string and the
reffer to it knowing that it has some certain letters in it using only
some integer value?

I.e if a string is 'catpow', there is 5 unique letters 'c', 'a', 't',
'p', 'o' and 'w'.
Just curious, out of the SIX letters, which one is non-unique?
word evaluates to some integer value, lets say
1234.

I would like to filter out all words except those having letter 't' in
it using single integer value and simple arithmetical expression.
This is not really a C++ language question, is it?

To determine if a word has a certain letter all you need is a bit mask
with each bit representing the letter. An unsigned long would do.
Then use the bitwise AND operator to see if the bit designated to 't'
is set.

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
Jun 20 '07 #2

P: n/a

Ulterior <le***************@gmail.comwrote in message...
Hi, everyone,
I have a simple problem, which bothers me for some while. I will try
to explain it -

There is some string, whith different letters in it. Is it possible to
analyse this string and the
reffer to it knowing that it has some certain letters in it using only
some integer value?
All 'strings' are made of 'integers', (a 'char' is an integer type).
// cast is to 'print' number, not character. (short, long, etc. ok)
std::cout<<"\'A\'="<< static_cast<int>('A') <<std::endl;
// out: 'A'=65
( you may get a different number on your system.)
>
I.e if a string is 'catpow', there is 5 unique letters 'c', 'a', 't',
'p', 'o' and 'w'. word evaluates to some integer value, lets say
1234.
That's clear as mud.

int anT( 't' );
std::cout<<"anT="<<anT<<std::endl;
// out: anT=116

int anS( 'c'+'a'+'t'+'p'+'o'+'w' );
std::cout<<"anS="<<anS<<std::endl;
// out: anS=654

Now you want to find 116 in 654?

int anS2( 'c'+'a'+'p'+'o'+'w' ); // no 't'
if( ( anS - anS2 ) == anT ){
std::cout<<"\'"<< char( anT & 0xFF ) <<"\' found.\n"; // or 0x7F
}
// out: 't' found.
{
std::string wrd("catpow"); // #include <string>
for( std::size_t in(0); in < wrd.size(); ++in){
if( int( wrd.at( in ) ) == int( 't' ) ){
std::cout<<"letter \'t\' found"<<std::endl;
}
else{ std::cout<<"letter \'t\' NOT found"<<std::endl;}
}
}
/* - output -
letter 't' NOT found
letter 't' NOT found
letter 't' found
letter 't' NOT found
letter 't' NOT found
letter 't' NOT found
*/
>
I would like to filter out all words except those having letter 't' in
it using single integer value and simple arithmetical expression.
Good luck with that.

{
std::vector<std::stringwords; // #include <vector>
words.push_back("hello");
words.push_back("world");
words.push_back("catpow");
words.push_back("clarity");
std::string FindThis( "t" );
for( std::size_t in(0); in < words.size(); ++in ){
if( words.at( in ).find( FindThis ) != std::string::npos ){
std::cout<<words.at( in )<<std::endl;
} // if(find)
} // for(in)
}
// out: catpow
// out: clarity

Now, tell us what you really really want.
--
Bob R
POVrookie
Jun 20 '07 #3

P: n/a
On Jun 20, 8:10 am, Ulterior <leonardas.surv...@gmail.comwrote:
Hi, everyone,

I have a simple problem, which bothers me for some while. I will try
to explain it -

There is some string, whith different letters in it. Is it possible to
analyse this string and the
reffer to it knowing that it has some certain letters in it using only
some integer value?

I.e if a string is 'catpow', there is 5 unique letters 'c', 'a', 't',
'p', 'o' and 'w'. word evaluates to some integer value, lets say
1234.

I would like to filter out all words except those having letter 't' in
it using single integer value and simple arithmetical expression.
You might consider std::bitset<26>, and its to_ulong member for your
value.

Try comp.programming.

- Anand

Jun 20 '07 #4

P: n/a
"Ulterior" <le***************@gmail.comwrote in message
news:11**********************@w5g2000hsg.googlegro ups.com...
Hi, everyone,

I have a simple problem, which bothers me for some while. I will try
to explain it -

There is some string, whith different letters in it. Is it possible to
analyse this string and the
reffer to it knowing that it has some certain letters in it using only
some integer value?

I.e if a string is 'catpow', there is 5 unique letters 'c', 'a', 't',
'p', 'o' and 'w'. word evaluates to some integer value, lets say
1234.

I would like to filter out all words except those having letter 't' in
it using single integer value and simple arithmetical expression.

Thank you in advance.
Here is something I threw together. Not sure if it's what you want, and
it's not optimized and is pretty much just C code.

#include <iostream>
#include <map>
#include <cmath>

int WordLetters( const char* Word )
{
int Result = 0;
for ( int i = 0; Word[i] != 0; ++i )
{
unsigned char LetterValue = static_cast<unsigned char>( toupper(
Word[i] ) );
// Ignore special characters, only treat characters
if ( LetterValue >= 'A' && LetterValue <= 'Z' )
Result |= static_cast<int>( std::pow( 2, LetterValue - 'A' ) );
}
return Result;
}

bool LetterInWord( char Letter, int WordValue )
{
unsigned char LetterValue = static_cast<unsigned char>( toupper(
Letter ) );
// Ignore special characters, only treat characters
if ( LetterValue >= 'A' && LetterValue <= 'Z' )
return (WordValue & static_cast<int>( std::pow(2, LetterValue -
'A' ))) != 0;
else
return false;
}

int main()
{
if ( sizeof(int) < 4 )
{
std::cout << "int is not big enough. try long int or something" <<
"\n";
return 1;
}

int Catpow = WordLetters( "Catpow" );
std::cout << "X in Catpow: " << LetterInWord( 'X', Catpow ) << "\n";
std::cout << "T in Catpow: " << LetterInWord( 'T', Catpow ) << "\n";
std::cout << "C in Catpow: " << LetterInWord( 'C', Catpow ) << "\n";
std::cout << "W in Catpow: " << LetterInWord( 'w', Catpow ) << "\n";

return 0;
}

Basically all I'm doing is converting each letter 'A' through 'Z' to a
number 1 thorugh 26. Then I take it as a binary place. Which means that I
need an ordinal value with at least 26 bits, for my system int fits that
since it has 32 bits. Then I simply set the bit on for that place.
Jun 20 '07 #5

P: n/a
Hi, guys,

thanks for your kind answers, but, if you are eager to help me
further,
I will try to clarify my problem.

The point is to evaluate a string to some single value, store it in
database field
and the, using simple sql compare, retrieve only those records with
fields, which have 't'
letter in it.

There must be no functions, as there is 6mil records and a fetch must
be quick.

Rgrds,
Ulterior

Jun 21 '07 #6

P: n/a
"Ulterior" <le***************@gmail.comwrote in message
news:11**********************@o61g2000hsh.googlegr oups.com...
Hi, guys,

thanks for your kind answers, but, if you are eager to help me
further,
I will try to clarify my problem.

The point is to evaluate a string to some single value, store it in
database field
and the, using simple sql compare, retrieve only those records with
fields, which have 't'
letter in it.

There must be no functions, as there is 6mil records and a fetch must
be quick.
"single value" "simple sql compare" Well, that's a bit tricky. If SQL does
bit checking then you can just retrieve the bit value of 't' using some
function, and run some SQL like this (if thsi was legal SQl)

select * from MyTable where WordLetters | LetterValue

If SQL doesn't do bitwise checking, you'll probalby need to come up with
some other way.

This now sounds more like a SQL problem then a C++ problem, perhaps a SQL
newsgroup would be more help.
Jun 21 '07 #7

P: n/a
On Jun 21, 12:47 am, Ulterior <leonardas.surv...@gmail.comwrote:
Hi, guys,

thanks for your kind answers, but, if you are eager to help me
further,
I will try to clarify my problem.

The point is to evaluate a string to some single value, store it in
database field
and the, using simple sql compare, retrieve only those records with
fields, which have 't'
letter in it.

There must be no functions, as there is 6mil records and a fetch must
be quick.
SELECT * FROM table_name WHERE field_name LIKE '%t%'

- Anand

Jun 21 '07 #8

P: n/a
In article <11**********************@o61g2000hsh.googlegroups .com>,
le***************@gmail.com says...

[ ... ]
The point is to evaluate a string to some single value, store it in
database field and the, using simple sql compare, retrieve only those
records with fields, which have 't' letter in it.

There must be no functions, as there is 6mil records and a fetch must
be quick.
Using (or not using) functions won't make any real difference here. For
what you're doing, CPU time is virtually guaranteed to mean nothing
compared to I/O time -- if you use a dozen functions to avoid even a
little bit of I/O, it'll be a win.

Since SQL (as such) is off-topic, I'll discuss this in C++ terms, and
leave it to you to translate over to SQL (though the translation is
_mostly_ fairly trivial).

I'm going to assume that you're willing to consume some time up-front to
build an index that gives you really fast retrieval later. In that case,
let's represent your database as a vector of records:

class record {
std::string key;
void *data; // this just represents whatever other
// data goes with your key.
};

std::vector<recorddatabase;

Now, our index is going to be a vector indexed by a character, and what
it returns as a result will be a set of records that have that character
in the key:

typedef std::set<std::size_trecord_set;

std::vector<record_setindex;

Now, we need to build our index from our data. We do that by looking at
each item in the database, and adding its record number to the
appropriate places in the index:

// for each item in the database
for (int i=0; i<database.size(); i++) {

std::string const &s = database[i].key;

// for each character in the key:
for (int j=0; j<key.size(); j++) {
char ch = std::toupper(key[j]);
if (std::isupper(ch))
index[ch].insert(std::toupper(i));
}
}

Now, "index['t']" is the set of all records that have 't' somewhere in
their string. For example, to print out the record numbers of all the
records that have 't' in their key, we could do something like this:

record_set records = index['t'];

std::copy(records.begin(), records.end(),
std::ostream_iterator<std::size_t>(std::cout, "\n"));

Now, as to the conversion to SQL: the main thing is that you don't have
record numbers, so you typically include a field that acts as a unique
key. You'll want to tell your database to index on that field. IIRC, SQL
doesn't have a convenient way of representing a set like I've used
either. You can do that pretty easily with a two-column table, with the
letter in one column and a single record number in the right column.
With the left column indexed, retrieving the set of record numbers with
a particular letter is roughly equivalent to what I've written above.

I should also point out that if a given query is likely to retrieve most
of the records in the database, the index may not do much good. Its
whole point is to read only records you care about. The more records it
avoids reading, the more good it does. If you retrieve all or most of
the records anyway, the index just adds overhead.

--
Later,
Jerry.

The universe is a figment of its own imagination.
Jun 23 '07 #9

P: n/a
In article <MP************************@news.sunsite.dk>,
jc*****@taeus.com says...

[ ... ]
for (int i=0; i<database.size(); i++) {

std::string const &s = database[i].key;

// for each character in the key:
for (int j=0; j<key.size(); j++) {
char ch = std::toupper(key[j]);
if (std::isupper(ch))
index[ch].insert(std::toupper(i));
Oops -- that should, of course, be:
index[ch].insert(i);

My apologies for being an idiot. I should also add that I left out a few
things like resizing the index to be as large as needed (since the C++
version is really intended as something like pseudo-code for a SQL
version, not for use as-is).

--
Later,
Jerry.

The universe is a figment of its own imagination.
Jun 23 '07 #10

This discussion thread is closed

Replies have been disabled for this discussion.