473,383 Members | 1,785 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,383 software developers and data experts.

Question About Method

I'm not quite sure if this the right forum for this question, so if
I'm wrong in my assumption, if you could point me in the right
direction, I'd appreciate it.

I'm in the process of learning C++, coming from a brief spell with C.
I'm completely self-taught, and the best way for me to learn a
language is by coming up with an idea for a project, and working on it
until I understand the methods used to finish it. Not the most ideal
way to learn, I'm sure, but it's the only option I have at the moment.

Currently, I'm working on a program that unscrambles words using a
600k dictionary file. I've made one in C before, but unfortunately, it
was horribly slow.

To speed up the routine I've decided to make an integer array
consisting of the line number where each letter starts in the
alphabetized dictionary file. So in case the scrambled letters were
abc (to make things simple), it would only scan through the a, b, and
c sections of the file. Also, I've been pondering the idea of putting
all of the data in the dictionary file into a linked list so as to make
the data processing quite a bit faster. Unfortunately, this would use
quite a bit of memory and I'm pretty sure it's not the most efficient
way. The reasoning behind my madness is this: scanning through the same
area of a file over and over again is redundant. Accessing the word
directly from a variable is a lot faster than having to grab it from a
file, and *then* allocating it to a variable. But like I said, memory is
an issue. Or is it?

My idea consists of scanning through the whole file when the program is
originally opened, storing the values of the words in a linked list, and
using an array to point to where the various letter groups starts. Is there
an easier/more practical way of doing this? And if so, what? If my method
*is* feasible, what sort of problems would I be likely to encounter?

Thanks much,
Mike
Jul 19 '05 #1
6 2248
On Wed, 22 Oct 2003 02:04:46 GMT, Michael Lawson <js******@adelphia.net> wrote:

To speed up the routine I've decided to make an integer array
consisting of the line number where each letter starts in the
alphabetized dictionary file. So in case the scrambled letters were
abc (to make things simple), it would only scan through the a, b, and
c sections of the file. Also, I've been pondering the idea of putting
all of the data in the dictionary file into a linked list so as to make
the data processing quite a bit faster. Unfortunately, this would use
quite a bit of memory and I'm pretty sure it's not the most efficient
way. The reasoning behind my madness is this: scanning through the same
area of a file over and over again is redundant. Accessing the word
directly from a variable is a lot faster than having to grab it from a
file, and *then* allocating it to a variable. But like I said, memory is
an issue. Or is it?


A better algorithm will speed things up.

There's no need to search through the words, you can preprocess them
once and then do a fast lookup.

Convert each dictionary word into a word with the same letters but in
sorted order. For example, "cat" -> "act" and "aardvark"=>"aaadkrrv".
Store the mappings of "sorted words" to words in a multimap<string,string>
or map<string, vector<string> > or whatever.

Finding the unscrambled word is then as simple as converting the
scrambled word to a "sorted word" and then looking up the words it
could be in the map.

Memory will be gobbled up, but memory isn't usually a problem on
PCs these days. Memory can be optimised by using a hashing function
instead of "sorted words" as the index to an array of words. And file
offsets for the words.

But that's algorithms and not C++.

--
Sam Holden
Jul 19 '05 #2
Sam Holden <sh*****@flexal.cs.usyd.edu.au> wrote in message
news:slrnbpc7rk.fa2.sh*****@flexal.cs.usyd.edu.au. ..
On Wed, 22 Oct 2003 02:04:46 GMT, Michael Lawson <js******@adelphia.net>

wrote:

To speed up the routine I've decided to make an integer array
consisting of the line number where each letter starts in the
alphabetized dictionary file. So in case the scrambled letters were
abc (to make things simple), it would only scan through the a, b, and
c sections of the file. Also, I've been pondering the idea of putting
all of the data in the dictionary file into a linked list so as to make
the data processing quite a bit faster. Unfortunately, this would use
quite a bit of memory and I'm pretty sure it's not the most efficient
way. The reasoning behind my madness is this: scanning through the same
area of a file over and over again is redundant. Accessing the word
directly from a variable is a lot faster than having to grab it from a
file, and *then* allocating it to a variable. But like I said, memory is
an issue. Or is it?


A better algorithm will speed things up.

There's no need to search through the words, you can preprocess them
once and then do a fast lookup.

Convert each dictionary word into a word with the same letters but in
sorted order. For example, "cat" -> "act" and "aardvark"=>"aaadkrrv".
Store the mappings of "sorted words" to words in a multimap<string,string>
or map<string, vector<string> > or whatever.


Yes, I know of a Scrabble algorithm that found the highest-scoring possible
word from a dictionary of 90,000 words in a fraction of a second on a 33MHz
486. It stored words as sorted anagrams also, and then in a tree. Each
letter of the sorted anagram was a node in the tree.

DW

Jul 19 '05 #3
Sam Holden wrote:

[snip - unscramble words]
Convert each dictionary word into a word with the same letters but in
sorted order. For example, "cat" -> "act" and "aardvark"=>"aaadkrrv".
Store the mappings of "sorted words" to words in a multimap<string,string>
What do you do about anagrams?
or map<string, vector<string> > or whatever.


Using a map makes the question moot, assuming you store (a hash of)
sorted words. But consider "edra" which could unscramble to "read",
"dear", or "dare". I'm not sure you can solve this problem without
knowledge of the context. Perhaps you can use some sort of suffix
chaining algorithm? Clearly, there is no 1-1 mapping.

/david

--
Andre, a simple peasant, had only one thing on his mind as he crept
along the East wall: 'Andre, creep... Andre, creep... Andre, creep.'
-- unknown
Jul 19 '05 #4
On Wed, 22 Oct 2003 14:09:22 -0400,
David Rubin <bo***********@nomail.com> wrote:
Sam Holden wrote:

[snip - unscramble words]
Convert each dictionary word into a word with the same letters but in
sorted order. For example, "cat" -> "act" and "aardvark"=>"aaadkrrv".
Store the mappings of "sorted words" to words in a multimap<string,string>
What do you do about anagrams?


It's a multimap so they all get put in with the same key.
or map<string, vector<string> > or whatever.


Using a map makes the question moot, assuming you store (a hash of)
sorted words. But consider "edra" which could unscramble to "read",
"dear", or "dare". I'm not sure you can solve this problem without
knowledge of the context. Perhaps you can use some sort of suffix
chaining algorithm? Clearly, there is no 1-1 mapping.


If a "scrambled" word just has a random arrangement of letters then clearly
you can't. You get a number of possible answers. But that's in the OPs
problem statement as well.

--
Sam Holden
Jul 19 '05 #5
Sam Holden wrote:

On Wed, 22 Oct 2003 14:09:22 -0400,
David Rubin <bo***********@nomail.com> wrote:
Sam Holden wrote:

[snip - unscramble words]
Convert each dictionary word into a word with the same letters but in
sorted order. For example, "cat" -> "act" and "aardvark"=>"aaadkrrv".
Store the mappings of "sorted words" to words in a multimap<string,string>
What do you do about anagrams?


It's a multimap so they all get put in with the same key.


That's my point. There could be more than one unscrambled word per
scrambled word.
[...anagrams...] Clearly, there is no 1-1 mapping.

If a "scrambled" word just has a random arrangement of letters then clearly
you can't. You get a number of possible answers. But that's in the OPs
problem statement as well.


This is all I saw:

Currently, I'm working on a program that unscrambles words using a
600k dictionary file.

OP doesn't mention anagrams at all. But it seems like an obvious
concern.

/david

--
Andre, a simple peasant, had only one thing on his mind as he crept
along the East wall: 'Andre, creep... Andre, creep... Andre, creep.'
-- unknown
Jul 19 '05 #6
In article <3F***************@nomail.com>, bo***********@nomail.com
says...

[ ... ]
It's a multimap so they all get put in with the same key.


That's my point. There could be more than one unscrambled word per
scrambled word.


I would use equal_range to find all the words corresponding to a set of
letters. Except under rather unusual circumstances, the computer
probably can't pick between them, at least without a LOT of extra work.

--
Later,
Jerry.

The universe is a figment of its own imagination.
Jul 19 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Robert Ferrell | last post by:
I have a style question. I have a class with a method, m1, which needs a helper function, hf. I can put hf inside m1, or I can make it another method of the class. The only place hf should ever...
1
by: Mike Malter | last post by:
I am just starting to work with reflection and I want to create a log that saves relevant information if a method call fails so I can call that method again later using reflection. I am...
15
by: designconcepts | last post by:
bo'jour, bo'jour, So I have question to present to the forum about OOD. This is a Csharp forum, but C# is the lang of choice and the question is an exercise based on some comments by the chief...
6
by: z_learning_tester | last post by:
Quick question- What happens if you have a private class with a public static method? Can you still say the following? Lets say you are making this call from another class, say class2... int...
1
by: Natalia DeBow | last post by:
Hi, I am working on a Windows-based client-server application. I am involved in the development of the remote client modules. I am using asynchronous delegates to obtain information from...
2
by: SouthSpawn | last post by:
I have the following situation. I am creating my own user control. This asp.net user control will have a method called "Read". When this method is being called from the page. It will read line...
105
by: Christoph Zwerschke | last post by:
Sometimes I find myself stumbling over Python issues which have to do with what I perceive as a lack of orthogonality. For instance, I just wanted to use the index() method on a tuple which does...
21
by: Roland | last post by:
The following issue is puzzling me: There are 2 ways of writing the code below: .... Dim fnt as Font = New Font(...) DrawString(myText, fnt,...) fnt.dispose(). or DrawString(myText, New...
29
by: Brad Pears | last post by:
Here is a simple OO design question... I have a Contract class. The user can either save an existing contract or they start off fresh with a blank contract, fill in the data and then save a...
2
by: K Viltersten | last post by:
Suppose there is the following class structure. class S {...} class A : S {...} class B : S {...} class A1 : A {void m(){...}} class B1 : B {void m(){...}} class B2 : B {...} Just to clarify...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.