By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,375 Members | 1,116 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,375 IT Pros & Developers. It's quick & easy.

sort an array of strings

P: n/a
Hi,

Can anyone tell me an efficient algorithm to sort an array of strings?
Keep in mind that this array is HUGE and so the algorithm should me
efficient enough to deal with it.

Thanks

Nov 25 '05 #1
Share this Question
Share on Google+
25 Replies


P: n/a
"Rainmaker" <ca*************@gmail.com> writes:
Can anyone tell me an efficient algorithm to sort an array of strings?
Keep in mind that this array is HUGE and so the algorithm should me
efficient enough to deal with it.


qsort() should be reasonably efficient. You might get better
performance by rolling your own sorting function (perhaps based on a
qsort() implementation -- which isn't necessarily Quicksort), which
could avoid the overhead caused by the indirect calls to the
comparison function.

Obviously, sorting pointers rather than shuffling the strings around
is going to save you a lot of time. (I'm not even sure how you'd go
about shuffling the strings themselves if they've of variable
lengths.)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 25 '05 #2

P: n/a
Rainmaker said:
Hi,

Can anyone tell me an efficient algorithm to sort an array of strings?
Keep in mind that this array is HUGE and so the algorithm should me
efficient enough to deal with it.


Quicksort or mergesort should do fine. Which you pick depends on just how
huge the array is.

Don't forget you probably don't need to sort the array of strings at all.
It's quite likely you can get away with merely sorting an array of
pointers, where each pointer points to one of the strings.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)
Nov 25 '05 #3

P: n/a
Keith Thompson wrote:
"Rainmaker" <ca*************@gmail.com> writes:
Can anyone tell me an efficient algorithm to sort an array of strings?
Keep in mind that this array is HUGE and so the algorithm should me
efficient enough to deal with it.

qsort() should be reasonably efficient. You might get better
performance by rolling your own sorting function (perhaps based on a
qsort() implementation -- which isn't necessarily Quicksort), which
could avoid the overhead caused by the indirect calls to the
comparison function.

Obviously, sorting pointers rather than shuffling the strings around
is going to save you a lot of time. (I'm not even sure how you'd go
about shuffling the strings themselves if they've of variable
lengths.)

You can shuffle variable length strings themselves if they're adjacent,
suggesting a insertion sort. It is very slow. Sorting pointers is a much
better idea if you can.

--
Joe Wright
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---
Nov 25 '05 #4

P: n/a
Joe Wright <jw*****@comcast.net> writes:
Keith Thompson wrote:
"Rainmaker" <ca*************@gmail.com> writes:
Can anyone tell me an efficient algorithm to sort an array of strings?
Keep in mind that this array is HUGE and so the algorithm should me
efficient enough to deal with it.

qsort() should be reasonably efficient. You might get better
performance by rolling your own sorting function (perhaps based on a
qsort() implementation -- which isn't necessarily Quicksort), which
could avoid the overhead caused by the indirect calls to the
comparison function.
Obviously, sorting pointers rather than shuffling the strings around
is going to save you a lot of time. (I'm not even sure how you'd go
about shuffling the strings themselves if they've of variable
lengths.)

You can shuffle variable length strings themselves if they're
adjacent, suggesting a insertion sort. It is very slow. Sorting
pointers is a much better idea if you can.


If you must shuffle the strings themselves, you can first sort
pointers to the strings, then use the sorted array of pointers to
traverse the strings and copy them to a new location. This requires
enough space to hold two copies of all your strings.

Shuffling the strings in place probably requires shifting many of the
strings multiple times; if you really need to do that, the best
solution is probably to use something like insertion sort that only
swaps adjacent elements.

But it's unlikely that you really need to shuffle the strings.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 25 '05 #5

P: n/a
Keith Thompson wrote:
Joe Wright <jw*****@comcast.net> writes:
Keith Thompson wrote:
"Rainmaker" <ca*************@gmail.com> writes:

Can anyone tell me an efficient algorithm to sort an array of strings?
Keep in mind that this array is HUGE and so the algorithm should me
efficient enough to deal with it.

Shuffling the strings in place probably requires shifting many of the
strings multiple times; if you really need to do that, the best
solution is probably to use something like insertion sort that only
swaps adjacent elements.


Personally, I wouldn't recommend insertion sort for
a HUGE number of anything. O(HUGE^2) is simply too awful
to contemplate ...

The real difficulty with Rainmaker's question is that
he gives no clue about what HUGE means. The choice of a
strategy depends a lot on whether HUGE fits in physical
memory, is too big for physical memory but fits in virtual
memory, or is so big it won't even fit in virtual memory.
Given that lack of information, there's no way to offer a
reliable suggestion. Even my objection to O(HUGE^2) may
be faulty, if HUGE means only "More than I can count on
the fingers of Captain Hook's other hand."

--
Eric Sosman
es*****@acm-dot-org.invalid
Nov 25 '05 #6

P: n/a
Hi,

Sorry for the delay....
By HUGE i mean giga bytes of data....it may be more....

Sorting the pointers seems to me a good solution. Can you please give
me a small example(in C) as to how to achieve this?

Rainmaker

Nov 27 '05 #7

P: n/a
Rainmaker wrote:
Hi,

Sorry for the delay....
By HUGE i mean giga bytes of data....it may be more....

Sorting the pointers seems to me a good solution. Can you please give
me a small example(in C) as to how to achieve this?

Rainmaker


If you are working with gigabytes of data, then you likely do not want
to store it all in main memory. But see the CLC FAQ, question 13.11, at
http://www.eskimo.com/~scs/C-faq/q13.11.html

-Peter

--
Pull out a splinter to reply.
Nov 28 '05 #8

P: n/a
Rainmaker said:
Hi,

Sorry for the delay....
By HUGE i mean giga bytes of data....it may be more....
Then presumably this data is on file. I suggest you keep it there! :-)
Sorting the pointers seems to me a good solution. Can you please give
me a small example(in C) as to how to achieve this?


Well, not a small example, no! :-) But if you're talking Gigabytes, I think
mergesort is going to be your best bet.

If you don't have a copy of Knuth ("The Art of Computer Programming", Volume
3, "Sorting and Searching"), then http://en.wikipedia.org/wiki/Mergesort
would be a good starting place.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)
Nov 28 '05 #9

P: n/a
Rainmaker wrote:
Hi,

Sorry for the delay....
By HUGE i mean giga bytes of data....it may be more....
Still no quantitative information. How many gigabytes?
How much more? Terabytes? Zettabytes?
Sorting the pointers seems to me a good solution. Can you please give
me a small example(in C) as to how to achieve this?


If the data won't fit in memory, sorting pointers to it
doesn't help much: The eventual rearrangement of the pointed-
to data (whether explicit or implicit) will involve an awful
lot of bouncing around on the disk. Let's see: one gigabyte
of 100-byte strings is ten million strings. If you must do
a 10-ms seek before reading each of them that's 100,000
seconds or about twenty-eight hours -- and that's *after*
sorting the pointers, which involved a few disk accesses
in its own right ...

Rainmaker, if you want usable advice you're going to have
to describe your problem in considerably more detail. How
many strings are there, how long are they, what do they signify,
is there any pre-existing order you can exploit, why do you
want them sorted anyhow, ...?

--
Eric Sosman
es*****@acm-dot-org.invalid
Nov 28 '05 #10

P: n/a
> If the data won't fit in memory, sorting pointers to it
doesn't help much: The eventual rearrangement of the pointed-
to data (whether explicit or implicit) will involve an awful
lot of bouncing around on the disk.


Sorting an array in memory of structures consisting of (a) the
sort key and (b) the file offset of the record is often a
useful technique, especially if the array of structures fits
in memory even though the records don't. I remember using that
to advantage on a system with 32k of memory in the early 70's.
Copying the file to put it in sorted order did involve a lot
of seeking, and sometimes would overheat the floppy drive.

Gordon L. Burditt
Nov 28 '05 #11

P: n/a
Gordon Burditt wrote:
If the data won't fit in memory, sorting pointers to it
doesn't help much: The eventual rearrangement of the pointed-
to data (whether explicit or implicit) will involve an awful
lot of bouncing around on the disk.

Sorting an array in memory of structures consisting of (a) the
sort key and (b) the file offset of the record is often a
useful technique, especially if the array of structures fits
in memory even though the records don't. I remember using that
to advantage on a system with 32k of memory in the early 70's.
Copying the file to put it in sorted order did involve a lot
of seeking, and sometimes would overheat the floppy drive.


Yah. Knuth cites a theorem by Floyd showing that the
task of rearranging the unsorted input (after generating a
sorted sequence of key/location pairs) has a lower bound
that makes it at least as hard as sorting the original
records in full, give or take a few constant factors.

Of course, those easily ignored constant factors can
have an awful lot to do with whether a method is or is not
practical. That's why Rainmaker needs to disclose quite a
lot more about his problem before advice can be advanced
with any confidence.

--
Eric Sosman
es*****@acm-dot-org.invalid
Nov 29 '05 #12

P: n/a
To be precise,
Take 2 terabytes of data

each string not longer than 100bytes.
I dont understand why you need to know what these strings signify?
These are normal strings and you can store anything in it < 100 bytes.

There is no pre existing order.

Thanks,
Rainmaker

Eric Sosman wrote:
Rainmaker wrote:
Hi,

Sorry for the delay....
By HUGE i mean giga bytes of data....it may be more....


Still no quantitative information. How many gigabytes?
How much more? Terabytes? Zettabytes?
Sorting the pointers seems to me a good solution. Can you please give
me a small example(in C) as to how to achieve this?


If the data won't fit in memory, sorting pointers to it
doesn't help much: The eventual rearrangement of the pointed-
to data (whether explicit or implicit) will involve an awful
lot of bouncing around on the disk. Let's see: one gigabyte
of 100-byte strings is ten million strings. If you must do
a 10-ms seek before reading each of them that's 100,000
seconds or about twenty-eight hours -- and that's *after*
sorting the pointers, which involved a few disk accesses
in its own right ...

Rainmaker, if you want usable advice you're going to have
to describe your problem in considerably more detail. How
many strings are there, how long are they, what do they signify,
is there any pre-existing order you can exploit, why do you
want them sorted anyhow, ...?

--
Eric Sosman
es*****@acm-dot-org.invalid


Nov 29 '05 #13

P: n/a
On 29 Nov 2005 14:42:41 -0800, in comp.lang.c , "Rainmaker"
<ca*************@gmail.com> wrote:
I dont understand why you need to know what these strings signify?


The more we understand about your (rather peculiar) requirement, the
more likely it is that someone can come up with an answer. The
solution for 10,000,000 short bit different strings could be radically
different from the solution for 10,000 long very similar strings.

For instance, my immediate thought is - 2 TB of totally unsorted data
simply should not have been allowed to come into existence. You have a
major problem with whatever is creating these files, and you need to
take a step back and redesign that end of it.

Another thought - if the strings can be grouped into similar sets eg
everything that starts "foobarpotato" in one group, everything that
starts "wibblewobble" in another, you could perhaps rapidly sort into
groups, then each group, perhaps writing them out to separate files
and using filesystem tools such as the unix sort utility.

Hence its useful to know more.

--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.ungerhu.com/jxh/clc.welcome.txt>

----== Posted via Newsfeeds.Com - Unlimited-Unrestricted-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups
----= East and West-Coast Server Farms - Total Privacy via Encryption =----
Nov 29 '05 #14

P: n/a
Rainmaker wrote:
To be precise,
Take 2 terabytes of data

each string not longer than 100bytes.
I dont understand why you need to know what these strings signify?
These are normal strings and you can store anything in it < 100 bytes.

There is no pre existing order.

Thanks,
Rainmaker

Eric Sosman wrote:


My apologies to Eric; I have snipped completely everything.
Now Rainmaker. Let's see.. 2 terabytes is 2.0e12 bytes, right? 100 bytes
is 1.0e2 I think. Dividing file size by line length give 2.0e10 I think.
That's 20 giga-lines, right?

We're trying to get a grip on what you have and what you are trying to
achieve. Your data set seems over large. There aren't 20 giga-lines in
all the books in the Library of Congress.

--
Joe Wright
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---
Nov 30 '05 #15

P: n/a
[Please don't top-post.]

Rainmaker wrote:
To be precise,
Take 2 terabytes of data

each string not longer than 100bytes.
I dont understand why you need to know what these strings signify?
Perhaps you don't, but then you don't understand how to sort such
strings either, so I don't understand why you're withholding
information
that could help you get you an efficient solution to your problem. ;-)
These are normal strings and you can store anything in it < 100 bytes.
What is a 'normal' string?
There is no pre existing order.


If they can store _anything_, why exactly are you sorting them? How
will that benefit
your future processing?

It sounds like you don't actually have a C question per se, rather you
have a programming issue that could probably be better answered in
another, more general, forum, e.g. comp.programming.

But you're unlikely to get different answers there either, if you're
not going to
offer clues that might help to make the sort more streamlined.

--
Peter

Nov 30 '05 #16

P: n/a
In article <11**********************@z14g2000cwz.googlegroups .com>,
Rainmaker <ca*************@gmail.com> wrote:
Can anyone tell me an efficient algorithm to sort an array of strings?
Keep in mind that this array is HUGE and so the algorithm should me
efficient enough to deal with it.


Taking into account the other information you posted, about the
amount of data and about the size of each string, and the lack of
information about what the strings signify (information that might
potentially allow for a better algorithm based upon the internal
structure of the strings, or based upon string probabilities ):

I wonder whether you really need to -sort- the strings, or if you
only need to be able to -locate- a string efficiently?

Often, the most efficient way for locating particular strings is -not-
through sorting them. Instead, the most efficient way might be
through some kind of hash table. The efficiency of the hash table
would depend upon matters such as whether you need to be able to
update the table without rebuilding it; upon the distribution
of the strings; and upon restrictions you may need to impose about
paging data in from disk.
--
Is there any thing whereof it may be said, See, this is new? It hath
been already of old time, which was before us. -- Ecclesiastes
Nov 30 '05 #17

P: n/a
Rainmaker wrote:
To be precise,
Take 2 terabytes of data

each string not longer than 100bytes.
Okay: at least 20,000,000,000 strings.
I dont understand why you need to know what these strings signify?


For example, suppose they are all United States postal
("ZIP") codes consisting of five digits. You could "sort"
them by creating a table of 100,000 counters, reading the
data and counting how many times each code appears, and
then traversing the table in order, outputting the proper
number of copies of the input string. Knowing something
about the data doesn't always open up other approaches, but
it sometimes does.

More generally, it's often helpful to describe the
overall problem you're trying to solve, not just seek advice
on one of the steps in a solution you've come up with. It
may happen that there's a better way to attack the larger
problem, one that avoids the step that's giving trouble.

Despite repeated pleas, though, you persist in dribbling
out tiny bits of information -- always, always less than was
asked for. All right, I wash my hands of you; I've got better
things to do than coddle the uncooperative for free. Based
on the little you've told us about your problem:

- You've got more data than is likely to fit in high-
speed memory, so you need what is known as an "external
sort." Plenty of such programs have been written already
and I suggest you don't set out to write yet another.
Unix systems come with a utility imaginatively named
"sort," and there are commercial products that are faster,
fancier, friendlier, and so on. Choose one and use it.

- The amount of time required to sort your data is hard
to estimate. The I/O speeds of your disks will play
a part, as will the size of your computer memory. If
we suppose you can use ten gigabytes of memory, the
technique of replacement selection applied to your two
terabytes of input will give about one hundred initial
sub-files to sort, and you might be able to finish up
with a hundred-way merge. If you've got only two gig
of buffer space, though, you'll get something like five
hundred sub-files and will probably want a different
merge pattern. In any event, the choice of a good merge
pattern will be heavily dependent on the characteristics
of your computer and disks.

- There's at least a fifty percent chance that you really
don't need to sort the data in the classical sense at
all. But lacking information about what you're trying
to do, I'm unable to suggest any likely shortcuts. Go
ahead and sort, if that's your desire.

... and for further advice, seek another newsgroup. You
do not have a C problem -- not unless you decide to write a
C program, and I suspect you're not yet knowledgable enough
about sorting to write an effective one -- so you should seek
a forum frequented by sorting experts, not experts in C (or
Fortran, or Java, or ...). Good bye, and good luck.

--
Eric Sosman
es*****@acm-dot-org.invalid
Nov 30 '05 #18

P: n/a
"Rainmaker" wrote:
To be precise,
Take 2 terabytes of data

each string not longer than 100bytes.
I dont understand why you need to know what these strings signify?
These are normal strings and you can store anything in it < 100 bytes.

There is no pre existing order.


This link looks kind of reasonable to me. I post this for background and as
a starting point, I didn't follow the links to the code. But I don't think
you want code anyway. I think the number of files your OS will allow you to
have open at a single time will become interesting as you pursue this.
Depending on the real numbers in your problem, you may want to open hundreds
of files at a time and your OS may be the constraint on what you can do..

I found this with< "external sort" huge OR giga>

http://cis.stvincent.edu/swd/extsort/extsort.html
Nov 30 '05 #19

P: n/a
Rainmaker wrote:
To be precise,
Take 2 terabytes of data

each string not longer than 100bytes.
I dont understand why you need to know what these strings signify?
These are normal strings and you can store anything in it < 100 bytes.


So reference URLS from a weblog of a very popular website or something
like that?

Ok, like Peter posted, your top-level should be merge-sort. This will
allow you to retain locality of reference, which is very important once
you get to sizes that large. Ok, but once your recursion makes it down
to in-memory sizes, you should use IntroSort which is implemented as
quick-sort on top of heapsort (not on top of InsertSort, which some
IntroSort implementations do.) As an additional condition, once you
reach "in-cache" sizes (L2 size is ok) for the partition, switch to
heap-sort. And of course, generate pointers first, and sort the
pointers first.

Here's why:

1. Ordinary quicksort on *average* performs fewer compares and data
moves than either heap-sort or merge-sort. (But note that the
guaranteed O(n*log(n)) versions of Quicksort, based on the Select
algorithm, do *NOT* have this property -- i.e., its only worth it if
you use the naive style algorithm.) So when you are dominated by
comparison and access times, quick-sort is the best bet.

2. On modern processors, Heap-sort is faster once you reach the cache.
This is not very well known however since I had to discover this for
myself a couple years ago: http://www.pobox.com/~qed/sort.html . This
does require that you use "branch removal" techniques, or use a moden
compiler which implements these techniques (such as the Intel
compiler.) The reason is that once everything is in the cache, branch
misses start to dominate the actual bottom line performance. By
properly coding up heap-sort, it almost completely removes all branch
mispredictions (even though it performs three times as many comparisons
and twice as many data moves as quicksort).

3. Random disk access is just tremendously slow, and can only be
mitigated by streaming the data in the order that it is stored in in
the first place. I.e., first random access is always far more
expensive than reads that follow it, if they are sequentially from the
first read. Mergesort is the only sorting algorithm that performs
streaming.

There are some new parallel sorting algorithms which are very close to
O(n*log(n)/#cpus) however I don't know very much about them. It seems
to me that once you recurse to "in-memory" you can dispatch each
segment to a processor, which will basically be close enough. But I am
really just talking off the top of my head on that.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 30 '05 #20

P: n/a

In article <yN********************@comcast.com>, Joe Wright <jw*****@comcast.net> writes:
Rainmaker wrote:
To be precise,
Take 2 terabytes of data

each string not longer than 100bytes.
I dont understand why you need to know what these strings signify?
These are normal strings and you can store anything in it < 100 bytes.

There is no pre existing order.


My apologies to Eric; I have snipped completely everything.
Now Rainmaker. Let's see.. 2 terabytes is 2.0e12 bytes, right? 100 bytes
is 1.0e2 I think. Dividing file size by line length give 2.0e10 I think.
That's 20 giga-lines, right?

We're trying to get a grip on what you have and what you are trying to
achieve. Your data set seems over large. There aren't 20 giga-lines in
all the books in the Library of Congress.


This problem description looks a great deal like a "rainbow table" -
an offline dictionary of the hash values for various strings, used
for cracking passwords. The OP's use of "Rainmaker" as a nickname
also suggests that.

There's a bunch of literature on constructing and using rainbow
tables, and I'd suggest that someone at the "what sort should I use?"
stage is not going to beat the published approaches. In other words,
some research seems to be the appropriate next step, and I don't mean
asking OT questions on comp.lang.c.

--
Michael Wojcik mi************@microfocus.com

Please enjoy the stereo action fully that will surprise you. -- Pizzicato Five
Dec 1 '05 #21

P: n/a
Keith Thompson wrote:

"Rainmaker" <ca*************@gmail.com> writes:
Can anyone tell me an efficient algorithm to sort
an array of strings?
Keep in mind that this array is HUGE and so the algorithm should me
efficient enough to deal with it.


qsort() should be reasonably efficient. You might get better
performance by rolling your own sorting function (perhaps based on a
qsort() implementation -- which isn't necessarily Quicksort), which
could avoid the overhead caused by the indirect calls to the
comparison function.

Obviously, sorting pointers rather than shuffling the strings around
is going to save you a lot of time. (I'm not even sure how you'd go
about shuffling the strings themselves if they've of variable
lengths.)


Strings of variable lengths
can be contained in objects of same sufficient size.

char string[][sizeof "three"] = {"one","two","three"};

--
pete
Dec 1 '05 #22

P: n/a
we******@gmail.com wrote:
1. Ordinary quicksort on *average* performs fewer compares and data
moves than either heap-sort or merge-sort.


Fewer data moves, yes.
But the sift up, sift down version of heapsort
and a simple mergesort,
will each generally uses less comparisons than a quicksort.
Even on small arrays

Followup To: comp.programming

--
pete
Dec 2 '05 #23

P: n/a
pete wrote:

pete wrote:

we******@gmail.com wrote:
1. Ordinary quicksort on
*average* performs fewer compares and data
moves than either heap-sort or merge-sort.
Fewer data moves, yes.
But the sift up, sift down version of heapsort
and a simple mergesort,
will each generally uses less comparisons than a quicksort.
Even on small arrays

Followup To: comp.programming


Again.
I have to take that back.
I do in fact have a Lomuto style quicksort
which is lighter on comparisons than my best heapsort
for small arrays.

I'll get back to you on my mergesort measurements.


My mergesort claim didn't pan out either.

Above about 500 elements,
is where my siftdown siftup heapsort
does less comparisons than the Lomuto on disordered arrays.

--
pete
Dec 3 '05 #24

P: n/a
"Rainmaker" <ca*************@gmail.com> wrote in message
news:11**********************@z14g2000cwz.googlegr oups.com...
Can anyone tell me an efficient algorithm to sort an array of strings?
Keep in mind that this array is HUGE and so the algorithm should me
efficient enough to deal with it.


If I tell you, can I get the "A"?

--
Mabden
Dec 5 '05 #25

P: n/a
"Walter Roberson" <ro******@ibd.nrc-cnrc.gc.ca> wrote in message
news:dm**********@canopus.cc.umanitoba.ca...
In article <11**********************@z14g2000cwz.googlegroups .com>,
Rainmaker <ca*************@gmail.com> wrote:
Can anyone tell me an efficient algorithm to sort an array of strings?Keep in mind that this array is HUGE and so the algorithm should me
efficient enough to deal with it.


Taking into account the other information you posted, about the
amount of data and about the size of each string, and the lack of
information about what the strings signify (information that might
potentially allow for a better algorithm based upon the internal
structure of the strings, or based upon string probabilities ):

I wonder whether you really need to -sort- the strings, or if you
only need to be able to -locate- a string efficiently?

Often, the most efficient way for locating particular strings is -not-
through sorting them. Instead, the most efficient way might be
through some kind of hash table. The efficiency of the hash table
would depend upon matters such as whether you need to be able to
update the table without rebuilding it; upon the distribution
of the strings; and upon restrictions you may need to impose about
paging data in from disk.


Jesus, Dude, it's just homework. Calm down.

--
Mabden
Dec 5 '05 #26

This discussion thread is closed

Replies have been disabled for this discussion.