Connecting Tech Pros Worldwide Forums | Help | Site Map

best method to find the freequent numbers

Imran
Guest
 
Posts: n/a
#1: Mar 14 '06
I have a vector of integers, such as [1 3 6 7 6 3 3 4 9 10 ]

and I want to find out the number which occurs most frequently.what is the
quick method. My array size is huge.

what I am doing is

1. find out the maximum value N
2. loop through 1...N
3. count # times each occurred
4. output the most frequent one

Are there are more efficient implementations avaiable?
Thanks



Peter Karlsson
Guest
 
Posts: n/a
#2: Mar 14 '06

re: best method to find the freequent numbers


Den 2006-03-14 skrev Imran <abdulrahman.imran@de.bosch.com>:[color=blue]
> I have a vector of integers, such as [1 3 6 7 6 3 3 4 9 10 ]
> and I want to find out the number which occurs most frequently.what is the
> quick method. My array size is huge.[/color]
[snip][color=blue]
> Are there are more efficient implementations avaiable?
> Thanks[/color]

Check out:
http://www.jjj.de/fxt/demo/sort/
For some good implementations.

//Peter
--
My REAL email address is:
vim -c ":%s/^/CrgreUXneyffba@tznvy.pbz/|:normal ggVGg?"
"Rainbows are pretty. I don't know why I shoot at them."
Richard G. Riley
Guest
 
Posts: n/a
#3: Mar 14 '06

re: best method to find the freequent numbers


On 2006-03-14, Imran <abdulrahman.imran@de.bosch.com> wrote:[color=blue]
> I have a vector of integers, such as [1 3 6 7 6 3 3 4 9 10 ]
>
> and I want to find out the number which occurs most frequently.what is the
> quick method. My array size is huge.
>
> what I am doing is
>
> 1. find out the maximum value N
> 2. loop through 1...N
> 3. count # times each occurred
> 4. output the most frequent one
>
> Are there are more efficient implementations avaiable?
> Thanks
>
>[/color]

If the range of the integers is limited to something "reasonable" like
65535 then you could consider creating an array indexed by the
integer in question and incrementing the count. Without writing pure C
code completely, it might approximate to something like:

while(not finished) begin
nextInt = inputIntegers[readIndex++];
countArray[nextInt]++;
end

Very fast. You could keep track of the most common number in the loop
or do a quick scan at the end.

The creation of the count array and the loop details are in your hands...
Vladimir S. Oka
Guest
 
Posts: n/a
#4: Mar 14 '06

re: best method to find the freequent numbers


On Tuesday 14 March 2006 08:50, Imran opined (in
<dv6058$p31$1@ns2.fe.internet.bosch.com>):
[color=blue]
> I have a vector of integers, such as [1 3 6 7 6 3 3 4 9 10 ]
>
> and I want to find out the number which occurs most frequently.what is
> the quick method. My array size is huge.
>
> what I am doing is
>
> 1. find out the maximum value N[/color]

I guess you mean "find the size of the array"? I'd expect that to be
known upfront.
[color=blue]
> 2. loop through 1...N
> 3. count # times each occurred
> 4. output the most frequent one
>
> Are there are more efficient implementations avaiable?
> Thanks[/color]

Let's make this slightly C-specific:

You can use `qsort()` to sort the array. Once sorted, you need just one
pass to determine the most frequent number (and it's frequency),
without the need to keep track of more than one count at the time
(well, two if you include current maximum). What you've proposed above,
would have required a separate count for every unique number you
encounter.

--
BR, Vladimir

He who laughs, lasts.

Ben C
Guest
 
Posts: n/a
#5: Mar 14 '06

re: best method to find the freequent numbers


On 2006-03-14, Imran <abdulrahman.imran@de.bosch.com> wrote:[color=blue]
> I have a vector of integers, such as [1 3 6 7 6 3 3 4 9 10 ]
>
> and I want to find out the number which occurs most frequently.what is the
> quick method. My array size is huge.
>
> what I am doing is
>
> 1. find out the maximum value N
> 2. loop through 1...N
> 3. count # times each occurred
> 4. output the most frequent one
>
> Are there are more efficient implementations avaiable?
> Thanks[/color]

Reading between the lines a bit, you're making another vector of counts
1 .. N long. It might not need to be that long if not every number is
represented; so you could use a sort of hash table where the hashing
function is just n % n_buckets. This could save memory, but wouldn't
be any faster.

But if every time you update one of the counts you keep track of the
maximum and minimum count so far, i.e.:

...

counts[number]++;
if (counts[number] > *max) max = counts + number;
if (counts[number] < *min) min = counts + number;

...

Then as soon as the number of elements left in the original vector is
less than *max - *min, you can stop counting and break out of the loop,
because none of them can get bigger than the max you've found.

Would this be worth it? Well, statistically, I would have thought quite
possibly, but you'd have to just try it and do some tests.
Richard G. Riley
Guest
 
Posts: n/a
#6: Mar 14 '06

re: best method to find the freequent numbers


On 2006-03-14, Ben C <spamspam@spam.eggs> wrote:[color=blue]
> On 2006-03-14, Imran <abdulrahman.imran@de.bosch.com> wrote:[color=green]
>> I have a vector of integers, such as [1 3 6 7 6 3 3 4 9 10 ]
>>
>> and I want to find out the number which occurs most frequently.what is the
>> quick method. My array size is huge.
>>
>> what I am doing is
>>
>> 1. find out the maximum value N
>> 2. loop through 1...N
>> 3. count # times each occurred
>> 4. output the most frequent one
>>
>> Are there are more efficient implementations avaiable?
>> Thanks[/color]
>
> Reading between the lines a bit, you're making another vector of counts
> 1 .. N long. It might not need to be that long if not every number is
> represented; so you could use a sort of hash table where the hashing
> function is just n % n_buckets. This could save memory, but wouldn't
> be any faster.
>
> But if every time you update one of the counts you keep track of the
> maximum and minimum count so far, i.e.:
>
> ...
>
> counts[number]++;
> if (counts[number] > *max) max = counts + number;
> if (counts[number] < *min) min = counts + number;
>
> ...
>
> Then as soon as the number of elements left in the original vector is
> less than *max - *min, you can stop counting and break out of the loop,
> because none of them can get bigger than the max you've found.[/color]

The break condition is a nice optimization alright.

A little simpler might be (no need to keep pointers to maximum and
minum integer locations and have overhead of pointer addition).

count = ++counts[number];
if (count>=max){
secondHighest=max;
max=count;
}
//numLeft is number left to process in vector.
if(numLeft < max-secondHighest)
break;
[color=blue]
>
> Would this be worth it? Well, statistically, I would have thought quite
> possibly, but you'd have to just try it and do some tests.[/color]

Ben C
Guest
 
Posts: n/a
#7: Mar 14 '06

re: best method to find the freequent numbers


On 2006-03-14, Richard G. Riley <rgrdev@gmail.com> wrote:
[color=blue]
> A little simpler might be (no need to keep pointers to maximum and
> minum integer locations and have overhead of pointer addition).
>
> count = ++counts[number];
> if (count>=max){
> secondHighest=max;
> max=count;
> }
> //numLeft is number left to process in vector.
> if(numLeft < max-secondHighest)
> break;[/color]

Also what I posted was wrong: max - secondHighest is correct. max - min
is not.

The point at which you break out is not necessarily the point at which
you discover the "winning" max count though; you could have discovered
max and secondHighest on previous iterations, and break out because
numLeft has got small enough.

It means then you may need to get back from max to the corresponding
number, since you need to return the number not the count. That was why
I had max as a pointer. There are other ways of course, like keep the
corresponding number in a variable as you go along etc...
Richard G. Riley
Guest
 
Posts: n/a
#8: Mar 14 '06

re: best method to find the freequent numbers


On 2006-03-14, Ben C <spamspam@spam.eggs> wrote:[color=blue]
> On 2006-03-14, Richard G. Riley <rgrdev@gmail.com> wrote:
>[color=green]
>> A little simpler might be (no need to keep pointers to maximum and
>> minum integer locations and have overhead of pointer addition).
>>
>> count = ++counts[number];
>> if (count>=max){
>> secondHighest=max;
>> max=count;
>> }
>> //numLeft is number left to process in vector.
>> if(numLeft < max-secondHighest) /****
>> break;[/color]
>
> Also what I posted was wrong: max - secondHighest is correct. max - min
> is not.
>
> The point at which you break out is not necessarily the point at which
> you discover the "winning" max count though; you could have discovered
> max and secondHighest on previous iterations, and break out because
> numLeft has got small enough.[/color]

I thought I did that? /**** above
Ben C
Guest
 
Posts: n/a
#9: Mar 14 '06

re: best method to find the freequent numbers


On 2006-03-14, Richard G. Riley <rgrdev@gmail.com> wrote:[color=blue]
> On 2006-03-14, Ben C <spamspam@spam.eggs> wrote:[color=green]
>> On 2006-03-14, Richard G. Riley <rgrdev@gmail.com> wrote:
>>[color=darkred]
>>> A little simpler might be (no need to keep pointers to maximum and
>>> minum integer locations and have overhead of pointer addition).
>>>
>>> count = ++counts[number];
>>> if (count>=max){
>>> secondHighest=max;
>>> max=count;
>>> }
>>> //numLeft is number left to process in vector.
>>> if(numLeft < max-secondHighest) /****
>>> break;[/color]
>>
>> Also what I posted was wrong: max - secondHighest is correct. max - min
>> is not.
>>
>> The point at which you break out is not necessarily the point at which
>> you discover the "winning" max count though; you could have discovered
>> max and secondHighest on previous iterations, and break out because
>> numLeft has got small enough.[/color]
>
> I thought I did that? /**** above[/color]

You do break out in the right place, the question is, having broken out
what do you have? The frequency of the highest-frequency number (but not
the highest-frequency number itself-- because that's not necessarily
"number", which was all I was saying). How to get from the max count
back to the corresponding number is the question. That was why I was
using pointers. *max is the highest count, max - counts would then be
the highest-frequency number, assuming the counts array is just
one entry for each possible value of number.

But never mind, this is just details, and not really as complicated as
I'm making it sound, and there are plenty of other ways of doing it that
are just as good or better. It would be pretty obvious what to do when
one actually implemented it I think.
Richard G. Riley
Guest
 
Posts: n/a
#10: Mar 14 '06

re: best method to find the freequent numbers


On 2006-03-14, Ben C <spamspam@spam.eggs> wrote:[color=blue]
> On 2006-03-14, Richard G. Riley <rgrdev@gmail.com> wrote:[color=green]
>> On 2006-03-14, Ben C <spamspam@spam.eggs> wrote:[color=darkred]
>>> On 2006-03-14, Richard G. Riley <rgrdev@gmail.com> wrote:
>>>
>>>> A little simpler might be (no need to keep pointers to maximum and
>>>> minum integer locations and have overhead of pointer addition).
>>>>
>>>> count = ++counts[number];
>>>> if (count>=max){
>>>> secondHighest=max;
>>>> max=count;
>>>> }
>>>> //numLeft is number left to process in vector.
>>>> if(numLeft < max-secondHighest) /****
>>>> break;
>>>
>>> Also what I posted was wrong: max - secondHighest is correct. max - min
>>> is not.
>>>
>>> The point at which you break out is not necessarily the point at which
>>> you discover the "winning" max count though; you could have discovered
>>> max and secondHighest on previous iterations, and break out because
>>> numLeft has got small enough.[/color]
>>
>> I thought I did that? /**** above[/color]
>
> You do break out in the right place, the question is, having broken out
> what do you have? The frequency of the highest-frequency number (but not
> the highest-frequency number itself-- because that's not necessarily[/color]
[color=blue]
> "number", which was all I was saying). How to get from the max count
> back to the corresponding number is the question. That was why I was
> using pointers. *max is the highest count, max - counts would then be
> the highest-frequency number, assuming the counts array is just
> one entry for each possible value of number.
>
> But never mind, this is just details, and not really as complicated as
> I'm making it sound, and there are plenty of other ways of doing it that
> are just as good or better. It would be pretty obvious what to do when
> one actually implemented it I think.[/color]

You're absolutely right!

a one liner after the if statement makes it all good

mostCommonNumber=number;

Fred Kleinschmidt
Guest
 
Posts: n/a
#11: Mar 14 '06

re: best method to find the freequent numbers



"Richard G. Riley" <rgrdev@gmail.com> wrote in message
news:cpuie3-ffm.ln1@quark.hadron...[color=blue]
> On 2006-03-14, Ben C <spamspam@spam.eggs> wrote:[color=green]
>> On 2006-03-14, Richard G. Riley <rgrdev@gmail.com> wrote:[color=darkred]
>>> On 2006-03-14, Ben C <spamspam@spam.eggs> wrote:
>>>> On 2006-03-14, Richard G. Riley <rgrdev@gmail.com> wrote:
>>>>
>>>>> A little simpler might be (no need to keep pointers to maximum and
>>>>> minum integer locations and have overhead of pointer addition).
>>>>>
>>>>> count = ++counts[number];
>>>>> if (count>=max){
>>>>> secondHighest=max;
>>>>> max=count;
>>>>> }
>>>>> //numLeft is number left to process in vector.
>>>>> if(numLeft < max-secondHighest) /****
>>>>> break;
>>>>
>>>> Also what I posted was wrong: max - secondHighest is correct. max - min
>>>> is not.
>>>>
>>>> The point at which you break out is not necessarily the point at which
>>>> you discover the "winning" max count though; you could have discovered
>>>> max and secondHighest on previous iterations, and break out because
>>>> numLeft has got small enough.
>>>
>>> I thought I did that? /**** above[/color]
>>
>> You do break out in the right place, the question is, having broken out
>> what do you have? The frequency of the highest-frequency number (but not
>> the highest-frequency number itself-- because that's not necessarily[/color]
>[color=green]
>> "number", which was all I was saying). How to get from the max count
>> back to the corresponding number is the question. That was why I was
>> using pointers. *max is the highest count, max - counts would then be
>> the highest-frequency number, assuming the counts array is just
>> one entry for each possible value of number.
>>
>> But never mind, this is just details, and not really as complicated as
>> I'm making it sound, and there are plenty of other ways of doing it that
>> are just as good or better. It would be pretty obvious what to do when
>> one actually implemented it I think.[/color]
>
> You're absolutely right!
>
> a one liner after the if statement makes it all good
>
> mostCommonNumber=number;
>[/color]

The OP also has to consider that the distribution of values might be
multi-modal:
1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6
The above set has 6 equally populated modes.
--
Fred L. Kleinschmidt
Boeing Associate Technical Fellow
Technical Architect, Software Reuse Project



Rod Pemberton
Guest
 
Posts: n/a
#12: Mar 14 '06

re: best method to find the freequent numbers



"Imran" <abdulrahman.imran@de.bosch.com> wrote in message
news:dv6058$p31$1@ns2.fe.internet.bosch.com...[color=blue]
> I have a vector of integers, such as [1 3 6 7 6 3 3 4 9 10 ]
>
> and I want to find out the number which occurs most frequently.what is the
> quick method. My array size is huge.[/color]

Forget about sorting the data, the time it takes to sort will be much more
than the time it takes to count it.

Try to only loop through the data once.

Factors which heavily affect how this could be implemented are:
1) the range of the integers
2) the data source of the vector
3) the size of the vector

If the range of the integers is small, say zero to 127 (ASCII) for a text
analysis of William Shakespaere, the implementation is simple:
1) create a counter array of 127 unsigned long or unsigned long long's
2) increment from the first to last element of the vector
3) increment the appropriate counter

If the range is very large, you may want to create multiple arrays to help
you keep track of the data to reduce memory usage:
1) one array of bits where each bit is a seen or not-seen indicator of
values in the vector
2) another array or binary tree which keeps track of the count for each of
the seen integers
3) you may need to normalize the data first, by say mapping one set of
numbers to another.
This could allow you to reduce the range of integers used:
1,6,30,32,60,200 to 1,2,3,4,5,6

If the data source of the vector is say from some function, forget about
generating the vector. Just feed the data into your tabulation method.

If the "huge" vector is truly huge, it becomes an issue of finding a method
to balance the memory available for tabulation and the speed of execution.


Rod Pemberton


Jan-Hinnerk Dumjahn
Guest
 
Posts: n/a
#13: Mar 14 '06

re: best method to find the freequent numbers


Imran wrote:
[color=blue]
> I have a vector of integers, such as [1 3 6 7 6 3 3 4 9 10 ]
>
> and I want to find out the number which occurs most frequently.what is the
> quick method. My array size is huge.[/color]

There two common "types" of complexity (of an algorithm):
- expected complexity
- worst-case complexity

For calculating expected complexity you need to know something about the
probabilities of different vectors.
Moreover, good expected complexity is not good enough for some applications
(e.g. real-time)

This leaves you with worst-case complexity.
[color=blue]
> what I am doing is
>
> 1. find out the maximum value N
> 2. loop through 1...N
> 3. count # times each occurred
> 4. output the most frequent one[/color]

Viewing the worst-case complexity none of the approaches are better than
yours.

I guess the best approach is some kind of sorting and then scanning through
once, like Vladimir suggested. However qsort() is not the best choice since
Quicksort has complexity O(n^2).

Faster sorting algorithms usually work better with linked lists then with
arrays, so you would need some extra memory ;-)

My suggestion would be to use a (height-balanced) binary-search-tree.
At the nodes you keep pairs of (value,frequency). For each array element
search the value in the tree. If it exists increment frequency; else add a
new node (value,1). After you are done with the array look for the node
with highest frequency.

This algorithm is especially fast, if you don't have many different values.

There should be BST implementations available in C.

--------

If your integer type is small compared to the size of the array, sorting the
array with bucket sort could be even faster than using a BST.

Have fun
/Jan-Hinnerk

Keith Thompson
Guest
 
Posts: n/a
#14: Mar 14 '06

re: best method to find the freequent numbers


Jan-Hinnerk Dumjahn <hinni@despammed.com> writes:
[...][color=blue]
> I guess the best approach is some kind of sorting and then scanning through
> once, like Vladimir suggested. However qsort() is not the best choice since
> Quicksort has complexity O(n^2).[/color]

Quicksort has complexity O(n log n). I think pure Quicksort has
worst-case complexity O(n^2), but there's no requirement for qsort()
to be implemented as pure Quicksort.
[color=blue]
> Faster sorting algorithms usually work better with linked lists then with
> arrays, so you would need some extra memory ;-)[/color]

I don't think that's true. The fastest way to sort a linked list is
usually to copy it to an array and sort the array.

--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Jan-Hinnerk Dumjahn
Guest
 
Posts: n/a
#15: Mar 14 '06

re: best method to find the freequent numbers


Keith Thompson wrote:
[color=blue]
> Jan-Hinnerk Dumjahn <hinni@despammed.com> writes:
> [...][color=green]
>> I guess the best approach is some kind of sorting and then scanning
>> through once, like Vladimir suggested. However qsort() is not the best
>> choice since Quicksort has complexity O(n^2).[/color]
>
> Quicksort has complexity O(n log n). I think pure Quicksort has
> worst-case complexity O(n^2), but there's no requirement for qsort()
> to be implemented as pure Quicksort.[/color]

I have taken a quick look on google but wasn't been able to find something
on Quicksort with O(N log n). However, I came across heapsort which seems
to fit quite nicely. (Wish we had done more fun stuff like that at
university...)

qsort() need not have a poor implementation, but it could ;-)
[color=blue][color=green]
>> Faster sorting algorithms usually work better with linked lists then with
>> arrays, so you would need some extra memory ;-)[/color]
>
> I don't think that's true. The fastest way to sort a linked list is
> usually to copy it to an array and sort the array.[/color]

Complexity should be same for both (if you don't need random access). From a
theoretical approach I still believe that e.g. bucketsort would be slower
using arrays. However, if I consider real world effects like processor
cache using arrays could be a lot faster ;-(

Keith Thompson
Guest
 
Posts: n/a
#16: Mar 14 '06

re: best method to find the freequent numbers


Jan-Hinnerk Dumjahn <hinni@despammed.com> writes:[color=blue]
> Keith Thompson wrote:[color=green]
>> Jan-Hinnerk Dumjahn <hinni@despammed.com> writes:[/color][/color]
[...][color=blue][color=green][color=darkred]
>>> Faster sorting algorithms usually work better with linked lists then with
>>> arrays, so you would need some extra memory ;-)[/color]
>>
>> I don't think that's true. The fastest way to sort a linked list is
>> usually to copy it to an array and sort the array.[/color]
>
> Complexity should be same for both (if you don't need random access). From a
> theoretical approach I still believe that e.g. bucketsort would be slower
> using arrays. However, if I consider real world effects like processor
> cache using arrays could be a lot faster ;-([/color]

Most decent (O(n log n)) sorting algorithms do require random access.

--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Jordan Abel
Guest
 
Posts: n/a
#17: Mar 14 '06

re: best method to find the freequent numbers


On 2006-03-14, Keith Thompson <kst-u@mib.org> wrote:[color=blue]
> Jan-Hinnerk Dumjahn <hinni@despammed.com> writes:
> [...][color=green]
>> I guess the best approach is some kind of sorting and then scanning through
>> once, like Vladimir suggested. However qsort() is not the best choice since
>> Quicksort has complexity O(n^2).[/color]
>
> Quicksort has complexity O(n log n). I think pure Quicksort has
> worst-case complexity O(n^2), but there's no requirement for qsort()
> to be implemented as pure Quicksort.
>[color=green]
>> Faster sorting algorithms usually work better with linked lists then with
>> arrays, so you would need some extra memory ;-)[/color]
>
> I don't think that's true. The fastest way to sort a linked list is
> usually to copy it to an array and sort the array.[/color]

I'd argue that the fastest way to sort a linked list is to keep it
sorted in the first place.
pete
Guest
 
Posts: n/a
#18: Mar 14 '06

re: best method to find the freequent numbers


Jordan Abel wrote:[color=blue]
>
> On 2006-03-14, Keith Thompson <kst-u@mib.org> wrote:[color=green]
> > Jan-Hinnerk Dumjahn <hinni@despammed.com> writes:
> > [...][color=darkred]
> >> I guess the best approach is
> >> some kind of sorting and then scanning through
> >> once, like Vladimir suggested.
> >> However qsort() is not the best choice since
> >> Quicksort has complexity O(n^2).[/color]
> >
> > Quicksort has complexity O(n log n). I think pure Quicksort has
> > worst-case complexity O(n^2), but there's no requirement for qsort()
> > to be implemented as pure Quicksort.
> >[color=darkred]
> >> Faster sorting algorithms usually work better
> >> with linked lists then with
> >> arrays, so you would need some extra memory ;-)[/color]
> >
> > I don't think that's true. The fastest way to sort a linked list is
> > usually to copy it to an array and sort the array.[/color]
>
> I'd argue that the fastest way to sort a linked list is to keep it
> sorted in the first place.[/color]

I find linked lists to be especially well suited to mergesort.


struct list_node {
struct list_node *next;
void *data;
};

typedef struct list_node list_type;

list_type *list_sort(list_type *head,
int (*compar)(const list_type *, const list_type *));
list_type *list_merge(list_type *head, list_type *tail,
int (*compar)(const list_type *, const list_type *));

static long unsigned node_count(list_type *head);
static list_type *list_split(list_type *head, long unsigned count);
static list_type *node_sort (list_type *head, long unsigned count,
int (*compar)(const list_type *, const list_type *));



list_type *list_sort(list_type *head,
int (*compar)(const list_type *, const list_type *))
{
return node_sort(head, node_count(head), compar);
}

static long unsigned node_count(list_type *head)
{
long unsigned count;

for (count = 0; head != NULL; head = head -> next) {
++count;
}
return count;
}

static list_type *node_sort(list_type *head, long unsigned count,
int (*compar)(const list_type *, const list_type *))
{
long unsigned half;
list_type *tail;

if (count > 1) {
half = count / 2;
tail = list_split(head, half);
tail = node_sort(tail, count - half, compar);
head = node_sort(head, half, compar);
head = list_merge(head, tail, compar);
}
return head;
}

static list_type *list_split(list_type *head, long unsigned count)
{
list_type *tail;

while (--count != 0) {
head = head -> next;
}
tail = head -> next;
head -> next = NULL;
return tail;
}

list_type *list_merge(list_type *head, list_type *tail,
int (*compar)(const list_type *, const list_type *))
{
list_type *list, *sorted, **node;

node = compar(head, tail) > 0 ? &tail : &head;
sorted = list = *node;
*node = sorted -> next;
while (*node != NULL) {
node = compar(head, tail) > 0 ? &tail : &head;
sorted -> next = *node;
sorted = *node;
*node = sorted -> next;
}
sorted -> next = head != NULL ? head : tail;
return list;
}

--
pete
Grumble
Guest
 
Posts: n/a
#19: Mar 15 '06

re: best method to find the freequent numbers


Jan-Hinnerk Dumjahn wrote:[color=blue]
> I have taken a quick look on google but wasn't able to find something
> on Quicksort with O(N log n). However, I came across heapsort which
> seems to fit quite nicely.[/color]

Quicksort has O(n log(n)) _average_ complexity.

http://en.wikipedia.org/wiki/Sorting_algorithm
http://en.wikipedia.org/wiki/Quicksort
Grumble
Guest
 
Posts: n/a
#20: Mar 15 '06

re: best method to find the freequent numbers


Jan-Hinnerk Dumjahn wrote:[color=blue]
> I guess the best approach is some kind of sorting and then scanning
> through once, like Vladimir suggested. However qsort() is not the
> best choice since Quicksort has complexity O(n^2).[/color]

Who said qsort() _must_ implement Quicksort?
Richard Bos
Guest
 
Posts: n/a
#21: Mar 15 '06

re: best method to find the freequent numbers


Keith Thompson <kst-u@mib.org> wrote:
[color=blue]
> Jan-Hinnerk Dumjahn <hinni@despammed.com> writes:
> [...][color=green]
> > I guess the best approach is some kind of sorting and then scanning through
> > once, like Vladimir suggested. However qsort() is not the best choice since
> > Quicksort has complexity O(n^2).[/color]
>
> Quicksort has complexity O(n log n). I think pure Quicksort has
> worst-case complexity O(n^2), but there's no requirement for qsort()
> to be implemented as pure Quicksort.[/color]

In fact, there's no requirement for it to be implemented as any kind of
quicksort at all, though for obvious reasons it often is.

Richard
Mark F. Haigh
Guest
 
Posts: n/a
#22: Mar 15 '06

re: best method to find the freequent numbers


Imran wrote:[color=blue]
> I have a vector of integers, such as [1 3 6 7 6 3 3 4 9 10 ]
>
> and I want to find out the number which occurs most frequently.what is the
> quick method. My array size is huge.
>
> what I am doing is
>
> 1. find out the maximum value N
> 2. loop through 1...N
> 3. count # times each occurred
> 4. output the most frequent one
>
> Are there are more efficient implementations avaiable?
> Thanks[/color]

Try this. Note that if there are multiple integers with the same
number of occurrances, this code returns the highest valued one.

#include <stdio.h>
#include <stdlib.h>

int cmp(const void *p, const void *q)
{
return *((int *) p) - *((int *) q);
}

int sort_and_get_most_popular(int *arr, size_t nelem)
{
int ret = *arr;
size_t i, count = 1, best = 1;

qsort(arr, nelem, sizeof *arr, cmp);
for(i = 1; i < nelem; i++) {
if(arr[i] == arr[i - 1])
count++;
else {
if (count > best) {
ret = arr[i - 1];
best = count;
}
count = 1;
}
}
return ret;
}

int main(void)
{
int arr[] = { 1, 3, 6, 7, 6, 3, 3, 4, 9, 10 };
int result = sort_and_get_most_popular(arr,
sizeof arr / sizeof *arr);

printf("result: %d\n", result);
return 0;
}

[mark@icepick]$ gcc -Wall -ansi -pedantic -O2 foo.c -o foo
[mark@icepick]$ ./foo
result: 3


Give it a good review, though-- It's late, I'm tired, and I whipped it
up very quickly.

Mark F. Haigh
mfhaigh@sbcglobal.net

Closed Thread