Hello everyone,
I am a biologist that is a relatively new to programming. This isn't for homework, however I would like to efficiently write a piece of a program that finds the number of unique elements in a sorted array. My array has over 200,000+ elements in it, so it would be silly to brute force the whole thing, especially because the array is sorted.
Example:
@array = (0,0,0,1,1,1,1,2,3,3);
Would return:
0: 3
1: 4
2: 1
3: 2
Any suggestions? I don't need the entire code if it is against the rules, but I would like an idea of how to program this.
I'm not sure what you mean by brute force but if it's sorted, you could just start counting and when it changes, insert into a new array with the value and count. In peudocode, it would be - var arr1, arr2, group, count, i
-
-
load arr1
-
group = arr1(0)
-
count = 0
-
-
for i = 0 to upperbound(arr1)
-
if arr1(i) != group
-
insert into arr2 (group, count)
-
count = 0
-
group = arr1(i)
-
end if
-
-
count = count + 1
-
end for
6 4940
I'm not sure what you mean by brute force but if it's sorted, you could just start counting and when it changes, insert into a new array with the value and count. In peudocode, it would be - var arr1, arr2, group, count, i
-
-
load arr1
-
group = arr1(0)
-
count = 0
-
-
for i = 0 to upperbound(arr1)
-
if arr1(i) != group
-
insert into arr2 (group, count)
-
count = 0
-
group = arr1(i)
-
end if
-
-
count = count + 1
-
end for
Thanks, by brute force I mean nested loops that go through the whole array in order to count.
Thanks for your post, appreciate it!
-Lisa
Not a problem. Since it's sorted, you don't need nested loops but you will need the one loop to iterate through the whole array.
If you have a rough estimate on how often each number is in the array in average, you don't need to iterate through the whole array by single steps, because it's sorted. This speeds up your program very much.
Here is the idea:
Let's say you know that each number is in the array 10 times on average.
First look at the first element.
Then look at the tenth element.
Case 1: If it's the same, go on forward by single steps until you reach a different element. (For example if the thirteenth element is different, you have used only 5 lookups for 13 elements, which is more than 2 and a half times faster than going by single step).
Case 2: If it's not the same, then go backward by single steps until you find one that's the same. (For example if the seventh element is the same, you have used only 5 lookups for 10 elements, which is more than 2 times faster than going by single step).
Then use the greatest position (11 or 14 as new starting point and repeat the whole procedure.
If you can't estimate the average, you can use a statistic from the first elements to guess the average of further elements ( the jump size) while going through the array. This works well if you know that there is an equal distribution among the different numbers in the array.
If there is no such distribution, that means among 200000 elements there can be 10000 times one element and only 2 times another, you can use binary jump steps: Jump by 2, 4, 8, 16, 32, 64 etc. That means look at the first, third, seventh, fifteenthe etc. element. With this strategy you only need 10 lookups to count more than 1000 elements of a row.
If you know in advance how many different elements are there in your array, or if you know all names of all elements that occur in your array, you can even optimize more.
Just one more idea:
Use an algorithm like the binary search: always cut the jump size in half when going forward or backward, don't go by single steps.
For example compare element 1 with 1024. If they are not the same, compare one with 512. If they are not the same, compare one with 256, if they are the same, compare one with 256+128=384etc.
200k will not take up much memory, so just loop through the array storing the values in a hash. This works whether the values are already sorted or not. - use warnings;
-
use strict;
-
use Data::Dumper;
-
-
my @array = (0,0,0,1,1,1,1,2,3,3);
-
my %hash;
-
for (@array) {
-
$hash{$_}++;
-
}
-
print Dumper(\%hash);
-
-
__END__
-
-
$VAR1 = {
-
'1' => 4,
-
'3' => 2,
-
'0' => 3,
-
'2' => 1
-
};
-
Sign in to post your reply or Sign up for a free account.
Similar topics
by: superprad |
last post by:
Is there an easy way to grab the Unique elements from a list?
For Example:
data =
what I am looking for is the unique elements 0.4 and 0.9 with their
index from the list.
Probably something...
|
by: jjl |
last post by:
I have an array, and try to find out the unique elements (case insentitive)
of array. Could someone help me out. Thanks a lot.
|
by: gouqizi.lvcha |
last post by:
Hi all
I have a large vector with float point numbers in it, for example
(1.1, 2.1 , 3.2 , 3.3 , 4, 6, 3.2, 8). Is there an easy way to
determine
how many uique elements in the array?
Thanks...
|
by: Paulers |
last post by:
Hello all,
I have a string array with duplicate elements. I need to create a new
string array containing only the unique elements. Is there an easy way
to do this? I have tried looping through...
|
by: Brian Tkatch |
last post by:
I'm looking for a simple way to unique an array of strings. I came up
with this. Does it make sense? Am i missing anything? (Testing seems to
show it to work.)
Public Function Unique(ByVal...
|
by: Tekkaman |
last post by:
I have a list of lists and I want to define an iterator (let's call
that uniter) over all unique elements, in any order. For example,
calling:
sorted(uniter(, , ]))
must return . I tried the...
|
by: newnewbie |
last post by:
Desperately need help in creating a query to count unique values in a table. I am a Business analyst with limited knowledge of Access….My boss got me ODBC connection to the underlying tables for our...
|
by: aszia787 |
last post by:
Hi guys,
I need to count the unique elements in an array. For example, if i have an array,
array. The number of unique elements in the array is 3. I am sorry if this message has been posted...
|
by: dez5000 |
last post by:
I'm trying to get a report by location that would list the number of visits to the location for the month but also count the number of unique visitors to that location.
I have a table of data with...
|
by: newnewbie |
last post by:
Hi,
Could somebody please help me with VBA code to count unique values in a Report?
Report is based on a query that has grouped values. Basically, I would like to use formula...
|
by: DolphinDB |
last post by:
Tired of spending countless mintues downsampling your data? Look no further!
In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: Vimpel783 |
last post by:
Hello!
Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
|
by: ArrayDB |
last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
|
by: CloudSolutions |
last post by:
Introduction:
For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
|
by: Defcon1945 |
last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
|
by: Shællîpôpï 09 |
last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
|
by: af34tf |
last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome former...
| |