Array problem | | |
Hi all
I have a large vector with float point numbers in it, for example
(1.1, 2.1 , 3.2 , 3.3 , 4, 6, 3.2, 8). Is there an easy way to
determine
how many uique elements in the array?
Thanks
Rick | | | | re: Array problem gouqizi.lvcha@gmail.com wrote:[color=blue]
> Hi all
>
> I have a large vector with float point numbers in it, for example
> (1.1, 2.1 , 3.2 , 3.3 , 4, 6, 3.2, 8). Is there an easy way to
> determine
> how many uique elements in the array?[/color]
Sort the array, e.g. with qsort(), run through the sorted array
and increase a counter at encountering the first element and afterwards
whenever consecutive elements are not "equal".
Note that equality with floating point numbers should not
hinge on "==" but rather on
fabs(a-b) <= MY_THRESHOLD*(fabs(a)+fabs(b))
or some similar construct where MY_THRESHOLD is a small number based,
for example, on your requirements or on DBL_EPSILON (from <float.h>).
If you know for sure that there is only a reasonably small number
of "distinct" values (distinct with the above notion of equality) and
you know these values, you can make use of the bucket sort algorithm
(you need not do the final concatenation -- just look at the number
of "used" buckets).
However, these are rather algorithmical questions, hence not topical
round here. comp.programming may a good source for the best algorithm.
You are of course welcome to come back here with your code if you
have problems.
Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address. | | | | re: Array problem gouqizi.lvcha@gmail.com wrote:[color=blue]
> Hi all
>
> I have a large vector with float point numbers in it, for example
> (1.1, 2.1 , 3.2 , 3.3 , 4, 6, 3.2, 8). Is there an easy way to
> determine
> how many uique elements in the array?[/color]
Yes and no.
Yes: Sort the array with qsort(), and then scan over it
in one pass looking for identical adjacent values.
No: Floating-point numbers usually come from sources that
are not exact -- you measure a temperature, or round an
irrational square root to a nearby representable value, and
so on. How many "unique" values do you think this array holds:
double array[3];
array[0] = 3.0;
array[1] = pow(sqrt(3.0), 2.0);
array[2] = exp(log(9.0) / 2.0);
? Deciding on what makes a number "unique" may be harder
than the rest of the job combined.
-- Eric.Sosman@sun.com | | | | re: Array problem
"gouqizi.lvcha@gmail.com" wrote:[color=blue]
>
> I have a large vector with float point numbers in it, for example
> (1.1, 2.1 , 3.2 , 3.3 , 4, 6, 3.2, 8). Is there an easy way to
> determine how many uique elements in the array?[/color]
Define uique. Or even define unique. Be precise.
--
Some useful references about C:
<http://www.ungerhu.com/jxh/clc.welcome.txt>
<http://www.eskimo.com/~scs/C-faq/top.html>
<http://benpfaff.org/writings/clc/off-topic.html>
<http://anubis.dkuug.dk/jtc1/sc22/wg14/www/docs/n869/> (C99)
<http://www.dinkumware.com/refxc.html> (C-library}
<http://gcc.gnu.org/onlinedocs/> (GNU docs) | | | | re: Array problem
Thanks for the challenge. There's a simple way to cheat - sort the
list using the machine's representation of the floating point number
cast to an integer. "Equal" numbers will have identical
representations. By "equal," I mean two numbers that differ by less
than the machine's precision, or ability to discriminate digits of
small significance. Here is some code:
#include <stdio.h>
#include <stdlib.h>
int fcmp(const void* a, const void* b)
{
/* These are floats, true, but we are only sorting to find equal
items.
* We can use a trick, treat them as raw bytes, and sort them based
on
* their IEEE 754 floating point representation. "Equal" floats -
* specifically, floating point numbers that are different by less
* than the machine epsilon (or significance) - will have equal
binary
* representations. */
long* ia=(long*)a;
long* ib=(long*)b;
return (*ia<*ib)?-1:((*ia>*ib)?1:0);
}
int main(void)
{
float A[5]; int i=0; int dups=0;
printf("Please input 5 floating point numbers: \n");
for (i=0; i<5; i++)
scanf(" %f", &(A[i]));
/* Perform in-place sort of 5 elements */
qsort(A, 5, sizeof(float), fcmp);
/* Perform linear scan for duplicates; also, print elements */
printf("Here's list sorted by IEEE representation: \n");
for (i=0; i<5-1; i++) {
printf(" %f\n", A[i]);
if (A[i]==A[i+1]) /* ah, so two floats CAN be =='d... within
precision */
dups++;
}
printf(" %f\nThere are %d repeats.\n", A[i], dups);
return 0;
}
cc 1.c -o 1 ; ./1
Please input 5 floating point numbers:
1.333333 2. 1.333333 4. 5.
Here's list sorted by IEEE representation:
1.333333
1.333333
2.000000
4.000000
5.000000
There are 1 repeats.
[rfink1@linux1 home]$ cc 1.c -o 1 ; ./1
# NOTE THE PROBLEM HERE...
Please input 5 floating point numbers:
1.3333333333333333 1.3333333333334 2. 3. 4.
Here's list sorted by IEEE representation:
1.333333
1.333333
2.000000
3.000000
4.000000
There are 1 repeats.
Best of luck,
Russ | | | | re: Array problem
On Wed, 08 Jun 2005 22:26:40 +0200, Michael Mair wrote:
[color=blue]
> gouqizi.lvcha@gmail.com wrote:[color=green]
>> Hi all
>>
>> I have a large vector with float point numbers in it, for example
>> (1.1, 2.1 , 3.2 , 3.3 , 4, 6, 3.2, 8). Is there an easy way to
>> determine
>> how many uique elements in the array?[/color]
>
> Sort the array, e.g. with qsort(), run through the sorted array
> and increase a counter at encountering the first element and afterwards
> whenever consecutive elements are not "equal".[/color]
Even this may not be correct.
[color=blue]
> Note that equality with floating point numbers should not
> hinge on "==" but rather on
> fabs(a-b) <= MY_THRESHOLD*(fabs(a)+fabs(b))
> or some similar construct where MY_THRESHOLD is a small number based,
> for example, on your requirements or on DBL_EPSILON (from <float.h>).[/color]
Which creates a problem. Say 2 numbers are considered "equal" if the
difference between them is less than 0.1. I'm mot concerned with the
accuracy of the representation here, say a decimal format is used. How
many unique numbers are there in:
1.00 1.01 1.02 1.09 1.11 1.13 1.19
?
The difference between each consecutive pair of numbers is less than 0.1
so every consecutive pair would be considered equal. So the algorithm
above would conclude that there a 1 unique element. But the difference
between the first and last numbers is more than 0.1 so these should
reasonably be considered unequal, which suggests that there should be at
least 2 unique elements found.
So you need to define what equals means VERY carefully.
Lawrence | | | | re: Array problem russfink@yahoo.com wrote:[color=blue]
> Thanks for the challenge. There's a simple way to cheat - sort the
> list using the machine's representation of the floating point number
> cast to an integer. "Equal" numbers will have identical
> representations. By "equal," I mean two numbers that differ by less
> than the machine's precision, or ability to discriminate digits of
> small significance. Here is some code:
>
> #include <stdio.h>
> #include <stdlib.h>
>
> int fcmp(const void* a, const void* b)
> {
> /* These are floats, true, but we are only sorting to find equal
> items.
> * We can use a trick, treat them as raw bytes, and sort them based
> on
> * their IEEE 754 floating point representation. "Equal" floats -
> * specifically, floating point numbers that are different by less
> * than the machine epsilon (or significance) - will have equal
> binary
> * representations. */
>
> long* ia=(long*)a;
> long* ib=(long*)b;
> return (*ia<*ib)?-1:((*ia>*ib)?1:0);
> }
>
> int main(void)
> {
> float A[5]; int i=0; int dups=0;
>
> printf("Please input 5 floating point numbers: \n");
> for (i=0; i<5; i++)
> scanf(" %f", &(A[i]));
>
> /* Perform in-place sort of 5 elements */
> qsort(A, 5, sizeof(float), fcmp);
>
> /* Perform linear scan for duplicates; also, print elements */
> printf("Here's list sorted by IEEE representation: \n");
> for (i=0; i<5-1; i++) {
> printf(" %f\n", A[i]);
> if (A[i]==A[i+1]) /* ah, so two floats CAN be =='d... within
> precision */
> dups++;
> }
> printf(" %f\nThere are %d repeats.\n", A[i], dups);
> return 0;
> }
>
> cc 1.c -o 1 ; ./1
> Please input 5 floating point numbers:
> 1.333333 2. 1.333333 4. 5.
> Here's list sorted by IEEE representation:
> 1.333333
> 1.333333
> 2.000000
> 4.000000
> 5.000000
> There are 1 repeats.
> [rfink1@linux1 home]$ cc 1.c -o 1 ; ./1
> # NOTE THE PROBLEM HERE...
> Please input 5 floating point numbers:
> 1.3333333333333333 1.3333333333334 2. 3. 4.
> Here's list sorted by IEEE representation:
> 1.333333
> 1.333333
> 2.000000
> 3.000000
> 4.000000
> There are 1 repeats.
>
> Best of luck,
> Russ
>[/color]
or when checking == try:
#define EPSILON 1.0e-10 /* or whatever threshold you require for
equality */
if ( A[i] - A[i+1] < EPSILON )
dups++;
save the need top cast to longs and such | | | | re: Array problem russfink@yahoo.com wrote:
[color=blue]
> Thanks for the challenge. There's a simple way to cheat - sort the
> list using the machine's representation of the floating point number
> cast to an integer. "Equal" numbers will have identical
> representations. By "equal," I mean two numbers that differ by less
> than the machine's precision, or ability to discriminate digits of
> small significance. Here is some code:[/color]
[color=blue]
> #include <stdio.h>
> #include <stdlib.h>
>
> int fcmp(const void* a, const void* b)
> {[/color]
(snip)
[color=blue]
> long* ia=(long*)a;
> long* ib=(long*)b;
> return (*ia<*ib)?-1:((*ia>*ib)?1:0);
> }[/color]
I would think memcmp() would be more portable, or
maybe a #if sizeof(long)==sizeof(float)
would do.
-- glen | | | | re: Array problem
Lawrence Kirby wrote:[color=blue]
> On Wed, 08 Jun 2005 22:26:40 +0200, Michael Mair wrote:
>
>[color=green]
>>gouqizi.lvcha@gmail.com wrote:
>>[color=darkred]
>>>Hi all
>>>
>>>I have a large vector with float point numbers in it, for example
>>>(1.1, 2.1 , 3.2 , 3.3 , 4, 6, 3.2, 8). Is there an easy way to
>>>determine
>>>how many uique elements in the array?[/color]
>>
>>Sort the array, e.g. with qsort(), run through the sorted array
>>and increase a counter at encountering the first element and afterwards
>>whenever consecutive elements are not "equal".[/color]
>
>
> Even this may not be correct.
>
>[color=green]
>>Note that equality with floating point numbers should not
>>hinge on "==" but rather on
>> fabs(a-b) <= MY_THRESHOLD*(fabs(a)+fabs(b))
>>or some similar construct where MY_THRESHOLD is a small number based,
>>for example, on your requirements or on DBL_EPSILON (from <float.h>).[/color]
>
>
> Which creates a problem. Say 2 numbers are considered "equal" if the
> difference between them is less than 0.1. I'm mot concerned with the
> accuracy of the representation here, say a decimal format is used. How
> many unique numbers are there in:
>
> 1.00 1.01 1.02 1.09 1.11 1.13 1.19
>
> ?
>
> The difference between each consecutive pair of numbers is less than 0.1
> so every consecutive pair would be considered equal. So the algorithm
> above would conclude that there a 1 unique element. But the difference
> between the first and last numbers is more than 0.1 so these should
> reasonably be considered unequal, which suggests that there should be at
> least 2 unique elements found.
>
> So you need to define what equals means VERY carefully.[/color]
Thanks for pointing out this particular problem -- I just thought
of the typical problems where one expects a set of "truly" distinct
values with distances between the exact results which are very large
compared to small clusters of more or less equal values.
BTW: I know how to deal with problems like that using more or less
statistical methods (together with an appropriate numerical analysis)
but I am not aware of a straightforward solution...
Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address. | | | | re: Array problem
Eric Sosman wrote:[color=blue]
>
> gouqizi.lvcha@gmail.com wrote:
>[color=green]
>>Hi all
>>
>>I have a large vector with float point numbers in it, for example
>>(1.1, 2.1 , 3.2 , 3.3 , 4, 6, 3.2, 8). Is there an easy way to
>>determine
>>how many uique elements in the array?[/color]
>
>
> Yes and no.
>
> Yes: Sort the array with qsort(), and then scan over it
> in one pass looking for identical adjacent values.
>
> No: Floating-point numbers usually come from sources that
> are not exact -- you measure a temperature, or round an
> irrational square root to a nearby representable value, and
> so on. How many "unique" values do you think this array holds:
>
> double array[3];
> array[0] = 3.0;
> array[1] = pow(sqrt(3.0), 2.0);
> array[2] = exp(log(9.0) / 2.0);
>
> ? Deciding on what makes a number "unique" may be harder
> than the rest of the job combined.
>[/color]
Apoligies in advance for any line wrapping. I hope you can read it
somehow. The first line is a representation of the actual binary double
object (IEEE 754 I presume) on my Wintel box with GCC. The next three
lines are my evaluation of the exponent and mantissa as values. The last
line is the result of a 'printf("%.16e", d);' statement.
I would illustrate Eric's point.
double d = 3.0;
This is exact. The representation of 3 is precise and accurate.
01000000 00001000 00000000 00000000 00000000 00000000 00000000 00000000
Exp = 1024 (2)
000 00000010
Man = .11000 00000000 00000000 00000000 00000000 00000000 00000000
3.0000000000000000e+00
double d = pow(sqrt(3.0), 2.0);
Not exact. The two functions introduce precision errors. Close.
01000000 00000111 11111111 11111111 11111111 11111111 11111111 11111111
Exp = 1024 (2)
000 00000010
Man = .10111 11111111 11111111 11111111 11111111 11111111 11111111
2.9999999999999996e+00
double d = exp(log(9.0) / 2.0);
Not exact. Again precision error. But close.
01000000 00001000 00000000 00000000 00000000 00000000 00000000 00000001
Exp = 1024 (2)
000 00000010
Man = .11000 00000000 00000000 00000000 00000000 00000000 00000001
3.0000000000000004e+00
Close only counts in horse shoes, slow dancing and hand grenades. :-)
--
Joe Wright
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein --- | | | | re: Array problem gouqizi.lvcha@gmail.com wrote:[color=blue]
> Hi all
>
> I have a large vector with float point numbers in it, for example
> (1.1, 2.1 , 3.2 , 3.3 , 4, 6, 3.2, 8). Is there an easy way to
> determine
> how many uique elements in the array?
>
> Thanks
>
> Rick[/color]
Binary search trees may also be useful in finding the duplicate
entries. | | | | re: Array problem
On Thu, 09 Jun 2005 12:40:04 -0700, glen herrmannsfeldt
<gah@ugcs.caltech.edu> wrote:
[color=blue]
> russfink@yahoo.com wrote:[/color]
<snip>[color=blue][color=green]
> > int fcmp(const void* a, const void* b)
> > {[/color]
>
> (snip)
>[color=green]
> > long* ia=(long*)a;
> > long* ib=(long*)b;
> > return (*ia<*ib)?-1:((*ia>*ib)?1:0);
> > }[/color]
>
> I would think memcmp() would be more portable, or
> maybe a #if sizeof(long)==sizeof(float)
> would do.
>[/color]
IAYM memcmp for sizeof(float); that would be safe.
The (standard) preprocessor does not know about types and sizeof.
You need assert( sizeof(long)==sizeof(float) ) or a compile-time hack
like typedef char checker [ sizeof(long)==sizeof(float) ? 1 : -1 ] ;
- David.Thompson1 at worldnet.att.net |  | | | | /bytes/about
We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights.
Get the best answers to your questions from over 226,471 network members.
|