By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
458,182 Members | 1,548 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 458,182 IT Pros & Developers. It's quick & easy.

Median Statistical function

P: 9
hi everyone!

i was wondering if anyone had any ideas on how to go about coding a method that would determine the median of a list of numbers

just to clarify- a median is the 'middle value' when all of the numbers are placed in increasing order;
for example- the median of
3 6 12 40 41 57 58 90
is 40.5

i was thinking about using a method that sorts the list in increasing order, then determining if there is an even or odd number of items, (if the number is even, there will be 2 middle values that have to be averaged) then 'cutting off' the first and last numbers until we reach the median value


i was wondering if anyone had any suggestions that may help with this endeavor, or even any guidance that may help me program the method!
im just not certain that this may be the easiest way, and wanted to get some feedback on what other people think..

thanks to everyone!
Apr 22 '08 #1
Share this Question
Share on Google+
7 Replies


Nepomuk
Expert 2.5K+
P: 3,112
hi everyone!

i was wondering if anyone had any ideas on how to go about coding a method that would determine the median of a list of numbers

just to clarify- a median is the 'middle value' when all of the numbers are placed in increasing order;
for example- the median of
3 6 12 40 41 57 58 90
is 40.5

i was thinking about using a method that sorts the list in increasing order, then determining if there is an even or odd number of items, (if the number is even, there will be 2 middle values that have to be averaged) then 'cutting off' the first and last numbers until we reach the median value


i was wondering if anyone had any suggestions that may help with this endeavor, or even any guidance that may help me program the method!
im just not certain that this may be the easiest way, and wanted to get some feedback on what other people think..

thanks to everyone!
Hi tomshorts07!
Just to clarify, if the list were
3 6 12 40 42 57 58 90
the median would be 41, correct?
The method you suggested is possible, however you don't have to sort the values. As sorting a collection can take quite some time, I would probably solve your problem something like this:

Say, you have an unsorted array of numbers like this:
Expand|Select|Wrap|Line Numbers
  1. int[] numbers = {58, 41, 6, 40, 12, 90, 57, 3};
then you have a look at, say, the first element: 58.
Now, go through the array and compare each number to that 58. (Constant time!) You'll find: there is 1 number bigger than 58 in the array.
Expand|Select|Wrap|Line Numbers
  1. int bigger = 1;
As the array has the length 8, you know: there are (8 - 1 - bigger) = 6 numbers smaller than 58 in the array.
Expand|Select|Wrap|Line Numbers
  1. int smaller = numbers.length() - 1 - bigger // 58 itself is neither bigger nor smaller than 58
Now,
Expand|Select|Wrap|Line Numbers
  1. (bigger - smaller) / 2 = (1 - 6) / 2 = -5 / 2 = -2.5
(divide by two, as you want to find the number in the middle) tells you, that there are two numbers relevant for the median: the second and the third smaller number than 58. Find those numbers, add them up and divide by two - there's your median.

Depending on the size of your array, that may save a lot of time, if you have a good way of finding the second and third (or whatever your values are) smaller number.

Hint: here a sorted collection like a TreeMap or a TreeSet or maybe the Collection.sort() method might be a good idea. (Think about how many elements you need...)

Greetings,
Nepomuk
Apr 22 '08 #2

Nepomuk
Expert 2.5K+
P: 3,112
http://en.wikipedia.org/wiki/Selection_algorithm
I guess, you mean the "Median of Medians algorithm"? It looks like it runs a bit faster than mine, but does it actually do the right thing? The description is a bit strange...
If I understand it correctly, it does this:
  • We have a Collection {13, 7, 29, 2, 97, 61, 5, 17, 31, 101}
  • This Collection is split into two Collections {13, 7, 29, 2, 97} and {61, 5, 17, 31, 101}
  • Then it calculates the medians of each group: 13 and 17
  • The median of these numbers is calculated: 15
  • The median of medians splits the original Collection somewhere between 30% / 70% and 70% / 30%: 2, 5, 7, 13 < 15 < 17, 29, 31, 61, 97, 101 => 40% / 60% (but so far, it doesn't know that, does it?)
  • And now?
From there on, I'm a little confused. My algorithm find's out, how many numbers are greater and how many are smaller faster, but of course it's not necessarily so near to the actual median.
Please explain, what happens now, to actually find the median?

Greetings,
Nepomuk
Apr 23 '08 #4

BigDaddyLH
Expert 100+
P: 1,216
I don't understand your algorithm since your explanation ends with quite the hand wave.

One simple median algorithm is to write essentially a modified quick sort:

1. Partition the data.
2. Instead of recursing on both partitions, select the one which would contain the median and recurse on only that one.

This algorithm has an expected time of O(N).
Apr 23 '08 #5

Nepomuk
Expert 2.5K+
P: 3,112
I don't understand your algorithm since your explanation ends with quite the hand wave.
What I was thinking is this:
You create a Collection (let's say an array, just for the sake of easiness) with as many elements as the furthest median value is away from the first number. (As calculated in the previous part of the algorithm.) Let's say, that first number is 58. So:
Expand|Select|Wrap|Line Numbers
  1. int[] numbers = {58, 41, 6, 40, 12, 90, 57, 3};
  2. double difference = -2.5; // calculated above
  3. int[] helpArray = new int[(int)(1 + abs(difference))]; // (int)(1 + 2.5) = (int) 3.5 = 3
  4.  
So, say the relevant numbers for the median are those 2 and 3 numbers smaller than 58, then the array has the size of 3.
Expand|Select|Wrap|Line Numbers
  1. // This is NOT supposed to be valid Java-code!
  2. helpArray.lenth() == 3
Now, go through the list of numbers and add the first 3 values, that are smaller than 58. Sort them, smallest first. (This is probably the part of the algorithm, that takes longest.)
Expand|Select|Wrap|Line Numbers
  1. // Neither is this or most of the other following code
  2. helpArray == {41, 6, 40};
  3. sort(helpArray);
  4. helpArray == {6, 40, 41};
  5.  
Now, go through the rest of the list and when you find a value that's smaller than 58 and larger than the smallest in the list, replace that smallest number and sort the list again (simple bubble sort like algorithm).
Expand|Select|Wrap|Line Numbers
  1. => helpArray == {12, 40, 41}
  2. => helpArray == {57, 40, 41}
  3. => helpArray == {40, 41, 57}
  4.  
Continue until the end of the list. The smallest number in your list is now the 3rd smaller number from 58, the second smallest is the 2nd smaller than 58.
Expand|Select|Wrap|Line Numbers
  1. int[] relevant = {40, 41};
  2. int median = (relevant[0] + relevant[1])/2;
  3. System.out.println("The median is " + median);
  4.  
Determine the median. Done.

Let me think, it should have a worst case runtime of something like
(n-1)*(n/2) + (n/2) + (n/2 - 1) + ... + 1
Not ideal I guess, but better than sorting the whole thing. ^^

Greetings,
Nepomuk
Apr 23 '08 #6

P: 9
What I was thinking is this:
You create a Collection (let's say an array, just for the sake of easiness) with as many elements as the furthest median value is away from the first number. (As calculated in the previous part of the algorithm.) Let's say, that first number is 58. So:
Expand|Select|Wrap|Line Numbers
  1. int[] numbers = {58, 41, 6, 40, 12, 90, 57, 3};
  2. double difference = -2.5; // calculated above
  3. int[] helpArray = new int[(int)(1 + abs(difference))]; // (int)(1 + 2.5) = (int) 3.5 = 3
  4.  
So, say the relevant numbers for the median are those 2 and 3 numbers smaller than 58, then the array has the size of 3.
Expand|Select|Wrap|Line Numbers
  1. // This is NOT supposed to be valid Java-code!
  2. helpArray.lenth() == 3
Now, go through the list of numbers and add the first 3 values, that are smaller than 58. Sort them, smallest first. (This is probably the part of the algorithm, that takes longest.)
Expand|Select|Wrap|Line Numbers
  1. // Neither is this or most of the other following code
  2. helpArray == {41, 6, 40};
  3. sort(helpArray);
  4. helpArray == {6, 40, 41};
  5.  
Now, go through the rest of the list and when you find a value that's smaller than 58 and larger than the smallest in the list, replace that smallest number and sort the list again (simple bubble sort like algorithm).
Expand|Select|Wrap|Line Numbers
  1. => helpArray == {12, 40, 41}
  2. => helpArray == {57, 40, 41}
  3. => helpArray == {40, 41, 57}
  4.  
Continue until the end of the list. The smallest number in your list is now the 3rd smaller number from 58, the second smallest is the 2nd smaller than 58.
Expand|Select|Wrap|Line Numbers
  1. int[] relevant = {40, 41};
  2. int median = (relevant[0] + relevant[1])/2;
  3. System.out.println("The median is " + median);
  4.  
Determine the median. Done.

Let me think, it should have a worst case runtime of something like
(n-1)*(n/2) + (n/2) + (n/2 - 1) + ... + 1
Not ideal I guess, but better than sorting the whole thing. ^^

Greetings,
Nepomuk

thanks nepomuk,
your explanation is really helpful, and i really understand how to implement the method a lot better now.
theres just one thing about your explanation that i dont understand-
your 'helparray' section.
i was just hoping you could explain this part a little bit more in detail.
for example, why is this array only 3 items deep? wouldnt it be easier to just use 2 values, allowing you to calculate the final median without transferring it to the 'relevant' array?

another thing about the array, it seems like using the bubble sort like method would waste a lot of processing time? is there some other way that may be quicker?
May 1 '08 #7

Nepomuk
Expert 2.5K+
P: 3,112
thanks nepomuk,
your explanation is really helpful, and i really understand how to implement the method a lot better now.
theres just one thing about your explanation that i dont understand-
your 'helparray' section.
i was just hoping you could explain this part a little bit more in detail.
for example, why is this array only 3 items deep? wouldnt it be easier to just use 2 values, allowing you to calculate the final median without transferring it to the 'relevant' array?

another thing about the array, it seems like using the bubble sort like method would waste a lot of processing time? is there some other way that may be quicker?
OK, let's see...
The helparray is 3 items deep in this case, because we only have to save values up to a distance of 3 from the first value (in my example: 58). BUT we have to have the exact distance, so if we had only 2 values, how would you decide, when to save new values to those variables?
Example:
Expand|Select|Wrap|Line Numbers
  1. A = {58, 51, 53, 56, 57, ..., 41, 40, ...}
and the values you want are 40 and 41. You start by saving 51, 53 and 56. Then you find 57. OK, that's nearer to the 58.
If you had only stored two ints, you'd have 51 and 53, then possibly 53 and 56, then maybe 55 and 57... how would you tell, that the difference to 58 is exactly 3? (Or whatever your distance should be?)
Concerning the bubble sort, for the first sorting it might not be the best way, but after that you always enter exactly one new value, so bubble sort should be quite efficient - but you can have it more efficient, no question about it. Right now the best algorithm I can think of would take O(log n) while bubble sort takes O(n), but you may be able to find a even better way.

The O(log n) method is this:
  1. You have an array A and a new value V. Check:
    Expand|Select|Wrap|Line Numbers
    1. V <= A[A.length / 2]
  2. If true, check the subarray
    Expand|Select|Wrap|Line Numbers
    1. int[] A2 = {A[0] ... A[A.length / 2 + 1]
    else the subarray
    Expand|Select|Wrap|Line Numbers
    1. int[] A2 = {A[A.length / 2] ... A[A.length]
  3. When
    Expand|Select|Wrap|Line Numbers
    1. Ax.length == 0
    is fulfilled, you have the place to enter your new value. Enter it and move everything smaller one place down. Forget the old value A[0].
(I'm sure it has a name, but I can't think of it right now.)

By the way, there may be slight errors in that, I'm a bit tired at the moment, but all in all it should be about right.

Greetings,
Nepomuk
May 2 '08 #8

Post your reply

Sign in to post your reply or Sign up for a free account.