P: 9

hi everyone!
i was wondering if anyone had any ideas on how to go about coding a method that would determine the median of a list of numbers
just to clarify a median is the 'middle value' when all of the numbers are placed in increasing order;
for example the median of
3 6 12 40 41 57 58 90
is 40.5
i was thinking about using a method that sorts the list in increasing order, then determining if there is an even or odd number of items, (if the number is even, there will be 2 middle values that have to be averaged) then 'cutting off' the first and last numbers until we reach the median value
i was wondering if anyone had any suggestions that may help with this endeavor, or even any guidance that may help me program the method!
im just not certain that this may be the easiest way, and wanted to get some feedback on what other people think..
thanks to everyone!
 
Share this Question
Expert 2.5K+
P: 3,112

hi everyone!
i was wondering if anyone had any ideas on how to go about coding a method that would determine the median of a list of numbers
just to clarify a median is the 'middle value' when all of the numbers are placed in increasing order;
for example the median of
3 6 12 40 41 57 58 90
is 40.5
i was thinking about using a method that sorts the list in increasing order, then determining if there is an even or odd number of items, (if the number is even, there will be 2 middle values that have to be averaged) then 'cutting off' the first and last numbers until we reach the median value
i was wondering if anyone had any suggestions that may help with this endeavor, or even any guidance that may help me program the method!
im just not certain that this may be the easiest way, and wanted to get some feedback on what other people think..
thanks to everyone!
Hi tomshorts07!
Just to clarify, if the list were 3 6 12 40 42 57 58 90 the median would be 41, correct?
The method you suggested is possible, however you don't have to sort the values. As sorting a collection can take quite some time, I would probably solve your problem something like this:
Say, you have an unsorted array of numbers like this:  int[] numbers = {58, 41, 6, 40, 12, 90, 57, 3};
then you have a look at, say, the first element: 58.
Now, go through the array and compare each number to that 58. (Constant time!) You'll find: there is 1 number bigger than 58 in the array.
As the array has the length 8, you know: there are (8  1  bigger) = 6 numbers smaller than 58 in the array.  int smaller = numbers.length()  1  bigger // 58 itself is neither bigger nor smaller than 58
Now,  (bigger  smaller) / 2 = (1  6) / 2 = 5 / 2 = 2.5
(divide by two, as you want to find the number in the middle) tells you, that there are two numbers relevant for the median: the second and the third smaller number than 58. Find those numbers, add them up and divide by two  there's your median.
Depending on the size of your array, that may save a lot of time, if you have a good way of finding the second and third (or whatever your values are) smaller number.
Hint: here a sorted collection like a TreeMap or a TreeSet or maybe the Collection.sort() method might be a good idea. (Think about how many elements you need...)
Greetings,
Nepomuk
  Expert 2.5K+
P: 3,112
 http://en.wikipedia.org/wiki/Selection_algorithm
I guess, you mean the "Median of Medians algorithm"? It looks like it runs a bit faster than mine, but does it actually do the right thing? The description is a bit strange...
If I understand it correctly, it does this:  We have a Collection {13, 7, 29, 2, 97, 61, 5, 17, 31, 101}
 This Collection is split into two Collections {13, 7, 29, 2, 97} and {61, 5, 17, 31, 101}
 Then it calculates the medians of each group: 13 and 17
 The median of these numbers is calculated: 15
 The median of medians splits the original Collection somewhere between 30% / 70% and 70% / 30%: 2, 5, 7, 13 < 15 < 17, 29, 31, 61, 97, 101 => 40% / 60% (but so far, it doesn't know that, does it?)
 And now?
From there on, I'm a little confused. My algorithm find's out, how many numbers are greater and how many are smaller faster, but of course it's not necessarily so near to the actual median.
Please explain, what happens now, to actually find the median?
Greetings,
Nepomuk
  Expert 100+
P: 1,216

I don't understand your algorithm since your explanation ends with quite the hand wave.
One simple median algorithm is to write essentially a modified quick sort:
1. Partition the data.
2. Instead of recursing on both partitions, select the one which would contain the median and recurse on only that one.
This algorithm has an expected time of O(N).
  Expert 2.5K+
P: 3,112

I don't understand your algorithm since your explanation ends with quite the hand wave.
What I was thinking is this:
You create a Collection (let's say an array, just for the sake of easiness) with as many elements as the furthest median value is away from the first number. (As calculated in the previous part of the algorithm.) Let's say, that first number is 58. So: 
int[] numbers = {58, 41, 6, 40, 12, 90, 57, 3};

double difference = 2.5; // calculated above

int[] helpArray = new int[(int)(1 + abs(difference))]; // (int)(1 + 2.5) = (int) 3.5 = 3

So, say the relevant numbers for the median are those 2 and 3 numbers smaller than 58, then the array has the size of 3.  // This is NOT supposed to be valid Javacode!

helpArray.lenth() == 3
Now, go through the list of numbers and add the first 3 values, that are smaller than 58. Sort them, smallest first. (This is probably the part of the algorithm, that takes longest.)  // Neither is this or most of the other following code

helpArray == {41, 6, 40};

sort(helpArray);

helpArray == {6, 40, 41};

Now, go through the rest of the list and when you find a value that's smaller than 58 and larger than the smallest in the list, replace that smallest number and sort the list again (simple bubble sort like algorithm). 
=> helpArray == {12, 40, 41}

=> helpArray == {57, 40, 41}

=> helpArray == {40, 41, 57}

Continue until the end of the list. The smallest number in your list is now the 3rd smaller number from 58, the second smallest is the 2nd smaller than 58. 
int[] relevant = {40, 41};

int median = (relevant[0] + relevant[1])/2;

System.out.println("The median is " + median);

Determine the median. Done.
Let me think, it should have a worst case runtime of something like (n1)*(n/2) + (n/2) + (n/2  1) + ... + 1 Not ideal I guess, but better than sorting the whole thing. ^^
Greetings,
Nepomuk
 
P: 9

What I was thinking is this:
You create a Collection (let's say an array, just for the sake of easiness) with as many elements as the furthest median value is away from the first number. (As calculated in the previous part of the algorithm.) Let's say, that first number is 58. So: 
int[] numbers = {58, 41, 6, 40, 12, 90, 57, 3};

double difference = 2.5; // calculated above

int[] helpArray = new int[(int)(1 + abs(difference))]; // (int)(1 + 2.5) = (int) 3.5 = 3

So, say the relevant numbers for the median are those 2 and 3 numbers smaller than 58, then the array has the size of 3.  // This is NOT supposed to be valid Javacode!

helpArray.lenth() == 3
Now, go through the list of numbers and add the first 3 values, that are smaller than 58. Sort them, smallest first. (This is probably the part of the algorithm, that takes longest.)  // Neither is this or most of the other following code

helpArray == {41, 6, 40};

sort(helpArray);

helpArray == {6, 40, 41};

Now, go through the rest of the list and when you find a value that's smaller than 58 and larger than the smallest in the list, replace that smallest number and sort the list again (simple bubble sort like algorithm). 
=> helpArray == {12, 40, 41}

=> helpArray == {57, 40, 41}

=> helpArray == {40, 41, 57}

Continue until the end of the list. The smallest number in your list is now the 3rd smaller number from 58, the second smallest is the 2nd smaller than 58. 
int[] relevant = {40, 41};

int median = (relevant[0] + relevant[1])/2;

System.out.println("The median is " + median);

Determine the median. Done.
Let me think, it should have a worst case runtime of something like(n1)*(n/2) + (n/2) + (n/2  1) + ... + 1 Not ideal I guess, but better than sorting the whole thing. ^^
Greetings,
Nepomuk
thanks nepomuk,
your explanation is really helpful, and i really understand how to implement the method a lot better now.
theres just one thing about your explanation that i dont understand
your 'helparray' section.
i was just hoping you could explain this part a little bit more in detail.
for example, why is this array only 3 items deep? wouldnt it be easier to just use 2 values, allowing you to calculate the final median without transferring it to the 'relevant' array?
another thing about the array, it seems like using the bubble sort like method would waste a lot of processing time? is there some other way that may be quicker?
  Expert 2.5K+
P: 3,112

thanks nepomuk,
your explanation is really helpful, and i really understand how to implement the method a lot better now.
theres just one thing about your explanation that i dont understand
your 'helparray' section.
i was just hoping you could explain this part a little bit more in detail.
for example, why is this array only 3 items deep? wouldnt it be easier to just use 2 values, allowing you to calculate the final median without transferring it to the 'relevant' array?
another thing about the array, it seems like using the bubble sort like method would waste a lot of processing time? is there some other way that may be quicker?
OK, let's see...
The helparray is 3 items deep in this case, because we only have to save values up to a distance of 3 from the first value (in my example: 58). BUT we have to have the exact distance, so if we had only 2 values, how would you decide, when to save new values to those variables? Example:  A = {58, 51, 53, 56, 57, ..., 41, 40, ...}
and the values you want are 40 and 41. You start by saving 51, 53 and 56. Then you find 57. OK, that's nearer to the 58.
If you had only stored two ints, you'd have 51 and 53, then possibly 53 and 56, then maybe 55 and 57... how would you tell, that the difference to 58 is exactly 3? (Or whatever your distance should be?) Concerning the bubble sort, for the first sorting it might not be the best way, but after that you always enter exactly one new value, so bubble sort should be quite efficient  but you can have it more efficient, no question about it. Right now the best algorithm I can think of would take O(log n) while bubble sort takes O(n), but you may be able to find a even better way.
The O(log n) method is this:  You have an array A and a new value V. Check:
 If true, check the subarray
 int[] A2 = {A[0] ... A[A.length / 2 + 1]
else the subarray  int[] A2 = {A[A.length / 2] ... A[A.length]
 When
is fulfilled, you have the place to enter your new value. Enter it and move everything smaller one place down. Forget the old value A[0].
(I'm sure it has a name, but I can't think of it right now.)
By the way, there may be slight errors in that, I'm a bit tired at the moment, but all in all it should be about right.
Greetings,
Nepomuk
    Question stats  viewed: 3067
 replies: 7
 date asked: Apr 22 '08
