2012-11-14

Find median or any statistic parameter in loop by array key


Suppose you have multiple lists with the same length and key:

key----value1----value2
1          35           33
2          40           20
3          23           23
..
12        60           70


and you would like to calculate medians by the key -
result should look like:

key----median
1          34
2          30
3          23
..
12        65
Why?
If you have raw tabular data, of course, this can be done fast in any spreadsheet. but if your results are intermediate results generated in large script, then it would be good if they have a structure and in such case they can be put together and passed to function.

What was before?
Previously in my script I invented superduper basic simple data structure for my data - long term monthly mean pairs. The month number is a key and it has its corresponding calculated value. It is kinda pseudo 2D array, where each element is a small array containing respective key (month number) and value, altogether 12 elements.

Why again?
So, what happens if you run such long term monthly mean function in loop (each time changing something in previous calculations to get different long term monthly mean values, of course, otherwise there would have no meaning of it..)? You get multiple arrays with similar structure.
..
What can be done with such arrays? Well.. different things, of course, any statistics.. in my case, 50th percentile, called median (or vice versa). The next thing is practical - how to pass all these arrays to the specific function? Wrap them together.

How?
What is wrapping here? Superduper basic simple thing, actually - I created an "outer" array, I call it wrapper array, which I am appending with "long term monthly mean" pseudo 2D array and such an "outer array"
 is what I passed to the function.

What function does? Function takes apart such an "outer array" and reconfigures it by keys allowing to make a median calculation over the created reconfigured lists. Reconfigured array consists of number of keys of lists - list count is equal to key count - and each list consists of all values corresponding to the key. Then statistics can be done over each list and the result can passed to the function return (I prefer to return array with the similar structure like "long term monthly mean" structure - with pairs consisting of key and corresponding statistical value).

The full sample script can be seen here.

Nav komentāru:

Ierakstīt komentāru