# Statistics: Calculating Mean, Median, Mode, Standard Deviation using Ruby, R, or Javascript

I'm currently learning statistics for my EMBA at USC and the best way for me to learn is to both write down my notes and include some code.

# Mean

Mean - The sum of all data points divided by the total number of observations.

## Ruby

```
weight = [115, 117, 120, 123, 126, 129, 132, 135, 139, 142, 146, 150, 154, 159, 164]
height = [58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72]
# Provide the average
def mean(array)
array = array.inject(0) { |sum, x| sum += x } / array.size.to_f
end
puts %Q{ Mean Weight: #{mean(weight)}, Mean Height: #{mean(height)} }
```

## R

```
weight <- c(115, 117, 120, 123, 126, 129, 132, 135, 139, 142, 146, 150, 154, 159, 164)
height <- c(58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72)
weight_mean <- mean(height)
height_mean <- mean(weight)
sprintf("Mean Weight: %1.4f, Mean Height: %1.4f", weight_mean, height_mean)
```

## Javascript

```
const util = require("util");
const math = require("mathjs");
let weight = [115, 117, 120, 123, 126, 129, 132, 135, 139, 142, 146, 150, 154, 159, 164];
let height = [58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72]
var weightMean = math.mean(weight).toFixed(2);
var heightMean = math.mean(height).toFixed(2);
console.log( util.format("Mean Weight %s, Mean Height: %s", weightMean, heightMean) );
```

`Ruby`

is such an elegant language that showing your work is fun but I love how `R`

just has a native method for `mean()`

. In the Javascript example, I'm splitting the difference with a little help from MathJS package.

# Median

Median is the midpoint of data. Suppose you have 25 observations. The midpoint would be the middle observation or row 13.

When you are given a mean or median, do not looks at those numbers as the end of the story, look at it as the beginning of an adventure.

## Ruby

```
weight = [115, 117, 120, 123, 126, 129, 132, 135, 139, 142, 146, 150, 154, 159, 164]
height = [58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72]
# If the array has an odd number, then simply pick the one in the middle
# If the array size is even, then we must calculate the mean of the two middle.
def median(array, already_sorted=false)
return nil if array.empty?
array = array.sort unless already_sorted
m_pos = array.size / 2
return array.size % 2 == 1 ? array[m_pos] : mean(array[m_pos-1..m_pos])
end
puts %Q{ Median Weight: #{median(weight)}, Median Height: #{median(height)} }
```

## R

```
weight <- c(115, 117, 120, 123, 126, 129, 132, 135, 139, 142, 146, 150, 154, 159, 164)
height <- c(58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72)
weight_median <- median(weight)
height_median <- median(height)
sprintf("Median Weight: %s, Median Height: %s", weight_median, height_median)
```

## Javascript

```
const util = require("util");
const math = require("mathjs");
let weight = [115, 117, 120, 123, 126, 129, 132, 135, 139, 142, 146, 150, 154, 159, 164];
let height = [58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72]
var weightMean = math.median(weight).toFixed(2);
var heightMean = math.median(height).toFixed(2);
console.log( util.format("Mean Weight %s, Mean Height: %s", weightMean, heightMean) );
```

# The Mode

The mode is the data point that is most prevalent in the data set. It represents the most likely outcome in a dataset.

## Ruby

```
weight = [115, 115, 117, 120, 123, 126, 129, 132, 135, 139, 142, 146, 150, 154, 159, 164]
height = [59, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72]
# The mode is the single most popular item in the array.
def modes(array, find_all=true)
histogram = array.inject(Hash.new(0)) { |h, n| h[n] += 1; h }
modes = nil
histogram.each_pair do |item, times|
modes << item if modes && times == modes[0] and find_all
modes = [times, item] if (!modes && times>1) or (modes && times>modes[0])
end
return modes ? modes[1...modes.size] : modes
end
puts %Q{ Mode Weight: #{modes(weight)}, Mode Height: #{modes(height)} }
```

## R

```
weight <- c(115, 115, 117, 120, 123, 126, 129, 132, 135, 139, 142, 146, 150, 154, 159, 164)
height <- c(58, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72)
get_mode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
height_mode <- get_mode(height)
weight_mode <- get_mode(weight)
sprintf("Mode Weight: %s, Height Mode: %s", weight_mode, height_mode)
```

## Javascript

```
const util = require("util");
const math = require("mathjs");
let weight = [115, 115, 117, 120, 123, 126, 129, 132, 135, 139, 142, 146, 150, 154, 159, 164];
let height = [58, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72]
var weightMedian = math.mode(weight);
var heightMedian = math.mode(height);
console.log( util.format("Median Weight %s, Median Height: %s", weightMedian, heightMedian) );
```

# Standard Deviation

Standard Deviation is the average (square) distance from the mean. Said differently, it's a number that measures how close your data set –as a whole– is to the mean.

This data point will help you get a better of field of the distribution of your datapoints.

## Ruby

```
weight = [115, 115, 117, 120, 123, 126, 129, 132, 135, 139, 142, 146, 150, 154, 159, 164]
height = [59, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72]
def mean(array)
array = array.inject(0) { |sum, x| sum += x } / array.size.to_f
end
def standard_deviation(array)
m = mean(array)
variance = array.inject(0) { |variance, x| variance += (x - m) ** 2 }
standard_deviation = Math.sqrt(variance/(array.size-1))
# Round floating point to 4 decimals
format = "%0.4f"
return format % standard_deviation
end
puts %Q{ Weight SD: #{standard_deviation(weight)}, Height SD: #{standard_deviation(height)} }
```

## R

*R method sd uses sample standard deviation, not the population standard deviation*.

```
weight <- c(115, 117, 120, 123, 126, 129, 132, 135, 139, 142, 146, 150, 154, 159, 164)
height <- c(58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72)
weight_sd <- sd(weight)
height_sd <- sd(height)
sprintf("Weight SD: %1.4f, Height SD: %1.4f", weight_sd, height_sd)
```

## Javascript

```
const util = require("util");
const math = require("mathjs");
let weight = [115, 115, 117, 120, 123, 126, 129, 132, 135, 139, 142, 146, 150, 154, 159, 164];
let height = [58, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72]
var weightSD = math.std(weight).toFixed(4);
var heightSD = math.std(height).toFixed(4);
console.log( util.format("Weight SD %s, Height SD: %s", weightSD, heightSD) );
```

# Z Scores

Z-scores are simple arithmetic transformations of the actual measurements.

## R

In `R`

, you can calculate the z-score using the `scale()`

method.

### Longhand

This is using the z-score algebraic expression.

```
weight <- c(115, 117, 120, 123, 126, 129, 132, 135, 139, 142, 146, 150, 154, 159, 164)
height <- c(58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72)
x <- 50
zWeight <- (x - mean(weight) ) / sd(weight)
zHeight <- (x - mean(height) ) / sd(height)
sprintf("Weight Z: %1.2f. Height Z: %1.2f", zWeight, zHeight)
```

This is using R's `scale()`

method.

```
weight <- c(115, 117, 120, 123, 126, 129, 132, 135, 139, 142, 146, 150, 154, 159, 164)
height <- c(58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72)
x <- 50
zWeight <- scale(x, center = mean(weight), scale = sd(weight))
zHeight <- scale(x, center = mean(height), scale = sd(height))
sprintf("Weight Z: %1.2f. Height Z: %1.2f", zWeight, zHeight)
```

## Javascript

```
const util = require("util");
const math = require("mathjs");
let weight = [115, 117, 120, 123, 126, 129, 132, 135, 139, 142, 146, 150, 154, 159, 164];
let height = [58, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72]
//How many standard deviations our datapoints lie from the mean
//This will help you determine if a specific datapoint is an outlier
function zScore(datapoint, mean, std, n=1){
let score = (datapoint - mean) / (std / Math.sqrt(n) );
// Number of standard deviations from the mean.
return Number(score).toFixed(4);
}
var x = 50
var mean = math.mean(weight)
var sd = math.std(weight);
var zWeight = zScore(x, mean, sd);
var mean = math.mean(height)
var sd = math.std(height);
var zHeight = zScore(x, mean, sd);
console.log( util.format("Weight Z %s, Height Z: %s", zWeight, zHeight) );
```

# Correlation

This little method in R is very handy. Sometimes you might want to ask yourself, "Are these two data points correlated?" Using R, it's very easy to understand `p`

.

## R

```
weight <- c(115, 117, 120, 123, 126, 129, 132, 135, 139, 142, 146, 150, 154, 159, 164)
height <- c(58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72)
# What percentage of correlation
cor <- cor(weight, height)
sprintf("Percentage of Correlation: %f", cor)
```