I'm currently studying statistics for my EMBA at USC, and the best way for me to learn is to both write down my notes and include some code.

# Mean

Mean - The sum of all data points divided by the total number of observations.

## Ruby

``````weight = [115, 117, 120, 123, 126, 129, 132, 135, 139, 142, 146, 150, 154, 159, 164]
height = [58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72]

# Provide the average
def mean(array)
array = array.inject(0) { |sum, x| sum += x } / array.size.to_f
end

puts %Q{ Mean Weight: #{mean(weight)}, Mean Height: #{mean(height)} }
``````

## R

``````weight <- c(115, 117, 120, 123, 126, 129, 132, 135, 139, 142, 146, 150, 154, 159, 164)
height <- c(58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72)

weight_mean <- mean(height)
height_mean <- mean(weight)

sprintf("Mean Weight: %1.4f, Mean Height: %1.4f", weight_mean, height_mean)
``````

## Javascript

``````const util = require("util");
const math = require("mathjs");

let weight = [115, 117, 120, 123, 126, 129, 132, 135, 139, 142, 146, 150, 154, 159, 164];
let height = [58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72]

var weightMean = math.mean(weight).toFixed(2);
var heightMean = math.mean(height).toFixed(2);

console.log( util.format("Mean Weight %s, Mean Height: %s", weightMean, heightMean) );
``````

`Ruby` is such an elegant language that shows your work is fun, but I love how `R` has a native method for `mean()`. In the Javascript example, I'm splitting the difference with a little help from MathJS package.

# Median

Median is the midpoint of data. Suppose you have 25 observations. The midpoint would be the middle observation or row 13.

When you are given a mean or median number, consider this it the beginning of an adventure.

## Ruby

``````weight = [115, 117, 120, 123, 126, 129, 132, 135, 139, 142, 146, 150, 154, 159, 164]
height = [58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72]

# If the array has an odd number, then simply pick the one in the middle
# If the array size is even, then we must calculate the mean of the two middle.
return nil if array.empty?
m_pos = array.size / 2
return array.size % 2 == 1 ? array[m_pos] : mean(array[m_pos-1..m_pos])
end

puts %Q{ Median Weight: #{median(weight)}, Median Height: #{median(height)} }
``````

## R

``````weight <- c(115, 117, 120, 123, 126, 129, 132, 135, 139, 142, 146, 150, 154, 159, 164)
height <- c(58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72)

weight_median <- median(weight)
height_median <- median(height)

sprintf("Median Weight: %s, Median Height: %s", weight_median, height_median)
``````

## Javascript

``````const util = require("util");
const math = require("mathjs");

let weight = [115, 117, 120, 123, 126, 129, 132, 135, 139, 142, 146, 150, 154, 159, 164];
let height = [58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72]

var weightMean = math.median(weight).toFixed(2);
var heightMean = math.median(height).toFixed(2);

console.log( util.format("Mean Weight %s, Mean Height: %s", weightMean, heightMean) );
``````

# The Mode

The mode is the data point that is most prevalent in the data set. It represents the most likely outcome in a dataset.

## Ruby

``````weight = [115, 115, 117, 120, 123, 126, 129, 132, 135, 139, 142, 146, 150, 154, 159, 164]
height = [59, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72]

# The mode is the single most popular item in the array.
def modes(array, find_all=true)
histogram = array.inject(Hash.new(0)) { |h, n| h[n] += 1; h }
modes = nil
histogram.each_pair do |item, times|
modes << item if modes && times == modes[0] and find_all
modes = [times, item] if (!modes && times>1) or (modes && times>modes[0])
end
return modes ? modes[1...modes.size] : modes
end

puts %Q{ Mode Weight: #{modes(weight)}, Mode Height: #{modes(height)} }
``````

## R

``````weight <- c(115, 115, 117, 120, 123, 126, 129, 132, 135, 139, 142, 146, 150, 154, 159, 164)
height <- c(58, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72)

get_mode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}

height_mode <- get_mode(height)
weight_mode <- get_mode(weight)

sprintf("Mode Weight: %s, Height Mode: %s", weight_mode, height_mode)
``````

## Javascript

``````const util = require("util");
const math = require("mathjs");

let weight = [115, 115, 117, 120, 123, 126, 129, 132, 135, 139, 142, 146, 150, 154, 159, 164];
let height = [58, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72]

var weightMedian = math.mode(weight);
var heightMedian = math.mode(height);

console.log( util.format("Median Weight %s, Median Height: %s", weightMedian, heightMedian) );
``````

# Standard Deviation

Standard Deviation is the average (square) distance from the mean. Said differently, it's a number that measures how close your data set –as a whole– is to the mean.

This data point will help you get a better field of the distribution of your data points.

## Ruby

``````weight = [115, 115, 117, 120, 123, 126, 129, 132, 135, 139, 142, 146, 150, 154, 159, 164]
height = [59, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72]

def mean(array)
array = array.inject(0) { |sum, x| sum += x } / array.size.to_f
end

def standard_deviation(array)
m = mean(array)
variance = array.inject(0) { |variance, x| variance += (x - m) ** 2 }
standard_deviation = Math.sqrt(variance/(array.size-1))

# Round floating point to 4 decimals
format = "%0.4f"
return format % standard_deviation
end

puts %Q{ Weight SD: #{standard_deviation(weight)}, Height SD: #{standard_deviation(height)} }
``````

## R

R method `sd` uses sample standard deviation, not the population standard Deviation.

``````weight <- c(115, 117, 120, 123, 126, 129, 132, 135, 139, 142, 146, 150, 154, 159, 164)
height <- c(58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72)

weight_sd <- sd(weight)
height_sd <- sd(height)

sprintf("Weight SD: %1.4f, Height SD: %1.4f", weight_sd, height_sd)
``````

## Javascript

``````const util = require("util");
const math = require("mathjs");

let weight = [115, 115, 117, 120, 123, 126, 129, 132, 135, 139, 142, 146, 150, 154, 159, 164];
let height = [58, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72]

var weightSD = math.std(weight).toFixed(4);
var heightSD = math.std(height).toFixed(4);

console.log( util.format("Weight SD %s, Height SD: %s", weightSD, heightSD) );

``````

# Z Scores

Z-scores are simple arithmetic transformations of the actual measurements.

## R

In `R`, you can calculate the z-score using the `scale()` method.

### Longhand

This is using the z-score algebraic expression.

``````weight <- c(115, 117, 120, 123, 126, 129, 132, 135, 139, 142, 146, 150, 154, 159, 164)
height <- c(58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72)
x <- 50

zWeight <- (x - mean(weight) ) / sd(weight)
zHeight <- (x - mean(height) ) / sd(height)
sprintf("Weight Z: %1.2f. Height Z: %1.2f", zWeight, zHeight)
``````

This is using R's `scale()` method.

``````weight <- c(115, 117, 120, 123, 126, 129, 132, 135, 139, 142, 146, 150, 154, 159, 164)
height <- c(58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72)
x <- 50

zWeight <- scale(x, center = mean(weight), scale = sd(weight))
zHeight <- scale(x, center = mean(height), scale = sd(height))
sprintf("Weight Z: %1.2f. Height Z: %1.2f", zWeight, zHeight)
``````

## Javascript

``````const util = require("util");
const math = require("mathjs");

let weight = [115, 117, 120, 123, 126, 129, 132, 135, 139, 142, 146, 150, 154, 159, 164];
let height = [58, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72]

//How many standard deviations our datapoints lie from the mean
//This will help you determine if a specific datapoint is an outlier
function zScore(datapoint, mean, std, n=1){
let score = (datapoint - mean) / (std / Math.sqrt(n) );
// Number of standard deviations from the mean.
return Number(score).toFixed(4);
}

var x = 50
var mean = math.mean(weight)
var sd = math.std(weight);
var zWeight = zScore(x, mean, sd);

var mean = math.mean(height)
var sd = math.std(height);
var zHeight = zScore(x, mean, sd);
console.log( util.format("Weight Z %s, Height Z: %s", zWeight, zHeight) );
``````

# Correlation

This little method in R is convenient. Sometimes you might want to ask yourself, "Are these two data points correlated?" Using R, it's straightforward to understand `p`.

## R

``````weight <- c(115, 117, 120, 123, 126, 129, 132, 135, 139, 142, 146, 150, 154, 159, 164)
height <- c(58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72)

# What percentage of correlation
cor <- cor(weight, height)

sprintf("Percentage of Correlation: %f", cor)
``````