R: Samples and Populations
This article aims to show you how to either create a random population or import a dataset then take a random sample using
What is a Sample?
So when you have a population of something, you'll start to notice that the population has certain characteristics. The characteristics (or parameters) could include the average (
mean) of the population, the
standard deviation of the population or something else.
In certain situtations, you might not know what those population parameters are so the way we try to estimate one is by taking a sample and study it.
Taking a Sample
When you take a sample, you need to know how many items to take. This is called the sample count and we will refer to as
n. From there, we can calculate a statistic that will be used to estimate a parameter.
Generate a Population, Take a Sample
Create a Random Population in R
We use the matrix object to create a random matrix.
# Set the seed of R's random number generator, which is useful for creating simulations or random objects that can be reproduced. # set.seed(5) # Create a matrix object nCols = 5 nRows = 3 population <- matrix( runif(nCols * nRows), ncol = nCols ) # Print the Population print.listof(list(population))
Here's how to only show the first row of apopulation.
# Print the First Row of the Population first_row <- population[1,] print(first_row)
Here's how to show the first column of a population.
# Print the First Column of the Population first_col <- population[,1] print(first_col)
Take a Sample from the Population
# Create a Random Population nCols = 5 nRows = 3 population <- matrix( runif(nCols * nRows), ncol = nCols ) # Print the Population print.listof(list(population)) # Generate a Random Sample from the Population n <- 5 random_sample = sample(population, n) # Print a Random Sample from the Population sprintf("Randon Sample of %s item is %1.7f", length(n), random_sample)
Import a Population, Take a Sample
In this example, we are importing CSV data from Github and taking a random sample.
# This will allow you to reproduce the same random results I do. set.seed(10) # 2. Load CSV Data df <- read.csv('https://raw.githubusercontent.com/thomaspernet/data_csv_r/master/data/women.csv', header=T ) # 3. Get a Random Sample of the Data num_of_rows = 10 my_sample = df[sample(nrow(df), num_of_rows), ] print(my_sample)
- Sampling Distributions on Khan Academy.
- Generating Random Samples from Other Distributions