Welcome to the Numplexis standard deviation guide. The following contents will present how to conduct a basic statistic assessment of a data set by calculating it's standard deviation. For background the guide will detail the meaning of the calculation and how it can be used to predict future values of the data set under study. For simplicity the guide will assume you have foundational knowledge on how to use excel including opening a blank workbook, clicking and editing cells, and accessing the formula bar. If you are unfamiliar with these actions it is recommended to learn the basics at the Goodwill Community Foundation's Excel 2016 Basics videos.
The text box below contains a comma delimited population sample set this guild will use to demonstrate how to calculate it's standard deviation and corresponding normal distribution. Please edit the text box as you wish to observe how it effects the output results below.Figure 1. Data set histogram with normal distribution overlay Table 1. Statistics Summary Table
|2||Adrstart||=ADDRESS( ROW(G2), COLUMN(G2))||text||Cell address of the first value in the input data set.|
|3||Adrend||=ADDRESS( ROW(G7),COLUMN(G7))||text||Cell address of the last value in the input data set.|
|4||µ||=AVERAGE( INDIRECT( B2&":"&B3))||number||Arithmetic mean of the input data set.|
|5||σ||=STDEV.S( INDIRECT( B2&":"&B3))||number||Total population standard deviation.|
Adjust the input boxes below to observe how changing the mean and standard deviation effects the position and shape of two normal distributions.
Let's cover some terms we use to create the statistics calculator.
|Mean||Also known as the arithmetic mean or average, the mean is the sum of a collection of values divided by the number of the values in the collection.|
|Standard Deviation||The standard deviation is a measure that is used to quantify the extent of variation in a collection or population of data values. A low standard deviation indicates the data points tend to be close in value to the population mean.|
|Probability Distribution||A mathematical function that links each outcome of a statistical experiment with it's probability of occurrence. You can think of a probability distribution as a tool for predicting future events given past observations. Probability distributions come in many different shapes and can be more or less suited to predicting a specific type of phenomenon partly based on how well it can conform to a given set of past observations.|
|Normal Distribution||Also known as the Gaussian distribution, the normal distribution is a widely used continuous probability distribution that resembles that of a bell curve centered on the population mean it aims to statistically represent. It consists of a family of curves whereby an individual curve is defined by two parameters. The first parameter is the population's mean value and the second is the population's standard deviation.|
Listed below are the formulas this guide will use to calculate the mean, standard deviation, and normal distribution for a generalized data set or population.
|x||An individual value or outcome of the data set under study.|
|n||Number of samples of the data set under study .|
|µ||The arithmetic mean of the data set under study.|
|σ||The standard deviation of the data set under study.|
|e||Called Euler's number, it is a mathematical constant approximately equal to 2.71828.|
|fnorm||The normal distribution probability density function.|
All you will need to get started is an installation of Microsoft Excel. For reference Microsoft Excel 2016 was used for this guide, but any recent version of Excel will do.
If you are creating a spreadsheet to solve a problem, you may wish to share it with coworker or need it years later. To address these possibilities it is best to add descriptions and figures to the sheet so anyone using it will understand what the spreadsheet is accomplishing and how. The below steps follow documentation best practices that you may find beneficial with any spreadsheet application.
Follow the below instructions to create the portion of the excel sheet that solves for the data set standard deviation and best fit normal distribution.Table 4. Data Set value table, copy below and paste to cell F1
|Data Value Index||Data Set Values|
|Adrstart||=ADDRESS( ROW(G2), COLUMN(G2))||text||Cell address of the first value in the input data set.|
|Adrend||=ADDRESS( ROW(G7), COLUMN(G7))||text||Cell address of the last value in the input data set.|
|µ||=AVERAGE( INDIRECT( B2&":"&B3))||number||Arithmetic mean of the input data set.|
|σ||=STDEV.S( INDIRECT( B2&":"&B3))||number||Sample standard deviation.|
To create the above figure within excel download this sample spreadsheet and paste your Tables 4 and 5 in cells F1 and A1 respectively. Be sure to adjust the axes of the bar and scatter plot to ensure the resulting histogram and normal distribution are correctly positioned relative to one another.