# Simple Statistics, a programming library for doing statistics.

Simple statistics is a JavaScript library that does descriptive statistics, regression, and classification. It can tell you basic things like minimum and maximum, but it can also compute tricky things like standard deviation and sample correlation. But more importantly, it's see-through: you can read it, and it's written in a friendly, literate manner.

## Play!

edit the left
to change the right

## Use Simple Statistics

node.js
``npm install simple-statistics``
``var ss = require('simple-statistics');``
browser
git
``git clone git://github.com/tmcw/simple-statistics.git``

## API

Basic contracts of functions:

• Functions do not modify their arguments e.g. change their order
• Invalid input, like empty lists to functions that need 1+ items to work, will cause functions to return `null`.

# Basic Array Operations

### .mixin()

Optionally mix in the following functions into the `Array` prototype. Otherwise you can use them off of the simple-statistics object itself.

### .mean(x)

Mean of a single-dimensional Array of numbers. Also available as `.average(x)`

### .sum(x)

Sum of a single-dimensional Array of numbers.

### .variance(x)

Variance of a single-dimensional Array of numbers.

### .standard_deviation(x)

Standard Deviation of a single-dimensional Array of numbers.

### .median_absolute_deviation(x)

The Median Absolute Deviation (MAD) is a robust measure of statistical dispersion. It is more resilient to outliers than the standard deviation. Accepts a single-dimensional array of numbers and returns a dispersion value.

Also aliased to `.mad(x)` for brevity.

### .median(x)

Median of a single-dimensional array of numbers.

### .geometric_mean(x)

Geometric mean of a single-dimensional array of positive numbers.

### .min(x)

Finds the minimum of a single-dimensional array of numbers. This runs in linear `O(n)` time.

### .max(x)

Finds the maximum of a single-dimensional array of numbers. This runs in linear `O(n)` time.

### .t_test(sample, x)

Does a student's t-test of a dataset `sample`, represented by a single-dimensional array of numbers. `x` is the known value, and the result is a measure of statistical significance.

### .t_test_two_sample(sample_x, sample_y, difference)

The two-sample t-test is used to compare samples from two populations or groups, confirming or denying the suspicion (null hypothesis) that the populations are the same. It returns a t-value that you can then look up to give certain judgements of confidence based on a t distribution table.

This implementation expects the samples `sample_x` and `sample_y` to be given as one-dimensional arrays of more than one number each.

### .sample_variance(x)

Produces sample variance of a single-dimensional array of numbers.

### .sample_covariance(x)

Produces sample covariance of two single-dimensional arrays of numbers.

### .sample_correlation(x)

Produces sample correlation of two single-dimensional arrays of numbers.

### .quantile(sample, p)

Does a quantile of a dataset `sample`, at p. For those familiary with the `k/q` syntax, `p == k/q`. `sample` must be a single-dimensional array of numbers. p must be a number greater than or equal to than zero and less or equal to than one, or an array of numbers following that rule. If an array is given, an array of results will be returned instead of a single number.

### .quantile_sorted(sample, p)

Does a quantile of a dataset `sample`, at p. `sample` must be a one-dimensional sorted array of numbers, and `p` must be a single number from zero to one.

### .iqr(sample)

Calculates the Interquartile range of a sample - the difference between the upper and lower quartiles. Useful as a measure of dispersion.

Also available as `.interquartile_range(x)`

### .sample_skewness(sample)

Calculates the skewness of a sample, a measure of the extent to which a probability distribution of a real-valued random variable "leans" to one side of the mean. The skewness value can be positive or negative, or even undefined.

This implementation uses the Fisher-Pearson standardized moment coefficient, which means that it behaves the same as Excel, Minitab, SAS, and SPSS.

Skewness is only valid for samples of over three values.

### .jenks(data, number_of_classes)

Find the Jenks Natural Breaks for a single-dimensional array of numbers as input and a desired `number_of_classes`. The result is a single-dimensional with class breaks, including the minimum and maximum of the input array.

### .r_squared(data, function)

Find the r-squared value of a particular dataset, expressed as a two-dimensional `Array` of numbers, against a `Function`.

``var r_squared = ss.r_squared([[1, 1]], function(x) { return x * 2; });``

### .cumulative_std_normal_probability(z)

Look up the given `z` value in a standard normal table to calculate the probability of a random variable appearing with a given value.

### .z_score(x, mean, standard_deviation)

The standard score is the number of standard deviations an observation or datum is above or below the mean.

### .standard_normal_table

A standard normal table from which to pull values of Φ (phi).

## Regression

### .linear_regression()

Create a new linear regression solver.

#### .data([[1, 1], [2, 2]])

Set the data of a linear regression. The input is a two-dimensional array of numbers, which are treated as coordinates, like `[[x, y], [x1, y1]]`.

#### .line()

Get the linear regression line: this returns a function that you can give `x` values and it will return `y` values. Internally, this uses the `m()` and `b()` values and the classic `y = mx + b` equation.

``````var linear_regression_line = ss.linear_regression()
.data([[0, 1], [2, 2], [3, 3]]).line();
linear_regression_line(5);``````

#### .m()

Just get the slope of the fitted regression line, the `m` component of the full line equation. Returns a number.

#### .b()

Just get the y-intercept of the fitted regression line, the `b` component of the line equation. Returns a number.

## Classification

### .bayesian()

Create a naïve bayesian classifier.

### .train(item, category)

Train the classifier to classify a certain item, given as an object with keys, to be in a certain category, given as a string.

### .score(item)

Get the classifications of a certain item, given as an object of `category -> score` mappings.

``````var bayes = ss.bayesian();
bayes.train({ species: 'Cat' }, 'animal');
bayes.score({ species: 'Cat' });
// { animal: 1 }``````