Chebyshev’s theorem is a fundamental concept in statistics that allows us to determine the probability of data values falling within a certain range. This theorem makes it possible to calculate the probability of a given dataset being within k standard deviations away from the mean. It is important for data scientists, statisticians, and analysts to understand this theorem as it can be used to assess the spread of data points around a mean value.

## What is Chebyshev’s Theorem?

Chebyshev’s Theorem is used to determine the approx. percentage of values that lie within a given number of standard deviations from the mean of a set of data whose shape of distribution is unknown or it is unknown whether the data is normally distributed. This theorem can be applied to all distributions regardless of their shape and can be used whenever the data distribution shape is unknown or is non normal. If the data distribution is known as normal distribution, one can apply the empirical rule (68-95-99.7) which looks like the following and states that given normal data distribution, 68% of the data falls within 1 standard deviation, 95% of data falls within two standard deviation and 99.7 % of data falls within 3 standard deviations.

Chebyshev’s theorem can be applied to data that are normally distributed as well as data that are non-normally distributed. However, for normal data distribution, empirical rule is widely used.

As per Chebyshev’s theorem, at least [latex]1 – \frac{1}{k^2}[/latex] values will fall within ±k standard deviations of the mean regardless of the shape of the distribution for values of k > 1. This looks like the following when plotted. The plot represents that 75% of values will fall under 2 standard deviations of mean and 88.88% of values will fall within 3 standard deviations of the mean.

75% is calculated as 1 − 1/k2 = 1 − 1/22 = 3/4 = .75. Note the value of k = 2. When comparing with the empirical rule if the data are normally distributed, 95% of all values are within μ ± 2σ (2 standard deviations). Similarly, the percentage of values within 3 standard deviations of the mean is at least 89%, in contrast to 99.7% for the empirical rule.

## Chebyshev’s Formula

The formula for Chebyshev’s theorem looks like the following:

## Chebyshev’s Example

Let’s look at an example to better understand how Chebyshev’s theorem works in practice. The Chebyshev theorem states that if the mean (μ) and standard deviation (σ) of a data set are known, then at least 75% of the data points should lie within two standard deviations of the mean (μ ± 2σ). This means that any two numbers that are two standard deviations away from the mean will contain at least 75% of the points in the data set. For example, if μ = 10 and σ = 2, then all points between 6 and 14 will contain at least 75% of our data points.

## Conclusion

In summary, Chebyshev’s theorem provides us with an easy way to calculate how many data points should fall within a certain range from their mean value based on their standard deviation and desired variance level if the data distribution is unknown or non-normal. This understanding can help us better assess where outliers may exist in our datasets and use this information for further analysis or predictions about our datasets’ behaviors over time. Whether you’re working with small or large datasets, having an understanding of Chebychev’s theorem can help you get more accurate insights into your data!