In statistics, distribution is just a collection of data, or scores, on a variable. These results are often arranged in ascending order from minimum to maximum and then visually shown. In other words, a distribution function is a mathematical statement that defines the probability of many possible outcomes for an experiment.
Let’s assume we are doing a fair coin-tossing experiment. Head and tail are two possible outcomes. For example, if we use the letter X to represent the occurrences, the probability distribution of X will be 0.5 for X (heads) and 0.5 for X (tails). Similarly, to find the number of calls per hour in an organization, Poisson distribution is used. Likewise, different types of distributions have many applications in real life. Here, let us discuss the brief about distributions and their types.
Before proceeding to distributions, it is essential to understand the term “data,” which is crucial for understanding statistics.
Data and Its Types
Data is a collection of facts, figures, and statistics (numbers, words, measurements, observations) gathered for analysis. The data can be divided into discrete and continuous types.
- Discrete data is countable and has a finite number of possible values. When stated in decimal format, it makes no sense. The discrete random variable is the random variable that carries discrete data. The number of employees in an organization is an example of discrete data.
- The continuous random variable is a random variable that carries continuous values. The continuous data is collected across a finite or infinite range and can take any number of values. It can be expressed as a decimal number. The height of a person is an example of continuous data.
There are two sorts of distribution functions based on the data types. Discrete distributions are used for discrete data, while continuous distributions are used for continuous data.
- Binomial distribution, Uniform distribution, Poisson distribution, Bernoulli’s distribution are the types of Discrete distribution.
- Normal distribution, Chi-squared distribution, T-distribution, Standard normal distribution are included in the Continuous distribution.
The “Sampling distribution” is yet another basic form of distribution in statistics. A sampling distribution is the probability distribution of a statistic derived from a bigger sample size gathered from a given population. The sampling distribution of a population is the range of possible results for a population statistic.
In addition to this, various sorts of testing are necessary when working with distributions. For example, if you are working with non-binary discrete data, you will almost certainly need to use a Chi-square goodness-of-fit test to see whether your data fits a discrete probability distribution. Chi-square goodness-of-fit tests have alternative and null hypothesis, just like any other statistical hypothesis test. The theoretical frequencies are compared to these tests’ observed values’ frequencies. If the difference is statistically significant, you can infer that your data does not follow that discrete distribution.
As a result, distributions are vital in statistics since we need to collect and estimate distribution parameters. As a result, distribution is required to draw conclusions about the whole data.