Probability theory is a special branch of mathematics that is studied only by students of higher educational institutions. Do you like calculations and formulas? You are not afraid of the prospects of getting acquainted with the normal distribution, ensemble entropy, mathematical expectation and discrete dispersion random variable? Then this subject will be very interesting to you. Let's take a look at a few of the most important basic concepts this branch of science.

Let's remember the basics

Even if you remember the simplest concepts of probability theory, do not neglect the first paragraphs of the article. The point is that without a clear understanding of the basics, you will not be able to work with the formulas discussed below.

So, some random event occurs, some experiment. As a result of the actions we take, we can get several outcomes - some of them occur more often, others less often. The probability of an event is the ratio of the number of actually obtained outcomes of one type to total number possible. Only knowing the classic definition of this concept can you begin to study mathematical expectation and variances of continuous random variables.

Average

Back in school, during math lessons, you started working with the arithmetic mean. This concept is widely used in probability theory, and therefore cannot be ignored. The main thing for us is this moment is that we will encounter it in the formulas for the mathematical expectation and dispersion of a random variable.

We have a sequence of numbers and want to find the arithmetic mean. All that is required of us is to sum up everything available and divide by the number of elements in the sequence. Let us have numbers from 1 to 9. The sum of the elements will be equal to 45, and we will divide this value by 9. Answer: - 5.

Dispersion

In scientific terms, dispersion is the average square of deviations of the obtained values ​​of a characteristic from the arithmetic mean. It is denoted by one capital Latin letter D. What is needed to calculate it? For each element of the sequence, we calculate the difference between the existing number and the arithmetic mean and square it. There will be exactly as many values ​​as there can be outcomes for the event we are considering. Next, we sum up everything received and divide by the number of elements in the sequence. If we have five possible outcomes, then divide by five.

Dispersion also has properties that need to be remembered in order to be used when solving problems. For example, when increasing a random variable by X times, the variance increases by X squared times (i.e. X*X). She never happens less than zero and does not depend on shifting values ​​by an equal value up or down. Additionally, for independent trials, the variance of the sum is equal to the sum of the variances.

Now we definitely need to consider examples of the variance of a discrete random variable and the mathematical expectation.

Let's say we ran 21 experiments and got 7 different outcomes. We observed each of them 1, 2, 2, 3, 4, 4 and 5 times, respectively. What will the variance be equal to?

First, let's calculate the arithmetic mean: the sum of the elements, of course, is 21. Divide it by 7, getting 3. Now subtract 3 from each number in the original sequence, square each value, and add the results together. The result is 12. Now all we have to do is divide the number by the number of elements, and, it would seem, that’s all. But there's a catch! Let's discuss it.

Dependence on the number of experiments

It turns out that when calculating variance, the denominator can contain one of two numbers: either N or N-1. Here N is the number of experiments performed or the number of elements in the sequence (which is essentially the same thing). What does this depend on?

If the number of tests is measured in hundreds, then we must put N in the denominator. If in units, then N-1. Scientists decided to draw the border quite symbolically: today it passes through the number 30. If we conducted less than 30 experiments, then we will divide the amount by N-1, and if more, then by N.

Task

Let's return to our example of solving the problem of variance and mathematical expectation. We got an intermediate number 12, which needed to be divided by N or N-1. Since we conducted 21 experiments, which is less than 30, we will choose the second option. So the answer is: the variance is 12 / 2 = 2.

Expected value

Let's move on to the second concept, which we must consider in this article. The mathematical expectation is the result of adding all possible outcomes multiplied by the corresponding probabilities. It is important to understand that the obtained value, as well as the result of calculating the variance, is obtained only once for the entire problem, no matter how many outcomes are considered in it.

The formula for mathematical expectation is quite simple: we take the outcome, multiply it by its probability, add the same for the second, third result, etc. Everything related to this concept is not difficult to calculate. For example, the sum of the expected values ​​is equal to the expected value of the sum. The same is true for the work. Not every quantity in probability theory allows you to perform such simple operations. Let's take the problem and calculate the meaning of two concepts we have studied at once. Besides, we were distracted by theory - it's time to practice.

One more example

We ran 50 trials and got 10 types of outcomes - numbers from 0 to 9 - appearing in different percentages. These are, respectively: 2%, 10%, 4%, 14%, 2%,18%, 6%, 16%, 10%, 18%. Recall that to obtain probabilities, you need to divide the percentage values ​​by 100. Thus, we get 0.02; 0.1, etc. Let us present an example of solving the problem for the variance of a random variable and the mathematical expectation.

We calculate the arithmetic mean using the formula that we remember from junior school: 50/10 = 5.

Now let’s convert the probabilities into the number of outcomes “in pieces” to make it easier to count. We get 1, 5, 2, 7, 1, 9, 3, 8, 5 and 9. From each value obtained, we subtract the arithmetic mean, after which we square each of the results obtained. See how to do this using the first element as an example: 1 - 5 = (-4). Next: (-4) * (-4) = 16. For other values, do these operations yourself. If you did everything correctly, then after adding them all up you will get 90.

Let's continue calculating the variance and expected value by dividing 90 by N. Why do we choose N rather than N-1? Correct, because the number of experiments performed exceeds 30. So: 90/10 = 9. We got the variance. If you get a different number, don't despair. Most likely, you made a simple mistake in the calculations. Double-check what you wrote, and everything will probably fall into place.

Finally, remember the formula for mathematical expectation. We will not give all the calculations, we will only write an answer that you can check with after completing all the required procedures. The expected value will be 5.48. Let us only recall how to carry out operations, using the first elements as an example: 0*0.02 + 1*0.1... and so on. As you can see, we simply multiply the outcome value by its probability.

Deviation

Another concept closely related to dispersion and mathematical expectation is standard deviation. It is denoted either by the Latin letters sd, or by the Greek lowercase “sigma”. This concept shows how much on average the values ​​deviate from the central feature. To find its value, you need to calculate Square root from dispersion.

If you plot a normal distribution graph and want to see directly on it square deviation, this can be done in several stages. Take half of the image to the left or right of the mode (central value), draw a perpendicular to the horizontal axis so that the areas of the resulting figures are equal. The size of the segment between the middle of the distribution and the resulting projection onto the horizontal axis will represent the standard deviation.

Software

As can be seen from the descriptions of the formulas and the examples presented, calculating variance and mathematical expectation is not the simplest procedure from an arithmetic point of view. In order not to waste time, it makes sense to use the program used in higher education educational institutions- it's called "R". It has functions that allow you to calculate values ​​for many concepts from statistics and probability theory.

For example, you specify a vector of values. This is done as follows: vector<-c(1,5,2…). Теперь, когда вам потребуется посчитать какие-либо значения для этого вектора, вы пишете функцию и задаете его в качестве аргумента. Для нахождения дисперсии вам нужно будет использовать функцию var. Пример её использования: var(vector). Далее вы просто нажимаете «ввод» и получаете результат.

Finally

Dispersion and mathematical expectation are without which it is difficult to calculate anything in the future. In the main course of lectures at universities, they are discussed already in the first months of studying the subject. It is precisely because of the lack of understanding of these simple concepts and the inability to calculate them that many students immediately begin to fall behind in the program and later receive bad grades at the end of the session, which deprives them of scholarships.

Practice for at least one week, half an hour a day, solving tasks similar to those presented in this article. Then, on any test in probability theory, you will be able to cope with the examples without extraneous tips and cheat sheets.

The variance of a random variable is a measure of the spread of the values ​​of this variable. Low variance means that the values ​​are clustered close together. Large dispersion indicates a strong spread of values. The concept of variance of a random variable is used in statistics. For example, if you compare the variance of two values ​​(such as between male and female patients), you can test the significance of a variable. Variance is also used when building statistical models, since low variance can be a sign that you are overfitting the values.

Steps

Calculating sample variance

  1. Record the sample values. In most cases, statisticians only have access to samples of specific populations. For example, as a rule, statisticians do not analyze the cost of maintaining the totality of all cars in Russia - they analyze a random sample of several thousand cars. Such a sample will help determine the average cost of a car, but, most likely, the resulting value will be far from the real one.

    • For example, let's analyze the number of buns sold in a cafe over 6 days, taken in random order. The sample looks like this: 17, 15, 23, 7, 9, 13. This is a sample, not a population, because we do not have data on buns sold for each day the cafe is open.
    • If you are given a population rather than a sample of values, continue to the next section.
  2. Write down a formula to calculate sample variance. Dispersion is a measure of the spread of values ​​of a certain quantity. The closer the variance value is to zero, the closer the values ​​are grouped together. When working with a sample of values, use the following formula to calculate variance:

    • s 2 (\displaystyle s^(2)) = ∑[(x i (\displaystyle x_(i))- x̅) 2 (\displaystyle ^(2))] / (n - 1)
    • s 2 (\displaystyle s^(2))– this is dispersion. Dispersion is measured in square units.
    • x i (\displaystyle x_(i))– each value in the sample.
    • x i (\displaystyle x_(i)) you need to subtract x̅, square it, and then add the results.
    • x̅ – sample mean (sample mean).
    • n – number of values ​​in the sample.
  3. Calculate the sample mean. It is denoted as x̅. The sample mean is calculated as a simple arithmetic mean: add up all the values ​​in the sample, and then divide the result by the number of values ​​in the sample.

    • In our example, add the values ​​in the sample: 15 + 17 + 23 + 7 + 9 + 13 = 84
      Now divide the result by the number of values ​​in the sample (in our example there are 6): 84 ÷ 6 = 14.
      Sample mean x̅ = 14.
    • The sample mean is the central value around which the values ​​in the sample are distributed. If the values ​​in the sample cluster around the sample mean, then the variance is small; otherwise the variance is large.
  4. Subtract the sample mean from each value in the sample. Now calculate the difference x i (\displaystyle x_(i))- x̅, where x i (\displaystyle x_(i))– each value in the sample. Each result obtained indicates the degree of deviation of a particular value from the sample mean, that is, how far this value is from the sample mean.

    • In our example:
      x 1 (\displaystyle x_(1))- x = 17 - 14 = 3
      x 2 (\displaystyle x_(2))- x̅ = 15 - 14 = 1
      x 3 (\displaystyle x_(3))- x = 23 - 14 = 9
      x 4 (\displaystyle x_(4))- x̅ = 7 - 14 = -7
      x 5 (\displaystyle x_(5))- x̅ = 9 - 14 = -5
      x 6 (\displaystyle x_(6))- x̅ = 13 - 14 = -1
    • The correctness of the results obtained is easy to check, since their sum should be equal to zero. This is related to the definition of the average, since negative values ​​(distances from the average to smaller values) are completely offset by positive values ​​(distances from the average to larger values).
  5. As noted above, the sum of the differences x i (\displaystyle x_(i))- x̅ must be equal to zero. This means that the average variance is always zero, which does not give any idea about the spread of values ​​of a certain quantity. To solve this problem, square each difference x i (\displaystyle x_(i))- x̅. This will result in you only getting positive numbers, which will never add up to 0.

    • In our example:
      (x 1 (\displaystyle x_(1))- x̅) 2 = 3 2 = 9 (\displaystyle ^(2)=3^(2)=9)
      (x 2 (\displaystyle (x_(2))- x̅) 2 = 1 2 = 1 (\displaystyle ^(2)=1^(2)=1)
      9 2 = 81
      (-7) 2 = 49
      (-5) 2 = 25
      (-1) 2 = 1
    • You found the square of the difference - x̅) 2 (\displaystyle ^(2)) for each value in the sample.
  6. Calculate the sum of the squares of the differences. That is, find that part of the formula that is written like this: ∑[( x i (\displaystyle x_(i))- x̅) 2 (\displaystyle ^(2))]. Here the sign Σ means the sum of squared differences for each value x i (\displaystyle x_(i)) in the sample. You have already found the squared differences (x i (\displaystyle (x_(i))- x̅) 2 (\displaystyle ^(2)) for each value x i (\displaystyle x_(i)) in the sample; now just add these squares.

    • In our example: 9 + 1 + 81 + 49 + 25 + 1 = 166 .
  7. Divide the result by n - 1, where n is the number of values ​​in the sample. Some time ago, to calculate sample variance, statisticians simply divided the result by n; in this case you will get the mean of the squared variance, which is ideal for describing the variance of a given sample. But remember that any sample is only a small part of the population of values. If you take another sample and perform the same calculations, you will get a different result. As it turns out, dividing by n - 1 (rather than just n) gives a more accurate estimate of the population variance, which is what you're interested in. Division by n – 1 has become common, so it is included in the formula for calculating sample variance.

    • In our example, the sample includes 6 values, that is, n = 6.
      Sample variance = s 2 = 166 6 − 1 = (\displaystyle s^(2)=(\frac (166)(6-1))=) 33,2
  8. The difference between variance and standard deviation. Note that the formula contains an exponent, so the dispersion is measured in square units of the value being analyzed. Sometimes such a magnitude is quite difficult to operate; in such cases, use the standard deviation, which is equal to the square root of the variance. That is why the sample variance is denoted as s 2 (\displaystyle s^(2)), and the standard deviation of the sample is as s (\displaystyle s).

    • In our example, the standard deviation of the sample is: s = √33.2 = 5.76.

    Calculating Population Variance

    1. Analyze some set of values. The set includes all values ​​of the quantity under consideration. For example, if you are studying the age of residents of the Leningrad region, then the totality includes the age of all residents of this region. When working with a population, it is recommended to create a table and enter the population values ​​into it. Consider the following example:

      • In a certain room there are 6 aquariums. Each aquarium contains the following number of fish:
        x 1 = 5 (\displaystyle x_(1)=5)
        x 2 = 5 (\displaystyle x_(2)=5)
        x 3 = 8 (\displaystyle x_(3)=8)
        x 4 = 12 (\displaystyle x_(4)=12)
        x 5 = 15 (\displaystyle x_(5)=15)
        x 6 = 18 (\displaystyle x_(6)=18)
    2. Write down a formula to calculate the population variance. Since the population includes all values ​​of a certain quantity, the formula below allows you to obtain the exact value of the population variance. To distinguish population variance from sample variance (which is only an estimate), statisticians use various variables:

      • σ 2 (\displaystyle ^(2)) = (∑(x i (\displaystyle x_(i)) - μ) 2 (\displaystyle ^(2)))/n
      • σ 2 (\displaystyle ^(2))– population dispersion (read as “sigma squared”). Dispersion is measured in square units.
      • x i (\displaystyle x_(i))– each value in its entirety.
      • Σ – sum sign. That is, from each value x i (\displaystyle x_(i)) you need to subtract μ, square it, and then add the results.
      • μ – population mean.
      • n – number of values ​​in the population.
    3. Calculate the population mean. When working with a population, its mean is denoted as μ (mu). The population mean is calculated as a simple arithmetic mean: add up all the values ​​in the population, and then divide the result by the number of values ​​in the population.

      • Keep in mind that averages are not always calculated as the arithmetic mean.
      • In our example, the population mean: μ = 5 + 5 + 8 + 12 + 15 + 18 6 (\displaystyle (\frac (5+5+8+12+15+18)(6))) = 10,5
    4. Subtract the population mean from each value in the population. The closer the difference value is to zero, the closer the specific value is to the population mean. Find the difference between each value in the population and its mean, and you will get a first idea of ​​the distribution of values.

      • In our example:
        x 1 (\displaystyle x_(1))- μ = 5 - 10.5 = -5.5
        x 2 (\displaystyle x_(2))- μ = 5 - 10.5 = -5.5
        x 3 (\displaystyle x_(3))- μ = 8 - 10.5 = -2.5
        x 4 (\displaystyle x_(4))- μ = 12 - 10.5 = 1.5
        x 5 (\displaystyle x_(5))- μ = 15 - 10.5 = 4.5
        x 6 (\displaystyle x_(6))- μ = 18 - 10.5 = 7.5
    5. Square each result obtained. The difference values ​​will be both positive and negative; If these values ​​are plotted on a number line, they will lie to the right and left of the population mean. This is not good for calculating variance because positive and negative numbers cancel each other out. So square each difference to get exclusively positive numbers.

      • In our example:
        (x i (\displaystyle x_(i)) - μ) 2 (\displaystyle ^(2)) for each population value (from i = 1 to i = 6):
        (-5,5)2 (\displaystyle ^(2)) = 30,25
        (-5,5)2 (\displaystyle ^(2)), Where x n (\displaystyle x_(n))– the last value in the population.
      • To calculate the average value of the results obtained, you need to find their sum and divide it by n:(( x 1 (\displaystyle x_(1)) - μ) 2 (\displaystyle ^(2)) + (x 2 (\displaystyle x_(2)) - μ) 2 (\displaystyle ^(2)) + ... + (x n (\displaystyle x_(n)) - μ) 2 (\displaystyle ^(2)))/n
      • Now let's write down the above explanation using variables: (∑( x i (\displaystyle x_(i)) - μ) 2 (\displaystyle ^(2))) / n and get a formula for calculating the population variance.

.

Conversely, if is a non-negative a.e. function such that , then there is an absolutely continuous probability measure on such that it is its density.

    Replacing the measure in the Lebesgue integral:

,

where is any Borel function that is integrable with respect to the probability measure.

Dispersion, types and properties of dispersion The concept of dispersion

Dispersion in statistics is found as the standard deviation of the individual values ​​of the characteristic squared from the arithmetic mean. Depending on the initial data, it is determined using the simple and weighted variance formulas:

1. Simple variance(for ungrouped data) is calculated using the formula:

2. Weighted variance (for variation series):

where n is frequency (repeatability of factor X)

An example of finding variance

This page describes a standard example of finding variance, you can also look at other problems for finding it

Example 1. Determination of group, group average, intergroup and total variance

Example 2. Finding the variance and coefficient of variation in a grouping table

Example 3. Finding variance in a discrete series

Example 4. The following data is available for a group of 20 correspondence students. It is necessary to construct an interval series of the distribution of the characteristic, calculate the average value of the characteristic and study its dispersion

Let's build an interval grouping. Let's determine the range of the interval using the formula:

where X max is the maximum value of the grouping characteristic; X min – minimum value of the grouping characteristic; n – number of intervals:

We accept n=5. The step is: h = (192 - 159)/ 5 = 6.6

Let's create an interval grouping

For further calculations, we will build an auxiliary table:

X"i – the middle of the interval. (for example, the middle of the interval 159 – 165.6 = 162.3)

We determine the average height of students using the weighted arithmetic average formula:

Let's determine the variance using the formula:

The formula can be transformed like this:

From this formula it follows that variance is equal to the difference between the average of the squares of the options and the square and the average.

Dispersion in variation series with equal intervals using the method of moments can be calculated in the following way using the second property of dispersion (dividing all options by the value of the interval). Determining variance, calculated using the method of moments, using the following formula is less laborious:

where i is the value of the interval; A is a conventional zero, for which it is convenient to use the middle of the interval with the highest frequency; m1 is the square of the first order moment; m2 - moment of second order

Alternative trait variance (if in a statistical population a characteristic changes in such a way that there are only two mutually exclusive options, then such variability is called alternative) can be calculated using the formula:

Substituting q = 1- p into this dispersion formula, we get:

Types of variance

Total variance measures the variation of a characteristic across the entire population as a whole under the influence of all factors that cause this variation. It is equal to the mean square of the deviations of individual values ​​of a characteristic x from the overall mean value of x and can be defined as simple variance or weighted variance.

Within-group variance characterizes random variation, i.e. part of the variation that is due to the influence of unaccounted factors and does not depend on the factor-attribute that forms the basis of the group. Such dispersion is equal to the mean square of the deviations of individual values ​​of the attribute within group X from the arithmetic mean of the group and can be calculated as simple dispersion or as weighted dispersion.

Thus, within-group variance measures variation of a trait within a group and is determined by the formula:

where xi is the group average; ni is the number of units in the group.

For example, intragroup variances that need to be determined in the task of studying the influence of workers’ qualifications on the level of labor productivity in a workshop show variations in output in each group caused by all possible factors (technical condition of equipment, availability of tools and materials, age of workers, labor intensity, etc. .), except for differences in qualification category (within a group all workers have the same qualifications).

The average of within-group variances reflects random variation, that is, that part of the variation that occurred under the influence of all other factors, with the exception of the grouping factor. It is calculated using the formula:

Intergroup variance characterizes the systematic variation of the resulting characteristic, which is due to the influence of the factor-attribute that forms the basis of the group. It is equal to the mean square of the deviations of the group means from the overall mean. Intergroup variance is calculated using the formula:

Expectation and variance are the most commonly used numerical characteristics of a random variable. They characterize the most important features of the distribution: its position and degree of scattering. In many practical problems, a complete, exhaustive characteristic of a random variable - the distribution law - either cannot be obtained at all, or is not needed at all. In these cases, one is limited to an approximate description of a random variable using numerical characteristics.

The expected value is often called simply the average value of a random variable. Dispersion of a random variable is a characteristic of dispersion, the spread of a random variable around its mathematical expectation.

Expectation of a discrete random variable

Let us approach the concept of mathematical expectation, first based on the mechanical interpretation of the distribution of a discrete random variable. Let the unit mass be distributed between the points of the x-axis x1 , x 2 , ..., x n, and each material point has a corresponding mass of p1 , p 2 , ..., p n. It is required to select one point on the abscissa axis, characterizing the position of the entire system of material points, taking into account their masses. It is natural to take the center of mass of the system of material points as such a point. This is the weighted average of the random variable X, to which the abscissa of each point xi enters with a “weight” equal to the corresponding probability. The average value of the random variable obtained in this way X is called its mathematical expectation.

The mathematical expectation of a discrete random variable is the sum of the products of all its possible values ​​and the probabilities of these values:

Example 1. A win-win lottery has been organized. There are 1000 winnings, of which 400 are 10 rubles. 300 - 20 rubles each. 200 - 100 rubles each. and 100 - 200 rubles each. What is the average winnings for someone who buys one ticket?

Solution. We will find the average winnings if we divide the total amount of winnings, which is 10*400 + 20*300 + 100*200 + 200*100 = 50,000 rubles, by 1000 (total amount of winnings). Then we get 50000/1000 = 50 rubles. But the expression for calculating the average winnings can be presented in the following form:

On the other hand, in these conditions, the winning size is a random variable, which can take values ​​of 10, 20, 100 and 200 rubles. with probabilities equal to 0.4, respectively; 0.3; 0.2; 0.1. Therefore, the expected average win is equal to the sum of the products of the size of the wins and the probability of receiving them.

Example 2. The publisher decided to publish a new book. He plans to sell the book for 280 rubles, of which he himself will receive 200, 50 - the bookstore and 30 - the author. The table provides information about the costs of publishing a book and the probability of selling a certain number of copies of the book.

Find the publisher's expected profit.

Solution. The random variable “profit” is equal to the difference between the income from sales and the cost of expenses. For example, if 500 copies of a book are sold, then the income from the sale is 200 * 500 = 100,000, and the cost of publication is 225,000 rubles. Thus, the publisher faces a loss of 125,000 rubles. The following table summarizes the expected values ​​of the random variable - profit:

NumberProfit xi Probability pi xi p i
500 -125000 0,20 -25000
1000 -50000 0,40 -20000
2000 100000 0,25 25000
3000 250000 0,10 25000
4000 400000 0,05 20000
Total: 1,00 25000

Thus, we obtain the mathematical expectation of the publisher’s profit:

.

Example 3. Probability of hitting with one shot p= 0.2. Determine the consumption of projectiles that provide a mathematical expectation of the number of hits equal to 5.

Solution. From the same mathematical expectation formula that we have used so far, we express x- shell consumption:

.

Example 4. Determine the mathematical expectation of a random variable x number of hits with three shots, if the probability of a hit with each shot p = 0,4 .

Hint: find the probability of random variable values ​​by Bernoulli's formula .

Properties of mathematical expectation

Let's consider the properties of mathematical expectation.

Property 1. The mathematical expectation of a constant value is equal to this constant:

Property 2. The constant factor can be taken out of the mathematical expectation sign:

Property 3. The mathematical expectation of the sum (difference) of random variables is equal to the sum (difference) of their mathematical expectations:

Property 4. The mathematical expectation of a product of random variables is equal to the product of their mathematical expectations:

Property 5. If all values ​​of a random variable X decrease (increase) by the same number WITH, then its mathematical expectation will decrease (increase) by the same number:

When you can’t limit yourself only to mathematical expectation

In most cases, only the mathematical expectation cannot sufficiently characterize a random variable.

Let the random variables X And Y are given by the following distribution laws:

Meaning X Probability
-0,1 0,1
-0,01 0,2
0 0,4
0,01 0,2
0,1 0,1
Meaning Y Probability
-20 0,3
-10 0,1
0 0,2
10 0,1
20 0,3

The mathematical expectations of these quantities are the same - equal to zero:

However, their distribution patterns are different. Random value X can only take values ​​that differ little from the mathematical expectation, and the random variable Y can take values ​​that deviate significantly from the mathematical expectation. A similar example: the average wage does not make it possible to judge the share of high- and low-paid workers. In other words, one cannot judge from the mathematical expectation what deviations from it, at least on average, are possible. To do this, you need to find the variance of the random variable.

Variance of a discrete random variable

Variance discrete random variable X is called the mathematical expectation of the square of its deviation from the mathematical expectation:

The standard deviation of a random variable X the arithmetic value of the square root of its variance is called:

.

Example 5. Calculate variances and standard deviations of random variables X And Y, the distribution laws of which are given in the tables above.

Solution. Mathematical expectations of random variables X And Y, as found above, are equal to zero. According to the dispersion formula at E(X)=E(y)=0 we get:

Then the standard deviations of random variables X And Y make up

.

Thus, with the same mathematical expectations, the variance of the random variable X very small, but a random variable Y- significant. This is a consequence of differences in their distribution.

Example 6. The investor has 4 alternative investment projects. The table summarizes the expected profit in these projects with the corresponding probability.

Project 1Project 2Project 3Project 4
500, P=1 1000, P=0,5 500, P=0,5 500, P=0,5
0, P=0,5 1000, P=0,25 10500, P=0,25
0, P=0,25 9500, P=0,25

Find the mathematical expectation, variance and standard deviation for each alternative.

Solution. Let us show how these values ​​are calculated for the 3rd alternative:

The table summarizes the found values ​​for all alternatives.

All alternatives have the same mathematical expectations. This means that in the long run everyone has the same income. Standard deviation can be interpreted as a measure of risk - the higher it is, the greater the risk of the investment. An investor who does not want much risk will choose project 1 since it has the smallest standard deviation (0). If the investor prefers risk and high returns in a short period, then he will choose the project with the largest standard deviation - project 4.

Dispersion properties

Let us present the properties of dispersion.

Property 1. The variance of a constant value is zero:

Property 2. The constant factor can be taken out of the dispersion sign by squaring it:

.

Property 3. The variance of a random variable is equal to the mathematical expectation of the square of this value, from which the square of the mathematical expectation of the value itself is subtracted:

,

Where .

Property 4. The variance of the sum (difference) of random variables is equal to the sum (difference) of their variances:

Example 7. It is known that a discrete random variable X takes only two values: −3 and 7. In addition, the mathematical expectation is known: E(X) = 4 . Find the variance of a discrete random variable.

Solution. Let us denote by p the probability with which a random variable takes a value x1 = −3 . Then the probability of the value x2 = 7 will be 1 − p. Let us derive the equation for the mathematical expectation:

E(X) = x 1 p + x 2 (1 − p) = −3p + 7(1 − p) = 4 ,

where we get the probabilities: p= 0.3 and 1 − p = 0,7 .

Law of distribution of a random variable:

X −3 7
p 0,3 0,7

We calculate the variance of this random variable using the formula from property 3 of dispersion:

D(X) = 2,7 + 34,3 − 16 = 21 .

Find the mathematical expectation of a random variable yourself, and then look at the solution

Example 8. Discrete random variable X takes only two values. It accepts the greater of the values ​​3 with probability 0.4. In addition, the variance of the random variable is known D(X) = 6 . Find the mathematical expectation of a random variable.

Example 9. There are 6 white and 4 black balls in an urn. 3 balls are drawn from the urn. The number of white balls among the drawn balls is a discrete random variable X. Find the mathematical expectation and variance of this random variable.

Solution. Random value X can take values ​​0, 1, 2, 3. The corresponding probabilities can be calculated from probability multiplication rule. Law of distribution of a random variable:

X 0 1 2 3
p 1/30 3/10 1/2 1/6

Hence the mathematical expectation of this random variable:

M(X) = 3/10 + 1 + 1/2 = 1,8 .

The variance of a given random variable is:

D(X) = 0,3 + 2 + 1,5 − 3,24 = 0,56 .

Expectation and variance of a continuous random variable

For a continuous random variable, the mechanical interpretation of the mathematical expectation will retain the same meaning: the center of mass for a unit mass distributed continuously on the x-axis with density f(x). Unlike a discrete random variable, whose function argument xi changes abruptly; for a continuous random variable, the argument changes continuously. But the mathematical expectation of a continuous random variable is also related to its average value.

To find the mathematical expectation and variance of a continuous random variable, you need to find definite integrals . If the density function of a continuous random variable is given, then it directly enters into the integrand. If a probability distribution function is given, then by differentiating it, you need to find the density function.

The arithmetic average of all possible values ​​of a continuous random variable is called its mathematical expectation, denoted by or .

This page describes a standard example of finding variance, you can also look at other problems for finding it

Example 1. Determination of group, group average, intergroup and total variance

Example 2. Finding the variance and coefficient of variation in a grouping table

Example 3. Finding variance in a discrete series

Example 4. The following data is available for a group of 20 correspondence students. It is necessary to construct an interval series of the distribution of the characteristic, calculate the average value of the characteristic and study its dispersion

Let's build an interval grouping. Let's determine the range of the interval using the formula:

where X max is the maximum value of the grouping characteristic;
X min – minimum value of the grouping characteristic;
n – number of intervals:

We accept n=5. The step is: h = (192 - 159)/ 5 = 6.6

Let's create an interval grouping

For further calculations, we will build an auxiliary table:

X"i – the middle of the interval. (for example, the middle of the interval 159 – 165.6 = 162.3)

We determine the average height of students using the weighted arithmetic average formula:

Let's determine the variance using the formula:

The formula can be transformed like this:

From this formula it follows that variance is equal to the difference between the average of the squares of the options and the square and the average.

Dispersion in variation series with equal intervals using the method of moments can be calculated in the following way using the second property of dispersion (dividing all options by the value of the interval). Determining variance, calculated using the method of moments, using the following formula is less laborious:

where i is the value of the interval;
A is a conventional zero, for which it is convenient to use the middle of the interval with the highest frequency;
m1 is the square of the first order moment;
m2 - moment of second order

Alternative trait variance (if in a statistical population a characteristic changes in such a way that there are only two mutually exclusive options, then such variability is called alternative) can be calculated using the formula:

Substituting q = 1- p into this dispersion formula, we get:

Types of variance

Total variance measures the variation of a characteristic across the entire population as a whole under the influence of all factors that cause this variation. It is equal to the mean square of the deviations of individual values ​​of a characteristic x from the overall mean value of x and can be defined as simple variance or weighted variance.

Within-group variance characterizes random variation, i.e. part of the variation that is due to the influence of unaccounted factors and does not depend on the factor-attribute that forms the basis of the group. Such dispersion is equal to the mean square of the deviations of individual values ​​of the attribute within group X from the arithmetic mean of the group and can be calculated as simple dispersion or as weighted dispersion.



Thus, within-group variance measures variation of a trait within a group and is determined by the formula:

where xi is the group average;
ni is the number of units in the group.

For example, intragroup variances that need to be determined in the task of studying the influence of workers’ qualifications on the level of labor productivity in a workshop show variations in output in each group caused by all possible factors (technical condition of equipment, availability of tools and materials, age of workers, labor intensity, etc. .), except for differences in qualification category (within a group all workers have the same qualifications).