Along with studying the variation of a characteristic throughout the entire population as a whole, it is often necessary to trace quantitative changes in the characteristic across the groups into which the population is divided, as well as between groups. This study of variation is achieved through calculation and analysis various types variances.
There are total, intergroup and intragroup variances.
Total variance σ 2 measures the variation of a trait throughout the entire population under the influence of all factors that caused this variation.

Intergroup variance (δ) characterizes systematic variation, i.e. differences in the value of the studied trait that arise under the influence of the factor trait that forms the basis of the group. It is calculated using the formula:
.

Within-group variance (σ) reflects random variation, i.e. part of the variation that occurs under the influence of unaccounted factors and does not depend on the factor-attribute that forms the basis of the group. It is calculated by the formula:
.

Average of within-group variances: .

There is a law connecting 3 types of dispersion. The total variance is equal to the sum of the average of the within-group and between-group variance: .
This ratio is called rule for adding variances.

A widely used indicator in analysis is the proportion of between-group variance in the total variance. It's called empirical coefficient of determination (η 2): .
The square root of the empirical coefficient of determination is called empirical correlation ratio (η):
.
It characterizes the influence of the characteristic that forms the basis of the group on the variation of the resulting characteristic. The empirical correlation ratio ranges from 0 to 1.
Let's show it practical use using the following example (Table 1).

Example No. 1. Table 1 - Labor productivity of two groups of workers in one of the workshops of NPO "Cyclone"

Let's calculate the overall and group means and variances:




The initial data for calculating the average of intragroup and intergroup variance are presented in table. 2.
table 2
Calculation and δ 2 for two groups of workers.


Worker groups
Number of workers, people Average, children/shift Dispersion

Completed technical training

5 95 42,0

Those who have not completed technical training

5 81 231,2

All workers

10 88 185,6
Let's calculate the indicators. Average of within-group variances:
.
Intergroup variance

Total variance:
Thus, the empirical correlation ratio: .

Along with variation in quantitative characteristics, variation in qualitative characteristics can also be observed. This study of variation is achieved by calculating the following types of variances:

The within-group dispersion of the share is determined by the formula

Where n i– number of units in separate groups.
The share of the studied characteristic in the entire population, which is determined by the formula:
The three types of variance are related to each other as follows:
.

This relation of variances is called the theorem of addition of variances of the trait share.

According to the sample survey, depositors were grouped according to the size of their deposit in the city’s Sberbank:

Define:

1) scope of variation;

2) average deposit size;

3) average linear deviation;

4) dispersion;

5) standard deviation;

6) coefficient of variation of contributions.

Solution:

This distribution series contains open intervals. In such series, the value of the interval of the first group is conventionally assumed to be equal to the value of the interval of the next one, and the value of the interval last group equal to the value of the previous interval.

The value of the interval of the second group is equal to 200, therefore, the value of the first group is also equal to 200. The value of the interval of the penultimate group is equal to 200, which means that the last interval will also have a value of 200.

1) Let us define the range of variation as the difference between the largest and lowest value sign:

The range of variation in the deposit size is 1000 rubles.

2) The average size of the contribution will be determined using the weighted arithmetic average formula.

Let us first determine the discrete value of the attribute in each interval. To do this, using the simple arithmetic mean formula, we find the midpoints of the intervals.

The average value of the first interval will be:

the second - 500, etc.

Let's enter the calculation results in the table:

Deposit amount, rub.Number of depositors, fMiddle of the interval, xxf
200-400 32 300 9600
400-600 56 500 28000
600-800 120 700 84000
800-1000 104 900 93600
1000-1200 88 1100 96800
Total 400 - 312000

The average deposit in the city's Sberbank will be 780 rubles:

3) The average linear deviation is the arithmetic mean of the absolute deviations of individual values ​​of a characteristic from the overall average:

The procedure for calculating the average linear deviation in the interval distribution series is as follows:

1. The weighted arithmetic mean is calculated, as shown in paragraph 2).

2. Absolute deviations from the average are determined:

3. The resulting deviations are multiplied by frequencies:

4. Find the sum of weighted deviations without taking into account the sign:

5. The sum of weighted deviations is divided by the sum of frequencies:

It is convenient to use the calculation data table:

Deposit amount, rub.Number of depositors, fMiddle of the interval, x
200-400 32 300 -480 480 15360
400-600 56 500 -280 280 15680
600-800 120 700 -80 80 9600
800-1000 104 900 120 120 12480
1000-1200 88 1100 320 320 28160
Total 400 - - - 81280

The average linear deviation of the size of the deposit of Sberbank clients is 203.2 rubles.

4) Dispersion is the arithmetic mean of the squared deviations of each attribute value from the arithmetic mean.

Calculation of variance in interval distribution series is carried out using the formula:

The procedure for calculating variance in this case is as follows:

1. Determine the weighted arithmetic mean, as shown in paragraph 2).

2. Find deviations from the average:

3. Square the deviation of each option from the average:

4. Multiply the squares of the deviations by the weights (frequencies):

5. Sum up the resulting products:

6. The resulting amount is divided by the sum of the weights (frequencies):

Let's put the calculations in a table:

Deposit amount, rub.Number of depositors, fMiddle of the interval, x
200-400 32 300 -480 230400 7372800
400-600 56 500 -280 78400 4390400
600-800 120 700 -80 6400 768000
800-1000 104 900 120 14400 1497600
1000-1200 88 1100 320 102400 9011200
Total 400 - - - 23040000

However, this characteristic alone is not enough to study a random variable. Let's imagine two shooters shooting at a target. One shoots accurately and hits close to the center, while the other... is just having fun and doesn’t even aim. But what's funny is that he average the result will be exactly the same as the first shooter! This situation is conventionally illustrated by the following random variables:

The “sniper” mathematical expectation is equal to , however, for the “interesting person”: – it is also zero!

Thus, there is a need to quantify how far scattered bullets (random variable values) relative to the target center ( mathematical expectation). well and scattering translated from Latin is no other way than dispersion .

Let's see how this is determined numerical characteristic using one of the examples from the 1st part of the lesson:

There we found a disappointing mathematical expectation of this game, and now we have to calculate its variance, which denoted by through .

Let's find out how far the wins/losses are “scattered” relative to the average value. Obviously, for this we need to calculate differences between random variable values and her mathematical expectation:

–5 – (–0,5) = –4,5
2,5 – (–0,5) = 3
10 – (–0,5) = 10,5

Now it seems that you need to sum up the results, but this way is not suitable - for the reason that fluctuations to the left will cancel each other out with fluctuations to the right. So, for example, an “amateur” shooter (example above) the differences will be , and when added they will give zero, so we will not get any estimate of the dispersion of his shooting.

To get around this problem you can consider modules differences, but for technical reasons the approach has taken root when they are squared. It is more convenient to formulate the solution in a table:

And here it begs to calculate weighted average the value of the squared deviations. What is it? It's theirs expected value, which is a measure of scattering:

definition variances. From the definition it is immediately clear that variance cannot be negative– take note for practice!

Let's remember how to find the expected value. Multiply the squared differences by the corresponding probabilities (Table continuation):
– figuratively speaking, this is “traction force”,
and summarize the results:

Don't you think that compared to the winnings, the result turned out to be too big? That's right - we squared it, and to return to the dimension of our game, we need to extract Square root. This quantity is called standard deviation and is denoted by the Greek letter “sigma”:

This value is sometimes called standard deviation .

What is its meaning? If we deviate from the mathematical expectation to the left and right by the standard deviation:

– then the most probable values ​​of the random variable will be “concentrated” on this interval. What we actually observe:

However, it so happens that when analyzing scattering one almost always operates with the concept of dispersion. Let's figure out what it means in relation to games. If in the case of arrows we are talking about the “accuracy” of hits relative to the center of the target, then here dispersion characterizes two things:

Firstly, it is obvious that as the bets increase, the dispersion also increases. So, for example, if we increase by 10 times, then the mathematical expectation will increase by 10 times, and the variance will increase by 100 times (since this is a quadratic quantity). But note that the rules of the game themselves have not changed! Only the rates have changed, roughly speaking, before we bet 10 rubles, now it’s 100.

The second, more interesting point is that variance characterizes the style of play. Mentally fix the game bets at some certain level, and let's see what's what:

A low variance game is a cautious game. The player tends to choose the most reliable schemes, where he does not lose/win too much at one time. For example, the red/black system in roulette (see example 4 of the article Random variables) .

High variance game. She is often called dispersive game. This is an adventurous or aggressive style of play, where the player chooses “adrenaline” schemes. Let's at least remember "Martingale", in which the amounts at stake are orders of magnitude greater than the “quiet” game of the previous point.

The situation in poker is indicative: there are so-called tight players who tend to be cautious and “shaky” over their gaming funds (bankroll). Not surprisingly, their bankroll does not fluctuate significantly (low variance). On the contrary, if a player has high variance, then he is an aggressor. He often takes risks, makes large bets and can either break a huge bank or lose to smithereens.

The same thing happens in Forex, and so on - there are plenty of examples.

Moreover, in all cases it does not matter whether the game is played for pennies or thousands of dollars. Every level has its low- and high-dispersion players. Well, as we remember, the average winning is “responsible” expected value.

You probably noticed that finding variance is a long and painstaking process. But mathematics is generous:

Formula for finding variance

This formula is derived directly from the definition of variance, and we immediately put it into use. I’ll copy the sign with our game above:

and the found mathematical expectation.

Let's calculate the variance in the second way. First, let's find the mathematical expectation - the square of the random variable. By determination of mathematical expectation:

In this case:

Thus, according to the formula:

As they say, feel the difference. And in practice, of course, it is better to use the formula (unless the condition requires otherwise).

We master the technique of solving and designing:

Example 6

Find its mathematical expectation, variance and standard deviation.

This task is found everywhere, and, as a rule, goes without meaningful meaning.
You can imagine several light bulbs with numbers that light up in a madhouse with certain probabilities :)

Solution: It is convenient to summarize the basic calculations in a table. First, we write the initial data in the top two lines. Then we calculate the products, then and finally the sums in the right column:

Actually, almost everything is ready. The third line shows a ready-made mathematical expectation: .

We calculate the variance using the formula:

And finally, the standard deviation:
– Personally, I usually round to 2 decimal places.

All calculations can be carried out on a calculator, or even better - in Excel:

It's hard to go wrong here :)

Answer:

Those who wish can simplify their life even more and take advantage of my calculator (demo), which will not only instantly solve this problem, but also build thematic graphics (we'll get there soon). The program can be download from the library– if you have downloaded at least one educational material, or get another way. Thanks for supporting the project!

A couple of tasks to solve on your own:

Example 7

Calculate the variance of the random variable in the previous example by definition.

AND similar example:

Example 8

A discrete random variable is specified by its distribution law:

Yes, random variable values ​​can be quite large (example from real work) , and here, if possible, use Excel. As, by the way, in Example 7 - it’s faster, more reliable and more enjoyable.

Solutions and answers at the bottom of the page.

At the end of the 2nd part of the lesson, we will look at one more typical task, one might even say, a small rebus:

Example 9

A discrete random variable can take only two values: and , and . The probability, mathematical expectation and variance are known.

Solution: Let's start with an unknown probability. Since a random variable can take only two values, the sum of the probabilities of the corresponding events is:

and since , then .

All that remains is to find..., it's easy to say :) But oh well, here we go. By definition of mathematical expectation:
– substitute known quantities:

– and nothing more can be squeezed out of this equation, except that you can rewrite it in the usual direction:

or:

I think you can guess the next steps. Let's compose and solve the system:

Decimals- this, of course, is a complete disgrace; multiply both equations by 10:

and divide by 2:

That's better. From the 1st equation we express:
(this is the easier way)– substitute into the 2nd equation:


We are building squared and make simplifications:

Multiply by:

The result was quadratic equation, we find its discriminant:
- Great!

and we get two solutions:

1) if , That ;

2) if , That .

The condition is satisfied by the first pair of values. With a high probability everything is correct, but, nevertheless, let’s write down the distribution law:

and perform a check, namely, find the expectation:

The main generalizing indicators of variation in statistics are dispersions and standard deviations.

Dispersion this arithmetic mean squared deviations of each characteristic value from the overall average. The variance is usually called the mean square of deviations and is denoted by  2. Depending on the source data, the variance can be calculated using the simple or weighted arithmetic mean:

 unweighted (simple) variance;

 variance weighted.

Standard deviation this is a generalizing characteristic of absolute sizes variations signs in the aggregate. It is expressed in the same units of measurement as the attribute (in meters, tons, percentage, hectares, etc.).

The standard deviation is the square root of the variance and is denoted by :

 standard deviation unweighted;

 weighted standard deviation.

The standard deviation is a measure of the reliability of the mean. The smaller the standard deviation, the better the arithmetic mean reflects the entire represented population.

The calculation of the standard deviation is preceded by the calculation of the variance.

The procedure for calculating the weighted variance is as follows:

1) determine the weighted arithmetic mean:

2) calculate the deviations of the options from the average:

3) square the deviation of each option from the average:

4) multiply the squares of deviations by weights (frequencies):

5) summarize the resulting products:

6) the resulting amount is divided by the sum of the weights:

Example 2.1

Let's calculate the weighted arithmetic mean:

The values ​​of deviations from the mean and their squares are presented in the table. Let's define the variance:

The standard deviation will be equal to:

If the source data is presented in the form of interval distribution series , then you first need to determine the discrete value of the attribute, and then apply the described method.

Example 2.2

Let us show the calculation of variance for an interval series using data on the distribution of the sown area of ​​a collective farm according to wheat yield.

The arithmetic mean is:

Let's calculate the variance:

6.3. Calculation of variance using a formula based on individual data

Calculation technique variances complicated, but large values options and frequencies can be overwhelming. Calculations can be simplified using the properties of dispersion.

The dispersion has the following properties.

1. Reducing or increasing the weights (frequencies) of a varying characteristic by a certain number of times does not change the dispersion.

2. Decrease or increase each value of a characteristic by the same constant amount A does not change the dispersion.

3. Decrease or increase each value of a characteristic by a certain number of times k respectively reduces or increases the variance in k 2 times standard deviation  in k once.

4. The dispersion of a characteristic relative to an arbitrary value is always greater than the dispersion relative to the arithmetic mean per square of the difference between the average and arbitrary values:

If A 0, then we arrive at the following equality:

that is, the variance of the characteristic is equal to the difference between the mean square of the characteristic values ​​and the square of the mean.

Each property can be used independently or in combination with others when calculating variance.

The procedure for calculating variance is simple:

1) determine arithmetic mean :

2) square the arithmetic mean:

3) square the deviation of each variant of the series:

X i 2 .

4) find the sum of squares of the options:

5) divide the sum of the squares of the options by their number, i.e. determine the average square:

6) determine the difference between the mean square of the characteristic and the square of the mean:

Example 3.1 The following data is available on worker productivity:

Let's make the following calculations:

Dispersion in statistics is found as the individual values ​​of the characteristic squared from . Depending on the initial data, it is determined using the simple and weighted variance formulas:

1. (for ungrouped data) is calculated using the formula:

2. Weighted variance (for variation series):

where n is frequency (repeatability of factor X)

An example of finding variance

This page describes a standard example of finding variance, you can also look at other problems for finding it

Example 1. The following data is available for a group of 20 students correspondence department. Need to build interval series distribution of a characteristic, calculate the average value of the characteristic and study its variance

Let's build an interval grouping. Let's determine the range of the interval using the formula:

where X max is the maximum value of the grouping characteristic;
X min – minimum value of the grouping characteristic;
n – number of intervals:

We accept n=5. The step is: h = (192 - 159)/ 5 = 6.6

Let's create an interval grouping

For further calculations, we will build an auxiliary table:

X'i is the middle of the interval. (for example, the middle of the interval 159 – 165.6 = 162.3)

We determine the average height of students using the weighted arithmetic average formula:

Let's determine the variance using the formula:

The dispersion formula can be transformed as follows:

From this formula it follows that variance is equal to the difference between the average of the squares of the options and the square and the average.

Variance in variation series With at equal intervals by the method of moments can be calculated in the following way using the second property of dispersion (dividing all options by the value of the interval). Determining variance, calculated using the method of moments, using the following formula is less laborious:

where i is the value of the interval;
A is a conventional zero, for which it is convenient to use the middle of the interval with the highest frequency;
m1 is the square of the first order moment;
m2 - moment of second order

(if in a statistical population a characteristic changes in such a way that there are only two mutually exclusive options, then such variability is called alternative) can be calculated using the formula:

Substituting q = 1- p into this dispersion formula, we obtain:

Types of variance

Total variance measures the variation of a characteristic across the entire population as a whole under the influence of all factors that cause this variation. It is equal to the mean square of the deviations of individual values ​​of a characteristic x from the overall mean value of x and can be defined as simple variance or weighted variance.

characterizes random variation, i.e. part of the variation that is due to the influence of unaccounted factors and does not depend on the factor-attribute that forms the basis of the group. Such dispersion is equal to the mean square of the deviations of individual values ​​of the attribute within group X from the arithmetic mean of the group and can be calculated as simple dispersion or as weighted dispersion.

Thus, within-group variance measures variation of a trait within a group and is determined by the formula:

where xi is the group average;
ni is the number of units in the group.

For example, intragroup variances that need to be determined in the task of studying the influence of workers’ qualifications on the level of labor productivity in a workshop show variations in output in each group caused by all possible factors (technical condition of equipment, availability of tools and materials, age of workers, labor intensity, etc. .), except for differences in qualification category (within a group all workers have the same qualifications).

The average of the within-group variances reflects random, i.e., that part of the variation that occurred under the influence of all other factors, with the exception of the grouping factor. It is calculated using the formula:

Characterizes the systematic variation of the resulting characteristic, which is due to the influence of the factor-sign that forms the basis of the group. It is equal to the mean square of the deviations of the group means from the overall mean. Intergroup variance is calculated using the formula:

The rule for adding variance in statistics

According to rule of adding variances total variance equal to the sum of the average of the within-group and between-group variances:

The meaning of this rule is that the total variance that arises under the influence of all factors is equal to the sum of the variances that arise under the influence of all other factors and the variance that arises due to the grouping factor.

Using the formula for adding variances, you can determine by two known dispersions the third is unknown, and also judge the strength of the influence of the grouping characteristic.

Dispersion properties

1. If all values ​​of a characteristic are reduced (increased) by the same constant amount, then the dispersion will not change.
2. If all values ​​of a characteristic are reduced (increased) by the same number of times n, then the variance will correspondingly decrease (increase) by n^2 times.