Values ​​obtained from experience inevitably contain errors due to a wide variety of reasons. Among them, one should distinguish between systematic and random errors. Systematic errors are caused by reasons that act in a very specific way, and can always be eliminated or taken into account quite accurately. Random errors are caused by a very large number of individual causes that cannot be accurately accounted for and act in different ways in each individual measurement. These errors cannot be completely excluded; they can only be taken into account on average, for which it is necessary to know the laws that govern random errors.

We will denote the measured quantity by A, and random error when measuring x. Since the error x can take on any value, it is a continuous random variable, which is fully characterized by its distribution law.

The simplest and most accurately reflecting reality (in the vast majority of cases) is the so-called normal error distribution law:

This distribution law can be obtained from various theoretical premises, in particular, from the requirement that the most probable value of an unknown quantity for which a series of values ​​with the same degree of accuracy is obtained by direct measurement is the arithmetic mean of these values. Quantity 2 is called dispersion of this normal law.

Average

Determination of dispersion from experimental data. If for any value A, n values ​​a i are obtained by direct measurement with the same degree of accuracy and if the errors of value A are subject to the normal distribution law, then the most probable value of A will be average:

a - arithmetic mean,

a i - measured value at the i-th step.

Deviation of the observed value (for each observation) a i of value A from arithmetic mean: a i - a.

To determine the variance of the normal error distribution law in this case, use the formula:

2 - dispersion,
a - arithmetic mean,
n - number of parameter measurements,

Standard deviation

Standard deviation shows the absolute deviation of the measured values ​​from arithmetic mean. In accordance with the formula for the measure of accuracy of a linear combination mean square error The arithmetic mean is determined by the formula:

, Where


a - arithmetic mean,
n - number of parameter measurements,
a i - measured value at the i-th step.

The coefficient of variation

The coefficient of variation characterizes the relative measure of deviation of measured values ​​from arithmetic mean:

, Where

V - coefficient of variation,
- standard deviation,
a - arithmetic mean.

The higher the value coefficient of variation, the relatively greater the scatter and less uniformity of the studied values. If the coefficient of variation less than 10%, then variability variation series is considered to be insignificant, from 10% to 20% is considered medium, more than 20% and less than 33% is considered significant and if the coefficient of variation exceeds 33%, this indicates the heterogeneity of information and the need to exclude the largest and smallest values.

Average linear deviation

One of the indicators of the scope and intensity of variation is average linear deviation(average deviation module) from the arithmetic mean. Average linear deviation calculated by the formula:

, Where

_
a - average linear deviation,
a - arithmetic mean,
n - number of parameter measurements,
a i - measured value at the i-th step.

To check the compliance of the studied values ​​with the law of normal distribution, the relation is used asymmetry indicator to his mistake and attitude kurtosis indicator to his mistake.

Asymmetry indicator

Asymmetry indicator(A) and its error (m a) is calculated using the following formulas:

, Where

A - asymmetry indicator,
- standard deviation,
a - arithmetic mean,
n - number of parameter measurements,
a i - measured value at the i-th step.

Kurtosis indicator

Kurtosis indicator(E) and its error (m e) is calculated using the following formulas:

, Where

To calculate the simple geometric mean, the formula is used:

Geometric weighted

To determine the weighted geometric mean, the formula is used:

The average diameters of wheels, pipes, and the average sides of squares are determined using the mean square.

Root-mean-square values ​​are used to calculate some indicators, for example, the coefficient of variation, which characterizes the rhythm of production. Here the standard deviation from the planned production output for a certain period is determined using the following formula:

These values ​​accurately characterize the change in economic indicators compared to their base value, taken in its average value.

Quadratic simple

The root mean square is calculated using the formula:

Quadratic weighted

The weighted mean square is equal to:

22. Absolute indicators of variation include:

range of variation

average linear deviation

dispersion

standard deviation

Range of variation (r)

Range of variation- is the difference between the maximum and minimum values ​​of the attribute

It shows the limits within which the value of a characteristic changes in the population being studied.

The work experience of the five applicants in previous work is: 2,3,4,7 and 9 years. Solution: range of variation = 9 - 2 = 7 years.

For a generalized description of differences in attribute values, average variation indicators are calculated based on taking into account deviations from the arithmetic mean. The difference is taken as a deviation from the average.

In this case, in order to avoid the sum of deviations of variants of a characteristic from the average turning to zero (zero property of the average), one must either ignore the signs of the deviation, that is, take this sum modulo , or square the deviation values

Average linear and square deviation

Average linear deviation is the arithmetic average of the absolute deviations of individual values ​​of a characteristic from the average.

The average linear deviation is simple:

The work experience of the five applicants in previous work is: 2,3,4,7 and 9 years.

In our example: years;

Answer: 2.4 years.

Average linear deviation weighted applies to grouped data:

Due to its convention, the average linear deviation is used in practice relatively rarely (in particular, to characterize the fulfillment of contractual obligations regarding uniformity of delivery; in the analysis of product quality, taking into account the technological features of production).

Standard deviation

The most perfect characteristic of variation is the mean square deviation, which is called the standard (or standard deviation). Standard deviation() is equal to the square root of the average square deviation of the individual values ​​of the arithmetic average attribute:

The standard deviation is simple:

Weighted standard deviation is applied to grouped data:

Between the root mean square and mean linear deviations under normal distribution conditions the following ratio occurs: ~ 1.25.

The standard deviation, being the main absolute measure of variation, is used in determining the ordinate values ​​of a normal distribution curve, in calculations related to the organization of sample observation and establishing the accuracy of sample characteristics, as well as in assessing the limits of variation of a characteristic in a homogeneous population.

Expectation and variance

Let us measure a random variable N times, for example, we measure the wind speed ten times and want to find the average value. How is the average value related to the distribution function?

Let's roll the dice a large number of once. The number of points that will appear on the dice with each throw is a random variable and can take any natural value from 1 to 6. The arithmetic mean of the dropped points calculated for all dice throws is also a random variable, but for large N it tends to a very specific number - mathematical expectation M x. In this case M x = 3,5.

How did you get this value? Let in N tests, once you get 1 point, once you get 2 points, and so on. Then When N→ ∞ number of outcomes in which one point was rolled, Similarly, Hence

Model 4.5. Dice

Let us now assume that we know the distribution law random variable x, that is, we know that the random variable x can take values x 1 , x 2 , ..., x k with probabilities p 1 , p 2 , ..., p k.

Expected value M x random variable x equals:

Answer. 2,8.

The mathematical expectation is not always a reasonable estimate of some random variable. So, to estimate the average wages it is more reasonable to use the concept of median, that is, such a value that the number of people receiving a salary lower than the median and a larger one coincide.

Median random variable is called a number x 1/2 is such that p (x < x 1/2) = 1/2.

In other words, the probability p 1 that the random variable x will be smaller x 1/2, and probability p 2 that the random variable x will be greater x 1/2 are identical and equal to 1/2. The median is not determined uniquely for all distributions.

Let's return to the random variable x, which can take values x 1 , x 2 , ..., x k with probabilities p 1 , p 2 , ..., p k.

Variance random variable x is the average value of the squared deviation of a random variable from its mathematical expectation:

Example 2

Under the conditions of the previous example, calculate the variance and standard deviation of the random variable x.

Answer. 0,16, 0,4.

Model 4.6. Shooting at a target

Example 3

Find the probability distribution of the number of points that appear on the dice on the first throw, the median, the mathematical expectation, the variance and standard deviation.

Any edge is equally likely to fall out, so the distribution will look like this:

Standard deviation It can be seen that the deviation of the value from the average value is very large.

Properties of mathematical expectation:

  • The mathematical expectation of the sum of independent random variables is equal to the sum of their mathematical expectations:

Example 4

Find the mathematical expectation of the sum and product of points rolled on two dice.

In example 3 we found that for one cube M (x) = 3.5. So for two cubes

Dispersion properties:

  • The variance of the sum of independent random variables is equal to the sum of the variances:

Dx + y = Dx + Dy.

Let for N rolls on the dice rolled y points. Then

This result is true not only for dice rolls. In many cases, it determines the accuracy of measuring the mathematical expectation empirically. It can be seen that with increasing number of measurements N the spread of values ​​around the average, that is, the standard deviation, decreases proportionally

The variance of a random variable is related to the mathematical expectation of the square of this random variable by the following relation:

Let's find the mathematical expectations of both sides of this equality. A-priory,

The mathematical expectation of the right side of the equality, according to the property of mathematical expectations, is equal to

Standard deviation

Standard deviation equal to the square root of the variance:
When determining the standard deviation for a sufficiently large volume of the population being studied (n > 30), the following formulas are used:

Related information.


X i - random (current) variables;

the average value of random variables for the sample is calculated using the formula:

So, variance is the average square of deviations . That is, the average value is first calculated, then taken the difference between each original and average value is squared , is added and then divided by the number of values ​​in the population.

The difference between an individual value and the average reflects the measure of deviation. It is squared so that all deviations become exclusively positive numbers and to avoid mutual destruction of positive and negative deviations when summing them up. Then, given the squared deviations, we simply calculate the arithmetic mean.

The answer to the magic word “dispersion” lies in just these three words: average - square - deviations.

Standard deviation (MSD)

Extracting from variance Square root, we get the so-called “ standard deviation". There are names "standard deviation" or "sigma" (from the name of the Greek letter σ .). Average formula square deviation has the form:

So, dispersion is sigma squared, or is the standard deviation squared.

The standard deviation, obviously, also characterizes the measure of data dispersion, but now (unlike dispersion) it can be compared with the original data, since they have the same units of measurement (this is clear from the calculation formula). The range of variation is the difference between extreme values. Standard deviation, as a measure of uncertainty, is also involved in many statistical calculations. With its help, the degree of accuracy of various estimates and forecasts is determined. If the variation is very large, then the standard deviation will also be large, and therefore the forecast will be inaccurate, which will be expressed, for example, in very wide confidence intervals.

Therefore, in methods of statistical data processing in real estate assessments, depending on the required accuracy of the task, the two or three sigma rule is used.

To compare the two-sigma rule and the three-sigma rule, we use Laplace’s formula:

F - F ,

where Ф(x) is the Laplace function;



Minimum value

β = maximum value

s = sigma value (standard deviation)

a = average

In this case it is used private view Laplace's formula when the boundaries α and β of the values ​​of the random variable X are equally spaced from the center of the distribution a = M(X) by a certain value d: a = a-d, b = a+d. Or (1) Formula (1) determines the probability of a given deviation d of a random variable X c normal law distribution from its mathematical expectation M(X) = a. If in formula (1) we take sequentially d = 2s and d = 3s, we obtain: (2), (3).

Two sigma rule

It can be almost reliably (with a confidence probability of 0.954) that all values ​​of a random variable X with a normal distribution law deviate from its mathematical expectation M(X) = a by an amount not greater than 2s (two standard deviations). Confidence probability (Pd) is the probability of events that are conventionally accepted as reliable (their probability is close to 1).

Let's illustrate the two-sigma rule geometrically. In Fig. Figure 6 shows a Gaussian curve with the distribution center a. The area bounded by the entire curve and the Ox axis is 1 (100%), and the area curved trapezoid between the abscissas a–2s and a+2s, according to the two-sigma rule, is equal to 0.954 (95.4% of the total area). The area of ​​the shaded areas is 1-0.954 = 0.046 (»5% of the total area). These areas are called the critical region of the random variable. Values ​​of a random variable falling into the critical region are unlikely and in practice are conventionally accepted as impossible.

The probability of conditionally impossible values ​​is called the significance level of a random variable. The significance level is related to the confidence probability by the formula:

where q is the significance level expressed as a percentage.

Three sigma rule

When solving issues that require greater reliability, when the confidence probability (Pd) is taken equal to 0.997 (more precisely, 0.9973), instead of the two-sigma rule, according to formula (3), the rule is used three sigma



According to three sigma rule at confidence probability 0.9973 the critical area will be the area of ​​attribute values ​​outside the interval (a-3s, a+3s). The significance level is 0.27%.

In other words, the probability that absolute value deviations will exceed three times the standard deviation, very small, namely equal to 0.0027 = 1-0.9973. This means that only 0.27% of cases will this happen. Such events, based on the principle of the impossibility of unlikely events, can be considered practically impossible. Those. sampling is highly accurate.

This is the essence of the three sigma rule:

If a random variable is distributed normally, then the absolute value of its deviation from the mathematical expectation does not exceed three times the standard deviation (MSD).

In practice, the three-sigma rule is applied as follows: if the distribution of the random variable being studied is unknown, but the condition specified in the above rule is met, then there is reason to assume that the variable being studied is normally distributed; otherwise it is not normally distributed.

The level of significance is taken depending on the permitted degree of risk and the task at hand. For real estate valuation, a less precise sample is usually adopted, following the two-sigma rule.

Standard deviation(synonyms: standard deviation, standard deviation, square deviation; related terms: standard deviation, standard spread) - in probability theory and statistics, the most common indicator of the dispersion of the values ​​of a random variable relative to its mathematical expectation. With limited arrays of samples of values, instead of the mathematical expectation, the arithmetic mean of the set of samples is used.

Encyclopedic YouTube

  • 1 / 5

    The standard deviation is measured in units of measurement of the random variable itself and is used when calculating the standard error of the arithmetic mean, when constructing confidence intervals, when statistically testing hypotheses, when measuring the linear relationship between random variables. Defined as the square root of the variance of a random variable.

    Standard deviation:

    s = n n − 1 σ 2 = 1 n − 1 ∑ i = 1 n (x i − x ¯) 2 ; (\displaystyle s=(\sqrt ((\frac (n)(n-1))\sigma ^(2)))=(\sqrt ((\frac (1)(n-1))\sum _( i=1)^(n)\left(x_(i)-(\bar (x))\right)^(2)));)
    • Note: Very often there are discrepancies in the names of MSD (Root Mean Square Deviation) and STD (Standard Deviation) with their formulas. For example, in the numPy module of the Python programming language, the std() function is described as "standard deviation", while the formula reflects the standard deviation (division by the root of the sample). In Excel, the STANDARDEVAL() function is different (division by the root of n-1).

    Standard deviation(estimate of the standard deviation of a random variable x relative to its mathematical expectation based on an unbiased estimate of its variance) s (\displaystyle s):

    σ = 1 n ∑ i = 1 n (x i − x ¯) 2 . (\displaystyle \sigma =(\sqrt ((\frac (1)(n))\sum _(i=1)^(n)\left(x_(i)-(\bar (x))\right) ^(2))).)

    Where σ 2 (\displaystyle \sigma ^(2))- dispersion; x i (\displaystyle x_(i)) - i th element of the selection; n (\displaystyle n)- sample size; - arithmetic mean of the sample:

    x ¯ = 1 n ∑ i = 1 n x i = 1 n (x 1 + … + x n) . (\displaystyle (\bar (x))=(\frac (1)(n))\sum _(i=1)^(n)x_(i)=(\frac (1)(n))(x_ (1)+\ldots +x_(n)).)

    It should be noted that both estimates are biased. IN general case It is impossible to construct an unbiased estimate. However, the estimate based on the unbiased variance estimate is consistent.

    In accordance with GOST R 8.736-2011, the standard deviation is calculated using the second formula of this section. Please check the results.

    Three sigma rule

    Three sigma rule (3 σ (\displaystyle 3\sigma )) - almost all values ​​of a normally distributed random variable lie in the interval (x ¯ − 3 σ ; x ¯ + 3 σ) (\displaystyle \left((\bar (x))-3\sigma ;(\bar (x))+3\sigma \right)). More strictly - with approximately probability 0.9973, the value of a normally distributed random variable lies in the specified interval (provided that the value x ¯ (\displaystyle (\bar (x))) true, and not obtained as a result of sample processing).

    If the true value x ¯ (\displaystyle (\bar (x))) is unknown, then you should not use σ (\displaystyle \sigma ), A s. Thus, rule of three sigma is converted to the rule of three s .

    Interpretation of the standard deviation value

    A larger standard deviation value shows a greater spread of values ​​in the presented set with the average value of the set; a smaller value, accordingly, shows that the values ​​in the set are grouped around the average value.

    For example, we have three number sets: (0, 0, 14, 14), (0, 6, 8, 14) and (6, 6, 8, 8). All three sets have mean values ​​equal to 7, and standard deviations, respectively, equal to 7, 5 and 1. The last set has a small standard deviation, since the values ​​in the set are grouped around the mean value; the first set has the most great importance standard deviation - values ​​within the set diverge greatly from the average value.

    In a general sense, standard deviation can be considered a measure of uncertainty. For example, in physics, standard deviation is used to determine the error of a series of successive measurements of some quantity. This value is very important for determining the plausibility of the phenomenon under study in comparison with the value predicted by the theory: if the average value of the measurements differs greatly from the values ​​​​predicted by the theory (large standard deviation), then the obtained values ​​or the method of obtaining them should be rechecked. identified with portfolio risk.

    Climate

    Suppose there are two cities with the same average maximum daily temperature, but one is located on the coast and the other on the plain. It is known that cities located on the coast have many different maximum daytime temperatures that are lower than cities located inland. Therefore, the standard deviation of maximum daily temperatures for a coastal city will be less than for a second city, despite the fact that their average value is the same, which in practice means that the probability that Maximum temperature air of each specific day of the year will differ more strongly from the average value, higher for a city located inside the continent.

    Sport

    Let's assume that there are several football teams that are evaluated according to some set of parameters, for example, the number of goals scored and conceded, scoring chances, etc. It is most likely that the best team in this group will have best values according to more parameters. The smaller the team’s standard deviation for each of the presented parameters, the more predictable the team’s result is; such teams are balanced. On the other hand, the team with great value standard deviation is difficult to predict the result, which in turn is explained by the imbalance, for example, strong defense, but with a weak attack.

    Using the standard deviation of team parameters makes it possible, to one degree or another, to predict the result of a match between two teams, assessing the strengths and weak sides commands, and therefore the chosen methods of struggle.