There are two types of estimates in statistics: point and interval. Point estimate is a single sample statistic that is used to estimate a population parameter. For example, the sample mean is a point estimate mathematical expectation population, and sample variance S 2- point estimate of population variance σ 2. it has been shown that the sample mean is an unbiased estimate of the mathematical expectation of the population. A sample mean is called unbiased because the average of all sample means (with the same sample size) n) is equal to the mathematical expectation of the general population.

In order for the sample variance S 2 became an unbiased estimate of the population variance σ 2, the denominator of the sample variance should be set equal to n – 1 , but not n. In other words, the population variance is the average of all possible sample variances.

When estimating population parameters, it should be kept in mind that sample statistics such as , depend on specific samples. To take this fact into account, to obtain interval estimation mathematical expectation of the general population, analyze the distribution of sample means (for more details, see). The constructed interval is characterized by a certain confidence level, which represents the probability that the true population parameter is estimated correctly. Similar confidence intervals can be used to estimate the proportion of a characteristic R and the main distributed mass of the population.

Download the note in or format, examples in format

Constructing a confidence interval for the mathematical expectation of the population with a known standard deviation

Constructing a confidence interval for the share of a characteristic in the population

This section extends the concept of confidence interval to categorical data. This allows us to estimate the share of the characteristic in the population R using sample share RS= X/n. As indicated, if the quantities nR And n(1 – p) exceed the number 5, the binomial distribution can be approximated as normal. Therefore, to estimate the share of a characteristic in the population R it is possible to construct an interval whose confidence level is equal to (1 – α)х100%.


Where pS- sample proportion of the characteristic equal to X/n, i.e. number of successes divided by sample size, R- the share of the characteristic in the general population, Z- critical value of the standardized normal distribution, n- sample size.

Example 3. Let's assume that a sample consisting of 100 invoices filled out during the last month is extracted from the information system. Let's say that 10 of these invoices were compiled with errors. Thus, R= 10/100 = 0.1. The 95% confidence level corresponds to the critical value Z = 1.96.

Thus, the probability that between 4.12% and 15.88% of invoices contain errors is 95%.

For a given sample size, the confidence interval containing the proportion of the characteristic in the population appears wider than for a continuous random variable. This is because measurements of a continuous random variable contain more information than measurements of categorical data. In other words, categorical data that takes only two values ​​contains insufficient information to estimate the parameters of their distribution.

INcalculating estimates extracted from a finite population

Estimation of mathematical expectation. Correction factor for the final population ( fpc) was used to reduce the standard error by a factor. When calculating confidence intervals for population parameter estimates, a correction factor is applied in situations where samples are drawn without being returned. Thus, a confidence interval for the mathematical expectation having a confidence level equal to (1 – α)х100%, is calculated by the formula:

Example 4. To illustrate the use of the correction factor for a finite population, let us return to the problem of calculating the confidence interval for the average amount of invoices, discussed above in Example 3. Suppose that a company issues 5,000 invoices per month, and =110.27 dollars, S= $28.95, N = 5000, n = 100, α = 0.05, t 99 = 1.9842. Using formula (6) we obtain:

Estimation of the share of a feature. When choosing without return, the confidence interval for the proportion of the attribute having a confidence level equal to (1 – α)х100%, is calculated by the formula:

Confidence intervals and ethical issues

When sampling a population and drawing statistical conclusions, ethical issues often arise. The main one is how confidence intervals and point estimates of sample statistics agree. Publishing point estimates without specifying the associated confidence intervals (usually at the 95% confidence level) and the sample size from which they are derived can create confusion. This may give the user the impression that the point estimate is exactly what he needs to predict the properties of the entire population. Thus, it is necessary to understand that in any research the focus should be not on point estimates, but on interval estimates. Besides, Special attention should be given the right choice sample sizes.

Most often, the objects of statistical manipulation are the results of sociological surveys of the population on certain political issues. In this case, the survey results are published on the front pages of newspapers, and the error sample survey and methodology statistical analysis printed somewhere in the middle. To prove the validity of the obtained point estimates, it is necessary to indicate the sample size on the basis of which they were obtained, the boundaries of the confidence interval and its level of significance.

Next note

Materials from the book Levin et al. Statistics for Managers are used. – M.: Williams, 2004. – p. 448–462

Central limit theorem states that with a sufficiently large sample size, the sample distribution of means can be approximated by a normal distribution. This property does not depend on the type of distribution of the population.

Intelligence consists not only in knowledge, but also in the ability to apply knowledge in practice. (Aristotle)

Confidence intervals

general review

By taking a sample from the population, we obtain a point estimate of the parameter of interest and calculate the standard error to indicate the precision of the estimate.

However, for most cases the standard error as such is not acceptable. It is much more useful to combine this measure of accuracy with an interval estimate for the population parameter.

This can be done by using knowledge of the theoretical probability distribution of the sample statistic (parameter) in order to calculate the confidence interval (CI - Confidence Interval, DI - Confidence interval) for the parameter.

In general, a confidence interval extends estimates in both directions by a certain multiple of the standard error (of a given parameter); the two values ​​(confidence limits) defining the interval are usually separated by a comma and enclosed in parentheses.

Confidence interval for the mean

Using Normal Distribution

The sample mean is normally distributed if the sample size is large, so you can apply knowledge of the normal distribution when considering the sample mean.

Specifically, 95% of the distribution of sample means is within 1.96 standard deviations (SD) of the population mean.

When we only have one sample, we call it the standard error of the mean (SEM) and calculate the 95% confidence interval for the mean as follows:

If we repeat this experiment several times, the interval will contain the true population mean 95% of the time.

Typically this is a confidence interval, such as the interval of values ​​within which the true population mean (general mean) lies with a 95% confidence probability.

While it is not entirely rigorous (the population mean is a fixed value and therefore cannot have a probability attached to it) to interpret a confidence interval this way, it is conceptually easier to understand.

Usage t- distribution

You can use the normal distribution if you know the value of the variance in the population. Also, when the sample size is small, the sample mean follows a normal distribution if the underlying population data are normally distributed.

If the data underlying the population are not normally distributed and/or the population variance is unknown, the sample mean obeys Student's t-distribution.

We calculate the 95% confidence interval for the general population mean as follows:

Where is the percentage point (percentile) t- Student's t distribution with (n-1) degrees of freedom, which gives a two-sided probability of 0.05.

In general, it provides a wider range than using the normal distribution because it takes into account the additional uncertainty introduced by estimating the population standard deviation and/or due to the small sample size.

When the sample size is large (on the order of 100 or more), the difference between the two distributions ( t-Student and normal) is insignificant. However, they always use t- distribution when calculating confidence intervals, even if the sample size is large.

Typically the 95% CI is reported. Other confidence intervals can be calculated, such as the 99% CI for the mean.

Instead of the product of the standard error and the table value t- distribution, which corresponds to a two-sided probability of 0.05, multiply it (standard error) by the value that corresponds to a two-sided probability of 0.01. This is a wider confidence interval than the 95% confidence interval because it reflects increased confidence that the interval actually includes the population mean.

Confidence interval for proportion

The sampling distribution of proportions has a binomial distribution. However, if the sample size n is reasonably large, then the sampling distribution of the proportion is approximately normal with the mean .

We evaluate by selective ratio p=r/n(Where r- the number of individuals in the sample with the ones we are interested in characteristic features), and the standard error is estimated:

The 95% confidence interval for the proportion is estimated:

If the sample size is small (usually when n.p. or n(1-p) less 5 ), then it is necessary to use the binomial distribution in order to calculate accurate confidence intervals.

Note that if p expressed as a percentage, then (1-p) replaced by (100-p).

Interpretation of confidence intervals

When interpreting a confidence interval, we are interested in the following questions:

How wide is the confidence interval?

A wide confidence interval indicates that the estimate is imprecise; narrow indicates an accurate estimate.

The width of the confidence interval depends on the size of the standard error, which in turn depends on the sample size and, when considering a numerical variable, the variability of the data produces wider confidence intervals than studies of a large data set of few variables.

Does the CI include any values ​​of particular interest?

You can check whether the likely value for a population parameter falls within the confidence interval. If so, the results are consistent with this likely value. If not, then it is unlikely (for a 95% confidence interval the chance is almost 5%) that the parameter has that value.

Let us have a large number of objects with a normal distribution of certain characteristics (for example, a complete warehouse of vegetables of the same type, the size and weight of which varies). You want to know the average characteristics of the entire batch of goods, but you have neither the time nor the desire to measure and weigh each vegetable. You understand that this is not necessary. But how many pieces would need to be taken for a spot check?

Before giving several formulas useful for this situation, let us recall some notation.

Firstly, if we did measure the entire warehouse of vegetables (this set of elements is called the general population), then we would know with all the accuracy available to us the average weight of the entire batch. Let's call this average X avg .g en . - general average. We already know what is completely determined if its mean value and deviation s are known . True, while we are neither X average gen. nor s We don’t know the general population. We can only take a certain sample, measure the values ​​we need and calculate for this sample both the average value X avg. and the standard deviation S select.

It is known that if our sample check contains a large number of elements (usually n is greater than 30), and they are taken really random, then s the general population will hardly differ from S selection ..

In addition, for the case of normal distribution we can use the following formulas:

With a probability of 95%


With a probability of 99%



IN general view with probability P (t)


The relationship between the t value and the probability value P (t), with which we want to know the confidence interval, can be taken from the following table:


Thus, we have determined in which range the average value for the population lies (with a given probability).

Unless we have a large enough sample, we cannot say that the population has s = S select In addition, in this case the closeness of the sample to the normal distribution is problematic. In this case, we also use S select instead s in the formula:




but the value of t for a fixed probability P(t) will depend on the number of elements in the sample n. The larger n, the closer the resulting confidence interval will be to the value given by formula (1). The t values ​​in this case are taken from another table ( Student's t-test), which we present below:

Student's t-test values ​​for probability 0.95 and 0.99


Example 3. 30 people were randomly selected from the company's employees. According to the sample, it turned out that the average salary (per month) is 30 thousand rubles with a standard deviation of 5 thousand rubles. Determine the average salary in the company with a probability of 0.99.

Solution: By condition we have n = 30, X avg. =30000, S=5000, P = 0.99. To find the confidence interval, we will use the formula corresponding to the Student's t test. From the table for n = 30 and P = 0.99 we find t = 2.756, therefore,


those. sought-after trustee interval 27484< Х ср.ген < 32516.

So, with a probability of 0.99 we can say that the interval (27484; 32516) contains within itself the average salary in the company.

We hope that you will use this method, and it is not necessary that you have a table with you every time. Calculations can be carried out automatically in Excel. While in the Excel file, click the fx button in the top menu. Then, select the “statistical” type among the functions, and from the proposed list in the window - STUDAR DISCOVER. Then, at the prompt, placing the cursor in the “probability” field, enter the value of the inverse probability (i.e. in our case, instead of the probability of 0.95, you need to type the probability of 0.05). Apparently, the spreadsheet is designed in such a way that the result answers the question of how likely we are to be wrong. Similarly, in the Degree of Freedom field, enter a value (n-1) for your sample.

Target– teach students algorithms for calculating confidence intervals of statistical parameters.

When statistically processing data, the calculated arithmetic mean, coefficient of variation, correlation coefficient, difference criteria and other point statistics should receive quantitative confidence limits, which indicate possible fluctuations of the indicator in smaller and larger directions within the confidence interval.

Example 3.1 . The distribution of calcium in the blood serum of monkeys, as previously established, is characterized by the following sample indicators: = 11.94 mg%; = 0.127 mg%; n= 100. It is required to determine the confidence interval for the general average ( ) with confidence probability P = 0,95.

The general average is located with a certain probability in the interval:

, Where – sample arithmetic mean; t– Student’s test; – error of the arithmetic mean.

Using the table “Student’s t-test values” we find the value with a confidence probability of 0.95 and the number of degrees of freedom k= 100-1 = 99. It is equal to 1.982. Together with the values ​​of the arithmetic mean and statistical error, we substitute it into the formula:

or 11.69
12,19

Thus, with a probability of 95%, it can be stated that the general average of this normal distribution is between 11.69 and 12.19 mg%.

Example 3.2 . Determine the boundaries of the 95% confidence interval for the general variance ( ) distribution of calcium in the blood of monkeys, if it is known that
= 1.60, at n = 100.

To solve the problem you can use the following formula:

Where – statistical error of dispersion.

We find the sampling variance error using the formula:
. It is equal to 0.11. Meaning t- criterion with a confidence probability of 0.95 and the number of degrees of freedom k= 100–1 = 99 is known from the previous example.

Let's use the formula and get:

or 1.38
1,82

More accurately, the confidence interval of the general variance can be constructed using (chi-square) - Pearson test. The critical points for this criterion are given in a special table. When using the criterion To construct a confidence interval, a two-sided significance level is used. For the lower limit, the significance level is calculated using the formula
, for the top –
. For example, for the confidence level = 0,99= 0,010,= 0.990. Accordingly, according to the table of distribution of critical values , with calculated confidence levels and number of degrees of freedom k= 100 – 1= 99, find the values
And
. We get
equals 135.80, and
equals 70.06.

To find confidence limits for the general variance using Let's use the formulas: for the lower boundary
, for the upper bound
. Let's substitute the found values ​​for the problem data into formulas:
= 1,17;
= 2.26. Thus, with a confidence probability P= 0.99 or 99% general variance will lie in the range from 1.17 to 2.26 mg% inclusive.

Example 3.3 . Among 1000 wheat seeds from the batch received at the elevator, 120 seeds were found infected with ergot. It is necessary to determine the probable boundaries of the general proportion of infected seeds in a given batch of wheat.

It is advisable to determine the confidence limits for the general share for all its possible values ​​using the formula:

,

Where n – number of observations; m– absolute size of one of the groups; t– normalized deviation.

The sample proportion of infected seeds is
or 12%. With confidence probability R= 95% normalized deviation ( t-Student's test at k =
)t = 1,960.

We substitute the available data into the formula:

Hence the boundaries of the confidence interval are equal to = 0.122–0.041 = 0.081, or 8.1%; = 0.122 + 0.041 = 0.163, or 16.3%.

Thus, with a confidence probability of 95% it can be stated that the general proportion of infected seeds is between 8.1 and 16.3%.

Example 3.4 . The coefficient of variation characterizing the variation of calcium (mg%) in the blood serum of monkeys was equal to 10.6%. Sample size n= 100. It is necessary to determine the boundaries of the 95% confidence interval for the general parameter Cv.

Limits of the confidence interval for the general coefficient of variation Cv are determined by the following formulas:

And
, Where K intermediate value calculated by the formula
.

Knowing that with confidence probability R= 95% normalized deviation (Student's test at k =
)t = 1.960, let’s first calculate the value TO:

.

or 9.3%

or 12.3%

Thus, the general coefficient of variation with a 95% confidence level lies in the range from 9.3 to 12.3%. With repeated samples, the coefficient of variation will not exceed 12.3% and will not be below 9.3% in 95 cases out of 100.

Questions for self-control:

Problems for independent solution.

1. The average percentage of fat in milk during lactation of Kholmogory crossbred cows was as follows: 3.4; 3.6; 3.2; 3.1; 2.9; 3.7; 3.2; 3.6; 4.0; 3.4; 4.1; 3.8; 3.4; 4.0; 3.3; 3.7; 3.5; 3.6; 3.4; 3.8. Establish confidence intervals for the general mean at 95% confidence level (20 points).

2. On 400 hybrid rye plants, the first flowers appeared on average 70.5 days after sowing. The standard deviation was 6.9 days. Determine the error of the mean and confidence intervals for the general mean and variance at the significance level W= 0.05 and W= 0.01 (25 points).

3. When studying the length of leaves of 502 specimens of garden strawberries, the following data were obtained: = 7.86 cm; σ = 1.32 cm, =± 0.06 cm. Determine confidence intervals for the arithmetic population mean with significance levels of 0.01; 0.02; 0.05. (25 points).

4. In a study of 150 adult men, the average height was 167 cm, and σ = 6 cm. What are the limits of the general mean and general dispersion with a confidence probability of 0.99 and 0.95? (25 points).

5. The distribution of calcium in the blood serum of monkeys is characterized by the following selective indicators: = 11.94 mg%, σ = 1,27, n = 100. Construct a 95% confidence interval for the general mean of this distribution. Calculate the coefficient of variation (25 points).

6. The total nitrogen content in the blood plasma of albino rats at the age of 37 and 180 days was studied. The results are expressed in grams per 100 cm 3 of plasma. At the age of 37 days, 9 rats had: 0.98; 0.83; 0.99; 0.86; 0.90; 0.81; 0.94; 0.92; 0.87. At the age of 180 days, 8 rats had: 1.20; 1.18; 1.33; 1.21; 1.20; 1.07; 1.13; 1.12. Set confidence intervals for the difference at a confidence level of 0.95 (50 points).

7. Determine the boundaries of the 95% confidence interval for the general variance of the distribution of calcium (mg%) in the blood serum of monkeys, if for this distribution the sample size is n = 100, statistical error of the sample variance s σ 2 = 1.60 (40 points).

8. Determine the boundaries of the 95% confidence interval for the general variance of the distribution of 40 wheat spikelets along the length (σ 2 = 40.87 mm 2). (25 points).

9. Smoking is considered the main factor predisposing to obstructive pulmonary diseases. Passive smoking is not considered such a factor. Scientists doubted the harmlessness of passive smoking and examined the airway patency of non-smokers, passive and active smokers. To characterize the state of the respiratory tract, we took one of the indicators of external respiration function - the maximum volumetric flow rate of mid-expiration. A decrease in this indicator is a sign of airway obstruction. The survey data are shown in the table.

Number of people examined

Maximum mid-expiratory flow rate, l/s

Standard deviation

Non-smokers

work in a non-smoking area

working in a smoky room

Smoking

smokers do not big number cigarettes

average number of cigarette smokers

smoke a large number of cigarettes

Using the table data, find 95% confidence intervals for the overall mean and overall variance for each group. What are the differences between the groups? Present the results graphically (25 points).

10. Determine the boundaries of the 95% and 99% confidence intervals for the general variance in the number of piglets in 64 farrows, if the statistical error of the sample variance s σ 2 = 8.25 (30 points).

11. It is known that the average weight of rabbits is 2.1 kg. Determine the boundaries of the 95% and 99% confidence intervals for the general mean and variance at n= 30, σ = 0.56 kg (25 points).

12. The grain content of the ear was measured for 100 ears ( X), ear length ( Y) and the mass of grain in the ear ( Z). Find confidence intervals for the general mean and variance at P 1 = 0,95, P 2 = 0,99, P 3 = 0.999 if = 19, = 6.766 cm, = 0.554 g; σ x 2 = 29.153, σ y 2 = 2. 111, σ z 2 = 0. 064. (25 points).

13. In 100 randomly selected ears of winter wheat, the number of spikelets was counted. The sample population was characterized by the following indicators: = 15 spikelets and σ = 2.28 pcs. Determine with what accuracy the average result was obtained ( ) and construct a confidence interval for the general mean and variance at 95% and 99% significance levels (30 points).

14. Number of ribs on fossil mollusk shells Orthambonites calligramma:

It is known that n = 19, σ = 4.25. Determine the boundaries of the confidence interval for the general mean and general variance at the significance level W = 0.01 (25 points).

15. To determine milk yield on a commercial dairy farm, the productivity of 15 cows was determined daily. According to data for the year, each cow gave on average the following amount of milk per day (l): 22; 19; 25; 20; 27; 17; thirty; 21; 18; 24; 26; 23; 25; 20; 24. Construct confidence intervals for the general variance and the arithmetic mean. Can we expect the average annual milk yield per cow to be 10,000 liters? (50 points).

16. In order to determine the average wheat yield for the agricultural enterprise, mowing was carried out on trial plots of 1, 3, 2, 5, 2, 6, 1, 3, 2, 11 and 2 hectares. Productivity (c/ha) from the plots was 39.4; 38; 35.8; 40; 35; 42.7; 39.3; 41.6; 33; 42; 29 respectively. Construct confidence intervals for the general variance and arithmetic mean. Can we expect that the average agricultural yield will be 42 c/ha? (50 points).

One of the methods for solving statistical problems is calculating the confidence interval. It is used as a preferable alternative to point estimation when the sample size is small. It should be noted that the process of calculating the confidence interval itself is quite complex. But the Excel program tools allow you to simplify it somewhat. Let's find out how this is done in practice.

This method is used for interval estimation of various statistical quantities. The main task of this calculation is to get rid of the uncertainties of the point estimate.

In Excel, there are two main options for performing calculations using this method: when the variance is known and when it is unknown. In the first case, the function is used for calculations TRUST.NORM, and in the second - TRUSTEE.STUDENT.

Method 1: CONFIDENCE NORM function

Operator TRUST.NORM, which belongs to the statistical group of functions, first appeared in Excel 2010. Earlier versions of this program use its analogue TRUST. The purpose of this operator is to calculate a normally distributed confidence interval for the population mean.

Its syntax is as follows:

CONFIDENCE.NORM(alpha;standard_off;size)

"Alpha"— an argument indicating the significance level that is used to calculate the confidence level. The confidence level is equal to the following expression:

(1-"Alpha")*100

"Standard deviation"- This is an argument, the essence of which is clear from the name. This is the standard deviation of the proposed sample.

"Size"— argument defining the sample size.

All arguments of this operator are mandatory.

Function TRUST has exactly the same arguments and possibilities as the previous one. Its syntax is:

TRUST(alpha, standard_off, size)

As you can see, the differences are only in the name of the operator. Specified function for compatibility reasons, left in Excel 2010 and newer versions in a special category "Compatibility". In versions of Excel 2007 and earlier, it is present in the main group of statistical operators.

The confidence interval limit is determined using the following formula:

X+(-)CONFIDENCE NORM

Where X is the average sample value, which is located in the middle of the selected range.

Now let's look at how to calculate a confidence interval on specific example. 12 tests were carried out, resulting in different results reported in the table. This is our totality. The standard deviation is 8. We need to calculate the confidence interval at the 97% confidence level.

  1. Select the cell where the result of data processing will be displayed. Click on the button "Insert Function".
  2. Appears Function Wizard. Go to category "Statistical" and highlight the name "TRUST.NORM". After that, click on the button "OK".
  3. The arguments window opens. Its fields naturally correspond to the names of the arguments.
    Place the cursor in the first field - "Alpha". Here we should indicate the level of significance. As we remember, our level of trust is 97%. At the same time, we said that it is calculated in this way:

    (1-trust level)/100

    That is, substituting the value, we get:

    By simple calculations we find out that the argument "Alpha" equals 0,03 . Enter given value in field.

    As is known, by condition the standard deviation is equal to 8 . Therefore, in the field "Standard deviation" just write down this number.

    In field "Size" you need to enter the number of test elements performed. As we remember, their 12 . But in order to automate the formula and not edit it every time we conduct a new test, let's set this value not with an ordinary number, but using the operator CHECK. So, let's place the cursor in the field "Size", and then click on the triangle, which is located to the left of the formula bar.

    A list of recently used functions appears. If the operator CHECK has been used by you recently, it should be on this list. In this case, you just need to click on its name. Otherwise, if you don’t find it, then go to the point "Other functions...".

  4. An already familiar one appears Function Wizard. Let's move back to the group again "Statistical". We highlight the name there "CHECK". Click on the button "OK".
  5. The argument window for the above statement appears. This function is designed to calculate the number of cells in a specified range that contain numeric values. Its syntax is as follows:

    COUNT(value1,value2,…)

    Argument group "Values" is a reference to the range in which you want to calculate the number of cells filled with numeric data. There can be up to 255 such arguments in total, but in our case we only need one.

    Place the cursor in the field "Value1" and, holding down the left mouse button, select on the sheet the range that contains our collection. Then his address will be displayed in the field. Click on the button "OK".

  6. After this, the application will perform the calculation and display the result in the cell where it is located. In our particular case, the formula looked like this:

    CONFIDENCE NORM(0.03,8,COUNT(B2:B13))

    The overall result of the calculations was 5,011609 .

  7. But that is not all. As we remember, the confidence interval limit is calculated by adding and subtracting the calculation result from the sample mean TRUST.NORM. In this way, the right and left boundaries of the confidence interval are calculated, respectively. The sample mean itself can be calculated using the operator AVERAGE.

    This operator is designed to calculate the arithmetic mean of a selected range of numbers. It has the following fairly simple syntax:

    AVERAGE(number1,number2,…)

    Argument "Number" can be either separate numerical value, and a link to cells or even entire ranges that contain them.

    So, select the cell in which the calculation of the average value will be displayed, and click on the button "Insert Function".

  8. Opens Function Wizard. Going back to the category "Statistical" and select a name from the list "AVERAGE". As always, click on the button "OK".
  9. The arguments window opens. Place the cursor in the field "Number1" and holding down the left mouse button, select the entire range of values. After the coordinates are displayed in the field, click on the button "OK".
  10. After that AVERAGE displays the calculation result in a sheet element.
  11. We calculate the right boundary of the confidence interval. To do this, select a separate cell and put the sign «=» and add up the contents of the sheet elements in which the results of function calculations are located AVERAGE And TRUST.NORM. To perform the calculation, press the button Enter. In our case, we got the following formula:

    Calculation result: 6,953276

  12. In the same way we calculate the left limit of the confidence interval, only this time from the result of the calculation AVERAGE subtract the result of the operator calculation TRUST.NORM. The resulting formula for our example is of the following type:

    Calculation result: -3,06994

  13. We tried to describe in detail all the steps for calculating the confidence interval, so we described each formula in detail. But you can combine all the actions in one formula. The calculation of the right boundary of the confidence interval can be written as follows:

    AVERAGE(B2:B13)+CONFIDENCE.NORM(0.03,8,COUNT(B2:B13))

  14. A similar calculation for the left border would look like this:

    AVERAGE(B2:B13)-CONFIDENCE.NORM(0.03,8,COUNT(B2:B13))

Method 2: TRUST.STUDENT function

In addition, Excel has another function that is associated with calculating the confidence interval - TRUSTEE.STUDENT. It only appeared in Excel 2010. This operator calculates the population confidence interval using the Student distribution. It is very convenient to use when the variance and, accordingly, the standard deviation are unknown. The operator syntax is:

CONFIDENCE.STUDENT(alpha,standard_off,size)

As you can see, the names of the operators remained unchanged in this case.

Let's see how to calculate the boundaries of a confidence interval with an unknown standard deviation using the example of the same population that we considered in the previous method. Let's take the level of trust as last time at 97%.

  1. Select the cell in which the calculation will be performed. Click on the button "Insert Function".
  2. In the opened Function Wizard go to category "Statistical". Select a name "TRUSTED STUDENT". Click on the button "OK".
  3. The arguments window for the specified operator is launched.

    In field "Alpha", given that the confidence level is 97%, we write down the number 0,03 . For the second time we will not dwell on the principles of calculating this parameter.

    After this, place the cursor in the field "Standard deviation". This time this indicator is unknown to us and needs to be calculated. This is done using a special function - STDEV.V. To open the window of this operator, click on the triangle to the left of the formula bar. If we do not find the desired name in the list that opens, then go to the item "Other functions...".

  4. Starts Function Wizard. Moving to category "Statistical" and mark the name in it "STDEV.B". Then click on the button "OK".
  5. The arguments window opens. Operator's task STDEV.V is the definition standard deviation when sampling. Its syntax looks like this:

    STANDARD DEVIATION.B(number1;number2;…)

    It is not difficult to guess that the argument "Number" is the address of the selection element. If the selection is placed in a single array, then you can use only one argument to provide a link to this range.

    Place the cursor in the field "Number1" and, as always, holding down the left mouse button, select the collection. After the coordinates are in the field, do not rush to press the button "OK", since the result will be incorrect. First we need to go back to the operator arguments window TRUSTEE.STUDENT to add the final argument. To do this, click on the corresponding name in the formula bar.

  6. The argument window for the already familiar function opens again. Place the cursor in the field "Size". Again, click on the triangle we are already familiar with to go to the selection of operators. As you understand, we need a name "CHECK". Since we used this function in the calculations in the previous method, it is present in this list, so just click on it. If you do not find it, then follow the algorithm described in the first method.
  7. Once in the arguments window CHECK, place the cursor in the field "Number1" and with the mouse button held down, select the collection. Then click on the button "OK".
  8. After this, the program performs a calculation and displays the confidence interval value.
  9. To determine the boundaries, we will again need to calculate the sample mean. But, given that the calculation algorithm using the formula AVERAGE the same as in the previous method, and even the result has not changed, we will not dwell on this in detail a second time.
  10. Adding up the calculation results AVERAGE And TRUSTEE.STUDENT, we obtain the right boundary of the confidence interval.
  11. Subtracting from the calculation results of the operator AVERAGE calculation result TRUSTEE.STUDENT, we have the left limit of the confidence interval.
  12. If the calculation is written in one formula, then the calculation of the right boundary in our case will look like this:

    AVERAGE(B2:B13)+CONFIDENCE.STUDENT(0.03,STDEV.B(B2:B13),COUNT(B2:B13))

  13. Accordingly, the formula for calculating the left border will look like this:

    AVERAGE(B2:B13)-CONFIDENCE.STUDENT(0.03,STDEV.B(B2:B13),COUNT(B2:B13))

As you can see, the tools Excel programs make it possible to significantly simplify the calculation of the confidence interval and its boundaries. For these purposes, separate operators are used for samples whose variance is known and unknown.