Any sample gives only an approximate idea of ​​the general population, and all sample statistical characteristics (mean, mode, variance...) are some approximation or say an estimate of general parameters, which in most cases are not possible to calculate due to the inaccessibility of the general population (Figure 20) .

Figure 20. Sampling error

But you can specify the interval in which, with a certain degree of probability, the true (general) value of the statistical characteristic lies. This interval is called d confidence interval (CI).

So the general average value with a probability of 95% lies within

from to, (20)

Where t – table value of Student’s test for α =0.05 and f= n-1

A 99% CI can also be found, in this case t selected for α =0,01.

What is the practical significance of a confidence interval?

    A wide confidence interval indicates that the sample mean does not accurately reflect the population mean. This is usually due to an insufficient sample size, or to its heterogeneity, i.e. large dispersion. Both give a larger error of the mean and, accordingly, a wider CI. And this is the basis for returning to the research planning stage.

    The upper and lower limits of the CI provide an estimate of whether the results will be clinically significant

Let us dwell in some detail on the question of the statistical and clinical significance of the results of the study of group properties. Let us remember that the task of statistics is to detect at least some differences in general populations based on sample data. The challenge for clinicians is to detect differences (not just any) that will aid diagnosis or treatment. And statistical conclusions are not always the basis for clinical conclusions. Thus, a statistically significant decrease in hemoglobin by 3 g/l is not a cause for concern. And, conversely, if some problem in the human body is not widespread at the level of the entire population, this is not a reason not to deal with this problem.

Let's look at this situation example.

Researchers wondered whether boys who have suffered from some kind of infectious disease lag behind their peers in growth. For this purpose, it was carried out sample survey, in which 10 boys who had suffered from this disease took part. The results are presented in Table 23.

Table 23. Results of statistical processing

lower limit

upper limit

Standards (cm)

average

From these calculations it follows that the sample average height of 10-year-old boys who have undergone some infection, close to normal (132.5 cm). However, the lower limit confidence interval(126.6 cm) indicates that there is a 95% probability that the true average height of these children corresponds to the concept of “short height”, i.e. these children are stunted.

In this example, the results of the confidence interval calculations are clinically significant.

Confidence interval(CI; in English, confidence interval - CI) obtained in a study with a sample gives a measure of the accuracy (or uncertainty) of the study results in order to draw conclusions about the population of all such patients ( population). The correct definition of a 95% CI can be formulated as follows: 95% of such intervals will contain the true value in the population. This interpretation is somewhat less accurate: CI is the range of values ​​within which you can be 95% sure that it contains the true value. When using a CI, the emphasis is on determining a quantitative effect, as opposed to the P value that results from testing statistical significance. The P value does not estimate any quantity, but rather serves as a measure of the strength of evidence against the null hypothesis of “no effect.” The value of P by itself does not tell us anything about the magnitude of the difference, or even about its direction. Therefore, independent P values ​​are absolutely uninformative in articles or abstracts. In contrast, the CI indicates both the size of the effect of immediate interest, such as the benefit of a treatment, and the strength of the evidence. Therefore, DI is directly related to the practice of EBM.

Assessment approach to statistical analysis, illustrated by a CI, aims to measure the quantity of an effect of interest (sensitivity of a diagnostic test, rate of predicted cases, relative risk reduction with treatment, etc.) and also to measure the uncertainty in that effect. Most often, the CI is the range of values ​​on either side of the estimate in which the true value is likely to lie, and you can be 95% sure of it. The agreement to use the 95% probability is arbitrary, as is the P value.<0,05 для оценки статистической значимости, и авторы иногда используют 90% или 99% ДИ. Заметим, что слово «интервал» означает диапазон величин и поэтому стоит в единственном числе. Две величины, которые ограничивают интервал, называются «доверительными пределами».

CI is based on the idea that the same study performed on different samples of patients would not produce identical results, but that their results would be distributed around a true but unknown value. In other words, CI describes it as “sample-dependent variability.” The CI does not reflect additional uncertainty due to other reasons; in particular, it does not include the impact of selective loss to follow-up, poor compliance or inaccurate outcome measurement, lack of blinding, etc. The CI therefore always underestimates the total amount of uncertainty.

Confidence Interval Calculation

Table A1.1. Standard errors and confidence intervals for selected clinical measurements

Typically, a CI is calculated from an observed estimate of a quantity, such as the difference (d) between two proportions, and the standard error (SE) in the estimate of that difference. The approximate 95% CI obtained in this way is d ± 1.96 SE. The formula changes according to the nature of the outcome measure and the scope of the CI. For example, in a randomized, placebo-controlled trial of an acellular pertussis vaccine, 72 of 1670 (4.3%) infants who received the vaccine developed pertussis and 240 of 1665 (14.4%) in the control group. The percentage difference, known as the absolute risk reduction, is 10.1%. The SE of this difference is 0.99%. Accordingly, the 95% CI is 10.1% + 1.96 x 0.99%, i.e. from 8.2 to 12.0.

Despite their different philosophical approaches, CIs and statistical significance tests are closely related mathematically.

Thus, the P value is “significant”, i.e. R<0,05 соответствует 95% ДИ, который исключает величину эффекта, указывающую на отсутствие различия. Например, для различия между двумя средними пропорциями это ноль, а для относительного риска или отношения шансов - единица. При некоторых обстоятельствах эти два подхода могут быть не совсем эквивалентны. Преобладающая точка зрения: оценка с помощью ДИ - предпочтительный подход к суммированию результатов исследования, но ДИ и величина Р взаимодополняющи, и во многих статьях используются оба способа представления результатов.

The uncertainty (inaccuracy) of the estimate, expressed in CI, is largely related to the square root of the sample size. Small samples provide less information than large ones, and the CI is correspondingly wider in a smaller sample. For example, an article comparing the performance of three tests used to diagnose Helicobacter pylori infection reported a sensitivity of the urea breath test of 95.8% (95% CI 75–100). While the 95.8% figure is impressive, the small sample of 24 adult patients with J. pylori means that there is significant uncertainty in this estimate, as shown by the wide CI. Indeed, the lower limit of 75% is much lower than the 95.8% estimate. If the same sensitivity were observed in a sample of 240 people, the 95% CI would be 92.5–98.0, giving more assurance that the test is highly sensitive.

In randomized controlled trials (RCTs), nonsignificant results (i.e., those with P >0.05) are particularly susceptible to misinterpretation. The CI is particularly useful here because it shows how consistent the results are with the clinically useful true effect. For example, in an RCT comparing colonic suture and staple anastomosis, wound infection developed in 10.9% and 13.5% of patients, respectively (P = 0.30). The 95% CI for this difference is 2.6% (−2 to +8). Even in this study of 652 patients, it remains possible that there is a modest difference in the incidence of infections resulting from the two procedures. The less research, the greater the uncertainty. Sung et al. performed an RCT to compare octreotide infusion with acute sclerotherapy for acute variceal bleeding in 100 patients. In the octreotide group, the bleeding control rate was 84%; in the sclerotherapy group - 90%, which gives P = 0.56. Note that rates of ongoing bleeding are similar to those for wound infection in the study mentioned. In this case, however, the 95% CI for the difference between interventions is 6% (−7 to +19). This range is quite wide compared to the 5% difference that would be of clinical interest. Clearly, the study does not rule out a significant difference in effectiveness. Therefore, the authors’ conclusion “octreotide infusion and sclerotherapy are equally effective in the treatment of bleeding from varicose veins” is definitely invalid. In cases like this, where, as here, the 95% CI for absolute risk reduction (ARR) includes zero, the CI for NNT (number needed to treat) is quite difficult to interpret . The NPL and its CI are obtained from the reciprocals of the ACP (multiplying by 100 if these values ​​are given as percentages). Here we get NPL = 100: 6 = 16.6 with a 95% CI of -14.3 to 5.3. As can be seen from footnote “d” in table. A1.1, this CI includes values ​​of NPL from 5.3 to infinity and NPL from 14.3 to infinity.

CIs can be constructed for most commonly used statistical estimates or comparisons. For RCTs, it includes the difference between mean proportions, relative risks, odds ratios, and NLRs. Similarly, CIs can be obtained for all the major estimates made in diagnostic test accuracy studies—sensitivity, specificity, positive predictive value (all of which are simple proportions), and likelihood ratios—estimates obtained in meta-analyses and comparison-with-control studies. A personal computer program that covers many of these uses of MDIs is available with the second edition of Statistics with Confidence. Macros for calculating CIs for proportions are available free of charge for Excel and the statistical programs SPSS and Minitab at http://www.uwcm.ac.uk/study/medicine/epidemiology_statistics/research/statistics/proportions, htm.

Multiple estimates of treatment effect

While CIs are desirable for primary study outcomes, they are not necessary for all outcomes. The CI concerns clinically important comparisons. For example, when comparing two groups, the correct CI is the one constructed for the difference between the groups, as shown in the examples above, and not the CI that can be constructed for the estimate in each group. Not only is it not helpful to provide separate CIs for estimates in each group, this presentation can be misleading. Similarly, the correct approach when comparing the effectiveness of treatments in different subgroups is to compare two (or more) subgroups directly. It is incorrect to assume that a treatment is effective in only one subgroup if its CI excludes the value corresponding to no effect and the others do not. CIs are also useful when comparing results across multiple subgroups. In Fig. A 1.1 shows the relative risk of eclampsia in women with preeclampsia in subgroups of women from a placebo-controlled RCT of magnesium sulfate.

Rice. A1.2. The forest plot shows the results of 11 randomized clinical trials of bovine rotavirus vaccine for the prevention of diarrhea compared with placebo. A 95% confidence interval was used to estimate the relative risk of diarrhea. The size of the black square is proportional to the amount of information. In addition, the summary estimate of treatment effectiveness and the 95% confidence interval (indicated by a diamond) are shown. The meta-analysis used a random effects model larger than some pre-specified ones; for example, this could be the size used in calculating the sample size. A more stringent criterion requires that the entire CI range show benefit greater than a prespecified minimum.

We have already discussed the fallacy of taking a lack of statistical significance as an indication that two treatments are equally effective. It is equally important not to equate statistical significance with clinical importance. Clinical importance can be assumed when the result is statistically significant and the magnitude of the estimate of treatment effectiveness

Studies can show whether results are statistically significant and which are clinically important and which are not. In Fig. A1.2 shows the results of four tests, for which the entire CI<1, т.е. их результаты статистически значимы при Р <0,05 , . После высказанного предположения о том, что клинически важным различием было бы сокращение риска диареи на 20% (ОР = 0,8), все эти испытания показали клинически значимую оценку сокращения риска, и лишь в исследовании Treanor весь 95% ДИ меньше этой величины. Два других РКИ показали клинически важные результаты, которые не были статистически значимыми. Обратите внимание, что в трёх испытаниях точечные оценки эффективности лечения были почти идентичны, но ширина ДИ различалась (отражает размер выборки). Таким образом, по отдельности доказательная сила этих РКИ различна.

There are two types of estimates in statistics: point and interval. Point estimate is a single sample statistic that is used to estimate a population parameter. For example, the sample mean is a point estimate of the mathematical expectation of the population, and the sample variance S 2- point estimate of population variance σ 2. it has been shown that the sample mean is an unbiased estimate of the mathematical expectation of the population. A sample mean is called unbiased because the average of all sample means (with the same sample size) n) is equal to the mathematical expectation of the general population.

In order for the sample variance S 2 became an unbiased estimate of the population variance σ 2, the denominator of the sample variance should be set equal to n – 1 , but not n. In other words, the population variance is the average of all possible sample variances.

When estimating population parameters, it should be kept in mind that sample statistics such as , depend on specific samples. To take this fact into account, to obtain interval estimation mathematical expectation of the general population, analyze the distribution of sample means (for more details, see). The constructed interval is characterized by a certain confidence level, which represents the probability that the true population parameter is estimated correctly. Similar confidence intervals can be used to estimate the proportion of a characteristic R and the main distributed mass of the population.

Download the note in or format, examples in format

Constructing a confidence interval for the mathematical expectation of the population with a known standard deviation

Constructing a confidence interval for the share of a characteristic in the population

This section extends the concept of confidence interval to categorical data. This allows us to estimate the share of the characteristic in the population R using sample share RS= X/n. As indicated, if the quantities nR And n(1 – p) exceed the number 5, the binomial distribution can be approximated as normal. Therefore, to estimate the share of a characteristic in the population R it is possible to construct an interval whose confidence level is equal to (1 – α)х100%.


Where pS- sample proportion of the characteristic equal to X/n, i.e. number of successes divided by sample size, R- the share of the characteristic in the general population, Z- critical value of the standardized normal distribution, n- sample size.

Example 3. Let's assume that a sample consisting of 100 invoices filled out during the last month is extracted from the information system. Let's say that 10 of these invoices were compiled with errors. Thus, R= 10/100 = 0.1. The 95% confidence level corresponds to the critical value Z = 1.96.

Thus, the probability that between 4.12% and 15.88% of invoices contain errors is 95%.

For a given sample size, the confidence interval containing the proportion of the characteristic in the population appears wider than for a continuous random variable. This is because measurements of a continuous random variable contain more information than measurements of categorical data. In other words, categorical data that takes only two values ​​contains insufficient information to estimate the parameters of their distribution.

INcalculating estimates extracted from a finite population

Estimation of mathematical expectation. Correction factor for the final population ( fpc) was used to reduce the standard error by a factor. When calculating confidence intervals for population parameter estimates, a correction factor is applied in situations where samples are drawn without being returned. Thus, a confidence interval for the mathematical expectation having a confidence level equal to (1 – α)х100%, is calculated by the formula:

Example 4. To illustrate the use of the correction factor for a finite population, let us return to the problem of calculating the confidence interval for the average amount of invoices, discussed above in Example 3. Suppose that a company issues 5,000 invoices per month, and =110.27 dollars, S= $28.95, N = 5000, n = 100, α = 0.05, t 99 = 1.9842. Using formula (6) we obtain:

Estimation of the share of a feature. When choosing without return, the confidence interval for the proportion of the attribute having a confidence level equal to (1 – α)х100%, is calculated by the formula:

Confidence Intervals and Ethical Issues

When sampling a population and drawing statistical conclusions, ethical issues often arise. The main one is how confidence intervals and point estimates of sample statistics agree. Publishing point estimates without specifying the associated confidence intervals (usually at the 95% confidence level) and the sample size from which they are derived can create confusion. This may give the user the impression that the point estimate is exactly what he needs to predict the properties of the entire population. Thus, it is necessary to understand that in any research the focus should be not on point estimates, but on interval estimates. In addition, special attention should be paid to the correct selection of sample sizes.

Most often, the objects of statistical manipulation are the results of sociological surveys of the population on certain political issues. At the same time, the survey results are published on the front pages of newspapers, and the sampling error and statistical analysis methodology are published somewhere in the middle. To prove the validity of the obtained point estimates, it is necessary to indicate the sample size on the basis of which they were obtained, the boundaries of the confidence interval and its level of significance.

Next note

Materials from the book Levin et al. Statistics for Managers are used. – M.: Williams, 2004. – p. 448–462

Central limit theorem states that with a sufficiently large sample size, the sample distribution of means can be approximated by a normal distribution. This property does not depend on the type of distribution of the population.

And others. All of them are estimates of their theoretical analogues, which could be obtained if not a sample, but a general population were available. But alas, the general population is very expensive and often inaccessible.

The concept of interval estimation

Any sample estimate has some spread, because is a random variable depending on the values ​​in a particular sample. Therefore, for more reliable statistical conclusions, one should know not only the point estimate, but also the interval, which with a high probability γ (gamma) covers the evaluated indicator θ (theta).

Formally, these are two such values ​​(statistics) T 1 (X) And T 2 (X), What T 1< T 2 , for which at a given probability level γ the condition is met:

In short, it is likely γ or more the true indicator is between the points T 1 (X) And T 2 (X), which are called the lower and upper bounds confidence interval.

One of the conditions for constructing confidence intervals is its maximum narrowness, i.e. it should be as short as possible. The desire is quite natural, because... the researcher tries to more accurately localize the location of the desired parameter.

It follows that the confidence interval must cover the maximum probabilities of the distribution. and the assessment itself should be in the center.

That is, the probability of deviation (of the true indicator from the estimate) upward is equal to the probability of deviation downward. It should also be noted that for asymmetric distributions, the interval on the right is not equal to the interval on the left.

The figure above clearly shows that the greater the confidence probability, the wider the interval - a direct relationship.

This was a short introduction to the theory of interval estimation of unknown parameters. Let's move on to finding confidence limits for the mathematical expectation.

Confidence interval for mathematical expectation

If the original data are distributed over , then the average will be a normal value. This follows from the rule that a linear combination of normal values ​​also has a normal distribution. Therefore, to calculate probabilities we could use the mathematical apparatus of the normal distribution law.

However, this will require knowing two parameters - expectation and variance, which are usually unknown. You can, of course, use estimates instead of parameters (arithmetic mean and ), but then the distribution of the average will not be entirely normal, it will be slightly flattened downwards. This fact was cleverly noted by citizen William Gosset from Ireland, publishing his discovery in the March 1908 issue of the journal Biometrica. For purposes of secrecy, Gosset signed himself Student. This is how the Student t-distribution appeared.

However, the normal distribution of data, used by K. Gauss in analyzing errors in astronomical observations, is extremely rare in earthly life and is quite difficult to establish (about 2 thousand observations are needed for high accuracy). Therefore, it is best to discard the assumption of normality and use methods that do not depend on the distribution of the original data.

The question arises: what is the distribution of the arithmetic mean if it is calculated from the data of an unknown distribution? The answer is given by the well-known in probability theory Central limit theorem(CPT). In mathematics, there are several variants of it (the formulations have been refined over the years), but all of them, roughly speaking, boil down to the statement that the sum of a large number of independent random variables obeys the normal distribution law.

When calculating the arithmetic mean, the sum of random variables is used. From here it turns out that the arithmetic mean has a normal distribution, in which the expectation is the expectation of the original data, and the variance is .

Smart people know how to prove CLT, but we will verify this with the help of an experiment conducted in Excel. Let's simulate a sample of 50 uniformly distributed random variables (using the Excel function RANDBETWEEN). Then we will make 1000 such samples and calculate the arithmetic mean for each. Let's look at their distribution.

It can be seen that the distribution of the average is close to the normal law. If the sample size and number are made even larger, the similarity will be even better.

Now that we have seen with our own eyes the validity of the CLT, we can, using , calculate confidence intervals for the arithmetic mean, which cover the true mean or mathematical expectation with a given probability.

To establish the upper and lower limits, you need to know the parameters of the normal distribution. As a rule, there are none, so estimates are used: arithmetic mean And sample variance. I repeat, this method gives a good approximation only with large samples. When samples are small, it is often recommended to use the Student distribution. Don't believe it! The Student distribution for the mean occurs only when the original data is normally distributed, that is, almost never. Therefore, it is better to immediately set a minimum bar for the amount of required data and use asymptotically correct methods. They say 30 observations are enough. Take 50 - you won't go wrong.

T 1.2– lower and upper limits of the confidence interval

– sample arithmetic mean

s 0– standard deviation of the sample (unbiased)

n – sample size

γ – confidence probability (usually equal to 0.9, 0.95 or 0.99)

c γ =Φ -1 ((1+γ)/2)– the inverse value of the standard normal distribution function. Simply put, this is the number of standard errors from the arithmetic mean to the lower or upper bound (these three probabilities correspond to values ​​of 1.64, 1.96 and 2.58).

The essence of the formula is that the arithmetic mean is taken and then a certain amount is set aside from it ( with γ) standard errors ( s 0 /√n). Everything is known, take it and consider it.

Before the widespread use of personal computers, they used to obtain the values ​​of the normal distribution function and its inverse. They are still used today, but it is more effective to use ready-made Excel formulas. All elements from the formula above ( , and ) can be easily calculated in Excel. But there is a ready-made formula for calculating the confidence interval - TRUST.NORM. Its syntax is as follows.

CONFIDENCE.NORM(alpha;standard_off;size)

alpha– significance level or confidence level, which in the notation adopted above is equal to 1- γ, i.e. the probability that the mathematicalthe expectation will be outside the confidence interval. With a confidence level of 0.95, alpha is 0.05, etc.

standard_off– standard deviation of sample data. There is no need to calculate the standard error; Excel itself will divide by the root of n.

size– sample size (n).

The result of the CONFIDENCE NORM function is the second term from the formula for calculating the confidence interval, i.e. half-interval Accordingly, the lower and upper points are the average ± the obtained value.

Thus, it is possible to construct a universal algorithm for calculating confidence intervals for the arithmetic mean, which does not depend on the distribution of the original data. The price for universality is its asymptotic nature, i.e. the need to use relatively large samples. However, in the age of modern technology, collecting the required amount of data is usually not difficult.

Testing statistical hypotheses using confidence intervals

(module 111)

One of the main problems solved in statistics is. Its essence is briefly as follows. An assumption is made, for example, that the expectation of the general population is equal to some value. Then the distribution of sample means that can be observed for a given expectation is constructed. Next, they look at where in this conditional distribution the real average is located. If it goes beyond acceptable limits, then the appearance of such an average is very unlikely, and if the experiment is repeated once, it is almost impossible, which contradicts the hypothesis put forward, which is successfully rejected. If the average does not go beyond the critical level, then the hypothesis is not rejected (but also not proven!).

So, with the help of confidence intervals, in our case for expectation, you can also test some hypotheses. It's very easy to do. Let's say the arithmetic mean for a certain sample is equal to 100. The hypothesis is tested that the expected value is, say, 90. That is, if we pose the question primitively, it sounds like this: can it be that with the true value of the mean equal to 90, the observed the average turned out to be 100?

To answer this question, you will additionally need information about standard deviation and sample size. Let's assume the standard deviation is 30 and the number of observations is 64 (to easily extract the root). Then the standard error of the mean is 30/8 or 3.75. To calculate a 95% confidence interval, you will need to add two standard errors to each side of the mean (more precisely, 1.96). The confidence interval will be approximately 100±7.5 or from 92.5 to 107.5.

Further reasoning is as follows. If the value being tested falls within the confidence interval, then it does not contradict the hypothesis, because falls within the limits of random fluctuations (with a probability of 95%). If the point being checked falls outside the confidence interval, then the probability of such an event is very small, in any case below the acceptable level. This means that the hypothesis is rejected as contradicting the observed data. In our case, the hypothesis about the expected value is outside the confidence interval (the tested value of 90 is not included in the interval 100±7.5), so it should be rejected. Answering the primitive question above, it should be said: no, it cannot, in any case, this happens extremely rarely. Often, they indicate the specific probability of erroneously rejecting the hypothesis (p-level), and not the specified level on which the confidence interval was constructed, but more on that another time.

As you can see, constructing a confidence interval for the average (or mathematical expectation) is not difficult. The main thing is to grasp the essence, and then things will move on. In practice, most cases use a 95% confidence interval, which is approximately two standard errors wide on either side of the mean.

That's all for now. All the best!

The confidence interval comes to us from the field of statistics. This is a certain range that serves to estimate an unknown parameter with a high degree of reliability. The easiest way to explain this is with an example.

Suppose you need to study some random variable, for example, the server's response speed to a client request. Each time a user types the address of a specific site, the server responds at different speeds. Thus, the response time under study is random. So, the confidence interval allows us to determine the boundaries of this parameter, and then we can say that with a 95% probability the server will be in the range we calculated.

Or you need to find out how many people know about the company’s trademark. When the confidence interval is calculated, it will be possible to say, for example, that with a 95% probability the share of consumers aware of this is in the range from 27% to 34%.

Closely related to this term is the value of confidence probability. It represents the probability that the desired parameter is included in the confidence interval. How large our desired range will be depends on this value. The larger the value it takes, the narrower the confidence interval becomes, and vice versa. Typically it is set to 90%, 95% or 99%. The value 95% is the most popular.

This indicator is also influenced by the dispersion of observations and its definition is based on the assumption that the characteristic under study obeys. This statement is also known as Gauss’s Law. According to him, normal is a distribution of all probabilities of a continuous random variable that can be described by a probability density. If the assumption of a normal distribution is incorrect, then the estimate may be incorrect.

First, let's figure out how to calculate the confidence interval for There are two possible cases here. Dispersion (the degree of spread of a random variable) may or may not be known. If it is known, then our confidence interval is calculated using the following formula:

xsr - t*σ / (sqrt(n))<= α <= хср + t*σ / (sqrt(n)), где

α - sign,

t - parameter from the Laplace distribution table,

σ is the square root of the variance.

If the variance is unknown, then it can be calculated if we know all the values ​​of the desired feature. The following formula is used for this:

σ2 = х2ср - (хср)2, where

х2ср - average value of squares of the studied characteristic,

(хср)2 is the square of this characteristic.

The formula by which the confidence interval is calculated in this case changes slightly:

xsr - t*s / (sqrt(n))<= α <= хср + t*s / (sqrt(n)), где

xsr - sample average,

α - sign,

t is a parameter that is found using the Student distribution table t = t(ɣ;n-1),

sqrt(n) - square root of the total sample size,

s is the square root of the variance.

Consider this example. Suppose that based on the results of 7 measurements, the studied characteristic was determined to be equal to 30 and the sample variance to be equal to 36. It is necessary to find, with a probability of 99%, a confidence interval that contains the true value of the measured parameter.

First, let's determine what t is equal to: t = t (0.99; 7-1) = 3.71. Using the above formula, we get:

xsr - t*s / (sqrt(n))<= α <= хср + t*s / (sqrt(n))

30 - 3.71*36 / (sqrt(7))<= α <= 30 + 3.71*36 / (sqrt(7))

21.587 <= α <= 38.413

The confidence interval for the variance is calculated both in the case of a known mean and when there is no data on the mathematical expectation, and only the value of the point unbiased estimate of the variance is known. We will not give formulas for calculating it here, since they are quite complex and, if desired, can always be found on the Internet.

Let us only note that it is convenient to determine the confidence interval using Excel or a network service, which is called that way.