home Treatment and prevention Least squares method in Excel. Regression analysis

Least squares method in Excel. Regression analysis

It has many uses as it allows for approximate representation given function others are simpler. LSM can be extremely useful in processing observations, and it is actively used to estimate some quantities based on the results of measurements of others containing random errors. In this article you will learn how to implement calculations using the method least squares in Excel.

Statement of the problem using a specific example

Suppose there are two indicators X and Y. Moreover, Y depends on X. Since OLS interests us from the point of view of regression analysis (in Excel its methods are implemented using built-in functions), we should immediately move on to considering a specific problem.

So, let X be the retail space of a grocery store, measured in square meters, and Y is the annual turnover, determined in millions of rubles.

It is required to make a forecast of what turnover (Y) the store will have if it has this or that retail space. Obviously, the function Y = f (X) is increasing, since the hypermarket sells more goods than the stall.

A few words about the correctness of the initial data used for prediction

Let's say we have a table built using data for n stores.

According to mathematical statistics, the results will be more or less correct if data on at least 5-6 objects is examined. In addition, “anomalous” results cannot be used. In particular, an elite small boutique can have a turnover many times greater than the turnover of large retail outlets"Masmarket" class.

The essence of the method

The table data can be depicted on a Cartesian plane in the form of points M 1 (x 1, y 1), ... M n (x n, y n). Now the solution to the problem will be reduced to the selection of an approximating function y = f (x), which has a graph passing as close as possible to the points M 1, M 2, .. M n.

Of course, you can use a high-degree polynomial, but this option is not only difficult to implement, but also simply incorrect, since it will not reflect the main trend that needs to be detected. The most reasonable solution is to search for the straight line y = ax + b, which best approximates the experimental data, or more precisely, the coefficients a and b.

Accuracy assessment

With any approximation, assessing its accuracy is of particular importance. Let us denote by e i the difference (deviation) between the functional and experimental values for point x i, i.e. e i = y i - f (x i).

Obviously, to assess the accuracy of the approximation, you can use the sum of deviations, i.e., when choosing a straight line for an approximate representation of the dependence of X on Y, you should give preference to the one with the smallest value of the sum e i at all points under consideration. However, not everything is so simple, since along with positive deviations there will also be negative ones.

The issue can be solved using deviation modules or their squares. The last method is the most widely used. It is used in many areas, including regression analysis (implemented in Excel using two built-in functions), and has long proven its effectiveness.

Least square method

Excel, as you know, has a built-in AutoSum function that allows you to calculate the values of all values located in the selected range. Thus, nothing will prevent us from calculating the value of the expression (e 1 2 + e 2 2 + e 3 2 + ... e n 2).

In mathematical notation this looks like:

Since the decision was initially made to approximate using a straight line, we have:

Thus, the task of finding the straight line that best describes the specific dependence of the quantities X and Y comes down to calculating the minimum of a function of two variables:

To do this, you need to equate the partial derivatives with respect to the new variables a and b to zero, and solve a primitive system consisting of two equations with 2 unknowns of the form:

After some simple transformations, including division by 2 and manipulation of sums, we get:

Solving it, for example, using Cramer’s method, we obtain a stationary point with certain coefficients a * and b *. This is the minimum, i.e. for predicting what turnover the store will have at certain area, the straight line y = a * x + b * will do, which is the regression model for the example in question. Of course, it will not allow you to find the exact result, but it will help you get an idea of whether purchasing a specific area on store credit will pay off.

How to Implement Least Squares in Excel

Excel has a function for calculating values using least squares. It has the following form: “TREND” (known Y values; known X values; new X values; constant). Let's apply the formula for calculating OLS in Excel to our table.

To do this, enter the “=” sign in the cell in which the result of the calculation using the least squares method in Excel should be displayed and select the “TREND” function. In the window that opens, fill in the appropriate fields, highlighting:

range of known values for Y (in this case, data for trade turnover);
range x 1 , …x n , i.e. the size of retail space;
both known and unknown values of x, for which you need to find out the size of the turnover (for information about their location on the worksheet, see below).

In addition, the formula contains the logical variable “Const”. If you enter 1 in the corresponding field, this will mean that you should carry out the calculations, assuming that b = 0.

If you need to find out the forecast for more than one x value, then after entering the formula you should not press “Enter”, but you need to type the combination “Shift” + “Control” + “Enter” on the keyboard.

Some features

Regression analysis can be accessible even to dummies. The Excel formula for predicting the value of an array of unknown variables—TREND—can be used even by those who have never heard of least squares. It is enough just to know some of the features of its work. In particular:

If you arrange the range of known values of the variable y in one row or column, then each row (column) with known values x will be treated by the program as a separate variable.
If the TREND window does not indicate a range with known x, then if the function is used in Excel program will treat it as an array consisting of integers, the number of which corresponds to the range with the given values of the variable y.
To output an array of “predicted” values, the expression for calculating the trend must be entered as an array formula.
If new values of x are not specified, then the TREND function considers them equal to the known ones. If they are not specified, then array 1 is taken as an argument; 2; 3; 4;…, which is commensurate with the range with already specified parameters y.
The range containing the new x values must have the same or more rows or columns as the range containing the given y values. In other words, it must be proportional to the independent variables.
An array with known x values can contain multiple variables. However, if we're talking about about only one, then it is required that the ranges with given values of x and y be proportional. In the case of several variables, it is necessary that the range with the given y values fit in one column or one row.

PREDICTION function

Implemented using several functions. One of them is called “PREDICTION”. It is similar to “TREND”, i.e. it gives the result of calculations using the least squares method. However, only for one X, for which the value of Y is unknown.

Now you know formulas in Excel for dummies that allow you to predict the future value of a particular indicator according to a linear trend.

The task is to find the linear dependence coefficients at which the function of two variables A And b takes the smallest value. That is, given A And b the sum of squared deviations of the experimental data from the found straight line will be the smallest. This is the whole point of the least squares method.

Thus, solving the example comes down to finding the extremum of a function of two variables.

Deriving formulas for finding coefficients. A system of two equations with two unknowns is compiled and solved. Finding the partial derivatives of a function by variables A And b, we equate these derivatives to zero.

We solve the resulting system of equations using any method (for example, the substitution method or the Cramer method) and obtain formulas for finding the coefficients using the least squares method (LSM).

Given A And b function takes the smallest value.

That's the whole method of least squares. Formula for finding the parameter a contains the sums , , , and parameter n- amount of experimental data. We recommend calculating the values of these amounts separately. Coefficient b found after calculation a.

The main area of application of such polynomials is the processing of experimental data (construction of empirical formulas). The fact is that an interpolation polynomial constructed from function values obtained through experiment will be strongly influenced by “experimental noise”; moreover, when interpolating, interpolation nodes cannot be repeated, i.e. The results of repeated experiments under the same conditions cannot be used. The root mean square polynomial smooths out noise and allows you to use the results of multiple experiments.

Numerical integration and differentiation. Example.

Numerical integration– calculation of the value of a definite integral (usually approximate). Numerical integration is understood as a set of numerical methods for finding the value of a certain integral.

Numerical differentiation– a set of methods for calculating the value of the derivative of a discretely specified function.

Integration

Formulation of the problem. Mathematical problem statement: you need to find the value definite integral

where a, b are finite, f(x) is continuous on [a, b].

When solving practical problems, it often happens that the integral is inconvenient or impossible to take analytically: it may not be expressed in elementary functions, the integrand can be specified in the form of a table, etc. In such cases, numerical integration methods are used. Numerical integration methods use replacing the area of a curvilinear trapezoid with a finite sum of areas of simpler ones geometric shapes, which can be calculated exactly. In this sense, they talk about using quadrature formulas.

Most methods use a representation of the integral as a finite sum (quadrature formula):

Quadrature formulas are based on the idea of replacing the graph of the integrand on the integration segment with functions of more simple type, which can be easily integrated analytically and thus easily calculated. The task of constructing quadrature formulas is most simply implemented for polynomial mathematical models.

Three groups of methods can be distinguished:

1. Method with dividing the integration segment into equal intervals. Partitioning into intervals is done in advance; usually the intervals are chosen equal (to make it easier to calculate the function at the ends of the intervals). Calculate areas and sum them up (rectangle, trapezoid, Simpson methods).

2. Methods with partitioning the integration segment using special points (Gauss method).

3. Calculation of integrals using random numbers (Monte Carlo method).

Rectangle method. Let the function (figure) need to be integrated numerically on the segment . Divide the segment by N equal intervals. The area of each of N curved trapezoids can be replaced by the area of a rectangle.

The width of all rectangles is the same and equals:

To select the height of the rectangles, you can select the value of the function on the left border. In this case, the height of the first rectangle will be f(a), the second - f(x 1),..., N-f(N-1).

If we take the value of the function on the right border to select the height of the rectangle, then in this case the height of the first rectangle will be f(x 1), the second - f(x 2), ..., N - f(x N).

As you can see, in this case one of the formulas gives an approximation to the integral with an excess, and the second with a deficiency. There is another way - to use the value of the function in the middle of the integration segment for approximation:

Estimation of the absolute error of the rectangle method (middle)

Estimation of the absolute error of the left and right rectangle methods.

Example. Calculate for the entire interval and dividing the interval into four sections

Solution. Analytical calculation of this integral gives I=arctg(1)–arctg(0)=0.7853981634. In our case:

1)h = 1; xo = 0; x1 = 1;

2) h = 0.25 (1/4); x0 = 0; x1 = 0.25; x2 = 0.5; x3 = 0.75; x4 = 1;

Let's calculate using the left rectangle method:

Let's calculate using the right rectangle method:

Let's calculate using the average rectangle method:

Trapezoid method. Using a first-degree polynomial (a straight line drawn through two points) to interpolate results in the trapezoidal formula. The ends of the integration segment are taken as interpolation nodes. Thus, the curvilinear trapezoid is replaced by an ordinary trapezoid, the area of which can be found as the product of half the sum of the bases and the height

In the case of N integration segments for all nodes, with the exception of the extreme points of the segment, the value of the function will be included in the total sum twice (since adjacent trapezoids have one common side)

The trapezoid formula can be obtained by taking half the sum of the formulas of rectangles along the right and left edges of the segment:

Checking the stability of the solution. As a rule, the shorter the length of each interval, i.e. how larger number these intervals, the less the difference between the approximate and exact value of the integral. This is true for most functions. In the trapezoid method, the error in calculating the integral ϭ is approximately proportional to the square of the integration step (ϭ ~ h 2). Thus, to calculate the integral of a certain function in terms of a, b, it is necessary to divide the segment into N 0 intervals and find the sum of the areas of the trapezoid. Then you need to increase the number of intervals N 1, again calculate the sum of the trapezoid and compare the resulting value with the previous result. This should be repeated until (N i) until the specified accuracy of the result is achieved (convergence criterion).

For the rectangle and trapezoid methods, usually at each iteration step the number of intervals increases by 2 times (N i +1 = 2N i).

Convergence criterion:

The main advantage of the trapezoidal rule is its simplicity. However, if high accuracy is required when calculating the integral, using this method may require too much large quantity iterations.

Absolute error of the trapezoidal method is estimated as
.

Example. Calculate an approximately definite integral using the trapezoidal formula.

a) Dividing the segment of integration into 3 parts.
b) Dividing the segment of integration into 5 parts.

Solution:
a) According to the condition, the integration segment must be divided into 3 parts, that is.
Let's calculate the length of each partition segment: .

Thus, general formula the trapezoid is reduced to a pleasant size:

Finally:

Let me remind you that the resulting value is an approximate value of the area.

b) Let's divide the integration segment into 5 equal parts, that is. By increasing the number of segments, we increase the accuracy of calculations.

If , then the trapezoidal formula takes the following form:

Let's find the partition step:
, that is, the length of each intermediate segment is 0.6.

When finalizing the task, it is convenient to formalize all calculations using a calculation table:

In the first line we write “counter”

As a result:

Well, there really is a clarification, and a serious one!
If for 3 partition segments, then for 5 segments. If you take an even larger segment => it will be even more accurate.

Simpson's formula. The trapezoid formula gives a result that strongly depends on the step size h, which affects the accuracy of calculating a certain integral, especially in cases where the function is non-monotonic. It can be assumed that the accuracy of calculations will increase if, instead of straight segments replacing curvilinear fragments of the graph of the function f(x), we use, for example, fragments of parabolas given through three adjacent points of the graph. This geometric interpretation underlies Simpson's method for calculating the definite integral. Entire interval integration a,b N segments are divided, the length of the segment will also be equal to h=(b-a)/N.

Simpson's formula looks like:

remainder term

As the length of the segments increases, the accuracy of the formula decreases, so to increase the accuracy, Simpson's compound formula is used. The entire integration interval is divided into even number identical segments N, the length of the segment will also be equal to h=(b-a)/N. Simpson's compound formula is:

In the formula, the expressions in brackets represent the sums of the values of the integrand at the ends of the odd and even internal segments, respectively.

The remainder of Simpson's formula is proportional to the fourth power of the step:

Example: Using Simpson's rule, calculate the integral. (Exact solution - 0.2)

Gauss method

Gaussian quadrature formula. The basic principle of quadrature formulas of the second type is visible from Figure 1.12: it is necessary to place the points in this way X 0 and X 1 inside the segment [ a;b], so that the total area of the “triangles” is equal to the area of the “segment”. When using the Gauss formula, the original segment [ a;b] is reduced to the segment [-1;1] by replacing the variable X on

0.5∙(b– a)∙t+ 0.5∙(b + a).

Then , Where .

Such a replacement is possible if a And b are finite, and the function f(x) is continuous on [ a;b]. Gauss formula at n points x i, i=0,1,..,n-1 inside the segment [ a;b]:

, (1.27)

Where t i And A i for various n are given in reference books. For example, when n=2 A 0 =A 1 =1; at n=3: t 0 =t 2 "0.775, t 1 =0, A 0 =A 2 "0.555, A 1 "0.889.

Gaussian quadrature formula

obtained with a weight function equal to unity p(x)= 1 and nodes x i, which are the roots of the Legendre polynomials

Odds A i easy to calculate using formulas

i=0,1,2,...n.

The values of nodes and coefficients for n=2,3,4,5 are given in the table

Order	Nodes	Odds
n=2	x 1=0 x 0 =-x 2=0.7745966692	A 1=8/9 A 0 =A 2=5/9
n=3	x 2 =-x 1=0.3399810436 x 3 =-x 0=0.8611363116	A 1 =A 2=0.6521451549 A 0 =A 3=0.6521451549
n=4	x 2 = 0 x 3 = -x 1 = 0.5384693101 x 4 =-x 0 =0.9061798459	A 0 =0.568888899 A 3 =A 1 =0.4786286705 A 0 =A 4 =0.2869268851
n=5	x 5 = -x 0 =0.9324695142 x 4 = -x 1 =0.6612093865 x 3 = -x 2 =0.2386191861	A 5 =A 0 =0.1713244924 A 4 =A 1 =0.3607615730 A 3 =A 2 =0.4679139346

Example. Calculate the value using the Gauss formula for n=2:

Exact value: .

The algorithm for calculating the integral using the Gauss formula does not involve doubling the number of microsegments, but increasing the number of ordinates by 1 and comparing the obtained values of the integral. The advantage of the Gauss formula is its high accuracy with a relatively small number of ordinates. Disadvantages: inconvenient for manual calculations; it is necessary to store the values in the computer memory t i, A i for various n.

The error of the Gaussian quadrature formula on the segment will be For the remainder term formula will be and the coefficient α N decreases quickly with growth N. Here

Gaussian formulas provide high accuracy even with a small number of nodes (from 4 to 10). In this case, in practical calculations the number of nodes ranges from several hundred to several thousand. Note also that the weights of Gaussian quadratures are always positive, which ensures the stability of the algorithm for calculating the sums

Differentiation. When solving problems, it is often necessary to find the derivative of a certain order from the function f(x), given in a table. In addition, sometimes, due to the complexity of the analytical expression of the function f(x), its direct differentiation is too difficult, and also when solving numerically differential equations. In these cases, numerical differentiation is used.

After alignment we get the function the following type: g (x) = x + 1 3 + 1 .

We can approximate this data using the linear relationship y = a x + b by calculating the corresponding parameters. To do this, we will need to apply the so-called least squares method. You will also need to make a drawing to check which line will best align the experimental data.

Yandex.RTB R-A-339285-1

What exactly is OLS (least squares method)

The main thing we need to do is to find such coefficients of linear dependence at which the value of the function of two variables F (a, b) = ∑ i = 1 n (y i - (a x i + b)) 2 will be the smallest. In other words, for certain values of a and b, the sum of the squared deviations of the presented data from the resulting straight line will have a minimum value. This is the meaning of the least squares method. All we need to do to solve the example is to find the extremum of the function of two variables.

How to derive formulas for calculating coefficients

In order to derive formulas for calculating coefficients, you need to create and solve a system of equations with two variables. To do this, we calculate the partial derivatives of the expression F (a, b) = ∑ i = 1 n (y i - (a x i + b)) 2 with respect to a and b and equate them to 0.

δ F (a , b) δ a = 0 δ F (a , b) δ b = 0 ⇔ - 2 ∑ i = 1 n (y i - (a x i + b)) x i = 0 - 2 ∑ i = 1 n ( y i - (a x i + b)) = 0 ⇔ a ∑ i = 1 n x i 2 + b ∑ i = 1 n x i = ∑ i = 1 n x i y i a ∑ i = 1 n x i + ∑ i = 1 n b = ∑ i = 1 n y i ⇔ a ∑ i = 1 n x i 2 + b ∑ i = 1 n x i = ∑ i = 1 n x i y i a ∑ i = 1 n x i + n b = ∑ i = 1 n y i

To solve a system of equations, you can use any methods, for example, substitution or Cramer's method. As a result, we should have formulas that can be used to calculate coefficients using the least squares method.

n ∑ i = 1 n x i y i - ∑ i = 1 n x i ∑ i = 1 n y i n ∑ i = 1 n - ∑ i = 1 n x i 2 b = ∑ i = 1 n y i - a ∑ i = 1 n x i n

We calculated the values of the variables at which the function
F (a , b) = ∑ i = 1 n (y i - (a x i + b)) 2 will take the minimum value. In the third paragraph we will prove why it is exactly like this.

This is the application of the least squares method in practice. Its formula, which is used to find the parameter a, includes ∑ i = 1 n x i, ∑ i = 1 n y i, ∑ i = 1 n x i y i, ∑ i = 1 n x i 2, as well as the parameter
n – it denotes the amount of experimental data. We advise you to calculate each amount separately. The value of the coefficient b is calculated immediately after a.

Let's go back to the original example.

Example 1

Here we have n equals five. To make it more convenient to calculate the required amounts included in the coefficient formulas, let’s fill out the table.

	i = 1	i=2	i=3	i=4	i=5	∑ i = 1 5
x i	0	1	2	4	5	12
y i	2 , 1	2 , 4	2 , 6	2 , 8	3	12 , 9
x i y i	0	2 , 4	5 , 2	11 , 2	15	33 , 8
x i 2	0	1	4	16	25	46

Solution

The fourth row includes the data obtained by multiplying the values from the second row by the values of the third for each individual i. The fifth line contains the data from the second, squared. The last column shows the sums of the values of individual rows.

Let's use the least squares method to calculate the coefficients a and b we need. To do this, let's substitute required values from the last column and calculate the amounts:

n ∑ i = 1 n x i y i - ∑ i = 1 n x i ∑ i = 1 n y i n ∑ i = 1 n - ∑ i = 1 n x i 2 b = ∑ i = 1 n y i - a ∑ i = 1 n x i n ⇒ a = 5 33, 8 - 12 12, 9 5 46 - 12 2 b = 12, 9 - a 12 5 ⇒ a ≈ 0, 165 b ≈ 2, 184

It turns out that the required approximating straight line will look like y = 0, 165 x + 2, 184. Now we need to determine which line will better approximate the data - g (x) = x + 1 3 + 1 or 0, 165 x + 2, 184. Let's estimate using the least squares method.

To calculate the error, we need to find the sum of squared deviations of the data from the straight lines σ 1 = ∑ i = 1 n (y i - (a x i + b i)) 2 and σ 2 = ∑ i = 1 n (y i - g (x i)) 2, the minimum value will correspond to a more suitable line.

σ 1 = ∑ i = 1 n (y i - (a x i + b i)) 2 = = ∑ i = 1 5 (y i - (0, 165 x i + 2, 184)) 2 ≈ 0, 019 σ 2 = ∑ i = 1 n (y i - g (x i)) 2 = = ∑ i = 1 5 (y i - (x i + 1 3 + 1)) 2 ≈ 0.096

Answer: since σ 1< σ 2 , то прямой, наилучшим образом аппроксимирующей исходные данные, будет
y = 0.165 x + 2.184.

The least squares method is clearly shown in the graphical illustration. The red line marks the straight line g (x) = x + 1 3 + 1, the blue line marks y = 0, 165 x + 2, 184. The original data is indicated by pink dots.

Let us explain why exactly approximations of this type are needed.

They can be used in tasks that require data smoothing, as well as in those where data must be interpolated or extrapolated. For example, in the problem discussed above, one could find the value of the observed quantity y at x = 3 or at x = 6. We have devoted a separate article to such examples.

Proof of the OLS method

In order for the function to take a minimum value when a and b are calculated, it is necessary that at a given point the matrix of the quadratic form of the differential of the function of the form F (a, b) = ∑ i = 1 n (y i - (a x i + b)) 2 is positive definite. Let's show you how it should look.

Example 2

We have a second order differential of the following form:

d 2 F (a ; b) = δ 2 F (a ; b) δ a 2 d 2 a + 2 δ 2 F (a ; b) δ a δ b d a d b + δ 2 F (a ; b) δ b 2 d 2 b

Solution

δ 2 F (a ; b) δ a 2 = δ δ F (a ; b) δ a δ a = = δ - 2 ∑ i = 1 n (y i - (a x i + b)) x i δ a = 2 ∑ i = 1 n (x i) 2 δ 2 F (a; b) δ a δ b = δ δ F (a; b) δ a δ b = = δ - 2 ∑ i = 1 n (y i - (a x i + b) ) x i δ b = 2 ∑ i = 1 n x i δ 2 F (a ; b) δ b 2 = δ δ F (a ; b) δ b δ b = δ - 2 ∑ i = 1 n (y i - (a x i + b)) δ b = 2 ∑ i = 1 n (1) = 2 n

In other words, we can write it like this: d 2 F (a ; b) = 2 ∑ i = 1 n (x i) 2 d 2 a + 2 2 ∑ x i i = 1 n d a d b + (2 n) d 2 b.

We obtained a matrix of the quadratic form M = 2 ∑ i = 1 n (x i) 2 2 ∑ i = 1 n x i 2 ∑ i = 1 n x i 2 n .

In this case the values individual elements will not change depending on a and b. Is this matrix positive definite? To answer this question, let's check whether its angular minors are positive.

We calculate the angular minor of the first order: 2 ∑ i = 1 n (x i) 2 > 0 . Since the points x i do not coincide, the inequality is strict. We will keep this in mind in further calculations.

We calculate the second order angular minor:

d e t (M) = 2 ∑ i = 1 n (x i) 2 2 ∑ i = 1 n x i 2 ∑ i = 1 n x i 2 n = 4 n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2

After this, we proceed to prove the inequality n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 > 0 using mathematical induction.

Let's check whether this inequality is valid for an arbitrary n. Let's take 2 and calculate:

2 ∑ i = 1 2 (x i) 2 - ∑ i = 1 2 x i 2 = 2 x 1 2 + x 2 2 - x 1 + x 2 2 = = x 1 2 - 2 x 1 x 2 + x 2 2 = x 1 + x 2 2 > 0

We have obtained a correct equality (if the values x 1 and x 2 do not coincide).

Let us make the assumption that this inequality will be true for n, i.e. n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 > 0 – true.
Now we will prove the validity for n + 1, i.e. that (n + 1) ∑ i = 1 n + 1 (x i) 2 - ∑ i = 1 n + 1 x i 2 > 0, if n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 > 0 .

We calculate:

(n + 1) ∑ i = 1 n + 1 (x i) 2 - ∑ i = 1 n + 1 x i 2 = = (n + 1) ∑ i = 1 n (x i) 2 + x n + 1 2 - ∑ i = 1 n x i + x n + 1 2 = = n ∑ i = 1 n (x i) 2 + n x n + 1 2 + ∑ i = 1 n (x i) 2 + x n + 1 2 - - ∑ i = 1 n x i 2 + 2 x n + 1 ∑ i = 1 n x i + x n + 1 2 = = ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 + n x n + 1 2 - x n + 1 ∑ i = 1 n x i + ∑ i = 1 n (x i) 2 = = ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 + x n + 1 2 - 2 x n + 1 x 1 + x 1 2 + + x n + 1 2 - 2 x n + 1 x 2 + x 2 2 + . . . + x n + 1 2 - 2 x n + 1 x 1 + x n 2 = = n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 + + (x n + 1 - x 1) 2 + (x n + 1 - x 2) 2 + . . . + (x n - 1 - x n) 2 > 0

The expression enclosed in curly braces will be greater than 0 (based on what we assumed in step 2), and the remaining terms will be greater than 0, since they are all squares of numbers. We have proven the inequality.

Answer: the found a and b will match lowest value functions F (a , b) = ∑ i = 1 n (y i - (a x i + b)) 2, which means they are the desired parameters of the least squares method (LSM).

If you notice an error in the text, please highlight it and press Ctrl+Enter

Which finds the widest application in various fields of science and practical activities. This could be physics, chemistry, biology, economics, sociology, psychology, and so on and so forth. By the will of fate, I often have to deal with the economy, and therefore today I will arrange for you a trip to an amazing country called Econometrics=) ...How can you not want it?! It’s very good there – you just need to make up your mind! ...But what you probably definitely want is to learn how to solve problems least squares method. And especially diligent readers will learn to solve them not only accurately, but also VERY QUICKLY ;-) But first general statement of the problem+ accompanying example:

Let in some subject area indicators that have a quantitative expression are studied. At the same time, there is every reason to believe that the indicator depends on the indicator. This assumption can be either a scientific hypothesis or based on elementary common sense. Let's leave science aside, however, and explore more appetizing areas - namely, grocery stores. Let's denote by:

– retail area of a grocery store, sq.m.,
– annual turnover of a grocery store, million rubles.

It is absolutely clear that the larger the store area, the greater in most cases its turnover will be.

Suppose that after carrying out observations/experiments/calculations/dances with a tambourine we have numerical data at our disposal:

With grocery stores, I think everything is clear: - this is the area of the 1st store, - its annual turnover, - the area of the 2nd store, - its annual turnover, etc. By the way, it is not at all necessary to have access to classified materials - a fairly accurate assessment of trade turnover can be obtained by means of mathematical statistics. However, let’s not get distracted, the commercial espionage course is already paid =)

Tabular data can also be written in the form of points and depicted in the familiar form Cartesian system .

Let's answer an important question: How many points are needed for a qualitative study?

The bigger, the better. The minimum acceptable set consists of 5-6 points. In addition, when the amount of data is small, “anomalous” results cannot be included in the sample. So, for example, a small elite store can earn orders of magnitude more than “its colleagues,” thereby distorting general pattern, which is what you need to find!

To put it very simply, we need to select a function, schedule which passes as close as possible to the points . This function is called approximating (approximation - approximation) or theoretical function . Generally speaking, an obvious “contender” immediately appears here - a high-degree polynomial, the graph of which passes through ALL points. But this option is complicated and often simply incorrect. (since the graph will “loop” all the time and poorly reflect the main trend).

Thus, the sought function must be quite simple and at the same time adequately reflect the dependence. As you might guess, one of the methods for finding such functions is called least squares method. First, let's look at its essence in general view. Let some function approximate experimental data:

How to evaluate the accuracy of this approximation? Let us also calculate the differences (deviations) between the experimental and functional values (we study the drawing). The first thought that comes to mind is to estimate how large the sum is, but the problem is that the differences can be negative (For example, ) and deviations as a result of such summation will cancel each other out. Therefore, as an estimate of the accuracy of the approximation, it begs to take the sum modules deviations:

or collapsed: (in case anyone doesn’t know: – this is the sum icon, and – an auxiliary “counter” variable, which takes values from 1 to ).

Bringing experimental points closer various functions, we will receive different meanings, and obviously, where this amount is smaller, that function is more accurate.

Such a method exists and it is called least modulus method. However, in practice I received much greater distribution least square method, in which possible negative values are eliminated not by the module, but by squaring the deviations:

, after which efforts are aimed at selecting a function such that the sum of squared deviations was as small as possible. Actually, this is where the name of the method comes from.

And now we're going back to something else important point: as noted above, the selected function should be quite simple - but there are also many such functions: linear , hyperbolic, exponential, logarithmic, quadratic etc. And, of course, here I would immediately like to “reduce the field of activity.” Which class of functions should I choose for research? A primitive but effective technique:

– The easiest way is to depict points on the drawing and analyze their location. If they tend to run in a straight line, then you should look for equation of a line with optimal values and . In other words, the task is to find SUCH coefficients so that the sum of squared deviations is the smallest.

If the points are located, for example, along hyperbole, then it is obviously clear that the linear function will give a poor approximation. In this case, we are looking for the most “favorable” coefficients for the hyperbola equation – those that give the minimum sum of squares .

Now note that in both cases we are talking about functions of two variables, whose arguments are searched dependency parameters:

And essentially we need to solve a standard problem - find minimum function of two variables.

Let's remember our example: suppose that “store” points tend to be located in a straight line and there is every reason to believe that linear dependence turnover from retail space. Let's find SUCH coefficients “a” and “be” such that the sum of squared deviations was the smallest. Everything is as usual - first 1st order partial derivatives. According to linearity rule You can differentiate right under the sum icon:

If you want to use this information for an essay or term paper, I will be very grateful for the link in the list of sources; you will find such detailed calculations in few places:

Let's create a standard system:

We reduce each equation by “two” and, in addition, “break up” the sums:

Note : independently analyze why “a” and “be” can be taken out beyond the sum icon. By the way, formally this can be done with the sum

Let's rewrite the system in “applied” form:

after which the algorithm for solving our problem begins to emerge:

Do we know the coordinates of the points? We know. Amounts can we find it? Easily. Let's make the simplest system of two linear equations in two unknowns(“a” and “be”). We solve the system, for example, Cramer's method, as a result of which we obtain a stationary point. Checking sufficient condition for an extremum, we can verify that at this point the function reaches exactly minimum. The check involves additional calculations and therefore we will leave it behind the scenes (if necessary, the missing frame can be viewed). We draw the final conclusion:

Function the best way (at least compared to any other linear function) brings experimental points closer . Roughly speaking, its graph passes as close as possible to these points. In tradition econometrics the resulting approximating function is also called paired linear regression equation .

The problem under consideration is of great practical importance. In our example situation, Eq. allows you to predict what trade turnover ("Igrek") the store will have at one or another value of the sales area (one or another meaning of “x”). Yes, the resulting forecast will only be a forecast, but in many cases it will turn out to be quite accurate.

I will analyze just one problem with “real” numbers, since there are no difficulties in it - all calculations are at the level school curriculum 7-8 grades. In 95 percent of cases, you will be asked to find just a linear function, but at the very end of the article I will show that it is no more difficult to find the equations of the optimal hyperbola, exponential and some other functions.

In fact, all that remains is to distribute the promised goodies - so that you can learn to solve such examples not only accurately, but also quickly. We carefully study the standard:

Task

As a result of studying the relationship between two indicators, we obtained next pairs numbers:

Using the least squares method, find the linear function that best approximates the empirical (experienced) data. Make a drawing on which to construct experimental points and a graph of the approximating function in a Cartesian rectangular coordinate system . Find the sum of squared deviations between the empirical and theoretical values. Find out if the feature would be better (from the point of view of the least squares method) bring experimental points closer.

Please note that the “x” meanings are natural, and this has a characteristic meaningful meaning, which I will talk about a little later; but they, of course, can also be fractional. In addition, depending on the content of a particular task, both “X” and “game” values can be completely or partially negative. Well, we have been given a “faceless” task, and we begin it solution:

We find the coefficients of the optimal function as a solution to the system:

For the purpose of more compact recording, the “counter” variable can be omitted, since it is already clear that the summation is carried out from 1 to .

Calculation the required amounts It’s more convenient to put it in tabular form:

Calculations can be carried out on a microcalculator, but it is much better to use Excel - both faster and without errors; watch a short video:

Thus, we get the following system:

Here you can multiply the second equation by 3 and subtract the 2nd from the 1st equation term by term. But this is luck - in practice, systems are often not a gift, and in such cases it saves Cramer's method:
, which means the system has a unique solution.

Let's check. I understand that you don’t want to, but why skip errors where they can absolutely not be missed? Let us substitute the found solution into left side each equation of the system:

The right-hand sides of the corresponding equations are obtained, which means that the system is solved correctly.

Thus, the desired approximating function: – from everyone linear functions It is she who best approximates the experimental data.

Unlike straight dependence of the store's turnover on its area, the found dependence is reverse (principle “the more, the less”), and this fact is immediately revealed by the negative slope. Function tells us that with an increase in a certain indicator by 1 unit, the value of the dependent indicator decreases average by 0.65 units. As they say, the higher the price of buckwheat, the less it is sold.

To plot the graph of the approximating function, we find its two values:

and execute the drawing:

The constructed straight line is called trend line (namely, a linear trend line, i.e. in general case a trend is not necessarily a straight line). Everyone is familiar with the expression “to be in trend,” and I think that this term does not need additional comments.

Let's calculate the sum of squared deviations between empirical and theoretical values. Geometrically, this is the sum of the squares of the lengths of the “raspberry” segments (two of which are so small that they are not even visible).

Let's summarize the calculations in a table:

Again, they can be done manually; just in case, I’ll give an example for the 1st point:

but it is much more effective to do it in the already known way:

We repeat once again: What is the meaning of the result obtained? From all linear functions y function the indicator is the smallest, that is, in its family it is the best approximation. And here, by the way, the final question of the problem is not accidental: what if the proposed exponential function would it be better to bring the experimental points closer?

Let's find the corresponding sum of squared deviations - to distinguish, I will denote them by the letter “epsilon”. The technique is exactly the same:

And again, just in case, the calculations for the 1st point:

In Excel we use the standard function EXP (syntax can be found in Excel Help).

Conclusion: , which means that the exponential function approximates the experimental points worse than a straight line .

But here it should be noted that “worse” is doesn't mean yet, what is wrong. Now I have built a graph of this exponential function - and it also passes close to the points - so much so that without analytical research it is difficult to say which function is more accurate.

This concludes the solution, and I return to the question of the natural values of the argument. In various studies, usually economic or sociological, natural “X’s” are used to number months, years or other equal time intervals. Consider, for example, the following problem.

3.5. Least square method

The first work that laid the foundations of the least squares method was carried out by Legendre in 1805. In the article “New methods for determining the orbits of comets,” he wrote: “After all the conditions of the problem have been fully used, it is necessary to determine the coefficients so that the magnitude of their errors were the smallest possible. Most in a simple way to achieve this is a method that consists of finding the minimum sum of squared errors.” Currently, the method is used very widely when approximating unknown functional dependencies specified by many experimental samples in order to obtain an analytical expression that is best approximated to a full-scale experiment.

Let, on the basis of an experiment, it be necessary to establish the functional dependence of the quantity y from x : Let us assume that as a result of the experiment we obtainedn values yfor the corresponding values of the argumentx. If the experimental points are located on the coordinate plane as in the figure, then, knowing that errors occur during the experiment, we can assume that the dependence is linear, i.e.y= ax+ bNote that the method does not impose restrictions on the type of function, i.e. it can be applied to any functional dependency.

From the experimenter's point of view, it is often more natural to consider that the sequence of samplingfixed in advance, i.e. is an independent variable, and counts - dependent variable. This is especially clear if under are understood as moments in time, which is most widely used in technical applications. But this is only a very common special case. For example, it is necessary to classify some samples by size. Then the independent variable will be the sample number, the dependent variable will be its individual size.

The least squares method is described in detail in many educational and scientific publications, especially in terms of approximation of functions in electrical and radio engineering, as well as in books on probability theory and mathematical statistics.

Let's return to the drawing. The dotted lines show that errors can arise not only due to imperfect measurement procedures, but also due to inaccuracy in specifying the independent variable. With the selected type of function All that remains is to select the parameters included in ita And bIt is clear that the number of parameters can be more than two, which is typical only for linear functions. In general, we will assume

.(1)

You need to select oddsa, b, c... so that the condition is fulfilled

. (2)

Let's find the values a, b, c..., turning the left side of (2) to a minimum. To do this, we define stationary points(points at which the first derivative vanishes) by differentiating the left side of (2) with respect toa, b, c:

(3)

etc. The resulting system of equations contains as many equations as unknownsa, b, c…. It is impossible to solve such a system in a general form, so it is necessary to specify, at least approximately, a specific type of function. Next, we will consider two cases: linear and quadratic functions.

Linear function .

Consider the sum of squared differences experimental values and function values at the corresponding points:

(4)

Let's select the parametersa And bso that this amount has the smallest value. Thus, the task comes down to finding the valuesa And b, at which the function has a minimum, i.e. to study the function of two independent variablesa And bto a minimum. To do this, we differentiate bya And b:

;

(5)

Substituting the experimental data and , we obtain a system of two linear equations with two unknownsa And b. Having solved this system, we can write the function .

Let us make sure that for the found valuesa And bhas a minimum. To do this, we find , and :

, , .

Hence,

− = ,

>0,

those. a sufficient minimum condition for a function of two variables is satisfied.

Quadratic function .

Let the experiment obtain the values of the function at points . Let also, based on a priori information, there be an assumption that the function is quadratic:

We need to find the coefficientsa, b And c.We have

– function of three variablesa, b, c.

In this case, system (3) takes the form:

Or:

Having solved this system of linear equations, we determine the unknownsa, b, c.

Example.Let four values of the desired function be obtained based on the experiment y = (x ) with four values of the argument, which are given in the table:

Diagnostics and diseases. Medicines from A to Z