• Programming
    • Tutorial

    Introduction

    I am a mathematician and programmer. The biggest leap I took in my career was when I learned to say: "I do not understand anything!" Now I am not ashamed to tell the luminary of science that he is giving me a lecture, that I do not understand what he, the luminary, is telling me. And it's very difficult. Yes, admitting your ignorance is difficult and embarrassing. Who likes to admit that he doesn’t know the basics of something? Due to my profession, I must attend large quantities presentations and lectures, where, I admit, in the vast majority of cases I want to sleep because I don’t understand anything. But I don’t understand because the huge problem of the current situation in science lies in mathematics. It assumes that all listeners are familiar with absolutely all areas of mathematics (which is absurd). Admitting that you don’t know what a derivative is (we’ll talk about what it is a little later) is shameful.

    But I've learned to say that I don't know what multiplication is. Yes, I don't know what a subalgebra over a Lie algebra is. Yes, I don’t know why they are needed in life quadratic equations. By the way, if you are sure that you know, then we have something to talk about! Mathematics is a series of tricks. Mathematicians try to confuse and intimidate the public; where there is no confusion, there is no reputation, no authority. Yes, it is prestigious to speak in as abstract a language as possible, which is complete nonsense.

    Do you know what a derivative is? Most likely you will tell me about the limit of the difference ratio. In the first year of mathematics and mechanics at St. Petersburg State University, Viktor Petrovich Khavin told me determined derivative as the coefficient of the first term of the Taylor series of the function at a point (this was a separate gymnastics to determine the Taylor series without derivatives). I laughed at this definition for a long time until I finally understood what it was about. The derivative is nothing more than a simple measure of how similar the function we are differentiating is to the function y=x, y=x^2, y=x^3.

    I now have the honor of lecturing to students who afraid mathematics. If you are afraid of mathematics, we are on the same path. As soon as you try to read some text and it seems to you that it is overly complicated, then know that it is poorly written. I assert that there is not a single area of ​​mathematics that cannot be discussed “on the fingers” without losing accuracy.

    Assignment for the near future: I assigned my students to understand what a linear quadratic regulator is. Don’t be shy, spend three minutes of your life and follow the link. If you don’t understand anything, then we are on the same path. I (a professional mathematician-programmer) didn’t understand anything either. And I assure you, you can figure this out “on your fingers.” On this moment I don't know what it is, but I assure you that we can figure it out.

    So, the first lecture that I am going to give to my students after they come running to me in horror and say that a linear-quadratic regulator is a terrible thing that you will never master in your life is methods least squares . Can you decide linear equations? If you are reading this text, then most likely not.

    So, given two points (x0, y0), (x1, y1), for example, (1,1) and (3,2), the task is to find the equation of the line passing through these two points:

    illustration

    This line should have an equation like the following:

    Here alpha and beta are unknown to us, but two points of this line are known:

    We can write this equation in matrix form:

    What should be done here lyrical digression: What is a matrix? A matrix is ​​nothing more than a two-dimensional array. This is a way of storing data; no further meanings should be attached to it. It depends on us exactly how to interpret a certain matrix. Periodically I will interpret it as a linear mapping, periodically as a quadratic form, and sometimes simply as a set of vectors. This will all be clarified in context.

    Let's replace concrete matrices with their symbolic representation:

    Then (alpha, beta) can be easily found:

    More specifically for our previous data:

    Which leads to the following equation of the line passing through the points (1,1) and (3,2):

    Okay, everything is clear here. Let's find the equation of the line passing through three points: (x0,y0), (x1,y1) and (x2,y2):

    Oh-oh-oh, but we have three equations for two unknowns! A standard mathematician will say that there is no solution. What will the programmer say? And he will first rewrite the previous system of equations in the following form:

    In our case vectors i,j,b three-dimensional, therefore (in general case) there is no solution to this system. Any vector (alpha\*i + beta\*j) lies in the plane spanned by the vectors (i, j). If b does not belong to this plane, then there is no solution (equality cannot be achieved in the equation). What to do? Let's look for a compromise. Let's denote by e(alpha, beta) exactly how far we have not achieved equality:

    And we will try to minimize this error:

    Why square?

    We are looking not just for the minimum of the norm, but for the minimum of the square of the norm. Why? The minimum point itself coincides, and the square gives a smooth function (a quadratic function of the arguments (alpha, beta)), while simply the length gives a cone-shaped function, non-differentiable at the minimum point. Brr. A square is more convenient.

    Obviously, the error is minimized when the vector e orthogonal to the plane spanned by the vectors i And j.

    Illustration

    In other words: we are looking for a straight line such that the sum of the squared lengths of the distances from all points to this straight line is minimal:

    UPDATE: I have a problem here, the distance to the straight line should be measured vertically, and not by orthogonal projection. The commentator is right.

    Illustration

    In completely different words (carefully, poorly formalized, but it should be clear): we take all possible lines between all pairs of points and look for the average line between all:

    Illustration

    Another explanation is straightforward: we attach a spring between all data points (here we have three) and the straight line that we are looking for, and the straight line of the equilibrium state is exactly what we are looking for.

    Minimum quadratic form

    So, having given vector b and a plane spanned by the column vectors of the matrix A(in this case (x0,x1,x2) and (1,1,1)), we are looking for the vector e with a minimum square of length. Obviously, the minimum is achievable only for the vector e, orthogonal to the plane spanned by the column vectors of the matrix A:

    In other words, we are looking for a vector x=(alpha, beta) such that:

    Let me remind you that this vector x=(alpha, beta) is the minimum of the quadratic function ||e(alpha, beta)||^2:

    Here it would be useful to remember that the matrix can be interpreted also as a quadratic form, for example, the identity matrix ((1,0),(0,1)) can be interpreted as a function x^2 + y^2:

    quadratic form

    All this gymnastics is known under the name linear regression.

    Laplace's equation with Dirichlet boundary condition

    Now the simplest real task: there is a certain triangulated surface, it is necessary to smooth it. For example, let's load a model of my face:

    The original commit is available. To minimize external dependencies, I took the code of my software renderer, already on Habré. For solutions linear system I use OpenNL, it is an excellent solver, which, however, is very difficult to install: you need to copy two files (.h+.c) to the folder with your project. All smoothing is done with the following code:

    For (int d=0; d<3; d++) { nlNewContext(); nlSolverParameteri(NL_NB_VARIABLES, verts.size()); nlSolverParameteri(NL_LEAST_SQUARES, NL_TRUE); nlBegin(NL_SYSTEM); nlBegin(NL_MATRIX); for (int i=0; i<(int)verts.size(); i++) { nlBegin(NL_ROW); nlCoefficient(i, 1); nlRightHandSide(verts[i][d]); nlEnd(NL_ROW); } for (unsigned int i=0; i&face = faces[i]; for (int j=0; j<3; j++) { nlBegin(NL_ROW); nlCoefficient(face[ j ], 1); nlCoefficient(face[(j+1)%3], -1); nlEnd(NL_ROW); } } nlEnd(NL_MATRIX); nlEnd(NL_SYSTEM); nlSolve(); for (int i=0; i<(int)verts.size(); i++) { verts[i][d] = nlGetVariable(i); } }

    X, Y and Z coordinates are separable, I smooth them separately. That is, I solve three systems of linear equations, each with a number of variables equal to the number of vertices in my model. The first n rows of matrix A have only one 1 per row, and the first n rows of vector b have the original model coordinates. That is, I tie a spring between the new position of the vertex and the old position of the vertex - the new ones should not move too far from the old ones.

    All subsequent rows of matrix A (faces.size()*3 = number of edges of all triangles in the mesh) have one occurrence of 1 and one occurrence of -1, with the vector b having zero components opposite. This means I put a spring on each edge of our triangular mesh: all edges try to get the same vertex as their starting and ending point.

    Once again: all vertices are variables, and they cannot move far from their original position, but at the same time they try to become similar to each other.

    Here's the result:

    Everything would be fine, the model is really smoothed, but it has moved away from its original edge. Let's change the code a little:

    For (int i=0; i<(int)verts.size(); i++) { float scale = border[i] ? 1000: 1; nlBegin(NL_ROW); nlCoefficient(i, scale); nlRightHandSide(scale*verts[i][d]); nlEnd(NL_ROW); }

    In our matrix A, for the vertices that are on the edge, I add not a row from the category v_i = verts[i][d], but 1000*v_i = 1000*verts[i][d]. What does it change? And this changes our quadratic form of error. Now a single deviation from the top at the edge will cost not one unit, as before, but 1000*1000 units. That is, we hung a stronger spring on the extreme vertices, the solution will prefer to stretch the others more strongly. Here's the result:

    Let's double the spring strength between the vertices:
    nlCoefficient(face[ j ], 2); nlCoefficient(face[(j+1)%3], -2);

    It is logical that the surface has become smoother:

    And now even a hundred times stronger:

    What is this? Imagine that we have dipped a wire ring in soapy water. As a result, the resulting soap film will try to have the least curvature as possible, touching the border - our wire ring. This is exactly what we got by fixing the border and asking for a smooth surface inside. Congratulations, we have just solved Laplace's equation with Dirichlet boundary conditions. Sounds cool? But in reality, you just need to solve one system of linear equations.

    Poisson's equation

    Let's remember another cool name.

    Let's say I have an image like this:

    Looks good to everyone, but I don’t like the chair.

    I'll cut the picture in half:



    And I will select a chair with my hands:

    Then I will pull everything that is white in the mask to the left side of the picture, and at the same time throughout the picture I will say that the difference between two neighboring pixels should be equal to the difference between two neighboring pixels of the right picture:

    For (int i=0; i

    Here's the result:

    Code and pictures available

    Example.

    Experimental data on the values ​​of variables X And at are given in the table.

    As a result of their alignment, the function is obtained

    Using least square method, approximate these data by a linear dependence y=ax+b(find parameters A And b). Find out which of the two lines better (in the sense of the least squares method) aligns the experimental data. Make a drawing.

    The essence of the least squares method (LSM).

    The task is to find the linear dependence coefficients at which the function of two variables A And b takes the smallest value. That is, given A And b the sum of squared deviations of the experimental data from the found straight line will be the smallest. This is the whole point of the least squares method.

    Thus, solving the example comes down to finding the extremum of a function of two variables.

    Deriving formulas for finding coefficients.

    A system of two equations with two unknowns is compiled and solved. Finding partial derivatives of a function with respect to variables A And b, we equate these derivatives to zero.

    We solve the resulting system of equations using any method (for example by substitution method or ) and obtain formulas for finding coefficients using the least squares method (LSM).

    Given A And b function takes the smallest value. The proof of this fact is given.

    That's the whole method of least squares. Formula for finding the parameter a contains the sums , , , and parameter n- amount of experimental data. We recommend calculating the values ​​of these amounts separately. Coefficient b found after calculation a.

    It's time to remember the original example.

    Solution.

    In our example n=5. We fill out the table for the convenience of calculating the amounts that are included in the formulas of the required coefficients.

    The values ​​in the fourth row of the table are obtained by multiplying the values ​​of the 2nd row by the values ​​of the 3rd row for each number i.

    The values ​​in the fifth row of the table are obtained by squaring the values ​​in the 2nd row for each number i.

    The values ​​in the last column of the table are the sums of the values ​​across the rows.

    We use the formulas of the least squares method to find the coefficients A And b. We substitute the corresponding values ​​from the last column of the table into them:

    Hence, y = 0.165x+2.184- the desired approximating straight line.

    It remains to find out which of the lines y = 0.165x+2.184 or better approximates the original data, that is, makes an estimate using the least squares method.

    Error estimation of the least squares method.

    To do this, you need to calculate the sum of squared deviations of the original data from these lines And , a smaller value corresponds to a line that better approximates the original data in the sense of the least squares method.

    Since , then straight y = 0.165x+2.184 better approximates the original data.

    Graphic illustration of the least squares (LS) method.

    Everything is clearly visible on the graphs. The red line is the found straight line y = 0.165x+2.184, the blue line is , pink dots are the original data.

    Why is this needed, why all these approximations?

    I personally use it to solve problems of data smoothing, interpolation and extrapolation problems (in the original example they might be asked to find the value of an observed value y at x=3 or when x=6 using the least squares method). But we’ll talk more about this later in another section of the site.

    Proof.

    So that when found A And b function takes the smallest value, it is necessary that at this point the matrix of the quadratic form of the second order differential for the function was positive definite. Let's show it.

    It has many applications, as it allows an approximate representation of a given function by other simpler ones. LSM can be extremely useful in processing observations, and it is actively used to estimate some quantities based on the results of measurements of others containing random errors. In this article, you will learn how to implement least squares calculations in Excel.

    Statement of the problem using a specific example

    Suppose there are two indicators X and Y. Moreover, Y depends on X. Since OLS interests us from the point of view of regression analysis (in Excel its methods are implemented using built-in functions), we should immediately move on to considering a specific problem.

    So, let X be the retail space of a grocery store, measured in square meters, and Y be the annual turnover, measured in millions of rubles.

    It is required to make a forecast of what turnover (Y) the store will have if it has this or that retail space. Obviously, the function Y = f (X) is increasing, since the hypermarket sells more goods than the stall.

    A few words about the correctness of the initial data used for prediction

    Let's say we have a table built using data for n stores.

    According to mathematical statistics, the results will be more or less correct if data on at least 5-6 objects is examined. In addition, “anomalous” results cannot be used. In particular, an elite small boutique can have a turnover that is several times greater than the turnover of large retail outlets of the “masmarket” class.

    The essence of the method

    The table data can be depicted on a Cartesian plane in the form of points M 1 (x 1, y 1), ... M n (x n, y n). Now the solution to the problem will be reduced to the selection of an approximating function y = f (x), which has a graph passing as close as possible to the points M 1, M 2, .. M n.

    Of course, you can use a high-degree polynomial, but this option is not only difficult to implement, but also simply incorrect, since it will not reflect the main trend that needs to be detected. The most reasonable solution is to search for the straight line y = ax + b, which best approximates the experimental data, or more precisely, the coefficients a and b.

    Accuracy assessment

    With any approximation, assessing its accuracy is of particular importance. Let us denote by e i the difference (deviation) between the functional and experimental values ​​for point x i, i.e. e i = y i - f (x i).

    Obviously, to assess the accuracy of the approximation, you can use the sum of deviations, i.e., when choosing a straight line for an approximate representation of the dependence of X on Y, you should give preference to the one with the smallest value of the sum e i at all points under consideration. However, not everything is so simple, since along with positive deviations there will also be negative ones.

    The issue can be solved using deviation modules or their squares. The last method is the most widely used. It is used in many areas, including regression analysis (implemented in Excel using two built-in functions), and has long proven its effectiveness.

    Least square method

    Excel, as you know, has a built-in AutoSum function that allows you to calculate the values ​​of all values ​​located in the selected range. Thus, nothing will prevent us from calculating the value of the expression (e 1 2 + e 2 2 + e 3 2 + ... e n 2).

    In mathematical notation this looks like:

    Since the decision was initially made to approximate using a straight line, we have:

    Thus, the task of finding the straight line that best describes the specific dependence of the quantities X and Y comes down to calculating the minimum of a function of two variables:

    To do this, you need to equate the partial derivatives with respect to the new variables a and b to zero, and solve a primitive system consisting of two equations with 2 unknowns of the form:

    After some simple transformations, including division by 2 and manipulation of sums, we get:

    Solving it, for example, using Cramer’s method, we obtain a stationary point with certain coefficients a * and b *. This is the minimum, i.e. to predict what turnover a store will have for a certain area, the straight line y = a * x + b * is suitable, which is a regression model for the example in question. Of course, it will not allow you to find the exact result, but it will help you get an idea of ​​whether purchasing a specific area on store credit will pay off.

    How to Implement Least Squares in Excel

    Excel has a function for calculating values ​​using least squares. It has the following form: “TREND” (known Y values; known X values; new X values; constant). Let's apply the formula for calculating OLS in Excel to our table.

    To do this, enter the “=” sign in the cell in which the result of the calculation using the least squares method in Excel should be displayed and select the “TREND” function. In the window that opens, fill in the appropriate fields, highlighting:

    • range of known values ​​for Y (in this case, data for trade turnover);
    • range x 1 , …x n , i.e. the size of retail space;
    • both known and unknown values ​​of x, for which you need to find out the size of the turnover (for information about their location on the worksheet, see below).

    In addition, the formula contains the logical variable “Const”. If you enter 1 in the corresponding field, this will mean that you should carry out the calculations, assuming that b = 0.

    If you need to find out the forecast for more than one x value, then after entering the formula you should not press “Enter”, but you need to type the combination “Shift” + “Control” + “Enter” on the keyboard.

    Some features

    Regression analysis can be accessible even to dummies. The Excel formula for predicting the value of an array of unknown variables—TREND—can be used even by those who have never heard of least squares. It is enough just to know some of the features of its work. In particular:

    • If you arrange the range of known values ​​of the variable y in one row or column, then each row (column) with known values ​​of x will be perceived by the program as a separate variable.
    • If a range with known x is not specified in the TREND window, then when using the function in Excel, the program will treat it as an array consisting of integers, the number of which corresponds to the range with the given values ​​of the variable y.
    • To output an array of “predicted” values, the expression for calculating the trend must be entered as an array formula.
    • If new values ​​of x are not specified, then the TREND function considers them equal to the known ones. If they are not specified, then array 1 is taken as an argument; 2; 3; 4;…, which is commensurate with the range with already specified parameters y.
    • The range containing the new x values ​​must have the same or more rows or columns as the range containing the given y values. In other words, it must be proportional to the independent variables.
    • An array with known x values ​​can contain multiple variables. However, if we are talking about only one, then it is required that the ranges with the given values ​​of x and y be proportional. In the case of several variables, it is necessary that the range with the given y values ​​fit in one column or one row.

    PREDICTION function

    Implemented using several functions. One of them is called “PREDICTION”. It is similar to “TREND”, i.e. it gives the result of calculations using the least squares method. However, only for one X, for which the value of Y is unknown.

    Now you know formulas in Excel for dummies that allow you to predict the future value of a particular indicator according to a linear trend.

    The method of least squares is a mathematical procedure for constructing a linear equation that best fits a set of ordered pairs by finding the values ​​for a and b, the coefficients in the equation of the line. The goal of least squares is to minimize the total squared error between the values ​​of y and ŷ. If for each point we determine the error ŷ, the least squares method minimizes:

    where n = number of ordered pairs around the line. as closely as possible to the data.

    This concept is illustrated in the figure

    Based on the figure, the line that best fits the data, the regression line, minimizes the total squared error of the four points on the graph. I'll show you how to determine this using least squares with the following example.

    Imagine a young couple who have recently moved in together and share a vanity table in the bathroom. The young man began to notice that half of his table was inexorably shrinking, losing ground to hair mousses and soy complexes. Over the past few months, the guy had been closely monitoring the rate at which the number of objects on her side of the table was increasing. The table below shows the number of items the girl has accumulated on her bathroom vanity over the past few months.

    Since our goal is to find out whether the number of items increases over time, “Month” will be the independent variable, and “Number of items” will be the dependent variable.

    Using the least squares method, we determine the equation that best fits the data by calculating the values ​​of a, the y-intercept, and b, the slope of the line:

    a = y avg - bx avg

    where x avg is the average value of x, the independent variable, y avg is the average value of y, the independent variable.

    The table below summarizes the calculations required for these equations.

    The effect curve for our bathtub example would be given by the following equation:

    Since our equation has a positive slope of 0.976, the guy has evidence that the number of items on the table increases over time at an average rate of 1 item per month. The graph shows the effect curve with ordered pairs.

    The expectation for the number of items over the next six months (month 16) will be calculated as follows:

    ŷ = 5.13 + 0.976x = 5.13 + 0.976(16) ~ 20.7 = 21 items

    So, it's time for our hero to take some action.

    TREND function in Excel

    As you probably already guessed, Excel has a function for calculating values ​​by least squares method. This function is called TREND. Its syntax is as follows:

    TREND (known Y values; known X values; new X values; constant)

    known Y values ​​– an array of dependent variables, in our case, the number of objects on the table

    known values ​​X – an array of independent variables, in our case this is the month

    new X values ​​– new X values ​​(months) for which TREND function returns the expected value of the dependent variables (number of items)

    const - optional. A Boolean value that specifies whether the constant b is required to be 0.

    For example, the figure shows the TREND function used to determine the expected number of items on a bathroom vanity for the 16th month.

    Least square method

    Least square method ( OLS, OLS, Ordinary Least Squares) - one of the basic methods of regression analysis for estimating unknown parameters of regression models using sample data. The method is based on minimizing the sum of squares of regression residuals.

    It should be noted that the least squares method itself can be called a method for solving a problem in any area if the solution lies in or satisfies some criterion for minimizing the sum of squares of some functions of the required variables. Therefore, the least squares method can also be used for an approximate representation (approximation) of a given function by other (simpler) functions, when finding a set of quantities that satisfy equations or constraints, the number of which exceeds the number of these quantities, etc.

    The essence of MNC

    Let some (parametric) model of a probabilistic (regression) relationship between the (explained) variable be given y and many factors (explanatory variables) x

    where is the vector of unknown model parameters

    - random model error.

    Let there also be sample observations of the values ​​of these variables. Let be the observation number (). Then are the values ​​of the variables in the th observation. Then, for given values ​​of parameters b, it is possible to calculate the theoretical (model) values ​​of the explained variable y:

    The size of the residuals depends on the values ​​of the parameters b.

    The essence of the least squares method (ordinary, classical) is to find parameters b for which the sum of the squares of the residuals (eng. Residual Sum of Squares) will be minimal:

    In the general case, this problem can be solved by numerical optimization (minimization) methods. In this case they talk about nonlinear least squares(NLS or NLLS - English) Non-Linear Least Squares). In many cases it is possible to obtain an analytical solution. To solve the minimization problem, it is necessary to find stationary points of the function by differentiating it with respect to the unknown parameters b, equating the derivatives to zero and solving the resulting system of equations:

    If the model's random errors are normally distributed, have the same variance, and are uncorrelated, OLS parameter estimates are the same as maximum likelihood estimates (MLM).

    OLS in the case of a linear model

    Let the regression dependence be linear:

    Let y is a column vector of observations of the explained variable, and is a matrix of factor observations (the rows of the matrix are the vectors of factor values ​​in a given observation, the columns are the vector of values ​​of a given factor in all observations). The matrix representation of the linear model is:

    Then the vector of estimates of the explained variable and the vector of regression residuals will be equal

    Accordingly, the sum of squares of the regression residuals will be equal to

    Differentiating this function with respect to the vector of parameters and equating the derivatives to zero, we obtain a system of equations (in matrix form):

    .

    The solution of this system of equations gives the general formula for least squares estimates for a linear model:

    For analytical purposes, the latter representation of this formula is useful. If in a regression model the data centered, then in this representation the first matrix has the meaning of a sample covariance matrix of factors, and the second is a vector of covariances of factors with the dependent variable. If in addition the data is also normalized to MSE (that is, ultimately standardized), then the first matrix has the meaning of a sample correlation matrix of factors, the second vector - a vector of sample correlations of factors with the dependent variable.

    An important property of OLS estimates for models with constant- the line of the constructed regression passes through the center of gravity of the sample data, that is, the equality is satisfied:

    In particular, in the extreme case, when the only regressor is a constant, we find that the OLS estimate of the only parameter (the constant itself) is equal to the average value of the explained variable. That is, the arithmetic mean, known for its good properties from the laws of large numbers, is also an least squares estimate - it satisfies the criterion of the minimum sum of squared deviations from it.

    Example: simplest (pairwise) regression

    In the case of a steam room linear regression calculation formulas are simplified (you can do without matrix algebra):

    Properties of OLS estimators

    First of all, we note that for linear models, OLS estimates are linear estimates, as follows from the above formula. For unbiased OLS estimates, it is necessary and sufficient to fulfill the most important condition of regression analysis: the mathematical expectation of a random error, conditional on the factors, must be equal to zero. This condition, in particular, is satisfied if

    1. the mathematical expectation of random errors is zero, and
    2. factors and random errors are independent random variables.

    The second condition - the condition of exogeneity of factors - is fundamental. If this property is not met, then we can assume that almost any estimates will be extremely unsatisfactory: they will not even be consistent (that is, even a very large amount of data does not allow us to obtain high-quality estimates in this case). In the classical case, a stronger assumption is made about the determinism of the factors, as opposed to a random error, which automatically means that the exogeneity condition is met. In the general case, for the consistency of the estimates, it is sufficient to satisfy the exogeneity condition together with the convergence of the matrix to some non-singular matrix as the sample size increases to infinity.

    In order for, in addition to consistency and unbiasedness, estimates of (ordinary) least squares to be also effective (the best in the class of linear unbiased estimates), additional properties of random error must be met:

    These assumptions can be formulated for the covariance matrix of the random error vector

    A linear model that satisfies these conditions is called classical. OLS estimates for classical linear regression are unbiased, consistent and the most effective estimates in the class of all linear unbiased estimates (in the English literature the abbreviation is sometimes used BLUE (Best Linear Unbaised Estimator) - the best linear unbiased estimate; in Russian literature the Gauss-Markov theorem is more often cited). As is easy to show, the covariance matrix of the vector of coefficient estimates will be equal to:

    Generalized OLS

    The least squares method allows for broad generalization. Instead of minimizing the sum of squares of the residuals, one can minimize some positive definite quadratic form of the vector of residuals, where is some symmetric positive definite weight matrix. Conventional least squares is a special case of this approach, where the weight matrix is ​​proportional to the identity matrix. As is known from the theory of symmetric matrices (or operators), for such matrices there is a decomposition. Consequently, the specified functional can be represented as follows, that is, this functional can be represented as the sum of the squares of some transformed “remainders”. Thus, we can distinguish a class of least squares methods - LS methods (Least Squares).

    It has been proven (Aitken’s theorem) that for a generalized linear regression model (in which no restrictions are imposed on the covariance matrix of random errors), the most effective (in the class of linear unbiased estimates) are the so-called estimates. generalized Least Squares (GLS - Generalized Least Squares)- LS method with a weight matrix equal to the inverse covariance matrix of random errors: .

    It can be shown that the formula for GLS estimates of the parameters of a linear model has the form

    The covariance matrix of these estimates will accordingly be equal to

    In fact, the essence of OLS lies in a certain (linear) transformation (P) of the original data and the application of ordinary OLS to the transformed data. The purpose of this transformation is that for the transformed data, the random errors already satisfy the classical assumptions.

    Weighted OLS

    In the case of a diagonal weight matrix (and therefore a covariance matrix of random errors), we have the so-called weighted Least Squares (WLS). In this case, the weighted sum of squares of the model residuals is minimized, that is, each observation receives a “weight” that is inversely proportional to the variance of the random error in this observation: . In fact, the data are transformed by weighting the observations (dividing by an amount proportional to the estimated standard deviation of the random errors), and ordinary OLS is applied to the weighted data.

    Some special cases of using MNC in practice

    Approximation of linear dependence

    Let us consider the case when, as a result of studying the dependence of a certain scalar quantity on a certain scalar quantity (This could be, for example, the dependence of voltage on current strength: , where is a constant value, the resistance of the conductor), measurements of these quantities were carried out, as a result of which the values ​​and their corresponding values. The measurement data must be recorded in a table.

    Table. Measurement results.

    Measurement no.
    1
    2
    3
    4
    5
    6

    The question is: what value of the coefficient can be selected to best describe the dependence? According to the least squares method, this value should be such that the sum of the squared deviations of the values ​​from the values

    was minimal

    The sum of squared deviations has one extremum - a minimum, which allows us to use this formula. Let us find from this formula the value of the coefficient. To do this, we transform its left side as follows:

    The last formula allows us to find the value of the coefficient, which is what was required in the problem.

    Story

    Until the beginning of the 19th century. scientists did not have certain rules for solving a system of equations in which the number of unknowns is less than the number of equations; Until that time, private techniques were used that depended on the type of equations and on the wit of the calculators, and therefore different calculators, based on the same observational data, came to different conclusions. Gauss (1795) was the first to use the method, and Legendre (1805) independently discovered and published it under its modern name (French. Méthode des moindres quarrés ) . Laplace related the method to probability theory, and the American mathematician Adrain (1808) considered its probability-theoretic applications. The method was widespread and improved by further research by Encke, Bessel, Hansen and others.

    Alternative uses of OLS

    The idea of ​​the least squares method can also be used in other cases not directly related to regression analysis. The fact is that the sum of squares is one of the most common proximity measures for vectors (Euclidean metric in finite-dimensional spaces).

    One application is the “solution” of systems of linear equations in which the number of equations is greater than the number of variables

    where the matrix is ​​not square, but rectangular of size .

    Such a system of equations, in the general case, has no solution (if the rank is actually greater than the number of variables). Therefore, this system can be “solved” only in the sense of choosing such a vector to minimize the “distance” between the vectors and . To do this, you can apply the criterion of minimizing the sum of squares of the differences between the left and right sides of the system equations, that is. It is easy to show that solving this minimization problem leads to solving the following system of equations