Solving equations by least squares example. Linear Regression

17.10.2019

It has many applications, as it allows an approximate representation of a given function by other simpler ones. LSM can be extremely useful in processing observations, and it is actively used to estimate some quantities from the results of measurements of others containing random errors. In this article, you will learn how to implement least squares calculations in Excel.

Statement of the problem on a specific example

Suppose there are two indicators X and Y. Moreover, Y depends on X. Since OLS is of interest to us from the point of view of regression analysis (in Excel, its methods are implemented using built-in functions), we should immediately proceed to consider a specific problem.

So, let X be the selling area of ​​a grocery store, measured in square meters, and Y be the annual turnover, defined in millions of rubles.

It is required to make a forecast of what turnover (Y) the store will have if it has one or another retail space. Obviously, the function Y = f (X) is increasing, since the hypermarket sells more goods than the stall.

A few words about the correctness of the initial data used for prediction

Let's say we have a table built with data for n stores.

According to mathematical statistics, the results will be more or less correct if the data on at least 5-6 objects are examined. Also, "anomalous" results cannot be used. In particular, an elite small boutique can have a turnover many times greater than the turnover of large outlets of the “masmarket” class.

The essence of the method

The table data can be displayed on the Cartesian plane as points M 1 (x 1, y 1), ... M n (x n, y n). Now the solution of the problem will be reduced to the selection of an approximating function y = f (x), which has a graph passing as close as possible to the points M 1, M 2, .. M n .

Of course, you can use a high degree polynomial, but this option is not only difficult to implement, but simply incorrect, since it will not reflect the main trend that needs to be detected. The most reasonable solution is to search for a straight line y = ax + b, which best approximates the experimental data, and more precisely, the coefficients - a and b.

Accuracy score

For any approximation, the assessment of its accuracy is of particular importance. Denote by e i the difference (deviation) between the functional and experimental values ​​for the point x i , i.e. e i = y i - f (x i).

Obviously, to assess the accuracy of the approximation, you can use the sum of deviations, i.e., when choosing a straight line for an approximate representation of the dependence of X on Y, preference should be given to the one that has the smallest value of the sum e i at all points under consideration. However, not everything is so simple, since along with positive deviations, there will practically be negative ones.

You can solve the problem using the deviation modules or their squares. The latter method is the most widely used. It is used in many areas, including regression analysis (in Excel, its implementation is carried out using two built-in functions), and has long been proven to be effective.

Least square method

In Excel, as you know, there is a built-in autosum function that allows you to calculate the values ​​of all values ​​located in the selected range. Thus, nothing will prevent us from calculating the value of the expression (e 1 2 + e 2 2 + e 3 2 + ... e n 2).

In mathematical notation, this looks like:

Since the decision was initially made to approximate using a straight line, we have:

Thus, the task of finding a straight line that best describes a specific relationship between X and Y amounts to calculating the minimum of a function of two variables:

This requires equating to zero partial derivatives with respect to new variables a and b, and solving a primitive system consisting of two equations with 2 unknowns of the form:

After simple transformations, including dividing by 2 and manipulating the sums, we get:

Solving it, for example, by Cramer's method, we obtain a stationary point with certain coefficients a * and b * . This is the minimum, i.e. to predict what turnover the store will have for a certain area, the straight line y = a * x + b * is suitable, which is a regression model for the example in question. Of course, it will not allow you to find the exact result, but it will help you get an idea of ​​\u200b\u200bwhether buying a store on credit for a particular area will pay off.

How to implement the least squares method in Excel

Excel has a function for calculating the value of the least squares. It has the following form: TREND (known Y values; known X values; new X values; constant). Let's apply the formula for calculating the OLS in Excel to our table.

To do this, in the cell in which the result of the calculation using the least squares method in Excel should be displayed, enter the “=” sign and select the “TREND” function. In the window that opens, fill in the appropriate fields, highlighting:

  • range of known values ​​for Y (in this case data for turnover);
  • range x 1 , …x n , i.e. the size of retail space;
  • and known and unknown values ​​of x, for which you need to find out the size of the turnover (for information about their location on the worksheet, see below).

In addition, there is a logical variable "Const" in the formula. If you enter 1 in the field corresponding to it, then this will mean that calculations should be carried out, assuming that b \u003d 0.

If you need to know the forecast for more than one x value, then after entering the formula, you should not press "Enter", but you need to type the combination "Shift" + "Control" + "Enter" ("Enter") on the keyboard.

Some Features

Regression analysis can be accessible even to dummies. The Excel formula for predicting the value of an array of unknown variables - "TREND" - can be used even by those who have never heard of the least squares method. It is enough just to know some features of its work. In particular:

  • If you place the range of known values ​​of the variable y in one row or column, then each row (column) with known values ​​of x will be perceived by the program as a separate variable.
  • If the range with known x is not specified in the TREND window, then in case of using the function in Excel, the program will consider it as an array consisting of integers, the number of which corresponds to the range with the given values ​​of the variable y.
  • To output an array of "predicted" values, the trend expression must be entered as an array formula.
  • If no new x values ​​are specified, then the TREND function considers them equal to the known ones. If they are not specified, then array 1 is taken as an argument; 2; 3; 4;…, which is commensurate with the range with already given parameters y.
  • The range containing the new x values ​​must have the same or more rows or columns as the range with the given y values. In other words, it must be proportionate to the independent variables.
  • An array with known x values ​​can contain multiple variables. However, if we are talking about only one, then it is required that the ranges with the given values ​​of x and y be commensurate. In the case of several variables, it is necessary that the range with the given y values ​​fit in one column or one row.

FORECAST function

It is implemented using several functions. One of them is called "PREDICTION". It is similar to TREND, i.e. it gives the result of calculations using the least squares method. However, only for one X, for which the value of Y is unknown.

Now you know the Excel formulas for dummies that allow you to predict the value of the future value of an indicator according to a linear trend.

The method of least squares (LSM) allows you to estimate various quantities using the results of many measurements containing random errors.

Characteristic MNC

The main idea of ​​this method is that the sum of squared errors is considered as a criterion for the accuracy of the solution of the problem, which is sought to be minimized. When using this method, both numerical and analytical approaches can be applied.

In particular, as a numerical implementation, the least squares method implies making as many measurements of an unknown random variable as possible. Moreover, the more calculations, the more accurate the solution will be. On this set of calculations (initial data), another set of proposed solutions is obtained, from which the best one is then selected. If the set of solutions is parametrized, then the least squares method will be reduced to finding the optimal value of the parameters.

As an analytical approach to the implementation of LSM on the set of initial data (measurements) and the proposed set of solutions, some (functional) is defined, which can be expressed by a formula obtained as a certain hypothesis that needs to be confirmed. In this case, the least squares method is reduced to finding the minimum of this functional on the set of squared errors of the initial data.

Note that not the errors themselves, but the squares of the errors. Why? The fact is that often the deviations of measurements from the exact value are both positive and negative. When determining the average, simple summation can lead to an incorrect conclusion about the quality of the estimate, since the mutual cancellation of positive and negative values ​​will reduce the sampling power of the set of measurements. And, consequently, the accuracy of the assessment.

To prevent this from happening, the squared deviations are summed up. Even more than that, in order to equalize the dimension of the measured value and the final estimate, the sum of squared errors is used to extract

Some applications of MNCs

MNC is widely used in various fields. For example, in probability theory and mathematical statistics, the method is used to determine such a characteristic of a random variable as the standard deviation, which determines the width of the range of values ​​of a random variable.

  • Programming
    • tutorial

    Introduction

    I am a computer programmer. I made the biggest leap in my career when I learned to say: "I do not understand anything!" Now I am not ashamed to tell the luminary of science that he is giving me a lecture, that I do not understand what it, the luminary, is talking to me about. And it's very difficult. Yes, it's hard and embarrassing to admit you don't know. Who likes to admit that he does not know the basics of something. By virtue of my profession, I have to attend a large number of presentations and lectures, where, I confess, in the vast majority of cases I feel sleepy, because I do not understand anything. And I don’t understand because the huge problem of the current situation in science lies in mathematics. It assumes that all students are familiar with absolutely all areas of mathematics (which is absurd). To admit that you do not know what a derivative is (that this is a little later) is a shame.

    But I've learned to say that I don't know what multiplication is. Yes, I don't know what a subalgebra over a Lie algebra is. Yes, I do not know why quadratic equations are needed in life. By the way, if you are sure that you know, then we have something to talk about! Mathematics is a series of tricks. Mathematicians try to confuse and intimidate the public; where there is no confusion, no reputation, no authority. Yes, it is prestigious to speak in the most abstract language possible, which is complete nonsense in itself.

    Do you know what a derivative is? Most likely you will tell me about the limit of the difference relation. In the first year of mathematics at St. Petersburg State University, Viktor Petrovich Khavin me defined derivative as the coefficient of the first term of the Taylor series of the function at the point (it was a separate gymnastics to determine the Taylor series without derivatives). I laughed at this definition for a long time, until I finally understood what it was about. The derivative is nothing more than just a measure of how much the function we are differentiating is similar to the function y=x, y=x^2, y=x^3.

    I now have the honor of lecturing students who afraid mathematics. If you are afraid of mathematics - we are on the way. As soon as you try to read some text and it seems to you that it is overly complicated, then know that it is badly written. I argue that there is not a single area of ​​mathematics that cannot be spoken about "on fingers" without losing accuracy.

    The challenge for the near future: I instructed my students to understand what a linear-quadratic controller is. Don't be shy, waste three minutes of your life, follow the link. If you do not understand anything, then we are on the way. I (a professional mathematician-programmer) also did not understand anything. And I assure you, this can be sorted out "on the fingers." At the moment I do not know what it is, but I assure you that we will be able to figure it out.

    So, the first lecture that I am going to give to my students after they come running to me in horror with the words that the linear-quadratic controller is a terrible bug that you will never master in your life is least squares methods. Can you solve linear equations? If you are reading this text, then most likely not.

    So, given two points (x0, y0), (x1, y1), for example, (1,1) and (3,2), the task is to find the equation of a straight line passing through these two points:

    illustration

    This straight line should have an equation like the following:

    Here alpha and beta are unknown to us, but two points of this line are known:

    You can write this equation in matrix form:

    Here we should make a lyrical digression: what is a matrix? A matrix is ​​nothing but a two-dimensional array. This is a way of storing data, no more values ​​should be attached to it. It is up to us how exactly to interpret a certain matrix. Periodically, I will interpret it as a linear mapping, periodically as a quadratic form, and sometimes simply as a set of vectors. This will all be clarified in context.

    Let's replace specific matrices with their symbolic representation:

    Then (alpha, beta) can be easily found:

    More specifically for our previous data:

    Which leads to the following equation of a straight line passing through the points (1,1) and (3,2):

    Okay, everything is clear here. And let's find the equation of a straight line passing through three points: (x0,y0), (x1,y1) and (x2,y2):

    Oh-oh-oh, but we have three equations for two unknowns! The standard mathematician will say that there is no solution. What will the programmer say? And he will first rewrite the previous system of equations in the following form:

    In our case, the vectors i, j, b are three-dimensional, therefore, (in the general case) there is no solution to this system. Any vector (alpha\*i + beta\*j) lies in the plane spanned by the vectors (i, j). If b does not belong to this plane, then there is no solution (equality in the equation cannot be achieved). What to do? Let's look for a compromise. Let's denote by e(alpha, beta) how exactly we did not achieve equality:

    And we will try to minimize this error:

    Why a square?

    We are looking not just for the minimum of the norm, but for the minimum of the square of the norm. Why? The minimum point itself coincides, and the square gives a smooth function (a quadratic function of the arguments (alpha,beta)), while just the length gives a function in the form of a cone, non-differentiable at the minimum point. Brr. Square is more convenient.

    Obviously, the error is minimized when the vector e orthogonal to the plane spanned by the vectors i And j.

    Illustration

    In other words: we are looking for a line such that the sum of the squared lengths of the distances from all points to this line is minimal:

    UPDATE: here I have a jamb, the distance to the line should be measured vertically, not orthographic projection. the commenter is right.

    Illustration

    In completely different words (carefully, poorly formalized, but it should be clear on the fingers): we take all possible lines between all pairs of points and look for the average line between all:

    Illustration

    Another explanation on the fingers: we attach a spring between all data points (here we have three) and the line that we are looking for, and the line of the equilibrium state is exactly what we are looking for.

    Quadratic form minimum

    So, given the vector b and the plane spanned by the columns-vectors of the matrix A(in this case (x0,x1,x2) and (1,1,1)), we are looking for a vector e with a minimum square of length. Obviously, the minimum is achievable only for the vector e, orthogonal to the plane spanned by the columns-vectors of the matrix A:

    In other words, we are looking for a vector x=(alpha, beta) such that:

    I remind you that this vector x=(alpha, beta) is the minimum of the quadratic function ||e(alpha, beta)||^2:

    Here it is useful to remember that the matrix can be interpreted as well as the quadratic form, for example, the identity matrix ((1,0),(0,1)) can be interpreted as a function of x^2 + y^2:

    quadratic form

    All this gymnastics is known as linear regression.

    Laplace equation with Dirichlet boundary condition

    Now the simplest real problem: there is a certain triangulated surface, it is necessary to smooth it. For example, let's load my face model:

    The original commit is available. To minimize external dependencies, I took the code of my software renderer, already on Habré. To solve the linear system, I use OpenNL , it's a great solver, but it's very difficult to install: you need to copy two files (.h + .c) to your project folder. All smoothing is done by the following code:

    For (int d=0; d<3; d++) { nlNewContext(); nlSolverParameteri(NL_NB_VARIABLES, verts.size()); nlSolverParameteri(NL_LEAST_SQUARES, NL_TRUE); nlBegin(NL_SYSTEM); nlBegin(NL_MATRIX); for (int i=0; i<(int)verts.size(); i++) { nlBegin(NL_ROW); nlCoefficient(i, 1); nlRightHandSide(verts[i][d]); nlEnd(NL_ROW); } for (unsigned int i=0; i&face = faces[i]; for (int j=0; j<3; j++) { nlBegin(NL_ROW); nlCoefficient(face[ j ], 1); nlCoefficient(face[(j+1)%3], -1); nlEnd(NL_ROW); } } nlEnd(NL_MATRIX); nlEnd(NL_SYSTEM); nlSolve(); for (int i=0; i<(int)verts.size(); i++) { verts[i][d] = nlGetVariable(i); } }

    X, Y and Z coordinates are separable, I smooth them separately. That is, I solve three systems of linear equations, each with the same number of variables as the number of vertices in my model. The first n rows of matrix A have only one 1 per row, and the first n rows of vector b have original model coordinates. That is, I spring-tie between the new vertex position and the old vertex position - the new ones shouldn't be too far away from the old ones.

    All subsequent rows of matrix A (faces.size()*3 = the number of edges of all triangles in the grid) have one occurrence of 1 and one occurrence of -1, while the vector b has zero components opposite. This means I put a spring on each edge of our triangular mesh: all edges try to get the same vertex as their starting and ending points.

    Once again: all vertices are variables, and they cannot deviate far from their original position, but at the same time they try to become similar to each other.

    Here is the result:

    Everything would be fine, the model is really smoothed, but it moved away from its original edge. Let's change the code a little:

    For (int i=0; i<(int)verts.size(); i++) { float scale = border[i] ? 1000: 1; nlBegin(NL_ROW); nlCoefficient(i, scale); nlRightHandSide(scale*verts[i][d]); nlEnd(NL_ROW); }

    In our matrix A, for the vertices that are on the edge, I add not a row from the category v_i = verts[i][d], but 1000*v_i = 1000*verts[i][d]. What does it change? And this changes our quadratic form of the error. Now a single deviation from the top at the edge will cost not one unit, as before, but 1000 * 1000 units. That is, we hung a stronger spring on the extreme vertices, the solution prefers to stretch others more strongly. Here is the result:

    Let's double the strength of the springs between the vertices:
    nlCoefficient(face[ j ], 2); nlCoefficient(face[(j+1)%3], -2);

    It is logical that the surface has become smoother:

    And now even a hundred times stronger:

    What is this? Imagine that we have dipped a wire ring in soapy water. As a result, the resulting soap film will try to have the least curvature as possible, touching the same border - our wire ring. This is exactly what we got by fixing the border and asking for a smooth surface inside. Congratulations, we have just solved the Laplace equation with Dirichlet boundary conditions. Sounds cool? But in fact, just one system of linear equations to solve.

    Poisson equation

    Let's have another cool name.

    Let's say I have an image like this:

    Everyone is good, but I don't like the chair.

    I'll cut the picture in half:



    And I will select a chair with my hands:

    Then I will drag everything that is white in the mask to the left side of the picture, and at the same time I will say throughout the whole picture that the difference between two neighboring pixels should be equal to the difference between two neighboring pixels of the right image:

    For (int i=0; i

    Here is the result:

    Code and pictures are available

    Example.

    Experimental data on the values ​​of variables X And at are given in the table.

    As a result of their alignment, the function

    Using least square method, approximate these data with a linear dependence y=ax+b(find options A And b). Find out which of the two lines is better (in the sense of the least squares method) aligns the experimental data. Make a drawing.

    The essence of the method of least squares (LSM).

    The problem is to find the linear dependence coefficients for which the function of two variables A And b takes the smallest value. That is, given the data A And b the sum of the squared deviations of the experimental data from the found straight line will be the smallest. This is the whole point of the least squares method.

    Thus, the solution of the example is reduced to finding the extremum of a function of two variables.

    Derivation of formulas for finding coefficients.

    A system of two equations with two unknowns is compiled and solved. Finding partial derivatives of a function with respect to variables A And b, we equate these derivatives to zero.

    We solve the resulting system of equations by any method (for example substitution method or ) and obtain formulas for finding the coefficients using the least squares method (LSM).

    With data A And b function takes the smallest value. The proof of this fact is given.

    That's the whole method of least squares. Formula for finding the parameter a contains the sums , , , and the parameter n- amount of experimental data. The values ​​of these sums are recommended to be calculated separately. Coefficient b found after calculation a.

    It's time to remember the original example.

    Solution.

    In our example n=5. We fill in the table for the convenience of calculating the amounts that are included in the formulas of the required coefficients.

    The values ​​in the fourth row of the table are obtained by multiplying the values ​​of the 2nd row by the values ​​of the 3rd row for each number i.

    The values ​​in the fifth row of the table are obtained by squaring the values ​​of the 2nd row for each number i.

    The values ​​of the last column of the table are the sums of the values ​​across the rows.

    We use the formulas of the least squares method to find the coefficients A And b. We substitute in them the corresponding values ​​from the last column of the table:

    Hence, y=0.165x+2.184 is the desired approximating straight line.

    It remains to find out which of the lines y=0.165x+2.184 or better approximates the original data, i.e. to make an estimate using the least squares method.

    Estimation of the error of the method of least squares.

    To do this, you need to calculate the sums of squared deviations of the original data from these lines And , a smaller value corresponds to a line that better approximates the original data in terms of the least squares method.

    Since , then the line y=0.165x+2.184 approximates the original data better.

    Graphic illustration of the least squares method (LSM).

    Everything looks great on the charts. The red line is the found line y=0.165x+2.184, the blue line is , the pink dots are the original data.

    What is it for, what are all these approximations for?

    I personally use to solve data smoothing problems, interpolation and extrapolation problems (in the original example, you could be asked to find the value of the observed value y at x=3 or when x=6 according to the MNC method). But we will talk more about this later in another section of the site.

    Proof.

    So that when found A And b function takes the smallest value, it is necessary that at this point the matrix of the quadratic form of the second-order differential for the function was positive definite. Let's show it.

    We approximate the function by a polynomial of the 2nd degree. To do this, we calculate the coefficients of the normal system of equations:

    , ,

    Let us compose a normal system of least squares, which has the form:

    The solution of the system is easy to find:, , .

    Thus, the polynomial of the 2nd degree is found: .

    Theoretical background

    Back to page<Введение в вычислительную математику. Примеры>

    Example 2. Finding the optimal degree of a polynomial.

    Back to page<Введение в вычислительную математику. Примеры>

    Example 3. Derivation of a normal system of equations for finding the parameters of an empirical dependence.

    Let us derive a system of equations for determining the coefficients and functions , which performs the root-mean-square approximation of the given function with respect to points. Compose a function and write the necessary extremum condition for it:

    Then the normal system will take the form:

    We have obtained a linear system of equations for unknown parameters and, which is easily solved.

    Theoretical background

    Back to page<Введение в вычислительную математику. Примеры>

    Example.

    Experimental data on the values ​​of variables X And at are given in the table.

    As a result of their alignment, the function

    Using least square method, approximate these data with a linear dependence y=ax+b(find options A And b). Find out which of the two lines is better (in the sense of the least squares method) aligns the experimental data. Make a drawing.

    The essence of the method of least squares (LSM).

    The problem is to find the linear dependence coefficients for which the function of two variables A And btakes the smallest value. That is, given the data A And b the sum of the squared deviations of the experimental data from the found straight line will be the smallest. This is the whole point of the least squares method.

    Thus, the solution of the example is reduced to finding the extremum of a function of two variables.

    Derivation of formulas for finding coefficients.

    A system of two equations with two unknowns is compiled and solved. Finding partial derivatives of functions by variables A And b, we equate these derivatives to zero.

    We solve the resulting system of equations by any method (for example substitution method or Cramer's method) and obtain formulas for finding coefficients using the least squares method (LSM).

    With data A And b function takes the smallest value. The proof of this fact is given below in the text at the end of the page.

    That's the whole method of least squares. Formula for finding the parameter a contains the sums , , , and the parameter n is the amount of experimental data. The values ​​of these sums are recommended to be calculated separately.

    Coefficient b found after calculation a.

    It's time to remember the original example.

    Solution.

    In our example n=5. We fill in the table for the convenience of calculating the amounts that are included in the formulas of the required coefficients.

    The values ​​in the fourth row of the table are obtained by multiplying the values ​​of the 2nd row by the values ​​of the 3rd row for each number i.

    The values ​​in the fifth row of the table are obtained by squaring the values ​​of the 2nd row for each number i.

    The values ​​of the last column of the table are the sums of the values ​​across the rows.

    We use the formulas of the least squares method to find the coefficients A And b. We substitute in them the corresponding values ​​from the last column of the table:

    Hence, y=0.165x+2.184 is the desired approximating straight line.

    It remains to find out which of the lines y=0.165x+2.184 or better approximates the original data, i.e. to make an estimate using the least squares method.

    Estimation of the error of the method of least squares.

    To do this, you need to calculate the sums of squared deviations of the original data from these lines And , a smaller value corresponds to a line that better approximates the original data in terms of the least squares method.

    Since , then the line y=0.165x+2.184 approximates the original data better.

    Graphic illustration of the least squares method (LSM).

    Everything looks great on the charts. The red line is the found line y=0.165x+2.184, the blue line is , the pink dots are the original data.

    What is it for, what are all these approximations for?

    I personally use to solve data smoothing problems, interpolation and extrapolation problems (in the original example, you could be asked to find the value of the observed value y at x=3 or when x=6 according to the MNC method). But we will talk more about this later in another section of the site.

    Top of page

    Proof.

    So that when found A And b function takes the smallest value, it is necessary that at this point the matrix of the quadratic form of the second-order differential for the function was positive definite. Let's show it.

    The second order differential has the form:

    That is

    Therefore, the matrix of the quadratic form has the form

    and the values ​​of the elements do not depend on A And b.

    Let us show that the matrix is ​​positive definite. This requires that the angle minors be positive.

    Angular minor of the first order . The inequality is strict, since the points do not coincide. This will be implied in what follows.

    Angular minor of the second order

    Let's prove that method of mathematical induction.

    Conclusion: found values A And b correspond to the smallest value of the function , therefore, are the desired parameters for the least squares method.

    Ever understand?
    Order a Solution

    Top of page

    Development of a forecast using the least squares method. Problem solution example

    Extrapolation - this is a method of scientific research, which is based on the dissemination of past and present trends, patterns, relationships to the future development of the object of forecasting. Extrapolation methods include moving average method, exponential smoothing method, least squares method.

    Essence least squares method consists in minimizing the sum of square deviations between the observed and calculated values. The calculated values ​​are found according to the selected equation - the regression equation. The smaller the distance between the actual values ​​and the calculated ones, the more accurate the forecast based on the regression equation.

    The theoretical analysis of the essence of the phenomenon under study, the change in which is displayed by a time series, serves as the basis for choosing a curve. Considerations about the nature of the growth of the levels of the series are sometimes taken into account. So, if the growth of output is expected in an arithmetic progression, then smoothing is performed in a straight line. If it turns out that the growth is exponential, then smoothing should be done according to the exponential function.

    The working formula of the method of least squares : Y t+1 = a*X + b, where t + 1 is the forecast period; Уt+1 – predicted indicator; a and b are coefficients; X is a symbol of time.

    The calculation of the coefficients a and b is carried out according to the following formulas:

    where, Uf - the actual values ​​of the series of dynamics; n is the number of levels in the time series;

    The smoothing of time series by the least squares method serves to reflect the patterns of development of the phenomenon under study. In the analytical expression of a trend, time is considered as an independent variable, and the levels of the series act as a function of this independent variable.

    The development of a phenomenon does not depend on how many years have passed since the starting point, but on what factors influenced its development, in what direction and with what intensity. From this it is clear that the development of a phenomenon in time appears as a result of the action of these factors.

    Correctly setting the type of curve, the type of analytical dependence on time is one of the most difficult tasks of pre-predictive analysis. .

    The selection of the type of function that describes the trend, the parameters of which are determined by the least squares method, is made empirically in most cases, by constructing a number of functions and comparing them with each other by the value of the root-mean-square error calculated by the formula:

    where Uf - the actual values ​​of the series of dynamics; Ur – calculated (smoothed) values ​​of the time series; n is the number of levels in the time series; p is the number of parameters defined in the formulas describing the trend (development trend).

    Disadvantages of the least squares method :

    • when trying to describe the economic phenomenon under study using a mathematical equation, the forecast will be accurate for a short period of time and the regression equation should be recalculated as new information becomes available;
    • the complexity of the selection of the regression equation, which is solvable using standard computer programs.

    An example of using the least squares method to develop a forecast

    Task . There are data characterizing the level of unemployment in the region, %

    • Build a forecast of the unemployment rate in the region for the months of November, December, January, using the methods: moving average, exponential smoothing, least squares.
    • Calculate the errors in the resulting forecasts using each method.
    • Compare the results obtained, draw conclusions.

    Least squares solution

    For the solution, we will compile a table in which we will make the necessary calculations:

    ε = 28.63/10 = 2.86% forecast accuracy high.

    Conclusion : Comparing the results obtained in the calculations moving average method , exponential smoothing and the least squares method, we can say that the average relative error in calculations by the exponential smoothing method falls within 20-50%. This means that the prediction accuracy in this case is only satisfactory.

    In the first and third cases, the forecast accuracy is high, since the average relative error is less than 10%. But the moving average method made it possible to obtain more reliable results (forecast for November - 1.52%, forecast for December - 1.53%, forecast for January - 1.49%), since the average relative error when using this method is the smallest - 1 ,13%.

    Least square method

    Other related articles:

    List of sources used

    1. Scientific and methodological recommendations on the issues of diagnosing social risks and forecasting challenges, threats and social consequences. Russian State Social University. Moscow. 2010;
    2. Vladimirova L.P. Forecasting and planning in market conditions: Proc. allowance. M .: Publishing House "Dashkov and Co", 2001;
    3. Novikova N.V., Pozdeeva O.G. Forecasting the National Economy: Educational and Methodological Guide. Yekaterinburg: Publishing House Ural. state economy university, 2007;
    4. Slutskin L.N. MBA course in business forecasting. Moscow: Alpina Business Books, 2006.

    MNE Program

    Enter data

    Data and Approximation y = a + b x

    i- number of the experimental point;
    x i- the value of the fixed parameter at the point i;
    y i- the value of the measured parameter at the point i;
    ω i- measurement weight at point i;
    y i, calc.- the difference between the measured value and the value calculated from the regression y at the point i;
    S x i (x i)- error estimate x i when measuring y at the point i.

    Data and Approximation y = k x

    i x i y i ω i y i, calc. Δy i S x i (x i)

    Click on the chart

    User manual for the MNC online program.

    In the data field, enter on each separate line the values ​​of `x` and `y` at one experimental point. Values ​​must be separated by whitespace (space or tab).

    The third value can be the point weight of `w`. If the point weight is not specified, then it is equal to one. In the overwhelming majority of cases, the weights of the experimental points are unknown or not calculated; all experimental data are considered equivalent. Sometimes the weights in the studied range of values ​​are definitely not equivalent and can even be calculated theoretically. For example, in spectrophotometry, weights can be calculated using simple formulas, although basically everyone neglects this to reduce labor costs.

    Data can be pasted through the clipboard from an office suite spreadsheet, such as Excel from Microsoft Office or Calc from Open Office. To do this, in the spreadsheet, select the range of data to copy, copy to the clipboard, and paste the data into the data field on this page.

    To calculate by the least squares method, at least two points are required to determine two coefficients `b` - the tangent of the angle of inclination of the straight line and `a` - the value cut off by the straight line on the `y` axis.

    To estimate the error of the calculated regression coefficients, it is necessary to set the number of experimental points to more than two.

    Least squares method (LSM).

    The greater the number of experimental points, the more accurate the statistical estimate of the coefficients (due to the decrease in the Student's coefficient) and the closer the estimate to the estimate of the general sample.

    Obtaining values ​​at each experimental point is often associated with significant labor costs, therefore, a compromise number of experiments is often carried out, which gives a digestible estimate and does not lead to excessive labor costs. As a rule, the number of experimental points for a linear least squares dependence with two coefficients is chosen in the region of 5-7 points.

    A Brief Theory of Least Squares for Linear Dependence

    Suppose we have a set of experimental data in the form of pairs of values ​​[`y_i`, `x_i`], where `i` is the number of one experimental measurement from 1 to `n`; `y_i` - the value of the measured value at the point `i`; `x_i` - the value of the parameter we set at the point `i`.

    An example is the operation of Ohm's law. By changing the voltage (potential difference) between sections of the electrical circuit, we measure the amount of current passing through this section. Physics gives us the dependence found experimentally:

    `I=U/R`,
    where `I` - current strength; `R` - resistance; `U` - voltage.

    In this case, `y_i` is the measured current value, and `x_i` is the voltage value.

    As another example, consider the absorption of light by a solution of a substance in solution. Chemistry gives us the formula:

    `A = εl C`,
    where `A` is the optical density of the solution; `ε` - solute transmittance; `l` - path length when light passes through a cuvette with a solution; `C` is the concentration of the solute.

    In this case, `y_i` is the measured optical density `A`, and `x_i` is the concentration of the substance that we set.

    We will consider the case when the relative error in setting `x_i` is much less than the relative error in measuring `y_i`. We will also assume that all measured values ​​of `y_i` are random and normally distributed, i.e. obey the normal distribution law.

    In the case of a linear dependence of `y` on `x`, we can write a theoretical dependence:
    `y = a + bx`.

    From a geometric point of view, the coefficient `b` denotes the tangent of the angle of inclination of the line to the `x` axis, and the coefficient `a` - the value of `y` at the point of intersection of the line with the `y` axis (with `x = 0`).

    Finding the parameters of the regression line.

    In an experiment, the measured values ​​of `y_i` cannot lie exactly on the theoretical line due to measurement errors, which are always inherent in real life. Therefore, a linear equation must be represented by a system of equations:
    `y_i = a + b x_i + ε_i` (1),
    where `ε_i` is the unknown measurement error of `y` in the `i`th experiment.

    Dependence (1) is also called regression, i.e. the dependence of the two quantities on each other with statistical significance.

    The task of restoring the dependence is to find the coefficients `a` and `b` from the experimental points [`y_i`, `x_i`].

    To find the coefficients `a` and `b` is usually used least square method(MNK). It is a special case of the maximum likelihood principle.

    Let's rewrite (1) as `ε_i = y_i - a - b x_i`.

    Then the sum of squared errors will be
    `Φ = sum_(i=1)^(n) ε_i^2 = sum_(i=1)^(n) (y_i - a - b x_i)^2`. (2)

    The principle of the least squares method is to minimize the sum (2) with respect to the parameters `a` and `b`.

    The minimum is reached when the partial derivatives of the sum (2) with respect to the coefficients `a` and `b` are equal to zero:
    `frac(partial Φ)(partial a) = frac(partial sum_(i=1)^(n) (y_i - a - b x_i)^2)(partial a) = 0`
    `frac(partial Φ)(partial b) = frac(partial sum_(i=1)^(n) (y_i - a - b x_i)^2)(partial b) = 0`

    Expanding the derivatives, we obtain a system of two equations with two unknowns:
    `sum_(i=1)^(n) (2a + 2bx_i - 2y_i) = sum_(i=1)^(n) (a + bx_i - y_i) = 0`
    `sum_(i=1)^(n) (2bx_i^2 + 2ax_i - 2x_iy_i) = sum_(i=1)^(n) (bx_i^2 + ax_i - x_iy_i) = 0`

    We open the brackets and transfer the sums independent of the desired coefficients to the other half, we get a system of linear equations:
    `sum_(i=1)^(n) y_i = a n + b sum_(i=1)^(n) bx_i`
    `sum_(i=1)^(n) x_iy_i = a sum_(i=1)^(n) x_i + b sum_(i=1)^(n) x_i^2`

    Solving the resulting system, we find formulas for the coefficients `a` and `b`:

    `a = frac(sum_(i=1)^(n) y_i sum_(i=1)^(n) x_i^2 - sum_(i=1)^(n) x_i sum_(i=1)^(n ) x_iy_i) (n sum_(i=1)^(n) x_i^2 - (sum_(i=1)^(n) x_i)^2)` (3.1)

    `b = frac(n sum_(i=1)^(n) x_iy_i - sum_(i=1)^(n) x_i sum_(i=1)^(n) y_i) (n sum_(i=1)^ (n) x_i^2 - (sum_(i=1)^(n) x_i)^2)` (3.2)

    These formulas have solutions when `n > 1` (the line can be drawn using at least 2 points) and when the determinant `D = n sum_(i=1)^(n) x_i^2 — (sum_(i= 1)^(n) x_i)^2 != 0`, i.e. when the `x_i` points in the experiment are different (i.e. when the line is not vertical).

    Estimation of errors in the coefficients of the regression line

    For a more accurate estimate of the error in calculating the coefficients `a` and `b`, a large number of experimental points is desirable. When `n = 2`, it is impossible to estimate the error of the coefficients, because the approximating line will uniquely pass through two points.

    The error of the random variable `V` is determined error accumulation law
    `S_V^2 = sum_(i=1)^p (frac(partial f)(partial z_i))^2 S_(z_i)^2`,
    where `p` is the number of `z_i` parameters with `S_(z_i)` error that affect the `S_V` error;
    `f` is a dependency function of `V` on `z_i`.

    Let's write the law of accumulation of errors for the error of the coefficients `a` and `b`
    `S_a^2 = sum_(i=1)^(n)(frac(partial a)(partial y_i))^2 S_(y_i)^2 + sum_(i=1)^(n)(frac(partial a )(partial x_i))^2 S_(x_i)^2 = S_y^2 sum_(i=1)^(n)(frac(partial a)(partial y_i))^2 `,
    `S_b^2 = sum_(i=1)^(n)(frac(partial b)(partial y_i))^2 S_(y_i)^2 + sum_(i=1)^(n)(frac(partial b )(partial x_i))^2 S_(x_i)^2 = S_y^2 sum_(i=1)^(n)(frac(partial b)(partial y_i))^2 `,
    because `S_(x_i)^2 = 0` (we previously made a reservation that the error of `x` is negligible).

    `S_y^2 = S_(y_i)^2` - the error (variance, squared standard deviation) in the `y` dimension, assuming that the error is uniform for all `y` values.

    Substituting formulas for calculating `a` and `b` into the resulting expressions, we get

    `S_a^2 = S_y^2 frac(sum_(i=1)^(n) (sum_(i=1)^(n) x_i^2 - x_i sum_(i=1)^(n) x_i)^2 ) (D^2) = S_y^2 frac((n sum_(i=1)^(n) x_i^2 - (sum_(i=1)^(n) x_i)^2) sum_(i=1) ^(n) x_i^2) (D^2) = S_y^2 frac(sum_(i=1)^(n) x_i^2) (D)` (4.1)

    `S_b^2 = S_y^2 frac(sum_(i=1)^(n) (n x_i - sum_(i=1)^(n) x_i)^2) (D^2) = S_y^2 frac( n (n sum_(i=1)^(n) x_i^2 - (sum_(i=1)^(n) x_i)^2)) (D^2) = S_y^2 frac(n) (D) ` (4.2)

    In most real experiments, the value of `Sy` is not measured. To do this, it is necessary to carry out several parallel measurements (experiments) at one or several points of the plan, which increases the time (and possibly cost) of the experiment. Therefore, it is usually assumed that the deviation of `y` from the regression line can be considered random. The variance estimate `y` in this case is calculated by the formula.

    `S_y^2 = S_(y, rest)^2 = frac(sum_(i=1)^n (y_i - a - b x_i)^2) (n-2)`.

    The divisor `n-2` appears because we have reduced the number of degrees of freedom due to the calculation of two coefficients for the same sample of experimental data.

    This estimate is also called the residual variance relative to the regression line `S_(y, rest)^2`.

    The assessment of the significance of the coefficients is carried out according to the Student's criterion

    `t_a = frac(|a|) (S_a)`, `t_b = frac(|b|) (S_b)`

    If the calculated criteria `t_a`, `t_b` are less than the table criteria `t(P, n-2)`, then it is considered that the corresponding coefficient is not significantly different from zero with a given probability `P`.

    To assess the quality of the description of a linear relationship, you can compare `S_(y, rest)^2` and `S_(bar y)` relative to the mean using the Fisher criterion.

    `S_(bar y) = frac(sum_(i=1)^n (y_i - bar y)^2) (n-1) = frac(sum_(i=1)^n (y_i - (sum_(i= 1)^n y_i) /n)^2) (n-1)` - sample estimate of the variance of `y` relative to the mean.

    To evaluate the effectiveness of the regression equation for describing the dependence, the Fisher coefficient is calculated
    `F = S_(bar y) / S_(y, rest)^2`,
    which is compared with the tabular Fisher coefficient `F(p, n-1, n-2)`.

    If `F > F(P, n-1, n-2)`, the difference between the description of the dependence `y = f(x)` using the regression equation and the description using the mean is considered statistically significant with probability `P`. Those. the regression describes the dependence better than the spread of `y` around the mean.

    Click on the chart
    to add values ​​to the table

    Least square method. The method of least squares means the determination of unknown parameters a, b, c, the accepted functional dependence

    The method of least squares means the determination of unknown parameters a, b, c,… accepted functional dependence

    y = f(x,a,b,c,…),

    which would provide a minimum of the mean square (variance) of the error

    , (24)

    where x i , y i - set of pairs of numbers obtained from the experiment.

    Since the condition for the extremum of a function of several variables is the condition that its partial derivatives are equal to zero, then the parameters a, b, c,… are determined from the system of equations:

    ; ; ; … (25)

    It must be remembered that the least squares method is used to select parameters after the form of the function y = f(x) defined.

    If from theoretical considerations it is impossible to draw any conclusions about what the empirical formula should be, then one has to be guided by visual representations, primarily a graphical representation of the observed data.

    In practice, most often limited to the following types of functions:

    1) linear ;

    2) quadratic a .



    Similar articles