Python-Old-Version/C3/Least-Square-Fit/English
In this tutorial we shall look at obtaining the least squares fit of a given data-set. For this purpose, we shall use the same pendulum data that we used in the tutorial on plotting from files.
To be able to follow this tutorial comfortably, you should have a working knowledge of arrays, plotting and file reading.
A least squares fit curve is the curve for which the sum of the squares of it's distance from the given set of points is minimum.
Previously, when we plotted the data from pendulum.txt we got a scatter plot of points as shown.
In our example, we know that the length of the pendulum is proportional to the square of the time-period. But when we plot the data using lines we get a distorted line as shown. What we expect ideally, is something like the redline in this graph. From the problem we know that L is directly proportional to T^2. But experimental data invariably contains errors and hence does not produce an ideal plot. The best fit curve for this data has to be a linear curve and this can be obtained by performing least square fit on the data set. We shall use the lstsq function to obtain the least squares fit curve.
The equation of the line is of the form T^2 = mL+c. We have a set of values for L and the corresponding T^2 values. Using this, we wish to obtain the equation of the straight line.
In matrix form the equation is represented as shown, Tsq = A.p where Tsq is an NX1 matrix, and A is an NX2 matrix as shown. And p is a 2X1 matrix of the slope and Y-intercept. In order to obtain the least square fit curve we need to find the matrix p.
Let's get started. As you can see, the file pendulum.txt is on our Desktop and hence we navigate to the Desktop by typing
$ cd Desktop
Let's now fire up IPython:
$ ipython -pylab
We have already seen (in a previous tutorial), how to read a file and obtain the data set using loadtxt(). Let's quickly get the required data from our file.
l, t = loadtxt('pendulum.txt', unpack=True)
loadtxt() directly stores the values in the pendulum.txt into arrays l and t. Let's now calculate the values of square of the time-period.
tsq = t*t
Now we shall obtain A, in the desired form using some simple array manipulation
A = array([l, ones_like(l)])
As we have seen in a previous tutorial, ones_like() gives an array similar in shape to the given array, in this case l, with all the elements as 1. Please note, this is how we create an array from an existing array.
Let's now look at the shape of A.
A.shape
This is an 2X90 matrix. But we need a 90X2 matrix, so we shall transpose it.
A = A.T
Type A, to confirm that we have obtained the desired array.
A
Also note the shape of A.
A.shape
We shall now use the lstsq function, to obtain the coefficients m and c. lstsq returns a lot of things along with these coefficients. We may look at the documentation of lstsq, for more information by typing lstsq?
result = lstsq(A,tsq)
We extract the required coefficients, which is the first element in the list of things that lstsq returns, and store them into the variable coef.
coef = result[0]
To obtain the plot of the line, we simply use the equation of the line, we have noted before. T^2 = mL + c.
Tline = coef[0]*l + coef[1] plot(l, Tline, 'r')
Also, it would be nice to have a plot of the points. So,
plot(l, tsq, 'o')
This brings us to the end of this tutorial. In this tutorial, you've learnt how to obtain a least squares fit curve for a given set of points using lstsq. There are other curve fitting functions available in Pylab such as polyfit.
Hope you enjoyed it. Thanks.