Most Valuable Professional


View Jan Karel Pieterse's profile on LinkedIn subscribe to rss feed
Subscribe in a reader

Subscribe to our mailing list

* indicates required

Audit !!!

Check out our RefTreeAnalyser
the ultimate Excel formula auditing tool.

Trainings

Excel VBA Masterclass (English)
Excel VBA for Financials (Dutch)

Third party tools

Speed up your file

FastExcel
The best tool to optimise your Excel model!

Repair your file

Stellar Phoenix Excel Repair
Best tool to repair corrupt Excel sheets and objects
Home > English site > Articles > Least Squares
Deze pagina in het Nederlands

Fitting curves to your data using least squares

Introduction

If you're an engineer (like I used to be in a previous life), you have probably done your bit of experimenting. Usually, you then need a way to fit your measurement results with a curve. If you're a proper engineer, you also have some idea what type of equation should theoretically fit your data.

Perhaps you did some measurements with results like this:

Fitting data with an equation
Fitting data with an equation.

A well known way to fit data to an equation is by using the least squares method (LS). I won't repeat the theory behind the method here, just read up on the matter by clicking that link to Wikipedia.

Fitting simple linear equations

Excel provides us with a couple of tools to perform Least Squares calculations, but they are all centered around the simpler functions: simple Linear functions of the shape
y=a.x+b, y-a.exp(b.x), y=a.x^b and etcetera. With some tricks you can also perform LS on polynomes using Excel.

Regression tools in the Analysis Toolpak Add-in

Activate the Analysis Toolpak in your list of Add-ins (File button or Office button, Excel Options, Add-ins tab, click Go):

Add-ins list of Excel
The add-ins list of Excel with the Analysis toolpak activated

This adds the "Data Analysis" button to your ribbon, on the Data tab, Analysis group (this is also the location where you can find the Solver button mentioned later on):

Ribbon with Data Analysis button
Ribbon with Data Analysis button

Click that button to explore which regression tools are available.

Worksheet functions

There is a number of worksheet functions which you can also use to do regression analysis. To quickly access them, select an empty cell and click shift+F3 to open the function wizard. In the search box, enter "Regression" (without the quotes of course). Excel will list the relevant functions:

Function wizard showing Regression functions
Function wizard showing Regression functions

Pick one and click on the "Help on this function" link at the bottom of the function wizard to find out more about its use.

Fitting more complex functions

What if you want to fit a more complex function, like y=exp(a.x).sin(x) + b ? How can that be done using Excel?

I devised a way to do this which involves the following steps:

Explanation of the Example file

I created an example file you can put to use directly. Below you will find a link to the file and an explanation on how the file is put together.

Download

Download this file:

Non linear least squares example

How the file works

Data

The calculations and the data are concentrated on Sheet1 of the file. The most important area is the table starting in cell A1:

Data table in LS file
Data table in LS file

Column A holds your x-values and column B holds the y-values. The third column holds the formula that calculates the result of the fitted equation using the constants and the x-values. The sample file has this formula in column C:

=EXP(Const_a*xValues)*SIN(xValues)+Const_b

The fourth column of the table is used to calculate the sum of squares. Formula:

=(B2-C2)^2

As you probably noted already, I used a couple of range names. I explain those below.

Range names

To ease working with the file I created some range names. Instead of using the table references that Excel 2007, 2010 and 2013 offer, I included some dynamic range names that point to the data. This means the workbook also works in Excel 2003 and before.

Range name Refers To Description
Const_a =Sheet1!$G$2 Model constant
Const_b =Sheet1!$G$3 Model constant
Const_c =Sheet1!$G$4 Model constant
Const_d =Sheet1!$G$5 Model constant
Const_e =Sheet1!$G$6 Model constant
Const_f =Sheet1!$G$7 Model constant
Const_g =Sheet1!$G$8 Model constant
Const_h =Sheet1!$G$9 Model constant
Constants =Sheet1!$G$2:$G$9 constants of equation
xValues =OFFSET(Sheet1!$A$2,0,0,COUNT(Sheet1!$A$1:$A$65551),1) Column with x values
yDelta =OFFSET(xValues,0,3) Column with Squared differences
yhat =OFFSET(xValues,0,2) Column with model fit results
yValues =OFFSET(xValues,0,1) Column with y values

Constants of the equation

The const range names point to a second table in the file:

Constants table
Constants table

This table is where you enter your first initial guesses for the resulting constants and where the Solver add-in also returns the results. As you can see, below that table the residual Sum of Squares is shown. Formula:

=SUM(yDelta)

It is this cell G11 that we try to minimize using the Solver add-in.

Using Solver

First of all, you need to install the Solver add-in. Use the Add-ins dialog I showed at the top of this article and check the box next to "Solver Add-in". This adds the Solver button in the same location on the ribbon as the "Data Analysis" button I showed before.

After you have ensured the model formula is correctly entered in column C and the calculations work, click the Solver button. The dialog below is shown:

The Solver dialog
The Solver dialog

Make sure the "Set Objective" box points to the cell that contains the sum of squares. Select "Min" next to "To".

The "By Changing Variable cells" box must ONLY point to the cells that are used by your model, otherwise the degrees of freedom calculation (on the ANOVA sheet) will be wrong. Also ensure that any unused constant cells are empty by selecting them and hitting the del key.

Note that depending on your model type you may have to change the solver settings. A bit of experimenting may be needed for best results. You can save and load Solver settings using the appropriate button.

So be prudent and critical on whether or not you have actually reached a best fit, the Solver may come up with non-optimal results, depending on your model equation and solver settings.

If you're happy with the current Solver settings, click Solve. After some time the "Solver Results" dialog opens, giving you some options on how to continue. Note that it also enables you to ask for a couple of reports.

The example file shows the end result:

The end result
The end result

Analysis of Variance

On the ANOVA tab, you can find the ANalysis Of VAriance table, which looks like this:

The ANOVA table
The ANOVA table

The most important cell here is cell F2. If the value in that cell is less than 0.05, there is a 95% probability your model is correctly fitting the data. So less is more for this cell, you want it to stay below 0.05. The cell will turn red for values over 0.05.

Please check whether the value in cell B2 is exactly one less than the number of constants you used for the model. If not, go back to Sheet1 and empty the cells not used by your model. So if you used const_a and const_b, then the value of B2 (model degrees of freedom) should be 1.

Conclusion

As you've seen fitting complex functions to your data isn't very hard to do. A combination of some relatively simple formulas and the Solver Add-in comes to the rescue here.

Some advice as one engineer to another; Be critical please. Don't believe everything Excel tells you! Carefully analyse the results it returns, as Solver may get things wrong and not give you the best possible result!


Comments

Showing last 8 comments of 77 in total (Show All Comments):

 


Comment by: Jan Karel Pieterse (1/23/2017 2:46:53 PM)

Hi Sacha,

For problems like that dedicated stats packages are a lot more capable.

 


Comment by: sumaira ibrahim (1/31/2017 9:46:50 PM)

very helpful. i am a chemistry researcher and i need this for multi component analysis.

 


Comment by: Anthony Lucio (2/2/2017 9:00:43 PM)

Hello,

I am attempting to use your spreadsheet to model my own data and curves. One question I have is how do you fit more than two parameters? The one discussed above uses two fit parameters but I would like to fit either three, six or nine parameters. Basically, can we extend the worksheet to optimize for Const_c through Const_i? I should mention I am using complex variable equations. I have gotten to the point where I can fit a curve manually by guess/check with MS Excel for three input parameters but I would like to extend it to six or nine, which is cumbersome to attempt manually and would be easier done with the Solver tool IMO. Thanks in advance for any help! Feel free to send me an email as well.

Cheers,
Anthony

 


Comment by: Jan Karel Pieterse (2/3/2017 11:49:21 AM)

Hi Anthony,

You should be able to use the default 8 parameters the file already allows you to use (const_a to h). If you need more, you can extend the table containing the constants. You do need to make sure each constant has a range name pointing to its cell. You can do so by selecting the table (not its header) with the constants and choosing Formulas, Create From Selection and only checking the box "Left Column".

 


Comment by: Gosia (2/8/2017 7:13:25 PM)

Hello,

I try to find 2 parameters within the function containing exponential functions and changing with time two variables:m and p.
I've tried to insert columns to the right of the column with x values (time in my case). Unfortunately, solver doesn't work.
Is there any reason for that?

I've called my columns x, m, p containing values at a specified times. My modelled solution I put in the "yhat" column and real data of solution in y column.

I think that I'm not aware of some function of solver (I don't know, maybe there's a different way to mark those 3 columns as variables changing with time), I'll appreciate any hint.

Gosia

 


Comment by: Roman Pienzer (2/9/2017 12:43:22 PM)

Hi Jan,

a question regarding the ANOVA.

In the sheet that you provided, the degrees of freedom are calculated with reference to the amount of yvalues.

Conducting an ANOVA with Excel's built in Data Analysis Tool on the other hand, the degrees of freedom are calculated as K*(n-1), which means both the number of yvalues and yhats are being counted.

Obviously this leads to differing results, in my case with df(JKP)=36 vs df(Excel)=72.

How do I choose the appropriate method? What's the rationale behind only counting the yvalues?

Many thanks for the great sheet and your support in advance!

Roman

 


Comment by: Jan Karel Pieterse (2/9/2017 4:01:04 PM)

Hi Roman,

As far as I know, the number of degrees of freedom equals:

Number of y-values - Number of constants in model - 1

Which is what the formula calculates. But I might be wrong of course :-)

 


Comment by: Jan Karel Pieterse (2/9/2017 4:02:37 PM)

Hi Gosia,

I'm not sure what the problem is I'm afraid :-)

 


Have a question, comment or suggestion? Then please use this form.

If your question is not directly related to this web page, but rather a more general "How do I do this" Excel question, then I advise you to ask your question here: www.eileenslounge.com.

Please enter your name (required):

Your e-mail address (optional but if you want me to respond it helps!; will not be shown, nor be used to send you unsolicited information):

Your request or comment:

To post VBA code in your comment, use [VB] tags, like this: [VB]Code goes here[/VB].