Author Topic: Multiple backgrounds error calculation  (Read 4157 times)

Julien

  • Professor
  • ****
  • Posts: 84
  • The truth is out there...
    • Geoloweb - J. Allaz personal website
Multiple backgrounds error calculation
« on: March 17, 2015, 02:58:03 PM »
Hi,

This might appear to be a trivial question, but my statistics skills are pretty bad... Can someone explain to me how to calculate the error on the background measurement *below the peak* when we measure MULTIPLE backgrounds on each side and apply an exponential interpolation? My thoughts are that we should be able to calculate an error envelope around the exponential regression, and that the error on the background under the peak should be much lower than the individual errors on each background. Unfortunately I have no idea how such an error envelope is calculated (some program can do it, e.g. with isoplot, but when I tried, it bugs as the probability of fitting ends up to be close to zero)...

Since we are dealing with an exponential regression, I would guess a first step would be to change the exponential relation to a more easy to handle linear relation: if y = f(x) = a * exp (b * x) with y = counts and x = spectrometer position, then we can also write LN (y) = LN (a) + b * x - which is a "simple" linear equation.

Does this sounds correct?

But then... assuming the error on x is negligible (accurate spectrometer positioning) and having an error on the points that define the y = f(x) equation, how to calculate the error at a specific x position?

Maybe, to start the discussion, someone should remind me on the "simple" process how the background error is calculated on TWO points with an exponential curvature?

Best, and thanks in advance for your help!

Julien

Julien

  • Professor
  • ****
  • Posts: 84
  • The truth is out there...
    • Geoloweb - J. Allaz personal website
Re: Multiple backgrounds error calculation
« Reply #1 on: March 18, 2015, 07:44:21 PM »
**UPDATE**

The background interpolation appears to fit better an exponential equation (or a polynomial - but this would further complicate the statistics I believe…). The difference is small, but can become significant for trace element analysis. This has been discussed in Jercinovic et 2008 (Chemical Geology 254, 197–215) and reference therein. See attachment with an exponential and a power interpolation. Errors - just barely visible behind the points - on each background obtained by doubling the standard deviation on 5 measurements. The WDS region correspond to Pb Ma-Mb in monazite (obtained on very large PET [VLPET]).

---Suggestion from Les LG Moore (SX-50 email list response...)---

The 95% confidence interval around a regression can be found here:
 
http://www.real-statistics.com/regression/confidence-and-prediction-intervals/
 
Of course this is for linear regression.
Simply convert the data space into a linear space by the appropriate transform and work it there.
 
You could also use the more clever regression packages such as SAS in which it is simply any option in the job submission code and can work on all of it’s regression forms.
 
Les
 

AndrewLocock

  • Professor
  • ****
  • Posts: 104
    • University of Alberta Electron Microprobe Laboratory
Re: Multiple backgrounds error calculation
« Reply #2 on: March 25, 2015, 10:31:14 AM »
It is possible to approximate the uncertainty in an exponential curve or power law curve in Excel by Monte Carlo (brute force) methods as described below.

In a chart of a set of XY points, Excel can generate an exponential or power-law trendline, whose equation and R^2 value can be reported on that chart.

For example, one uses Excel to plot these 11 points:
X1   0.533   Y1   1.9
X2   0.545   Y2   1.7
X3   0.554   Y3   1.6
X4   0.558   Y4   1.6
X5   0.568   Y5   1.5
X6   0.572   Y6   1.455
X7   0.62   Y7   1.03
X8   0.624   Y8   1
X9   0.628   Y9   0.97
X10   0.632   Y10   0.93
X11   0.634   Y11   0.89

In a chart of these points, Excel generates an exponential trendline, y = a e^bx:
y = 85.281e^(-7.143x) with R^2 0.9963, or a power-law trendline, y = a x^b:
y = 0.137x^(-4.19) with R^2 0.9949.

Rather than plotting a chart, the coefficients of these functions can be calculated by using combinations of the LINEST, LN, EXP and INDEX functions as follows.

For an exponential relationship:
a: =EXP(INDEX(LINEST(LN(Y-range of cells),(X-range-of-cells),TRUE),2))
b: =INDEX(LINEST(LN(Y-range-of-cells),(X-range-of-cells),true),1)
For the 11 data points above, a =85.28146, b = -7.14325

For a power-law relationship:
a: =EXP(INDEX(LINEST(LN(Y-range-of-cells),LN(X-range-of-cells),TRUE),2))
b: =INDEX(LINEST(LN(Y-range-of-cells),LN(X-range-of-cells),TRUE),1)
For the 11 data points above, a = 0.137015, b = -4.18959

A value of Y for any point X on the curve can then be calculated from either of these relationships.

The question that arises is:
What is the uncertainty of a newly calculated Y-value, based on the uncertainties in the points that were used to establish the curve?

Assumptions: for each X and each Y value used to establish the curve there is some error of a known or estimable magnitude, and that error is normally distributed.

For the purposes of this example, assume 1% error in each of the X and Y values listed above.
Thus each listed value of X (X1 to X11) or Y (Y1 to Y11) is a mean value, and the standard deviation of that mean is 0.01X and 0.01Y respectively.

Using the Excel functions NORMINV and RAND, it is possible to generate a large (preferably >5000; rule of thumb) fictive set of values for each X and Y point. So, for example, the set of 10,000 X values corresponding to X1 will have mean 0.533 and standard deviation 0.0053.

The coefficients (a and b) of the exponential relationship, and of the power-law relationship, can then be calculated for the large set of normally-distributed fictive values.

Finally, the resulting large set of exponential and power-law relationships (with varying coefficients a and b) can be evaluated for a given value of X, and the average value of Y determined for each relationship, and the standard deviations of this average Y-value calculated.

For the XY points listed above, a set of 30,000 normally distributed fictive values gives
-    for the exponential relationship at X = 0.600, Y = 1.175, with one-sigma standard deviation  0.017;
-   for the same data, the power-law relationship gives at X = 0.600, Y = 1.167, with one-sigma standard deviation 0.016.

Thus, for this example, the uncertainty in Y at X = 0.600 is slightly lower for the power-law relationship than for the exponential relationship.

This brute-force method is calculation intensive – expansion from 10,000 to a set of 100,000 fictive data for each original XY point involves a 10-fold increase in the size (megabytes) of the spreadsheet.

A spreadsheet with 10,000 fictive data is uploaded along with this post.

The method is essentially equivalent to partial error propagation, and does not take into account any covariance in the uncertainties.  See: Giaramita, M. J., and Day, H. W. (1990) Error propagation in calculations of structural formulas. American Mineralogist 75(1-2), 170-182.

The following website was useful in determining the grammar of the Excel expressions:
https://newtonexcelbach.wordpress.com/2011/01/19/using-linest-for-non-linear-curve-fitting/

AndrewLocock

  • Professor
  • ****
  • Posts: 104
    • University of Alberta Electron Microprobe Laboratory
Re: Multiple backgrounds error calculation (error propagation)
« Reply #3 on: April 16, 2015, 01:02:17 PM »
I have attached a better-annotated Excel spreadsheet, and a Word document that together explain and demonstrate a brute-force (Monte Carlo) method of propagating uncertainties in the cases of exponential or power-law curves. If the starting points (from which such curves are derived) have associated uncertainties, then the coefficients a and b are not constants.

The partial error propagation is examined with specific reference to spectrometer position (sine theta) vs. intensity (cps/nA) relationships.
The resultant propagated uncertainties appear to be non-linearly distributed, and more sensitive to uncertainties in spectrometer position than intensity.
An implication is that the uncertainty of a calculated background position derived from the measurement of multiple points that have similar uncertainties appears to have lower uncertainty than the individual measured points. This is not entirely unexpected: the error propagated into the mean of two values that have equally small independent uncertainties, is smaller than either of the individual uncertainties.