An example of misusing statistics.

A couple days ago, a fellow user on a different forum challenged me with the following information while disputing whether or not the Earth is still warming:

"For RSS the warming is not significant for over 23 years.
For RSS: +0.127 +/-0.136 C/decade at the two sigma level from 1990
For UAH, the warming is not significant for over 19 years.
For UAH: 0.143 +/- 0.173 C/decade at the two sigma level from 1994
For Hacrut3, the warming is not significant for over 19 years.
For Hadcrut3: 0.098 +/- 0.113 C/decade at the two sigma level from 1994
For Hacrut4, the warming is not significant for over 18 years.
For Hadcrut4: 0.095 +/- 0.111 C/decade at the two sigma level from 1995
For GISS, the warming is not significant for over 17 years.
For GISS: 0.116 +/- 0.122 C/decade at the two sigma level from 1996"

Looks like no global warming, right? I mean, it can't be statistically significant if the error terms are larger than the average rate of rise, right? Wrong. This is a prime example of lying by misusing statistics and hoping that the other person doesn't know enough to catch the lie. The lie in this case? Those error terms. They're far too large. I'll explain why using the "GISS shows no significant warming since 1996" as an example.

It's well known that global temperature data show autocorrelation, wherein the temperature one month is correlated with the temperatures of previous months. Most of the statistics we use in science assume that there is no correlation between data points, a condition we call "white" noise. Autocorrelation means that the noise isn't white but red noise and that the standard errors and p-values calculated with standard statistics will be far too small. We compensate for autocorrelation using ARIMA to model the red noise and correct the standard errors and p-values. For an example of how to use ARIMA in regression models in R, see this post.

From January 1996 to May 2013, GISS shows second-order red noise, with the best-fit ARIMA(2,0,0). Using generalized least square (GLS) regression to incorporate the ARIMA into the regression and compensate for the autocorrelation gives us this:

Generalized least squares fit by REML
Model: GISS.96 ~ time(GISS.96)
Data: NULL
        AIC       BIC   logLik
-329.9748 -313.2871 169.9874

Correlation Structure: ARMA(2,0)
Formula: ~1
Parameter estimate(s):
     Phi1      Phi2
0.4412208 0.2299560

Coefficients:
                             Value        Std.Error    t-value      p-value
(Intercept)        -21.047794   8.357168   -2.518532   0.0125
time(GISS.96)    0.010767    0.004169    2.582892 0.0105

Correlation:
             (Intr)
time(GISS.96) -1

Standardized residuals:
        Min                  Q1              Med         Q3              Max
-2.62471410   -0.72152714 0.01100717 0.61391719 3.00939636

Residual standard error: 0.1307708
Degrees of freedom: 210 total; 208 residual

Converting the standard error to a 95% confidence interval, we get the following trend ± 95% confidence interval for GISS since 1996:

GISS: 0.10767 ± 0.08171ºC per decade

Note that the trend is statistically significant, with a p-value of 0.0105. Now, I don't know where my challenger got his "2 sigma" error term of ±0.122 from but it's not from any sort of regression analysis. Neither regular regression nor autocorrelated regression shows an error term anywhere near that large. My data is a bit more recent than his, accounting for the differences in the calculated warming rates but there's no way that would have messed up the error term to the point of nearly doubling it. So just where did he get it?

It's not just his GISS error term that is problematic, either. UAH also shows autocorrelation of ARIMA(2,0,0) since his claimed start point of 1994. Plugging that into GLS shows this:

Generalized least squares fit by REML
Model: UAH.94 ~ time(UAH.94)
Data: NULL
        AIC       BIC          logLik
-349.2611 -332.0274 179.6306

Correlation Structure: ARMA(2,0)
Formula: ~1
Parameter estimate(s):
     Phi1      Phi2
0.6099461 0.2143155

Coefficients:
                              Value         Std.Error    t-value p-value
(Intercept)        -29.943202 13.467363 -2.223390   0.0272
time(UAH.94)     0.015004   0.006721 2.232317   0.0265

Correlation:
                       (Intr)
time(UAH.94) -1

Standardized residuals:
        Min             Q1           Med               Q3              Max
-2.33041950   -0.62247393 -0.03907016 0.48623254 3.49613665

Residual standard error: 0.1778559
Degrees of freedom: 234 total; 232 residual

The trend ± 95% confidence interval for UAH since 1994 is 0.15004 ± 0.13173ºC per decade, again far below his claimed error of ±
0.173. Oh yes, and it's statistically significant as well, despite his claim.

The only one he came close to getting right? HadCRUT4 shows autocorrelation of ARIMA(1,0,1) since 1995. GLS analysis incorporating that ARIMA model shows the following:

Generalized least squares fit by REML
Model: HadCRUT4.95 ~ time(HadCRUT4.95)
Data: NULL
        AIC       BIC   logLik
-395.5914 -378.6461 202.7957

Correlation Structure: ARMA(1,1)
Formula: ~1
Parameter estimate(s):
      Phi1     Theta1
0.8800097 -0.4207543

Coefficients:
                                           Value       Std.Error t-value    p-value
(Intercept)      -15.566915 10.54182    -1.476682   0.1412
time(HadCRUT4.95)      0.007982   0.00526       1.517508   0.1306

Correlation:
                   (Intr)
time(HadCRUT4.95) -1

Standardized residuals:
         Min                     Q1         Med                Q3                Max
-2.241658378   -0.671252575 0.008621077 0.593698960 2.877875246

Residual standard error: 0.1306724
Degrees of freedom: 221 total; 219 residual

The trend ± 95% confidence interval this time is 0.07982 ± 0.10310ºC per decade. This is, quite frankly, the only error term that he states where my analysis agrees with his. HadCRUT4 just doesn't show any statistically significant trend since 1995. However, start from 1994 and and the nature of the noise becomes ARIMA(2,0,0) as GISS and UAH show over their time frames. That trend IS statistically significant (0.12089 ± 0.08283ºC per decade, p = 0.0046).

While I'm leaving HadCRUT3 and RSS out for now, there are good reasons for doing so. HadCRUT3 has been replaced by HadCRUT4, making analyzing both HadCRUT3 and HadCRUT4 redundant. RSS shows false cooling since 2000 due to orbital decay, as Roy Spencer pointed out two years ago, making it unreliable. However, what I have done should give the general idea. His error terms are generally larger than they should be.

I don't know exactly where he's getting his error terms (he's so far refused to divulge that information), but I suspect that he's (mis)using the trend calculator at Skeptical Science. If you use the Advanced Options, there's the option to correct for autocorrelation, with a default start date of 1980 and a default end date of 2010. I suspect that he's calculating the trend for different time periods using the default autocorrelation time period. That's invalid because, just like the trend changes depending on the time period, the best-fit autocorrelation model changes depending on the time period. You cannot calculate the autocorrelation in GISS for 1980–2010 and mindlessly apply it to the trend for 1996–2013. If I'm correct in the source for his statistics, then what this really shows is the danger of using online tools when you don't understand how the underlying statistics work. And my challenger doesn't understand–he didn't even know what an ARIMA model was.

Search This Blog

Seeing the environmental forest

An example of misusing statistics.

Comments

Post a Comment

Popular posts from this blog

John L. Casey and climate denial

Enough hockey sticks for a team

Tom Luongo's multiple lies about climate change