### An example of misusing statistics.

A couple days ago, a fellow user on a different forum challenged me with the following information while disputing whether or not the Earth is still warming:

It's well known that global temperature data show autocorrelation, wherein the temperature one month is correlated with the temperatures of previous months. Most of the statistics we use in science assume that there is no correlation between data points, a condition we call "white" noise. Autocorrelation means that the noise isn't white but red noise and that the standard errors and p-values calculated with standard statistics will be far too small. We compensate for autocorrelation using ARIMA to model the red noise and correct the standard errors and p-values. For an example of how to use ARIMA in regression models in R, see this post.

From January 1996 to May 2013, GISS shows second-order red noise, with the best-fit ARIMA(2,0,0). Using generalized least square (GLS) regression to incorporate the ARIMA into the regression and compensate for the autocorrelation gives us this:

It's not just his GISS error term that is problematic, either. UAH also shows autocorrelation of ARIMA(2,0,0) since his claimed start point of 1994. Plugging that into GLS shows this:

The trend ± 95% confidence interval for UAH since 1994 is 0.15004 ± 0.13173ºC per decade, again far below his claimed error of ±

0.173. Oh yes, and it's statistically significant as well, despite his claim.

The only one he came close to getting right? HadCRUT4 shows autocorrelation of ARIMA(1,0,1) since 1995. GLS analysis incorporating that ARIMA model shows the following:

While I'm leaving HadCRUT3 and RSS out for now, there are good reasons for doing so. HadCRUT3 has been replaced by HadCRUT4, making analyzing both HadCRUT3 and HadCRUT4 redundant. RSS shows false cooling since 2000 due to orbital decay, as Roy Spencer pointed out two years ago, making it unreliable. However, what I have done should give the general idea. His error terms are generally larger than they should be.

I don't know exactly where he's getting his error terms (he's so far refused to divulge that information), but I suspect that he's (mis)using the trend calculator at Skeptical Science. If you use the Advanced Options, there's the option to correct for autocorrelation, with a default start date of 1980 and a default end date of 2010. I suspect that he's calculating the trend for different time periods using the default autocorrelation time period. That's invalid because, just like the trend changes depending on the time period, the best-fit autocorrelation model changes depending on the time period. You cannot calculate the autocorrelation in GISS for 1980–2010 and mindlessly apply it to the trend for 1996–2013. If I'm correct in the source for his statistics, then what this really shows is the danger of using online tools when you don't understand how the underlying statistics work. And my challenger doesn't understand–he didn't even know what an ARIMA model was.

"For RSS the warming is not significant for over 23 years.Looks like no global warming, right? I mean, it can't be statistically significant if the error terms are larger than the average rate of rise, right? Wrong. This is a prime example of lying by misusing statistics and hoping that the other person doesn't know enough to catch the lie. The lie in this case? Those error terms. They're far too large. I'll explain why using the "GISS shows no significant warming since 1996" as an example.

For RSS: +0.127 +/-0.136 C/decade at the two sigma level from 1990

For UAH, the warming is not significant for over 19 years.

For UAH: 0.143 +/- 0.173 C/decade at the two sigma level from 1994

For Hacrut3, the warming is not significant for over 19 years.

For Hadcrut3: 0.098 +/- 0.113 C/decade at the two sigma level from 1994

For Hacrut4, the warming is not significant for over 18 years.

For Hadcrut4: 0.095 +/- 0.111 C/decade at the two sigma level from 1995

For GISS, the warming is not significant for over 17 years.

For GISS: 0.116 +/- 0.122 C/decade at the two sigma level from 1996"

It's well known that global temperature data show autocorrelation, wherein the temperature one month is correlated with the temperatures of previous months. Most of the statistics we use in science assume that there is no correlation between data points, a condition we call "white" noise. Autocorrelation means that the noise isn't white but red noise and that the standard errors and p-values calculated with standard statistics will be far too small. We compensate for autocorrelation using ARIMA to model the red noise and correct the standard errors and p-values. For an example of how to use ARIMA in regression models in R, see this post.

From January 1996 to May 2013, GISS shows second-order red noise, with the best-fit ARIMA(2,0,0). Using generalized least square (GLS) regression to incorporate the ARIMA into the regression and compensate for the autocorrelation gives us this:

Generalized least squares fit by REMLConverting the standard error to a 95% confidence interval, we get the following trend ± 95% confidence interval for GISS since 1996:

Model: GISS.96 ~ time(GISS.96)

Data: NULL

AIC BIC logLik

-329.9748 -313.2871 169.9874

Correlation Structure: ARMA(2,0)

Formula: ~1

Parameter estimate(s):

Phi1 Phi2

0.4412208 0.2299560

Coefficients:

Value Std.Error t-value p-value

(Intercept) -21.047794 8.357168 -2.518532 0.0125

time(GISS.96) 0.010767 0.004169 2.582892 0.0105

Correlation:

(Intr)

time(GISS.96) -1

Standardized residuals:

Min Q1 Med Q3 Max

-2.62471410 -0.72152714 0.01100717 0.61391719 3.00939636

Residual standard error: 0.1307708

Degrees of freedom: 210 total; 208 residual

GISS: 0.10767 ± 0.08171ºC per decadeNote that the trend is statistically significant, with a p-value of 0.0105. Now, I don't know where my challenger got his "2 sigma" error term of ±0.122 from but it's not from any sort of regression analysis. Neither regular regression nor autocorrelated regression shows an error term anywhere near that large. My data is a bit more recent than his, accounting for the differences in the calculated warming rates but there's no way that would have messed up the error term to the point of nearly doubling it. So just where did he get it?

It's not just his GISS error term that is problematic, either. UAH also shows autocorrelation of ARIMA(2,0,0) since his claimed start point of 1994. Plugging that into GLS shows this:

Generalized least squares fit by REML

Model: UAH.94 ~ time(UAH.94)

Data: NULL

AIC BIC logLik

-349.2611 -332.0274 179.6306

Correlation Structure: ARMA(2,0)

Formula: ~1

Parameter estimate(s):

Phi1 Phi2

0.6099461 0.2143155

Coefficients:

Value Std.Error t-value p-value

(Intercept) -29.943202 13.467363 -2.223390 0.0272

time(UAH.94) 0.015004 0.006721 2.232317 0.0265

Correlation:

(Intr)

time(UAH.94) -1

Standardized residuals:

Min Q1 Med Q3 Max

-2.33041950 -0.62247393 -0.03907016 0.48623254 3.49613665

Residual standard error: 0.1778559

Degrees of freedom: 234 total; 232 residual

The trend ± 95% confidence interval for UAH since 1994 is 0.15004 ± 0.13173ºC per decade, again far below his claimed error of ±

0.173. Oh yes, and it's statistically significant as well, despite his claim.

The only one he came close to getting right? HadCRUT4 shows autocorrelation of ARIMA(1,0,1) since 1995. GLS analysis incorporating that ARIMA model shows the following:

Generalized least squares fit by REMLThe trend ± 95% confidence interval this time is 0.07982 ± 0.10310ºC per decade. This is, quite frankly, the only error term that he states where my analysis agrees with his. HadCRUT4 just doesn't show any statistically significant trend since 1995. However, start from 1994 and and the nature of the noise becomes ARIMA(2,0,0) as GISS and UAH show over their time frames. That trend IS statistically significant (0.12089 ± 0.08283ºC per decade, p = 0.0046).

Model: HadCRUT4.95 ~ time(HadCRUT4.95)

Data: NULL

AIC BIC logLik

-395.5914 -378.6461 202.7957

Correlation Structure: ARMA(1,1)

Formula: ~1

Parameter estimate(s):

Phi1 Theta1

0.8800097 -0.4207543

Coefficients:

Value Std.Error t-value p-value

(Intercept) -15.566915 10.54182 -1.476682 0.1412

time(HadCRUT4.95) 0.007982 0.00526 1.517508 0.1306

Correlation:

(Intr)

time(HadCRUT4.95) -1

Standardized residuals:

Min Q1 Med Q3 Max

-2.241658378 -0.671252575 0.008621077 0.593698960 2.877875246

Residual standard error: 0.1306724

Degrees of freedom: 221 total; 219 residual

While I'm leaving HadCRUT3 and RSS out for now, there are good reasons for doing so. HadCRUT3 has been replaced by HadCRUT4, making analyzing both HadCRUT3 and HadCRUT4 redundant. RSS shows false cooling since 2000 due to orbital decay, as Roy Spencer pointed out two years ago, making it unreliable. However, what I have done should give the general idea. His error terms are generally larger than they should be.

I don't know exactly where he's getting his error terms (he's so far refused to divulge that information), but I suspect that he's (mis)using the trend calculator at Skeptical Science. If you use the Advanced Options, there's the option to correct for autocorrelation, with a default start date of 1980 and a default end date of 2010. I suspect that he's calculating the trend for different time periods using the default autocorrelation time period. That's invalid because, just like the trend changes depending on the time period, the best-fit autocorrelation model changes depending on the time period. You cannot calculate the autocorrelation in GISS for 1980–2010 and mindlessly apply it to the trend for 1996–2013. If I'm correct in the source for his statistics, then what this really shows is the danger of using online tools when you don't understand how the underlying statistics work. And my challenger doesn't understand–he didn't even know what an ARIMA model was.

Article edited on Aug. 23, 2013 to reflect new information.

ReplyDelete