The large
structural uncertainties in observations hamper our ability to determine how well models simulate the tropospheric temperature changes that actually occurred over the satellite era.
There were two strands to our critique: i) that the statistical test they used was not appropriate and ii) that they did not acknowledge the true
structural uncertainty in the observations.
This is of course the same Douglass et al paper that used completely incoherent statistics and deliberately failed to note
the structural uncertainty in the observations.
Not exact matches
There are basically two key points (explored
in more depth here)-- comparisons should be «like with like», and different sources of
uncertainty should be clear, whether
uncertainties are related to «weather» and / or
structural uncertainty in either the
observations or the models.
Because the differences between the various observational estimates are largely systematic and
structural (Chapter 2; Mears et al., 2011), the
uncertainty in the observed trends can not be reduced by averaging the
observations as if the differences between the datasets were purely random.
let's take this to an extreme... suppose that internal variability is zero... then the «within group» s.d. is zero... suppose that models agree pretty well with each other and
observations fall within the tight band of model projections... then by steve's method you create the average of models and call it a model... with an s.d. of zero... show that the model falls outside the observational s.d.... proclaim that the model fails... claim that this is a test of modelling... hence extrapolate that all models fail... even though
observations fall slap bang
in the model range... this result is nonsensical... per tco it isn't how models are used... where's
structural uncertainty?
The two obvious contributors to the
uncertainty are the
structural biases
in the proxies and the sampling error from estimating GAST from 5 - 61 SST
observations.