## Shooting into the dark

Part of what makes uncertainty such a slippery subject is that it conflates several concepts that are better kept apart: **precision**, **accuracy**, and **repeatability**. People often mention the first two, less often the third.

It's clear that precision and accuracy are different things. If someone's shooting at you, for instance, it's better that they are inaccurate but precise so that every bullet whizzes exactly 1 metre over your head. But, though the idea of one-off repeatability is built in to the concept of multiple 'readings', scientists often repeat experiments and this wholesale repeatability also needs to be captured. Hence the third drawing.

One of the things I really like in Peter Copeland's book *Communicating Rocks* is the accuracy-precision-repeatability figure (here's my review). He captured this concept very nicely, and gives a good description too. There are two weaknesses though, I think, in these classic target figures. First, they portray two dimensions (spatial, in this case), when really each measurement we make is on a single axis. So I tried re-drawing the figure, but on one axis:

The second thing that bothers me is that there is an implied 'correct answer'—the middle of the target. This seems reasonable: we are trying to measure some external reality, after all. The problem is that when we make our measurements, we do not know where the middle of the target is. We are blind.

If we don't know where the bullseye is, we cannot tell the difference between precise and imprecise. But if we don't know the size of the bullseye, we also do not know how accurate we are, or how repeatable our experiments are. Both of these things are entirely relative to the nature of the target.

What can we do? Sound statistical methods can help us, but most of us don't know what we're doing with statistics (be honest). Do we just need more data? No. More expensive analysis equipment? No.

**No, none of this will help. You cannot beat uncertainty. You just have to deal with it.**

*This is based on an article of mine in the February issue of the CSEG Recorder. Rather woolly, even for me, it's the beginning of a thought experiment about doing a better job dealing with uncertainty. See Hall, M (2012). Do you know what you think you know? CSEG Recorder, February 2012. Online in May. Figures are here. *

## Reader Comments (5)

Excellent article again!

As you highlight the key with the subsurface is we can never know the true answer. Our data is always a sample of a larger population (often much much larger)

Remembering all of the classes and courses where they have discussed with statistics for geoscientists they normally just touch on the basics of mean, mode etc.

Perhaps, like social sciences, we geoscientists need to improve our background in statistics with focus on the limitations of datasets and what conclusions we can and cannot make. The key difference between statistics for a known population and a sample of a population of an unknown size.

@Adam: great point. You and other might like this article (perfect for anyone confused about p-values!).

Great suggestion. Left me even more reassuringly confused about statistics in general.

However I can weakly try to draw some points from it. Firstly I have never seen any tests of statistical significance applied to subsurface data sets?(would love to see some if you have any ideas).

A hypothesis would be that a lot of the data we are confident in would fail tests of significance.....I'm not confident I could defend that properly though!

However as stated in the article the problem is that there are uncertainties in the p-value. As even after you finished an analysis you would still have an unknown sample space as you stated above

Again a common theme is that a lot of these techniques rely on an estimate of the population size and standard deviation. The "frequency probability" is the common type currently used. http://en.wikipedia.org/wiki/Frequentism. Pete Rose's work and methods is based on a field size distribution,this would suggest there is a known background distribution to all basins, which should always remain as an unknown. Nearly all companies apply these methods into their risking methodologies. So, I think I can, infer that the Pete Rose method is frequentist.

So the bit that makes me scratch my head is why are we applying Bayesian statistics to direct fluid analysis, and frequency probability to a chance of success. Is a better solution outlined in the article you suggested that we should be moving towards a predictive method. (http://en.wikipedia.org/wiki/Predictive_analytics#Statistical_techniques).

You were getting to that in the "RELIABLE PREDICTIONS OF UNLIKELY GEOLOGY" article. The maths an options look very off putting indeed.....

So I'm still confused but it seems statistacians can't make up their mind either. I'll just stick to being happily not very confident about anything.

Came across these resources which may be helpful for those who want to skip the maths and look at demonstrations and graphs.

WolfmanMathWorld — Bayes Theorem. They have easy to run examples that just need one download to run on a web browser. For example: BayessTheoremAndInverseProbability and ComparingAmbiguousInferencesWhenProbabilitiesAreImprecise.

Additionally, Fourier Transforms has many examples for signal processing and many other areas.

The above system is quicker than using Sage or R, but if anyone has the urge.

Statistical Significance, R based input. Found using Dan Goldstein's search for R.

@Adam: Thanks for all the thoughtful input. I hope you don't mind but I turned some of your URLs into links to make them easier to explore. I love the Wolfram widgets.

I feel like there are rigorous ideas and methods, and then there are practical ones we can use. I'm not confident about any of it either. But I do feel drawn towards Bayesian methods, as opposed to frequentist ones, because I think we can often ascribe prior likelihoods in what we do. In fact, I think we kind of do that anyway, even when we use frequentist methods, it's just that we don't articulate them — or even admit to them. We just use them as background constraints on our risk 'analysis' (usually not very analytic m in my experience).