## Are you a poet or a mathematician?

Many geologists can sometimes be rather prone to a little woolliness in their language. Perhaps because you cannot prove anything in geology (prove me wrong), or because everything we do is doused in interpretation, opinion and even bias, we like to beat about the bush. A lot.

Sometimes this doesn't matter much. We're just sparing our future self from a guilty binge of word-eating, and everyone understands what we mean—no harm done. But there are occasions when a measure of unambiguous precision is called for. When we might want to be careful about the technical meanings of words like *approximately*, *significant*, and *certain*.

Sherman Kent was a CIA analyst in the Cold War, and he tasked himself with bringing quantitative rigour to the language of intelligence reports. He struggled (and eventually failed), meeting what he called *aesthetic* opposition:

What slowed me up in the first instance was the firm and reasoned resistance of some of my colleagues. Quite figuratively I am going to call them the

—as opposed to thepoets—in my circle of associates, and if the term conveys a modicum of disapprobation on my part, that is what I want it to do. Their attitude toward the problem of communication seems to be fundamentally defeatist. They appear to believe the most a writer can achieve when working in a speculative area of human affairs is communication in only the broadest general sense. If he gets the wrong message across or no message at all—well, that is life.mathematiciansSherman Kent, Words of Estimative Probability, CIA Studies in Intelligence, Fall 1964

Kent proposed using some specific words to convey specific levels of certainty (right). We have used these words in our mobile app Risk*. The only modification I made was setting *P* = 0.99 for *Certain*, and *P* = 0.01 for *Impossible* (see my remark about proving things in geology).

There are other schemes. Most petroleum geologists know Peter Rose's work. A common language, with some quantitative meaning, can dull the pain of prospect risking sessions. Almost certainly. Probably.

**Do you use systematic descriptions of uncertainty? Do you think they help? How can we balance our poetic side of geology with the mathematical?**

As I was adding to the SubSurfWiki page for WEPs, I came across a wonderful article by Bernie O'Brien (1989), a physician. He interviewed 56 doctors about twenty-three words and phrases, asking them to place them on a probability scale. He then used the interquartile range of the responses as an indication of ambiguity. Here's how his data plot:

This is very interesting: words used to express equivocality and fence-sitting are themselves ambiguous and uncertain. That makes intuitive sense, but it's a fascinating insight into the language of uncertainty.

O'Brien also compared the interquartile range to a 3-point rating of ambiguity, as given by the respondents. You can read more about it on the wiki page.

**Reference**

O'Brien, B (1989), Words or numbers? The evaluation of probability expressions in general practice. Journal of the Royal College of General Practitioners 39, p 98–100, March 1989. Link to PDF.

## Reader Comments (13)

"Unwary readers should take warning that ordinary language undergoes modification to a high-pressure form when applied to the interior of the Earth. A few examples of equivalents follow:"

High Pressure FormOrdinary MeaningCertain Dubious

Undoubtedly Perhaps

Positive proof Vague suggestion

Unanswerable argument Trivial objection

Pure iron Uncertain mixture of all the elements

-Francis Birch

@Toastar: Ha! That is brilliant, thank you!

A similar exercise was done by the IPCC to translate their probabilities related to climate predictions into 'everyday' phrases of certainty/uncertainty.

Two examples I can think of that do BOTH 'poetry' and 'mathematics' very well are Philip Allen and Chris Paola. They both write wonderfully, with evocative descriptions and explanations of quantitative concepts for which they also lay out the equations.

I love that second figure. The problem, I think, when communicating certainty and uncertainty with a wider audience is that most people's thinking on probabilities seems to be rather binary in nature, and language tends to be forced into bimodal bins: 'probable' is interpreted as 'very likely' or 'almost certain' and 'probably not' translates as 'not at all likely'. So the fact that something described as probable might not happen 1 time in 3, or that something described as not probably will happen 1 time in 3, is not appreciated at the time, and used as evidence of something fishy going on later.

Then, of course, there's the particle physicists, who only accept existence past the 99.9% percentile or something.

@Brian: Great tip— thank you. I added a section on the IPCC's WEPs to the wiki page. Thanks for that. The document I reference is full of interesting stuff about reporting on uncertain models.

And I love the idea that we can be poets

andmathematicians. My new goal in life!@Chris: I think that's a really important point: the flip side of uncertainty is the certainty of the occasional anti-outcome. A 40% chance of showers means it will probably be dry. A 90% pass rate means one in ten fail. Perhaps this is intuitive for some people, maybe most scientists, but if the anti-outcome is especially bad (or good), then it's worth knowing about. I guess this is the basis for buying a lottery ticket!

There is one slippery class of event though: the one-off. Things like drilling a single oil well, or predicting the outcome of a single football game, are conceptually tricky for me. There will not be a succession of trials, converging on some predicted frequency like dice rolls. The thing will happen once, and the probability will collapse into a Schrödinger-esque finality. The prediction is good or bad, the probability irrelevant in retrospect.

I'll go along with your dichotomy of people into poets and mathematicians,

notwithstanding people who are competent at both, for they typically know when to use which.

I see the problem that you start with a fuzzy estimate and fuzz it even worse by trying to express it in "everyday language".

DON'T DO THAT.

If people cannot be bothered to understand a numerical probability with error term (P = 0.83 +- 0.1)

they don't deserve any information.

Poetry is not the tool for transmitting quantitative information.

I "might" understand what I am "tying" to say, but the "chances" that your neurons will reconstruct a "similar" or even an "overlapping" mental map of my concept is "practically nil" or at least "unpredictably unreliable".

The chart of words expressing uncertainty should adequately convince us that words alone will not succeed in communicating the information. Giving someone a warm-fuzzy feeling is not the same as expressing a fuzzy measurement.

Use math to express math.

SI units are great for measurement. Non-standard units can work in context, too.

I can deal with a 2x4 not having a 1:2 aspect ratio.

I can deal with needing a 2x4 at least 10 feet long.

I stop short of "a long board"... it takes poetic license to imagine scenarios where that is adequate info...

"Timmy fell through the ice. Lassy, run and get a long board. "

(and call 911 while you are at it)

@Rik: Thanks for reading, and for the image of Lassie with a longboard in her mouth.

I can see how I gave the impression I was proposing casting probabilities as WEPs, especially with the tables arranged as they are. But no — I am with you: I would rather cast WEPs as probabilities. I think there is a role for Kent et al's work in helping us do this more consistently, within an organization, say. Often, it matters more that we are

consistentin our treatments than that we areaccurate. Indeed, we can't have accuracy without consistency.The notion of fuzzy probabilities is one I have never played with. At least in petroleum geoscience, we tend to draw the uncertainty into a key parameter—the expected volume of gas in a trap, say, or the return on the investment—and use a single probability. The probability therefore represents the chance of getting onto the distribution. There is then a separate probability distribution function, usually a log-normal one, to describe the parameter. I suppose it amounts to the same thing, but I find this more intuitive than the idea of error bars on probabilities. Maybe I'm just used to it.

Cheers!

McLane et al (2008), AAPG Bulletin, show a similar study to your O'Brien one, but done on geoscientists. The results (p1437) are absolutely shocking, but I suspect that there is a significant chance that it is possible that they perhaps could be invalidated by one study participant giving a 10% ("P10") confidence answer when everyone else was giving a 90% confidence answer. Which is a shame, because it would be a useful dataset otherwise.

...this contrary behaviour happened in McLane et al too: "In particular, the forms of four [out of 56] respondents were not included for analysis because they gave implausible answers, rating 'certain' below 10% and 'never' higher than 90%."

@Richie: Wow, that's a great reference, thank you. The entire paper is online at the US Securities and Exchange Commission. Highly recommended, and essential if you deal with portfolio management or reserve reporting.

The figure you referred to is on page 7 of that version of the paper, page 1437 in the original

Bulletin. I have also added it to the wiki page. As you say, it's clear from the next figure, their Figure 4, that one respondent may have misinterpreted the exercise, though it seems odd that he or she would then giveProveda 25% 'confidence' level, and (assuming it's the same person),Reasonable certaintya lower 10% level. Shocking indeed.@Matt, Excellent article and discussions in the comments. I just wanted to discuss it in a, fairly heavy, industry context.

What you have discussed is a common and very important problem I have come across during my time in an exploration asset. As geoscientists, we often have little training in statistics and the probability theories they are based on are designed for finite games of chance. For me, it has so far been best to review other subjects for publications with people with similar view points. http://en.wikipedia.org/wiki/Mathematical_economics (Scroll down to Criticisms and defenses)

Your reading list and views suggest this would not be new for you, and it is refreshing to common across these views in a fellow geoscientist.

The problem raised here is amplified in current oil and gas exploration practice. Your app, risk, is an example of a series of factors multiplied together. The chance of success (PG, POS or whichever abbreviation is used) is an aggregation of all these parameters. It is commonly encouraged to not consider a chance of success, but to consider each of the risk factors (independently) and then take the outputted PG. I, for one, cannot calculate the conditional probability on the fly of 5 factors during an intense discussion of a prospect.

So to word it, two risk factors with a WED of “probably not” = “Almost certainly not”? (0.3*0.3 = 0.09).

Three risk factors with a WED of “probably not” = (almost) Impossible (0.3 * 0.3 * 0.3 = 0.027).

This rarely makes sense and is often fudged around and played with until a chance of success is found that pleases the group (or much worse the manager!).

There are different “risking factor” systems applied by different companies. The problem becomes more acute the more factors, methods, segments are included. This also applies to other methods like Bayesian modification, how can anyone calibrate the inputs to a DFI upgrade/downgrade?

The key thing missing in my general rant so far is where and why are we making a chance of success in the first place. The chance of success is typically applied to allow for an estimate of the project/opportunity value and for the COS and Volumes to cross company benchmarks (despite Rose pointing out the nonsense in this) and help managers high grade their portfolio so they can make a decision. This is the key thing, everything we do, boils down to making an investment decision and that a series of good looking numbers gives mangers a good basis to take a decision and to fall back on if the well is dry!

So if we recognize that the input to the risking numbers is flawed, the decision basis is flawed, then why do we use it? Every time I ask that, I receive a dumbfounded look and a response that we have to have numbers to take a decision on!

As you mentioned in @Brian October the 14th 2011, oil wells are not common. Even the biggest companies’ portfolios are very small data sets. You mentioned the importance of being consistent in risking, however even if consistency could be achieved this is still statistical inference. Statistical inference that has a severe statistical self-reference problem due to the small number of exploration wells drilled a year. I feel that if a company that has a 30% success rate that matches a 30% prognosis on their exploration portfolio has done very little apart from get two numbers to coincidently match over a specific time-frame.

I will stop now; there are many additional things that could be added. For example there is an important connection to the risk and the data quality, can we ever say “almost certainly not” given an immature data set, and for that what defines an immature data set!

Do you have any ideas for a different system for valuing and risking exploration prospects?

I have only very recently found your website, this is a long comment, I just had to get it out of my system!

@Adam: Thanks for the awesome comment. I'm very grateful that you found the time to bash it out; rants on uncertainty and risk are my favourite kind of rant!

I agree — there are more problems with risk analysis than there are non-problems. The best you can hope for is consistency across a portfolio, and even that is probably impossible to achieve. And besides, decisions are often made on non-technical grounds (because of commitments, politics, egos, etc). Fundamentally, I feel like risking is hard because it's more like Schrodinger's cat than a roulette wheel, because the events we are risking have already happened. In fact, it's not even like the quantum cat, because the events were not determined by chance, but by a natural system. I don't know if it's possible to model that physical system with enough precision to make chance (i.e. stochastic models) a weaker tool, but I think it should be our goal.

On this point, I do have some ideas about another way to do it, but I am having trouble manifesting them. As a sort of prelude, I just wrote an article about the subject... it's due out this month in the CSEG Recorder. It will be freely available online, but not till about June. Get in touch and I'll gladly send it to you.

The most beneficial aspect will always be that we think through every aspect (or risk factors) and discuss them rather than the output of a risk element for a model. I don't think we can model that system, at least not with current technology. Your scales of geoscience (I've continued to browse you excellent site) reflect the limitations of that, and then there are additional layers of uncertainty on everyone one of our measurements. So it may as well be Schrodinger's Cat.

For me perhaps some form of qualitative guide would suffice that is underpinned on high quality technical work but then it will not fit into calculating a dollar value of the project. There will always be abuse or mistakes using very advanced quantitative systems unless everyone who uses them has a thorough understanding of the models background. Perhaps making better WEDs could bridge the gap.

As you mentioned the decisions are not always on technical grounds, but often the current models are being manipulated for egos, political reasons. So they form the defence and the reason still.

I sent you a mail with my address, I would be interested in viewing the article.