On answering questions

On Tuesday I wrote about asking better questions. One of the easiest ways to ask better questions is to hang back a little. In a lecture, the answer to your question may be imminent. Even if it isn't, some thinking or research will help. It's the same with answering questions. Better to think about the question, and maybe ask clarifying questions, than to jump right in with "Let me explain".

Here's a slightly edited example from Earth Science Stack Exchange

I suppose natural gas underground caverns on Earth have substantial volume and gas is in gaseous form there. I wonder how it would look like inside such cavern (with artificial light of course). Will one see a rocky sky at big distance?

The first answer was rather terse:

What is a good answer?

This answer, addressing the apparent misunderstanding the OP (original poster) has about gas being predominantly found in caverns, was the first thing that occurred to me too. But it's incomplete, and has other problems:

  • It's not very patient, and comes across as rather dismissive. Not very welcoming for this new user.
  • The reference is far from being an appropriate one, and seems to have been chosen randomly.
  • It only addresses sandstone reservoirs, and even then only 'typical' ones.

In my own answer to the question, I tried to give a more complete answer. I tried to write down my principles, which are somewhat aligned with the advice given on the Stack Exchange site:

  1. Assume the OP is smart and interested. They were smart and curious enough to track down a forum and ask a question that you're interested enough in to answer, so give them some credit. 
  2. No bluffing! If you find yourself typing something like, "I don't know a lot about this, but..." then stop writing immediately. Instead, send the question to someone you know that can give a better answer then you.
  3. If possible, answer directly and clearly in the first sentence. I usually write it in bold. This should be the closest you can get to a one-word answer, especially if it was a direct question. 
  4. Illustrate the answer with an example. A picture or a numerical example — if possible with working code in an accessible, open source language — go a long way to helping someone get further. 
  5. Be brief but thorough. Round out your answer with some different angles on the question, especially if there's nuance in your answer. There's no need for an essay, so instead give links and references if the OP wants to know more.
  6. Make connections. If there are people in your community or organization who should be connected, connect them.

It's remarkable how much effort people are willing to put into a great answer. A question about detecting dog paw-prints on a pressure pad, posted to the programming community Stack Overflow, elicited some great answers.

The thread didn't end there. Check out these two answers by Joe Kington, a programmer–geoscientist in Houston:

  • One epic answer with code and animated GIFs, showing how to make a time-series of pawprints.
  • A second answer, with more code, introducing the concept of eigenpaws to improve paw recognition.

A final tip: writing informative answers might be best done on Wikipedia or your corporate wiki. Instead of writing a long response to the post, think about writing it somewhere more accessible, and instead posting a link to your answer. 

What do you think makes a good answer to a question? Have you ever received an answer that went beyond helpful? 

On asking questions

If I had only one hour to solve a problem, I would spend up to two-thirds of that hour in attempting to define what the problem is. — Anonymous Yale professor (often wrongly attributed to Einstein)

Asking questions is a core skill for professionals. Asking questions to know, to understand, to probe, to test. Anyone can feel exposed asking questions, because they feel like they should know or understand already. If novices and 'experts' alike have trouble asking questions, if your community or organization does not foster a culture of asking, then there's a problem.

What is a good question?

There are naive questions, tedious questions, ill-phrased questions, questions put after inadequate self-criticism. But every question is a cry to understand the world. There is no such thing as a dumb question. — Carl Sagan

Asking good questions is the best way to avoid the problem of feeling silly or — worse — being thought silly. Here are some tips from my experience in Q&A forums at work and on the Internet:

  1. Do some research. Go beyond a quick Google search — try Google Scholar, ask one or two colleagues for help, look in the index of a couple of books. If you have time, stew on it for a day or two. Do enough to make sure the answer isn't widely known or trivial to find. Once you've decided to ask a network...
  2. Ask your question in the right forum. You will save yourself a lot of time by going taking the trouble to find the right place — the place where the people most likely to be able to help you are. Avoid the shotgun approach: it's not considered good form to cross-post in multiple related forums.
  3. Make the subject or headline a direct question, with some relevant detail. This is how most people will see your question and decide whether to even read the rest of it. So "Help please" or "Interpretation question" are hopeless. Much better is something like "How do I choose seismic attribute parameters?" or "What does 'replacement velocity' mean?".
  4. Provide some detail, and ideally an image. A bit of background helps. If you have a software or programming problem, just enough information needed to reproduce the problem is critical. Tell people what you've read and where your assumptions are coming from. Tell people what you think is going on.
  5. Manage the question. Make sure early comments or answers seem to get your drift. Edit your question or respond to comments to help people help you. Follow up with new questions if you need clarification, but make a whole new thread if you're moving into new territory. When you have your answer, thank those who helped you and make it clear if and how your problem was solved. If you solved your own problem, post your own answer. Let the community know what happened in the end.

If you really want to cultivate your skills of inquiry, here is some more writing on the subject...

Supply and demand

Knowledge sharing networks like Stack Exchange, or whatever you use at work, often focus too much on answers. Capturing lessons learned, for example. But you can't just push knowledge at people — the supply and demand equation has two sides — there has to be a pull too. The pull comes from questions, and an organization or community that pulls, learns.

Do you ask questions on knowledge networks? Do you have any advice for the curious? 

Don't miss the next post, On answering questions.

Seismic inception

A month ago, some engineers at Google blogged about how they had turned a deep learning network in on itself and produced some fascinating and/or disturbing images:

One of the images produced by the team at Google. Click to see a larger version. Read more. CC-BY.

The basic recipe, which Google later open sourced, involves training a deep learning network (basically a multi-layer neural network) on some labeled images, animals maybe, then searching for matching patterns in a target image, like these clouds. If it finds something, it emphasizes it — given the data, it tries to construct an animal. Then do it again.

Or, here's how a Google programmer puts it (one of my favourite sentences ever)...

Making the "dream" images is very simple. Essentially it is just a gradient ascent process that tries to maximize the L2 norm of activations of a particular DNN layer. 

That's all! Anyway, the point is that you get utter weirdness:

OK, cool... what happens if you feed it seismic?

That was my first thought, I'm sure it was yours too. The second thing I thought, and the third, and the fourth, was: wow, this software is hard to compile. I spent an unreasonable amount of time getting caffe, the Berkeley Vision & Learning Centre's deep learning software, working. But on Friday I cracked it, so today I got to satisfy my curiosity.

The short answer is: reptiles. These weirdos were 8 levels down, which takes about 20 minutes to reach on my iMac.

Seismic data from the Virtual Seismic Atlas, courtesy of Fugro. 


Er, right... what's the point in all this?

That's a good question. It's just a bit of fun really. But it makes you wonder:

  • What if we train the network on seismic facies? I think this could be very interesting.
  • Better yet, what if we train it on geology? Probably spurious: seismic is not geology.
  • Does this mean learning networks are just dumb machines, or can they see more than us? Tough one — human vision is highly fallible. There are endless illusions to prove this. But computers only do what we tell them, at least for now. I think if we're careful what we ask for, we can use these highly non-linear data-crunching algorithms for good.
  • Are we out of a job? Definitely not. How do you think machines will know what to learn? The challenge here is to make this work, and then figure out how it can help change, or at least accelerate, our understanding of the subsurface.

This deep learning stuff — of which the University of Toronto was a major pioneer during its emergence in about 2010 — is part of the machine learning revolution that you are, like it or not, experiencing. It will take time, and it will make awful mistakes, but the indications are that machine learning will eat every analytical method for breakfast. Customer behaviour prediction, computer vision, natural language processing, all this stuff is reeling from the relatively sudden and widespread availability of inexpensive computer intelligence. 

So what are we going to do with that?

           Okay, one more. from Paige Bailey's Twitter feed.

           Okay, one more. from Paige Bailey's Twitter feed.

Ask your employer about being more awesome


Open source software needs money to survive. If you work at a corporation with a positive bottom line, and you use open source software to help you maintain it, I'd urge you to consider asking your organization to help out. You can't imagine the difference it makes — these projects take serious resources to run: server hardware, infrastructure maintenance, professional developers, research and development, legal and marketing functions, educational outreach, work in developing countries,... just like commercial, closed-source, black-or-at-least-dark-grey-box software. 

(Come to think of it, the only thing they don't have is sales personnel driving to golf courses in a BMW 5 series. How many of those have you paid for with those license fees?)

Which projects need your company's help?

There are some fundamental projects, but they tend to be quite well funded already, both financially and in-kind. For example, software engineers at companies like IBM and Google make substantial contributions to the Linux kernel. Still, your company definitely depends on technology from the following projects:

  1. The Linux Foundation — responsible for the kernel of the Linux operating system.
  2. Free Software Foundation — the umbrella for a ridiculous number of software tools.
  3. The Apache Foundation — maintainers of the eponymous web server, and forerunners of the ongoing big data and machine learning revolutions and the tools that power them. 

These higher-level projects are closer to my heart, and do great working supporting the work of scientists:

  1. The Mozilla Foundation — check out the Mozilla Science Lab and Software Carpentry
  2. The WikiMedia Foundation — for Wikipedia, and the MediaWiki software that powers it (as well as AAPG's and SEG's wikis)
  3. NumFOCUS Foundation — all the better to help you wield scientific Python!

If money really isn't an option, consider working somewhere where it is an option. If that's not an option either, then there are plenty of other ways to make a difference:

  1. Use and champion open source software at your place of work.
  2. Submit tickets for the software you use, and engage with the community.
  3. If you can code, submit patches, documentation, or whatever you can.

Now, if we only had an Open Geoscience Foundation to help fund projects in geoscience...

Software, stats, and tidal energy

Today was the last day of the conference part of SciPy 2015 in Austin. Almost all the talks at this conference have been inspiring and/or enlightening. This makes it all the more wonderful that the organizers get the talks online within a couple of hours (!), so you can see everything (compared to about 5% maximum coverage at SEG).

Jake Vanderplas, a young astronomer and data scientist at UW's eScience Institute, gave the keynote this morning. He eloquently reviewed the history and state-of-the-art of the so-called SciPy stack, the collection of tools that Pythonistic scientists use to get their research done. If you're just getting started in this world, it's about the best intro you could ask for:

Chris Fonnesbeck treated the room to what might as well have been a second keynote, so well did he express his convictions. Beautiful slides, and a big message: statistics matters.

Kristen Thyng, an energetic contributor to the conference, gave a fantastic talk about tidal energy, her main field, as well as one about perceptual colourmaps, which is more of a hobby. The work includes some very nice visualizations of tidal currents in my home province...

Finally, I highly recommend watching the lightning talks. Apart from being filled with some mind-blowing ideas, many of them eliciting spontaneous applause (imagine that!), I doubt you will ever witness a more effective exercise in building a community of passionate professionals. It's remarkable. (If you don't have an hour these three are awesome.)

Next we'll be enjoying the 'sprints', a weekend of coding on open source projects. We'll be back to geophysics blogging next week :)

Geophysics at SciPy 2015

Yesterday was the geoscience day at SciPy 2015 in Austin.

At lunchtime, Paige Bailey (Chevron) organized a Birds of a Feather on GIS. This was a much-needed meetup for anyone interested in spatial data. It was useful to hear about the tools the fifty-or-so participants  use every day, and a great chance to air some frustrations like Why is it so hard to install a geospatial stack? And questions like How do people make attractive maps with the toolset?

One way to make attractive maps is go beyond the screen and 3D print them. Almost any subsurface dataset could seem more tangible and believable as a 3D object, and Joe Kington (Chevron) showed us how to make data into objects. Just watch:

Matteus Ueckermann followed up with some virtual elevation models, showing how Python can process not just a few tiles of data, but can handle hydrology modeling for the entire world:

Nicola Creati (OGS, Trieste) showed us the PyGmod package, a new and fully parallel geodynamic simulation tool for HPC nuts. So now you can make more plate tectonic models before most people are out of bed!

We also heard from Lindsey Heagy and Gudnir Rosenkjaer from UBC, talking about various applications of Rowan Cockett's awesome SimPEG package to their work. As at the hackathon in Denver, it's very clear that this group's investment in and passion for a well-architected, integrated package is well worth the work, giving everyone who works with it superpowers. And, as we all know, superpowers are awesome. Especially geophysical ones.

Last up, I talked about striplog, a small package for handling interval and point data in logs, core, and other 1D datasets. It's still very immature, but almost ready for real-world users, so if you think you have a use case, I'd love to hear from you.

Today is the last day of the conference part, before we head into the coding sprints tomorrow. Stay tuned for more, or follow the #scipy2015 hashtag to keep up. See all the videos, which go up almost right after talks, on YouTube.

You'd better read this

The clean white front cover of this month's Bloomberg Businessweek carries a few lines of Python code, and two lines of English as a footnote... If you can't read that, then you'd better read this. The entire issue is a single essay written by Paul Ford. It was an impeccable coincidence: I picked up a copy before boarding the plane to Austin for SciPy 2015. This issue is a grand achievement; it could be the best thing I've ever read. Go out an buy as many copies as you can, and give them to your friends. Or read it online right now.

Not your grandfather's notebook

Jess Hamrick is a cognitive scientist at UC Berkeley who makes computational models of human behaviour. In her talk, she described how she built a multi-user server for Jupyter notebooks to administer course content, assign homework, even do auto-grading for a class with 220 undergrads. During her talk, she invited the audience to list their GitHub usernames on an Etherpad. Minutes after she stood down from her podium, she granted access, so we could all come inside and see how it was done.

Dangerous defaults

I wrote a while ago about the dangers of defaults, and as Matteo Niccoli highlighted in his 52 Things essay, How to choose a colourmap, default colourmaps can be especially harmful. Matplotlib has long been criticized for its nasty default colourmap, but today redeemed itself with a new default. Hear all about it from Stefan van der Walt:

Sound advice

Allen Downey of Olin College gave a wonderful talk this afternoon about teaching digital signal processing to students using fun and intuitive audio signals as the hook. Watch it yourself, it's well worth the 20 minutes or so:

If you're really into musical and audio applications, there was another talk on the subject, by Brian McFee (Librosa project). 

More tomorrow as we head into Day 2 of the conference. 

Attribute analysis and statistics

Last week I wrote a basic introduction to attribute analysis. The post focused on the different ways of thinking about sampling and intervals, and on how instantaneous attributes have to be interpolated from the discrete data. This week, I want to look more closely at those interval attributes. We'd often like to summarize the attributes of an interval info a single number, perhaps to make a map.

Before thinking about amplitudes and seismic traces, it's worth reminding ourselves about different kinds of average. This table from SubSurfWiki might help... 

A peculiar feature of seismic data. from a statistical point of view, is the lack of the very low frequencies needed to give it a trend. Because of this, it oscillates around zero, so the average amplitude over a window tends to zero — seismic data has a mean value of zero. So not only do we have to think about interpolation issues when we extract attributes, we also have to think about statistics.

Fortunately, once we understand the issue it's easy to come up with ways around it. Look at the trace (black line) below:

The mean is, as expected, close to zero. So I've applied some other statistics to represent the amplitude values, shown as black dots, in the window (the length of the plot):

  • Average absolute amplitude (light green) — treat all values as positive and take the mean.
  • Root-mean-square amplitude (dark green) — tends to emphasize large values, so it's a bit higher.
  • Average energy (magenta) — the mean of the magnitude of the complex trace, or the envelope, shown in grey.
  • Maximum amplitude (blue) — the absolute maximum value encountered, which is higher than the actual sample values (which are all integers in this fake dataset) because of interpolation.
  • Maximum energy (purple) — the maximum value of the envelope, which is higher still because it is phase independent.

There are other statistics besides these, of course. We could compute the median average, or some other mean. We could take the strongest trough, or the maximum derivative (steepest slope). The options are really only limited by your imagination, and the physical relationship with geology that you expect.

We'll return to this series over the summer, asking questions like How do you know what to expect? and Does a physically realistic relationship even matter? 

To view and run the code that I used in creating the figures for this post, grab the iPython/Jupyter Notebook.

How do I become a quantitative interpreter?

TLDR: start doing quantitative interpretation.

I just saw this question on reddit/r/geophysics

I always feel a bit sad when I read this sort of question, which is even more common on LinkedIn, because it reminds me that we (in the energy industry at least) have built recruiting patterns and HR practices that make it look as if professionals have career tracks or have to build CVs to impress people or get permission to train in a new area. This is all wrong.

Or, to be more precise, we can treat this as all wrong and have a lot more fun in the process.

If you are a 'geologist' or 'geophysicist', then you are in control of your own career and what you apply yourself to. No-one is telling you what to do, they are only telling you what they need. How you do it, the methods you apply, the products you build — all this is completely up to you. This is almost the whole point of being a professional.

The replies to Timbledon's question include this one:

I disagree with Schwa88. Poor Timbledon doesn't need another degree. Rock physics is not a market, and not new. There are no linear tracks. And there is no clear or useful distinction between rock physics and quantitative interpretation (or petrophysics, or seismic geophysics) — I bet there are no two self-identifying quantitative interpreters with identical, or even similar, job or educational histories.

As for 'now is not the time'... I can't even... 'Now' is the only time you can do anything about, so work with it.

OK, enough ranting, what should Timbledon do?

It's easy! The best way to pursue quantitative interpretation, or pretty much anything except pediatric cardiology, is to just start doing it. It really is that simple. My advice is to use quantitative methods in every project you touch, and in doing so you will immediately outperform most interpreters. Talk to anyone and everyone about your interest and share your insights. Volunteer for projects. Go to talks. Give talks. To help you find your passion, take the time to learn about some big things:

  • Rock physics, e.g. the difference between static and dynamic elasticity.
  • Seismic processing, e.g. what surface consistent deconvolution and trim statics are.
  • Seismic interpretation, e.g. seismic geomorphology and seismic stratigraphy.
  • Seismic analysis, e.g. the difference between Zoeppritz, Fatti, and Shuey.
  • Statistics, e.g. when you need multilinear regression, or K-means clustering.

Those are just examples. If you're more into X-ray diffraction in clays, or the physics of crystalline rocks, or fluid properties, or wellbore seismic, or time-lapse effects, or whatever — learn about those things instead.

Whatever you do, Timbledon, don't listen to anybody ;)

An attribute analysis primer

A question on Stack Exchange the other day reminded me of the black magic feeling I used to have about attribute analysis. It was all very meta: statistics of combinations of attributes, with shifted windows and crazy colourbars. I realized I haven't written much about the subject, despite the fact that many of us spend a lot of time trying to make sense of attributes.

Time slices, horizon slices, and windows

One of the first questions a new attribute-analyser has is, "Where should the window be?" Like most things in geoscience: it depends. There are lots of ways of doing it, so think about what you're after...

  • Timeslice. Often the most basic top-down view is a timeslice, because they are so easy to make. This is often where attribute analysis begins, but since timeslices cut across stratigraphy, not usually where it ends.
  • Horizon. If you're interested in the properties of a strong reflector, such as a hard, karsted unconformity, maybe you just want the instantaneous attribute from the horizon itself.
  • Zone. If the horizon was hard to interpret, or is known to be a gradual facies transition, you may want to gather statistics from a zone around it. Or perhaps you couldn't interpret the thing you really wanted, but only that nice strong reflection right above it... maybe you can bootstrap yourself from there. 
  • Interval. If you're interested in a stratigraphic interval, you can bookend it with existing horizons, perhaps with a constant shift on one or both of them.
  • Proportional. If seismic geomorphology is your game, then you might get the most reasonable inter-horizon slices from proportionally slicing in between stratigraphic surface. Most volume interpretation software supports this. 

There are some caveats to simply choosing the stratigraphic interval you are after. Beware of choosing an interval that strong reflectors come into and out of. They may have an unduly large effect on most statistics, and could look 'geological'. And if you're after spectral attributes, do remember that the Fourier transform needs time! The only way to get good frequency resolution is to provide long windows: a 100 ms window gives you frequency information every 10 Hz.

Extraction depends on sample interpolation

When you extract an attribute, say amplitude, from a trace, it's easy to forget that the software has to do some approximation to give you an answer. This is because seismic traces are not continuous curves, but discrete series, with samples typically every 1, 2, or 4 milliseconds. Asking for the amplitude at some arbitrary time, like the point at which a horizon crosses a trace, means the software has to interpolate between samples somehow. Different software do this in different ways (linear, spline, polynomial, etc), and the methods give quite different results in some parts of the trace. Here are some samples interpolated with a spline (black curve) and linearly (blue). The nearest sample gives the 'no interpolation' result.

As well as deciding how to handle non-sampled parts of the trace, we have to decide how to represent attributes operating over many samples. In a future post, we'll give some guidance for using statistics to extract information about the entire window. What options are available and how do we choose? Do we take the average? The maximum? Something else?

There's a lot more to come!

As I wrote this post, I realized that this is a massive subject. Here are some aspects I have not covered today:

  • Calibration is a gaping void in many published workflows. How can we move past "that red blob looks like a point bar so I drew a line around it in PowerPoint" to "there's a 70% chance of finding reservoir quality sand at that location"?
  • This article was about  single-trace attributes at single instants or over static windows. Multi-trace and volume attributes, like semblance, curvature, and spectral decomposition, need a post of their own.
  • There are a million attributes (though only a few that count, just ask Art Barnes) so choosing which ones to use can be a challenge. Criteria range from what software licenses you have to what is physically reasonable.
  • Because there are a million attributes, the art of combining attributes with statistical methods like principal component analysis or multi-linear regression needs a look. This gets into seismic inversion.

We'll return to these ideas over the next few weeks. If you have specific questions or workflows to share, please leave a comment below, or get in touch by email or Twitter.

To view and run the code that I used in creating the figures for this post, grab the iPython/Jupyter Notebook.