The big data eye-roll

First, let's agree on one thing: 'big data' is a half-empty buzzword. It's shorthand for 'more data than you can look at', but really it's more than that: it branches off into other hazy territory like 'data science', 'analytics', 'deep learning', and 'machine intelligence'. In other words, it's not just 'large data'. 

Anyway, the buzzword doesn't bother me too much. What bothers me is when I talk to people at geoscience conferences about 'big data', about half of them roll their eyes and proclaim something like this: "Big data? We've been doing big data since before these punks were born. Don't talk to me about big data."

This is pretty demonstrably a load of rubbish.

What the 'big data' movement is trying to do is not acquire loads of data then throw 99% of it away. They are not processing it in a purely serial pipeline, making arbitrary decisions about parameters on the way. They are not losing most of it in farcical enterprise data management train-wrecks. They are not locking most of their data up in such closed systems that even they don't know they have it.

They are doing the opposite of all of these things.

If you think 'big data', 'data' science' and 'machine learning' are old hat in geophysics, then you have some catching up to do. Sure, we've been toying with simple neural networks for years, eg probabilistic neural nets with 1 hidden layer — though this approach is very, very far from being mainstream in subsurface — but today this is child's play. Over and over, and increasingly so in the last 3 years, people are showing how new technology — built specifically to handle the special challenge that terabytes bring — can transform any quantitative endeavour: social media and online shopping, sure, but also astronomy, robotics, weather prediction, and transportation. These technologies will show up in petroleum geoscience and engineering. They will eat geostatistics for breakfast. They will change interpretation.

So when you read that Google has open sourced its TensorFlow deep learning library (9 November), or that Microsoft has too (yesterday), or that Facebook has too (months ago), or that Airbnb has too (in August), or that there are a bazillion other super easy-to-use packages out there for sophisticated statistical learning, you should pay a whole heap of attention! Because machine learning is coming to subsurface.