Subscribe by email
Want updates? Enter your email


Delivered by Google FeedBurner
No spam, total privacy, opt out any time
News

Entries in data (16)

Monday
Feb182013

Machines can read too

The energy industry has a lot of catching up to do. Humanity is faced with difficult, pressing problems in energy production and usage, yet our industry remains as secretive and proprietary as ever. One rich source of innovation we are seriously under-utilizing is the Internet. You have probably heard of it.

Machine experience design

Web sites are just the front-end of the web. Humans have particular needs when they read web pages — attractive design, clear navigation, etc. These needs are researched and described by the rapidly growing field of user experience design, often called UX. (Yes, the ways in which your intranet pages need fixing are well understood, just not by your IT department!)

But the web has a back-end too. Rather than being for human readers, the back-end is for machines. Just like human readers, machines—other computers—also have particular needs: structured data, and a way to make queries. Why do machines need to read the web? Because the web is full of data, and data makes the world go round. 

So website administrators need to think about machine experience design too. As well as providing beautiful web pages for humans to read, they should provide widely-accepted machine-readable format such as JSON or XML, and a way to make queries.

What can we do with the machine-readable web?

The beauty of the machine-readable web, sometimes called the semantic web, or Web 3.0, is that developers can build meta-services on it. For example, a website like hipmunk.com that finds the best flights, wherever they are. Or a service that provides charts, given some data or a function. Or a mobile app that knows where to get the oil price. 

In the machine-readable web, you could do things like:

  • Write a program to analyse bibliographic data from SEG, SPE and AAPG.
  • Build a mobile app to grab log mnemonics info from SLB's, HAL's, and BHI's catalogs.
  • Grab course info from AAPG, PetroSkills, and Nautilus to help people find training they need.

Most wikis have a public application programming interface, giving direct, machine-friendly access to the wiki's database. Here are two views of one wiki page — click on the images to see the pages:

At SEG last year, I suggested to a course provider that they might consider offering machine access to their course catalog—so that developers can build services that use their course information and thus send them more students. They said, "Don't worry, we're building better search tools for our users." Sigh.

In this industry, everyone wants to own their own portal, and tends to be selfish about their data and their users. The problem is that you don't know who your users are, or rather who they could be. You don't know what they will want to do with your data. If you let them, they might create unimagined value for you—as hipmunk.com does for airlines with reasonable prices, good schedules, and in-flight Wi-Fi. 

I can't wait for the Internet revolution to hit this industry. I just hope I'm still alive.

Tuesday
Dec112012

The digital well scorecard

In my last post, I ranted about the soup of acronyms that refer to well log curves; a too-frequent book-keeping debacle. This pain, along with others before it, has motivated me to design a solution. At this point all I have is this sketch, a wireframe of should-be software that allows you visualize every bit of borehole data you can think of:

The goal is, show me where the data is in the domain of the wellbore. I don't want to see the data explicitly (yet), just its whereabouts in relation to all other data. Data from many disaggregated files, reports, and so on. It is part inventory, part book-keeping, part content management system. Clear the fog before the real work can begin. Because not even experienced folks can see clearly in a fog.

The scorecard doesn't yield a number or a grade point like a multiple choice test. Instead, you build up a quantitative display of your data extents. With the example shown above, I don't even have to look at the well log to tell you that you are in for a challenging well tie, with the absence of sonic measurements in the top half of the well. 

The people that I showed this to immediately undestood what was being expressed. They got it right away, so that bodes well for my preliminary sketch. Can you imagine using a tool like this, and if so, what features would you need? 

Thursday
Oct042012

My StrataConf highlights

Lots went on at the geologically named, but not geologically inclined, Strata Conference in London. Here are my highlights:

George Dyson was one of the keynote speakers on the first morning. The son of the British–American mathematician Freeman Dyson, George is an author and historian of science and computing. He talked about the history of storage, starting with tally sticks, through the 53kB of global digital storage in 1953, to today. His talk was fascinating. 

Simon Rogers was one of several speakers from the Guardian newspaper, one of the most progressive and online-friendly news outlets in the world. The paper has a host of strategies for putting data first:

  • Their data and viz geeks sit in the middle of news room
  • They built their own software library for data viz, Miso
  • They share the data behind every story on their Datablog

Duncan Irving from Teradata gave the audience a glimpse of the big data geoscientists wield, as I alluded to yesterday. Teradata does data warehousing, but with high technology extras like distributed storage and level of detail layers. I was intrigued by one of the technologies he talked about — SQL on Hadoop. This sounds like gobbledygook, but here's the (possibly horribly misunderstood) gist: store statistical attributes of a massive seismic volume in a database, then you can query them. "Show me all the traces with such-and-such seismic facies."   

Hjalmar Gislason from Datamarket, whose recent products include Energy Portal, gave us his best practices for publishing data:

  • Use simple formats, like CSV
  • Aim for at least 3 stars in Tim Berners-Lee's system
  • Be consistent across the datasets you publish
  • Put unique IDs everywhere, especially on tables and columns
  • Provide FAQs and clear feedback channels for users
  • Be clear about the license terms of the data

Ben Goldacre, author and bad science crimefighter, gave a keynote on the second day. Almost vibrating with energy, he described how the most basic bias-fighting tool in medicine — randomized controlled trials — might be applied to improving government services (Haynes et al., 2012, Test, learn, adapt). 

At the end of the two days, I had the usual feeling of fullness, fatigue, and anticlimax... but also the inspired, impatient, creative energy that I hope for from events. The consistency of the themes was encouraging — data wants to be free, visualization is necessary but insufficient, reproducibility is core, stories drive us — these are ideas we embrace. They're at the heart of the quiet revolution going on in the world, but perhaps not yet at the heart of our subsurface professional communities. 

Photo by flickr user bjelkeman.

Tuesday
Oct022012

Big data in geoscience

Big data is what we got when the decision cost of deleting data became greater than the cost of storing it.
George Dyson, at Strata London

I was looking for something to do in London this week. Tempted by the Deep-water contintental margins meeting in Piccadilly, I instead took the opportunity to attend a different kind of conference. The media group O'Reilly, led by the inspired Tim O'Reilly, organizes conferences. They're known for being energetic, quirky, and small-company-friendly. I wanted to see one, so I came to Strata.

Strata is the conference for big data, one of the woolliest buzzwords in computer science today. Some people are skeptical that it's anything other than a new way to provoke fear and uncertainty in IT executives, the only known way to make them spend money. Indeed, Google "big data" and the top 5 hits are: Wikipedia (obvsly), IBM, McKinsey, Oracle, and EMC. It might be hype, but all this attention might lead somewhere good. 

We're all big data scientists

Geoscientists, especially geophysicists, are unphased by the concept of big data. The acquisition data from a 3D survey can easily require 10TB (10,240GB) or even 100TB of storage. The data must be written, read, processed, and re-written dozens of times during processing, then delivered, loaded, and interpreted. In geoscience, big data is normal data. 

So it's great that big data problems are being hacked on by thousands of developers, researchers, and companies that, until about a year ago, were only interested in games and the web. About 99% of them are not working on problems in geophysics or petroleum, but there will be insight and technology that will benefit our industry.

It's not just about data management. Some of the most creative data scientists in the world are at this conference. People are showing dense, and sometimes beautiful, visualizations of giant datasets, like the transport displays by James Cheshire's research group at UCL (right). I can't wait to show some of these people a SEG-Y or LAS file and, unencumbered by our curmudgeonly tradition of analog display metaphors, see how they would display it.

Would the wiggle display pass muster?

Friday
Apr272012

Opening data in Nova Scotia

When it comes to data, open doesn't mean part of the public relations campaign. Open must be put to work. And making open data work can take a lot of work, by a number of contributors across organizations.

Also, open data should be accesible by more than the privileged few in the right location at the right time, or with the right connections. The better way to connect is by digital data stewardship.

I will be speaking about the state of the onshore Nova Scotia petroleum database Nova Scotia Energy R&D Forum in Halifax on 16 & 17 May, and the direction this might head for the collective benefit of regulators, researchers, explorationists, and the general public. Here's the abstract for the talk:

Click to read more ...