Today was the first day of the Petroleum Technology Transfer Council's workshop Open software for reproducible computational geophysics, being held at the Bureau of Economic Geology's Houston Research Center and organized skillfully by Karl Schleicher of the University of Texas at Austin. It was a full day of presentations (boo!), but all the presentations had live installation demos and even live coding (yay!). It was fantastic.
Serial entrepreneur Alex Mihai Popovici, the CEO of Z-Terra, gave a great, very practical, overview of the relative merits of three major seismic processing packages: Seismic Unix (SU), Madagascar, and SEPlib. He has a very real need: delivering leading edge seismic processing services to clients all over the world. He more or less dismissed SEPlib on the grounds of its low development rate and difficulty of installation. SU is popular (about 3300 installs) and has the best documentation, but perhaps lacks some modern imaging algorithms. Madagascar, Alex's choice, has about 1100 installs, relatively terse self-documentation (it's all on the wiki), but is the most actively developed.
The legendary Dave Hale (I think that's fair), Colorado School of Mines, gave an overview of his Mines Java Toolkit (JTK). He's one of those rare people who can explain almost anything to almost anybody, so I learned a lot about how to manage parallelization in 2D and 3D arrays of data, and how to break it. Dave is excited about the programming language Scala, a sort of Java lookalike (to me) that handles parallelization beautifully. He also digs Jython, because it has the simplicity and fun of Python, but can incorporate Java classes. You can get his library from his web pages. Installing it on my Mac was a piece of cake, needing only three terminal commands:
- svn co http://boole.mines.edu/jtk
- cd jtk/trunk
Chuck Mosher of ConocoPhillips then gave us a look at JavaSeis, an open source project that makes handling prestack seismic data easy and very, very fast. It has parallelization built into it, and is perfect for large, modern 3D datasets and multi-dimensional processing algorithms. His take on open source in commerce: corporations are struggling with the concept, but "it's in their best interests to actively participate".
Eric Jones is CEO of Enthought, the innovators behind (among other things) NumPy/SciPy and the Enthought Python Distribution (or EPD). His take on the role of Python as an integrator and facilitator, handling data traffic and improving usability for the legacy software we all deal with, was practical and refreshing. He is not at all dogmatic about doing everything in Python. He also showed a live demo of building a widget with Traits and Chaco. Awesome.
After lunch, BP's Richard Clarke told us about the history and future of FreeUSP and FreeDDS, a powerful processing system. FreeDDS is being actively developed and released gradually by BP; indeed, a new release is due in the next fews days. It will eventually replace FreeUSP. Richard and others also mentioned that Randy Selzler is actively developing PSeis, the next generation of this processing system (and he's looking for sponsors!).
German Garabito of the Federal University of Parà, Brazil, generated a lot of interest in BotoSeis, the GUI he has developed to help him teach SU. It allows one to build and manage processing flows visually, in a Java-built interface inspired by Focus, ProMax and other proprietary tools. The software is named after the Amazon river dolphin, or boto (left). Dave Hale described his efforts as the perfect example of the triumph of 'scratching your own itch'.
Continuing the usability theme, Karl Schleicher followed up with a nice look at how he is building scripts to pull field data from the USGS online repository, and perform SU and Madagascar processing flows on them. He hopes he can build a library of such scripts as part of Sergey Fomel's reproducible geophysics efforts.
Finally, Bill Menger of Global Geophysical told the group a bit about two projects he open sourced when he was at ConocoPhillips: GeoCraft and CPSeis. His insight on what was required to get them into the open was worth sharing:
- Get permission, using a standard open source license (and don't let lawyers change it!)
- Communicate the return on investment carefully: testing, bug reporting, goodwill, leverage, etc.
- Know what you want to get out of it, and have a plan for how to get there
- Pick a platform: compiler, dependencies, queueing, etc (unless you have a lot of time for support!)
- Know the issues: helping users, dealing with legacy code, dependency changes, etc.
I am looking forward to another awesome-packed data tomorrow. My own talk is the wafer-thin mint at the end!
You can read all about Day 2 of this workshop in this blog post.