News
Tuesday
Sep092014

The road to Modelr: my EuroSciPy poster

At EuroSciPy recently, I gave a poster-ized version of the talk I did at SciPy. Unlike most of the other presentations at EuroSciPy, my poster didn't cover a lot of the science (which is well understood), or the code (which is esoteric).

Instead it focused on the advantages of spreading software via web applications, rather than only via source code, and on the challenges that we overcame — well, that we're still overcoming — to get our Modelr tool out there. I wanted other programmer-scientists to think about running some of their code as a web app for others to enjoy, but to be aware of the effort involved in doing this.

I've written before about my dislike of posters, though I'm told they are an important component at, say, the AGU Fall Meeting. I admit I do quite like the process of making them, and — on advice from Colin Purrington's useful page — I left a space on the poster for people to write comments or leave sticky notes. As a result, I heard about Docker, a lead I'll certainly follow up,

What's new in modelr

This wasn't part of the poster, but I might as well take the chance to let you know what we've updated recently:

  • You can now add noise to models by specifying the signal:noise.
  • Instead of automatic scaling, you can choose your own gain.
  • The app now returns the elastic moduli of the rocks in the model.
  • You can choose a spatial cross-section view or a space–offset–frequency view.

All of these features are now available to subscribers for only $9/month. Amazing value :)

Figshare

I've stored my poster on Figshare, a data storage site and part of Macmillan's Digital Science effort. What I love about Figshare, apart from the convenience of cloud-based storage and easy access for others, is that every item gets a digital object identifier or DOI. You've probably seen these on journal articles. They're a bit like other persistent and unique IDs for publications, such as ISBNs for books, but the idea is to provide more interactivity by making it easily linkable: you can get to any object with a DOI by prepending it with "http://dx.doi.org/".

Reference

Hall, M (2014). The road to modelr: building a commercial web app on an open source foundation. EuroSciPy, Cambridge, UK, August 29–30, 2014. Poster presentation. DOI:10.6084/m9.figshare.1151653

Thursday
Sep042014

Julia in a nutshell

Julia is the most talked-about language in the scientific Python community. Well, OK, maybe second to Python... but only just. I noticed this at SciPy in July, and again at EuroSciPy last weekend.

As promised, here's my attempt to explain why scientists are so excited about it.

Why is everyone so interested in Julia?

At some high level, Julia seems to solve what Steven Johnson (MIT) described at EuroSciPy on Friday as 'the two-language problem'. It's also known as Outerhout's dichotomy. Basically, there are system languages (hard to use, fast), and scripting languages (easy to use, slow). Attempts to get the best of boths worlds have tended to result in a bit of a mess. Until Julia.

Really though, why?

Cheap speed. Computer scientists adore C because it's rigorous and fast. Scientists and web developers worship Python because it's forgiving and usually fast enough. But the trade-off has led to various cunning ploys to get the best of both worlds, e.g. PyPy and Cython. Julia is perhaps the cunningest ploy of all, achieving speeds that compare with C, but with readable code, dynamic typing, garbage collection, multiple dispatch, and some really cool tricks like Unicode variable names that you enter in pure LaTeX. And check out this function definition shorthand:

Why is Julia so fast?

Machines don't understand programming languages — the code written by humans has to be translated into machine language in a process called 'compiling'. There are three approaches:

Compiling makes languages fast, because the executed code is tuned to the task (e.g. in terms of the types of variables it handles), and to the hardware it's running on. Indeed, it's only by building special code for, say, integers, that compiled languages achieve the speeds they do.

Julia is compiled, like C or Fortran, so it's fast. However, unlike C and Fortran, which are compiled before execution, Julia is compiled at runtime ('just in time' for execution). So it looks a little like an interpreted language: you can write a script, hit 'run' and it just works, just like you can with Python.

You can even see what the generated machine code looks like:

Don't worry, I can't read it either.

But how is it still dynamically typed?

Because the compiler can only build machine code for specific types — integers, floats, and so on — most compiled languages have static typing. The upshot of this is that the programmer has to declare the type of each variable, making the code rather brittle. Compared to dynamically typed languages like Python, in which any variable can be any type at any time, this makes coding a bit... tricky. (A computer scientist might say it's supposed to be tricky — you should know the type of everything — but we're just trying to get some science done.)

So how does Julia cope with dynamic typing and still compile everything before execution? This is the clever bit: Julia scans the instructions and compiles for the types it finds — a process called type inference — then makes the bytecode, and caches it. If you then call the same instructions but with a different type, Julia recompiles for that type, and caches the new bytecode in another location. Subsequent runs use the appropriate bytecode, with recompilation.

Metaprogramming

It gets better. By employing metaprogramming — on-the-fly code generation for special tasks — it's possible for Julia to be even faster than highly optimized Fortran code (right), in which metaprogramming is unpleasantly difficult. So, for example, in Fortran one might tolerate a relatively slow loop that can only be avoided with code generation tricks; in Julia the faster route is much easier. Here's Steven's example.

Interoperability and parallelism

It gets even better. Julia has been built with interoperability in mind, so calling C — or Python — from inside Julia is easy. Projects like Jupyter will only push this further, and I expect Julia to soon be the friendiest way to speed up that stubborn inner NumPy loop. And I'm told a lot of thoughtful design has gone into Julia's parallel processing libraries... I have never found an easy way into that world, so I hope it's true.


I'm not even close to being able to describe all the other cool things Julia, which is still a young language, can do. Much of it will only be of interest to 'real' programmers. In many ways, Julia seems to be 'Python for C programmers'.

If you're really interested, read Steven's slides and especially his notebooks. Better yet, just install Julia and IJulia, and play around. Here's another tutorial and a cheatsheet to get you started.

Tuesday
Sep022014

Highlights from EuroSciPy

In July, Agile reported from SciPy in Austin, Texas, one of several annual conferences for people writing scientific software in the Python programming language. I liked it so much I was hungry for more, so at the end of my recent trip to Europe I traveled to the city of Cambridge, UK, to participate in EuroSciPy.

The conference was quite a bit smaller than its US parent, but still offered 2 days of tutorials, 2 days of tech talks, and a day of sprints. It all took place in the impressive William Gates Building, just west of the beautiful late Medieval city centre, and just east of Schlumberger's cool-looking research centre. What follows are my highlights...

Okay you win, Julia

Steven Johnson, an applied mathematician at MIT, gave the keynote on the first morning. His focus was Julia, the current darling of the scientific computing community, and part of a new ecosystem of languages that seek to cooperate, not compete. I'd been sort of ignoring Julia, in the hope that it might go away and let me focus on Python, the world's most useful language, and JavaScript, the world's most useful pidgin... but I don't think scientists can ignore Julia much longer.

I started writing about what makes Julia so interesting, but it turned into another post — up next. Spoiler: it's speed. [Edit: Here is that post! Julia in a nutshell.]

Learning from astrophysics

The Astropy project is a truly inspiring community — in just 2 years it has synthesized a dozen or so disparate astronomy libraries into an increasingly coherent and robust toolbox for astronomers and atrophysicists. What does this mean?

  • The software is well-tested and reliable.
  • Datatypes and coordinate systems are rich and consistent.
  • Documentation is useful and evenly distributed.
  • There is a tangible project to rally developers and coordinate funding.

Geophysicists might even be interested in some of the components of Astropy and the related SunPy project, for example:

  • astropy.units, just part of the ever-growing astropy library, as a unit conversion and quantity handler to compare with pint.
  • sunpy datatypes map and spectra for types of data that need special methods.
  • asv is a code-agnostic benchmarking library, a bit like freebench.

Speed dating for scientists

Much of my work is about connecting geoscientists in meaningful collaboration. There are several ways to achieve this, other than through project work: unsessions, wikis, hackathons, and so on. Now there's another way: speed dating.

Okay, it doesn't quite get to the collaboration stage, but Vaggi and his co-authors shared an ingenious way to connect people and give their professional relationship the best chance of success (an amazing insight, a new algorithm, or some software). They asked everyone at a small 40-person conference to complete a questionnaire that asked, among other things, what they knew about, who they knew, and, crucially, what they wanted to know about. Then they applied graph theory to find the most desired new connections (the matrix shows the degree of similarity of interests, red is high), and gave the scientists five 10-minute 'dates' with scientists whose interests overlapped with theirs, and five more with scientists who knew about fields that were new to them. Brilliant! We have to try this at SEG.

Vaggi, F, T Schiavinotto, A Csikasz-Nagy, and R Carazo-Salas (2014). Mixing scientists at conferences using speed dating. Poster presentation at EuroSciPy, Cambridge, UK, August 2014. Code on GitHub.

Vaggi, F, T Schiavinotto, J Lawson, A Chessel, J Dodgson, M Geymonat, M Sato, R Carazo Salas, A Csikasz Nagy (2014). A network approach to mixing delegates at meetings. eLife, 3. DOI: 10.7554/eLife.02273

Other highlights

  • sumatra to generate and keep track of simulations.
  • vispy, an OpenGL-based visualization library, now has higher-level, more Pythonic components.
  • Ian Osvald's IPython add-on for RAM usage.
  • imageio for lightweight I/O of image files.
  • nbagg backend for matplotlib version 1.4, bringin native (non-JS) interactivity.
  • An on-the-fly kernel chooser in upcoming IPython 3 (currently in dev).

All in all, the technical program was a great couple of days, filled with the usual note-taking and hand-shaking. I had some good conversations around my poster on modelr. I got a quick tour of the University of Cambridge geophysics department (thanks to @lizzieday), which made me a little nostalgic for British academic institutions. A fun week!

Thursday
Aug282014

Burrowing by burning

Most kind of mining are low-yield games. For example, the world's annual gold production would fit in a 55 m2 room. But few mining operations I'm aware of are as low yield as the one that ran in Melle, France, from about 500 till 950 CE, producing silver for the Carolingian empire and Charlemagne's coins. I visited the site on Saturday.

The tour made it clear just how hard humans had to work to bring about commerce and industry in the Middle Ages. For a start, of course they had no machines, just picks and shovels. But the Middle Jurassic limestone is silicic and very hard, so to weaken the rock they set fires against the face and thermally shocked the rock to bits. The technique, called fire-setting, was common in the Middle Ages, and was described in detail by Georgius Agricola in his book De Re Metallica (right; aside: the best translation of this book is by Herbert Hoover!). Apart from being stupefyingly dangerous, the method is slow: each fire got the miners about 4 cm further into the earth. Incredibly, they excavated about 20 km of galleries this way, all within a few metres of the surface.

The fires were set against the walls and fuelled with wood, mostly beech. Recent experiments have found that one tonne of wood yielded about one tonne of rock. Since a tonne of rock yields 5 kg of galena, and this in turn yields 10 g of silver, we see that producing 1.1 tonnes of silver per year — enough for 640,000 deniers — was quite a feat!

There are several limits to such a resource intensive operation: wood, distance from face to works, maintenance, and willing human labour, not to mention the usual geological constraints. It is thought that, in the end, operations ended due to a shortage of wood.

Several archaeologists visit the site regularly (here's one geospatial paper I found mentioning the site: Arles et al. 2013), and the evidence of their attempts to reproduce the primitive metallurgical methods were on display. Here's my attempt to label everything, based on what I could glean from the tour guide's rapid French:

The image of the denier coin is licensed CC-BY-SA by Wikipedia user Lequenne Gwendoline

Monday
Aug252014

The hack is back: An invitation to get creative

We're organizing another hackathon! It's free, and it's for everyone — not just programmers. So mark your calendar for the weekend of 25 and 26 October, sign up with a friend, and come to Denver for the most creative 48 hours you'll spend this year. Then stay for the annual geophysics fest that is the SEG Annual Meeting!

First things first: what is a hackathon? Don't worry, it's not illegal, and it has nothing to do with security. It has to do with ideas and collaborative tool creation. Here's a definition from Wikipedia:

A hackathon (also known as a hack day, hackfest, or codefest) is an event in which computer programmers and others involved in software development, including graphic designers, interface designers and project managers, collaborate intensively on software projects.

I would add that we just need a lot of scientists — you can bring your knowledge of workflows, attributes, wave theory, or rock physics. We need all of that.

Creativity in geophysics

The best thing we can do with our skills — and to acquire new ones — is create things. And if we create things with and alongside others, we learn from them and they learn from us, and we make lasting connections with people. We saw all this last year, when we built several new geophysics apps:

The event is at the THRIVE coworking space in downtown Denver, less than 20 minutes' walk from the convention centre — a Manhattan distance of under 1 mile. They are opening up especially for us — so we'll have the place to ourselves. Just us, our laptops, high-speed WiFi, and lots of tacos. 

Sign up here. It's going to be awesome.

The best in the biz


This business is blessed with some forward-looking companies that know all about innovation in subsurface geoscience. We're thrilled to have some of them as sponsors of our event, and I hope they will also be providing coders and judges for the event itself. So far we have generous support from dGB — creators of the OpendTect seismic interpretation platform — and ffA — creators the GeoTeric seismic attribute analysis toolbox. A massive Thank You to them both.

If you think your organization might be up for supporting the event, please get in touch! And remember, a fantastic way to support the event — for free! — is just to come along and take part. Sign your team up here!

Student grants

We know there's a lot going on at SEG on this same weekend, and we know it's easier to get money for traditional things like courses. So... We promise that this hackathon will bring you at least as much lasting joy, insight, and skill development as any course. And, if you'll write and tell us what you'd build, we'll consider you for one of four special grants of $250 to help cover your extra costs. No strings. Send your ideas to matt@agilegeoscience.com.