Subscribe by email
Want updates? Enter your email


Delivered by Google FeedBurner
No spam, total privacy, opt out any time
News
« Fabric facies | Main | Fabric textures »
Thursday
Jul052012

Fabric clusters

There are many reasons we might want to use cluster analysis in our work. Geologists might want to sort hundreds of rock samples into a handful of rock types, a petrophysicist might want to group data points from well logs (as shown here), or a curious kitchen dweller may want to digitally classify patterns found in his (or her) linen collection.

Two algorithms worth knowing about when faced with any clustering problem are called k-means and fuzzy c-means clustering. They aren't the right solution for all clustering problems, but they are a good place to start.

k-means clustering — each data point gets assigned to one of k centroids (or centres) according to the centroid it is closest to. In the image shown here, the number of clusters is 2. The pink dots are closest to the left centroid, and the black dots are closest to the right centroid. To see how the classification is done, watch this short step-by-step video. The main disadvantage with this method is that if the clusters are poorly defined, the result seems rather arbitrary.

Fuzzy c-means clustering — each data point belongs to all clusters, but only to a certain degree. Each data point is assigned a probability of belonging to each cluster, and is thus easily assigned the class for which it has a highest probability. If a data point is midway between two clusters, it is still assigned to its closest cluster, but with lower probability. As the bottom image shows, data points on the periphery of cluster groups, such as those shown in grey may be equally likely to belong to both clusters. Fuzzy c-means clustering provides a way of capturing quantitative uncertainty, and even visualizing it.

Some observations fall naturally into clusters. It is just a matter of the observer choosing an adequate combination of attributes to characterize them. In the fabric and seismic examples shown in the previous post, only two of the four Haralick textures are needed to show a diagnostic arrangement of the data for clustering. Does the distribution of these thumbnail sections in the attribute space align with your powers of visual inspection? 

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments (3)

Thanks for posting the video link for the K-means clustering explanation. It is perhaps one of the best explanations I've seen on what happens with k-means clustering. One of the tools I work with does electrofacies prediction/classification and uses a type of k-means clustering. I often have to explain the method to clients, and it can be challenging to keep it simple.

July 26, 2012 | Unregistered CommenterKim McLean

@Kim,
I am glad you liked the video. The man in the video is Sebastian, who, among other things, is famous for his educational courses at Udacity dot com. So on one hand, those who know him are not surprised. Why does this video work? Five reasons, I think: 1) Specific and simple example, 2) don't skip any steps, 3) don't hide anything, 4) use plain language, and 5) show pictures. It occurs to me that this video is even effective if you turn the sound off. That has got to be some kind of measure of awesomeness, I think. The visual representation is unambiguous.
Have fun clustering.

July 26, 2012 | Registered CommenterEvan Bianco

You and Matt have a pretty awesome thing going with Agile, and I see you both as tremendous assets on the subject matter you discuss.

You hit the nail on the head as to why Sebastian's video works.

I deliver presentations on a fairly regular basis, and always hope I'm delivering a clear and concise message. Most of the time, I'm giving product demonstrations, but from time to time, I deliver workflow presentations which require a more in-depth conversation around the steps we're taking, and why they are important. It can be tricky not to complicate the message when you have to get into the details.

At any rate, I was happy to see the two posts on clustering, as I work with a tool called Facimage (part of Geolog) which does electrofacies classification and log property prediction through K-nn clustering along with MRGC (multi-resolution graph based clustering, which is a mouth full). It is a tool that I think could offer users a tremendous amount of information about their wells, but understanding what the tool is doing so that it is not just a 'black box' can be daunting for some clients.

Keep up the awesome posts. I look forward to more!

July 26, 2012 | Unregistered CommenterKim McLean

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>