What is cluster analysis?

Cluster analysis is an exploratory data analysis tool for solving classification problems.  Its object is to sort cases (people, things, events, etc) into groups, or clusters, so that the degree of association is strong between members of the same cluster and weak between members of different clusters. The actual math is a loose collection of methods designed to cluster or group together objects based on similarities or dissimilarities (or variances between the objects described by their parameters).

The great thing about cluster analysis is that this can be done without any preconceived notions of what those groups are or how many there might be. This is a good thing if you want to remove any investigator bias in determining how heterogeneous your population of objects may be.

This means that cluster analysis is most useful in testing the null hypothesis that the entire group of objects that you have is homogeneous. If it turns out that you have more than one group, then you can perform additional analysis on your data that can reveal relationships between variables that weren't discernible with all the objects clumped together in one group.

12 cartoon faces to classify, summarised in the next table.

This is an example picture I got from http://obelia.jde.aca.mmu.ac.uk/multivar/ca.htm that gives a good general description of cluster analysis.

It demonstrates that we have an innate ability to group objects together based on variables describing each object. In this case variables can include: sex, presence of glasses, moustache, smile etc. Unsupervised learning in neural networks works by mimicking this ability to recognize features and patterns in data sets.

Back to main page