Making a Dendrogram

Hierchical cluster analysis (as we've been doing here) can be portrayed graphically by a dendrogram, which represents the clustering process in a tree-like graph.

One axis will (usually) represent an agglomeration coefficient. This depends on the clustering algorithm used, but is usually the distance between clusters joined at each stage. Along the other axis individual cases will be plotted giving a visualization of the relative size of each of the clusters.

Here's the dendrogram created when clustering the data using Ward's Method (squared Euclidean distance, variables normalized using z-scores)

Stage Distance Btw Cluster Ctrs. Total SSE At Each Stage
1 0.5576 0.278802
2 0.91788 0.737743
3 1.09913 1.287309
4 1.14743 1.861023
5 1.15104 2.436542
6 1.23339 3.053236
7 1.56906 3.837767
8 1.73841 4.70697
9 2.06592 5.739929
10 2.12478 6.802316
11 2.52013 8.06238
12 2.96935 9.547052
13 3.08863 11.09137
14 3.56697 12.87485
15 4.14933 14.94952
16 5.39885 17.64894
17 5.4883 20.39309
18 5.5328 23.15949
19 6.30109 26.31003
20 6.38562 29.50284
21 7.20355 33.10462
22 7.31179 36.76051
23 9.05223 41.28663
24 18.0712 50.32223
25 18.90347 59.77396
26 23.43379 71.49086
27 38.75994 90.87082
28 101.16 141.4508
29 123.0984 203

Along the horizontal axis of this graph is the distance between cluster centers (centroids), I'm not quite sure why for Ward's method this distance is used rather than SSEtotal, but it is the same as the increase in SSE at each stage of clustering multiplied by 2. For instance at the last stage (stage 29) the coefficient (or total SSE) is 203.000. The previous stage is 141.451.

Meaning the increase in total SSE is:

SSE29 - SSE28 = ΔSSE28-29

203.000 - 141.451 = 61.549

This has to be multiplied by 2, for reasons that are explained here.

61.549 * 2 = 123.098

Which as you'll see is where stage 29 is plotted on the horizontal axis.

 

The next step in this is to determine the number of clusters you want to work with.

How many clusters are there?