Visualization Strategies: Hierarchical Data

22 Dec 2008

One of the most challenging types of data to convert into a chart or visualization is also one of the most common: Multi-Level or 'Hierarchical' data.

Perhaps every category of data is composed of sub-categories, or a change in one data point has a major effect on surrounding data, but regardless, the standard library of charts and graphs doesn't offer much in the way of making hierarchical data clear, so here are a few alternatives:

Nested Categories:

Treemaps:

Recently "Treemap" style charts have come out of their origins in academia into common use. A personal favorite of mine, they allow aggregate categories to show through without losing the smaller constituent data.

Each category is sized according to what percent of the total it takes up, and child categories can be placed inside parents in a similar manner. Interactive versions often allow for 'drilling down' deeper into the data by clicking on a category to see its members full screen.

Treemaps seem to work best when the total number of categories at each level is fairly small (otherwise the hundreds or thousands of categories become tiny, undifferentiated squares), and each item fits neatly into a single sub-category.

As usual, Many Eyes leads the pack with excellent treemaps, and even offers a rather unique treemap over time. Other solutions include a free (non-commercial) desktop program, java library, jQuery plugin, flex component, and Excel add-in from Microsoft.

Multi-Level Pie:

If the categories are a little more concise then those in a treemap, or deeper data is required at a glance, multi-level pie charts may be the solution. As the concentric rings go 'out', each item is sized with respect to its contribution to the inner parent category, allowing for deep hierarchies to be understood at a glance.

Neoformix has a great illustrated example of building a multi-level pie chart, and Fusion Charts offers the best flash version that i've used in their powercharts package. HDGraph applies the concept to hard-drive space use, historically a big driver of multi-level visualization.

Bubble Diagrams:

Bubble diagrams, which are generally used to show multi-dimensional data (x, y, size, color) instead of hierarchical data, can be adapted easily, either by coloring all members of a similar group the same (for example Gapminder assigning members of each continent the same color), or by nesting the bubbles within another, to create an effect somewhat similar to a treemap.

Many Eyes offers a fairly simple version, mainly for highlighting relative sizes, as does Microsoft Excel, while Gapminder offers more complex abilities. I've yet to see anyone take on the nested-bubble approach, but feel free to contact me if you know of a solution.

Organizational Hierarchies:

Vertical Hierarchy:

Most commonly seen in the 'organization chart', vertical hierarchies are common for showing chains of membership or authority, but rarely have an added dimension to show actual numeric values.

The best examples use pictures effectively, and tend to represent data with each level's members being of similar importance. The JIT toolkit offers a javascript based approach to this, as do any others.

Horizontal Hierarchy:

In contrast to vertical hierarchy, these charts normally serve to hide large amounts of data that may be useless to a user, and allow fast drilling very deep into a hierarchy. Horizontal charts work best when the data is very deep, but each category only has a few subcategories, so the user can drill down very quickly.

My favorite horizontal hierarchy examples comes from the spacetree at the JIT.

Relational Data:

Sometimes data in hierarchies takes a more abstract form, where each element influences one or more elements, or just overlaps multiple categories. These types of data sets tend to provide an additional challenge to visualize, since laying them out optimally involves a bit of heavy-lifting mathematically.

Node-Link Diagrams:

Coming from a branch of mathematics, these diagrams offer different takes on data that is highly interrelated: one element may link to hundreds (or thousands) of additional data points, or none at all.

Node-Link charts (also called Network Diagrams) are best employed when data is related, but not necessarily in a clear hierarchy. Different dimensions of data can be shown by the size of each node, color, or even position.

[caption id="attachment_342" align="alignright" width="210" caption="A network diagram projected in 3D to highlight hierarchies"][/caption]

Some toolkits even allow the links to be different lengths, thicknesses, or colors to show an additional dimension of how two points are related.

A huge number of solutions exist for different variations on network diagrams, a few selections include efforts from constellation, fusion charts' power charts, many eyes, JIT, and jsviz.

JIT also offers an innovative hyperbolic network graph, allowing connections that are further down the chain to be hidden from the user.

Traditional Charts:

Stacked Bar:

Probably a simplest and most familiar chart type on the list, at a glance this chart only allows two 'levels' of depth, (parent category (Bar), and parent percentage (Inner bar)), but interactive versions allow the user to click on a bar or bar component and 'zoom' to see categories that make up that bar.

This chart is excellent for comparing simple, broad categories with few constituents, and benefits from general familiarity, but fails to impart more than two levels of depth at a glance, and it's often non-obvious that user interaction is possible.

Most common charting packages offer stacked bar charts, including Excel, and flash packages such as Open Flash Chart, FusionCharts, and AnyChart.

Stacked Shapes/Area:

Similar to stacked bar charts, many charts offer stacked Area or Shape (Pyramid, Cylinder, etc) charts.

Although these can be useful (particularly the pyramid chart, when one or two options in each 'bar' take up an overwhelming percent of the total, but the lower percents must still be visible), they suffer from the pitfalls of stacked bar, plus the added difficulty of resolving the relative sizes of 3D objects. Oftentimes relative proportions that are intuitive in rectangular dimensions ("This box is twice the area of that box") are lost in other shapes or in 3D ("How much more volume is in the_ bottom half_ of the pyramid than the top half?").

Again, these are supported by most common spreadsheet and graph libraries, but beware of obscuring data just to add eye-candy.

Common Challenges & Future Developments

One of the largest problems with presenting hierarchical data is size, once the members of a tree grow beyond a few dozen, putting them all on screen at once can be overwhelming, but knowing which elements to hide can be a challenge as well. This makes effective use of color, as well as choosing the right visualization for the job extremely important.

Hopefully in the future we'll see more development of toolkits and frameworks for quick-deployment of interactive charts, and better support for deep hierarchical data. Thanks to strides taken by a few of the javascript toolkits, flex/flare, and IBM's Many Eyes, most of the solutions listed above are becoming accessible to the average developer, although custom deployments still abound.

With the uptick in machine readable statistics being made available at corporate, as well as national and international levels, we're sure to see a rise in large, deep data sets in the future, and the corresponding need to understand that data intuitively.

TimShowers.com