Visualization Strategies: Hierarchical Data22 Dec 2008
Perhaps every category of data is composed of sub-categories, or a change in one data point has a major effect on surrounding data, but regardless, the standard library of charts and graphs doesn't offer much in the way of making hierarchical data clear, so here are a few alternatives:<!-- more -->
Recently "Treemap" style charts have come out of their origins in academia into common use. A personal favorite of mine, they allow aggregate categories to show through without losing the smaller constituent data.
Each category is sized according to what percent of the total it takes up, and child categories can be placed inside parents in a similar manner. Interactive versions often allow for 'drilling down' deeper into the data by clicking on a category to see its members full screen.
Treemaps seem to work best when the total number of categories at each level is fairly small (otherwise the hundreds or thousands of categories become tiny, undifferentiated squares), and each item fits neatly into a single sub-category.
As usual, Many Eyes leads the pack with excellent treemaps, and even offers a rather unique treemap over time. Other solutions include a free (non-commercial) desktop program, java library, jQuery plugin, flex component, and Excel add-in from Microsoft.
If the categories are a little more concise then those in a treemap, or deeper data is required at a glance, multi-level pie charts may be the solution. As the concentric rings go 'out', each item is sized with respect to its contribution to the inner parent category, allowing for deep hierarchies to be understood at a glance.
Neoformix has a great illustrated example of building a multi-level pie chart, and Fusion Charts offers the best flash version that i've used in their powercharts package. HDGraph applies the concept to hard-drive space use, historically a big driver of multi-level visualization.
Bubble diagrams, which are generally used to show multi-dimensional data (x, y, size, color) instead of hierarchical data, can be adapted easily, either by coloring all members of a similar group the same (for example Gapminder assigning members of each continent the same color), or by nesting the bubbles within another, to create an effect somewhat similar to a treemap.
Many Eyes offers a fairly simple version, mainly for highlighting relative sizes, as does Microsoft Excel, while Gapminder offers more complex abilities. I've yet to see anyone take on the nested-bubble approach, but feel free to contact me if you know of a solution.
Most commonly seen in the 'organization chart', vertical hierarchies are common for showing chains of membership or authority, but rarely have an added dimension to show actual numeric values.
In contrast to vertical hierarchy, these charts normally serve to hide large amounts of data that may be useless to a user, and allow fast drilling very deep into a hierarchy. Horizontal charts work best when the data is very deep, but each category only has a few subcategories, so the user can drill down very quickly.
Sometimes data in hierarchies takes a more abstract form, where each element influences one or more elements, or just overlaps multiple categories. These types of data sets tend to provide an additional challenge to visualize, since laying them out optimally involves a bit of heavy-lifting mathematically.
Coming from a branch of mathematics, these diagrams offer different takes on data that is highly interrelated: one element may link to hundreds (or thousands) of additional data points, or none at all.
Node-Link charts (also called Network Diagrams) are best employed when data is related, but not necessarily in a clear hierarchy. Different dimensions of data can be shown by the size of each node, color, or even position.
[caption id="attachment_342" align="alignright" width="210" caption="A network diagram projected in 3D to highlight hierarchies"][/caption]
Some toolkits even allow the links to be different lengths, thicknesses, or colors to show an additional dimension of how two points are related.
JIT also offers an innovative hyperbolic network graph, allowing connections that are further down the chain to be hidden from the user.
Probably a simplest and most familiar chart type on the list, at a glance this chart only allows two 'levels' of depth, (parent category (Bar), and parent percentage (Inner bar)), but interactive versions allow the user to click on a bar or bar component and 'zoom' to see categories that make up that bar.
This chart is excellent for comparing simple, broad categories with few constituents, and benefits from general familiarity, but fails to impart more than two levels of depth at a glance, and it's often non-obvious that user interaction is possible.
Similar to stacked bar charts, many charts offer stacked Area or Shape (Pyramid, Cylinder, etc) charts.
Although these can be useful (particularly the pyramid chart, when one or two options in each 'bar' take up an overwhelming percent of the total, but the lower percents must still be visible), they suffer from the pitfalls of stacked bar, plus the added difficulty of resolving the relative sizes of 3D objects. Oftentimes relative proportions that are intuitive in rectangular dimensions ("This box is twice the area of that box") are lost in other shapes or in 3D ("How much more volume is in the_ bottom half_ of the pyramid than the top half?").
Again, these are supported by most common spreadsheet and graph libraries, but beware of obscuring data just to add eye-candy.
Common Challenges & Future Developments
One of the largest problems with presenting hierarchical data is size, once the members of a tree grow beyond a few dozen, putting them all on screen at once can be overwhelming, but knowing which elements to hide can be a challenge as well. This makes effective use of color, as well as choosing the right visualization for the job extremely important.
With the uptick in machine readable statistics being made available at corporate, as well as national and international levels, we're sure to see a rise in large, deep data sets in the future, and the corresponding need to understand that data intuitively.