Visualization Strategies: Hierarchical Data

Example of a Multi-level Pie ChartOne of the most challenging types of data to convert into a chart or visualization is also one of the most common: Multi-Level or 'Hierarchical' data.

Perhaps every category of data is composed of sub-categories, or a change in one data point has a major effect on surrounding data, but regardless, the standard library of charts and graphs doesn't offer much in the way of making hierarchical data clear, so here are a few alternatives:<!-- more -->

Nested Categories:

Treemaps:Sample Treemap from Smart Money

Recently "Treemap" style charts have come out of their origins in academia into common use.  A personal favorite of mine, they allow aggregate categories to show through without losing the smaller constituent data.

Each category is sized according to what percent of the total it takes up, and child categories can be placed inside parents in a similar manner. Interactive versions often allow for 'drilling down' deeper into the data by clicking on a category to see its members full screen.

Treemaps seem to work best when the total number of categories at each level is fairly small (otherwise the hundreds or thousands of categories become tiny, undifferentiated squares), and each item fits neatly into a single sub-category.

As usual, Many Eyes leads the pack with excellent treemaps, and even offers a rather unique treemap over time.  Other solutions include a free (non-commercial) desktop program, java library, jQuery plugin, flex component, and Excel add-in from Microsoft.

Multi-Level Pie:

If the categories are a little more concise then those in a treemap, or deeper data is required at a glance, multi-level pie charts may be the solution.  As the concentric rings go 'out', each item is sized with respect to its contribution to the inner parent category, allowing for deep hierarchies to be understood at a glance.

Neoformix has a great illustrated example of building a multi-level pie chart, and Fusion Charts offers the best flash version that i've used in their powercharts package. HDGraph applies the concept to hard-drive space use, historically a big driver of multi-level visualization.

Bubble Diagrams:

Bubble diagrams, which are generally used to show multi-dimensional data (x, y, size, color) instead of hierarchical data, can be adapted easily, either by coloring all members of a similar group the same (for example Gapminder assigning members of each continent the same color), or by nesting the bubbles within another, to create an effect somewhat similar to a treemap.

Many Eyes offers a fairly simple version, mainly for highlighting relative sizes, as does Microsoft Excel, while Gapminder offers more complex abilities. I've yet to see anyone take on the nested-bubble approach, but feel free to contact me if you know of a solution.

Organizational Hierarchies:

Vertical Hierarchy:

Most commonly seen in the 'organization chart', vertical hierarchies are common for showing chains of membership or authority, but rarely have an added dimension to show actual numeric values.

The best examples use pictures effectively, and tend to represent data with each level's members being of similar importance.  The JIT toolkit offers a javascript based approach to this, as do any others.

Horizontal Hierarchy:

In contrast to vertical hierarchy, these charts normally serve to hide large amounts of data that may be useless to a user, and allow fast drilling very deep into a hierarchy. Horizontal charts work best when the data is very deep, but each category only has a few subcategories, so the user can drill down very quickly.

My favorite horizontal hierarchy examples comes from the spacetree at the JIT.

Relational Data:

Sometimes data in hierarchies takes a more abstract form, where each element influences one or more elements, or just overlaps multiple categories.  These types of data sets tend to provide an additional challenge to visualize, since laying them out optimally involves a bit of heavy-lifting mathematically.

Node-Link Diagrams:

Coming from a branch of mathematics, these diagrams offer different takes on data that is highly interrelated: one element may link to hundreds (or thousands) of additional data points, or none at all.

Node-Link charts (also called Network Diagrams) are best employed when data is related, but not necessarily in a clear hierarchy.  Different dimensions of data can be shown by the size of each node, color, or even position.

[caption id="attachment_342" align="alignright" width="210" caption="A network diagram projected in 3D to highlight hierarchies"][/caption]

Some toolkits even allow the links to be different lengths, thicknesses, or colors to show an additional dimension of how two points are related.

A huge number of solutions exist for different variations on network diagrams, a few selections include efforts from constellation, fusion charts' power charts, many eyes, JIT, and jsviz.

JIT also offers an innovative hyperbolic network graph, allowing connections that are further down the chain to be hidden from the user.

Traditional Charts:

Stacked Bar:

Probably a simplest and most familiar chart type on the list, at a glance this chart only allows two 'levels' of depth, (parent category (Bar), and parent percentage (Inner bar)), but interactive versions allow the user to click on a bar or bar component and 'zoom' to see categories that make up that bar.

This chart is excellent for comparing simple, broad categories with few constituents, and benefits from general familiarity, but fails to impart more than two levels of depth at a glance, and it's often non-obvious that user interaction is possible.

Most common charting packages offer stacked bar charts, including Excel, and flash packages such as Open Flash Chart, FusionCharts, and AnyChart.

Stacked Shapes/Area:

Similar to stacked bar charts, many charts offer stacked Area or Shape (Pyramid, Cylinder, etc) charts.

Although these can be useful (particularly the pyramid chart, when one or two options in each 'bar' take up an overwhelming percent of the total, but the lower percents must still be visible), they suffer from the pitfalls of stacked bar, plus the added difficulty of resolving the relative sizes of 3D objects.  Oftentimes relative proportions that are intuitive in rectangular dimensions ("This box is twice the area of that box") are lost in other shapes or in 3D ("How much more volume is in the_ bottom half_ of the pyramid than the top half?").

Again, these are supported by most common spreadsheet and graph libraries, but beware of obscuring data just to add eye-candy.

Common Challenges & Future Developments

One of the largest problems with presenting hierarchical data is size, once the members of a tree grow beyond a few dozen, putting them all on screen at once can be overwhelming, but knowing which elements to hide can be a challenge as well.  This makes effective use of color, as well as choosing the right visualization for the job extremely important.

Hopefully in the future we'll see more development of toolkits and frameworks for quick-deployment of interactive charts, and better support for deep hierarchical data.  Thanks to strides taken by a few of the javascript toolkits, flex/flare, and IBM's Many Eyes, most of the solutions listed above are becoming accessible to the average developer, although custom deployments still abound.

With the uptick in machine readable statistics being made available at corporate, as well as national and international levels, we're sure to see a rise in large, deep data sets in the future, and the corresponding need to understand that data intuitively.

Creating Effective Cartograms

[caption id="attachment_375" align="alignright" width="210" caption="A Standard 2008 Election Map (Upper), and a Cartogram skewed by Population (Lower)"] A Standard 2008 Election Map (Upper), and a Cartogram skewed by Population (Lower)[/caption]

Cartograms, or visualizations of an area skewed by some variable, are a powerful tool to control for disparities over a large area, especially with respect to politics.

A relatively large but sparsely populated area will dominate a standard projection, whereas a cartogram allows populated areas to be warped to show their true influence (See sidebar).

Applications of Cartograms:

Socio-economic data is the most obvious use case for cartograms, particularly data from the United Nations and National Elections.

<!-- more -->The cartogram strategy can be applied to just about any area visualization, and works particularly well where there are major disparities between area size and overall effect.

Algorithms and Tools:

A general outline of the cartogram creation process, as well as some excellent alternatives to the standard 'skewed border' approach can be found at "Scaling Counties in a Checkerboard State" over at style.org.

Actual implementation of cartograms seems to often follow the algorithm first published here. Desktop versions in Java (with code) are available here and here.  There's a rundown of more methods at indiemaps.

[caption id="attachment_386" align="alignright" width="210" caption="A Rescaled Election Cartogram. Can You Identify the State?"]A rescaled election cartogram. Can you identify the state?[/caption]

Another post at style.org looks at alternate skewing methods, namely squarifying the areas in question, and then expanding them by a fixed ratio with respect to the data.

Non-Standard Cartograms:

Some of the most beautiful work in cartograms comes from the SASI group and Dr. Daniel Dorling.

Indiemaps blog has a number of posts on cartograms, including the promise of python source code, and some great visuals.

I think cartograms are an untapped resource for commerical data, especially for geographically disbursed figures like sales or conversion rates. Hopefully in the future we'll see more tools centered around this resource, as well as a decent online generator.

O'Reilly on the Future of Massive Data Analysis

There's a post by Joseph Hellerstein worth a read over on O'Reilly Radar: The Commoditization of Massive Data Analysis.  It's more enterprise focused then small-normal business focused, but that's just a consequence of the target audience.

His primary point is becoming especially pertinent to web companies and smaller developers: The convergence of dropping hardware prices and machine-readable APIs is making the storage and processing of vast amounts of information practical.

We are at the beginning of what I call The Industrial Revolution of Data. We're not quite there yet, since most of the digital information available today is still individually "handmade": prose on web pages, data entered into forms, videos and music edited and uploaded to servers. But we are starting to see the rise of automatic data generation "factories" such as software logs, UPC scanners, RFID, GPS transceivers, video and audio feeds.

It's already reasonable for a site on a commodity web host to store every user and search interaction, or a database of tens of millions of data points, and in the future it will only get easier. The question is, what tools will we use to make sense of all of this?

His analysis reduces the field to SQL (via Oracle) and MapReduce (via Hadoop), but once we look beyond the enterprise, tools like Erlang (or functional programming in general) and the emerging CouchDB show promise, not to mention some of the cloud computing entries from Amazon and others.

On the visualization side of things, tools like Processing and the Prefuse Toolkit are seeing quick uptake, as well as more focused commercial tools like FusionCharts.

Whatever the toolchain turns out to be, those of us with an interest in understanding information have the opportunity to be on the forefront of the change, and if we don't gain expertise in the available options early, we risk being left behind.

A Roundup of U.S. Election Visualizations

Well Formed Data is offering this post full of offbeat election and presidential data visualizations.

It's definitely worth a look, I'm particularly enamored with the New York Times' Presidential Physique Graph, although the data density is a bit low for being so large.

A Collection of Real UI Patterns

Peter Morville has established this great flickr set of screenshots of the user interfaces from actual sites, broken down by purpose & category.

Because they're all from major sites, they do a great job illustrating both the prevailing UI paradigms, as well as the different spins on each approach.