Visualisation tools overview

An important aspect of creating a visualization is the time consumed to make a plot and time efficiency of switching between plots. We can improve both by using interactive tools, such as sliders, drop down menus, buttons, and others. We can distinguish visualization tools by their complexity:

  • Visualisation packages – they are integrated with programming language require programming knowledge. Especially interesting are libraries providing interactive tools, however, these tools are coded from scratch. Examples: Shiny (R), D3(JavaScript), Plotly (R, Python).
  • Visualisation Software– You can quickly create interactive visualization by dragging- dropping data and selecting features from the panel. It requires basic programming knowledge, however, you can manage without it as well. Buttons and sliders are easier to add. Examples: Tableau, Qlik Sense, Power BI and others.

Visualisation Software is definitely time-saving. As long as I was studying, I could enroll full Tableau version for free (and I recommend it with all my heart!). Since I’ve graduated, I’ve been looking for a budget visualization tools. That’s how I come across free and fully interactive couple: Plotly + Python. I will use Plotly 2.5 and Python 3.5, but if you use different versions- please, let me know if they work together.

Tutorial

If you don’t have a clue about Plotly, I recommend you to start with these articles. If you are able to create even simple Plotly charts- keep reading. 

In the examples below I’ll use dataset World Happiness Report from Kaggle (seems that the dataset has changed over the last months, so you can download the valid version from my Github). The dataset contains the results of a survey regarding happiness around the world in 2015-2017. We will create scatterplot illustrating Life Expectancy with reference to GDP and Family.

Load data and libraries

Before we start, we load the required libraries

import plotly.plotly as py
import plotly.offline as offline
from plotly.offline import init_notebook_mode, iplot
from plotly.graph_objs import Scatter, Figure, Layout
offline.init_notebook_mode()

import IPython.display
from IPython.display import HTML, Image
init_notebook_mode(connected=True)

import pandas as pd
import numpy as np

and the data

data_happy = pd.read_csv('data.csv', index_col = 0)
data_happy.columns = ['Country', 'Region', 'Happiness_Rank', 'Happiness_Score', 'GDP_per_capita','Family', 'Life_Expectancy', 'Freedom',  'Government_Corruption', 'Generosity', 'Dystopia_Residual', 'Year']

Simple plot

As mentioned before, we will check how Life Expectancy refers to GDP and Family. We will also show the information about Region, Country and Year. Let’s create a simple chart for the 2015 year. It will be a start point for further modifications. 

Below you can find the code with the comments:

continents = list(data_happy.Region.unique())
year = 2015
data = []

for n, cont in enumerate(continents):     # create a subplot (trace) for each continent,
    mask = (data_happy.Year.values == year) & (data_happy.Region.values == cont)     # Create the binary mask
    trace = dict(
        text = data_happy.Country[mask],
        name = cont,  # As each bubble represents a separated country, annotate them with the country name
        x = data_happy.loc[mask, 'GDP_per_capita'],
        y = data_happy.loc[mask, 'Family'],
        mode = 'markers',
        marker = dict(size = data_happy.loc[mask, 'Life_Expectancy']*25) # Resize bubbles 25 times to make them more visible
                )
    data.append(trace)     # show all the traces on a single plot.


## merge it together
layout = dict(title = "Life Expectancy in %d" % year, xaxis=dict(title = 'GDP'), yaxis=dict(title = 'Family factor'))

fig = dict(data=data, layout=layout)

offline.iplot(fig, filename='Happiness_score')

We have a meaningful chart, however, we need to plot the same information for each year. We can plot them one by one which requires scrolling and inhibits comparison between years. A better solution will be using sliders and drop down menus to easily switch between years.

Sliders

To show a similar data for several years, we can add below a slider representing years. Our plot looks like this:

To display it, we have to modify the previous code. Firstly we create sliders for each year. Here is the crucial element – steps. A step is a dictionary informing us which trace will be displayed:

By trace, I mean single visualization. The basic chart was created out of several traces. Keep in mind – all subplots are passed in one list. It’s valid both for sliders and drop-down menus. It means, subplots for 2015, 2016 and 2017 are composed of many traces) delivered in a single list, and the boolean mask in steps is a pattern informing which plots will be displayed. Let’s visualize it to make it simple:

data = [2015_Continent12016_Continent12017_Continent12015_Continent22016_Continent2, ...]

Boolean_mask_2015=  [True, False, False] [True, False, False] …

Data is a list of traces. We see mixed continents and years. Boolean mask tells us what will be displayed on the single plot. So according to steps, the boolean mask for 2015 is [True, False, False], which will display every third subplot, starting from the first one:[2015_Continent1, 2015_Continent2, ...]

So let’s update data by 2016 and 2017. We will add an inner loofor year in years and additional condition in the mask:

for n, cont in enumerate(continents):
    for year in years:
        mask = (data_happy.Year.values == year) & (data_happy.Region.values == cont)
        trace = (dict(visible = False,
            text = daComparabilityta_happy.Country[mask], 
            name = cont,
            x = data_happy.loc[mask, 'GDP_per_capita'],
            y = data_happy.loc[mask, 'Family'],
            mode = 'markers',
            marker = dict(size = data_happy.loc[mask, 'Life_Expectancy']*25))
            )
        data.append(trace)
    data[int(n*len(years))]['visible'] = True

Note that we’ve changed the visible argument to False. Visible indicates what will be displayed before the user will interact with the chart. Initially let’s show only the first plot (2015). To do accomplish this, we set every third visible (starting with the first one) to true. However, if you drag the slider, then only boolean mask from steps will be in use.

Comparability

Default x and y-axis values adjust to the highest values in a given plot. It results with different axis scope for each plot, which makes the plots incomparable. To fix this, we should set the same data range for each plot, ex. min and max value among all subplots +- margin:

layout = dict(title = "Life Expectancy", sliders=sliders, 
xaxis=dict(title = 'GDP', range=[-0.05, max(data_happy.loc[:, 'GDP_per_capita']) + 0.1]),
yaxis=dict(title = 'Family factor', range=[-0.1, max(data_happy.loc[:, 'Family'])+0.1]))

As you can see, we appended to the layout sliders object. This is the steps object wrapped with additional display parameters:

sliders = [dict(
    active = 0,
    currentvalue = {"prefix": "Rok: "},
    pad = {"b": 40, "t": 40},
    steps = steps
)]

The code together:

continents = list(data_happy.Region.unique())
years = sorted(list(data_happy.Year.unique()))
data = []


## create sliders
steps = []
for i, year in enumerate(years):
    step = dict(
        args = ['visible', [False] * len(years) ],
        label= year)
    step['args'][1][i] = True 
    steps.append(step)
    
sliders = [dict(
    active = 0,
    currentvalue = {"prefix": "Rok: "},
    pad = {"b": 40, "t": 40},
    steps = steps
)]

## data
for n, cont in enumerate(continents):
    for year in years:
        mask = (data_happy.Year.values == year) & (data_happy.Region.values == cont)
        trace = (dict(visible = False,
            text = data_happy.Country[mask],      
            name = cont,
            x = data_happy.loc[mask, 'GDP_per_capita'],
            y = data_happy.loc[mask, 'Family'],
            mode = 'markers',
            marker = dict(size = data_happy.loc[mask, 'Life_Expectancy']*25))
                   )
        data.append(trace)
    data[int(n*len(years))]['visible'] = True



## merge it together
layout = dict(title = "Life Expectancy", sliders=sliders, 
              xaxis=dict(title = 'GDP', range=[-0.05, max(data_happy.loc[:, 'GDP_per_capita']) + 0.1]),
              yaxis=dict(title = 'Family factor', range=[-0.1, max(data_happy.loc[:, 'Family'])+0.1]))

fig = dict(data=data, layout=layout)

offline.iplot(fig, filename='Happiness_score')

 Dropdown menu

Dropdown menu (called also updatemenu) allow us to switch between charts by choosing the scope from the list. They have similar code structure to sliders. Our chart with dropdown menu should look like this:

Firstly create a list of parameters for each year:

list_updatemenus = []
for n, year in enumerate(years):
    visible = [False] * len(years)
    visible[n] = True
    temp_dict = dict(label = str(year),
                 method = 'update',
                 args = [{'visible': visible},
                         {'title': 'Year %d' % year}])
    list_updatemenus.append(temp_dict)

The code responsible for creating data dictionary is identical to analogical code from sliders. Finally, we place dropdown menu in Layout. In sliders it was sliders = sliders, here we add updatemenus item, which should look like this:

layout = dict(
   updatemenus=list([dict(buttons= list_updatemenus)]),
   xaxis=...,
   yaxis=...,
   title='Life Expectancy' )

And the code together:

continents = list(data_happy.Region.unique())
years = sorted(list(data_happy.Year.unique()))
data = []


list_updatemenus = []
for n, year in enumerate(years):
    visible = [False] * len(years)
    visible[n] = True
    temp_dict = dict(label = str(year),
                 method = 'update',
                 args = [{'visible': visible},
                         {'title': 'Year %d' % year}])
    list_updatemenus.append(temp_dict)

    
for n, cont in enumerate(continents):
    for year in years:
        mask = (data_happy.Year.values == year) & (data_happy.Region.values == cont)
        trace = (dict(visible = False,
            text = data_happy.Country[mask],      
            name = cont,
            x = data_happy.loc[mask, 'GDP_per_capita'],
            y = data_happy.loc[mask, 'Family'],
            mode = 'markers',
            marker = dict(size = data_happy.loc[mask, 'Life_Expectancy']*25))
                   )
        data.append(trace)
    data[int(n*len(years))]['visible'] = True
    

layout = dict(updatemenus=list([dict(buttons= list_updatemenus)]),
              xaxis=dict(title = 'GDP', range=[-0.05, max(data_happy.loc[:, 'GDP_per_capita']) + 0.1]),
              yaxis=dict(title = 'Family factor', range=[-0.1, max(data_happy.loc[:, 'Family'])+0.1]),
              title='Life Expectancy' )

fig = dict(data=data, layout=layout)
offline.iplot(fig, filename='update_dropdown')

 

Sliders + Drop-down menus

The dream plot: sliders allowing to choose the year and drop-down menus which allow choosing the dependent variable. An easy way to visualize all possible relationships in one chart. Sadly, I’m pretty sure it’s not possible. Why? The key is visibility argument. In both sliders and dropdown menus visibility is boolean mask determining which plots will be visible given pattern. The problem is that sliders mask is independent of dropdown menus’ mask. If you move the slider, sliders’ mask will be activated ignoring the one from dropdown menus and vice versa. However, if you will find any way to set sliders and dropdown menus working together – I will be over the moon to see the solution 🙂

Tips

Topics which will be helpful while working with Plotly.:

  • You can share plots stored online via URL, however, free version allows to store up to 25 charts (that’s why Basic chart is PNG. I save my chart limits 🙂 ).
  • If you create online charts, they are visible to the public for free. You should use offline mode with the business data, which you want to keep private.
    • To go offline, type at the top of your script/notebook: offline.init_notebook_mode() and call your plots by offline.iplot().

Hope this explanation is enough simply to call it Plotly sliders and dropdown menus for dummies 🙂 However, if you have any questions and doubts- don’t hesitate to ask. Also if the tutorial was helpful for you – please leave me a comment.