Atif Vickers Rafael Whitley Nataniel Ayers Annabell Mohammed Khaleesi Plummer Clyde Blankenship Kole Roth Sufyaan Mcfadden Ismaeel Richmond Jeanne Lowry Lemar Everett Anne-Marie Glover Shannan Bowers Sammy Herman Emelia Mathis Jamelia Rodriquez Renesmae Pritchard Adnan Bridges Manveer Allman Vivian Hodges Ranveer Cline Muhammed Williamson Celeste Mac Bodhi Milner Johan Gaines Katya Esparza Honey Pruitt Ella-Mae Whelan Jean Marquez Ronaldo Kelly Rizwan Chase Nathaniel Gay Corbin Pittman Shakeel Wilder Kevin Nairn Faisal Goldsmith Derren Terry Kurtis Combs Connar Mansell Maud Stacey Eduard Mayo Kenzo Espinoza Adelle Delaney Aras Newman Tomasz Patel Jaskaran Higgins Keeleigh Colon Amba Rollins Lilly-May Kline Tasneem Hudson Leopold England Sylvia Bernal Conan Britton Daniaal Sharp Josh Cuevas Zidane Campbell Jayne Clarkson Riaan Hassan Liliana Molina Bernice Young Franklin Cochran Paloma Swan Bessie Ahmad Cleveland Partridge Kornelia Hunter Jessica Farrington Humairaa Stark Ralphie Carty Hamaad Howard Clayton Kramer Sameeha Seymour Deen Chadwick Jeremy Randall Jax Alvarez Sahara Terrell Chase Leech Megan Mcmillan Azeem Hawkins Bryony Lyon Farrell Gillespie Ronan Trejo Kira Peralta Karter Corona Demi-Leigh North Fred Callaghan Dominique Shea Iolo Stone Benito Acosta Misty Flores Tala Villarreal Teejay Ho Keegan Harrell Lydia Carson Tahir Kendall Margaret Worthington Harvey Stanton Georgina Lowe Mohammod Lozano Frances Andrews Suleman Delacruz Shanai Gallegos Ewan Roche Momina Shields Ellesse Conley Aled Smyth Elize Sanford Usman Grant Umar Guy Jamila Wolfe Rosina Larsen Micheal Walter Courtnie Whitehouse Alyce Harrington Saara Bouvet Meadow Silva Arnas Manning Zhane Boyer Marek Knights Divine Noel Ismail Larson Humphrey Black Olivia-Rose Byrne Malachy Laing Cinar Lancaster Anabia Avery Fenella Fields Trystan Robertson Tobey Haigh Sana Witt Caitlan Walmsley Magdalena Greenwood Jason Stout Colin Brennan Jeremy Reeves Eoghan Vance Mahek Franks Kailum Farley Pascal Noel Tate Moreno Amal Ferry Pharrell Broughton Jadon Holland Damien Byrne Jamie Mansell Amanpreet David Devon Parks Peyton Livingston Daniyal Weir Gwen Oakley Sommer Arias Farrah Howarth Pooja Hussain Tomi Brookes Shae Bob Donovan Lu Marcel Lane Christina Slater Sumaiya Salgado Tanvir Connor Elli Gamble Zahid Deacon Lisa-Marie Allison Sunil Lucero Zakariya Tucker Cell count Zero Low Medium High Unbound (missing) 0 yr 1 yr 2 yr 3 yr 4 yr 5 yr 6 yr 7 yr 8 yr
Figure 1: Shell plot visualizing the binned count of cancer cells in the blood of 164 anonymized subjects during treatment. Total observation period ranged from a few days to more than 8 years. Hover of the episode sectors to reveal additional information of the episode and linked observation and subject. Click on an episode to copy the data row to the clipboard. Scroll to zoom in and out.

Quick start

I have not created an R package for this plot yet. I would love to incorporate some feedback first so as to improve its usability and make it more robust and applicable in a wider range of real-life scenarios. For now you can follow the steps below:

Introduction

Around the year 2015, I devised a plot type that I baptised the shell plot (Figure 1). It allows for pattern discovery in longitudinal data with highly variable and positively skewed observation periods. One example that we are going to demonstrate here is the observational study where many patients (the subject) are being followed during treatments (events) ranging from a couple of days to several years. Another example could be the analysis of volcano eruption episodes (subject) during which all kinds of specific events occur such as explosions, fumarole activity, ash expulsion, … that deserve temporal analysis. The observational axis would typically represent the time dimension but could also represent distance or another quantity altogether. We will discuss some example at the end of this article.

In comparison with the older versions of this plot, I decided to make this one vector-based, to write it in pure HTML and to add interactivity such as zooming and showing details upon hovering. I went to the trouble of having most parameterization automated so that only a minimal user input is required, while still allowing endless flexibility in the layout and styling part.

Building the plot from scratch

Terminology

Let us take the time to think this through, because although it looks as simple as converting Cartesian coordinates to polar ones, there is a little bit more to it. First, to agree on the terminology used, I drew up Figure 2:

Figure 2: Terminology used for the shell plot.

Layout parameterization

Next, let us think about how to parameterize the layout. Details are shown in Figure 3. We begin by centering our plot around the \({x, y}\)-coordinate \({0, 0}\) and by confining the radius range of the annuli sectors containing the data to \([a, 1]\), where \(a\) is the size of the hole in the middle of the plot. Next we need to calculate the lengths of the labels \(l_i\) based on the font used, the font size and the font style. These label-lengths allow us to determine the maximum allowed sector angle \(\alpha_{max}\) so as to avoid that data sectors overlap with the subject labels. Next, we draw the outer observation sectors that cover the complete observation-period per subject and then we draw the episode sectors. The distribution of the data determines the length of the radial grid lines and finally, the positions of all the elements on the plot determine the framing of the view port. Now that the parameterization is in place, we can draw the process diagram as shown in Figure 4.

Figure 3: Parameterization of the layout for the shell plot. The circled numbers indicate the order in which to calculate the (derived) parameters. See text for more details.

Figure 4: Process diagram for the shell plot. Only one input is truly compulsory and that is the observational data with the features Subject, Start, End, and Group. Although Width and Height are strictly layout parameters, they are treated as true input values because of their importance. id represents an unique HTML identifier to avoid conflict when working with more than one shell plot per HTML page. The input gridlines corresponds to the at argument of the axis() function. To allow extra information to be shown upon hover, an infobox template must be provided. Layout and styling information can be provided as a YAML configuration file. Additional and dynamic styling (e.g. highlighting of sector upon hover) is provided as CSS style sheets.

Methodology

Data

At some point we will need data. However, let us first load our libraries. There are two libraries that I use all the time:

example.Rmd

library(magrittr)
library(data.table)

The simplified and pseudonymized dataset of cancer treatments looks like this (only first few records shown):

example.Rmd

epi <- fread("dat/cancer_tx.tsv", encoding = "UTF-8",
  stringsAsFactors = TRUE)
epi %>% head(10) %>% kable
Subject Start End Line Cycle Type Treatment Regime Measure Group Start_Day End_Day Age
Adelle Delaney 0.0000000 0.0547581 1 1 Cycle TX_1 Drug E 4 Low 1 21 48
Adelle Delaney 0.0547581 0.1122542 1 2 Cycle TX_1 Drug E 0 Zero 21 42 48
Adelle Delaney 0.1122542 0.1697502 1 3 Cycle TX_1 Drug E 1 Low 42 63 48
Adelle Delaney 0.1697502 0.2272463 1 4 Cycle TX_1 Drug E 0 Zero 63 84 48
Adelle Delaney 0.2272463 0.2847423 1 5 Cycle TX_1 Drug E 1 Low 84 105 48
Adelle Delaney 0.2847423 0.3422384 1 6 Cycle TX_1 Drug E 2 Low 105 126 48
Adelle Delaney 0.3997344 0.4572305 1 7 Cycle TX_1 Drug E 0 Zero 147 168 48
Adelle Delaney 0.6488840 0.7063800 1 0 Progression TX_1 Drug E 9 Medium 238 241 48
Adnan Bridges 0.0000000 0.0574960 1 1 Cycle TX_1 Drug G / Drug C 753 Unbound 1 22 59
Adnan Bridges 0.0876130 0.1451091 1 2 Cycle TX_1 Drug G / Drug C 436 Unbound 33 54 59

Start and End are the start and end times of the subject’s treatment episode expressed in years after initial admission, whereas Start_Day and End_Day were calculated by dividing the former by 365.2425. The features Line to Regime correspond to the treatment descriptors that you would typically find in oncology. Measure represents some quantifiable outcome measure. Finally, the Age is a fake feature to demonstrate the use of subject-specific data. It was drawn as epi[, Age := rnorm(1, 45, 15) %>% round, Subject].

Personalizing subject identifiers

Subject names are fake and are derived from the Quick Name Generator so as to remind ourselves that we are dealing with real people even if the data has been anonymized beyond recovery. If you are in a similar situation in which you want to protect the people’s identity while still reminding anyone that you are dealing with people instead of records or numbers, then you can collect a sufficiently large set of unique names (first and/or last names dependent on the number needed) in an headed tab-separated file names.tsv1 and do something along these lines:

example.Rmd

names <- fread("names.tsv", sep = "/t") # One feature called 'Name'
dat[, Subject := names$Name[Subject_ID %>% as.factor %>% as.numeric]

Note that the sep = "\t" argument is only needed here because the file consist of a single feature and fread might be confused about what separator to choose, especially if the names you collected from some online generator contains comma values.

Without the shell plot…

The summary below and Figure 5 show that the accumulated observation time per Subject typically varies greatly (from 0.066 years = 24 days to more than 8.5 years). Because we have such a wide range of observation periods among subjects, it becomes hard to keep the overview. Log-transforming the time axis is not ideal because it hinders interpretation and because there might be statistical objections. That is where the idea arose to bend the observational axis into a circle while keeping the subjects sorted from shorter to longer observation periods.

                Subject    TX_Time
  1:       Atif Vickers 0.06570977
  2:     Rafael Whitley 0.11225419
  3:     Nataniel Ayers 0.11499209
  4:  Annabell Mohammed 0.13415744
  5:   Khaleesi Plummer 0.13689535
 ---                              
160:        Elli Gamble 5.56068913
161:       Zahid Deacon 5.68389495
162: Lisa-Marie Allison 5.79614913
163:       Sunil Lucero 6.08636728
164:    Zakariya Tucker 8.55595940
_Figure 5_: Distribution of the total treatment time (expressed in days since admission) among all subjects. The layout in this plot is hindered by the longer observation period of _Zakariya Tucker_ which makes other observation scale along. In many realistic cases, the distribution of observation periods is even more skewed and leptocurtic then in this dataset.

Figure 5: Distribution of the total treatment time (expressed in days since admission) among all subjects. The layout in this plot is hindered by the longer observation period of Zakariya Tucker which makes other observation scale along. In many realistic cases, the distribution of observation periods is even more skewed and leptocurtic then in this dataset.

Shell function signature

Based on the process diagram shown above, we can write out the signature and usage of shell function:

shell.R

shell_plot <- function(episodes, width, height, layout, style, id,
  gridlines = NULL, shell_template = "shell/svg/shell_template.svg",
  infobox_template = "shell/html/infobox-template.html",
  javascript = "shell/js/shell.js", css = "shell/css/shell.css") {
  
  # Code goes here
}

epi %>% shell

The SVG template

In the next paragraph, we will start with plotting the subject labels. But for that, we first need a plotting device. Instead of using the base plotting system of R, I decided here to generate SVG code directly instead. In order to keep the plot function flexible, adjustable and configurable, I will utilize templates and work with the whisker package which implements mustache. Below is the first version of the SVG template for our plot:

shell_template.svg

<svg viewBox="-.7 -1.1 1.4 2.2" width="{{width}}" height="{{height}}">
  <g fill="none" stroke="#00B0F0" stroke-width=".004">
    <line x1="0" y1="-1.1" x2="0" y2="2.2" />
    <line x1="-.7" y1="0" x2="1.4" y2="0" />
    <rect x="-1" y="-1" width="2" height="2" />
    <circle cx="0" cy="0" r="{{layout.a}}" />
  </g>
  <g font-size="{{layout.m}}">
    {{#subjects}}
    <text x="{{Lab_X}}"  y="{{Lab_Y}}" text-anchor="end">{{Subject}}</text>
    {{/subjects}}
  </g>
</svg>

The width and height of the viewport are variable and must be completed, hence the double curly braces. In this tutorial, we will be using the chunk options to set these variables with width <- opts_current$get("out.width") and opts_current$get("out.height"), respectively (see below). Mind that the viewbox is currently being hard coded. Later, we will calculate these values from the content. The numbers .7 -1.1 1.4 2.2 are the left, top, width and height of the viewing window and are in the user units of the SVG element. Next we see a group tag <g> in which I temporally added some guidelines to help me positioning graph element during development. I will remove these later. Finally, we see a whisker loop structure that will iterate over subjects ({{#subjects}}<loop body>{{/subjects}}). For each subject, we will be gathering the subject’s identifier (Subject) for the label text as well as the Lab_X and Lab_Y features for the proper placement of the labels.

Loading width and height from chunk options

We will soon load and complete the SVG template, but for that we will need some data to complete. As mentioned above, we load the out.width and out.height chunk options into the variables width and height, respectively. First, we define the RMArkdown chunk as such:

```{r shell_dev_02, out.height="800px", out.width="500px"}

and then we acces these options:

Example.Rmd

width <- opts_current$get("out.width")
height <- opts_current$get("out.height")

Loading layout options from the YAML configuration file

Simply save a file named config.yaml in the root folder:

config.yaml

default:
  layout:
    a: .10  # Size of the central hole in the plot in user units (UU)
    b: .02  # Space between subject labels and subject axis (UU)

Remember that a represents the radius of the central hole in the middle of the graph and b is the relative distance between the labels and the subject axis (see Figure 3 for details). The config packages makes it really easy to read from and write to YAML configuration files.

shell.R

library(config)

config <- config::get(file = "shell/config.yaml")

layout <- config$layout

Using whisker for templating

In order to complete the above template, we reference the whisker package. With the function base::readchar() we can allocate the template to the string shell_template:

shell.R

library(whisker)

shell_template <- "svg/shell_template.svg" %>%
  readChar(., file.info(.)$size)

We are now ready to adjust the shell() function so as to print the subject labels:

shell.R

shell_plot_01 <- function(episodes, layout) {
  layout$n <- episodes$Subject %>% uniqueN
  layout$m <- (1 - layout$a) / layout$n
  
  subjects <- episodes[, .(
    Episode_Count = .N,
    Lab_X = -layout$b,
    Lab_Y = .GRP * layout$m - 1), Subject] %>%
    rowSplit %>% unname
  
  out_data <- list(
    width = width,
    height = height,
    subjects = subjects,
    layout = layout)
  
  out <- whisker.render(
    template = shell_template,
    data = out_data)

  return(out)
}

Dynamically setting font size

An added advantage about plotting directly to SVG, is that one can easily scale texts. Here, the font size of the labels is set to the value \(m\) (see Figure 3 for more details):

\[m=\frac{(1-a)}{n}\]

Within R, this task is trickier as text height in R using the base::strheight() function is based on the height of the capital letter ‘M’ (see ?strheight for details on this).

Completing a template from a data table

Notice the use of data.table in the above code snippet:

shell.R

[...]

subjects <- episode[, .(
  Episode_Count = .N,
  Lab_X = -layout$b,
  Lab_Y = .GRP * layout$m - 1), Subject] %>%
  rowSplit %>% unname

[...]

By now, data tables are well established I assume, but just in case: the above translates to:

Take the episodes data table, perform no filtering on it (nothing before the first comma), group by subject (third parameter inside the []) and then aggregate the rows in that group (containing the episode for that subject) into a single row with the features Episode_Count containing the number of episodes (using data.table’s special variable .N) and the label positions Lab_X and Lab_Y.

The whisker::rowSplit() function is a convenience function to split a data.frame, data.table or matrix object into a list of lists allowing for the whisker loop structure inside the template to work properly. In the variable out_data, we gather all the data to be processed by whisker to complete the template. Notice how the dot in {{layout.a}} (see template above) allows the reference of children (and grandchildren, etc…) within a complex object.

Plotting the subject labels

All that we are left to do is to test whether the subject label are plotted in the correct order, in the correct position and with the dynamically adjusted font size:

example.Rmd

canc %>% head(200) %>% shell_plot_01(layout) %>% cat
Adelle Delaney Adnan Bridges Aled Smyth Alyce Harrington Amal Ferry Amanpreet David Amba Rollins Anabia Avery Annabell Mohammed Anne-Marie Glover Aras Newman Arnas Manning
canc %>% head(500) %>% shell_plot_01(layout) %>% cat
Adelle Delaney Adnan Bridges Aled Smyth Alyce Harrington Amal Ferry Amanpreet David Amba Rollins Anabia Avery Annabell Mohammed Anne-Marie Glover Aras Newman Arnas Manning Atif Vickers Azeem Hawkins Benito Acosta Bernice Young Bessie Ahmad Bodhi Milner Bryony Lyon Caitlan Walmsley Celeste Mac Chase Leech Christina Slater Cinar Lancaster Clayton Kramer Cleveland Partridge Clyde Blankenship Colin Brennan Conan Britton Connar Mansell Corbin Pittman Courtnie Whitehouse Damien Byrne
Figure 6: Plotting the subject identifiers of the top 200 or 500 episodes on a canvas containing a number of guidelines.

Again, if you haven’t been reading the above and your are freaking out because you fear for the privacy of the people, the names are fake, see above for details on this. I am showing the labels immediately in their final font family (here, Google’s Roboto). Instead of the on-element attribute, I used CSS for that:

css/shell.css

@import url('https://fonts.googleapis.com/css2?family=Roboto:wght@100&display=swap');

.shell-subject-labels{
  font-family: 'Roboto', sans-serif;
}

If you are a pixel fucker, as I am, you might have noticed a slight shift in the subject identifiers respective to the guidelines. I will be adding a parameter to account for a relative shift for this and other <text> elements as shown in Figure 3.

Searchable text

Because the plot is being generated as SVG, the text is fully searchable, indexable, CSS stylable, Javascriptable, etc… (see Figure 7).

_Figure 7_: Because the plot is constructed as a SVG element, it consist purely of XML code allowing for all kinds of interactivity through CSS, Javascript and derivatives thereof. If you are viewing this in your browser, you can try and select some text in _Figure 6_.

Figure 7: Because the plot is constructed as a SVG element, it consist purely of XML code allowing for all kinds of interactivity through CSS, Javascript and derivatives thereof. If you are viewing this in your browser, you can try and select some text in Figure 6.

Plotting an annulus sector

Let us start plotting the visual elements representing the data. In a shell plot, the subject’s observation periods as well as the episodes it contains are being represented by annulus sectors as shown in Figure 2. In general, an annulus sector is being parameterized by:

  • center (\(c_x\) and \(c_y\))
  • inner radius (\(r_{inner}\))
  • outer radius (\(r_{outer}\))
  • start angle (\(\alpha_{min,j}\))
  • end angle (\(\alpha_{max,j}\))

However, we will have to manually construct this shape using the SVG <path> element. Suppose we would want to draw the upper arc of the observation highlighted in Figure 2 (with identifier abcdef) and we assume that the end angle for this sector is 50°, then we would create something like this (look at second group, the first is for the guidelines):

shell_template.svg

<svg viewBox="-.7 -1.1 1.4 2.2" width="200" height="200">
  <g fill="none" stroke="#00B0F0" stroke-width=".002">
    <line x1="+0.0" y1="-1.1" x2="0.0" y2="2.2" />
    <line x1="-1.1" y1="+0.0" x2="2.2" y2="0.0" />
    <rect x="-1" y="-1" width="2" height="2" />
    <circle cx="0" cy="0" r=".3" />
  </g>
  <g fill="none" stroke="#00B0F0" stroke-width=".02">
    <path d="M 0.000 -0.860 A 0.860 0.860 50 0 1 0.659 -0.553" />
  </g>
</svg>

The above reads as Move to \(\{0.000, -0.860\}\) (all numbers in user units as per Figure 3), where \(-0.860\) comes from: \[-0.860=-r_{outer, 2}=-1 \cdot \left[a+(1-a)\frac{(n-i+1)}{n}\right]=-1 \cdot \left[0.3+(1-0.3)\frac{(5-2+1)}{5}\right]\]

, then draw an elliptical Arc with radius \(r_{outer, 2}=0.860\) (twice because \(r_x=r_y\) are axes of an ellipse and we need a circle), along a \(\Delta\alpha\) of 50°, taking the shorter of both arcs (because \(\Delta\alpha<\pi\) as indicated with 0), that moves clockwise (1) to the point \(\{0.659, -0.553\}\) where \(0.659=0.860\cdot cos(90-50)\) and \(-0.553=-0.860\cdot sin(90-50)\) (because of the vertically flipped coordinate system of SVG). Here is the result:

Figure 8: Drawing the upper arc of the highlighted observation in Figure 2 using an SVG <path> element.

From drawing an arc, it is a small step to drawing the annulus sector. Here is the adjusted SVG template:

shell_template.svg

<svg viewBox="-.7 -1.1 1.4 2.2" width="200" height="200">
  <g fill="none" stroke="#00B0F0" stroke-width=".002">
    <line x1="+0.0" y1="-1.1" x2="0.0" y2="2.2" />
    <line x1="-1.1" y1="+0.0" x2="2.2" y2="0.0" />
    <rect x="-1" y="-1" width="2" height="2" />
    <circle cx="0" cy="0" r=".3" />
  </g>
  <g fill="none" stroke="#00B0F0" stroke-width=".02">
    <path d="
      M {{X_Start_Outer}} {{Y_Start_Outer}}
      A {{R_Outer}} {{R_Outer}} {{Delta_Angle}} {{Large_Arc}} 1 {{X_End_Outer}} {{Y_End_Outer}}
      L {{X_End_Inner}} {{Y_End_Inner}}
      A {{R_Inner}} {{R_Inner}} {{Delta_Angle}} {{Large_Arc}} 0 {{X_Start_Inner}} {{Y_Start_Inner}}
      L {{X_Start_Outer}} {{Y_Start_Outer}}" />
  </g>
</svg>

And here is some test code to complete the template based on the start angle (100°), the end angle (260°), the subject index (2), the total number of subjects (5):

shell.R

template <- "svg/shell_template_2.svg" %>%
  readChar(., file.info(.)$size)
out_data <- list()

# Knowns
angle_start <- 100
angle_end <- 260
i <- 2
n <- 5

# Calculated
out_data$Delta_Angle <- angle_end - angle_start
out_data$Large_Arc <- 1 * (out_data$Delta_Angle > 180)
out_data$R_Outer <-
  layout_default$a +
  (1 - layout_default$a) *
  (n - i + 1) / n
out_data$R_Inner <-
  layout_default$a +
  (1 - layout_default$a) *
  (n - i) / n
out_data$X_Start_Outer <- (
  out_data$R_Outer *
  cos((90 - angle_start) / 180 * pi)) %>%
  round(3)
out_data$Y_Start_Outer <- (
  -1 * out_data$R_Outer *
  sin((90 - angle_start) / 180 * pi)) %>%
  round(3)
out_data$X_End_Outer <- (
  out_data$R_Outer *
  cos((90 - angle_end) / 180 * pi)) %>%
  round(3)
out_data$Y_End_Outer <- (
  -1 * out_data$R_Outer *
  sin((90 - angle_end) / 180 * pi)) %>%
  round(3)
out_data$X_Start_Inner <- (
  out_data$R_Inner *
  cos((90 - angle_start) / 180 * pi)) %>%
  round(3)
out_data$Y_Start_Inner <- (
  -1 * out_data$R_Inner *
  sin((90 - angle_start) / 180 * pi)) %>%
  round(3)
out_data$X_End_Inner <- (
  out_data$R_Inner *
  cos((90 - angle_end) / 180 * pi)) %>%
  round(3)
out_data$Y_End_Inner <- (
  -1 * out_data$R_Inner *
  sin((90 - angle_end) / 180 * pi)) %>%
  round(3)

template %>%
  whisker.render(data = out_data) %>% 
  cat
Figure 9: Drawing our first annulus sector.

With this beautiful smile we have overcome the biggest technical hurdles. We will now move to the interactivity aspect.

Hover info

As promised, I am going to make the plot interactive. I will test this interactivity on a single shape (the one we prepared in previous paragraph). Suppose I would like, whenever I hover over the shape, the name of the subject to appear in an information box below. As it is good practise to separate the HTML elements from their styling and from their actions. We start by defining a piece of HTML template that is to be completed with the data:

infobox-template.html

<div id="infobox-template-03" style="display: none;">
  <aside>
    <table class="shell-subject">
      <caption>Subject details</caption>
      <colgroup>
        <col class="prop-label">
        <col class="prop-value">
      </colgroup>
      <tr>
        <th scope="row">Name</th>
        <td>{{Subject}}</td>
      </tr>
    </table>
  </aside>
</div>

The only piece of information to complete this template, in this case, is the subject’s identifier Subject. We know how to complete a template using whisker. However, this HTML snippet is to be complete not during R’s compile time but on the fly and so needs to reside on the client. So, where does the data come from? There are many options to store the data. Probably the slickest way to store the data client-side is by using the storage options of your browser. However, we are going to store the data inside attributes of the object we are hovering over. Having the data attached to its source makes a lot of sense, but may not be the most memory-efficient way. We are going to store the headers separately from the data values to avoid repetition, but even then we would be wasteful whenever we need multiple HTML elements types to respond to a mouse hovering.

We now wrap the svg inside a division add the infobox-03 element to hold the completed data:

shell_template.svg

<div class="shell-plot">
  <svg viewBox="-.7 -1.1 1.4 2.2" width="200" height="200" data-infobox-target="infobox-03"
    data-infobox-template="infobox-template-03">
    <g fill="none" stroke="#00B0F0" stroke-width=".002">
      <line x1="+0.0" y1="-1.1" x2="0.0" y2="2.2" />
      <line x1="-1.1" y1="+0.0" x2="2.2" y2="0.0" />
      <rect x="-1" y="-1" width="2" height="2" />
      <circle cx="0" cy="0" r=".3" />
      <text x="0" y="-0.5" text-anchor="middle" font-size=".2" fill="#00B0F0">Hover over me!</text>
    </g>
    <g fill="white" stroke="#00B0F0" stroke-width=".02">
      <path data-subject-name="abcdef" class="episode" d="
      M {{X_Start_Outer}} {{Y_Start_Outer}}
      A {{R_Outer}} {{R_Outer}} {{Delta_Angle}} {{Large_Arc}} 1 {{X_End_Outer}} {{Y_End_Outer}}
      L {{X_End_Inner}} {{Y_End_Inner}}
      A {{R_Inner}} {{R_Inner}} {{Delta_Angle}} {{Large_Arc}} 0 {{X_Start_Inner}} {{Y_Start_Inner}}
      L {{X_Start_Outer}} {{Y_Start_Outer}}" />
    </g>
  </svg>
  <div id="infobox-03" />
</div>

Notice the addition of the data-subject-name and class attributes on the path element as well as the division at the bottom which will be containing the completed template upon hover. Obviously, here one needs to work with the id attribute instead of the class attribute as we want to be able to have multiple shell plots on a HTML page each connected to their specific information box. So we have the data template, we have the data and we have the information box recipient. Now, we only need to write some Javascript code to handle the template completion upon hover:

shell.js

// Function carrying out the completion of the template
const complete = (template, data) => {
  for (var item in data) {
    template = template.replace(RegExp('{{' + item + '}}', 'g'), data[item]);
  }

  template = template.replace(/{\w+}/g, '');

  return template;
}

// Function to be executed upon mouse entering one of the source elements
const onEpisodeMouseEnter = (e) => {
  const targetName = e.target.closest("svg").dataset.infoboxTarget;
  const templateName = e.target.closest("svg").dataset.infoboxTemplate;

  target = document.getElementById(targetName);
  template = document.getElementById(templateName).innerHTML;

  data = { Subject: e.target.dataset.subjectName };
  target.innerHTML = complete(template, data);
}

// Function to be executed upon mouse leaving one of the source elements
const onEpisodeMouseLeave = (e) => {
  const targetName = e.target.closest("svg").dataset.infoboxTarget;

  target = document.getElementById(targetName);

  target.innerHTML = "";
}

// Function to be executed upon clicking one of the source elements
const onEpisodeMouseClick = (e) => {
  data = { Subject: e.target.dataset.subjectName };
  
  var type = "text/plain";
  var blob = new Blob([JSON.stringify(data)], { type });
  var clipData = [new ClipboardItem({ [type]: blob })];

  navigator.clipboard.write(clipData)
}

const onDocumentReady = () => {
  document.querySelectorAll(".episode").
    forEach(el => {
      el.addEventListener("mouseenter", onEpisodeMouseEnter);
      el.addEventListener("mouseleave", onEpisodeMouseLeave);
      el.addEventListener("click", onEpisodeMouseClick)
    });
}

// Binding the source element to the action function when document is ready
document.addEventListener('DOMContentLoaded', onDocumentReady);

Notice that I have mouseenter and mouseleave events declared as well as the click event to store the current data to the user’s clipboard.

Hover over me!
Figure 10: Hover over the sector to reveal the data attached. Mind that at this point no styling has been performed.

If you click the sector, the following text should be placed on your clipboard (try in browser as it may not work in your RStudio viewer):

{"Subject":"abcdef"}

Styling and legend

Within the R landscape I came across many ways to provide colors for a color scale. I have been thinking about the best way to implemented this for the shell plot and I came to the following conclusions:

  • Use the YAML configuration file for generic styling
  • Use CSS for dynamic or specialist styling
  • Provide grouping information as a factored feature in x

For example, suppose we want to define the color of the episode sectors based on the binned version of the Measure feature, we can prepare the data like so:

example.Rmd

measure_breaks <- c(-Inf, 2, 150, Inf)
labels <- c("Low", "Medium", "High")

epi[, Group := Measure %>%
    cut(breaks = measure_breaks, labels = labels) %>%
    addNA(ifany = TRUE)]

epi$Group %>% table
FALSE .
FALSE    Low Medium   High   <NA> 
FALSE   1676    852    155    283

With a properly defined factored feature, it becomes really easy to draw a legend. Here is some test code:

example.Rmd

library(colorspace)

cols_bord <- epi$Group %>%
  uniqueN %>%
  add(-1) %>%
  rainbow %>% 
  c("grey") %>% 
  darken(.3)
cols_body <- epi$Group %>%
  uniqueN %>%
  add(-1) %>%
  rainbow %>% 
  c("lightgrey")

out_data <- epi[order(Group), .(
  Lab_Y = -.9 + (.GRP) * .1,
  Symbol_Y = -.95 + (.GRP) * .1,
  Symbol_Stroke_Color = cols_bord[.GRP],
  Symbol_Color = cols_body[.GRP]), .(Label = Group)] %>% 
  rowSplit %>% unname

out_data = list(
  legend = out_data,
  Lab_Color = "#646464",
  Lab_Dim = .08,
  Symbol_Dim = .05,
  Symbol_X = -.5,
  Lab_X = -.4,
  Symbol_Stroke_Width = .01
  )

"svg/shell_template_4.svg" %>%
  readChar(., file.info(.)$size) %>%
  whisker.render(data = out_data) %>% 
  cat

Low Medium High NA

Figure 11: Example legend.

Label lengths

The last piece of the puzzle are the radial grid lines, including the subject axis. First, we need to calculate the maximum possible angle \(\alpha_{max}\) so that the observation sectors do not overlap with the subject labels. Then we can calibrate the observation axis so that \(t_{max}\) corresponds to \(\alpha_{max}\). Finally, we can use the pretty functionality of R so that the grid lines get logically-separated along the range of the observations axis.

To calculate \(\alpha_{max}\), we first need the computed lengths of the subject labels (\(\ell_i\)). I tried to use strwidth() and strheight() functions but later switched to writing my own function to measure the length of a label. The font family determines the width of the printed font, so we need to take that into consideration. As shown above in the CSS file, I have used the Roboto font but obviously, this will become an option. First, we look at the function to calculate label widths:

shell.R

library(systemfonts)

get_label_widths <- function(labels, font_height, style) {
  labels <- labels %>% as.character
  glyphs <- labels %>% strsplit("") %>% unlist %>% unique
  glyph_details <- glyph_info(glyphs, style$font$family) %>% as.data.table
  glyph_ar <- glyph_details[,
    .(Glyph = glyph, Width = x_advance, Height = height)]
  
  setkey(glyph_ar, "Glyph")
  
  label_width <- function(label) {
    label_ar <- glyph_ar[label %>% strsplit("") %>% 
        unlist, sum(Width) / max(Height)]
    
    return(font_height * label_ar)
  }

  label_widths <- labels %>%
    lapply(label_width) %>%
    unlist
  
  return(label_widths)
}

After converting the labels, which might be stored as a factor, to a character vector, we extract all used characters with %>% strplit("") %>% unlist. Then we query Roboto’s glyph table using systemfonts::glyph_info() function for these particular characters and store the resulting aspect ratios in the variable glyph_ar. Given a word and the font height (i.e. font size in user units), we can then easily calculate the width of a label as it would be printed on a screen in a particular font family and in a particular font size. Mind that, unlike for strwidth() and strheight() we do not actually need to plot the string, which saves a lot of resources (I may actually talk about green computing on another occasion).

shell.R

subject_names <- epi$Subject %>% levels %>% extract(1:13)
font_size <- (1 - layout_default$a) / length(subject_names)
style_01 <- list(font=list(family = "Roboto"))

(predicted <- subject_names %>%
  get_label_widths(font_size, style_01))
 [1] 0.6076923 0.5469231 0.4769231 0.6300000 0.4538462 0.6507692 0.4984615
 [8] 0.4984615 0.9000000 0.6784615 0.5923077 0.5746154 0.4223077

In case you want to reproduce this code and not want to install Roboto on your windows system, you can download the font, say in the fnt subfolder, import it and check its availability:

example.Rmd

library(extrafont)

font_import(paths = "fnt", prompt = FALSE)
fonts() %>% grep("Roboto", ., value = TRUE) %>% head

Radial grid lines

Now that we have the lengths of the printed labels, we can continue to calculate the max angle of the observation axis:

\[\alpha_{max}=min\left[atan2\left(-(\ell_i+b),\frac{(i-n)(1-a)}{n}\right)-a\right]-s\]

where \(\ell_i\) is the cumulative maximum of \(l_i\). Notice how I vectorize atan2 to ease programming:

shell.R

n <- length(predicted)
l_i <- cummax(predicted %>% rev) %>% rev
i <- 1:n
s <- pi / 12

x <- -(l_i + layout_default$b)
y <- (i - n) * (1  - layout_default$a) / n - layout_default$a

atan3 <- Vectorize(atan2) # Sorry, I couldn't resist

(alpha_max <- 5 * pi / 2 - min(atan3(-y, x)) - s)
FALSE [1] 5.241806

, which, in case you do not like radians, roughly corresponds to \(\frac{13\pi}{8}\) or 300°.

With \(\alpha_{max}\) known, we can finally draw the axial grid lines. I collect the max observation years of the top 200 rows of data (which corresponds to the 13 label names plotted before) and use pretty to find a nice distribution of angles:

shell.R

a <- .3
c <- .05
d <- .02

max_obs_years <- epi %>% head(200) %$% End %>% max
years_to_rad <- function(x) {
  (5 * pi / 2) - (x / max_obs_years * alpha_max)
}

obs_axis <- data.table(
  Label = c(0, max_obs_years) %>%
    pretty %>% 
    extract(. < (max_obs_years)))

obs_axis[, Axis_Angle := Label %>% years_to_rad]
obs_axis[, Gridline_Start_X := a * cos(Axis_Angle)]
obs_axis[, Gridline_Start_Y := -a * sin(Axis_Angle)]
obs_axis[, Gridline_End_X := (1 + d) * cos(Axis_Angle)]
obs_axis[, Gridline_End_Y := -(1 + d) * sin(Axis_Angle)]
obs_axis[, Axis_Label_X := (1 + d + c) * cos(Axis_Angle)]
obs_axis[, Axis_Label_Y := -(1 + d + c) * sin(Axis_Angle) + 0.025]
obs_axis[, Anchor := ifelse(abs(Axis_Label_X) < a, "middle",
  ifelse(Axis_Label_X > 0, "start", "end"))]

And here is the result:

out_data = list(
  grid_lines = obs_axis %>% rowSplit %>% unname,
  layout = list(m = .053),
  Gridline_Color = "pink",
  Gridline_Width = .006,
  Unit = "yr"
  )

"svg/shell_template_5.svg" %>%
  readChar(., file.info(.)$size) %>%
  whisker.render(data = out_data) %>% 
  cat

0 yr 1 yr 2 yr 3 yr 4 yr

Figure 12: Test of drawing the gridlines (pink).

Not shown here is the shortening of the grid lines based on the data as indicated in Figure 3.

Putting it all together

You can find the final solution on my Github page.

Examples

Conflicts

The Uppsala Conflict Data Program (UCDP) provides global data about the wars and other forms of armed conflicts that have raged during the past 30 years.

conflicts.html

<div class="infobox-template" style="display: none;">
  <aside>
    <p class="shell-message">Click episode to copy the data to the clipboard.</p>
    <table class="cycle">
      <caption>{{Subject}}</caption>
      <colgroup>
        <col class="prop-label">
        <col class="prop-value">
      </colgroup>
      <tr>
        <th scope="row">Period</th>
        <td>{{Start_Date}} - {{End_Date}}</td>
      </tr>
      <tr>
        <th scope="row">Type</th>
        <td>{{Type}}</td>
      </tr>
      <tr>
        <th scope="row">Location</th>
        <td>{{Location}}</td>
      </tr>
      <tr>
        <th scope="row">Country</th>
        <td>{{Country}}</td>
      </tr>
    </table>
    <table class="observations">
      <caption>Casualties</caption>
      <colgroup>
        <col class="prop-label">
        <col class="prop-value">
      </colgroup>
      <tr>
        <th scope="row">Side A | Side B</th>
        <td>{{Casualties_Sides}}</td>
      </tr>
      <tr>
        <th scope="row">Civilian</th>
        <td>{{Casualties_Civilians}}</td>
      </tr>
      <tr>
        <th scope="row">Estimate (low - high)</th>
        <td>{{Casualties_Estimate_String}}</td>
      </tr>
    </table>
    <table class="subject">
      <caption>Conflict details</caption>
      <colgroup>
        <col class="prop-label">
        <col class="prop-value">
      </colgroup>
      <tr>
        <th scope="row">Conflict name</th>
        <td>{{Subject}}</td>
      </tr>
      <tr>
        <th scope="row">Sides</th>
        <td>{{Sides}}</td>
      </tr>
      <tr>
        <th scope="row">Duration</th>
        <td>{{Conflict_Duration}} yrs</td>
      </tr>
      <tr>
        <th scope="row">Activity</th>
        <td>{{Percent_Active}} %</td>
      </tr>
    </table>
  </aside>
</div>
conflicts <- fread("dat/conflicts.tsv", sep = "\t")

breaks <- c(-Inf, 1, 2, 4, 8, Inf)
labels <- c("Very low", "Low", "Medium", "High", "Very high")

conflicts[, Group :=
    Casualties_Estimate %>% log %>%
    cut(breaks = breaks, labels = labels)]

layout <- copy(layout_default)
layout$legend <- data.table(
  Id = labels %>% c(NA),
  Label = labels %>% c("(missing)"),
  Color = colorRampPalette(c("steelblue", "orange"))(5) %>%
    c("lightgrey"))
layout$fx <- .4
layout$fy <- -1
layout$unit = "yr"
layout$legend_title <- "Casualties"

conflicts[
  Episode_Count > 10 & Conflict_Duration < 10 &
    Duration_Active > .8] %>%
  shell_plot(
    width = "500px",
    height = "500px",
    layout = layout,
    infobox_template = "html/conflicts.html"
  ) %>% cat
Afghanistan - United Kingdom, United States of America Suqour al-Sham, Syrian Liberation Front - HTS Republic of Bosnia-Herzegovina - Serbian Republic of Bosnia-Herzegovina Baz al-Islamiya, Dawn of Freedom Brigades, Islamic Front, Jabhat al-Akrad, Jabhat Fateh al-Sham, Liwa al-Sultan Murad, Majd al-Sham - IS 23rd Division, Jaysh al Nasr, Jaysh al-Nukhba, Sham Legion, SNA, Syrian Liberation Front - SDF Republic of Croatia - Serbian Republic of Krajina Ukraine: Lugansk Authenticity and Development Front, Islamic Front, Jabhat Fateh al-Sham - IS IS - Jaysh al-Sanadid, MFS, PYD Yemen (North Yemen): South Yemen Ukraine: Donetsk IS - Jaysh al-Sanadid, Khabour Guards, MFS, PYD IS - Islamic Front, Jabhat Fateh al-Sham Iraq - Kuwait Caborca Cartel, Sinaloa Cartel - Los Mayos faction - Sinaloa Cartel - Chapitos faction Sinaloa Cartel - El Pepillo faction - Sinaloa Cartel - Los 28 faction Los Aztecas - La Linea JVP - Civilians M23 - Civilians Serbia (Yugoslavia): Croatia Nicaragua: Government AFDL - Civilians SNA - SDF National Front for Liberation, SNA - SDF LPC - NPFL AFRC - Civilians Russia (Soviet Union): Nagorno-Karabakh IS - PYD Bosnia-Herzegovina: Croat ULIMO - Civilians LURD - Civilians Kamajors - Civilians Jalisco Cartel New Generation - La Familia Jalisco Cartel New Generation - Nueva Plaza Cartel Jalisco Cartel New Generation - Santa Rosa de Lima Cartel ULIMO - K - Civilians LPC - Civilians Cameroon: Ambazonia Serbia (Yugoslavia): Kosovo Sinaloa Cartel - Forces of Damaso Comando Vermelho - GDE ANC - Civilians Jalisco Cartel New Generation - La Nueva Familia Philippines: Islamic State RCD - Civilians IFP - Civilians IS - SDF Nigeria: Islamic State Afghanistan: Islamic State Jalisco Cartel New Generation - Sinaloa Cartel Ukraine: Novorossiya Guerreros Unidos - Los Rojos NPFL - Civilians Guatemala: Government Syria: Islamic State Government of South Sudan - Civilians Jalisco Cartel New Generation - Los Zetas Syria: Government Janjaweed - Civilians Casualties Very low Low Medium High Very high (missing) 0 yr 2 yr 4 yr 6 yr 8 yr

Waves

The NOAA/WDS tsunami database consists of the set of global historical tsunami events for which sufficient evidence is being provided.

waves <- fread("dat/waves.tsv", sep = "\t")

breaks <- c(-Inf, 1, 2, 5, 10, 25, Inf)
labels <- c("< 1m", "1-2m", "2-5m", "5-10m", "10-25m", "> 25m")

waves[, Group := Wave_Height %>%
    cut(breaks = breaks, labels = labels)]

layout <- copy(layout_default)

layout$legend <- data.table(
  Id = labels %>% c(NA),
  Label = labels %>% c("(missing)"),
  Color = colorRampPalette(c("steelblue", "orange"))(6) %>%
    c("lightgrey"))
layout$legend_title <- "Wave height"
layout$fx <- -.4
layout$fy <- -.9
layout$s <- .3
layout$unit <- function (x) {
  return(paste0(10 ^ x, "km"))
}

waves[, Start :=  Wave_Distance %>% add(.1) %>% log10]
waves[, End := Start + .1]
waves[, Subject :=  Quake %>% as.character %>% as.factor]

waves <- waves[!is.na(Start) & !is.na(Subject)]
waves[, Episode_Count := .N, Subject]

waves[Episode_Count > 10 & Episode_Count < 100] %>%
  shell_plot (
    width = "500px",
    height = "800px",
    layout = layout, infobox_template = "html/waves.html") %>% cat
2259 2326 2234 2368 315 5570 1929 682 5412 2096 1260 1747 2036 503 1719 322 2430 1898 2135 2110 2190 1899 2124 377 1038 2136 226 1717 1244 2431 2466 2126 1208 1734 367 2105 1718 967 1716 2235 965 2156 2025 2145 4622 2437 2436 1957 1241 2214 2046 1914 568 2403 2061 1741 2265 2092 2056 2254 904 966 2253 2001 1373 5628 1128 2227 1859 2280 2101 2202 1996 1766 799 1458 38 1761 2103 2086 5598 2208 1622 5571 424 1349 971 2398 2055 1880 2079 2006 5449 1840 468 2044 2277 2281 2076 2075 2194 5463 1856 1841 3691 2221 2125 456 2273 3228 2034 1457 2060 2080 1998 1508 1806 1649 2003 2016 2162 1446 2091 1234 2487 2261 1948 835 2257 2749 689 2339 864 837 1645 2083 2035 5430 4312 1947 5462 3647 1964 2252 2213 4442 1884 2164 2263 1984 2015 1907 1632 2429 1175 1331 2122 3019 2021 1982 2068 708 1339 1752 2249 2158 1505 1142 Wave height < 1m 1-2m 2-5m 5-10m 10-25m > 25m (missing) 1km 10km 100km 1000km 10000km

  1. I tend to avoid comma-separators where possibly for, I hope, the obvious reasons. Whoever has started the idea that a comma would be a good separator is a fool anyway↩︎