Figure 1: Shell plot visualizing the binned count of cancer cells in the blood of 164 anonymized subjects during treatment. Total observation period ranged from a few days to more than 8 years. Hover of the episode sectors to reveal additional information of the episode and linked observation and subject. Click on an episode to copy the data row to the clipboard. Scroll to zoom in and out.

Quick start

I have not created an R package for this plot yet. I would love to incorporate some feedback first so as to improve its usability and make it more robust and applicable in a wider range of real-life scenarios. For now you can follow the steps below:

Download and install R and RStudio
Prepare a longitudinal data.table so that each row contains an episode with a Subject and with numeric features called Start and End.
Decide on which feature to determine the coloring of the episode sectors and place the calculated color for each episode in the feature named Group.
Download or clone the Shell plot repository and copy the shell subfolder as an immediate subfolder of your project folder (otherwise adjust respective paths in the main functions default arguments).
Generate the HTML containing the SVG, JavaScript and CSS of the shell plot like so:
```
my_dt %>% shell %>% cat
```
Either render the HTML directly in RMarkdown or copy-paste it into your own HTML page.

Introduction

Around the year 2015, I devised a plot type that I baptised the shell plot (Figure 1). It allows for pattern discovery in longitudinal data with highly variable and positively skewed observation periods. One example that we are going to demonstrate here is the observational study where many patients (the subject) are being followed during treatments (events) ranging from a couple of days to several years. Another example could be the analysis of volcano eruption episodes (subject) during which all kinds of specific events occur such as explosions, fumarole activity, ash expulsion, … that deserve temporal analysis. The observational axis would typically represent the time dimension but could also represent distance or another quantity altogether. We will discuss some example at the end of this article.

In comparison with the older versions of this plot, I decided to make this one vector-based, to write it in pure HTML and to add interactivity such as zooming and showing details upon hovering. I went to the trouble of having most parameterization automated so that only a minimal user input is required, while still allowing endless flexibility in the layout and styling part.

Building the plot from scratch

Terminology

Let us take the time to think this through, because although it looks as simple as converting Cartesian coordinates to polar ones, there is a little bit more to it. First, to agree on the terminology used, I drew up Figure 2:

Figure 2: Terminology used for the shell plot.

Layout parameterization

Next, let us think about how to parameterize the layout. Details are shown in Figure 3. We begin by centering our plot around the ${x, y}$-coordinate ${0, 0}$ and by confining the radius range of the annuli sectors containing the data to $[a, 1]$, where $a$ is the size of the hole in the middle of the plot. Next we need to calculate the lengths of the labels $l_i$ based on the font used, the font size and the font style. These label-lengths allow us to determine the maximum allowed sector angle $\alpha_{max}$ so as to avoid that data sectors overlap with the subject labels. Next, we draw the outer observation sectors that cover the complete observation-period per subject and then we draw the episode sectors. The distribution of the data determines the length of the radial grid lines and finally, the positions of all the elements on the plot determine the framing of the view port. Now that the parameterization is in place, we can draw the process diagram as shown in Figure 4.

Figure 3: Parameterization of the layout for the shell plot. The circled numbers indicate the order in which to calculate the (derived) parameters. See text for more details.

Figure 4: Process diagram for the shell plot. Only one input is truly compulsory and that is the observational data with the features Subject, Start, End, and Group. Although Width and Height are strictly layout parameters, they are treated as true input values because of their importance. id represents an unique HTML identifier to avoid conflict when working with more than one shell plot per HTML page. The input gridlines corresponds to the at argument of the axis() function. To allow extra information to be shown upon hover, an infobox template must be provided. Layout and styling information can be provided as a YAML configuration file. Additional and dynamic styling (e.g. highlighting of sector upon hover) is provided as CSS style sheets.

Methodology

Data

At some point we will need data. However, let us first load our libraries. There are two libraries that I use all the time:

example.Rmd

library(magrittr)
library(data.table)

The simplified and pseudonymized dataset of cancer treatments looks like this (only first few records shown):

example.Rmd

epi <- fread("dat/cancer_tx.tsv", encoding = "UTF-8",
  stringsAsFactors = TRUE)
epi %>% head(10) %>% kable

Subject	Start	End	Line	Cycle	Type	Treatment	Regime	Measure	Group	Start_Day	End_Day	Age
Adelle Delaney	0.0000000	0.0547581	1	1	Cycle	TX_1	Drug E	4	Low	1	21	48
Adelle Delaney	0.0547581	0.1122542	1	2	Cycle	TX_1	Drug E	0	Zero	21	42	48
Adelle Delaney	0.1122542	0.1697502	1	3	Cycle	TX_1	Drug E	1	Low	42	63	48
Adelle Delaney	0.1697502	0.2272463	1	4	Cycle	TX_1	Drug E	0	Zero	63	84	48
Adelle Delaney	0.2272463	0.2847423	1	5	Cycle	TX_1	Drug E	1	Low	84	105	48
Adelle Delaney	0.2847423	0.3422384	1	6	Cycle	TX_1	Drug E	2	Low	105	126	48
Adelle Delaney	0.3997344	0.4572305	1	7	Cycle	TX_1	Drug E	0	Zero	147	168	48
Adelle Delaney	0.6488840	0.7063800	1	0	Progression	TX_1	Drug E	9	Medium	238	241	48
Adnan Bridges	0.0000000	0.0574960	1	1	Cycle	TX_1	Drug G / Drug C	753	Unbound	1	22	59
Adnan Bridges	0.0876130	0.1451091	1	2	Cycle	TX_1	Drug G / Drug C	436	Unbound	33	54	59

Start and End are the start and end times of the subject’s treatment episode expressed in years after initial admission, whereas Start_Day and End_Day were calculated by dividing the former by 365.2425. The features Line to Regime correspond to the treatment descriptors that you would typically find in oncology. Measure represents some quantifiable outcome measure. Finally, the Age is a fake feature to demonstrate the use of subject-specific data. It was drawn as epi[, Age := rnorm(1, 45, 15) %>% round, Subject].

Personalizing subject identifiers

Subject names are fake and are derived from the Quick Name Generator so as to remind ourselves that we are dealing with real people even if the data has been anonymized beyond recovery. If you are in a similar situation in which you want to protect the people’s identity while still reminding anyone that you are dealing with people instead of records or numbers, then you can collect a sufficiently large set of unique names (first and/or last names dependent on the number needed) in an headed tab-separated file names.tsv¹ and do something along these lines:

example.Rmd

names <- fread("names.tsv", sep = "/t") # One feature called 'Name'
dat[, Subject := names$Name[Subject_ID %>% as.factor %>% as.numeric]

Note that the sep = "\t" argument is only needed here because the file consist of a single feature and fread might be confused about what separator to choose, especially if the names you collected from some online generator contains comma values.

Without the shell plot…

The summary below and Figure 5 show that the accumulated observation time per Subject typically varies greatly (from 0.066 years = 24 days to more than 8.5 years). Because we have such a wide range of observation periods among subjects, it becomes hard to keep the overview. Log-transforming the time axis is not ideal because it hinders interpretation and because there might be statistical objections. That is where the idea arose to bend the observational axis into a circle while keeping the subjects sorted from shorter to longer observation periods.

                Subject    TX_Time
  1:       Atif Vickers 0.06570977
  2:     Rafael Whitley 0.11225419
  3:     Nataniel Ayers 0.11499209
  4:  Annabell Mohammed 0.13415744
  5:   Khaleesi Plummer 0.13689535
 ---                              
160:        Elli Gamble 5.56068913
161:       Zahid Deacon 5.68389495
162: Lisa-Marie Allison 5.79614913
163:       Sunil Lucero 6.08636728
164:    Zakariya Tucker 8.55595940

Figure 5: Distribution of the total treatment time (expressed in days since admission) among all subjects. The layout in this plot is hindered by the longer observation period of Zakariya Tucker which makes other observation scale along. In many realistic cases, the distribution of observation periods is even more skewed and leptocurtic then in this dataset.

Shell function signature

Based on the process diagram shown above, we can write out the signature and usage of shell function:

shell.R

shell_plot <- function(episodes, width, height, layout, style, id,
  gridlines = NULL, shell_template = "shell/svg/shell_template.svg",
  infobox_template = "shell/html/infobox-template.html",
  javascript = "shell/js/shell.js", css = "shell/css/shell.css") {
  
  # Code goes here
}

epi %>% shell

The SVG template

In the next paragraph, we will start with plotting the subject labels. But for that, we first need a plotting device. Instead of using the base plotting system of R, I decided here to generate SVG code directly instead. In order to keep the plot function flexible, adjustable and configurable, I will utilize templates and work with the whisker package which implements mustache. Below is the first version of the SVG template for our plot:

shell_template.svg

<svg viewBox="-.7 -1.1 1.4 2.2" width="{{width}}" height="{{height}}">
  <g fill="none" stroke="#00B0F0" stroke-width=".004">
    <line x1="0" y1="-1.1" x2="0" y2="2.2" />
    <line x1="-.7" y1="0" x2="1.4" y2="0" />
    <rect x="-1" y="-1" width="2" height="2" />
    <circle cx="0" cy="0" r="{{layout.a}}" />
  </g>
  <g font-size="{{layout.m}}">
    {{#subjects}}
    <text x="{{Lab_X}}"  y="{{Lab_Y}}" text-anchor="end">{{Subject}}</text>
    {{/subjects}}
  </g>
</svg>

The width and height of the viewport are variable and must be completed, hence the double curly braces. In this tutorial, we will be using the chunk options to set these variables with width <- opts_current$get("out.width") and opts_current$get("out.height"), respectively (see below). Mind that the viewbox is currently being hard coded. Later, we will calculate these values from the content. The numbers .7 -1.1 1.4 2.2 are the left, top, width and height of the viewing window and are in the user units of the SVG element. Next we see a group tag <g> in which I temporally added some guidelines to help me positioning graph element during development. I will remove these later. Finally, we see a whisker loop structure that will iterate over subjects ({{#subjects}}<loop body>{{/subjects}}). For each subject, we will be gathering the subject’s identifier (Subject) for the label text as well as the Lab_X and Lab_Y features for the proper placement of the labels.

Loading width and height from chunk options

We will soon load and complete the SVG template, but for that we will need some data to complete. As mentioned above, we load the out.width and out.height chunk options into the variables width and height, respectively. First, we define the RMArkdown chunk as such:

```{r shell_dev_02, out.height="800px", out.width="500px"}

and then we acces these options:

Example.Rmd

width <- opts_current$get("out.width")
height <- opts_current$get("out.height")

Loading layout options from the YAML configuration file

Simply save a file named config.yaml in the root folder:

config.yaml

default:
  layout:
    a: .10  # Size of the central hole in the plot in user units (UU)
    b: .02  # Space between subject labels and subject axis (UU)

Remember that a represents the radius of the central hole in the middle of the graph and b is the relative distance between the labels and the subject axis (see Figure 3 for details). The config packages makes it really easy to read from and write to YAML configuration files.

shell.R

library(config)

config <- config::get(file = "shell/config.yaml")

layout <- config$layout

Using `whisker` for templating

In order to complete the above template, we reference the whisker package. With the function base::readchar() we can allocate the template to the string shell_template:

shell.R

library(whisker)

shell_template <- "svg/shell_template.svg" %>%
  readChar(., file.info(.)$size)

We are now ready to adjust the shell() function so as to print the subject labels:

shell.R

shell_plot_01 <- function(episodes, layout) {
  layout$n <- episodes$Subject %>% uniqueN
  layout$m <- (1 - layout$a) / layout$n
  
  subjects <- episodes[, .(
    Episode_Count = .N,
    Lab_X = -layout$b,
    Lab_Y = .GRP * layout$m - 1), Subject] %>%
    rowSplit %>% unname
  
  out_data <- list(
    width = width,
    height = height,
    subjects = subjects,
    layout = layout)
  
  out <- whisker.render(
    template = shell_template,
    data = out_data)

  return(out)
}

Dynamically setting font size

An added advantage about plotting directly to SVG, is that one can easily scale texts. Here, the font size of the labels is set to the value $m$ (see Figure 3 for more details):

\[m=\frac{(1-a)}{n}\]

Within R, this task is trickier as text height in R using the base::strheight() function is based on the height of the capital letter ‘M’ (see ?strheight for details on this).

Completing a template from a data table

Notice the use of data.table in the above code snippet:

shell.R

[...]

subjects <- episode[, .(
  Episode_Count = .N,
  Lab_X = -layout$b,
  Lab_Y = .GRP * layout$m - 1), Subject] %>%
  rowSplit %>% unname

[...]

By now, data tables are well established I assume, but just in case: the above translates to:

Take the episodes data table, perform no filtering on it (nothing before the first comma), group by subject (third parameter inside the []) and then aggregate the rows in that group (containing the episode for that subject) into a single row with the features Episode_Count containing the number of episodes (using data.table’s special variable .N) and the label positions Lab_X and Lab_Y.

The whisker::rowSplit() function is a convenience function to split a data.frame, data.table or matrix object into a list of lists allowing for the whisker loop structure inside the template to work properly. In the variable out_data, we gather all the data to be processed by whisker to complete the template. Notice how the dot in {{layout.a}} (see template above) allows the reference of children (and grandchildren, etc…) within a complex object.

Plotting the subject labels

All that we are left to do is to test whether the subject label are plotted in the correct order, in the correct position and with the dynamically adjusted font size:

example.Rmd

canc %>% head(200) %>% shell_plot_01(layout) %>% cat

canc %>% head(500) %>% shell_plot_01(layout) %>% cat

Figure 6: Plotting the subject identifiers of the top 200 or 500 episodes on a canvas containing a number of guidelines.

Again, if you haven’t been reading the above and your are freaking out because you fear for the privacy of the people, the names are fake, see above for details on this. I am showing the labels immediately in their final font family (here, Google’s Roboto). Instead of the on-element attribute, I used CSS for that:

css/shell.css

@import url('https://fonts.googleapis.com/css2?family=Roboto:wght@100&display=swap');

.shell-subject-labels{
  font-family: 'Roboto', sans-serif;
}

If you are a pixel fucker, as I am, you might have noticed a slight shift in the subject identifiers respective to the guidelines. I will be adding a parameter to account for a relative shift for this and other <text> elements as shown in Figure 3.

Searchable text

Because the plot is being generated as SVG, the text is fully searchable, indexable, CSS stylable, Javascriptable, etc… (see Figure 7).

Figure 7: Because the plot is constructed as a SVG element, it consist purely of XML code allowing for all kinds of interactivity through CSS, Javascript and derivatives thereof. If you are viewing this in your browser, you can try and select some text in Figure 6.

Plotting an annulus sector

Let us start plotting the visual elements representing the data. In a shell plot, the subject’s observation periods as well as the episodes it contains are being represented by annulus sectors as shown in Figure 2. In general, an annulus sector is being parameterized by:

center ($c_x$ and $c_y$)
inner radius ($r_{inner}$)
outer radius ($r_{outer}$)
start angle ($\alpha_{min,j}$)
end angle ($\alpha_{max,j}$)

However, we will have to manually construct this shape using the SVG <path> element. Suppose we would want to draw the upper arc of the observation highlighted in Figure 2 (with identifier abcdef) and we assume that the end angle for this sector is 50°, then we would create something like this (look at second group, the first is for the guidelines):

shell_template.svg

<svg viewBox="-.7 -1.1 1.4 2.2" width="200" height="200">
  <g fill="none" stroke="#00B0F0" stroke-width=".002">
    <line x1="+0.0" y1="-1.1" x2="0.0" y2="2.2" />
    <line x1="-1.1" y1="+0.0" x2="2.2" y2="0.0" />
    <rect x="-1" y="-1" width="2" height="2" />
    <circle cx="0" cy="0" r=".3" />
  </g>
  <g fill="none" stroke="#00B0F0" stroke-width=".02">
    <path d="M 0.000 -0.860 A 0.860 0.860 50 0 1 0.659 -0.553" />
  </g>
</svg>

The above reads as Move to $\{0.000, -0.860\}$ (all numbers in user units as per Figure 3), where $-0.860$ comes from: \[-0.860=-r_{outer, 2}=-1 \cdot \left[a+(1-a)\frac{(n-i+1)}{n}\right]=-1 \cdot \left[0.3+(1-0.3)\frac{(5-2+1)}{5}\right]\]

, then draw an elliptical Arc with radius $r_{outer, 2}=0.860$ (twice because $r_x=r_y$ are axes of an ellipse and we need a circle), along a $\Delta\alpha$ of 50°, taking the shorter of both arcs (because $\Delta\alpha<\pi$ as indicated with 0), that moves clockwise (1) to the point $\{0.659, -0.553\}$ where $0.659=0.860\cdot cos(90-50)$ and $-0.553=-0.860\cdot sin(90-50)$ (because of the vertically flipped coordinate system of SVG). Here is the result:

Figure 8: Drawing the upper arc of the highlighted observation in Figure 2 using an SVG <path> element.

From drawing an arc, it is a small step to drawing the annulus sector. Here is the adjusted SVG template:

shell_template.svg

<svg viewBox="-.7 -1.1 1.4 2.2" width="200" height="200">
  <g fill="none" stroke="#00B0F0" stroke-width=".002">
    <line x1="+0.0" y1="-1.1" x2="0.0" y2="2.2" />
    <line x1="-1.1" y1="+0.0" x2="2.2" y2="0.0" />
    <rect x="-1" y="-1" width="2" height="2" />
    <circle cx="0" cy="0" r=".3" />
  </g>
  <g fill="none" stroke="#00B0F0" stroke-width=".02">
    <path d="
      M {{X_Start_Outer}} {{Y_Start_Outer}}
      A {{R_Outer}} {{R_Outer}} {{Delta_Angle}} {{Large_Arc}} 1 {{X_End_Outer}} {{Y_End_Outer}}
      L {{X_End_Inner}} {{Y_End_Inner}}
      A {{R_Inner}} {{R_Inner}} {{Delta_Angle}} {{Large_Arc}} 0 {{X_Start_Inner}} {{Y_Start_Inner}}
      L {{X_Start_Outer}} {{Y_Start_Outer}}" />
  </g>
</svg>

And here is some test code to complete the template based on the start angle (100°), the end angle (260°), the subject index (2), the total number of subjects (5):

shell.R

template <- "svg/shell_template_2.svg" %>%
  readChar(., file.info(.)$size)
out_data <- list()

# Knowns
angle_start <- 100
angle_end <- 260
i <- 2
n <- 5

# Calculated
out_data$Delta_Angle <- angle_end - angle_start
out_data$Large_Arc <- 1 * (out_data$Delta_Angle > 180)
out_data$R_Outer <-
  layout_default$a +
  (1 - layout_default$a) *
  (n - i + 1) / n
out_data$R_Inner <-
  layout_default$a +
  (1 - layout_default$a) *
  (n - i) / n
out_data$X_Start_Outer <- (
  out_data$R_Outer *
  cos((90 - angle_start) / 180 * pi)) %>%
  round(3)
out_data$Y_Start_Outer <- (
  -1 * out_data$R_Outer *
  sin((90 - angle_start) / 180 * pi)) %>%
  round(3)
out_data$X_End_Outer <- (
  out_data$R_Outer *
  cos((90 - angle_end) / 180 * pi)) %>%
  round(3)
out_data$Y_End_Outer <- (
  -1 * out_data$R_Outer *
  sin((90 - angle_end) / 180 * pi)) %>%
  round(3)
out_data$X_Start_Inner <- (
  out_data$R_Inner *
  cos((90 - angle_start) / 180 * pi)) %>%
  round(3)
out_data$Y_Start_Inner <- (
  -1 * out_data$R_Inner *
  sin((90 - angle_start) / 180 * pi)) %>%
  round(3)
out_data$X_End_Inner <- (
  out_data$R_Inner *
  cos((90 - angle_end) / 180 * pi)) %>%
  round(3)
out_data$Y_End_Inner <- (
  -1 * out_data$R_Inner *
  sin((90 - angle_end) / 180 * pi)) %>%
  round(3)

template %>%
  whisker.render(data = out_data) %>% 
  cat

Figure 9: Drawing our first annulus sector.

With this beautiful smile we have overcome the biggest technical hurdles. We will now move to the interactivity aspect.

Hover info

As promised, I am going to make the plot interactive. I will test this interactivity on a single shape (the one we prepared in previous paragraph). Suppose I would like, whenever I hover over the shape, the name of the subject to appear in an information box below. As it is good practise to separate the HTML elements from their styling and from their actions. We start by defining a piece of HTML template that is to be completed with the data:

infobox-template.html

<div id="infobox-template-03" style="display: none;">
  <aside>
    <table class="shell-subject">
      <caption>Subject details</caption>
      <colgroup>
        <col class="prop-label">
        <col class="prop-value">
      </colgroup>
      <tr>
        <th scope="row">Name</th>
        <td>{{Subject}}</td>
      </tr>
    </table>
  </aside>
</div>

The only piece of information to complete this template, in this case, is the subject’s identifier Subject. We know how to complete a template using whisker. However, this HTML snippet is to be complete not during R’s compile time but on the fly and so needs to reside on the client. So, where does the data come from? There are many options to store the data. Probably the slickest way to store the data client-side is by using the storage options of your browser. However, we are going to store the data inside attributes of the object we are hovering over. Having the data attached to its source makes a lot of sense, but may not be the most memory-efficient way. We are going to store the headers separately from the data values to avoid repetition, but even then we would be wasteful whenever we need multiple HTML elements types to respond to a mouse hovering.

We now wrap the svg inside a division add the infobox-03 element to hold the completed data:

shell_template.svg

<div class="shell-plot">
  <svg viewBox="-.7 -1.1 1.4 2.2" width="200" height="200" data-infobox-target="infobox-03"
    data-infobox-template="infobox-template-03">
    <g fill="none" stroke="#00B0F0" stroke-width=".002">
      <line x1="+0.0" y1="-1.1" x2="0.0" y2="2.2" />
      <line x1="-1.1" y1="+0.0" x2="2.2" y2="0.0" />
      <rect x="-1" y="-1" width="2" height="2" />
      <circle cx="0" cy="0" r=".3" />
      <text x="0" y="-0.5" text-anchor="middle" font-size=".2" fill="#00B0F0">Hover over me!</text>
    </g>
    <g fill="white" stroke="#00B0F0" stroke-width=".02">
      <path data-subject-name="abcdef" class="episode" d="
      M {{X_Start_Outer}} {{Y_Start_Outer}}
      A {{R_Outer}} {{R_Outer}} {{Delta_Angle}} {{Large_Arc}} 1 {{X_End_Outer}} {{Y_End_Outer}}
      L {{X_End_Inner}} {{Y_End_Inner}}
      A {{R_Inner}} {{R_Inner}} {{Delta_Angle}} {{Large_Arc}} 0 {{X_Start_Inner}} {{Y_Start_Inner}}
      L {{X_Start_Outer}} {{Y_Start_Outer}}" />
    </g>
  </svg>
  <div id="infobox-03" />
</div>

Notice the addition of the data-subject-name and class attributes on the path element as well as the division at the bottom which will be containing the completed template upon hover. Obviously, here one needs to work with the id attribute instead of the class attribute as we want to be able to have multiple shell plots on a HTML page each connected to their specific information box. So we have the data template, we have the data and we have the information box recipient. Now, we only need to write some Javascript code to handle the template completion upon hover:

shell.js

// Function carrying out the completion of the template
const complete = (template, data) => {
  for (var item in data) {
    template = template.replace(RegExp('{{' + item + '}}', 'g'), data[item]);
  }

  template = template.replace(/{\w+}/g, '');

  return template;
}

// Function to be executed upon mouse entering one of the source elements
const onEpisodeMouseEnter = (e) => {
  const targetName = e.target.closest("svg").dataset.infoboxTarget;
  const templateName = e.target.closest("svg").dataset.infoboxTemplate;

  target = document.getElementById(targetName);
  template = document.getElementById(templateName).innerHTML;

  data = { Subject: e.target.dataset.subjectName };
  target.innerHTML = complete(template, data);
}

// Function to be executed upon mouse leaving one of the source elements
const onEpisodeMouseLeave = (e) => {
  const targetName = e.target.closest("svg").dataset.infoboxTarget;

  target = document.getElementById(targetName);

  target.innerHTML = "";
}

// Function to be executed upon clicking one of the source elements
const onEpisodeMouseClick = (e) => {
  data = { Subject: e.target.dataset.subjectName };
  
  var type = "text/plain";
  var blob = new Blob([JSON.stringify(data)], { type });
  var clipData = [new ClipboardItem({ [type]: blob })];

  navigator.clipboard.write(clipData)
}

const onDocumentReady = () => {
  document.querySelectorAll(".episode").
    forEach(el => {
      el.addEventListener("mouseenter", onEpisodeMouseEnter);
      el.addEventListener("mouseleave", onEpisodeMouseLeave);
      el.addEventListener("click", onEpisodeMouseClick)
    });
}

// Binding the source element to the action function when document is ready
document.addEventListener('DOMContentLoaded', onDocumentReady);

Notice that I have mouseenter and mouseleave events declared as well as the click event to store the current data to the user’s clipboard.

Figure 10: Hover over the sector to reveal the data attached. Mind that at this point no styling has been performed.

If you click the sector, the following text should be placed on your clipboard (try in browser as it may not work in your RStudio viewer):

{"Subject":"abcdef"}

Styling and legend

Within the R landscape I came across many ways to provide colors for a color scale. I have been thinking about the best way to implemented this for the shell plot and I came to the following conclusions:

Use the YAML configuration file for generic styling
Use CSS for dynamic or specialist styling
Provide grouping information as a factored feature in x

For example, suppose we want to define the color of the episode sectors based on the binned version of the Measure feature, we can prepare the data like so:

example.Rmd

measure_breaks <- c(-Inf, 2, 150, Inf)
labels <- c("Low", "Medium", "High")

epi[, Group := Measure %>%
    cut(breaks = measure_breaks, labels = labels) %>%
    addNA(ifany = TRUE)]

epi$Group %>% table

FALSE .
FALSE    Low Medium   High   <NA> 
FALSE   1676    852    155    283

With a properly defined factored feature, it becomes really easy to draw a legend. Here is some test code:

example.Rmd

library(colorspace)

cols_bord <- epi$Group %>%
  uniqueN %>%
  add(-1) %>%
  rainbow %>% 
  c("grey") %>% 
  darken(.3)
cols_body <- epi$Group %>%
  uniqueN %>%
  add(-1) %>%
  rainbow %>% 
  c("lightgrey")

out_data <- epi[order(Group), .(
  Lab_Y = -.9 + (.GRP) * .1,
  Symbol_Y = -.95 + (.GRP) * .1,
  Symbol_Stroke_Color = cols_bord[.GRP],
  Symbol_Color = cols_body[.GRP]), .(Label = Group)] %>% 
  rowSplit %>% unname

out_data = list(
  legend = out_data,
  Lab_Color = "#646464",
  Lab_Dim = .08,
  Symbol_Dim = .05,
  Symbol_X = -.5,
  Lab_X = -.4,
  Symbol_Stroke_Width = .01
  )

"svg/shell_template_4.svg" %>%
  readChar(., file.info(.)$size) %>%
  whisker.render(data = out_data) %>% 
  cat

Figure 11: Example legend.

Label lengths

The last piece of the puzzle are the radial grid lines, including the subject axis. First, we need to calculate the maximum possible angle $\alpha_{max}$ so that the observation sectors do not overlap with the subject labels. Then we can calibrate the observation axis so that $t_{max}$ corresponds to $\alpha_{max}$. Finally, we can use the pretty functionality of R so that the grid lines get logically-separated along the range of the observations axis.

To calculate $\alpha_{max}$, we first need the computed lengths of the subject labels ($\ell_i$). I tried to use strwidth() and strheight() functions but later switched to writing my own function to measure the length of a label. The font family determines the width of the printed font, so we need to take that into consideration. As shown above in the CSS file, I have used the Roboto font but obviously, this will become an option. First, we look at the function to calculate label widths:

shell.R

library(systemfonts)

get_label_widths <- function(labels, font_height, style) {
  labels <- labels %>% as.character
  glyphs <- labels %>% strsplit("") %>% unlist %>% unique
  glyph_details <- glyph_info(glyphs, style$font$family) %>% as.data.table
  glyph_ar <- glyph_details[,
    .(Glyph = glyph, Width = x_advance, Height = height)]
  
  setkey(glyph_ar, "Glyph")
  
  label_width <- function(label) {
    label_ar <- glyph_ar[label %>% strsplit("") %>% 
        unlist, sum(Width) / max(Height)]
    
    return(font_height * label_ar)
  }

  label_widths <- labels %>%
    lapply(label_width) %>%
    unlist
  
  return(label_widths)
}

After converting the labels, which might be stored as a factor, to a character vector, we extract all used characters with %>% strplit("") %>% unlist. Then we query Roboto’s glyph table using systemfonts::glyph_info() function for these particular characters and store the resulting aspect ratios in the variable glyph_ar. Given a word and the font height (i.e. font size in user units), we can then easily calculate the width of a label as it would be printed on a screen in a particular font family and in a particular font size. Mind that, unlike for strwidth() and strheight() we do not actually need to plot the string, which saves a lot of resources (I may actually talk about green computing on another occasion).

shell.R

subject_names <- epi$Subject %>% levels %>% extract(1:13)
font_size <- (1 - layout_default$a) / length(subject_names)
style_01 <- list(font=list(family = "Roboto"))

(predicted <- subject_names %>%
  get_label_widths(font_size, style_01))

 [1] 0.6076923 0.5469231 0.4769231 0.6300000 0.4538462 0.6507692 0.4984615
 [8] 0.4984615 0.9000000 0.6784615 0.5923077 0.5746154 0.4223077

In case you want to reproduce this code and not want to install Roboto on your windows system, you can download the font, say in the fnt subfolder, import it and check its availability:

example.Rmd

library(extrafont)

font_import(paths = "fnt", prompt = FALSE)
fonts() %>% grep("Roboto", ., value = TRUE) %>% head

Radial grid lines

Now that we have the lengths of the printed labels, we can continue to calculate the max angle of the observation axis:

\[\alpha_{max}=min\left[atan2\left(-(\ell_i+b),\frac{(i-n)(1-a)}{n}\right)-a\right]-s\]

where $\ell_i$ is the cumulative maximum of $l_i$. Notice how I vectorize atan2 to ease programming:

shell.R

n <- length(predicted)
l_i <- cummax(predicted %>% rev) %>% rev
i <- 1:n
s <- pi / 12

x <- -(l_i + layout_default$b)
y <- (i - n) * (1  - layout_default$a) / n - layout_default$a

atan3 <- Vectorize(atan2) # Sorry, I couldn't resist

(alpha_max <- 5 * pi / 2 - min(atan3(-y, x)) - s)

FALSE [1] 5.241806

, which, in case you do not like radians, roughly corresponds to $\frac{13\pi}{8}$ or 300°.

With $\alpha_{max}$ known, we can finally draw the axial grid lines. I collect the max observation years of the top 200 rows of data (which corresponds to the 13 label names plotted before) and use pretty to find a nice distribution of angles:

shell.R

a <- .3
c <- .05
d <- .02

max_obs_years <- epi %>% head(200) %$% End %>% max
years_to_rad <- function(x) {
  (5 * pi / 2) - (x / max_obs_years * alpha_max)
}

obs_axis <- data.table(
  Label = c(0, max_obs_years) %>%
    pretty %>% 
    extract(. < (max_obs_years)))

obs_axis[, Axis_Angle := Label %>% years_to_rad]
obs_axis[, Gridline_Start_X := a * cos(Axis_Angle)]
obs_axis[, Gridline_Start_Y := -a * sin(Axis_Angle)]
obs_axis[, Gridline_End_X := (1 + d) * cos(Axis_Angle)]
obs_axis[, Gridline_End_Y := -(1 + d) * sin(Axis_Angle)]
obs_axis[, Axis_Label_X := (1 + d + c) * cos(Axis_Angle)]
obs_axis[, Axis_Label_Y := -(1 + d + c) * sin(Axis_Angle) + 0.025]
obs_axis[, Anchor := ifelse(abs(Axis_Label_X) < a, "middle",
  ifelse(Axis_Label_X > 0, "start", "end"))]

And here is the result:

out_data = list(
  grid_lines = obs_axis %>% rowSplit %>% unname,
  layout = list(m = .053),
  Gridline_Color = "pink",
  Gridline_Width = .006,
  Unit = "yr"
  )

"svg/shell_template_5.svg" %>%
  readChar(., file.info(.)$size) %>%
  whisker.render(data = out_data) %>% 
  cat

Figure 12: Test of drawing the gridlines (pink).

Not shown here is the shortening of the grid lines based on the data as indicated in Figure 3.

Putting it all together

You can find the final solution on my Github page.

Examples

Conflicts

The Uppsala Conflict Data Program (UCDP) provides global data about the wars and other forms of armed conflicts that have raged during the past 30 years.

conflicts.html

<div class="infobox-template" style="display: none;">
  <aside>
    <p class="shell-message">Click episode to copy the data to the clipboard.</p>
    <table class="cycle">
      <caption>{{Subject}}</caption>
      <colgroup>
        <col class="prop-label">
        <col class="prop-value">
      </colgroup>
      <tr>
        <th scope="row">Period</th>
        <td>{{Start_Date}} - {{End_Date}}</td>
      </tr>
      <tr>
        <th scope="row">Type</th>
        <td>{{Type}}</td>
      </tr>
      <tr>
        <th scope="row">Location</th>
        <td>{{Location}}</td>
      </tr>
      <tr>
        <th scope="row">Country</th>
        <td>{{Country}}</td>
      </tr>
    </table>
    <table class="observations">
      <caption>Casualties</caption>
      <colgroup>
        <col class="prop-label">
        <col class="prop-value">
      </colgroup>
      <tr>
        <th scope="row">Side A | Side B</th>
        <td>{{Casualties_Sides}}</td>
      </tr>
      <tr>
        <th scope="row">Civilian</th>
        <td>{{Casualties_Civilians}}</td>
      </tr>
      <tr>
        <th scope="row">Estimate (low - high)</th>
        <td>{{Casualties_Estimate_String}}</td>
      </tr>
    </table>
    <table class="subject">
      <caption>Conflict details</caption>
      <colgroup>
        <col class="prop-label">
        <col class="prop-value">
      </colgroup>
      <tr>
        <th scope="row">Conflict name</th>
        <td>{{Subject}}</td>
      </tr>
      <tr>
        <th scope="row">Sides</th>
        <td>{{Sides}}</td>
      </tr>
      <tr>
        <th scope="row">Duration</th>
        <td>{{Conflict_Duration}} yrs</td>
      </tr>
      <tr>
        <th scope="row">Activity</th>
        <td>{{Percent_Active}} %</td>
      </tr>
    </table>
  </aside>
</div>

conflicts <- fread("dat/conflicts.tsv", sep = "\t")

breaks <- c(-Inf, 1, 2, 4, 8, Inf)
labels <- c("Very low", "Low", "Medium", "High", "Very high")

conflicts[, Group :=
    Casualties_Estimate %>% log %>%
    cut(breaks = breaks, labels = labels)]

layout <- copy(layout_default)
layout$legend <- data.table(
  Id = labels %>% c(NA),
  Label = labels %>% c("(missing)"),
  Color = colorRampPalette(c("steelblue", "orange"))(5) %>%
    c("lightgrey"))
layout$fx <- .4
layout$fy <- -1
layout$unit = "yr"
layout$legend_title <- "Casualties"

conflicts[
  Episode_Count > 10 & Conflict_Duration < 10 &
    Duration_Active > .8] %>%
  shell_plot(
    width = "500px",
    height = "500px",
    layout = layout,
    infobox_template = "html/conflicts.html"
  ) %>% cat

Waves

The NOAA/WDS tsunami database consists of the set of global historical tsunami events for which sufficient evidence is being provided.

waves <- fread("dat/waves.tsv", sep = "\t")

breaks <- c(-Inf, 1, 2, 5, 10, 25, Inf)
labels <- c("< 1m", "1-2m", "2-5m", "5-10m", "10-25m", "> 25m")

waves[, Group := Wave_Height %>%
    cut(breaks = breaks, labels = labels)]

layout <- copy(layout_default)

layout$legend <- data.table(
  Id = labels %>% c(NA),
  Label = labels %>% c("(missing)"),
  Color = colorRampPalette(c("steelblue", "orange"))(6) %>%
    c("lightgrey"))
layout$legend_title <- "Wave height"
layout$fx <- -.4
layout$fy <- -.9
layout$s <- .3
layout$unit <- function (x) {
  return(paste0(10 ^ x, "km"))
}

waves[, Start :=  Wave_Distance %>% add(.1) %>% log10]
waves[, End := Start + .1]
waves[, Subject :=  Quake %>% as.character %>% as.factor]

waves <- waves[!is.na(Start) & !is.na(Subject)]
waves[, Episode_Count := .N, Subject]

waves[Episode_Count > 10 & Episode_Count < 100] %>%
  shell_plot (
    width = "500px",
    height = "800px",
    layout = layout, infobox_template = "html/waves.html") %>% cat

I tend to avoid comma-separators where possibly for, I hope, the obvious reasons. Whoever has started the idea that a comma would be a good separator is a fool anyway↩︎

Cycle count	{{Episode_Count}}
Duration	{{Obs_End}} years

Location	{{Wave_Province}} ({{Wave_Country}}) lat: {{Wave_Latitude}}, lon: {{Wave_Longitude}}
Time	{{Wave_Date}}
Height	{{Wave_Height}} m
Distance from source	{{Wave_Distance}} km

Shell plot for positively skewed longitudinal data

David D’Haese

Published on Tue Jul 13 00:42:30 2021

Quick start

Introduction

Building the plot from scratch

Terminology

Layout parameterization

Methodology

Data

Personalizing subject identifiers

Without the shell plot…

Shell function signature

The SVG template

Loading width and height from chunk options

Loading layout options from the YAML configuration file

Using `whisker` for templating

Dynamically setting font size

Completing a template from a data table

Plotting the subject labels

Searchable text

Plotting an annulus sector

Hover info

Styling and legend

Label lengths

Radial grid lines

Putting it all together

Examples

Conflicts

Waves

Period	Day {{Start_Day}} - {{End_Day}}
Type	{{Type}}
Treatment	{{Treatment}}
Regime	{{Regime}}
Measure	{{Measure}} cells

Period	{{Start_Date}} - {{End_Date}}
Type	{{Type}}
Location	{{Location}}
Country	{{Country}}

Side A \| Side B	{{Casualties_Sides}}
Civilian	{{Casualties_Civilians}}
Estimate (low - high)	{{Casualties_Estimate_String}}

Conflict name	{{Subject}}
Sides	{{Sides}}
Duration	{{Conflict_Duration}} yrs
Activity	{{Percent_Active}} %

Shell plot for positively skewed longitudinal data

David D’Haese

Published on Tue Jul 13 00:42:30 2021

Quick start

Introduction

Building the plot from scratch

Terminology

Layout parameterization

Methodology

Data

Personalizing subject identifiers

Without the shell plot…

Shell function signature

The SVG template

Loading width and height from chunk options

Loading layout options from the YAML configuration file

Using whisker for templating

Dynamically setting font size

Completing a template from a data table

Plotting the subject labels

Searchable text

Plotting an annulus sector

Hover info

Styling and legend

Label lengths

Radial grid lines

Putting it all together

Examples

Conflicts

Waves

Using `whisker` for templating