I have not created an R package for this plot yet. I would love to incorporate some feedback first so as to improve its usability and make it more robust and applicable in a wider range of real-life scenarios. For now you can follow the steps below:
Prepare a longitudinal data.table so that each row contains an episode with a Subject and with numeric features called Start and End.
Decide on which feature to determine the coloring of the episode sectors and place the calculated color for each episode in the feature named Group.
Download or clone the Shell plot repository and copy the shell subfolder as an immediate subfolder of your project folder (otherwise adjust respective paths in the main functions default arguments).
Generate the HTML containing the SVG, JavaScript and CSS of the shell plot like so:
my_dt %>% shell %>% catEither render the HTML directly in RMarkdown or copy-paste it into your own HTML page.
Around the year 2015, I devised a plot type that I baptised the shell plot (Figure 1). It allows for pattern discovery in longitudinal data with highly variable and positively skewed observation periods. One example that we are going to demonstrate here is the observational study where many patients (the subject) are being followed during treatments (events) ranging from a couple of days to several years. Another example could be the analysis of volcano eruption episodes (subject) during which all kinds of specific events occur such as explosions, fumarole activity, ash expulsion, … that deserve temporal analysis. The observational axis would typically represent the time dimension but could also represent distance or another quantity altogether. We will discuss some example at the end of this article.
In comparison with the older versions of this plot, I decided to make this one vector-based, to write it in pure HTML and to add interactivity such as zooming and showing details upon hovering. I went to the trouble of having most parameterization automated so that only a minimal user input is required, while still allowing endless flexibility in the layout and styling part.
Let us take the time to think this through, because although it looks as simple as converting Cartesian coordinates to polar ones, there is a little bit more to it. First, to agree on the terminology used, I drew up Figure 2:
Figure 2: Terminology used for the shell plot.
Next, let us think about how to parameterize the layout. Details are shown in Figure 3. We begin by centering our plot around the \({x, y}\)-coordinate \({0, 0}\) and by confining the radius range of the annuli sectors containing the data to \([a, 1]\), where \(a\) is the size of the hole in the middle of the plot. Next we need to calculate the lengths of the labels \(l_i\) based on the font used, the font size and the font style. These label-lengths allow us to determine the maximum allowed sector angle \(\alpha_{max}\) so as to avoid that data sectors overlap with the subject labels. Next, we draw the outer observation sectors that cover the complete observation-period per subject and then we draw the episode sectors. The distribution of the data determines the length of the radial grid lines and finally, the positions of all the elements on the plot determine the framing of the view port. Now that the parameterization is in place, we can draw the process diagram as shown in Figure 4.
Figure 3: Parameterization of the layout for the shell plot. The circled numbers indicate the order in which to calculate the (derived) parameters. See text for more details.
Figure 4: Process diagram for the shell plot. Only one input is truly compulsory and that is the observational data with the features Subject, Start, End, and Group. Although Width and Height are strictly layout parameters, they are treated as true input values because of their importance. id represents an unique HTML identifier to avoid conflict when working with more than one shell plot per HTML page. The input gridlines corresponds to the at argument of the axis() function. To allow extra information to be shown upon hover, an infobox template must be provided. Layout and styling information can be provided as a YAML configuration file. Additional and dynamic styling (e.g. highlighting of sector upon hover) is provided as CSS style sheets.
At some point we will need data. However, let us first load our libraries. There are two libraries that I use all the time:
example.Rmd
library(magrittr)
library(data.table)The simplified and pseudonymized dataset of cancer treatments looks like this (only first few records shown):
example.Rmd
epi <- fread("dat/cancer_tx.tsv", encoding = "UTF-8",
stringsAsFactors = TRUE)
epi %>% head(10) %>% kable| Subject | Start | End | Line | Cycle | Type | Treatment | Regime | Measure | Group | Start_Day | End_Day | Age |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Adelle Delaney | 0.0000000 | 0.0547581 | 1 | 1 | Cycle | TX_1 | Drug E | 4 | Low | 1 | 21 | 48 |
| Adelle Delaney | 0.0547581 | 0.1122542 | 1 | 2 | Cycle | TX_1 | Drug E | 0 | Zero | 21 | 42 | 48 |
| Adelle Delaney | 0.1122542 | 0.1697502 | 1 | 3 | Cycle | TX_1 | Drug E | 1 | Low | 42 | 63 | 48 |
| Adelle Delaney | 0.1697502 | 0.2272463 | 1 | 4 | Cycle | TX_1 | Drug E | 0 | Zero | 63 | 84 | 48 |
| Adelle Delaney | 0.2272463 | 0.2847423 | 1 | 5 | Cycle | TX_1 | Drug E | 1 | Low | 84 | 105 | 48 |
| Adelle Delaney | 0.2847423 | 0.3422384 | 1 | 6 | Cycle | TX_1 | Drug E | 2 | Low | 105 | 126 | 48 |
| Adelle Delaney | 0.3997344 | 0.4572305 | 1 | 7 | Cycle | TX_1 | Drug E | 0 | Zero | 147 | 168 | 48 |
| Adelle Delaney | 0.6488840 | 0.7063800 | 1 | 0 | Progression | TX_1 | Drug E | 9 | Medium | 238 | 241 | 48 |
| Adnan Bridges | 0.0000000 | 0.0574960 | 1 | 1 | Cycle | TX_1 | Drug G / Drug C | 753 | Unbound | 1 | 22 | 59 |
| Adnan Bridges | 0.0876130 | 0.1451091 | 1 | 2 | Cycle | TX_1 | Drug G / Drug C | 436 | Unbound | 33 | 54 | 59 |
Start and End are the start and end times of the subject’s treatment episode expressed in years after initial admission, whereas Start_Day and End_Day were calculated by dividing the former by 365.2425. The features Line to Regime correspond to the treatment descriptors that you would typically find in oncology. Measure represents some quantifiable outcome measure. Finally, the Age is a fake feature to demonstrate the use of subject-specific data. It was drawn as epi[, Age := rnorm(1, 45, 15) %>% round, Subject].
Subject names are fake and are derived from the Quick Name Generator so as to remind ourselves that we are dealing with real people even if the data has been anonymized beyond recovery. If you are in a similar situation in which you want to protect the people’s identity while still reminding anyone that you are dealing with people instead of records or numbers, then you can collect a sufficiently large set of unique names (first and/or last names dependent on the number needed) in an headed tab-separated file names.tsv1 and do something along these lines:
example.Rmd
names <- fread("names.tsv", sep = "/t") # One feature called 'Name'
dat[, Subject := names$Name[Subject_ID %>% as.factor %>% as.numeric]Note that the sep = "\t" argument is only needed here because the file consist of a single feature and fread might be confused about what separator to choose, especially if the names you collected from some online generator contains comma values.
The summary below and Figure 5 show that the accumulated observation time per Subject typically varies greatly (from 0.066 years = 24 days to more than 8.5 years). Because we have such a wide range of observation periods among subjects, it becomes hard to keep the overview. Log-transforming the time axis is not ideal because it hinders interpretation and because there might be statistical objections. That is where the idea arose to bend the observational axis into a circle while keeping the subjects sorted from shorter to longer observation periods.
Subject TX_Time
1: Atif Vickers 0.06570977
2: Rafael Whitley 0.11225419
3: Nataniel Ayers 0.11499209
4: Annabell Mohammed 0.13415744
5: Khaleesi Plummer 0.13689535
---
160: Elli Gamble 5.56068913
161: Zahid Deacon 5.68389495
162: Lisa-Marie Allison 5.79614913
163: Sunil Lucero 6.08636728
164: Zakariya Tucker 8.55595940
Figure 5: Distribution of the total treatment time (expressed in days since admission) among all subjects. The layout in this plot is hindered by the longer observation period of Zakariya Tucker which makes other observation scale along. In many realistic cases, the distribution of observation periods is even more skewed and leptocurtic then in this dataset.
Based on the process diagram shown above, we can write out the signature and usage of shell function:
shell.R
shell_plot <- function(episodes, width, height, layout, style, id,
gridlines = NULL, shell_template = "shell/svg/shell_template.svg",
infobox_template = "shell/html/infobox-template.html",
javascript = "shell/js/shell.js", css = "shell/css/shell.css") {
# Code goes here
}
epi %>% shellIn the next paragraph, we will start with plotting the subject labels. But for that, we first need a plotting device. Instead of using the base plotting system of R, I decided here to generate SVG code directly instead. In order to keep the plot function flexible, adjustable and configurable, I will utilize templates and work with the whisker package which implements mustache. Below is the first version of the SVG template for our plot:
shell_template.svg
<svg viewBox="-.7 -1.1 1.4 2.2" width="{{width}}" height="{{height}}">
<g fill="none" stroke="#00B0F0" stroke-width=".004">
<line x1="0" y1="-1.1" x2="0" y2="2.2" />
<line x1="-.7" y1="0" x2="1.4" y2="0" />
<rect x="-1" y="-1" width="2" height="2" />
<circle cx="0" cy="0" r="{{layout.a}}" />
</g>
<g font-size="{{layout.m}}">
{{#subjects}}
<text x="{{Lab_X}}" y="{{Lab_Y}}" text-anchor="end">{{Subject}}</text>
{{/subjects}}
</g>
</svg>The width and height of the viewport are variable and must be completed, hence the double curly braces. In this tutorial, we will be using the chunk options to set these variables with width <- opts_current$get("out.width") and opts_current$get("out.height"), respectively (see below). Mind that the viewbox is currently being hard coded. Later, we will calculate these values from the content. The numbers .7 -1.1 1.4 2.2 are the left, top, width and height of the viewing window and are in the user units of the SVG element. Next we see a group tag <g> in which I temporally added some guidelines to help me positioning graph element during development. I will remove these later. Finally, we see a whisker loop structure that will iterate over subjects ({{#subjects}}<loop body>{{/subjects}}). For each subject, we will be gathering the subject’s identifier (Subject) for the label text as well as the Lab_X and Lab_Y features for the proper placement of the labels.
We will soon load and complete the SVG template, but for that we will need some data to complete. As mentioned above, we load the out.width and out.height chunk options into the variables width and height, respectively. First, we define the RMArkdown chunk as such:
```{r shell_dev_02, out.height="800px", out.width="500px"}
and then we acces these options:
Example.Rmd
width <- opts_current$get("out.width")
height <- opts_current$get("out.height")Simply save a file named config.yaml in the root folder:
config.yaml
default:
layout:
a: .10 # Size of the central hole in the plot in user units (UU)
b: .02 # Space between subject labels and subject axis (UU)Remember that a represents the radius of the central hole in the middle of the graph and b is the relative distance between the labels and the subject axis (see Figure 3 for details). The config packages makes it really easy to read from and write to YAML configuration files.
shell.R
library(config)
config <- config::get(file = "shell/config.yaml")
layout <- config$layoutwhisker for templatingIn order to complete the above template, we reference the whisker package. With the function base::readchar() we can allocate the template to the string shell_template:
shell.R
library(whisker)
shell_template <- "svg/shell_template.svg" %>%
readChar(., file.info(.)$size)We are now ready to adjust the shell() function so as to print the subject labels:
shell.R
shell_plot_01 <- function(episodes, layout) {
layout$n <- episodes$Subject %>% uniqueN
layout$m <- (1 - layout$a) / layout$n
subjects <- episodes[, .(
Episode_Count = .N,
Lab_X = -layout$b,
Lab_Y = .GRP * layout$m - 1), Subject] %>%
rowSplit %>% unname
out_data <- list(
width = width,
height = height,
subjects = subjects,
layout = layout)
out <- whisker.render(
template = shell_template,
data = out_data)
return(out)
}An added advantage about plotting directly to SVG, is that one can easily scale texts. Here, the font size of the labels is set to the value \(m\) (see Figure 3 for more details):
\[m=\frac{(1-a)}{n}\]
Within R, this task is trickier as text height in R using the base::strheight() function is based on the height of the capital letter ‘M’ (see ?strheight for details on this).
Notice the use of data.table in the above code snippet:
shell.R
[...]
subjects <- episode[, .(
Episode_Count = .N,
Lab_X = -layout$b,
Lab_Y = .GRP * layout$m - 1), Subject] %>%
rowSplit %>% unname
[...]By now, data tables are well established I assume, but just in case: the above translates to:
Take the episodes data table, perform no filtering on it (nothing before the first comma), group by subject (third parameter inside the []) and then aggregate the rows in that group (containing the episode for that subject) into a single row with the features Episode_Count containing the number of episodes (using data.table’s special variable .N) and the label positions Lab_X and Lab_Y.
The whisker::rowSplit() function is a convenience function to split a data.frame, data.table or matrix object into a list of lists allowing for the whisker loop structure inside the template to work properly. In the variable out_data, we gather all the data to be processed by whisker to complete the template. Notice how the dot in {{layout.a}} (see template above) allows the reference of children (and grandchildren, etc…) within a complex object.
All that we are left to do is to test whether the subject label are plotted in the correct order, in the correct position and with the dynamically adjusted font size:
example.Rmd
canc %>% head(200) %>% shell_plot_01(layout) %>% catcanc %>% head(500) %>% shell_plot_01(layout) %>% catAgain, if you haven’t been reading the above and your are freaking out because you fear for the privacy of the people, the names are fake, see above for details on this. I am showing the labels immediately in their final font family (here, Google’s Roboto). Instead of the on-element attribute, I used CSS for that:
css/shell.css
@import url('https://fonts.googleapis.com/css2?family=Roboto:wght@100&display=swap');
.shell-subject-labels{
font-family: 'Roboto', sans-serif;
}If you are a pixel fucker, as I am, you might have noticed a slight shift in the subject identifiers respective to the guidelines. I will be adding a parameter to account for a relative shift for this and other <text> elements as shown in Figure 3.
Because the plot is being generated as SVG, the text is fully searchable, indexable, CSS stylable, Javascriptable, etc… (see Figure 7).
Figure 7: Because the plot is constructed as a SVG element, it consist purely of XML code allowing for all kinds of interactivity through CSS, Javascript and derivatives thereof. If you are viewing this in your browser, you can try and select some text in Figure 6.
Let us start plotting the visual elements representing the data. In a shell plot, the subject’s observation periods as well as the episodes it contains are being represented by annulus sectors as shown in Figure 2. In general, an annulus sector is being parameterized by:
However, we will have to manually construct this shape using the SVG <path> element. Suppose we would want to draw the upper arc of the observation highlighted in Figure 2 (with identifier abcdef) and we assume that the end angle for this sector is 50°, then we would create something like this (look at second group, the first is for the guidelines):
shell_template.svg
<svg viewBox="-.7 -1.1 1.4 2.2" width="200" height="200">
<g fill="none" stroke="#00B0F0" stroke-width=".002">
<line x1="+0.0" y1="-1.1" x2="0.0" y2="2.2" />
<line x1="-1.1" y1="+0.0" x2="2.2" y2="0.0" />
<rect x="-1" y="-1" width="2" height="2" />
<circle cx="0" cy="0" r=".3" />
</g>
<g fill="none" stroke="#00B0F0" stroke-width=".02">
<path d="M 0.000 -0.860 A 0.860 0.860 50 0 1 0.659 -0.553" />
</g>
</svg>The above reads as Move to \(\{0.000, -0.860\}\) (all numbers in user units as per Figure 3), where \(-0.860\) comes from: \[-0.860=-r_{outer, 2}=-1 \cdot \left[a+(1-a)\frac{(n-i+1)}{n}\right]=-1 \cdot \left[0.3+(1-0.3)\frac{(5-2+1)}{5}\right]\]
, then draw an elliptical Arc with radius \(r_{outer, 2}=0.860\) (twice because \(r_x=r_y\) are axes of an ellipse and we need a circle), along a \(\Delta\alpha\) of 50°, taking the shorter of both arcs (because \(\Delta\alpha<\pi\) as indicated with 0), that moves clockwise (1) to the point \(\{0.659, -0.553\}\) where \(0.659=0.860\cdot cos(90-50)\) and \(-0.553=-0.860\cdot sin(90-50)\) (because of the vertically flipped coordinate system of SVG). Here is the result:
<path> element.
From drawing an arc, it is a small step to drawing the annulus sector. Here is the adjusted SVG template:
shell_template.svg
<svg viewBox="-.7 -1.1 1.4 2.2" width="200" height="200">
<g fill="none" stroke="#00B0F0" stroke-width=".002">
<line x1="+0.0" y1="-1.1" x2="0.0" y2="2.2" />
<line x1="-1.1" y1="+0.0" x2="2.2" y2="0.0" />
<rect x="-1" y="-1" width="2" height="2" />
<circle cx="0" cy="0" r=".3" />
</g>
<g fill="none" stroke="#00B0F0" stroke-width=".02">
<path d="
M {{X_Start_Outer}} {{Y_Start_Outer}}
A {{R_Outer}} {{R_Outer}} {{Delta_Angle}} {{Large_Arc}} 1 {{X_End_Outer}} {{Y_End_Outer}}
L {{X_End_Inner}} {{Y_End_Inner}}
A {{R_Inner}} {{R_Inner}} {{Delta_Angle}} {{Large_Arc}} 0 {{X_Start_Inner}} {{Y_Start_Inner}}
L {{X_Start_Outer}} {{Y_Start_Outer}}" />
</g>
</svg>And here is some test code to complete the template based on the start angle (100°), the end angle (260°), the subject index (2), the total number of subjects (5):
shell.R
template <- "svg/shell_template_2.svg" %>%
readChar(., file.info(.)$size)
out_data <- list()
# Knowns
angle_start <- 100
angle_end <- 260
i <- 2
n <- 5
# Calculated
out_data$Delta_Angle <- angle_end - angle_start
out_data$Large_Arc <- 1 * (out_data$Delta_Angle > 180)
out_data$R_Outer <-
layout_default$a +
(1 - layout_default$a) *
(n - i + 1) / n
out_data$R_Inner <-
layout_default$a +
(1 - layout_default$a) *
(n - i) / n
out_data$X_Start_Outer <- (
out_data$R_Outer *
cos((90 - angle_start) / 180 * pi)) %>%
round(3)
out_data$Y_Start_Outer <- (
-1 * out_data$R_Outer *
sin((90 - angle_start) / 180 * pi)) %>%
round(3)
out_data$X_End_Outer <- (
out_data$R_Outer *
cos((90 - angle_end) / 180 * pi)) %>%
round(3)
out_data$Y_End_Outer <- (
-1 * out_data$R_Outer *
sin((90 - angle_end) / 180 * pi)) %>%
round(3)
out_data$X_Start_Inner <- (
out_data$R_Inner *
cos((90 - angle_start) / 180 * pi)) %>%
round(3)
out_data$Y_Start_Inner <- (
-1 * out_data$R_Inner *
sin((90 - angle_start) / 180 * pi)) %>%
round(3)
out_data$X_End_Inner <- (
out_data$R_Inner *
cos((90 - angle_end) / 180 * pi)) %>%
round(3)
out_data$Y_End_Inner <- (
-1 * out_data$R_Inner *
sin((90 - angle_end) / 180 * pi)) %>%
round(3)
template %>%
whisker.render(data = out_data) %>%
catWith this beautiful smile we have overcome the biggest technical hurdles. We will now move to the interactivity aspect.
As promised, I am going to make the plot interactive. I will test this interactivity on a single shape (the one we prepared in previous paragraph). Suppose I would like, whenever I hover over the shape, the name of the subject to appear in an information box below. As it is good practise to separate the HTML elements from their styling and from their actions. We start by defining a piece of HTML template that is to be completed with the data:
infobox-template.html
<div id="infobox-template-03" style="display: none;">
<aside>
<table class="shell-subject">
<caption>Subject details</caption>
<colgroup>
<col class="prop-label">
<col class="prop-value">
</colgroup>
<tr>
<th scope="row">Name</th>
<td>{{Subject}}</td>
</tr>
</table>
</aside>
</div>The only piece of information to complete this template, in this case, is the subject’s identifier Subject. We know how to complete a template using whisker. However, this HTML snippet is to be complete not during R’s compile time but on the fly and so needs to reside on the client. So, where does the data come from? There are many options to store the data. Probably the slickest way to store the data client-side is by using the storage options of your browser. However, we are going to store the data inside attributes of the object we are hovering over. Having the data attached to its source makes a lot of sense, but may not be the most memory-efficient way. We are going to store the headers separately from the data values to avoid repetition, but even then we would be wasteful whenever we need multiple HTML elements types to respond to a mouse hovering.
We now wrap the svg inside a division add the infobox-03 element to hold the completed data:
shell_template.svg
<div class="shell-plot">
<svg viewBox="-.7 -1.1 1.4 2.2" width="200" height="200" data-infobox-target="infobox-03"
data-infobox-template="infobox-template-03">
<g fill="none" stroke="#00B0F0" stroke-width=".002">
<line x1="+0.0" y1="-1.1" x2="0.0" y2="2.2" />
<line x1="-1.1" y1="+0.0" x2="2.2" y2="0.0" />
<rect x="-1" y="-1" width="2" height="2" />
<circle cx="0" cy="0" r=".3" />
<text x="0" y="-0.5" text-anchor="middle" font-size=".2" fill="#00B0F0">Hover over me!</text>
</g>
<g fill="white" stroke="#00B0F0" stroke-width=".02">
<path data-subject-name="abcdef" class="episode" d="
M {{X_Start_Outer}} {{Y_Start_Outer}}
A {{R_Outer}} {{R_Outer}} {{Delta_Angle}} {{Large_Arc}} 1 {{X_End_Outer}} {{Y_End_Outer}}
L {{X_End_Inner}} {{Y_End_Inner}}
A {{R_Inner}} {{R_Inner}} {{Delta_Angle}} {{Large_Arc}} 0 {{X_Start_Inner}} {{Y_Start_Inner}}
L {{X_Start_Outer}} {{Y_Start_Outer}}" />
</g>
</svg>
<div id="infobox-03" />
</div>Notice the addition of the data-subject-name and class attributes on the path element as well as the division at the bottom which will be containing the completed template upon hover. Obviously, here one needs to work with the id attribute instead of the class attribute as we want to be able to have multiple shell plots on a HTML page each connected to their specific information box. So we have the data template, we have the data and we have the information box recipient. Now, we only need to write some Javascript code to handle the template completion upon hover:
shell.js
// Function carrying out the completion of the template
const complete = (template, data) => {
for (var item in data) {
template = template.replace(RegExp('{{' + item + '}}', 'g'), data[item]);
}
template = template.replace(/{\w+}/g, '');
return template;
}
// Function to be executed upon mouse entering one of the source elements
const onEpisodeMouseEnter = (e) => {
const targetName = e.target.closest("svg").dataset.infoboxTarget;
const templateName = e.target.closest("svg").dataset.infoboxTemplate;
target = document.getElementById(targetName);
template = document.getElementById(templateName).innerHTML;
data = { Subject: e.target.dataset.subjectName };
target.innerHTML = complete(template, data);
}
// Function to be executed upon mouse leaving one of the source elements
const onEpisodeMouseLeave = (e) => {
const targetName = e.target.closest("svg").dataset.infoboxTarget;
target = document.getElementById(targetName);
target.innerHTML = "";
}
// Function to be executed upon clicking one of the source elements
const onEpisodeMouseClick = (e) => {
data = { Subject: e.target.dataset.subjectName };
var type = "text/plain";
var blob = new Blob([JSON.stringify(data)], { type });
var clipData = [new ClipboardItem({ [type]: blob })];
navigator.clipboard.write(clipData)
}
const onDocumentReady = () => {
document.querySelectorAll(".episode").
forEach(el => {
el.addEventListener("mouseenter", onEpisodeMouseEnter);
el.addEventListener("mouseleave", onEpisodeMouseLeave);
el.addEventListener("click", onEpisodeMouseClick)
});
}
// Binding the source element to the action function when document is ready
document.addEventListener('DOMContentLoaded', onDocumentReady);Notice that I have mouseenter and mouseleave events declared as well as the click event to store the current data to the user’s clipboard.
If you click the sector, the following text should be placed on your clipboard (try in browser as it may not work in your RStudio viewer):
{"Subject":"abcdef"}Within the R landscape I came across many ways to provide colors for a color scale. I have been thinking about the best way to implemented this for the shell plot and I came to the following conclusions:
xFor example, suppose we want to define the color of the episode sectors based on the binned version of the Measure feature, we can prepare the data like so:
example.Rmd
measure_breaks <- c(-Inf, 2, 150, Inf)
labels <- c("Low", "Medium", "High")
epi[, Group := Measure %>%
cut(breaks = measure_breaks, labels = labels) %>%
addNA(ifany = TRUE)]
epi$Group %>% tableFALSE .
FALSE Low Medium High <NA>
FALSE 1676 852 155 283
With a properly defined factored feature, it becomes really easy to draw a legend. Here is some test code:
example.Rmd
library(colorspace)
cols_bord <- epi$Group %>%
uniqueN %>%
add(-1) %>%
rainbow %>%
c("grey") %>%
darken(.3)
cols_body <- epi$Group %>%
uniqueN %>%
add(-1) %>%
rainbow %>%
c("lightgrey")
out_data <- epi[order(Group), .(
Lab_Y = -.9 + (.GRP) * .1,
Symbol_Y = -.95 + (.GRP) * .1,
Symbol_Stroke_Color = cols_bord[.GRP],
Symbol_Color = cols_body[.GRP]), .(Label = Group)] %>%
rowSplit %>% unname
out_data = list(
legend = out_data,
Lab_Color = "#646464",
Lab_Dim = .08,
Symbol_Dim = .05,
Symbol_X = -.5,
Lab_X = -.4,
Symbol_Stroke_Width = .01
)
"svg/shell_template_4.svg" %>%
readChar(., file.info(.)$size) %>%
whisker.render(data = out_data) %>%
catThe last piece of the puzzle are the radial grid lines, including the subject axis. First, we need to calculate the maximum possible angle \(\alpha_{max}\) so that the observation sectors do not overlap with the subject labels. Then we can calibrate the observation axis so that \(t_{max}\) corresponds to \(\alpha_{max}\). Finally, we can use the pretty functionality of R so that the grid lines get logically-separated along the range of the observations axis.
To calculate \(\alpha_{max}\), we first need the computed lengths of the subject labels (\(\ell_i\)). I tried to use strwidth() and strheight() functions but later switched to writing my own function to measure the length of a label. The font family determines the width of the printed font, so we need to take that into consideration. As shown above in the CSS file, I have used the Roboto font but obviously, this will become an option. First, we look at the function to calculate label widths:
shell.R
library(systemfonts)
get_label_widths <- function(labels, font_height, style) {
labels <- labels %>% as.character
glyphs <- labels %>% strsplit("") %>% unlist %>% unique
glyph_details <- glyph_info(glyphs, style$font$family) %>% as.data.table
glyph_ar <- glyph_details[,
.(Glyph = glyph, Width = x_advance, Height = height)]
setkey(glyph_ar, "Glyph")
label_width <- function(label) {
label_ar <- glyph_ar[label %>% strsplit("") %>%
unlist, sum(Width) / max(Height)]
return(font_height * label_ar)
}
label_widths <- labels %>%
lapply(label_width) %>%
unlist
return(label_widths)
}After converting the labels, which might be stored as a factor, to a character vector, we extract all used characters with %>% strplit("") %>% unlist. Then we query Roboto’s glyph table using systemfonts::glyph_info() function for these particular characters and store the resulting aspect ratios in the variable glyph_ar. Given a word and the font height (i.e. font size in user units), we can then easily calculate the width of a label as it would be printed on a screen in a particular font family and in a particular font size. Mind that, unlike for strwidth() and strheight() we do not actually need to plot the string, which saves a lot of resources (I may actually talk about green computing on another occasion).
shell.R
subject_names <- epi$Subject %>% levels %>% extract(1:13)
font_size <- (1 - layout_default$a) / length(subject_names)
style_01 <- list(font=list(family = "Roboto"))
(predicted <- subject_names %>%
get_label_widths(font_size, style_01)) [1] 0.6076923 0.5469231 0.4769231 0.6300000 0.4538462 0.6507692 0.4984615
[8] 0.4984615 0.9000000 0.6784615 0.5923077 0.5746154 0.4223077
In case you want to reproduce this code and not want to install Roboto on your windows system, you can download the font, say in the fnt subfolder, import it and check its availability:
example.Rmd
library(extrafont)
font_import(paths = "fnt", prompt = FALSE)
fonts() %>% grep("Roboto", ., value = TRUE) %>% headNow that we have the lengths of the printed labels, we can continue to calculate the max angle of the observation axis:
\[\alpha_{max}=min\left[atan2\left(-(\ell_i+b),\frac{(i-n)(1-a)}{n}\right)-a\right]-s\]
where \(\ell_i\) is the cumulative maximum of \(l_i\). Notice how I vectorize atan2 to ease programming:
shell.R
n <- length(predicted)
l_i <- cummax(predicted %>% rev) %>% rev
i <- 1:n
s <- pi / 12
x <- -(l_i + layout_default$b)
y <- (i - n) * (1 - layout_default$a) / n - layout_default$a
atan3 <- Vectorize(atan2) # Sorry, I couldn't resist
(alpha_max <- 5 * pi / 2 - min(atan3(-y, x)) - s)FALSE [1] 5.241806
, which, in case you do not like radians, roughly corresponds to \(\frac{13\pi}{8}\) or 300°.
With \(\alpha_{max}\) known, we can finally draw the axial grid lines. I collect the max observation years of the top 200 rows of data (which corresponds to the 13 label names plotted before) and use pretty to find a nice distribution of angles:
shell.R
a <- .3
c <- .05
d <- .02
max_obs_years <- epi %>% head(200) %$% End %>% max
years_to_rad <- function(x) {
(5 * pi / 2) - (x / max_obs_years * alpha_max)
}
obs_axis <- data.table(
Label = c(0, max_obs_years) %>%
pretty %>%
extract(. < (max_obs_years)))
obs_axis[, Axis_Angle := Label %>% years_to_rad]
obs_axis[, Gridline_Start_X := a * cos(Axis_Angle)]
obs_axis[, Gridline_Start_Y := -a * sin(Axis_Angle)]
obs_axis[, Gridline_End_X := (1 + d) * cos(Axis_Angle)]
obs_axis[, Gridline_End_Y := -(1 + d) * sin(Axis_Angle)]
obs_axis[, Axis_Label_X := (1 + d + c) * cos(Axis_Angle)]
obs_axis[, Axis_Label_Y := -(1 + d + c) * sin(Axis_Angle) + 0.025]
obs_axis[, Anchor := ifelse(abs(Axis_Label_X) < a, "middle",
ifelse(Axis_Label_X > 0, "start", "end"))]And here is the result:
out_data = list(
grid_lines = obs_axis %>% rowSplit %>% unname,
layout = list(m = .053),
Gridline_Color = "pink",
Gridline_Width = .006,
Unit = "yr"
)
"svg/shell_template_5.svg" %>%
readChar(., file.info(.)$size) %>%
whisker.render(data = out_data) %>%
catNot shown here is the shortening of the grid lines based on the data as indicated in Figure 3.
You can find the final solution on my Github page.
The Uppsala Conflict Data Program (UCDP) provides global data about the wars and other forms of armed conflicts that have raged during the past 30 years.
conflicts.html
<div class="infobox-template" style="display: none;">
<aside>
<p class="shell-message">Click episode to copy the data to the clipboard.</p>
<table class="cycle">
<caption>{{Subject}}</caption>
<colgroup>
<col class="prop-label">
<col class="prop-value">
</colgroup>
<tr>
<th scope="row">Period</th>
<td>{{Start_Date}} - {{End_Date}}</td>
</tr>
<tr>
<th scope="row">Type</th>
<td>{{Type}}</td>
</tr>
<tr>
<th scope="row">Location</th>
<td>{{Location}}</td>
</tr>
<tr>
<th scope="row">Country</th>
<td>{{Country}}</td>
</tr>
</table>
<table class="observations">
<caption>Casualties</caption>
<colgroup>
<col class="prop-label">
<col class="prop-value">
</colgroup>
<tr>
<th scope="row">Side A | Side B</th>
<td>{{Casualties_Sides}}</td>
</tr>
<tr>
<th scope="row">Civilian</th>
<td>{{Casualties_Civilians}}</td>
</tr>
<tr>
<th scope="row">Estimate (low - high)</th>
<td>{{Casualties_Estimate_String}}</td>
</tr>
</table>
<table class="subject">
<caption>Conflict details</caption>
<colgroup>
<col class="prop-label">
<col class="prop-value">
</colgroup>
<tr>
<th scope="row">Conflict name</th>
<td>{{Subject}}</td>
</tr>
<tr>
<th scope="row">Sides</th>
<td>{{Sides}}</td>
</tr>
<tr>
<th scope="row">Duration</th>
<td>{{Conflict_Duration}} yrs</td>
</tr>
<tr>
<th scope="row">Activity</th>
<td>{{Percent_Active}} %</td>
</tr>
</table>
</aside>
</div>conflicts <- fread("dat/conflicts.tsv", sep = "\t")
breaks <- c(-Inf, 1, 2, 4, 8, Inf)
labels <- c("Very low", "Low", "Medium", "High", "Very high")
conflicts[, Group :=
Casualties_Estimate %>% log %>%
cut(breaks = breaks, labels = labels)]
layout <- copy(layout_default)
layout$legend <- data.table(
Id = labels %>% c(NA),
Label = labels %>% c("(missing)"),
Color = colorRampPalette(c("steelblue", "orange"))(5) %>%
c("lightgrey"))
layout$fx <- .4
layout$fy <- -1
layout$unit = "yr"
layout$legend_title <- "Casualties"
conflicts[
Episode_Count > 10 & Conflict_Duration < 10 &
Duration_Active > .8] %>%
shell_plot(
width = "500px",
height = "500px",
layout = layout,
infobox_template = "html/conflicts.html"
) %>% catThe NOAA/WDS tsunami database consists of the set of global historical tsunami events for which sufficient evidence is being provided.
waves <- fread("dat/waves.tsv", sep = "\t")
breaks <- c(-Inf, 1, 2, 5, 10, 25, Inf)
labels <- c("< 1m", "1-2m", "2-5m", "5-10m", "10-25m", "> 25m")
waves[, Group := Wave_Height %>%
cut(breaks = breaks, labels = labels)]
layout <- copy(layout_default)
layout$legend <- data.table(
Id = labels %>% c(NA),
Label = labels %>% c("(missing)"),
Color = colorRampPalette(c("steelblue", "orange"))(6) %>%
c("lightgrey"))
layout$legend_title <- "Wave height"
layout$fx <- -.4
layout$fy <- -.9
layout$s <- .3
layout$unit <- function (x) {
return(paste0(10 ^ x, "km"))
}
waves[, Start := Wave_Distance %>% add(.1) %>% log10]
waves[, End := Start + .1]
waves[, Subject := Quake %>% as.character %>% as.factor]
waves <- waves[!is.na(Start) & !is.na(Subject)]
waves[, Episode_Count := .N, Subject]
waves[Episode_Count > 10 & Episode_Count < 100] %>%
shell_plot (
width = "500px",
height = "800px",
layout = layout, infobox_template = "html/waves.html") %>% catI tend to avoid comma-separators where possibly for, I hope, the obvious reasons. Whoever has started the idea that a comma would be a good separator is a fool anyway↩︎