Yongzhe Wang

Heatmap in R with ggplot2

Heatmap

This is a short tutorial for making heatmaps in R with ggplot2. In general, a heatmap is intended to show a (numerical) correlation between a pair of features/covariates/variables and mostly a correlation matrix will be the input of a heatmap. However, it is possible that we just want to show the longitudinal change/trend of subjects in a heatmap and under this case, the scale in a heatmap will only have few levels (e.g. Yes/No, High/Median/Low, etc.).

Part 1: Categorical/Nominal variable

1. Format of dataset for heatmap

For example, a cohort study recorded a list of symptoms that each participant experienced and reported in follow-up surveys. In here, each symptom will only contain two levels: “Yes” for experiencing the symptom at X-month follow-up and “No” for no experiencing the symptom at X-month follow-up. We used a fake dataset to show how we can visualize the longitudinal change/trend of participants’ symptoms. The dataset has 5 columns:

In the ggplot, no matter what kind of plots we make, they are basically composed of two parts: X-axis and Y-axis. Since we want to create a heatmap for the longitudinal reported symptoms, we set X-axis as Month, Y-axis as PTID, and filled values for tiles in the heatmap with Experienced.Symptom. Since we have 10 symptoms and each of them were recorded at 3 different timepoints, we need to work on each symptom first and then arrange them horizontally.

2. Heatmap for one symptom

We take Fatigue as an example and we will extract rows for Fatigue.

# Extract rows for Fatigue #################################################################################
Fatigue <- Dt[which(Dt$Symptom == 'Fatigue'), ]
# Heatmap ##################################################################################################
p_Fatigue <- 
  ggplot(Fatigue, 
         aes(x = Month, y = reorder(PTID, n.symptoms))) + 
  geom_tile(aes(fill = Experienced.Symptom)) + 
  scale_fill_manual(name = 'Experienced Symptom', 
                    values = c('No' = 'aquamarine2', 'Yes' = 'firebrick2', 'NA' = 'gray45')) + 
  ggtitle('Fatigue') + 
  theme(panel.grid.major = element_blank(),      # remove background and grid
        panel.background = element_blank(), 
        axis.line = element_blank(),             # remove lines in X- and Y-axes
        axis.ticks.x = element_blank(),          # remove ticks in X- and Y-axes
        axis.ticks.y = element_blank(),
        axis.title.x = element_blank(),          # remove titles in X- and Y-axes
        axis.title.y = element_blank(),
        axis.text.x = element_text(size = 15, color = 'black', face = "bold", angle = 45, hjust = 1),
        axis.text.y = element_text(size = 15, color = 'black', face = "bold"),
        legend.text = element_text(size = 15, color = 'black', face = "bold"),
        legend.title = element_text(size = 15, face = "bold", color = 'black'),
        plot.title = element_text(size = 20, hjust = 0.5, face = "bold", color = 'black'))
p_Fatigue

The above is the heatmap for one symptom (fatigue) and we are going to layout details for the code below:

3. Heatmap for multiple symptoms

Now we are going to pack heatmaps for multiple symptoms together. For the first heatmap (in here it is Fatigue), we need to remove the legend and for the last one, we add the legend to it but remove texts in Y-axis. For others heatmaps in the middle of a row, we will remove texts in Y-axis and legend for better display. Thus, we create the first and the last one separately and then create others in a loop.

# First: Fatigue #########################################################################################
Fatigue <- Dt[which(Dt$Symptom == 'Fatigue'), ]
p_Fatigue <- 
  ggplot(Fatigue, 
         aes(x = Month, y = reorder(PTID, n.symptoms))) + 
  geom_tile(aes(fill = Experienced.Symptom)) + 
  scale_fill_manual(name = 'Experienced Symptom', 
                    values = c('No' = 'aquamarine2', 'Yes' = 'firebrick2', 'NA' = 'gray45')) + 
  ggtitle('Fatigue') + 
  theme(panel.grid.major = element_blank(),      # remove background and grid
        panel.background = element_blank(), 
        axis.line = element_blank(),             # remove lines in X- and Y-axes
        axis.ticks.x = element_blank(),          # remove ticks in X- and Y-axes
        axis.ticks.y = element_blank(),
        axis.title.x = element_blank(),          # remove titles in X- and Y-axes
        axis.title.y = element_blank(),
        axis.text.x = element_text(size = 15, color = 'black', face = "bold", angle = 45, hjust = 1),
        axis.text.y = element_text(size = 15, color = 'black', face = "bold"),
        legend.position = 'none',                # remove legend
        plot.title = element_text(size = 20, hjust = 0.5, face = "bold", color = 'black'))

# Last: Hair Loss #########################################################################################
Hair_Loss <- Dt[which(Dt$Symptom == 'Hair Loss'), ]
p_Hair_Loss <- 
  ggplot(Hair_Loss, 
         aes(x = Month, y = reorder(PTID, n.symptoms))) + 
  geom_tile(aes(fill = Experienced.Symptom)) + 
  scale_fill_manual(name = 'Experienced Symptom', 
                    values = c('No' = 'aquamarine2', 'Yes' = 'firebrick2', 'NA' = 'gray45')) + 
  ggtitle('Hair Loss') + 
  theme(panel.grid.major = element_blank(),      # remove background and grid
        panel.background = element_blank(), 
        axis.line = element_blank(),             # remove lines in X- and Y-axes
        axis.ticks.x = element_blank(),          # remove ticks in X- and Y-axes
        axis.ticks.y = element_blank(),
        axis.title.x = element_blank(),          # remove titles in X- and Y-axes
        axis.title.y = element_blank(),
        axis.text.x = element_text(size = 15, color = 'black', face = "bold", angle = 45, hjust = 1),
        axis.text.y = element_blank(),
        legend.text = element_text(size = 15, color = 'black', face = "bold"),
        legend.title = element_text(size = 15, face = "bold", color = 'black'),
        plot.title = element_text(size = 20, hjust = 0.5, face = "bold", color = 'black'))

# Middle: other symptoms  #################################################################################
Var.Middle <- c('Headache', 'Muscle Aches', 'Breathing Difficulties', 'Loss Taste/Smell', 
                'Joint Pain', 'Vertigo', 'Lowering Vision', 'Brain Fog')
Heatmap.Middle <- list()
for(i in 1:length(Var.Middle)){
  Data <- Dt[which(Dt$Symptom == Var.Middle[i]), ]
  p <- 
  ggplot(Data, 
         aes(x = Month, y = reorder(PTID, n.symptoms))) + 
  geom_tile(aes(fill = Experienced.Symptom)) + 
  scale_fill_manual(name = 'Experienced Symptom', 
                    values = c('No' = 'aquamarine2', 'Yes' = 'firebrick2', 'NA' = 'gray45')) + 
  ggtitle(Var.Middle[i]) + 
  theme(panel.grid.major = element_blank(),      # remove background and grid
        panel.background = element_blank(), 
        axis.line = element_blank(),             # remove lines in X- and Y-axes
        axis.ticks.x = element_blank(),          # remove ticks in X- and Y-axes
        axis.ticks.y = element_blank(),
        axis.title.x = element_blank(),          # remove titles in X- and Y-axes
        axis.title.y = element_blank(),
        axis.text.x = element_text(size = 15, color = 'black', face = "bold", angle = 45, hjust = 1),
        axis.text.y = element_blank(),
        legend.position = 'none',                # remove legend
        plot.title = element_text(size = 20, hjust = 0.5, face = "bold", color = 'black'))
  Heatmap.Middle[[i]] <- p
}

To remove the the legend in plots, we pass the command legend.position = 'none' into the function theme(). Since we have multiple plots now, we are going to arrange their positions and eventually pack them together!

This time since the width of plots are different since the first and last one contains more non-data components than the heatmaps in the middle, we need to use a layout matrix to arrange the position of heatmaps. The idea for the matrix is very simple and we just need to repeat the rank number for each heatmap multiple times (the number of repeat is used to modify the width of each heatmap). In here, we set up the width for the first nine heatmaps as 8 units and the width for the last heatmap as 16 units (since it has a legend). Since all heatmaps are displayed in a row, the layout matrix is just a row vector which contains different lengths of repeated rank numbers and it looks like

Layout.Mat <- matrix(c(rep(1:9, each = 8), rep(10, 16)), nrow = 1)
Layout.Mat

To arrange the position of heatmaps, we use the function grid.arrange() and pass the command layout_matrix = Layout.Mat into it.

grid.arrange(p_Fatigue,
             Heatmap.Middle[[1]],
             Heatmap.Middle[[2]],
             Heatmap.Middle[[3]],
             Heatmap.Middle[[4]],
             Heatmap.Middle[[5]],
             Heatmap.Middle[[6]],
             Heatmap.Middle[[7]],
             Heatmap.Middle[[8]], 
             p_Hair_Loss, 
             layout_matrix = Layout.Mat)

It eventually looks like the above one! Its actual size (width * height) and resolution should be adjusted when you are going to output it from R.

Part 2: Numerical variable

1. Format of dataset for heatmap

Similar to the dataset on the first part, the only difference is that this time we are going to fill out numerical values for tiles rather than categorical/nominal values.

This dataset includes 4 variables:

2. Heatmap for one type of Comparison

In here, we first create a heatmap for a pair of symptoms in acute and 6-month timepoint.

# Extract the specific type #################################################################################
Dt1 <- Dt[which(Dt$Comparison == "Acute vs 6-month"), ]
# Plot ###################################################################################################### 
p_acute_6m <- 
  ggplot(Dt1, aes(x = Symptom.PriorTimepoint, y = Symptom.LaterTimepoint)) + 
  geom_tile(aes(fill = Corr)) + 
  ggtitle('Acute vs 6-month') + 
  scale_fill_gradientn(breaks = seq(-1, 1, length.out = 6), 
                       colors = viridis(6), n.breaks = 6, limits = c(-1, 1), na.value = 'gray27') + 
  theme(panel.grid.major = element_blank(), 
        panel.background = element_blank(), 
        axis.line = element_blank(),
        axis.ticks.x = element_blank(),
        axis.ticks.y = element_blank(),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        axis.text.x = element_text(size = 15, color = 'black', face = "bold", angle = 45, hjust = 1),
        axis.text.y = element_text(size = 15, color = 'black', face = "bold"),
        plot.title = element_text(size = 20, hjust = 0.5, face = "bold", color = 'black'),
        legend.text = element_text(size = 15, hjust = 0.5, color = 'black'),
        legend.title = element_text(size = 15, hjust = 0.5, face = "bold", color = 'black'))
p_acute_6m

All non-data component settings for heatmaps in part 2 are the same as the part 1 except the function scale_fill_gradientn().

3. Heatmap for multiple types of Comparison

Similar to the part 1, we will remove the legend for the first heatmap, remove the texts in Y-axis and legend for the heatmap in the middle, and remove the texts in Y-axis for the last heatmap.

# First: Acute vs 6-month ############################################################################
Dt1 <- Dt[which(Dt$Comparison == "Acute vs 6-month"), ]
p_acute_6m <- 
  ggplot(Dt1, aes(x = Symptom.PriorTimepoint, y = Symptom.LaterTimepoint)) + 
  geom_tile(aes(fill = Corr)) + 
  ggtitle('Acute vs 6-month') + 
  scale_fill_gradientn(breaks = seq(-1, 1, length.out = 6), 
                       colors = viridis(6), n.breaks = 6, limits = c(-1, 1), na.value = 'gray27') + 
  theme(panel.grid.major = element_blank(), 
        panel.background = element_blank(), 
        axis.line = element_blank(),
        axis.ticks.x = element_blank(),
        axis.ticks.y = element_blank(),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        axis.text.x = element_text(size = 15, color = 'black', face = "bold", angle = 45, hjust = 1),
        axis.text.y = element_text(size = 15, color = 'black', face = "bold"),
        plot.title = element_text(size = 20, hjust = 0.5, face = "bold", color = 'black'),
        legend.position = 'none')

# Middle: 6-month vs 12-month #######################################################################
Dt2 <- Dt[which(Dt$Comparison == "6-month vs 12-month"), ]
p_acute_12m <- ggplot(Dt2, aes(x = Symptom.PriorTimepoint, y = Symptom.LaterTimepoint)) + 
  geom_tile(aes(fill = Corr)) + 
  ggtitle('6-month vs 12-month') + 
  scale_fill_gradientn(breaks = seq(-1, 1, length.out = 6), 
                       colors = viridis(6), n.breaks = 6, limits = c(-1, 1), na.value = 'gray27') + 
  theme(panel.grid.major = element_blank(), 
        panel.background = element_blank(), 
        axis.line = element_blank(),
        axis.ticks.x = element_blank(),
        axis.ticks.y = element_blank(),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        axis.text.x = element_text(size = 15, color = 'black', face = "bold", angle = 45, hjust = 1),
        axis.text.y = element_blank(),
        plot.title = element_text(size = 20, hjust = 0.5, face = "bold", color = 'black'),
        legend.position = 'none')

# Last: 12-month vs 24-month ##########################################################################
Dt3 <- Dt[which(Dt$Comparison == "12-month vs 24-month"), ]
p_acute_24m <- ggplot(Dt3, aes(x = Symptom.PriorTimepoint, y = Symptom.LaterTimepoint)) + 
  geom_tile(aes(fill = Corr)) + 
  ggtitle('12-month vs 24-month') + 
  scale_fill_gradientn(breaks = seq(-1, 1, length.out = 6), 
                       colors = viridis(6), n.breaks = 6, limits = c(-1, 1), na.value = 'gray27') + 
  theme(panel.grid.major = element_blank(), 
        panel.background = element_blank(), 
        axis.line = element_blank(),
        axis.ticks.x = element_blank(),
        axis.ticks.y = element_blank(),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        axis.text.x = element_text(size = 15, color = 'black', face = "bold", angle = 45, hjust = 1),
        axis.text.y = element_blank(),
        plot.title = element_text(size = 20, hjust = 0.5, face = "bold", color = 'black'),
        legend.text = element_text(size = 15, hjust = 0.5, color = 'black'),
        legend.title = element_text(size = 15, hjust = 0.5, face = "bold", color = 'black'))
# Arrange heatmaps ######################################################################################
Layout.Mat <- matrix(c(rep(1, 14), rep(2, 10), rep(3, 12)), nrow = 1)
grid.arrange(p_acute_6m, 
             p_acute_12m, 
             p_acute_24m, 
             layout_matrix = Layout.Mat)

Eventually, we get the final one!