Yongzhe Wang

Kaplan Meier Curve in R with ggplot2

Kaplan Meier

In this tutorial, we will go through survival curves (e.g. Kaplan-Meier curve, Cox model curve) with an extension package survminer from ggplots.

1. Format of dataset

The dataset used for the tutorial includes follow-up time for patients with a specific type of cancer and their corresponding endpoint for the last follow-up visits. It includes clinical variables for patients as well. The detail is listed below:

2. Kaplan-Meier Curve

(1). Version 1: Basic

We will draw a basic version of Kaplan-Meier curve with the aim of survminer. We first need to fit a Kaplan-Meier estimator for the cancer Stage variable and this will require survival package. In here, we will implement the following functions:

# Fit a Kaplan-Meier curve for cancer stage
KM <- survfit(Surv(Time, Status) ~ Stage, data = Dt.Plot)

After we have a Kaplan-Meier estimator, we can plug into the main function for our plot ggsurvplot(). There are several arguments for the function and we are going to introduce them:

# Basic: survical curve with ggplot
ggsurvplot(fit = KM, data = Dt.Plot,   # Kaplan-Meier estimator and dataset               
           pval = T,                   # show p-value
           pval.size = 4,              # font size of p-value
           pval.coord = c(175, 1.0),   # coordination of p-value (x, y)
           legend = 'right',           # location of legend
           surv.median.line = 'hv')    # show median survival line 

Since the sample size of the dataset is large, the Keplan-Meier curves look smooth and log-rank test shows significance for difference in trends among stages.

(2). Version 2: Facet plot

The second version is to draw a multipanel survival plot. We want to stratify the Kaplan-Meier curve into different combinations of treatments and stages so we create another Kaplan-Meier estimator stratified by both Treatment and Stage. We also introduce two other arguments for this version:

KM2 <- survfit(Surv(Time, Status) ~ Stage + Treatment, data = Dt.Plot)
ggsurvplot(KM2, data = Dt.Plot, 
           facet.by = 'Treatment',                                          # create multipanel used variable "Treatment"
           pval = TRUE, 
           legend = 'right', 
           legend.labs = c('Stage I', 'Stage II', 'Stage III', 'Stage IV')) # create a self-defined legend label

We can a 2-by-2 multi-panel survival plot, grouped by Treatment and in each panel, it is stratified by cancer Stage. Tests on trends are performed for each panel as well. In the next version, we are going to explore how to the modification of non-graphical argument for survival plots (i.e., font size/family, etc.).

(3). Version 3: Modification of non-graphical components

In here, we first introduce a self create function for modifying non-graphical components and this self defined function uses ggplot2 arguments.

# Customize theme
Customized.Theme <- function(){
  theme_survminer() %+replace%                                        # a ggplot2 function 
    theme(panel.border = element_blank(),
          panel.background = element_blank(),                    
          panel.grid.major = element_blank(), 
          panel.grid.minor = element_blank(), 
          axis.line.x = element_line(),                               # these two are for the axis line
          axis.line.y = element_line(),
          axis.text.x = element_text(family = 'Times', colour = "black", size = 11),    # there two are for texts in axes
          axis.text.y = element_text(family = 'Times', colour = "black", size = 11),
          axis.ticks.x = element_line(),                              # these two are for ticks in axes
          axis.ticks.y = element_line(),
          axis.title.x = element_text(family = 'Times', colour = "black", size = 11, face = 'bold', vjust = -1),                              
          axis.title.y = element_text(family = 'Times', colour = "black", size = 11, face = 'bold', angle = 90, vjust = 3),
          legend.title = element_text(family = 'Times', colour = "black", size = 11, face = 'bold'),
          legend.text = element_text(family = 'Times', colour = "black", size = 11),
          strip.text.x = element_text(family = 'Times', colour = "black", size = 11))
}

In the Customized.Theme() function, we set the font family for text as “Times” and size as 11 for axes, legends, and labels. Meanwhile, we remove the background and grid and use self defined color/palette for different cancer Stage as usual.

# Color/palette for cancer stage
Legend.Labs <- c('Stage I', 'Stage II', 'Stage III', 'Stage IV')            # create labels for cancer stages
n.color <- length(Legend.Labs)                                              # the lenght of vector for labels
Palette.New <- rainbow(n.color, alpha = 0.5)                                # obtain a vector of colors for labels

# Kaplan-Meier estimator
KM2 <- survfit(Surv(Time, Status) ~ Stage + Treatment, data = Dt.Plot)    

# Survival plot
ggsurvplot(KM2, data = Dt.Plot, 
           facet.by = 'Treatment',                                          # create multipanel used variable "Treatment"
           pval = TRUE, 
           legend = 'right',                                                # legend position
           legend.labs = Legend.Labs,                                       # self-defined legend label
           palette = Palette.New,                                           # self-defined color
           xlab = 'Time (Month)', ylab = 'Survival Probability',            # modify X-axis and Y-axis labels
           ggtheme = Customized.Theme())                                    # use modified non-graphical components

Here we go! For other non-graphical components, we can change them in the Customized.Theme() and follow arguments in ggplot2.