Yongzhe Wang

Pie Chart in R with ggplot2

Cover

In this tutorial, I will demonstrate how to create a pie chart using the ggplot2 and ggrepel packages in R. A pie chart is a type of chart that displays numerical proportions of a variable in polar coordinates, similar to a bar chart. However, unlike a bar chart, a pie chart focuses on displaying percentages rather than raw counts. To create the chart, I will use the publicly available National Cancer Database (NCDB) to display the percentage of combinations of race and ethnicity with the type of cancer facility.

1. Dataset

After data cleaning, the dataset I used to create pie charts includes four variables:

I will use this simple dataset to first create a single pie chart for Mexican patients, and then multiple pie charts for the other patients. First, I will create a variable to display percentage labels in a pie chart. I will use the paste0() function to concatenate the numerical value of the Percentage with a percentage sign (%), and store the result as a string variable. And then I will set Origin and Type as factors.

# Percentage label
Dt.Plot$'Percentage.Label' <- paste0(Dt.Plot$Percentage, '%')

# Factor for Origin and Type
Dt.Plot$Origin <- factor(Dt.Plot$Origin, levels = c("Mexican", 'White, NH', 'Black, NH'))
Dt.Plot$Type <- factor(Dt.Plot$Type, levels = c("Comprehensive Community Cancer Program",
                                                "Academic/Research Program", 
                                                "Community Cancer Program", 
                                                "Integrated Network Cancer Program", 
                                                "Missing"))

In R, a pie chart maps a bar chart into polar coordinates. If a user wants to label percentages in a pie chart, they must provide coordinates for Percentage.Label in the polar system. I am going to use arrange(), desc(), group_by(), and mutate() functions from the dplyr package and cumsum() function in R base.

Here, I arranged the dataset by Origin and sorted it in decreasing order based on the values in Type.

An original bar plot from geom_bar() or geom_col() stacks rectangles for stratified variables vertically, and I plan to place the Percentage.Label in the middle of each rectangle.

This can be done by first grouping the data frame by the Origin variable, and then subtracting half of the Percentage from the cumulative sum of Percentage for each Origin.

# Pie chart
Dt.PieChart <- 
  Dt.Plot %>% 
  arrange(Origin, desc(Type)) %>%  # arrange it with Origin and decreasing order of Type
  group_by(Origin) %>%             # group by Origin
  mutate(text_y = cumsum(Percentage) - Percentage/2) # the middle of each rectangle

head(Dt.PieChart)

Dt.PieChart will be used to create pie charts!

2. Single pie

To create a single pie chart, three functions are needed to be used:

I will extract a subset of Dt.PieChart for 'Mexican' and create a pie chart to show the percentage of facility type for Mexican patients.

Dt.Mexican <- Dt.PieChart[which(Dt.PieChart$Origin == 'Mexican'), ]

ggplot(Dt.Mexican, 
       aes(x = '',               # We don't want to show x axis
           y = Percentage,       # y axis is numerical percentage
           fill = Type)) +       # color for each facility type
  geom_col(width = 1) +          # create a bar chart with width = 1
  coord_polar(theta = "y") +     # transfer y axis into polar coordinate
  scale_fill_brewer() +          # color 
  geom_text_repel(aes(y = text_y, 
                      label = Percentage.Label),  # label text
                  size = 4,                       # size of text
                  show.legend = FALSE,            # remove legend
                  nudge_x = 1.5) +                # the distance between text and figure
  theme(axis.text = element_blank(),
        axis.ticks = element_blank(),
        axis.title = element_blank(),
        panel.grid = element_blank(),
        panel.background = element_blank(),
        plot.background = element_blank(),
        legend.background = element_blank(),
        legend.title = element_text(size = 13, face = 'bold'), 
        legend.text = element_text(size = 13),
        strip.text = element_text(size = 13, face = 'bold'),      
        strip.background = element_blank())

3. Multiple pies

For multiple pie charts, we will need to use facet_grid() functions to form a matrix of panels defined by row and column faceting variables.

ggplot(Dt.PieChart, 
       aes(x = '',               # We don't want to show x axis
           y = Percentage,       # y axis is numerical percentage
           fill = Type)) +       # color for each facility type
  geom_col(width = 1) +          # create a bar chart with width = 1
  coord_polar(theta = "y") +     # transfer y axis into polar coordinate
  scale_fill_brewer() +          # color 
  geom_text_repel(aes(y = text_y, 
                      label = Percentage.Label),  # label text
                  size = 4,                       # size of text
                  show.legend = FALSE,            # remove legend
                  nudge_x = 1.5) +                # the distance between text and figure
  facet_grid(cols = vars(Origin),   # faceted by Origin variable
             scales = 'fixed') + 
  theme(axis.text = element_blank(),
        axis.ticks = element_blank(),
        axis.title = element_blank(),
        panel.grid = element_blank(),
        panel.background = element_blank(),
        plot.background = element_blank(),
        legend.background = element_blank(),
        legend.title = element_text(size = 13, face = 'bold'), 
        legend.text = element_text(size = 13),
        strip.text = element_text(size = 13, face = 'bold'),      
        strip.background = element_blank())

Here we go!