Yongzhe Wang

Gene Map Plot in R with ggplot2

Cover

This tutorial introduces the visualization of gene maps with arrows in R using the ggplot2 and gggenes packages. It provides a linear representation of gene strain segments in genomes, commonly employed in scientific publications for gene description. We’ll generate a synthetic dataset to simulate a gene map resembling Figure 1E in a recent paper detailing a mouse model for triple-negative breast cancer. This synthetic dataset comprises three distinct genomes, each containing different gene segments. With a straightforward structure, the dataset consists of four columns:

We will initially demonstrate how to visualize a single genome and then extend the discussion to multiple genomes. Two core functions are used to generate a gene map:

In addition to these functions, it’s essential to define the xmin (start location), xmax (end location), y (genome), and fill (gene segments) arguments within the aes() function. By specifying the fill argument, each gene within a genome will be categorized with a distinct color and displayed in the legend. To label each gene in the figure and remove the legend, users should include an additional argument, label, in the aes() function within the ggplot() function.

SingleGenome <- Dt[which(Dt$Molecule == 'Rosa LSL-Myc'), ]
# Version 1: default
p1 <- 
ggplot(SingleGenome, 
       aes(xmin = Start, xmax = End,      # Specify start/end location for a gene
           y = Molecule,                  # Specify genome
           fill = Gene)) +                # Specify color for genes
  geom_gene_arrow() +                     # Draw gene map as arrows
  scale_fill_brewer(palette = "Set3") +   # Specify color palette
  theme_genes() +                         # Specify theme for gene map
  theme(axis.text.x = element_blank(),    # Remove x axis 
        axis.ticks.x = element_blank(),   
        axis.title.x = element_blank(),
        axis.line.x = element_blank(),
        axis.title.y = element_blank(),   # Remove title in y axis
        axis.text.y = element_text(face = 'bold', color = 'black'),
        legend.title = element_text(face = 'bold', color = 'black'))
p1 

# Version 2: modify the shape of arrowhead
p2 <- 
ggplot(SingleGenome, 
       aes(xmin = Start, xmax = End,      # Specify start/end location for a gene
           y = Molecule,                  # Specify genome
           fill = Gene)) +                # Specify color for genes
  geom_gene_arrow(arrowhead_height = unit(3, "mm"),    # Modify height and width
                  arrowhead_width = unit(1, "mm")) +                     
  scale_fill_brewer(palette = "Set3") +   # Specify color palette
  theme_genes() +                         # Specify theme for gene map
  theme(axis.text.x = element_blank(),    # Remove x axis 
        axis.ticks.x = element_blank(),   
        axis.title.x = element_blank(),
        axis.line.x = element_blank(),
        axis.title.y = element_blank(),   # Remove title in y axis
        axis.text.y = element_text(face = 'bold', color = 'black'),
        legend.title = element_text(face = 'bold', color = 'black'))
p2 

# Version 3: add label to genes and remove legend
p3 <- 
ggplot(SingleGenome, 
       aes(xmin = Start, xmax = End,      # Specify start/end location for a gene
           y = Molecule,                  # Specify genome
           fill = Gene,                   # Specify color for genes
           label = Gene)) +                
  geom_gene_arrow(arrowhead_height = unit(3, "mm"),    # Modify height and width
                  arrowhead_width = unit(1, "mm")) +        
  geom_gene_label() +                     # Specify labels
  scale_fill_brewer(palette = "Set3") +   # Specify color palette
  theme_genes() +                         # Specify theme for gene map
  theme(axis.text.x = element_blank(),    # Remove x axis 
        axis.ticks.x = element_blank(),   
        axis.title.x = element_blank(),
        axis.line.x = element_blank(),
        axis.title.y = element_blank(),   # Remove title in y axis
        axis.text.y = element_text(face = 'bold', color = 'black'),
        legend.position = 'none')                        
p3

# Show figure
grid.arrange(p1, p2, p3, ncol = 1)

The initial plot represents the default version, the second plot reflects a slight modification to the arrowhead, and in the last one, we remove the legend while adding text labels to the genes. After visualizing a single genome, the next step involves creating a collective representation of multiple genomes in a single figure. Here, the facet_wrap() function from the ggplot2 package becomes essential. Given that genes in genomes occupy distinct locations, it’s crucial to include the argument scales = "free" in the function. This enables stacking various gene maps together using ncol = 1.

p4 <- 
ggplot(Dt, 
       aes(xmin = Start, xmax = End,      # Specify start/end location for a gene
           y = Molecule,                  # Specify genome
           fill = Gene,                   # Specify color for genes
           label = Gene)) +                
  geom_gene_arrow(arrowhead_height = unit(3, "mm"),    # Modify height and width
                  arrowhead_width = unit(1, "mm")) +        
  geom_gene_label() +                     # Specify labels
  facet_wrap(~ Molecule, scales = "free", ncol = 1) + 
  scale_fill_brewer(palette = "Set3") +   # Specify color palette
  theme_genes() +                         # Specify theme for gene map
  theme(axis.text.x = element_blank(),    # Remove x axis 
        axis.ticks.x = element_blank(),   
        axis.title.x = element_blank(),
        axis.line.x = element_blank(),
        axis.title.y = element_blank(),   # Remove title in y axis
        axis.text.y = element_text(face = 'bold', color = 'black'),
        legend.position = 'none')                        
p4

In this figure, each gene map for every genome is vertically stacked, with each gene designated a unique color. In this synthetic dataset for genomes, the genes are not shared across genomes. However, at times, the same genes are shared across different genomes, but located in different positions. Hence, we’ll introduce the make_alignment_dummies() function from the gggenes package to assist in aligning genes across genomes. We will use the sample dataset provided by the gggenes package to show how we can align genes across genomes. The make_alignment_dummies() function takes several arguments:

After passing the gene dataset into the make_alignment_dummies() function, it’s essential to store the result and subsequently input it into the geom_blank() function from the ggplot2 package when creating the figure. Let’s start by examining the sample dataset (example_genes) provided by the gggenes package, which comprises 6 columns:

As the sample dataset includes gene orientations, indicating the arrowhead orientation for each gene, this information can be integrated into the figure using an additional argument, aes(forward = orientation), in the geom_gene_arrow() function. Please note that the forward argument can only accept a vector with numerical values of 0 or 1. We can incorporate all this information to generate a comprehensive gene map plot:

dummies <- make_alignment_dummies(example_genes, 
                                  aes(xmin = start, xmax = end,y = molecule,id = gene), 
                                  on = "genE")

p5 <- 
ggplot(example_genes, 
       aes(xmin = start, xmax = end,      # Specify start/end location for a gene
           y = molecule,                  # Specify genome
           fill = gene))+                 # Specify color for genes
  geom_gene_arrow(aes(forward = orientation),
                  arrowhead_height = unit(3, "mm"),    # Modify height and width
                  arrowhead_width = unit(1, "mm")) +        
  geom_blank(data = dummies) + 
  facet_wrap(~ molecule, scales = "free", ncol = 1) + 
  scale_fill_brewer(palette = "Set3") +   # Specify color palette
  theme_genes() +                         # Specify theme for gene map
  theme(axis.text.x = element_blank(),    # Remove x axis 
        axis.ticks.x = element_blank(),   
        axis.title.x = element_blank(),
        axis.line.x = element_blank(),
        axis.title.y = element_blank(),   # Remove title in y axis
        axis.text.y = element_text(face = 'bold', color = 'black'))                        
p5

In this gene map, we can find that the figure was aligned in the geneE.