Formatting Math Symbols and Expressions in ggplot Labels

Yesterday, I was trying to put some finishing touches on a figure I made in ggplot2 that visualizes some simulation results. The plot features several panels using facet_grid(), and uses colors to distinguish between different regression models that were fit to the simulated data. I wanted to label certain axes and panel names using the Greek letters I had used as parameter notation, and I also wanted the labels in the color legend to correspond to the different regression models I had fit.

The problem was, I had no clue how to do this! So, I consulted #rstats Twitter, got some really great tips, and figured that I’d share them all in a quick demo blogpost (mostly so that I can easily find this info the next time I need it! 😂).


First, let’s load the necessary packages:

library(dplyr); library(ggplot2); library(scales)

Next, let’s generate some random data for plotting (I’m including two binary variables for grouping purposes):

data = data.frame(x = rnorm(50),
                  y = rnorm(50), 
                  c = factor(rep(c("a","b"),each=25)), 
                  d = factor(rep(0:1, length=50)))
Here’s what the data look like:
x y c d
0.3742627 1.5543698 a 0
-0.9425578 -0.0823036 a 1
-1.6657054 -1.0520083 a 0
0.5832060 -0.3072647 a 1
0.1599096 0.4622656 a 0
1.0003727 0.6914586 a 1
1.2789265 -2.1005350 a 0
1.1214342 0.7142599 a 1
-0.5663470 -0.5445171 a 0
-0.4726843 0.7298770 a 1
-0.3706856 -0.1009510 a 0
0.2621671 -0.3572437 a 1
0.8466112 -0.6987421 a 0
1.0164569 0.3779464 a 1
-0.5712356 -1.4183010 a 0
-0.0972607 -1.8983545 a 1
0.6966767 -0.2568583 a 0
0.5856234 0.4697543 a 1
-0.2805859 2.1859859 a 0
-0.9959265 0.3985238 a 1
0.4911207 -0.5948105 a 0
1.7149612 0.3328452 a 1
0.4051799 -0.2179183 a 0
0.6710795 0.0428971 a 1
0.8042860 -0.1560667 a 0
-0.4853985 -1.0871922 b 1
-0.4248538 0.4763488 b 0
0.7339685 1.7302655 b 1
-0.0382100 -0.5277092 b 0
1.0148851 -0.8100530 b 1
0.4882840 -1.4875938 b 0
-1.3390943 0.3942370 b 1
0.0874273 0.6504262 b 0
-1.0976633 0.1321445 b 1
0.8200730 -0.0614851 b 0
-1.1440706 -1.4008513 b 1
1.8051419 0.4173913 b 0
0.5466853 -0.5349018 b 1
2.1209070 0.0183752 b 0
-0.0819052 1.2833521 b 1
1.6778101 0.1788126 b 0
-0.1120956 0.1040616 b 1
0.3878472 0.5414568 b 0
-1.2145690 0.1132756 b 1
-0.7031186 0.7151388 b 0
1.3898619 0.1627585 b 1
0.0777624 -0.9014512 b 0
-0.5475966 -0.6950393 b 1
0.3948451 0.8542271 b 0
0.2029085 0.3385795 b 1

Next, let’s make a simple panel of scatter plots using ggplot(), coloring the points by the variable ‘c’ and creating two panels so that the points are grouped by the variable ‘d’:

ggplot(data) +
  geom_point(aes(x = x,y = y, col = c))+
  facet_grid(~ d) 

This is how the plot would look if we didn’t make any alterations to any of the labels. Using the code above as something to build upon, let’s go through some examples of how to change different types of labels on the plot to incorporate Greek symbols and math expressions.


Plot Titles, Axes and Legend Titles

One way to modify plot titles, axes and legend titles is through the labs() function in ggplot2. In order to add math notation to those labels, we can use the expression() function to specify the label text. For example, if we wanted to modify the plot above such that the title was “\(Y \sim X\)”, the x axis was labeled as “\(\beta_0\),” and the legend title read “Values of \(\mu\),” we could run the following:

ggplot(data) +
  geom_point(aes(x = x,y = y, col = c))+
  facet_grid(~ d) +
  labs(title = expression(Y %~% X),
       x = expression(beta[0]),
       col = expression(paste('Values of ', mu)))

🌟 BTW: This website will come in handy when figuring out the math expression syntax!

Legend values

Next, let’s play around with the text of the values shown in the legend. Suppose we want to show that the names of the groups used to color the points are not actually ‘a’ and ‘b,’ but ‘\(\alpha\)’ and ‘\(\beta\)’, respectively. In order to reformat the color legend values, we’ll use the parse_format() function from the scales package.

🚨 Before modifying the plot, we will first recode the variable ‘c’ such that the values are character strings containing the expressions we want to show:

data = data %>% 
  mutate(c = recode_factor(c, `a` = "alpha", `b` = "beta"))

Now, let’s modify the color labels:

ggplot(data) +
  geom_point(aes(x = x,y = y, col = c))+
  facet_grid(~ d) +
  labs(title = expression(Y %~% X),
       x = expression(beta[0]),
       col = expression(paste('Values of ', mu))) + 
  scale_colour_discrete(labels = parse_format())

Facet Labels

Lastly, let’s change the labels of the different plot panels to read ‘\(\gamma = 1\)’ and ‘\(\gamma = 2\)’. To do so, we will specify the label parameter in the facet_grid() plotting step as label = "label_parsed".

🚨 Again, before we do this, we’ll need to recode the variable that is used to create the facet grid:

data = data %>% 
  mutate(d = recode_factor(d, `0` = "gamma == 1", `1` = "gamma == 2"))

Now let’s modify the panel names!

ggplot(data) +
  geom_point(aes(x = x,y = y, col = c))+
  facet_grid(~ d, label = "label_parsed") +
  labs(title = expression(Y %~% X),
       x = expression(beta[0]),
       col = expression(paste('Values of ', mu))) + 
  scale_colour_discrete(labels = parse_format())


There you have it! Hopefully these examples will come in handy the next time you need to include math expressions in a plot. Thank you to Ben Williams and Jeremy Yoder for coming to my rescue on Twitter! 🙌 🎉

Related