Order Your Data with Intention – ggplot edition

Before/After slide transformation. Before slide (left) shows several categories of data displayed in a lollipop chart where the values are not ordered meaningfully. The After slide shows the same data sorted in descending order.

If you struggle to get your data to display properly in ggplot, don't worry – you're not alone. It can be a challenge. In this post, I will show you two ways to sort your chart data using ggplot.

The Data

For this example, we will use (fictitious) data from a large survey study on the fruit consumption patterns of adult residents in a small town. One of the questions on the survey asked participants:

Which of the following fruits have you eaten in the last week?

Respondents could select all that apply from the following list:

  • apples

  • cherries

  • elderberries

  • dates

  • blueberries

Survey researchers ran some numbers and produced the following table, which shows the percent of adult residents who have eaten each fruit in the last week:

fruit percent
apples 27%
cherries 4%
elderberries 21%
dates 36%
blueberries 20%

Let's visualize the data using a lollipop chart in R:

### Set up
## load libraries
library(ggplot2)
library(dplyr)

### Plot data
fruit_data <- 
  data.frame(fruit = c("apples","cherries","elderberry",
                       "dates","blueberries"), 
             percent = c(0.27,0.04,0.21,0.36,0.2))

### custom ggplot theme
new_min_theme <- function(){
  theme_minimal() %+replace%							
    theme(axis.ticks = element_blank(), 
          axis.ticks.length=unit(0, "cm"),
          panel.grid.major.y=element_blank(),
          panel.grid.major.x=element_blank(),
          panel.grid.minor.x=element_blank(),
          axis.text.y = element_text(family = "Montserrat", 
                                     face = "plain", 
                                     colour = "#3e3e3e", 
                                     hjust = 1),
          axis.text.x = element_text(family = "Montserrat", 
                                     face = "plain", 
                                     colour = "#626262"))
}

### main plot
ggplot(data = fruit_data, aes(x = percent, y = fruit)) +
  geom_vline(xintercept = seq(0, 0.5, by = 0.1), 
             colour= "#BFBFBF", linewidth = 0.20) +
  geom_segment(aes(y = fruit, yend = fruit, x = 0, 
                   xend = percent), color = "#5b5b5b", 
               linewidth = 0.75) +					
  geom_point(color = "#7030A0", size = 5, alpha = 1) +
  scale_x_continuous(limits=c(0, 0.5), 
                     breaks = seq(0, 0.5, by = 0.1),
                     labels = paste0("\n", 
                                     seq(0, 0.5, by = 0.1)*100, 
                                     "%")) +
  labs(x = "", y = "") +
  new_min_theme()

Default Chart:

Lollipop chart showing the percentage of adult residents in a small town who have eaten the following fruits: apples: 27%; cherries: 4%; elderberries: 21%; dates: 36%; blueberries: 20%.

The default chart looks good. But look at the y-axis labels. Notice how the data labels appear in descending order? What if you want to sort your y-axis labels in ascending order?

You could try ordering the rows of the data frame by the fruit variable using order() or arrange(), but it will not impact your ggplot output. Why? Because the y-axis variable fruit is a character type. When you supply data that are of a character type to ggplot, the resulting displayed labels (in this case, fruit names) will, by default, appear in descending (alphabetical) order (from top to bottom).

Ordering by Y-Axis Labels

To display fruit names in ascending order,

  1. First, arrange the rows of the fruit_data data frame by the fruit column in descending order using the arrange() function from the dplyr package.
  2. Next, convert the fruit variable into a factor.
  3. Finally, plot the data.

Note: you could also arrange the rows of the fruit_data data frame by fruit names in the fruit column in descending order using the order() function.

Here is one way you could do that:

fruit_data |>
  arrange(desc(fruit)) |>
  mutate(fruit = factor(fruit, levels = fruit)) |>
  ggplot(aes(x = percent, y = fruit)) +
  geom_vline(xintercept = seq(0, 0.5, by = 0.1), 
             colour= "#BFBFBF", 
             linewidth = 0.20) +
  geom_segment(aes(y = fruit, yend = fruit, x = 0, 
                   xend = percent), color = "#5b5b5b", 
               linewidth = 0.75) +					
  geom_point(color = "#7030A0", size = 5, alpha = 1) +
  scale_x_continuous(limits=c(0, 0.5), 
                     breaks = seq(0, 0.5, by = 0.1),
                     labels = paste0("\n", 
                                     seq(0, 0.5, by = 0.1)*100, 
                                     "%")) +
  labs(x = "", y = "") +
  new_min_theme()
Lollipop chart showing the percentage of adult residents in a small town who have eaten five fruits (shown in alphabetical order): apples: 27%; blueberries: 20%; cherries: 4%; dates: 36%; elderberries: 21%.

Pretty simple, right?

Well, what if you wanted to order chart data based on the percent of adults who reported eating each fruit?

Ordering by Data Values

Another common scenario involves sorting chart data based on values in a data set column. In this case, we might consider sorting our chart data by the percent of adults who reported eating each fruit (i.e., sorting by the percent column).

To sort chart data by the percent of adults who reported eating each fruit in ascending order (i.e., smallest to largest percent):

  1. First, arrange the rows of the fruit_data data frame by the percent column in descending order using the arrange() function from the dplyr package.
  2. Next, convert the fruit variable into a factor.
  3. Finally, plot the data.

Note: you could also arrange the rows of the fruit_data data frame by the percent of adults who reported eating each fruit using the order() function and then convert the fruit column into a factor, preserving the order of the values and labels.

Here's how:

fruit_data |>
  arrange(desc(percent)) |>
  mutate(fruit = factor(fruit, levels = fruit)) |>
  ggplot(aes(x = percent, y = fruit)) +
  geom_vline(xintercept = seq(0, 0.5, by = 0.1), 
             colour= "#BFBFBF", 
             linewidth = 0.20) +
  geom_segment(aes(y = fruit, yend = fruit, x = 0, 
                   xend = percent), color = "#5b5b5b", 
               linewidth = 0.75) +					
  geom_point(color = "#7030A0", size = 5, alpha = 1) +
  scale_x_continuous(limits=c(0, 0.5), 
                     breaks = seq(0, 0.5, by = 0.1),
                     labels = paste0("\n", 
                                     seq(0, 0.5, by = 0.1)*100, 
                                     "%")) +
  labs(x = "", y = "") +
  new_min_theme()
Lollipop chart showing the percentage of adult residents in a small town who have eaten five fruits (chart data are ordered from smallest to largest percent): cherries: 4%; blueberries: 20%; apples: 27%; elderberries: 21%; dates: 36%.

Now, if you wanted to sort chart data by the percent of adults who reported eating each fruit in descending order (i.e., largest to smallest percent):

  1. First, arrange the rows of the fruit_data data frame by the percent column in ascending order using the arrange() function from the dplyr package.
  2. Next, convert the fruit variable into a factor.
  3. Finally, plot the data.

Again, you could also arrange the rows of the fruit_data data frame by the percent of adults who reported eating each fruit using the order() function. Once the data are ordered, convert the fruit column into a factor, preserving the order of the values and labels.

Here is one way to accomplish this task:

fruit_data |>
  arrange(percent) |>
  mutate(fruit = factor(fruit, levels = fruit)) |>
  ggplot(aes(x = percent, y = fruit)) +
  geom_vline(xintercept = seq(0, 0.5, by = 0.1), 
             colour= "#BFBFBF", 
             linewidth = 0.20) +
  geom_segment(aes(y = fruit, yend = fruit, x = 0, 
                   xend = percent), color = "#5b5b5b", 
               linewidth = 0.75) +					
  geom_point(color = "#7030A0", size = 5, alpha = 1) +
  scale_x_continuous(limits=c(0, 0.5), 
                     breaks = seq(0, 0.5, by = 0.1),
                     labels = paste0("\n", 
                                     seq(0, 0.5, by = 0.1)*100, 
                                     "%")) +
  labs(x = "", y = "") +
  new_min_theme()
Lollipop chart showing the percentage of adult residents in a small town who have eaten five fruits (chart data are ordered from largest to smallest percent): dates: 36%; elderberries: 21%; apples: 27%; blueberries: 20%; cherries: 4%.

How do you order your data using ggplot?

Let me know in the comments.

Want to learn more about why you should order your data with intention? Read my three-part series of posts called Order Your Data with Intention:

Previous
Previous

Design with a discerning, editing eye

Next
Next

Four Ways to Focus Your Audience's Attention