Analytics Made Accessible

View Original

Order Your Data with Intention – ggplot edition

See this content in the original post

If you struggle to get your data to display properly in ggplot, don't worry – you're not alone. It can be a challenge. In this post, I will show you two ways to sort your chart data using ggplot.

The Data

For this example, we will use (fictitious) data from a large survey study on the fruit consumption patterns of adult residents in a small town. One of the questions on the survey asked participants:

Which of the following fruits have you eaten in the last week?

Respondents could select all that apply from the following list:

  • apples

  • cherries

  • elderberries

  • dates

  • blueberries

Survey researchers ran some numbers and produced the following table, which shows the percent of adult residents who have eaten each fruit in the last week:

See this content in the original post

Let's visualize the data using a lollipop chart in R:

See this content in the original post

Default Chart:

See this content in the original post

The default chart looks good. But look at the y-axis labels. Notice how the data labels appear in descending order? What if you want to sort your y-axis labels in ascending order?

You could try ordering the rows of the data frame by the fruit variable using order() or arrange(), but it will not impact your ggplot output. Why? Because the y-axis variable fruit is a character type. When you supply data that are of a character type to ggplot, the resulting displayed labels (in this case, fruit names) will, by default, appear in descending (alphabetical) order (from top to bottom).

Ordering by Y-Axis Labels

To display fruit names in ascending order,

  1. First, arrange the rows of the fruit_data data frame by the fruit column in descending order using the arrange() function from the dplyr package.
  2. Next, convert the fruit variable into a factor.
  3. Finally, plot the data.

Note: you could also arrange the rows of the fruit_data data frame by fruit names in the fruit column in descending order using the order() function.

Here is one way you could do that:

See this content in the original post

Pretty simple, right?

Well, what if you wanted to order chart data based on the percent of adults who reported eating each fruit?

Ordering by Data Values

Another common scenario involves sorting chart data based on values in a data set column. In this case, we might consider sorting our chart data by the percent of adults who reported eating each fruit (i.e., sorting by the percent column).

To sort chart data by the percent of adults who reported eating each fruit in ascending order (i.e., smallest to largest percent):

  1. First, arrange the rows of the fruit_data data frame by the percent column in descending order using the arrange() function from the dplyr package.
  2. Next, convert the fruit variable into a factor.
  3. Finally, plot the data.

Note: you could also arrange the rows of the fruit_data data frame by the percent of adults who reported eating each fruit using the order() function and then convert the fruit column into a factor, preserving the order of the values and labels.

Here's how:

See this content in the original post

Now, if you wanted to sort chart data by the percent of adults who reported eating each fruit in descending order (i.e., largest to smallest percent):

  1. First, arrange the rows of the fruit_data data frame by the percent column in ascending order using the arrange() function from the dplyr package.
  2. Next, convert the fruit variable into a factor.
  3. Finally, plot the data.

Again, you could also arrange the rows of the fruit_data data frame by the percent of adults who reported eating each fruit using the order() function. Once the data are ordered, convert the fruit column into a factor, preserving the order of the values and labels.

Here is one way to accomplish this task:

fruit_data |>
  arrange(percent) |>
  mutate(fruit = factor(fruit, levels = fruit)) |>
  ggplot(aes(x = percent, y = fruit)) +
  geom_vline(xintercept = seq(0, 0.5, by = 0.1), 
             colour= "#BFBFBF", 
             linewidth = 0.20) +
  geom_segment(aes(y = fruit, yend = fruit, x = 0, 
                   xend = percent), color = "#5b5b5b", 
               linewidth = 0.75) +					
  geom_point(color = "#7030A0", size = 5, alpha = 1) +
  scale_x_continuous(limits=c(0, 0.5), 
                     breaks = seq(0, 0.5, by = 0.1),
                     labels = paste0("\n", 
                                     seq(0, 0.5, by = 0.1)*100, 
                                     "%")) +
  labs(x = "", y = "") +
  new_min_theme()
See this content in the original post

How do you order your data using ggplot?

Let me know in the comments.

Want to learn more about why you should order your data with intention? Read my three-part series of posts called Order Your Data with Intention:

See this content in the original post