Order Your Data with Intention – ggplot edition
If you struggle to get your data to display properly in ggplot, don't worry – you're not alone. It can be a challenge. In this post, I will show you two ways to sort your chart data using ggplot.
The Data
For this example, we will use (fictitious) data from a large survey study on the fruit consumption patterns of adult residents in a small town. One of the questions on the survey asked participants:
Which of the following fruits have you eaten in the last week?
Respondents could select all that apply from the following list:
apples
cherries
elderberries
dates
blueberries
Survey researchers ran some numbers and produced the following table, which shows the percent of adult residents who have eaten each fruit in the last week:
Let's visualize the data using a lollipop chart in R:
Default Chart:
The default chart looks good. But look at the y-axis labels. Notice how the data labels appear in descending order? What if you want to sort your y-axis labels in ascending order?
You could try ordering the rows of the data frame by the fruit
variable using order()
or arrange()
, but it will not impact your ggplot output. Why? Because the y-axis variable fruit
is a character type. When you supply data that are of a character type to ggplot, the resulting displayed labels (in this case, fruit names) will, by default, appear in descending (alphabetical) order (from top to bottom).
Ordering by Y-Axis Labels
To display fruit names in ascending order,
- First, arrange the rows of the
fruit_data
data frame by thefruit
column in descending order using thearrange()
function from the dplyr package. - Next, convert the
fruit
variable into a factor. - Finally, plot the data.
Note: you could also arrange the rows of the fruit_data
data frame by fruit names in the fruit
column in descending order using the order()
function.
Here is one way you could do that:
Pretty simple, right?
Well, what if you wanted to order chart data based on the percent of adults who reported eating each fruit?
Ordering by Data Values
Another common scenario involves sorting chart data based on values in a data set column. In this case, we might consider sorting our chart data by the percent of adults who reported eating each fruit (i.e., sorting by the percent
column).
To sort chart data by the percent of adults who reported eating each fruit in ascending order (i.e., smallest to largest percent):
- First, arrange the rows of the
fruit_data
data frame by thepercent
column in descending order using thearrange()
function from the dplyr package. - Next, convert the
fruit
variable into a factor. - Finally, plot the data.
Note: you could also arrange the rows of the fruit_data
data frame by the percent of adults who reported eating each fruit using the order()
function and then convert the fruit
column into a factor, preserving the order of the values and labels.
Here's how:
Now, if you wanted to sort chart data by the percent of adults who reported eating each fruit in descending order (i.e., largest to smallest percent):
- First, arrange the rows of the
fruit_data
data frame by thepercent
column in ascending order using thearrange()
function from the dplyr package. - Next, convert the
fruit
variable into a factor. - Finally, plot the data.
Again, you could also arrange the rows of the fruit_data
data frame by the percent of adults who reported eating each fruit using the order()
function. Once the data are ordered, convert the fruit
column into a factor, preserving the order of the values and labels.
Here is one way to accomplish this task:
fruit_data |>
arrange(percent) |>
mutate(fruit = factor(fruit, levels = fruit)) |>
ggplot(aes(x = percent, y = fruit)) +
geom_vline(xintercept = seq(0, 0.5, by = 0.1),
colour= "#BFBFBF",
linewidth = 0.20) +
geom_segment(aes(y = fruit, yend = fruit, x = 0,
xend = percent), color = "#5b5b5b",
linewidth = 0.75) +
geom_point(color = "#7030A0", size = 5, alpha = 1) +
scale_x_continuous(limits=c(0, 0.5),
breaks = seq(0, 0.5, by = 0.1),
labels = paste0("\n",
seq(0, 0.5, by = 0.1)*100,
"%")) +
labs(x = "", y = "") +
new_min_theme()
How do you order your data using ggplot?
Let me know in the comments.
Want to learn more about why you should order your data with intention? Read my three-part series of posts called Order Your Data with Intention: