Analytics Made Accessible

View Original

Work Smarter, Not Harder.

GIF showing a tile grid map of Afria with different tiles (countries) highlighted in purple.

I absolutely love using R for data visualization. Why? Because R makes it easy to create multiple plots with the same, consistent formatting with little effort. In fact, you can wrap the entire plotting process in a nice little function that you can use whenever you want. Trust me, once you give it a try, you'll never go back!

 

What is a function?

A function is a block of code you can reuse to perform a specific task. That's a fancy way of saying if you find yourself repeating the same series of steps over…

and over

and over

and over again…

Consider writing a function.

Functions can save you time and a whole lot of manual copy-paste work.

 

Writing Functions in R

Functions take inputs, execute a task, and return an output(s). The "body" of the function contains statements and expressions that perform specific actions or calculations.

functionName = function(argument1, argument2){
    # place statements that perform 
    # specific actions or calculations here
}

where

  • functionName is the name of your function.

  • argument1 is the first argument, and argument2 is the second argument.

  • The commented text is where you would enter statements to be executed.

  

A Simple Example

Here is a simple example. Say you want to create a function called wordJoin that combines two words into one.

You could write:

wordJoin = function(word1, word2) {
    paste0(word1, word2)
}

Note: word1 is the first argument, and word2 is the second argument. Also, yes, paste0() is a built-in function.

Enter the function into your console and call the function to join the two words mouse and trap together:

wordJoin("mouse", "trap")
[1] "mousetrap"

Pretty easy, right? Well, you can use the same logic to write functions that create stylized charts.

 

Creating a Chart in R

ggplot2 is my go-to package for creating charts in R. The package offers A LOT of flexibility and works well with a laundry list of companion packages to do everything from customizing text to building custom color palettes to adding interactivity.

Say you work as a dashboard developer at an organization whose mission is to promote Inclusive growth and sustainable development in Africa. Your boss asks you to create a series of static country profiles. Each profile should have a map of Africa, with the (current) country highlighted, and the two-letter country code (abbreviation) should be visible.

You set off to work and decide to produce a tile grid map.

## set plot dimensions
dev.new(width = 5.25, height = 4.75)
ggplot(tileGrid, aes(x = x, y = y)) +
    geom_point(aes(color ="#B3B3B3"), shape = "square", size = 13) +
    scale_color_identity() +
    geom_text(aes(label = code, color ="#444444"),  vjust = 0.5, hjust = 0.5, 
            fontface = "plain", size = 4.5, family = "Helvetica Neue") +
    scale_x_continuous(limits = c(10,20)) +
    scale_y_reverse() +
    theme_void()
Tile Grid Map of Africa.

Ok, it looks great. But your boss wants EACH country to have its own profile where its corresponding tile on the map is highlighted.

So, you set off to add that feature, starting with the country of Ghana, which has the two-letter country code "GH":

Tile Grid Map of Africa, with the tile representing Ghana highlighted in purple.

Awesome, 50+ more countries to go.

Yes, you read that right. You would have to update the tileColor variable about FIFTY more times.

Or you could put in a little extra work and create a function.

 

Creating a Chart Function

The key to creating useful functions is to know:

  1. What the function should return.

  2. What the function needs to do its work.

  3. How the function will work. (In other words, have an idea of how to get from input to output. By the time I need/want to use a function, I typically have working code I can fiddle with in a function environment.)

So, let's answer those questions using our map example:

  1. I want the function to return a tile grid map, where each tile represents a different country, AND each country's two-letter country code is clearly visible.

  2. For the function to work, I need to feed it a data source that contains:

    • A column with the two-letter country code and

    • (row and column) Coordinates to plot the tile grid.

  3. I want the function to (loosely) follow the logical flow of my previously written R code.

But I also need the function to return a tile grid map where:

  • A specified country's tile is highlighted in a different color from the rest.

And because I cannot remember the two-letter abbreviation code for 50+ countries, I want to add a chart title that displays the full name of the highlighted country. So, we also need our data source to contain:

  • A column with the full name of each country.

Using the parameters outlined above, I created a function called colorTileGrid that has 6 arguments:

  1. datasource is the main data source (in this case, it will be a data frame).

  2. abbrev is the 2-letter abbreviation code for the country that will be highlighted.

  3. abbrevCol is the column in the main data source that contains the 2-letter abbreviation code of each country.

  4. countryCol is the column in the main data source that contains the full name of each country.

  5. column is the column in the main data source that contains x-axis coordinates for plotting each tile.

  6. row is the column in the main data source that contains y-axis coordinates for plotting each tile.

colorTileGrid = function(dataset, abbrev, abbrevCol, countryCol, column, row){
    ## Set Up
    # get datset
    dataset = get(dataset)
    # set colors (tile color and text color)
    dataset[["tileColor"]] = ifelse(dataset[[abbrevCol]] == abbr,
                              "#702963" , "#B3B3B3")
    dataset[["textColor"]] = ifelse(dataset[["tileColor"]] ==         
                              "#702963", "#FFFFFF","#444444")
    # flag current state (for chart title)
    currentCountry = paste0("**",dataset[[countryCol]]
                            [dataset[[abbrCol]] == abbr],"**")                                          
    ## Create the chart
    tileGGplot = ggplot(dataset, aes(x = .data[[column]], 
                        y = .data[[row]])) +
    geom_point(aes(color =.data[["tileColor"]]), shape = "square",
               size = 13) +
    scale_color_identity() +
    labs(title =paste0("<span style = 'font-size:14pt; 
        font-family: Helvetica Neue;color:#3e3e3e;'>Current 
        Country: </span><span style = 'font-size:14pt; 
        font-weight:bold; font-family: Helvetica Neue;color:#702963;
       '>", currentCountry,"</span>")) +
    geom_text(aes(label = .data[[abbrCol]], color =
             .data[["textColor"]]), vjust = 0.5, hjust = 0.5,
             fontface = "plain", size = 4.5, 
             family = "Helvetica Neue") +
    scale_y_reverse() +
    theme_void() + 
    theme(plot.title = element_markdown(vjust = 1,  hjust = 0,
          padding = unit(c(t = 0, r = 0, b = 0.5, l = 0), "cm")),
            plot.margin = margin(t = 0.5, r = 1, b = 2, l = 1, "cm"))    
    return(tileGGplot)    
    }

But we still have the same problem. Even though we have created a function that speeds up the process of generating each profile, we still would have to update the abbrev argument 50+ times to create a tile map for each country in our dataset.

I don't know about you, but that still seems like A LOT of work.

 

Using a Function to Create MANY Charts

Thankfully, we can use a function like pmap() from the purrr package to quickly generate all 50+ of our tile maps.

The easiest way to do this is to create a new data frame where each column represents a function argument, and each row contains the information required to generate a plot. (I am calling my new data frame tileGridMaps:

tileGridMaps = data.frame(dataset = "tileGrid", cc = tileGrid$code, 
ccCol = "code", ctryCol = "name", col = "x", row = "y")

Let's take a closer look. Print the first row of this new data frame:

tileGridMaps[1,]
#   dataset cc ccCol ctryCol col row
1 tileGrid DZ  code    name   x   y

The first row will produce a tile grid map highlighting the country with the 2-letter country code "DZ" (Algeria). How do I know this? Let’s break it down:

  • values in the dataset column will be passed to the datasource argument.

  • values in the cc (current country 2-letter country code) column will be passed to the abbrevCol argument.

  • values in the ccCol (country 2-letter code) column will be passed to the abbrevCol argument.

  • values in the ctryCol (full name for each country) column will be passed to the countryCol argument.

  • values in the col and row (coordinates to plot each tile) columns will be passed to the column and row arguments.

With that in mind, let’s generate the maps!

# generate the maps
dev.new(width = 5.60, height = 6.25)
pmap(tileGridMaps, function(dataset, cc, ccCol, ctryCol, col, row) 
colorTileGrid(datasource = dataset, abbrev = cc, abbrevCol = ccCol, 
countryCol = ctryCol, column = col, row = row))

And there you have it!

GIF showing a tile grid map of Afria with different tile (countries) highlighted in purple.

Now, here is where I make a few mandatory comments about error handling and other checks. In practice, it is important to check the inputs of your functions. For example, you may want to stop the function from executing if an object you referenced does not exist in the current environment or is of the wrong class (e.g., data frame instead of a character vector). 


How do you create functions? What does your process look like? Let me know your thoughts in the comments.

  

** This post uses Maarten Lambrechts' world tile grid coordinates, available for download from their GitHub. **