List Files in a Directory in R
**Download the R syntax and data used in this post **
Sometimes you want to know what files and folders are within a specific directory. In R, you can print the names of the files and folders in a given directory using the list.files() function. The list.files() function has two main arguments:
path: name to a folder on your computer, specified as a character vector. (The default is the current working directory)
pattern: an optional regular expression (regex). Only file names that match the regular expression will be returned when the pattern argument is specified.
Let's look at several examples.
Download the corresponding R syntax and folder for this post. Make sure to close your current R session (if you have one open) and open the R syntax file from the folder to start a new session.)
List all files
Run the setup syntax and print the current directory's contents (the "List of Files in Directory" folder) by entering list.files() into your console. The folder contains four data files (2 CSV files and 2 XLSX files) and one R syntax file.
#### List Contents of Current Directory to Console ####
### learn more about list.files()
?list.files()
### list folder contents
list.files()
[1] "Applicants_Batch_1.csv"
[2] "Applicants_Batch_2.csv"
[3] "Applicants_Batch_3.xlsx"
[4] "Applicants_Batch_4.xlsx"
[5] "List Files in Directory.R"
List files with a .csv extension
Say, however, you only wanted to print those files with a .csv extension. Here is where you would need to specify a pattern argument. Note how the dollar sign is being used to match the end of the string (or, in this case, ensure that the filenames returned are those that end in .csv):
## only list contents within current directory that have a .csv extension
list.files(pattern="\\.csv$")
[1] "Applicants_Batch_1.csv"
[2] "Applicants_Batch_2.csv"
Find specific files with multiple search patterns
What if you wanted to search for and return files with specific names? One way to accomplish this task is to create a vector (using the paste function) with the strings you want to search for and separate each string with a vertical bar ( | ). And suppose you are unsure whether the filenames contain uppercase and lowercase characters. In that case, you can specify a third argument (within list.files()) called ignore.case and set it equal to TRUE, which tells R that the pattern matching should not be case-insensitive. Note that by using the vertical bar below, you are searching for and returning filenames that contain either “batch_1” OR “batch_3”.
## only list contents within current directory that have
## the strings Batch_1 or Batch_2 in the filename
batch1_3 <- paste("Batch_1","Batch_3", sep = "|")
list.files(pattern = batch1_3, ignore.case = TRUE)
[1] "Applicants_Batch_1.csv "
[2] "Applicants_Batch_3.xlsx"
Seeing (and knowing) the contents of a directory is important, especially if you need to access or use the files.
Do you check the contents of a directory before you begin a project? Why or why not? Let me know in the comments.