R - Dynamic reference to files for read csv - r

I would like to make a script that reads data from the correct folder. I have several lines in my code refering to the foldername, therefore I would like to make this dynamic. Is it possible to make the reference to a folder name dynamic? See below what I would like to do
# Clarifies the name of the folder, afterwards "Foldername" will be used as reference
FolderA <- Foldername
# Read csv to import the data from the selected location
data1 <- read.csv(file="c:/R/Foldername/datafile1.csv", header=TRUE, sep=",")
data2 <- read.csv(file="c:/R/Foldername/datafile2.csv", header=TRUE, sep=",")
I am trying to get the same result as what I would get with this code:
data1 <- read.csv(file="c:/R/FolderA/datafile1.csv", header=TRUE, sep=",")
data2 <- read.csv(file="c:/R/FolderA/datafile2.csv", header=TRUE, sep=",")
Can somebody please clarify how it would be possible to make this dynamic?

You could use paste0 for this:
FolderA <- "Foldername"
paste0("c:/R/", FolderA, "/datafile1.csv")
#[1] "c:/R/Foldername/datafile1.csv"
So in your case:
data1 <- read.csv(file=paste0("c:/R/", FolderA, "/datafile1.csv"), header=TRUE, sep=",")

A slight generalization of #LyzandeR's answer,
make_files <- function(directory, filenames) {
sprintf("C:/R/%s/%s", directory, filenames)
}
##
Files <- sprintf("file%i.csv", 1:3)
##
make_files("FolderA", Files)
#[1] "C:/R/FolderA/file1.csv" "C:/R/FolderA/file2.csv" "C:/R/FolderA/file3.csv"

you could also try the following method. The loop will create a list with output file, but if your files all have the same column names you could just rbind them together (method 2). This method will allow you to specify your folder, then use the list.files function to extract all files with extension ".csv". This way if you have many csv files in a folder you won't have to write them all out individually.
# Specify working directory or location of files:
FolderA = "c:/R/Foldername"
# identify all files with specific extension:
files = list.files(FolderA,pattern="*.csv")
Method 1 - Separate by lists
data = NULL
for(i in 1:length(files)){
data[[i]] = read.csv(files[i],header=F,stringsAsFactors=F)
}
Method 2 - single dataframe
data = NULL
for(i in 1:length(files)){
df = read.csv(files[i],header=F,stringsAsFactors=F)
data = rbind(data,df)
}

Related

How to 'read.csv' many files in a folder using R?

How can I read many CSV files and make each of them into data tables?
I have files of 'A1.csv' 'A2.csv' 'A3.csv'...... in Folder 'A'
So I tried this.
link <- c("C:/A")
filename<-list.files(link)
listA <- c()
for(x in filename) {
temp <- read.csv(paste0(link , x), header=FALSE)
listA <- list(unlist(listA, recursive=FALSE), temp)
}
And it doesn't work well. How can I do this job?
Write a regex to match the filenames
reg_expression <- "A[0-9]+"
files <- grep(reg_expression, list.files(directory), value = TRUE)
and then run the same loop but use assign to dynamically name the dataframes if you want
for(file in files){
assign(paste0(file, "_df"),read.csv(file))
}
But in general introducing unknown variables into the scope is bad practice so it might be best to do a loop like
dfs <- list()
for(index in 1:length(files)){
file <- files[index]
dfs[index] <- read.csv(file)
}
Unless each file is a completely different structure (i.e., different columns ... the number of rows does not matter), you can consider a more efficient approach of reading the files in using lapply and storing them in a list. One of the benefits is that whatever you do to one frame can be immediately done to all of them very easily using lapply.
files <- list.files(link, full.names = TRUE, pattern = "csv$")
list_of_frames <- lapply(files, read.csv)
# optional
names(list_of_frames) <- files # or basename(files), if filenames are unique
Something like sapply(list_of_frames, nrow) will tell you how many rows are in each frame. If you have something more complex,
new_list_of_frames <- lapply(list_of_frames, function(x) {
# do something with 'x', a single frame
})
The most immediate problem is that when pasting your file path together, you need a path separator. When composing file paths, it's best to use the function file.path as it will attempt to determine what the path separator is for operating system the code is running on. So you want to use:
read.csv(files.path(link , x), header=FALSE)
Better yet, just have the full path returned when listing out the files (and can filter for .csv):
filename <- list.files(link, full.names = TRUE, pattern = "csv$")
Combining with the idea to use assign to dynamically create the variables:
link <- c("C:/A")
files <-list.files(link, full.names = TRUE, pattern = "csv$")
for(file in files){
assign(paste0(basename(file), "_df"), read.csv(file))
}

Loop subset over several files in a directory and output files into a new directory with a suffix

I have figured out some part of the code, I will describe below, but I find it hard to iterate (loop) the function over a list of files:
library(Hmisc)
filter_173 <- c("kp|917416", "kp|835898", "kp|829747", "kp|767311")
# This is a vector of values that I want to exclude from the files
setwd("full_path_of_directory_with_desired_files")
filepath <- "//full_path_of_directory_with_desired_files"
list.files(filepath)
predict_files <- list.files(filepath, pattern="predict.txt")
# all files that I want to filter have _predict.txt in them
predict_full <- file.path(filepath, predict_files)
# generates full pathnames of all desired files I want to filter
sample_names <- sample_names <- sapply(strsplit(predict_files , "_"), `[`, 1)
Now here is an example of a simple filtering I want to do with one specific example file, this works great. How do I repeat this in a loop on all filenames in predict_full
test_predict <- read.table("a550673-4308980_A05_RepliG_rep2_predict.txt", header = T, sep = "\t")
# this is a file in my current working directory that I set with setwd above
test_predict_filt <- test_predict[test_predict$target_id %nin% filter_173]
write.table(test_predict_filt, file = "test_predict")
Finally how do I place the filtered files in a folder with the same name as original with the suffix filtered?
predict_filt <- file.path(filepath, "filtered")
# Place filtered files in
filtered/ subdirectory
filtPreds <- file.path(predict_filt, paste0(sample_names, "_filt_predict.txt"))
I always get stuck at looping! It is hard to share a 100% reproducible example as everyone's working directory and file paths will be unique though all the code I shared works if you adapt it to an appropriate path name on your machine.
This should work to loop through each of the files and write them out to the new location with the filename specifications you needed. Just be sure to change the directory paths first.
filter_173 <- c("kp|917416", "kp|835898", "kp|829747", "kp|767311") #This is a vector of values that I want to exclude from the files
filepath <- "//full_path_of_directory_with_desired_files"
filteredpath <- "//full_path_of_directory_with_filtered_results/"
# Get vector of predict.txt files
predict_files <- list.files(filepath, pattern="predict.txt")
# Get vector of full paths for predict.txt files
predict_full <- file.path(filepath, predict_files)
# Get vector of sample names
sample_names <- sample_names <- sapply(strsplit(predict_files , "_"), `[`, 1)
# Set for loop to go from 1 to the number of predict.txt files
for(i in 1:length(predict_full))
{
# Load the current file into a dataframe
df.predict <- read.table(predict_full[i], header=T, sep="\t")
# Filter out the unwanted rows
df.predict <- df.predict[!(df.predict$target_id %in% filter_173)]
# Write the filtered dataframe to the new directory
write.table(df.predict, file = file.path(filteredpath, paste(sample_names[i],"_filt_predict.txt",sep = "")))
}

Importing and binding multiple and specific csv files into R

I would like to import and bind, all together in a R file, specific csv files named as "number.CSV" (e.g. 3437.CSV) which I have got in a folder with other csv files that I do not want to import.
How can I select only the ones that interest me?
I have got a list of all the csv files that I need and in the following column there are some of them.
CODE
49002
47001
64002
84008
46003
45001
55008
79005
84014
84009
45003
45005
51001
55012
67005
19004
7003
55023
55003
76004
21013
I have got 364 csv files to read and bind.
n.b. I can't select all the "***.csv" files from my folder because I have got other files that I do not need.
Thanks
You could iterate over the list of CSV files of interest, read in each one, and bind it to a common data frame:
path <- "path/to/folder/"
ROOT <- c("49002", "47001", "21013")
files <- paste0(path, ROOT)
sapply(files, bindFile, var2=all_files_df)
bindFile <- function(x, all_df) {
df <- read.csv(x)
all_df <- rbind(df, all_df)
}
Just make file names out of your numeric codes:
filenames = paste(code, 'csv', sep = '.')
# [1] "49002.csv" "47001.csv" "64002.csv" …
You might need to specify the full path to the files as well:
directory = '/example/path'
filenames = file.path(directory, filenames)
# [1] "/example/path/49002.csv" "/example/path/47001.csv" "/example/path/64002.csv" …
And now you can simply read them into R in one go:
data = lapply(filenames, read.csv)
Or, if your CSV files don’t have column headers (this is the case, in particular, when the file’s lines have different numbers of items!)
data = lapply(filenames, read.csv, header = FALSE)
This will give you a list of data.frames. If you want to bind them all into one table, use
data = do.call(rbind, data)
I don't know if you can do that from .CSV file. What you can do is open all your data and then use the command cbind.
For example:
data1 <- read.table("~/YOUR/DATA", quote="\"", comment.char="")
data2 <- read.table("~/YOUR/DATA", quote="\"", comment.char="")
data3 <- read.table("~/YOUR/DATA", quote="\"", comment.char="")
And then:
df <- cbind(data1$Col1, data2$col3...)
Where col is the name of the column that you want.

Batch rename list of files to totally different names using R

I have 330 files that i would like to rename using R. I saved the original names and the new names in a .csv file. I used a script which does not give an error but it does not change the names.
Here is a sample of the new names:(df1)
D:\Modis_EVI\Original\EVI_Smoothed\ MODIS_EVI_20010101.tif
D:\Modis_EVI\Original\EVI_Smoothed\ MODIS_EVI_20010117.tif
D:\Modis_EVI\Original\EVI_Smoothed\ MODIS_EVI_20010201.tif
And a sample of the original names:(df2)
D:\Modis_EVI\Original\EVI_Smoothed\ MODIS.2001001.yL1600.EVI.tif
D:\Modis_EVI\Original\EVI_Smoothed\ MODIS.2001033.yL1600.EVI.tif
D:\Modis_EVI\Original\EVI_Smoothed\ MODIS.2001049.yL1600.EVI.tif
Then here is the script i'm using:
csv_dir <- "D:\\"
df1 <- read.csv(paste(csv_dir,"New_names.csv",sep=""), header=TRUE, sep=",") # read csv
hdfs <- df1$x
hdfs <- as.vector(hdfs)
df2 <- read.csv(paste(csv_dir,"smoothed.csv",sep=""), header=TRUE, sep=",") # read csv
tifs <- df2$x
tifs <- as.vector(tifs)
for (i in 1:length(hdfs)){
setwd("D:\\Modis_EVI\\Original\\EVI_Smoothed\\")
file.rename(from =tifs[i], to = hdfs[i])
}
Any advice please?
I think you mix up the old and the new files, and you are trying to use rename the new file (names), which do not exist, to the old file names. This might work
file.rename(from =hdfs[i], to = tifs[i])
A general approach would go like this:
setwd("D:\\Modis_EVI\\Original\\EVI_Smoothed\\")
fin <- list.files(pattern='tif$')
fout <- gsub("_EVI_", ".", fin)
fout <- gsub(".tif", "yL1600.EVI.tif", fout)
for (i in 1:length(fin)){
file.rename(from=fin[i], to= fout[i])
}
To fix your script (do you really need .csv files?)
setwd("D:\\Modis_EVI\\Original\\EVI_Smoothed\\")
froms <- read.csv("d:/New_names.csv", stringsAsFactors=FALSE)
froms <- as.vector(froms$x)
First check if they exist:
all(file.exists(froms))
Perhaps you need to trim the names (remove whitespace) -- that is what the examples you give suggest
library(raster)
froms <- trim(froms)
all(file.exists(froms))
If they exist
tos <- read.csv("d:/smoothed.csv", stringsAsFactors=FALSE)
tos <- as.vector(tos$x)
# tos <- trim(tos)
for (i in 1:length(froms)) {
file.rename(froms[i], tos[i])
}

Read in multiple txt files and create a list of it to access each file by accessing the list element in R

Being relatively new to R programming I am struggling with a huge data set of 16 text files (, seperated) saved in one dierctory. All the files have same number of columns and the naming convention, for example file_year_2000, file_year_2001 etc. I want to create a list in R where i can access each file individually by accessing the list elementts. By searching through the web i found some code and tried the following but as a result i get one huge list (16,2 MB) where the output is just strange. I would like to have 16 elements in the list each represting one file read from the directory. I tried the following code but it does not work as i want:
path = "~/.../.../.../Data_1999-2015"
list.files(path)
file.names <- dir(path, pattern =".txt")
length(file.names)
df_list = list()
for( i in length(file.names)){
file <- read.csv(file.names[i],header=TRUE, sep=",", stringsAsFactors=FALSE)
year = gsub('[^0-9]', '', file)
df_list[[year]] = file
}
Any suggestions?
Thanks in advance.
Just to give more details
path = "~/.../.../.../Data_1999-2015"
list.files(path)
file.names <- dir(path, pattern =".txt")
length(file.names)
df_list = list()
for(i in seq(length(file.names))){
year = gsub('[^0-9]', '', file.names[i])
df_list[[year]] = read.csv(file.names[i],header=TRUE, sep=",", stringsAsFactors=FALSE)
}
Maybe it would be worth joining the data frames into one big data frame with an additional column being the year?
I assume that instead of "access each file individually" you mean you want to access individually data in each file.
Try something like this (untested):
path = "~/.../.../.../Data_1999-2015"
file.names <- dir(path, pattern =".txt")
df_list = vector("list", length(file.names))
# create a list of data frames with correct length
names(df_list) <- rep("", length(df_list))
# give it empty names to begin with
for( i in seq(along=length(file.names))) {
# now i = 1,2,...,16
file <- read.csv(file.names[i],header=TRUE, sep=",", stringsAsFactors=FALSE)
df_list[[i]] = file
# save the data
year = gsub('[^0-9]', '', file.names[i])
names(df_list)[i] <- year
}
Now you can use either df_list[[1]] or df_list[["2000"]] for year 2000 data.
I am uncertain if you are reading yout csv files in the right directory. If not, use
file <- read.csv(paste0(path, file.names[i], sep="/"),header=TRUE, sep=",", stringsAsFactors=FALSE)
when reading the file.

Resources