R: Importing Entire Folder of Files - r

I am using the R programming language (in R Studio). I am trying to import an entire folder of ".txt" files (notepad files) into R and "consistently" name them.
I know how to do this process manually:
#find working directory:
getwd()
[1] "C:/Users/Documents"
#import files manually and name them "consistently":
df_1 <- read.table("3rd_file.txt")
df_2 <- read.table("file_1.txt")
df_3 <- read.table("second_file.txt")
Of course, this will take a long time to do if there are 100 files.
Right now, suppose these files are in a folder : "C:/Users/Documents/files_i_want"
Is there a way to import all these files at once and name them as "df_1", "df_2", "df_3", etc.?
I found another stackoverflow post that talks about a similar problem: How to import folder which contains csv file in R Studio?
setwd("where is your folder")
#
#List file subdirectories
folders<- list.files(path = "C:/Users/Documents/files_i_want")
#
#Get all files...
files <- rep(NA,0)
for(i in c(1:length(folders)))
{
files.i <- list.files(path = noquote(paste("C:/Users/Documents/files_i_want/",folders[i], "/", sep = "")))
n <- length(files.i)
files.i <- paste(folders[i], files.i, sep = "/")
files <- c(files, files.i)
}
#
#
#Read first data file (& add file name as separate column)
T1 <- read.delim(paste("C:/Users/Documents/files_i_want", files[1], sep = ""), sep = "", header=TRUE)
T1 <- cbind(T1, "FileName" = files[1])
But this produces the following error:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
Is this because there is a problem in the naming convention?
Thanks

You can try the following :
#Get the path of filenames
filenames <- list.files("C:/Users/Documents/files_i_want", full.names = TRUE)
#Read them in a list
list_data <- lapply(filenames, read.table)
#Name them as per your choice (df_1, df_2 etc)
names(list_data) <- paste('df', seq_along(filenames), sep = '_')
#Create objects in global environment.
list2env(list_data, .GlobalEnv)

Related

Extract column from multiple csv files and merge into new data frame in R

I want to extract column called X1 out of 168 different .csv files, called table3_2, table3_3, table3_4, table3_5..., table3_168, all held in one folder (folder1). Then, merge into one new df. Contents of the column is factor.
Trying this code but can't get it to work.
folder1 <- "folder1"
folder2 <- "folder2" # destination folder
write_to <- function(file.name) {
file.name <- paste0(tools::file_path_sans_ext(basename(file.name)), ".csv")
df <- read.csv(paste(folder1, file.name, sep = "/"), header = FALSE, sep = "/")[X1]
write.csv(df, file = past(folder2, file.name, sep= "/"))
}
files <- list.files(path = folder1, pattern = "*.csv")
lapply(X = paste(folder1, files, sep= "/"), write_to)
This comes up with the error:
Error in file(file, "rt") : cannot open the connection
In addition: warning message:
In file(file, "rt") :
cannot often file folder1/folder1.csv: No such file or directory
So, I am not calling in the correct names of the table, and maybe not directing R to the correct folder (I've set the wd to folder1).
Any suggestions would be greatly appreciated.
Many thanks
There are a few minor issues that stand out, e.g. you have a typo in file = past(folder2, file.name, sep= "/") (should be paste() not past()) but perhaps a simpler approach would suit, e.g. using vroom:
library(vroom)
files <- fs::dir_ls(glob = "table3_*csv")
data <- vroom(files, id = "ID", col_select = c(ID, X1))
data
vroom_write(data, file = "~/xx/folder2/new_df.csv")
# replace "xx" with your path
Does this approach solve your problem?

R: Looping through all files in the directory, applying the column replacement command

So I have a a directory of .txt files. Each contains columns representing features of some measurement. For the files in the same directory, I want to replace the content of one column named treatmentsum to 3S_TNFaCHx_cluster1_333nM+0.1ug/ml_none. (I will replace the same column with different names in different folders, and yes I know this name is utterly annoying, trust me there are much worse names)
So I thought, replacing the column in a dataframe is easy, I just open directory then do below:
df <- read.table(x, header=TRUE)
df$treatmentsum <- c("3S_TNFaCHx_cluster1_333nM+0.1ug/ml_none")
Next, I just need to make this a function and lapply it to all files in the directory and make sure the new files are deposited in a new folder.
The full code did not work:
#set up directories
directory <- "C:/Users/XXX"
outdirectory <- "C:/Users/XXX/replaced"
#find all .txt files in the directory
files <- list.files(path=directory, pattern="*.txt", full.names=TRUE, recursive=FALSE)
lapply(files, function(x) {
df <- read.table(x, header=TRUE)
df$treatmentsum <- c("the new stupid name thanks to pharma")
out <- function(df)
# write to file
write.table(out, outdirectory, sep="\t", quote=FALSE, row.names=FALSE, col.names=TRUE)
})
warning message that I got:
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 1 did not have 744 elements
I tried creating the folder structure and tested out the following code. Seems to work.
source_folder <- "Folder1"
destination_folder <- "Folder2"
files <- list.files(path = source_folder, pattern = "*.txt",
full.names = TRUE, recursive = FALSE)
lapply(files, function(x) {
df <- read.table(x, header=TRUE)
df$Col2 <- c("the new stupid name thanks to pharma")
write.table(df, paste0(destination_folder,"/", basename(x)), sep="\t",
quote=FALSE, row.names=FALSE, col.names=TRUE)
})

Looping through files using dynamic name variable in R

I have a large number of files to import which are all saved as zip files.
From reading other posts it seems I need to pass the zip file name and then the name of the file I want to open. Since I have a lot of them I thought I could loop through all the files and import them one by one.
Is there a way to pass the name dynamically or is there an easier way to do this?
Here is what I have so far:
Temp_Data <- NULL
Master_Data <- NULL
file.names <- c("f1.zip", "f2.zip", "f3.zip", "f4.zip", "f5.zip")
for (i in 1:length(file.names)) {
zipFile <- file.names[i]
dataFile <- sub(".zip", ".csv", zipFile)
Temp_Data <- read.table(unz(zipFile,
dataFile), sep = ",")
Master_Data <- rbind(Master_Data, Temp_Data)
}
I get the following error:
In open.connection(file, "rt") :
I can import them manually using:
dt <- read.table(unz("D:/f1.zip", "f1.csv"), sep = ",")
I can create the sting dynamically but it feels long winded - and doesn't work when I wrap it with read.table(unz(...)). It seems it can't find the file name and so throws an error
cat(paste(toString(shQuote(paste("D:/",zipFile, sep = ""))),",",
toString(shQuote(dataFile)), sep = ""), "\n")
But if I then print this to the console I get:
"D:/f1.zip","f1.csv"
I can then paste this into `read.table(unz(....)) and it works so I feel like I am close
I've tagged in data.table since this is what I almost always use so if it can be done with 'fread' that would be great.
Any help is appreciated
you can use the list.files command here:
first set your working directory, where all your files are stored there:
setwd("C:/Users/...")
then
file.names = list.files(pattern = "*.zip", recursive = F)
then your for loop will be:
for (i in 1:length(file.names)) {
#open the files
zipFile <- file.names[i]
dataFile <- sub(".zip", ".csv", zipFile)
Temp_Data <- read.table(unz(zipFile,
dataFile), sep = ",")
# your function for the opened file
Master_Data <- rbind(Master_Data, Temp_Data)
#write the file finaly
write_delim(x = Master_Data, path = paste(file.names[[i]]), delim = "\t",
col_names = T )}

How to bypass errors in for loops in r?

I created a for loop to merge several csv files in a directory together into one table. In some cases the files that are indicated in the loop have not been created. When the files do not exist the loop produces an error and no files are merged. I am trying to adjust the code so the loop inserts "NULL" or "error" in the parts of the matrix reserved for the files.
Here is the original code:
COMP_raw <- cbind(m, matrix(-9999, ncol = length(dirnames), nrow = 169))
setwd() #actual wd is removed for posting
for(i in length(dirnames)){
j<-dirnames[1] #Take the directory folder name
id<-gsub("_.*$","",dirnames[1]) #Take the numeric identifier of the indicator
fpath <- file.path(paste(j,"/",id,"_2016",".csv", sep = "")) #Merge the directory folder name and desired csv to a file path format
data<-read.csv(fpath,header = TRUE, as.is = TRUE)
last <- max(ncol(data))
COMP_raw[,(1+1)] <- data[,last]
colnames(COMP_raw)[(1+1)] <- names(data[last])
}
This above code works for every loop where the "fpath" actually exists in my directory. When the csv does not exist the following message occurs.
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file '2.1_PermitIndirectCosts/2.1_2016.csv': No such file or directory
I looked at a few other posts to see how to solve the issue and tried the following
COMP_raw <- cbind(m, matrix(-9999, ncol = length(dirnames), nrow = 169))
for(i in length(dirnames)){
j<-dirnames[1] #Take the directory folder name
id<-gsub("_.*$","",dirnames[1]) #Take the numeric identifier of the indicator
fpath <- file.path(paste(j,"/",id,"_2016",".csv", sep = "")) #Merge the directory folder name and desired csv to a file path format
possibleerror<- tryCatch(data<-read.csv(fpath,header = TRUE, as.is = TRUE),silent = TRUE),
error=function(e) e
)
if(!inherits(possibleerror,"error"))
{last <- max(ncol(data))
COMP_raw[,(1+3)] <- data[,last]
colnames(COMP_raw)[(1+3)] <- names(data[last])}
}
But that is still generating an error
What about using file.exists().
file.exists returns a logical vector indicating whether the files named by its argument exist.
COMP_raw <- cbind(m, matrix(-9999, ncol = length(dirnames), nrow = 169))
setwd() #actual wd is removed for posting
for(i in length(dirnames)){
j <- dirnames[1] #Take the directory folder name
id <- gsub("_.*$","",dirnames[1]) #Take the numeric identifier of the indicator
fpath <- file.path(paste(j,"/",id,"_2016",".csv", sep = "")) #Merge the directory folder name and desired csv to a file path format
#Checks if file exists if not, assign NULL
if(file.exists(fpath)){
data <- read.csv(fpath,header = TRUE, as.is = TRUE)
last <- max(ncol(data))
COMP_raw[,(1+1)] <- data[,last]
colnames(COMP_raw)[(1+1)] <- names(data[last])
} else{
colnames(COMP_raw)[(1+1)] <- NULL
}
}
Not specific to your example (I'm on a mobile) but it should help:
var <- try(some function)
if(is(var, "try-error")){
some other function
next;}
If try fails, it will assign the variable a value of "try-error" which you can handle accordingly. next will go to the next item in the loop.

Read multiple files and save data into one dataframe in R

I am trying to read multiple files and then combine them into one data frame. The code that I am using is as follows:
library(plyr)
mydata = ldply(list.files(path="Data load for stations/data/Predicted",pattern = "txt"), function(filename) {
dum = read.table(filename,skip=5, header=F, sep=" ")
#If you want to add the filename as well on the column
dum$filename = filename
return(dum)
})
The error that I am getting is as follows:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'mobdata201001.txt': No such file or directory
The data files can be found on https://www.dropbox.com/sh/827kmkrwd0irehk/BFbftkks42
Any help is highly appreciated.
Alternatively you can use argument full.names in list.files:
list.files(path="Data load for stations/data/Predicted",
pattern = "txt", full.names=TRUE)
It will add automatically the full path before the file name.
Try the following code:
library(plyr)
path <- "Data load for stations/data/Predicted/"
filenames <- paste0(path, list.files(path, pattern = "txt"))
mydata = ldply(filenames, function(filename) {
dum = read.table(filename,skip=5, header=F, sep=" ")
#If you want to add the filename as well on the column
dum$filename = filename
return(dum)
})
I think what is happening is you're generating a list of files relative to the path in list.files, and then asking read.table to take the filename without the rest of the path...

Resources