Reading CSVs from a folder on R [duplicate] - r

This question already has answers here:
Reading multiple files into multiple data frames
(2 answers)
Closed 5 years ago.
d <- NULL
datafiles <- list.files(path = "C:___")
for (i in datafiles){
print(i)
j <- read.csv(i, header = T)
j$file <- i
d <- rbind(d, j)
}
When I ran just the print line, all of the csv names in the folder were displayed, but everything beyond the j <- line, an error pops up. When I ran the entire code, the error I got says:
Error in file(file, "rt") : cannot open the connection In addition:
Warning message:
In file(file, "rt") :
cannot open file 'xxx.csv': No such file or directory
Any suggestions would be appreciated, thanks!

You could try:
library(tidyverse)
file_list <- list.files()
df <- map_dfr(file_list, read_csv)
Or:
file_list <- list.files()
df <- do.call("rbind", lapply(file_list, read_csv))
Make sure to set your working directory correctly with setwd().

Related

Error in file(file, "rt") : cannot open the connection - Unsure of what to do

I am currently working through Coursera's R Programming course and have hit a bit of a snag with this assignment. I have been getting various errors (not I'm not totally sure I've nailed down) but this is a new one and no matter what I do I can't seem to shake it.
Whenever I run the below code it comes back with
Error in file(file, "rt") : cannot open the connection
pollutantmean <- function (directory, pollutant, id){
files<- list.files(path = directory, "/", full.names = TRUE)
dat <- data.frame()
dat <- sapply(file = directory,"/", read.csv)
mean(dat["pollutant"], na.rm = TRUE)
}
I have tried numerous different solutions posted here on SO for this issue but none of it has worked. I made sure that I am running after setting the working directory to the folder with all of the CSV files and I can see all of the files in the file pane. I have also moved that working directory around a few times since some of the suggestions were to put it on the desktop, etc. but none of that has worked. I am currently running R Studio as an admin but that does not seem to have done anything and I have also modified the permissions on the specdata file to ensure there's no weird restrictions there. Any help is appreciated.
Here are two possible implementations:
# list all files in "directory", read them, combine and then take mean of "pollutant" column
pollutantmean_1 <- function (directory){
files <- list.files(path = directory, full.names = TRUE)
dat <- lapply(file = directory, read.csv)
dat <- data.table::rbindlist(dat) |> as.data.frame()
mean(dat[, 'pollutant' ], na.rm = TRUE)
}
# list all files in "directory", read them, take the mean of "pollutant" column for each file and return them
pollutantmean_2 <- function (directory){
files <- list.files(path = directory, full.names = TRUE)
dat <- lapply(file = directory, read.csv)
pollutant_means <- sapply(dat, function(x) mean(x[ , 'pollutant' ], na.rm = TRUE))
names(pollutant_means) <- basename(files)
pollutant_means
}

R: Importing Entire Folder of Files

I am using the R programming language (in R Studio). I am trying to import an entire folder of ".txt" files (notepad files) into R and "consistently" name them.
I know how to do this process manually:
#find working directory:
getwd()
[1] "C:/Users/Documents"
#import files manually and name them "consistently":
df_1 <- read.table("3rd_file.txt")
df_2 <- read.table("file_1.txt")
df_3 <- read.table("second_file.txt")
Of course, this will take a long time to do if there are 100 files.
Right now, suppose these files are in a folder : "C:/Users/Documents/files_i_want"
Is there a way to import all these files at once and name them as "df_1", "df_2", "df_3", etc.?
I found another stackoverflow post that talks about a similar problem: How to import folder which contains csv file in R Studio?
setwd("where is your folder")
#
#List file subdirectories
folders<- list.files(path = "C:/Users/Documents/files_i_want")
#
#Get all files...
files <- rep(NA,0)
for(i in c(1:length(folders)))
{
files.i <- list.files(path = noquote(paste("C:/Users/Documents/files_i_want/",folders[i], "/", sep = "")))
n <- length(files.i)
files.i <- paste(folders[i], files.i, sep = "/")
files <- c(files, files.i)
}
#
#
#Read first data file (& add file name as separate column)
T1 <- read.delim(paste("C:/Users/Documents/files_i_want", files[1], sep = ""), sep = "", header=TRUE)
T1 <- cbind(T1, "FileName" = files[1])
But this produces the following error:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
Is this because there is a problem in the naming convention?
Thanks
You can try the following :
#Get the path of filenames
filenames <- list.files("C:/Users/Documents/files_i_want", full.names = TRUE)
#Read them in a list
list_data <- lapply(filenames, read.table)
#Name them as per your choice (df_1, df_2 etc)
names(list_data) <- paste('df', seq_along(filenames), sep = '_')
#Create objects in global environment.
list2env(list_data, .GlobalEnv)

Error in loop reading multiple text files

I'm a bit stuck with this code... The purpose is to read only text files from a folder with few different kind of files, take a column for each one and create a data frame with every extracted column (cbind.fill is a hand-made function that add a new column and fill the "empty" spaces with NA values). Here is the code:
setwd("...folderOfInterest/")
genes_data <- data.frame()
for(i in list.files(pattern = "^GO_.*txt", full.names = TRUE)){
print(i) #this works perfectly, it only prints desired files...
q <- read.table(i, header = TRUE, sep = "\t", quote = NULL)
genes_data <- cbind.fill(genes_data, q[,2])
}
As #Adam B suggests, here is the print(i) output and a screenshot of the folder (folder_screenshot):
[1] "./GO_ALPHA_AMINO_ACID_CATABOLIC_PROCESS.xls"
[1] "./GO_ALPHA_AMINO_ACID_METABOLIC_PROCESS.xls"
[1] "./GO_ALPHA_BETA_T_CELL_ACTIVATION.xls"
[1] "./GO_AMINO_ACID_BETAINE_METABOLIC_PROCESS.xls"
[1] "./GO_AMINO_ACID_IMPORT.xls"
[1] "./GO_AMINO_ACID_TRANSMEMBRANE_TRANSPORT.xls"
[1] "./GO_AMINO_ACID_TRANSPORT.xls"
[1] "./GO_AMINOGLYCAN_BIOSYNTHETIC_PROCESS.xls"
[1] "./GO_ANGIOGENESIS.xls"
[1] "./GO_ANION_TRANSPORT.xls"
[1] "./GO_ANTIGEN_PROCESSING_AND_PRESENTATION.xls"
[1] "./GO_ANTIGEN_PROCESSING_AND_PRESENTATION_OF_ENDOGENOUS_ANTIGEN.xls"
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file './GO_ANTIGEN_PROCESSING_AND_PRESENTATION_OF_ENDOGENOUS_ANTIGEN.xls': No such file or directory
(note: the files' extension is .xls, but really they are .txt files)
It propmts this message:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file './GO_ANTIGEN_PROCESSING_AND_PRESENTATION.txt': No such file or directory
Also running only q <- read.table(i, header = TRUE, sep = "\t", quote = NULL) appears this error message.
I think I'm in the correct folder (because print(i) works good), I've also changed full.names option and set list.files as a variable out the loop... but nothins seems to work. Please, if anybody has an idea it'll be welcome!
I've tried it on randomly generated files and it works. You probably do not need to cd into the directory with the data, just give the list.files a dir argument with the path to your data directory.
GOfls <- list.files("indata", pattern = "^GO_.*\\.txt", full.names = TRUE)
head(GOfls)
[1] "indata/GO_amswylfbgp.txt" "indata/GO_amswylfbgptxt" "indata/GO_apqqqktvir.txt"
[4] "indata/GO_arwudmbzsr.txt" "indata/GO_autljyljgn.txt" "indata/GO_beeqcmnayk.txt"
# lapply -> do.call for reading and binding the data is better approach
gene_data <- do.call('cbind', lapply(GOfls, function(path) read.delim(path)[,2]))
# have a look at the data
dim(gene_data)
[1] 100 100
I have tried to reproduce your problem this way (it's optional text):
dir.create("indata")
fls <- lapply(1:100, function(i) data.frame(matrix(rnorm(1000), ncol = 10)))
names(fls) <- replicate(100, paste0("./indata/", "GO_",
paste0(sample(letters, 10, replace = T),
collapse = ""), ".txt"
)
)
lapply(names(fls), function(x) write.table(fls[[x]], x, quote = F, sep = "\t"))
head(dir("indata"))
[1] "GO_acebruujkw.pdf" "GO_amswylfbgp.txt" "GO_amswylfbgptxt" "GO_apqqqktvir.txt"
[5] "GO_arwudmbzsr.txt" "GO_autljyljgn.txt"
# I have added some renamed .txt files (.pdf, .tiff, .gel) to the indata
rm(list = ls())
That's solved! It's a bit strange but copying the folder of interest into the desktop the code seems to work again.
A mate and I saw that hard disk's activity was collapsed, so we thought that maybe there could be a problem in the process of reading... so copying the folder was the (simple) solution!
Nevertheless, if anybody has an idea that explains this strange situation I'm sure it'll be useful! Thanks a lot!
EDIT
I've done some tests and maybe the problem is the name of the folder path, which it'd too long and crashes the loop.
I think it's because you're searching for .xls files, but then trying to open it at as a .txt file
In excel try saving the files as comma or tab delimited text files.
If you want to open excel files directly they have a few packages that can do that. Try readxl.

Error in file(file, "rt") : cannot open the connection in r

I am new to programming.Currently doing R programming in coursera and got this error while doing assignment named "pollutantmean".I searched in forums and stackoverflow,but couldnt able to fix it.Appreciate your help.Thanks.
I got this error :
Error in file(file, "rt") : cannot open the connection In addition: Warning message:
In file(file, "rt") : cannot open file 'NA': No such file or directory
Note: I have a folder "specdata" which is the working directory.This "specdata" has all the 332 csv files.I want to calculate mean for one of the pollutant column named "pollutant" in those files and "directory" is the location of those files."id" is a integer vector mentioning the monitor number.so,here is my code:
pollutantmean <- function(directory, pollutant, id = 1:332) {
files_full <- list.files(directory, full.names = TRUE)
dat <- data.frame()
for (i in id) {
dat <- rbind(dat, read.csv(files_full[i]))
}
mean(dat[, pollutant], na.rm = TRUE)
}
pollutantmean("specdata","sulfate",id = 1:10)
As per the error, it seems that list.files is returning empty vector for files names.
list.files returns only file nam inside the directory by default. So in that case, you should set the working directory first, so that read.csv will read the files inside the working directory.
setwd("<directory name>")
Now your function should work fine.

R: Cannot Open File : No such file or directory

I have script as below:
setwd ("I:/prep/Coord/RData/test")
#load .csv files
a.files <- grep("^Whirr", dir(), value=TRUE) #pattern matching
b.files <- paste0("Files_", a.files)
for(i in length(a.files)){
a <- read.table(a.files[i], header=T, sep=",", row.names=1) #read files start with Whirr_
b <- read.table(b.files[i], header=T, sep=",", row.names=1) #read files start with Files_
a
b
cr <- as.matrix(a) %*% as.matrix(t(a)
cr
diag(cr)<-0
cr
#write to file
write.csv(cr, paste0("CR_", a.files[i], ".csv"))
}
Basically, what I want to do is to compare two files which have similar filename at the end of file name, and do the calculation, and write the result to file.
When I tried to print a.files and b.files, the output seems ok for me. The output as below:
> a.files <- grep("^Whirr", dir(), value=TRUE) #pattern matching
> b.files <- paste0("Files_", a.files, sep="")
Error: could not find function "paste0"
> a.files
[1] "Whirr_127.csv" "Whirr_128.csv"
> b.files
[1] "Files_ Whirr_127.csv" "Files_ Whirr_128.csv"
>
I tried to feed the script with multiple files, but I got an error msg as below:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") : cannot open file 'Files_ Whirr_128.csv': No such file or directory
So, I tried to use file.choose, but it also doesn't work for me.
Appreciate help from the expert
Change the line:
b.files <- paste0("Files_", a.files)
to:
b.files <- paste("Files_", a.files, sep="")
You are using a version of R that does not have paste0 (I see that code was given to you in an earlier answer). This means you were keeping an earlier version of b.files, perhaps one that had been constructed using paste.
One important lesson about this is that whenever you get an error message about a line, such as Error: could not find function "paste0", that means the line did not happen! You have to fix that error before you paste the code, or tell us about the error when you do- otherwise we assume the b.files <- paste0("Files_", a.files) line works.

Resources