Files with unknown file names - r

I am having a folder where I have a lot of csv files in it.
Can I read all of them, for example as zoo object without knowing the file name?
UPDATE
I tried that:
files <- list.files( "C://Users//ramid//Desktop//Files//" );
(na.omit(files))
for( i in files ) {
filePath <- gsub(" ","", paste("C://Users//ramid//Desktop//Files//",files[i],".csv"), fixed=TRUE)
cat(filePath)
df <- read.csv(gsub(" ","", filePath, fixed=TRUE), header = TRUE, sep = ";",stringsAsFactors=FALSE)
}
However I am getting an error:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'C://Users//ramid//Desktop//Files//NA.csv': No such file or directory
I do not have any NA in my files list.

I'd use a combination of list.files and lapply:
list_of_files = list.files('.', pattern = '*csv', full.names = TRUE)
list_of_csv_contents = lapply(list_of_files, read.csv)
list_of_zoo = lapply(list_of_csv_contents, zoo)
Or wrap both the read.csv and zoo in one step:
read_into_zoo = function(path) {
contents = read.csv(path)
zoo_contents = zoo(contents)
return(zoo_contents)
}
list_of_zoo = lapply(list_of_files, read_into_zoo)
This strategy of storing things in lists/arrays/vectors/matrices and using apply style looping is a strategy that works very well in R.

Related

Extract column from multiple csv files and merge into new data frame in R

I want to extract column called X1 out of 168 different .csv files, called table3_2, table3_3, table3_4, table3_5..., table3_168, all held in one folder (folder1). Then, merge into one new df. Contents of the column is factor.
Trying this code but can't get it to work.
folder1 <- "folder1"
folder2 <- "folder2" # destination folder
write_to <- function(file.name) {
file.name <- paste0(tools::file_path_sans_ext(basename(file.name)), ".csv")
df <- read.csv(paste(folder1, file.name, sep = "/"), header = FALSE, sep = "/")[X1]
write.csv(df, file = past(folder2, file.name, sep= "/"))
}
files <- list.files(path = folder1, pattern = "*.csv")
lapply(X = paste(folder1, files, sep= "/"), write_to)
This comes up with the error:
Error in file(file, "rt") : cannot open the connection
In addition: warning message:
In file(file, "rt") :
cannot often file folder1/folder1.csv: No such file or directory
So, I am not calling in the correct names of the table, and maybe not directing R to the correct folder (I've set the wd to folder1).
Any suggestions would be greatly appreciated.
Many thanks
There are a few minor issues that stand out, e.g. you have a typo in file = past(folder2, file.name, sep= "/") (should be paste() not past()) but perhaps a simpler approach would suit, e.g. using vroom:
library(vroom)
files <- fs::dir_ls(glob = "table3_*csv")
data <- vroom(files, id = "ID", col_select = c(ID, X1))
data
vroom_write(data, file = "~/xx/folder2/new_df.csv")
# replace "xx" with your path
Does this approach solve your problem?

R: Importing Entire Folder of Files

I am using the R programming language (in R Studio). I am trying to import an entire folder of ".txt" files (notepad files) into R and "consistently" name them.
I know how to do this process manually:
#find working directory:
getwd()
[1] "C:/Users/Documents"
#import files manually and name them "consistently":
df_1 <- read.table("3rd_file.txt")
df_2 <- read.table("file_1.txt")
df_3 <- read.table("second_file.txt")
Of course, this will take a long time to do if there are 100 files.
Right now, suppose these files are in a folder : "C:/Users/Documents/files_i_want"
Is there a way to import all these files at once and name them as "df_1", "df_2", "df_3", etc.?
I found another stackoverflow post that talks about a similar problem: How to import folder which contains csv file in R Studio?
setwd("where is your folder")
#
#List file subdirectories
folders<- list.files(path = "C:/Users/Documents/files_i_want")
#
#Get all files...
files <- rep(NA,0)
for(i in c(1:length(folders)))
{
files.i <- list.files(path = noquote(paste("C:/Users/Documents/files_i_want/",folders[i], "/", sep = "")))
n <- length(files.i)
files.i <- paste(folders[i], files.i, sep = "/")
files <- c(files, files.i)
}
#
#
#Read first data file (& add file name as separate column)
T1 <- read.delim(paste("C:/Users/Documents/files_i_want", files[1], sep = ""), sep = "", header=TRUE)
T1 <- cbind(T1, "FileName" = files[1])
But this produces the following error:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
Is this because there is a problem in the naming convention?
Thanks
You can try the following :
#Get the path of filenames
filenames <- list.files("C:/Users/Documents/files_i_want", full.names = TRUE)
#Read them in a list
list_data <- lapply(filenames, read.table)
#Name them as per your choice (df_1, df_2 etc)
names(list_data) <- paste('df', seq_along(filenames), sep = '_')
#Create objects in global environment.
list2env(list_data, .GlobalEnv)

error reading and merging a folder of data into one dataframe

If it helps, I'm running R 3.3.1 on a Macbook Pro OS El Captain...
I am trying to read in a folder of similar data files. I've checked the directory and the files are where they should be:
list.files('../data/')
[1] "B101.txt" "B101p2.txt" "B116.txt" "B6.txt" "B65.txt" "B67.txt" "B67p2.txt"
[8] "B70.txt" "B71.txt" "B71p2.txt" "B95.txt" "B95p2.txt" "B96.txt" "B96p2.txt"
[15] "B98.txt" "B98p2.txt" "B99.txt" "B99p2.txt"
The following is my code and error:
a = ldply(
.data = list.files(
path = '../data/'
)
, .fun = function(x){
to_return = read.table(
file = x
, skip = 20
, sep = '\t'
, fill = TRUE
)
return(to_return)
}
, .progress = 'text'
)
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'B101.txt': No such file or directory
I do not know what the problem is as all searches for those errors suggest fixing the directory. I have also checked the data files and can read an individual file using:
read.table('../data/B101.txt', skip = 20, sep = '\t', fill=TRUE)
Could someone please help me to fix the problem of reading in the whole folder. I'm trying to sort out the script with a small number of files but will need it to run for a much larger number, so reading them in one by one isn't practical. Thanks.
By default, list.files returns only the filename itself, not including the leading (relative or absolute) path (if any). When dealing with files potentially in another directory, you need to include full.names = TRUE:
a = ldply(
.data = list.files(
path = '../data/',
full.names = TRUE
)
, .fun = function(x){
to_return = read.table(
file = x
, skip = 20
, sep = '\t'
, fill = TRUE
)
return(to_return)
}
, .progress = 'text'
)

How can I use a variable as an argument to a function, specifically unz() in R

I am writing an R function that reads CSV files from a subdirectory in a ZIP file without first unzipping it, using read.csv() and unz().
The CSV files are named with leading 0 as in 00012.csv, 00013.csv etc.
The function has the following parameters: MyZipFile, ASubDir, VNum (a vector e.g. 1:42) which forms the filename.
What I want is to use the variable PathNfilename in unz().
# Incorporate the directory in the ZIP file while constructing the filename using stringr package
PathNfilename <- paste0("/", ASubDir, "/", str_pad(Vnum, 5, pad = "0"), ".csv", sep="")
What works is:
csvdata <- read.csv(unz(description = "MyZipFile.zip", filename = "ASubDirectory/00039.csv"), header=T, quote = "")
What I need is something along these lines of this:
csvdata <- read.csv(unz(description = "MyZipFile.zip", filename = PathNFileName), header=T, quote = "")
The error that I get is:
Error in open.connection(file, "rt") : cannot open the connection
In addition: Warning message:
In open.connection(file, "rt") :
cannot locate file '/ASubDir/00039.csv' in zip file 'MyZipFile.zip'
I'd like to understand why I'm getting the error and how to resolve it. Is it a scoping issue?
Try with some PathFilename without the leading /
ASubDir <- "ASubDirectory"
Vnum <- 1:5
PathNfilename <- file.path(ASubDir,
paste0(str_pad(Vnum, 5, pad = "0"), ".csv")
)
PathNfilename
#> [1] "ASubDirectory/00001.csv" "ASubDirectory/00002.csv"
#> [3] "ASubDirectory/00003.csv" "ASubDirectory/00004.csv"
#> [5] "ASubDirectory/00005.csv"

Read multiple files and save data into one dataframe in R

I am trying to read multiple files and then combine them into one data frame. The code that I am using is as follows:
library(plyr)
mydata = ldply(list.files(path="Data load for stations/data/Predicted",pattern = "txt"), function(filename) {
dum = read.table(filename,skip=5, header=F, sep=" ")
#If you want to add the filename as well on the column
dum$filename = filename
return(dum)
})
The error that I am getting is as follows:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'mobdata201001.txt': No such file or directory
The data files can be found on https://www.dropbox.com/sh/827kmkrwd0irehk/BFbftkks42
Any help is highly appreciated.
Alternatively you can use argument full.names in list.files:
list.files(path="Data load for stations/data/Predicted",
pattern = "txt", full.names=TRUE)
It will add automatically the full path before the file name.
Try the following code:
library(plyr)
path <- "Data load for stations/data/Predicted/"
filenames <- paste0(path, list.files(path, pattern = "txt"))
mydata = ldply(filenames, function(filename) {
dum = read.table(filename,skip=5, header=F, sep=" ")
#If you want to add the filename as well on the column
dum$filename = filename
return(dum)
})
I think what is happening is you're generating a list of files relative to the path in list.files, and then asking read.table to take the filename without the rest of the path...

Resources