R load csv files from folder - r

I am loading a bunch of csv files simultaneously from a local directory using
the following code:
myfiles = do.call(rbind, lapply(files, function(x) read.table(x, stringsAsFactors = FALSE, header = F, fill = T, sep=",", quote=NULL)))
and getting an error message:
Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match
I am afraid that quotes cause this as I inspect the number of columns in each of the 4 files I see that file number 3 contain 10 columns (incorrect) and the rest only 9 columns (correct). Looking into the corrupted file - it is definitely caused by quotes that cause a column split.
Any help apreciated

Found the answer, quote parameter should be set to quote ="\""
myfiles = do.call(rbind, lapply(files, function(x) read.table(x, stringsAsFactors = FALSE, header = F, fill = T, sep=",", quote ="\"")))

Related

R: Trying to read several .txt files from a directory into a nested list

Sorry in advance but I don't think i can make this entirely reproduceable as it involves reading in txt files but you can test it out quite easily with a folder of a few tabbed txt files with some random numbers in.
I have a folder with several txt files inside; I would like to read each of them into a nested list. Currently I can read 1 txt at a time with this code:
user_input <- readline(prompt="paste the path for the folder here: ")
files <- list.files(path = user_input, pattern = NULL, all.files = FALSE, full.names = TRUE)
thefiles <- data.frame(files)
thefiles
Sfiles <- split(thefiles, thefiles$files)
Sfiles
input1 <- print(Sfiles[1])
But I want to read all of the files in the given directory.
I suppose it would then be a list of dataframes?
Here are some of the things i've tried:
-i guessed this would just paste all of the files in the directory but that's not entirely what i want to do.
{paste(thefiles,"/",files[[i]],".txt",sep="")
}
-this was meant to use lapply to execute read.delim on all of the files in the folder.
the error it gives is:
Error in file(file, "rt") : invalid 'description' argument
files_test <- list.files(path=user_input, pattern="*.txt", full.names=TRUE, recursive=FALSE)
lapply(thefiles, transform, files = read.delim(files, header = TRUE, sep = "\t", dec = "."))
-I tried it on its own aswell, also doesn't work
read.delim(files_test, header = TRUE, sep = "\t", dec = ".")
-I tried a for loop too:
test2 <- for (i in 1:length(Sepfiles){read.delim(files_test, header = TRUE, sep = "\t", dec = "."})
Is there anything obvious that I'm doing wrong? Any pointers would be appreciated
Thanks
This should work if the read.delim part is correct:
thefiles <- list.files(path = user_input, pattern = ".txt$", ignore.case = TRUE, full.names = TRUE, recursive = FALSE)
lapply(thefiles, function(f) read.delim(f, header = TRUE, sep = "\t", dec = "."))

Warning message in R when using colClasses when reading csv files

I am using lapply to read a list of files. The files have multiple rows and columns, and I interested in the first row in the first column. The code I am using is:
lapply(file_list, read.csv,sep=',', header = F, col.names=F, nrow=1, colClasses = c('character', 'NULL', 'NULL'))
The first row has three columns but I am only reading the first one. From other posts on stackoverflow I found that the way to do this would be to use colClasses = c('character', 'NULL', 'NULL'). While this approach is working, I would like to know the underlying issue that is causing the following error message to be generated and hopefully prevent it from popping up:
"In read.table(file = file, header = header, sep = sep, quote = quote, :
cols = 1 != length(data) = 3"
It's to let you know that you're just keeping one column of the data out of three because it doesn't know how to handle colClasses of "NULL". Note your NULL is in quotation marks.
An example:
write.csv(data.frame(fi=letters[1:3],
fy=rnorm(3,500,1),
fo=rnorm(3,50,2))
,file="a.csv",row.names = F)
write.csv(data.frame(fib=letters[2:4],
fyb=rnorm(3,5,1),
fob=rnorm(3,50,2))
,file="b.csv",row.names = F)
file_list=list("a.csv","b.csv")
lapply(file_list, read.csv,sep=',', header = F, col.names=F, nrow=1, colClasses = c('character', 'NULL', 'NULL'))
Which results in:
[[1]]
FALSE.
1 fi
[[2]]
FALSE.
1 fib
Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
cols = 1 != length(data) = 3
Which is the same as if you used:
lapply(file_list, read.csv,sep=',', header = F, col.names=F,
nrow=1, colClasses = c('character', 'asdasd', 'asdasd'))
But the warning goes away (and you get the rest of the row as a result) if you do:
lapply(file_list, read.csv,sep=',', header = F, col.names=F,
nrow=1, colClasses = c( 'character',NULL, NULL))
You can see where errors and warnings come from in source code for a function by entering, for example, read.table directly without anything following it, then searching for your particular warning within it.

How to get the name back and write a csv after using llply in a list of dataframes in R

I need to add 2 columns to a list of csv files and then write the csv's again into a folder. So, what I did is I used llply.
data_files <- list.files(pattern= ".csv$", recursive = T, full.names = F)
x <- llply(data_files, read.csv, header = T)
y <- llply(x, within, Cf <- var1 * 8)
z <- llply(y, within, Pc <- Cf + 1)
When I tried to write the files again using write.table in a loop:
lapply(z, FUN = function(eachPath) {
b <- read.csv(eachPath, header = F)
write.table(b, file = eachPath, row.names = F, col.names = T, quote = F)
})
I get this error and I think it is because z is a list of lists.
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
'file' must be a character string or connection
What I think it needs to be done is to convert z in a list of dataframes. I would like and advise of how to do that, plus adding a command to extract the name of each file from a column containing the sample ID.
Thanks

Read Several Files in R - TAB Delimited Files

I would like to modify the piece of code bellow, which read several .csv (comma separated values) files, in order to inform it that the files are tab delimited, i.e., .tsv files.
temp = list.files(pattern="*.csv")
myfiles = lapply(temp, read.delim)
For individual files, I did (using the readr package):
data_1 <- readr::read_delim("dataset_1.csv", "\t", escape_double = FALSE, trim_ws = TRUE)
Any help? Thanks,
Ricardo.
I guess what you are looking for is the following:
Version 1: User defined function
my_read_delim <- function(path){
readr::read_delim(path, "\t", escape_double = FALSE, trim_ws = TRUE)
}
lapply(temp, my_read_delim)
Version 2: Using the ... argument of lapply
lapply has as third argument ... which means arguments after the second are passed to the function specified as second argument:
lapply(temp, readr::read_delim, delim = "\t", escape_double = FALSE, trim_ws = TRUE)
Version two is essentially the same as version one but more compact
Assuming all files do have the same columns:
In most applications after reading the data in via read_delim you want to rbind them. You can use map_df from the purrr-package to streamline this as follows:
require(purrr)
require(readr)
# or require(tidyverse)
temp <- list.files(pattern="*.csv")
map_df(temp, read_delim, delim = "\t", escape_double = FALSE, trim_ws = TRUE)

Constantly amending dataframe

I have a dataframe that generates from a folder that users will place several .csv files into. The .csv files will always have the same column structure, however they vary in row length. The idea is to make a single dataframe with all of the .csv files. When I use the code below with multiple .csv files I receive the following error message: "Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 88, 259"
temp <- list.files(pattern="*.csv", path = dir, full.names = TRUE)
importDM<-lapply(temp, read.csv, header = TRUE)
rawDM <- as.data.frame(importDM)
rawDM$Created.Date <- as.Date(rawDM$Created.Date...Time, format="%d/%m/%Y")
rawDM$Week <- strftime(rawDM$Created.Date,format="%W")
Something that will be an issue down the road as well is I want only the first .csv file added to be used for the header, as I believe with the code as it is will just lapply the header into the dataframe with each .csv file added.
Cheers,
Found an answer on a blog elsewhere, here was the final code:
temp <- list.files(pattern="*.csv", path = dir, full.names = TRUE)
importDM<-do.call("rbind", lapply(temp, read.csv, header = TRUE))
rawDM <- as.data.frame(importDM)

Resources