I have more than 70 CSV files and I am trying to merge them row-wise (they all have same columns). I tried to combine them using this code:
library(tidyverse)
library(plyr)
library(readr)
setwd("*\\data")
myfolder="test"
allfiles= list.files(path=myfolder, pattern="*.csv", full.names = T)
allfiles
combined_csv= ldply(allfiles, read.csv)
Once I run this code I get a warning message:
Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
EOF within quoted string
It looks like that I am losing some rows. How can I fix this?
It is possible that same columns in different files are read as different types when some of them have some 'character' element and some are just numeric. Here, is one method to read with all columns specified as "character" column, rbind the elements and then use type.convert to automatically convert the column classes based on the value it have
library(data.table)
out <- rbindlist(lapply(list.files(path=myfolder, full.names = TRUE),
fread, colClasses = "character"))
out <- type.convert(out, as.is = TRUE)
Try this:
library(dplyr)
library(readr)
myfolder="test"
df <- list.files(path=myfolder, full.names = TRUE) %>%
lapply(read_csv) %>%
bind_rows
Related
The goal is to combine multiple .txt files with single column from different subfolders then cbind to one dataframe (each file will be one column), and keep file names as columne value, an example of the .txt file:
0.348107
0.413864
0.285974
0.130399
...
My code:
#list all the files in the folder
listfile<- list.files(path="",
pattern= "txt",full.names = T, recursive = TRUE) #To include sub directories, change the recursive = TRUE, else FALSE.
#extract the files with folder name aINS
listfile_aINS <- listfile[grep("aINS",listfile)]
#inspect file names
head(listfile_aINS)
#combined all the text files in listfile_aINS and store in dataframe 'Data'
for (i in 1:length(listfile_aINS)){
if(i==1){
assign(paste0("Data"), read.table(listfile[i],header = FALSE, sep = ","))
}
if(!i==1){
assign(paste0("Test",i), read.table(listfile[i],header = FALSE, sep = ","))
Data <- cbind(Data,get(paste0("Test",i))) #choose one: cbind, combine by column; rbind, combine by row
rm(list = ls(pattern = "Test"))
}
}
rm(list = ls(pattern = "list.+?"))
I ran into two problems:
R returns this error because the .txt files have different # of rows.
"Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 37, 36"
I have too many files so I hope to work around the error without having to fix the files into the same length.
my code won't keep file name as the column name
It will be easier to write a function and then rbind() the data from each file. The resulting data frame will have a file column with the filename from the listfile_aINS vector.
read_file <- function(filename) {
dat <- read.table(filename,header = FALSE, sep = ",")
dat$file <- filename
return(dat)
}
all_dat <- do.call(rbind, lapply(listfile_aINS, read_file))
If they don't all have the same number of rows it might not make sense to have each column be a file, but if you really want that you could make it into a wide dataset with NA filling out the empty rows:
library(dplyr)
library(tidyr)
all_dat %>%
group_by(file) %>%
mutate(n = 1:n()) %>%
pivot_wider(names_from = file, values_from = V1)
I am combining a number of files that are essentially .txt files, though called .sta.
I've used the following code to combine them after having trouble with base R apply and dplyr lapply:
library(plyr)
myfiles <- list.files(path="LDI files", pattern ="*.sta", full.names = TRUE)
dat_tab <- ldply(myfiles, read.table, header= TRUE, sep = "\t", skip = 5)
I want to add a column which has values which are part of the file names. File name examples are "GFREX28-00-1" and "GFREX1534-00-1" . I want to keep the digits immediately after GFREX, before the first dash -.
I'm not sure if I understood your question correctly. I provide a tentative answer. The idea is to assign a new column to the data.frame before returning it.
filepaths <- list.files(path="LDI files", pattern ="*.sta",
full.names = TRUE)
filesnames <- list.files(path="LDI files", pattern ="*.sta",
full.names = FALSE)
dat_tab <- lapply(1:length(filepaths), function(i) {
df <- read.table(filepaths[i] header= TRUE, sep = "\t", skip = 5)
df$fn <- gsub("GFREX","",filesnames[i])
df
})
I am trying to import multiple CSVs from a folder at once, but the CSVs do not have column names. The following code works, but the first row is converted into column names:
dat <- list.files(pattern="*.csv") %>% lapply(read.csv)
When I try to use the code below:
dat <- list.files(pattern="*.csv") %>% lapply(read.csv(header = FALSE))
I get the following error message:
Error in read.table(file = file, header = header, sep = sep, quote = quote, : argument "file" is missing, with no default
Any idea how I can avoid this?
The issue comes from incorrect specifying of additional parameters to FUN.
? lapply:
lapply(X, FUN, ...)
... optional arguments to FUN.
You need to make a tiny change to your code to get it to work:
dat <- list.files(pattern="*.csv") %>% lapply(read.csv, header=FALSE)
If you're in the tidyverse you might want
list.files(pattern=".csv") %>%
purrr::map(readr::read_csv, col_names=FALSE)
(watch out for differences in default behaviour between read.csv and readr::read_csv)
I have a file in which every row is a string of numbers. Example of a row: 0234
Example of this file:
00020
04921
04622
...
When i use read.table it delete all the first 0 of each row (00020 becomes 20, 04921 -> 4921,...). I use:
example <- read.table(fileName, sep="\t",check.names=FALSE)
After this, for obtain a vector i use as.vector(unlist(example)).
I try different options of read.table but the problem remains
The read.table by default checks the column values and change the column type accordingly. If we want a custom type, specify it with colClasses
example <- read.table(fileName, sep="\t",check.names=FALSE,
colClasses = "character", stringsAsFactors = FALSE)
When we are not specifying the colClasses, the function use type.convert to automatically assign the column types based on the value
read.table # function
...
...
data[[i]] <- if (is.na(colClasses[i]))
type.convert(data[[i]], as.is = as.is[i], dec = dec,
numerals = numerals, na.strings = character(0L))
...
...
If I understand the issue correctly, you read in your data file with read.table but since you want a vector, not a data frame, you then unlist the df. And you want to keep the leading zeros.
There is a simpler way of doing the same, use scan.
example <- scan(file = fileName, what = character(), sep = "\t")
I would like to modify the piece of code bellow, which read several .csv (comma separated values) files, in order to inform it that the files are tab delimited, i.e., .tsv files.
temp = list.files(pattern="*.csv")
myfiles = lapply(temp, read.delim)
For individual files, I did (using the readr package):
data_1 <- readr::read_delim("dataset_1.csv", "\t", escape_double = FALSE, trim_ws = TRUE)
Any help? Thanks,
Ricardo.
I guess what you are looking for is the following:
Version 1: User defined function
my_read_delim <- function(path){
readr::read_delim(path, "\t", escape_double = FALSE, trim_ws = TRUE)
}
lapply(temp, my_read_delim)
Version 2: Using the ... argument of lapply
lapply has as third argument ... which means arguments after the second are passed to the function specified as second argument:
lapply(temp, readr::read_delim, delim = "\t", escape_double = FALSE, trim_ws = TRUE)
Version two is essentially the same as version one but more compact
Assuming all files do have the same columns:
In most applications after reading the data in via read_delim you want to rbind them. You can use map_df from the purrr-package to streamline this as follows:
require(purrr)
require(readr)
# or require(tidyverse)
temp <- list.files(pattern="*.csv")
map_df(temp, read_delim, delim = "\t", escape_double = FALSE, trim_ws = TRUE)