I would like to modify the piece of code bellow, which read several .csv (comma separated values) files, in order to inform it that the files are tab delimited, i.e., .tsv files.
temp = list.files(pattern="*.csv")
myfiles = lapply(temp, read.delim)
For individual files, I did (using the readr package):
data_1 <- readr::read_delim("dataset_1.csv", "\t", escape_double = FALSE, trim_ws = TRUE)
Any help? Thanks,
Ricardo.
I guess what you are looking for is the following:
Version 1: User defined function
my_read_delim <- function(path){
readr::read_delim(path, "\t", escape_double = FALSE, trim_ws = TRUE)
}
lapply(temp, my_read_delim)
Version 2: Using the ... argument of lapply
lapply has as third argument ... which means arguments after the second are passed to the function specified as second argument:
lapply(temp, readr::read_delim, delim = "\t", escape_double = FALSE, trim_ws = TRUE)
Version two is essentially the same as version one but more compact
Assuming all files do have the same columns:
In most applications after reading the data in via read_delim you want to rbind them. You can use map_df from the purrr-package to streamline this as follows:
require(purrr)
require(readr)
# or require(tidyverse)
temp <- list.files(pattern="*.csv")
map_df(temp, read_delim, delim = "\t", escape_double = FALSE, trim_ws = TRUE)
Related
I would like to import several ".txt" file in R and add a new "ID" column with the file name in it.
I found something like that that works to import all the txt files but I don't have my "ID" column.
listfile <- list.files("C:/Users/........",pattern = "txt",full.names = T, recursive = TRUE)
for (i in 1:length(listfile)){
if(i==1){
assign(paste0("Data"), read.table(listfile[i],header = TRUE, sep = ",", skipNul = TRUE))
}
}
rm(list = ls(pattern = "list.+?"))
All the files name have this format: "XXX-M_N6 2021-04-16.txt" with different letter instead of "XXX" and other date as well.
Any idea how to do that?
Thanks!
library(purrr)
purrr::map_dfr(listfile,
~{cbind(read.table(.x, header = TRUE, sep = ",", skipNul = TRUE),
ID = .x)})
Using R base, you can try something like:
path_arqs = dir() # accept that your csvs files are the same
df = do.call(rbind, lapply(path_arqs, read.csv))
im trying to separate a unique column in multiple csv files. I've already done it for one single file with this code:
tempmax <- read.csv(file="path", header=TRUE, sep=";", fill = TRUE)
colnames(tempmax) = c("Fecha", "Hora", "Temperatura max")
rbind(tempmax)
write.csv(tempmax, "path", sep = ";", append = FALSE, row.names = FALSE, col.names = FALSE)
However, I haven't found the way to do it in multiple csv saved in a folder. I would like to do the same: read, modify and write the new one.
I used this to read the multiple files:
getwd <- ("path")
filenames <- list.files("path",
pattern = "*.csv", full.names = TRUE)
But i just cant find the way to edit what i want. (i'm pretty new using R)
I appreciate the help. Thanks!
If we have several files, we can use lapply. It is not clear about the transformation. So, the file is written back by selecting the first column
lapply(filenames, function(file){
tempmax <- read.csv(file= file, header=TRUE, sep=";", fill = TRUE)
colnames(tempmax) = c("Fecha", "Hora", "Temperatura max")
write.csv(tempmax[1], file, sep = ";", append = FALSE,
row.names = FALSE, col.names = FALSE)})
I am combining a number of files that are essentially .txt files, though called .sta.
I've used the following code to combine them after having trouble with base R apply and dplyr lapply:
library(plyr)
myfiles <- list.files(path="LDI files", pattern ="*.sta", full.names = TRUE)
dat_tab <- ldply(myfiles, read.table, header= TRUE, sep = "\t", skip = 5)
I want to add a column which has values which are part of the file names. File name examples are "GFREX28-00-1" and "GFREX1534-00-1" . I want to keep the digits immediately after GFREX, before the first dash -.
I'm not sure if I understood your question correctly. I provide a tentative answer. The idea is to assign a new column to the data.frame before returning it.
filepaths <- list.files(path="LDI files", pattern ="*.sta",
full.names = TRUE)
filesnames <- list.files(path="LDI files", pattern ="*.sta",
full.names = FALSE)
dat_tab <- lapply(1:length(filepaths), function(i) {
df <- read.table(filepaths[i] header= TRUE, sep = "\t", skip = 5)
df$fn <- gsub("GFREX","",filesnames[i])
df
})
I'm fairly new to R, so my apologies if this is a very basic question.
I'm trying to read two Excel files in, using the list.files(pattern) method, then using a for loop to bind the files and replace values in the bound file. However, the output that my script is producing is the output from only one file, meaning that it is not binding.
The file names are fact_import_2020 and fact_import_20182019.
FilePath <- "//srdceld2/project2/"
FileNames <- list.files(path = FilePath, pattern = "fact_import_20", all.files = FALSE,
full.names = FALSE, recursive = FALSE,
ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)
FileCount <- length(FileNames)
for(i in 1:FileCount){
MOH_TotalHC_1 <- read_excel(paste(FilePath, "/", FileNames[i], sep = ""), sheet = 1, range = cell_cols("A:I"))
MOH_TotalHC_2 <- read_excel(paste(FilePath, "/", FileNames[i], sep = ""), sheet = 1, range = cell_cols("A:I"))
MOH_TotalHC <- rbind(MOH_TotalHC_1, MOH_TotalHC_2)
MOH_TotalHC <- MOH_TotalHC[complete.cases(MOH_TotalHC), ]
use full.names = TRUE in list.files().
After this, make sure FileNames has full path of the files.
Then loop through the filenames, instead of filecount.
I think, you are trying to do this. I am guessing here. Please see below.
You are getting data from one file, because you are overwriting the data from file-2 with data from file-1. The for() loop is indicating it.
FileNames <- list.files(path = FilePath, pattern = "fact_import_20", all.files = FALSE,
full.names = TRUE, recursive = FALSE,
ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)
# list of data from excell files
df_lst <- lapply(FileNames, function(fn){
read_excel(fn, sheet = 1, range = cell_cols("A:I"))
})
# combine both data
MOH_TotalHC <- do.call('rbind', df_lst)
# complete cases
MOH_TotalHC[complete.cases(MOH_TotalHC), ]
The potential solution is below. This solution is taken from here and seems like a
duplicate question.
Potential solution:
library(readxl)
library(data.table)
#Set your path here
FilePath <- "//srdceld2/project2/"
#Update the pattern to suit your needs. Currently, its just set for XLSX files
file.list <- list.files(path = FilePath, pattern = "*.xlsx", full.names = T)
df.list <- lapply(file.list, read_excel, sheet = 1, range = cell_cols("a:i"))
attr(df.list, "names") <- file.list
names(df.list) <- file.list
setattr(df.list, "names", file.list)
#final data frame is here
dfFinal <- rbindlist(df.list, use.names = TRUE, fill = TRUE)
Assumptions and call outs:
The files in the folder are similar file types. For example xlsx.
The files could have different set of columns and NULLs as well.
Note that the order of the columns matter and so if there are more columns in new file the number of output columns could be different.
Note: Like #Sathish, I am guessing what the input could look like
Sorry in advance but I don't think i can make this entirely reproduceable as it involves reading in txt files but you can test it out quite easily with a folder of a few tabbed txt files with some random numbers in.
I have a folder with several txt files inside; I would like to read each of them into a nested list. Currently I can read 1 txt at a time with this code:
user_input <- readline(prompt="paste the path for the folder here: ")
files <- list.files(path = user_input, pattern = NULL, all.files = FALSE, full.names = TRUE)
thefiles <- data.frame(files)
thefiles
Sfiles <- split(thefiles, thefiles$files)
Sfiles
input1 <- print(Sfiles[1])
But I want to read all of the files in the given directory.
I suppose it would then be a list of dataframes?
Here are some of the things i've tried:
-i guessed this would just paste all of the files in the directory but that's not entirely what i want to do.
{paste(thefiles,"/",files[[i]],".txt",sep="")
}
-this was meant to use lapply to execute read.delim on all of the files in the folder.
the error it gives is:
Error in file(file, "rt") : invalid 'description' argument
files_test <- list.files(path=user_input, pattern="*.txt", full.names=TRUE, recursive=FALSE)
lapply(thefiles, transform, files = read.delim(files, header = TRUE, sep = "\t", dec = "."))
-I tried it on its own aswell, also doesn't work
read.delim(files_test, header = TRUE, sep = "\t", dec = ".")
-I tried a for loop too:
test2 <- for (i in 1:length(Sepfiles){read.delim(files_test, header = TRUE, sep = "\t", dec = "."})
Is there anything obvious that I'm doing wrong? Any pointers would be appreciated
Thanks
This should work if the read.delim part is correct:
thefiles <- list.files(path = user_input, pattern = ".txt$", ignore.case = TRUE, full.names = TRUE, recursive = FALSE)
lapply(thefiles, function(f) read.delim(f, header = TRUE, sep = "\t", dec = "."))