I would like to import several ".txt" file in R and add a new "ID" column with the file name in it.
I found something like that that works to import all the txt files but I don't have my "ID" column.
listfile <- list.files("C:/Users/........",pattern = "txt",full.names = T, recursive = TRUE)
for (i in 1:length(listfile)){
if(i==1){
assign(paste0("Data"), read.table(listfile[i],header = TRUE, sep = ",", skipNul = TRUE))
}
}
rm(list = ls(pattern = "list.+?"))
All the files name have this format: "XXX-M_N6 2021-04-16.txt" with different letter instead of "XXX" and other date as well.
Any idea how to do that?
Thanks!
library(purrr)
purrr::map_dfr(listfile,
~{cbind(read.table(.x, header = TRUE, sep = ",", skipNul = TRUE),
ID = .x)})
Using R base, you can try something like:
path_arqs = dir() # accept that your csvs files are the same
df = do.call(rbind, lapply(path_arqs, read.csv))
Related
ListOfFileNames= list.files(path = "D:/in/",
pattern = '*.txt',recursive = T)
options(stringsAsFactors = F)
setwd("D:/in/")
outFile <- file("output.txt", "w")
for (i in ListOfFileNames){
x = read.delim(ListOfFileNames[i], skip = 29, nrows= 1)
x = as.character(x)
writeLines(x, paste('D:/out/out.csv',sep = ","))
}
enter link description hereThis the txt files that I have.
I would like to extract row number 30 and 63 from each txt file and save it into one txt file. How can I solve this in R ? This is the codes that I try to extract row number 30 and store it in one csv file. But it doesn't work. Could you please help ?
Thanks
You can try :
ListOfFileNames= list.files(path = "D:/in/",
pattern = '*.txt',recursive = TRUE, full.names = TRUE)
result <- do.call(rbind, lapply(ListOfFileNames, function(x)
read.csv(x)[c(30, 63), ]))
write.csv(result, 'D:/out/out.csv', row.names = FALSE)
im trying to separate a unique column in multiple csv files. I've already done it for one single file with this code:
tempmax <- read.csv(file="path", header=TRUE, sep=";", fill = TRUE)
colnames(tempmax) = c("Fecha", "Hora", "Temperatura max")
rbind(tempmax)
write.csv(tempmax, "path", sep = ";", append = FALSE, row.names = FALSE, col.names = FALSE)
However, I haven't found the way to do it in multiple csv saved in a folder. I would like to do the same: read, modify and write the new one.
I used this to read the multiple files:
getwd <- ("path")
filenames <- list.files("path",
pattern = "*.csv", full.names = TRUE)
But i just cant find the way to edit what i want. (i'm pretty new using R)
I appreciate the help. Thanks!
If we have several files, we can use lapply. It is not clear about the transformation. So, the file is written back by selecting the first column
lapply(filenames, function(file){
tempmax <- read.csv(file= file, header=TRUE, sep=";", fill = TRUE)
colnames(tempmax) = c("Fecha", "Hora", "Temperatura max")
write.csv(tempmax[1], file, sep = ";", append = FALSE,
row.names = FALSE, col.names = FALSE)})
I'm fairly new to R, so my apologies if this is a very basic question.
I'm trying to read two Excel files in, using the list.files(pattern) method, then using a for loop to bind the files and replace values in the bound file. However, the output that my script is producing is the output from only one file, meaning that it is not binding.
The file names are fact_import_2020 and fact_import_20182019.
FilePath <- "//srdceld2/project2/"
FileNames <- list.files(path = FilePath, pattern = "fact_import_20", all.files = FALSE,
full.names = FALSE, recursive = FALSE,
ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)
FileCount <- length(FileNames)
for(i in 1:FileCount){
MOH_TotalHC_1 <- read_excel(paste(FilePath, "/", FileNames[i], sep = ""), sheet = 1, range = cell_cols("A:I"))
MOH_TotalHC_2 <- read_excel(paste(FilePath, "/", FileNames[i], sep = ""), sheet = 1, range = cell_cols("A:I"))
MOH_TotalHC <- rbind(MOH_TotalHC_1, MOH_TotalHC_2)
MOH_TotalHC <- MOH_TotalHC[complete.cases(MOH_TotalHC), ]
use full.names = TRUE in list.files().
After this, make sure FileNames has full path of the files.
Then loop through the filenames, instead of filecount.
I think, you are trying to do this. I am guessing here. Please see below.
You are getting data from one file, because you are overwriting the data from file-2 with data from file-1. The for() loop is indicating it.
FileNames <- list.files(path = FilePath, pattern = "fact_import_20", all.files = FALSE,
full.names = TRUE, recursive = FALSE,
ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)
# list of data from excell files
df_lst <- lapply(FileNames, function(fn){
read_excel(fn, sheet = 1, range = cell_cols("A:I"))
})
# combine both data
MOH_TotalHC <- do.call('rbind', df_lst)
# complete cases
MOH_TotalHC[complete.cases(MOH_TotalHC), ]
The potential solution is below. This solution is taken from here and seems like a
duplicate question.
Potential solution:
library(readxl)
library(data.table)
#Set your path here
FilePath <- "//srdceld2/project2/"
#Update the pattern to suit your needs. Currently, its just set for XLSX files
file.list <- list.files(path = FilePath, pattern = "*.xlsx", full.names = T)
df.list <- lapply(file.list, read_excel, sheet = 1, range = cell_cols("a:i"))
attr(df.list, "names") <- file.list
names(df.list) <- file.list
setattr(df.list, "names", file.list)
#final data frame is here
dfFinal <- rbindlist(df.list, use.names = TRUE, fill = TRUE)
Assumptions and call outs:
The files in the folder are similar file types. For example xlsx.
The files could have different set of columns and NULLs as well.
Note that the order of the columns matter and so if there are more columns in new file the number of output columns could be different.
Note: Like #Sathish, I am guessing what the input could look like
Some background for my question: This is an R script that a previous research assistant wrote, but he did not provide any guidance to me on using it for myself. After working through an R textbook, I attempted to use the code on my data files.
What this code is supposed to do is load multiple .csv files, delete certain items/columns from them, and then write the new cleaned .csv files to a specified directory.
When I run my code, I don't get any errors, but the code isn't going anything. I originally thought that this was a problem with file permissions, but I'm still having the problem after changing them. Not sure what to try next.
Here's the code:
library(data.table)
library(magrittr)
library(stringr)
# create a function to delete unnecessary variables from a CAFAS or PECFAS
data set and save the reduced copy
del.items <- function(file)
{
data <- read.csv(input = paste0("../data/pecfas|cafas/raw",
str_match(pattern = "cafas|pecfas", string = file) %>% tolower, "/raw/",
file), sep = ",", header = TRUE, na.strings = "", stringsAsFactors = FALSE,
skip = 0, colClasses = "character", data.table = FALSE)
data <- data[-grep(pattern = "^(CA|PEC)FAS_E[0-9]+(T(Initial|[0-
9]+|Exit)|SP[a-z])_(G|S|Item)[0-9]+$", x = names(data))]
write.csv(data, file = paste0("../data/pecfas|cafas/items-del",
str_match(pattern = "cafas|pecfas", string = file) %>% tolower, "/items-
del/", sub(pattern = "ExportData_", x = file, replacement = "")) %>%
tolower, sep = ",", row.names = FALSE, col.names = TRUE)
}
# delete items from all cafas data sets
cafas.files <- list.files("../data/cafas/raw/", pattern = ".csv")
for (file in cafas.files){
del.items(file)
}
# delete items from all pecfas data sets
pecfas.files <- list.files("../data/pecfas/raw/", pattern = ".csv")
for (file in pecfas.files){
del.items(file)
}
I would like to modify the piece of code bellow, which read several .csv (comma separated values) files, in order to inform it that the files are tab delimited, i.e., .tsv files.
temp = list.files(pattern="*.csv")
myfiles = lapply(temp, read.delim)
For individual files, I did (using the readr package):
data_1 <- readr::read_delim("dataset_1.csv", "\t", escape_double = FALSE, trim_ws = TRUE)
Any help? Thanks,
Ricardo.
I guess what you are looking for is the following:
Version 1: User defined function
my_read_delim <- function(path){
readr::read_delim(path, "\t", escape_double = FALSE, trim_ws = TRUE)
}
lapply(temp, my_read_delim)
Version 2: Using the ... argument of lapply
lapply has as third argument ... which means arguments after the second are passed to the function specified as second argument:
lapply(temp, readr::read_delim, delim = "\t", escape_double = FALSE, trim_ws = TRUE)
Version two is essentially the same as version one but more compact
Assuming all files do have the same columns:
In most applications after reading the data in via read_delim you want to rbind them. You can use map_df from the purrr-package to streamline this as follows:
require(purrr)
require(readr)
# or require(tidyverse)
temp <- list.files(pattern="*.csv")
map_df(temp, read_delim, delim = "\t", escape_double = FALSE, trim_ws = TRUE)