I'm fairly new to R, so my apologies if this is a very basic question.
I'm trying to read two Excel files in, using the list.files(pattern) method, then using a for loop to bind the files and replace values in the bound file. However, the output that my script is producing is the output from only one file, meaning that it is not binding.
The file names are fact_import_2020 and fact_import_20182019.
FilePath <- "//srdceld2/project2/"
FileNames <- list.files(path = FilePath, pattern = "fact_import_20", all.files = FALSE,
full.names = FALSE, recursive = FALSE,
ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)
FileCount <- length(FileNames)
for(i in 1:FileCount){
MOH_TotalHC_1 <- read_excel(paste(FilePath, "/", FileNames[i], sep = ""), sheet = 1, range = cell_cols("A:I"))
MOH_TotalHC_2 <- read_excel(paste(FilePath, "/", FileNames[i], sep = ""), sheet = 1, range = cell_cols("A:I"))
MOH_TotalHC <- rbind(MOH_TotalHC_1, MOH_TotalHC_2)
MOH_TotalHC <- MOH_TotalHC[complete.cases(MOH_TotalHC), ]
use full.names = TRUE in list.files().
After this, make sure FileNames has full path of the files.
Then loop through the filenames, instead of filecount.
I think, you are trying to do this. I am guessing here. Please see below.
You are getting data from one file, because you are overwriting the data from file-2 with data from file-1. The for() loop is indicating it.
FileNames <- list.files(path = FilePath, pattern = "fact_import_20", all.files = FALSE,
full.names = TRUE, recursive = FALSE,
ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)
# list of data from excell files
df_lst <- lapply(FileNames, function(fn){
read_excel(fn, sheet = 1, range = cell_cols("A:I"))
})
# combine both data
MOH_TotalHC <- do.call('rbind', df_lst)
# complete cases
MOH_TotalHC[complete.cases(MOH_TotalHC), ]
The potential solution is below. This solution is taken from here and seems like a
duplicate question.
Potential solution:
library(readxl)
library(data.table)
#Set your path here
FilePath <- "//srdceld2/project2/"
#Update the pattern to suit your needs. Currently, its just set for XLSX files
file.list <- list.files(path = FilePath, pattern = "*.xlsx", full.names = T)
df.list <- lapply(file.list, read_excel, sheet = 1, range = cell_cols("a:i"))
attr(df.list, "names") <- file.list
names(df.list) <- file.list
setattr(df.list, "names", file.list)
#final data frame is here
dfFinal <- rbindlist(df.list, use.names = TRUE, fill = TRUE)
Assumptions and call outs:
The files in the folder are similar file types. For example xlsx.
The files could have different set of columns and NULLs as well.
Note that the order of the columns matter and so if there are more columns in new file the number of output columns could be different.
Note: Like #Sathish, I am guessing what the input could look like
Related
Consider one file 'C:/ZFILE' that includes many zip files.
Now, consider that each of these zip includes many csv, among which one specific csv named 'NAME.CSV', all these scattered 'NAME.CSV' being similarly named and structured (i.e., same columns).
How to rbind all these scattered csv?
The script below allows that, but a function would be more appropriate.
How to do this?
Thanks
zfile <- "C:/ZFILE"
zlist <- list.files(path = zfile, pattern = "\\.zip$", recursive = FALSE, full.names = TRUE)
zlist # list all zip from the zfile file
zunzip <- lapply(zlist, unzip, exdir = zfile) # unzip all zip in the zfile file (may takes time depending on the number of zip)
library(data.table) # rbindlist & fread
csv_name <- "NAME.CSV"
csv_list <- list.files(path = zfile, pattern = paste0("\\", csv_name, "$"), recursive = TRUE, ignore.case = FALSE, full.names = TRUE)
csv_list # list all 'NAME.CSV' from the zfile file
csv_rbind <- rbindlist(sapply(csv_list, fread, simplify = FALSE), idcol = 'filename')
You can try this type of function ( you can pass the unzip call directly to the cmd param of data.table::fread())
get_zipped_csv <- function(path) {
fnames = list.files(path,full.names = T)
rbindlist(lapply(fnames, \(f) fread(cmd = paste0("unzip -p ",f))[,src:=f]))
}
Usage:
get_zipped_csv(path = "C:\ZFILE\")
I have multiple excel files, with multiple sheets. I need to extract certain data from each sheet and combine all the data together. For one sheet I do the following:
supdata = read_excel("Data/Exercise/IDNo-03.xlsx", sheet="Supervised", skip = 2)
ID = read_excel("Data/Exercise/IDNo-03.xlsx", sheet="Measurements", col_names = FALSE)
id = as.character( ID[1,1])%>%
str_replace("Participant ", "")
mass = as.numeric(ID[3,5])
supdata = supdata%>%
mutate(ID = id, Mass = mass)
This works. I need to do this for all the files.
I've tried this:
dir_path <- "Data/Exercise/"
list = list.files(path = dir_path, "*.xlsx")
all = lapply(list, function(x){
supdata = read_excel(x, sheet="Supervised", skip = 2)
ID = read_excel(x, sheet="Measurements", col_names = FALSE)
id = as.character( ID[1,1])%>%
str_replace("Participant ", "")
mass = as.numeric(ID[3,5])
supdata = supdata%>%
mutate(ID = id, Mass = mass)
})
list identifies the relevant files in the specified path, but I get an error:
Error: `path` does not exist: ‘IDNo-03.xlsx’
What am I doing wrong? Is there another way to approach this problem?
If I can get this bit working I will then do:
dat = do.call("rbind.data.frame", all)
list.files without specifying the full.names return only the file names without the full path
list.files(file.path(getwd(), "Downloads"), pattern ="\\.csv")
#[1] "testing.csv"
If we specify the full.names
list.files(file.path(getwd(), "Downloads"), pattern ="\\.csv", full.names = TRUE)
#[1]"/Users/akrun/Downloads/testing.csv"
When we loop over those files, without the path, it looks for the file in the working directory and thus gives the error
im trying to separate a unique column in multiple csv files. I've already done it for one single file with this code:
tempmax <- read.csv(file="path", header=TRUE, sep=";", fill = TRUE)
colnames(tempmax) = c("Fecha", "Hora", "Temperatura max")
rbind(tempmax)
write.csv(tempmax, "path", sep = ";", append = FALSE, row.names = FALSE, col.names = FALSE)
However, I haven't found the way to do it in multiple csv saved in a folder. I would like to do the same: read, modify and write the new one.
I used this to read the multiple files:
getwd <- ("path")
filenames <- list.files("path",
pattern = "*.csv", full.names = TRUE)
But i just cant find the way to edit what i want. (i'm pretty new using R)
I appreciate the help. Thanks!
If we have several files, we can use lapply. It is not clear about the transformation. So, the file is written back by selecting the first column
lapply(filenames, function(file){
tempmax <- read.csv(file= file, header=TRUE, sep=";", fill = TRUE)
colnames(tempmax) = c("Fecha", "Hora", "Temperatura max")
write.csv(tempmax[1], file, sep = ";", append = FALSE,
row.names = FALSE, col.names = FALSE)})
I am combining a number of files that are essentially .txt files, though called .sta.
I've used the following code to combine them after having trouble with base R apply and dplyr lapply:
library(plyr)
myfiles <- list.files(path="LDI files", pattern ="*.sta", full.names = TRUE)
dat_tab <- ldply(myfiles, read.table, header= TRUE, sep = "\t", skip = 5)
I want to add a column which has values which are part of the file names. File name examples are "GFREX28-00-1" and "GFREX1534-00-1" . I want to keep the digits immediately after GFREX, before the first dash -.
I'm not sure if I understood your question correctly. I provide a tentative answer. The idea is to assign a new column to the data.frame before returning it.
filepaths <- list.files(path="LDI files", pattern ="*.sta",
full.names = TRUE)
filesnames <- list.files(path="LDI files", pattern ="*.sta",
full.names = FALSE)
dat_tab <- lapply(1:length(filepaths), function(i) {
df <- read.table(filepaths[i] header= TRUE, sep = "\t", skip = 5)
df$fn <- gsub("GFREX","",filesnames[i])
df
})
I was wondering whether someone has an idea how to read the EXIF data from multiple image directories. I have gathered image data, but for single samples this is often stored in multiple subdirectories. So far, I've tried this:
multidirdata <- list.dirs("D:/F04", full.names = TRUE, recursive = TRUE)
for (i in 1 : length(multidirdata)){
setwd("C:/exiftool/")
multisubdirdata <- list.dirs(multidirdata[i])
for (j in 1 : length(multisubdirdata)){
filelist <- list.files(path = multisubdirdata, pattern = ".tif", full.names = TRUE)
fulldata <- data.frame(system('exiftool -FileName -GPSLatitude -GPSLongitude -DateTimeOriginal -,
"D:\\F04\\0005SET\\000"', intern = TRUE))
img.df <- read.delim2(textConnection(fulldata), stringsAsFactors = FALSE, header = FALSE,
col.names = c("File", "Lat", "Lon", "Time"))
setwd(multisubdirdata[j])
write.csv(fulldata, file = paste("multipts", "csv", sep = "."), row.names = TRUE, append = FALSE)
}
}
As you can see, this only asks the EXIF data from "D:\F04\0005SET\000" and not from other directories such as "D:\F04\0005SET\001".
Preferably, I'd like to set a vector of all needed image directories through the vectors multidirdata and multisubdirdata, and use those in the EXIF command.
Paying attention to the Common Mistake that StarGeek mentioned made it work for me now:
setwd("C:/exiftool/")
fulldata <- system('exiftool -FileName -GPSLatitude -GPSLongitude -DateTimeOriginal -ext tif -r. "D:\\GIS\\Congo\\F04"', intern = TRUE)