Lapply function across specified excel sheets - r

I have multiple excel files, with multiple sheets. I need to extract certain data from each sheet and combine all the data together. For one sheet I do the following:
supdata = read_excel("Data/Exercise/IDNo-03.xlsx", sheet="Supervised", skip = 2)
ID = read_excel("Data/Exercise/IDNo-03.xlsx", sheet="Measurements", col_names = FALSE)
id = as.character( ID[1,1])%>%
str_replace("Participant ", "")
mass = as.numeric(ID[3,5])
supdata = supdata%>%
mutate(ID = id, Mass = mass)
This works. I need to do this for all the files.
I've tried this:
dir_path <- "Data/Exercise/"
list = list.files(path = dir_path, "*.xlsx")
all = lapply(list, function(x){
supdata = read_excel(x, sheet="Supervised", skip = 2)
ID = read_excel(x, sheet="Measurements", col_names = FALSE)
id = as.character( ID[1,1])%>%
str_replace("Participant ", "")
mass = as.numeric(ID[3,5])
supdata = supdata%>%
mutate(ID = id, Mass = mass)
})
list identifies the relevant files in the specified path, but I get an error:
Error: `path` does not exist: ‘IDNo-03.xlsx’
What am I doing wrong? Is there another way to approach this problem?
If I can get this bit working I will then do:
dat = do.call("rbind.data.frame", all)

list.files without specifying the full.names return only the file names without the full path
list.files(file.path(getwd(), "Downloads"), pattern ="\\.csv")
#[1] "testing.csv"
If we specify the full.names
list.files(file.path(getwd(), "Downloads"), pattern ="\\.csv", full.names = TRUE)
#[1]"/Users/akrun/Downloads/testing.csv"
When we loop over those files, without the path, it looks for the file in the working directory and thus gives the error

Related

Multiple files process with different output names in R

I have a number data set (say 50 files) of csv files: crasha, crashabd, crashd, …
I wrote a function to do some changes and analysis for a single data. I want to have a dynamic name for output. For example, I want to have newcrasha, newcrashabd, newcrashd, and … as output csv files. Indeed, I want to get names of imported files and use these as output filenames?
For example:
filenames <- list.files(path = "D:/health/car crash/", pattern = "csv",full.names = TRUE)
analyze <- function(filename) {
# Input is character string of a csv file.
crash <- read.csv(file = filename, header = TRUE)
#merg and summation (crashcounter and NUMBER_INJURED)
newcrash<-crash %>% group_by(COLLISION_DATE) %>% summarise(crashcounter = sum(crashcounter), NUMBER_INJURED = sum(NUMBER_INJURED))
write.csv( newcrash, "D://health//car crash// newcrash.csv", row.names = FALSE)
}
filenames <- filenames[1:50]
for (f in filenames) {
analyze(f)
}
Thank you for any help
try this, following the suggestion of #mhovd:
filename <- list.files(path = "D:/health/car crash/", pattern = "csv",full.names = TRUE)
analyze <- function(filename) {
# Input is character string of a csv file.
crash <- read.csv(file = filename, header = TRUE)
#merg and summation (crashcounter and NUMBER_INJURED)
newcrash<-crash %>% group_by(COLLISION_DATE) %>% summarise(crashcounter = sum(crashcounter), NUMBER_INJURED = sum(NUMBER_INJURED))
new.name <- paste0("D:/health/car crash/new",basename(tools::file_path_sans_ext(filename)),".csv")
write.csv( newcrash, file=new.name, row.names = FALSE)
}
lapply(filename[1:50], analyze)

for Loop in R not binding files

I'm fairly new to R, so my apologies if this is a very basic question.
I'm trying to read two Excel files in, using the list.files(pattern) method, then using a for loop to bind the files and replace values in the bound file. However, the output that my script is producing is the output from only one file, meaning that it is not binding.
The file names are fact_import_2020 and fact_import_20182019.
FilePath <- "//srdceld2/project2/"
FileNames <- list.files(path = FilePath, pattern = "fact_import_20", all.files = FALSE,
full.names = FALSE, recursive = FALSE,
ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)
FileCount <- length(FileNames)
for(i in 1:FileCount){
MOH_TotalHC_1 <- read_excel(paste(FilePath, "/", FileNames[i], sep = ""), sheet = 1, range = cell_cols("A:I"))
MOH_TotalHC_2 <- read_excel(paste(FilePath, "/", FileNames[i], sep = ""), sheet = 1, range = cell_cols("A:I"))
MOH_TotalHC <- rbind(MOH_TotalHC_1, MOH_TotalHC_2)
MOH_TotalHC <- MOH_TotalHC[complete.cases(MOH_TotalHC), ]
use full.names = TRUE in list.files().
After this, make sure FileNames has full path of the files.
Then loop through the filenames, instead of filecount.
I think, you are trying to do this. I am guessing here. Please see below.
You are getting data from one file, because you are overwriting the data from file-2 with data from file-1. The for() loop is indicating it.
FileNames <- list.files(path = FilePath, pattern = "fact_import_20", all.files = FALSE,
full.names = TRUE, recursive = FALSE,
ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)
# list of data from excell files
df_lst <- lapply(FileNames, function(fn){
read_excel(fn, sheet = 1, range = cell_cols("A:I"))
})
# combine both data
MOH_TotalHC <- do.call('rbind', df_lst)
# complete cases
MOH_TotalHC[complete.cases(MOH_TotalHC), ]
The potential solution is below. This solution is taken from here and seems like a
duplicate question.
Potential solution:
library(readxl)
library(data.table)
#Set your path here
FilePath <- "//srdceld2/project2/"
#Update the pattern to suit your needs. Currently, its just set for XLSX files
file.list <- list.files(path = FilePath, pattern = "*.xlsx", full.names = T)
df.list <- lapply(file.list, read_excel, sheet = 1, range = cell_cols("a:i"))
attr(df.list, "names") <- file.list
names(df.list) <- file.list
setattr(df.list, "names", file.list)
#final data frame is here
dfFinal <- rbindlist(df.list, use.names = TRUE, fill = TRUE)
Assumptions and call outs:
The files in the folder are similar file types. For example xlsx.
The files could have different set of columns and NULLs as well.
Note that the order of the columns matter and so if there are more columns in new file the number of output columns could be different.
Note: Like #Sathish, I am guessing what the input could look like

R read_excel or readxl Multiple Files with Multiple Sheets - Bind

I have a directory full of .xlsx files. They all have multiple sheets. I want to extract the same sheet from all of the files and append them into a tibble.
I have found numerous solutions for extracting multiple sheets from a single Excel file; however, not a single sheet from multiple files.
I have tried:
paths = as.tibble(list.files("data/BAH", pattern = ".xlsx", full.names = TRUE, all.files = FALSE))
test <- paths %>% read_xlsx(sheet = "Portal", col_names = TRUE)
I know the "paths" variable contains all of my file names with path. However, I am not sure how to iterate through each file name appending just the specific sheet = "Portal" to a csv file.
The error is:
Error: path must be a string
I have tried to pass in paths as a vector, as a tibble, and tried sub-scripting it as well. All fails.
So, in summary. I have a directory of xlsx files and I need to extract a single sheet from each one and append it to a csv file. I have tried using purrr with some map functions but also could not get it to work.
My goal was to use the Tidy way.
Thanks for any hints.
You have to use lapply() or map(). Try
test <- lapply(paths, read_xlsx, sheet = "Portal", col_names = TRUE)
or
library(purrr)
test <- map_dfr(paths, read_xlsx, sheet = "Portal", col_names = TRUE)
You can then bind the dataframes with
library(dplyr)
test %>% bind_rows()
library(tidyverse)
library(readxl)
library(fs)
# Get all files
xlsx_files <- fs::dir_ls("data/BAH", regexp = "\\.xlsx$")
paths = as_tibble(list.files("data/BAH", pattern = ".xlsx", full.names = TRUE, all.files = FALSE))
#portal_tabs <- map_dfr(paths, read_xlsx, sheet = "Portal", col_names = TRUE)
portal_tabs <- map_dfr(xlsx_files, read_xlsx, sheet = "Portal", col_names = TRUE, .id = 'source')

My R code isn't throwing any errors, but it's not doing what it's supposed to

Some background for my question: This is an R script that a previous research assistant wrote, but he did not provide any guidance to me on using it for myself. After working through an R textbook, I attempted to use the code on my data files.
What this code is supposed to do is load multiple .csv files, delete certain items/columns from them, and then write the new cleaned .csv files to a specified directory.
When I run my code, I don't get any errors, but the code isn't going anything. I originally thought that this was a problem with file permissions, but I'm still having the problem after changing them. Not sure what to try next.
Here's the code:
library(data.table)
library(magrittr)
library(stringr)
# create a function to delete unnecessary variables from a CAFAS or PECFAS
data set and save the reduced copy
del.items <- function(file)
{
data <- read.csv(input = paste0("../data/pecfas|cafas/raw",
str_match(pattern = "cafas|pecfas", string = file) %>% tolower, "/raw/",
file), sep = ",", header = TRUE, na.strings = "", stringsAsFactors = FALSE,
skip = 0, colClasses = "character", data.table = FALSE)
data <- data[-grep(pattern = "^(CA|PEC)FAS_E[0-9]+(T(Initial|[0-
9]+|Exit)|SP[a-z])_(G|S|Item)[0-9]+$", x = names(data))]
write.csv(data, file = paste0("../data/pecfas|cafas/items-del",
str_match(pattern = "cafas|pecfas", string = file) %>% tolower, "/items-
del/", sub(pattern = "ExportData_", x = file, replacement = "")) %>%
tolower, sep = ",", row.names = FALSE, col.names = TRUE)
}
# delete items from all cafas data sets
cafas.files <- list.files("../data/cafas/raw/", pattern = ".csv")
for (file in cafas.files){
del.items(file)
}
# delete items from all pecfas data sets
pecfas.files <- list.files("../data/pecfas/raw/", pattern = ".csv")
for (file in pecfas.files){
del.items(file)
}

R write.csv is creating an empty file

Some background for my question: This is an R script that a previous research assistant wrote, but he did not provide any guidance to me on using it for myself. After working through an R textbook, I attempted to use the code on my data files.
What this code is supposed to do is load multiple .csv files, delete certain items/columns from them, and then write the new cleaned .csv files to a specified directory.
Currently, the files are being created in the right directory with the right file name, but the .csv files that are being created are empty.
I am currently getting the following error message:
Warning in
fread(input = paste0("data/", str_match(pattern = "CAFAS|PECFAS",: Starting data input on line 2 and discarding line 1 because it has too few or too many items to be column names or data: (variable names).
This is my code:
library(data.table)
library(magrittr)
library(stringr)
# create a function to delete unnecessary variables from a CAFAS or PECFAS
data set and save the reduced copy
del.items <- function(file){
data <- fread(input = paste0("data/", str_match(pattern = "CAFAS|PECFAS",
string = file) %>% tolower, "/raw/", file), sep = ",", header = TRUE,
na.strings = "", stringsAsFactors = FALSE, skip = 0, colClasses =
"character", data.table = FALSE)
data <- data[-grep(pattern = "^(CA|PEC)FAS_E[0-9]+(TR?(Initial|[0-
9]+|Exit)|SP[a-z])_(G|S|Item)[0-9]+$", x = names(data))]
write.csv(data, file = paste0("data/", str_match(pattern = "CAFAS|PECFAS",
string = file) %>% tolower, "/items-del/", sub(pattern = "ExportData_", x =
file, replacement = "")) %>% tolower, row.names = FALSE)
}
# delete items from all cafas data sets
cafas.files <- list.files("data/cafas/raw", pattern = ".csv")
for (file in cafas.files){
del.items(file)
}
# delete items from all pecfas data sets
pecfas.files <- list.files("data/pecfas/raw", pattern = ".csv")
for (file in pecfas.files){
del.items(file)
}

Resources