I am trying to read multiple excel files in a folder using the read.xlsx2 function. I only need to read a particular sheet titled 'Returns' or 'Prices'.
Is there a way I can give an 'OR' argument in the function and also skip a file if it contains neither of the sheets?
P.s.: Each file will have either a 'Returns' or a 'Prices' sheet or neither but not both so there cannot be a clash.
Thanks
You could read all the sheet names of the file and using intersect select one of 'Returns' or 'Prices' whichever is present in the sheet and read the excel file with that sheet.
Using readxl you can do this as :
library(readxl)
all_files <- list.files(pattern = '\\.xlsx$')
result <- lapply(all_files, function(x) {
all_sheets <- excel_sheets(x)
correct_sheet <- intersect(all_sheets, c('Returns', 'Prices'))
if(length(correct_sheet)) read_xlsx(x, correct_sheet)
})
result will have a list of dataframes. If you want to combine the data into one dataframe and if they have same column names you can use do.call(rbind, result)
Related
I have a folder of excel files that contain multiple sheets each. The sheets are named the same in each wb. I'm trying to import one specific named sheet for all excel files as separate data frames. I have been able to import them in; however, the names become df_1, df_2, df_3, etc... I've been trying to take the first word of the excel file name and use that to identify the df.
Example of Excel file Name "AAPL Multiple Sheets" the sheet would be named "Balance" I'm importing as a df. I would like "AAPL Balance df" as the result.
The code that came closest to what I'm looking for located below, however, it names each data frame as df_1, df_2, and so on.
library(purrr)
library(readxl)
files_list <- list.files(path = 'C:/Users/example/Drive/Desktop/Total_Related_Data/Analysis of Data/',
pattern = "*.xlsx",full.names = TRUE)
files_list %>%
walk2(1:length(files_list),
~ assign(paste0("df_", .y), read_excel(path = .x), envir = globalenv()))
I tried using the file path variable 'file_list' in the past0 function to label them and ended up with,
df_C:/Users/example/Drive/Desktop/Total_Related_Data/Analysis of Data/.xlsx1, df_C:/Users/example/Drive/Desktop/Total_Related_Data/Analysis of Data/.xlsx2,
and so on.
I tried to make a list of file names to use. This read the file names and created a list but I couldn't make it work with the code above.
files_Names<-list.files(path='C:/Users/example/Drive/Desktop/Total_Related_Data/Analysis of Data/', pattern=NULL, all.files=FALSE, full.names=FALSE)
Which resulted with this,
"AAPL Analysis of Data.xlsx" for all the files in the list.
You can do the following (note that I'm using the openxlsx package for reading in Excel files, but you can replace that part with readxl of course):
library(openxlsx)
library(tidyverse)
Starting with your `files_list` we can do:
# using lapply to read in all files and store them as list elements in one list
list_of_dfs <- lapply(as.list(files_list), function(x) readWorkbook(x, sheet = "Balance"))
# Create a vector of names based on the first word of the filename + "Balance"
# Note that we can't use empty space in object names, hence the underscore
df_names <- paste0(str_extract(basename(files_list), "[^ ]+"), "_Balance_df")
# Assign the names to our list of dfs
names(list_of_dfs) <- df_names
# Push the list elements (i.e. data frames) to the Global environment
# I highly recommend NOT doing this. I'd say in 99% of the cases it's better to continue working in the list structure or combine the individual dfs into one large df.
list2env(list_of_dfs, env = .GlobalEnv)
I hope I could reproduce your example without code. I would create a function to have more control for the new filename.
I would suggest:
library(purrr)
library(readxl)
library(openxlsx)
target_folder <- 'C:/Users/example/Drive/Desktop/Total_Related_Data/Analysis of Data'
files_list <- list.files(path = target_folder,
pattern = "*.xlsx", full.names = TRUE)
tease_out <- function(file) {
data <- read_excel(file, sheet = "Balance")
filename <- basename(file) %>% tools::file_path_sans_ext()
new_filename <- paste0(target_folder, "/", fileneame, "Balance df.xlsx")
write.xlsx(data, file = new_filename)
}
map(file_list, tease_out)
Let me know if it works. I assume you are just targeting for the sheet "Balance"?
Suppose I have two excel files named packs.xlsx and each contains multiple sheets. I want to iteratively create a dataframe using only 1 sheet from each file, each named "summary". How can I go about this using Purrr and readxl while also adding a field which contains the filename?
I'm successful when I save the sheets as CSVs using the following code:
filenames <- list.files(pattern="packs*.*csv")
dat <- map_dfr(filenames, read_xlsx, sheet = "summary") %>% glimpse()
How would I go about adding a field to show which file a given row came from? Thanks for any insight that can be offered!
Supposing the two packs.xlsx files are in different subfolders:
library(readxl)
filenames <- list.files(pattern = "packs.xlsx", recursive = TRUE)
df <- lapply(filenames, function(fn) {
# get the sheet detail
xl <- read_excel(fn, sheet = "summary")
# add the filename as a field
xl$filename <- fn
# function return
xl
})
# if both summary sheets have the same format, you can combine them into one
fin <- do.call(rbind, df)
I am trying to read a bunch of excel files, and all of the sheets from these files into R. I would like to then save each sheet as a separate data frame with the name of the data frame the same name as the name of the sheet. Some files only have 1 sheet, while others have more than one sheet so I'm not sure how to specify all sheets as opposed to just a number.
I have tried:
library(XLConnect)
files.list <- list.files(recursive=T,pattern='*.xlsx') #get files list from folder
for (i in 1:length(files.list)){
wb <- loadWorkbook(files.list[i])
sheet <- getSheets(wb, sheet = )
for (j in 1:length(sheet)){
tmp<-read.xlsx(files.list[i], sheetIndex=j,
sheetName=NULL,
as.data.frame=TRUE, header=F)
if (i==1&j==1) dataset<-tmp else dataset<-rbind(dataset,tmp)
}
}
and I get an error "could not find function "loadWorkbook"". At one point I resolved that issue and got an error "could not find function "getSheets"". I have had some issues getting this package to work so if anyone has a different alternative I would appreciate it!
You could try with readxl...
I've not tested this for the case of different workbooks with duplicate worksheet names.
There were a number of issues with your code:
the list.files pattern included a . which is a reserved character so needs to be escaped with \\
As #deschen pointed out the excel referring functions are from the openxlsx package
library(readxl)
files.list <- list.files(recursive = T, pattern = '*\\.xlsx$') #get files list from folder
for (i in seq_along(files.list)){
sheet_nm <- excel_sheets(files.list[i])
for (j in seq_along(sheet_nm)){
assign(x = sheet_nm[j], value = read_xlsx(path = files.list[i], sheet = sheet_nm[j]), envir = .GlobalEnv)
}
}
Created on 2022-01-31 by the reprex package (v2.0.1)
I'm pretty sure, the loadWorkbook function comes from package openxlsx. So you should use:
library(openxlsx)
https://cran.r-project.org/web/packages/openxlsx/openxlsx.pdf
This can easily be done using for loop but I am looking for a solution with lapply or dplyr.
I have multiple dataframes which I want to export to excel file in separate sheets (I saw many questions and answers on similar lines but couldn't find one that addressed naming sheets dynamically).
I want to name the sheet by the name of the dataframe. For simplicity, I have named the dataframes in a pattern (say df1 to df10). How do I do this?
Below is a reproducable example with my attempt with two dataframes mtcars and cars (which is working, but without good sheetnames).
names_of_dfs=c('mtcars','cars')
# variable 'combined' below will have all dfs separately stored in it
combined = lapply(as.list(names_of_dfs), get)
names(combined)=names_of_dfs # naming the list but unable to use below
multi.export=function(df,filename){
return(xlsx::write.xlsx(df,file = filename,
sheetName = paste0('sheet',sample(c(1:200),1)),
append = T))
}
lapply(combined, function(x) multi.export(x,filename='combined.xlsx'))
In case it can be done more easily with some other r package then please do suggest.
Here's an approach with writexl:
library(writexl)
write_xlsx(setNames(lapply(names_of_dfs,get),names_of_dfs),
path = "Test.xlsx")
We need to use setNames because the names of the sheets are set from the list names. Otherwise, the unnamed list that lapply returns will result in default sheet names.
Try something like this:
library(xlsx)
#Workbook
wb = createWorkbook()
#Lapply
lapply(names(combined), function(s) {
sht = createSheet(wb, s)
addDataFrame(combined[[s]], sht)
})
saveWorkbook(wb, "combined.xlsx")
I´m trying to read a xlsx file and write one data frame into the same sheet of xlsx without remove the other rows of my data frame.
I tried with the library("XLConnect") and with the function appendWorksheet() but the data is not writing in the correct place, and with the library("xlsx") but I can´t find a function similar to appendWorksheet.
I just want to read my xlsx file and write my data that is into a data frame and write into the same xlsx file without removing the previous rows.
There doesn't seem to be a great way to append data to an xlsx file. You can create a function which will read in the sheet and append data to the dataframe and then overwrite the xlsx.
library(xslx)
appendxlsx <- function(df2, path, sheetName) {
df1 <- read.xlsx2(path, sheetName = sheetName)
colnames(df2) <- colnames(df1)
df <- rbind(df1, df2)
write.xlsx2(df, path, sheetName = sheetName)
}
Then you would just call the function supplying the updated dataframe (df2), the path to the xlsx, and the sheet name. Something like this:
appendxlsx(df2=df2, "/path/to/xlsx", sheetName = "Sheet1")