This can easily be done using for loop but I am looking for a solution with lapply or dplyr.
I have multiple dataframes which I want to export to excel file in separate sheets (I saw many questions and answers on similar lines but couldn't find one that addressed naming sheets dynamically).
I want to name the sheet by the name of the dataframe. For simplicity, I have named the dataframes in a pattern (say df1 to df10). How do I do this?
Below is a reproducable example with my attempt with two dataframes mtcars and cars (which is working, but without good sheetnames).
names_of_dfs=c('mtcars','cars')
# variable 'combined' below will have all dfs separately stored in it
combined = lapply(as.list(names_of_dfs), get)
names(combined)=names_of_dfs # naming the list but unable to use below
multi.export=function(df,filename){
return(xlsx::write.xlsx(df,file = filename,
sheetName = paste0('sheet',sample(c(1:200),1)),
append = T))
}
lapply(combined, function(x) multi.export(x,filename='combined.xlsx'))
In case it can be done more easily with some other r package then please do suggest.
Here's an approach with writexl:
library(writexl)
write_xlsx(setNames(lapply(names_of_dfs,get),names_of_dfs),
path = "Test.xlsx")
We need to use setNames because the names of the sheets are set from the list names. Otherwise, the unnamed list that lapply returns will result in default sheet names.
Try something like this:
library(xlsx)
#Workbook
wb = createWorkbook()
#Lapply
lapply(names(combined), function(s) {
sht = createSheet(wb, s)
addDataFrame(combined[[s]], sht)
})
saveWorkbook(wb, "combined.xlsx")
Related
I have a folder of excel files that contain multiple sheets each. The sheets are named the same in each wb. I'm trying to import one specific named sheet for all excel files as separate data frames. I have been able to import them in; however, the names become df_1, df_2, df_3, etc... I've been trying to take the first word of the excel file name and use that to identify the df.
Example of Excel file Name "AAPL Multiple Sheets" the sheet would be named "Balance" I'm importing as a df. I would like "AAPL Balance df" as the result.
The code that came closest to what I'm looking for located below, however, it names each data frame as df_1, df_2, and so on.
library(purrr)
library(readxl)
files_list <- list.files(path = 'C:/Users/example/Drive/Desktop/Total_Related_Data/Analysis of Data/',
pattern = "*.xlsx",full.names = TRUE)
files_list %>%
walk2(1:length(files_list),
~ assign(paste0("df_", .y), read_excel(path = .x), envir = globalenv()))
I tried using the file path variable 'file_list' in the past0 function to label them and ended up with,
df_C:/Users/example/Drive/Desktop/Total_Related_Data/Analysis of Data/.xlsx1, df_C:/Users/example/Drive/Desktop/Total_Related_Data/Analysis of Data/.xlsx2,
and so on.
I tried to make a list of file names to use. This read the file names and created a list but I couldn't make it work with the code above.
files_Names<-list.files(path='C:/Users/example/Drive/Desktop/Total_Related_Data/Analysis of Data/', pattern=NULL, all.files=FALSE, full.names=FALSE)
Which resulted with this,
"AAPL Analysis of Data.xlsx" for all the files in the list.
You can do the following (note that I'm using the openxlsx package for reading in Excel files, but you can replace that part with readxl of course):
library(openxlsx)
library(tidyverse)
Starting with your `files_list` we can do:
# using lapply to read in all files and store them as list elements in one list
list_of_dfs <- lapply(as.list(files_list), function(x) readWorkbook(x, sheet = "Balance"))
# Create a vector of names based on the first word of the filename + "Balance"
# Note that we can't use empty space in object names, hence the underscore
df_names <- paste0(str_extract(basename(files_list), "[^ ]+"), "_Balance_df")
# Assign the names to our list of dfs
names(list_of_dfs) <- df_names
# Push the list elements (i.e. data frames) to the Global environment
# I highly recommend NOT doing this. I'd say in 99% of the cases it's better to continue working in the list structure or combine the individual dfs into one large df.
list2env(list_of_dfs, env = .GlobalEnv)
I hope I could reproduce your example without code. I would create a function to have more control for the new filename.
I would suggest:
library(purrr)
library(readxl)
library(openxlsx)
target_folder <- 'C:/Users/example/Drive/Desktop/Total_Related_Data/Analysis of Data'
files_list <- list.files(path = target_folder,
pattern = "*.xlsx", full.names = TRUE)
tease_out <- function(file) {
data <- read_excel(file, sheet = "Balance")
filename <- basename(file) %>% tools::file_path_sans_ext()
new_filename <- paste0(target_folder, "/", fileneame, "Balance df.xlsx")
write.xlsx(data, file = new_filename)
}
map(file_list, tease_out)
Let me know if it works. I assume you are just targeting for the sheet "Balance"?
I have a dataframe that contains the names of a bunch of .CSV files. It looks how it does in the snippet below:
What I'm trying to do is convert each of these .CSVs into a dataframe that appends the results of each. What I'm trying to do is create three different dataframes based on what's in the file names:
Create a dataframe with all results from .CSV files with -callers- in its file name
Create a dataframe with all results from .CSV files with -results in its filename
Create a dataframe with all results from .CSV files with -script_results- in its filename
The command to actually convert the .CSV file into a dataframe looks like this if I were using the first .CSV in the dataframe below:
data <- aws.s3::s3read_using(read.csv, object = "s3://abc-testtalk/08182020-testpilot-arizona-results-08-18-2020--08-18-2020-168701001.csv")
But what I'm trying to do is:
Iterate ALL the .csv files under Key using the s3read_using function
Put them in three separate dataframes based on the file names as listed above
Key
08182020-testpilot-arizona-results-08-18-2020--08-18-2020-168701001.csv
08182020-testpilot-arizona-results-08-18-2020--08-18-2020-606698088.csv
08182020-testpilot-arizona-script_results-08-18-2020--08-18-2020-114004469.csv
08182020-testpilot-arizona-script_results-08-18-2020--08-18-2020-450823767.csv
08182020-testpilot-iowa-callers-08-18-2020-374839084.csv
08182020-testpilot-maine-callers-08-18-2020-396935866.csv
08182020-testpilot-maine-results-08-18-2020--08-18-2020-990912614.csv
08182020-testpilot-maine-script_results-08-18-2020--08-18-2020-897037786.csv
08182020-testpilot-michigan-callers-08-18-2020-367670258.csv
08182020-testpilot-michigan-follow-ups-08-18-2020--08-18-2020-049435266.csv
08182020-testpilot-michigan-results-08-18-2020--08-18-2020-544974900.csv
08182020-testpilot-michigan-script_results-08-18-2020--08-18-2020-239089219.csv
08182020-testpilot-nevada-callers-08-18-2020-782329503.csv
08182020-testpilot-nevada-results-08-18-2020--08-18-2020-348644934.csv
08182020-testpilot-nevada-script_results-08-18-2020--08-18-2020-517037762.csv
08182020-testpilot-new-hampshire-callers-08-18-2020-134150800.csv
08182020-testpilot-north-carolina-callers-08-18-2020-739838755.csv
08182020-testpilot-pennsylvania-callers-08-18-2020-223839956.csv
08182020-testpilot-pennsylvania-results-08-18-2020--08-18-2020-747438886.csv
08182020-testpilot-pennsylvania-script_results-08-18-2020--08-18-2020-546894204.csv
08182020-testpilot-virginia-callers-08-18-2020-027531377.csv
08182020-testpilot-virginia-follow-ups-08-18-2020--08-18-2020-419338697.csv
08182020-testpilot-virginia-results-08-18-2020--08-18-2020-193170030.csv
Create 3 empty dataframes. You will probably also need to indicate column names matching column names from each of the file you want to append:
results <- data.frame()
script_results <- data.frame()
callers <- data.frame()
Then iterate over file_name and read it into data object. Conditionally on what pattern ("-results-", "-script_results-" or "-caller-" is contanied in the name of each file, it will be appended to the correct dataframe:
for (file in file_name) {
data <- aws.s3::s3read_using(read.csv, object = paste0("s3://abc-testtalk/", file))
if (grepl(file, "-results-")) { results <- rbind(results, data)}
if (grepl(file, "-script_results-")) { script_results <- rbind(script_results, data)}
if (grepl(file, "-callers-")) { callers <- rbind(callers, data)}
}
As an alternative to #JohnFranchak's recommendation for map_dfr (which likely works just fine), the method that I referenced in comments would look something like this:
alldat <- lapply(setNames(nm = dat$file_name),
function(obj) aws.s3::s3read_using(read.csv, object = obj))
callers <- do.call(rbind, alldat[grepl("-callers-", names(alldat))])
results <- do.call(rbind, alldat[grepl("-results-", names(alldat))])
script_results <- do.call(rbind, alldat[grepl("-script_results-", names(alldat))])
others <- do.call(rbind, alldat[!grepl("-(callers|results|script_results)-", names(alldat))])
The do.call(rbind, ...) part is analogous to dplyr::bind_rows and data.table::rbindlist in that it accepts a list of frames, and the result is a single frame. Some differences:
do.call(rbind, ...) really requires all columns to exist in all frames, in the same order. It's not hard to enforce this externally (e.g., adding missing columns, rearranging), but it's not automatic.
data.table::rbindlist will complain for the same conditions (missing columns or different order), but it has fill= and use.names= arguments that need to be set TRUE.
dplyr::bind_rows will fill and row-bind by-name by default, without message or warning. (I don't agree that a default of silence is good all of the time, but it is the simplest.)
Lastly, my use of setNames(nm=..) is merely to assign the filename to each object. This is not strictly necessary since we still have dat$file_name, but I've found that with two separate objects, it is feasible to accidentally change (delete, append, or reorder) one of them and not the other, so I prefer to keep the names and the objects (frames) perfectly tied together. These two calls are relatively the same in the resulting named-list:
lapply(setNames(nm = dat$file_name), ...)
sapply(dat$file_name, ..., simplify = FALSE)
I have sveral dataframes (mydf1, mydf2, mydf3 etc.). How can I export each dataframe to the separate Excel file so that the name of the file is the name of the dataframe (eg. mydf1.xlsx).
I've tried to put them in a list and do a loop as below. It nearly gives me what I want, but I don't know how to make R name the Excel files properly instead of 1.xlsx, 2.xlsx etc. Any ideas?
install.packages("writexl")
library(writexl)
list_of_dfs <- lapply(ls(pattern="mydf"), function(x) get(x))
for (i in c(1:length(list_of_dfs))){
write_xlsx(list_of_dfs[i], paste(i,".xlsx"))
}
Try the following.
use mget to get all df's in one go, no need for lapply;
the list of df's is a named list and the names can be used to assemble the filenames.
The code corrected is then:
library(writexl)
list_of_dfs <- mget(ls(pattern = "mydf"))
for(i in seq_along(list_of_dfs)){
filename <- paste0(names(list_of_dfs)[i], ".xlsx")
write_xlsx(list_of_dfs[[i]], filename)
}
I have a vector of file paths called dfs, and I want create a dataframe of those files and bind them together into one huge dataframe, so I did something like this :
for (df in dfs){
clean_df <- bind_rows(as.data.table(read.delim(df, header=T, sep="|")))
return(clean_df)
}
but only the last item in the dataframe is being returned. How do I fix this?
I'm not sure about your file format, so I'll take common .csv as an example. Replace the a * i part with actually reading all the different files, instead of just generating mockup data.
files = list()
for (i in 1:10) {
a = read.csv('test.csv', header = FALSE)
a = a * i
files[[i]] = a
}
full_frame = data.frame(data.table::rbindlist(files))
The problem is that you can only pass one file at a time to the function read.delim(). So the solution would be to use a function like lapply() to read in each file specified in your df.
Here's an example, and you can find other answers to your question here.
library(tidyverse)
df <- c("file1.txt","file2.txt")
all.files <- lapply(df,function(i){read.delim(i, header=T, sep="|")})
clean_df <- bind_rows(all.files)
(clean_df)
Note that you don't need the function return(), putting the clean_df in parenthesis prompts R to print the variable.
I am trying to read multiple excel files in a folder using the read.xlsx2 function. I only need to read a particular sheet titled 'Returns' or 'Prices'.
Is there a way I can give an 'OR' argument in the function and also skip a file if it contains neither of the sheets?
P.s.: Each file will have either a 'Returns' or a 'Prices' sheet or neither but not both so there cannot be a clash.
Thanks
You could read all the sheet names of the file and using intersect select one of 'Returns' or 'Prices' whichever is present in the sheet and read the excel file with that sheet.
Using readxl you can do this as :
library(readxl)
all_files <- list.files(pattern = '\\.xlsx$')
result <- lapply(all_files, function(x) {
all_sheets <- excel_sheets(x)
correct_sheet <- intersect(all_sheets, c('Returns', 'Prices'))
if(length(correct_sheet)) read_xlsx(x, correct_sheet)
})
result will have a list of dataframes. If you want to combine the data into one dataframe and if they have same column names you can use do.call(rbind, result)