R Export several dataframes to differeny Excel files - r

I have sveral dataframes (mydf1, mydf2, mydf3 etc.). How can I export each dataframe to the separate Excel file so that the name of the file is the name of the dataframe (eg. mydf1.xlsx).
I've tried to put them in a list and do a loop as below. It nearly gives me what I want, but I don't know how to make R name the Excel files properly instead of 1.xlsx, 2.xlsx etc. Any ideas?
install.packages("writexl")
library(writexl)
list_of_dfs <- lapply(ls(pattern="mydf"), function(x) get(x))
for (i in c(1:length(list_of_dfs))){
write_xlsx(list_of_dfs[i], paste(i,".xlsx"))
}

Try the following.
use mget to get all df's in one go, no need for lapply;
the list of df's is a named list and the names can be used to assemble the filenames.
The code corrected is then:
library(writexl)
list_of_dfs <- mget(ls(pattern = "mydf"))
for(i in seq_along(list_of_dfs)){
filename <- paste0(names(list_of_dfs)[i], ".xlsx")
write_xlsx(list_of_dfs[[i]], filename)
}

Related

Read a folder of excel files and import individual sheets as separate df's in R with Names

I have a folder of excel files that contain multiple sheets each. The sheets are named the same in each wb. I'm trying to import one specific named sheet for all excel files as separate data frames. I have been able to import them in; however, the names become df_1, df_2, df_3, etc... I've been trying to take the first word of the excel file name and use that to identify the df.
Example of Excel file Name "AAPL Multiple Sheets" the sheet would be named "Balance" I'm importing as a df. I would like "AAPL Balance df" as the result.
The code that came closest to what I'm looking for located below, however, it names each data frame as df_1, df_2, and so on.
library(purrr)
library(readxl)
files_list <- list.files(path = 'C:/Users/example/Drive/Desktop/Total_Related_Data/Analysis of Data/',
pattern = "*.xlsx",full.names = TRUE)
files_list %>%
walk2(1:length(files_list),
~ assign(paste0("df_", .y), read_excel(path = .x), envir = globalenv()))
I tried using the file path variable 'file_list' in the past0 function to label them and ended up with,
df_C:/Users/example/Drive/Desktop/Total_Related_Data/Analysis of Data/.xlsx1, df_C:/Users/example/Drive/Desktop/Total_Related_Data/Analysis of Data/.xlsx2,
and so on.
I tried to make a list of file names to use. This read the file names and created a list but I couldn't make it work with the code above.
files_Names<-list.files(path='C:/Users/example/Drive/Desktop/Total_Related_Data/Analysis of Data/', pattern=NULL, all.files=FALSE, full.names=FALSE)
Which resulted with this,
"AAPL Analysis of Data.xlsx" for all the files in the list.
You can do the following (note that I'm using the openxlsx package for reading in Excel files, but you can replace that part with readxl of course):
library(openxlsx)
library(tidyverse)
Starting with your `files_list` we can do:
# using lapply to read in all files and store them as list elements in one list
list_of_dfs <- lapply(as.list(files_list), function(x) readWorkbook(x, sheet = "Balance"))
# Create a vector of names based on the first word of the filename + "Balance"
# Note that we can't use empty space in object names, hence the underscore
df_names <- paste0(str_extract(basename(files_list), "[^ ]+"), "_Balance_df")
# Assign the names to our list of dfs
names(list_of_dfs) <- df_names
# Push the list elements (i.e. data frames) to the Global environment
# I highly recommend NOT doing this. I'd say in 99% of the cases it's better to continue working in the list structure or combine the individual dfs into one large df.
list2env(list_of_dfs, env = .GlobalEnv)
I hope I could reproduce your example without code. I would create a function to have more control for the new filename.
I would suggest:
library(purrr)
library(readxl)
library(openxlsx)
target_folder <- 'C:/Users/example/Drive/Desktop/Total_Related_Data/Analysis of Data'
files_list <- list.files(path = target_folder,
pattern = "*.xlsx", full.names = TRUE)
tease_out <- function(file) {
data <- read_excel(file, sheet = "Balance")
filename <- basename(file) %>% tools::file_path_sans_ext()
new_filename <- paste0(target_folder, "/", fileneame, "Balance df.xlsx")
write.xlsx(data, file = new_filename)
}
map(file_list, tease_out)
Let me know if it works. I assume you are just targeting for the sheet "Balance"?

Best way to import multiple .csv files into separate data frames? lapply() [duplicate]

Suppose we have files file1.csv, file2.csv, ... , and file100.csv in directory C:\R\Data and we want to read them all into separate data frames (e.g. file1, file2, ... , and file100).
The reason for this is that, despite having similar names they have different file structures, so it is not that useful to have them in a list.
I could use lapply but that returns a single list containing 100 data frames. Instead I want these data frames in the Global Environment.
How do I read multiple files directly into the global environment? Or, alternatively, How do I unpack the contents of a list of data frames into it?
Thank you all for replying.
For completeness here is my final answer for loading any number of (tab) delimited files, in this case with 6 columns of data each where column 1 is characters, 2 is factor, and remainder numeric:
##Read files named xyz1111.csv, xyz2222.csv, etc.
filenames <- list.files(path="../Data/original_data",
pattern="xyz+.*csv")
##Create list of data frame names without the ".csv" part
names <-substr(filenames,1,7)
###Load all files
for(i in names){
filepath <- file.path("../Data/original_data/",paste(i,".csv",sep=""))
assign(i, read.delim(filepath,
colClasses=c("character","factor",rep("numeric",4)),
sep = "\t"))
}
Quick draft, untested:
Use list.files() aka dir() to dynamically generate your list of files.
This returns a vector, just run along the vector in a for loop.
Read the i-th file, then use assign() to place the content into a new variable file_i
That should do the trick for you.
Use assign with a character variable containing the desired name of your data frame.
for(i in 1:100)
{
oname = paste("file", i, sep="")
assign(oname, read.csv(paste(oname, ".txt", sep="")))
}
This answer is intended as a more useful complement to Hadley's answer.
While the OP specifically wanted each file read into their R workspace as a separate object, many other people naively landing on this question may think that that's what they want to do, when in fact they'd be better off reading the files into a single list of data frames.
So for the record, here's how you might do that.
#If the path is different than your working directory
# you'll need to set full.names = TRUE to get the full
# paths.
my_files <- list.files("path/to/files")
#Further arguments to read.csv can be passed in ...
all_csv <- lapply(my_files,read.csv,...)
#Set the name of each list element to its
# respective file name. Note full.names = FALSE to
# get only the file names, not the full path.
names(all_csv) <- gsub(".csv","",
list.files("path/to/files",full.names = FALSE),
fixed = TRUE)
Now any of the files can be referred to by my_files[["filename"]], which really isn't much worse that just having separate filename variables in your workspace, and often it is much more convenient.
Here is a way to unpack a list of data.frames using just lapply
filenames <- list.files(path="../Data/original_data",
pattern="xyz+.*csv")
filelist <- lappy(filenames, read.csv)
#if necessary, assign names to data.frames
names(filelist) <- c("one","two","three")
#note the invisible function keeps lapply from spitting out the data.frames to the console
invisible(lapply(names(filelist), function(x) assign(x,filelist[[x]],envir=.GlobalEnv)))
Reading all the CSV files from a folder and creating vactors same as the file names:
setwd("your path to folder where CSVs are")
filenames <- gsub("\\.csv$","", list.files(pattern="\\.csv$"))
for(i in filenames){
assign(i, read.csv(paste(i, ".csv", sep="")))
}
A simple way to access the elements of a list from the global environment is to attach the list. Note that this actually creates a new environment on the search path and copies the elements of your list into it, so you may want to remove the original list after attaching to prevent having two potentially different copies floating around.
I want to update the answer given by Joran:
#If the path is different than your working directory
# you'll need to set full.names = TRUE to get the full
# paths.
my_files <- list.files(path="set your directory here", full.names=TRUE)
#full.names=TRUE is important to be added here
#Further arguments to read.csv can be passed in ...
all_csv <- lapply(my_files, read.csv)
#Set the name of each list element to its
# respective file name. Note full.names = FALSE to
# get only the file names, not the full path.
names(all_csv) <- gsub(".csv","",list.files("copy and paste your directory here",full.names = FALSE),fixed = TRUE)
#Now you can create a dataset based on each filename
df <- as.data.frame(all_csv$nameofyourfilename)
a simplified version, assuming your csv files are in the working directory:
listcsv <- list.files(pattern= "*.csv") #creates list from csv files
names <- substr(listcsv,1,nchar(listcsv)-4) #creates list of file names, no .csv
for (k in 1:length(listcsv)){
assign(names[[k]] , read.csv(listcsv[k]))
}
#cycles through the names and assigns each relevant dataframe using read.csv
#copy all the files you want to read in R in your working directory
a <- dir()
#using lapply to remove the".csv" from the filename
for(i in a){
list1 <- lapply(a, function(x) gsub(".csv","",x))
}
#Final step
for(i in list1){
filepath <- file.path("../Data/original_data/..",paste(i,".csv",sep=""))
assign(i, read.csv(filepath))
}
Use list.files and map_dfr to read many csv files
df <- list.files(data_folder, full.names = TRUE) %>%
map_dfr(read_csv)
Reproducible example
First write sample csv files to a temporary directory.
It's more complicated than I thought it would be.
library(dplyr)
library(purrr)
library(purrrlyr)
library(readr)
data_folder <- file.path(tempdir(), "iris")
dir.create(data_folder)
iris %>%
# Keep the Species column in the output
# Create a new column that will be used as the grouping variable
mutate(species_group = Species) %>%
group_by(species_group) %>%
nest() %>%
by_row(~write.csv(.$data,
file = file.path(data_folder, paste0(.$species_group, ".csv")),
row.names = FALSE))
Read these csv files into one data frame.
Note the Species column has to be present in the csv files, otherwise we would loose that information.
iris_csv <- list.files(data_folder, full.names = TRUE) %>%
map_dfr(read_csv)

loop through .rds files from directory and convert to data frame

I am trying to loop through a list containing the file_path to all .rds files in a folder.
I can easily load one .rds file and then convert it to a data frame as shown below. However, the issue is how to first load all files and subsequently convert the rds. files to separate dataframes. I suspect a for-loop using file_path as input is necessary.
user_x <- readRDS("/Users/marcoliedecke/Desktop/thesis/data/VK_Data/75315975_VK_user.rds") # load data
user_x_df <- as.data.frame(user_x) # convert to dataframe
file_path <- list.files(".../VK_Data", pattern="*.rds", full.names=TRUE)
print(file_path)
[1] "/Users/marcoliedecke/Desktop/thesis/data/VK_Data/103656622_VK_user.rds"
[2] "/Users/marcoliedecke/Desktop/thesis/data/VK_Data/11226063_VK_user.rds"
[3] "/Users/marcoliedecke/Desktop/thesis/data/VK_Data/112552215_VK_user.rds"
(...)
Yes you have the right idea. You can make a for-loop to do this. Since you have your vector of file paths, you could do something like I do below. Note that I don't know how you want your data frames named, but this will read everything in.
library(tidyverse)
## counting how many files in your directory
file_number <- 1
for (file in file_path) {
name <- paste0("df_", file_number) ## getting a new name ready
x <- readRDS(file) ## reading in the file
assign(name, x) ## assigning the data the name we created two lines up
file_number <- file_number + 1
}
This gives you as many data frames named df_* as there are files in your file_path vector.
You could then append all of them together (assuming they have the same column names and column types) using this:
full_data <- mget(ls(pattern = "^df_")) %>%
reduce(bind_rows)
In the above code, the ls(pattern = "^df_") line returns a list of all of the data frames in your global environment that start with "df". The mget function will grab these for you, and the reduce(bind_rows) should append them all together.

Iterating over CSVs to different dataframes based on file names

I have a dataframe that contains the names of a bunch of .CSV files. It looks how it does in the snippet below:
What I'm trying to do is convert each of these .CSVs into a dataframe that appends the results of each. What I'm trying to do is create three different dataframes based on what's in the file names:
Create a dataframe with all results from .CSV files with -callers- in its file name
Create a dataframe with all results from .CSV files with -results in its filename
Create a dataframe with all results from .CSV files with -script_results- in its filename
The command to actually convert the .CSV file into a dataframe looks like this if I were using the first .CSV in the dataframe below:
data <- aws.s3::s3read_using(read.csv, object = "s3://abc-testtalk/08182020-testpilot-arizona-results-08-18-2020--08-18-2020-168701001.csv")
But what I'm trying to do is:
Iterate ALL the .csv files under Key using the s3read_using function
Put them in three separate dataframes based on the file names as listed above
Key
08182020-testpilot-arizona-results-08-18-2020--08-18-2020-168701001.csv
08182020-testpilot-arizona-results-08-18-2020--08-18-2020-606698088.csv
08182020-testpilot-arizona-script_results-08-18-2020--08-18-2020-114004469.csv
08182020-testpilot-arizona-script_results-08-18-2020--08-18-2020-450823767.csv
08182020-testpilot-iowa-callers-08-18-2020-374839084.csv
08182020-testpilot-maine-callers-08-18-2020-396935866.csv
08182020-testpilot-maine-results-08-18-2020--08-18-2020-990912614.csv
08182020-testpilot-maine-script_results-08-18-2020--08-18-2020-897037786.csv
08182020-testpilot-michigan-callers-08-18-2020-367670258.csv
08182020-testpilot-michigan-follow-ups-08-18-2020--08-18-2020-049435266.csv
08182020-testpilot-michigan-results-08-18-2020--08-18-2020-544974900.csv
08182020-testpilot-michigan-script_results-08-18-2020--08-18-2020-239089219.csv
08182020-testpilot-nevada-callers-08-18-2020-782329503.csv
08182020-testpilot-nevada-results-08-18-2020--08-18-2020-348644934.csv
08182020-testpilot-nevada-script_results-08-18-2020--08-18-2020-517037762.csv
08182020-testpilot-new-hampshire-callers-08-18-2020-134150800.csv
08182020-testpilot-north-carolina-callers-08-18-2020-739838755.csv
08182020-testpilot-pennsylvania-callers-08-18-2020-223839956.csv
08182020-testpilot-pennsylvania-results-08-18-2020--08-18-2020-747438886.csv
08182020-testpilot-pennsylvania-script_results-08-18-2020--08-18-2020-546894204.csv
08182020-testpilot-virginia-callers-08-18-2020-027531377.csv
08182020-testpilot-virginia-follow-ups-08-18-2020--08-18-2020-419338697.csv
08182020-testpilot-virginia-results-08-18-2020--08-18-2020-193170030.csv
Create 3 empty dataframes. You will probably also need to indicate column names matching column names from each of the file you want to append:
results <- data.frame()
script_results <- data.frame()
callers <- data.frame()
Then iterate over file_name and read it into data object. Conditionally on what pattern ("-results-", "-script_results-" or "-caller-" is contanied in the name of each file, it will be appended to the correct dataframe:
for (file in file_name) {
data <- aws.s3::s3read_using(read.csv, object = paste0("s3://abc-testtalk/", file))
if (grepl(file, "-results-")) { results <- rbind(results, data)}
if (grepl(file, "-script_results-")) { script_results <- rbind(script_results, data)}
if (grepl(file, "-callers-")) { callers <- rbind(callers, data)}
}
As an alternative to #JohnFranchak's recommendation for map_dfr (which likely works just fine), the method that I referenced in comments would look something like this:
alldat <- lapply(setNames(nm = dat$file_name),
function(obj) aws.s3::s3read_using(read.csv, object = obj))
callers <- do.call(rbind, alldat[grepl("-callers-", names(alldat))])
results <- do.call(rbind, alldat[grepl("-results-", names(alldat))])
script_results <- do.call(rbind, alldat[grepl("-script_results-", names(alldat))])
others <- do.call(rbind, alldat[!grepl("-(callers|results|script_results)-", names(alldat))])
The do.call(rbind, ...) part is analogous to dplyr::bind_rows and data.table::rbindlist in that it accepts a list of frames, and the result is a single frame. Some differences:
do.call(rbind, ...) really requires all columns to exist in all frames, in the same order. It's not hard to enforce this externally (e.g., adding missing columns, rearranging), but it's not automatic.
data.table::rbindlist will complain for the same conditions (missing columns or different order), but it has fill= and use.names= arguments that need to be set TRUE.
dplyr::bind_rows will fill and row-bind by-name by default, without message or warning. (I don't agree that a default of silence is good all of the time, but it is the simplest.)
Lastly, my use of setNames(nm=..) is merely to assign the filename to each object. This is not strictly necessary since we still have dat$file_name, but I've found that with two separate objects, it is feasible to accidentally change (delete, append, or reorder) one of them and not the other, so I prefer to keep the names and the objects (frames) perfectly tied together. These two calls are relatively the same in the resulting named-list:
lapply(setNames(nm = dat$file_name), ...)
sapply(dat$file_name, ..., simplify = FALSE)

Use R to write multiple sheets in excel with dynamic sheetNames

This can easily be done using for loop but I am looking for a solution with lapply or dplyr.
I have multiple dataframes which I want to export to excel file in separate sheets (I saw many questions and answers on similar lines but couldn't find one that addressed naming sheets dynamically).
I want to name the sheet by the name of the dataframe. For simplicity, I have named the dataframes in a pattern (say df1 to df10). How do I do this?
Below is a reproducable example with my attempt with two dataframes mtcars and cars (which is working, but without good sheetnames).
names_of_dfs=c('mtcars','cars')
# variable 'combined' below will have all dfs separately stored in it
combined = lapply(as.list(names_of_dfs), get)
names(combined)=names_of_dfs # naming the list but unable to use below
multi.export=function(df,filename){
return(xlsx::write.xlsx(df,file = filename,
sheetName = paste0('sheet',sample(c(1:200),1)),
append = T))
}
lapply(combined, function(x) multi.export(x,filename='combined.xlsx'))
In case it can be done more easily with some other r package then please do suggest.
Here's an approach with writexl:
library(writexl)
write_xlsx(setNames(lapply(names_of_dfs,get),names_of_dfs),
path = "Test.xlsx")
We need to use setNames because the names of the sheets are set from the list names. Otherwise, the unnamed list that lapply returns will result in default sheet names.
Try something like this:
library(xlsx)
#Workbook
wb = createWorkbook()
#Lapply
lapply(names(combined), function(s) {
sht = createSheet(wb, s)
addDataFrame(combined[[s]], sht)
})
saveWorkbook(wb, "combined.xlsx")

Resources