Lapply with read_csv - r

I am having trouble understanding lapply with read_csv function. The question is if Lapply creates an array of dataframes where I can access each dataframe using data[i]?
What I did:
I have downloaded the 5 cities data set (found here: https://archive.ics.uci.edu/ml/machine-learning-databases/00394/FiveCitiePMData.rar) and wrote R code to extract the 5 csv files and save to a dataframe as follows:
cities <- list.files('FiveCities')
cities_df <- lapply(cities, read.csv)
My goal was to create a workbook and save each of the csv files into an xlsx file with each csv being a sheet in the workbook as follows:
wb <- createWorkbook()
for(i in 1:length(cities)){
sheet <- addWorksheet(wb , i)
writeData(wb, sheet, cities_df[i])
}
What I am confused on is accessing each csv like this cities_df[i]. I thought cities_df[i] accesses the ith row of the dataframe and not a separate dataframe as a whole. Does lapply create an array of dataframes called cities_df[i] or what happens? If it does create an array then how come I can simply call cities_df and receive a result without specifying which dataframe in the array to call?

Here is complete code to create the Excel workbook and save it to a file FiveCities/cities.xlsx.
cities <- list.files('FiveCities', full.names = TRUE)
cities_df <- lapply(cities, read.csv)
names(cities_df) <- sub("\\.csv", "", basename(cities))
wb <- createWorkbook()
for(i in names(cities_df)){
sheet <- addWorksheet(wb , i)
writeData(wb, i, cities_df[[i]])
}
saveWorkbook(wb, file = "FiveCities/cities.xlsx")

This code may help!
library(plyr)
library(readr)
library(tidyverse)
library(openxlsx)
mydir = "C:/Users/mouad/Desktop/assasins creed/new"
myfiles = list.files(path=mydir, pattern="*.csv", full.names=TRUE)
str_length(mydir)
mylist=lapply(1:5, function(j) read_csv(myfiles[[j]]))
setwd(mydir)
wb <- createWorkbook()
lapply(1:length(mylist), function(i){
addWorksheet(wb=wb, sheetName = substr(myfiles[i],str_length(mydir)+1,60))
writeData(wb, sheet = i, mylist[[i]][length(mylist[[i]])])
})
saveWorkbook(wb, "test.xlsx", overwrite = TRUE)
read.xlsx("test.xlsx", sheet = 1)

Related

Reading xlsx with multiple sheets in R for duplication removal

I have a excel file which has multiple sheets embedded in it. My main goal is to basically remove all rows which are appearing multiple times in a single sheet and have to do this for every sheet.
I have written the code below but the code is only reading the first sheet and also giving ' ...' in first row and column. Can someone help me out where I might be going wrong. Thank you in advanced
**config_file_name <- '/RBIAPI3tables.xlsx'
config_xl <- paste(currentPath,config_file_name,sep="")
config_xl_sheets_name <- excel_sheets(path = config_xl) # An array of sheets is created. To access the array use config_xl_sheets[1]
count_of_xl_sheets <- length(config_xl_sheets_name)
# Read all sheets in the file as separate lists
list_all_sheets <- lapply(config_xl_sheets_name, function(x) read_excel(path = config_xl, sheet = x))
names (list_all_sheets) <- config_xl_sheets_name # Change the name of all the lists to excel file sheets name
count_of_list_all_sheets <- length(list_all_sheets) # to get the data frame of each list use list_all_sheets[[Config]]
# Create data frame for each sheet Assign the sheet name to the data frame
for (i in 1:count_of_list_all_sheets)
{
assign(x= trimws(config_xl_sheets_name[i]), value = data.frame(list_all_sheets[[i]]))
updateddata = unique(list_all_sheets[[i]])
}
write.xlsx(updateddata,"Unique3tables.xlsx",showNA = FALSE)**
this is my approach
library(readxl)
library(data.table)
library(openxlsx)
file.to.read <- "./testdata.xlsx"
sheets.to.read <- readxl::excel_sheets(file.to.read)
# read sheets from the file to a list and remove duplicate rows
L <- lapply(sheets.to.read, function(x) {
data <- setDT(readxl::read_excel(file.to.read, sheet = x))
#remove puplicates
data[!duplicated(data), ]
})
# create a new workbook
wb <- createWorkbook()
# create new worksheets an write to them
for (i in seq.int(L)) {
addWorksheet(wb, sheets.to.read[i])
writeData(wb, i, L[[i]] )
}
# write the workbook to disk
saveWorkbook(wb, "testdata_new.xlsx")

R. Appending list to list and export each list to individual excel sheets in one excel workbook

I'm trying to understand why part of my appended list is getting chopped off when exporting to excel. I can separate a dataframe by a grouping variable into separate lists:
data(iris)
split_tibble <- function(tibble, col = 'col') tibble %>% split(., .[, col])
spliris = split_tibble(iris,'Species') #creates list for each species having all variables
I have a separate list that looks like this:
mylist = list(cbind(col1 = c("val1","val2","val3","val4"),col2 = c("A","B","C","D")))
names(mylist) = "Table"
And I combine them into one list:
newlist = c(mylist,spliris) #looks correct so far
And I write out to an excel wb
#Create workbook with a sheet for each list element
library(openxlsx)
wb <- createWorkbook()
lapply(seq_along(newlist), function(i){
addWorksheet(wb=wb, sheetName = names(newlist[i]))
writeData(wb, sheet = i, newlist[[i]][-length(newlist[[i]])])
})
But when I save the workbook, the first sheet "Table" is incomplete and the two columns are just a single column.
Why does this happen? If I do not append the lists together and just write out the iris-lists it works perfectly:
#This works
wb <- createWorkbook()
lapply(seq_along(spliris), function(i){
addWorksheet(wb=wb, sheetName = names(spliris[i]))
writeData(wb, sheet = i, spliris[[i]][-length(spliris[[i]])])
})

How to skip/handle an empty-sheet/No-sheet in R Excel Automation?

I am trying to merge/bind a huge set of data. The code for that is written and working fine. The problem comes while trying to combine data where there is no sheet present. Is there any ways by which I can skip the error?
library(tidyverse)
library(xlsx)
files <- list.files(pattern="*.xlsx")
read_Sheet_1 <- lapply(files, readxl::read_excel, sheet = "Sheet 1")
Sheet_1 = do.call(rbind, read_Sheet_1)
read_Sheet_2 <- lapply(files, readxl::read_excel, sheet = "Sheet 2")
Sheet_2 = do.call(rbind, read_Sheet_2)
read_Sheet_3 <- lapply(files, readxl::read_excel, sheet = "Sheet 3")
Sheet_3 = do.call(rbind, read_Sheet_3)
write.xlsx(as.data.frame(Sheet_1), file="Final.xlsx", sheetName="Sheet_1", row.names=FALSE)
write.xlsx(as.data.frame(Sheet_2), file="Final.xlsx", sheetName="Sheet_2", append=TRUE, row.names=FALSE)
write.xlsx(as.data.frame(Sheet_3), file="Final.xlsx", sheetName="Sheet_3", append=TRUE, row.names=FALSE)
Expected Result: Merged rows from each sheets in one Final.xlsx file
Actual Result: Even though the functions merges the rows together. Error comes if there is just 2 sheets in one of the file. Example: File3.xlsx has just "Sheet 1" and "Sheet 3" in it, not "Sheet 2". So this will throw an error for entire "Sheet_2" data frame.
You may use safely from the purrr package:
library(tidyverse)
library(xlsx)
files <- list.files(pattern="*.xlsx")
read_excel_safe <- function(file, sheet) {
read_excel_safely <- safely(readxl::read_excel, otherwise = NULL)
read_excel_safely(file, sheet = sheet)$result
}
Sheet_1 <- files %>%
map(.f = read_excel_safe, sheet = 'Sheet 1') %>%
reduce(rbind)
The function read_excel_safely will return a named list with error and results. If there is an error, result will be NULL. And it should affect the rbind when performing that reduce step.

Saving Excel files in loop in r

I have data frame and created a subset of it. I split the data frame and its
subset by a variable factors. I want to save it in excel file. I want to
write a loop to create multiple excel files data frame and subset files are
in sheets by a variable factor.
I had written a code its just saving the last kind of variable workbook.
How to create all the workbooks.
rm(list = ls())
mtcars
split_mtcars <- split(mtcars, mtcars$cyl)
split_mtcars_subset <- split(mtcars[,2:4], mtcars$cyl)
cyl_type <- names(split_mtcars)
for(i in length(cyl_type)){
wb <- createWorkbook()
addWorksheet(wb, "raw")
addWorksheet(wb, "subset")
writeData(wb, 1, split_mtcars[[i]])
writeData(wb, 2, split_mtcars_subset[[i]])
saveWorkbook(wb, file = paste0(cyl_type[i],".xlsx"), overwrite = TRUE)
}
Thanks In advance
Consider by to split your data frame by factor(s) to avoid the need of intermediate objects and hide the loop. Below outputs your workbook and builds a list of data frames.
split_mtcars <- by(mtcars, mtcars$cyl, function(sub) {
wb <- createWorkbook()
addWorksheet(wb, "raw")
addWorksheet(wb, "subset")
writeData(wb, 1, sub)
writeData(wb, 2, sub[,2:5])
saveWorkbook(wb, file = paste0(sub$cyl[1],".xlsx"), overwrite = TRUE)
return(sub) # TO REPLICATE split()
})

List of data.frame's to individual excel worksheets - R

I have a list of data.frame's that I would like to output to their own worksheets in excel. I can easily save a single data frame to it's own excel file but I'm not sure how to save multiple data frames to the their own worksheet within the same excel file.
library(xlsx)
write.xlsx(sortedTable[1], "c:/mydata.xlsx")
Specify sheet name for each list element.
library(xlsx)
file <- paste("usarrests.xlsx", sep = "")
write.xlsx(USArrests, file, sheetName = "Sheet1")
write.xlsx(USArrests, file, sheetName = "Sheet2", append = TRUE)
Second approach as suggested by #flodel, would be to use addDataFrame. This is more or less an example from the help page of the said function.
file <- paste("usarrests.xlsx", sep="")
wb <- createWorkbook()
sheet1 <- createSheet(wb, sheetName = "Sheet1")
sheet2 <- createSheet(wb, sheetName = "Sheet2")
addDataFrame(USArrests, sheet = sheet1)
addDataFrame(USArrests * 2, sheet = sheet2)
saveWorkbook(wb, file = file)
Assuming you have a list of data.frames and a list of sheet names, you can use them pair-wise.
wb <- createWorkbook()
datas <- list(USArrests, USArrests * 2)
sheetnames <- paste0("Sheet", seq_along(datas)) # or names(datas) if provided
sheets <- lapply(sheetnames, createSheet, wb = wb)
void <- Map(addDataFrame, datas, sheets)
saveWorkbook(wb, file = file)
Here's the solution with openxlsx:
## create data;
dataframes <- split(iris, iris$Species)
# create workbook
wb <- createWorkbook()
#Iterate the same way as PavoDive, slightly different (creating an anonymous function inside Map())
Map(function(data, nameofsheet){
addWorksheet(wb, nameofsheet)
writeData(wb, nameofsheet, data)
}, dataframes, names(dataframes))
## Save workbook to excel file
saveWorkbook(wb, file = "file.xlsx", overwrite = TRUE)
.. however, openxlsx is also able to use it's function openxlsx::write.xlsx for this, so you can just give the object with your list of dataframes and the filepath, and openxlsx is smart enough to create the list as sheets within the xlsx-file. The code I post here with Map() is if you want to format the sheets in a specific way.
The following code works perfectly, which is from:https://rpubs.com/gbganalyst/RdatatoExcelworkbook
packages <- c("openxlsx", "readxl", "magrittr", "purrr", "ggplot2")
if (!require(install.load)) {
install.packages("install.load")
}
install.load::install_load(packages)
list_of_mydata
write.xlsx(list_of_mydata, "Excel workbook.xlsx")
lets say your list of data frames is called Lst and that the workbook you want to save to is called wb.xlsx. Then you can use:
library(xlsx)
counter <- 1
for (i in length(Lst)){
write.xlsx(x=Lst[[i]],file="wb.xlsx",sheetName=paste("sheet",counter,sep=""),append=T)
counter <- counter + 1
}
G
I think the simplest solution is still missing. Using the writexl package you can write a list of data frames easily:
list_of_dfs <- list(iris, iris)
writexl::write_xlsx(list_of_dfs, "output.xlsx")
Also, if you have a named list then those names become the sheet names:
names(list_of_dfs) <- c("a", "b")
writexl::write_xlsx(list_of_dfs, "output.xlsx")
Alternatively, the rio package allows for more export control and the syntax and handling of named lists is similar:
rio::export(list_of_dfs, "output.xlsx")
You can also easily output them to their own workbooks as well.

Resources