How to append some data into existing xlsx sheet R? - r

I´m trying to read a xlsx file and write one data frame into the same sheet of xlsx without remove the other rows of my data frame.
I tried with the library("XLConnect") and with the function appendWorksheet() but the data is not writing in the correct place, and with the library("xlsx") but I can´t find a function similar to appendWorksheet.
I just want to read my xlsx file and write my data that is into a data frame and write into the same xlsx file without removing the previous rows.

There doesn't seem to be a great way to append data to an xlsx file. You can create a function which will read in the sheet and append data to the dataframe and then overwrite the xlsx.
library(xslx)
appendxlsx <- function(df2, path, sheetName) {
df1 <- read.xlsx2(path, sheetName = sheetName)
colnames(df2) <- colnames(df1)
df <- rbind(df1, df2)
write.xlsx2(df, path, sheetName = sheetName)
}
Then you would just call the function supplying the updated dataframe (df2), the path to the xlsx, and the sheet name. Something like this:
appendxlsx(df2=df2, "/path/to/xlsx", sheetName = "Sheet1")

Related

Merging excelsheets (12 excel sheets with each excel sheet has more than 140k rows) into 1 data frame then exporting it to CSV /txt

I am a newbie and need help with manipulating the data I have.
I have an excel workbook with 12 excel sheet with each sheet having approximately 140k rows.
Is it possible to turn them all into 1 via R and then export the file to csv or txt please?
Thank you
Tried using reasxl and tidy verse
Path<- "C:/data"
Setwd(path)
Sheet=excel_sheets("df.xlsx")
Data= lapply(setNames(sheet, sheet), function(x) read_excel("df.xlsx", sheet=x))
Data=bind_rows(Data, id="sheet")
Lapply(Data, function(x) write.table(Data(x), 'data0.csv', append =T, sep= ','))
And I still don't get a one file having all the data sheets combined.
Here's a simple approach using data.table and openxlsx. I'm not sure about the full structure of your data but you can easily perform other operations when reading in the data (if needed) before combining it all and writing to an output file.
library(data.table)
library(openxlsx)
file <- 'my_file.xlsx' #full path and name of your file
sheet_names <- getSheetNames(file = file)
# loop through sheetnames to read data
data_list <- lapply(sheet_names, function(z){
dat <- as.data.table(read.xlsx(xlsxFile = file, sheet = z))
dat$sheet <- z #added to check which sheet the data was retrieved from
# other operations could be added here, e.g. any sheets that contain
# "raw" in the name need addl. calculations
return(dat)
})
# bind all data together
data_combined <- rbindlist(l = data_list, use.names = T, fill = T)
# write to a csv file - xlsx might exceed max. allowable rows
fwrite(x = data_combined, file = 'new_file_name.csv')

read.xlsx2 | Skipping if sheetName does not exist

I am trying to read multiple excel files in a folder using the read.xlsx2 function. I only need to read a particular sheet titled 'Returns' or 'Prices'.
Is there a way I can give an 'OR' argument in the function and also skip a file if it contains neither of the sheets?
P.s.: Each file will have either a 'Returns' or a 'Prices' sheet or neither but not both so there cannot be a clash.
Thanks
You could read all the sheet names of the file and using intersect select one of 'Returns' or 'Prices' whichever is present in the sheet and read the excel file with that sheet.
Using readxl you can do this as :
library(readxl)
all_files <- list.files(pattern = '\\.xlsx$')
result <- lapply(all_files, function(x) {
all_sheets <- excel_sheets(x)
correct_sheet <- intersect(all_sheets, c('Returns', 'Prices'))
if(length(correct_sheet)) read_xlsx(x, correct_sheet)
})
result will have a list of dataframes. If you want to combine the data into one dataframe and if they have same column names you can use do.call(rbind, result)

Export data frames in list to xlsx with named sheets

I need to create an xlsx file with my list of data frames. I came across this solution openxlsx solution, see below (or 5th Answer by Syed). However, my list has 51 named data frames, what changes do I make to below for a long list of data frames? Because my created excel file is not opening.
require(openxlsx)
list_of_datasets <- list("Name of DataSheet1" = dataframe1, "Name of Datasheet2" = dataframe2)
write.xlsx(list_of_datasets, file = "writeXLSX2.xlsx")
I tried to run write.xlsx(listname,file="") , the command ran successfully and created an xlsx file, but while opening it, it throws up an error "microsoft excel unable to open file because it is corrupt". I tried to shorten length of the vector down to 1-2, but it won't open.
EDIT: SOLVED BY HACK for now
#extract all data frames
list2env(soup ,.GlobalEnv)
#reassign names and form new list
list_of_datasets1 <- list("filename"=dataframe,.....)
#write new list
write.xlsx(list_of_datasets1, file = "template.xlsx")
To create a workbook with multiple named worksheets, one must use createWorkbook(), addWorksheet(), writeDataTable(), and saveWorkbook() (in this order) instead of write.xlsx(). Here is an example that generates worksheets based on a list of data frames that I create with random data.
library(openxlsx)
id <- 1:5
# create data frames
aList <- lapply(id,function(x){
# generate output as list so we can use id as index to worksheets
list( data.frame(matrix(runif(50),nrow=10,ncol=5)),x)
})
# initialize a workbook
wb <- createWorkbook("Workbook")
# add worksheets to workbook
lapply(aList,function(x){
addWorksheet(wb,paste("worksheet",x[[2]]))
writeDataTable(wb,paste("worksheet",x[[2]]),x[[1]])
})
# save workbook to disk once all worksheets and data have been added
saveWorkbook(wb,file="./data/newWorkbook.xlsx")
...and the output, noting that there are 5 tabs in the workbook.
You can also simply use append to create new sheets and name them as you want.
write.xlsx(datatable, file = "File.xlsx", sheetName = "sheet1", row.names = FALSE)

Writing data on formatted xlsx sheets using XLConnect

I can use the split and then the below to split my dataframe by column values (DF$Name) and pop them out into individual CSV files.
DFNames<- split(DF, DF$Name)
DF= read.csv("~/Downloads/DataDownload_2012.csv")
for (name in levels(DF$Name)){
tmp=subset(DF,Name==name)
fn=paste('Expenses/',gsub(' ','',name),sep='', ".CSV")
write.csv(tmp,fn,row.names=FALSE)
}
But, I would love to pop them out into pre-formatted excel files. I can use XLConnect and the below to pop out my dataframe into an excel file, but i cant do it to multiple files based on the column variable.
DataFormatted<- loadWorkbook("Income.xlsx")
xldf = readWorksheet(DataFormatted, sheet = getSheets(DataFormatted)[1])
sheet_name <- "Data"
renameSheet(wb, sheet = getSheets(wb)[1], newName = sheet_name)
writeWorksheet(wb,xldf,sheet=getSheets(wb)[1],startRow=2,header=F)
saveWorkbook(wb,'income_data.xlsx')
Help?

How do I append data from a data frame in R to an Excel sheet that already exists

I have created dozens of data frames in R and would like to append them all to one sheet in an Excel file.
Here are two of the pages I have looked at in an attempt to find an answer (I don't have 10 reputations so I can't paste all four webpage urls I have visited):
Write data to Excel file using R package xlsx
The author says: "You can also add the dataframes to a particular starting place in the sheet using the startRow and startCol arguments to the addDataFrame function."
Here is the suggested code:
workbook.sheets workbook.test addDataFrame(x = sample.dataframe, sheet = workbook.test,
row.names = FALSE, startColumn = 4) # write data to sheet starting on line 1, column 4
saveWorkbook(workbook.sheets, "test.excelfile.xlsx") # and of course you need to save it.
Based on this suggestion, this was my attempt in RStudio:
addDataFrame(df_fl1, sheet = "AllData2.xlsx", startRow = 712)
This was R's output:
Error in sheet$getWorkbook : $ operator is invalid for atomic vectors
I've also tried this page:
Tutorial on Reading and Importing Excel Files into R
"If, however, you want to write the data frame to a file that already exists, you can execute the following command:"
write.xlsx(df,
"<name and extension of your existing file>",
sheetName="Data Frame"
append=TRUE)
write.xlsx(df_fl3, "AllData2.xlsx", sheetName="Salinity1", append=TRUE)
I tried this code and it overwrote the data that was already in the sheet. How can I append data from the data frames into an Excel sheet?
Appending to an existing Excel worksheet is a bit of a pain. Instead, read all of your Excel data files into R, combine them within R, and then write the single combined data frame to a new Excel file (or write to a csv file if you don't need the data to be in an Excel workbook). See code below for both the easy way and the hard way.
Easy Way: Do all the work in R and save a single combined data frame at the end
For example, if all of your Excel data files are in the current working directory and the first worksheet in each Excel file contains the data, you could do the following:
library(xlsx)
# Get file names
file.names = list.files(pattern="xlsx$")
# Read them into a list
df.list = lapply(file.names, read.xlsx, sheetIndex=1, header=TRUE)
Then combine them into a single data frame and write to disk:
df = do.call(rbind, df.list)
write.xlsx(df, "combinedData.xlsx", sheetName="data", row.names=FALSE)
Hard Way: Append successive data frames to a pre-existing Excel worksheet
Create a list of data frames that we want to write to Excel (as discussed above, in your actual use case, you'll read your data files into a list in R). We'll use the built-in iris data frame for illustration here:
df.list = split(iris, iris$Species)
To write each data frame to a single Excel worksheet, first, create an Excel workbook and the worksheet where we want to write the data:
wb = createWorkbook()
sheet = createSheet(wb, "data")
# Add the first data frame
addDataFrame(df.list[[1]], sheet=sheet, row.names=FALSE, startRow=1)
Now append all of the remaining data frames using a loop. Increment startRow each time so that the next data frame is written in the correct location.
startRow = nrow(df.list[[1]]) + 2
for (i in 2:length(df.list)) {
addDataFrame(df.list[[i]], sheet=sheet, row.names=FALSE, col.names=FALSE,
startRow=startRow)
startRow = startRow + nrow(df.list[[i]])
}
Save the workbook:
saveWorkbook(wb, "combinedData.xlsx")
addDataFrame is useful if you want to layout various summary tables in various parts of an Excel worksheet and make it all look nice for presentation. However, if you're just combining raw data into a single data file, I think it's a lot easier to do all the work in R and then just write the combined data frame to an Excel worksheet (or csv file) at the end.
To get around the original error that you mentioned:
Error in sheet$getWorkbook : $ operator is invalid for atomic vectors
You can try this:
wb <- loadWorkbook("<name and extension of your existing file>")
addDataFrame(df,getSheets(wb)$<sheetname>, startRow = 712)
saveWorkbook(wb, <filename>)

Resources