Import tabbed spreadsheet into list in R - r

My data exists as a tabbed spreadsheet, and I'm trying to write a script to import it.
library(readxl)
oput <- 0
tabnames <- excel_sheets("dataset.xlsx")
for(x in seq_along(tabnames)){
assign(tabnames[x], read_excel("dataset.xlsx", sheet = tabnames[x], col_names = TRUE)
}
This works, giving me multiple datasheets in the environment:
tab1
tab2
...
What I would like to do is have these outputs as items in a list:
>oput
$tab1
[1] data1
$tab2
[1] data2
...
But I can't get this working properly
assign(oput[[x]], read_excel("dataset.xlsx", sheet = tabnames[x], col_names = TRUE)
and
assign(oput$x, read_excel("dataset.xlsx", sheet = tabnames[x], col_names = TRUE)
both give:
Error in assign(oput[[x]], read_excel("dataset.xlsx", :
invalid first argument
It's obviously an error on my part in identifying the sheetname variable.
What's the correct way of doing this, please?

Found previously on SO with some slightly different search terms. Apologies for the duplicate post.
How to read all worksheets in an Excel Workbook into an R list with data.frame elements using XLConnect?

Related

Select specific data frames from global environment [duplicate]

I am surprised to find that there is no easy way to export multiple data.frame to multiple worksheets of an Excel file? I tried xlsx package, seems it can only write to one sheet (override old sheet); I also tried WriteXLS package, but it gives me error all the time...
My code structure is like this: by design, for each iteration, the output dataframe (tempTable) and the sheetName (sn) got updated and exported into one tab.
for (i in 2 : ncol(code)){
...
tempTable <- ...
sn <- ...
WriteXLS("tempTable", ExcelFileName = "C:/R_code/../file.xlsx",
SheetNames = sn);
}
I can export to several cvs files, but there has to be an easy way to do that in Excel, right?
You can write to multiple sheets with the xlsx package. You just need to use a different sheetName for each data frame and you need to add append=TRUE:
library(xlsx)
write.xlsx(dataframe1, file="filename.xlsx", sheetName="sheet1", row.names=FALSE)
write.xlsx(dataframe2, file="filename.xlsx", sheetName="sheet2", append=TRUE, row.names=FALSE)
Another option, one that gives you more control over formatting and where the data frame is placed, is to do everything within R/xlsx code and then save the workbook at the end. For example:
wb = createWorkbook()
sheet = createSheet(wb, "Sheet 1")
addDataFrame(dataframe1, sheet=sheet, startColumn=1, row.names=FALSE)
addDataFrame(dataframe2, sheet=sheet, startColumn=10, row.names=FALSE)
sheet = createSheet(wb, "Sheet 2")
addDataFrame(dataframe3, sheet=sheet, startColumn=1, row.names=FALSE)
saveWorkbook(wb, "My_File.xlsx")
In case you might find it useful, here are some interesting helper functions that make it easier to add formatting, metadata, and other features to spreadsheets using xlsx:
http://www.sthda.com/english/wiki/r2excel-read-write-and-format-easily-excel-files-using-r-software
You can also use the openxlsx library to export multiple datasets to multiple sheets in a single workbook.The advantage of openxlsx over xlsx is that openxlsx removes the dependencies on java libraries.
Write a list of data.frames to individual worksheets using list names as worksheet names.
require(openxlsx)
list_of_datasets <- list("Name of DataSheet1" = dataframe1, "Name of Datasheet2" = dataframe2)
write.xlsx(list_of_datasets, file = "writeXLSX2.xlsx")
There's a new library in town, from rOpenSci: writexl
Portable, light-weight data frame to xlsx exporter based on
libxlsxwriter. No Java or Excel required
I found it better and faster than the above suggestions (working with the dev version):
library(writexl)
sheets <- list("sheet1Name" = sheet1, "sheet2Name" = sheet2) #assume sheet1 and sheet2 are data frames
write_xlsx(sheets, "path/to/location")
Many good answers here, but some of them are a little dated. If you want to add further worksheets to a single file then this is the approach I find works for me. For clarity, here is the workflow for openxlsx version 4.0
# Create a blank workbook
OUT <- createWorkbook()
# Add some sheets to the workbook
addWorksheet(OUT, "Sheet 1 Name")
addWorksheet(OUT, "Sheet 2 Name")
# Write the data to the sheets
writeData(OUT, sheet = "Sheet 1 Name", x = dataframe1)
writeData(OUT, sheet = "Sheet 2 Name", x = dataframe2)
# Export the file
saveWorkbook(OUT, "My output file.xlsx")
EDIT
I've now trialled a few other answers, and I actually really like #Syed's. It doesn't exploit all the functionality of openxlsx but if you want a quick-and-easy export method then that's probably the most straightforward.
I'm not familiar with the package WriteXLS; I generally use XLConnect:
library(XLConnect)
##
newWB <- loadWorkbook(
filename="F:/TempDir/tempwb.xlsx",
create=TRUE)
##
for(i in 1:10){
wsName <- paste0("newsheet",i)
createSheet(
newWB,
name=wsName)
##
writeWorksheet(
newWB,
data=data.frame(
X=1:10,
Dataframe=paste0("DF ",i)),
sheet=wsName,
header=TRUE,
rownames=NULL)
}
saveWorkbook(newWB)
This can certainly be vectorized, as #joran noted above, but just for the sake of generating dynamic sheet names quickly, I used a for loop to demonstrate.
I used the create=TRUE argument in loadWorkbook since I was creating a new .xlsx file, but if your file already exists then you don't have to specify this, as the default value is FALSE.
Here are a few screenshots of the created workbook:
Incase data size is small, R has many packages and functions which can be utilized as per your requirement.
write.xlsx, write.xlsx2, XLconnect also do the work but these are sometimes slow as compare to openxlsx.
So, if you are dealing with the large data sets and came across java errors. I would suggest to have a look of "openxlsx" which is really awesome and reduce the time to 1/12th.
I've tested all and finally i was really impressed with the performance of openxlsx capabilities.
Here are the steps for writing multiple datasets into multiple sheets.
install.packages("openxlsx")
library("openxlsx")
start.time <- Sys.time()
# Creating large data frame
x <- as.data.frame(matrix(1:4000000,200000,20))
y <- as.data.frame(matrix(1:4000000,200000,20))
z <- as.data.frame(matrix(1:4000000,200000,20))
# Creating a workbook
wb <- createWorkbook("Example.xlsx")
Sys.setenv("R_ZIPCMD" = "C:/Rtools/bin/zip.exe") ## path to zip.exe
Sys.setenv("R_ZIPCMD" = "C:/Rtools/bin/zip.exe") has to be static as it takes reference of some utility from Rtools.
Note: Incase Rtools is not installed on your system, please install it first for smooth experience. here is the link for your reference: (choose appropriate version)
https://cran.r-project.org/bin/windows/Rtools/
check the options as per link below (need to select all the check box while installation)
https://cloud.githubusercontent.com/assets/7400673/12230758/99fb2202-b8a6-11e5-82e6-836159440831.png
# Adding a worksheets : parameters for addWorksheet are 1. Workbook Name 2. Sheet Name
addWorksheet(wb, "Sheet 1")
addWorksheet(wb, "Sheet 2")
addWorksheet(wb, "Sheet 3")
# Writing data in to respetive sheets: parameters for writeData are 1. Workbook Name 2. Sheet index/ sheet name 3. dataframe name
writeData(wb, 1, x)
# incase you would like to write sheet with filter available for ease of access you can pass the parameter withFilter = TRUE in writeData function.
writeData(wb, 2, x = y, withFilter = TRUE)
## Similarly writeDataTable is another way for representing your data with table formatting:
writeDataTable(wb, 3, z)
saveWorkbook(wb, file = "Example.xlsx", overwrite = TRUE)
end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken
openxlsx package is really good for reading and writing huge data from/ in excel files and has lots of options for custom formatting within excel.
The interesting fact is that we dont have to bother about java heap memory here.
I had this exact problem and I solved it this way:
library(openxlsx) # loads library and doesn't require Java installed
your_df_list <- c("df1", "df2", ..., "dfn")
for(name in your_df_list){
write.xlsx(x = get(name),
file = "your_spreadsheet_name.xlsx",
sheetName = name)
}
That way you won't have to create a very long list manually if you have tons of dataframes to write to Excel.
I regularly use the packaged rio for exporting of all kinds. Using rio, you can input a list, naming each tab and specifying the dataset. rio compiles other in/out packages, and for export to Excel, uses openxlsx.
library(rio)
filename <- "C:/R_code/../file.xlsx"
export(list(sn1 = tempTable1, sn2 = tempTable2, sn3 = tempTable3), filename)
tidy way of taking one dataframe and writing sheets by groups:
library(tidyverse)
library(xlsx)
mtcars %>%
mutate(cyl1 = cyl) %>%
group_by(cyl1) %>%
nest() %>%
ungroup() %>%
mutate(rn = row_number(),
app = rn != 1,
q = pmap(list(rn,data,app),~write.xlsx(..2,"test1.xlsx",as.character(..1),append = ..3)))
For me, WriteXLS provides the functionality you are looking for. Since you did not specify which errors it returns, I show you an example:
Example
library(WriteXLS)
x <- list(sheet_a = data.frame(a=letters), sheet_b = data.frame(b = LETTERS))
WriteXLS(x, "test.xlsx", names(x))
Explanation
If x is:
a list of data frames, each one is written to a single sheet
a character vector (of R objects), each object is written to a single sheet
something else, then see also what the help states:
More on usage
?WriteXLS
shows:
`x`: A character vector or factor containing the names of one or
more R data frames; A character vector or factor containing
the name of a single list which contains one or more R data
frames; a single list object of one or more data frames; a
single data frame object.
Solution
For your example, you would need to collect all data.frames in a list during the loop, and use WriteXLS after the loop has finished.
Session info
R 3.2.4
WriteXLS 4.0.0
I do it in this way for openxlsx using following function
mywritexlsx<-function(fname="temp.xlsx",sheetname="Sheet1",data,
startCol = 1, startRow = 1, colNames = TRUE, rowNames = FALSE)
{
if(! file.exists(fname))
wb = createWorkbook()
else
wb <- loadWorkbook(file =fname)
sheet = addWorksheet(wb, sheetname)
writeData(wb,sheet,data,startCol = startCol, startRow = startRow,
colNames = colNames, rowNames = rowNames)
saveWorkbook(wb, fname,overwrite = TRUE)
}
I do this all the time, all I do is
WriteXLS::WriteXLS(
all.dataframes,
ExcelFileName = xl.filename,
AdjWidth = T,
AutoFilter = T,
FreezeRow = 1,
FreezeCol = 2,
BoldHeaderRow = T,
verbose = F,
na = '0'
)
and all those data frames come from here
all.dataframes <- vector()
for (obj.iter in all.objects) {
obj.name <- obj.iter
obj.iter <- get(obj.iter)
if (class(obj.iter) == 'data.frame') {
all.dataframes <- c(all.dataframes, obj.name)
}
obviously sapply routine would be better here
for a lapply-friendly version..
library(data.table)
library(xlsx)
path2txtlist <- your.list.of.txt.files
wb <- createWorkbook()
lapply(seq_along(path2txtlist), function (j) {
sheet <- createSheet(wb, paste("sheetname", j))
addDataFrame(fread(path2txtlist[j]), sheet=sheet, startColumn=1, row.names=FALSE)
})
saveWorkbook(wb, "My_File.xlsx")

Read specific sheets from an excel file list

I need to read specific sheets from a list of excel files.
I have >500 excel files, ".xls" and ".xlsx".
Each file can have different sheets, but I just want to read each sheets containing a specific expresion, like pattern = "^Abc", and not all files have sheets with this pattern.
I've created a code to read one file, but when I try to translate to multiple files, allways returns an error.
# example with 3rd file
# 2 sheets have the pattern
list_excels <- list.files(path = "path_to_folder", pattern = ".xls*"
sheet_names <- excel_sheets(list_excels[[3]])
list_sheets <- lapply(excel_sheets(list_excels[[3]]), read_excel, path = list_excels[[3]])
names(list_sheets) <- sheet_names
do.call("rbind", list_sheets[grepl(pattern = "^Abc", sheet names)])
But when I try to code for read multiple excels files, I have an error or something in the loop that slows a lot the computation.
There are some examples
This is a loop that doesn't return an error, but takes 30 seconds at least for each element of the list, I've never waited to finishing .
for (i in seq_along(list_excels)) {
sheet_names <- excel_sheets(list_excels[[i]])
list_sheets <- lapply(excel_sheets(list_excels[[i]]), read_excel, path = list_excels[[i]])
names(list_sheets) <- sheet_names[i] list_sheets[grepl(pattern = "^Abc", sheet_names)]
}
In this loop is missing the final part, the merging sheets with this code
list_sheets[grepl(pattern = "^Abc", sheet_names)]
I've tried to sum the rows of each sheet and store it in an vector, but I think that the loop is broken when there is a sheet that doesn't have the pattern.
x <- c()
for(i in seq_along(list_excels)) {
x[i] <- nrow(do.call("rbind",
lapply(excel_sheets(list_excels[[i]]),
read_excel,
path = list_excels[[i]])[grepl(pattern = "^Abc",
excel_sheets(list_excels[[i]]))]))
Also with purrr library, trying to read all, the same result with first loop example.
list_test <- list()
for(i in seq_along(list_excels)) {
list_test[[i]] <- excel_sheets(list_excels[[i]]) %>%
set_names() %>%
map(read_excel, path = list_excels[[i]])
}
Last example, that works with one excel file, but not with multiple. Just reading named sheet.
# One file works
data.frame(readWorksheetFromFile(list_excels[[1]], sheet = "Abc"))
#Multiple file returns an error
for(i in seq_along(list_excels)) {
data.frame(readWorksheetFromFile(list_excels[[i]], sheet = "Abc"))
#Returns the following error
#Error: IllegalArgumentException (Java): Sheet index (-1) is out of range (0..1)
Some one could help me?

Import data from excel but get warning messages

I import data from excel and I have multiple excel so I read at one time.
Here is my code:
library(readxl)
library(data.table)
file.list <- dir(path = "path/", pattern='\\.xlsx', full.names = T)
df.list <- lapply(file.list, read_excel)
data <- rbindlist(df.list)
However, I get this warning messages between df.list <- lapply(file.list, read_excel) and data <- rbindlist(df.list).
Warning messages:
1: In read_xlsx_(path, sheet, col_names = col_names, col_types = col_types, :
[3083, 9]: expecting date: got '2015/07/19'
2: In read_xlsx_(path, sheet, col_names = col_names, col_types = col_types, :
[3084, 9]: expecting date: got '2015/07/20'
What's going on? How can I check and correct?
According to my comment I submit this as an answer. Have you looked into your excel sheet at the respective lines? to me it seems that there is something going on there. maybe you have an empty cell before or after these lines, some space or anything like that... or the format of your date is different in these ones from what is in the other cells.
It is not an elegant solution but use the parameter guess_max = "number of lines in your data file"; this eliminates the warnings and the side effects.

Weird behavior lapplying XLConnect functions to list of workbooks

Part 1:
I am trying to rename worksheets in a list of workbooks using lapply and XLConnect (I need to rename them for the next part of the code to run properly, more on this in part 2):
library(XLConnect)
# testWB.xlsx contains a blank worksheet called Sheet1
testWB <- rep(lapply("testWB.xlsx", loadWorkbook), 3)
lapply(1:length(testWB), function(x) {
renameSheet(testWB[[x]], "Sheet1", "test1")
})
Gives me the error:
`Error: IllegalArgumentException (Java): Sheet index (-1) is out of range (0..0)`
But:
renameSheet(testWB[[1]], "Sheet1", "test1")
Renames the sheet as it is supposed to. It is weird, renameSheet does NOT work with lapply, but getActiveSheetIndex does work with lapply.
unlist(lapply(1:length(testWB), function(x) {
getActiveSheetIndex(testWB[[x]])
}))
[1] 1 1 1
I've tested other XLConnect functions and some work in lapply and others do not.
Part 2:
I need to rename sheets to get the writeWorksheet function to work. E.g.:
cell_data <- c("Larry", "Curly", "Moe")
unlist(lapply(1:length(testWB), function(x) {
writeWorksheet(testWB[[x]], cell_data[x], sheet = "sheet1", header = F)
readWorksheet(testWB[[x]], "Sheet1", header = F)
}))
Col1 Col1 Col1
"Larry" "Curly" "Moe"
But looking at testWB after running the above loop:
unlist(lapply(1:length(testWB), function(x) {
readWorksheet(testWB[[x]], "Sheet1", header = F)
}))
Col1 Col1 Col1
"Moe" "Moe" "Moe"
As you can see this ends up inputting Moe into all of the A1 cells of each sheet in each workbook instead of Larry, Curly, Moe in A1 cell of each respective workbook. If the workbook sheets have different names (e.g., Sheet1, Sheet2, Sheet3) then it works properly. Hence my issue in part one.
Since I have not gotten this to work, I have had to reconstruct testWB.xlsx templates in R and reapply formatting. testWB.xlsx in reality is quite a nuanced excel form so recreating it is not ideal.
I hope I am just missing something small and thank you in advance for any suggestions.
XLConnect version 0.2-12

Using XLConnect to write data to excel template gives empty cell error

I am using the R package XLConnect to write data frames to an existing excel worksheet in an existing workbook. The excel workbook has a worksheet for the raw data, which I populate using writeWorksheet() in R, and another worksheet for the formatted data which references the raw data worksheet and performs calculations. When I write my data to the raw data worksheet in R, however, the formatted data worksheet does not update and gives an error that "Formula refers to empty cell", even though those cells have data in them. I'm uncertain if this error is due to R and XLConnect or something in my workbook. When I simply copy and paste my data directly into the cells in my Raw Data Import worksheet I do not receive an error. Please see example below, and thank you for your help:
In R:
library(XLConnect)
# Creating first data frame
L3 <- LETTERS[1:3]
fac <- sample(L3, 10, replace = TRUE)
(d <- data.frame(x = 1, y = 1:10, fac = fac))
df.1<-data.frame(1, 1:10, sample(L3, 10, replace = TRUE))
# Creating second data frame
L4 <- LETTERS[4:7]
fac <- sample(L4, 10, replace = TRUE)
(d <- data.frame(x = 1, y = 1:10, fac = fac))
df.2<-data.frame(1, 1:10, sample(L4, 10, replace = TRUE))
# Reading in workbook
wb <- loadWorkbook(xlname)
wbnames <- as.vector(getSheets(wb)) # where wbnames is of length two
[1] "Raw Data Import" [2] "Formatted Data"
# Writing df.1 and df.2 to specific locations in Raw Data Import worksheet
writeWorksheet(wb,df.1,sheet=wbnames[1],startRow=3,header=F)
writeWorksheet(wb,df.2,sheet=wbnames[1],startRow=15,header=F)
# Saving workbook
saveWorkbook(wb)
You can use the XLConnect function setForceFormulaRecalculation(). This forces Excel to recalculate formula values upon opening the worksheet. The second argument allows you to specify a sheet to recalculate. If it is set to "*", it will recalculate all of the formulas in the workbook.
wb <- loadWorkbook(xlname)
wbnames <- as.vector(getSheets(wb))
writeWorksheet(wb,df.1,sheet=wbnames[1],startRow=3,header=F)
writeWorksheet(wb,df.2,sheet=wbnames[1],startRow=15,header=F)
setForceFormulaRecalculation(wb,"*",TRUE)
saveWorkbook(wb,'~/test.xls')
I had no success until I added a createSheet operation for each sheet. If you want to use existing sheet information, then look at getSheets and ?getSheetPos.
> wb <- loadWorkbook('~/test.xls')
> wbnames <- as.vector(getSheets(wb))
> createSheet(wb, "test1"); createSheet(wb, 'test2')
> writeWorksheet(wb,df.1,sheet='test1',startRow=3,header=F)
> writeWorksheet(wb,df.2,sheet='test2',startRow=15,header=F)
> saveWorkbook(wb,'~/test.xls')
When I later ran your code I saw that both dataframes got written to Sheet1. If I interposed a saveWorkbook-operation I got the data on different sheets:
> writeWorksheet(wb,df.1,sheet='Sheet1',startRow=3,header=F); saveWorkbook(wb,'~/test.xls')
> writeWorksheet(wb,df.2,sheet='Sheet2',startRow=15,header=F)
> saveWorkbook(wb,'~/test.xls')
XLConnect is a quasi-commercial product and I don't think they encourage questions to SO, so you might want to contact Mirai Solutions, GMBH.

Resources