How to extract sheet names from Excel file in R - r

I have loaded a workbook into R and read in the worksheets using xlConnect, but I was wondering if there was a way of extracting the names of the sheets perhaps in a vector?
So far my code is:
dataIn<-loadWorkbook(file.path(filenames[1],sep=""))
lst = readWorksheet(dataIn, sheet = getSheets(dataIn), startRow=1, startCol=1, header=TRUE)
...and I want to extract the sheet names of the sheets in lst.

Another really nice package developed by the folks at RStudio is readxl. It's easy to get the excel sheet names with the excel_sheets() function.
library(readxl)
path <- "path/to/your/file.xlsx"
excel_sheets(path = path)

You are looking for getSheets
Returns all worksheet names in a workbook.

In the "openxlsx" package it would be a command "getSheetNames":
library(openxlsx)
path <- "path/to/your/file.xlsx"
getSheetNames(path)

Related

How to convert xlsx files to csv files in RStudio? Need to convert multiple workbooks all with multiple spreadsheets

Trying to write an R script that will convert multiple xlsx workbook files within a folder while also converting the sheets within the workbook as separate csv files.
Looking for a single script to automatically apply code to all workbooks and their spreadsheets.
For reading Excel files, there are several packages.
I personally am happy with the xlsx package, which you can use to read Excel files, as well as their individual sheets. This article looks like it will give you the gist of it.
Each worksheet you read out you should then be able to export to CSV files by using R's built-in write.csv (or write.csv2) method.
Below is an example to convert a single xlsx workbook to multiple csv files.
Note that type conversions are not guaranteed to be correct.
xlsx_path <-"path_to_xlsx.xlsx"
sheet_names <- readxl::excel_sheets(xlsx_path)
# read from all sheets to a list of data frames
xlsx_data <- purrr::map(
sheet_names,
~readxl::read_excel(xlsx_path,.x,col_types = "text",col_names = FALSE)
)
# write a list of data frame to csv files
purrr::walk2(
xlsx_data,sheet_names,
~readr::write_csv(.x,paste0(xlsx_path,"-",.y,".csv"),col_names = FALSE)
)
# csv files will be saved as:
# path_to_xlsx-sheet1.xlsx, path_to_xlsx-sheet2.xlsx, ...
If you need to apply this function to many xlsx files. Use list.files() to get the path to all xlsx files. And write a for loop or use another map function to iterate this process.
If you are using Rstudio it is possible that you already have the package readxl installed. They have many workflows for common usecases explained here: https://readxl.tidyverse.org/articles/articles/readxl-workflows.html
They also provide this nice code snippet to do what you are asking for:
read_then_csv <- function(sheet, path) {
pathbase <- tools::file_path_sans_ext(basename(path))
df <- read_excel(path = path, sheet = sheet)
write.csv(df, paste0(pathbase, "-", sheet, ".csv"),
quote = FALSE, row.names = FALSE)
df
}
path <- readxl_example("datasets.xlsx")
sheets <- excel_sheets(path)
xl_list <- lapply(excel_sheets(path), read_then_csv, path = path)
names(xl_list) <- sheets
If you go to here and put "excel" and "xls" in the search bar, you 'll get a list of packages and functions which might help.

Extract the number of sheets from an Excel workbook in R (without XLConnect)

I'm relatively new to R (and programming).
I have an Excel workbook with 36 sheets, but suppose that I don't know how many sheets there are and I want my code to find that out for me. I have tried something like:
options(java.parameters = "-Xmx6g")
library(XLConnect)
myWorkbook <- loadWorkbook(filename)
numberofsheets <- length(getSheets(myWorkbook))
But even though I set my memory to 6GB I still run into memory errors with XLConnect, so I would like to use other packages (e.g. xlsx, openxlsx). Is there a way to find out the number of sheets in an Excel workbook without using XLConnect?
Thanks for your help.
Maybe try:
library( readxl )
length( excel_sheets( filename ) )
This should do exactly what you want.
gdata::sheetCount("your_path_here.xlsx")
Also, to list the sheet names as an array.
library(purrr)
library(readxl)
file <- 'your_path_here.xlsx'
sheets <- excel_sheets(file)

How do I modify an existing a sheet in an Excel Workbook using Openxlsx package in R?

I am using "openxlsx" package to read and write excel files. I have a fixed file with a sheet called "Data" which is used by formulas in other sheets. I want to update this Data sheet without touching the other.
I am trying the following code:
write.xlsx(x = Rev_4, file = "Revenue.xlsx", sheetName="Data")
But this erases the excel file and creates a new one with just the new data in the "Data" sheet while all else gets deleted. Any Advice?
Try this:
wb <- loadWorkbook("Revenue.xlsx")
writeData(wb, sheet = "Data", Rev_4, colNames = F)
saveWorkbook(wb,"Revenue.xlsx",overwrite = T)
You need to load the complete workbook, then modify its data and then save it to disk. With writeData you can also specify the starting row and column. And you could also modify other sections before saving to disk.
I've found this package. It depends on openxlsx and helps to insert many sheets on a xlsx file. Maybe it makes easier:
Package documentation
library(xlsx2dfs)
# However, be careful, the function xlsx2dfs assumes
# that all sheets contain simple tables. If that is not the case,
# use the accepted answer!
dfs <- xlsx2dfs("Revenue.xlsx") # all sheets of file as list of dfs
dfs["Data"] <- Rev_4 # replace df of sheet "Data" by updated df Rev_4
dfs2xlsx(dfs, "Revenue.xlsx") # this overwrites the existing file! cave!

Convert .xlsm to .xlsx in R

I would like to convert an Excel file (say it's name is "Jimmy") that is saved as a macro enabled workbook (Jimmy.xlsm) to Jimmy.xlsx.
I need this to be done in a coding environment. I cannot simply change this by opening the file in Excel and assigning a different file-type. I am currently programming in R. If I use the function
file.rename("Jimmy.xlsm", "Jimmy.xlsx")
the file becomes corrupted.
In your framework you have to read in the sheet and write it back out. Suppose you have an XLSM file (with macros, I presume) called "testXLSM2X.xlsm" containing one sheet with tabular columns of data. This will do the trick:
library(xlsx)
r <- read.xlsx("testXLSMtoX.xlsm", 1) # read the first sheet
# provides a data frame
# use the first column in the spreadsheet to create row names then delete that column from the data frame
# otherwise you will get an extra column of row index numbers in the first column
r2w<-data.frame(r[-1],row.names=r[,1])
w <- write.xlsx(r2w,"testXLSMtoX.xlsx") # write the sheet
The macros will be stripped out, of course.
That's an answer but I would question what you are trying to accomplish. In general it is easier to control R from Excel than Excel from R. I use REXCEL from http://rcom.univie.ac.at/, which is not open source but pretty robust.
Here is a function that converts XLSM files to XLSX files with the R package RDCOMClient :
convert_XLSM_File_To_XLSX <- function(path_XLSM_File, path_XLSX_File)
{
xlApp <- COMCreate("Excel.Application")
xlApp[['Visible']] <- FALSE
xlApp[["DisplayAlerts"]] <- FALSE
xlWbk <- xlApp$Workbooks()$Open(path_XLSM_File)
xlWbk$SaveAs(path_XLSX_File, 51)
xlWbk$Close()
xlApp$Quit()
}
library(RDCOMClient)
convert_XLSM_File_To_XLSX(path_XLSM_File, path_XLSX_File)

r - read.xlsx from .xlsx with unknown number of sheets

Suppose I have an excel file, which I would like to read to R with read.xlsx function. File consists of spreadsheets, number of which I do not know (there is like 200 of such files so manually checking number of sheets would be huge pain). Each spreadsheet is organized like a proper data frame.
I would like to have those spreadsheets one on top of another.
I write something like:
columnsILike <- c(1,40)
for(i in 1:numberOfSheets){
dfInd <- read.xlsx("myfile.xlsx", i, # number of sheet
colIndex=columnsILike, endRow=201, startRow=2,
header=F)
PreviousEmptyDataFrame <- rbind(PreviousEmptyDataFrame, dfInd)
}
write.csv(PreviousEmptyDataFrame, "data.csv")
Question is, how do I know number of sheets in advance?
getSheets(loadWorkbook("file_path")) in the XLSX package should return a list of the sheets in the workbook so you can get the length of the list to find the amount of sheets.
This answer is rather late, but wouldn't this be simpler?
gdata::sheetCount("myworkbook.xlsx")
You can also use package XLConnect if the workbook isn't too large.
library(XLConnect)
wb <- loadWorkbook("myworkbook.xlsx")
result <- do.call(rbind,lapply(getSheets(wb),
function(sheet)readWorksheet(wb,sheet)))

Resources