Reading all sheets in multiple excel files into R - r

I am trying to read a bunch of excel files, and all of the sheets from these files into R. I would like to then save each sheet as a separate data frame with the name of the data frame the same name as the name of the sheet. Some files only have 1 sheet, while others have more than one sheet so I'm not sure how to specify all sheets as opposed to just a number.
I have tried:
library(XLConnect)
files.list <- list.files(recursive=T,pattern='*.xlsx') #get files list from folder
for (i in 1:length(files.list)){
wb <- loadWorkbook(files.list[i])
sheet <- getSheets(wb, sheet = )
for (j in 1:length(sheet)){
tmp<-read.xlsx(files.list[i], sheetIndex=j,
sheetName=NULL,
as.data.frame=TRUE, header=F)
if (i==1&j==1) dataset<-tmp else dataset<-rbind(dataset,tmp)
}
}
and I get an error "could not find function "loadWorkbook"". At one point I resolved that issue and got an error "could not find function "getSheets"". I have had some issues getting this package to work so if anyone has a different alternative I would appreciate it!

You could try with readxl...
I've not tested this for the case of different workbooks with duplicate worksheet names.
There were a number of issues with your code:
the list.files pattern included a . which is a reserved character so needs to be escaped with \\
As #deschen pointed out the excel referring functions are from the openxlsx package
library(readxl)
files.list <- list.files(recursive = T, pattern = '*\\.xlsx$') #get files list from folder
for (i in seq_along(files.list)){
sheet_nm <- excel_sheets(files.list[i])
for (j in seq_along(sheet_nm)){
assign(x = sheet_nm[j], value = read_xlsx(path = files.list[i], sheet = sheet_nm[j]), envir = .GlobalEnv)
}
}
Created on 2022-01-31 by the reprex package (v2.0.1)

I'm pretty sure, the loadWorkbook function comes from package openxlsx. So you should use:
library(openxlsx)
https://cran.r-project.org/web/packages/openxlsx/openxlsx.pdf

Related

read.xlsx2 | Skipping if sheetName does not exist

I am trying to read multiple excel files in a folder using the read.xlsx2 function. I only need to read a particular sheet titled 'Returns' or 'Prices'.
Is there a way I can give an 'OR' argument in the function and also skip a file if it contains neither of the sheets?
P.s.: Each file will have either a 'Returns' or a 'Prices' sheet or neither but not both so there cannot be a clash.
Thanks
You could read all the sheet names of the file and using intersect select one of 'Returns' or 'Prices' whichever is present in the sheet and read the excel file with that sheet.
Using readxl you can do this as :
library(readxl)
all_files <- list.files(pattern = '\\.xlsx$')
result <- lapply(all_files, function(x) {
all_sheets <- excel_sheets(x)
correct_sheet <- intersect(all_sheets, c('Returns', 'Prices'))
if(length(correct_sheet)) read_xlsx(x, correct_sheet)
})
result will have a list of dataframes. If you want to combine the data into one dataframe and if they have same column names you can use do.call(rbind, result)

How to use R to copy a data range from one Excel spreadsheet to another?

I want to write an R script that copies the data range A5:X1000 from "WorksheetX" in "WorkbookX", and pastes values to the same range (A5:X1000) in "WorksheetY" in "WorkbookY". Both of the workbooks are in the same directory.
Is this possible?
The openxlsx package is, in my opinion, the best for writing to Excel files. There are many more options for reading from Excel files (such as the readxl package.)
I can't use this package on the machine I'm on at the moment, but this should work.
# NOT TESTED
library(openxlsx)
foo <- read.xlsx("WorkbookX.xlsx", sheet = "WorksheetX", rows = 5:1000, cols = 1:24)
write.xlsx("WorkbookY.xlsx", sheet = "WorksheetY", startRow = 5, startCol = 1)
You can write to multiple sheets with the xlsx package. You just need to use a different sheetName for each data frame and you need to add append=TRUE:
library(xlsx)
write.xlsx(dataframe1, file="filename.xlsx", sheetName="sheet1")
write.xlsx(dataframe2, file="filename.xlsx", sheetName="sheet2", append=TRUE)

Store contents of workbook to separate data.frames in a list

I am trying to read in an Excel workbook with an unknown number of sheets, and store each sheet as part of a variable (result[1] gives sheet 1, result[2] gives sheet 2, etc). I started trying to find a way to do it using the XLConnect package (which I could get to work correctly on Linux). I stopped when I realized I had broken almost every R convention there is.... Anyone have a better solution, using the XLConnect package?
require(XLConnect)
demoExcelFile <- system.file("demoFiles/multiregion.xlsx", package = "XLConnect")
endloop<<-F
x<<-1
result<<-NULL
while(!endloop){
result[x] <<- tryCatch({
readWorksheetFromFile(demoExcelFile,sheet=x)
x<<-x+1
}, error = function(e) {
endloop<<-T
})
}
Note: I'm open to using other packages, I just haven't been able to find another one that works reliably on 64 bit Linux Mint
Use the readxl package which has a function to list sheet names.
library(readxl)
library(purrr)
# get the sheet names
sheetnames <- excel_sheets("path/to/myfile.xlsx")
# loop through them and read each sheet into an item in a list.
# alternatively, use lapply() instead of map()
listofsheets <- map(sheetnames, ~ read_excel("path/to/myfile.xlsx", sheet = .x))
I would recommend using readxl from tidyverse. You could write something like:
library(readxl)
sheets <- excel_sheets("insert_filepath/workbook.xlsx")
data <- list()
for (i in 1:length(sheets)) {
data[[i]] <- read_excel("insert_filepath/workbook.xlsx", sheet = sheets[i])
}
Because I don't have your Excel file, I can't reproduce your data exactly. But this should be a general solution that finds all the sheet names in your Excel file and then loops through each sheet and reads them into a list called 'data'
require(XLConnect)
# Load workbook
wb <- loadWorkbook(system.file("demoFiles/multiregion.xlsx", package = "XLConnect"))
# Read all worksheets into a list of data.frames
listOfDfs <- readWorksheet(wb, sheet = getSheets(wb))

Working with excel and r2xcel in R

I created a function that takes an excel file and splits it into smaller files using r2excel package. Basically, the function reads an excel file which contains all the students in our district, and creates individual files for each teacher in a school (e.g. class list). It seems to work fine in one excel file, however, when I tested on a different one, it still produced some files, but then suddenly it stopped. My solution was to remove some of the rows that causes the problem, and then rerun the function again. But this is only a temporary solution.
Below is the error I received.
Error in .jnew("java/io/File", file) :
java.lang.NoSuchMethodError:
Here is my code:
df <- read.csv("bigfile.csv")
extract <- function(name){
temp_df <- subset(df, `Teacher Name` == name)
temp_df <- temp_df[order(temp_df$Class, temp_df$`Student Name`),]
wb <- createWorkbook(type="xlsx")
sheet <- createSheet(wb, sheetName = "Class List")
xlsx.addTable(wb, sheet, temp_df, fontColor="darkblue", row.names=FALSE, startCol=1,fontSize=11)
xlsx.addLineBreak(sheet,0)
filename <- paste(unique(temp_df$`School Name`), unique(temp_df$`Teacher Name`),sep=" ")
filename <- paste(filename, " 2D.xlsx", sep="")
saveWorkbook(wb, filename)
}
lapply(unique(df$`Teacher Name`), extract)
Can someone please explain to me what the error implies as I am not familiar with r2excel or java? Is there something wrong with my excel file or did I not implement r2excel correctly? I am using the latest R and Rstudio. Thank you

Read all worksheets in an Excel workbook into an R list with data.frames

I understand that XLConnect can be used to read an Excel worksheet into R. For example, this would read the first worksheet in a workbook called test.xls into R.
library(XLConnect)
readWorksheetFromFile('test.xls', sheet = 1)
I have an Excel Workbook with multiple worksheets.
How can all worksheets in a workbook be imported into a list in R where each element of the list is a data.frame for a given sheet, and where the name of each element corresponds to the name of the worksheet in Excel?
Updated answer using readxl (22nd June 2015)
Since posting this question the readxl package has been released. It supports both xls and xlsx format. Importantly, in contrast to other excel import packages, it works on Windows, Mac, and Linux without requiring installation of additional software.
So a function for importing all sheets in an Excel workbook would be:
library(readxl)
read_excel_allsheets <- function(filename, tibble = FALSE) {
# I prefer straight data.frames
# but if you like tidyverse tibbles (the default with read_excel)
# then just pass tibble = TRUE
sheets <- readxl::excel_sheets(filename)
x <- lapply(sheets, function(X) readxl::read_excel(filename, sheet = X))
if(!tibble) x <- lapply(x, as.data.frame)
names(x) <- sheets
x
}
This could be called with:
mysheets <- read_excel_allsheets("foo.xls")
Old Answer
Building on the answer provided by #mnel, here is a simple function that takes an Excel file as an argument and returns each sheet as a data.frame in a named list.
library(XLConnect)
importWorksheets <- function(filename) {
# filename: name of Excel file
workbook <- loadWorkbook(filename)
sheet_names <- getSheets(workbook)
names(sheet_names) <- sheet_names
sheet_list <- lapply(sheet_names, function(.sheet){
readWorksheet(object=workbook, .sheet)})
}
Thus, it could be called with:
importWorksheets('test.xls')
Note that most of XLConnect's functions are already vectorized. This means that you can read in all worksheets with one function call without having to do explicit vectorization:
require(XLConnect)
wb <- loadWorkbook(system.file("demoFiles/mtcars.xlsx", package = "XLConnect"))
lst = readWorksheet(wb, sheet = getSheets(wb))
With XLConnect 0.2-0 lst will already be a named list.
I stumbled across this old question and I think the easiest approach is still missing.
You can use rio to import all excel sheets with just one line of code.
library(rio)
data_list <- import_list("test.xls")
If you're a fan of the tidyverse, you can easily import them as tibbles by adding the setclass argument to the function call.
data_list <- import_list("test.xls", setclass = "tbl")
Suppose they have the same format, you could easily row bind them by setting the rbind argument to TRUE.
data_list <- import_list("test.xls", setclass = "tbl", rbind = TRUE)
From official readxl (tidyverse) documentation (changing first line):
path <- "data/datasets.xlsx"
path %>%
excel_sheets() %>%
set_names() %>%
map(read_excel, path = path)
Details at:
http://readxl.tidyverse.org/articles/articles/readxl-workflows.html#iterate-over-multiple-worksheets-in-a-workbook
Since this is the number one hit to the question: Read multi sheet excel to list:
here is the openxlsx solution:
filename <-"myFilePath"
sheets <- openxlsx::getSheetNames(filename)
SheetList <- lapply(sheets,openxlsx::read.xlsx,xlsxFile=filename)
names(SheetList) <- sheets
Adding to Paul's answer. The sheets can also be concatenated using something like this:
data = path %>%
excel_sheets() %>%
set_names() %>%
map_df(~ read_excel(path = path, sheet = .x), .id = "Sheet")
Libraries needed:
if(!require(pacman))install.packages("pacman")
pacman::p_load("tidyverse","readxl","purrr")
You can load the work book and then use lapply, getSheets and readWorksheet and do something like this.
wb.mtcars <- loadWorkbook(system.file("demoFiles/mtcars.xlsx",
package = "XLConnect"))
sheet_names <- getSheets(wb.mtcars)
names(sheet_names) <- sheet_names
sheet_list <- lapply(sheet_names, function(.sheet){
readWorksheet(object=wb.mtcars, .sheet)})
To read multiple sheets from a workbook, use readxl package as follows:
library(readxl)
library(dplyr)
final_dataFrame <- bind_rows(path_to_workbook %>%
excel_sheets() %>%
set_names() %>%
map(read_excel, path = path_to_workbook))
Here, bind_rows (dplyr) will put all data rows from all sheets
into one data frame, and path_to_workbook is the location of your data: "dir/of/the/data/workbook".
excel.link will do the job.
I actually found it easier to use compared to XLConnect (not that either package is that difficult to use). Learning curve for both was about 5 minutes.
As an aside, you can easily find all R packages that mention the word "Excel" by browsing to
http://cran.r-project.org/web/packages/available_packages_by_name.html
Just for simplifying the very useful response of #Jeromy Anglim:
allsheets <- sapply(readxl::excel_sheets("your_file.xlsx"), simplify = F, USE.NAMES = T,
function(X) readxl::read_excel("your_file.xlsx", sheet = X))
I tried the above and had issues with the amount of data that my 20MB Excel I needed to convert consisted of; therefore the above did not work for me.
After more research I stumbled upon openxlsx and this one finally did the trick (and fast)
Importing a big xlsx file into R?
https://cran.r-project.org/web/packages/openxlsx/openxlsx.pdf

Resources