Import Multiple Sheets into Multiple Data Frames in R - r

I have an Excel file with a lot of sheets and I need a code to import each sheet in a separate data frame which will be named in the same convention as the sheet name in Excel.
Example, tabs A, B, C will be imported as data frame A, B, and C respectively.
From other threads, I saw codes like:
length(excel_sheets(filename)) to get the number of sheets in the file
Then create a list that would contain each tab:
read_excel_allsheets <- function(filename) {
sheets <- readxl::excel_sheets(filename)
x <- lapply(sheets, function(X) readxl::read_excel(filename, sheet = X))
names(x) <- sheets
x
}
But I do not know how the tabs gets imported into R from there.
Would greatly appreciate the help.
Thanks in advance!

Here's one way to do it:
# write test data
tf <- writexl::write_xlsx(
list("the mtcars" = mtcars, "iris data" = iris),
tempfile(fileext = ".xlsx")
)
# read excel sheets
sheets <- readxl::excel_sheets(tf)
lst <- lapply(sheets, function(sheet)
readxl::read_excel(tf, sheet = sheet)
)
names(lst) <- sheets
# shove them into global environment
list2env(lst, envir = .GlobalEnv)

Your function reads in all the tabs and saves them as elements of a single list (because of lapply()). You can take the elements out of the list with list2env:
your_excel_list <- read_excel_allsheets("test.xlsx")
list2env(your_excel_list, .GlobalEnv)
You'll see that the named elements of your list are now data frames (or actually tbl_df) in your global environment

could read in one line.
should load magrittr and dplyr packages.
data <- lapply(list.files(pattern = "*.xlsx"),function(x) x=read_excel(x,sheet = "(sheetname)")) %>% bind_rows

Related

Read and save multiple sheet

I have found this code and I would like to save the different sheets in R from Excel, how can I change the code?
library(readxl)
multiplesheets <- function(fname) {
# getting info about all excel sheets
sheets <- readxl::excel_sheets(fname)
tibble <- lapply(sheets, function(x) readxl::read_excel(fname, sheet = x))
data_frame <- lapply(tibble, as.data.frame)
# assigning names to data frames
names(data_frame) <- sheets
# print data frame
print(data_frame)
}
# specifying the path name
path <- "/Users/mallikagupta/Desktop/Gfg.xlsx"
multiplesheets(path)
You can use purrr::walk2()
Since the output of multiplesheets is a list of sheets, you can use walk2() to walk across the list of sheets and a list of their names, saving each sheet by its name:
sheets <- multiplesheets(path)
filenames <- paste0(names(sheets), ".csv")
## install.packages("purrr") #(if you haven't already)
purrr::walk2(sheets, filenames, write.csv) # or try readr::write_csv() for nicer output

Reading xlsx with multiple sheets in R for duplication removal

I have a excel file which has multiple sheets embedded in it. My main goal is to basically remove all rows which are appearing multiple times in a single sheet and have to do this for every sheet.
I have written the code below but the code is only reading the first sheet and also giving ' ...' in first row and column. Can someone help me out where I might be going wrong. Thank you in advanced
**config_file_name <- '/RBIAPI3tables.xlsx'
config_xl <- paste(currentPath,config_file_name,sep="")
config_xl_sheets_name <- excel_sheets(path = config_xl) # An array of sheets is created. To access the array use config_xl_sheets[1]
count_of_xl_sheets <- length(config_xl_sheets_name)
# Read all sheets in the file as separate lists
list_all_sheets <- lapply(config_xl_sheets_name, function(x) read_excel(path = config_xl, sheet = x))
names (list_all_sheets) <- config_xl_sheets_name # Change the name of all the lists to excel file sheets name
count_of_list_all_sheets <- length(list_all_sheets) # to get the data frame of each list use list_all_sheets[[Config]]
# Create data frame for each sheet Assign the sheet name to the data frame
for (i in 1:count_of_list_all_sheets)
{
assign(x= trimws(config_xl_sheets_name[i]), value = data.frame(list_all_sheets[[i]]))
updateddata = unique(list_all_sheets[[i]])
}
write.xlsx(updateddata,"Unique3tables.xlsx",showNA = FALSE)**
this is my approach
library(readxl)
library(data.table)
library(openxlsx)
file.to.read <- "./testdata.xlsx"
sheets.to.read <- readxl::excel_sheets(file.to.read)
# read sheets from the file to a list and remove duplicate rows
L <- lapply(sheets.to.read, function(x) {
data <- setDT(readxl::read_excel(file.to.read, sheet = x))
#remove puplicates
data[!duplicated(data), ]
})
# create a new workbook
wb <- createWorkbook()
# create new worksheets an write to them
for (i in seq.int(L)) {
addWorksheet(wb, sheets.to.read[i])
writeData(wb, i, L[[i]] )
}
# write the workbook to disk
saveWorkbook(wb, "testdata_new.xlsx")

Export List of Lists as CSV into seperate Files or Excel Sheets

I have a list of lists mylists with data.table objects
x <- rep("example",5)
y <- 1:5
list1 <- list('a'= data.table(x,y),'b' = data.table(x,y))
list2 <- list('c'= data.table(x,y), 'd' = data.table(x,y))
mylists <- list('Output1'= list1,'Output2' =list2)
mylists
I want to export every object of every list as a separate CSV File (preferably using fwrite from data.table) named like the object i.e. Output1_a.csv
I cannot rbind to one data.frame/table as the data needs to be kept separate.
I've tried using
lapply(mylists,fwrite)
but have trouble producing separate files with different names.
Additionally how could I produce an xlsx File where all objects of mylists are stored into separate sheets named as described above.
I'd like to know both ways as this might be useful for the future.
for the xlsx version you could do something like this
x <- rep("example",5)
y <- 1:5
list1 <- list('a'= data.table(x,y),'b' = data.table(x,y))
list2 <- list('c'= data.table(x,y), 'd' = data.table(x,y))
mylists <- list('Output1'= list1,'Output2' =list2)
purrr::walk(names(mylists),
function(x){
writexl::write_xlsx(mylists[[x]],
path = paste0(x, ".xlsx"))
})
this will produce xlsx files with the names of the outer list and sheets with the name of the inner lists
for the csv would I do something like this to first flatten the list
mylists_flat <- unlist(mylists, recursive = FALSE)
walk(names(mylists_flat),
function(x){
write.csv(mylists_flat[[x]],
file = paste0(x, ".csv"))
})
this should produce csv files with the name <outerlist_name>.<innerlist_name>.csv

Assigning the filename and sheet name to (multiple) observations in R

I have written a function that, after giving the direction of the folder, takes all the excel files inside it and merges them into a data frame with some modest modifications.
Yet I have two small things I would like to add but struggle with:
Each file has a country code in the name, and I would like the function to create an additional column in the data frame, "Country", where each observation would be assigned such country code. name example: BGR_CO2_May12
Each file is composed of many sheets, with each sheet representing the year; these sheets are also called by these years. I would like the function to create another column, "Year", where each observation would be assigned the name of the sheet that it comes from.
Is there a neat way to do it? Possibly without modifying the current function?
multmerge_xls_TEST <- function(mypath2) {
library(dplyr)
library(readxl)
library(XLConnect)
library(XLConnectJars)
library(stringr)
# This function gets the list of files in a given folder
re_file <- ".+\\.xls.?"
testFiles <- list.files(path = mypath2,
pattern = re_file,
full.names = TRUE)
# This function rbinds in a single dataframe the content of multiple sheets in the same workbook
# (assuming that all the sheets have the same column types)
# It also removes the first sheet (no data there)
rbindAllSheets <- function(file) {
wb <- loadWorkbook(file)
removeSheet(wb, sheet = 1)
sheets <- getSheets(wb)
do.call(rbind,
lapply(sheets, function(sheet) {
readWorksheet(wb, sheet)
})
)
}
# Getting a single dataframe for all the Excel files and cutting out the unnecessary variables
result <- do.call(rbind, lapply(testFiles, rbindAllSheets))
result <- result[,c(1,2,31)]
Try making a wrapper around readWorksheet(). This would store the file name into the variable Country and the sheet name into Year. You would need to do some regex on the file though to get the code only.
You could also skip the wrapper and simply add the mutate() line within your current function. Note this uses the dplyr package, which you already have referenced.
read_worksheet <- function(sheet, wb, file) {
readWorksheet(wb, sheet) %>%
mutate(Country = file,
Year = sheet)
}
So then you could do something like this within the function you already have.
rbindAllSheets <- function(file) {
wb <- loadWorkbook(file)
removeSheet(wb, sheet = 1)
sheets <- getSheets(wb)
do.call(rbind,
lapply(sheets, read_worksheet, wb = wb, file = file)
)
}
As another note, bind_rows() is another dplyr function which can take the place of your do.call(rbind, ...) calls.
bind_rows(lapply(sheets, read_worksheet, wb = wb, file = file))

How to unlist a list to a specific level

The following function can be used to load an entire excel workbook
install.packages("xlsx")
library("xlsx")
library(readxl)
read_excel_allsheets <- function(filename)
{
sheets <- readxl::excel_sheets(filename)
x <- lapply(sheets, function(X) readxl::read_excel(filename, sheet = X))
names(x) <- sheets
x
}
mysheets <- read_excel_allsheets(choose.files())
I am trying to find a way to unlist all worksheets into the global environment as separate dataframes. Thusfar I have been accesing them one at a time using mysheets$, but this inefficient for the large workbooks am using.
I have tried
unlist(mysheets, recursive=F)
but it does not provide the desired result.
I don't see what is inefficient about using mysheets$. You could rename mysheets to a shorter name if that's what's bothering you. If you really want to move the dfs to the global environment, you could use this hack:
for (s in names(mysheets)) {
eval(parse(text = sprintf("%s <- mysheets[['%s']]", make.names(s), s)))
}
rm(mysheets)
Note that the dfs' names will be changed by make.names. If you'd like to keep the names change the line in the for statement to:
eval(parse(text = sprintf("`%s` <- mysheets[['%s']]", s, s)))

Resources