Reading xlsx with multiple sheets in R for duplication removal

Reading xlsx with multiple sheets in R for duplication removal - r

I have a excel file which has multiple sheets embedded in it. My main goal is to basically remove all rows which are appearing multiple times in a single sheet and have to do this for every sheet.
I have written the code below but the code is only reading the first sheet and also giving ' ...' in first row and column. Can someone help me out where I might be going wrong. Thank you in advanced
**config_file_name <- '/RBIAPI3tables.xlsx'
config_xl <- paste(currentPath,config_file_name,sep="")
config_xl_sheets_name <- excel_sheets(path = config_xl) # An array of sheets is created. To access the array use config_xl_sheets[1]
count_of_xl_sheets <- length(config_xl_sheets_name)
# Read all sheets in the file as separate lists
list_all_sheets <- lapply(config_xl_sheets_name, function(x) read_excel(path = config_xl, sheet = x))
names (list_all_sheets) <- config_xl_sheets_name # Change the name of all the lists to excel file sheets name
count_of_list_all_sheets <- length(list_all_sheets) # to get the data frame of each list use list_all_sheets[[Config]]
# Create data frame for each sheet Assign the sheet name to the data frame
for (i in 1:count_of_list_all_sheets)
{
assign(x= trimws(config_xl_sheets_name[i]), value = data.frame(list_all_sheets[[i]]))
updateddata = unique(list_all_sheets[[i]])
}
write.xlsx(updateddata,"Unique3tables.xlsx",showNA = FALSE)**

this is my approach
library(readxl)
library(data.table)
library(openxlsx)
file.to.read <- "./testdata.xlsx"
sheets.to.read <- readxl::excel_sheets(file.to.read)
# read sheets from the file to a list and remove duplicate rows
L <- lapply(sheets.to.read, function(x) {
data <- setDT(readxl::read_excel(file.to.read, sheet = x))
#remove puplicates
data[!duplicated(data), ]
})
# create a new workbook
wb <- createWorkbook()
# create new worksheets an write to them
for (i in seq.int(L)) {
addWorksheet(wb, sheets.to.read[i])
writeData(wb, i, L[[i]] )
}
# write the workbook to disk
saveWorkbook(wb, "testdata_new.xlsx")

Related

Read and save multiple sheet

I have found this code and I would like to save the different sheets in R from Excel, how can I change the code?
library(readxl)
multiplesheets <- function(fname) {
# getting info about all excel sheets
sheets <- readxl::excel_sheets(fname)
tibble <- lapply(sheets, function(x) readxl::read_excel(fname, sheet = x))
data_frame <- lapply(tibble, as.data.frame)
# assigning names to data frames
names(data_frame) <- sheets
# print data frame
print(data_frame)
}
# specifying the path name
path <- "/Users/mallikagupta/Desktop/Gfg.xlsx"
multiplesheets(path)

You can use purrr::walk2()
Since the output of multiplesheets is a list of sheets, you can use walk2() to walk across the list of sheets and a list of their names, saving each sheet by its name:
sheets <- multiplesheets(path)
filenames <- paste0(names(sheets), ".csv")
## install.packages("purrr") #(if you haven't already)
purrr::walk2(sheets, filenames, write.csv) # or try readr::write_csv() for nicer output

Assigning the filename and sheet name to (multiple) observations in R

I have written a function that, after giving the direction of the folder, takes all the excel files inside it and merges them into a data frame with some modest modifications.
Yet I have two small things I would like to add but struggle with:
Each file has a country code in the name, and I would like the function to create an additional column in the data frame, "Country", where each observation would be assigned such country code. name example: BGR_CO2_May12
Each file is composed of many sheets, with each sheet representing the year; these sheets are also called by these years. I would like the function to create another column, "Year", where each observation would be assigned the name of the sheet that it comes from.
Is there a neat way to do it? Possibly without modifying the current function?
multmerge_xls_TEST <- function(mypath2) {
library(dplyr)
library(readxl)
library(XLConnect)
library(XLConnectJars)
library(stringr)
# This function gets the list of files in a given folder
re_file <- ".+\\.xls.?"
testFiles <- list.files(path = mypath2,
pattern = re_file,
full.names = TRUE)
# This function rbinds in a single dataframe the content of multiple sheets in the same workbook
# (assuming that all the sheets have the same column types)
# It also removes the first sheet (no data there)
rbindAllSheets <- function(file) {
wb <- loadWorkbook(file)
removeSheet(wb, sheet = 1)
sheets <- getSheets(wb)
do.call(rbind,
lapply(sheets, function(sheet) {
readWorksheet(wb, sheet)
})
)
}
# Getting a single dataframe for all the Excel files and cutting out the unnecessary variables
result <- do.call(rbind, lapply(testFiles, rbindAllSheets))
result <- result[,c(1,2,31)]

Try making a wrapper around readWorksheet(). This would store the file name into the variable Country and the sheet name into Year. You would need to do some regex on the file though to get the code only.
You could also skip the wrapper and simply add the mutate() line within your current function. Note this uses the dplyr package, which you already have referenced.
read_worksheet <- function(sheet, wb, file) {
readWorksheet(wb, sheet) %>%
mutate(Country = file,
Year = sheet)
}
So then you could do something like this within the function you already have.
rbindAllSheets <- function(file) {
wb <- loadWorkbook(file)
removeSheet(wb, sheet = 1)
sheets <- getSheets(wb)
do.call(rbind,
lapply(sheets, read_worksheet, wb = wb, file = file)
)
}
As another note, bind_rows() is another dplyr function which can take the place of your do.call(rbind, ...) calls.
bind_rows(lapply(sheets, read_worksheet, wb = wb, file = file))

Looping read.xlsx for reading in multiple sheets and saving them as separate DF´s

I want to read in sheets 3:8 of my excel file and save them separately.
I got something like this:
for (y in 2012:2017){
save("Year" ,y)<- for (i in 3:8)
{
read_xlsx("/Users/.../Desktop/Kriminalität.xlsx", sheet = i , skip = 4)
}

I am guessing you want to save 5 sheets as separate dataframes and name them as Year2012,Year2013....Year2017.
Create an empty list and read the sheets as elements.Name these elements accordingly and then unlist to get separate dataframes
library(openxlsx)
x=list()
for(i in 3: 8){
x[[i]]=read.xlsx("check.xlsx",sheet = i,colNames = T)
}
names(x)=paste0("Year",c(2012:2017))
list2env(x,envir=.GlobalEnv)

setwd("your directory to excel ")
library(readxl)
data="Kriminalität.xlsx" # your excel name
n=8 #number of the sheets in your excel
for (i in 3:n){
y=paste("sheet",i,sep="")
assign(y, read_xlsx(data, sheet = i ,skip = 4))
}
in section for (i in 3:n) you could define your sheets to read, for example for (i in 3:8) means read sheet3 to sheet8
results for a excel with 3 sheets for (i in 1:3):

Write multiple data frames to correct excel sheets using for loops

I have a nested for loop situation where I am looping through several excel work sheets, applying several functions, generating a single data.frame from each work sheet, and writing all of this back to a new work book whose sheet names are the same as the original work book.
At the moment the sheet names are correct in the new work book, but the data frames keep writing over each other so that every sheet has the same data frame. How can I get each data frame to write to its respective excel sheet?
library(xlsx)
library(agricolae)
#Read all worksheets in one file
read_excel_allsheets <- function(filename) {
sheets <- readxl::excel_sheets(filename)
x <- lapply(sheets, function(X) readxl::read_excel(filename, sheet = X))
names(x) <- sheets
x
}
mysheets <- read_excel_allsheets("Loop_help.xls")`
for (i in 1:length(mysheets)){ #loop through sheets in workbook
file<-as.data.frame(mysheets[i])
for(i in 2:ncol(file)){ #loop through columns in sheets
if(var(file[,i]) > 0){ #exclude columns that are all zeros
#create data frame (df)
#do stuff
}
#dostuff
}
#do more stuff
n<-excel_sheets("Loop_help.xls") #sheet names
for (i in 1:length(mysheets)) {#write new sheets to excel
if (i == 1)
write.xlsx(df, file="Loop.Help.xls", sheetName = n[i])
else write.xlsx(df, file="Loop.Help.xls", sheetName = n[i],
append = TRUE)
}
}

Thank you to #Parfait for leading me to this answer.
Not only did I have a superfluous for loop in there, but I was calling i in both loops, when I needed to use a different letter (k).
Here is the final code:
for (k in 1:length(mysheets)){#loop through sheets in workbook
file<-as.data.frame(mysheets[k])
for(i in 2:ncol(file)){#loop through columns in sheets
if(var(file[,i]) > 0){#exclude columns that are all zeros
#create data frame (df)
#do stuff
}
}
#do more stuff
n<-excel_sheets("Loop_help.xls") #sheet names
if (k == 1)
write.xlsx(df, file="Loop.help.xls", sheetName = n[k])#write first sheet
else write.xlsx(df, file="Loop.help.xls", sheetName = n[k],
append = TRUE) #append additional sheets
}

List of data.frame's to individual excel worksheets - R

I have a list of data.frame's that I would like to output to their own worksheets in excel. I can easily save a single data frame to it's own excel file but I'm not sure how to save multiple data frames to the their own worksheet within the same excel file.
library(xlsx)
write.xlsx(sortedTable[1], "c:/mydata.xlsx")

Specify sheet name for each list element.
library(xlsx)
file <- paste("usarrests.xlsx", sep = "")
write.xlsx(USArrests, file, sheetName = "Sheet1")
write.xlsx(USArrests, file, sheetName = "Sheet2", append = TRUE)
Second approach as suggested by #flodel, would be to use addDataFrame. This is more or less an example from the help page of the said function.
file <- paste("usarrests.xlsx", sep="")
wb <- createWorkbook()
sheet1 <- createSheet(wb, sheetName = "Sheet1")
sheet2 <- createSheet(wb, sheetName = "Sheet2")
addDataFrame(USArrests, sheet = sheet1)
addDataFrame(USArrests * 2, sheet = sheet2)
saveWorkbook(wb, file = file)
Assuming you have a list of data.frames and a list of sheet names, you can use them pair-wise.
wb <- createWorkbook()
datas <- list(USArrests, USArrests * 2)
sheetnames <- paste0("Sheet", seq_along(datas)) # or names(datas) if provided
sheets <- lapply(sheetnames, createSheet, wb = wb)
void <- Map(addDataFrame, datas, sheets)
saveWorkbook(wb, file = file)

Here's the solution with openxlsx:
## create data;
dataframes <- split(iris, iris$Species)
# create workbook
wb <- createWorkbook()
#Iterate the same way as PavoDive, slightly different (creating an anonymous function inside Map())
Map(function(data, nameofsheet){
addWorksheet(wb, nameofsheet)
writeData(wb, nameofsheet, data)
}, dataframes, names(dataframes))
## Save workbook to excel file
saveWorkbook(wb, file = "file.xlsx", overwrite = TRUE)
.. however, openxlsx is also able to use it's function openxlsx::write.xlsx for this, so you can just give the object with your list of dataframes and the filepath, and openxlsx is smart enough to create the list as sheets within the xlsx-file. The code I post here with Map() is if you want to format the sheets in a specific way.

The following code works perfectly, which is from:https://rpubs.com/gbganalyst/RdatatoExcelworkbook
packages <- c("openxlsx", "readxl", "magrittr", "purrr", "ggplot2")
if (!require(install.load)) {
install.packages("install.load")
}
install.load::install_load(packages)
list_of_mydata
write.xlsx(list_of_mydata, "Excel workbook.xlsx")

lets say your list of data frames is called Lst and that the workbook you want to save to is called wb.xlsx. Then you can use:
library(xlsx)
counter <- 1
for (i in length(Lst)){
write.xlsx(x=Lst[[i]],file="wb.xlsx",sheetName=paste("sheet",counter,sep=""),append=T)
counter <- counter + 1
}
G

I think the simplest solution is still missing. Using the writexl package you can write a list of data frames easily:
list_of_dfs <- list(iris, iris)
writexl::write_xlsx(list_of_dfs, "output.xlsx")
Also, if you have a named list then those names become the sheet names:
names(list_of_dfs) <- c("a", "b")
writexl::write_xlsx(list_of_dfs, "output.xlsx")
Alternatively, the rio package allows for more export control and the syntax and handling of named lists is similar:
rio::export(list_of_dfs, "output.xlsx")
You can also easily output them to their own workbooks as well.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Reading xlsx with multiple sheets in R for duplication removal - r

Related

Read and save multiple sheet

Assigning the filename and sheet name to (multiple) observations in R

Looping read.xlsx for reading in multiple sheets and saving them as separate DF´s

Write multiple data frames to correct excel sheets using for loops

List of data.frame's to individual excel worksheets - R

Categories

Resources