r batch reading xls files from specific sheet - r

I have around 30 excel files in my folder.
I am interested in reading them all.
I used this code below
library(readxl)
file.list <- list.files(pattern='*.xlsx')
df.list <- lapply(file.list, read_excel)
The problem is each excel file has multiple sheets and I am interested in the contents of only one sheet , sheetName = "Piano", not interested in the contents of other sheets.
So how can I ensure that in addition to reading all the 30 excel files r reads only data from sheetName="Piano" from all these excel files. Thanks.

We can make use of the sheet argument of read_excel. According to ?read_excel
sheet - Sheet to read. Either a string (the name of a sheet), or an integer (the position of the sheet). Ignored if the sheet is specified via range. If neither argument specifies the sheet, defaults to the first sheet.
library(dplyr)
library(readxl)
df.list <- map(file.list, read_excel, sheet = 'Piano')

Related

Combining multiple excel files by the name of their sheets using rstudio

In my folder I do have 45 different excel files, each having only one sheet and different data structure. I want to import them in R and then just export them back into one single excel file. Till here I can do with the help of google. However I want the sheet names of the individual sheets in the exported file to be what they were in the original file. Below is what my code looks like till now and I can't seem to figure out how to change it (this code works fine just not the way I want).
setwd("C:\\Users\\I0510906\\OneDrive - Sanofi\\Desktop\\Insulin\\IQVIA Files\\RAuto\\PERFB\\INSULIN\\")
data.files <- list.files(pattern = "*.xlsx")
# read the files
files <- lapply(data.files, function(x) read.xlsx2(x, sheetIndex = 1))
for (i in 1:length(data.files)){
# loading the workbook or excel file
wbook <- loadWorkbook(data.files[i])
#extracting sheets from the individual workbooks
sheet <- getSheets(wbook)
for (j in 1:length(sheet)){
assign(paste("global.", i,j, sep = ""),
read.xlsx2(data.files[i], sheetIndex=j,
as.data.frame=TRUE, header=TRUE))
}
}
for (i in 1:length(data.files)) {
if(i==1)
write.xlsx(files[[i]], file="global-data.xlsx",
sheetName = paste("global",i))
else
write.xlsx(files[[i]], file="global-data.xlsx",
sheetName = paste("global",i), append=TRUE)
}
Created on 2022-08-22 by the reprex package (v2.0.1)
Currently it is setting the name of sheets in the exported final file to be Global1/2/3 etc. while what I want is the sheet names should be XYZ/ABC/DEF etc. which were exactly at the time of importing them in RStudio.
EDIT
To make it more clear, I have a folder with 45 files inside. All of them have different structure. My need is not to create a single DF.
e.g. my first excel file is named "SLIDE1". The sheet name is also "SHEET1". Same way the 45th file will have a sheet named "SLIDE45".
After importing all the 45 of them, I want to export them back as one single excel file with 45 different worksheets and name of each of the worksheet would be "SLIDE1/SLIDE2/....../SLIDE45" and so on.
This should work:
library(tidyverse)
files <- list.files("C:\\Users\\I0510906\\OneDrive - Sanofi\\Desktop\\Insulin\\IQVIA Files\\RAuto\\PERFB\\INSULIN\\",
pattern = ".xlsx$",
full.names = TRUE)
files_list <- map(files, rio::import) |>
setNames(tools::file_path_sans_ext(basename(files)))
rio::export(files_list, "C:\\Users\\I0510906\\OneDrive - Sanofi\\Desktop\\Insulin\\IQVIA Files\\RAuto\\PERFB\\INSULIN\\insulin.xlsx")
map iterates over the vector files and applies the same function to each file (in this case import) and returns a list with one element per iteration (essentially The same as lapply)
rio::import can handle most file types automatically and is easier to use than loadWorkbook
setNames sets names to each list elements
rio::export likewise can handle most file types and uses sensible default values for export. When given a named list of data.frames, it writes each element into a sheet

Reading multiple excel sheets into separate data frames then append with sheets from another excel file in R

I am trying to read multiple excel sheets of an excel file into separate dataframes in R. I have 31 sheets. So I tried this:
setwd('E:/Report_folder')
x1_data <- 'E:/Report_folder/report_1h.xlsx'
sheets <- excel_sheets(x1_data)
book <- data.frame()
for (i in 1:31){
book <- read_excel(path = x1_data, sheet = i)
names(book) <- sheets[i]
}
But somehow it doesn't work. Can someone tell me why?
I also have multiple excel files with same sheet names and format, and I want to append all the sheets with similar names and formats in different excel files into separate data frames. Shall I use a loop to read each sheet in those excel files, then append to the associated sheet of the first excel file?
Thank you.

Combine csv files of different formats and make into one excel with different sheets

I have four csv files with different formats and variables, combining these 4 CSV files into one excel file using below code
library(rJava)
library(xlsx)
rm(list = ls())
# getting the path of all reports (they are in csv format)
files <- list.files(pattern = "\\.csv$")
# creating work book
wb <- createWorkbook()
# going through each csv file
for (item in files)
{
# create a sheet in the workbook
sheet <- createSheet(wb, sheetName=strsplit(item,"[.]")[[1]][1])
# add the data to the new sheet
addDataFrame(read.csv(item), sheet,row.names=FALSE)
}
# saving the workbook
saveWorkbook(wb, "crosstabs of data.xlsx")
In csv file one sheet the variable name is source / Medium But it is appeared in output excel file as Source...Medium,
% New Sessions variable is appeared as X..New.Sessions
and all variables delimited space occupied with . in output excel file
How to overcome this i need what ever the variable names in CSV files same as in output Excel file
This problem is due to read.csv changing names of headers. Column headers like gi/joe will be converted in gi.joe if we do read.csv with header=T. So one need to convert just the header names again using:
names(df) <- gsub("\\.","/",names(df))
OR if acceptable do simply (read headers as data):
addDataFrame(read.csv(item,header=F), sheet,row.names=FALSE)
On a separate note looks like names like gi/joe are not allowed as excel sheet names. Now to validate limitation in excel end open excel and try to name a sheet hi/5. One should get error saying The sheet name contains invalid characters: : \ / ? * [ ]. [I am testing this on mac excel 15.19.1]

How do I modify an existing a sheet in an Excel Workbook using Openxlsx package in R?

I am using "openxlsx" package to read and write excel files. I have a fixed file with a sheet called "Data" which is used by formulas in other sheets. I want to update this Data sheet without touching the other.
I am trying the following code:
write.xlsx(x = Rev_4, file = "Revenue.xlsx", sheetName="Data")
But this erases the excel file and creates a new one with just the new data in the "Data" sheet while all else gets deleted. Any Advice?
Try this:
wb <- loadWorkbook("Revenue.xlsx")
writeData(wb, sheet = "Data", Rev_4, colNames = F)
saveWorkbook(wb,"Revenue.xlsx",overwrite = T)
You need to load the complete workbook, then modify its data and then save it to disk. With writeData you can also specify the starting row and column. And you could also modify other sections before saving to disk.
I've found this package. It depends on openxlsx and helps to insert many sheets on a xlsx file. Maybe it makes easier:
Package documentation
library(xlsx2dfs)
# However, be careful, the function xlsx2dfs assumes
# that all sheets contain simple tables. If that is not the case,
# use the accepted answer!
dfs <- xlsx2dfs("Revenue.xlsx") # all sheets of file as list of dfs
dfs["Data"] <- Rev_4 # replace df of sheet "Data" by updated df Rev_4
dfs2xlsx(dfs, "Revenue.xlsx") # this overwrites the existing file! cave!

How to ignore hidden data when importing from Excel

I have a collection of excel files which I am importing into R.
The files contain hidden data which I would like to disregard -- eg, simply not importing it, or importing it with a flag indicating it was hidden so that I can then drop it.
The files contain two types of hidden data:
Complete sheets are hidden
Specific Rows within a sheet are hidden.
Is there a way to identify when data in excel is hidden?
Right now I am using the gdata package, but am happy to use XLConnect or other package
Sample Code:
library(gdata)
xlsfile <- "test.xls"
# grab all the sheet names.
# This is giving me both hidden & non-hidden sheets. I would like only the latter
sheets <- sheetNames(xlsfile)
# read in the xls file, by sheet
xlData <-
lapply(sheets, function(s)
read.xls(xlsfile, sheet=s, stringsAsFactors = FALSE))
if needed, I can create a dummy xls file and post it.
XLConnect has a nice function called isSheetHidden which does what you want. Assuming Sheet2 is hidden:
library(XLConnect)
xlsfile <- "Book1.xls"
wb <- loadWorkbook(xlsfile, create = TRUE)
isSheetHidden(wb, "Sheet1") # FALSE
isSheetHidden(wb, "Sheet2") # TRUE
In gdata you would have to write your own function that calls the underlying perl package to access the sheet property, but it is possible.

Resources