Combining multiple excel files by the name of their sheets using rstudio - r

In my folder I do have 45 different excel files, each having only one sheet and different data structure. I want to import them in R and then just export them back into one single excel file. Till here I can do with the help of google. However I want the sheet names of the individual sheets in the exported file to be what they were in the original file. Below is what my code looks like till now and I can't seem to figure out how to change it (this code works fine just not the way I want).
setwd("C:\\Users\\I0510906\\OneDrive - Sanofi\\Desktop\\Insulin\\IQVIA Files\\RAuto\\PERFB\\INSULIN\\")
data.files <- list.files(pattern = "*.xlsx")
# read the files
files <- lapply(data.files, function(x) read.xlsx2(x, sheetIndex = 1))
for (i in 1:length(data.files)){
# loading the workbook or excel file
wbook <- loadWorkbook(data.files[i])
#extracting sheets from the individual workbooks
sheet <- getSheets(wbook)
for (j in 1:length(sheet)){
assign(paste("global.", i,j, sep = ""),
read.xlsx2(data.files[i], sheetIndex=j,
as.data.frame=TRUE, header=TRUE))
}
}
for (i in 1:length(data.files)) {
if(i==1)
write.xlsx(files[[i]], file="global-data.xlsx",
sheetName = paste("global",i))
else
write.xlsx(files[[i]], file="global-data.xlsx",
sheetName = paste("global",i), append=TRUE)
}
Created on 2022-08-22 by the reprex package (v2.0.1)
Currently it is setting the name of sheets in the exported final file to be Global1/2/3 etc. while what I want is the sheet names should be XYZ/ABC/DEF etc. which were exactly at the time of importing them in RStudio.
EDIT
To make it more clear, I have a folder with 45 files inside. All of them have different structure. My need is not to create a single DF.
e.g. my first excel file is named "SLIDE1". The sheet name is also "SHEET1". Same way the 45th file will have a sheet named "SLIDE45".
After importing all the 45 of them, I want to export them back as one single excel file with 45 different worksheets and name of each of the worksheet would be "SLIDE1/SLIDE2/....../SLIDE45" and so on.

This should work:
library(tidyverse)
files <- list.files("C:\\Users\\I0510906\\OneDrive - Sanofi\\Desktop\\Insulin\\IQVIA Files\\RAuto\\PERFB\\INSULIN\\",
pattern = ".xlsx$",
full.names = TRUE)
files_list <- map(files, rio::import) |>
setNames(tools::file_path_sans_ext(basename(files)))
rio::export(files_list, "C:\\Users\\I0510906\\OneDrive - Sanofi\\Desktop\\Insulin\\IQVIA Files\\RAuto\\PERFB\\INSULIN\\insulin.xlsx")
map iterates over the vector files and applies the same function to each file (in this case import) and returns a list with one element per iteration (essentially The same as lapply)
rio::import can handle most file types automatically and is easier to use than loadWorkbook
setNames sets names to each list elements
rio::export likewise can handle most file types and uses sensible default values for export. When given a named list of data.frames, it writes each element into a sheet

Related

Read excel folder with some containing worksheets in R

I have a folder containing excel files. There are some that have worksheets and I just want to know how to use a special case (maybe a for loop) where I would identify which files need to be used to read worksheets and then select only certain tabs to join all of the excel files together at the end. In addition, these tabs would need to skip 40 lines which I assume would use skip=40. When I type the code that I have, it looks like a giant mess.
files <- list.files(path = "/Users/Desktop/folder2, pattern = "*.xlsx")
files_join <- lapply(files, read_excel) %>%
bind_rows()

r batch reading xls files from specific sheet

I have around 30 excel files in my folder.
I am interested in reading them all.
I used this code below
library(readxl)
file.list <- list.files(pattern='*.xlsx')
df.list <- lapply(file.list, read_excel)
The problem is each excel file has multiple sheets and I am interested in the contents of only one sheet , sheetName = "Piano", not interested in the contents of other sheets.
So how can I ensure that in addition to reading all the 30 excel files r reads only data from sheetName="Piano" from all these excel files. Thanks.
We can make use of the sheet argument of read_excel. According to ?read_excel
sheet - Sheet to read. Either a string (the name of a sheet), or an integer (the position of the sheet). Ignored if the sheet is specified via range. If neither argument specifies the sheet, defaults to the first sheet.
library(dplyr)
library(readxl)
df.list <- map(file.list, read_excel, sheet = 'Piano')

How to convert xlsx files to csv files in RStudio? Need to convert multiple workbooks all with multiple spreadsheets

Trying to write an R script that will convert multiple xlsx workbook files within a folder while also converting the sheets within the workbook as separate csv files.
Looking for a single script to automatically apply code to all workbooks and their spreadsheets.
For reading Excel files, there are several packages.
I personally am happy with the xlsx package, which you can use to read Excel files, as well as their individual sheets. This article looks like it will give you the gist of it.
Each worksheet you read out you should then be able to export to CSV files by using R's built-in write.csv (or write.csv2) method.
Below is an example to convert a single xlsx workbook to multiple csv files.
Note that type conversions are not guaranteed to be correct.
xlsx_path <-"path_to_xlsx.xlsx"
sheet_names <- readxl::excel_sheets(xlsx_path)
# read from all sheets to a list of data frames
xlsx_data <- purrr::map(
sheet_names,
~readxl::read_excel(xlsx_path,.x,col_types = "text",col_names = FALSE)
)
# write a list of data frame to csv files
purrr::walk2(
xlsx_data,sheet_names,
~readr::write_csv(.x,paste0(xlsx_path,"-",.y,".csv"),col_names = FALSE)
)
# csv files will be saved as:
# path_to_xlsx-sheet1.xlsx, path_to_xlsx-sheet2.xlsx, ...
If you need to apply this function to many xlsx files. Use list.files() to get the path to all xlsx files. And write a for loop or use another map function to iterate this process.
If you are using Rstudio it is possible that you already have the package readxl installed. They have many workflows for common usecases explained here: https://readxl.tidyverse.org/articles/articles/readxl-workflows.html
They also provide this nice code snippet to do what you are asking for:
read_then_csv <- function(sheet, path) {
pathbase <- tools::file_path_sans_ext(basename(path))
df <- read_excel(path = path, sheet = sheet)
write.csv(df, paste0(pathbase, "-", sheet, ".csv"),
quote = FALSE, row.names = FALSE)
df
}
path <- readxl_example("datasets.xlsx")
sheets <- excel_sheets(path)
xl_list <- lapply(excel_sheets(path), read_then_csv, path = path)
names(xl_list) <- sheets
If you go to here and put "excel" and "xls" in the search bar, you 'll get a list of packages and functions which might help.

How do I modify an existing a sheet in an Excel Workbook using Openxlsx package in R?

I am using "openxlsx" package to read and write excel files. I have a fixed file with a sheet called "Data" which is used by formulas in other sheets. I want to update this Data sheet without touching the other.
I am trying the following code:
write.xlsx(x = Rev_4, file = "Revenue.xlsx", sheetName="Data")
But this erases the excel file and creates a new one with just the new data in the "Data" sheet while all else gets deleted. Any Advice?
Try this:
wb <- loadWorkbook("Revenue.xlsx")
writeData(wb, sheet = "Data", Rev_4, colNames = F)
saveWorkbook(wb,"Revenue.xlsx",overwrite = T)
You need to load the complete workbook, then modify its data and then save it to disk. With writeData you can also specify the starting row and column. And you could also modify other sections before saving to disk.
I've found this package. It depends on openxlsx and helps to insert many sheets on a xlsx file. Maybe it makes easier:
Package documentation
library(xlsx2dfs)
# However, be careful, the function xlsx2dfs assumes
# that all sheets contain simple tables. If that is not the case,
# use the accepted answer!
dfs <- xlsx2dfs("Revenue.xlsx") # all sheets of file as list of dfs
dfs["Data"] <- Rev_4 # replace df of sheet "Data" by updated df Rev_4
dfs2xlsx(dfs, "Revenue.xlsx") # this overwrites the existing file! cave!

To stack up results in one masterfile in R

Using this script I have created a specific folder for each csv file and then saved all my further analysis results in this folder. The name of the folder and csv file are same. The csv files are stored in the main/master directory.
Now, I have created a csv file in each of these folders which contains a list of all the fitted values.
I would now like to do the following:
Set the working directory to the particular filename
Read fitted values file
Add a row/column stating the name of the site/ unique ID
Add it to the masterfile which is stored in the main directory with a title specifying site name/filename. It can be stacked by rows or by columns it doesn't really matter.
Come to the main directory to pick the next file
Repeat the loop
Using the merge(), rbind(), cbind() combines all the data under one column name. I want to keep all the sites separate for comparison at a later on stage.
This is what I'm using at the moment and I'm lost on how to proceed further.
setwd( "path") # main directory
path <-"path" # need this for convenience while switching back to main directory
# import all files and create a character type array
files <- list.files(path=path, pattern="*.csv")
for(i in seq(1, length(files), by = 1)){
fileName <- read.csv(files[i]) # repeat to set the required working directory
base <- strsplit(files[i], ".csv")[[1]] # getting the filename
setwd(file.path(path, base)) # setting the working directory to the same filename
master <- read.csv(paste(base,"_fiited_values curve.csv"))
# read the fitted value csv file for the site and store it in a list
}
I want to construct a for loop to make one master file with the files in different directories. I do not want to merge all under one column name.
For example, If I have 50 similar csv files and each had two columns of data, I would like to have one csv file which accommodates all of it; but in its original format rather than appending to the existing row/column. So then I will have 100 columns of data.
Please tell me what further information can I provide?
for reading a group of files, from a number of different directories, with pathnames patha pathb pathc:
paths = c('patha','pathb','pathc')
files = unlist(sapply(paths, function(path) list.files(path,pattern = "*.csv", full.names = TRUE)))
listContainingAllFiles = lapply(files, read.csv)
If you want to be really quick about it, you can grab fread from data.table:
library(data.table)
listContainingAllFiles = lapply(files, fread)
Either way this will give you a list of all objects, kept separate. If you want to join them together vertically/horizontally, then:
do.call(rbind, listContainingAllFiles)
do.call(cbind, listContainingAllFiles)
EDIT: NOTE, the latter makes no sense unless your rows actually mean something when they're corresponding. It makes far more sense to just create a field tracking what location the data is from.
if you want to include the names of the files as the method of determining sample location (I don't see where you're getting this info from in your example), then you want to do this as you read in the files, so:
listContainingAllFiles = lapply(files,
function(file) data.frame(filename = file,
read.csv(file)))
then later you can split that column to get your details (Assuming of course you have a standard naming convention)

Resources