Merging multiple excel files - r

I am trying to read multiple excel files with in R studio so I can append them to one large dataset. For some reason it is not uploading any of the excel files. I have the correct working directory but the R file is always empty.
The name of the files are sbir_award_1 sbir_award_2 , etc. Any help would be greatly appreciated.
library(tidyverse)
library(readxl)
library(writexl)
sbir <- list.files(pattern = "data/raw/sbir/sbir_award_.xlsx")
data <- list()
df.list <- lapply(file.list,read.xlsx)

Related

R read multiple password protected xlsx files in directory and convert to csv

I was hoping someone could give me some pointers how to do this. I have multiple xlsx files in a directory that I'd like to convert to csv and then combine in one csv. I can do this with regular xlsx with the code below, but now I have to be able to read xlsx files that are password protected. Any ideas how I'd get around that?
library(rio)
library(plyr)
##LOAD, CONVERT AND COMBINE TO CSV
#convert all xlsx to csv
xls <- dir(pattern = "xlsx")
created <- mapply(convert, xls, gsub("xlsx", "csv", xls))
unlink(xls)
#combine csv files in directory
combined_ll <- ldply(list.files(), read.csv, header=TRUE)
They all have the same password, so in theory I should be able to read and unlock each file as it converts them. Any pointers would be so appreciated.

How to Read Selected Multiple Files in a Folder with sparklyr?

I'd like to read SELECTED multiple files with sparklyr. I have multiple csv files (eg. a1.csv, a2.csv, a3.csv, a4.csv, a5.csv) in a folder, and I'd like to read a2.csv, a3.csv, a4.csv files at once if possible.
I know I can read csv file with spark_read_csv(sc, "cash", "/dir1/folder1/a2") so I tried
a_all <- data.frame(col1=integer(),col2=integer())
a_all <- sdf_copy_to(sc, a_all, "a_all")
for(i in 2:4){
tmp1 <- spark_read_csv(sc=sc, name="tmp1", paste0("/dir1/folder1/a",i))
a_all <- sdf_bind_rows(a_all, tmp1)
}
As a result I will get a spark_tbl which is binding a2.csv, a3.csv, a4.csv files rbind(a2,a3,a4).
I think there is a way to do it easier (maybe without for loop) by using path= but I am not sure how to select only few csv files in a folder. Please help!

Convert XLS to CSV - R (Tried Rio Package)

I have a list of files in a directory which I'm trying to convert to csv, had tried rio package and solutions as suggested here
The output is list of empty CSV files with no content. It could be because the first 8 rows of the xls files have an image and few emtpy lines with couple couple of cells filled with text.
Is there any way I could skip those first 8 lines in all of xls files before converting.
Tried exploring options from openxlsx or readxls packages, any suggestions or guidance will be helpful.
Please do not mark as duplicate since I have a different problem than the one that was already answered
Maybe the following will work. At least it does for my own mock-up of an excel file with a picture in the top
library("readxl") # To read xlsx
library("readr") # Fast csv write
indata <- read_excel("~/cowexcel.xlsx", skip=8)
write_csv(indata, path="cow.csv")
If you are running this for several files then combine it into a function. Note that the function below does no checking and might overwrite existing csv files
convert_excel_to_csv <- function(name) {
indata <- read_excel(name, skip=8)
write_csv(indata, path=paste0(tools::file_path_sans_ext(name), ".csv"))
}
Although I was not able to do it with rio to convert, I read it as xls and wrote it back as csv using below code. Testing worked fine, Hope it works without glitch in implementation.
files <- list.files(pattern = '*.xls')
y=NULL
for(i in files ) {
x <- read.xlsx(i, sheetIndex = 1, header=TRUE, startRow=9)
y= rbind(y,x)
}
dt <- Sys.Date()
fn<- paste("path/",dt,".csv",sep="")
write.csv(y,fn,row.names = FALSE)

Creating a loop for creating multiple sheet from multiple excel files in R

I have multiple excel files with data. I wanted to split the data in each excel file into multiple sheets within that particular excel file. I have already managed to do that with the following code:
library(Openxlsx)
data<- read.xlsx(file.choose())
splitdata <- split(data, data$Assigned)
splitdata
workbook <- createWorkbook()
Map(function(data,name){
addWorksheet(workbook, name)
writeDataTable(workbook, name, data)
},splitdata, names(splitdata))
saveWorkbook(workbook, file = "WorkbookWithMultipleSheets.xlsx", overwrite = TRUE)
However, I have more than 50 excel files, for which I need to create multiple sheets using the code above. Is there any way to create a loop so that I won't have to write this data for each excel file that I have?
Any help is appreciated! Thank you!

Dynamically converting a list of Excel files to csv files in R

I currently have a folder containing all Excel (.xlsx) files, and using R I would like to automatically convert all of these files to CSV files using the "openxlsx" package (or some variation). I currently have the following code to convert one of the files and place it in the same folder:convert("team_order\\team_1.xlsx", "team_order\\team_1.csv")
I would like to automate the process so it does it to all the files in the folder, and also removes the current xlsx files, so only the csv files remain. Thanks!
You can try this using rio, since it seems like that's what you're already using:
library("rio")
xls <- dir(pattern = "xlsx")
created <- mapply(convert, xls, gsub("xlsx", "csv", xls))
unlink(xls) # delete xlsx files
library(readxl)
# Create a vector of Excel files to read
files.to.read = list.files(pattern="xlsx")
# Read each file and write it to csv
lapply(files.to.read, function(f) {
df = read_excel(f, sheet=1)
write.csv(df, gsub("xlsx", "csv", f), row.names=FALSE)
})
You can remove the files with the command below. However, this is dangerous to run automatically right after the previous code. If the previous code fails for some reason, the code below will still delete your Excel files.
lapply(files.to.read, file.remove)
You could wrap it in a try/catch block to be safe.

Resources