Batch convert Excel files to PDFs in R - r

I have a folder of excel (xlsx) sheets that I want to convert to PDFs in R. I've tried reading the worksheets into R directly (using almost all packages) but the data is never read properly. I'm dealing with excel spreadsheets from several different people so assume this is because of the differences between saving files from everyone's computers.
I figure that converting these files to PDFs would mean that they are all formatted the same and therefore will be easier to work with.
Is it possible to convert files from excel worksheets to PDFs using R without opening the files/reading them into R as this is where the errors occur?

This should work with R 3.6.2:
# install RDCOMClient for 3.6.2
url <- "http://www.omegahat.net/R/bin/windows/contrib/3.5.1/RDCOMClient_0.93-0.zip"
install.packages(url, repos=NULL, type="binary")
# install R.utils
install.packages("R.utils")
library(RDCOMClient)
library(R.utils)
# make a list of the folder with your excel files
# replace "Path to your folder" with the path to your folder
list.files("Path to your folder",full.names=TRUE) -> list
# Batch convert (replace "Path to your folder" with the path to your folder)
lapply(list, function(x) {
file <- x # path to Excel file
ex <- COMCreate("Excel.Application") # create COM object
file <- getAbsolutePath(file) # convert to absolute path
book <- ex$workbooks()$Open(file) # open Excel file
sheet <- book$Worksheets()$Item(1) # pointer to first worksheet
sheet$Select() # select first worksheet
ex[["ActiveSheet"]]$ExportAsFixedFormat(Type=0, # export as PDF
Filename=paste0("Path to your folder",gsub(pattern = "\\.xlsx$", "", basename(x)),".pdf"),
IgnorePrintAreas=FALSE)
ex[["ActiveWorkbook"]]$Save() # save workbook
ex$Quit() # close Excel
})

Related

Read in feather file directly from GitHub in R

How can I read in a .feather file from the web (e.g. GitHub) in R? I can read formats as .csv or .dta from GitHub directly as raw
# CSV
coursedata <- read.csv(file = 'https://raw.githubusercontent.com/MarcoKuehne/seminars_in_applied_economics/main/Data/GF_2020.csv')
# DTA
library(haven)
soep <- read_dta("https://github.com/MarcoKuehne/seminars_in_applied_economics/blob/main/Data/soep_lebensz_en.dta?raw=true")
But the same approach fails for arrow and read_feather.
library(arrow)
digital <- read_feather("https://github.com/MarcoKuehne/seminars_in_applied_economics/blob/main/Data/Digital_Literacy_EN.feather?raw=true")
Is there a direct way or a nested command? Or am I required to download the file manually or programmatically as a temporary file?

Read latest SPSS file from directory

I am trying to read the latest SPSS file from the directory which has several SPSS files. I want to read only the newest file from a list of 3 files which changes with time. Currently, I have manually entered the filename (SPSS-1568207835.sav for ex.) which works absolutely fine, but I want to make this dynamic and automatically fetch the latest file. Any help would be greatly appreciated.
setwd('/file/path/for/this/file/SPSS')
library(expss)
expss_output_viewer()
mydata = read_spss("SPSS-1568207835.sav",reencode = TRUE)
w <- data.frame(mydata)
args <- commandArgs(TRUE)
This should return a character string for the filename of the .sav file modified most recently
# get all .sav files
all_sav <- list.files(pattern ='\\.sav$')
# use file.info to get the index of the file most recently modified
all_sav[with(file.info(all_sav), which.max(mtime))]

Dynamically converting a list of Excel files to csv files in R

I currently have a folder containing all Excel (.xlsx) files, and using R I would like to automatically convert all of these files to CSV files using the "openxlsx" package (or some variation). I currently have the following code to convert one of the files and place it in the same folder:convert("team_order\\team_1.xlsx", "team_order\\team_1.csv")
I would like to automate the process so it does it to all the files in the folder, and also removes the current xlsx files, so only the csv files remain. Thanks!
You can try this using rio, since it seems like that's what you're already using:
library("rio")
xls <- dir(pattern = "xlsx")
created <- mapply(convert, xls, gsub("xlsx", "csv", xls))
unlink(xls) # delete xlsx files
library(readxl)
# Create a vector of Excel files to read
files.to.read = list.files(pattern="xlsx")
# Read each file and write it to csv
lapply(files.to.read, function(f) {
df = read_excel(f, sheet=1)
write.csv(df, gsub("xlsx", "csv", f), row.names=FALSE)
})
You can remove the files with the command below. However, this is dangerous to run automatically right after the previous code. If the previous code fails for some reason, the code below will still delete your Excel files.
lapply(files.to.read, file.remove)
You could wrap it in a try/catch block to be safe.

Print/save Excel (.xlsx) sheet to PDF using R

I want to print an Excel file to a pdf file after manipulating it. For the manipulation I used the .xlsx package which works fine. There is a function printSetup but I cannot find a function to start the printing. Is there a solution for this?
library(xlsx)
file <- "test.xlsx"
wb <- loadWorkbook(file)
sheets <- getSheets(wb) # get all sheets
sheet <- sheets[[1]] # get first sheet
# HERE: MAGIC TO SAVE THIS SHEET TO PDF
It may be a solution using DCOM through the RDCOMClient package, though I would prefer a plattform independent solution (e.g. using xlsx) as I work on MacOS. Any ideas?
Below a solution using the DCOM interface via the RDCOMClient. This is not my preferred solution as it only works on Windows. A plattform independent solution would still be appreciated.
library(RDCOMClient)
library(R.utils)
file <- "file.xlsx" # relative path to Excel file
ex <- COMCreate("Excel.Application") # create COM object
file <- getAbsolutePath(file) # convert to absolute path
book <- ex$workbooks()$Open(file) # open Excel file
sheet <- book$Worksheets()$Item(1) # pointer to first worksheet
sheet$Select() # select first worksheet
ex[["ActiveSheet"]]$ExportAsFixedFormat(Type=0, # export as PDF
Filename="my.pdf",
IgnorePrintAreas=FALSE)
ex[["ActiveWorkbook"]]$Save() # save workbook
ex$Quit() # close Excel
An open source and cross platform way to do this would be with libreoffice as so:
library("XLConnect")
x <- rnorm(1:100)
y <- x ^ 2
writeWorksheetToFile("test.xlsx", data.frame(x = x, y = y), "Data")
tmpDir <- file.path(tempdir(), "LOConv")
system2("libreoffice", c(paste0("-env:UserInstallation=file://", tmpDir), "--headless", "--convert-to pdf",
"--outdir", getwd(), file.path(getwd(),"test.xlsx")))
Ideally you'd then remove the folder referenced by tmpDir but that would be platform specific.
Note this assumes libreoffice is in your path. If it isn't, then the command would need to be altered to include the full path to the libreoffice executable.
The reason for the env bit is that headless libreoffice will only do anything otherwise if it isn't already running in GUI mode. See http://ask.libreoffice.org/en/question/1686/how-to-not-connect-to-a-running-instance/ for more info.
You could use the pdf function:
pdf(file="myfile.pdf", width=8.5, height=11)
print(firstsheet)
grid.newpage()
print(secondsheet)
grid.newpage()
print(thirdsheet)
dev.off()

In R, opening an object saved to Excel through shell.exec

I would like to be able to open files quickly in Excel after saving them. I learned from R opening a specific worksheet in a excel workbook using shell.exec 1 on SO
On my Windows system, I can do so with the following code and could perhaps turn it into a function: saveOpen <_ function {... . However, I suspect there are better ways to accomplish this modest goal.
I would appreciate any suggestions to improve this multi-step effort.
# create tiny data frame
df <- data.frame(names = c("Alpha", "Baker"), cities = c("NYC", "Rome"))
# save the data frame to an Excel file in the working directory
save.xls(df, filename "test file.xlsx")
# I have to reenter the file name and add a forward slash for the paste() command below to create a proper file path
name <- "/test file.xlsx"
# add the working directory path to the file name
file <- paste0(getwd(), name)
# with shell and .exec for Windows, open the Excel file
shell.exec(file = file)
Do you just want to create a helper function to make this easier? How about
save.xls.and.open <- function(dataframe, filename, ...) {
save.xls(df, filename=filename, ...)
cmd <- file.path(getwd(), filename)
shell.exec(cmd)
}
then you just run
save.xls.and.open(df, filename ="testfile.xlsx")
I guess it doesn't seem like all that many steps to me.

Resources