In R, opening an object saved to Excel through shell.exec - r

I would like to be able to open files quickly in Excel after saving them. I learned from R opening a specific worksheet in a excel workbook using shell.exec 1 on SO
On my Windows system, I can do so with the following code and could perhaps turn it into a function: saveOpen <_ function {... . However, I suspect there are better ways to accomplish this modest goal.
I would appreciate any suggestions to improve this multi-step effort.
# create tiny data frame
df <- data.frame(names = c("Alpha", "Baker"), cities = c("NYC", "Rome"))
# save the data frame to an Excel file in the working directory
save.xls(df, filename "test file.xlsx")
# I have to reenter the file name and add a forward slash for the paste() command below to create a proper file path
name <- "/test file.xlsx"
# add the working directory path to the file name
file <- paste0(getwd(), name)
# with shell and .exec for Windows, open the Excel file
shell.exec(file = file)

Do you just want to create a helper function to make this easier? How about
save.xls.and.open <- function(dataframe, filename, ...) {
save.xls(df, filename=filename, ...)
cmd <- file.path(getwd(), filename)
shell.exec(cmd)
}
then you just run
save.xls.and.open(df, filename ="testfile.xlsx")
I guess it doesn't seem like all that many steps to me.

Related

Creating objects from all .xlsx documents in working directory

I am trying to create objects from all files in working directory with name of the original file. I tried to go the following way, but couldn't solve appearing problems.
# - SETTING WD
getwd()
setwd("PATH TO THE FILE")
library(readxl)
# - CREATING OBJECTS
file_objects <- list.files()
xlsx_objects <- unlist(grep(".xlsx",file_objects,value = T))
for (i in xlsx_objects) {
xlsx_objects[i] <- read_xlsx(xlsx_objects[i], header = T)
}
I tried to paste [i]item from "xlsx_objects" with path to WD but it only created a list of files names from docs in WD.
I also find information, that read.csv can read only one file at the time, but I guess that it should be the case with for loop, right? It is reading only one file at the time.
Using lapply (as described in this forum) I was able to get the data in the environment, but argument header didn't work, I lost names of my docs in that object which does not have desired structure. I am though looking for having these files in separated objects without calling every document exclusively.
IIUC, you could do something like:
files = list.files("PATH TO THE FILE", full.names = T, pattern = 'xlsx')
list_files = map(files, readxl::read_excel)
(You can't use read.csv to read excel files)
Also I recommend reading about R Projects so you don't have to use setwd() ever again, which makes your code harder to reproduce down the pipeline

Read latest SPSS file from directory

I am trying to read the latest SPSS file from the directory which has several SPSS files. I want to read only the newest file from a list of 3 files which changes with time. Currently, I have manually entered the filename (SPSS-1568207835.sav for ex.) which works absolutely fine, but I want to make this dynamic and automatically fetch the latest file. Any help would be greatly appreciated.
setwd('/file/path/for/this/file/SPSS')
library(expss)
expss_output_viewer()
mydata = read_spss("SPSS-1568207835.sav",reencode = TRUE)
w <- data.frame(mydata)
args <- commandArgs(TRUE)
This should return a character string for the filename of the .sav file modified most recently
# get all .sav files
all_sav <- list.files(pattern ='\\.sav$')
# use file.info to get the index of the file most recently modified
all_sav[with(file.info(all_sav), which.max(mtime))]

Create a .tar.gz file from serialized content string with R

Given a .tar.gz file on my hard disk, would like to create that exact file, but with R code alone (e.g. with the help of serialization). The goal is to not refer to the file itself, but to generate a plain text variable containing the content of the file and after that to write the file to the file system. I thought about the following:
Take the base64 string of the file (base64 serialization).
Write it to the file system as a binary file.
But the following code generates an empty file:
zzfil <- tempfile("testfile")
zz <- file(zzfil, "wb")
file_content <- "H4sIAAAAAAAAA+1YbW/bNhD2Z/6KW/zBNpLIerHjQmvapo6HBWgyw3ZXDE1X0BJtEZFIgaTguEb/+06S7drJumJA5m6DHsAQxOM9PPF4uscyTJuUBnd0zmzbaV8Oxv3R1XBy9ctN7clgI846nfzq9Lr27rVAr2fXHM+zvV6303N6NdvxHDSDXTsAMm2owlDE/K/nfcv+H8WwzL0PZu8gkMkyxcG1lUy4ifH2XUQNmIhtxuFSMg3Nwgp9qlmL/MqU5lL4YFuOZZOLzERS5Z4SFkoaBtyQa8qFwR9DwwTZ1stCsh2H50uZKc3i2SstE7aImGKWYOYFuWQ6UDw1xRrXUjGgU5kZWOShcQNhEVFCl1Pky80mogKkYBAjcYsA4q1mMEN+0LgyTkd6AVyETBgu5hiOonNF0wgt3ERcFI+8s7BF3vCACb3Zkbi8A67zCDIkUi/JQAQyRDof3k5+On2GgadMhNqHETRfnINndSyvRa6SVCqDo/N5GkvjFjbXci2ndQKGT6e4sfmQg9PdFnlDPy0vqaGYLpUxcsNYqPsySXlMyx0RkqxzE/rg2s6zU7t76jngOL7T870uBtP/EcScbPK/n/b2zcX1YDy86A+e8ox9q/6x8Mv6d92eZztY/+6Z163q/xBg9/kBHFJjmBLNo9/fv/dpnEbU//Dh+KhFahX+33hQ/6P2P7BG0eO73a/Xv20/qH/nDGUAdKv6P3z+IxbH0hod8P2P2T5b59/zOo6d6z+706ve/4dAHX7OE34CC6ni8AdSJ3XUZLmS0YDCid3TJEUNMstEkCsMEDRhITSKU9LAuYuIBxGkCpWbhsYeV8Mq2H6TGQRIFTOqRKnJSsm2kX200Ii58srlFozGJgu5BGr8wh8gMib12211mt7NtRXR0AqkJT61C/MY9SFkms2yGO7YciqpCkEjoQkyDGkm1eOFNsSvMx6H+JghjFgsaQhbNQyNvlExHMM44jOD19eNwqMfseBuZ9oOHnoMSo8JFtifOzzymDQIqTfgVdmTSbHH8Px0u/nNFqxQwBab3Tza22ts1Z8fO3/Mq3ufISeI+VRRtWyuNWfrC+eV3gpRLrAw4piFL4+KCTlNaWuGqEDPExNQpU+AMt28P0/SeaucdgxzJpOPeISMRBWdYNBQh5DNaBYbmCItRlr13X/p+z+h4ukVwN/v/67Tc6r+/53yv1YA4cH+/7mP8u91u17V/w+B27yhr4qUfya3NOZUb+9M/l1nte4z74o+g6OZxrOyKhtMM287t+GXTyMrMvyKFMB5azGhd52rN3CFChUqfB/8AQr6tbUAGgAA"
writeBin(RCurl::base64Decode(file_content), zz)
close(zz)
file.rename(from = zzfil, to = paste0(zzfil,".tar.gz"))
How should I serialize the file instead? I.e. how should I fill the functions file_to_string and string_to_file?
file_to_string <- function(input_file){
# Return a serialized string of input_file
}
string_to_file <- function(input_string){
# Return content to write to a file
}
original_file <- "original.tar.gz"
zzfil <- tempfile("copy")
zz <- file(zzfil, "wb")
file_content <- file_to_string(original_file)
writeBin(string_to_file(file_content), zz)
close(zz)
file.rename(from = zzfil, to = paste0(zzfil,".tar.gz"))
For me, using R 3.4.4 on platform x86_64-pc-linux-gnu, RCurl version 1.95-4.10, the example code produces a non-empty file that can be read back in using readBin, so i can't reproduce your empty file issue.
But that's not the main issue here.
UsingwriteBin does not achieve what you want to do: it's use case is to store an R-Object (a vector) in a binary format on the filesystem and read it back in with readBin; not to read in a binary file, then manipulate it and save the new version or generate a binary file that is meant to be understood by anything else besides readBin.
In my humble opinion: R is probably not the right tool to do binary patches.

How to read a csv file or load an excel workbook by ignoring some characters in the file path?

I'm writing a loop script which involves reading a file from a workbook (using the package XLConnect). The challenge is that the file names contain characters (representing time) that I want to ignore.
For example, here are 3 paths to those files:
G://User//Documents//daily_data//Op_Schedule_20160520_132025.xlsx
G://User//Documents//daily_data//Op_Schedule_20160521_142805.xlsx
G://User//Documents//daily_data//Op_Schedule_20160522_103052.xlsx
I need to import hundreds of those files. I can easily account for the character string representing the date (e.g. 20160522), but not the time.
Is there a way to tell R to ignore some characters located in the file path? Here is how I was thinking of writing my script (the "???" is where i need help). I know a loop is probably not the most efficient way, but i'm open to suggestions, should you have any:
require(XLConnect)
path= "G://User//Documents//daily_data//Op_Schedule_"
wd.seq = format(seq(as.Date("2014-01-01"),as.Date("2016-12-31"),"days"),format="%Y%m%d")
scheduleList = rep(list(matrix(1,1,1)),length(wd.seq))
for(i in 1:length(wd.seq)) {
wb = loadWorkbook(file= paste0(path,wd.seq[i],"???",".xlxs"))
scheduleList[[i]] = readWorksheet(wb,sheet='=SCHEDULE', header = TRUE)
}
`
Thanks for reading and suggestions, if any.
Mathieu
I don't know if this is helpful, but if you want to read all the files in a certain directory (which it seems to me is what you're after), you can read all the filenames into a list using the list.files() function, for example
fileList <- list.files(""G://User//Documents//daily_data//")
And then load the xlsx files looping through the list with a for loop
for(i in fileList) {
loadWorkbook(file = i)
}
I haven't used the XLConnect function before so that exact code probably doesn't work, but the loop will iterate through all the files in that directory and so you can construct your loading call using the i variable for the filename (it won't be an absolute path though, so you might need to use paste to add the first part of the filepath)
I realize there might be other files in the directory that are not excel files, you could use grepl to select only files containg "OP_Schedule_"
fileListClean <- fileList[grepl("Op_Schedule_",fileList)]
or perhaps only selecting .xlsx files in the directory:
fileListClean <- fileList[grepl(".xlsx",fileList)]
Edit to fit your reply:
Since you need to fit it to a sequence, you can do it as you did earlier:
wd.seq = format(seq(as.Date("2014-01-01"),as.Date("2016-12-31"),"days"),format="%Y%m%d")
wd.seq2 <- paste("Op_Schedule_", wd.seq, sep = "")
And then use grepl to only pick files starting with that extensions:
fileListClean <- fileList[grepl(paste(wd.seq2, collapse = "|"), fileList)]
Full disclosure: The last part i got from this SO answer: grep using a character vector with multiple patterns

Print/save Excel (.xlsx) sheet to PDF using R

I want to print an Excel file to a pdf file after manipulating it. For the manipulation I used the .xlsx package which works fine. There is a function printSetup but I cannot find a function to start the printing. Is there a solution for this?
library(xlsx)
file <- "test.xlsx"
wb <- loadWorkbook(file)
sheets <- getSheets(wb) # get all sheets
sheet <- sheets[[1]] # get first sheet
# HERE: MAGIC TO SAVE THIS SHEET TO PDF
It may be a solution using DCOM through the RDCOMClient package, though I would prefer a plattform independent solution (e.g. using xlsx) as I work on MacOS. Any ideas?
Below a solution using the DCOM interface via the RDCOMClient. This is not my preferred solution as it only works on Windows. A plattform independent solution would still be appreciated.
library(RDCOMClient)
library(R.utils)
file <- "file.xlsx" # relative path to Excel file
ex <- COMCreate("Excel.Application") # create COM object
file <- getAbsolutePath(file) # convert to absolute path
book <- ex$workbooks()$Open(file) # open Excel file
sheet <- book$Worksheets()$Item(1) # pointer to first worksheet
sheet$Select() # select first worksheet
ex[["ActiveSheet"]]$ExportAsFixedFormat(Type=0, # export as PDF
Filename="my.pdf",
IgnorePrintAreas=FALSE)
ex[["ActiveWorkbook"]]$Save() # save workbook
ex$Quit() # close Excel
An open source and cross platform way to do this would be with libreoffice as so:
library("XLConnect")
x <- rnorm(1:100)
y <- x ^ 2
writeWorksheetToFile("test.xlsx", data.frame(x = x, y = y), "Data")
tmpDir <- file.path(tempdir(), "LOConv")
system2("libreoffice", c(paste0("-env:UserInstallation=file://", tmpDir), "--headless", "--convert-to pdf",
"--outdir", getwd(), file.path(getwd(),"test.xlsx")))
Ideally you'd then remove the folder referenced by tmpDir but that would be platform specific.
Note this assumes libreoffice is in your path. If it isn't, then the command would need to be altered to include the full path to the libreoffice executable.
The reason for the env bit is that headless libreoffice will only do anything otherwise if it isn't already running in GUI mode. See http://ask.libreoffice.org/en/question/1686/how-to-not-connect-to-a-running-instance/ for more info.
You could use the pdf function:
pdf(file="myfile.pdf", width=8.5, height=11)
print(firstsheet)
grid.newpage()
print(secondsheet)
grid.newpage()
print(thirdsheet)
dev.off()

Resources