Batch file oddly looping through subsection of R file - r

I have a batch file test.bat on my windows desktop with the following content:
cd /d W:\r\dev\
"C:\Program Files\R\R-3.5.0\bin\i386\Rscript.exe" scripts\some_function.R
When executed, the referenced R script some_function.R begins to run and outputs messages to the console to let me know where it is. However, it stops after reaching a certain point in the R code then starts over and continues to do so until the session auto-terminates.
I've inspected the R code for an indication of why the batch file would continue to return to the beginning and I've found none. The R file works fine when run directly (from within R studio, for example).
I'm well aware that naming the batch file with a command that appears in the file can be problematic, but that's not what is happening here. I've tried multiple names for the batch file but that doesn't seem to solve the problem.
Below is the beginning of the code in the referenced file some_function.R that keeps getting re-executed:
library(data.table)
dir <- gsub(x= getwd(), pattern = "(.:/r/)(.*)",replacement = "\\1")
envir <- gsub(x= getwd(), pattern = "(.:/r/)(\\w+)",replacement = "\\2")
cat(paste("directory = "),dir,"\n","environment = ",envir,"\n")
##############################################################
# Load System Parameters #
##############################################################
cat("Loading system parameters ...","\n")
sys_param_file_path <- paste0(dir,"shared/files/system/sys_param.csv")
sys_params <- data.table(read.csv(sys_param_file_path, stringsAsFactors = FALSE))
sys_params <- sys_params[param_envir == envir,]
for(i in 1:nrow(sys_params)){
if(sys_params[i,param_type] == "environment path"){
sys_params[i,param_val := paste(dir,envir,param_val,sep = "")]
}
if(sys_params[i,param_type] == "root path"){
sys_params[i,param_val := paste(substr(dir,1,nchar(dir)-1),param_val,sep = "")]
}
}
cat(paste("Loaded",nrow(sys_params),"parameters","\n"))
and the console output when the batch file is called:
C:\Users\me\Desktop> cd /d W:\r\dev\
W:\r\dev>"C:Program Files\R\R-3.5.0\bin\i386\Rscript.exe" scripts\some_function.R
Warning message:
package 'data.table' was built under R version 3.5.1
directory = W:/r/
environment = dev
Loading system parameters ...
Loaded 32 parameters
directory = W:/r/
environment = dev
Loading system parameters ...
Loaded 32 parameters
directory = W:/r/
environment = dev
Loading system parameters ...
Loaded 32 parameters
What am I missing?

Related

How to call a parallelized script from command prompt?

I'm running into this issue and I for the life of me can't figure out how to solve it.
Quick summary before example:
I have several hundred data sets from which I want create reports on everyday. In order to do this efficiently, I parallelized the process with doParallel. From within RStudio, the process works fine, but when I try to make the process automatic via Task Scheduler on windows, I can't seem to get it to work.
The process within RStudio is:
I call a script that sources all of my other scripts, each individual script has a header section that performs the appropriate package import, so for instance it would look like:
get_files <- function(){
get_files.create_path() -> path
for(file in path){
if(!(file.info(paste0(path, file))[['isdir']])){
source(paste0(path, file))
}
}
}
get_files.create_path <- function(){
return(<path to directory>)
}
#self call
get_files()
This would be simply "Source on saved" and brings in everything I need into the .GlobalEnv.
From there, I could simply type: parallel_report() which calls a script that sources another script that houses the parallelization of the report generations. There was an issue awhile back with simply calling the parallelization directly (I wonder if this is related?) and so I had to make the doParallel script a non-function housing script and thus couldn't be brought in with the get_files script which would start the report generation every time I brought everything in. Thus, I had to include it in its own script and save it elsewhere to be called when necessary. The parallel_report() function would simply be:
parallel_report <- function(){
source(<path to script>)
}
Then the script that is sourced is the real parallelization script, and would look something like:
doParallel::registerDoParallel(cl = (parallel::detectCores() - 1))
foreach(name = report.list$names,
.packages = c('tidyverse', 'knitr', 'lubridate', 'stringr', 'rmarkdown'),
.export = c('generate_report'),
.errorhandling = 'remove') %dopar% {
tryCatch(expr = {
generate_report(name)
}, error = function(e){
error_handler(error = e, caller = paste0("generate report for ", name, " from parallel"), line = 28)
})
}
doParallel::stopImplicitCluster()
The generate_report function is simply an .Rmd and render() caller:
generate_report <- function(<arguments>){
#stuff
generate_report.render(<arguments>)
#stuff
}
generate_report.render <- function(<arguments>){
rmarkdown::render(
paste0(data.information#location, 'report_generator.Rmd'),
params = list(
name = name,
date = date,
thoughts = thoughts,
auto = auto),
output_file = paste0(str_to_upper(stock), '_report_', str_remove_all(date, '-'))
)
}
So to recap, in RStudio I would simply perform the following:
1 - Source save the script to bring everything
2 - type parallel_report
2.a - this calls directly the doParallization of generate_report
2.b - generate_report calls an .Rmd file that houses the required function calling and whatnot to produce the reports
And the process starts and successfully completes without a hitch.
In order to make the situation automatic via the Task Scheduler, I made a script that the Task Scheduler can call, named automatic_caller:
source(<path to the get_files script>) # this brings in all the scripts and data into the global, just
# as if it were being done manually
tryCatch(
expr = {
parallel_report()
}, error = function(e){
error_handler(error = e, caller = "parallel_report from automatic_callng", line = 39)
})
The error_handler function is just an in-house script used to log errors throughout.
So then on the Task Schedule's tasks I have the Rscript.exe called and then the automatic_caller after that. Everything within the automatic_caller function works except for the report generation.
The process completes almost automatically, and the only output I get is an error:
"pandoc version 1.12.3 or higher is required and was not found (see the help page ?rmarkdown::pandoc_available)."
But rmarkdown is within the .export call of the doParallel and it is in the scripts that use it explicitly, and in the actual generate_report it is called directly via rmarkdown::render().
So - I am at a complete loss.
Thoughts and suggestions would be completely appreciated.
So pandoc is apprently an executable that helps convert files from one extension to another. RStudio comes with its own pandoc executable so when running the scripts from RStudio, it knew where to point when pandoc is required.
From the command prompt, the system did not know to look inside of RStudio, so simply downloading pandoc as a standalone executable gives the system the proper pointer.
Downloded pandoc and everything works fine.

Finding and setting the working directory within a super computer

Obviously one can find and set working directories with getwd() and setwd(). This question is a bit more complicated. I'm running two files on a super computer (one is a regular R file which calls the other file (a .stan file). The way I submit work to the super computer is that I have to zip a folder containing the data, the .R file, and the .stan file. I upload this folder, and I pull this folder by setting it as one of the parameters in the super computer. I call the data using the standard read.csv() command and everything is hunky dory.
However, when I call the .stan file from the .R file, it can't access it because it needs to know the working directory.
This is the error I get:
> fit <- stan(file="ace_thresholds.stan", data=stanData, cores = 4)
Error in file(fname, "rt") : cannot open the connection
In addition: Warning messages:
1: In normalizePath(file) :
path[1]="ace_thresholds.stan": No such file or directory
2: In file(fname, "rt") :
cannot open file 'ace_thresholds.stan': No such file or directory
Error in get_model_strcode(file, model_code) :
cannot open model file "ace_thresholds.stan"
Calls: stan -> stan_model -> stanc -> get_model_strcode
Execution halted
When I tried setting the working directory to the unzipped NSG_stan folder (which is what I assumed the wd to be, I received this error:
fit <- stan(file="NSG_stan/ace_thresholds.stan", data=stanData, cores = 4)
Error in file(fname, "rt") : cannot open the connection
In addition: Warning messages:
1: In normalizePath(file) :
path[1]="NSG_stan/ace_thresholds.stan": No such file or directory
2: In file(fname, "rt") :
cannot open file 'NSG_stan/ace_thresholds.stan': No such file or directory
Error in get_model_strcode(file, model_code) :
cannot open model file "NSG_stan/ace_thresholds.stan"
Calls: stan -> stan_model -> stanc -> get_model_strcode
Execution halted
So I tried running print(getwd()) within the script and in the printout I see that the wd is
"/projects/ps-nsg/home/nsguser/ngbw/workspace/NGBW-JOB-RTOOL_TG-EBE9CDBF28BF42AF8CB6EC9355006B3E/NSG_stan"
which means that the working directory will shift with every job. So to accurately set the working directory, I'd need to set it to the current folder within the script. I looked for various posts on how to do this like the following
# install.packages("rstudioapi") # run this if it's your first time using it to install
library(rstudioapi) # load it
# the following line is for getting the path of your current open file
current_path <- getActiveDocumentContext()$path
# The next line set the working directory to the relevant one:
setwd(dirname(current_path ))
# you can make sure you are in the right directory
print( getwd() )
The issue with this is that it's a super computer, so it's difficult to install packages, because every time I want to install something, I have to email the folks associated with the super comp and that all takes time.
I've looked over this thread as well: R command for setting working directory to source file location in Rstudio. Appears to be a lot of dissent over what works. I tried a couple of them, and they didn't work.
setwd(getSrcDirectory()[1])
this.dir <- dirname(parent.frame(2)$ofile)
setwd(this.dir)
I've included the .R file below, in the event that it helps, but I think this is probably a pretty easy answer for someone with a decent amount of coding experience.
.R file (so it's the "ace_thresholds.stan" file that I either need to link to the current wd or to include the code that would set the wd, such that this "ace_thresholds.stan" call would work. Does that make sense?
Thanks much!
dat <- ace.threshold.t2.samp
dat <- subset(dat, !is.na(rw))
dat$condition <- factor(dat$condition)
dat$pid <- factor(dat$pid)
nTotal <- dim(dat)[1]
nCond <- length(unique(dat$condition))
nSubj <- length(unique(dat$pid))
intensity <- dat$rw
condition <- as.numeric(dat$condition)
pid <- as.numeric(dat$pid)
correct <- dat$correct_button == "correct"
chancePerformance <- 1/2
stanData <- list(nTotal=nTotal, nLevels=nCond, nSubj = nSubj, subject = pid, intensity=intensity, level=condition, correct=correct, chancePerformance=chancePerformance)
fit.rw <- stan(file="ace_thresholds.stan", data=stanData, cores = 4, control=list(max_treedepth=15, adapt_delta=0.90))

Issue when creating zip file from directory

I have trouble with creating zip file from R. The same code worked perfectly at work with R version 3.4.2, 32 bit computer.
Now I am trying to run the same thing on R version 3.5.1, 64 bit computer, and the zip() command does not seem to work. What is going on?
zip(zipfile = "test.zip",files=list.files(getwd()))
#create zip from whole directory, on 1st machine it works, now nothing happens
I checked the source code for zip() and when I debug it, I found out that system2 command does nothing.
zip <- function (zipfile, files, flags = "-r9X", extras = "", zip = Sys.getenv("R_ZIPCMD",
"zip"))
{
if (missing(flags) && (!is.character(files) || !length(files)))
stop("'files' must a character vector specifying one or more filepaths")
args <- c(flags, shQuote(path.expand(zipfile)), shQuote(files),
extras)
if (.Platform$OS.type == "windows")
invisible(system2(zip, args))
else invisible(system2(zip, args))
}
# I run this manually when trying to debug, nothing happens;
system2(zip, args) ## zip is a parameter here, not a function
####
Browse[2]> zip
[1] "zip"
Browse[2]> args
[1] "-r9X" "\"bla.zip\""
[3] "\"[Content_Types].xml\"" "\"_rels\""
[5] "\"docProps\"" "\"xl\""
[7] ""
For example absurd call does not give an error.
system2("blablađ",2) ## does nothing but no error or warning either
I am stuck trying to understand how does system2() function works and what do I need to change to create a compressed folder.
Thanks
EDIT: After taking the account the help from comment, I got following error:
Browse[2]> system2(zip, args,stderr = T)
Error in system2(zip, args, stderr = T) : '"zip"' not found
SOLVED: After installing Rtools for version 3.5 it worked.
From the zip help:
zip(zipfile, files, flags = "-r9X", extras = "",
zip = Sys.getenv("R_ZIPCMD", "zip"))
zip A character string specifying the external command to be used.
As you can see, the zip function has an argument zip to specify the external command to be used. On my machine it is:
λ where zip
C:\Oracle\Ora11\BIN\zip.exe
C:\Program Files\Rtools\bin\zip.exe
The zip program is available in Rtools, but it is also available on any (Windows?) machine, usually.
To check whether zip is found by R, type:
> Sys.which("zip")
zip
"C:\\Oracle\\Ora11\\bin\\zip.exe"
If you get "", that means zip is not in the path, and if it is neither in the environment variable R_ZIPCMD, you have to specify its path in the zip argument.

How to fix "Unable to find GhostScript executable to run checks on size reduction" error upon package check in R?

In Revolution R Enterprise console,
devtools::check("C:/Users/User/Documents/Revolution/mypackage")
produced
checking sizes of PDF files under 'inst/doc' ... NOTE
Unable to find GhostScript executable to run checks on size reduction
with no any other warnings/errors/notes. So, (even though AFAIK this note is not that much important for eventual check), I wanted to get rid of this warning (since I wanna put .PDF files into mypackage\inst\doc folder produced outside of R).
I have Ghostscript installed in my notebook. I got helped via:
> help("R_GSCMD")
R_GSCMD: Optional. The path to Ghostscript, used by dev2bitmap, bitmap and embedFonts.
Consulted when those functions are invoked.
Since it will be treated as if passed to system, spaces and shell metacharacters should be escaped.
> Sys.getenv("R_GSCMD")
[1] ""
What I did (and took error again) is:
> Sys.setenv("R_GSCMD") <- "C:\\Program Files (x86)\\gs\\gs9.19\\bin\\gswin32c.exe"
Error in Sys.setenv("R_GSCMD") <- "C:\\Program Files (x86)\\gs\\gs9.19\\bin\\gswin32c.exe" :
target of assignment expands to non-language object
Upon deepening, I found: ["These errors occur when one tries to assign a value to a variable that doesn't exist, or that R can't treat as a name. (A name is a variable type that holds a variable name."]
What I am basically trying to do is to set my GS executable (C:\Program Files (x86)\gs\gs9.19\bin\gswin32c.exe) to "R_GSCMD".
Any help would be greatly appreciated.
On consulting ?Sys.setenv it confirms my expectation that the call should instead be:
Sys.setenv(R_GSCMD = "C:\\Program Files (x86)\\gs\\gs9.19\\bin\\gswin32c.exe")
Because the gs versions change all the time, you may like a little R script for it!
system.partition = 'c:'
dirs = c('Program Files', 'Program Files (x86)')
for (dir in dirs) {
dir.list = list.dirs(file.path(system.partition, dir), recursive = FALSE)
GsinList = grepl(pattern = 'gs', x = dir.list)
if (sum(GsinList) > 0) {
gsDirectory = which(GsinList == TRUE)
GsExeFiles = list.files(
dir.list[gsDirectory],
recursive = TRUE,
pattern = 'gswin',
include.dirs = TRUE,
full.names = TRUE
)[1]
message('Gs found! ~> ',GsExeFiles)
Sys.setenv(R_GSCMD = GsExeFiles)
break
}
}
Gs found! ~> c:/Program Files/gs/gs9.21/bin/gswin64.exe

Error with tex2docx function from reports R package

I'm trying to reproduce the example for tex2docx function in reports R package and getting the following error.
DOC <- system.file("extdata/doc_library/apa6.qual_tex/doc.tex",
package = "reports")
BIB <- system.file("extdata/docs/example.bib", package = "reports")
tex2docx(DOC, file.path(getwd(), "test.docx"), path = NULL, bib.loc = BIB)
Error Message
pandoc.exe: Error reading bibliography `C:/Users/Muhammad'
citeproc: the format of the bibliographic database could not be recognized
using the file extension.
docx file generated!
Warning message:
running command 'C:\Users\MUHAMM~1\AppData\Local\Pandoc\pandoc.exe -s C:/Users/Muhammad Yaseen/R/win-library/3.0/reports/extdata/doc_library/apa6.qual_tex/doc.tex -o C:/Users/Muhammad Yaseen/Documents/test.docx --bibliography=C:/Users/Muhammad Yaseen/R/win-library/3.0/reports/extdata/docs/example.bib' had status 23
I wonder how to get tex2docx function in reports R package working properly.
As described in the above comments, the error is caused by passing a filename/path including some spaces that are nor escaped, nor quoted. A workaround could be wrapping all file paths and names inside of shQuote before passing to the command line with system.
Code: https://github.com/trinker/reports/pull/31
Demo:
Loading package
library(reports)
Creating a dummy dir with a space in the name that would hold the bib file
dir.create('foo bar')
file.copy(system.file("extdata/docs/example.bib", package = "reports"), 'foo bar/example.bib')
Specifying the source and the copied bib file:
DOC <- system.file("extdata/doc_library/apa6.qual_tex/doc.tex", package = "reports")
BIB <- 'foo bar/example.bib'
Running the test:
tex2docx(DOC, file.path(getwd(), "test2.docx"), path = NULL, bib.loc = BIB)
Disclaimer: I tried to test this pull request, but I could not setup an environment with all the needed tools to run R CMD check with vignettes and everything else after all in 5 mins (sorry but being on vacation right now and just enjoying the siesta after lunch), so please consider this pull request as "untested" -- although it should work.

Resources