Can a Y/N prompt in the RStudio Console be deactivated? - r

I'm using a function from an R package called RAC (R Package for Aqua Culture). It generates a Y/N prompt in the console window prior to execution. Is there a way to deactivate the prompt or automatically answer N every time?
The function Bass_pop_main will generate:
Do you want to change the inputs? [y/n]
Here's an example:
library(RAC)
setwd("../RAC_seabass") #working directory
userpath <- "../RAC_seabass" #userpath
Bass_pop_skeleton(userpath) #create input and output folders
forcings <- Bass_pop_dataloader(userpath) #load environmental variables
output <- Bass_pop_main("../RAC", forcings) #run growth model

Not sure if there is any setting that you can supply externally which will allow you to answer "No" automatically every time. However, we can change the source code of Bass_pop_main according to our requirement and use it. The source code is available if you enter Bass_pop_main in the console.
library(RAC)
Bass_pop_main_revised <- function (userpath, forcings) {
rm(list = ls())
cat("Sea Bass population bioenergetic model\n")
cat(" \n")
currentpath = getwd()
out_pre <- Bass_pop_pre(userpath, forcings)
Param = out_pre[[1]]
Tint = out_pre[[2]]
Gint = out_pre[[3]]
Food = out_pre[[4]]
IC = out_pre[[5]]
times = out_pre[[6]]
Dates = out_pre[[7]]
N = out_pre[[8]]
CS = out_pre[[9]]
out_RKsolver <- Bass_pop_loop(Param, Tint, Gint, Food, IC, times, N, userpath)
out_post <- Bass_pop_post(userpath, out_RKsolver, times, Dates, N, CS)
cat(" ")
cat("End")
return(out_post)
}
Now use Bass_pop_main_revised function instead of Bass_pop_main and it will never ask for input.
setwd("../RAC_seabass")
userpath <- "../RAC_seabass"
Bass_pop_skeleton(userpath)
forcings <- Bass_pop_dataloader(userpath)
output <- Bass_pop_main_revised("../RAC", forcings)

Related

Using callr to display an (estimated) progress bar without stopping the script

I would like to run a very simple script concurrently or asynchronously, displaying an estimated progress bar.
This works well enough when using system2() like this:
path <- '../Desktop/.../My_Skript_Dir/'
system2(command = "cmd.exe",
input = paste('"./R-4.2.1/bin/Rscript.exe"',
paste0(path, '/Progress_Bar.R')), wait = FALSE)
If possible I would like to avoid using system2 though and I recently found out that callr might do the trick. It almost works, using the function from the "Progress_Bar" script:
estimated_progress <- function(df = NULL, add_time = FALSE){
require(tcltk)
require(callr)
pred <- round(nrow(df)*0.6) # prediction
callr::r_bg(func = function(pred){ # open background r session
pb1 <- tcltk::tkProgressBar(title='PB', label='PB', min=0, max=pred, initial=0)
for (index in seq(pred)){
tcltk::setTkProgressBar(pb=pb1, value=index)
Sys.sleep(1)
}
}, args = list(pred))
}
df <- data.frame(matrix(nrow = 200, ncol = 3)) # dummy data
estimated_progress(df = df, add_time = FALSE)
When I do this, the progress bar opens in a new window as expected.
It keeps going for the next 1-3 function(s) (for example invisible(pbapply::pblapply(1:200000, function(x) x**3)) ) but any more than that and estimated_progress() abborts.
What am I missing here? I am sure it's quite obvious and I have read that callr can work asynchronously (look here) but I can't make it work.

Loop in R through variable names with values as endings and create new variables from the result

I have 24 variables called empl_1 -empl_24 (e.g. empl_2; empl_3..)
I would like to write a loop in R that takes this values 1-24 and puts them in the respective places so the corresponding variables are either called or created with i = 1-24. The sample below shows what I would like to have within the loop (e.g. ye1- ye24; ipw_atet_1 - ipw_atet_14 and so on.
ye1_ipw <- empl$empl_1[insample==1]
ipw_atet_1 <- treatweight(y=ye1_ipw, d=treat_ipw, x=x1_ipw, ATET =TRUE, trim=0.05, boot = 2)
ipw_atet_1
ipw_atet_1$se
ye2_ipw <- empl$empl_2[insample==1]
ipw_atet_2 <- treatweight(y=ye2_ipw, d=treat_ipw, x=x1_ipw, ATET =TRUE, trim=0.05, boot = 2)
ipw_atet_2
ipw_atet_2$se
ye3_ipw <- empl$empl_3[insample==1]
ipw_atet_3 <- treatweight(y=ye3_ipw, d=treat_ipw, x=x1_ipw, ATET =TRUE, trim=0.05, boot = 2)
ipw_atet_3
ipw_atet_3$se
coming from a Stata environment I tried
for (i in seq_anlong(empl_list)){
ye[i]_ipw <- empl$empl_[i][insample==1]
ipw_atet_[i]<-treatweight(y=ye[i]_ipw, d=treat_ipw, x=x1_ipw, ATET=TRUE, trim=0.05, boot =2
}
However this does not work at all. Do you have any idea how to approach this problem by writing a nice loop? Thank you so much for your help =)
You can try with lapply :
result <- lapply(empl[paste0('empl_', 1:24)], function(x)
treatweight(y = x[insample==1], d = treat_ipw,
x = x1_ipw, ATET = TRUE, trim = 0.05, boot = 2))
result would be a list output storing the data of all the 24 variables in same object which is easier to manage and process instead of having different vectors.

how to write out multiple files in R?

I am a newbie R user. Now, I have a question related to write out multiple files with different names. Lets says that my data has the following structure:
IV_HAR_m1<-matrix(rnorm(1:100), ncol=30, nrow = 2000)
DV_HAR_m1<-matrix(rnorm(1:100), ncol=10, nrow = 2000)
I am trying to estimate multiple LASSO regressions. At the beginning, I was storing the iterations in one object called Dinamic_beta. This object was stored in only one file, and it saves the required information each time that my code iterate.
For doing this I was using stew which belongs to pomp package, but the total process takes 5 or 6 days and I am worried about a power outage or a fail in my computer.
Now, I want to save each environment (iterations) in a .Rnd file. I do not know how can I do that? but the code that I am using is the following:
library(glmnet)
library(Matrix)
library(pomp)
space <- 7 #THE NUMBER OF FILES THAT I would WANT TO CREATE
Dinamic_betas<-array(NA, c(10, 31, (nrow(IV_HAR_m1)-space)))
dimnames(Dinamic_betas) <- list(NULL, NULL)
set.seed(12345)
stew( #stew save the enviroment in a .Rnd file
file = "Dinamic_LASSO_RD",{ # The name required by stew for creating one file with all information
for (i in 1:dim(Dinamic_betas)[3]) {
tryCatch( #print messsages
expr = {
cv_dinamic <- cv.glmnet(IV_HAR_m1[i:(space+i-1),],
DV_HAR_m1[i:(space+i-1),], alpha = 1, family = "mgaussian", thresh=1e-08, maxit=10^9)
LASSO_estimation_dinamic<- glmnet(IV_HAR_m1[i:(space+i-1),], DV_HAR_m1[i:(space+i-1),],
alpha = 1, lambda = cv_dinamic$lambda.min, family = "mgaussian")
coefs <- as.matrix(do.call(cbind, coef(LASSO_estimation_dinamic)))
Dinamic_betas[,,i] <- t(coefs)
},
error = function(e){
message("Caught an error!")
print(e)
},
warning = function(w){
message("Caught an warning!")
print(w)
},
finally = {
message("All done, quitting.")
}
)
if (i%%400==0) {print(i)}
}
}
)
If someone can suggest another package that stores the outputs in different files I will grateful.
Try adding this just before the close of your loop
save.image(paste0("Results_iteration_",i,".RData"))
This should save your entire workspace to disk for every iteration. You can then use load() to load the workspace of every environment. Let me know if this works.

Load the MNIST digit recognition dataset with R and see any results

In the book "Machine Learning - A Probabilistic Perspective" by Kevin P. Murphy the first task reads:
Exercise 1.1 KNN classifier on shuffled MNIST data
Run mnist1NNdemo
and verify that the misclassification rate (on the first 1000 test
cases) of MNIST of a 1-NN classifier is 3.8%. (If you run it all on
all 10,000 test cases, the error rate is 3.09%.) Modify the code so
that you first randomly permute the features (columns of the training
and test design matrices), as in shuffledDigitsDemo, and then apply
the classifier. Verify that the error rate is not changed.
My simple understanding is that the exercise is looking for the 1-NN after loading the files(kNN() in R).
The files:
train-images-idx3-ubyte.gz: training set images (9912422 bytes)
train-labels-idx1-ubyte.gz: training set labels (28881 bytes)
t10k-images-idx3-ubyte.gz: test set images (1648877 bytes)
t10k-labels-idx1-ubyte.gz: test set labels (4542 bytes)
are taken from the The MNIST DATABASE
I found a popular template for loading the files:
# for the kNN() function
library(VIM)
load_mnist <- function() {
load_image_file <- function(filename) {
ret = list()
f = file(filename,'rb')
readBin(f,'integer',n=1,size=4,endian='big')
ret$n = readBin(f,'integer',n=1,size=4,endian='big')
nrow = readBin(f,'integer',n=1,size=4,endian='big')
ncol = readBin(f,'integer',n=1,size=4,endian='big')
x = readBin(f,'integer',n=ret$n*nrow*ncol,size=1,signed=F)
ret$x = matrix(x, ncol=nrow*ncol, byrow=T)
close(f)
ret
}
load_label_file <- function(filename) {
f = file(filename,'rb')
readBin(f,'integer',n=1,size=4,endian='big')
n = readBin(f,'integer',n=1,size=4,endian='big')
y = readBin(f,'integer',n=n,size=1,signed=F)
close(f)
y
}
train <<- load_image_file("train-images.idx3-ubyte")
test <<- load_image_file("t10k-images.idx3-ubyte")
train$y <<- load_label_file("train-labels.idx1-ubyte")
test$y <<- load_label_file("t10k-labels.idx1-ubyte")
}
show_digit <- function(arr784, col=gray(12:1/12)) {
image(matrix(arr784, nrow=28)[,28:1], col=col)
}
According to the comment, in the command line this should work:
# Error "Error in matrix(arr784, nrow = 28) : object 'train' not found"
show_digit(train$x[5,])
The question is how can I use the show_digit function ?
Edit Remove extra question
What I figured out for the problem:
First run the whole file in R Studio or ESS, then call the load_mnist() from the console.
After that execute show_digit(train$x[3,]) in the console again and it works.
Finding the KNN classifier can be done on the whole data set:
a <- knn(train, test, train$y) but it would be a very slow process.
Predictions for the result can be done like table(test$y, a), test$y is predicted, a is the actual result.

Only one processor being used while running NetLogo models using parApply

I am using the 'RNetLogo' package to run sensitivity analyses on my NetLogo model. My model has 24 parameters I need to vary - so parallelising this process would be ideal! I've been following along with the example in Thiele's "Parallel processing with the RNetLogo package" vignette, which uses the 'parallel' package in conjunction with 'RNetLogo'.
I've managed to get R to initialise the NetLogo model across all 12 of my processors, which I've verified using gui=TRUE. The problem comes when I try to run the simulation code across the 12 processors using 'parApply'. This line runs without error, but it only runs on one of the processors (using around 8% of my total CPU power). Here's a mock up of my R code file - I've included some commented-out code at the end, showing how I run the simulation without trying to parallelise:
### Load packages
library(parallel)
### Set up initialisation function
prepro <- function(dummy, gui, nl.path, model.path) {
library(RNetLogo)
NLStart(nl.path, gui=gui)
NLLoadModel(model.path)
}
### Set up finalisation function
postpro <- function(x) {
NLQuit()
}
### Set paths
# For NetLogo
nl.path <- "C:/Program Files/NetLogo 6.0/app"
nl.jarname <- "netlogo-6.0.0.jar"
# For the model
model.path <- "E:/Model.nlogo"
# For the function "sim" code
sim.path <- "E:/sim.R"
### Set base values for parameters
base.param <- c('prey-max-velocity' = 25,
'prey-agility' = 3.5,
'prey-acceleration' = 20,
'prey-deceleration' = 25,
'prey-vision-distance' = 10,
'prey-vision-angle' = 240,
'time-to-turn' = 5,
'time-to-return-to-foraging' = 300,
'time-spent-circling' = 2,
'predator-max-velocity' = 35,
'predator-agility' = 3.5,
'predator-acceleration' = 20,
'predator-deceleration' = 25,
'predator-vision-distance' = 20,
'predator-vision-angle' = 200,
'time-to-give-up' = 120,
'number-of-safe-zones' = 1,
'number-of-target-patches' = 5,
'proportion-obstacles' = 0.05,
'obstacle-radius' = 2.0,
'obstacle-radius-range' = 0.5,
'obstacle-sensitivity-for-prey' = 0.95,
'obstacle-sensitivity-for-predators' = 0.95,
'safe-zone-attractiveness' = 500
)
## Get names of parameters
param.names <- names(base.param)
### Load the code of the simulation function (name: sim)
source(file=sim.path)
### Convert "base.param" to a matrix, as required by parApply
base.param <- matrix(base.param, nrow=1, ncol=24)
### Get the number of simulations we want to run
design.combinations <- length(base.param[[1]])
already.processed <- 0
### Initialise NetLogo
processors <- detectCores()
cl <- makeCluster(processors)
clusterExport(cl, 'sim')
gui <- FALSE
invisible(parLapply(cl, 1:processors, prepro, gui=gui, nl.path=nl.path, model.path=model.path))
### Run the simulation across all processors, using parApply
sim.result.base <- parApply(cl, base.param, 1, sim,
param.names,
no.repeated.sim = 100,
trace.progress = FALSE,
iter.length = design.combinations,
function.name = "base parameters")
### Run the simulation on a single processor
#sim.result.base <- sim(base.param,
# param.names,
# no.repeated.sim = 100,
# my.nl1,
# trace.progress = TRUE,
# iter.length = design.combinations,
# function.name = "base parameters")
Here's a mock up for the 'sim' function (adapted from Thiele's paper "Facilitating parameter estimation and sensitivity analyses of agent-based models - a cookbook using NetLogo and R"):
sim <- function(param.set, parameter.names, no.repeated.sim, trace.progress, iter.length, function.name) {
# Some security checks
if (length(param.set) != length(parameter.names))
{ stop("Wrong length of param.set!") }
if (no.repeated.sim <= 0)
{ stop("Number of repetitions must be > 0!") }
if (length(parameter.names) <= 0)
{ stop("Length of parameter.names must be > 0!") }
# Create an empty list to save the simulation results
eval.values <- NULL
# Run the repeated simulations (to control stochasticity)
for (i in 1:no.repeated.sim)
{
# Create a random-seed for NetLogo from R, based on min/max of NetLogo's random seed
NLCommand("random-seed",runif(1,-2147483648,2147483647))
## This is the stuff for one simulation
cal.crit <- NULL
# Set NetLogo parameters to current parameter values
lapply(seq(1:length(parameter.names)), function(x) {NLCommand("set ",parameter.names[x], param.set[x])})
NLCommand("setup")
# This should run "go" until prey-win =/= 5, i.e. when the pursuit ends
NLDoCommandWhile("prey-win = 5", "go")
# Report a value
prey <- NLReport("prey-win")
# Report another value
pred <- NLReport("predator-win")
## Extract the values we are interested in
cal.crit <- rbind(cal.crit, c(prey, pred))
# append to former results
eval.values <- rbind(eval.values,cal.crit)
}
## Make sure eval.values has column names
names(eval.values) <- c("PreySuccess", "PredSuccess")
# Return the mean of the repeated simulation results
if (no.repeated.sim > 1) {
return(colMeans(eval.values))
}
else {
return(eval.values)
}
}
I think the problem might lie in the "nl.obj" string that RNetLogo uses to identify the NetLogo instance you want to run the code on - however, I've tried several different methods of fixing this, and I haven't been able to come up with a solution that works. When I initialise NetLogo across all the processors using the code provided in Thiele's example, I don't set an "nl.obj" value for each instance, so I'm guessing RNetLogo uses some kind of default list? However, in Thiele's original code, the "sim" function requires you to specify which NetLogo instance you want to run it on - so R will spit an error when I try to run the final line (Error in checkForRemoteErrors(val) : one node produced an error: argument "nl.obj" is missing, with no default). I have modified the "sim" function code so that it doesn't require this argument and just accepts the default setting for nl.obj - but then my simulation only runs on a single processor. So, I think that by default, "sim" must only be running the code on a single instance of NetLogo. I'm not certain how to fix it.
This is also the first time I've used the 'parallel' package, so I could be missing something obvious to do with 'parApply'. Any insight would be much appreciated!
Thanks in advance!
I am still in the process of applying a similar technique to perform a Morris Elementary Effects screening with my NetLogo model. For me the parallel execution works fine. I compared your script to mine and noticed that in my version the 'parApply' call of the simulation function (simfun) is embedded in a function statement (see below). Maybe including the function already solves your issue.
sim.results.morris <- parApply(cl, mo$X, 1, function(x) {simfun(param.set=x,
no.repeated.sim=no.repeated.sim,
parameter.names=input.names,
iter.length=iter.length,
fixed.values=fixed.values,
model.seed=new.model.seed,
function.name="Morris")})

Resources