I have a question about implementing tryCatch in R. I have a for loop which runs a multiverse analysis (read many variations of the same test). However, before running the test, I shuffle the independent variables. Sometimes, that results in an unlucky combination of independent variables, which makes it impossible to run the analysis, and the analysis throws an error. Now, I would like the loop to just reshuffle and try again whenever that happens. From previous stack overflow posts I saw that tryCatch should do what I want it to, but I can´t find any information on how to implement tryCatch properly. Does anyone have a link or knows how to do that?
Please find below my code:
#Note: This won´t run on your machine, because it uses self-written functions which are too long to post here. It would be sufficient if you can tell me where to put the tryCatch things or send me to a link which explains how to use it to avoid loop terminations.
#setup up numer of iterations for permutations
permutation <- 1:500
#setup count of iterations
count <- 0
set.seed(117)
#set up empty dataframe
df_permutation <- data.frame()
#set up permutation loop
for (i in permutation){
#shuffling of the independent variables
simulate$shuffledemotion <- permute(simulate$Emotion)
simulate$shuffledgender <- permute(simulate$ModelGender)
simulate$shuffledmask <- permute(simulate$MaskStatus)
#run the multiverse, make sure it has the same settings as the original multiverse
df_mult_sim_shuffled <- multiverse.freq.anova(dataframe = simulate, valuevariable = "latency", idvariable = "pp_num", within1 = "shuffledemotion", within2 = "shuffledmask", within3 = "shuffledgender", between1 = NA, TransformationTypes = c("raw"), FixedTrimmingTypes = c("nofixedtrimming"), DataTrimmingTypes = c("notrimming"), data.lower = 1, data.upper = 3, data.step =0.5, fixed.min.lower = 0.05, fixed.min.upper = 0.3, fixed.min.step = 0.05, fixed.max.lower = 8, fixed.max.upper = 10, fixed.max.step = 0.1, RawData = TRUE)
#add +1 to the count for each iteration
count = count + 1
#save the dataset
df_permutation_prelim <- as.data.frame(cbind(df_mult_sim_shuffled, count))
df_permutation <- as.data.frame(rbind(df_permutation, df_permutation_prelim))
}
You can do it just with try. I'd use a while loop so that it just retries until 500 completed runs. Something like this
count <- 0
set.seed(117)
while (count < 500) {
x <- try({
# replace this with your code
if (runif(1) > 0.99) stop()
})
if (!inherits(x, 'try-error')) count <- count+1 else message('tried but failed ', count)
}
Skipping error in for-loop provides a good response!
Sorry, I haven´t seen that earlier!
Related
I am new to R and have difficulties using "if" and "for-loop". sorry if it is duplicated.
as you can see a chuck of a code below, I try to create 100 lm models and save when the R is more than 0.7.
However, the code saved all 100 lm models.
I suspect the statement (!is.na(lm.cv.r[i]) < 0.60) is wrong but I cannot figure it out.
# lets use USArrests data as an example
data("USArrests")
head(USArrests)
df.norm <- USArrests
set.seed(100)
lm.cv.mse <- NULL
lm.cv.r <- NULL
k <- 100
for(i in 1:k){
index.cv <- sample(1:nrow(df.norm),round(0.8*nrow(df.norm)))
df.cv.train <- df.norm[index.cv, ]
df.cv.test <- df.norm[-index.cv, ]
lm.cv <- glm(Rape~., data = df.cv.train)
lm.cv.predicted <- predict(lm.cv, df.cv.test)
lm.cv.mse[i] <- sum((df.cv.test$target - lm.cv.predicted)^2)/nrow(df.cv.test)
lm.cv.r[i] <- as.numeric(round(cor(lm.cv.predicted, df.cv.test$target, method = "pearson"), digits = 3))
if (!is.na(lm.cv.r[i]) > 0.70){
saveRDS(lm.cv, file = paste("lm.cv", lm.cv.r[i], ".rds", sep = ''))
}
}
I'm not familiarized with lm, so I will assume your code is working and the problem is as you said the if statement.
Try this out:
if ((lm.cv.r[i]>0.7) & (is.na(lm.cv.r[i])==FALSE)){
saveRDS(lm.cv, file = paste("lm.cv", lm.cv.r[i], ".rds", sep = ''))
}
So in your code
(!is.na(lm.cv.r[i]) > 0.70)
take a look at the !is.na(lm.cv.r[i]). Assuming that lm.cv.r[i] is a value or a set of values, then applying !is.na will return a value of TRUE since lm.cv.r[i] is not a na value. So you are dealing with this condition: " if TRUE > 0.7 ", which in fact returns TRUEsince 0.7 is less than 1.
In conclusion, you are saving every element since every if is TRUE.
I am trying something pretty simple, want to run a bunch of regressions parallelly. When I use the following data generator (PART 1), The parallel part does not work and give the error listed below
#PART 1
p <- 20; rho<-0.7;
cdc<- diag(p)
for( i in 1:(p-1) ){ for( j in (i+1):p ){
cdc[i,j] <- cdc[j,i] <- rho^abs(i-j)
}}
my.data <- mvrnorm(n=100, mu = rep(0, p), Sigma = cdc)
The following Parallel Part does work but if I generate the data as PART 2
# PART 2
my.data<-matrix(rnorm(1000,0,1),nrow=100,ncol=10)
I configured the function that I want to run parallelly... as
parallel_fun<-function(obj,my.data){
p1 <- nrow(cov(my.data));store.beta<-matrix(0,p1,length(obj))
count<-1
for (itration in obj) {
my_df<-data.frame(my.data)
colnames(my_df)[itration] <- "y"
my.model<-bas.lm(y ~ ., data= my_df, alpha=3,
prior="ZS-null", force.heredity = FALSE, pivot = TRUE)
cf<-coef(my.model, estimator="MPM")
betas<-cf$postmean[-1]
store.beta[ -itration, count]<- betas
count<-count+1
}
result<-list('Beta'=store.beta)
}
So I write the following way of running parlapply
{
no_cores <- detectCores(logical = TRUE)
myclusternumber<-(no_cores-1)
cl <- makeCluster(myclusternumber)
registerDoParallel(cl)
p1 <- ncol(my.data)
obj<-splitIndices(p1, myclusternumber)
clusterExport(cl,list('parallel_fun','my.data','obj'),envir=environment())
clusterEvalQ(cl, {
library(MASS)
library(Matrix)
library(BAS)
})
newresult<-parallel::parLapply(cl,obj,fun = parallel_fun,my.data)
stopCluster(cl)
}
But whenever am doing PART 1 I get the following error
Error in checkForRemoteErrors(val) :
7 nodes produced errors; first error: object 'my_df' not found
But this should not happen, the data frame should be created, I have no idea why this is happening. Any help is appreciated.
Posting this as one possible workaround, see if it works:
parallel_fun<-function(obj,my.data){
p1 <- nrow(cov(my.data));store.beta<-matrix(0,p1,length(obj))
count<-1
for (itration in obj) {
my_df<-data.frame(my.data)
colnames(my_df)[itration] <- "y"
my_df <<- my_df
my.model<-bas.lm(y ~ ., data= my_df, alpha=3,
prior="ZS-null", force.heredity = FALSE, pivot = TRUE)
cf<-BAS:::coef.bas(my.model, estimator="MPM")
betas<-cf$postmean[-1]
store.beta[ -itration, count]<- betas
count<-count+1
}
result<-list('Beta'=store.beta)
}
The issue seems to be with BAS:::coef.bas function, that calls eval in order to get my_df and fails to do that when called in parallel. The "hack" here is to force my_df out to the parent environment by calling my_df <<- my_df.
There should be a better way to do this, but <<- might be the fastest one. In general, <<- may cause unwanted behaviour, especially when used in loops. Assigning unique variable name before exporting (and don't forgetting to remove after use) is one way to tackle them.
I am trying to learn how to implement control structures such as FOR and while loops.
I created a function the simulates betting in a famous Brazilian lotto.
In the lotto, a player bets on 6 unique integers from a 1:60 vector (called your_bet).
The function samples 6 values from the 1 to 60 universe ("result") and tests how many values in result match your_bet, printing out:
your_bet
result
total points (out of 6 possible)
one of three possible commentaries on the result of the bet.
Code as follows:
```
LOTTO<-function(your_bet=sample(1:60, size=6, replace=FALSE)){
result<-sample(1:60, size=6, replace=FALSE)
logical_vector<-(your_bet %in% result)
total_points<-sum(as.integer(logical_vector))
print(paste(c("Your bet:", as.character(your_bet))), collapse="")
print(paste(c("Result", as.character(result))), collapse="")
print(paste(c("Total points", as.character(total_points))), collapse="")
if (total_points==6)
print("You are a millonaire")
else if (total_points==5)
print("5 points, you are rich!")
else print("good luck next time")
}
```
I then tried to implement a loop that would make the function go over and over again in a loop until total points>= a given target (here as target_points), modifying the function as bellow.
```
LOTTO<-function(your_bet=sample(1:60, size=6, replace=FALSE), stubborn_until_x_points=FALSE,
target_points)#inserted stubborn_until_x_points and target_points arguments{
result<-sample(1:60, size=6, replace=FALSE)
logical_vector<-(your_bet %in% result)
total_points<-sum(as.integer(logical_vector))
print(paste(c("Your bet:", as.character(your_bet))), collapse="")
print(paste(c("Result", as.character(result))), collapse="")
print(paste(c("Total points", as.character(total_points))), collapse="")
if (total_points==6)
print("You are a millonaire")
else if (total_points==5)
print("5 points, you are rich!")
else print("good luck next time")
if (stubborn_until_x_points==TRUE)#Inserted WHILE loop here{
while(total_points < target_points){
LOTTO(your_bet, stubborn_until_x_points=TRUE, target_points)}
}
}
```
This did make the function repeat in a loop, but for some reason, it keeps looping even when the condition is met.
Observation - I realized that when the condition is met on the first run, it actually stops, but after entering a loop, it goes on forever.
I can´t find what is wrong. Any ideas?
Thanks in advance.
I haven't dug quite deep into the function, but the only reason that this should not break at any point is when target_points > n_samples (6 here).
The problem in this case is quite obvious and simple to fix. Reduce target_points to be less than 6 or add n_samples (6 atm.) and make it greater than target_points. Rather than this I suspect the main problem lies within the recursive function though. R is rather restrictive when it comes to recursion, if one tries to do a simple recursion for example
i <- 0
f <- function(){
i <<- i + 1
f()
}
f()
i
# 896 on my pc
one can see that we cannot use recursion for very deep recursive functions (in R). This throws the very unhelpful error
Error: C stack usage 7974196 is too close to the limit
To alleviate this, one simply has to remove the recursion (simply in italian because sometimes it is not simple). In this case we just move the while loop to contain the main body of the function, and use an if statement to break early if necessary.
Below is a slightly modified version of the function (note that sample_n and number_range has been added as arguments).
In this function the while loop has been moved to contain the main body, and the result is instead printed at the end (using the variable res to figure out the result). At the end of the loop I use a if(isFALSE(stubborn_until_x_points))break statement to exit early if necessary.
LOTTO <- function(your_bet,
sample_n = 6,
number_range = 1:60,
stubborn_until_x_points = FALSE,
target_points){
if(missing(target_points) || target_points > sample_n)
stop('missing target_points or target_points too large')
total_points <- -Inf # Always smaller than target_points
i <- 0
res <- 0
# If you want a new bet every iteration.
# Add this at the end of the loop, but remove the 'if'
if(missing(your_bet))
your_bet <- sample(number_range, size=sample_n, replace=FALSE)
while(total_points < target_points){
result <- sample(number_range, size=sample_n, replace=FALSE)
logical_vector <- your_bet %in% result
total_points <- sum(logical_vector)
if (total_points==6){
res <- 1
}else if (total_points==5){
res <- 2
}
i <- i + 1
if(isFALSE(stubborn_until_x_points))
break
}
if(res == 1)
cat(res <- 'You\'re a millionaire!\n', sep = '\n')
else if(res == 2)
cat(res <- '5 points, you are rich!\n', sep = '\n')
else
cat(res <- 'Better luck next time.', sep = '\n')
c(result = res, number_of_tries = i)
}
The function is called as before, but now also returns the number of attempts and the result obtained from trials as shown below.
LOTTO(target_points = 6, stubborn_until_x_points = TRUE)
You're a millionaire!
#Output:
result number_of_tries
"You're a millionaire!\n" "8297820"
Removing recursion, including the body of the function inside the while loop, assigning -Inf to the initial total_points and adding the break statement were much usefull.
Addapting the answer by #Oliver yielded exactly what I was looking for:
LOTTO<-function(your_bet=sample(1:60, size=6, replace=FALSE), stubborn_until_x_points=FALSE,
target_points=0){
total_points<--Inf
while(total_points < target_points){
result<-sample(1:60, size=6, replace=FALSE)
logical_vector<-(your_bet %in% result)
total_points<-sum(as.integer(logical_vector))
print(paste(c("Your bet:", as.character(your_bet))), collapse="")
print(paste(c("Result", as.character(result))), collapse="")
print(paste(c("Total points", as.character(total_points))), collapse="")
if (total_points==6)
print("You are a millonaire")
else if (total_points==5)
print("5 points, you are rich!")
else print("good luck next time")
if (isFALSE(stubborn_until_x_points==TRUE))
break
}
His answer, however, yields interesting additional results, and better control over inadequate argument inputs
I am aware that the "skip error in for loop" has been answered multiple times (see How to skip an error in a loop or Skip Error and Continue Function in R). But all answers are complex and difficult to apply to a different situation for a novice.
I am performing a Gaussian histogram fitting on 100's of datasets using a piece of code.
results = list()
for(i in 1:length(T_files)){
R = Table[i][,1]
tab = data.frame(x = seq_along(R), r = R)
res = nls(R ~ k*exp(-1/2*(x-mu)^2/sigma^2), start=c(mu=15,sigma=5, k=1) , data = tab)
v = summary(res)$parameters[,"Estimate"]
fun = function(x) v[3]*exp(-1/2*(x-v[1])^2/v[2]^2)
results[[i]] = fun(seq(0, 308, 1))/max(fun_SP(seq(0, 308, 1)))/2
}
The code works on most datasets when tested on each individual. However, the loop does not and shows the "error in nls(...): singular gradient" message. I want to skip this message and continue to the next dataset.
I know that a tryCatch function may be used, but the line containing the nls function is complex and I have not found a way to use correctly tryCatch in this line. Any advice is welcome :-)
Use the function try, it allows you save an error and then put a condition if(error==T) then "pass to next df". Something like this:
error<-try(your code...)
if(class(error)!="try-error"){pass to the next one}
In yor case, maybe must be:
results = list()
for(i in 1:length(T_files)){
R = Table[i][,1]
tab = data.frame(x = seq_along(R), r = R)
error = try(res <- nls(R ~ k*exp(-1/2*(x-mu)^2/sigma^2), start=c(mu=15,sigma=5, k=1) , data = tab))
if(class(error)!="try-error"){
v = summary(res)$parameters[,"Estimate"]
fun = function(x) v[3]*exp(-1/2*(x-v[1])^2/v[2]^2)
results[[i]] = fun(seq(0, 308, 1))/max(fun_SP(seq(0, 308, 1)))/2
}else{
pass to next data frame (or something like that)
}
}
I'm running a principal component analysis on a dataset with more than 1000 variables. I'm using R Studio and when I run the summary to see the cumulative variance of the components, I can only see the last few hundred components. How do I limit the summary to only show, say, the first 100 components?
I tried this and it seems to be working:
l = loadings(prin)
l[,1:100]
It's pretty easy to modify print.summary.princomp (you can see the original code by typing stats:::print.summary.princomp) to do this:
pcaPrint <- function (x, digits = 3, loadings = x$print.loadings, cutoff = x$cutoff,n, ...)
{
#Check for sensible value of n; default to full output
if (missing(n) || n > length(x$sdev) || n < 1){n <- length(x$sdev)}
vars <- x$sdev^2
vars <- vars/sum(vars)
cat("Importance of components:\n")
print(rbind(`Standard deviation` = x$sdev[1:n], `Proportion of Variance` = vars[1:n],
`Cumulative Proportion` = cumsum(vars)[1:n]))
if (loadings) {
cat("\nLoadings:\n")
cx <- format(round(x$loadings, digits = digits))
cx[abs(x$loadings) < cutoff] <- paste(rep(" ", nchar(cx[1,
1], type = "w")), collapse = "")
print(cx[,1:n], quote = FALSE, ...)
}
invisible(x)
}
pcaPrint(summary(princomp(USArrests, cor=TRUE),
loadings = TRUE, cutoff = 0.2), digits = 2,n = 2)
Edited To include a basic check for a sensible value for n. Now that I've done this, I wonder if it isn't worth suggesting to R Core as a permanent addition; seems simple and like it might be useful.
You can put the loadings in matrix form, you could save the matrix to a variable and then subset (a la matrix[,1:100]) it to see the first/middle/last n. In this example, I've used head(). Each column is a principle component.
head(
matrix(
prin$loadings,
ncol=length(dimnames(prin$loadings)[[2]]),
nrow=length(dimnames(prin$loadings)[[1]])
),
100)