I am aware that the "skip error in for loop" has been answered multiple times (see How to skip an error in a loop or Skip Error and Continue Function in R). But all answers are complex and difficult to apply to a different situation for a novice.
I am performing a Gaussian histogram fitting on 100's of datasets using a piece of code.
results = list()
for(i in 1:length(T_files)){
R = Table[i][,1]
tab = data.frame(x = seq_along(R), r = R)
res = nls(R ~ k*exp(-1/2*(x-mu)^2/sigma^2), start=c(mu=15,sigma=5, k=1) , data = tab)
v = summary(res)$parameters[,"Estimate"]
fun = function(x) v[3]*exp(-1/2*(x-v[1])^2/v[2]^2)
results[[i]] = fun(seq(0, 308, 1))/max(fun_SP(seq(0, 308, 1)))/2
}
The code works on most datasets when tested on each individual. However, the loop does not and shows the "error in nls(...): singular gradient" message. I want to skip this message and continue to the next dataset.
I know that a tryCatch function may be used, but the line containing the nls function is complex and I have not found a way to use correctly tryCatch in this line. Any advice is welcome :-)
Use the function try, it allows you save an error and then put a condition if(error==T) then "pass to next df". Something like this:
error<-try(your code...)
if(class(error)!="try-error"){pass to the next one}
In yor case, maybe must be:
results = list()
for(i in 1:length(T_files)){
R = Table[i][,1]
tab = data.frame(x = seq_along(R), r = R)
error = try(res <- nls(R ~ k*exp(-1/2*(x-mu)^2/sigma^2), start=c(mu=15,sigma=5, k=1) , data = tab))
if(class(error)!="try-error"){
v = summary(res)$parameters[,"Estimate"]
fun = function(x) v[3]*exp(-1/2*(x-v[1])^2/v[2]^2)
results[[i]] = fun(seq(0, 308, 1))/max(fun_SP(seq(0, 308, 1)))/2
}else{
pass to next data frame (or something like that)
}
}
Related
I am trying something pretty simple, want to run a bunch of regressions parallelly. When I use the following data generator (PART 1), The parallel part does not work and give the error listed below
#PART 1
p <- 20; rho<-0.7;
cdc<- diag(p)
for( i in 1:(p-1) ){ for( j in (i+1):p ){
cdc[i,j] <- cdc[j,i] <- rho^abs(i-j)
}}
my.data <- mvrnorm(n=100, mu = rep(0, p), Sigma = cdc)
The following Parallel Part does work but if I generate the data as PART 2
# PART 2
my.data<-matrix(rnorm(1000,0,1),nrow=100,ncol=10)
I configured the function that I want to run parallelly... as
parallel_fun<-function(obj,my.data){
p1 <- nrow(cov(my.data));store.beta<-matrix(0,p1,length(obj))
count<-1
for (itration in obj) {
my_df<-data.frame(my.data)
colnames(my_df)[itration] <- "y"
my.model<-bas.lm(y ~ ., data= my_df, alpha=3,
prior="ZS-null", force.heredity = FALSE, pivot = TRUE)
cf<-coef(my.model, estimator="MPM")
betas<-cf$postmean[-1]
store.beta[ -itration, count]<- betas
count<-count+1
}
result<-list('Beta'=store.beta)
}
So I write the following way of running parlapply
{
no_cores <- detectCores(logical = TRUE)
myclusternumber<-(no_cores-1)
cl <- makeCluster(myclusternumber)
registerDoParallel(cl)
p1 <- ncol(my.data)
obj<-splitIndices(p1, myclusternumber)
clusterExport(cl,list('parallel_fun','my.data','obj'),envir=environment())
clusterEvalQ(cl, {
library(MASS)
library(Matrix)
library(BAS)
})
newresult<-parallel::parLapply(cl,obj,fun = parallel_fun,my.data)
stopCluster(cl)
}
But whenever am doing PART 1 I get the following error
Error in checkForRemoteErrors(val) :
7 nodes produced errors; first error: object 'my_df' not found
But this should not happen, the data frame should be created, I have no idea why this is happening. Any help is appreciated.
Posting this as one possible workaround, see if it works:
parallel_fun<-function(obj,my.data){
p1 <- nrow(cov(my.data));store.beta<-matrix(0,p1,length(obj))
count<-1
for (itration in obj) {
my_df<-data.frame(my.data)
colnames(my_df)[itration] <- "y"
my_df <<- my_df
my.model<-bas.lm(y ~ ., data= my_df, alpha=3,
prior="ZS-null", force.heredity = FALSE, pivot = TRUE)
cf<-BAS:::coef.bas(my.model, estimator="MPM")
betas<-cf$postmean[-1]
store.beta[ -itration, count]<- betas
count<-count+1
}
result<-list('Beta'=store.beta)
}
The issue seems to be with BAS:::coef.bas function, that calls eval in order to get my_df and fails to do that when called in parallel. The "hack" here is to force my_df out to the parent environment by calling my_df <<- my_df.
There should be a better way to do this, but <<- might be the fastest one. In general, <<- may cause unwanted behaviour, especially when used in loops. Assigning unique variable name before exporting (and don't forgetting to remove after use) is one way to tackle them.
I have a question about implementing tryCatch in R. I have a for loop which runs a multiverse analysis (read many variations of the same test). However, before running the test, I shuffle the independent variables. Sometimes, that results in an unlucky combination of independent variables, which makes it impossible to run the analysis, and the analysis throws an error. Now, I would like the loop to just reshuffle and try again whenever that happens. From previous stack overflow posts I saw that tryCatch should do what I want it to, but I can´t find any information on how to implement tryCatch properly. Does anyone have a link or knows how to do that?
Please find below my code:
#Note: This won´t run on your machine, because it uses self-written functions which are too long to post here. It would be sufficient if you can tell me where to put the tryCatch things or send me to a link which explains how to use it to avoid loop terminations.
#setup up numer of iterations for permutations
permutation <- 1:500
#setup count of iterations
count <- 0
set.seed(117)
#set up empty dataframe
df_permutation <- data.frame()
#set up permutation loop
for (i in permutation){
#shuffling of the independent variables
simulate$shuffledemotion <- permute(simulate$Emotion)
simulate$shuffledgender <- permute(simulate$ModelGender)
simulate$shuffledmask <- permute(simulate$MaskStatus)
#run the multiverse, make sure it has the same settings as the original multiverse
df_mult_sim_shuffled <- multiverse.freq.anova(dataframe = simulate, valuevariable = "latency", idvariable = "pp_num", within1 = "shuffledemotion", within2 = "shuffledmask", within3 = "shuffledgender", between1 = NA, TransformationTypes = c("raw"), FixedTrimmingTypes = c("nofixedtrimming"), DataTrimmingTypes = c("notrimming"), data.lower = 1, data.upper = 3, data.step =0.5, fixed.min.lower = 0.05, fixed.min.upper = 0.3, fixed.min.step = 0.05, fixed.max.lower = 8, fixed.max.upper = 10, fixed.max.step = 0.1, RawData = TRUE)
#add +1 to the count for each iteration
count = count + 1
#save the dataset
df_permutation_prelim <- as.data.frame(cbind(df_mult_sim_shuffled, count))
df_permutation <- as.data.frame(rbind(df_permutation, df_permutation_prelim))
}
You can do it just with try. I'd use a while loop so that it just retries until 500 completed runs. Something like this
count <- 0
set.seed(117)
while (count < 500) {
x <- try({
# replace this with your code
if (runif(1) > 0.99) stop()
})
if (!inherits(x, 'try-error')) count <- count+1 else message('tried but failed ', count)
}
Skipping error in for-loop provides a good response!
Sorry, I haven´t seen that earlier!
I make this code using a for-statement. (The main purpose of this code is to list different webpages, which are obtained via httr and rvest)
r = "asdgkjkhdf"
t = "osrt"
all = c()
for(i in 1:400)
{
y = paste(r, i, sep = '')
d = paste(y, t, sep = '')
all = c(all, d)
}
all
I got things like these (pasted numbers are actually getting accumulated in the each results)
[1]asdgkjkhdf1osrt
[2]asdgkjkhdf12osrt
[3]asdgkjkhdf123osrt
[4]asdgkjkhdf1234osrt
...
But I want results like these regardless of how many numbers i put in 'for()'function.
[1]asdgkjkhdf1osrt
[2]asdgkjkhdf2osrt
...
[400]asdgkjkhdf400osrt
like these above
What should I change in order to have what I want to result in?
Should I use paste(substr(), substr(), sep='')?
If you really want to use a for-statement you can use the following
r = "asdgkjkhdf"
t = "osrt"
all = c()
for (idx in 1:400)
all = c(all, paste0(r, idx, t))
However, in R you should prefer code without for-statements since, in general, this is less readable and hurts performance. The solution without the for-statement (given by Roland in the comments) equals
all <- paste0(r, 1:400, t)
Note that paste0("string")is just a short notation for paste("string", sep='').
I'm having an issue in R where I am running a cor.test on a data frame where there are multiple groups.
I am trying to obtain the correlation coefficient for one dependent variable and multiple independent variables contained in a data frame. The data frame has 2 grouping columns for subsetting the data. Here is an example:
DF <- data.frame(group1=rep(1:4,3),group2=rep(1:2,6),x=rnorm(12),v1=rnorm(12),v2=rnorm(12),v3=rnorm(12))
I created the following script that uses plyr to calculate the correlation coefficient for each of the groups and then loop through for each of the variables.
library(plyr)
group_cor <- function(DF,x,y)
{
return(data.frame(cor = cor.test(DF[,x], DF[,y])$estimate))
}
resultDF <- ddply(DF, .(group1,group2), group_cor,3,4)
for(i in 5:6){
resultDF2 <- ddply(DF, .(group1,group2), group_cor,3,i)
resultDF <- merge(resultDF,resultDF2,by=c("group1","group2"))
rm(resultDF2)
}
This works fine. The problem I'm running into is when there aren't enough values in a group to calculate the correlation coefficient. For example: when I change the data frame created above to now include a few key NA values and then try to run the same loop:
DF[c(2,6,10),5]=NA
for(i in 5:6){
resultDF2 <- ddply(DF, .(group1,group2), group_cor,3,i)
resultDF <- merge(resultDF,resultDF2,by=c("group1","group2"))
rm(resultDF2)
}
I get the following error "Error: not enough finite observations"
I understand why I get this error and am not expecting to get a correlation coefficient for these cases. But what I would like to do is to pass out a null value and move on the the next group instead of stopping my code at an error.
I've tried using a wrapper with try() but can't seem to pass that variable into my result data frame.
Any help on how to get around this would be much appreciated.
I invariably forget to use try if I haven't use it in, oh, a day or something. This link helped me remember the basics.
For your function, you could add it in like this:
group_cor = function(DF,x,y) {
check = try(cor.test(DF[,x], DF[,y])$estimate, silent = TRUE)
if(class(check) != "try-error")
return(data.frame(cor = cor.test(DF[,x], DF[,y])$estimate))
}
However, the won't return anything for the group with the error. That's actually OK if you use the all argument when you merge. Here's another way to merge, saving everything into a list with lapply and then merging with Reduce.
allcor = lapply(4:6, function(i) ddply(DF, .(group1,group2), group_cor, 3, i))
Reduce(function(...) merge(..., by = c("group1", "group2"), all = TRUE), allcor)
If you want to fill in with NA inside the function rather than waiting to fill in using merge, you could change your function to:
group_cor2 = function(DF,x,y) {
check = try(cor.test(DF[,x], DF[,y])$estimate, silent = TRUE)
if(class(check) == "try-error")
return(data.frame(cor = NA))
return(data.frame(cor = cor.test(DF[,x], DF[,y])$estimate))
}
Finally (and outside the scope of the question), depending on what you are doing with your output, you might consider naming your columns uniquely based on which columns you are doing the cor.test for so merge doesn't name them all with suffixes. There is likely a better way to do this, maybe with merge and the suffixes argument.
group_cor3 = function(DF,x,y) {
check = try(cor.test(DF[,x], DF[,y])$estimate, silent = TRUE)
if(class(check) != "try-error") {
dat = data.frame(cor = cor.test(DF[,x], DF[,y])$estimate)
names(dat) = paste("cor", x, "vs", y, sep = ".")
dat
}
}
I am trying to analyze how varying starting conditions and variable values in a set of differential equations (that describe the progression of a disease through a population) influences the dynamics of the system (as seen via graph). I have written the code and it works perfectly well, but forces me to change a value and then re-run the whole code.
I am therefore trying to put this code inside of manipulate() so I can manipulate the variables and immediately see the effect on the produced graph:
library(deSolve)
library(manipulate)
xyz <- function(time, state, parameters) {
with(as.list(c(state, parameters)), {
dt <- 1
dv <- v0*v1*cos(2*pi*t/33)
dX <- (v*N-B*X*Y-(mu+N/K)*X)
dY <- (B*X*Y-(mu+m+g+N/K)*Y)
dZ <- (g*Y-(mu+N/K)*Z)
dN <- (v-mu-N/K)*N-m*Y
return(list(c(dt,dv,dX, dY, dZ, dN)))
})
}
times <- seq(0,365*3,by = 1)
init <- c(t=0,v=0.02,X=995,Y=5,Z=0,N=1000)
parameters <- c(B=0.14,mu=.01,m=.075,g=.025,K=10000,v0=.02,v1=.5)
manipulate(
out <- as.data.frame(ode(y = init, times = seq(0,365*3,by=1), func = xyz, parms = parameters)),
matplot(times,out[4:6],type="l",xlab="Time",ylab="Susceptibles and Recovereds",main="SIR Model",lwd=1,lty=1,bty="l",col=2:4),
B=slider(0,1,initial=0.14,step=0.01)
)
I keep getting error messages regardless if I have all or part of the code inside manipulate(), define variables outside or inside of it, or anything else. Any help would be greatly appreciated!
When I first ran your code, the error I encountered was:
Error in manipulate(out <- as.data.frame(ode(y = init, times = seq(0, :
all controls passed to manipulate must be named
This error occurred because the second argument to manipulate was the matplot() command rather than a named control argument (such as a slider). So I placed the first two lines within curly braces to make them a single expression:
manipulate( {
out <- as.data.frame(ode(y = init, times = seq(0,365*3,by=1), func = xyz, parms = parameters))
matplot(times,out[4:6],type="l",xlab="Time",ylab="Susceptibles and Recovereds",main="SIR Model",lwd=1,lty=1,bty="l",col=2:4)
}, B=slider(0,1,initial=0.14,step=0.01)
)
This eliminates the error, but moving the slider doesn't do anything to the plot. Why? Because the slider named B doesn't refer to anything within the expression passed to manipulate(). I solved that by moving the parameters <- ... line into the manipulate expression and then changing that line so that there was a variable B (not just a name in the list); in other words, we need B=B instead of B=0.14. Now the plot changes when you move the slider, which I believe is what you wanted:
manipulate( {
parameters <- c(B=B,mu=.01,m=.075,g=.025,K=10000,v0=.02,v1=.5)
out <- as.data.frame(ode(y = init, times = seq(0,365*3,by=1), func = xyz, parms = parameters))
matplot(times,out[4:6],type="l",xlab="Time",ylab="Susceptibles and Recovereds",main="SIR Model",lwd=1,lty=1,bty="l",col=2:4)
}, B=slider(0,1,initial=0.14,step=0.01)
)
times <- seq(0,365*3,by = 1)
init <- c(t=0,v=0.02,X=995,Y=5,Z=0,N=1000)
plot.ode <- function(B.param) {
parameters <- c(B=B.param,mu=.01,m=.075,g=.025,K=10000,v0=.02,v1=.5)
out <- as.data.frame(ode(y = init, times = seq(0,365*3,by=1), func = xyz, parms = parameters))
matplot(times,out[4:6],type="l",xlab="Time",ylab="Susceptibles and Recovereds",main="SIR Model",lwd=1,lty=1,bty="l",col=2:4, ylim=c(0,200))
}
manipulate(plot.ode(B), B=slider(0,1,initial=0.14,step=0.01))
Seems a little odd that only the red curve is influenced by changing B.