Related
My analysis will randomly sample values from pre-specified distributions for each parameter. I am using the rdecision package for this.
For example, in a simplified example where I have just two parameters:
v1 <- BetaModVar$new("Beta1", alpha = a1, beta = b1, units="")
v2 <- BetaModVar$new("Beta2", alpha = a2, beta = b2, units="")
I want to create
v3 <- v1 + v2
However, this is not possible given the nature of v1 and v2. How can I create v3? Essentially this would combine values randomly drawn from the respective distributions of v1 and v2.
Assuming your question is about using the rdecision package, you can add the v1 and v2 model variables by creating v3 as an expression model variable.
library(rdecision)
library(rlang)
a1 <- 2
b1 <- 2
a2 <- 20
b2 <- 20
# with model variables
v1 <- BetaModVar$new(description = "v1", units = "", alpha = a1, beta = b1)
v2 <- BetaModVar$new(description = "v2", units = "", alpha = a2, beta = b2)
v3 <- ExprModVar$new(description = "v3", units = "", quo = quo(v1 + v2))
print(v3$mu_hat())
print(v3$sigma_hat())
# with base R
V1 <- rbeta(1000, shape1 = a1, shape2 = b1)
V2 <- rbeta(1000, shape1 = a2, shape2 = b2)
V3 <- V1 + V2
print(mean(V3))
print(sd(V3))
Expression model variables can be used in decision analytic models in the same way as regular model variables whose uncertainty follows a particular distribution. But some of their properties (such as standard deviation) may be undefined, so expression model variables offer the mu_hat and sigma_hat methods to allow the mean and standard deviation to be estimated, if you want to summarise their distributions, as in the example above.
I am trying to perform a nls fit on multiple experiments (say exp1 and exp2).
When running the code on R 3.6 (or previous versions) it works nicely (for instance here: https://www.tutorialspoint.com/execute_r_online.php on 3.4.1).
On R 4.0.2, I get an error:
Error in nls(donnees$ally ~ donnees$allx * a[donnees$allexp] + b[donnees$allexp], : Missing value or an infinity produced when evaluating the model
valx1<-c(1,2,3,4,5,6)
valx2<-c(2,7,8,9)
allx<-c(valx1,valx2)
valy1<-c(2,3,8,9,10,12)
valy2<-c(3,10,11,12)
ally<-c(valy1,valy2)
exp1<-rep('exp1',length(valx1))
exp2<-rep('exp2',length(valx2))
allexp<-c(exp1,exp2)
df<-data.frame(allx,ally,allexp)
fitexp<-nls(df$ally ~ df$allx*a[df$allexp]+b[df$allexp],start=list(a=c(1,1),b=c(1,1)))
summary(fitexp)
In R 4.0+ allexp is character whereas it would have been factor in earlier versions of R. Convert it to a factor or numeric variable. We convert to a factor below. Also the data frame is normally written as a separate argument and not part of the formula.
df2 <- transform(df, allexp = factor(allexp))
nls(ally ~ allx * a[allexp] + b[allexp], df2,
start = list(a = c(1, 1),b = c(1, 1)))
Also note that nls is not needed since the model is linear and lm could have been used with the appropriate formula:
df3 <- within(df, {
b1 = +(allexp == "exp1")
b2 = +(allexp == "exp2")
a1 = allx * b1
a2 = allx * b2
})
lm(ally ~ a1 + a2 + b1 + b2 + 0, df3)
I search for one approach for comparing linear, non-linear and different parameterization non-linear models. For this:
#Packages
library(nls2)
library(minpack.lm)
# Data set - Diameter in function of Feature and Age
Feature<-sort(rep(c("A","B"),22))
Age<-c(60,72,88,96,27,
36,48,60,72,88,96,27,36,48,60,72,
88,96,27,36,48,60,27,27,36,48,60,
72,88,96,27,36,48,60,72,88,96,27,
36,48,60,72,88,96)
Diameter<-c(13.9,16.2,
19.1,19.3,4.7,6.7,9.6,11.2,13.1,15.3,
15.4,5.4,7,9.9,11.7,13.4,16.1,16.2,
5.9,8.3,12.3,14.5,2.3,5.2,6.2,8.6,9.3,
11.3,15.1,15.5,5,7,7.9,8.4,10.5,14,14,
4.1,4.9,6,6.7,7.7,8,8.2)
d<-dados <- data.frame(Feature,Age,Diameter)
str(d)
I will create three different models, two non-linear models with specific parametization and one linear model. In my example
a suppose that all the coefficients of each mode were significant (and not considering real results).
# Model 1 non-linear
e1<- Diameter ~ a1 * Age^a2
#Algoritm Levenberg-Marquardt
m1 <- nlsLM(e1, data = d,
start = list(a1 = 0.1, a2 = 10),
control = nls.control(maxiter = 1000))
# Model 2 linear
m2<-lm(Diameter ~ Age, data=d)
# Model 3 another non-linear
e2<- Diameter ~ a1^(-Age/a2)
m3 <- nls2(e2, data = d, alg = "brute-force",
start = data.frame(a1 = c(-1, 1), a2 = c(-1, 1)),
control = nls.control(maxiter = 1000))
Now, my idea is comparing the "better" model despite the different nature of each model, than I try a proportional measure
and for this I use each mean square error of each model comparing of total square error in data set, when a make this I have if
a comparing model 1 and 2:
## MSE approach (like pseudo R2 approach)
#Model 1
SQEm1<-summary(m1)$sigma^2*summary(m1)$df[2]# mean square error of model
SQTm1<-var(d$Diameter)*(length(d$Diameter)-1)#total square error in data se
R1<-1-SQEm1/SQTm1
R1
#Model 2
SQEm2<-summary(m2)$sigma^2*summary(m2)$df[2]# mean square error of model
R2<-1-SQEm2/SQTm1
R2
In my weak opinion model 1 is "better" that model 2. My question is, does this approach sounds correct? Is there any way to compare these models types?
Thanks in advance!
#First cross-validation approach ------------------------------------------
#Cross-validation model 1
set.seed(123) # for reproducibility
n <- nrow(d)
frac <- 0.8
ix <- sample(n, frac * n) # indexes of in sample rows
e1<- Diameter ~ a1 * Age^a2
#Algoritm Levenberg-Marquardt
m1 <- nlsLM(e1, data = d,
start = list(a1 = 0.1, a2 = 10),
control = nls.control(maxiter = 1000), subset = ix)# in sample model
BOD.out <- d[-ix, ] # out of sample data
pred <- predict(m1, new = BOD.out)
act <- BOD.out$Diameter
RSS1 <- sum( (pred - act)^2 )
RSS1
#[1] 56435894734
#Cross-validation model 2
m2<-lm(Diameter ~ Age, data=d,, subset = ix)# in sample model
BOD.out2 <- d[-ix, ] # out of sample data
pred <- predict(m2, new = BOD.out2)
act <- BOD.out2$Diameter
RSS2 <- sum( (pred - act)^2 )
RSS2
#[1] 19.11031
# Sum of squares approach -----------------------------------------------
deviance(m1)
#[1] 238314429037
deviance(m2)
#[1] 257.8223
Based in gfgm and G. Grothendieck comments, RSS2 has lower error that RSS1 and comparing deviance(m2) and deviance(m2) too, than model 2 is better than model 1.
I am trying to run a Monte Carlo simulation of a difference in differences estimator, but I am running into an error. Here is the code I am running:
# Set the random seed
set.seed(1234567)
library(MonteCarlo)
#Set up problem, doing this before calling the function
# set sample size
n<- 400
# set true parameters: betas and sd of u
b0 <- 1 # intercept for control data (b0 in diffndiff)
b1 <- 1 # shift on both control and treated after treatment (b1 in
#diffndiff)
b2 <- 2 # difference between intercept on control vs. treated (b2-this is
#the level difference pre-treatment to compare to coef on treat)
b3 <- 3 # shift after treatment that is only for treated group (b3-this is
#the coefficient of interest in diffndiff)
b4 <- 0 # parallel time trend (not measured in diffndiff) biases b0,b1 but
#not b3 that we care about
b5 <- 0 # allows for treated group trend to shift after treatment (0 if
#parallel trends holds)
su <- 4 # std. dev for errors
dnd <- function(n,b0,b1,b2,b3,b4,b5,su){
#initialize a time vector (set observations equal to n)
timelength = 10
t <- c(1:timelength)
num_obs_per_period = n/timelength #allows for multiple observations in one
#time period (can simulate multiple states within one group or something)
t0 <- c(1:timelength)
for (p in 1:(num_obs_per_period-1)){
t <- c(t,t0)
}
T<- 5 #set treatment period
g <- t >T
post <- as.numeric(g)
# assign equal amounts of observations to each state to start with (would
#like to allow selection into treatment at some point)
treat <- vector()
for (m in 1:(round(n/2))){
treat <- c(treat,0)
}
for (m in 1:(round(n/2))){
treat <- c(treat,1)
}
u <- rnorm(n,0,su) #This assumes the mean error is zero
#create my y vector now from the data
y<- b0 + b1*post + b2*treat + b3*treat*post + b4*t + b5*(t-T)*treat*post +u
interaction <- treat*post
#run regression
olsres <- lm(y ~ post + treat + interaction)
olsres$coefficients
# assign the coeeficients
bhat0<- olsres$coefficients[1]
bhat1 <- olsres$coefficients[2]
bhat2<- olsres$coefficients[3]
bhat3<- olsres$coefficients[4]
bhat3_stderr <- coef(summary(olsres))[3, "Std. Error"]
#Here I will use bhat3 to conduct a t-test and determine if this was a pass
#or a fail
tval <- (bhat3-b3)/ bhat3_stderr
#decision at 5% confidence I believe (False indicates the t-stat was less
#than 1.96, and we fail to reject the null)
decision <- abs(tval) > 1.96
decision <- unname(decision)
return(list(decision))
}
#Define a parameter grid to simulate over
from <- -5
to <- 5
increment <- .25
gridparts<- c(from , to , increment)
b5_grid <- seq(from = gridparts[1], to = gridparts[2], by = gridparts[3])
parameter <- list("n" = n, "b0" = b0 , "b1" = b1 ,"b2" = b2 ,"b3" = b3 ,"b4"
=
b4 ,"b5" = b5_grid ,"su" = su)
#Now simulate this multiple times in a monte carlo setting
results <- MonteCarlo(func = dnd ,nrep = 100, param_list = parameter)
And the error that comes up is:
in results[[i]] <- array(NA, dim = c(dim_vec, nrep)) :
attempt to select less than one element in integerOneIndex
This leads me to believe that somewhere something is attempting to access the "0th" element of a vector, which doesn't exist in R as far as I understand. I don't think the part that is doing this arises from my code vs. internal to this package however, and I can't make sense of the code that runs when I run the package.
I am also open to hearing about other methods that will essentially replace simulate() from Stata.
The function passed to MonteCarlo must return a list with named components. Changing line 76 to
return(list("decision" = decision))
should work
I have a logistic regression using glm and I would like to add a term of the form
c1(k+ac2)/(t+c2)
where k and t are columns in a data frame, a is a constant. I would like R to find best-fit values for c1 and c2. Is this possible?
If I only wanted a fixed value, say c2 = 2,
c1(k+2a)/(t+2)
I could just write
glm( model$y ~ I((model$k + 2*a)/(model$t + 2)) + model$otherterms,
family = binomial(logit) )
which is similar to what I am doing now. But I don't think that 2 is optimal and iterating 'manually' is very time-consuming.
You can use function gnm from package gnm.
gnm(y~Mult(1, # c1
offset(k)+1,# c3=a*c2
Inv(offset(t)+1)) # c2
+other terms,
family=binomial,
data=models)
EDIT (solution for constrained coefficients)
term_fun <- function(predLabels, varLabels){
paste0(predLabels[1],"*(",varLabels[1],
"+",predLabels[2],"*3)/(", # a=3 for example
varLabels[2],"+", predLabels[3],")")}
Ratio <- function(t,x){
list(predictors = list(C1 = 1, C2 = 1),
variables = list(substitute(t), substitute(x)),
term = term_fun)
}
class(Ratio) <- "nonlin"
fit <- gnm(Y~Ratio(k,t), data=models, family=binomial)