Cohen d for mixed effects model - r

has anyone worked with the powerlmm package?
I want to calculate statistical power for my model with the study_parameters command. It takes multiple values as input one of which is Cohen's d.
How do I figure out what is the Cohen d for my model? Also, for which effects exactly should I specify cohen d: for fixed, random or for the entire model?
p <- study_parameters(n1 = 6,
n2 = per_treatment(control=27, treatment=55),
sigma_subject_intercept = 4.56,
sigma_subject_slope = 0.22,
sigma_error = 5.39,
cor_subject = -0.19,
dropout = dropout_manual(0.0000000, 0.2666667, 0.3466667, 0.4666667, 0.47, 0.7333333),
effect_size = cohend(0.5),
standardizer = "posttest_SD"))
get_power(p)
Some details about the study:
2 groups: Treatment and Control
6 Timepoints
n(control)=27, n(treatment)=55
Thanks in advance for your answers.

You get Cohens d from the output of the function. I see you made a small error in your code. Delete the ")", after cohend(.05 and it works.
By definition, you are interested in the effect size of the fixed effect.
Hope it works out for you.

Related

Calculating MDE for a difference-in-difference clustered randomized trial (in R)

I'm looking to calculate the Minimum Detectable Effect for a potential Difference-in-Differences design where treatment and control are assigned at the cluster level and my outcome at the individual level is dichotomous. To do this I'm working in R and using the clusterPower package, specifically I'm using the cpa.did.binary function. In the help file for this function it notes that d is "The expected absolute difference." I'm interpreting this as being a Minimum Detectable Effect, is that correct? If this is the MDE, is this output the expected difference in logits?
Thanks to anyone that can help. Alternatively, if you have a better package or way of calculating MDE that is also welcome.
# Input
cpa.did.binary(power = .8,
nclusters = 10,
nsubjects = 100,
p = .5,
d = NA,
ICC = .04,
rho_c = 0,
rho_s = 0)
# Output
> d
>0.2086531

How to estimate mle in functional response

So I have been trying to understand on how do you estimate Maximum likelihood estimates in R, here in the Gammarus dataset of package frair, what is by=0.1, a=1.2, h=0.08 and T=40/24, how do you get these values. Can someone please explain it to me ?
with(gammarus, plot(density, eaten,
xlab = "Prey Density", ylab = "No. Prey Eaten"))
x <- with(gammarus, seq(from = min(density), to = max(density),
by = 0.1))
lines(x, rogersII(X = x, a = 1.2, h = 0.08, T = 40/24), col='grey50', lty=2)
lines(x, rogersII(X = x, a = 0.6, h = 0.16, T = 40/24), col='grey50', lty=2)
I am expecting to know about functional response analysis and maximum likelihood estimation in detail.
I don't think you can expect Stack Overflow to teach you "about functional response analysis and maximum likelihood estimation in detail" - that's too broad a topic for a SO question, which is intended to solve a particular programming problem.
library(frair)
data(gammarus)
## adding a bit of noise makes it easier to identify overlapping/repeated points
plot(jitter(eaten) ~ jitter(density), gammarus)
lines(x, rogersII(X = x, a = 1.2, h = 0.08, T = 40/24), col='grey50', lty=2)
lines(x, rogersII(X = x, a = 0.6, h = 0.16, T = 40/24), col='grey50', lty=2)
The parameters used here are examples only, probably derived by visually examining the data (and knowing that the attack rate a corresponds to the initial slope of the functional response curve and 1/h corresponds to the asymptote at high densities: T is used for the exposure time of the experiment).
From ?frair::gammarus:
Total experimental time was 40 hours.
Whoever wrote the example wanted the time units to be in days rather than hours (a has units of 1/time and h has units of time), so they used T = 40/24 as the duration of the experiment (the experimental duration T must be specified for Rogers-type functional responses that allow for depletion, but not for simple Holling-type responses).
To estimate the parameters, you need to use frair_fit; you must provide reasonable starting values, which is one reason for doing the preliminary graph.
ff <- frair_fit(eaten ~ density, gammarus, response = 'rogersII',
start=list(a = 1.2, h = 0.08), fixed=list(T=40/24))
## add line to existing plot
lines(ff, col = 2)
One place you could look for more information on functional responses and MLEs (besides in the published literature) would be here, e.g. chap 3/p. 12, all of chaps 6/7 ...

Fitting a physical model to a specific data using nls: over-parameterization or unidentifiable parameters?

I have somewhat a complex physical model with five unknown parameters to fit, but no success so far.
I used nls2 first to get some estimates for the start values, but then nls, nlxb, and nlsLM all threw the famous "singular gradient error at initial parameter estimates" error.
For the start values for nls2, I extracted them from the literature, so I think that I have good starting values at least for nls2. The parameter estimates extracted from nls2 make quite sense physically as well; however, don't resolve the issue with the singular gradient matrix error.
Since it's a physical model, every coefficient has a physical meaning, and I prefer not to fix any of them.
I should also mention that all five unknown parameters in the model equation are positive and the shape parameter m can go up to 2.
Reading through many posts and trying different solution suggestions, I have come to conclusion that I have either over-parameterization or unidentifiable parameters problem.
My question is that should I stop trying to use nls with this specific model (with this many unknown parameters) or is there any way out?
I am quite new to topic, so any help, mathematically or code-wise, is greatly appreciated.
Here is my MWE:
# Data
x <- c(0, 1000, 2000, 2500, 2750, 3000, 3250, 3500, 3750, 4000, 5000)
y <- c(1.0, 0.99, 0.98, 0.95, 0.795, 0.59, 0.35, 0.295, 0.175, 0.14, 0.095)
# Start values for nls2
bounds <- data.frame(a = c(0.8, 1.5), b = c(1e+5, 1e+7), c = c(0.4, 1.4), n = c(0.1, 2), m = c(0.1, 2))
# Model equation function
mod <- function(x, a, b, c, n, m){
t <- b*85^n*exp(-c/0.0309)
(1 - exp(-(a/(t*x))^m))
}
# # Model equation
# mod <- y ~ (1 - exp(-(a/(b*85^n*exp(-c/0.0309)*x))^m))
# Model fit with nls2
fit2 <- nls2(y ~ mod(x, a, b, c, n, m), data = data.frame(x, y), start = bounds, algorithm = "brute-force")
# Model fit with nls
fit <- nls(y ~ mod(x, a, b, c, n, m), data = data.frame(x, y), start = coef(fit2))
The more I look at this the more confused I get, but I'm going to try again.
Looking again at your expression, we have the expression inside the exponential
-(a/(b*85^n*exp(-c/0.0309)*x))^m
We can rewrite this as
-( [a/(b*85^n*exp(-c/0.0309))] * 1/x )^m
(please check my algebra!)
If this is correct, then that whole bold-faced blob doesn't affect the functional form of x — it all collapses to a single constant in the equation. (In other words, {a,b,c,n} are all jointly unidentifiable.) Lumping that stuff into a single parameter phi :
1 - exp(-(phi/x)^m)
phi is a shape parameter (has the same units as x, should be roughly the same magnitude as a typical value of x): let's try a starting value of 2500 (the mean value of x)
m is a shape parameter; we can't go too badly wrong starting from m==1
Now nls works fine without any extra help:
n1 <- nls(y~1 - exp(-(phi/x)^m), start=list(phi=2500,m=1), data=data.frame(x,y))
and gets phi=2935, m=6.49.
Plot predictions:
plot(x,y, ylim=c(0,1))
xvec <- seq(0, 5000, length=101)
lines(xvec, predict(n1, newdata=data.frame(x=xvec)))
Another way to think about what this curve is doing: we can transform the equation to -log(1-y) = phi^m*(1/x)^m: that is, -log(1-y) should follow a power-law curve with respect to 1/x.
Here's what that looks like:
plot(1/x, -log(1-y))
## curve() uses "x" as the current x-axis variable, i.e.
## read "x" as "1/x" below.
with(as.list(coef(n1)), curve(phi^m*x^m, add=TRUE))
In this format, it appears to fit the central data well but fails for large values of 1/x (the x=0 point is missing here because it goes to infinity).

Reproducing output from one statistical program to R

A previous employee from my organization performed all of their analyses on a different statistical program than R (with no documentation), and no one currently employed knows which program was used. Looking at the model output table and comparing it to Google search results, I think they used Statistica. In an effort to be transparent with other organizations who work with us, I'm trying to replicate their work and potentially reevaluate it.
Model: They modeled the relationship between three variables which I will call A, B, C. Variables were chosen based on exploratory analyses (i.e., correlation matrices and GLM modeling). Parameter estimates are used for prediction purposes. From what I can tell, they used a GLM with a log-link function to model C as a function of A and B.
Data:
A <- c(0.937918714, 1.277501774, 34.46428571, 3.843879361, 5.135520685, 0.324675325, 1.038421599, 0.333333333, 0.058139535, 0.09009009, 0.080515298, 5.174234424, 10.625, 21.9047619, 0.162074554, 2.372881356, 1.084430674, 18.53658537, 6.438631791, 0.172413793, 0.291120815, 9.090909091, 5.882352941)
B <- c(0.416666667, 0.555555556, 0.833333333, 0.4, 0.833333333, 0.4, 0.625, 0.625, 0.294117647, 0.37037037, 0.285714286, 1.111111111, 0.588235294, 0.476190476, 0.555555556, 0.833333333, 0.666666667, 0.476190476, 0.208333333, 0.163934426, 0.163934426, 0.3125, 0.454545455)
C <- c(0.009533367, 0.020812183, 0.056208054, 0.015002587, 0.042735043, 0.013661202, 0.004377736, 0.00635324, 0.001345895, 0.001940492, 0.00446144, 0.043768997, 0.021134594, 0.004471772, 0.023488256, 0.029441118, 0.052287582, 0.003526093, 0.030984508, 0.010891089, 0.020812686, 0.016032064, 0.018145161)
My Approach:
I combined each vector into a data frame (dat) and modeled using the following:
glm(formula = C ~ A + B, family = binomial(link = logit), data = dat)
The Question:
I notice we have different parameter estimates; in fact, their analysis includes 'Scale' as a factor, and an associated parameter estimate and standard error (see below). I haven't figured out how to include separate 'Scale' factor. My parameter estimates are close to these, but are obviously different with the inclusion of a new variable.
Anyone familiar with this [Statistica] output and how I could replicate it in R? Primarily, how would I incorporate the Scale factor into my analyses?
Side-note
I've also posted this to Reddit (r/rstats - Replicating an analysis performed in different software).
Much appreciated!

Use R's NeuralNetToolslibrary to Plot the Network Structure of a H2O Deep Neural Network

I want to be able to use R's NeuralNetTools tools library to plot the network layout of a h2o deep neural network. Below is a sample code that plots the network layout of the model from the neural net package.
library(NeuralNetTools)
library(neuralnet)
data(neuraldat)
wts_in <- neuralnet(Y1 ~ X1 + X2 + X3, data = neuraldat, hidden = c(4),
rep=1)
plotnet(wts_in)
I want to do the same thing but use H2o deep neural model. The code shows how to generate a layout by only knowing the number of layers and weight structure.
library(NeuralNetTools)
# B1-H11, I1-H11, I2-H11, B1-H12, I1-H12, I2-H12, B2-H21, H11-H21, H12-H21,
# B2-H22, H11-H22, H12-H22, B3-O1, H21-O1, H22-O1
wts_in <- c(1.12, 1.49, 0.16, -0.11, -0.19, -0.16, 0.5, 0.2, -0.12, -0.1,
0.89, 0.9, 0.56, -0.52, 0.81)
struct <- c(2, 2, 2, 1) # two inputs, two (two nodes each), one output
x_names<-c("No","Yes") #Input Variable Names
y_names<-c("maybe") #Output Variable Names
plotnet(wts_in, struct=struct)
Below is the above neuralnet model but I have used H2o to generate it. I’m stumped on how to get the number of layers.
library(h2o)
h2o.init()
neuraldat.hex <- as.h2o(neuraldat)
h2o_neural_model<-h2o.deeplearning(x = 1:4, y = 5,
training_frame= neuraldat.hex,
hidden=c(2,3),
epochs = 10,
model_id = NULL)
h2o_neural_model#model
I can use the weights #h2o.weights(object, matrix_id = 1) and bias function #h2o.biases(object, vector_id = 1) to build the structure but I need it to determine the number layers. I know I can specify the number layers in the model to start with but I sometimes write code that will determine the number of layers going into the model and so I need to a function determine the layers in network structure and weights for the plotnet() function below.
plotnet(wts_in, struct=struct)
As an alternative, it would be nice if I had a ggplot2 function instead of the plotnet() function.
Any help is greatly appreciated.
I know it's been 8 months and it's likely that you've already figured it out. However, I will post my solution for those who run into the same problem.
The importance here resides in the parameter export_weights_and_biases of h2o.deeplearning(); and the h2o.weigths(neuralnet) and h2o.biases(neuralnet) functions, which gives the parameters you're looking for.
All that's left is ordering the data.
# Load your data
neuraldat.hex <- as.h2o(neuraldat)
h2o_neural_model <- h2o.deeplearning(x = 1:4, y = 5,
training_frame= neuraldat.hex,
hidden = c(2,3),
epochs = 10,
model_id = NULL,
export_weights_and_biases = T) # notice this parameter!
# for each layer, starting from left hidden layer,
# append bias and weights of each node in layer to
# numeric vector.
wts <- c()
for (l in 1:(length(h2o_neural_model#allparameters$hidden)+1)){
wts_in <- h2o.weights(h2o_neural_model, l)
biases <- as.vector(h2o.biases(h2o_neural_model, l))
for (i in 1:nrow(wts_in)){
wts <- c(wts, biases[i], as.vector(wts_in[i,]))
}
}
# generate struct from column 'units' in model_summary
struct <- h2o_neural_model#model$model_summary$units
# plot it
plotnet(wts, struct = struct)
The h2o object that it's returned by the deeplearning function it's quite complex and one can get lost in the documentation.

Resources