I have a variable called "duration_bout" which corresponds to a duration in seconds. Here's what the distribution looks like:
It looks like a Poisson distribution, but my durations are a continuous variable. If I transform my data to integers, Poisson fits rather well:
I think what I really have here is rather a gamma distribution, but I can't figure out how to use this family in glmmTMB! Several questions:
family=gamma() asks me to define an "x" parameter; what is it?
How come Poisson model fits? How correct is it to tranform my data to integers in order to use it?
Thanks!
Related
I have a data set where observations come from highly distinct groups. Each group may have a wildly different distribution, so I am trying to find the best distribution using fitdist from fitdistrplus, then use gamlssML from the gamlss package to find the best parameters.
My issue is with transforming the data after this step. For some of the distributions, like the Box-Cox t, I can find the equation for normalizing the data using the BCT coefficients, but for many of these distributions I cannot.
Does gamlss have a function that normalizes the data after fitting? Their documentation only provides the transformations for a small number of distributions https://www.gamlss.com/wp-content/uploads/2018/01/DistributionsForModellingLocationScaleandShape.pdf
Thanks a lot
The normalised data values (for any distribution) are exactly equal to the residuals from a gamlss fit,
m1 <- gamlss()
which can be accessed by
residuals(m1) or
m1$residuals
Is there an algorithm available in R that can decompose a gamma distribution into two (or more) gamma distributions? If so, can you give me an example with it? Basically, I have a data set that looks like a gamma distribution if I plot it with respect to time (it's a time series data). Basically, this data contains the movement of the animal. And the animal can be in two different states: hungry, not hungry. My immediate reaction was to use the Hidden Markov Model and see if I can predict the two states. I was trying to use the depmix() function from depmixS4 library in R to see if I can see the two different states. However, I don't really know how to use this function in gamma distribution. The following is the code that I wrote, but it says that I need an argument for gamma, which I don't understand. Can someone tell me what parameter I should use and how to determine the parameter? Thanks!
mod <- depmix(freq ~ 1, data = mod.data, nstates = 2, family = gamma())
fit.mod <- fit(mod)
Thank you!
I'm trying to estimate an SS model from this paper that has the following form:
Setting the order of the first lag polynomial to zero and the second one to one, we can reformulate it using terms from the MARSS package guide when applicable (x is the state, y is the observed variable, d is exogenous):
MARSS package allows for estimation of a simpler model that dooesn't include lagged variables in the measurement equation. Is there a way to estimate this one using MARSS or any other package without rewriting the estimation routine for this special case? Maybe there is a way to reformulate it so it could be "fed" to MARSS or some other package?
Take a look at how say the BSM Structural time series model or ARMA model is formulated as a MARSS model, aka a multivariate state-space model. That'll give you an idea of how to reform your model in multivariate state-space form.
Basically, your x will look like
See how the x_2 is just a dummy that is forced to be x(t-1)?
Now the y equation
The d and a are your D and A. I wrote in small case to spec that they are scalars. But they can be matrices in general (if y is multivariate say). Your inputs are the d_t and y_{t-1}. You prepare that 2x1xT matrix as an input.
Be careful with your initial condition specification. Probably best/easiest to set it at t=1 and estimate or use diffuse prior.
You can fit this model with MARSS. You can fit with any Kalman filter function that will allow you to pass in inputs in the y equation (some do, some don't). KFAS::KFS() allows that using the SScustom() function.
In MARSS the model list will look like so
mod.list=list(
B=matrix(list("b",1,0,0),2,2),
U=matrix(0,2,1),
Q=matrix(list("q",0,0,0),2,2),
Z=matrix(c("z", "c"),1,2),
A=matrix(0),
R=matrix("r"),
D=matrix(c("d", "a"),1,2),
x0=matrix(c("x1","x2"),2,1),
tinitx=1,
d=rbind(dt[2:TT],y[1:(TT-1)])
)
dat <- y[2:TT] # since you need y_{t-1} in the d (inputs)
fit <- MARSS(dat, model=mod.list)
It'll probably complain that it wants initial conditions for x0. Anything will work. The EM algorithm isn't sensitive to that like a BFGS or Newton algorithm. But method="BFGS" is actually often better for this type of structural ts model and in that case pick a reasonable initial condition for x (reasonable = close to your data in this case I think).
I have a vector of data. I need build the density / distribution function and from that, extract a random sample, i.e. I need obtain the result that give us a function similar to rnorm(), rpois(), rbinom(), etc, but with a distribution built from a vector of data. All in R. Thank you so much.
It has nothing to do with generate stochastic random deviates.
I know the function sample() do something similar, but not exactly. If I use sample() I obtain only elements from my original data, as a discrete distribution and I need as a continuous distribution.
Let's say I have a response variable which is not normally distributed and an explanatory variable. Let's create these two variables first (coded in R):
set.seed(12)
resp = (rnorm(120)+20)^3.79
expl = rep(c(1,2,3,4),30)
I run a linear model and I realize that the residuals are not normally distributed. (I know running a Shapiro might not be enough to justify that the residuals are not normally distributed but it is not the point of my question)
m1=lm(resp~expl)
shapiro.test(residuals(m1))
0.01794
Therefore I want to transform my explanatory variable (looking for a transformation with a Box-Cox for example).
m2=lm(resp^(1/3.79)~expl)
shapiro.test(residuals(m2))
0.4945
Ok, now my residuals are normally distributed it is fine! I now want to make a graphical representation of my data and my model. But I do not want to plot my explanatory variable in the transformed form because I would lose lots of its intuitive meaning. Therefore I do:
plot(x=expl,y=resp)
What if I now want to add the model? I could do this
abline(m2) # m2 is the model with transformed variable
but of course the line does not fit the data represented. I could do this:
abline(m1) # m1 is the model with the original variable.
but it is not the model I ran for the statistics! How can I re-transform the line predicted by m2 so that it fits the data?
plotexpl <- seq(1,4,length.out=10)
predresp <- predict(m2,newdata=list(expl=plotexpl))
lines(plotexpl, predresp^(3.79))
I won't discuss the statistical issues here (e.g. a non-significant test does not mean that H0 is true and your model is not better than the mean).
Since you've mentioned that the transformation might base on Box-Cox formula,
I would like to point out a issue you might want to consider.
According to the Box-cox transformation formula in the paper Box,George E. P.; Cox,D.R.(1964). "An analysis of transformations", your transformation implementation (in case it is a Box-Cox one) might need to be slightly edited.The transformed y should be (y^(lambda)-1)/lambda instead of y^(lambda). (Actually, y^(lambda) is called Tukey transformation, which is another distinct transformation formula.)
So, the code should be:
lambda=3.79
m2=lm(resp^((lambda-1)/lambda)~expl)
shapiro.test(residuals(m2))
More information
Correct implementation of Box-Cox transformation formula by boxcox() in R:
https://www.r-bloggers.com/on-box-cox-transform-in-regression-models/
A great comparison between Box-Cox transformation and Tukey transformation. http://onlinestatbook.com/2/transformations/box-cox.html
One could also find the Box-Cox transformation formula on Wikipedia:
en.wikipedia.org/wiki/Power_transform#Box.E2.80.93Cox_transformation
Please correct me if I misunderstood your implementation.