How to mix 2 given student distribution with a Gaussian copula? - r

In R, I have simulated two independent student: X1 and X2, with 5 and 10 degrees of liberty respectively. I want to consider different mixtures of these data. First, I opt for a linear mixture as Y=RX where R is as rotation matrix. No problem for this part.
The problem is that I want to have a non-linear mixture of X1 and X2 by using a Gaussian copula.
I know that I can use the R Copula Package for simulating two student distribution with a Gaussian copula. But as far as I know, this package cannot solve my problem as it simulates new data and doesn't use X1 and X2 to create the mixture.
There is obviously something that I don't understand. Does anyone have an answer/any idea to solve the problem ? Would be great!
Many thanks.

Do you mean mixture distribution? if so, you can use copula package. It provides a mixture model as well. For example,
Cop <- mixCopula(list(frankCopula(-5), claytonCopula(4)))
Cdat <- rCopula(500, Cop)
Res <- fitCopula(Cop, Cdat)
This will generate a mixture of Frank and Clayton copula. Of course, you can have a mixture of any copulas.

Related

How to transform data after fitting a distribution with gamlss?

I have a data set where observations come from highly distinct groups. Each group may have a wildly different distribution, so I am trying to find the best distribution using fitdist from fitdistrplus, then use gamlssML from the gamlss package to find the best parameters.
My issue is with transforming the data after this step. For some of the distributions, like the Box-Cox t, I can find the equation for normalizing the data using the BCT coefficients, but for many of these distributions I cannot.
Does gamlss have a function that normalizes the data after fitting? Their documentation only provides the transformations for a small number of distributions https://www.gamlss.com/wp-content/uploads/2018/01/DistributionsForModellingLocationScaleandShape.pdf
Thanks a lot
The normalised data values (for any distribution) are exactly equal to the residuals from a gamlss fit,
m1 <- gamlss()
which can be accessed by
residuals(m1) or
m1$residuals

How to use weights in multivariate linear regression in R with lm?

I've got a linear regression that looks like:
multivariateModel = lm(cbind(y1, y2, y3)~., data=temperature)
I need to do two things with this, which I've found difficult to do. First is to extract the variances, and right now I'm using sigma(multivariateModel), which has returned
y1 y2 y3
31.22918 31.83245 31.01727
I would like to use those 3 sigmas to create variances (sd^2) and weight them against my regression. Currently, weights=cbind(31.22918, 31.83245, 31.01727) is not working, and it's also not working to use matrix 3 columns long with those values repeated.
Here is the dataset in question:
Is there a way to add these as a weighted matrix so that I can get out a fitted model with this, or is there another package I need to use besides lm for this? Thanks.
Here is a link to the dataset: https://docs.google.com/spreadsheets/d/1zm9pPqOnkBdsPekOf8IoXN8yLr82CCFBuc9EtxN5JII/edit?usp=sharing

R: functions to determine distance of multivariate data to normal distribution

I have a multivariate data and I am interested to compute the distance of complete data to multivariate normal distribution. I want to use R. I have seen some functions like shapiro-wilk test etc. But from them I can only understand if p-value is less <0.05 it does not follow normal distribution. But I want to know how much it is far from the normal distribution. Can anyone please refer me to some functions that I can refer to for use.
Use the mqqnorm function from the RVAideMemoire package. It shows, among others, Mahalanobis distances. From the function example:
x <- 1:30+rnorm(30)
y <- 1:30+rnorm(30,1,3)
mqqnorm(cbind(x,y))

Generalized linear model fit with constraints and poisson error

I am currently trying to fit a linear model to count data, where the errors are following a poisson distribution. Namely I would like to minimize the following
where I have i samples. β is a vector with m coefficients and x is consisting of m independent (explanatory) variables. β should sum up to 1 and each coefficient should be larger than 0.
I am using R and I tried the package glmc without much success. The only example in the documentation is only confusing me, as I don't get how the constraint matrix Amat is enforcing a constraint on the coefficients. Is there any other example I could have a look at or another package?
I also tried solving this analytically with medium success.
Any help is appreciated!
kind regards, Lena

How to choose the param in fitCopula

Suppose I have two random variables X1 and X2, both normally distributed. In R I would generate sample like this:
> X1 <- rnorm(1000,mean=2.4,sd=1.3)
> X2 <- rnorm(1000,mean=1.5,sd=0.9)
Assuming they are normally distributed with the corresponding mean and sd. My goals is to fit a copula C to these samples assuming a certain family for the copula C. For simplicity assume the bivariate distribution follows t distribution.
The first step would be to transform them into (pseudo) uniform distribution. We would look at U1=F1(X1) and U2=F2(X2). In R I would do this with the following code
> U1 <- pnorm(X1,mean=2.4,sd=1.3)
> U2 <- pnorm(X2,mean=1.5,sd=0.9)
Then I would fit a t-copula using the copula package. I know that I could directly fit a multivariat t distribution. However I would like to know how these things are working in the package. The function fitCopula needs an object of class copula. Obviously I would hand over a t-copula. I'm not sure how to choose the parameter, since they should be estimated. So how can I fit a t-Copula to U1 and U2?

Resources