I am new to Julia programming language, however, I am fitting a Linear Mixed Effects Model and I find it difficult to save the fixed and random effects estimates in .csv files.
An example code can be found:
using MixedModels
#time modelOutput = fit(lmm(Y~ A + B + (0 + A | group), data))
There is available reference about how to obtain the fixed (fixef(modelOutput)) and random (ranef(modelOutput)) effects however using a DataFrame I am facing errors.
Any advice is appreciated.
Okay, I actually took the time to do this for you. A CoefTable is a type defined in statmodels here. Given this information, we can extract the relevant information from the CoefTable instance as follows:
df = DataFrame(variable = ct.rownms,
Estimate = ct.mat[:,1],
StdError = ct.mat[:,2],
z_val = ct.mat[:,3])
This will give an nvar-by-4 DataFrame which you can then write to csv as described earlier using writetable("output.csv",df)
I had a number of problems getting the accepted answer to work; Julia has evolved a lot since then. I rewrote it based primarily on code from the jglmm R package, with some adaptation/cobbling-together from other sources ...
"""
outfun(m, outfn="output.csv")
output the coefficient table of a fitted model to a file
"""
outfun = function(m, outfn="output.csv")
ct = coeftable(m)
coef_df = DataFrame(ct.cols);
rename!(coef_df, ct.colnms, makeunique = true)
coef_df[!, :term] = ct.rownms;
CSV.write(outfn, coef_df);
end
Related
Greeting to everyone.
I sucessfully computed pls-r model in R using the code below
pls_modB_Kexch_2 <- plsr(Av.K_exc~., data = trainKexch.sar.veg, scale=TRUE,method= "s",validation='CV')
The regression coeffiecents for ncomps =11 were
(
Intercept)= -4.692966e+05,
Easting = 6.068582e+03, Northings= 7.929767e+02,
sigma_vv = 8.024741e+05, sigma_vh = -6.375260e+05,
gamma_vv = -7.120684e+05, gamma_vh = 4.330279e+05,
beta_vv = -8.949598e+04, beta_vh = 2.045924e+05,
c11_db = 2.305016e+01, c22_db = -4.706773e+01,
c12_real = -1.877267e+00.)
It predicts well new data sets when applied with in R enviroment.
My challenge is presenting this model in form of y=sum(AX)+Bo equation where A are coeffiecents of respective variablesX
Or any other mathmetical form, that can be presented academically.
I tried a direct way by multiplying the coeff.to each variable and suming them up, aquick manual trial for predictions gave me strange results. Am missing something here, please help.
I am very new to R, and this is my first time of encountering the eval() function. So I am trying to use the med and boot.med function from the following package: mma. I am using it to conduct mediation analysis. med and boot.med take in models such as linear models, and dataframes that specify mediators and predictors and then estimate the mediation effect of each mediator.
The author of the package gives the flexible option of specifying one's own custom.function. From the source code of med, it can be seen that the custom.function is passed to the eval(). So I tried insert the gbmt function as the custom function. However, R kept giving me error message: Error during wrapup: Number of trees to be used in prediction must be provided. I have been searching online for days and tried many ways of specifying the number of trees parameter n.trees, but nothing works (I believe others have raised similar issues: post 1, post 2).
The following codes are part of the source code of the med function:
cf1 = gsub("responseY", "y[,j]", custom.function[j])
cf1 = gsub("dataset123", "x2", cf1)
cf1 = gsub("weights123", "w", cf1)
full.model[[j]] <- eval(parse(text = cf1))
One custom function example the author gives in the package documentation is as follows:
temp1<-med(data=data.bin,n=2,custom.function = 'glm(responseY~.,data=dataset123,family="quasibinomial",
weights=weights123)')
Here the glm is the custom function. This example code works and you can replicate it easily (if you have mma installed and loaded). However when I am trying to use the gbmt function on a survival object, I got errors and here is what my code looks like:
temp1 <- med(data = data.surv,n=2,type = "link",
custom.function = 'gbmt(responseY ~.,
data = dataset123,
distribution = dist,
train_params = start_stop,
cv_folds=10,
keep_gbm_data = TRUE,
)')
Anyone has any idea how the argument about number of trees n.trees can be added somewhere in the above code?
Many thanks in advance!
Update: in order to replicate the example code, please install mma and try the following:
library("mma")
data("weight_behavior") ##binary x #binary y
x=weight_behavior[,c(2,4:14)]
pred=weight_behavior[,3]
y=weight_behavior[,15]
data.bin<-data.org(x,y,pred=pred,contmed=c(7:9,11:12),binmed=c(6,10), binref=c(1,1),catmed=5,catref=1,predref="M",alpha=0.4,alpha2=0.4)
temp1<-med(data=data.bin,n=2) #or use self-defined final function
temp1<-med(data=data.bin,n=2, custom.function = 'glm(responseY~.,data=dataset123,family="quasibinomial",
weights=weights123)')
I changed the custom.function to gbmt and used a survival object as responseY and the error occurs. When I use the gbmt function on my data outside the med function, there is no error.
Problem
I am using R 3.3.3 on Windows 10 (x64 bit). I get the following prediction output from the glmmPQL prediction function as follows:
library(MASS)
library(nlme)
library(dplyr)
model<-glmmPQL(a ~ b + c + d, data = trainingDataSet, family = binomial, random = list( ~ 1 | e), correlation = corAR1())
The prediction values are given as follows:
p <- predict(model, newdata=testingDataSet, type="response",level=0) (1.0)
The output it gives is as follows:
I then try to measure the performance of this output using the following code:
pr <- prediction(p, testingDataSet$a)(1.1)
It gives us the following error as follows:
Error in prediction(p, testingDataSet$a) :
Format of predictions is invalid. (1.2)
I have successfully been able to use the prediction method in R using other functions (glm,svm,nn) when the data looks something like as follows:
model<-glm(a ~ b + c + e, family = binomial(link = 'logit'), data = trainingDataSet)
p <- predict(model, newdata=testingDataSet, type="response") (1.3)
Attempts
I believe the fix to the above problem is to get it into the format shown below (1.3). I have tried the following things using R and have been failing.
I have tried casting p in 1.0 using as.numeric() and as.list() and other things. I want to get look like the p R object in 1.3. In other words, I believe the format is reason why things not working for me?
No matter what mutate or casting I try, I can't seem to get it into the form in 1.3 and image shown as desired. Especially with the index as columns features.
I'm coming up empty handed on stackoverflow and the R help files. When I use the command class(p) both tell me they are numeric.
Question
Give the above, can someone tell me how I can use R to get the output from glmmPQL in a format that the prediction function can use as shown above please?
In other words, how can I make sure the output in 1.0 can made to match the output in 1.3 in R? My attempts have failed and I would deeply appreciate someone more skilled in R to point out where I am failing?
If you use as.numeric(p) then you'll get the values you want - then the only difference is that the GLM output has names. You can add these in with something like:
p <- as.numeric(p)
names(p) <- 1:length(p)
If this doesn't work, you can use str(p) to examine the structure of the object in more depth.
I'm constructing a piecewise structural equation model using the piecewiseSEM package in R (Lefcheck - https://cran.r-project.org/web/packages/piecewiseSEM/vignettes/piecewiseSEM.html)
I already created the model set and I could evaluate the model fit, so the model itself works. Also, the data fits the model (p = 0.528).
But I do not succeed in extracting the path coefficients.
This is the error i get: Error in cbind(Xlarge, Xsmall) : number of rows of matrices must match (see arg 2)
I already tried (but this did not work):
standardising my data because of the warning: Some predictor variables are on very different scales: consider rescaling
adapted my data (threw some NA values away)
This is my modellist:
predatielijst = list(
lmer(plantgrootte ~ gapfraction + olsen_P + (1|plot_ID), data = d),
glmer(piek1 ~ gapfraction + olsen_P + plantgrootte + (1|plot_ID),
family = poisson, data = d),
glmer(predatie ~ piek1 + (1|plot_ID), family = binomial, data = d)
)
with "predatie" being a binary variable (yes or no) and all the rest continuous variables (gapfraction, plantgrootte, olsen_P & piek1)
Thanks in advance!
Try installing the development version:
library(devtools)
install_github("jslefche/piecewiseSEM#2.0")
Replace list with psem and run the coefs or summary function. It will likely get rid of your error. If not, open a bug on Github!
WARNING: this will overwrite your current version from CRAN. You will need to reinstall from CRAN to get version 1.4 back.
try to use lme (out of the nlme library) ilstead of glmer. As far as I understand, the fact that lmer does not provide p-values (while lme does) seems to be the problem here.
Hope this works.
I'm using the plm package for panel data to do instrumental variable estimation. However, it seems that calculating cluster robust standard errors by using the vcovHC() function is not supported.
More specifically, when I use the vcovHC() function, the following error message is displayed:
Error in vcovG.plm(x, type = type, cluster = cluster, l = 0, inner = >inner, :
Method not available for IV
Example:
data("Wages", package = "plm")
IV <- plm(lwage ~ south + exp | wks + south,
data = Wages, model = "pooling", index = 595)
vcvIV <- vcovHC(IV)
According to this thread, someone worked on a fix two years ago. Is there any progress on the issue? I know that the packages "lfe" and "ivpack" allow to compute cluster robust standard errors for IV estimation but none of them allows for random effects/intercepts.
In fact it's not implemented. However, you can use Schrimpf's clustered errors function which is applied directly to a object of the plm class.
Using your example:
library (plm)
data("Wages", package = "plm")
IV <- plm(lwage ~ south + exp | wks + south, data = Wages, model = "pooling", index = 595)
Wages$id <- rep(1:595, each = 7)
cl.plm(Wages, IV, Wages$id)
Where I'm using Wages$idas the panel first dimension around which clusters will be formed. You may want to compare these results with the obtained in other software. Anyway, the code is simple allowing some tricks. The cl.plm function is based on Arai's clustering notes which can help you further.
You can obtain the same result from cl.plm doing this in Stata:
ivregress 2sls lwage south (exp = wks), vce(cluster id) small
Or for the within model:
xtset id time, generic
xtivreg2 lwage south (exp = wks), fe small cluster(id)
Note however I used the small sample formulation in Stata, which is not big deal. More about this here. Anyway, cl.plm properly deals with the plm class object.
For sake of completeness: as suggested by #Helix123, you can use the development version (1.6-1) of plm package and proceed as you did in tour question.