how to re-calculate the dependent variable of an R model - formula

result~1+Speed+Reltailsize+tailcontrst+relsvl:Speed+Reltailsize:Speed+Reltailsize:relsvl+strategy:relsvl+tailcontrst:Reltailsize
I have fitted it using
require(MASS)
polr(result~1+Speed+Reltailsize+tailcontrst+relsvl:Speed+Reltailsize:Speed+Reltailsize:relsvl+strategy:relsvl+tailcontrst:Reltailsize, data=results)
and got the coefficients:
my question now is how do I do now to re calculate values of "result" substituting the values of the factors?
I have two specific doubt about that,
1) what do I substitute ":" for? Maybe "/" ?
2) Do i need to re use a probit link function? how do I do this?

I advanced in my problem. : cannot be substituted by / because it means that the factor vectors should be casewise multiplied (Venables & Ripley, 2002).
still dont know how to do an ordered probit transformation with values 1,2 and 3.
cheers
Agus

I advanced more. Now i fit the model using multinom from nnet package and use predict.nnet to get the predicted values from a dataframe with real data.
M=multinom(result~1+Speed+relsvl, data=results)
summary(M)
require(nnet)
d=predict (M, realdata)
Cheers

Related

Covariance structure in lme - AR(1)

My response variable is Yijk corresponding to the recovery time of
patient i (i=1,...,I)
with treatment j (j=1,...,J)
and measured at time k (k=1,...,K)
I would like to fit the following model:Model equation, where:
μ is a global fixed intercept
αj is a fixed effect for the treatment
bik is a random effect with the following covariance structure. Denote bi the K-dimensional vector of effect for the patient i, then its variance-covariance matrix would have the following AR(1) structure.
Variance covariance matrix
uijk is the usual error term with variance σ²
Consider the following line of command:
lme(recovery ~ treatment, method="REML", random=~1|patient, correlation=corAR1,form=~time|patient,data=data)
Several questions:
What does this correlation argument correspond to? The structure of covariance of what? Is that the var-cov matrix which I defined as R?
Does the line actually do what I would like to?
If not, what does it do?
If not, is there a way to do what I would like to?
Thank you in advance!
First, you have a command lme, I will assume that is meant to be nlme because a) lme isn't an R command in any package that I know of or that R could find and b) correlation isn't an option in lme4
Second, in the documentation for nlme they have this:
an optional corStruct object describing the within-group correlation
structure. See the documentation of corClasses for a description of
the available corStruct classes. Defaults to NULL, corresponding to no
within-group correlations.
and in corClasses it says
corAR1 autoregressive process of order 1.
So, the answers to your first two questions appears to be "Yes".

R: glm (multiple linear regression) ignores/removes some predictor variables

I have posted this question before, but I believe that I had not explained the problem well and that it was over-complicated, so I deleted my previous post and I am posting this one instead. I am sorry if this caused any inconvenience.
I also apologize in advance for not being able to provide example data, I am using very large tables, and what I am trying to do works fine with simpler examples, so providing example data cannot help. It has always worked for me until now. So I am just trying to get your ideas on what might be the issue. But if there is any way I could provide more information, do let me know.
So, I have a vector corresponding to a response variable and a table of predictor variables. The response vector is numeric, the predictor variables (columns of the table) are in the binary format (0s and 1s).
I am running the glm function (multivariate linear regression) using the response vector and the table of predictors:
fit <- glm(response ~ as.matrix(predictors), na.action=na.exclude)
coeff <- as.vector(coef(summary(fit))[,4])[-1]
When I have been doing that in the past, I would extract the vector of regression coefficient to use it for further analysis.
The problem is that now the regression returns a vector of coefficients which is missing some values. Essentially some predictor variables are not attributed a coefficient at all by glm. But there are no error messages.
The summary of the model looks normal, but some predictor variables are missing like I mentioned. Most other predictors have assigned data (coefficient, pvalue, etc.).
About 30 predictors are missing from the model, over 200.
I have tried using different response variables (vectors), but I am getting the same issue, although the missing predictors vary depending on the response vector...
Any ideas on what might be going on? I think this can happen if some variables have 0 variance, but I have checked that. There are also no NA values and no missing values in the tables.
What could cause glm to ignore/remove some predictor variables?
Any suggestion is welcome!
EDIT: I found out that the predictors that were removed has values identical to another predictor. There should still be a way to keep them, and they would get the same regression coefficient for example
Your edit explains why you are not getting those variables. That was going to be my first question. (This question would be better posed on Cross validated because it is not an R error, it is a problem with your model.)
They would not get the same coefficients: Say you have a 1:1 relationship, Y = X + e, Then fit simple model Y ~ X + X. Each X is going to be assigned ANY value such that the sum is equal to 1. There is no solution. Y = 0.5X + 0.5X may be the most obvious to us, but Y = 100X -99X is just as valid.
You also cannot have any predictors that are linear sums of other predictors for the same reason.
If you really want those values you can generate them from what you have. However I do not recommend it because the assumptions are going to be on very thin ice.

Predict y value for a given x in R

I have a linear model:
mod=lm(weight~age, data=f2)
I would like to input an age value and have returned the corresponding weight from this model. This is probably simple, but I have not found a simple way to do this.
Its usually more robust to use the predict method of lm:
f2<-data.frame(age=c(10,20,30),weight=c(100,200,300))
f3<-data.frame(age=c(15,25))
mod<-lm(weight~age,data=f2)
pred3<-predict(mod,f3)
This spares you from wrangling with all of the coefs when the models can be potentially large.
If your purposes are related to just one prediction you can just grab your coefficient with
coef(mod)
Or you can just build a simple equation like this.
coef(mod)[1] + "Your_Value"*coef(mod)[2]

Pseudo R squared for cumulative link function

I have an ordinal dependent variable and trying to use a number of independent variables to predict it. I use R. The function I use is clm in the ordinal package, to perform a cumulative link function with a probit link, to be precise:
I tried the function pR2 in the package pscl to get the pseudo R squared with no success.
How do I get pseudo R squareds with the clm function?
Thanks so much for your help.
There are a variety of pseudo-R^2. I don't like to use any of them because I do not see the results as having a meaning in the real world. They do not estimate effect sizes of any sort and they are not particularly good for statistical inference. Furthermore in situations like this with multiple observations per entity, I think it is debatable which value for "n" (the number of subjects) or the degrees of freedom is appropriate. Some people use McFadden's R^2 which would be relatively easy to calculate, since clm generated a list with one of its values named "logLik". You just need to know that the logLikelihood is only a multiplicative constant (-2) away from the deviance. If one had the model in the first example:
library(ordinal)
data(wine)
fm1 <- clm(rating ~ temp * contact, data = wine)
fm0 <- clm(rating ~ 1, data = wine)
( McF.pR2 <- 1 - fm1$logLik/fm0$logLik )
[1] 0.1668244
I had seen this question on CrossValidated and was hoping to see the more statistically sophisticated participants over there take this one on, but they saw it as a programming question and dumped it over here. Perhaps their opinion of R^2 as a worthwhile measure is as low as mine?
Recommend to use function nagelkerke from rcompanion package to get Pseudo r-squared.
When your predictor or outcome variables are categorical or ordinal, the R-Squared will typically be lower than with truly numeric data. R-squared merely a very weak indicator about model's fit, and you can't choose model based on this.

Running predict() after tobit() in package AER

I am doing a tobit analysis on a dataset where the dependent variable (lets call it y) is left censored at 0. So this is what I do:
library(AER)
fit <- tobit(data=mydata,formula=y ~ a + b + c)
This is fine. Now I want to run the "predict" function to get the fitted values. Ideally I am interested in the predicted values of the unobserved latent variable "y*" and the observed censored variable "y" [See Reference 1].
I checked the documentation for predict.survreg [Reference 2] and I don't think I understood which option gives me the predicted censored variables (or the latent variable).
Most examples I found online advise the following :
predict(fit,type="response").
Again, its not clear what kind of predictions these are.
My guess is that the "type" option in the predict function is the key here, with type="response" meant for the censored variable predictions and type="linear" meant for latent variable predictions.
Can someone with some experience here, shed some light for me please ?
Many Thanks!
References:
http://en.wikipedia.org/wiki/Tobit_model
http://astrostatistics.psu.edu/datasets/2006tutorial/html/survival/html/predict.survreg.html
Generally predict-"response" results have been back-transformed to the original scale of data from whatever modeling transformations were used in a regression, whereas the "linear" predictions are the linear predictors on the link transformed scale. In the case of tobit which has an identity link, they should be the same.
You can check my meta-prediction easily enough. I just checked it with the example on the ?tobit page:
plot(predict(fm.tobit2, type="response"), predict(fm.tobit2,type="linear"))
I posted a similar question on stats.stackexchange and I got an answer that could be useful for you:
https://stats.stackexchange.com/questions/149091/censored-regression-in-r
There one of the authors of the package shows how to calculate the mean of (ie. prediction) of $Y$ where $Y = max(Y^*,0)$. Using the package AER this has to be done somewhat "by hand".

Resources