R Multiple Regression with Two Predictor Variables - r

I have a data.frame (X,Y,a,b,c,d,e)
Is there a package where I can predict both X and Y at the same time?
Thanks for your help.

Try package car, ?linearHypothesis, example
a multivariate linear model for repeated-measures data
see ?OBrienKaiser for a description of the data set used in this example.
However, it might not be appropriate for the real X Y you have.

Related

Linear regression with errors on x and y

I have two variables, x and y, each of which has an error in x and y associated with each point. I'm trying to fit a linear regression model in R which takes account of the error in both variables. I see that you can use weights in lm() to weight the regression based on errors but as far as I can see this can only incorporate errors on one variable. Is there any way to fit a linear model which takes into account errors on both of the variables?
Thanks to #Stéphane Laurent for the answer.
The package "deming" contains a function to do exactly this.

Is it possible to analyse the effect of both factors and numeric variables at the same time in a limma model?

I try to perfom an analysis of gene expression data with the limma r package. My model includes factors and numerical covariates and I'm not able to get the results for both types of variables at once.
Here is an example:
design <- model.matrix(~0+Factor+NumericCov,data=sampleData)
fit <- lmFit(geneExprData,design)
cont.matrix <- makeContrasts(Factor1=FactorLevel2-FactorLevel1,
Factor2=FactorLevel3-FactorLevel2,
Factor2=FactorLevel1-FactorLevel3,
NumericCov = NumericCov,
levels=design)
fit <- contrasts.fit(fit, cont.matrix)
fit <- eBayes(fit)
topTable(fit,coef="Factor1")
topTable(fit,coef="Factor2")
topTable(fit,coef="Factor3")
topTable(fit,coef="NumericCov")
Is this correct? Or should I just not use a contrast matrix for the analysis of the effect of numeric covariates?
If I do not use the makeContrast function it is more difficult to look at the difference between all the levels of the factor (which I need to do).
So if this is not correct, is there nevertheless a way to define the constrasts in order do both parts of the analysis at once?
Thanks in advance!

Bivariate partial dependence with randomForest in R

I have a dataset with a binary dependent variable and a number of predictors, including participant. I am trying to examine the idiosyncratic effects of different predictors for different participants. In order to do that, I'm trying to look at the effect of interactions between participant id and the other predictors on the dependent variable. I'm using randomForest in R. I can fit the forest successfully, and can produce partial dependence plots for individual variables. What I need, however, are partial dependence plots for pairs of variables - participant + others. Is this possible?
For reference, my code:
data_sample<-data_raw[sample(1:nrow(data_raw),500,replace=F),];
test_rf<-randomForest(perceptually.rhotic~vowel+speaker+modified_clip_start+function_word+year_of_birth+gender+fathers_job_type+prepausal,data=data_sample,ntree=500,mtry=3);
partialPlot(test_rf,pred.dat=data_sample,x.var="speaker");
??? partialPlot(test_rf,pred.dat=data_sample,x.var=c("speaker","vowel"));
Thanks very much in advance for any advice anyone can offer!
The plotmo R package will plot partial dependencies for all variables and pairs of variables (bivariate dependencies) for "any" model. For example:
library(randomForest)
data(trees)
mod <- randomForest(Volume~., data=trees)
library(plotmo)
plotmo(mod, pmethod="partdep") # plot partial dependencies
which gives
You can specify exactly which variable and variable pairs get plotted using plotmo's all1, all2, degree1, and degree2 arguments. Additional examples are in the vignette for the plotmo package.

hurdle models using continuous data and covariates

I was wondering if I get some advice about fitting hurdle models using continuous data and covariates.
I have some continuous data that are generally well fit using a right-skewed distribution such as a Pareto, Gamma, or Weibull distribution. However, there several zeros in my data which are important to my analysis. In addition, I have some categorical (two-level) covariates and would like to model the parameters of a distribution as a function of these covariates in order to formally evaluate their importance (e.g., using AIC).
I have seen examples of hurdle models fit using continuous data but have not yet found any examples of how to incorporate covariates and a model-selection framework. Does anyone have any suggestions as to how to proceed or know of any R packages that allow this procedure? I have included some code below to reproduce the type of data I am working with. The non-zero data are generated via a generalized Pareto distribution from the package texmex. The parameters were estimated directly from my non-zero data. I have also included the code to plot the data in a histogram to see their distribution.
library("texmex")
set.seed(101)
zeros <- rep(0,8)
non_zeros <- rgpd(17, sigm=exp(-10.4856), xi=0.1030, u = 0)
all.data <- c(zeros,non_zeros)
hist(non_zeros,breaks=50,xlim=c(0,0.00015),ylim=c(0,9),main="",xlab="",
col="gray")
hist(zeros,add=TRUE,col="black",breaks=100,xlim=c(0,0.00015),ylim=c(0,9))
legend("topright",legend=c("zeros"),col="black",lwd=8)

Running predict() after tobit() in package AER

I am doing a tobit analysis on a dataset where the dependent variable (lets call it y) is left censored at 0. So this is what I do:
library(AER)
fit <- tobit(data=mydata,formula=y ~ a + b + c)
This is fine. Now I want to run the "predict" function to get the fitted values. Ideally I am interested in the predicted values of the unobserved latent variable "y*" and the observed censored variable "y" [See Reference 1].
I checked the documentation for predict.survreg [Reference 2] and I don't think I understood which option gives me the predicted censored variables (or the latent variable).
Most examples I found online advise the following :
predict(fit,type="response").
Again, its not clear what kind of predictions these are.
My guess is that the "type" option in the predict function is the key here, with type="response" meant for the censored variable predictions and type="linear" meant for latent variable predictions.
Can someone with some experience here, shed some light for me please ?
Many Thanks!
References:
http://en.wikipedia.org/wiki/Tobit_model
http://astrostatistics.psu.edu/datasets/2006tutorial/html/survival/html/predict.survreg.html
Generally predict-"response" results have been back-transformed to the original scale of data from whatever modeling transformations were used in a regression, whereas the "linear" predictions are the linear predictors on the link transformed scale. In the case of tobit which has an identity link, they should be the same.
You can check my meta-prediction easily enough. I just checked it with the example on the ?tobit page:
plot(predict(fm.tobit2, type="response"), predict(fm.tobit2,type="linear"))
I posted a similar question on stats.stackexchange and I got an answer that could be useful for you:
https://stats.stackexchange.com/questions/149091/censored-regression-in-r
There one of the authors of the package shows how to calculate the mean of (ie. prediction) of $Y$ where $Y = max(Y^*,0)$. Using the package AER this has to be done somewhat "by hand".

Resources