Order lattice panel by regression intercept - r

Example dataset here
Let us build a simple lattice plot from this data for linear regression, with separate panels for each Subject
xyplot(Measurement~HOL|Subject,groups=Treatment,data=Data,
type=c('p','r'),auto.key=T,aspect="xy")
The issue is, I would like to visually inspect if the slopes-and-intercepts are correlated. Thus, I would like to order the panels by linear-regression intercept as opposed to by Subject (this was done in Douglas Bates' book "lme4: Mixed-effects modeling with R" Figure 3.1, but I cannot find example code). I know I can change the order of panels by hand by adding
index.cond=list(c(1,2,3, etc))
But this is extraordinarily inefficient, especially since I would like to do this for multiple response variables.
Does anyone have an automated way to do this? I am also open to attempting this in ggplot2 if it has any built in functions, but as I understand, there is no way to easily change the aspect to a 45degree such as the
aspect="xy"
does in Lattice.
Thank you in advance for any thoughts

If you want to order by regression intercept, it's best to run the regression. For example, with your data we can do
cf<-sapply(Data$Subject, function(x)
coef(lm(Measurement~HOL, data=subset(Data, Subject==x))))
which will give a slope/intercept for each person, we can then create a new factor of Subjects ordered by the intercept with
Sx<-reorder(Data$Subject, cf[1,])
and then use that variable as the grouping variable in the plot
xyplot(Measurement~HOL|Sx,groups=Treatment,data=Data,
type=c('p','r'),auto.key=T,aspect="xy")
And in ggplot you can fix the ratio of x/y with +coord_fixed(ratio=1)

Related

Finding how variable affect output of time-series random-forest regression model

I created a Random-Forest Regression model for time-series data in R that have three predictors and one output variable.
Is there a way to find (perhaps in more absolute terms) how changes in a specific variable affect the prediction output?
I know about variable importance, I am not trying to find the variables that have the biggest effect instead I am trying to see if I pick input variable X_1 and increase its value (or decrease it) how that would change the prediction output.
Does it even makes sense to do this? or is it even possible with a random-forest model? Rereading my question a few times it made me dubious, but any insight/recommendation would be greatly appreciated.
I would guess what this question is actually about is called exploratory data analysis (EDA). For starters, I would calculate the correlations between the variables to get a feeling for the strength of the [linear] relationship between two variables. Further, I would look at scatter plots between the variables to get a feeling for the relationships. Depending on the variables [linear] regression could tell how an increase in variable x1 would affect variable x2.

Integrating a multiple regression equation (including covariates) into ggplot2 graph

I am currently generating simple linear regression plots in ggplot2 with the following code (toy example)
library(ggplot2)
Data<-data.frame(Age=c(40,41,42,43,44,45,46,47,48,49),
Height=c(185,184,183,182,181,186,187,188,189,190),
Sex=c("Male","Male","Male","Male","Male","Female","Female","Female","Female","Female"),
Weight=c(84,83,82,81,80,85,86,87,88,89),
BMI=c(20,21,22,23,24,25,26,27,28,29))
points<-c("#FF9999","turquoise")
lines<-c("dark red","blue")
linetypes<-c(1,1)
x<-ggplot(Data,aes(x=Age,y=Weight))+
geom_point(aes(fill=factor(Sex)),size=4,alpha=1,shape=21,color="transparent")+
geom_smooth(aes(color=Sex,linetype=Sex),method="lm",formula=y~x,lwd=1,se=F)+
scale_fill_manual(values=points,labels=c("Female","Male"))+
scale_colour_manual(values=lines)+
scale_linetype_manual(values=linetypes)+
theme_classic(base_size=15)+
theme(legend.position="top",legend.direction="horizontal")+
labs(fill="Points",colour="Lines",x='Age',y='Weight')+
guides(fill=guide_legend(override.aes=list(shape=c(16,16),alpha=1,size=4,color=c("#FF9999","turquoise"))))+
guides(colour=guide_legend(override.aes=list(linetype=c(1,1),alpha=1,size=1,color=c("dark red","blue"))))+
guides(linetype="none")
x
I am stuck as how to add one-or-more covariates, in order to transform the plots from simple to multiple linear regression, e.g., from
Weight~Age to Weight~Age+Height+BMI
I am sure that it must have something to do with the method="lm" and formula=y~x commands, but I am not clear on exactly what changes need to be made.
Any help would be very much appreciated. Thank you.

Is there an R function for creating an interaction plot of a panelAR model?

In order to strengthen the interpretation of an interaction term I would like to create an interaction plot.
Starting Point: I am analyzing a panel data frame with which I fitted a feasible generalized least squares model by using the panelAR function. It includes an interaction term of two continuous variables.
What I want to do: To create an interaction plot, e.g. following the style of “plot_model” from the package sjPlot (see Three-Way-Interactions: link).
Problem: I could neither find any package which supports the type of my model nor a different way to get a plot.
Question: Is there any workaround which can be used for obtaining an interaction plot or even a package which supports a panelAR model?
Since I am quite new to R I would appreciate every kind of help. Thank you very much

Did I just do an ANCOVA or MANOVA?

I’m trying to do an ANCOVA here ...
I want to analyze the effect of EROSION FORCE and ZONATION on all the species (listed with small letters) in each POOL.STEP (ranging from 1-12/1-4), while controlling for the effect of FISH.
I’m not sure if I’m doing it right. What is the command for ANCOVA?
So far I used lm(EROSIONFORCE~ZONATION+FISH,data=d), which yields:
So what I see here is that both erosion force percentage (intercept?) and sublittoral zonation are significant in some way, but I’m still not sure if I’ve done an ANCOVA correctly here or is this just an ANOVA?
In general, ANCOVA (analysis of covariance) is simply a special case of the general linear model with one categorical predictor (factor) and one continuous predictor (the "covariate"), so lm() is the right function to use.
However ... the bottom line is that you have a moderately challenging statistical problem here, and I would strongly recommend that you try to get local help (if you're working within a research group, can you consult with others in your group about appropriate methods?) I would suggest following up either on CrossValidated or r-sig-ecology#r-project.org
by putting EROSIONFORCE on the left side of the formula, you're specifying that you want to use EROSIONFORCE as a response (dependent) variable, i.e. your model is estimating how erosion force varies across zones and for different fish numbers - nothing about species response
if you want to analyze the response of a single species to erosion and zone, controlling for fish numbers, you need something like
lm(`Acmaeidae s...` ~ EROSIONFORCE+ZONATION+FISH, data=your_data)
the lm() suggestion above would do each species independently, i.e. you'd have to do a separate analysis for each species. If you also want to do it separately for each POOL.STEP you're going to have to do a lot of separate analyses. There are various ways of automating this in R, the most idiomatic is probably to melt your data (see reshape2::melt or tidy::gather) into long format and then use lmList from lme4.
since you have count data with low means, i.e. lots of zeros (and a few big values), you should probably consider a Poisson or negative binomial model, and possibly even a zero-inflated/hurdle model (i.e. analyze presence-absence and size of positive responses separately)
if you really want to analyze the joint distribution of all species (i.e., a response of a multivariate analysis, which is the M in MANOVA), you're going to have to work quite a bit harder ... there are a variety of joint species distribution models by people like Pierre Legendre, David Warton and others ... I'd suggest you try starting with the mvabund package, but you might need to do some reading first

Function to plot model with one variable varying and others constant

It's simple, but I can't remember how this procedure is called, hence I was not able to find the function to do so. I want to explore the effects and gradients of a simple lm() model by plotting the response of one variable at a time, the others being kept constant.
Can anybody tell me which function to use to do so? I seem to remember it's a function generating several plots, or something like this. It could be something akin to sensitivity analysis... Sorry for the beginner question.
Thank you in advance!
The car package has a lot of utilities for analyzing regression models. This sounds like a component+residual plot (or partial residuals plot).
library(car) # for avPlots(...)
fit <- lm(mpg~wt+hp+disp, mtcars)
crPlots(fit)
As noted in the comments, termplot(...) does basically the same thing.

Resources