Quadratic quantile regression in R - r

Does anyone know how to fit a quadratic (or higher order) model on a continuous variable and do quantile regression on it in R? Additionally, how do you tell what level of tau fits the data better?
The values for "den" are fish densities (count/m^3) and salinity = salinity (ppt). The full dataset is 1500 observations and I'd like to predict fish density using salinity. A plot with all the data looks semi-quadratic, but I'd like to compare that fit to others using quantile regression. I just can't figure out how to make the relationship in the model non-linear. Is it den ~ salinity + salinity^2?
df <- structure(list(den = c(0, 12, 8.33, 5, 0, 0, 1, 1.33, 0, 3), salinity = c(37, 35, 36, 39, 36, 37, 35, 38, 36, 37)), row.names = c(86L,
240L, 394L, 548L, 702L, 856L, 1010L, 1164L, 1318L, 1472L), class = "data.frame")
quantreg75 <- rq(den ~ salinity, data=rain, tau=0.75)

Related

Multilevel modeling for repeated measures data - time and lagged variables

I'm very new to multilevel modeling and doing data analysis for repeated measures. I am trying to figure out if my model is set up correctly using the nlme package and if it's set up correctly to answer the question I want to answer. I want to see if ius moderates the relationship between na and worry.
Variables
id - subject id
count - time variable; day of collection
worry - outcome (collected daily, continuous variable)
na - predictor (collected daily, continuous variable)
ius - moderator (collected at baseline, continuous variable)
I also created lag variables for na (lag_na) and worry (lag_worry) so I can control for the previous days na and worry though I'm not sure if this was the right thing to do.
Here is my code:
library(lme4)
# Here's an example dataset:
Dataset <- structure(list(id = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L), levels = c("1", "2", "3",
"5"), class = "factor"), count = c(1, 2, 3, 1, 2, 3, 4, 1, 2,
3, 4, 1, 2, 3, 4, 5, 6), na = c(1, 0, 0, 18, 13, 3, 5, 29, 15,
19, 21, 3, 5, 2, 2, 18, 19), worry = c(0, 1, 0, 0, 0, 0, 0, 2,
2, 1, 2, 0, 0, 3, 0, 4, 3), ius = c(35, 35, 35, 65, 65, 65, 65,
44, 44, 44, 44, 53, 53, 53, 53, 53, 53), lag_na = c(NA, 1, 0,
NA, 18, 13, 3, NA, 29, 15, 19, NA, 3, 5, 2, 2, 18), lag_worry = c(NA,
0, 1, NA, 0, 0, 0, NA, 2, 2, 1, NA, 0, 0, 3, 0, 4)), row.names = c(NA,
-17L), groups = structure(list(id = structure(1:4, levels = c("1",
"2", "3", "5"), class = "factor"), .rows = structure(list(1:3,
4:7, 8:11, 12:17), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -4L), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
model <- lmer(worry ~ na*ius + lag_na + lag_worry + count + (1 | id), REML=FALSE, data = Dataset)
For a variable to be an "effect moderator" (at least as the term is used in epidemiologic discussion) there would need to be a material change in the predictions from models with and without the interaction term in the model. You have a model with an interaction between ius and na
> model <- lmer(worry ~ na*ius + lag_na + lag_worry + count + (1 | id), REML=FALSE, data = Dataset)
> model
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: worry ~ na * ius + lag_na + lag_worry + count + (1 | id)
Data: Dataset
AIC BIC logLik deviance df.resid
49.0113 54.0958 -15.5056 31.0113 4
Random effects:
Groups Name Std.Dev.
id (Intercept) 0.7525
Residual 0.6096
Number of obs: 13, groups: id, 4
Fixed Effects:
(Intercept) na ius lag_na lag_worry count na:ius
2.185346 -0.169745 -0.092891 0.128599 -0.871304 1.004783 0.003021
# Now remove the interaction term
> model <- lmer(worry ~ na + ius + lag_na + lag_worry + count + (1 | id), REML=FALSE, data = Dataset)
> model
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: worry ~ na + ius + lag_na + lag_worry + count + (1 | id)
Data: Dataset
AIC BIC logLik deviance df.resid
47.4562 51.9758 -15.7281 31.4562 5
Random effects:
Groups Name Std.Dev.
id (Intercept) 0.7212
Residual 0.6325
Number of obs: 13, groups: id, 4
Fixed Effects:
(Intercept) na ius lag_na lag_worry count
1.474439 -0.006056 -0.076298 0.122280 -0.840278 0.951945
From what I can see there is almost no change in measures of global fit (AIC, BIC or deviance). Do you want to proceed further in determining what the differences in predictions are with such a small dataset? There would be a difference in the predictions between these two models, but there seems to be little evidence that they would be considered "material". The method of examining what the data shows versus the respective models is described in this post to the stats.SE forum: https://stats.stackexchange.com/questions/33059/testing-for-moderation-with-continuous-vs-categorical-moderators/33090#33090
Plot (scatterplot) worry as the y-axis and na on the x-axis. Then for the non-interaction model plot the single line at the mean of ius, You're going to find some difficulty in doing this sensibly because the values of `ius are not at all normally distributed. (I discovered this when I went to color the points in a scatterplot:
findInterval(Dataset$ius, c(30,45, 52, 66))
[1] 1 1 1 3 3 3 3 1 1 1 1 3 3 3 3 3 3
> table(Dataset$ius)
35 44 53 65
3 4 6 4
When you plot the points with the four groups you find that the ranges of the outcome and the predictor within groups of identical ius measures are much smaller that the full dataset ranges. It really makes little sense to use an interaction model with continuous variables in this setting:
png(); plot(worry~jitter(na,3), Dataset, col=2+findInterval(Dataset$ius, c(30,36, 52, 56, 66))); dev.off()
So I see two compelling reasons not to use this analysis as evidence for effect moderation. Whether you want to built a categorical prediction model might be determined by how much more data could be gathered. Seems to be a pretty sparse dataset for any conclusions, but there is suggestion of some sort of grouping effect.

How do I plot 3 variables along the x-axis in R plot?

How do I label the x-axis as actual, knn, and pca and plot its respective values along the y-axis?
dat.pca.knn <- rbind(actual, knn, pca)
plot(c(1, ncol(dat)),range(dat.pca.knn),type="n",main="Profile plot of probeset 206054_at\nwith actual and imputed values of GSM146784_Normal",xlab="Actual vs imputed values",ylab="Expression Intensity",axes=F,col="red")
axis(side=1,at=1:length(dat.pca.knn),labels=dimnames(dat.pca.knn)[[2]],cex.axis=0.4,col="1",las=2,tick=T)
axis(side=2)
for(i in 1:length(dat.pca.knn)) {
dat.y <- as.numeric(dat.pca.knn[i,])
lines(c(1:ncol(dat.pca.knn)),dat.y,lwd=2,type="p",col="1")
}
Data
dput(dat[1:2,])
structure(c(1942.1, 40.1, 2358.3, 58.2, 2465.2, 132.6, 2732.9,
64.3, 1952.2, 66.1, 2048.3, 69, 2109, 109.7, 3005.1, 59.4, 2568.1,
81.7, 2107.7, 100.8, 1940.2, 170.1, 2608.8, 186.7, 1837.2, 103.8,
1559.2, 86.8, 2111.6, 86, 2641, 152.7, 1972.7, 124.8, 1737.2,
115, 1636.1, 202.1, 2718.4, 257.3), .Dim = c(2L, 20L), .Dimnames = list(
c("1007_s_at", "1053_at"), c("GSM146778_Normal", "GSM146780_Normal",
"GSM146782_Normal", "GSM146784_Normal", "GSM146786_Normal",
"GSM146789_Normal", "GSM146790_Normal", "GSM146792_Normal",
"GSM146794_Normal", "GSM146796_Normal", "GSM146779_Tumor",
"GSM146781_Tumor", "GSM146783_Tumor", "GSM146785_Tumor",
"GSM146787_Tumor", "GSM146788_Tumor", "GSM146791_Tumor",
"GSM146793_Tumor", "GSM146795_Tumor", "GSM146797_Tumor")))
dat.pca.knn
> print(dat.pca.knn)
[,1]
actual 8385.300
knn 7559.533
pca 10418.002
you probably need a barplot if Understand you correctly.
# just to recreate your data
dat.pca.knn <- dplyr::tribble(
~actual, ~knn, ~pca,
8385.300, 7559.533, 10418.002
)
with(
dat.pca.knn,
barplot(height = c(actual, knn, pca),
names.arg = c("actual", "knn", "pca"))
)

Problem with post-hoc emmeans() test after lmerTest

I have a dataset looking at a response variable (Fat %), over time (Week 0-4), and over a treatment condition -- short vs long day.
I used a lmer model test to find out if the variables and interaction term were significant and it was significant. I want to look further at the interaction term (so basically a Tukey test but still accounting for the repeated measures). That's when I started to use the emmeans package and the output is not giving me the full output I would like. Any suggestions I would love, thank you.
here is my data set:
structure(list(`Bird ID` = c(61, 62, 71, 72, 73, 76, 77, 63,
64, 69), Day = c("long", "long", "long", "long", "long", "long",
"long", "short", "short", "short"), Week = c(0, 0, 0, 0, 0, 0,
0, 0, 0, 0), `Body Weight` = c(34.57, 41.05, 37.74, 37.04, 33.38,
35.6, 31.88, 34.32, 35.5, 35.78), `Fat %` = c(2.42718446601942,
2.07515423443634, 11.7329093799682, 8.61137591356848, 5.36031238906638,
7.9879679144385, 1.2263099219621, 5.17970401691332, 8.73096446700508,
3.62993896562801), `Lean %` = c(97.5728155339806, 97.9248457655636,
88.2670906200318, 91.3886240864315, 94.6396876109336, 92.0120320855615,
98.7736900780379, 94.8202959830867, 91.2690355329949, 96.370061034372
), `Fat(g)` = c(0.7, 0.74, 3.69, 2.71, 1.51, 2.39, 0.33, 1.47,
2.58, 1.13), `Lean(g)` = c(28.14, 34.92, 27.76, 28.76, 26.66,
27.53, 26.58, 26.91, 26.97, 30), ID = c(1, 2, 3, 4, 5, 6, 7,
8, 9, 10)), row.names = c(NA, -10L), class = c("tbl_df", "tbl",
"data.frame"))
code I have tried:
model:
model3b <- lmer( `Fat %` ~ Day + Week + Day:Week + (1|ID), data=jussara_data)
summary(model3b)
resp <- jussara_data$`Fat %`
f1 <- jussara_data$Week
f2 <- jussara_data$Day
fit1 = lm(log(resp) ~ f1 + f2 + f1:f2, data = jussara_data)
emm1 = emmeans(fit1, specs = pairwise ~ f1:f2)
emm1$emmeans
emm1$contrasts
The contrasts function I was hoping it would give me the summary looking something like this (but I need the repeated measures included not just this anova analysis):
Fat % groups
4:short 32.065752 a
3:short 27.678036 a
2:short 21.358485 b
4:long 13.895404 c
1:short 13.138941 c
2:long 12.245741 c
3:long 12.138498 c
1:long 10.315978 cd
0:short 6.134327 d
0:long 5.631602 d
but instead only gave me this:
f1 f2 emmean SE df lower.CL upper.CL
2 long 2.24 0.0783 66 2.09 2.40
2 short 2.80 0.0783 66 2.64 2.95
Results are given on the log (not the response) scale.
Confidence level used: 0.95
contrast estimate SE df t.ratio p.value
2 long - 2 short -0.556 0.111 66 -5.025 <.0001
Results are given on the log (not the response) scale.
Thank you for the help!

Plotting both a GLM and LM of same data

I would like to plot both a linear model (LM) and non-linear (GLM) model of the same data.
The range between 16% - 84% should line up between a LM and GLM, Citation: section 3.5
I have included a more complete chunk of the code because I am not sure at which point I should try to cut the linear model. or at which point I have messed up - I think with the linear model.
The code below results in the following image:
My Objective (taken from previous citation-link).
Here is my data:
mydata3 <- structure(list(
dose = c(0, 0, 0, 3, 3, 3, 7.5, 7.5, 7.5, 10, 10, 10, 25, 25, 25, 50, 50, 50),
total = c(25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L),
affected = c(1, 0, 1.2, 2.8, 4.8, 9, 2.8, 12.8, 8.6, 4.8, 4.4, 10.2, 6, 20, 14, 12.8, 23.4, 21.6),
probability = c(0.04, 0, 0.048, 0.112, 0.192, 0.36, 0.112, 0.512, 0.344, 0.192, 0.176, 0.408, 0.24, 0.8, 0.56, 0.512, 0.936, 0.864)),
.Names = c("dose", "total", "affected", "probability"),
row.names = c(NA, -18L),
class = "data.frame")
My script:
#load libraries
library(ggplot2)
library(drc) # glm model
library(plyr) # rename function
library(scales) #log plot scale
#Creating linear model
mod_linear <- lm(probability ~ (dose), weights = total, data = mydata3)
#Creating data.frame: note values 3 and 120 refer to 16% and 84% response in sigmoidal plot
line_df <-expand.grid(dose=exp(seq(log(3),log(120),length=200)))
#Extracting values from linear model
p_line_df <- as.data.frame(cbind(dose = line_df,
predict(mod_linear, newdata=data.frame(dose = line_df),
interval="confidence",level=0.95)))
#Renaming linear df columns
p_line_df <-rename(p_line_df, c("fit"="probability"))
p_line_df <-rename(p_line_df, c("lwr"="Lower"))
p_line_df <-rename(p_line_df, c("upr"="Upper"))
p_line_df$model <-"Linear"
#Create sigmoidal dose-response curve using drc package
mod3 <- drm(probability ~ (dose), weights = total, data = mydata3, type ="binomial", fct=LL.2(names=c("Slope:b","ED50:e")))
#data frame for ggplot2
base_DF_3 <-expand.grid(dose=exp(seq(log(1.0000001),log(10000),length=200)))
#extract data from model
p_df3 <- as.data.frame(cbind(dose = base_DF_3,
predict(mod3, newdata=data.frame(dose = base_DF_3),
interval="confidence", level=.95)))
#renaming columns
p_df3 <-rename(p_df3, c("Prediction"="probability"))
p_df3$model <-"Sigmoidal"
#combining Both DataFames
p_df_all <- rbind(p_df3, p_line_df)
#plotting
ggplot(p_df_all, aes(x=dose,y=probability, group=model))+
geom_line(aes(x=dose,y=probability,group=model,linetype=model),show.legend = TRUE)+
scale_x_log10(breaks = c(0.000001, 10^(0:10)),labels = c(0, math_format()(0:10)))
Looking at the reference you provided, what the authors describe is the use of a linear model to approximate the central portion of a (sigmoidal) logistic function. The linear model that achieves this is a straight line that passes through the inflection point of the logistic curve, and has the same slope as the logistic function at that inflection point. We can use some basic algebra and calculus to solve this problem.
From ?LL.2, we see that the form of the logistic function being fitted by drm is
f(x) = 1 / {1 + exp(b(log(x) - log(e)))}
We can get the values of the coefficient in this equation by
b = mod3$coefficients[1]
e = mod3$coefficients[2]
Now, by differentiation, the slope of the logistic function is given by
dy/dx = -(b * exp((log(x)-log(e))*b)) / (1+exp((log(x)-log(e))*b))^2
At the inflection point, the dose (x) is equal to the coefficient e, thus the slope at the inflection point simplifies (greatly) to
sl50 = -b/4
Since we also know that the inflection point occurs at the point where probability = 0.5 and dose = e, we can construct the straight line (in log-transformed coordinates) like this:
linear_probability = sl50 * (log(p_df3$dose) - log(e)) + 0.5
Now, to plot the logistic and linear functions together:
p_df3_lin = p_df3
p_df3_lin$model = 'linear'
p_df3_lin$probability = linear_probability
p_df_all <- rbind(p_df3, p_df3_lin)
ggplot(p_df_all, aes(x=dose,y=probability, group=model))+
geom_line(aes(x=dose,y=probability,group=model,linetype=model),show.legend = TRUE)+
scale_x_log10(breaks = c(0.000001, 10^(0:10)),labels = c(0, math_format()(0:10))) +
scale_y_continuous(limits = c(0,1))

Plot Regression Surface

I am reading a book by Cohen, Cohen, Aiken and West(2003) "Applied Multiple Regression Correlation Analysis for the Behavioral Sciences" and have come across a 3d plot of a Regression surface showing interaction and no interaction (p. 259). The graphs look like they may have been created using R. I like the graphs as a teaching tool and would like to reproduce them. The plots look something like this:
The only addition to the Coehn et al. plots were lines across the planes at the mean, +1sd, and =1sd for x2. This would be an excellent addition if possible (generally most things are possible with R)
I have provided a sample data set below with an IV, 2 predictors and centered predictors. How would I use R to generate the regression surface (plane) plot showing interaction and an additive model for both the centered and uncentered data (I assume the technique will be the same but want to make sure).
Total of 4 plots:
1. uncentered no interaction
2. uncentered interaction
3. centered no interaction
4. centered interaction
DF<-structure(list(y = c(-1.22, -1.73, -2.64, -2.44, -1.11, 2.24,
3.42, 0.67, 0.59, -0.61, -10.77, 0.93, -8.6, -6.99, -0.12, -2.29,
-5.16, -3.35, -3.35, -2.51, 2.21, -1.18, -5.21, -7.74, -1.34),
x1 = c(39.5, 41, 34, 30.5, 31.5, 30, 41.5, 24, 43, 39, 25.5,
38.5, 33.5, 30, 41, 31, 25, 37, 37.5, 24.5, 38, 37, 41, 37,
36), x2 = c(61L, 53L, 53L, 44L, 49L, 44L, 57L, 47L, 54L,
48L, 46L, 59L, 46L, 61L, 55L, 57L, 59L, 59L, 55L, 50L, 62L,
55L, 55L, 52L, 55L), centered.x1 = c(5.49702380952381, 6.99702380952381,
-0.0029761904761898, -3.50297619047619, -2.50297619047619,
-4.00297619047619, 7.49702380952381, -10.0029761904762, 8.99702380952381,
4.99702380952381, -8.50297619047619, 4.49702380952381, -0.50297619047619,
-4.00297619047619, 6.99702380952381, -3.00297619047619, -9.00297619047619,
2.99702380952381, 3.49702380952381, -9.50297619047619, 3.99702380952381,
2.99702380952381, 6.99702380952381, 2.99702380952381, 1.99702380952381
), centered.x2 = c(9.80357142857143, 1.80357142857143, 1.80357142857143,
-7.19642857142857, -2.19642857142857, -7.19642857142857,
5.80357142857143, -4.19642857142857, 2.80357142857143, -3.19642857142857,
-5.19642857142857, 7.80357142857143, -5.19642857142857, 9.80357142857143,
3.80357142857143, 5.80357142857143, 7.80357142857143, 7.80357142857143,
3.80357142857143, -1.19642857142857, 10.8035714285714, 3.80357142857143,
3.80357142857143, 0.803571428571431, 3.80357142857143)), .Names = c("y",
"x1", "x2", "centered.x1", "centered.x2"), row.names = c(NA,
25L), class = "data.frame")
Thank you in advance.
EDIT: The following code plots the plane but will not work for when you have an interaction (which is really what I'm interested in). Additionally, I don't know how to plot the high (+1sd), low(-1sd) and mean for x2 either.
x11(10,5)
s3d <- scatterplot3d(DF[,c(2,3,1)], type="n", highlight.3d=TRUE,
angle=70, scale.y=1, pch=16, main="scatterplot3d")
# Now adding a regression plane to the "scatterplot3d"
my.lm <- with(DF, lm(y ~ x1 + x2))
s3d$plane3d(my.lm, lty.box = "solid")
An attempt to plot an interaction plane (Seen here):
s3d <- scatterplot3d(DF[,c(2,3,1)], type="n", highlight.3d=TRUE,
angle=70, scale.y=1, pch=16, main="scatterplot3d")
my.lm <- with(DF, lm(y ~ x1 + x2 + x1:x2 ))
s3d$plane3d(my.lm, lty.box = "solid")
Yielded the following error:
Error in segments(x, z1, x + y.max * yx.f, z2 + yz.f * y.max, lty = ltya, :
cannot mix zero-length and non-zero-length coordinates
Here's how I would do it (adding a bit of color) with packages 'rms' and 'lattice':
require(rms) # also need to have Hmisc installed
require(lattice)
ddI <- datadist(DF)
options(datadist="ddI")
lininterp <- ols(y ~ x1*x2, data=DF)
bplot(Predict(lininterp, x1=25:40, x2=45:60),
lfun=wireframe, # bplot passes extra arguments to wireframe
screen = list(z = -10, x = -50), drape=TRUE)
And the non-interaction model:
bplot(Predict(lin.no.int, x1=25:40, x2=45:60), lfun=wireframe, col=2:8, drape=TRUE,
screen = list(z = -10, x = -50),
main="Estimated regression surface with no interaction")

Resources