Adjusting range of predicted values in ggeffects - r

I am using ggpredict to plot the marginal effects of temperature (a continuous variable) from a glmm zero-inflated model:
pr1 = ggpredict(mod, "temp", type = "re.zi")
The function is working properly, but only returns predicted values for 7 random temperatures. Does anyone know how to increase the quantity of x values, to 25 for example?
Thanks,
Andrew

According to the section Marginal Effects at Specific Values of the link https://rdrr.io/github/strengejacke/ggeffects/man/ggpredict.html, the syntax for ggpredict looks something like this:
pr1 = ggpredict(mod, terms = "temp", type = "re.zi")
You can take a random sample of any size by inserting sample=n in square brackets after the name of the temp variable (making sure a white space exists between temp and the sample option), e.g. terms = "temp [sample=25]", which will sample 25 values at random from all possible values of the variable temp:
pr1 = ggpredict(mod, terms = "temp [sample=25]", type = "re.zi")

Related

Using predict in metafor when each author has multiple rows in the data

I'm running a meta-analysis where I'm interested in the effect of X on the effect of age on habitat use (raw mean values and variances) using the metafor package.
An example of one of my models is:
mod6 <-
rma.mv(
yi = Used_value,
V = Used_variance,
slab = Citation,
mods = ~ Age + poly(Slope, degrees = 2),
random = ~ 1 | Region,
data = vel.focal,
method = "ML"
)
My justification for not using Citation as a random effect is that using only Region accounts for more of the heterogeneity than when random = list( ~ 1 | Citation/ID, ~ 1 | Region) or when Citation/ID is used by itself.
What I need for output is the prediction for each age by region, but the predict() function for the model and the associated forest plot spits out the prediction for each row, as it assumes each row in the data is a unique study. In my case it is not as I have my input values separated by age and season.
predict(mod6)
pred se ci.lb ci.ub pi.lb pi.ub
Riehle and Griffith 1993.1 9.3437 2.3588 4.7205 13.9668 0.2362 18.4511
Riehle and Griffith 1993.2 9.3437 2.3588 4.7205 13.9668 0.2362 18.4511
Riehle and Griffith 1993.3 9.3437 2.3588 4.7205 13.9668 0.2362 18.4511
Spina 2000.1 8.7706 2.7386 3.4030 14.1382 -0.7364 18.2776
Spina 2000.2 8.5407 2.7339 3.1824 13.8991 -0.9611 18.0426
Spina 2000.3 8.5584 2.7406 3.1868 13.9299 -0.9509 18.0676
Vondracek and Longanecker 1993.1 12.6116 2.5138 7.6847 17.5385 3.3462 21.8769
Vondracek and Longanecker 1993.2 12.6116 2.5138 7.6847 17.5385 3.3462 21.8769
Vondracek and Longanecker 1993.3 12.3817 2.5327 7.4176 17.3458 3.0965 21.6669
Vondracek and Longanecker 1993.4 12.3817 2.5327 7.4176 17.3458 3.0965 21.6669
Does anybody know a way to modify the arguments inside predict() to tell it how you want your predictions output or to tell it that there are multiple rows per slab?
You need to use the newmods argument to specify the values for Age for which you want predicted values. You will have to plug in something for the linear and quadratic terms for the Slope variable as well (e.g., holding Slope constant at its mean and hence the quadratic term will just be the mean squared). Region is not a fixed effect, so it is not relevant if you want to compute predicted values based on the fixed effects. If you want to compute BLUPs for those random effects, you can do so with ranef(). One can then combine the predictions based on the fixed effects with the BLUPs. That would be the general idea, but implementing this will require a bit of programming.

How to deal with a quadratic model that has too many fitted values?

I'm trying to fit a quadratic regression model to a dataset and then plot the curve on a scatterplot. The dataset is about number of episodes and screentime for characters in a TV show.
I plotted a scatterplot with episodes on x axis and screentime on y axis this worked fine.
Then I create the model as follows:
#ordering
gottemp <- got[order(got$episodes),]
#plotting
plot(screentime~episodes, data = gottemp, xlab ="Number of episodes", ylab = "Screentime (minutes)", col=c("blue","red")[gender], pch=c(1,2)[gender])
legend("topleft",pch = c(1,2),col=c("blue","red"),c("female","male"))
title("Plot of Screentimes vs Number of Episodes")
#creating 3model and plotting line
model <- lm(screentime~episodes+I(episodes^2), data = got)
lines(fitted(model))
This gives me a model with correct coeefficients however the line that is plotted is not what would be expected. When I view the model i see that there are 113 fitted values, which I think is due to some characters having the same number of episodes so to fix this I think there should only be one fitted value for each number of episodes.
Something like
nd <- data.frame(episodes=seq(min(episodes), max(episodes), length=51)
nd$screentime <- predict(model, newdata=nd)
with(nd, lines(episodes, screentime))
should do what you want. There's probably a duplicate around somewhere ...

Coefficients of linear regression y=mx+c using lm() differ in magnitude from what I expect

ddd = lm('USER ID' ~ 'CREATED ON')
summary(ddd)
The slope of line in second image should be approx. (6000-0)/(2017-2016)=6000 but the slope as shown in first image is 2.204e-04. How does this make sense?
(USER ID and CREATED ON are same as no of users and time as shown in plot)
I generated plot using plot(Data1$'CREATED ON', Data1$'USER ID', cex = 0.5, xlab = "Time", ylab = "No.Of Users") then abline(lm('USER ID'~'CREATED ON', Data1), col=4).
At time = 2017, No.of Users ~ 6000 and At time = 2016 No.of Users ~ 0 so slope must be (6000 - 0)/(2017-2016) = 6000, but the slope shown is in 10^-4 magnitude.
CREATED ON column is a Date Time type. class(CREATED ON) gives output "POSIXct" "POSIXt"
Check as.integer(Data1$'CREATED ON'). Date and DateTime object are integers that can be large.
In general, why not just extract the model matrix to see what columns are?
model.matrix.lm(ddd)
This immediately exposes the problem. Regression coefficients are computed using this model matrix.

How can I extract confidence intervals from ezBoot (ez package)?

I am using the ezBoot function from the ez package. I would like to extract the confidence intervals that are plotted with the ezPlot2 function of the same package.
An example can be found in the ezBoot function:
#Read in the ANT data (see ?ANT).
data(ANT)
head(ANT)
ezPrecis(ANT)
#Run ezBoot on the accurate RT data
rt = ezBoot(
data = ANT
, dv = rt
, wid = subnum
, within = .(cue,flank)
, between = group
, iterations = 1e1 #1e3 or higher is best for publication
)
#plot the full design
p = ezPlot2(
preds = rt
, x = flank
, split = cue
, col = group
)
print(p)
How do I extract the confidence intervals?
Nevermind, I didn't read the complete set of arguments for the [ezPlot2 function] (http://www.inside-r.org/packages/cran/ez/docs/ezPlot2). To be fair the initial description says that the functions is for displaying, however if the parameter do_plot is set to true then it will return the point predictions (I am guessing it is the averages) and confidence intervals:
do_plot: Logical. If TRUE, no plot will be produced but instead a data frame
containing point predictions and confidence limits will be returned.

r qqp function - why is the 'perfect fit' a flat line on 0?

This may be more of a statistical question than a programming one. I just wanted to make sure I was getting the programming right first.
I have a large count dataset (108 sites with 31 species = 3348 observations) but a lot of these are 0 counts because only not species were not present at every site. I have had log transformation suggested to me but others have also said that you shouldn't log transform count data. Here is my data for the first 8 species (also contains the very abundant species with the highest counts):
example.abund <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,
0,0,1,0,8,0,1,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,1,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,1,0,0,0,0,2,0,3,1,0,0,0,0,0,0,0,0,0,
2,0,1,1,0,0,0,0,1,1,0,0,1,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,1,
0,1,0,0,0,28,1,0,1,0,0,1,0,2,0,0,2,0,0,0,1,0,0,0,1,0,0,0,2,0,0,1,0,0,
0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,1,2,0,1,0,0,8,7,7,1,1,13,0,8,0,3,0,1,1,
1,4,4,0,1,0,1,0,0,0,0,6,5,2,0,2,58,4,2,47,4,0,0,0,2,59,2,0,0,6,1,36,28,2,
1,1,0,6,0,0,2,5,0,0,0,0,87,7,0,1,1,1,0,0,1,1,0,6,11,0,0,0,3,0,4,0,7,2,
0,5,0,4,1,0,1,12,0,2,0,9,0,1,0,0,0,24,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,3,1,0,1,0,1,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,1,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,15,0,2,
81,0,1,32,26,13,2,61,0,66,2,2,0,17,43,43,0,25,19,2,25,26,91,61,0,13,0,62,186,1,4,22,1,50,3,67,86,11,56,26,74,0,6,8,7,0152,8,14,1,97,1,0,12,11,3,1,1,112,2,35,36,5,61,26,211,15,8,173,17,97,22,18,88,11,1,66,15,3,3,3,2,0,1,0,41,9,14,1,0,38,0,0,51,27,11,38,31,1,0,221,68,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,2,0,0,2,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,29,0,0,0,0,
0,82,12,0,0,3,0,9,0,0,164,0,0,0,0,1,0,15,0,0,0,6,56,0,0,0,6,0,0,1,0,5,5,8,
0,4,0,0,6,0,0,2,0,0,3,0,0,0,0,683,0,0,0,0,3,149,252,11,13,195,19,0,59,0,0,1,28,0,
0,0,0,0,0,0,0,0,0,0,31,55,85,0,142,0,44,52,0,0192,0,45,0,0,0,0,0,0,11,2,0,0,6,
0,0,0,0,0,0,0,0,0,0,0,0,0,19,3,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0)
I am need to make a mixed model to fit the data, but first I am trying to figure out the most appropriate distribution to use. I was following the steps in this blog. But all of the red lines (meant to represent 'perfect fit' for that distribution) are coming up as being 0 along the entire plot.
My question is: have a coded this correctly and there are so many 0s in my data that the perfect fit is 0? Or is there something wrong with the way I have coded?
Code example:
#so that the families without 0s can recognise data
example.abund.1 <- example.abund + 1
plot(hist(example.abund))
qqp(example.abund, "norm")
qqp(example.abund.1, "lnorm") #lognorm
#have to generate estimates of parameters:
nbinom <- fitdistr(example.abund.1, "Negative Binomial")
qqp(example.abund.1, "nbinom", size = nbinom$estimate[[1]], mu = nbinom$estimate[[2]])
poisson <- fitdistr(example.abund.1, "Poisson")
qqp(example.abund.1, "pois", poisson$estimate)
gamma <- fitdistr(example.abund.1, "gamma")
qqp(example.abund.1, "gamma", shape = gamma$estimate[[1]], rate = gamma$estimate[[2]])

Resources