I'm attempting to grab one plot from a multiple plot output. For example
library(mboost);
mod=gamboost(Ozone~.,data=airquality[complete.cases(airquality),]);
plot(mod)
The above creates a plot for each variable's "partial effect". The same could be said for the residual plots created when plotting a linear model (lm). I've attempted to save the output in a list akin to how ggplots can be saved and have spent a few hours searching how to extract just one plot but have failed. Any advice?
As for the context of the question, I'm trying to put the plots into a shiny app and have a variable number of plots show up as output.
Session info is as follows:
R version 2.15.2 (2012-10-26)
Platform: i386-redhat-linux-gnu (32-bit)
Many functions that produce multiple plots also have an argument to select a subset of the plots. In the case of plot.lm it is the which argument. So saying plot(fit, which=1) will only produce one plot.
You can check the mboost documentation to see if there is a similar argument for that plotting function.
Essentially, #greg-snow gave a proper solution. I will elaborate this a bit.
In mboost you can use
plot(mod, which = "Day")
to plot the effect of Day only. As we use regular expressions you can even do much more using the argument which. In a model with linear and smooth effects you can for example extract all smooth effects for plotting:
airquality$Month <- as.factor(airquality$Month)
mod <- mod <- gamboost(Ozone ~ bbs(Solar.R) + bbs(Wind) + bbs(Temp) + bols(Month) + bbs(Day), data=airquality[complete.cases(airquality),])
## now plot bbs effects, i.e., smooth effects:
par(mfrow = c(2,2))
plot(mod, which = "bbs")
## or the linear effect only
par(mfrow = c(1,1))
plot(mod, which = "bols")
You can use any portion of the name (see e.g. names(coef(mod)))to define the effect to be plotted. You can also use integer values to define which effect to plot:
plot(mod, which = 1:2)
Note that this can be also used to certain extract coefficients. E.g.
coef(mod, which = 1)
coef(mod, which = "Solar")
coef(mod, which = "bbs(Solar.R)")
are all the same. For more on how to specify which, both in coef and plot please see our tutorial paper (Hofner et al. (2014), Model-based Boosting in R - A Hands-on Tutorial Using the R Package mboost. Computational Statistics, 29:3-35. DOI 10.1007/s00180-012-0382-5).
We acknowledge that this currently isn't documented in mboost but it is on our todo list (see github issue 14).
(I'm not familiar with GAMboost.)
Looking at the documentation for ?plot.GAMBoost, I see there is an argument called select. I gather you would set this argument to the variable you are interested in, and then you would get just the single plot you want. This is analogous the the which argument in plot.lm that #GregSnow notes.
Related
Forest plots using forestmodel package are really nice for multivariate cox regression. However, I have problems to substitute the name of original variables and factors of my dataframe for final presentation labels (i.e variable:age2 for "Age";factors:0 for "<60", 1 for "≥60") .I have sort of limited knowledge of R coding, but I tried expss package to add labels to variables and factors. However, coxph() does not work with labels but factors.
This is my coding:
Cox proportional model:
mcox<-coxph(pblsurv~age2+sex1+origin,data = pbl)
Forest plot using forestmodel package:
print(forest_model(mcox))
Variable names in the final plot were age2,sex1,origen; therefore, I used expss package to add labels:
pbl <- apply_labels(pbl,
age2 = "Age",age2 = c("<60"=0,"≥60"=1),
sex1 = "Gender",sex1 = c("Female"=0,"Male"=1),
origin = "Ethnicity",origin =c("Non=hispanic"=0, "Hispanic"=1))
However, after applying labels coxph did not work:
mcox<-coxph(pblsurv~age2+sex1+origin,data = pbl)
Error in coxph(pblsurv ~ age2 + sex1 + origin, data = pbl) :
data contains an infinite predictor
Any idea what additional code to use in print(forest_model(mcox)) for a final journal presentation?
I have also enjoyed reporting association measurements using the package forestmodel and have run into the same problem several times. I searched for solutions and found different ways, but all quite complex, especially for R beginners (which is my case). I found a simple solution to the column name issue using the var_label function from the labelled package:
labelled::var_label(pbl$age2) <- "Age group"
labelled::var_label(pbl$sex1) <- "Gender"
labelled::var_label(pbl$origin) <- "Ethnicity"
After that, run forestplot.
This may or may not be a simple question. Any help is appreciated.
I have accessed a paper regarding to GARCH and long memory. It has a figure, particularly Fig. 1.1 that I haven't learnt how to plot it in R. The author said that ACF function has a respective hyperbolic function. It is very important to discover whether the data has long memory or not. So I want to apply this technique to my squared returns. The sample data is supplied in this link.
My code is:
data=read.csv("sample.csv",header=T)
lret=100*diff(log(data$CLOSE))
acf(lret^2)
How do we find the hyperbolic function of ACFs and how do we plot it in ACF graph?
ACF with hyperbolic line
Mikosch and Starica stress that the ACF does not follow a hyperbolic function; that figure is devoted to showing how a misuse of statistical tools can lead to wrong conclusions - the data is shown in the other windows of figure 1.1 to be uncorrelated! Anyways, that is a discussion for Cross Validated Stack Exchange.
You can make non-linear regression fits with nls. I have used the ACF of an AR(2)-process with parameters 0.8 and 0.1 as an example (fit will of course be incorrect here but it demonstrates a few of the problems you may experience when working with autocorrelation functions).
set.seed(1e2)
## AR(2) simulation
arsim <- arima.sim(list(ar = c(0.8,0.1)),n = 1000)
## Autocorrelation function of absolute values:
myacf <- acf(abs(arsim),ci = 0)
## Fit acf = b*x^(-c)
nls_fit <- nls(y ~ b*x^(-c),
data.frame(x = myacf$lag[-1], y = myacf$acf[-1]), #Remove lag 0
start = list(b=1,c=1))
curve(nls_fit$m$getPars()[1]*x^(-nls_fit$m$getPars()[2]),
add = TRUE,col="red")
Note how I remove the data at lag 0 since 0^(-c) does not make sense. This is in agreement with what the authors usually do (ignore at lag 0 - never makes sense to plot anyways. Why it is the default of plot.acf I do not know).
Mikosch usually suggests to remove the iid confidence bands that are shown by default when the data is clearly not iid. You do this with the plot.acf option ci = 0.
I ran a GAMM model with a large dataset (over 20,000 cases) using mgcv. Because of the large number of data points, it is very difficult to see the smoothed lines among the residual points in the plot. Is it possible to specify different colors for the points and the smoothed fit lines?
Here is an example adopted from the mgcv documentation:
library(mgcv)
## simple examples using gamm as alternative to gam
set.seed(0)
dat <- gamSim(1,n=200,scale=2)
b <- gamm(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat)
plot(b$gam, pages=1, residuals=T, col='#FF8000', shade=T, shade.col='gray90')
plot(absbmidiffLog.GAMM$gam, pages=1, residuals=T, pch=19, cex=0.01, scheme=1,
col='#FF8000', shade=T,shade.col='gray90')
I have looked into the visreg package, but it does not seem to work with gamma objects.
I also found it surprisingly difficult/impossible to choose 2 different colors.
A workaround for me is to add the points later, i.e first call plot.gam with residuals=FALSE and then add the points with the base R points() call.
This only works properly though if you shift the gam plot to its proper mean. Here is the code for one of the terms. (Use a for loop to get all four on one page)
library(mgcv)
## simple examples using gamm as alternative to gam
set.seed(0)
dat <- gamSim(1,n=200,scale=2)
b <- gamm(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat)
plot(b$gam, select=3, shift = coef(b$gam)[1], residuals=FALSE, col='#FF8000', shade=T, shade.col='gray90')
points(y~x3, data=dat,pch=20,cex=0.75,col=rgb(1,0.65,0,0.25))
I ran a three way repeated measures ANOVA with ezANOVA.
anova_1<-ezANOVA(data = main_data, dv = .(rt), wid.(id),
within = .(A,B,C), type = 3, detailed = TRUE)
I'm trying to see what's going on with the residuals via a qqplot but I don't know how to get to them or if they'r even there. With my lme models I simply extract them from the model
main_data$model_residuals <- as.numeric(residuals(model_1))
and plot them
residuals_qq<-ggplot(main_data, aes(sample = main_data$model_residuals)) +
stat_qq(color="black", alpha=1, size =2) +
geom_abline(intercept = mean(main_data$model_residuals), slope = sd(main_data$model_residuals))
I'd like to use ggplot since I'd like to keep a sense of consistency in my graphing.
EDIT
Maybe I wasn't clear in what I'm trying to do. With lme models I can simply create the variable model_residuals from the residuals object in the main_data data.frame that then contains the residuals I plot in ggplot. I want to know if something similar is possible for the residuals in ezAnova or if there is a way I can get hold of the residuals for my ANOVA.
I had the same trouble with ezANOVA. The solution I went for was to switch to ez.glm (from the afex package). Both ezANOVA and ez.glm wrap a function from a different package, so you should get the same results.
This would look like this for your example:
anova_1<-ez.glm("id", "rt", main_data, within=c("A","B","C"), return="full")
nice.anova(anova_1$Anova) # show the ANOVA table like ezANOVA does.
Then you can pull out the lm object and get your residuals in the usual way:
residuals(anova_1$lm)
Hope that helps.
Edit: A few changes to make it work with the last version
anova_1<-aov_ez("id", "rt", main_data, within=c("A","B","C"))
print(m1)
print(m1$Anova)
summary(m1$Anova)
summary(m1)
Then you can pull out the lm object and get your residuals in the usual way:
residuals(anova_1$lm)
A quite old post I know, but it's possible to use ggplot to plot the residuals after modeling your data with ez package by using this function:
proj(ez_outcome$aov)[[3]][, "Residuals"]
then:
qplot(proj(ez_outcome$aov)[[3]][, "Residuals"])
Hope it helps.
Also potentially adding to an old post, but I butted up against this problem as well and as this is the first thing that pops up when searching for this question I thought I might add how I got around it.
I found that if you include the return_aov = TRUE argument in the ezANOVA setup, then the residuals are in there, but ezANOVA partitions them up in the resulting list it produces within each main and interaction effect, similar to how base aov() does if you include an Error term for subject ID as in this case.
These can be pulled out into their own list with purrr by mapping the residual function over this aov sublist in ezANOVA, rather than the main output. So from the question example, it becomes:
anova_1 <- ezANOVA(data = main_data, dv = .(rt), wid = .(id),
within = .(A,B,C), type = 3, detailed = TRUE, return_aov = TRUE)
ezanova_residuals <- purrr::map(anova_1$aov, residuals)
This will produce a list where each entry is the residuals from a part of the ezANOVA model for effects and interactions, i.e. $(Intercept), $id, id:a, id:b, id:a:b etc.
I found it useful to then stitch these together in a tibble using enframe and unnest (as the list components will probably be different lengths) to very quickly get them in a long format, that can then be plotted or tested:
ezanova_residuals_tbl <- enframe(ezanova_residuals) %>% unnest
hist(ezanova_residuals_tbl$value)
shapiro.test(ezanova_residuals_tbl$value)
I've not used this myself but the mapping idea also works for the coefficients and fitted.values functions to pull them out of the ezANOVA results, if needed. They might come out in some odd formats and need some extra manipulation afterwards though.
I have conducted an NMDS analysis and have plotted the output too. However, I am unsure how to actually report the results from R. Which parts from the following output are of most importance? The graph that is produced also shows two clear groups, how are you supposed to describe these results?
MDS.out
Call:
metaMDS(comm = dgge2, distance = "bray")
global Multidimensional Scaling using monoMDS
Data: dgge2
Distance: bray
Dimensions: 2
Stress: 0
Stress type 1, weak ties
No convergent solutions - best solution after 20 tries
Scaling: centring, PC rotation, halfchange scaling
Species: expanded scores based on ‘dgge2’
The most important pieces of information are that stress=0 which means the fit is complete and there is still no convergence. This happens if you have six or fewer observations for two dimensions, or you have degenerate data. You should not use NMDS in these cases. Current versions of vegan will issue a warning with near zero stress. Perhaps you had an outdated version.
I think the best interpretation is just a plot of principal component. yOu can use plot and text provided by vegan package. Here I am creating a ggplot2 version( to get the legend gracefully):
library(vegan)
library(ggplot2)
data(dune)
ord = metaMDS(comm = dune)
ord_spec <- scores(ord, "spec")
ord_spec <- cbind.data.frame(ord_spec,label=rownames(ord_spec))
ord_sites <- scores(ord, "sites")
ord_sites <- cbind.data.frame(ord_sites,label=rownames(ord_sites))
ggplot(data=ord_spec,aes(x=NMDS1,y=NMDS2)) +
geom_text(aes(label=label,col='species')) +
geom_text(data=ord_sites,aes(label=label,col='sites'))