Is there any method can set different labels with plot.gam? - r

I am trying to plot my gam result. I want to turn the labels of the plots into Chinese. But, the x label will be used for all plots. How to creat different x-labels for different plots?
fit <- gam(happiness ~ s(age) + s(edu) + s(mobility), family = octa(R=5), data = data) plot(fit, xlab = c("年龄","教育”))

You could simply change the column names, not sure how to do this in Chinese though.
library(mgcv)
set.seed(2) ## simulate some data...
dat <- gamSim(1,n=400,dist="normal",scale=2)[1:3]
names(dat)[2:3] <- c("ONE", "TWO")
b <- gam(y~s(ONE)+s(TWO),data=dat)
plot(b,pages=1,residuals=TRUE) ## show partial residuals

Related

Add r square and equation on a scatter plot

My r² and equation doesn't show on my scatter plot using visreg function
library(visreg)
variable1_lm<- lm(variable1 ~ variable2,
data = PCA)
visreg(variable1_lm , "variable2", gg = TRUE)
What code should I use so it displays on top of it?
Thanks
You should reproducible data with your code. Here is an example with the iris data set that comes with R:
library(visreg)
library(ggplot2)
SL.lm <- lm(Sepal.Length~Sepal.Width, iris)
RSQ <- summary(SL.lm)$r.squared
visreg(SL.lm, gg=TRUE) + ggtitle(paste("R-Square =", round(RSQ, 4)))

How to plot numerous polygons in each data category?

I am working with ggplot to plot bivariate data in groups along with standard ellipses of these data using a separate set of tools. These return n=100 x,y coordinates that define each ellipse, and then for each group, I would like to plot about 10-25 ellipses.
Conceptually, how can this be achieved? I can plot a single ellipse easily using geom_polygon, but I am confused how to get the data organized to make it work so multiple ellipses are plotted and guides (color, fills, linetypes, etc.) are applied per group.
In the traditional R plotting, I could just keep adding lines using a for loop.
Thanks!
UPDATE: Here is a CSV containing 100 coordinates for a single ellipse.
Data
Let's say I have three groups of bivariate data to which the ellipse fitting has been applied: Green, Red, Blue. For each group, I'd like to plot several ellipses.
I don't know how I would organize the data in such a way to work in the long format prefered by ggplot and preserve the group affiliations. Would a list work?
UPDATE2:
Here is a csv of raw x and y data organized into two groups: river and lake
Data
The data plot like this:
test.data <- read.csv("ellipse_test_data.csv")
ggplot(test.data) +
geom_point(aes(x, y, color = group)) +
theme_classic()
I am using a package called SIBER, which will fit Bayesian ellipses to the data for comparing groups by ellipse area, etc. The output of the following creates a list with number of elements = number of groups of data, and each element contains a 6 x n (n=number of draws) for each fitted ellipse - first four columns are a covariance matrix Sigma in vector format and the last two are the bivariate means:
# options for running jags
parms <- list()
parms$n.iter <- 2 * 10^5 # number of iterations to run the model for
parms$n.burnin <- 1 * 10^3 # discard the first set of values
parms$n.thin <- 100 # thin the posterior by this many
parms$n.chains <- 2 # run this many chains
# define the priors
priors <- list()
priors$R <- 1 * diag(2)
priors$k <- 2
priors$tau.mu <- 1.0E-3
# fit the ellipses which uses an Inverse Wishart prior
# on the covariance matrix Sigma, and a vague normal prior on the
# means. Fitting is via the JAGS method.
ellipses.test <- siberMVN(siber.test, parms, priors)
First few rows of the first element in the list:
$`1.river`
Sigma2[1,1] Sigma2[2,1] Sigma2[1,2] Sigma2[2,2] mu[1] mu[2]
[1,] 1.2882740 2.407070e-01 2.407070e-01 1.922637 -15.52846 12.14774
[2,] 1.0677979 -3.997169e-02 -3.997169e-02 2.448872 -15.49182 12.37709
[3,] 1.1440816 7.257331e-01 7.257331e-01 4.040416 -15.30151 12.14947
I would like to be able to extract a random number of these ellipses and plot them with ggplot using alpha transparency.
The package SIBER has a function (addEllipse) to convert the '6 x n' entries to a set number of x and y points that define an ellipse, but I don't know how to organize that output for ggplot. I thought there might be an elegant way to do with all internally with ggplot.
The ideal output would be something like this, but in ggplot so the ellipses could match the aesthetics of the levels of data:
some code to do this on the bundled demo dataset from SIBER.
In this example we try to create some plots of the multiple samples of the posterior ellipses using ggplot2.
library(SIBER)
library(ggplot2)
library(dplyr)
library(ellipse)
Fit a basic SIBER model to the example data bundled with the package.
# load in the included demonstration dataset
data("demo.siber.data")
#
# create the siber object
siber.example <- createSiberObject(demo.siber.data)
# Calculate summary statistics for each group: TA, SEA and SEAc
group.ML <- groupMetricsML(siber.example)
# options for running jags
parms <- list()
parms$n.iter <- 2 * 10^4 # number of iterations to run the model for
parms$n.burnin <- 1 * 10^3 # discard the first set of values
parms$n.thin <- 10 # thin the posterior by this many
parms$n.chains <- 2 # run this many chains
# define the priors
priors <- list()
priors$R <- 1 * diag(2)
priors$k <- 2
priors$tau.mu <- 1.0E-3
# fit the ellipses which uses an Inverse Wishart prior
# on the covariance matrix Sigma, and a vague normal prior on the
# means. Fitting is via the JAGS method.
ellipses.posterior <- siberMVN(siber.example, parms, priors)
# The posterior estimates of the ellipses for each group can be used to
# calculate the SEA.B for each group.
SEA.B <- siberEllipses(ellipses.posterior)
siberDensityPlot(SEA.B, xticklabels = colnames(group.ML),
xlab = c("Community | Group"),
ylab = expression("Standard Ellipse Area " ('\u2030' ^2) ),
bty = "L",
las = 1,
main = "SIBER ellipses on each group"
)
Now we want to create some plots of some sample ellipses from these distributions. We need to create a data.frame object of all the ellipses for each group. In this exmaple we simply take the frist 10 posterior draws assuming them to be independent of one another, but you could take a random sample if you prefer.
# how many of the posterior draws do you want?
n.posts <- 10
# decide how big an ellipse you want to draw
p.ell <- 0.95
# for a standard ellipse use
# p.ell <- pchisq(1,2)
# a list to store the results
all_ellipses <- list()
# loop over groups
for (i in 1:length(ellipses.posterior)){
# a dummy variable to build in the loop
ell <- NULL
post.id <- NULL
for ( j in 1:n.posts){
# covariance matrix
Sigma <- matrix(ellipses.posterior[[i]][j,1:4], 2, 2)
# mean
mu <- ellipses.posterior[[i]][j,5:6]
# ellipse points
out <- ellipse::ellipse(Sigma, centre = mu , level = p.ell)
ell <- rbind(ell, out)
post.id <- c(post.id, rep(j, nrow(out)))
}
ell <- as.data.frame(ell)
ell$rep <- post.id
all_ellipses[[i]] <- ell
}
ellipse_df <- bind_rows(all_ellipses, .id = "id")
# now we need the group and community names
# extract them from the ellipses.posterior list
group_comm_names <- names(ellipses.posterior)[as.numeric(ellipse_df$id)]
# split them and conver to a matrix, NB byrow = T
split_group_comm <- matrix(unlist(strsplit(group_comm_names, "[.]")),
nrow(ellipse_df), 2, byrow = TRUE)
ellipse_df$community <- split_group_comm[,1]
ellipse_df$group <- split_group_comm[,2]
ellipse_df <- dplyr::rename(ellipse_df, iso1 = x, iso2 = y)
Now to create the plots. First plot all the raw data as we want.
first.plot <- ggplot(data = demo.siber.data, aes(iso1, iso2)) +
geom_point(aes(color = factor(group):factor(community)), size = 2)+
ylab(expression(paste(delta^{15}, "N (\u2030)")))+
xlab(expression(paste(delta^{13}, "C (\u2030)"))) +
theme(text = element_text(size=15))
print(first.plot)
Now we can try to add the posterior ellipses on top and facet by group
second.plot <- first.plot + facet_wrap(~factor(group):factor(community))
print(second.plot)
# rename columns of ellipse_df to match the aesthetics
third.plot <- second.plot +
geom_polygon(data = ellipse_df,
mapping = aes(iso1, iso2,
group = rep,
color = factor(group):factor(community),
fill = NULL),
fill = NA,
alpha = 0.2)
print(third.plot)
Facet-wrapped plot of sample of posterior ellipses by group

Graphing a large number of plots

I'm fitting a dose-response curve to many data sets that I want to plot to a single file.
Here's how one data set looks like:
df <- data.frame(dose=c(10,0.625,2.5,0.156,0.0391,0.00244,0.00977,0.00061,10,0.625,2.5,0.156,0.0391,0.00244,0.00977,0.00061,10,0.625,2.5,0.156,0.0391,0.00244,0.00977,0.00061),viability=c(6.12,105,57.9,81.9,86.5,98.3,96.4,81.8,27.3,85.2,80.8,92,82.5,110,90.2,76.6,11.9,89,35.4,79,95.8,117,82.1,95.1),stringsAsFactors=F)
Here's the dose-response fit:
library(drc)
fit <- drm(viability~dose,data=df,fct=LL.4(names=c("Slope","Lower Limit","Upper Limit","ED50")))
Now I'm predicting values in order to plot the curve:
pred.df <- expand.grid(dose=exp(seq(log(max(df$dose)),log(min(df$dose)),length=100)))
pred <- predict(fit,newdata=pred.df,interval="confidence")
pred.df$viability <- pred[,1]
pred.df$viability.low <- pred[,2]
pred.df$viability.high <- pred[,3]
And this is how a single plot looks like:
library(ggplot2)
p <- ggplot(df,aes(x=dose,y=viability))+geom_point()+geom_ribbon(data=pred.df,aes(x=dose,y=viability,ymin=viability.low,ymax=viability.high),alpha=0.2)+labs(y="viability")+
geom_line(data=pred.df,aes(x=dose,y=viability))+coord_trans(x="log")+theme_bw()+scale_x_continuous(name="dose",breaks=sort(unique(df$dose)),labels=format(signif(sort(unique(df$dose)),3),scientific=T))+ggtitle(label="all doses")
adding a few parameter estimates to the plot:
params <- signif(summary(fit)$coefficient[-1,1],3)
names(params) <- c("lower","upper","ed50")
p <- p + annotate("text",size=3,hjust=0,x=2.4e-3,y=5,label=paste(sapply(1:length(params),function(p) paste0(names(params)[p],"=",params[p])),collapse="\n"),colour="black")
Which gives:
Now suppose I have 20 of these that I want to cram in a single figure file.
I thought that a reasonable solution would be to use grid.arrange:
As an example I'll loop 20 times on this example data set:
plot.list <- vector(mode="list",20)
for(i in 1:20){
plot.list[[i]] <- ggplot(df,aes(x=dose,y=viability))+geom_point()+geom_ribbon(data=pred.df,aes(x=dose,y=viability,ymin=viability.low,ymax=viability.high),alpha=0.2)+labs(y="viability")+
geom_line(data=pred.df,aes(x=dose,y=viability))+coord_trans(x="log")+theme_bw()+scale_x_continuous(name="dose",breaks=sort(unique(df$dose)),labels=format(signif(sort(unique(df$dose)),3),scientific=T))+ggtitle(label="all doses")+
annotate("text",size=3,hjust=0,x=2.4e-3,y=5,label=paste(sapply(1:length(params),function(p) paste0(names(params)[p],"=",params[p])),collapse="\n"),colour="black")
}
And then plot using:
library(grid)
library(gridExtra)
grid.arrange(grobs=plot.list,ncol=3,nrow=ceiling(length(plot.list)/3))
Which is obviously poorly scaled. So my question is how to create this figure with better scaling - meaning that all objects are compressed proportionally in way that produces a figure that is still visually interperable.
You should set the device size so that the plots remain readable, e.g.
pl = replicate(11, qplot(1,1), simplify = FALSE)
g = arrangeGrob(grobs = pl, ncol=3)
ggsave("plots.pdf", g, width=15, height=20)

Place different QQ plot (with different datasets) in the same coordinate system

I can only get the qq plot one by one with different datasets..
library(fitdistrplus)
x1<-c(1300,541,441,35,278,167,276,159,126,60.8,160,5000,264.6,379,170,251.3,155.84,187.01,850)
x2<-c(25,500,42,100,10,8.2,76,2.2,7.86,50)
y1<-log10(x1)
y2<-log10(x2)
x1.logis <- fitdist(y1, "logis", method="mle")
x2.logis <- fitdist(y2, "logis", method="mle")
ppcomp(x1.logis, addlegend=FALSE)
ppcomp(x2.logis, addlegend=FALSE)
How can i place the two qq plot in same coordinate system?
Use ggplot2. You need to extract your fitted values from the fitdist object n and make a new data frame. Use ggplot2 layers to add the fitted values from the two data sets and then add an abline.
library(ggplot2)
fittedx1 <- data.frame(x = sort(plogis(x1.logis$data,
location = x1.logis$estimate[1],
scale = x1.logis$estimate[2])),
p = (1:length(x1.logis$data))/length(x1.logis$data))
fittedx2 <- data.frame(x = sort(plogis(x2.logis$data,
location = x2.logis$estimate[1],
scale = x2.logis$estimate[2])),
p = (1:length(x2.logis$data))/length(x2.logis$data))
fitted <- rbind(fittedx1,fittedx2) #You need to combine the two datasets
#Add a variable that identifies which dataset the values belong to
#Then you can use the col option in ggplot to give each data set its own color!
fitted$set <- c(rep("1", nrow(fittedx1)), rep("2", nrow(fittedx2)))
#Now plot
ggplot(fitted) +
geom_point(aes(p, x, col=set), shape=1, size=3) +
geom_abline(intercept=0, slope=1)

Change colors of select lines in ggplot2 coefficient plot in R

I would like to change the color of coefficient lines based on whether the point estimate is negative or positive in a ggplot2 coefficient plot in R. For example:
require(coefplot)
set.seed(123)
dat <- data.frame(x = rnorm(100), z = rnorm(100))
mod1 <- lm(y1 ~ x + z, data = dat)
coefplot.lm(mod1)
Which produces the following plot:
In this plot, I would like to change the "x" variable to red when plotted. Any ideas? Thanks.
I think, you cannot do this with a plot produced by coefplot.lm. The package coefplot uses ggplot2 as the plotting system, which is good itself, but does not allow to play with colors as easily as you would like. To achieve the desired colors, you need to have a variable in your dataset that would color-code the values; you need to specify color = color-code in aes() function within the layer that draws the dots with CE. Apparently, this is impossible to do with the output of coefplot.lm function. Maybe, you can change the colors using ggplot2 ggplot_build() function. I would say, it's easier to write your own function for this task.
I've done this once to plot odds. If you want, you may use my code. Feel free to change it. The idea is the same as in coefplot. First, we extract coefficients from a model object and prepare the data set for plotting; second, actually plot.
The code for extracting coefficients and data set preparation
df_plot_odds <- function(x){
tmp<-data.frame(cbind(exp(coef(x)), exp(confint.default(x))))
odds<-tmp[-1,]
names(odds)<-c('OR', 'lower', 'upper')
odds$vars<-row.names(odds)
odds$col<-odds$OR>1
odds$col[odds$col==TRUE] <-'blue'
odds$col[odds$col==FALSE] <-'red'
odds$pvalue <- summary(x)$coef[-1, "Pr(>|t|)"]
return(odds)
}
Plot the output of the extract function
plot_odds <- function(df_plot_odds, xlab="Odds Ratio", ylab="", asp=1){
require(ggplot2)
p <- ggplot(df_plot_odds, aes(x=vars, y=OR, ymin=lower, ymax=upper),asp=asp) +
geom_errorbar(aes(color=col),width=0.1) +
geom_point(aes(color=col),size=3)+
geom_hline(yintercept = 1, linetype=2) +
scale_color_manual('Effect', labels=c('Positive','Negative'),
values=c('blue','red'))+
coord_flip() +
theme_bw() +
theme(legend.position="none",aspect.ratio = asp)+
ylab(xlab) +
xlab(ylab) #switch because of the coord_flip() above
return(p)
}
Plotting your example
set.seed(123)
dat <- data.frame(x = rnorm(100),y = rnorm(100), z = rnorm(100))
mod1 <- lm(y ~ x + z, data = dat)
df <- df_plot_odds(mod1)
plot <- plot_odds(df)
plot
Which yields
Note that I chose theme_wb() as the default. Output is a ggplot2object. So, you may change it quite a lot.

Resources