Customize formula in geom-smooth / ggplot2 / R - r

I want to customize the formula used in geom_smooth like this:
library(MASS)
library(ggplot2)
data("Cars93", package = "MASS")
str(Cars93)
Cars93.log <- transform(Cars93, log.price = log(Price))
log.model <- lm(log.price ~ Horsepower*Origin, data = Cars93.log)
summary(log.model)
plot(log.model)
p <- ggplot(data = Cars93.log, aes(x = Horsepower, y = log.price, colour = Origin)) +
geom_point(aes(shape = Origin, color = Origin)) + # Punkte
facet_grid(~ Origin) +
theme(axis.title.x = element_text(margin=margin(15,0,0,0)),
axis.title.y = element_text(margin=margin(0,15,0,0))) +
scale_y_continuous(n.breaks = 7) +
scale_colour_manual(values = c("USA" = "red","non-USA" = "black")) +
scale_shape_manual(values = c(16,16)) +
ylab("Price(log)")
lm.mod <- function(df) {
y ~ x*Cars93.log$Origin
}
p_smooth <- by(Cars93.log, Cars93.log$Origin,
function(x) geom_smooth(data=x, method = lm, formula = lm.mod(x)))
p + p_smooth
However, I receive the error that the computation failed because of different lengths of my used variables.
length(Cars93.log$log.price)
length(Cars93.log$Origin)
length(Cars93.log$Horsepower)
But when I check the length for each variable they're all the same... Any ideas, what's wrong?
Thanks a lot, Martina

I agree with #Rui Barradas, seems like the issue is the lines for lm.mod and p_smooth and the by function
Once you are making a distinction by Origin (e.g., by doing either facet_wrap or color = Origin) then geom_smooth will automatically run different models for those facets.
p <- ggplot(data = Cars93.log,
aes(x = Horsepower, y = log.price, color = Origin)) +
geom_point(aes(shape = Origin)) +
facet_wrap(~ Origin) +
theme(axis.title.x = element_text(margin=margin(15,0,0,0)),
axis.title.y = element_text(margin=margin(0,15,0,0))) +
scale_y_continuous(n.breaks = 7) +
scale_colour_manual(values = c("USA" = "red","non-USA" = "black")) +
scale_shape_manual(values = c(16,16)) +
ylab("Price(log)")
p + geom_smooth(method = lm, formula = y ~ x)
you can convince yourself that this is the same as the output of log.model by extending the x-axis limits to see where the geom_smooth line would cross the y axis (e.g., + coord_cartesian(xlim = c(0, 300)))
You can also see the difference in the graph if you don't pass color = Origin to the geom_smooth function (essentially what is happening if you comment this out from the first ggplot() initialization):
p <- ggplot(data = Cars93.log,
aes(x = Horsepower, y = log.price)) + # color = Origin)) +
geom_point(aes(shape = Origin)) +
#facet_wrap(~ Origin) +
theme(axis.title.x = element_text(margin=margin(15,0,0,0)),
axis.title.y = element_text(margin=margin(0,15,0,0))) +
scale_y_continuous(n.breaks = 7) +
scale_colour_manual(values = c("USA" = "red","non-USA" = "black")) +
scale_shape_manual(values = c(16,16)) +
ylab("Price(log)")
p + geom_smooth(method = lm, formula = y ~ x)

Related

How to color the area between two geom_smooth lines?

I have 3 columns in a data frame from which I want to create a visualisation with geom_smooth() :
ggplot(my_data_frame) +
aes(x = fin_enquete,
y = intentions,
colour = candidat) +
geom_point(alpha = 1/6,
shape = "circle",
size = .5L) +
geom_smooth(mapping = aes(y = erreur_inf),
size = .5L,
span = .42,
se = F) +
geom_smooth(mapping = aes(y = erreur_sup),
size = .5L,
span = .42,
se = F) +
geom_smooth(method = "loess",
size = 1.5L,
span = .42,
se = F) +
labs(x = "Date de fin d'enquĂȘte",
y = "Pourcentage d'intentions de vote") +
theme_minimal() +
theme(text = element_text(family = "DIN Pro")) +
coord_cartesian(expand = F) +
easy_remove_legend()
3 lines with geom_smooth
I would like to color the area between the upper and the lower line. I know the geom_ribbon() function but I am not sure I can use it in this situation.
Does anybody have a solution?
Have a nice day!
You could use geom_ribbon and calculate the loess model yourself within the geom_ribbon call?
Toy random data
dat <- data.frame(x=1:100, y=runif(100), y2=runif(100)+1, y3=runif(100)+2)
Now suppose we want a smoothed ribbon between y and y3, with y2 drawn as a line between them:
ggplot( dat , aes(x, y2)) +
geom_ribbon(aes(ymin=predict(loess(y~x)),
ymax=predict(loess(y3~x))), alpha=0.3) +
geom_smooth(se=F)
You could use lapply() smooth to calculate the range of df values such as (5,11,13) to calculate the smooths and plot only the two edges of the se.
Sample code:
library(ggplot2)
ggplot(data = mtcars,
mapping = aes(x = wt,
y = mpg)) +
geom_point(size = 2)+
lapply(c(5,11, 13), function (i) {
geom_smooth(
data = ~ cbind(., facet_plots = i),
method = lm,
se=F,
formula = y ~ splines::bs(x, i)
)
})+
#facet_wrap(vars(facet_plots))
geom_ribbon(
stat = "smooth",
method = "loess",
se = TRUE,
alpha = 0, # or, use fill = NA
colour = "black",
linetype = "dotted")+
theme_minimal()
Plot:

How can I add a layer showing the distribution on a conditional variable in a probability plot in R studio?

I am fitting the following regression:
model <- glm(DV ~ conditions + predictor + conditions*predictor, family = binomial(link = "probit"), data = d).
I use 'sjPlot' (and 'ggplot2') to make the following plot:
library("ggplot2")
library("sjPlot")
plot_model(model, type = "pred", terms = c("predictor", "conditions")) +
xlab("Xlab") +
ylab("Ylab") +
theme_minimal() +
ggtitle("Title")>
But I can't figure out how to add a layer showing the distribution on the conditioning variable like I can easily do by setting "hist = TRUE" using 'interplot':
library("interplot")
interplot(model, var1 = "conditions", var2 = "predictor", hist = TRUE) +
xlab("Xlab") +
ylab("Ylab") +
theme_minimal() +
ggtitle("Title")
I have tried a bunch of layers using just ggplot as well, with no success
ggplot(d, aes(x=predictor, y=DV, color=conditions))+
geom_smooth(method = "glm") +
xlab("Xlab") +
ylab("Ylab") +
theme_minimal() +
ggtitle("Title")
.
I am open to any suggestions!
I've obviously had to try to recreate your data to get this to work, so it won't be faithful to your original, but if we assume your plot is something like this:
p <- plot_model(model, type = "pred", terms = c("predictor [all]", "conditions")) +
xlab("Xlab") +
ylab("Ylab") +
theme_minimal() +
ggtitle("Title")
p
Then we can add a histogram of the predictor variable like this:
p + geom_histogram(data = d, inherit.aes = FALSE,
aes(x = predictor, y = ..count../1000),
fill = "gray85", colour = "gray50", alpha = 0.3)
And if you wanted to do the whole thing in ggplot, you need to remember to tell geom_smooth that your glm is a probit model, otherwise it will just fit a normal linear regression. I've copied the color palette over too for this example, though note the smoothing lines for the groups start at their lowest x value rather than extrapolating back to 0.
ggplot(d, aes(x = predictor, y = DV, color = conditions))+
geom_smooth(method = "glm", aes(fill = conditions),
method.args = list(family = binomial(link = "probit")),
alpha = 0.15, size = 0.5) +
xlab("Xlab") +
scale_fill_manual(values = c("#e41a1c", "#377eb8")) +
scale_colour_manual(values = c("#e41a1c", "#377eb8")) +
ylab("Ylab") +
theme_minimal() +
ggtitle("Title") +
geom_histogram(aes(y = ..count../1000),
fill = "gray85", colour = "gray50", alpha = 0.3)
Data
set.seed(69)
n_each <- 500
predictor <- rgamma(2 * n_each, 2.5, 3)
predictor <- 1 - predictor/max(predictor)
log_odds <- c((1 - predictor[1:n_each]) * 5 - 3.605,
predictor[n_each + 1:n_each] * 0 + 0.57)
DV <- rbinom(2 * n_each, 1, exp(log_odds)/(1 + exp(log_odds)))
conditions <- factor(rep(c(" ", " "), each = n_each))
d <- data.frame(DV, predictor, conditions)

Add item to legend by theme options

I have a data frame d like this:
d <- data.frame("name" = c("pippo","pluto","paperino"),
"id" = c(1,2,3),"count" = c(10,20,30),
"pvalue"=c(0.01,0.02,0.05),
geneRatio=c(0.5,0.8,0.2),
type=c("KEGG","Reactome","Reactome"))
and I plot a dotplot using the library ggplot:
ggplot(data = d,aes(geneRatio,name,size=count,colour = pvalue)) +
geom_point()+
ggtitle("Significantly Pathways") +
xlab("Gene Ratio") +
ylab("Pathways")+
theme(axis.text.y = element_text(color=d$type))
This is the plot at the moment
I would like to add to legend the information of "type" contained in dataframe d.
I would like to have a new item in the legend with color red = Reactome and color black= KEGG
Not saying that this is a good idea, but you can add a nonsensical geom to force the adding of a guide:
d <- data.frame("name" = c("pippo","pluto","paperino"),
"id" = c(1,2,3),
"count" = c(10,20,30),
"value"=c(0.01,0.02,0.05),
geneRatio=c(0.5,0.8,0.2),
type=c("KEGG","Reactome","Reactome")
)
library(ggplot2)
ggplot(data = d, aes(geneRatio,name,colour = pvalue)) +
geom_point(aes(size=count))+
geom_polygon(aes(geneRatio,name,fill = type)) +
ggtitle("Significantly Pathways") +
xlab("Gene Ratio") +
ylab("Pathways") +
scale_fill_manual(values = c('Reactome'='red', 'KEGG'='black')) +
theme(axis.text.y = element_text(color=d$type))
geom_polygon may not work with your actual data, and you may not find a suitable 'nonsensical' geom. I agree with #zx8754, a facet would be clearer:
ggplot(data = d, aes(geneRatio,name,colour = pvalue)) +
geom_point(aes(size=count)) +
ggtitle("Significantly Pathways") +
xlab("Gene Ratio") +
ylab("Pathways") +
facet_grid(type ~ ., scales = 'free_y', switch = 'y')
You could accomplish this using annotate, but it is a bit manual.
ggplot(data = d, aes(geneRatio, name, size = count, colour = pvalue)) +
geom_point() +
ggtitle("Significantly Pathways") +
xlab("Gene Ratio") +
ylab("Pathways")+
theme(axis.text.y = element_text(color=d$type)) +
annotate("text", x = 0.25, y = 3.5, label = "Reactome", color = "red") +
annotate("text", x = 0.25, y = 3.4, label = "KEGG", color = "black")

ggplot error bar legend

I am having difficulties adding a legend to my error bar plot. I tried several command that I've seen in other subject, but unfortunately it doesn't work (I am sure I'm missing something but I can't figure out what)
library(ggplot2)
errors=matrix(c(-3.800904,-3.803444,-3.805985,-3.731204,-3.743969,
-3.756735,-3.742510,-3.764961,-3.787413,-3.731204,-3.743969,-3.756735,
-3.711420,-3.721589,-3.731758,-3.731204,-3.743969,-3.756735,-3.636346,
-3.675159,-3.713971,-3.731204,-3.743969,-3.756735),nrow=4,byrow=TRUE)
modelName=c("model 1","model 2","model 3","model 0")
boxdata=data.frame(errors,modelName)
colnames(boxdata)=c("icp","pred","icm","icp_obs","obs","icm_obs","model")
qplot(boxdata$model,boxdata$pred,
main = paste("confidance level 95% for age ", age_bp + start_age - 1,sep="")) +
geom_errorbar(aes(x=boxdata$model, ymin=boxdata$icm, ymax=boxdata$icp), width=0.20,col='deepskyblue') +
geom_point(aes(x=boxdata$model,y=boxdata$obs),shape=4,col="orange") +
geom_errorbar(aes(x=boxdata$model, ymin=boxdata$icm_obs, ymax=boxdata$icp_obs), width=0.20,col='red') +
scale_shape_manual(name="legend", values=c(19,4)) +
scale_color_manual(name="legend", values = c("black","orange")) +
xlab("models") +
ylab("confidence level")
The problem is that you are using wide form data rather than long form data. You need to convert the data from wide to long before plotting if you want to get a legend.
library(ggplot2)
errors=matrix(c(-3.800904,-3.803444,-3.805985,-3.731204,-3.743969,
-3.756735,-3.742510,-3.764961,-3.787413,-3.731204,-3.743969,-3.756735,
-3.711420,-3.721589,-3.731758,-3.731204,-3.743969,-3.756735,-3.636346,
-3.675159,-3.713971,-3.731204,-3.743969,-3.756735),nrow=4,byrow=TRUE)
errors = rbind(errors[, 1:3], errors[,4:6]) # manually reshaping the data
modelName=c("model 1","model 2","model 3","model 0")
type = rep(c("model", "obs"), each = 4)
boxdata=data.frame(errors,modelName, type)
colnames(boxdata)=c("icp","pred","icm","model", "type")
ggplot(boxdata, aes(x = model, y = pred, ymax = icp, ymin = icm,
group = type, colour = type, shape = type)) +
geom_errorbar(width=0.20) +
geom_point() +
scale_shape_manual(values=c(19, 4)) +
scale_color_manual(values = c("black","orange")) +
xlab("models") +
ylab("confidence level")
The output looks closer to your output can be generated by:
ggplot(boxdata, aes(x = model, y = pred, ymax = icp, ymin = icm,
group = type, colour = type, shape = type)) +
geom_errorbar(width=0.20) +
geom_point(colour = rep(c("black","orange"), each = 4)) +
scale_shape_manual(values=c(19, 4)) +
scale_color_manual(values = c("deepskyblue", "red")) +
xlab("models") +
ylab("confidence level")

Regression line lost after factor conversion

In the following plot, time is on the x-axis but tick marks do not show for every year:
ggplot(mm, aes(x = time, y = value)) +
geom_point(aes(color = variable)) +
geom_line(stat = "smooth", method = "lm", alpha = 0.5) +
facet_grid(variable ~ ., scales = "free_y") +
theme(legend.position="none") +
coord_fixed(ratio = 10)
In order to fix this, I have converted the time variable to a factor, which works but then the linear regression disappears:
ggplot(mm, aes(x = factor(time), y = value)) +
geom_point(aes(color = variable)) +
geom_line(stat = "smooth", method = "lm", alpha = 0.5) +
facet_grid(variable ~ ., scales = "free_y") +
theme(legend.position = "none") +
coord_fixed(ratio = 10)
Is there a workaround for this with geom_line?
I think that scale_x_date is what you are looking for.
First, some reproducible data:
df <-
data.frame(
y = 99:117
, x = seq(as.Date("1999-01-01")
, as.Date("2017-01-01")
, "year")
)
Then, this is the way you can set to some "pretty" break points while still getting a tick at each year. If you want every year labelled, then use date_breaks = "1 year" instead of the breaks and date_minor_breaks arguments I have now
ggplot(df, aes(x = x, y = y) ) +
geom_smooth(method = "lm") +
geom_point() +
scale_x_date(breaks = pretty(df$x)
, date_minor_breaks = "1 year"
, date_labels = "%Y")
gives
Or, if your years are just numeric (and not dates), you can use scale_x_continuous for a similar effect:
df <-
data.frame(
y = 99:117
, x = 1999:2017
)
ggplot(df, aes(x = x, y = y) ) +
geom_smooth(method = "lm") +
geom_point() +
scale_x_continuous(breaks = pretty(df$x)
, minor_breaks = unique(df$x)) +
theme_gray()
Gives a plot that is indistinguishable from above.

Resources