The mpg dataset in R - r

I am trying to figure out a way to color my point on a geom_point plot based upon the type of transmission, but in the mpg dataset, the trans column has different names for auto and manual trans. How can I rename the values in the trans column to be either Auto for automatic and Manual for manual transmissions? I also attached a picture of the desired graph for reference.
Here is my main plot code:
data <- mpg
n <- nrow(mpg)
mpg_auto <- subset(mpg, substring(trans[1:n],1,1)=="a")
mpg_manual <- subset(mpg, substring(trans[1:n],1,1)=="m")
mpg$trans <- factor(mpg$trans, levels=c(mpg_auto,mpg_manual),
labels = c("0","1"))
mpg_select <- subset(mpg, mpg$hwy > 30 & mpg$displ < 3)
mpg_select <- as.data.frame(mpg_select)
gg<- ggplot(mpg) + aes(x = displ, y = hwy) +
geom_point(aes(col = trans))+
geom_encircle(aes(x = displ, y = hwy),
data = mpg_select,
color= "darkgreen",
expand = .05,
size = 2) +
annotate_textp(x = .2, y = .9, size = 15,
label = "Efficient Vehicle", color = "darkgreen")+
labs(y = "Hwy MPG",
x = "Displacement")
ggMarginal(gg, type= "density", alpha = 0.5,
groupColour = TRUE, groupFill = TRUE)
Picture of the plot with the above code: https://ibb.co/fGMSXdn

Here's a good way to relabel the transmission (I create a new column named transmission, but you could just as easily overwrite the existing column).
mpg$transmission = ifelse(substring(mpg$trans, 1, 1) == "a", "automatic", "manual")
Now that's done, coloring is easy:
gg <- ggplot(mpg, aes(x = displ, y = hwy) +
geom_point(aes(color = transmission))+
labs(y = "Hwy MPG",
x = "Displacement")
I've left out all your non-standard ggplot stuff because I'm not sure what package(s) it's from. It doesn't seem related to your issue anyway, so you should be able to just add it back in.

Related

How can I add annotation in ggplotly animation?

I am creating animated plotly graph for my assignment in r, where I am comparing several models with various number of observations. I would like to add annotation showing what is the RMSE of the current model - this means I would like to have text that changes together with slider. Is there any easy way how to do that?
Here is my dataset stored on GitHub. There already is created variable with RMSE: data
The base ggplot graphic is as follows:
library(tidyverse)
library(plotly)
p <- ggplot(values_predictions, aes(x = x)) +
geom_line(aes(y = preds_BLR, frame = n, colour = "BLR")) +
geom_line(aes(y = preds_RLS, frame = n, colour = "RLS")) +
geom_point(aes(x = x, y = target, frame = n, colour = "target"), alpha = 0.3) +
geom_line(aes(x = x, y = sin(2 * pi * x), colour = "sin(2*pi*x)"), alpha = 0.3) +
ggtitle("Comparison of performance) +
labs(y = "predictions and targets", colour = "colours")
This is converted to plotly, and I have added an animation to the Plotly graph:
plot <- ggplotly(p) %>%
animation_opts(easing = "linear",redraw = FALSE)
plot
Thanks!
You can add annotations to a ggplot graph using the annotate function: http://ggplot2.tidyverse.org/reference/annotate.html
df <- data.frame(x = rnorm(100, mean = 10), y = rnorm(100, mean = 10))
# Build model
fit <- lm(x ~ y, data = df)
# function finds RMSE
RMSE <- function(error) { sqrt(mean(error^2)) }
library(ggplot2)
ggplot(df, aes(x, y)) +
geom_point() +
annotate("text", x = Inf, y = Inf, hjust = 1.1, vjust = 2,
label = paste("RMSE", RMSE(fit$residuals)) )
There seems to be a bit of a problem converting between ggplot and plotly. However this workaround here shows a workaround which can be used:
ggplotly(plot) %>%
layout(annotations = list(x = 12, y = 13, text = paste("RMSE",
RMSE(fit$residuals)), showarrow = F))
Here's an example of adding data dependent text using the built in iris dataset with correlation as text to ggplotly.
library(plotly)
library(ggplot2)
library(dplyr)
mydata = iris %>% rename(variable1=Sepal.Length, variable2= Sepal.Width)
shift_right = 0.1 # number from 0-1 where higher = more right
shift_down = 0.02 # number from 0-1 where higher = more down
p = ggplot(mydata, aes(variable1,variable2))+
annotate(geom = "text",
label = paste0("Cor = ",as.character(round(cor.test(mydata$variable1,mydata$variable2)$estimate,2))),
x = min(mydata$variable1)+abs(shift_right*(min(mydata$variable1)-max(mydata$variable1))),
y = max(mydata$variable2)-abs(shift_down*(min(mydata$variable2)-max(mydata$variable2))), size=4)+
geom_point()
ggplotly(p) %>% style(hoverinfo = "none", traces = 1) # remove hover on text

Plotting points and lines separately in R with ggplot

I'm trying to plot 2 sets of data points and a single line in R using ggplot.
The issue I'm having is with the legend.
As can be seen in the attached image, the legend applies the lines to all 3 data sets even though only one of them is plotted with a line.
I have melted the data into one long frame, but this still requires me to filter the data sets for each individual call to geom_line() and geom_path().
I want to graph the melted data, plotting a line based on one data set, and points on the remaining two, with a complete legend.
Here is the sample script I wrote to produce the plot:
xseq <- 1:100
x <- rnorm(n = 100, mean = 0.5, sd = 2)
x2 <- rnorm(n = 100, mean = 1, sd = 0.5)
x.lm <- lm(formula = x ~ xseq)
x.fit <- predict(x.lm, newdata = data.frame(xseq = 1:100), type = "response", se.fit = TRUE)
my_data <- data.frame(x = xseq, ypoints = x, ylines = x.fit$fit, ypoints2 = x2)
## Now try and plot it
melted_data <- melt(data = my_data, id.vars = "x")
p <- ggplot(data = melted_data, aes(x = x, y = value, color = variable, shape = variable, linetype = variable)) +
geom_point(data = filter(melted_data, variable == "ypoints")) +
geom_point(data = filter(melted_data, variable == "ypoints2")) +
geom_path(data = filter(melted_data, variable == "ylines"))
pushViewport(viewport(layout = grid.layout(1, 1))) # One on top of the other
print(p, vp = viewport(layout.pos.row = 1, layout.pos.col = 1))
You can set them manually like this:
We set linetype = "solid" for the first item and "blank" for others (no line).
Similarly for first item we set no shape (NA) and for others we will set whatever shape we need (I just put 7 and 8 there for an example). See e.g. http://www.r-bloggers.com/how-to-remember-point-shape-codes-in-r/ to help you to choose correct shapes for your needs.
If you are happy with dots then you can use my_shapes = c(NA,16,16) and scale_shape_manual(...) is not needed.
my_shapes = c(NA,7,8)
ggplot(data = melted_data, aes(x = x, y = value, color=variable, shape=variable )) +
geom_path(data = filter(melted_data, variable == "ylines") ) +
geom_point(data = filter(melted_data, variable %in% c("ypoints", "ypoints2"))) +
scale_colour_manual(values = c("red", "green", "blue"),
guide = guide_legend(override.aes = list(
linetype = c("solid", "blank","blank"),
shape = my_shapes))) +
scale_shape_manual(values = my_shapes)
But I am very curious if there is some more automated way. Hopefully someone can post better answer.
This post relied quite heavily on this answer: ggplot2: Different legend symbols for points and lines

Dynamically create expression for ggplot legend?

I want to plot a line graph, with multiple lines, coloured depending on a grouping variable. Now I want to set the legend labels via scale-command:
scale_color_manual(values = colors_values, labels = ...)
The legend labels are as following: "x^2", "x^3", "x^4" etc., where the range is dynamically created. I would now like to dynamically create the expression as label text, i.e.
"x^2" should become x2
"x^3" should become x3
etc.
The amount of legend labels varies, so I thought about something like as.expression(sprintf("x^%i", number)), which does of course not work as label parameter for the scale function.
I have searched google and stack overflow, however, I haven't found a working solution yet, so I hope someone can help me here.
Here's a reproducible example:
poly.term <- runif(100, 1, 60)
resp <- rnorm(100, 40, 5)
poly.degree <- 2:4
geom.colors <- scales::brewer_pal(palette = "Set1")(length(poly.degree))
plot.df <- data.frame()
for (i in poly.degree) {
mydat <- na.omit(data.frame(x = poly.term, y = resp))
fit <- lm(mydat$y ~ poly(mydat$x, i, raw = TRUE))
plot.df <- rbind(plot.df, cbind(mydat, predict(fit), sprintf("x^%i", i)))
}
colnames(plot.df) <- c("x","y", "pred", "grp")
ggplot(plot.df, aes(x, y, colour = grp)) +
stat_smooth(method = "loess", se = F) +
geom_line(aes(y = pred))
scale_color_manual(values = geom.colors
# here I want to change legend labels
# lables = expresion???
)
I would like to have the legend labels to be x2, x3 and x4.
ggplot(plot.df, aes(x, y, colour = grp)) +
stat_smooth(method = "loess", se = F) +
geom_line(aes(y = pred)) +
scale_color_manual(values = setNames(geom.colors,
paste0("x^",poly.degree)),
labels = setNames(lapply(poly.degree, function(i) bquote(x^.(i))),
paste0("x^",poly.degree)))
It's important to ensure correct mapping if you change values or labels in the scale. Thus, you should always use named vectors.

Bar plot of group means with lines of individual results overlaid

this is my first stack overflow post and I am a relatively new R user, so please go gently!
I have a data frame with three columns, a participant identifier, a condition (factor with 2 levels either Placebo or Experimental), and an outcome score.
set.seed(1)
dat <- data.frame(Condition = c(rep("Placebo",10),rep("Experimental",10)),
Outcome = rnorm(20,15,2),
ID = factor(rep(1:10,2)))
I would like to construct a bar plot with two bars with the mean outcome score for each condition and the standard deviation as an error bar. I would like to then overlay lines connecting points for each participant's score in each condition. So the plot displays the individual response as well as the group mean.If it is also possible I would like to include an axis break.
I don't seem to be able to find any advice in other threads, apologies if I am repeating a question.
Many Thanks.
p.s. I realise that presenting data in this way will not be to everyones tastes. It is for a specific requirement!
This ought to work:
library(ggplot2)
library(dplyr)
dat.summ <- dat %>% group_by(Condition) %>%
summarize(mean.outcome = mean(Outcome),
sd.outcome = sd(Outcome))
ggplot(dat.summ, aes(x = Condition, y = mean.outcome)) +
geom_bar(stat = "identity") +
geom_errorbar(aes(ymin = mean.outcome - sd.outcome,
ymax = mean.outcome + sd.outcome),
color = "dodgerblue", width = 0.3) +
geom_point(data = dat, aes(x = Condition, y = Outcome),
color = "firebrick", size = 1.2) +
geom_line(data = dat, aes(x = Condition, y = Outcome, group = ID),
color = "firebrick", size = 1.2, alpha = 0.5) +
scale_y_continuous(limits = c(0, max(dat$Outcome)))
Some people are better with ggplot's stat functions and arguments than I am and might do it differently. I prefer to just transform my data first.
set.seed(1)
dat <- data.frame(Condition = c(rep("Placebo",10),rep("Experimental",10)),
Outcome = rnorm(20,15,2),
ID = factor(rep(1:10,2)))
dat.w <- reshape(dat, direction = 'wide', idvar = 'ID', timevar = 'Condition')
means <- colMeans(dat.w[, 2:3])
sds <- apply(dat.w[, 2:3], 2, sd)
ci.l <- means - sds
ci.u <- means + sds
ci.width <- .25
bp <- barplot(means, ylim = c(0,20))
segments(bp, ci.l, bp, ci.u)
segments(bp - ci.width, ci.u, bp + ci.width, ci.u)
segments(bp - ci.width, ci.l, bp + ci.width, ci.l)
segments(x0 = bp[1], x1 = bp[2], y0 = dat.w[, 2], y1 = dat.w[, 3], col = 1:10)
points(c(rep(bp[1], 10), rep(bp[2], 10)), dat$Outcome, col = 1:10, pch = 19)
Here is a method using the transfomations inside ggplot2
ggplot(dat) +
stat_summary(aes(x=Condition, y=Outcome, group=Condition), fun.y="mean", geom="bar") +
stat_summary(aes(x=Condition, y=Outcome, group=Condition), fun.data="mean_se", geom="errorbar", col="green", width=.8, size=2) +
geom_line(aes(x=Condition, y=Outcome, group=ID), col="red")

Color one point and add an annotation in ggplot2?

I have a dataframe a with three columns :
GeneName, Index1, Index2
I draw a scatterplot like this
ggplot(a, aes(log10(Index1+1), Index2)) +geom_point(alpha=1/5)
Then I want to color a point whose GeneName is "G1" and add a text box near that point, what might be the easiest way to do it?
You could create a subset containing just that point and then add it to the plot:
# create the subset
g1 <- subset(a, GeneName == "G1")
# plot the data
ggplot(a, aes(log10(Index1+1), Index2)) + geom_point(alpha=1/5) + # this is the base plot
geom_point(data=g1, colour="red") + # this adds a red point
geom_text(data=g1, label="G1", vjust=1) # this adds a label for the red point
NOTE: Since everyone keeps up-voting this question, I thought I would make it easier to read.
Something like this should work. You may need to mess around with the x and y arguments to geom_text().
library(ggplot2)
highlight.gene <- "G1"
set.seed(23456)
a <- data.frame(GeneName = paste("G", 1:10, sep = ""),
Index1 = runif(10, 100, 200),
Index2 = runif(10, 100, 150))
a$highlight <- ifelse(a$GeneName == highlight.gene, "highlight", "normal")
textdf <- a[a$GeneName == highlight.gene, ]
mycolours <- c("highlight" = "red", "normal" = "grey50")
a
textdf
ggplot(data = a, aes(x = Index1, y = Index2)) +
geom_point(size = 3, aes(colour = highlight)) +
scale_color_manual("Status", values = mycolours) +
geom_text(data = textdf, aes(x = Index1 * 1.05, y = Index2, label = "my label")) +
theme(legend.position = "none") +
theme()

Resources