I am using ggplot2 to plot points from a .csv file that is just a column used a x values and a column used a y values. I am a little confused as to how ggplot decides what to make a legend for and haven't found any good examples online.
I would like the legend to show that geom_point is stress vs strain, and my geom_smooth is the best fit line.
Here is my code:
library(ggplot2)
imported = read.csv("data.csv")
Strain = imported$Strain
Stress = imported$Stress..N.m.2.
err = .0005
gg <-
ggplot(imported, aes(x=Strain, y=Stress)) +
geom_point(aes(group = "Points"), shape = 79, colour = "black", size = 2, stroke = 4) +
geom_smooth(method = "lm", se = FALSE, color = "orange") +
geom_errorbarh(xmin = Strain - err, xmax = Strain + err, show.legend = TRUE) +
theme_gray() + ggtitle("Stress vs Strain") +
theme(legend.position = "top")
gg
And it is producing the following plot:
my plot
Edit: added approach at top to create legend for each geom, by creating dummy mapping to separate aesthetics.
library(ggplot2)
ggplot(mtcars, aes(mpg, wt)) +
geom_point(aes(color = "point")) + # dummy mapping to color
geom_smooth(method = "lm", se = FALSE, color = "orange",
aes(linetype = "best fit")) + # dummy mapping to linetype
geom_errorbarh(aes(xmin = mpg - 2, xmax = mpg + 1)) +
scale_color_manual(name = "Stress vs. Strain", values = "black") +
scale_linetype_manual(name = "Best fit line", values = "solid")
original answer:
Note the difference in legend here:
library(ggplot2)
ggplot(mtcars, aes(mpg, wt, color = as.character(cyl))) +
geom_point() +
geom_errorbarh(aes(xmin = mpg - 2, xmax = mpg + 1),
show.legend = TRUE) # error bars reflected in legend
ggplot(mtcars, aes(mpg, wt, color = as.character(cyl))) +
geom_point() +
geom_errorbarh(aes(xmin = mpg - 2, xmax = mpg + 1),
show.legend = FALSE) # error bars not shown in legend
Related
I am trying to add labels in line graph but am unable to do so.
I want to add lable such that blue line mentiones 'model_1'; red line mentioned 'model_2' and darkgreen line mentioned 'model_3'
Attaching the code below
p1 <- ggplot(data = Auto, aes(x = horsepower, y = mpg)) +
geom_point() +
geom_line(aes(y = fitted(lm_mpg_1)), color = "blue", size = 1) +
geom_line(aes(y = fitted(lm_mpg_2)), color = "red", size = 1) +
geom_line(aes(y = fitted(lm_mpg_3)), color = "darkgreen", size = 1)
I have tried to use geom_text, geom_label and annotate function however they give me error.
The code I tried was:
p1 + geom_text(label = c('model_1','model_2','model_3'))
You don't have any data. You can use dput to share your data. In the meanwhile I have used mtcars as an example below:
# library
library(ggplot2)
# Keep 30 first rows in the mtcars natively available dataset
data=head(mtcars, 30)
# 1/ add text with geom_text, use nudge to nudge the text
ggplot(data, aes(x=wt, y=mpg)) +
geom_point() + # Show dots
geom_text(
label=rownames(data),
nudge_x = 0.25, nudge_y = 0.25,
check_overlap = T
)
ggplot(data, aes(x=wt, y=mpg)) +
geom_point() + # Show dots
geom_label(
label=rownames(data),
nudge_x = 0.25, nudge_y = 0.25,
check_overlap = T
)
p <- ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point()
p + annotate("text", x = 4, y = 25, label = "Some text")
I made a visualization of a regression. Currently this is what the graph looks like.
The regression lines are hard to see since they are the same color as the scatter plot dots.
My question is, how do I make the regression lines a different color from the scatter plot dots?
Here is my code:
(ggplot(data=df, mapping=aes(x='score', y='relent',
color='factor(threshold)'))+
geom_point()+
scale_color_manual(values=['darkorange', 'purple'])+
geom_smooth(method='lm',
formula = 'y ~ x+I(x**2)',se=False, )+
geom_vline(xintercept = 766, color = "red", size = 1, linetype = "dashed")+
labs(y = "Yield",
x = "Score")+
theme_bw()
)
One option to achieve your desired result would be to "duplicate" your threshold column with different values, e.g. in the code below I map 0 on 2 and 1 on 3. This duplicated column could then be mapped on the color aes inside geom_smooth and allows to set different colors for the regression lines.
My code below uses R or ggplot2 but TBMK the code could be easily adapted to plotnine:
n <- 1000
df <- data.frame(
relent = c(runif(n, 100, 200), runif(n, 150, 250)),
score = c(runif(n, 764, 766), runif(n, 766, 768)),
threshold = c(rep(0, n), rep(1, n))
)
df$threshold_sm <- c(rep(2, n), rep(3, n))
library(ggplot2)
p <- ggplot(data = df, mapping = aes(x = score, y = relent, color = factor(threshold))) +
scale_color_manual(values = c("darkorange", "purple", "blue", "green")) +
geom_vline(xintercept = 766, color = "red", size = 1, linetype = "dashed") +
labs(
y = "Yield",
x = "Score"
) +
theme_bw()
p +
geom_point() +
geom_smooth(aes(color = factor(threshold_sm)),
method = "lm",
formula = y ~ x + I(x**2), se = FALSE
)
A second option would be to add some transparency to the points so that the lines stand out more clearly and by the way deals with the overplotting of the points:
p +
geom_point(alpha = .3) +
geom_smooth(aes(color = factor(threshold)),
method = "lm",
formula = y ~ x + I(x**2), se = FALSE
) +
guides(color = guide_legend(override.aes = list(alpha = 1)))
Compare:
iris %>%
ggplot(aes(Petal.Length, Sepal.Width, color = Species)) +
geom_point() +
geom_smooth(method = "lm", aes(group = Species))
With:
iris %>%
ggplot(aes(Petal.Length, Sepal.Width)) +
geom_point(aes(color = Species)) +
geom_smooth(method = "lm", aes(group = Species))
When aes(color = ...) is specified inside of ggplot(), it is applied to both of the subsequent geoms. Moving it to geom_point() applies it to the points only.
I need to combine the boxplot with the histogram using ggplot2. So far I have this code.
library(dplyr)
library(ggplot2)
data(mtcars)
dat <- mtcars %>% dplyr::select(carb, wt) %>%
dplyr::group_by(carb) %>% dplyr::mutate(mean_wt = mean(wt), carb_count = n())
plot<-ggplot(data=mtcars, aes(x=carb, y=..count..)) +
geom_histogram(alpha=0.3, position="identity", lwd=0.2,binwidth=1)+
theme_bw()+
theme(panel.border = element_rect(colour = "black", fill=NA, size=0.7))+
geom_text(data=aggregate(mean_wt~carb+carb_count,dat,mean), aes(carb, carb_count+0.5, label=round(mean_wt,1)), color="black")
plot + geom_boxplot(data = mtcars,mapping = aes(x = carb, y = 6*wt,group=carb),
color="black", fill="red", alpha=0.2,width=0.1,outlier.shape = NA)+
scale_y_continuous(name = "Count",
sec.axis = sec_axis(~./6, name = "Weight"))
This results in
However, I dont want the secondary y axis to be the same length of primary y axis. I want the secondary y axis to be smaller and on the top right corner only. Lets say secondary y axis should scale between 20-30 of primary y axis and the box plot should also scale with the axis.
Can anyone help me with this?
Here's one approach, where I adjusted the secondary axis formula and tweaked the way it's labeled. (EDIT: adjusted to make boxplots bigger, per OP comment.)
plot + geom_boxplot(data = mtcars,
# Adj'd scaling so each 1 wt = 2.5 count
aes(x = carb, y = (wt*2.5)+10,group=carb),
color="black", fill="red", alpha=0.2,
width=0.5, outlier.shape = NA)+ # Wider width
scale_y_continuous(name = "Count", # Adj'd labels to limit left to 0, 5, 10
breaks = 5*0:5, labels = c(5*0:2, rep("", 3)),
# Adj'd scaling to match the wt scaling
sec.axis = sec_axis(~(.-10)/2.5, name = "Weight",
breaks = c(0:5))) +
theme(axis.title.y.left = element_text(hjust = 0.15, vjust = 1),
axis.title.y.right = element_text(hjust = 0.15, vjust = 1))
You might also consider an alternative using the patchwork package, coincidentally written by the same developer who implemented secondary scales in ggplot2...
# Alternative solution using patchwork
library(patchwork)
plot2 <- ggplot(data=mtcars, aes(x=carb, y=..count..)) +
theme_bw()+
theme(panel.border = element_rect(colour = "black", fill=NA, size=0.7))+
geom_boxplot(data = mtcars,
aes(x = carb, y = wt, group=carb),
color="black", fill="red", alpha=0.2,width=0.1,outlier.shape = NA) +
scale_y_continuous(name = "Weight") +
scale_x_continuous(labels = NULL, name = NULL,
expand = c(0, 0.85), breaks = c(2,4,6,8))
plot2 + plot + plot_layout(nrow = 2, heights = c(1,3)) +
labs(x=NULL)
I want to plot the distribution of a variable by Class and add vertical lines denoting the means of the subsets defined by each Class and having them colored by Class. While I succeed to color the distributions by Class, the vertical lines appear gray. For a reproducible example see below:
library(data.table)
library(ggplot2)
library(ggthemes)
data(mtcars)
setDT(mtcars)
mtcars[, am := factor(am, levels = c(1, 0))]
mean_data <- mtcars[, .(mu = mean(hp)), by = am]
ggplot(mtcars, aes(x = hp, fill = am , color = am)) +
geom_histogram(aes(y=..density..), position="identity",alpha = 0.4) + guides(color = FALSE) +
geom_density (alpha = 0.5)+
geom_vline(data = mean_data, xintercept = mean_data$mu, aes(color = as.factor(mean_data$am)), size = 2, alpha = 0.5) +
ggtitle("Hp by am") + scale_fill_discrete(labels=c("am" , "no am")) +
labs(fill = "Transmission") + theme_economist()
This code renders the following plot:
Your advice will be appreciated.
You need to include the xintercept mapping in your aes call, so that ggplot properly maps all the aesthetics:
ggplot(mtcars, aes(x = hp, fill = am , color = am)) +
geom_histogram(aes(y=..density..), position="identity",alpha = 0.4) + guides(color = FALSE) +
geom_density (alpha = 0.5)+
geom_vline(data = mean_data, aes(xintercept = mu, color = as.factor(am)), size = 2, alpha = 0.5) +
ggtitle("Hp by am") + scale_fill_discrete(labels=c("am" , "no am")) +
labs(fill = "Transmission") + theme_economist()
Anything you put in a geom call that's not in aes gets treated as a one-off value, and doesn't get all the mapped aesthetics applied to it.
I'm using the code below to generate a simple chart.
# Data import -------------------------------------------------------------
data(mtcars)
mtcars$model <- rownames(mtcars)
# Graph: Income Broadband -------------------------------------------------
# Lib.
require(ggplot2); require(directlabels)
# Graph definition
ggplot(data = mtcars, aes(x = mpg, y = disp)) +
geom_point(shape = 1, colour = "black", size = 3, fill = "black") +
geom_smooth(method = lm, se = TRUE, fullrange = TRUE) +
geom_dl(aes(label = model), list("smart.grid", cex = 0.5, hjust = -.5)) +
xlab("MPG") +
ylab("DISP") +
theme_bw()
As illustrated below, the labels on the chart are placed far away from the points. I would like to amend this and place the point labels closer to the points on the graph. Naturally, for the sake of readability I would like for the labels not overlap. In addition, I would like for the solution to be easy to reproduce as I will have to apply across a number of charts. mlabvpos in Stata, as discussed here, provides some of those functionalities. I'm looking for a similar solution in R.
Edit
Following the comments, it appears the problem is not associated with the hjust settings. For instance, for the code:
# Graph definition
ggplot(data = mtcars, aes(x = mpg, y = disp)) +
geom_point(shape = 1, colour = "black", size = 3, fill = "black") +
geom_smooth(method = lm, se = TRUE, fullrange = TRUE) +
geom_dl(aes(label = model), list("smart.grid", cex = 0.5, hjust = -.001)) +
xlab("MPG") +
ylab("DISP") +
theme_bw()
The labels are still misplaced:
On the same lines, running the code with no hjust settings does not place the labels in a more sensible manner: