Using the following website (http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html), I made the graph below:
mtcars$`car name` <- rownames(mtcars) # create new column for car names
mtcars$mpg_z <- round((mtcars$mpg - mean(mtcars$mpg))/sd(mtcars$mpg), 2) # compute normalized mpg
mtcars$mpg_type <- ifelse(mtcars$mpg_z < 0, "below", "above") # above / below avg flag
mtcars <- mtcars[order(mtcars$mpg_z), ] # sort
mtcars$`car name` <- factor(mtcars$`car name`, levels = mtcars$`car name`) # convert to factor to retain sorted order in plot.
library(ggplot2)
theme_set(theme_bw())
# Plot
ggplot(mtcars, aes(x=`car name`, y=mpg_z, label=mpg_z)) +
geom_point(stat='identity', aes(col=mpg_type), size=6) +
scale_color_manual(name="Mileage",
labels = c("Above Average", "Below Average"),
values = c("above"="#00ba38", "below"="#f8766d")) +
geom_text(color="white", size=2) +
labs(title="Diverging Dot Plot",
subtitle="Normalized mileage from 'mtcars': Dotplot") +
ylim(-2.5, 2.5) +
coord_flip()
My Question: I want to modify the above graph so that there are "2 dots" (green and red) on each horizontal line, representing the values of two different variables.
I created a data set for this example:
my_data = data.frame(var_1_col = "red", var_2_col = "green", var_1 = rnorm(8,10,10), var_2 = rnorm(8,5,1), name = c("A", "B", "C", "D", "E", "F", "G", "H"))
var_1_col var_2_col var_1 var_2 name
1 red green 14.726642 4.676161 A
2 red green 11.011187 4.937376 B
3 red green 12.418489 5.869617 C
4 red green 21.935154 5.641106 D
5 red green 20.209498 6.193123 E
6 red green -5.339944 5.187093 F
7 red green 20.540806 3.895683 G
8 red green 21.619631 4.097438 H
Then, I tried to create the graph - but it comes out as empty:
# Plot
ggplot(my_data, aes(x=name, y=var_1, label=name)) +
geom_point(stat='identity', aes(col=var_1_col), size=6) +
scale_color_manual(name="Var 1 or Var 2",
labels = c("Var 1", "Var 2"),
values = c("Var 1"="#00ba38", "Var 2"="#f8766d")) +
geom_text(color="white", size=2) +
labs(title="Plot",
subtitle="Plot: Dotplot") +
ylim(-2.5, 2.5) +
coord_flip()
Ideally, I would like the graph to look something like this:
Can someone please show me how to do this?
Thanks!
Note: var_1 could be some variable like "average fuel price" and var_2 could be "median fuel price"
I recommend putting the data into a long format, as it is the preference when plotting with ggplot2. So, I would just drop the two color columns as you can just set that in scale_color_manual. Then, in aes for geom_point, we can set that we want the two variables to be colored different (i.e., as their own group). Then, we can still set all of the labels, names, and colors in scale_color_manual.
library(tidyverse)
my_data %>%
select(-c(var_1_col, var_2_col)) %>%
pivot_longer(-name, names_to = "variable", values_to = "value") %>%
ggplot(., aes(x = name, y = value, label = name)) +
geom_point(stat = 'identity', aes(color = variable), size = 6) +
scale_color_manual(
name = "Var 1 or Var 2",
labels = c("Var 1", "Var 2"),
values = c("#00ba38", "#f8766d")
) +
labs(title = "Plot",
subtitle = "Plot: Dotplot") +
coord_flip() +
theme_bw()
Output
I want to modify [...], representing the values of two different variables.
If you're looking to plot two different variables on the same graph (and they share a common axis like the names in this case), you can construct two separate geom_point arguments.
ggplot(my_data) +
geom_point(aes(x=name, y=var_1, col=var_1_col)) +
geom_point(aes(x=name, y=var_2, col=var_2_col)) +
coord_flip()
You don't always have to define the axes/colors/labels in the initial ggplot function. By only specifying the dataset, then you can be flexible with the variables you use in the following graph-specific functions. That's how you can construct multiple graphs on one plot :)
Related
"Gender" is stored as an integer: 1 and 0
I am trying to graph it in a bar chart as "Male" and "female" on the X-axis
compared with total number of bets by gambler.
I think there is an easier way of doing this rather than changing every input in the dataset into a string.
I am than trying to give male a blue color and female purple or yellow.
thank you everyone
code:
# scatter plot of age of Gamblers correlated with number of bets
#alpha keeps from over plotting remove if unnecessary
p_1 <- ggplot(data = data, aes(x = Gender, y = BetsA )) +
geom_point(alpha = 0.1)
p_1 + ggtitle(label = "Gender Correlated with Total Number of Bets") + # for the main title
xlab(label = "Gender of Gambler") + # for the x axis label
ylab(label = "Total Number of Bets" ) # for the y axis label
One option would be to first convert your Gender column to a factor. Afterwards you could use the labels argument of scale_x_discrete to assign your desired labels for 0 and 1. And for coloring you could basically do the same. Just map factor(Gender) on the color aes then set your desired colors via the values argument of scale_color_manual:
Using some fake random example data:
set.seed(123)
# Create example data
data <- data.frame(
Gender = rep(c(0,1), 100),
BetsA = runif(200, 0, 40000)
)
library(ggplot2)
ggplot(data = data, aes(x = factor(Gender), y = BetsA, color = factor(Gender) )) +
geom_point(alpha = 0.1) +
scale_x_discrete(labels = c("0" = "Male", "1" = "Female")) +
scale_color_manual(values = c("0" = "blue", "1" = "purple"), labels = c("0" = "Male", "1" = "Female")) +
ggtitle(label = "Gender Correlated with Total Number of Bets") +
xlab(label = "Gender of Gambler") +
ylab(label = "Total Number of Bets" )
I have a relatively large dataset that I can share here.
I am trying to plot all the lines (not just one: e.g. a mean or a median) corresponding to the values of y over x = G, with the data grouped by I and P; so that the levels of the variable I appear with a different colour and the levels of the variable P appear with a different line type.
The problem I have is that the graph I get is a zig-zag line graph along the x-axis. The aim, obviously, is to have a line for each combination of data, avoiding the zig-zag. I have read that this problem could be related to the way the data is grouped. I have tried several combinations of data grouping using group but I can't solve the problem.
The code I use is as follows:
#Selecting colours
colours<-brewer.pal(n = 11, name = "Spectral")[c(9,11,1)]
#Creating plot
data %>%
ggplot(aes(x = G, y = y, color = I, linetype=P)) +
geom_line(aes(linetype=P,color=I),size=0.2)+
scale_linetype_manual(values=c("solid", "dashed")) +
scale_color_manual(values=colours) +
scale_x_continuous(breaks = seq(0,100, by=25), limits=c(0,100)) +
scale_y_continuous(breaks = seq(0,1, by=0.25), limits=c(0,1)) +
labs(x = "Time", y = "Value") +
theme_classic()
I also tried unsuccessfully adding group=interaction(I, P) inside ggplot(aes()), as they suggests in other forums.
Following #JonSpring's point:
dd2 <- (filter(dd,G %in% c(16,17))
%>% group_by(P,I,G)
%>% summarise(n=length(unique(y)))
)
shows that you have many different values of y for each combination of G/I/P:
# A tibble: 12 x 4
# Groups: P, I [6]
P I G n
<chr> <chr> <dbl> <int>
1 heterogeneity I005 16 34
2 heterogeneity I005 17 37
3 heterogeneity I010 16 34
... [etc.]
One way around this, if you so choose, is to use stat_summary() to have R collapse the y values in each group to their mean:
(dd %>%
ggplot(aes(x = G, y = y, color = I, linetype=P)) +
stat_summary(fun=mean, geom="line",
aes(linetype=P,color=I,group=interaction(I,P)),size=0.2) +
scale_linetype_manual(values=c("solid", "dashed")) +
scale_color_manual(values=colours) +
labs(x = "Time", y = "Value") +
theme_classic()
)
You could also do this yourself with group_by() + summarise() before calling ggplot.
There's not enough information in the data set as presented to identify individual lines. If we are willing to assume that the order of the values within a given I/G/P group is an appropriate indexing variable, then we can do this:
## add index variable
dd3 <- dd %>% group_by(P,I,G) %>% mutate(index=seq(n()))
(dd3 %>%
ggplot(aes(x = G, y = y, color = I, linetype=P)) +
geom_line(aes(group=interaction(index,I,P)), size=0.2) +
scale_linetype_manual(values=c("solid", "dashed")) +
scale_color_manual(values=colours) +
labs(x = "Time", y = "Value") +
theme_classic()
)
If this isn't what you had in mind, then you need to provide more information ...
I have a df like this:
set.seed(123)
df <- data.frame(Delay=rep(-5:6, times=8, each=1),
ID= rep(c("A","B","C","D"), times=1, each=24),
variable=rep(c("R2","SE"), times=4, each=12),
value=c(0.3,0.4,0.51,0.58,0.64,0.78,0.68,0.63,0.54,0.45,0.32,0.22,0.78,0.68,0.59,0.55,0.47,0.35,0.28,0.41,0.50,0.58,0.63,0.73,0.3,0.4,0.51,0.58,0.64,0.78,0.68,0.63,0.54,0.45,0.32,0.22,0.78,0.68,0.59,0.55,0.47,0.35,0.28,0.41,0.50,0.58,0.63,0.73,0.3,0.4,0.51,0.58,0.64,0.78,0.68,0.63,0.54,0.45,0.32,0.22,0.78,0.68,0.59,0.55,0.47,0.35,0.28,0.41,0.50,0.58,0.63,0.73,0.3,0.4,0.51,0.58,0.64,0.78,0.68,0.63,0.54,0.45,0.32,0.22,0.78,0.68,0.59,0.55,0.47,0.35,0.28,0.41,0.50,0.58,0.63,0.73))
df$ID <- as.factor(df$ID)
df$variable <- as.factor(df$variable)
Plot<- ggplot(df[df$ID=="B",], aes(x=Delay, y=value, group=variable, colour=variable)) +
geom_point(size=1) +
geom_line () +
theme_hc() +
theme(legend.position="right") +
labs(x= '\nDelay',y=expression(R^{2})) +
guides(color=guide_legend(override.aes=list(fill=NA))) +
scale_x_continuous(breaks=seq(-5,5,1)) +
scale_color_jco()
Plot
I am plotting just data of B.
I would like to add a vertical for the minimum value of SE and a vertical line for the maximum value of R2. I would like that the lines had the same colour than the variable. However, I don't know how to do it. The colour of the vertical lines are black as you can see below, so I don't know how to indicate I want the specific colour I Used previously.
Plot <- Plot + geom_vline(xintercept = 0)
Plot
Does anyone know how add both vertical lines using the same colours that for the variables?
You don't need to find the color to instruct ggplot2 to reuse it: you can supply "new data" with your desired x-intercept lines, and identify each v-line as belonging to a particular variable to use that variable's color.
I don't have your original Plot object or call, so my colors/theme will be different.
library(ggplot2)
ggplot(df, aes(Delay, value, color = variable)) +
geom_line() +
geom_vline(aes(xintercept = Delay, color = variable),
data = data.frame(Delay = 0, variable = "R2"))
Or with multiple v-lines:
ggplot(df, aes(Delay, value, color = variable)) +
geom_line() +
geom_vline(aes(xintercept = Delay, color = variable),
data = data.frame(Delay = c(-1, 1, 2), variable = c("R2", "SE", "R2")))
This edit might answer this and your other question:
mins <- do.call(rbind, by(df, df[,c("ID", "variable")], function(z) z[which.min(z$value),]))
mins
# Delay ID variable value
# 12 6 A R2 0.22
# 36 6 B R2 0.22
# 60 6 C R2 0.22
# 84 6 D R2 0.22
# 19 1 A SE 0.28
# 43 1 B SE 0.28
# 67 1 C SE 0.28
# 91 1 D SE 0.28
ggplot(df[df$ID == "B",], aes(Delay, value, color = variable)) +
geom_line() +
geom_vline(aes(xintercept = Delay, color = variable), data = mins)
Or if you want to see multiple IDs, you can facet,
ggplot(df, aes(Delay, value, color = variable)) +
geom_line() +
geom_vline(aes(xintercept = Delay, color = variable), data = mins) +
facet_wrap("ID")
I think #r2evans approach to your specific problem is the correct one. However, to answer the more general question about how you can retrieve the colours from an applied colour scale (e.g. if you want to modify the colour etc), you can get it without going through ggbuild, using the following:
Plot$scales$get_scales("colour")$palette(2)
[1] "#0073C2FF" "#EFC000FF"
So we could do:
# Get colours
my_blue <- Plot$scales$get_scales("colour")$palette(2)[1]
my_yellow <- Plot$scales$get_scales("colour")$palette(2)[2]
# Get index of max R2 and min SE
maxR2 <- which.max(df$value[df$ID == "B" & df$variable == "R2"])
minSE <- which.min(df$value[df$ID == "B" & df$variable == "SE"])
# Get value of Delay at maxR2 and minSE
D_R2 <- df$Delay[df$ID == "B" & df$variable == "R2"][maxR2]
D_SE <- df$Delay[df$ID == "B" & df$variable == "SE"][minSE]
# Plot lines at the correct positions and with the desired colours
Plot + geom_vline(aes(xintercept = D_R2), colour = my_blue) +
geom_vline(aes(xintercept = D_SE), colour = my_yellow)
I have data which looks similar to example data below and I am attempting to draw a histogram of the measurement column faceted on the Genotype column. Ultimately I would like the colours of the bars to be conditional on the Genotype and Condition columns.
Crucially Genotype B individuals were never measured under condition L.
This is what the data looks like:
library(ggplot2)
library(dplyr)
set.seed(123)
DF <- data.frame(Genotype = rep(c("A", "B"), 500),
Condition = sample(c("E", "L"), 1000, replace = T),
Measurment = round(rnorm(500,10,3), 0))
DF <- anti_join(DF, filter(DF, Genotype == "B" & Condition != "E"))'
head(DF)
Genotype Condition Measurment
1 A L 18
2 A L 2
3 B E 18
4 B E 18
5 B E 16
6 B E 16
Now I to specify the colours of the bars I thought it easiest to create a new column of hexcodes such that all individuals of Genotype B are one colour, and individuals of Genotype A are a second colour if measured under Condition E and a third colour if measured under Condition L.
DF <- DF %>% mutate(colr = ifelse(Genotype == "B", "#409ccd",
ifelse(Condition == "E", "#43cd80", "#ffc0cb")))
I can then draw a histogram faceted on the Genotype column like so:
ggplot(data=DF, aes(Measurment, fill = Condition)) +
geom_histogram(aes(y=..count.., fill = colr), position='dodge', binwidth = 1) +
facet_wrap(~Genotype, nrow=2) +
scale_fill_manual(values = c("#409ccd","#ffc0cb","#43cd80")) +
theme(legend.position="none")
and it like like this:
However as you can see the columns for Genotype B are twice the size of Genotype A. How can I shrink the Genotype B to the same size as Genotype A?
I considered adding dummy entries to my data where Genotype B has Condition L entries but the binning function then counts these as Measurements which is misleading. I also have a version of this using geom_bar() but that results in a similar problem. ggplot must have a way of doing this.
Any help appreciated.
something like this maybe?
ggplot(data=DF, aes(Measurment, fill = Condition)) +
geom_histogram(data=subset(DF, Genotype!="B"),aes(y=..count.., fill = colr), position='dodge', binwidth = 1) +
geom_histogram(data=subset(DF, Genotype=="B"),aes(x = Measurment, y=..count.., fill = colr), position=position_nudge(x=0.25), binwidth = 0.5) +
facet_wrap(~Genotype, nrow=2) +
scale_fill_identity() +
theme(legend.position="none")
Do you want something like the following? I assumed by size of the column you meant bar width.
library(grid)
library(gridExtra)
p1 <- ggplot(data=DF[DF$Genotype=='A',], aes(Measurment, fill = Condition)) +
geom_histogram(aes(y=..count.., fill = colr), position='dodge', binwidth = 1) +
scale_fill_manual(values = c("#43cd80","#ffc0cb")) +
theme(legend.position="none")
p2 <- ggplot(data=DF[DF$Genotype=='B',], aes(Measurment, fill = Condition)) +
geom_histogram(aes(y=..count.., fill = colr), binwidth = 0.5, boundary = 1) +
scale_fill_manual(values = c("#409ccd")) +
theme(legend.position="none")
grid.arrange(p1, p2)
I've got a dataframe that looks like:
df<-data.frame(Date=as.Date(c("06-08-10","06-09-10","06-10-10","06-11-10","06-13-10")),closed_this_year_cum_gv=c(3,5,6,7,NA),opened_this_year_cum_gv=c(2,5,6,8,10),closed_last_year_cum_gv=c(5,6,7,8,10),opened_last_year_cum_gv=c(5,6,8,10,NA))
and have this framework for a plot using ggplot2:
ggplot(df, aes(x=Date))+
geom_line(aes(y=closed_this_year_cum_gv, color="blue"),linetype="dashed")+
geom_line(aes(y=opened_this_year_cum_gv, color="blue"))+
geom_line(aes(y=closed_last_year_cum_gv, color="red"),linetype="dashed")+
geom_line(aes(y=opened_last_year_cum_gv, color="red"))+
xlab("Date")+
ylab("Millions of Dollars")+
ggtitle("Cummulative Sum of TGV for Opened and Closed Cases - 2013 vs. 2012")
I tried this with the sample data but for some reason the lines aren't showing up (they're showing up with my real data). I want the NAs to not be graphed, which is why they aren't 0.
In my real data, it graphs, but the legend title has "blue" and it's contents are "blue" and "red" as labels. I want them to be labeled by year and opened/closed. I've tried various methods but nothing seems to override the legend.
How do I control the legend title and labels?
Edit: changed to class "Date"
ggplot is generelly happier to be fed with data in 'long' format, as opposed to wide. Then it is, among other things, easier to map different aesstetics to variables in the data set.
# some data massage before the plot
# reshape data from wide to long format
library(reshape2)
df2 <- melt(df)
# convert variable 'Date' to class 'Date'
df2$Date <- as.Date(df2$Date, format = "%m-%d-%y")
# create two variables
# var1: opened vs closed
df2$var1 <- ifelse(grepl(x = df2$variable, pattern = "opened"), "Opened", "Closed")
# set factor levels so that 'opened' comes before 'closed'
df2$var1 <- factor(df2$var1, levels = c("Opened", "Closed"))
# var2: this vs last year
df2$var2 <- ifelse(grepl(x = df2$variable, pattern = "this"), "This year", "Last year")
# plot
# use default colours, slightly pale 'red' and 'blue'
ggplot(df2, aes(x = Date, y = value, linetype = var1, colour = var2, group = interaction(var1, var2))) +
geom_line()
# if you want to set colours to red and blue, add this
+ scale_colour_manual(values = c("red", "blue"))
Update following comment
If you only want one legend, one possibility is to let linetype and colour to depend on 'variable'.
# set factor levels so that 'opened' comes before 'closed', and 'last' before 'this'
df2$variable <- factor(df2$variable,
levels = c("opened_last_year_cum_gv",
"closed_last_year_cum_gv",
"opened_this_year_cum_gv",
"closed_this_year_cum_gv")
)
ggplot(df2, aes(x = Date, y = value, linetype = variable, colour = variable, group = variable)) +
geom_line() +
scale_colour_manual(values = rep(c("red", "blue"), each = 2),
name = "",
labels = c("Opened last year",
"Closed last year",
"Opened this year",
"Closed this year")) +
scale_linetype_manual(values = rep(c("solid", "dashed"), 2),
name = "",
labels = c("Opened last year",
"Closed last year",
"Opened this year",
"Closed this year"))
You need to specify appropriate mappings in aes(). Try this:
ggplot(df, aes(x=Date)) +
geom_line(aes(y=closed_this_year_cum_gv, color="this", linetype="closed")) +
geom_line(aes(y=opened_this_year_cum_gv, color="this", linetype="opened")) +
geom_line(aes(y=closed_last_year_cum_gv, color="last", linetype="closed")) +
geom_line(aes(y=opened_last_year_cum_gv, color="last", linetype="opened")) +
xlab("Date") +
ylab("Millions of Dollars") +
ggtitle("Cummulative Sum of TGV for Opened and Closed Cases - 2013 vs. 2012") +
scale_colour_manual(name="year", values=c("this"="blue", "last"="red")) +
scale_linetype_manual(name="type", values=c(2, 1))