Add legend to ggplot2 line with point plot - r

I have a question about legends in ggplot2. I managed to plot two lines and two points in the same graph and want to add a legend with the two colors used. This is the code used
P <- ggplot() + geom_point(data = data_p,aes(x = V1,y = V2),shape = 0,col = "#56B4E9") + geom_line(data = data_p,aes(x = V1,y = V2),col = "#56B4E9")+geom_point(data = data_p,aes(x = V1,y = V3),shape = 1,col = "#009E73") + geom_line(data = data_p,aes(x = V1,y = V3),col = "#009E73")
and the output is
enter image description here
I try to use scale_color_manual and scale_shape_manual and scale_line_manual,but they don't work .
P + scale_color_manual(name = "group",values = c('#56B4E9' = '#56B4E9','#009E73' = '#009E73'),
breaks = c('#56B4E9','#009E73'),labels = c('B','H')) +
I want it like this
Here is the simple data if it can help you.
5 0.49216 0.45148
10 0.3913 0.35751
15 0.32835 0.30361
data_p

I would approach this problem in two steps.
Generally, to get stuff in the guides, ggplot2 wants you to put "aesthetics" like colour inside the aes() function. I typically do this inside the ggplot() rather than individually for each "geom", especially if everything kind of makes sense in a single dataframe.
My first step would be to remake your dataframe slightly. I would use the package tidyr (part of the tidyverse, like ggplot2, which is really nice for reformatting data and worth learning as you go), and do something like this
#key is the new variable that will be your color variable
#value is the numbers that had been in V2 and V3 that will now be your y-values
data_p %>% tidyr::gather (key = "color", value = "yval", V2, V3)
#now, I would rewrite your plot slightly
P<-(newdf %>% ggplot(aes(x=V1,y=yval, colour=color))
#when you put the ggplot inside parentheses,
#you can add each new layer on its own line, starting with the "+"
+ geom_point()
+ geom_line()
+ scale_color_manual(values=c("#56B4E9","#009E73"))
#theme classic is my preferred look in ggplot, usually
+ theme_classic()
)

Related

Bug in ggplot2?

I'm currently working on plotting simple plots using ggplot2.
The graph looks good, but there is one tiny detail I can't fix.
When you look at the legend, it says "Low n" twice. One of them should be "High n".
Here is my code:
half_plot <- ggplot() +
ggtitle(plot_title) +
geom_line(data = plot_dataframe_SD1, mapping = aes(x = XValues, y = YValues_SD1, color = "blue")) +
geom_line(data = plot_dataframe_SD2, mapping = aes(x = XValues, y = YValues_SD2, color = "green")) +
xlim(1, 2) +
ylim(1, 7) +
xlab("Standard Deviation") +
ylab(AV_column_name) +
scale_fill_identity(name = 'the fill', guide = 'legend',labels = c('m1')) +
scale_colour_manual(name = 'Legend',
values =c('blue'='blue','green'='green'),
labels = c(paste("Low ", Mod_column_name), paste("High ", Mod_column_name))
Here is the graph I get in my output:
So do you know how to fix this?
And there is one more thing that makes me curious: I can't remember that I changes anything in this code, but I know that the legend worked just fine a few days ago. I safed pictures I made wih this code and it looks alright..
Also if you have any further suggestions how to upgrade the graph, these suggestions are very welcome too.
When asking questions, it will help us if you provide a reproducible example including the data. With some sample data, there are a couple ways to fix it.
Sample data
library(dplyr)
plot_dataframe_SD1 = data.frame(XValues=seq(1,2,by=.2)) %>%
mutate(YValues_SD1=XValues*2)
plot_dataframe_SD2 = data.frame(XValues=seq(1,2,by=.2)) %>%
mutate(YValues_SD2=XValues*5)
The simplest way to modify your code is to supply the desired color label in the aesthetic.
Mod_column_name = 'n'
half_plot <- ggplot() +
# put the desired label name in the aesthetic
# link describing the bang bang operator (!!) https://www.r-bloggers.com/2019/07/bang-bang-how-to-program-with-dplyr/ geom_line(data=plot_dataframe_SD1,mapping=aes(x=XValues,y=YValues_SD1,color=!!paste('Low',Mod_column_name))) +
geom_line(data=plot_dataframe_SD2,mapping=aes(x=XValues,y=YValues_SD2,color=!!paste('High',Mod_column_name))) +
scale_color_manual(values=c('blue','green'),
labels=c(paste('Low',Mod_column_name),paste('High',Mod_column_name)))
A more general approach is to join the dataframes and pivot the joined df to have a column with the SD values and another to specify how to separate the colors. This makes it easier to plot without having to make multiple calls to geom_line.
# Join the dfs, pivot the SD columns longer, and make a new column with your desired labels
joined_df = plot_dataframe_SD1 %>% full_join(plot_dataframe_SD2,by='XValues') %>%
tidyr::pivot_longer(cols=contains('YValues'),names_to='df_num',values_to='SD') %>%
mutate(label_name=if_else(df_num == 'YValues_SD1',paste('Low',Mod_column_name),paste('High',Mod_column_name)))
# Simplified plot
ggplot(data=joined_df,aes(x=XValues,y=SD,color=label_name)) +
geom_line() +
scale_color_manual(values=c('blue','green'),
labels=c(paste('Low',Mod_column_name),paste('High',Mod_column_name)))

r facet_wrap not grouping properly with geom_point

I'm struggling with facet_wrap in R. It should be simple however the facet variable is not being picked up? Here is what I'm running:
plot = ggplot(data = item.household.descr.count, mapping = aes(x=item.household.descr.count$freq, y = item.household.descr.count$descr, color = item.household.descr.count$age.cat)) + geom_point()
plot = plot + facet_wrap(~ age.cat, ncol = 2)
plot
I colored the faceting variable to try to help illustrate what is going on. The plot should have only one color in each facet instead of what you see here. Does anyone know what is going on?
This error is caused by fact that you are using $and data frame name to refer to your variables inside the aes(). Using ggplot() you should only use variables names in aes() as data frame is named already in data=.
plot = ggplot(data = item.household.descr.count,
mapping = aes(x=freq, y = descr, color = age.cat)) + geom_point()
plot = plot + facet_wrap(~ age.cat, ncol = 2)
plot
Here is an example using diamonds dataset.
diamonds2<-diamonds[sample(nrow(diamonds),1000),]
ggplot(diamonds2,aes(diamonds2$carat,diamonds2$price,color=diamonds2$color))+geom_point()+
facet_wrap(~color)
ggplot(diamonds2,aes(carat,price,color=color))+geom_point()+
facet_wrap(~color)

How to obtain y-axis-labels in ggplot2? [duplicate]

I have created a function for creating a barchart using ggplot.
In my figure I want to overlay the plot with white horizontal bars at the position of the tick marks like in the plot below
p <- ggplot(iris, aes(x = Species, y = Sepal.Width)) +
geom_bar(stat = 'identity')
# By inspection I found the y-tick postions to be c(50,100,150)
p + geom_hline(aes(yintercept = seq(50,150,50)), colour = 'white')
However, I would like to be able to change the data, so I can't use static positions for the lines like in the example. For example I might change Sepal.With to Sepal.Height in the example above.
Can you tell me how to:
get the tick positions from my ggplot; or
get the function that ggplot uses for tick positions so that I can use this to position my lines.
so I can do something like
tickpositions <- ggplot_tickpostion_fun(iris$Sepal.Width)
p + scale_y_continuous(breaks = tickpositions) +
geom_hline(aes(yintercept = tickpositions), colour = 'white')
A possible solution for (1) is to use ggplot_build to grab the content of the plot object. ggplot_build results in "[...] a panel object, which contain all information about [...] breaks".
ggplot_build(p)$layout$panel_ranges[[1]]$y.major_source
# [1] 0 50 100 150
See edit for pre-ggplot2 2.2.0 alternative.
Check out ggplot2::ggplot_build - it can show you lots of details about the plot object. You have to give it a plot object as input. I usually like to str() the result of ggplot_build to see what all the different values it has are.
For example, I see that there is a panel --> ranges --> y.major_source vector that seems to be what you're looking for. So to complete your example:
p <- ggplot() +
geom_bar(data = iris, aes(x = Species, y = Sepal.Width), stat = 'identity')
pb <- ggplot_build(p)
str(p)
y.ticks <- pb$panel$ranges[[1]]$y.major_source
p + geom_hline(aes(yintercept = y.ticks), colour = 'white')
Note that I moved the data argument from the main ggplot function to inside geom_bar, so that geom_line would not try to use the same dataset and throw errors when the number in iris is not a multiple of the number of lines we're drawing. Another option would be to pass a data = data.frame() argument to geom_line; I cannot comment on which one is a more correct solution, or if there's a nicer solution altogether. But the gist of my code still holds :)
For ggplot 3.1.0 this worked for me:
ggplot_build(p)$layout$panel_params[[1]]$y.major_source
#[1] 0 50 100 150
for sure you can. Read the help file for the seq() function.
seq(from = min(), to = max(), len = 5)
and do something like this.
p <- ggplot(iris, aes(x = Species, y = Sepal.Width)) +
geom_bar(stat = 'identity')
p + geom_hline(aes(yintercept = seq(from = min(), to = max(), len = 5)), colour = 'white')

'Merging' two plots using ggplot2 and R

Given the following dataset:
data = cbind(1:10,c('open','reopen','closed'),letters[1:3],1:10)
data = rbind(data,cbind(1:10,c('open','closed','reopen'),letters[1:3],5:10))
data = rbind(data,cbind(1:10,c('closed','open','reopen'),letters[1:3],3:10))
data = data.frame(data);
colnames(data) <- c("id","status","author","when")
I'd like to get a plot similar to the following:
ggplot(data, aes(when,id)) +
geom_line(aes(group = id,colour = status)) +
geom_point(aes(group = id,colour = author))
But, as such I get a single legend by 'author' with the status and author values. How can I get the same result but with a legend for author and other for status? My rationale is that I want to layer two plots of the same dataset on top of each other.
I don't think you can have different color scales / legends for one ggplot. You could hack something together (see this question for legend hacking), but in this case where one of your geom's is point, you could just use fill and one of the point options that are filled in.
ggplot(data, aes(when,id)) +
geom_line(aes(group = id,colour = status)) +
geom_point(aes(group = id, fill = author),
shape = 21, color = NA, size = 4)
Here the colors used are the same for each, but you can edit the color or fill scales individually, e.g., adding
scale_fill_brewer(type = "qual") +
scale_color_brewer(type = "qual", palette = 2)
I do agree with AndyClifton that using color in two ways will be hard to distinguish. You could also experiment with line types, point shapes, or even plotting with geom_text using a word, a letter, or a number as a label instead of points. You say you have more than 6 values for author, but it will be very difficult to distinguish more than 6 colors for author, especially when color is also being used for status.
Let's take your data. First you should be aware that you have a problem that your when and id column is a string, so you are plotting 1, 10, 2, 3, ... not 1,...9,10. We can fix that:
data$when.num <-as.numeric(as.character(data$when))
data$id.num <-as.numeric(as.character(data$id))
Then we'll plot it but use different shapes to get two different legends:
require(ggplot2)
p <- ggplot(data, aes(x = when.num, y = id)) +
geom_line(aes(group = id,colour = status)) +
geom_point(aes(group = id,shape = author))
print(p)
And you get this:
I think this is much clearer than using coloured points for the author, but this is a question of taste.

Omitting a Missing x-axis Value in ggplot2 (Convert range to categorical variable)

I am using ggplot to generate a chart that summarises a race made up from several laps. There are 24 participants in the race,numbered 1-12, 14-25; I am plotting out a summary measure for each participant using ggplot, but ggplot assumes I want the number range 1-25, rather than categories 1-12, 14-25.
What's the fix for this? Here's the code I am using (the data is sourced from a Google spreadsheet).
sskey='0AmbQbL4Lrd61dHlibmxYa2JyT05Na2pGVUxLWVJYRWc'
library("ggplot2")
require(RCurl)
gsqAPI = function(key,query,gid){ return( read.csv( paste( sep="", 'http://spreadsheets.google.com/tq?', 'tqx=out:csv', '&tq=', curlEscape(query), '&key=', key, '&gid=', curlEscape(gid) ) ) ) }
sin2011racestatsX=gsqAPI(sskey,'select A,B,G',gid='13')
sin2011proximity=gsqAPI(sskey,'select A,B,C',gid='12')
h=sin2011proximity
k=sin2011racestatsX
l=subset(h,lap==1)
ggplot() +
geom_step(aes(x=h$car, y=h$pos, group=h$car)) +
scale_x_discrete(limits =c('VET','WEB','HAM','BUT','ALO','MAS','SCH','ROS','SEN','PET','BAR','MAL','','SUT','RES','KOB','PER','BUE','ALG','KOV','TRU','RIC','LIU','GLO','AMB'))+
xlab(NULL) + opts(title="F1 2011 Korea \nRace Summary Chart", axis.text.x=theme_text(angle=-90, hjust=0)) +
geom_point(aes(x=l$car, y=l$pos, pch=3, size=2)) +
geom_point(aes(x=k$driverNum, y=k$classification,size=2), label='Final') +
geom_point(aes(x=k$driverNum, y=k$grid, col='red')) +
ylab("Position")+
scale_y_discrete(breaks=1:24,limits=1:24)+ opts(legend.position = "none")
Expanding on my cryptic comment, try this:
#Convert these to factors with the appropriate labels
# Note that I removed the ''
h$car <- factor(h$car,labels = c('VET','WEB','HAM','BUT','ALO','MAS','SCH','ROS','SEN','PET','BAR','MAL',
'SUT','RES','KOB','PER','BUE','ALG','KOV','TRU','RIC','LIU','GLO','AMB'))
k$driverNum <- factor(k$driverNum,labels = c('VET','WEB','HAM','BUT','ALO','MAS','SCH','ROS','SEN','PET','BAR','MAL',
'SUT','RES','KOB','PER','BUE','ALG','KOV','TRU','RIC','LIU','GLO','AMB'))
l=subset(h,lap==1)
ggplot() +
geom_step(aes(x=h$car, y=h$pos, group=h$car)) +
geom_point(aes(x=l$car, y=l$pos, pch=3, size=2)) +
geom_point(aes(x=k$driverNum, y=k$classification,size=2), label='Final') +
geom_point(aes(x=k$driverNum, y=k$grid, col='red')) +
ylab("Position") +
scale_y_discrete(breaks=1:24,limits=1:24) + opts(legend.position = "none") +
opts(title="F1 2011 Korea \nRace Summary Chart", axis.text.x=theme_text(angle=-90, hjust=0)) + xlab(NULL)
Calling scale_x_discrete is no longer necessary. And stylistically, I prefer putting opts and xlab stuff at the end.
Edit
A few notes in response to your comment. Many of your difficulties can be eased by a more streamlined use of ggplot. Your data is in an awkward format:
#Summarise so we can use geom_linerange rather than geom_step
d1 <- ddply(h,.(car),summarise,ymin = min(pos),ymax = max(pos))
#R has a special value for missing data; use it!
k$classification[k$classification == 'null'] <- NA
k$classification <- as.integer(k$classification)
#The other two data sets should be merged and converted to long format
d2 <- merge(l,k,by.x = "car",by.y = "driverNum")
colnames(d2)[3:5] <- c('End of Lap 1','Final Position','Grid Position')
d2 <- melt(d2,id.vars = 1:2)
#Now the plotting call is much shorter
ggplot() +
geom_linerange(data = d1,aes(x= car, ymin = ymin,ymax = ymax)) +
geom_point(data = d2,aes(x= car, y= value,shape = variable),size = 2) +
opts(title="F1 2011 Korea \nRace Summary Chart", axis.text.x=theme_text(angle=-90, hjust=0)) +
labs(x = NULL, y = "Position", shape = "")
A few notes. You were setting aesthetics to fixed values (size = 2) which should be done outside of aes(). aes() is for mapping variables (i.e. columns) to aesthetics (color, shape, size, etc.). This allows ggplot to intelligently create the legend for you.
Merging the second two data sets and then melting it creates a grouping variable for ggplot to use in the legend. I used the shape aesthetic since a few values overlap; using color may make that hard to spot. In general, ggplot will resist mixing aesthetics into a single legend. If you want to use shape, color and size you'll get three legends.
I prefer setting labels using labs, since you can do them all in one spot. Note that setting the aesthetic label to "" removes the legend title.

Resources