Change label keys and title on a density curve plot with ggplot2 - r

I have a density plot with this code:
p <- ggplot(data=paddling, aes(frequency, color=type, fill=type))
p <- p + geom_density(alpha=0.2)
p <- p + scale_x_continuous(limits=c(0, 1000),name='Frequency (Hz)')
I would like to change the legend keys and legend title. I tried using:
p <- p + scale_fill_discrete(name='Paddling type',labels=c("Hands only", "Hands and feet"))
But it just added another legend on top of the other one:
Any help would be greatly appreciated! Thank you!!

What you were doing was halfway there. As you have two aestethics in use (fill and color), both of these need a legend. So, if you change legend title and labels for fill, the legend for color doesn't change and gets plotted as is. As such, the solution is to add a 'scale_color_discrete':
#generate data
set.seed(123)
n=1000
paddling <- data.frame(frequency=runif(n,0,n),
type=sample(c("hand_only","with_feet"),n,T))
#plot
p <- ggplot(data=paddling, aes(frequency, color=type, fill=type)) +
geom_density(alpha=0.2)+
scale_x_continuous(limits=c(0, 1000),name='Frequency (Hz)') +
scale_fill_discrete(name='Paddling type',labels=c("Hands only", "Hands and feet"))+
scale_color_discrete(name='Paddling type',labels=c("Hands only", "Hands and feet"))
p
This can be a bit cumbersome if you have a lot of mappings, levels. This is another approach: change the data (both column name and factor levels)
#note the backticks needed for space
paddling$`Paddling type` <- paddling$type
levels(paddling$`Paddling type`)
levels(paddling$`Paddling type`) <- c("Hands only","Hands and feet")
p2 <- p <- ggplot(data=paddling, aes(frequency, color=`Paddling type`, fill=`Paddling type`)) +
geom_density(alpha=0.2)+
scale_x_continuous(limits=c(0, 1000),name='Frequency (Hz)')

Related

R ggplot2 overlapping histogram, adding in legend for overlapping part

I have a histogram that is plotting 2 different groups with some overlap between them. I have been able to manually color the groups and a legend is generated for each group, however I am asking how to add into the legend a color and label for the overlapping part?
For example, in the above histogram I would like to add a legend for the purplish part where A and B overlap (which should be labeled as "Overlap" in the legend, underneath B).
Code for generating above histogram:
set.seed(42)
n <- 100
dat <- data.frame(id=1:n,
group=rep(LETTERS[1:2], n/2),
x=rnorm(n))
ggplot(dat, aes(x=x, fill=group)) + geom_histogram(alpha=.5, position="identity") +
scale_fill_manual(values=c("blue","red"))
A partially overlap solution
Sample code:
library(ggplot2)
ggplot(dat, aes(x=x, fill=group)) +
geom_histogram(position = position_dodge(width = 0.6))+
scale_fill_manual(values=c("blue","red"))+
scale_y_continuous(expand=c(0,0))+
theme_bw()
Plot:

Ggplot with more than two legends

I have a data.frame which I'd like to scatter plot using ggplot.
The data have 3 factors whose levels I'd like to show in the legend, although the color of the points will only be according to one of these factors (df$group below).
Here's what I have so far:
set.seed(1)
df <- data.frame(x=rnorm(100),y=rnorm(100),
group=LETTERS[sample(5,100,replace=T)],
type=letters[sample(3,100,replace=T)],
background=sample(4,100,replace=T),stringsAsFactors=F)
df$group <- factor(df$group,LETTERS[1:5])
df$type <- factor(df$type,;etters[1:3])
df$background <- factor(df$background,c(1:4))
I manually specify colors:
require(RColorBrewer)
require(scales)
all.colors <- hcl(h=seq(0,(12-1)/(12),length=12)*360,c=100,l=65,fixup=TRUE)
group.colors <- all.colors[1:5]
type.colors <- all.colors[6:8]
background.colors <- all.colors[9:12]
This is what I have for showing the 3 factors in the legend (df$group and df$type):
require(ggplot2)
ggplot(df,aes(x=x,y=y,colour=group,fill=type,alpha=background))+geom_point(cex=2,shape=1,stroke=1)+
theme_bw()+theme(strip.background=element_blank())+scale_color_manual(drop=FALSE,values=group.colors,name="group")+
guides(fill=guide_legend(override.aes=list(colour=type.colors,pch=0)))
So my question is how to get background.colors appear in the legend under "background" rather than the gray scale colors chosen by default that currently appear there.
ggplot(df,aes(x=x, y=y, colour=group, fill=type, alpha=background))+
geom_point(cex=2, shape=1, stroke=1) +
theme_bw() +
theme(strip.background=element_blank()) +
scale_color_manual(drop=FALSE, values=group.colors, name="group") +
guides(fill=guide_legend(override.aes=list(colour=type.colors,pch=0)),
alpha=guide_legend(override.aes=list(colour=background.colors,pch=0)))

ggplot2: add conditional density curves describing both dimensions of scatterplot

I have scatterplots of 2D data from two categories. I want to add density lines for each dimension -- not outside the plot (cf. Scatterplot with marginal histograms in ggplot2) but right on the plotting surface. I can get this for the x-axis dimension, like this:
set.seed(123)
dim1 <- c(rnorm(100, mean=1), rnorm(100, mean=4))
dim2 <- rnorm(200, mean=1)
cat <- factor(c(rep("a", 100), rep("b", 100)))
mydf <- data.frame(cbind(dim2, dim1, cat))
ggplot(data=mydf, aes(x=dim1, y=dim2, colour=as.factor(cat))) +
geom_point() +
stat_density(aes(x=dim1, y=(-2+(..scaled..))),
position="identity", geom="line")
It looks like this:
But I want an analogous pair of density curves running vertically, showing the distribution of points in the y-dimension. I tried
stat_density(aes(y=dim2, x=0+(..scaled..))), position="identity", geom="line)
but receive the error "stat_density requires the following missing aesthetics: x".
Any ideas? thanks
You can get the densities of the dim2 variables. Then, flip the axes and store them in a new data.frame. After that it is simply plotting them on top of the other graph.
p <- ggplot(data=mydf, aes(x=dim1, y=dim2, colour=as.factor(cat))) +
geom_point() +
stat_density(aes(x=dim1, y=(-2+(..scaled..))),
position="identity", geom="line")
stuff <- ggplot_build(p)
xrange <- stuff[[2]]$ranges[[1]]$x.range # extract the x range, to make the new densities align with y-axis
## Get densities of dim2
ds <- do.call(rbind, lapply(unique(mydf$cat), function(lev) {
dens <- with(mydf, density(dim2[cat==lev]))
data.frame(x=dens$y+xrange[1], y=dens$x, cat=lev)
}))
p + geom_path(data=ds, aes(x=x, y=y, color=factor(cat)))
So far I can produce:
distrib_horiz <- stat_density(aes(x=dim1, y=(-2+(..scaled..))),
position="identity", geom="line")
ggplot(data=mydf, aes(x=dim1, y=dim2, colour=as.factor(cat))) +
geom_point() + distrib_horiz
And:
distrib_vert <- stat_density(data=mydf, aes(x=dim2, y=(-2+(..scaled..))),
position="identity", geom="line")
ggplot(data=mydf, aes(x=dim2, y=dim1, colour=as.factor(cat))) +
geom_point() + distrib_vert + coord_flip()
But combining them is proving tricky.
So far I have only a partial solution since I didn't manage to obtain a vertical stat_density line for each individual category, only for the total set. Maybe this can nevertheless help as a starting point for finding a better solution. My suggestion is to try with the ggMarginal() function from the ggExtra package.
p <- ggplot(data=mydf, aes(x=dim1, y=dim2, colour=as.factor(cat))) +
geom_point() + stat_density(aes(x=dim1, y=(-2+(..scaled..))),
position="identity", geom="line")
library(ggExtra)
ggMarginal(p,type = "density", margins = "y", size = 4)
This is what I obtain:
I know it's not perfect, but maybe it's a step in a helpful direction. At least I hope so. Looking forward to seeing other answers.

easiest way to discretize continuous scales for ggplot2 color scales?

Suppose I have this plot:
ggplot(iris) + geom_point(aes(x=Sepal.Width, y=Sepal.Length, colour=Sepal.Length)) + scale_colour_gradient()
what is the correct way to discretize the color scale, like the plot shown below the accepted answer here (gradient breaks in a ggplot stat_bin2d plot)?
ggplot correctly recognizes discrete values and uses discrete scales for these, but my question is if you have continuous data and you want a discrete colour bar for it (with each square corresponding to a value, and squares colored in a gradient still), what is the best way to do it? Should the discretizing/binning happen outside of ggplot and get put in the dataframe as a separate discrete-valued column, or is there a way to do it within ggplot? an example of what I'm looking for is similar to the scale shown here:
except I'm plotting a scatter plot and not something like geom_tile/heatmap.
thanks.
The solution is slightly complicated, because you want a discrete scale. Otherwise you could probably simply use round.
library(ggplot2)
bincol <- function(x,low,medium,high) {
breaks <- function(x) pretty(range(x), n = nclass.Sturges(x), min.n = 1)
colfunc <- colorRampPalette(c(low, medium, high))
binned <- cut(x,breaks(x))
res <- colfunc(length(unique(binned)))[as.integer(binned)]
names(res) <- as.character(binned)
res
}
labels <- unique(names(bincol(iris$Sepal.Length,"blue","yellow","red")))
breaks <- unique(bincol(iris$Sepal.Length,"blue","yellow","red"))
breaks <- breaks[order(labels,decreasing = TRUE)]
labels <- labels[order(labels,decreasing = TRUE)]
ggplot(iris) +
geom_point(aes(x=Sepal.Width, y=Sepal.Length,
colour=bincol(Sepal.Length,"blue","yellow","red")), size=4) +
scale_color_identity("Sepal.Length", labels=labels,
breaks=breaks, guide="legend")
You could try the following, I have your example code modified appropriately below:
#I am not so great at R, so I'll just make a data frame this way
#I am convinced there are better ways. Oh well.
df<-data.frame()
for(x in 1:10){
for(y in 1:10){
newrow<-c(x,y,sample(1:1000,1))
df<-rbind(df,newrow)
}
}
colnames(df)<-c('X','Y','Val')
#This is the bit you want
p<- ggplot(df, aes(x=X,y=Y,fill=cut(Val, c(0,100,200,300,400,500,Inf))))
p<- p + geom_tile() + scale_fill_brewer(type="seq",palette = "YlGn")
p<- p + guides(fill=guide_legend(title="Legend!"))
#Tight borders
p<- p + scale_x_continuous(expand=c(0,0)) + scale_y_continuous(expand=c(0,0))
p
Note the strategic use of cut to discretize the data followed by the use of color brewer to make things pretty.
The result looks as follows.

When I use stat_summary with line and point geoms I get a double legend

I have data for 4 sectors (A,B,C,D) and 5 years. I would like to draw 4 lines, 1 for each sector, adding a point for every year and add a fifth line representing the mean line using the stat_summary statement and controlling the line colors by means of scale_color_manual and point shapes in aes() argument. The problem is that if I add the point geom the legend is split in two parts one for point shapes and one for line colors. I didn't understand how to obtain 1 legend combining colors and points.
Here is an example. First of all let's build the data frame dtfr as follows:
a <- 100; b <- 100; c <- 100; d <- 100
for(k in 2:5){
a[k] <- a[k-1]*(1+rnorm(1)/100)
b[k] <- b[k-1]*(1+rnorm(1)/100)
c[k] <- c[k-1]*(1+rnorm(1)/100)
d[k] <- d[k-1]*(1+rnorm(1)/100)
}
v <- numeric()
for(k in 1:5){ v <- c(v,a[k],b[k],c[k],d[k]) }
dtfr <- data.frame(Year=rep(2008:2012,1, each=4),
Sector=rep(c("A","B","C","D"),5),
Value=v,
stringsAsFactors=F)
Now let us start to draw our graph by ggpolt2. In the first graph we draw lines and points geom without the mean line:
library(ggplot2)
ggplot(dtfr, aes(x=Year, y=Value)) +
geom_line(aes(group=Sector, color=Sector)) +
geom_point(aes(color=Sector, shape=Sector)) +
# stat_summary(aes(colour="mean",group=1), fun.y=mean, geom="line", size=1.1) +
scale_color_manual(values=c("#004E00", "#33FF00", "#FF9966", "#3399FF", "#FF004C")) +
ggtitle("Test for ggplot2 graph")
In this graph we have the legend with line colors and point shapes all in one:
But if I use the stat_summary to draw the mean line using the following code:
ggplot(dtfr, aes(x=Year, y=Value)) +
geom_line(aes(group=Sector, color=Sector)) +
geom_point(aes(color=Sector, shape=Sector)) +
stat_summary(aes(colour="mean",group=1), fun.y=mean, geom="line", size=1.1) +
scale_color_manual(values=c("#004E00", "#33FF00", "#FF9966", "#3399FF", "#FF004C")) +
ggtitle("Test for ggplot2 graph")
I get the mean (red) line but the legend is split into two parts one for line colors and one for point shapes. At this point my question is: How can I get the mean line graph with the legend like the one in the first graph? That is, how to get only one legend combining lines and shapes in the second graph where is drawn the mean line?
Try this:
ggplot(dtfr, aes(x=Year, y=Value)) +
geom_line(aes(group=Sector, color=Sector)) +
geom_point(aes(color=Sector, shape=Sector)) +
stat_summary(aes(colour="mean",shape="mean",group=1), fun.y=mean, geom="line", size=1.1) +
scale_color_manual(values=c("#004E00", "#33FF00", "#FF9966", "#3399FF", "#FF004C")) +
scale_shape_manual(values=c(1:4, 32)) +
ggtitle("Test for ggplot2 graph")
Maybe someone more knowledgeable can come in and correct my explanation (or provide a better solution), but here's how I understand it: You have 5 values in the color scale, but you only have 4 in the shape scale; you're missing a value for "mean". So the scales aren't really compatible in a way. You can fix this by assigning a blank shape (32) to your mean line.
Here is a different approach that calculates the summary/mean beforehand and adds it as an additional level to the data frame before building the plot.
The approach can be used to easily add an additional line but with a specific color, which may be desired for a summary/mean for example.
First, I calculate the mean and add it to the dtfr of the OP.
dtfr2 <- dtfr %>%
dplyr::group_by(Year) %>%
dplyr::summarise(Value = mean(Value)) %>%
dplyr::mutate(Sector = NA) %>%
dplyr::bind_rows(dtfr)
dtfr2 now has additional rows with the mean values stored in Value and NAs in Sector.
Then, building the plot is easy:
p1 <- ggplot(dtfr2, aes(x=Year, y=Value, color = Sector, shape = Sector)) +
geom_line() +
geom_point()
Finally, you may tweak the legend a little:
p1 +
scale_color_discrete(labels = c(letters[1:4], "M"), na.value = "black") +
scale_shape_discrete(labels = c(letters[1:4], "M"))

Resources