I'm mapping size to a variable with something like a log distribution - mostly small values but a few very large ones. How can I make the legend display custom values in the low-value range? For example:
df = data.frame(x=rnorm(2000), y=rnorm(2000), v=abs(rnorm(2000)^5))
p = ggplot(df, aes(x, y)) +
geom_point(aes(col=v, size=v), alpha=0.75) +
scale_size_area(max_size = 10)
print(p)
I've tried p + guides(shape=guide_legend(override.aes=list(size=8))) solution posted in this SO question, but it makes no difference in my plot. In any case I'd like to use specific legend size values e.g. v = c(10,25,50,100,250,500) instead of the default range e.g. c(100,200,300,400)..
Grateful for assistance.
To get different break points of size in legend, modify scale_size_area() by adding argument breaks=. With breaks= you can set breakpoints at positions you need.
ggplot(df, aes(x, y)) +
geom_point(aes(col=v, size=v), alpha=0.75) +
scale_size_area(max_size = 10,breaks=c(10,25,50,100,250,500))
Related
I am trying to use scale_y_continuous() with a faceted histogram and running into an issue. I am hoping to get each count to be a percentage instead. My code is:
ggplot(d, aes(x = likely_att)) +
geom_histogram(binwidth = 0.5, color = "black") +
facet_wrap(~married, scales = "free_y") +
theme_classic() +
scale_y_continuous(labels = percent_format())
It looks like the distributions themselves are accurate, but the scaling is off: the percentages are "200 000%", "5 000%", etc. and that seems wrong, but I'm not quite sure why it's happening.
There are many more "yes" than "no" or "separated" married values in my dataset, which is why I use scales = "free_y" and why I'm hoping to just have percentages shown and only need one axis value shown.
I can't share this exact data for privacy reasons, but the likely_att variable is just a 1-5 numeric var, and married is a character var with 3 values: yes, no, separated.
In case it's helpful, I basically want it to look just like this image, but with percentages instead of counts, so I can just have one single y axis on the far left with 0 - 100 %
The problem is that using the percentage_format() function changes the way the labels are printed, but it doesn't actually rescale the numbers. To do that, you could use the density constructed variable and multiply it by the bin-width, then use the percent formatting.
ggplot(d, aes(x = likely_att)) +
stat_bin(aes(y=..density..*.5, group = married),
binwidth = 0.5, color = "black") +
facet_wrap(~married, scales = "free_y") +
theme_classic() +
scale_y_continuous(labels = percent_format())
This question is motivated by a previous post illustrating various ways to change how axes scales are plotted in a ggplot figure, from the default exponential notation to the full integer value (when ones axes values are very large). While I am able to convert the axes scales from exponential notation to full values, I am unclear how one would achieve the same goal for the values appearing in the legend.
While I understand that one can manually change the length of the legend scale with "scale_color..." or "scale_fill..." followed by the "limits" argument, this does not appear to be a solution to getting my legend values to show "6000000000" rather than "6e+09" (or "0" rather than "0e+00" for that matter).
The following example should suffice. My hope is someone can point out how to implement the 'scales' package to apply for legend scales rather than axes scales.
Thanks very much.
library(ggplot2)
library(scales)
Data <- data.frame(
pi = c(2,71,828,1828,45904,523536,2874713,52662497,757247093,6999595749),
e = c(3,14,159,2653,58979,311599,7963468,54418516,1590576171, 99),
face = 1:10)
p <- ggplot(data = Data, aes(x=face, y=e, colour = pi))
myplot <- p + geom_point() +
scale_y_continuous(labels = comma) +
scale_color_gradientn(colours = rainbow(2), limits=c(0,7000000000))
myplot
Use the Comma formatter in scale_color_gradientn by setting labels = comma e.g.:
p <- ggplot(data = Data, aes(x=face, y=e, colour = pi))
myplot <- p + geom_point() +
scale_y_continuous(labels = comma) +
scale_color_gradientn(colours = rainbow(2), limits=c(0,7000000000), labels = comma)
myplot
Here is my dummy code:
set.seed(1)
df <- data.frame(xx=sample(10,6),
yy=sample(10,6),
type2=c('a','b','a','a','b','b'),
type3=c('A','C','B','A','B','C')
)
ggplot(data=df, mapping = aes(x=xx, y=yy)) +
geom_point(aes(shape=type3, fill=type2), size=5) +
scale_shape_manual(values=c(24,25,21)) +
scale_fill_manual(values=c('green', 'red'))
Resulting plot has a legend but it's 'type2' section doesn't reflect scale of fill value - is it by design?
I know this is an old thread, but I ran into this exact problem and want to post this here for others like me. While the accepted answer works, the less risky, cleaner method is:
library(ggplot2)
ggplot(data=df, mapping = aes(x=xx, y=yy)) +
geom_point(aes(shape=type3, fill=type2), size=5) +
scale_shape_manual(values=c(24,25,21)) +
scale_fill_manual(values=c(a='green',b='red'))+
guides(fill=guide_legend(override.aes=list(shape=21)))
The key is to change the shape in the legend to one of those that can have a 'fill'.
Here's a different workaround.
library(ggplot2)
ggplot(data=df, mapping = aes(x=xx, y=yy)) +
geom_point(aes(shape=type3, fill=type2), size=5) +
scale_shape_manual(values=c(24,25,21)) +
scale_fill_manual(values=c(a='green',b='red'))+
guides(fill=guide_legend(override.aes=list(colour=c(a="green",b="red"))))
Using guide_legend(...) with override_aes is a way to influence the appearance of the guide (the legend). The hack is that here we are "overriding" the fill colors in the guide with the colors they should have had in the first place.
I played with the data and came up with this idea. I first assigned shape in the first geom_point. Then, I made the shapes empty. In this way, outlines stayed in black colour. Third, I manually assigned specific shape. Finally, I filled in the symbols.
ggplot(data=df, aes(x=xx, y=yy)) +
geom_point(aes(shape = type3), size = 5.1) + # Plot with three types of shape first
scale_shape(solid = FALSE) + # Make the shapes empty
scale_shape_manual(values=c(24,25,21)) + # Assign specific types of shape
geom_point(aes(color = type2, fill = type2, shape = type3), size = 4.5)
I'm not sure if what you want looks like this?
ggplot(df,aes(x=xx,y=yy))+
geom_point(aes(shape=type3,color=type2,fill=type2),size=5)+
scale_shape_manual(values=c(24,25,21))
How does one distinguish 4 different factors (not using size)? Is it possible to use hollow and solid points to distinguish a variable in ggplot2?
test=data.frame(x=runif(12,0,1),
y=runif(12,0,1),
siteloc=as.factor(c('a','b','a','b','a','b','a','b','a','b','a','b')),
modeltype=as.factor(c('q','r','s','q','r','s','q','r','s','q','r','s')),
mth=c('Mar','Apr','May','Mar','Apr','May','Mar','Apr','May','Mar','Apr','May'),
yr=c(2010,2011,2010,2011,2010,2011,2010,2011,2010,2011,2010,2011))
where x are observations and y are modeling results and I want to compare different model versions across several factors. Thanks!
I think , it very difficult visually to distinguish/compare x and y values according to 4 factors. I would use faceting and I reduce the number of factors using interaction for example.
Here an example using geom_bar:
set.seed(10)
library(reshape2)
test.m <- melt(test,measure.vars=c('x','y'))
ggplot(test.m)+
geom_bar(aes(x=interaction(yr,mth),y=value,
fill=variable),stat='identity',position='dodge')+
facet_grid(modeltype~siteloc)
I really like using interaction by agstudy - I would probably try this first. But if keeping things unchanged then:
4 factors could be accomodated with faceting and 2 axes. Then there are 2 metrics x and y: one option is a bubble chart with both metrics distinguishing by color or shape or both (added jitter to make shapes less overlapping):
testm = melt(test, id=c('siteloc', 'modeltype', 'mth', 'yr'))
# by color
ggplot(testm, aes(x=siteloc, y=modeltype, size=value, colour=variable)) +
geom_point(shape=21, position="jitter") +
facet_grid(mth~yr) +
scale_size_area(max_size=40) +
scale_shape(solid=FALSE) +
theme_bw()
#by shape
testm$shape = as.factor(with(testm, ifelse(variable=='x', 21, 25)))
ggplot(testm, aes(x=siteloc, y=modeltype, size=value, shape=shape)) +
geom_point(position="jitter") +
facet_grid(mth~yr) +
scale_size_area(max_size=40) +
scale_shape(solid=FALSE) +
theme_bw()
# by shape and color
ggplot(testm, aes(x=siteloc, y=modeltype, size=value, colour=variable, shape=shape)) +
geom_point(position="jitter") +
facet_grid(mth~yr) +
scale_size_area(max_size=40) +
scale_shape(solid=FALSE) +
theme_bw()
UPDATE:
This is attempt based on 1st comment by Dominik to show if (x,y) is above or below 1:1 line and how big is the ratio x/y or y/x - blue triangle is if x/y>1, red circle otherwise (no need in melt in this case):
test$shape = as.factor(with(test, ifelse(x/y>1, 25, 21)))
test$ratio = with(test, ifelse(x/y>1, x/y, y/x))
ggplot(test, aes(x=siteloc, y=modeltype, size=ratio, colour=shape, shape=shape)) +
geom_point() +
facet_grid(mth~yr) +
scale_size_area(max_size=40) +
scale_shape(solid=FALSE) +
theme_bw()
You can use hollow and solid points, but only with certain shapes as described in this answer.
So, that leaves you with fill, colour, shape, and alpha as your aesthetic mappings. It looks ugly, but here it is:
ggplot(test, aes(x, y,
fill=modeltype,
shape=siteloc,
colour=mth,
alpha=factor(yr)
)) +
geom_point(size = 4) +
scale_shape_manual(values=21:25) +
scale_alpha_manual(values=c(0.35,1))
Ugly, but I guess it is what you asked for. (I haven't bothered to figure out what is happening with the legend -- it obviously isn't displaying the borders right.)
If you want to map a variable to a kind of custom aesthetic (hollow and solid), you'll have to go a little further:
test$fill.type<-ifelse(test$yr==2010,'other',as.character(test$mth))
cols<-c('red','green','blue')
ggplot(test, aes(x, y,
shape=modeltype,
alpha=siteloc,
colour=mth,
fill=fill.type
)) +
geom_point(size = 10) +
scale_shape_manual(values=21:25) +
scale_alpha_manual(values=c(1,0.5)) +
scale_colour_manual(values=cols) +
scale_fill_manual(values=c(cols,NA))
Still ugly, but it works. I don't know a cleaner way of mapping both the yr to one colour if it is 2010 and the mth if not; I'd be happy if someone showed me a cleaner way to do that. And now the guides (legend) is totally wrong, but you can fix that manually.
I have the following plot but do not want the legend for point size to show. Also how can I change the title for the factor(grp)? Sorry I know this should be an easy one but I am stuck.
df1<-data.frame(x=c(3,4,5),y=c(15,20,25),grp=c(1,2,2))
p<-ggplot(df1,aes(x,y))
p<-p+ geom_point(aes(colour=factor(grp),size=4))
p
df2<-data.frame(x=c(3.5,4.5,5.5),y=c(15.5,20.5,25.5))
p<-p + geom_path(data=df2,aes(x=x,y=y))
p
To change the legend title, it's easier (I find) to just change the data frame title:
df1$grp = factor(df1$grp)
colnames(df1)[3] = "Group"
The reason why size appears in the legend, is because you have made it an aesthetic - it's not! An aesthetic is something that varies with data. Here size is fixed:
p = ggplot(df1,aes(x,y))
p = p+ geom_point(aes(colour=Group), size=4)
You can also change the name of the legend in ggplot itself:
p = p + scale_colour_discrete(name="Group")
Leave the size out of the aesthetics.
ggplot(df1,aes(x,y)) + geom_point(aes(colour = factor(grp)), size=4) +
scale_colour_discrete(name = "Grp")