I can't seem to be able to fill a boxplot by a continuous value using color brewer, and I know it must just be a simple swap of syntax somewhere, since I can get the outlines of the boxes to adjust based on continuous values. Here's the data I'm working with:
data <- data.frame(
value = sample(1:50),
animals = sample(c("cat","dog","zebra"), 50, replace = TRUE),
region = sample(c("forest","desert","tundra"), 50, replace = TRUE)
)
I want to make a paneled boxplot, ordered by median "value", with the depth of color fill for each box increasing with "value" (I know this is redundant, but bear with me for the sake of the example)
(Ordering the data):
orderindex <- order(as.numeric(by(data$value, data$animals, median)))
data$animals <- ordered(data$animals, levels=levels(data$animals)[orderindex])
If I create the boxplot with panels, I can adjust the color of the outlines:
library(ggplot2)
first <- qplot(animals, value, data = data, colour=animals)
second <- first + geom_boxplot() + facet_grid(~region)
third <- second + scale_colour_brewer()
print(third)
But I want to do what I did to the outlines, but instead with the fill of each box (so each box gets darker as "value" increases). I thought that it might be a matter of putting the "scale_colour_brewer()" argument within the aesthetic argument for geom_boxplot, ie
second <- first + geom_boxplot(aes(scale_colour_brewer())) + facet_grid(~region)
but that doesn't seem to do the trick. I know it's a matter of positioning for this "scale_colour_brewer" argument; I just don't know where it goes!
(there is a similar question here but it's not quite what I'm looking for, since the colors of the box don't increase along a spectrum/gradient with some continuous value; it looks like these values are basically factors: Add color to boxplot - "Continuous value supplied to discrete scale" error, and the example at the ggplot site with the cars package:
http://docs.ggplot2.org/0.9.3.1/geom_boxplot.html doesn't seem to work when I set "fill" to "value" ... I get the error:
Error in unit(tic_pos.c, "mm") : 'x' and 'units' must have length > 0)
)
If you need to set fill for the boxplots then instead of color=animals use fill=animals and the same way replace scale_color_brewer() with scale_fill_brewer().
qplot(animals, value, data = data, fill=animals)+
geom_boxplot() + facet_grid(~region) + scale_fill_brewer()
Related
I have used the following code to generate a plot with ggplot:
I want the legend to show the runs 1-8 and only the volumes 12.5 and 25 why doesn't it show it?
And is it possible to show all the points in the plot even though there is an overlap? Because right now the plot only shows 4 of 8 points due to overlap.
OP. You've already been given a part of your answer. Here's a solution given your additional comment and some explanation.
For reference, you were looking to:
Change a continuous variable to a discrete/discontinuous one and have that reflected in the legend.
Show runs 1-8 labeled in the legend
Disconnect lines based on some criteria in your dataset.
First, I'm representing your data here again in a way that is reproducible (and takes away the extra characters so you can follow along directly with all the code):
library(ggplot2)
mydata <- data.frame(
`Run`=c(1:8),
"Time"=c(834, 834, 584, 584, 1184, 1184, 938, 938),
`Area`=c(55.308, 55.308, 79.847, 79.847, 81.236, 81.236, 96.842, 96.842),
`Volume`=c(12.5, 12.5, 12.5, 12.5, 25.0, 25.0, 25.0, 25.0)
)
Changing to a Discrete Variable
If you check the variable type for each column (type str(mydata)), you'll see that mydata$Run is an int and the rest of the columns are num. Each column is understood to be a number, which is treated as if it were a continuous variable. When it comes time to plot the data, ggplot2 understands this to mean that since it is reasonable that values can exist between these (they are continuous), any representation in the form of a legend should be able to show that. For this reason, you get a continuous color scale instead of a discrete one.
To force ggplot2 to give you a discrete scale, you must make your data discrete and indicate it is a factor. You can either set your variable as a factor before plotting (ex: mydata$Run <- as.factor(mydata$Run), or use code inline, referring to aes(size = factor(Run),... instead of just aes(size = Run,....
Using reference to factor(Run) inline in your ggplot calls has the effect of changing the name of the variable to be "factor(Run)" in your legend, so you will have to also add that to the labs() object call. In the end, the plot code looks like this:
ggplot(data = mydata, aes(x=Area, y=Time)) +
geom_point(aes(color =as.factor(Volume), size = Run)) +
geom_line() +
labs(
x = "Area", y = "Time",
# This has to be changed now
color='Volume'
) +
theme_bw()
Note in the above code I am also not referring to mydata$Run, but just Run. It is greatly preferable that you refer to just the name of the column when using ggplot2. It works either way, but much better in practice.
Disconnect Lines
The reason your lines are connected throughout the data is because there's no information given to the geom_line() object other than the aesthetics of x= and y=. If you want to have separate lines, much like having separate colors or shapes of points, you need to supply an aesthetic to use as a basis for that. Since the two lines are different based on the variable Volume in your dataset, you want to use that... but keep the same color for both. For this, we use the group= aesthetic. It tells ggplot2 we want to draw a line for each piece of data that is grouped by that aesthetic.
ggplot(data = mydata, aes(x=Area, y=Time)) +
geom_point(aes(color =as.factor(Volume), size = Run)) +
geom_line(aes(group=as.factor(Volume))) +
labs(
x = "Area", y = "Time", color='Volume'
) +
theme_bw()
Show Runs 1-8 Labeled in Legend
Here I'm reading a bit into what you exactly wanted to do in terms of "showing runs 1-8" in the legend. This could mean one of two things, and I'll assume you want both and show you how to do both.
Listing and showing sizes 1-8 in the legend.
To set the values you see in the scale (legend) for size, you can refer to the various scale_ functions for all types of aesthetics. In this case, recall that since mydata$Run is an int, it is treated as a continuous scale. ggplot2 doesn't know how to draw a continuous scale for size, so the legend itself shows discrete sizes of points. This means we don't need to change Run to a factor, but what we do need is to indicate specifically we want to show in the legend all breaks in the sequence from 1 to 8. You can do this using scale_size_continuous(breaks=...).
ggplot(data = mydata, aes(x=Area, y=Time)) +
geom_point(aes(color =as.factor(Volume), size = Run)) +
geom_line(aes(group=as.factor(Volume))) +
labs(
x = "Area", y = "Time", color='Volume'
) +
scale_size_continuous(breaks=c(1:8)) +
theme_bw()
Showing all of your runs as points.
The note about showing all runs might also mean you want to literally see each run represented as a discrete point in your plot. For this... well, they already are! ggplot2 is plotting each of your points from your data into the chart. Since some points share the same values of x= and y=, you are getting overplotting - the points are drawn over top of one another.
If you want to visually see each point represented here, one option could be to use geom_jitter() instead of geom_point(). It's not really great here, because it will look like your data has different x and y values, but it is an option if this is what you want to do. Note in the code below I'm also changing the shape of the point to be a hollow circle for better clarity, where the color= is the line around each point (here it's black), and the fill= aesthetic is instead used for Volume. You should get the idea though.
set.seed(1234) # using the same randomization seed ensures you have the same jitter
ggplot(data = mydata, aes(x=Area, y=Time)) +
geom_jitter(aes(fill =as.factor(Volume), size = Run), shape=21, color='black') +
geom_line(aes(group=as.factor(Volume))) +
labs(
x = "Area", y = "Time", fill='Volume'
) +
scale_size_continuous(breaks=c(1:8)) +
theme_bw()
I would like to plot some horizontal lines onto a scatterplot (e.g. with geom_hline) and then put some error ribbons around those lines that have different widths for each line.
I have a data frame consisting of a continuous x and y and grouping factor:
#make the dataframe:
so<-data.frame(expand.grid(x=c(1:5),sys=c("a","b","c","d")))
so$y<-c(1,2,1,3,2,2,1,3,2,3,4,3,2,3,4,5,4,3,4,5)
And a second dataframe with information for some hlines and error ribbons that I would like to add to the plot:
#make the second dataframe:
so2<-data.frame(sys=c("a","b","c","d"),yint=c(1.4,2.3,3.5,4.6),low=c(1.2,2.1,3.4,4.1),
upp=c(1.6,2.7,3.6,4.7))
I can create a plot with the hlines:
ggplot(so,aes(x=x,y=y,colour=sys)) +
geom_point(position=position_jitter()) +
geom_hline(data=so2,aes(yintercept=yint,colour=sys))
But if I try to put the ribbons around them, the ggplot gets lost without x values:
ggplot(so,aes(x=x,y=y,colour=sys)) +
geom_point(position=position_jitter()) +
geom_hline(data=so2,aes(yintercept=yint,colour=sys))+
geom_ribbon(data=so2,aes(ymin=low,ymax=upp))
#Error in FUN(X[[i]], ...) : object 'x' not found
Is it possible to get geom_ribbon to act like geom_hline? Or is there a workaround of e.g. plotting the upper and lower bounds as hlines and somehow shading between them?
I'm not sure I understand what you're trying to achieve, but if you use geom_rect() instead of geom_ribbon() you can indicate the upper/lower bounds, e.g.
library(tidyverse)
so<-data.frame(expand.grid(x=c(1:5),sys=c("a","b","c","d")))
so$y<-c(1,2,1,3,2,2,1,3,2,3,4,3,2,3,4,5,4,3,4,5)
#make the second dataframe:
so2<-data.frame(sys=c("a","b","c","d"),yint=c(1.4,2.3,3.5,4.6),low=c(1.2,2.1,3.4,4.1),
upp=c(1.6,2.7,3.6,4.7))
ggplot(so,aes(x=x,y=y,colour=sys)) +
geom_point(position=position_jitter()) +
geom_hline(data=so2,aes(yintercept=yint, colour=sys)) +
geom_rect(data = so2, aes(ymin = low, ymax = upp,
xmin = 0.5, xmax = 5.5, fill=sys),
alpha = 0.2, inherit.aes = FALSE)
The issue with geom_ribbon() is that you have a single upper / lower bounds for all values of x, so I don't know how to make it work with geom_ribbon() unless your actual data is different to this minimal reproducible example. Hopefully this helps and makes sense.
I'm plotting a sort of chloropleth of up to three selectable species abundances across a research area. This toy code behaves as expected and does almost what I want:
library(dplyr)
library(ggplot2)
square <- expand.grid(X=0:10, Y=0:10)
sq2 <- square[rep(row.names(square), 2),] %>%
arrange(X,Y) %>%
mutate(SPEC = rep(c('red','blue'),len=n())) %>%
mutate(POP = ifelse(SPEC %in% 'red', X, Y)) %>%
group_by(X,Y) %>%
mutate(CLR = rgb(X/10,0,Y/10)) %>% ungroup()
ggplot(sq2, aes(x=X, y=Y, fill=CLR)) + geom_tile() +
scale_fill_identity("Species", guide="legend",
labels=c('red','blue'), breaks=c('#FF0000','#0000FF'))
Producing this:
A modified version properly plots the real map, appropriately mixing the RGBs to show the species proportions per map unit. But given that mixing, the real data does not necessarily include the specific values listed in breaks, in which case no entry appears in the legend for that species. If you change the last line of the example to
labels=c('red','blue','green'), breaks=c('#FF0000','#0000FF','#00FF00'))
you get the same legend as shown, with only 'red' and 'blue' displayed, as there is no green in it. Searching the data for each max(Species) and assigning those to the legend is possible but won't make good legend keys for species that only occur in low proportions. What's needed is for the legend to display the idea of the entities present, not their attested presences -- three colors in the legend even if only one species is detected.
I'd think that scale_fill_manual() or the override.aes argument might help me here but I haven't been able to make any combination work.
Edit: Episode IV -- A New Dead End
(Thanks #r2evans for fixing my omission of packages.)
I thought I might be able to trick the legend by mutating a further column into the df in the processing pipe called spCLR to represent the color ('#FF0000', e.g.) that codes each entry's species (redundant info, but fine). Now the plotting call in my real version goes:
df %>% [everything] %>%
ggplot(aes(x = X, y = Y, height = WIDTH, width = WIDTH, fill = CLR)) +
geom_tile() +
scale_fill_identity("Species", guide="legend",
labels=spCODE, breaks=spCLR)
But this gives the error: Error in check_breaks_labels(breaks, labels) : object 'spCLR' not found. That seems weird since spCLR is indeed in the pipe-modified df, and of all the values supplied to the ggplot functions spCODE is the only one present in the original df -- so if there's some kind of scope problem I don't get it. [Re-edit -- I see that neither labels nor breaks wants to look at df$anything. Anyway.]
I assume (rightly?) there's some way to make this one work [?], but it still wouldn't make the legend show 'red', 'blue' and 'green' in my toy example -- which is what my original question is really about -- because there is still no actual green-data present in that. So to reiterate, isn't there any way to force a ggplot2 legend to show the things you want to talk about, rather than just the ones that are present in the data?
I have belatedly discovered that my question is a near-duplicate of this. The accepted answer there (from #joran) doesn't work for this but the second answer (from #Axeman) does. So the way for me to go here is that the last line should be
labels=c('red','blue','green'), limits=c('#FF0000','#0000FF','#00FF00'))
calling limits() instead of breaks(), and now my example and my real version work as desired.
I have to say I spent a lot of time digging around in the ggplot2 reference without ever gaining a suspicion that limits() was the correct alternative to breaks() -- which is explicitly mentioned in that ref page while limits() does not appear. The ?limits() page is quite uninformative, and I can't find anything that lays out the distinctions between the two: when this rather than that.
I assume from the heatmap use case that you have no other need for colour mapping in the chart. In this case, a possible workaround is to leave the fill scale alone, & create an invisible geom layer with colour aesthetic mapping to generate the desired legend instead:
ggplot(sq2, aes(x=X, y=Y)) +
geom_tile(aes(fill = CLR)) + # move fill mapping here so new point layer doesn't inherit it
scale_fill_identity() + # scale_*_identity has guide set to FALSE by default
# add invisible layer with colour (not fill) mapping, within x/y coordinates within
# same range as geom_tile layer above
geom_point(data = . %>%
slice(1:3) %>%
# optional: list colours in the desired label order
mutate(col = forcats::fct_inorder(c("red", "blue", "green"))),
aes(colour = col),
alpha = 0) +
# add colour scale with alpha set to 1 (overriding alpha = 0 above),
# also make the shape square & larger to mimic the default legend keys
# associated with fill scale
scale_color_manual(name = "Species",
values = c("red" = '#FF0000', "blue" = '#0000FF', "green" = '#00FF00'),
guide = guide_legend(override.aes = list(alpha = 1, shape = 15, size = 5)))
I want to create a histogram in R and ggplot2, in which the bins are filled based on their continuous x-value. Most tutorials only feature coloring by discrete values or density/count.
Following this example was able to color the bins with a rainbow scale:
df <- data.frame(x = runif(100))
ggplot(df) +
geom_histogram(aes(x), fill = rainbow(30))
Rainbow histogram
I want to use a color gradient, where the bins are from blue (lowest) to yellow (highest). The scale_fill_gradient() function seems to achive that, yet when i insert it in place of rainbow() for the fill argument i receive an error:
> ggplot(df) +
+ geom_histogram(aes(x), fill = scale_fill_gradient(low='blue', high='yellow'))
Error: Aesthetics must be either length 1 or the same as the data (30): fill
I tried several ways to supply the length of 30 for the scale, yet i get the same error every time. So my question is:
Is scale_color_gradient the right function for the fill argument or do i have to use another one?
If it is the right function, how can i correctly supply the length?
If you want different colors for each bin, you need to specify fill = ..x.. in the aesthetics, which is a necessary quirk of geom_histogram. Using scale_fill_gradient with your preferred color gradient then yields the following output:
ggplot(df, aes(x, fill = ..x..)) +
geom_histogram() +
scale_fill_gradient(low='blue', high='yellow')
I've made a plot using a data frame and ggplot. Here's the plot for example
I'll be using this in a presentation. In one slide, I'm going to talk about epsilon=0.1, and in the next I'll be talking about epsilon=0.5. My question is: How do I make one particular plot thicker? i.e. I wish to create a plot where the orange graph corresponding to epsilon=0.1 is thick (and thus highlighted), so the audience knows that is the graph I'm referring to.
What I would do is add an additional column to the data, thickness, which you can assign to the size aesthetic of geom_line. You simply assign a higher value to the values in thickness where epsilon equals 0.1:
df$thickness = ifelse(df$epsilon == 0.1, 2, 1)
and use it in aes() of geom_line():
ggplot(df,aes(x,y,color=as.factor(epsilon))) +
geom_line(aes(size = thickness)) + scale_size_identity()
You can simply change the value in the call to ifelse to change which line get's highlighted. Note the use of scale_size_identity to prevent ggplot from scaling the values, and simply using the values in thickness as such.
An example with the built-in dataset mtcars:
ggplot(mtcars, aes(x = wt, y = mpg, color = factor(cyl))) +
geom_line(aes(size = ifelse(mtcars$cyl == 6))) +
scale_size_identity()