Adding shaded target region to ggplot2 barchart - r

I have two data frames: one I am using to create the bars in a barchart and a second that I am using to create a shaded "target region" behind the bars using geom_rect.
Here is example data:
test.data <- data.frame(crop=c("A","B","C"), mean=c(6,4,12))
target.data <- data.frame(crop=c("ONE","TWO"), mean=c(31,12), min=c(24,9), max=c(36,14))
I start with the means of test.data for the bars and means of target.data for the line in the target region:
library(ggplot2)
a <- ggplot(test.data, aes(y=mean, x=crop)) + geom_hline(aes(yintercept = mean, color = crop), target.data) + geom_bar(stat="identity")
a
So far so good, but then when I try to add a shaded region to display the min-max range of target.data, there is an issue. The shaded region appears just fine, but somehow, the crops from target.data are getting added to the x-axis. I'm not sure why this is happening.
b <- a + geom_rect(aes(xmin=-Inf, xmax=Inf, ymin=min, ymax=max, fill = crop), data = target.data, alpha = 0.5)
b
How can I add the geom_rect shapes without adding those extra names to the x-axis of the bar-chart?

This is a solution to your question, but I'd like to better understand you problem because we might be able to make a more interpretable plot. All you have to do is add aes(x = NULL) to your geom_rect() call. I took the liberty to change the variable 'crop' in add.data to 'brop' to minimize any confusion.
test.data <- data.frame(crop=c("A","B","C"), mean=c(6,4,12))
add.data <- data.frame(brop=c("ONE","TWO"), mean=c(31,12), min=c(24,9), max=c(36,14))
ggplot(test.data, aes(y=mean, x=crop)) +
geom_hline(data = add.data, aes(yintercept = mean, color = brop)) +
geom_bar(stat="identity") +
geom_rect(data = add.data, aes(xmin=-Inf, xmax=Inf, x = NULL, ymin=min, ymax=max, fill = brop),
alpha = 0.5, show.legend = F)
In ggplot calls all of the aesthetics or aes() are inherited from the intial call:
ggplot(data, aes(x=foo, y=bar)).
That means that regardless of what layers I add on geom_rect(), geom_hline(), etc. ggplot is looking for 'foo' to assign to x and 'bar' to assign to y, unless you specifically tell it otherwise. So like aeosmith pointed out you can clear all inherited aethesitcs for a layer with inherit.aes = FALSE, or you can knock out single variables at a time by reassigning them as NULL.

Related

How to change the legend from geom_area to color in geom_line [duplicate]

I have a graph of wind speeds against direction which has a huge numeber of points, and so am using alpha=I(1/20) in addition to color=month
Here is a sample of code:
library(RMySQL)
library(ggplot2)
con <- dbConnect(...)
wind <- dbGetQuery(con, "SELECT speed_w/speed_e AS ratio, dir_58 as dir, MONTHNAME(timestamp) AS month, ROUND((speed_w+speed_e)/2) AS speed FROM tablename;");
png("ratio-by-speed.png",height=400,width=1200)
qplot(wind$dir,wind$ratio,ylim=c(0.5,1.5),xlim=c(0,360),color=wind$month,alpha=I(1/30),main="West/East against direction")
dev.off()
This produces a decent graph, however my issue is that the alpha of the legend is 1/30th also, which makes it unreadable. Is there a way I can force the legend to be 1 alpha instead?
Here is an example:
Update With the release of version 0.9.0, one can now override aesthetic values in the legend using override.aes in the guides function. So if you add something like this to your plot:
+ guides(colour = guide_legend(override.aes = list(alpha = 1)))
that should do it.
I've gotten around this by doing a duplicate call to the geom using an empty subset of the data and using the legend from that call. Unfortunately, it doesn't work if the data frame is actually empty (e.g. as you'd get from subset(diamonds,FALSE)) since ggplot2 seems to treat this case the same as it treats NULL in place of a data frame. But we can get the same effect by taking a subset with only one row and setting it to NaN on one of the plot dimensions, which will prevent it from getting plotted.
Based off Chase's example:
# Alpha parameter washes out legend:
gp <- ggplot() + geom_point(data=diamonds, aes(depth, price, colour=clarity), alpha=0.1)
print(gp)
# Full color legend:
dummyData <- diamonds[1, ]
dummyData$price <- NaN
#dummyData <- subset(diamonds, FALSE) # this would be nicer but it doesn't work!
gp <- ggplot() +
geom_point(data=diamonds, aes(depth, price, colour=clarity), alpha=0.1, legend=FALSE) +
geom_point(data=dummyData, aes(depth, price, colour=clarity), alpha=1.0, na.rm=TRUE)
print(gp)
A bit of googling turned up this post which doesn't seem to indicate that ggplot currently supports this option. Others have addressed related problems by using gridExtra and using viewPorts as discussed here.
I'm not that sophisticated, but here's one approach that should give you the desired results. The approach is to plot the geom twice, once without an alpha parameter and outside of the real plotting area. The second geom will include the alpha parameter and suppress the legend. We will then specify the plotting region with xlim and ylim. Given that you are a lot of points, this will roughly double the plotting time, but should give you the effect you are after.
Using the diamonds dataset:
#Alpha parameter washes out legend
ggplot(data = diamonds, aes(depth, price, colour = clarity)) +
geom_point(alpha = 1/10)
#Fully colored legend
ggplot() +
geom_point(data = diamonds, aes(depth, price, colour =clarity), alpha = 1/10, legend = FALSE) +
geom_point(data = diamonds, aes(x = depth - 999999, y = price - 999999, colour = clarity)) +
xlim(40, 80) + ylim(0, 20000)

R code of scatter plot for three variables

Hi I am trying to code for a scatter plot for three variables in R:
Race= [0,1]
YOI= [90,92,94]
ASB_mean = [1.56, 1.59, 1.74]
Antisocial <- read.csv(file = 'Antisocial.csv')
Table_1 <- ddply(Antisocial, "YOI", summarise, ASB_mean = mean(ASB))
Table_1
Race <- unique(Antisocial$Race)
Race
ggplot(data = Table_1, aes(x = YOI, y = ASB_mean, group_by(Race))) +
geom_point(colour = "Black", size = 2) + geom_line(data = Table_1, aes(YOI,
ASB_mean), colour = "orange", size = 1)
Image of plot: https://drive.google.com/file/d/1E-ePt9DZJaEr49m8fguHVS0thlVIodu9/view?usp=sharing
Data file: https://drive.google.com/file/d/1UeVTJ1M_eKQDNtvyUHRB77VDpSF1ASli/view?usp=sharing
Can someone help me understand where I am making mistake? I want to plot mean ASB vs YOI grouped by Race. Thanks.
I am not sure what is your desidered output. Maybe, if I well understood your question I Think that you want somthing like this.
g_Antisocial <- Antisocial %>%
group_by(Race) %>%
summarise(ASB = mean(ASB),
YOI = mean(YOI))
Antisocial %>%
ggplot(aes(x = YOI, y = ASB, color = as_factor(Race), shape = as_factor(Race))) +
geom_point(alpha = .4) +
geom_point(data = g_Antisocial, size = 4) +
theme_bw() +
guides(color = guide_legend("Race"), shape = guide_legend("Race"))
and this is the output:
#Maninder: there are a few things you need to look at.
First of all: The grammar of graphics of ggplot() works with layers. You can add layers with different data (frames) for the different geoms you want to plot.
The reason why your code is not working is that you mix the layer call and or do not really specify (and even mix) what is the scatter and line visualisation you want.
(I) Use ggplot() + geom_point() for a scatter plot
The ultimate first layer is: ggplot(). Think of this as your drawing canvas.
You then speak about adding a scatter plot layer, but you actually do not do it.
For example:
# plotting antisocal data set
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race)))
will plot your Antiscoial data set using the scatter, i.e. geom_point() layer.
Note that I put Race as a factor to have a categorical colour scheme otherwise you might end up with a continous palette.
(II) line plot
In analogy to above, you would get for the line plot the following:
# plotting Table_1
ggplot() +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean))
I save showing the plot of the line.
(III) combining different layers
# putting both together
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race))) +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean)) +
## this is to set the legend title and have a nice(r) name in your colour legend
labs(colour = "Race")
This yields:
That should explain how ggplot-layering works. Keep an eye on the datasets and geoms that you want to use. Before working with inheritance in aes, I recommend to keep the data= and aes() call in the geom_xxxx. This avoids confustion.
You may want to explore with geom_jitter() instead of geom_point() to get a bit of a better presentation of your dataset. The "few" points plotted are the result of many datapoints in the same position (and overplotted).
Moving away from plotting to your question "I want to plot mean ASB vs YOI grouped by Race."
I know too little about your research to fully comprehend what you mean with that.
I take it that the mean ASB you calculated over the whole population is your reference (aka your Table_1), and you would like to see how the Race groups feature vs this population mean.
One option is to group your race data points and show them as boxplots for each YOI.
This might be what you want. The boxplot gives you the median and quartiles, and you can compare this per group against the calculated ASB mean.
For presentation purposes, I highlighted the line by increasing its size and linetype. You can play around with the colours, etc. to give you the aesthetics you aim for.
Please note, that for the grouped boxplot, you also have to treat your integer variable YOI, I coerced into a categorical factor. Boxplot works with fill for the body (colour sets only the outer line). In this setup, you also need to supply a group value to geom_line() (I just assigned it to 1, but that is arbitrary - in other contexts you can assign another variable here).
ggplot() +
geom_boxplot(data = Antisocial, aes(x = as.factor(YOI), y = ASB, fill = as.factor(Race))) +
geom_line(data = Table_1, aes(x = as.factor(YOI), y = ASB_mean, group = 1)
, size = 2, linetype = "dashed") +
labs(x = "YOI", fill = "Race")
Hope this gets you going!

Legend type when plotting different geometries [duplicate]

Let's say I don't need a 'proper' variable mapping but still would like to have legend keys to help the chart understanding. My actual data are similar to the following df
df <- data.frame(id = 1:10, line = rnorm(10), points = rnorm(10))
library(ggplot2)
ggplot(df) +
geom_line(aes(id, line, colour = "line")) +
geom_point(aes(id, points, colour = "points"))
Basically, I would like the legend key relative to points to be.. just a point, without the line in the middle. I got close to that with this:
library(reshape2)
df <- melt(df, id.vars="id")
ggplot() +
geom_point(aes(id, value, shape = variable), df[df$variable=="points",]) +
geom_line(aes(id, value, colour = variable), df[df$variable=="line",])
but it defines two separate legends. Fixing the second code (and having to reshape my data) would be fine too, but I'd prefer a way (if any) to manually change any legend key (and keep using the first approch). Thanks!
EDIT :
thanks #alexwhan you refreshed my memory about variable mapping. However, the easiest way I've got so far is still the following (very bad hack!):
df <- data.frame(id = 1:10, line = rnorm(10), points = rnorm(10))
ggplot(df) +
geom_line(aes(id, line, colour = "line")) +
geom_point(aes(id, points, shape = "points")) +
theme(legend.title=element_blank())
which is just hiding the title of the two different legends.
Other ideas more than welcome!!!
You can use override.aes= inside guides() function to change default appearance of legend. In this case your guide is color= and then you should set shape=c(NA,16) to remove shape for line and then linetype=c(1,0) to remove line from point.
ggplot(df) +
geom_line(aes(id, line, colour = "line")) +
geom_point(aes(id, points, colour = "points"))+
guides(color=guide_legend(override.aes=list(shape=c(NA,16),linetype=c(1,0))))
I am not aware of any way to do this easily, but you can do a hack version like this (using your melted dataframe):
p <- ggplot(df.m, aes(id, value)) +
geom_line(aes(colour = variable, linetype = variable)) + scale_linetype_manual(values = c(1,0)) +
geom_point(aes(colour = variable, alpha = variable)) + scale_alpha_manual(values = c(0,1))
The key is that you need to get the mapping right to have it displayed correctly in the legend. In this case, getting it 'right', means fooling it to look the way you want it to. It's probably worth pointing out this only works because you can set linetype to blank (0) and then use the alpha scale for the points. You can't use alpha for both, because it will only take one scale.

ggplot2 add manual bar to existing plot

I'm trying to add a single, manual bar to the existing area (ribbon) plot. Ideally I just wanted to specify the x (position) and y (value) for the bar.
ExampleData <- data.frame(myID=c(1,2,3,4,5,6,7,8,9,10),PU=c(10,20,30,40,50,60,70,80,90,100))
MyPlot <- ggplot(ExampleData,aes(x=myID))
MyPlot <- MyPlot + geom_ribbon(aes(ymin=0, ymax=PU), fill="lightgray", color="darkgray", size=1)
MyPlot <- MyPlot + geom_col(aes(x=4,y=40), color="red", linetype="solid", size=1)
MyPlot
It is almost working, but for some reason the value of 40 is becoming 400, and ideally I should be able to specify the width of the bar (should be half of what we see below).
Thank you for any help!
Maybe something more like this?
ExampleData <- data.frame(myID=c(1,2,3,4,5,6,7,8,9,10),
PU=c(10,20,30,40,50,60,70,80,90,100))
bar <- data.frame(xmin = 4,xmax= 4.5,ymin = 0,ymax = 40)
ggplot() +
geom_ribbon(data = ExampleData,
aes(x = myID,ymin=0, ymax=PU),
fill="lightgray",
color="darkgray", size=1) +
geom_rect(data = bar,
aes(xmin = xmin,xmax = xmax,ymin = ymin,ymax = ymax),
color = "red")
The 40 vs 400 issue you mention happens when you specify a data frame at the top ggplot() level and then try to add layers where all the aesthetics are intended to be "set" rather than "mapped". The most common case when this happens is when people are adding text labels and you end up with many many copies of each text label plotted on top of each other.
In this case, ggplot is trying to interpret the x and y values you give geom_col in the context of ExampleData, and so ends up repeating those single values 10 times and stacking the resulting bars.

Add a box for the NA values to the ggplot legend for a continuous map

I have got a map with a legend gradient and I would like to add a box for the NA values. My question is really similar to this one and this one. Also I have read this topic, but I can't find a "nice" solution somewhere or maybe there isn't any?
Here is an reproducible example:
library(ggplot2)
map <- map_data("world")
map$value <- setNames(sample(-50:50, length(unique(map$region)), TRUE),
unique(map$region))[map$region]
map[map$region == "Russia", "value"] <- NA
ggplot() +
geom_polygon(data = map,
aes(long, lat, group = group, fill = value)) +
scale_fill_gradient2(low = "brown3", mid = "cornsilk1", high = "turquoise4",
limits = c(-50, 50),
na.value = "black")
So I would like to add a black box for the NA value for Russia. I know, I can replace the NA's by a number, so it will appear in the gradient and I think, I can write a workaround like the following, but all this workarounds do not seem like a pretty solution for me and also I would like to avoid "senseless" warnings:
ggplot() +
geom_polygon(data = map,
aes(long, lat, group = group, fill = value)) +
scale_fill_gradient2(low = "brown3", mid = "cornsilk1", high = "turquoise4",
limits = c(-50, 50),
na.value = "black") +
geom_point(aes(x = -100, y = -50, size = "NA"), shape = NA, colour = "black") +
guides(size = guide_legend("NA", override.aes = list(shape = 15, size = 10)))
Warning messages:
1: Using size for a discrete variable is not advised.
2: Removed 1 rows containing missing values (geom_point).
One approach is to split your value variable into a discrete scale. I have done this using cut(). You can then use a discrete color scale where "NA" is one of the distinct colors labels. I have used scale_fill_brewer(), but there are other ways to do this.
map$discrete_value = cut(map$value, breaks=seq(from=-50, to=50, length.out=8))
p = ggplot() +
geom_polygon(data=map, aes(long, lat, group=group, fill=discrete_value)) +
scale_fill_brewer(palette="RdYlBu", na.value="black") +
coord_quickmap()
ggsave("map.png", plot=p, width=10, height=5, dpi=150)
Another solution
Because the original poster said they need to retain the color gradient scale and the colorbar-style legend, I am posting another possible solution. It has 3 components:
We need to trick ggplot into drawing a separate color scale by using aes() to map something to color. I mapped a column of empty strings using aes(colour="").
To ensure that we do not draw a colored boundary around each polygon, I specified a manual color scale with a single possible value, NA.
Finally, guides() along with override.aes is used to ensure the new color legend is drawn as the correct color.
p2 = ggplot() +
geom_polygon(data=map, aes(long, lat, group=group, fill=value, colour="")) +
scale_fill_gradient2(low="brown3", mid="cornsilk1", high="turquoise4",
limits=c(-50, 50), na.value="black") +
scale_colour_manual(values=NA) +
guides(colour=guide_legend("No data", override.aes=list(colour="black")))
ggsave("map2.png", plot=p2, width=10, height=5, dpi=150)
It's possible, but I did it years ago. You can't use guides. You have to set individually the continuous scale for the values as well as the discrete scale for the NAs. This is what the error is telling you and this is how ggplot2 works. Did you try using both scale_continuous and scale_discrete since your set up is rather awkward, instead of simply using guides which is basically used for simple plot designs?

Resources