Possible to force non-occurring elements to show in ggplot legend? - r

I'm plotting a sort of chloropleth of up to three selectable species abundances across a research area. This toy code behaves as expected and does almost what I want:
library(dplyr)
library(ggplot2)
square <- expand.grid(X=0:10, Y=0:10)
sq2 <- square[rep(row.names(square), 2),] %>%
arrange(X,Y) %>%
mutate(SPEC = rep(c('red','blue'),len=n())) %>%
mutate(POP = ifelse(SPEC %in% 'red', X, Y)) %>%
group_by(X,Y) %>%
mutate(CLR = rgb(X/10,0,Y/10)) %>% ungroup()
ggplot(sq2, aes(x=X, y=Y, fill=CLR)) + geom_tile() +
scale_fill_identity("Species", guide="legend",
labels=c('red','blue'), breaks=c('#FF0000','#0000FF'))
Producing this:
A modified version properly plots the real map, appropriately mixing the RGBs to show the species proportions per map unit. But given that mixing, the real data does not necessarily include the specific values listed in breaks, in which case no entry appears in the legend for that species. If you change the last line of the example to
labels=c('red','blue','green'), breaks=c('#FF0000','#0000FF','#00FF00'))
you get the same legend as shown, with only 'red' and 'blue' displayed, as there is no green in it. Searching the data for each max(Species) and assigning those to the legend is possible but won't make good legend keys for species that only occur in low proportions. What's needed is for the legend to display the idea of the entities present, not their attested presences -- three colors in the legend even if only one species is detected.
I'd think that scale_fill_manual() or the override.aes argument might help me here but I haven't been able to make any combination work.
Edit: Episode IV -- A New Dead End
(Thanks #r2evans for fixing my omission of packages.)
I thought I might be able to trick the legend by mutating a further column into the df in the processing pipe called spCLR to represent the color ('#FF0000', e.g.) that codes each entry's species (redundant info, but fine). Now the plotting call in my real version goes:
df %>% [everything] %>%
ggplot(aes(x = X, y = Y, height = WIDTH, width = WIDTH, fill = CLR)) +
geom_tile() +
scale_fill_identity("Species", guide="legend",
labels=spCODE, breaks=spCLR)
But this gives the error: Error in check_breaks_labels(breaks, labels) : object 'spCLR' not found. That seems weird since spCLR is indeed in the pipe-modified df, and of all the values supplied to the ggplot functions spCODE is the only one present in the original df -- so if there's some kind of scope problem I don't get it. [Re-edit -- I see that neither labels nor breaks wants to look at df$anything. Anyway.]
I assume (rightly?) there's some way to make this one work [?], but it still wouldn't make the legend show 'red', 'blue' and 'green' in my toy example -- which is what my original question is really about -- because there is still no actual green-data present in that. So to reiterate, isn't there any way to force a ggplot2 legend to show the things you want to talk about, rather than just the ones that are present in the data?

I have belatedly discovered that my question is a near-duplicate of this. The accepted answer there (from #joran) doesn't work for this but the second answer (from #Axeman) does. So the way for me to go here is that the last line should be
labels=c('red','blue','green'), limits=c('#FF0000','#0000FF','#00FF00'))
calling limits() instead of breaks(), and now my example and my real version work as desired.
I have to say I spent a lot of time digging around in the ggplot2 reference without ever gaining a suspicion that limits() was the correct alternative to breaks() -- which is explicitly mentioned in that ref page while limits() does not appear. The ?limits() page is quite uninformative, and I can't find anything that lays out the distinctions between the two: when this rather than that.

I assume from the heatmap use case that you have no other need for colour mapping in the chart. In this case, a possible workaround is to leave the fill scale alone, & create an invisible geom layer with colour aesthetic mapping to generate the desired legend instead:
ggplot(sq2, aes(x=X, y=Y)) +
geom_tile(aes(fill = CLR)) + # move fill mapping here so new point layer doesn't inherit it
scale_fill_identity() + # scale_*_identity has guide set to FALSE by default
# add invisible layer with colour (not fill) mapping, within x/y coordinates within
# same range as geom_tile layer above
geom_point(data = . %>%
slice(1:3) %>%
# optional: list colours in the desired label order
mutate(col = forcats::fct_inorder(c("red", "blue", "green"))),
aes(colour = col),
alpha = 0) +
# add colour scale with alpha set to 1 (overriding alpha = 0 above),
# also make the shape square & larger to mimic the default legend keys
# associated with fill scale
scale_color_manual(name = "Species",
values = c("red" = '#FF0000', "blue" = '#0000FF', "green" = '#00FF00'),
guide = guide_legend(override.aes = list(alpha = 1, shape = 15, size = 5)))

Related

How can I automatically combine legends in ggplot when one categorical variables is a nested subset of the other?

I am working with a set of biological data, an example of which is as follows...
library(tidyverse)
data<-data.frame(
order=c("Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Perissodactyla","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Perissodactyla","Perissodactyla","Artiodactlya","Artiodactlya","Artiodactlya","Proboscidea","Perissodactyla","Perissodactyla","Perissodactyla","Perissodactyla","Perissodactyla","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Proboscidea","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Perissodactyla","Perissodactyla","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Perissodactyla","Perissodactyla","Perissodactyla","Perissodactyla","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya","Artiodactlya"),
family=c("Bovidae","Bovidae","Bovidae","Cervidae","Bovidae","Bovidae","Bovidae","Antilocapridae","Bovidae","Cervidae","Cervidae","Suidae","Bovidae","Cervidae","Bovidae","Bovidae","Bovidae","Bovidae","Bovidae","Bovidae","Bovidae","Bovidae","Camelidae","Camelidae","Bovidae","Bovidae","Bovidae","Bovidae","Bovidae","Cervidae","Cervidae","Bovidae","Bovidae","Bovidae","Bovidae","Tayassuidae","Bovidae","Bovidae","Bovidae","Bovidae","Bovidae","Bovidae","Bovidae","Rhinocerotidae","Cervidae","Cervidae","Hippopotamidae","Bovidae","Bovidae","Cervidae","Bovidae","Bovidae","Rhinocerotidae","Rhinocerotidae","Bovidae","Cervidae","Cervidae","Elephantidae","Equidae","Equidae","Equidae","Equidae","Equidae","Bovidae","Bovidae","Bovidae","Bovidae","Bovidae","Bovidae","Bovidae","Giraffidae","Bovidae","Cervidae","Hippopotamidae","Bovidae","Bovidae","Tragulidae","Suidae","Bovidae","Bovidae","Bovidae","Bovidae","Bovidae","Camelidae","Camelidae","Bovidae","Elephantidae","Bovidae","Bovidae","Bovidae","Cervidae","Cervidae","Cervidae","Cervidae","Cervidae","Cervidae","Cervidae","Tragulidae","Moschidae","Moschidae","Cervidae","Cervidae","Cervidae","Cervidae","Cervidae","Bovidae","Bovidae","Bovidae","Bovidae","Bovidae","Bovidae","Bovidae","Bovidae","Cervidae","Cervidae","Giraffidae","Bovidae","Bovidae","Bovidae","Bovidae","Bovidae","Bovidae","Bovidae","Bovidae","Bovidae","Bovidae","Bovidae","Cervidae","Bovidae","Tayassuidae","Bovidae","Suidae","Suidae","Bovidae","Bovidae","Suidae","Suidae","Suidae","Bovidae","Bovidae","Bovidae","Bovidae","Cervidae","Cervidae","Cervidae","Bovidae","Bovidae","Bovidae","Bovidae","Bovidae","Rhinocerotidae","Rhinocerotidae","Cervidae","Bovidae","Suidae","Suidae","Suidae","Suidae","Bovidae","Bovidae","Tapiridae","Tapiridae","Tapiridae","Tapiridae","Tayassuidae","Bovidae","Bovidae","Bovidae","Bovidae","Bovidae","Bovidae","Bovidae","Bovidae","Bovidae","Tragulidae","Tragulidae","Tragulidae","Tragulidae","Camelidae","Camelidae"),
y=c(-0.001863125,-0.102166005,-0.082681787,-0.034819115,-0.203971372,-0.14608609,-0.124751937,-0.060228156,-0.26087436,-0.058897042,-0.021611439,0.055752278,0.122263006,-0.123637398,0.170119323,0.108469085,0.013105529,-0.182166486,0.157200336,0.215896371,-0.069659876,-0.035434215,0.019311512,-0.442534178,-0.135233526,-0.011888411,-0.029017369,-0.050831484,0.050087067,-0.201617248,-0.216700049,0.010216729,-0.098640743,0.021208928,-0.040445327,0.040405856,-0.17503882,0.035631826,-0.028174861,-0.073752856,-0.056991923,0.060354632,-0.086089782,0.383185174,-0.029138017,-0.185213812,0.314313318,0.033053219,0.056801385,-0.113646176,-0.03677128,-0.056918928,0.25985937,0.28245838,-0.236766645,-0.207875916,0.059422923,0.497636371,0.091523907,-0.148471043,-0.018225364,0.023796819,-0.110043566,-0.240739321,-0.12247444,-0.42435168,0.048939773,-0.240873039,-0.235269232,-0.186239865,-0.039528773,-0.101909681,-0.050572131,0.257243087,-0.160718975,0.036663509,-0.050196678,0.094949553,0.015759057,-0.07056069,-0.086559596,0.024333424,0.007231104,-0.274224886,-0.22015192,-0.211627074,0.426924042,-0.082525667,-0.131491001,-0.178848667,-0.003254554,-0.116941389,-0.09253145,-0.170562248,-0.0715427,-0.172290183,-0.071607145,-0.075225703,-0.234899977,-0.219243408,-0.125744609,-0.071939465,-0.143435232,-0.057530919,0.146632885,0.036838395,0.139369968,-0.03057342,-0.060313888,-0.143063257,-0.120948779,-0.106405272,-0.043558975,-0.143716061,-0.028329571,0.127768192,-0.043361589,-0.198177174,-0.07460359,-0.04461378,-0.050870971,-0.216857513,0.059916575,0.017476752,0.000504442,-0.154878275,-0.064869262,-0.173919684,-0.220541878,0.077789088,-0.289841358,0.107983011,0.136913671,-0.024349708,-0.143390158,0.055258952,0.172375097,0.162804834,-0.215683398,-0.225801926,0.043443631,-0.032301548,-0.117351289,-0.131936913,-0.046815394,-0.116160051,-0.069521163,0.054158881,-0.161774587,-0.14186065,0.323341205,0.295046188,0.034689389,-0.375590361,-0.032423553,0.107506313,0.088711276,0.12349364,-0.117709213,0.26788402,-0.053017818,-0.007870963,0.073981956,-0.077337294,0.077832946,-0.231063286,-0.01175521,-0.100368044,0.089356041,-0.068279906,0.122043925,-0.116726111,-0.063823308,-5.16E-05,-0.174039832,-0.160276752,-0.090239835,-0.022758897,-0.41815848,-0.192715601),
x=c(3.208441356,3.124588445,3.281248903,3.412877815,3.067591548,3.192802934,3.087885763,3.115415346,3.12499302,3.14482989,3.027751994,3.006893708,3.397121498,3.207634367,3.430877175,3.394930445,3.434355962,3.438225808,3.247873551,3.438257472,3.127752516,3.334252642,3.458311439,3.479717375,3.133538908,3.06679081,2.99343623,3.224377652,3.198973072,3.058220344,3.068204421,3.035769836,3.22549666,3.197225406,2.938519725,3.007831012,2.99343623,2.925827575,2.974636761,2.931966115,2.880813592,3.105510185,2.929418926,3.532822299,3.331312058,3.136088637,3.212498338,3.241795431,3.268845404,3.159717546,3.232585053,3.133538908,3.394858641,3.446340342,2.95658859,3.007534418,3.236903523,3.53140227,3.124504225,3.413048169,3.332332398,3.316079605,3.355930187,3.092204872,2.988413942,3.086074954,2.888224438,3.023663918,2.981750673,3.049835132,3.468697676,3.190877725,3.191730393,3.492377901,3.384484836,3.27277331,2.885587356,3.211947744,3.296575356,3.151965256,3.191868021,3.155336037,3.125887718,3.216211914,3.252002216,3.124226996,3.517468145,2.737272177,2.792509648,2.729124275,3.019393264,3.02596091,2.942775895,2.971275849,3.004751156,3.01911629,2.949390007,2.725225362,2.894133222,2.963787827,3.020293816,2.981356442,2.924623828,2.858837851,2.982271233,2.988049072,3.031024499,3.105368235,3.12742737,3.145196406,2.669316881,2.645606462,2.759037976,3.140388312,3.068585203,3.26020285,3.108498601,2.92800816,3.292497894,3.220003426,3.295699017,2.983153657,3.344496133,3.195470679,3.125604258,3.132049171,3.095595491,3.034828916,3.106417565,2.932465493,3.068185862,3.098369548,3.079586399,2.584331224,2.806204493,2.709108767,3.082323374,3.036807434,3.093561758,3.022881813,3.075106513,3.176091259,2.812913357,2.882026208,3.217396199,2.896007841,2.890234666,3.088042782,3.061461418,3.116590166,3.536558443,3.501671858,3.181266849,3.171897335,3.151632475,3.070037867,3.148385521,3.161368002,2.938007311,3.377682241,3.301073423,3.362618171,3.231405869,3.332299129,3.006551609,3.028083007,3.148669341,3.372912003,3.260071388,3.22816929,3.379686983,3.109281304,3.173037857,3.29666519,2.652971172,2.664936787,2.724531025,2.602059991,3.23192811,3.110408903))
I am trying to make a ggplot of my data and separate the points by shape and color for easier reading. In this case because the data are biological all of the categorical variables are nested, i.e. "Bovidae", "Camelidae", "Cervidae" are always "Artiodactyla", "Equidae" and "Rhinocerotidae" are always "Perissodactyla", etc.
However, this creates an issue when I try to plot the data in that I cannot easily get the different shapes and colors/fills to form a single legend. Normally if the shape and fills have the same level it is possible to make them form a single legend, but here even though shape is a nested subset of the values used to scale color I cannot easily combine them into a single legend, see below.
ggplot(data,aes(x,y))+
geom_point(aes(color=family,shape=order))+
labs(color="family",shape="family")
I know I can put a manual override for the legend using a command like the following:
scale_shape_manual(values=c(1,2,3),guide=FALSE)+
guides(shape=guide_legend(override.aes=list(shape=c(1,1,1,3,2...)))) ## i.e., code it whatever order of shapes desired
However, this requires me to manually type out the order of the various categories every time, which is very inefficient when I have a large number of categories as can be seen in the above graph and that in some of my graphs the order of the categories is not always the same (because data is missing for some groups or a certain subset is under consideration). It is also very easy to accidentally input an error if I do not enter the shapes in the precise order or make a lapsus.
Is there any way to automatically change the shapes on the legend when one variable is a nested subset of the other like this? That is, every level of "family" is technically already present in "order", merely duplicated, such that it is possible to write case_when or if_else statements for them like so...
mutate(order=case_when(family=="Elephantidae"~"Proboscidea",
family %in% c("Rhinocerotidae","Tapiridae","Equidae")~"Perissodactyla",
TRUE~"Artiodactyla"))
Can that be used to automatically combine shape and color into a single legend?
You can create some small lookup tables to match family to orders and orders to shapes. Then, you can use this to override the default shapes in the colour legend. I recommend you explicitly set the breaks of the colour scale and values of the shape scale, because it is easy for mistakes to slip in.
library(tidyverse)
# data <- data.frame(...) # omitted for brevity
# Make lookup tables
fam2order <- setNames(data$order, data$family)
order2shape <- c("Artiodactlya" = 16, "Perissodactyla" = 17, "Proboscidea" = 15)
# Make fam2order unique without dropping names
fam2order <- fam2order[!duplicated(data[, c("order", "family")])]
ggplot(data, aes(x, y)) +
geom_point(aes(colour = family, shape = order)) +
scale_colour_discrete(
breaks = names(fam2order),
guide = guide_legend(
override.aes = list(shape = order2shape[fam2order])
)
) +
scale_shape_manual(values = order2shape, guide = "none")
In addition, because you've effectively made a shape palette already, you can use that lookup table directly in combination with scale_shape_identity().
ggplot(data, aes(x, y)) +
geom_point(aes(colour = family, shape = order2shape[order])) +
scale_colour_discrete(
breaks = names(fam2order),
guide = guide_legend(
override.aes = list(shape = order2shape[fam2order])
)
) +
scale_shape_identity()
Yet another option is to use the same breaks for a colour and shape scale, but simply have repeated values for manual shapes. This dispenses with the whole override approach.
ggplot(data, aes(x, y)) +
geom_point(aes(colour = family, shape = family)) +
scale_colour_discrete(
breaks = names(fam2order)
) +
scale_shape_manual(
values = setNames(order2shape[fam2order], names(fam2order))
)

Issue adding second variable to scatter plot in R

Been set this question for an assignment - but i've never used R before - any help is appreciated.
Many thanks.
Question:
Produce a scatter plot to compare CO2 emissions from Brazil and Argentina between 1950 and 2019....
I can get it for Brazil but cannot figure out how to add Argentina.
I think i have to do something with geom_point and filter?
df%>%
filter(Country=="Brazil", Year<=2019 & Year>=1950) %>%
ggplot(aes(x = Year, y = CO2_annual_tonnes)) +
geom_point(na.rm =TRUE, shape=20, size=2, colour="green") +
labs(x = "Year", y = "CO2Emmissions (tonnes)")
The answer depends on what you're looking to do, but generally adding another dimension to a scatter plot where you already have clear x and y dimensions is done by applying an aesthetic (color, shape, etc) or via faceting.
In both approaches, you actually don't want to filter the data. You use either aesthetics or faceting to "filter" in a way and map the data appropriately based on the country column in the dataset. If your dataset contains more countries than Argentina and Brazil, you will want to filter to only include those, so:
your_filtered_df <- your_df %>%
dplyr::filter(Country %in% c("Argentina", "Brazil"))
Faceting
Faceting is another way of saying you want to split up your one plot into two separate plots (one for Argentina, one for Brazil). Each plot will have the same aesthetics (look the same), but will have the appropriate "filtered" dataset.
In your case, you can try:
your_filtered_df %>%
ggplot(aes(x = Year, y = CO2_annual_tonnes)) +
geom_point(na.rm =TRUE, shape=20, size=2, colour="green") +
facet_wrap(~Country)
Aesthetics
Here, you have a lot of options. The idea is that you tell ggplot2 to map the appearance of individual points in the point geom to the value specified in your_filtered_df$Country. You do this by placing one of the aesthetic arguments for geom_point() inside of aes(). If you use shape=, for example it might look like this:
your_filtered_df %>%
ggplot(aes(x = Year, y = CO2_annual_tonnes)) +
geom_point(aes(shape=Country), na.rm =TRUE, size=2, colour="green")
This should show a plot that has a legend created to and two different shapes for the points that correspond to the country name. It's very important to remember that when you put an aesthetic like shape or color or size inside of aes(), you must not also have it outside. So, this will behave correctly:
geom_point(aes(colour=Country), ...)
But this will not:
geom_point(aes(colour=Country), colour="green", ...)
When one aesthetic is outside, it overrides the one in aes(). The second one will still show all points as green.
Don't Do this... but it works
OP posted a comment that indicated some additional hints from the professor, which was:
We were given the hint in the question "you can embed piped filter
functions within geom_point objects"
I believe they are referring to a final... very bad way of generating the points. This method would require you to have two geom_point() objects, and send each one a different filtered dataset. You can do this by accessing the data= argument within each geom_point() object. There are many problems with this approach, including the lack of a legend being generated, but if you simply must do it this way... here it is:
# painful to write this. it goes against all good practices with ggplot
your_filtered_df %>%
ggplot(aes(x = Year, y = CO2_annual_tonnes)) +
geom_point(data=your_filtered_df %>% dplyr::filter(Country=="Argentina"),
color="green", shape=20) +
geom_point(data=your_filtered_df %>% dplyr::filter(Country=="Brazil"),
color="red", shape=20)
You should probably see why this is not a good convention. Think about what you would do for representing 50 different countries... the above codes or methods would work, but with this method, you would have 50 individual geom_point() objects in your plot... ugh. Don't make a typo!

How to constrain the format of strings ggplots and interaction generate to find colors in a named color vector?

ggplot2 cannot find the color I generate, intended for parts of the bar diagram I want to generate.
The problem comes from the way ggplot2 converts numbers from group and fill parameters in aes() to generates strings that are used as keys to find the correct color in the named color vector I generate.
It appears that the large numbers involved in my plots are turned into scientific notation, cutting all right-most decimal zeros from the converted string, whereas the keys in the named color vector I generate provide always keep the 2 first digits.
I give below a simple example that illustrates my problem
#!/usr/bin/Rscript
library(ggplot2)
frame = data.frame(
varA = c(5, 5, 5),
varB = c(1, 2, 3),
varC = as.factor(c(
4e+08,
1.05e+09,
1.75e+09
)),
varD = c(1, 1, 1)
)
colors = c("#BB0000", "#00BB00", "#0000BB")
names(colors) = c("4.00e+08.1", "1.05e+09.1", "1.75e+09.1")
plot = ggplot() +
guides(
fill = guide_legend(title = "varA")
) +
scale_fill_manual(
values = colors
) +
geom_bar(
data = frame,
aes(
varA,
varB,
group = interaction(varC, varD),
fill = interaction(varC, varD)
),
position = "dodge",
stat = "identity"
)
ggsave(file = "plot.svg", plot = plot, width = 4, height = 3)
which results in this figure:
An obvious solution to this simple example would be to fix the key for red into "4e+08" instead of "4.00e+08", but that would not work in my full-blown use case. First because I do not hard code the colors but I generate them from the context in which I run this script. Second because I observed that sometime ggplot2 preserves some of my numbers as plain integers, sometimes it converts some other into scientific notation and I don't know what are the rules about this decision.
As I see it, there are two solutions.
The first one would be to make sure that I generate color vector keys that will be the same as ggplot2 looks for colors. It implies that I can mitigate the second problem Identified above, with documentation I failed to find.
The second (preferred) solution would be to constrain the way ggplot2 formats the color key string it uses so that it fits the named color vector I provide. I suspect this second solution is the same as constraining the way ggplot2 generate strings in the legend (more specifically, how it treats number formats) but I could not find any information about that either.
I am aware that I can give ggplot2 a simple vector of colors and it will fetch colors in the same order as it orders each bar. I cannot accept it as I want to make sure that all colors are consistent from plot to plot, even if I choose to show or to hide some bars in some of them.
My question is therefore:
How can I constraint ggplots, probably via geom_bar() and/or interaction() so that I looks for colors the way I need it to?
If this is not possible, where can I find how ggplot2 precisely converts (or does not convert) big numbers into scientific notations so I can generate correct name color vectors?
Version information :
R v3.4.1
ggplot2 v2.2.1
ggplot is not really at fault here. The behavior is caused by the interaction function, which creates a factor. More directly, it is caused by factor (which underlies interaction), which in turn simply calls as.character. See:
> as.character(frame$varC)
[1] "4e+08" "1.05e+09" "1.75e+09"
This should solve your problems:
frame$group <- interaction(frame$varC, frame$varD)
colors <- setNames(c("#BB0000", "#00BB00", "#0000BB"), levels(frame$group))
ggplot(frame, aes(varA, varB, group = group, fill = group)) +
geom_col(position = 'dodge') +
scale_fill_manual(values = colors)

Dot Priority in ggplot2 jittered scatterplot [duplicate]

I'm plotting a dense scatter plot in ggplot2 where each point might be labeled by a different color:
df <- data.frame(x=rnorm(500))
df$y = rnorm(500)*0.1 + df$x
df$label <- c("a")
df$label[50] <- "point"
df$size <- 2
ggplot(df) + geom_point(aes(x=x, y=y, color=label, size=size))
When I do this, the scatter point labeled "point" (green) is plotted on top of the red points which have the label "a". What controls this z ordering in ggplot, i.e. what controls which point is on top of which?
For example, what if I wanted all the "a" points to be on top of all the points labeled "point" (meaning they would sometimes partially or fully hide that point)? Does this depend on alphanumerical ordering of labels?
I'd like to find a solution that can be translated easily to rpy2.
2016 Update:
The order aesthetic has been deprecated, so at this point the easiest approach is to sort the data.frame so that the green point is at the bottom, and is plotted last. If you don't want to alter the original data.frame, you can sort it during the ggplot call - here's an example that uses %>% and arrange from the dplyr package to do the on-the-fly sorting:
library(dplyr)
ggplot(df %>%
arrange(label),
aes(x = x, y = y, color = label, size = size)) +
geom_point()
Original 2015 answer for ggplot2 versions < 2.0.0
In ggplot2, you can use the order aesthetic to specify the order in which points are plotted. The last ones plotted will appear on top. To apply this, you can create a variable holding the order in which you'd like points to be drawn.
To put the green dot on top by plotting it after the others:
df$order <- ifelse(df$label=="a", 1, 2)
ggplot(df) + geom_point(aes(x=x, y=y, color=label, size=size, order=order))
Or to plot the green dot first and bury it, plot the points in the opposite order:
ggplot(df) + geom_point(aes(x=x, y=y, color=label, size=size, order=-order))
For this simple example, you can skip creating a new sorting variable and just coerce the label variable to a factor and then a numeric:
ggplot(df) +
geom_point(aes(x=x, y=y, color=label, size=size, order=as.numeric(factor(df$label))))
ggplot2 will create plots layer-by-layer and within each layer, the plotting order is defined by the geom type. The default is to plot in the order that they appear in the data.
Where this is different, it is noted. For example
geom_line
Connect observations, ordered by x value.
and
geom_path
Connect observations in data order
There are also known issues regarding the ordering of factors, and it is interesting to note the response of the package author Hadley
The display of a plot should be invariant to the order of the data frame - anything else is a bug.
This quote in mind, a layer is drawn in the specified order, so overplotting can be an issue, especially when creating dense scatter plots. So if you want a consistent plot (and not one that relies on the order in the data frame) you need to think a bit more.
Create a second layer
If you want certain values to appear above other values, you can use the subset argument to create a second layer to definitely be drawn afterwards. You will need to explicitly load the plyr package so .() will work.
set.seed(1234)
df <- data.frame(x=rnorm(500))
df$y = rnorm(500)*0.1 + df$x
df$label <- c("a")
df$label[50] <- "point"
df$size <- 2
library(plyr)
ggplot(df) + geom_point(aes(x = x, y = y, color = label, size = size)) +
geom_point(aes(x = x, y = y, color = label, size = size),
subset = .(label == 'point'))
Update
In ggplot2_2.0.0, the subset argument is deprecated. Use e.g. base::subset to select relevant data specified in the data argument. And no need to load plyr:
ggplot(df) +
geom_point(aes(x = x, y = y, color = label, size = size)) +
geom_point(data = subset(df, label == 'point'),
aes(x = x, y = y, color = label, size = size))
Or use alpha
Another approach to avoid the problem of overplotting would be to set the alpha (transparancy) of the points. This will not be as effective as the explicit second layer approach above, however, with judicious use of scale_alpha_manual you should be able to get something to work.
eg
# set alpha = 1 (no transparency) for your point(s) of interest
# and a low value otherwise
ggplot(df) + geom_point(aes(x=x, y=y, color=label, size=size,alpha = label)) +
scale_alpha_manual(guide='none', values = list(a = 0.2, point = 1))
The fundamental question here can be rephrased like this:
How do I control the layers of my plot?
In the 'ggplot2' package, you can do this quickly by splitting each different layer into a different command. Thinking in terms of layers takes a little bit of practice, but it essentially comes down to what you want plotted on top of other things. You build from the background upwards.
Prep: Prepare the sample data. This step is only necessary for this example, because we don't have real data to work with.
# Establish random seed to make data reproducible.
set.seed(1)
# Generate sample data.
df <- data.frame(x=rnorm(500))
df$y = rnorm(500)*0.1 + df$x
# Initialize 'label' and 'size' default values.
df$label <- "a"
df$size <- 2
# Label and size our "special" point.
df$label[50] <- "point"
df$size[50] <- 4
You may notice that I've added a different size to the example just to make the layer difference clearer.
Step 1: Separate your data into layers. Always do this BEFORE you use the 'ggplot' function. Too many people get stuck by trying to do data manipulation from with the 'ggplot' functions. Here, we want to create two layers: one with the "a" labels and one with the "point" labels.
df_layer_1 <- df[df$label=="a",]
df_layer_2 <- df[df$label=="point",]
You could do this with other functions, but I'm just quickly using the data frame matching logic to pull the data.
Step 2: Plot the data as layers. We want to plot all of the "a" data first and then plot all the "point" data.
ggplot() +
geom_point(
data=df_layer_1,
aes(x=x, y=y),
colour="orange",
size=df_layer_1$size) +
geom_point(
data=df_layer_2,
aes(x=x, y=y),
colour="blue",
size=df_layer_2$size)
Notice that the base plot layer ggplot() has no data assigned. This is important, because we are going to override the data for each layer. Then, we have two separate point geometry layers geom_point(...) that use their own specifications. The x and y axis will be shared, but we will use different data, colors, and sizes.
It is important to move the colour and size specifications outside of the aes(...) function, so we can specify these values literally. Otherwise, the 'ggplot' function will usually assign colors and sizes according to the levels found in the data. For instance, if you have size values of 2 and 5 in the data, it will assign a default size to any occurrences of the value 2 and will assign some larger size to any occurrences of the value 5. An 'aes' function specification will not use the values 2 and 5 for the sizes. The same goes for colors. I have exact sizes and colors that I want to use, so I move those arguments into the 'geom_plot' function itself. Also, any specifications in the 'aes' function will be put into the legend, which can be really useless.
Final note: In this example, you could achieve the wanted result in many ways, but it is important to understand how 'ggplot2' layers work in order to get the most out of your 'ggplot' charts. As long as you separate your data into different layers before you call the 'ggplot' functions, you have a lot of control over how things will be graphed on the screen.
It's plotted in order of the rows in the data.frame. Try this:
df2 <- rbind(df[-50,],df[50,])
ggplot(df2) + geom_point(aes(x=x, y=y, color=label, size=size))
As you see the green point is drawn last, since it represents the last row of the data.frame.
Here is a way to order the data.frame to have the green point drawn first:
df2 <- df[order(-as.numeric(factor(df$label))),]

ggplot boxplot + fill + color brewer spectrum

I can't seem to be able to fill a boxplot by a continuous value using color brewer, and I know it must just be a simple swap of syntax somewhere, since I can get the outlines of the boxes to adjust based on continuous values. Here's the data I'm working with:
data <- data.frame(
value = sample(1:50),
animals = sample(c("cat","dog","zebra"), 50, replace = TRUE),
region = sample(c("forest","desert","tundra"), 50, replace = TRUE)
)
I want to make a paneled boxplot, ordered by median "value", with the depth of color fill for each box increasing with "value" (I know this is redundant, but bear with me for the sake of the example)
(Ordering the data):
orderindex <- order(as.numeric(by(data$value, data$animals, median)))
data$animals <- ordered(data$animals, levels=levels(data$animals)[orderindex])
If I create the boxplot with panels, I can adjust the color of the outlines:
library(ggplot2)
first <- qplot(animals, value, data = data, colour=animals)
second <- first + geom_boxplot() + facet_grid(~region)
third <- second + scale_colour_brewer()
print(third)
But I want to do what I did to the outlines, but instead with the fill of each box (so each box gets darker as "value" increases). I thought that it might be a matter of putting the "scale_colour_brewer()" argument within the aesthetic argument for geom_boxplot, ie
second <- first + geom_boxplot(aes(scale_colour_brewer())) + facet_grid(~region)
but that doesn't seem to do the trick. I know it's a matter of positioning for this "scale_colour_brewer" argument; I just don't know where it goes!
(there is a similar question here but it's not quite what I'm looking for, since the colors of the box don't increase along a spectrum/gradient with some continuous value; it looks like these values are basically factors: Add color to boxplot - "Continuous value supplied to discrete scale" error, and the example at the ggplot site with the cars package:
http://docs.ggplot2.org/0.9.3.1/geom_boxplot.html doesn't seem to work when I set "fill" to "value" ... I get the error:
Error in unit(tic_pos.c, "mm") : 'x' and 'units' must have length > 0)
)
If you need to set fill for the boxplots then instead of color=animals use fill=animals and the same way replace scale_color_brewer() with scale_fill_brewer().
qplot(animals, value, data = data, fill=animals)+
geom_boxplot() + facet_grid(~region) + scale_fill_brewer()

Resources