Setting breakpoints for data with scale_fill_brewer() function in ggplot2 - r

I am creating a map (choropleth) as described on the ggplot2 wiki. Everything works like a charm, except that I am running into an issue mapping a continuous value to the polygon fill color via the scale_fill_brewer() function.
This question describes the problem I'm having. As in the answer, my workaround has been to pre-cut my data into bins using the gtools quantcut() function:
UPDATE: This first example is actually the right way to do this
require(gtools) # needed for quantcut()
...
fill_factor <- quantcut(fill_continuous, q=seq(0,1,by=0.25))
ggplot(mydata) +
aes(long,lat,group=group,fill=fill_factor) +
geom_polygon() +
scale_fill_brewer(name="mybins", palette="PuOr")
This works, however, I feel like I should be able to skip the step of pre-cutting my data and do something like this with the breaks option:
ggplot(mydata) +
aes(long,lat,group=group,fill=fill_continuous) +
geom_polygon() +
scale_fill_brewer(names="mybins", palette="PuOr", breaks=quantile(fill_continuous))
But this doesn't work. Instead I get an error something like:
Continuous variable (composite score) supplied to discrete scale_brewer.
Have I misunderstood the purpose of the "breaks" option? Or is breaks broken?

A major issue with pre-cutting continuous data is that there are three pieces of information used at different points in the code:
The Brewer palette -- determines the maximum number of colors available
The number of break points (or the bin width) -- has to be specified with the data
The actual data to be plotted -- influences the choice of the Brewer palette (sequential/diverging)
A true vicious circle. This can be broken by providing a function that accepts the data and the palette, automatically derives the number of break points and returns an object that can be added to the ggplot object. Something along the following lines:
fill_brewer <- function(fill, palette) {
require(RColorBrewer)
n <- brewer.pal.info$maxcolors[palette == rownames(brewer.pal.info)]
discrete.fill <- call("quantcut", match.call()$fill, q=seq(0, 1, length.out=n))
list(
do.call(aes, list(fill=discrete.fill)),
scale_fill_brewer(palette=palette)
)
}
Use it like this:
ggplot(mydata) + aes(long,lat,group=group) + geom_polygon() +
fill_brewer(fill=fill_continuous, palette="PuOr")

As Hadley explains, the breaks option moves the ticks, but does not make the data continuous. Therefore pre-cutting the data as per the first example in the question is the right way to use the scale_fill_brewer command.

Related

General way to break on unique values in ggplot2 continuous scales

I often need to create custom breaks for axis or color/fill/size to reflect the actual data point. Typically in my data, the variable is continuous, but the measurement is at discrete points. I think this may apply to many others from what I see on SO. Below is an example of plotting mpg vs. cyl:
mpg %>%
ggplot(aes(cyl, cty)) +
geom_point() +
scale_x_continuous(breaks = unique(mpg$cyl))
But one does not really want to type different "mpg$cyl" for different exploratory data analysis all the time. So I am here to look for a general solution.
p.s. I read that ggplot does not pass the data to the scale functions -- probably just the range for calculation. I filed an issue but have not yet get any response.
Indeed, ggplot2 does not have a general way to do this. For continuous scales, the training method is to update the range of the scale every time a new layer is examined. It makes sense in the 'grammar of graphics' that scales are mostly independent of geometry layers.
You could, in theory, tackle this problem from the bottom up by making a new Range ggproto class that keeps track of unique values. However, ggplot2 does not export their Range classes, which likely means they don't support tinkering with this. Also, its quite the task to setup a new type of scale.
Instead I'm proposing to hack the ggplot_add() method to leak information from the global plot to the scale. First thing to do is to wrap the constructor of a scale, that tags on an extra class to that scale.
library(ggplot2)
scale_x_unique <- function(...) {
sc <- scale_x_continuous(...)
new <- ggproto("ScaleUnique", sc)
new
}
Next, we want to update the ggplot_add method for our ScaleUnique class. The function beneath essentially checks if there are any user-defined breaks and, if there are none, evaluate the scale's aesthetics in the global plot data.
ggplot_add.ScaleUnique <- function(object, plot, object_name) {
# "waiver" class is for undefined arguments
if (inherits(object$breaks, "waiver")) {
# Find common aesthetic between scale and plot mapping
aes <- intersect(object$aesthetics, names(plot$mapping))
# Find out the expression associated with that aesthetic
aes <- plot$mapping[[aes[[1]]]]
# Evaluate the aesthetic
values <- rlang::eval_tidy(aes, plot$data)
# Assign unique values to breaks
object$breaks <- sort(unique(values))
}
plot$scales$add(object)
plot
}
Now you can use it like any other scale
ggplot(mpg, aes(cyl, cty)) +
geom_point() +
scale_x_unique()
Created on 2021-08-11 by the reprex package (v1.0.0)
This of course only works if the aesthetic is defined in the global plot call and the data is available in the global plot. You could in theory traverse all layers and keep updating your unique values, but this becomes cumbersome.

Histograms and Density Plots do not match up

I am creating histograms of substitutions: 1st, 2nd,or 3rd sub over Time. So each histogram shows the number of subs in a given minute given the Sub Number. The histograms make sense to me because for the most part they are smooth (I used a bin width of 1 minute). Nothing looks too out of the ordinary. However, when I overlay a density plot, the tails on the left inflate and I cannot determine why for one of the graphs.
The dataset is of substitions, ranging from minute 1 to a maximum time. I then cut this dataset in half to only look at when the sub was made after minute 45. I have not folded this data back and I have tried to create a reproducable example, but cannot given the data.
Code used to create in R
## Filter out subs that are not in the second half
df.half<-df[df$PeriodId>=2,]
p<-ggplot(data=df.half, aes(x=time)) +
geom_histogram(aes(y=..density..),position="identity", alpha=0.5,binwidth=1)+
geom_vline(data=sumy.df.half,aes(xintercept=grp.mean),color="blue", linetype="dashed", size=1)+
geom_density(alpha=.2)+
facet_grid(SUB_NUMBER ~ .)+
scale_y_continuous(limits = c(0,0.075),breaks = c(seq(0,0.075,0.025)),
minor_breaks = c(seq(0,0.075,0.025)),name='Count')
p
Why, for the First Sub is the density plot inflated in the tail if there are no values less than 45? Also why isn't the density plot more inflated in the tail for the Second Sub?
Side Note: I did ask this question on crossvalidated, but was told since it involved R, to ask it here instead. Here
So I was able to change the code and get the following:
ggplot() +
geom_histogram(data=df.half, aes(x=time,y=..density..),position="identity", alpha=0.5,binwidth=1)+
geom_density(data=df.half,aes(x=time,y=..density..))+
geom_vline(data=sumy.df.half,aes(xintercept=grp.mean),color="blue", linetype="dashed", size=1)+
facet_grid(SUB_NUMBER ~ .)
This looks more correct and at least now fits the dataset. However, I am still confused as to why those issues occured in the first place.
While there is no data sample to reproduce the error, you could try to
make sure that the environment used by geom_density is correct by specifying it explicitly. You can also try to move the code line specifying the density (geom_density) just after the geom_histogram. Also, the y-axis label is probably wrong - it is now set as counts, while values suggest that is in fact density.
How would I specify density explicitly?
You can specify the density parameters explicitly by specifying data, aes and position directly in geom_density function call, so it would use these stated instead of inherited arguments:
ggplot() +
geom_histogram(data=df.half, aes(x=time,y=..density..),position="identity", alpha=0.5,binwidth=1)+
geom_density(data=df.half,aes(x=time,y=..density..))+
geom_vline(data=sumy.df.half,aes(xintercept=grp.mean),color="blue", linetype="dashed", size=1)+
facet_grid(SUB_NUMBER ~ .)
I do not understand how it occured in the first place
I think in your initial code for geom_density, you have explicitly specified just the alpha argument. Thus for all of the rest of the parameters it needed, (data, aes, position etc) it used the inherited arguments/parameters and apparently it did not inherit them correctly. Probably it tried to use the data argument from the geom_vline function - sumy.df.half , or was confused by the syntaxis in argument "..density.."

Set categorical axis labels with scales "free" ggplot2

I am trying to set the labels on a categorical axis within a faceted plot using the ggplot2 package (1.0.1) in R (3.1.1) with scales="free". If I plot without manually setting the axis tick labels they appear correctly (first plot), but when I try to set the labels (second plot) only the first n labels are used on both facets (not in sequence as with the original labels).
Here is a reproducible code snippet exemplifying the problem:
foo <- data.frame(yVal=factor(letters[1:8]), xVal=factor(rep(1:4,2)), fillVal=rnorm(8), facetVar=rep(1:2,each=4))
## axis labels are correct
p <- ggplot(foo) + geom_tile(aes(x=xVal, y=yVal, fill=fillVal)) + facet_grid(facetVar ~ ., scales='free')
print(p)
## axis labels are not set correctly
p <- p + scale_y_discrete(labels=c('a','a','b','b','c','d','d','d'))
print(p)
I note that I cannot set the labels correctly within the data.frame as they are not unique. Also I am aware that I can accomplish this with arrange.grid, but this requires "manually" aligning the plots if there are different length labels etc. Additionally, I would like to have the facet labels included in the plot which is not an available option with the arrange.grid solution. Also I haven't tried viewports yet. Maybe that is the solution, but I was hoping for more of the faceted look to this plot and that seems to be more similar to grid.arrange.
It seems to me as though this is a bug, but I am open to an explanation as to how this might be a "feature". I also hope that there might be a simple solution to this problem that I have not thought of yet!
The easiest method would be to create another column in your data set with the right conversion. This would also be easier to audit and manipulate. If you insist on changing manually:
You cannot simply set the labels directly, as it recycles (I think) the label vector for each facet. Instead, you need to set up a conversion using corresponding breaks and labels:
p <- p + scale_y_discrete(labels = c('1','2','3','4','5','6','7','8'), breaks=c('a','b','c','d','e','f','g','h'))
print(p)
Any y axis value of a will now be replaced with 1, b with 2 and so on. You can play around with the label values to see what I mean. Just make sure that every factor value you have is also represented in the breaks argument.
I think I may actually have a solution to this. My problem was that my labels were not correct because as someone above has said - it seems like the label vector is recycled through. This line of code gave me incorrect labels.
ggplot(dat, aes(x, y))+geom_col()+facet_grid(d ~ t, switch = "y", scales = "free_x")+ylab(NULL)+ylim(0,10)+geom_text(aes(label = x))
However when the geom_text was moved prior to the facet_grid, the below code gave me correct labels.
ggplot(dat, aes(x, y))+geom_col()+geom_text(aes(label = x))+facet_grid(d ~ t, switch = "y", scales = "free_x")+ylab(NULL)+ylim(0,10)
There's a good chance I may have misunderstood the problem above, but I certainly solved my problem so hopefully this is helpful to someone!

How to change color/shape/size for a subset of data after plotting in ggplot2

There are lots of situations where I use ggplot to create a nice looking graph, but I would like to play around with the colors/shapes/sizes for data belonging to a certain group (e.g. to highlight it).
I understand how to set these properties differently for each group when I first create the plot. However, I would like to know if there is a simple command to change the properties after the plot has been created preferably without having to specify the properties for all other subsets).
As an example consider the following code:
library(ggplot2)
x = seq(0,1,0.2)
y = seq(0,1,0.2)
types = c("a","a","a","b","b","c")
df = data.frame(x,y,types)
table_of_colors = c("a"="red","b"="blue","c"="green")
table_of_shapes = c("a"=15,"b"=15,"c"=16)
my_plot = ggplot(df) +
theme_bw() +
geom_point(aes(x=x,y=y,color=types,shape=types),size=10) +
scale_color_manual(values = table_of_colors) +
scale_shape_manual(values=table_of_shapes)
which produces the following plot:
I'm wondering:
Is there a way to change the color of the green point (type=="c") without having to type out the colors for the other points?
Is there a way to change the shape of the blue/red points (type %in% c("a","b")) without having to type out the shapes for all the other points?
The size of all points is currently set to 10. Is there a way to change the size of only the green point to say 15, while keeping the size of all remaining points at 10?
I'm not sure if this is an existing feature, but hacks are welcome (so long as the changes will be reflected in the legend).
This seems kind of hacky to me, but the code below addresses items 1 and 2 in your list:
my_plot +
scale_colour_manual(values=c(table_of_colors[1:2],c="green")) +
scale_shape_manual(values=c(a=4,b=6, table_of_shapes[3]))
I thought maybe you could change the size with something like scale_size_manual(values=c(10,10,15)), but that doesn't work, perhaps because size was hard-coded, rather than set with an aesthetic to begin with.
It would probably be cleaner to just create new vectors of shapes, colors, etc., as needed, rather than to make individual ad hoc changes like those above.

Why wont ggplot2 allow me to set a size for each individual point?

I've got a scatter plot. I'd like to scale the size of each point by its frequency. So I've got a frequency column of the same length. However, if I do:
... + geom_point(size=Freq)
I get this error:
When _setting_ aesthetics, they may only take one value. Problems: size
which I interpret as all points can only have 1 size. So how would I do what I want?
Update: data is here
The basic code I used is:
dcount=read.csv(file="New_data.csv",header=T)
ggplot(dcount,aes(x=Time,y=Counts)) + geom_point(aes(size=Freq))
Have you tried..
+ geom_point(aes(size = Freq))
Aesthetics are mapped to variables in the data with the aes function. Check out http://had.co.nz/ggplot2/geom_point.html
ok, this might be what you're looking for. The code you provided above aggregates the information into four categories. If you don't want that, you can specify the categories with scale_size_manual().
sizes <- unique(dcount$Freq)
names(sizes) <- as.character(unique(dcount$Freq))
ggplot(dcount,aes(x=Time,y=Counts)) + geom_point(aes(size=as.factor(Freq))) + scale_size_manual(values = sizes/2)
If the code gd047 gave doesn't work, I'd double check that your Freq column is actually called Freq and that your workspace doesn't have some other object called Freq. Other than that, the code should work. How do you know that the scale has nothing to do with the frequency?

Resources