assign ggplot2 geom attributes to variable - r

This is my first question on stackoverflow so please correct me if the question is unclear.
I would like to assign geom attributes for ggplot2 to a variable for reuse in multiple plots. For example, let's say I want to assign the attributes of size and shape to a variable to resuse in plotting data other than mtcars.
This code works, but if I have a lot of plots I don't want to keep re-entering the size and shape attributes.
ggplot(mtcars) +
geom_point(aes(x = wt,
y = mpg),
size = 5,
shape = 21
)
How should I assign a variable (eg size.shape) these attributes so that I can use it in the below code to produce the same plot?
ggplot(mtcars) +
geom_point(aes(x = wt,
y = mpg),
size.shape
)

If you always want to use the same values for size and shape (or other aesthetics), you could use update_geom_defaults() to set the default values to other values:
update_geom_defaults("point", list(size = 5, shape = 21))
These will then be used whenever you do not specifically give values for the aesthetics.
Example
The plot you create with the usual default settings looks as follows:
ggplot(mtcars) + geom_point(aes(x = wt, y = mpg))
But when you reset the defaults for size and shape, it looks differently:
update_geom_defaults("point", list(size = 5, shape = 21))
ggplot(mtcars) + geom_point(aes(x = wt, y = mpg))
As you can see, the actual plot is done with the same code as before, but the result is different because you changed the default values for size and shape. Of course, you can still produce plots with any value for these aesthetics, by simply providing values in geom_point():
ggplot(mtcars) + geom_point(aes(x = wt, y = mpg), size = 2, shape = 2)
Note that the defaults are given by geom, which means that only geom_point() is affected.
This solution is convenient, if there is only one set of values for size and shape that you want to use. If you have several sets of values that you want to be able to pick from when creating a plot, then you might be better off with something along the lines of the comment by lukeA.

Related

R code of scatter plot for three variables

Hi I am trying to code for a scatter plot for three variables in R:
Race= [0,1]
YOI= [90,92,94]
ASB_mean = [1.56, 1.59, 1.74]
Antisocial <- read.csv(file = 'Antisocial.csv')
Table_1 <- ddply(Antisocial, "YOI", summarise, ASB_mean = mean(ASB))
Table_1
Race <- unique(Antisocial$Race)
Race
ggplot(data = Table_1, aes(x = YOI, y = ASB_mean, group_by(Race))) +
geom_point(colour = "Black", size = 2) + geom_line(data = Table_1, aes(YOI,
ASB_mean), colour = "orange", size = 1)
Image of plot: https://drive.google.com/file/d/1E-ePt9DZJaEr49m8fguHVS0thlVIodu9/view?usp=sharing
Data file: https://drive.google.com/file/d/1UeVTJ1M_eKQDNtvyUHRB77VDpSF1ASli/view?usp=sharing
Can someone help me understand where I am making mistake? I want to plot mean ASB vs YOI grouped by Race. Thanks.
I am not sure what is your desidered output. Maybe, if I well understood your question I Think that you want somthing like this.
g_Antisocial <- Antisocial %>%
group_by(Race) %>%
summarise(ASB = mean(ASB),
YOI = mean(YOI))
Antisocial %>%
ggplot(aes(x = YOI, y = ASB, color = as_factor(Race), shape = as_factor(Race))) +
geom_point(alpha = .4) +
geom_point(data = g_Antisocial, size = 4) +
theme_bw() +
guides(color = guide_legend("Race"), shape = guide_legend("Race"))
and this is the output:
#Maninder: there are a few things you need to look at.
First of all: The grammar of graphics of ggplot() works with layers. You can add layers with different data (frames) for the different geoms you want to plot.
The reason why your code is not working is that you mix the layer call and or do not really specify (and even mix) what is the scatter and line visualisation you want.
(I) Use ggplot() + geom_point() for a scatter plot
The ultimate first layer is: ggplot(). Think of this as your drawing canvas.
You then speak about adding a scatter plot layer, but you actually do not do it.
For example:
# plotting antisocal data set
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race)))
will plot your Antiscoial data set using the scatter, i.e. geom_point() layer.
Note that I put Race as a factor to have a categorical colour scheme otherwise you might end up with a continous palette.
(II) line plot
In analogy to above, you would get for the line plot the following:
# plotting Table_1
ggplot() +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean))
I save showing the plot of the line.
(III) combining different layers
# putting both together
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race))) +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean)) +
## this is to set the legend title and have a nice(r) name in your colour legend
labs(colour = "Race")
This yields:
That should explain how ggplot-layering works. Keep an eye on the datasets and geoms that you want to use. Before working with inheritance in aes, I recommend to keep the data= and aes() call in the geom_xxxx. This avoids confustion.
You may want to explore with geom_jitter() instead of geom_point() to get a bit of a better presentation of your dataset. The "few" points plotted are the result of many datapoints in the same position (and overplotted).
Moving away from plotting to your question "I want to plot mean ASB vs YOI grouped by Race."
I know too little about your research to fully comprehend what you mean with that.
I take it that the mean ASB you calculated over the whole population is your reference (aka your Table_1), and you would like to see how the Race groups feature vs this population mean.
One option is to group your race data points and show them as boxplots for each YOI.
This might be what you want. The boxplot gives you the median and quartiles, and you can compare this per group against the calculated ASB mean.
For presentation purposes, I highlighted the line by increasing its size and linetype. You can play around with the colours, etc. to give you the aesthetics you aim for.
Please note, that for the grouped boxplot, you also have to treat your integer variable YOI, I coerced into a categorical factor. Boxplot works with fill for the body (colour sets only the outer line). In this setup, you also need to supply a group value to geom_line() (I just assigned it to 1, but that is arbitrary - in other contexts you can assign another variable here).
ggplot() +
geom_boxplot(data = Antisocial, aes(x = as.factor(YOI), y = ASB, fill = as.factor(Race))) +
geom_line(data = Table_1, aes(x = as.factor(YOI), y = ASB_mean, group = 1)
, size = 2, linetype = "dashed") +
labs(x = "YOI", fill = "Race")
Hope this gets you going!

ggplot not plotting the correct color [duplicate]

This question already has an answer here:
Using colors in aes() function in ggplot2
(1 answer)
Closed 3 years ago.
gb <- read.csv('results-gradient-boosting.csv')
p <- ggplot(gb) + geom_point(aes(x = pred, y = y),alpha = 0.4, fill = 'darkgrey', size = 2) +
geom_line(aes(x = pred, y = pred,color = 'darkgrey'),size = 0.6) +
geom_line(aes(x = pred, y = pred + 3,color = I("darkgrey")), linetype = 'dashed',size = 0.6) +
geom_line(aes(x = pred, y = pred -3,color = 'darkgrey'),linetype = 'dashed',size = 0.6)
My code is above. I have no idea why when I put color inside aes, the color turns out to be red. But if I put it outside of aes, it is correct. Thanks for your help!
When you put color="darkgrey" outside aes, ggplot takes it literally to mean that the line should be colored "darkgrey". But when you put color="darkgrey" inside aes, ggplot takes it to mean that you want to map color to a variable. In this case, the variable has only one value: "darkgrey". But that's not the color "darkgrey". It's just a string. You could call it anything. The color ggplot chooses will be based on the default palette. Map color to a variable when you want different colors for different levels of that variable.
For example, see what happens in the example below. The colors are chosen from ggplot's default palette and are completely independent of the names we've used for colour in each call to geom_line. You will get the same three colors when you have any color aesthetic that takes on three different unique values:
library(ggplot2)
theme_set(theme_classic())
ggplot(mtcars) +
geom_line(aes(mpg, wt, colour="green")) +
geom_line(aes(mpg, wt - 1, colour="blue")) +
geom_line(aes(mpg, wt + 1, colour="star trek"))
But now we put the colors outside aes so they are taken literally, and we comment out the third line, because it will cause an error if we don't use a valid colour.
ggplot(mtcars) +
geom_line(aes(mpg, wt), colour="green") +
geom_line(aes(mpg, wt - 1), colour="blue") #+
#geom_line(aes(mpg, wt + 1), colour="star trek")
Note that if we map colour to an actual column of mtcars (one that has three unique levels), we get the same three colors as in the first example, but now they are mapped to an actual feature of the data:
ggplot(mtcars) +
geom_line(aes(mpg, wt, colour=factor(cyl)))
And finally, what if we want to set those mapped colors to different values:
ggplot(mtcars) +
geom_line(aes(mpg, wt, colour=factor(cyl))) +
scale_colour_manual(values=c("purple", hcl(150,100,80), rgb(0.9,0.5,0.3)))

ggplot conflict between fill and scale_fill_discrete/plot legend

I'm tinkering with geom_point trying to plot the following code. I have converted cars$vs to a factor with discrete levels so that I can visualize both levels of that variable in different colors by assigning it to "fill" in the ggplot aes settings.
cars <- mtcars
cars$vs <- as.factor(cars$vs)
ggplot(cars,aes(x = mpg, y = disp, fill = vs)) +
geom_point(size = 4) +
scale_fill_discrete(name = "Test")
As you can see, the graph does not differentiate between both "fill" conditions via color. However, it preserves the legend label I have specified in scale_fill_discrete.
Alternatively, I can plot the following (same code, but instead of "fill", use "color")
cars <- mtcars
cars$vs <- as.factor(cars$vs)
ggplot(cars,aes(x = mpg, y = disp, color = vs)) +
geom_point(size = 4) +
scale_fill_discrete(name = "Test")
As you can see, using "color" instead of "fill" differentiates between the levels of the factor via color, but seems to override any changes I make to the legend title using scale_fill_discrete.
Am I using "fill" incorrectly? How can I plot different levels of a factor in different colors using this method and have control over the plot legend vis scale_fill_discrete?
Since you are using color as mapping, you can use scale_color_* to change the corresponding attributes instead of scale_fill_*:
ggplot(cars,aes(x = mpg, y = disp, color = vs)) +
geom_point(size = 4) +
scale_color_discrete(name = "Test")
To use a fill with geom_point you should use a fill-able shape:
ggplot(cars,aes(x = mpg, y = disp, fill = vs)) +
geom_point(size = 4, shape = 21) +
scale_fill_discrete(name = "Test")
See ?pch, which shows that shapes 21 to 25 can be colored and filled with different colors.ggplot will not use the fill unless the shape is one that is fill-able. This behavior has changed a bit in different versions, as seen in the NEWS file.
There's no reason to use fill with geom_point unless you want the outline and fill colors of the points to be different, so the other answer recommending color is probably what you want.

Which aesthetics go in ggplot( ) and which in geom_xx( )

How should I decide when to put parameters in ggplot( ) or in the geom_xx( )? Does it matter if the parameter is set to a constant or to a column from the data frame? What other factors (R pun unintentional) should be considered?
This seems to work fine, but has a legend which lists the transparency.
ggplot(mtcars, aes(x = wt, y = mpg, col = cyl, alpha = 0.6))+geom_point(size = 4)
This is a slight improvement because the legend has been removed, but seems to be the same otherwise.
ggplot(mtcars, aes(x = wt, y = mpg, col = cyl))+geom_point(size = 4, alpha = 0.6)
I understand that parameters should be defined once when possible if they apply to all geoms and separately when they should apply to only one geom or if they should override settings from ggplot. There's a helpful list of aesthetics here:
Is there a table or catalog of aesthetics for ggplot2? and a nice graphical representation here.

Map size to more than one variable with geom_point in ggplot2

Is there any option to use two variables in parameter size in ggplot function?
E.g.
ggplot(mtcars, aes(qsec,drat, size = cyl))+
geom_point()
Obviously it works and size of points depends on cyl variable. But is it any option to add another size variable which can use alpha option e.g. variable = mpg. How to do it?
I did it just for three dots as an example in Ilustrator.
Thanks,
Update
Thanks to #lukeA following lines of code works:
ggplot() +
geom_point(data = mtcars, aes(qsec,drat, size = cyl)) +
geom_point(data = mtcars, aes(qsec,drat, size = mpg), alpha = .1, colour = "red")
But when I want to set the size of each variable separately, it is not possible. When there is one size variable normally I use scale_size_contiunous but with two size variables it doesn't work. I know that using scale_size_continuous two times doesn't change anything. Maybe it is not possible at all, but maybe somebody find the solution.
You can do this now with ggnewscale - allowing you to have more than one scale for any aesthetic. The scales can be defined separately too. Here's an example:
library(ggplot2)
library(ggnewscale)
ggplot(mtcars, aes(x=qsec, y=drat)) + theme_bw() +
geom_point(aes(size=cyl), color='gray20') +
scale_size_continuous(range=c(0.2, 3)) +
new_scale("size") +
geom_point(aes(size=mpg), alpha=0.1, color='red') +
scale_size_continuous(range=c(5, 10))
You can define the ranges for each scale separately. In this case, that's important so that you can ensure that the range of sizes mapped to mpg is larger than the range of sizes mapped to cyl. Otherwise, both are mapped to the default range (c(1,6)), which means many points would not have a visible red point.

Resources