How should I decide when to put parameters in ggplot( ) or in the geom_xx( )? Does it matter if the parameter is set to a constant or to a column from the data frame? What other factors (R pun unintentional) should be considered?
This seems to work fine, but has a legend which lists the transparency.
ggplot(mtcars, aes(x = wt, y = mpg, col = cyl, alpha = 0.6))+geom_point(size = 4)
This is a slight improvement because the legend has been removed, but seems to be the same otherwise.
ggplot(mtcars, aes(x = wt, y = mpg, col = cyl))+geom_point(size = 4, alpha = 0.6)
I understand that parameters should be defined once when possible if they apply to all geoms and separately when they should apply to only one geom or if they should override settings from ggplot. There's a helpful list of aesthetics here:
Is there a table or catalog of aesthetics for ggplot2? and a nice graphical representation here.
Related
Just encountered a surprising colour problem while doing a boxplot with ggplot2.
The same colour (#FF4040) looks drastically different whether I set it as fill parameter or later in scale_fill_manual.
Here is an example you can copy/paste using the mtcars dataset.
library(ggplot2)
data('mtcars')
ggplot (data = mtcars, aes(x = as.factor(cyl), disp)) +
geom_boxplot(aes(fill = '#FF4040'))
ggplot (data = mtcars, aes(x = as.factor(cyl), disp)) +
geom_boxplot(aes(fill = as.factor(cyl)))+
scale_fill_manual(breaks=c('4', '6', '8'),
values=c('#FF4040', '#FF4040', '#FF4040'))
Here is the comparison:
As I said in comments in first example you don't change fill color only mapping fill. So instead of geom_boxplot(aes(fill= '#FF4040')) put geom_boxplot(fill= '#FF4040') and you recive the same result as the second version.
Why does this graph not show overlaps
Some of the cars in this dataset share the same combination for x and y (displ and hwy).
For example for displ = 2 and hwy = 29, there are: 1 midsize; 6 compact and 3 subcompact.
However, in this spot there is only a green dot showing only 1 midsize. What am I misunderstanding about this graph?
Thank you so much!
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
Carsten,
The call to goem_point() will map coordinates over each other, hence you will see only one point, this is especially true for small datasets. You can address this by using geom_jitter(), which allows you to insert noise into the plot allowing you to see all points.
Solution: geom_jitter()
Here we use geom_jitter(), to insert noise into the plot data allowing us to see all overlapping points.
if (require(ggplot2) ) install.packages("ggplot2")
data(mtcars)
ggplot(data = mpg) +
geom_jitter(mapping = aes(x = displ, y = hwy, color = class))
Plot Output: (Points slightly shifted to distinguish each point)
Note how the inserted "noise" allows you to distinguish the plot points.
nb. The jitter geom is a convenient shortcut for geom_point(position = "jitter"). It adds a small amount of random variation to the location of each point, and is a useful way of handling overplotting caused by discreteness in smaller datasets.
Apart from jitter, you can also change the alpha argument in geom_point() to 0.3 or 0.4, by default it is 1, which means 100% opaque.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class, alpha = 0.3))
This will highlight areas of over-plotting
The geom_jitter solution and alpha changing solution are both excellent. A third possibility is to map the size of the marker to the number of observations at those coordinates (along with an alpha adjustment) using geom_count():
library(ggplot2)
data(mtcars)
ggplot(data = mpg) +
geom_count(mapping = aes(x = displ, y = hwy, color = class), alpha = .5)
I'm tinkering with geom_point trying to plot the following code. I have converted cars$vs to a factor with discrete levels so that I can visualize both levels of that variable in different colors by assigning it to "fill" in the ggplot aes settings.
cars <- mtcars
cars$vs <- as.factor(cars$vs)
ggplot(cars,aes(x = mpg, y = disp, fill = vs)) +
geom_point(size = 4) +
scale_fill_discrete(name = "Test")
As you can see, the graph does not differentiate between both "fill" conditions via color. However, it preserves the legend label I have specified in scale_fill_discrete.
Alternatively, I can plot the following (same code, but instead of "fill", use "color")
cars <- mtcars
cars$vs <- as.factor(cars$vs)
ggplot(cars,aes(x = mpg, y = disp, color = vs)) +
geom_point(size = 4) +
scale_fill_discrete(name = "Test")
As you can see, using "color" instead of "fill" differentiates between the levels of the factor via color, but seems to override any changes I make to the legend title using scale_fill_discrete.
Am I using "fill" incorrectly? How can I plot different levels of a factor in different colors using this method and have control over the plot legend vis scale_fill_discrete?
Since you are using color as mapping, you can use scale_color_* to change the corresponding attributes instead of scale_fill_*:
ggplot(cars,aes(x = mpg, y = disp, color = vs)) +
geom_point(size = 4) +
scale_color_discrete(name = "Test")
To use a fill with geom_point you should use a fill-able shape:
ggplot(cars,aes(x = mpg, y = disp, fill = vs)) +
geom_point(size = 4, shape = 21) +
scale_fill_discrete(name = "Test")
See ?pch, which shows that shapes 21 to 25 can be colored and filled with different colors.ggplot will not use the fill unless the shape is one that is fill-able. This behavior has changed a bit in different versions, as seen in the NEWS file.
There's no reason to use fill with geom_point unless you want the outline and fill colors of the points to be different, so the other answer recommending color is probably what you want.
When you run this code you will see the facet with B has a red point but it clearly should be red. How do you set the colors properly given data frame "d"
Thank you.
d = data.frame(x = c(1,2,3),y = c(4,5,6), color = c("red","blue","red"), group = c("A","B","A"))
d
ggplot(data= d, aes(x = x, y = y ) ) +geom_point( color = d$color)+
facet_wrap(~group)
Unlike base plots, ggplot doesn't expect you to have a column of color names in your data. It expects you to have a column that defines the variable that you want to color by, and optionally specify the mapping between that vector's values and custom colors (if you don't like the defaults).
In your data, the color column seems to be based off of the group column. This would be the canonical ggplot way to create your plot (notice that the color column is not used):
ggplot(data = d, aes(x = x, y = y, color = group)) +
geom_point() +
facet_wrap(~group)
Note that you do not need to facet and color by the same column, e.g.,
ggplot(data = mtcars, aes(x = wt, y = mpg, color = factor(cyl))) +
geom_point() +
facet_wrap(~ am)
The key point is that you are mapping a column to the color argument of aes() inside aes(). When facets are involved ggplot does potentially complicated splitting of the data behind-the-scenes. This data manipulation is based on the data provided to the data argument and column names provided inside aes.
If you specify data$column you are passing just a vector. You have taken it from your data frame, but ggplot doesn't know that - it could have come from anywhere. This will cause mistakes in the subsetting done for the facets. You need to use aes(color = column) (note the lack of data$ - use just the column name inside aes()), and ggplot will look for a column of that name in the data and know how to correctly filter the data for each facet.
This is one way:
ggplot(data= d, aes(x = x, y = y ) ) +
geom_point(aes(color = color))+
facet_wrap(~group) +
scale_color_manual(values = c('red' = 'red','blue' = 'blue'))
This is my first question on stackoverflow so please correct me if the question is unclear.
I would like to assign geom attributes for ggplot2 to a variable for reuse in multiple plots. For example, let's say I want to assign the attributes of size and shape to a variable to resuse in plotting data other than mtcars.
This code works, but if I have a lot of plots I don't want to keep re-entering the size and shape attributes.
ggplot(mtcars) +
geom_point(aes(x = wt,
y = mpg),
size = 5,
shape = 21
)
How should I assign a variable (eg size.shape) these attributes so that I can use it in the below code to produce the same plot?
ggplot(mtcars) +
geom_point(aes(x = wt,
y = mpg),
size.shape
)
If you always want to use the same values for size and shape (or other aesthetics), you could use update_geom_defaults() to set the default values to other values:
update_geom_defaults("point", list(size = 5, shape = 21))
These will then be used whenever you do not specifically give values for the aesthetics.
Example
The plot you create with the usual default settings looks as follows:
ggplot(mtcars) + geom_point(aes(x = wt, y = mpg))
But when you reset the defaults for size and shape, it looks differently:
update_geom_defaults("point", list(size = 5, shape = 21))
ggplot(mtcars) + geom_point(aes(x = wt, y = mpg))
As you can see, the actual plot is done with the same code as before, but the result is different because you changed the default values for size and shape. Of course, you can still produce plots with any value for these aesthetics, by simply providing values in geom_point():
ggplot(mtcars) + geom_point(aes(x = wt, y = mpg), size = 2, shape = 2)
Note that the defaults are given by geom, which means that only geom_point() is affected.
This solution is convenient, if there is only one set of values for size and shape that you want to use. If you have several sets of values that you want to be able to pick from when creating a plot, then you might be better off with something along the lines of the comment by lukeA.