This question already has answers here:
Place a border around points
(5 answers)
Closed 1 year ago.
I have a dataset like this:
Year<-rep(2001:2005, each = 5)
name<-c("John","Ellen","Mark","Randy","Luisa")
Name<-c(rep(name,5))
Value<-sample(seq(0,25,by=1),25)
mydata<-data.frame(Year,Name,Value)
And my plot looks like this:
p <- ggplot(mydata, aes(x=Year, y=reorder(Name, desc(Name)), size = Value)) +
geom_point(aes(colour = Value,
alpha = I(as.numeric(Value > 0))))
p <- p + scale_colour_viridis_c(option = "D", direction = -1,
limits = c(1, 25)) +
scale_size_area(guide = "none") +
ylab("Name") +
theme(axis.line = element_blank(),
axis.text.x=element_text(size=11,margin=margin(b=10),colour="black"),
axis.text.y=element_text(size=13,margin=margin(l=10),colour="black",
face="italic"),
axis.ticks = element_blank(),
axis.title=element_text(size=18,face="bold"),
panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(),
legend.text = element_text(size=14),
legend.title = element_text(size=18))
I would like to improve it in two ways but I couldn't figure out how.
I would like to add a black border around points. I know I should use pch>20 and specify colour, but because my colours are mapped to a feature of the dataset (they depend on value, in this case), I don't know exactly how to do that. Note that value = 0 points are not plotted. Easy stratagems such as plotting bigger black points under my points seem utopic for me.
I would like to change the breaks of the scale (e.g., instead of having breaks every 5, I'd like to have breaks every 2.5), but it is a continuous scale, and I'm not sure how to do that.
I am not very familiar with ggplo2, thus any help would be appreciated!
You can indeed use a shape >20, e.g. I use shape=21 here. Then you need to change your scale_color_ to scale_fill_, because the color is now black (it is the border of the shape).
For breaks, you could just specify them in the scale itself. Combining both:
ggplot(mydata, aes(x=Year, y=reorder(Name, desc(Name)), size = Value)) +
geom_point(aes(fill = Value,
alpha = I(as.numeric(Value > 0))), shape=21, color = "black") +
scale_fill_viridis_c(option = "D", direction = -1,
limits = c(1, 25), breaks=seq(1, 25, 2.5)) +
scale_size_area(guide = "none") +
ylab("Name") +
theme(axis.line = element_blank(),
axis.text.x=element_text(size=11,margin=margin(b=10),colour="black"),
axis.text.y=element_text(size=13,margin=margin(l=10),colour="black",
face="italic"),
axis.ticks = element_blank(),
axis.title=element_text(size=18,face="bold"),
panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(),
legend.text = element_text(size=14),
legend.title = element_text(size=18))
I make a heatmap in R that shows the dependency of a variable (Corona misinformation Score) on two other variables (Indifference Score and Rigidity Score). I do not understand why ordering my data according to the Corona misinformation score makes a difference for how the heatmap looks.
Here is the code I use to generate the graph:
dset %>%
arrange(Mean_Corona) %>%
ggplot(aes(x=Mean_Rigidity, y=Mean_Indifference, fill = Mean_Corona)) +
geom_tile(alpha=0.8) +
scale_fill_distiller(palette = "RdYlGn") +
ylab("Indifference Score") +
xlab("Rigidity Score") +
labs(color="Corona Misinformation Score") +
theme(
legend.position="bottom",
panel.background = element_rect(fill = "white"),
panel.grid.major = element_line(colour = "grey70", size = 0.2),
panel.grid.minor = element_blank())
This is what the graph looks like:
If I run the same code but remove the second line (arrange(Mean_Corona) %>%), the heatmap looks instead like this:
If I order the data for the same variable in descending order, the heatmap looks different again. What I don't understand is why ordering rows in the dataset should make any difference to how the graph looks. Should not the shading of each tile just be determined by the average Corona Misinformation score for people with that score? I am stuck because I am not sure what the more accurate way of displaying my data is.
You will notice the plots have all the tiles in the same position but that some tiles have different colours. You are quite right that the ordering of Mean_Corona shouldn't make a difference, but that is true only if the position of each tile is unique. If you have multiple values for each tile position and you sort for Mean_Corona, then the lower value tiles are plotted first, and the higher values are plotted on top of the lower values. If you reverse that ordering, the higher value tiles will be obscured by the lower value tiles.
We can see this more clearly if we create a small dummy data set with 8 unique tiles but only 4 unique tile positions:
dset <- data.frame(Mean_Corona = 1:8,
Mean_Indifference = rep(c(0.5, 1.5), 4),
Mean_Rigidity = rep(c(0.5, 1.5), each = 4))
So let's plot this with the original data frame, which happens to be sorted by Mean_Corona already:
dset %>%
ggplot(aes(x=Mean_Rigidity, y=Mean_Indifference, fill = Mean_Corona)) +
geom_tile(alpha=0.8) +
scale_fill_distiller(palette = "RdYlGn") +
ylab("Indifference Score") +
xlab("Rigidity Score") +
labs(color="Corona Misinformation Score") +
theme(
legend.position="bottom",
panel.background = element_rect(fill = "white"),
panel.grid.major = element_line(colour = "grey70", size = 0.2),
panel.grid.minor = element_blank())
Now we plot with the values in descending order. Here we see that the lower values have been plotted over the higher values:
dset %>%
arrange(-Mean_Corona) %>%
ggplot(aes(x=Mean_Rigidity, y=Mean_Indifference, fill = Mean_Corona)) +
geom_tile(alpha=0.8) +
scale_fill_distiller(palette = "RdYlGn") +
ylab("Indifference Score") +
xlab("Rigidity Score") +
labs(color="Corona Misinformation Score") +
theme(
legend.position="bottom",
panel.background = element_rect(fill = "white"),
panel.grid.major = element_line(colour = "grey70", size = 0.2),
panel.grid.minor = element_blank())
One possible solution here is to group by both the indifference and rigidity scores, then take the average of the tiles at that position. That will ensure you have a single tile at each location that better reflects the relationship between variables.
dset %>%
group_by(Mean_Rigidity, Mean_Indifference) %>%
summarise(Mean_Corona = mean(Mean_Corona)) %>%
ggplot(aes(x=Mean_Rigidity, y=Mean_Indifference, fill = Mean_Corona)) +
geom_tile(alpha=0.8) +
scale_fill_distiller(palette = "RdYlGn") +
ylab("Indifference Score") +
xlab("Rigidity Score") +
labs(color="Corona Misinformation Score") +
theme(
legend.position="bottom",
panel.background = element_rect(fill = "white"),
panel.grid.major = element_line(colour = "grey70", size = 0.2),
panel.grid.minor = element_blank())
You should remove the alpha, because the order defines in which way the tiles are plotted over each other.
Best regards
Roel
I have a plot created using ggplot2 where I'm trying to modify some of the minor grid lines. Here is the current version:
library(tidyverse)
data(starwars)
starwars = starwars %>%
filter(!is.na(homeworld), !is.na(skin_color)) %>%
mutate(tatooine = factor(if_else(homeworld == "Tatooine", "Tatooine Native", "Other Native")),
skin_color = factor(skin_color))
ggplot(starwars, aes(birth_year, skin_color)) +
geom_point(aes(color = gender), size = 4, alpha = 0.7, show.legend = FALSE) +
facet_grid(tatooine ~ ., scales = "free_y", space = "free_y", switch = "y") +
theme_minimal() +
theme(
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
strip.placement = "outside",
strip.background = element_rect(fill="gray90", color = "white"),
) +
geom_hline(yintercept = seq(0, length(unique(starwars$skin_color))) + .5, color="gray30")
Y axis is a factor and a facet grid is used, with an uneven number of categories in each grid. I added some minor grid lines using geom_hline (my understanding is that panel.grid.minor does not work with categorical data i.e., factors).
I would like to remove the lines highlighted in yellow below, and then ADD a single black line in between the two facet grids (i.e., where the current double lines are that are highlighted in yellow).
Any way to do this? I'd prefer avoiding hard coding the position of any lines, in case the data change. Thanks.
Removing the top and bottom grid lines dynamically is relatively easy. You code the line positions in the data set based on the faceting groups and exclude the highest and lowest value, and plot the geom_hline with an xintercept inside the aes() statement. That approach is robust to changing the data (to see that this approach works if you change the data, comment out the # filter(!is.na(birth_year)) line below).
library(tidyverse)
library(grid)
data(starwars)
starwars = starwars %>%
filter(!is.na(homeworld), !is.na(skin_color)) %>%
mutate(tatooine = factor(if_else(homeworld == "Tatooine", "Tatooine Native", "Other Native")),
skin_color = factor(skin_color)) %>%
# filter(!is.na(birth_year)) %>%
group_by(tatooine) %>%
# here we assign the line_positions
mutate(line_positions = as.numeric(factor(skin_color, levels = unique(skin_color))),
line_positions = line_positions + .5,
line_positions = ifelse(line_positions == max(line_positions), NA, line_positions))
plot_out <- ggplot(starwars, aes(birth_year, skin_color)) +
geom_point(aes(color = gender), size = 4, alpha = 0.7, show.legend = FALSE) +
geom_hline(aes(yintercept = line_positions)) +
facet_grid(tatooine ~ ., scales = "free_y", space = "free_y", switch = "y") +
theme_minimal() +
theme(
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_line(colour = "black"),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
strip.placement = "outside",
strip.background = element_rect(fill="gray90", color = "white"),
)
print(plot_out)
gives
However, adding a solid between the facets without any hardcoding is difficult. There are some possible ways to add borders between facets (see here), but if we don't know whether the facets change it is not obvious to which value the border should be assigned. I guess there is a possible solution with drawing a hard coded line in the plot that divides the facets, but the tricky part is to determine dynamically where that border is going to be located, based on the data and how the facets are ultimately draw (e.g. in which order etc). I'd be interested in hearing other opinions on this.
This question already has answers here:
ggplot2 - jitter and position dodge together
(2 answers)
Closed 6 years ago.
I have a data which can be divaded via two seperators. One is year and second is a field characteristics.
box<-as.data.frame(1:36)
box$year <- c(1996,1996,1996,1996,1996,1996,1996,1996,1996,
1997,1997,1997,1997,1997,1997,1997,1997,1997,
1996,1996,1996,1996,1996,1996,1996,1996,1996,
1997,1997,1997,1997,1997,1997,1997,1997,1997)
box$year <- as.character(box$year)
box$case <- c(6.40,6.75,6.11,6.33,5.50,5.40,5.83,4.57,5.80,
6.00,6.11,6.40,7.00,NA,5.44,6.00, NA,6.00,
6.00,6.20,6.40,6.64,6.33,6.60,7.14,6.89,7.10,
6.73,6.27,6.64,6.41,6.42,6.17,6.05,5.89,5.82)
box$code <- c("L","L","L","L","L","L","L","L","L","L","L","L",
"L","L","L","L","L","L","M","M","M","M","M","M",
"M","M","M","M","M","M","M","M","M","M","M","M")
colour <- factor(box$code, labels = c("#F8766D", "#00BFC4"))
In boxplots, I want to display points over them, to see how data is distributed. That is easily done with one single boxplot for every year:
ggplot(box, aes(x = year, y = case, fill = "#F8766D")) +
geom_boxplot(alpha = 0.80) +
geom_point(colour = colour, size = 5) +
theme(text = element_text(size = 18),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.major.x = element_blank(),
legend.position = "none")
But it become more complicated as I add fill parameter in them:
ggplot(box, aes(x = year, y = case, fill = code)) +
geom_boxplot(alpha = 0.80) +
geom_point(colour = colour, size = 5) +
theme(text = element_text(size = 18),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.major.x = element_blank(),
legend.position = "none")
And now the question: How to move these points to boxplot axes, where they belong? As blue points to blue boxplot and red to red one.
Like Henrik said, use position_jitterdodge() and shape = 21. You can clean up your code a bit too:
No need to define box, then fill it piece by piece
You can let ggplot hash out the colors if you wish and skip constructing the colors factor. If you want to change the defaults, look into scale_fill_manual and scale_color_manual.
box <- data.frame(year = c(1996,1996,1996,1996,1996,1996,1996,1996,1996,
1997,1997,1997,1997,1997,1997,1997,1997,1997,
1996,1996,1996,1996,1996,1996,1996,1996,1996,
1997,1997,1997,1997,1997,1997,1997,1997,1997),
case = c(6.40,6.75,6.11,6.33,5.50,5.40,5.83,4.57,5.80,
6.00,6.11,6.40,7.00,NA,5.44,6.00, NA,6.00,
6.00,6.20,6.40,6.64,6.33,6.60,7.14,6.89,7.10,
6.73,6.27,6.64,6.41,6.42,6.17,6.05,5.89,5.82),
code = c("L","L","L","L","L","L","L","L","L","L","L","L",
"L","L","L","L","L","L","M","M","M","M","M","M",
"M","M","M","M","M","M","M","M","M","M","M","M"))
ggplot(box, aes(x = factor(year), y = case, fill = code)) +
geom_boxplot(alpha = 0.80) +
geom_point(aes(fill = code), size = 5, shape = 21, position = position_jitterdodge()) +
theme(text = element_text(size = 18),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.major.x = element_blank(),
legend.position = "none")
I see you've already accepted #JakeKaupp's nice answer, but I thought I would throw in a different option, using geom_dotplot. The data you are visualizing is rather small, so why not forego the boxplot?
ggplot(box, aes(x = factor(year), y = case, fill = code))+
geom_dotplot(binaxis = 'y', stackdir = 'center',
position = position_dodge())
I am trying to draw this following graph using ggplot2 package, but somehow the axis won't show up. the ticks are there, just not the axis line. I have used the theme(axis.line=element_line()) function, but it wouldn't work.
Here is my code:
library(ggplot2)
ggplot(data = soepl_randsub, aes(x = year, y =satisf_org, group = id)) +
geom_point() + geom_line() +ylab("Current Life Satisfaction") +theme_bw() +
theme(plot.background = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank() ) +
theme(panel.border= element_blank()) +
theme(axis.line = element_line(color="black", size = "2"))
I am not sure what went wrong. Here is the chart.
The bug was fixed in ggplot2 v2.2.0 There is no longer a need to specify axis lines separately.
I think this is a bug in ggplot2 v2.1.0. (See this bug report and this one.) A workaround is to set the x-axis and y-axis lines separately.
library(ggplot2)
ggplot(data = mpg, aes(x = hwy, y = displ)) +
geom_point() +
theme_bw() +
theme(plot.background = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank() )+
theme(panel.border= element_blank())+
theme(axis.line.x = element_line(color="black", size = 2),
axis.line.y = element_line(color="black", size = 2))
You don't need to specify axis-size for X and Y separately. When you are specifying size="2", R is considering value 2 as non-numeric argument. Hence, axis-line parameter is defaulted to 0 size. Use this line of code:
ggplot(data = mpg, aes(x = hwy, y = displ)) + geom_point() +xlab("Date")+ylab("Value of Home")+theme_bw() +theme(plot.background = element_blank(),panel.grid.major = element_blank(),panel.grid.minor = element_blank()) + theme(panel.border= element_blank()) +
theme(axis.line = element_line(color="black", size = 2))
axis_line inherits from line in R, hence specifying size is mandatory for non-default values.