ggplot fill dotplot but group as no filled - r

I want to plot a dotplot grouped as non colored figure but filled as the coloured one. To generate coloured I used:
Sample dataset:
data <- data.frame(estado1 = c('APLV','APLV','APLV','APLV','APLV','NO APLV','APLV','NO APLV','NO APLV','APLV','NO APLV','APLV','APLV','APLV','APLV','APLV','APLV','APLV','NO APLV','APLV'), combined_ige = c(3.6,2.84,1.2,14.33,0,0,0,0,0.07,2,0,0.3,0.11,0,0,1.31,0,0,0,0.19), sxtypes = c('skin_resp','skin','skin','skin_dig','dig','dig_resp','skin_dig','dig','dig','skin_resp','skin_dig_resp','dig','dig','dig_resp','skin_dig_resp','skin','dig','skin_dig_resp','resp','skin_dig'))
code
ggplot(data, aes(x=estado1, y=combined_ige, fill= sxtypes)) +
geom_dotplot(binaxis='y', stackdir='center',
stackratio=1.5, dotsize=1.2, alpha=0.6) +
geom_hline(yintercept = (0.35), linetype="dashed") +
geom_hline(yintercept = (0.77), linetype="dashed", col="red") +
xlab("Status group") +
ggtitle("IgE específicas combinadas") +
scale_y_log10(labels = function(y) format(y, scientific = F))
When I use "fill = sxtypes" in order to colour dots, them group in layers overlapping each other. I want them to stay in the same positions as in the not coloured figure at the time they colour as in the second figure.

Related

R: ggplot2 density plot shows wrong fill colors

I would like to plot densities of two variables ("red_variable", "green_variable") from two independent dataframes on one density plot, using red and green color for the two variables.
This is my attempt at coding:
library(ggplot2)
### Create dataframes
red_dataframe <- data.frame(red_variable = c(10,11,12,13,14))
green_dataframe <- data.frame(green_variable = c(6,7,8,9,10))
mean(red_dataframe$red_variable) # mean is 12
mean(green_dataframe$green_variable) # mean is 8
### Set colors
red_color= "#FF0000"
green_color= "#008000"
### Trying to plot densities with correct colors and correct legend entries
ggplot() +
geom_density(aes(x=red_variable, fill = red_color, alpha=0.5), data=red_dataframe) +
geom_density(aes(x=green_variable, fill = green_color, alpha=0.5), data=green_dataframe) +
scale_fill_manual(labels = c("Density of red_variable", "Density of green_variable"), values = c(red_color, green_color)) +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha=FALSE)
Result: The legend shows correct colors, but the colors on the plot are wrong: The "red" variable is plotted with green color, the "green" variable with red color. The "green" density (mean=8) should appear left and the "red" density (mean=12) on the right on the x-axis. This behavior of the plot doesn't make any sense to me.
I can in fact get the desired result by switching red and green in the code:
### load ggplot2
library(ggplot2)
### Create dataframes
red_dataframe <- data.frame(red_variable = c(10,11,12,13,14))
green_dataframe <- data.frame(green_variable = c(6,7,8,9,10))
mean(red_dataframe$red_variable) # mean is 12
mean(green_dataframe$green_variable) # mean is 8
### Set colors
red_color= "#FF0000"
green_color= "#008000"
### Trying to plot densities with correct colors and correct legend entries
ggplot() +
geom_density(aes(x=red_variable, fill = green_color, alpha=0.5), data=red_dataframe) +
geom_density(aes(x=green_variable, fill = red_color, alpha=0.5), data=green_dataframe) +
scale_fill_manual(labels = c("Density of red_variable", "Density of green_variable"), values = c(red_color, green_color)) +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha=FALSE)
... While the plot makes sense now, the code doesn't. I cannot really trust code doing the opposite of what I would expect it to do. What's the problem here? Am I color blind?
On your code, in order to have color at the right position, you need to specify fill = red_color or fill = green_color (as well as alpha as it is a constant - as pointed out by #Gregor) outside of the aes such as:
...+
geom_density(aes(x=red_variable), alpha=0.5, fill = red_color, data=red_dataframe) +
geom_density(aes(x=green_variable), alpha=0.5, fill = green_color, data=green_dataframe) + ...
Alternatively, you can bind your dataframes together, reshape them into a longer format (much more appropriate to ggplot) and then add color column that you can use with scale_fill_identity function (https://ggplot2.tidyverse.org/reference/scale_identity.html):
df <- cbind(red_dataframe,green_dataframe)
library(tidyr)
library(ggplot2)
library(dplyr)
df <- df %>% pivot_longer(.,cols = c(red_variable,green_variable), names_to = "var",values_to = "val") %>%
mutate(Color = ifelse(grepl("red",var),red_color,green_color))
ggplot(df, aes(val, fill = Color))+
geom_density(alpha = 0.5)+
scale_fill_identity(guide = "legend", name = "Legend", labels = levels(as.factor(df$var)))+
xlab("X value") +
ylab("Density")
Does it answer your question ?
You're trying to use ggplot as if it's base graphics... the mindset shift can take a little while to get used to. dc37's answer shows how you should do it. I'll try to explain what goes wrong in your attempt:
When you put fill = green_color inside aes(), because it's inside aes() ggplot essentially creates a new column of data filled with the green_color values in your green_data_frame, i.e., "#008000", "#008000", "#008000", .... Ditto for the red color values in the red data frame. We can see this if we modify your plot by simply deleting your scale:
ggplot() +
geom_density(aes(x = red_variable, fill = green_color, alpha = 0.5), data =
red_dataframe) +
geom_density(aes(x = green_variable, fill = red_color, alpha = 0.5), data =
green_dataframe) +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha = FALSE)
We can actually get what you want by putting the identity scale, which is designed for the (common in base, rare in ggplot2) case where you actually put color values in the data.
ggplot() +
geom_density(aes(x = red_variable, fill = green_color, alpha = 0.5), data =
red_dataframe) +
geom_density(aes(x = green_variable, fill = red_color, alpha = 0.5), data =
green_dataframe) +
scale_fill_identity() +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha = FALSE)
When you added your scale_fill_manual, ggplot was like "okay, cool, you want to specify colors and labels". But you were thinking in the order that you added the layers to the plot (much like base graphics), whereas ggplot was thinking of these newly created variables "#FF0000" and "#008000", which it ordered alphabetically by default (just as if they were factor or character columns in a data frame). And since you happened to add the layers in reverse alphabetical order, it was switched.
dc37's answer shows a couple better methods. With ggplot you should (a) work with a single, long-format data frame whenever possible (b) don't put constants inside aes() (constant color, constant alpha, etc.), (c) set colors in a scale_fill_* or scale_color_* function when they're not constant.

how to plot probability histogram in ggplot2

I want to plot a probability histogram overlay with probability curve and compare them between two group.
my code is as following,
ggplot(MDmedianall, aes(x= MD_median, y=..density.., fill =IDH.type )) +
geom_histogram(alpha = 0.5,binwidth = 0.00010, position = 'identity') +
geom_density( stat="density", position="identity", alpha=0.3 ) +
scale_fill_discrete(breaks=c("0","1"), labels=c("IDH wild type","IDH mutant type")) +
scale_y_continuous(labels = scales :: percent) +
ylab("Relative cumulative frequency(%)") +
xlab("MD median value")
However, the y axis is not what I want, any reasons for that?
BTW, how to change the line style and label them within the color square on the right.

ggplot2 add offset to jitter positions

I have data that looks like this
df = data.frame(x=sample(1:5,100,replace=TRUE),y=rnorm(100),assay=sample(c('a','b'),100,replace=TRUE),project=rep(c('primary','secondary'),50))
and am producing a plot using this code
ggplot(df,aes(project,x)) + geom_violin(aes(fill=assay)) + geom_jitter(aes(shape=assay,colour=y),height=.5) + coord_flip()
which gives me this
This is 90% of the way to being what I want. But I would like it if each point was only plotted on top of the violin plot for the matching assay type. That is, the jitterred positions of the points were set such that the triangles were only ever on the upper teal violin plot and the circles in the bottom red violin plot for each project type.
Any ideas how to do this?
In order to get the desired result, it is probably best to use position_jitterdodge as this gives you the best control over the way the points are 'jittered':
ggplot(df, aes(x = project, y = x, fill = assay, shape = assay, color = y)) +
geom_violin() +
geom_jitter(position = position_jitterdodge(dodge.width = 0.9,
jitter.width = 0.5,
jitter.height = 0.2),
size = 2) +
coord_flip()
which gives:
You can use interaction between assay & project:
p <- ggplot(df,aes(x = interaction(assay, project), y=x)) +
geom_violin(aes(fill=assay)) +
geom_jitter(aes(shape=assay, colour=y), height=.5, cex=4)
p + coord_flip()
The labeling can be adjusted by numeric scaled x axis:
# cbind the interaction as a numeric
df$group <- as.numeric(interaction(df$assay, df$project))
# plot
p <- ggplot(df,aes(x=group, y=x, group=cut_interval(group, n = 4))) +
geom_violin(aes(fill=assay)) +
geom_jitter(aes(shape=assay, colour=y), height=.5, cex=4)
p + coord_flip() + scale_x_continuous(breaks = c(1.5, 3.5), labels = levels(df$project))

overlaying plots in ggplot2

How to overlay one plot on top of the other in ggplot2 as explained in the following sentences? I want to draw the grey time series on top of the red one using ggplot2 in R (now the red one is above the grey one and I want my graph to be the other way around). Here is my code (I generate some data in order to show you my problem, the real dataset is much more complex):
install.packages("ggplot2")
library(ggplot2)
time <- rep(1:100,2)
timeseries <- c(rep(0.5,100),rep(c(0,1),50))
upper <- c(rep(0.7,100),rep(0,100))
lower <- c(rep(0.3,100),rep(0,100))
legend <- c(rep("red should be under",100),rep("grey should be above",100))
dataset <- data.frame(timeseries,upper,lower,time,legend)
ggplot(dataset, aes(x=time, y=timeseries)) +
geom_line(aes(colour=legend, size=legend)) +
geom_ribbon(aes(ymax=upper, ymin=lower, fill=legend), alpha = 0.2) +
scale_colour_manual(limits=c("grey should be above","red should be under"),values = c("grey50","red")) +
scale_fill_manual(values = c(NA, "red")) +
scale_size_manual(values=c(0.5, 1.5)) +
theme(legend.position="top", legend.direction="horizontal",legend.title = element_blank())
Convert the data you are grouping on into a factor and explicitly set the order of the levels. ggplot draws the layers according to this order. Also, it is a good idea to group the scale_manual codes to the geom it is being applied to for readability.
legend <- factor(legend, levels = c("red should be under","grey should be above"))
c <- data.frame(timeseries,upper,lower,time,legend)
ggplot(c, aes(x=time, y=timeseries)) +
geom_ribbon(aes(ymax=upper, ymin=lower, fill=legend), alpha = 0.2) +
scale_fill_manual(values = c("red", NA)) +
geom_line(aes(colour=legend, size=legend)) +
scale_colour_manual(values = c("red","grey50")) +
scale_size_manual(values=c(1.5,0.5)) +
theme(legend.position="top", legend.direction="horizontal",legend.title = element_blank())
Note that the ordering of the values in the scale_manual now maps to "grey" and "red"

How can I force ggplot's geom_tile to fill every facet?

I am using ggplot's geom_tile to do 2-D density plots faceted by a factor. Every facet's scale goes from the minimum of all the data to the maximum of all the data, but the geom_tile in each facet only extends to the range of the data plotted in that facet.
Example code that demonstrates the problem:
library(ggplot2)
data.unlimited <- data.frame(x=rnorm(500), y=rnorm(500))
data.limited <- subset(data.frame(x=rnorm(500), y=rnorm(500)), x<1 & y<1 & x>-1 & y>-1)
mydata <- rbind(data.frame(groupvar="unlimited", data.unlimited),
data.frame(groupvar="limited", data.limited))
ggplot(mydata) +
aes(x=x,y=y) +
stat_density2d(geom="tile", aes(fill = ..density..), contour = FALSE) +
facet_wrap(~ groupvar)
Run the code, and you will see two facets. One facet shows a density plot of an "unlimited" random normal distribution. The second facet shows a random normal truncated to lie within a 2x2 square about the origin. The geom_tile in the "limited" facet will be confined inside this small box instead of filling the facet.
last_plot() +
scale_x_continuous(limits=c(-5,5)) +
scale_y_continuous(limits=c(-5,5))
These last three lines plot the same data with specified x and y limits, and we see that neither facet extends the tile sections to the edge in this case.
Is there any way to force the geom_tile in each facet to extend to the full range of the facet?
I think you're looking for a combination of scales = "free" and expand = c(0,0):
ggplot(mydata) +
aes(x=x,y=y) +
stat_density2d(geom="tile", aes(fill = ..density..), contour = FALSE) +
facet_wrap(~ groupvar,scales = "free") +
scale_x_continuous(expand = c(0,0)) +
scale_y_continuous(expand = c(0,0))
EDIT
Given the OP's clarification, here's one option via simply setting the panel background manually:
ggplot(mydata) +
aes(x=x,y=y) +
stat_density2d(geom="tile", aes(fill = ..density..), contour = FALSE) +
facet_wrap(~ groupvar) +
scale_fill_gradient(low = "blue", high = "red") +
opts(panel.background = theme_rect(fill = "blue"),panel.grid.major = theme_blank(),
panel.grid.minor = theme_blank())

Resources