Inserting horizontal line to line chart in ggplot2 - r

I must plot 25 plots, each with its own dataset. I need to insert a horizontal line into each plot. Problem is, the coordinates cannot be hardcoded as each dataset's range varies.
I need to have the horizontal line always to be at the first value of the according dataset
This is my geom for the line that I tried (the y-axis intercept is hardcoded in this case and doesnt help).
+ geom_hline(yintercept=c(75,0), linetype="dotted")
I can grab the value (which is at the identical position in each dataset for each plot) for each line's y-intersepction with this:
dataset[1, 6]
which I could also store in a vector like this
coord <- dataset[1, 6]
But not having any success bringing this together
I tried with no luck:
+ geom_hline(yintercept=coord, linetype="dotted")
Example Code:
a <- c(10,40,30,22)
b <- c(1,2,3,4)
df <- data.frame(a,b)
try <- df %>% ggplot(aes(x = b, y = a)) + geom_line() + scale_y_continuous(expand = c(0,0), limits = c(0, NA)) + geom_hline(yintercept=c(30,0), linetype="dotted") + theme_tq()
Thanks in advance

I don't understand what exactly is causing you trouble. If I loop through a list of dataframes, I can set the yintercept of each corresponding plot without too much trouble. Example below:
library(ggplot2)
library(patchwork)
# Split the economics dataset as an example
datasets <- split(economics, cut(seq_len(nrow(economics)), 9))
# Loop through list of dataframes, set hline to [1, 6] (drop because tibble)
plots <- lapply(datasets, function(df) {
ggplot(df, aes(date, unemploy)) +
geom_line() +
scale_y_continuous(limits = c(0, NA)) +
geom_hline(yintercept = c(df[1, 6, drop = TRUE], 0),
linetype = "dotted")
})
# For visualisation purposes
wrap_plots(plots)
Created on 2020-12-04 by the reprex package (v0.3.0)

Related

How to rescale color mapping in scale_color_distiller (ggplot2)?

My data consists of three numeric variables. Something like this:
set.seed(1)
df <- data.frame(x= rnorm(10000), y= rnorm(10000))
df$col= df$x + df$y + df$x*df$y
Plotting this as a heatplot looks good:
ggplot(df, aes(x, y, col= col)) + geom_point(size= 2) + scale_color_distiller(palette = "Spectral")
But real variables can have some skewness or outliers and this totally changes the plot. After df$col[nrow(df)] <- 100 same ggplot code as above returns this plot:
Clearly, the problem is that this one point changes the scale and we get a plot with little information. My solution is to rank the data with rank() which gives a reasonable color progression for any variable I`ve tried so far. See here:
ggplot(df, aes(x, y, col= rank(col))) + geom_point(size= 2) + scale_color_distiller(palette = "Spectral")
The problem with this solution that the new scale (2,500 to 10,000) is shown as the color label. I want the original scale to be shown as color label (o to 10). Therefor, I want that the color progression corresponds to the ranked data; i.e. I need to somehow map the original values to the ranked color values. Is that possible? I tried to change limits argument to limits= c(0, 10) inside scale_color_distiller() but this does not help.
Sidenotes: I do not want to remove the outlier. Ranking works well. I wan to use scale_color_distiller(). If possible, I want not to use any additional packages than ggplot2.
rescale the rank to the range of your original df$col.
library(tidyverse)
set.seed(1)
df <- data.frame(x = rnorm(10000), y = rnorm(10000))
df %>%
mutate(
col = x + y + x * y,
scaled_rank = scales::rescale(rank(col), range(col))
) %>%
ggplot(aes(x, y, col = scaled_rank)) +
geom_point(size = 2) +
scale_color_distiller(palette = "Spectral")
Created on 2021-11-17 by the reprex package (v2.0.1)

generating a manhattan plot with ggplot

I've been trying to generate a Manhattan plot using ggplot, which I finally got to work. However, I cannot get the points to be colored by chromosome, despite having tried several different examples I've seen online. I'm attaching my code and the resulting plot below. Can anyone see why the code is failing to color points by chromosome?
library(tidyverse)
library(vroom)
# threshold to drop really small -log10 p values so I don't have to plot millions of uninformative points. Just setting to 0 since I'm running for a small subset
min_p <- 0.0
# reading in data to brassica_df2, converting to data frame, removing characters from AvsDD p value column, converting to numeric, filtering by AvsDD (p value)
brassica_df2 <- vroom("manhattan_practice_data.txt", col_names = c("chromosome", "position", "num_SNPs", "prop_SNPs_coverage", "min_coverage", "AvsDD", "AvsWD", "DDvsWD"))
brassica_df2 <- as.data.frame(brassica_df2)
brassica_df2$AvsDD <- gsub("1:2=","",as.character(brassica_df2$AvsDD))
brassica_df2$AvsDD <- as.numeric(brassica_df2$AvsDD)
brassica_df2 <- filter(brassica_df2, AvsDD > min_p)
# setting significance threshhold
sig_cut <- -log10(1)
# settin ylim for graph
ylim <- (max(brassica_df2$AvsDD) + 2)
# setting up labels for x axis
axisdf <- as.data.frame(brassica_df2 %>% group_by(chromosome) %>% summarize(center=( max(position) + min(position) ) / 2 ))
# making manhattan plot of statistically significant SNP shifts
manhplot <- ggplot(data = filter(brassica_df2, AvsDD > sig_cut), aes(x=position, y=AvsDD), color=as.factor(chromosome)) +
geom_point(alpha = 0.8) +
scale_x_continuous(label = axisdf$chromosome, breaks= axisdf$center) +
scale_color_manual(values = rep(c("#276FBF", "#183059"), unique(length(axisdf$chromosome)))) +
geom_hline(yintercept = sig_cut, lty = 2) +
ylab("-log10 p value") +
ylim(c(0,ylim)) +
theme_classic() +
theme(legend.position = "n")
print(manhplot)
I think you just need to move your color=... argument inside the call to aes():
ggplot(
data = filter(brassica_df2, AvsDD > sig_cut),
aes(x=position, y=AvsDD),
color=as.factor(chromosome))
becomes...
ggplot(
data = filter(brassica_df2, AvsDD > sig_cut),
aes(x=position, y=AvsDD, color=as.factor(chromosome)))

Adding entry for NA-values in continuous ggplot-legend

There is a very similar question here: Add NA value to ggplot legend for continuous data map.
I tried to understand it, but I didn't manage to make it work for my data.
So I created a super simple example. I have this data:
set.seed(1)
df = data.frame(a=rnorm(50), b=rnorm(50), c=rep(1:5, 10))
df[sample(1:50, 10), ]$c = NA
where all columns are numeric. Now I'd like to make a ggplot with a legend entry for the NA-values. When I do the following:
ggplot(df) +
geom_point(
aes(x = a, y =b, col=c)
)
This is the result
What I want is something like this (when c is a a factor it gets automatically an entry):
ggplot(df) +
geom_point(
aes(x = a, y =b, col=factor(c))
)
Could I achieve more or less easy similar results and keep my values in class numeric?
Defining a color for NA is easy by adding scale_color_continuous(na.value="red"), but it is not explicitly labeled in the legend.
To achieve that you could add a second color scale just for the NA value using ggnewscale:
library(ggplot2)
library(ggnewscale)
set.seed(1)
df = data.frame(a=rnorm(50), b=rnorm(50), c=rep(1:5, 10))
df[sample(1:50, 10), ]$c = NA
na.value.forplot <- 'red'
ggplot(df) +
geom_point(aes(x = a, y =b, col=c)) +
scale_color_continuous(guide = guide_colorbar(order = 2)) +
new_scale_color() +
geom_point(data=subset(df, is.na(c)),
aes(x=a, y=b, col="red")) +
scale_color_manual(name=NULL, labels="NA", values="red")
Created on 2021-03-31 by the reprex package (v1.0.0)

Violin plots with additional points

Suppose I make a violin plot, with say 10 violins, using the following code:
library(ggplot2)
library(reshape2)
df <- melt(data.frame(matrix(rnorm(500),ncol=10)))
p <- ggplot(df, aes(x = variable, y = value)) +
geom_violin()
p
I can add a dot representing the mean of each variable as follows:
p + stat_summary(fun.y=mean, geom="point", size=2, color="red")
How can I do something similar but for arbitrary points?
For example, if I generate 10 new points, one drawn from each distribution, how could I plot those as dots on the violins?
You can give any function to stat_summary provided it just returns a single value. So one can use the function sample. Put extra arguments such as size, in the fun.args
p + stat_summary(fun.y = "sample", geom = "point", fun.args = list(size = 1))
Assuming your points are qualified using the same group names (i.e., variable), you should be able to define them manually with:
newdf <- group_by(df, variable) %>% sample_n(10)
p + geom_point(data=newdf)
The points can be anything, including static numbers:
newdf <- data.frame(variable = unique(df$variable), value = seq(-2, 2, len=10))
p + geom_point(data=newdf)
I had a similar problem. Code below exemplifies the toy problem - How does one add arbitrary points to a violin plot? - and solution.
## Visualize data set that comes in base R
head(ToothGrowth)
## Make a violin plot with dose variable on x-axis, len variable on y-axis
# Convert dose variable to factor - Important!
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
# Plot
p <- ggplot(ToothGrowth, aes(x=dose, y=len)) +
geom_violin(trim = FALSE) +
geom_boxplot(width=0.1)
# Suppose you want to add 3 blue points
# [0.5, 10], [1,20], [2, 30] to the plot.
# Make a new data frame with these points
# and add them to the plot with geom_point().
TrueVals <- ToothGrowth[1:3,]
TrueVals$len <- c(10,20,30)
# Make dose variable a factor - Important for positioning points correctly!
TrueVals$dose <- as.factor(c(0.5, 1, 2))
# Plot with 3 added blue points
p <- ggplot(ToothGrowth, aes(x=dose, y=len)) +
geom_violin(trim = FALSE) +
geom_boxplot(width=0.1) +
geom_point(data = TrueVals, color = "blue")

Plotting two variables using ggplot2 - same x axis

I have two graphs with the same x axis - the range of x is 0-5 in both of them.
I would like to combine both of them to one graph and I didn't find a previous example.
Here is what I got:
c <- ggplot(survey, aes(often_post,often_privacy)) + stat_smooth(method="loess")
c <- ggplot(survey, aes(frequent_read,often_privacy)) + stat_smooth(method="loess")
How can I combine them?
The y axis is "often privacy" and in each graph the x axis is "often post" or "frequent read".
I thought I can combine them easily (somehow) because the range is 0-5 in both of them.
Many thanks!
Example code for Ben's solution.
#Sample data
survey <- data.frame(
often_post = runif(10, 0, 5),
frequent_read = 5 * rbeta(10, 1, 1),
often_privacy = sample(10, replace = TRUE)
)
#Reshape the data frame
survey2 <- melt(survey, measure.vars = c("often_post", "frequent_read"))
#Plot using colour as an aesthetic to distinguish lines
(p <- ggplot(survey2, aes(value, often_privacy, colour = variable)) +
geom_point() +
geom_smooth()
)
You can use + to combine other plots on the same ggplot object. For example, to plot points and smoothed lines for both pairs of columns:
ggplot(survey, aes(often_post,often_privacy)) +
geom_point() +
geom_smooth() +
geom_point(aes(frequent_read,often_privacy)) +
geom_smooth(aes(frequent_read,often_privacy))
Try this:
df <- data.frame(x=x_var, y=y1_var, type='y1')
df <- rbind(df, data.frame(x=x_var, y=y2_var, type='y2'))
ggplot(df, aes(x, y, group=type, col=type)) + geom_line()

Resources