ggforce facet_zoom - labels only on zoomed example - r

I would like to label points in a scatterplot, but only those within the facet_zoom panel. Here is an example:
library(ggplot2)
library(ggforce)
library(ggrepel)
library(magrittr)
labels <- letters
example_values_x <- rnorm(26)
example_values_y <- rnorm(26)
df <- data.frame(labels,
example_values_x,
example_values_y)
df %>% ggplot(aes(y = example_values_y,
x = example_values_x)) +
geom_point() +
facet_zoom(x = example_values_x > 0.5) +
geom_label_repel(data = filter(df, example_values_x > 0.5), aes(label = labels))
Any idea how to make it so the labels don't also appear on the non-zoomed panel?

NOTE: The following answer works with the GitHub version of ggforce. As of writing this, the version that's on CRAN appears to have a different interface for facet_zoom(), even though the package version is the same.
First, take your subset of data being labeled and add a zoom column, specifying whether the data should be rendered in the zoomed panel (TRUE), the original panel (FALSE), or both (NA):
dftxt <- dplyr::filter(df, example_values_x > 0.5) %>%
dplyr::mutate( zoom = TRUE ) ## All entries to appear in the zoom panel only
You can now pass this new data frame to geom_label_repel, while telling facet_zoom() to use the zoom column to determine where the data should be drawn:
df %>% ggplot(aes(y = example_values_y,
x = example_values_x)) +
geom_point() +
facet_zoom(x = example_values_x > 0.5, zoom.data=zoom) + # Note the zoom.data argument
geom_label_repel(data = dftxt, aes(label = labels))
Note that because the original df doesn't have a zoom column, facet_zoom() will treat it as NA and draw geom_point() in both panels, as desired:

Related

ggplotly: unable to add a frame in PCA score plot in ggplot2

I would like to make a PCA score plot using ggplot2, and then convert the plot into interactive plot using plotly.
What I want to do is to add a frame (not ellipse using stat_ellipse, I know it worked).
My problem is that when I try to use sample name as tooltip in ggplotly, the frame will disappear. I don't know how to fix it.
Below is my code
library(ggplot2)
library(plotly)
library(dplyr)
## Demo data
dat <- iris[1:4]
Group <- iris$Species
## Calculate PCA
df_pca <- prcomp(dat, center = T, scale. = FALSE)
df_pcs <- data.frame(df_pca$x, Group = Group)
percentage <-round(df_pca$sdev^2 / sum(df_pca$sdev^2) * 100, 2)
percentage <-paste(colnames(df_pcs),"(", paste(as.character(percentage), "%", ")", sep = ""))
## Visualization
Sample_Name <- rownames(df_pcs)
p <- ggplot(df_pcs, aes(x = PC1, y = PC2, color = Group, label = Sample_Name)) +
xlab(percentage[1]) +
ylab(percentage[2]) +
geom_point(size = 3)
ggplotly(p, tooltip = "label")
Until here it works! You can see that sample names can be properly shown in the ggplotly plot.
Next I tried to add a frame
## add frame
hull_group <- df_pcs %>%
dplyr::mutate(Sample_Name = Sample_Name) %>%
dplyr::group_by(Group) %>%
dplyr::slice(chull(PC1, PC2))
p2 <- p +
ggplot2::geom_polygon(data = hull_group, aes(fill = Group), alpha = 0.1)
You can see that the static plot still worked! The frame is properly added.
However, when I tried to convert it to plotly interactive plot. The frame disappeared.
ggplotly(p2, tooltip = "label")
Thanks a lot for your help.
It works if you move the data and mapping from the ggplot() call to the geom_point() call:
p2 <- ggplot() +
geom_point(data = df_pcs, mapping = aes(x = PC1, y = PC2, color = Group, label = Sample_Name), size = 3) +
ggplot2::geom_polygon(data = hull_group, aes(x = PC1, y = PC2, fill = Group, group = Group), alpha = 0.2)
ggplotly(p2, tooltip = "label")
You might want to change the order of the geom_point and geom_polygon to make sure that the points are on top of the polygon (this also affects the tooltip location).

generating a manhattan plot with ggplot

I've been trying to generate a Manhattan plot using ggplot, which I finally got to work. However, I cannot get the points to be colored by chromosome, despite having tried several different examples I've seen online. I'm attaching my code and the resulting plot below. Can anyone see why the code is failing to color points by chromosome?
library(tidyverse)
library(vroom)
# threshold to drop really small -log10 p values so I don't have to plot millions of uninformative points. Just setting to 0 since I'm running for a small subset
min_p <- 0.0
# reading in data to brassica_df2, converting to data frame, removing characters from AvsDD p value column, converting to numeric, filtering by AvsDD (p value)
brassica_df2 <- vroom("manhattan_practice_data.txt", col_names = c("chromosome", "position", "num_SNPs", "prop_SNPs_coverage", "min_coverage", "AvsDD", "AvsWD", "DDvsWD"))
brassica_df2 <- as.data.frame(brassica_df2)
brassica_df2$AvsDD <- gsub("1:2=","",as.character(brassica_df2$AvsDD))
brassica_df2$AvsDD <- as.numeric(brassica_df2$AvsDD)
brassica_df2 <- filter(brassica_df2, AvsDD > min_p)
# setting significance threshhold
sig_cut <- -log10(1)
# settin ylim for graph
ylim <- (max(brassica_df2$AvsDD) + 2)
# setting up labels for x axis
axisdf <- as.data.frame(brassica_df2 %>% group_by(chromosome) %>% summarize(center=( max(position) + min(position) ) / 2 ))
# making manhattan plot of statistically significant SNP shifts
manhplot <- ggplot(data = filter(brassica_df2, AvsDD > sig_cut), aes(x=position, y=AvsDD), color=as.factor(chromosome)) +
geom_point(alpha = 0.8) +
scale_x_continuous(label = axisdf$chromosome, breaks= axisdf$center) +
scale_color_manual(values = rep(c("#276FBF", "#183059"), unique(length(axisdf$chromosome)))) +
geom_hline(yintercept = sig_cut, lty = 2) +
ylab("-log10 p value") +
ylim(c(0,ylim)) +
theme_classic() +
theme(legend.position = "n")
print(manhplot)
I think you just need to move your color=... argument inside the call to aes():
ggplot(
data = filter(brassica_df2, AvsDD > sig_cut),
aes(x=position, y=AvsDD),
color=as.factor(chromosome))
becomes...
ggplot(
data = filter(brassica_df2, AvsDD > sig_cut),
aes(x=position, y=AvsDD, color=as.factor(chromosome)))

ggforce: geom_mark_ellipse - How to move connectors?

I have some data for which I would like to circle some different subsets. I am using ggplot2 and ggforce to plot the data and draw an ellipse (geom_mark_ellipse) around the data.
I have an issue in that the positions of the connectors on the ellipses (for my data) are in ambiguous positions (at the conjunction of two ellipses, on the border of two ellipses that graze each other).
How can I manually set the position of the connector to the ellipse? Or at least influence them into a particular region?
I have some code below which captures the spirit in which I'm plotting my data. For the purpose of the example, how could I make all of the labels appear in the top left of the plot, or all join the ellipses at x == 0, -2, -4 for each of the factors?
library(tidyverse)
library(ggforce)
x <- c(-1,0,1,-3,-2,2,3,-5,-4,4,5)
t <- c(1,1,1,2,2,2,2,3,3,3,3)
tmp <- as_tibble_col(x, column_name = "x")
tmp <- tmp %>% mutate(t = t)
#How do I move the position of the label connectors on the ellipses?
tmp %>%
ggplot(aes(x=x, y=x)) +
geom_mark_ellipse(aes(label = t, group=t),con.cap = 0) +
geom_point()
Created on 2020-05-05 by the reprex package (v0.3.0)
I've managed to do it for my contrived example, yet to try on my real data, but there is hope.
As shown in the code below, I created data to fill the area (top left) that I didn't want to have labels in, and gave it a factor of "". I manually set the colour of the connectors to NA for that factor, and got rid of the label background for everything. Because the factor is "", the label is an empty string, and nothing shows up. I also set scale_colour_manual to give the colour NA to the ellipse I didn't want to see.
I also filtered the geom_point to not show the data with a factor of "". Finally, I deleted the legend.
library(tidyverse)
library(ggforce)
x <- c(-1,0,1,-3,-2,2,3,-5,-4,4,5)
t <- c(1,1,1,2,2,2,2,3,3,3,3)
tmp <- as_tibble_col(x, column_name = "x")
tmp <- tmp %>% mutate(y=x)
tmp <- tmp %>% mutate(t = t)
#now lets add some dodging data
tmp <- tmp %>% mutate(t = as.character(t))
tmp <- tmp %>% add_row(x=c(-5,2.5,-2.5), y=c(-2.5,5,2.5),t="")
tmp %>%
ggplot(aes(x=x, y=y)) +
geom_mark_ellipse(aes(label = t, group=t, colour=factor(t)),
con.cap = 0, con.colour = c(NA, "black","black","black"),
label.fill=NA) +
scale_colour_manual(values=c(NA, "black", "black", "black")) +
geom_point(data = subset(tmp, t != "")) +
theme(legend.position = "none")
Created on 2020-05-06 by the reprex package (v0.3.0)

Fill area under time series based on factor value

I am trying to fill the area under a time series line based on a factor value of 0 and 1. The area should only be filled if the value is equal to 1.
I have managed to colour code the time series line based on the factor value with the following code:
install.packages("scales")
library("scales")
library("ggplot2")
ggplot(plot.timeseries) +
geom_line(aes(x = Date, y = Price, color = Index, group = 1)) +
scale_x_date(labels = date_format("%Y"), breaks = date_breaks("years")) +
scale_colour_manual(values = c("red3", "green3"))
This provides the following graph:
I have also tried this:
ggplot(plot.timeseries, aes(x=Date, y = Price, fill=Index)) +
geom_area(alpha=0.6) +
theme_classic() +
scale_fill_manual(values=c("#999999", "#32CD32"))
which comes out as a complete mess:
Ideally the final result should look like plot1 where the parts of the line in green are filled.
The time series data can be accessed here:
https://drive.google.com/file/d/1qWsuJk41_fJZktLCAZSgfGvoDLqTt-jk/view?usp=sharing
Any help would be greatly appreciated!
Okay, here is what I did to get the graph shown below if that is what you want.
# -------------------------------------------------------------------------
# load required packages #
library(scales)
library("ggplot2")
library(dplyr)
# -------------------------------------------------------------------------
# load the data to a df #
plot.timeseries <- get(load("TimeSeries_Data.RData"))
# -------------------------------------------------------------------------
# transform the data (my_fill_color will have green and NA values)
my_object <- plot.timeseries %>%
select(Price, Index, Date) %>%
mutate(Index_ord_factor = factor(Index, levels = unique(Index), ordered=TRUE),
my_fill_color = case_when(
Index_ord_factor > 0 ~ "green" # ordered factor enables the '>' operation
))
# -------------------------------------------------------------------------
# Plot your graph using the transformed data
ggplot(my_object, mapping = aes(x=Date, y=Price)) +
geom_line(aes(color = Index, group = 1))+
geom_col(fill =my_object$my_fill_color, width = 1)
# -------------------------------------------------------------------------
Let me know if you need elaboration to understand the script. Attached is the output in my end.
For those that are interested I also received this alternative solution from Erik Chacon.
You can view his tutorial here for a better understanding of the ggplot2 extension he designed, which is used in this solution.
# Installing and loading necessary packages
install.packages("remotes")
remotes::install_github("ErickChacon/mbsi")
library(mbsi)
library(ggplot2)
load("timeseries.RData")
#converting factor to numeric
plot.timeseries$Index <- as.numeric(levels(plot.timeseries$Index))[plot.timeseries$Index]
ggplot(plot.timeseries, aes(Date, Price)) +
geom_line() +
stat_events(aes(event = I(1 * (Index > 0)), fill = "Index"),
threshold = min(plot.timeseries$Price),
fill = "green", alpha = 0.3)

jitter geom_line()

Is there a way to jitter the lines in geom_line()? I know it kinda defies the purpose of this plot, but if you have a plot with few lines and would like them all to show it could be handy. Maybe some other solution to this visibility problem.
Please see below for code,
A <- c(1,2,3,5,1)
B <- c(3,4,1,2,3)
id <- 1:5
df <- data.frame(id, A, B)
# install.packages(reshape2)
require(reshape2) # for melt
dfm <- melt(df, id=c("id"))
# install.packages(ggplot2)
require(ggplot2)
p1 <- ggplot(data = dfm, aes(x = variable, y = value, group = id,
color= as.factor(id))) + geom_line() + labs(x = "id # 1 is hardly
visible as it is covered by id # 5") + scale_colour_manual(values =
c('red','blue', 'green', 'yellow', 'black'))
p2 <- ggplot(subset(dfm, id != 5), aes(x = variable, y = value,
group = id, color= as.factor(id))) + geom_line() + labs(x = "id # 
5 removed, id # 1 is visible") + scale_colour_manual(values =
c('red','blue', 'green', 'yellow', 'black'))
# install.packages(RODBC)
require(gridExtra)
grid.arrange(p1, p2)
You can try
geom_line(position=position_jitter(w=0.02, h=0))
and see if that works well.
If you just want to prevent two lines from overlapping exactly, there is now a better way: position_dodge(), which "adjusts position by dodging overlaps to the side". This is nicer than adding jitter to any line, even when it's not needed.
Avoid ggplot2 lines overlapping exactly using position_dodge()
Code example:
df<-data.frame(x=1:10,y=1:10,z=1:10);
df.m <- melt(df, id.vars = "x");
ggplot(df.m, aes(x=x,y=value,group=variable,colour=variable))
+ geom_line(position=position_dodge(width=0.2));
Thanks to position_dodge(), we can now see that there are two lines in the plot, which just happen to co-incide exactly:
I tend to use different linestyles, so that, say, a solid blue line "peeks through" a dashed red line on top of it.
Then again, it does depend on what you want to impart to the reader. Keep in mind first and foremost that data should be points and theory lines unless this makes things cluttered. Unless the y and x values are identical, it'll be easier to see the points. (or you could apply the existing jitter function to the x-values)
Next, if you just want to show which runs are in the "bundle" and which are outliers, overlap doesn't matter because it's very unlikely that two outliers will be near-equal.
If you want to show a bunch of near-equal runs, you may prefer (which is to say, your readers will understand better) to plot the deltas against a mean rather than the actual values.
I would like to suggest a solution to a different problem than described, in which the Y axis is a factor, so position_dodge does nothing.
code:
library(tidyverse)
time_raw <- tibble(year=1900:1909,
person_A=c(rep("Rome",2),rep("Jerusalem",8)),
person_B=c(rep("Jerusalem",5),rep("Rome",5)))
achievements <- tribble(~year,~who,~what,
1900,"person_A","born",
1900,"person_B","born",
1909,"person_A","died",
1909,"person_B","died",
1905,"person_A","super star",
1905,"person_B","super star")
SCALE=0.5
jitter_locations <- time_raw %>%
pivot_longer(-year,names_to="who",values_to="place") %>%
distinct(place)%>%
filter(!is.na(place)) %>%
mutate(y_place=seq_along(place))
jitter_lines <- time_raw %>%
pivot_longer(-year,names_to="who",values_to="place") %>%
distinct(who) %>%
mutate(y_jitter=scale(seq_along(who))*0.015)
data_for_plot <- time_raw %>%
pivot_longer(-year,names_to="who",values_to="place") %>%
filter(!is.na(place)) %>%
left_join(achievements) %>%
left_join(jitter_locations) %>%
left_join(jitter_lines)
data_for_plot %>%
ggplot(aes(x=year,y=y_place+y_jitter,color=who,group=who))+
geom_line(size=2)+
geom_hline(aes(yintercept=y_place),size=50,alpha=0.1)+
geom_point(data = . %>% filter(!is.na(what)),size=5)+
geom_label(aes(label=what),size=3,nudge_y = -0.025)+
theme_bw()+
coord_cartesian(ylim = c(min(jitter_locations$y_place)-0.5*SCALE,
max(jitter_locations$y_place)+0.5*SCALE))+
scale_y_continuous(breaks =
min(jitter_locations$y_place):max(jitter_locations$y_place),
labels = jitter_locations$place)+
scale_x_continuous(breaks =
min(data_for_plot$year):max(data_for_plot$year))+
ylab("Place")

Resources