How to Cluster the Dot Plots and add a dendrogram in R? - r

I plotted a DotPlot following these incredible steps (https://davemcg.github.io/post/lets-plot-scrna-dotplots/) from David but I could only reach the DotPlot part. I tried to calculate and plot the dendrogram and reorder my data, but I didn't succeed.
My code:
> teste2 %>% filter(bacteria %in% markers) %>%
+ ggplot(aes(x = bacteriophage, y = bacteria, color = eop_index, size = virulence_index)) +
+ geom_point() +
+ scale_color_viridis_c(name = 'EOP') +
+ cowplot::theme_cowplot() +
+ theme(axis.line = element_blank()) +
+ theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
+ ylab('') +
+ xlab('') +
+ theme(axis.ticks = element_blank())
My result: DotPlot
What I need:
DotPlot that I based my results
It's important to say that the type of data is different. I'm not working with scRNA data, so I'm adapting to my needs. When I apply the same code of David but change the variable names, the result is completely nonsense and does not reorder the data.
The code:
> mat <- teste2 %>%
+ filter(bacteria %in% markers) %>%
+ pivot_wider(names_from = bacteriophage, values_from = eop_index) %>%
+ data.frame() # make df as tibbles -> matrix annoying
> mat <- mat[,-1] #drop gene column as now in rows
> clust <- hclust(dist(mat %>% as.matrix())) # hclust with distance matrix
> ddgram <- as.dendrogram(clust) # create dendrogram
> ggtree_plot <- ggtree::ggtree(ddgram)
> ggtree_plot
> dotplot <- teste2 %>% filter(bacteria %in% markers) %>%
+ mutate(bacteria, factor(bacteria, levels = clust$labels[clust$order])) %>%
+ ggplot(aes(x = bacteriophage, y = bacteria, color = eop_index, size = virulence_index)) +
+ geom_point() +
+ scale_color_viridis_c(name = 'EOP') +
+ cowplot::theme_cowplot() +
+ theme(axis.line = element_blank()) +
+ theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
+ ylab('') +
+ xlab('') +
+ theme(axis.ticks = element_blank())
> plot_grid(ggtree_plot, dotplot, nrow = 1, rel_widths = c(0.5,2), align = 'h')
The result: DotPlot with dendrogram rsrs
Here is a sample of my data: Screenshot of data
If someone could help me, I would appreciate it very much!
Thanks in advance!
Obs: I tried to change the variables at different ways, but any of them result a correct dendrogram and reorder the dotplot data sequence.

Related

How to add legend to combined ggplot2 (different values)

I need to add a legend to my combined ggplot but don't seem to succeed. The data is made up from three different values of which "targeted" and "non targeted" are in one plot and "response" in the other. I would like to add the legend at the bottom of the combined plot.
Thanks in advance
Output of my current code
A <- ggplot(placeholder) +
geom_line(aes(x=date, y= targeted), color='#0072b2', group= 1, size=1.25) +
geom_line(aes(x=date, y= non_targeted), color='#d55e00' , group= 1, size=1.25) +
theme(axis.text.x = element_text(angle = 90)) +
labs(y= "Y1", x = "Date") + theme_classic() +
theme(axis.text.x = element_text(angle = 90))
A
B <- ggplot(media_analysis) +
geom_line(aes(x=date, y= nuisance_reports), color='#f0e442', group= 1, size=1.25) +
theme(axis.text.x = element_text(angle = 90)) +
labs(y= "Y2", x = "Date") + theme_classic() +
theme(axis.text.x = element_text(angle = 90))
B
combined <- plot_grid(A, B,
labels = c("A", "B"),
ncol = 2, nrow = 1)
combined
Easiest is to use facets. This requires minor data wrangling (see comments).
suppressMessages(library(tidyverse))
set.seed(42)
foo <- cbind(data.frame(replicate(3, rnorm(30))), date = ISOdate(1,1,1:30))
# make a data frame which has all the values in one column,
# and all the dates in another
# there needs to be a facetting variable
foo %>%
pivot_longer(starts_with("X")) %>%
## combine X1 and X2 to one category
mutate(new_name = ifelse(grepl("X[1-2]", name), "A", "B")) %>%
## now facet by this new variable
ggplot() +
geom_line(aes(date, value, color = name)) +
facet_wrap(~new_name) +
## place at bottom
theme(legend.position = "bottom")
or combining the plots with patchwork
This will require same name and limits of the guides
library(patchwork)
p1 <-
foo %>%
pivot_longer(matches("X[1-2]")) %>%
ggplot() +
geom_line(aes(date, value, color = name)) +
## now define the same names for your color legend and get the same limits
scale_color_brewer("color", limits = paste0("X", 1:3))
p2 <- foo %>%
ggplot() +
geom_line(aes(date, X3, color = "X3")) +
scale_color_brewer("color", limits = paste0("X", 1:3))
p1 + p2 +
## combine the legends
plot_layout(guides = "collect") +
# add tags
plot_annotation(tag_levels = "A") &
## place at bottom
theme(legend.position = "bottom")
Created on 2022-06-29 by the reprex package (v2.0.1)

Add mean line to ggplot?

I currently have this plot:
current plot without mean line
I want to add a continuous line in the plot that shows the mean value of each x-axis point.
How can i do this? Here is my code:
data <- ndpdata[which(ndpdata$FC.Fill.Size==250),] #250 fill size
data$PS_DATE <- as.Date(data$PS_DATE, "%Y-%m-%d")
data$PS_DATE <- as.Date(data$PS_DATE, "%m-%d-%Y")
data$final <- paste(data$PS_DATE, data$FC.Batch.Nbr, sep=" ") %>% na.omit()
library(tidyr)
my_df_long <- gather(data, group, y, -final)
data = my_df_long[2075:2550,] %>% na.omit()
ggplot(data, aes(final, y, color=final), na.rm=TRUE) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) + theme(legend.position = "none") + geom_point(na.rm=TRUE) +
scale_y_discrete(breaks = c(251,270,290,310,325))
First, for the future please note the note of MrFlick.
We could use stat_summary. x should be factor and in a meaningful order.
I can't test because no data provided:
ggplot(data, aes(x=factor(final), y, color=final), na.rm=TRUE) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) + theme(legend.position = "none") + geom_point(na.rm=TRUE) +
scale_y_discrete(breaks = c(251,270,290,310,325)) +
stat_summary(fun=mean, colour="red", geom="line", aes(group = 1))

How to create such a figure using ggplot2 in R?

I have a matrix with many zero elements. The column names are labeled on the horizontal axis. I'd like to show explictly the nonzero elements as the bias from the vertical line for each column.
So how should construct a figure such as the example using ggplot2?
An example data can be generated as follow:
set.seed(2018)
N <- 5
p <- 40
dat <- matrix(0.0, nrow=p, ncol=N)
dat[2:7, 1] <- 4*rnorm(6)
dat[4:12, 2] <- 2.6*rnorm(9)
dat[25:33, 3] <- 2.1*rnorm(9)
dat[19:26, 4] <- 3.3*rnorm(8)
dat[33:38, 5] <- 2.9*rnorm(6)
colnames(dat) <- letters[1:5]
print(dat)
Here is another option using facet_wrap and geom_col with theme_minimal.
library(tidyverse)
dat %>%
as.data.frame() %>%
rowid_to_column("row") %>%
gather(key, value, -row) %>%
ggplot(aes(x = row, y = value, fill = key)) +
geom_col() +
facet_wrap(~ key, ncol = ncol(dat)) +
coord_flip() +
theme_minimal()
To further increase the aesthetic similarity to the plot in your original post we can
move the facet strips to the bottom,
rotate strip labels,
add "zero lines" in matching colours,
remove the fill legend, and
get rid of the x & y axis ticks/labels/title.
library(tidyverse)
dat %>%
as.data.frame() %>%
rowid_to_column("row") %>%
gather(key, value, -row) %>%
ggplot(aes(x = row, y = value, fill = key)) +
geom_col() +
geom_hline(data = dat %>%
as.data.frame() %>%
gather(key, value) %>%
count(key) %>%
mutate(y = 0),
aes(yintercept = y, colour = key), show.legend = F) +
facet_wrap(~ key, ncol = ncol(dat), strip.position = "bottom") +
coord_flip() +
guides(fill = FALSE) +
theme_minimal() +
theme(
strip.text.x = element_text(angle = 45),
axis.title = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank())
It would be much easier if you can provide some sample data. Thus I needed to create them and there is no guarantee that this will work for your purpose.
set.seed(123)
# creating some random sample data
df <- data.frame(id = rep(1:100, each = 3),
x = rnorm(300),
group = rep(letters[1:3], each = 100),
bias = sample(0:1, 300, replace = T, prob = c(0.7, 0.3)))
# introducing bias
df$bias <- df$bias*rnorm(nrow(df))
# calculate lower/upper bias for errorbar
df$biaslow <- apply(data.frame(df$bias), 1, function(x){min(0, x)})
df$biasupp <- apply(data.frame(df$bias), 1, function(x){max(0, x)})
Then I used kind of hack to be able to print groups in sufficient distance to make them not overlapped. Based on group I shifted bias variable and also lower and upper bias.
# I want to print groups in sufficient distance
df$bias <- as.numeric(df$group)*5 + df$bias
df$biaslow <- as.numeric(df$group)*5 + df$biaslow
df$biasupp <- as.numeric(df$group)*5 + df$biasupp
And now it is possible to plot it:
library(ggplot2)
ggplot(df, aes(x = x, col = group)) +
geom_errorbar(aes(ymin = biaslow, ymax = biasupp), width = 0) +
coord_flip() +
geom_hline(aes(yintercept = 5, col = "a")) +
geom_hline(aes(yintercept = 10, col = "b")) +
geom_hline(aes(yintercept = 15, col = "c")) +
theme(legend.position = "none") +
scale_y_continuous(breaks = c(5, 10, 15), labels = letters[1:3])
EDIT:
To incorporate special design you can add
theme_bw() +
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.title.y = element_blank(),
axis.text.x = element_text(angle = 45, vjust = 0.5, hjust = 1),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
to your plot.
EDIT2:
To incorporate several horizontal lines, you can create different dataset:
df2 <- data.frame(int = unique(as.numeric(df$group)*5),
gr = levels(df$group))
And use
geom_hline(data = df2, aes(yintercept = int, col = gr))
instead of copy/pasting geom_hline for each group level.

Closing the lines in a ggplot2 radar / spider chart

I need a flexible way to make radar / spider charts in ggplot2. From solutions I've found on github and the ggplot2 group, I've come this far:
library(ggplot2)
# Define a new coordinate system
coord_radar <- function(...) {
structure(coord_polar(...), class = c("radar", "polar", "coord"))
}
is.linear.radar <- function(coord) TRUE
# rescale all variables to lie between 0 and 1
scaled <- as.data.frame(lapply(mtcars, ggplot2:::rescale01))
scaled$model <- rownames(mtcars) # add model names as a variable
as.data.frame(melt(scaled,id.vars="model")) -> mtcarsm
ggplot(mtcarsm, aes(x = variable, y = value)) +
geom_path(aes(group = model)) +
coord_radar() + facet_wrap(~ model,ncol=4) +
theme(strip.text.x = element_text(size = rel(0.8)),
axis.text.x = element_text(size = rel(0.8)))
which works, except for the fact that lines are not closed.
I thougth that I would be able to do this:
mtcarsm <- rbind(mtcarsm,subset(mtcarsm,variable == names(scaled)[1]))
ggplot(mtcarsm, aes(x = variable, y = value)) +
geom_path(aes(group = model)) +
coord_radar() + facet_wrap(~ model,ncol=4) +
theme(strip.text.x = element_text(size = rel(0.8)),
axis.text.x = element_text(size = rel(0.8)))
in order to join the lines, but this does not work. Neither does this:
closes <- subset(mtcarsm,variable == names(scaled)[c(1,11)])
ggplot(mtcarsm, aes(x = variable, y = value)) +
geom_path(aes(group = model)) +
coord_radar() + facet_wrap(~ model,ncol=4) +
theme(strip.text.x = element_text(size = rel(0.8)),
axis.text.x = element_text(size = rel(0.8))) + geom_path(data=closes)
which does not solve the problem, and also produces lots of
"geom_path: Each group consist of only one observation. Do you need to
adjust the group aesthetic?"
messages. Som, how do I go about closing the lines?
/Fredrik
Using the new ggproto mechanism available in ggplot2 2.0.0, coord_radar can be defined as:
coord_radar <- function (theta = "x", start = 0, direction = 1)
{
theta <- match.arg(theta, c("x", "y"))
r <- if (theta == "x")
"y"
else "x"
ggproto("CoordRadar", CoordPolar, theta = theta, r = r, start = start,
direction = sign(direction),
is_linear = function(coord) TRUE)
}
Not sure if the syntax is perfect but it is working...
The codes here seem outdated for ggplot2: 2.0.0
Try my package zmisc: devtools:install_github("jerryzhujian9/ezmisc")
After you install it, you will be able to run:
df = mtcars
df$model = rownames(mtcars)
ez.radarmap(df, "model", stats="mean", lwd=1, angle=0, fontsize=0.6, facet=T, facetfontsize=1, color=id, linetype=NULL)
ez.radarmap(df, "model", stats="none", lwd=1, angle=0, fontsize=1.5, facet=F, facetfontsize=1, color=id, linetype=NULL)
if you are curious about what's inside, see my codes at github:
The main codes were adapted from http://www.cmap.polytechnique.fr/~lepennec/R/Radar/RadarAndParallelPlots.html
solution key factor
add duplicated mpg row after melt by rbind
inherit CoordPolar on ggproto
set is_linear = function() TRUE on ggproto
especially is_linear = function() TRUE is important,
since if not you will get plot like this...
with is_linear = function() TRUE settings you can get,
library(dplyr)
library(data.table)
library(ggplot2)
rm(list=ls())
scale_zero_to_one <-
function(x) {
r <- range(x, na.rm = TRUE)
min <- r[1]
max <- r[2]
(x - min) / (max - min)
}
scaled.data <-
mtcars %>%
lapply(scale_zero_to_one) %>%
as.data.frame %>%
mutate(car.name=rownames(mtcars))
plot.data <-
scaled.data %>%
melt(id.vars='car.name') %>%
rbind(subset(., variable == names(scaled.data)[1]))
# create new coord : inherit coord_polar
coord_radar <-
function(theta='x', start=0, direction=1){
# input parameter sanity check
match.arg(theta, c('x','y'))
ggproto(
NULL, CoordPolar,
theta=theta, r=ifelse(theta=='x','y','x'),
start=start, direction=sign(direction),
is_linear=function() TRUE)
}
plot.data %>%
ggplot(aes(x=variable, y=value, group=car.name, colour=car.name)) +
geom_path() +
geom_point(size=rel(0.9)) +
coord_radar() +
facet_wrap(~ car.name, nrow=4) +
theme_bw() +
theme(
axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.title.x = element_blank(),
legend.position = 'none') +
labs(title = "Cars' Status")
final result
Sorry, I was beeing stupid. This seems to work:
library(ggplot2)
# Define a new coordinate system
coord_radar <- function(...) {
structure(coord_polar(...), class = c("radar", "polar", "coord"))
}
is.linear.radar <- function(coord) TRUE
# rescale all variables to lie between 0 and 1
scaled <- as.data.frame(lapply(mtcars, ggplot2:::rescale01))
scaled$model <- rownames(mtcars) # add model names as a variable
as.data.frame(melt(scaled,id.vars="model")) -> mtcarsm
mtcarsm <- rbind(mtcarsm,subset(mtcarsm,variable == names(scaled)[1]))
ggplot(mtcarsm, aes(x = variable, y = value)) +
geom_path(aes(group = model)) +
coord_radar() + facet_wrap(~ model,ncol=4) +
theme(strip.text.x = element_text(size = rel(0.8)),
axis.text.x = element_text(size = rel(0.8)))
It turns out than geom_polygom still produces a polygon in the polar coordinates so that
# rescale all variables to lie between 0 and 1
scaled <- as.data.frame(lapply(mtcars, ggplot2:::rescale01))
scaled$model <- rownames(mtcars) # add model names as a variable
# melt the dataframe
mtcarsm <- reshape2::melt(scaled)
# plot it as using the polygon geometry in the polar coordinates
ggplot(mtcarsm, aes(x = variable, y = value)) +
geom_polygon(aes(group = model), color = "black", fill = NA, size = 1) +
coord_polar() + facet_wrap( ~ model) +
theme(strip.text.x = element_text(size = rel(0.8)),
axis.text.x = element_text(size = rel(0.8)),
axis.ticks.y = element_blank(),
axis.text.y = element_blank()) +
xlab("") + ylab("")
works perfectly...
Thank you guys for the help but it did not cover all of my needs. I used two series of data to be compared so I took the subset of mtcars for Mazda:
nobody mentioned about order of the x variable and ggplot2 sorts this variable for the plot but does not sort the data and it made my chart wrong at the first attempt. Apply sorting function for me it was dplyr::arrange(plot.data, x.variable.name)
I needed to annotate the chart with values and ggplot2::annotate() works fine but it was not included in the recent answers
the above code did not work fine for my data until adding ggplot2::geom_line
Finally this code chunk did my chart:
scaled <- as.data.frame(lapply(mtcars, ggplot2:::rescale01))
scaled$model <- rownames(mtcars)
mtcarsm <- scaled %>%
filter(grepl('Mazda', model)) %>%
gather(variable, value, mpg:carb) %>%
arrange(variable)
ggplot(mtcarsm, aes(x = variable, y = value)) +
geom_polygon(aes(group = model, color = model), fill = NA, size = 1) +
geom_line(aes(group = model, color = model), size = 1) +
annotate("text", x = mtcarsm$variable, y = (mtcarsm$value + 0.05), label = round(mtcarsm$value, 2), size = 3) +
theme(strip.text.x = element_text(size = rel(0.8)),
axis.text.x = element_text(size = rel(1.2)),
axis.ticks.y = element_blank(),
axis.text.y = element_blank()) +
xlab("") + ylab("") +
guides(color = guide_legend()) +
coord_radar()
Hopefully usefull for somebody

ggplot2: A mean row in heatmaps

Say I created a heatmap using the function geom_raster() (from ggplot2).
What's a smart way to add a row at the bottom of the table showing (in my case) the 'Mean return' for each month on the period considered ?
It would be nice there is some space left between the 1985-2013 period and the row for the average, and maybe police color and 'cases' could be customized.
The core of my code is as follows (the object molten contains the my data, originally a matrix passed through the melt() function of reshape2.
hm <- ggplot(data = molten, aes(x = factor(Var2, levels = month.abb), y=Var1, fillll=value)) + geom_raster()
hm <- hm + scale_fill_gradient2(low=LtoM(100), mid=Mid, high=MtoH(100))
hm <- hm + labs(fill='% Return')
hm <- hm + geom_text(aes(label=paste(sprintf("%.1f %%", value))), size = 4)
hm <- hm + scale_y_continuous(breaks = 1985:2013)
hm <- hm + xlab(label = NULL) + ylab(label = NULL)
hm <- hm + theme_bw()
hm <- hm + theme(axis.text.x = element_text(size = 10, hjust = 0, vjust = 0.4, angle=90))
It's not very concise, but I think this should do what you need.
You didn't provide a data set, so I just made some up. Also, the LtoM and MtoH functions are not included in any R package I could find, so I did a quick Google search and found them here
The following code produces a plot hm2 with facets to make the "Mean Return" row at the bottom:
require(reshape2)
require(ggplot2)
# Random data
set.seed(100)
casted = data.frame(Var1 = rep(1985:2013, times=12), Var2 = rep(month.abb, each=29), return = rnorm(12*29, 0, 9))
molten = melt(casted, id.vars = c("Var1", "Var2"))
LtoM <-colorRampPalette(c('red', 'yellow' ))
Mid <- "snow3"
MtoH <-colorRampPalette(c('lightgreen', 'darkgreen'))
# Averages
monthly.avg = cbind(Var1 = rep("Mean", 12), dcast(molten, Var2 ~ ., mean))
colnames(monthly.avg)[3] = "Mean"
molten2 = merge(molten, melt(monthly.avg), all.x = TRUE, all.y = TRUE)
# New plot
hm2 =
ggplot(data = molten2, aes(x = factor(Var2, levels = month.abb), y=Var1, fill=value)) +
geom_raster() +
scale_fill_gradient2(low=LtoM(100), mid=Mid, high=MtoH(100)) +
labs(fill='% Return') +
geom_text(aes(label=paste(sprintf("%.1f %%", value))), size = 4) +
xlab(label = NULL) + ylab(label = NULL) +
theme_bw() +
theme(axis.text.x = element_text(size = 10, hjust = 0, vjust = 0.4, angle=90)) +
facet_grid(variable ~ ., scales = "free_y", space = "free_y") + # grid layout
theme(strip.background = element_rect(colour = 'NA', fill = 'NA'), strip.text.y = element_text(colour = 'white')) # remove facet labels
which gives the following plot:
How about this:
I created a grid to mock up your data
Main changes, are to precalculate the aggregate and "spacer" data rows, and add to molten,
then add scale_y_discrete so you can label the rows,
then make sure the format works for the grey spacer bar with no % label (comments in code)
Easier in future if you include the data (or a sample) in the question
require(ggplot2)
molten<-expand.grid(c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"),1985:2013,0)
colnames(molten)<-c("Var2","Var1","value")
molten$value=(runif(nrow(molten))*60)-30
#create means
means<-aggregate(molten[,c(1,3)], by=list(molten$Var2),FUN=mean, na.rm=TRUE)
colnames(means)<-c("Var2","Var1","value")
means$Var1<-"MEANS"
#create spacer bar
spacer<-means
spacer$Var1<-" "
spacer$value<-NA
#append them to the data
molten<-rbind(molten,spacer,means)
hm <- ggplot(data = molten, aes(x = Var2, y=Var1, fill=value)) +
geom_raster() +
# replaced your functions for ease of use
scale_fill_gradient2(low="red", mid="yellow", high="green",na.value="grey") +
labs(fill='% Return') +
# don't format the NA vals with %, return blank
geom_text(aes(label=ifelse((is.na(value)),"",paste(sprintf("%.1f %%", value)))), size = 4) +
# make the scale discrete to add labels and enforce order (use a blank space for the spacer)
scale_y_discrete(limits = c("MEANS"," ",1985:2013)) +
xlab(label = NULL) + ylab(label = NULL) +
theme_bw() +
theme(axis.text.x = element_text(size = 10, hjust = 0, vjust = 0.4, angle=90))
hm

Resources