Related
I am working with the R programming language. I made the following graph that shows a scatterplot between points of two different colors :
library(ggplot2)
a = rnorm(10000,10,10)
b = rnorm(10000, 10, 10)
c = as.factor("red")
data_1 = data.frame(a,b,c)
a = rnorm(10000,7,5)
b = rnorm(10000, 7, 5)
c = as.factor("blue")
data_2 = data.frame(a,b,c)
final = rbind(data_1, data_2)
my_plot = ggplot(final, aes(x=a, y=b, col = c)) + geom_point() + theme(legend.position="top") + ggtitle("My Plot")
My Question: Is there a way to "change the colors of overlapping points"?
Here is what I tried so far:
1) I found the following question (Visualizing two or more data points where they overlap (ggplot R)) and tried the strategy suggested:
linecolors <- c("#714C02", "#01587A", "#024E37")
fillcolors <- c("#9D6C06", "#077DAA", "#026D4E")
# partially transparent points by setting `alpha = 0.5`
ggplot(final, aes(a,b, colour = c, fill = c)) +
geom_point(alpha = 0.5) +
scale_color_manual(values=linecolors) +
scale_fill_manual(values=fillcolors) +
theme_bw()
This shows the two different colors along with the overlap, but it is quite dark and still not clear. Is there a way to pick better colors/resolutions for this?
2) I found the following link which shows how to make color gradients for continuous variables : https://drsimonj.svbtle.com/pretty-scatter-plots-with-ggplot2 - but I have discrete colors and I do not know how to apply this
3) I found this question over here (Any way to make plot points in scatterplot more transparent in R?) which shows to do this with the base R plot, but not with ggplot2:
addTrans <- function(color,trans)
{
# This function adds transparancy to a color.
# Define transparancy with an integer between 0 and 255
# 0 being fully transparant and 255 being fully visable
# Works with either color and trans a vector of equal length,
# or one of the two of length 1.
if (length(color)!=length(trans)&!any(c(length(color),length(trans))==1)) stop("Vector lengths not correct")
if (length(color)==1 & length(trans)>1) color <- rep(color,length(trans))
if (length(trans)==1 & length(color)>1) trans <- rep(trans,length(color))
num2hex <- function(x)
{
hex <- unlist(strsplit("0123456789ABCDEF",split=""))
return(paste(hex[(x-x%%16)/16+1],hex[x%%16+1],sep=""))
}
rgb <- rbind(col2rgb(color),trans)
res <- paste("#",apply(apply(rgb,2,num2hex),2,paste,collapse=""),sep="")
return(res)
}
cols <- sample(c("red","green","pink"),100,TRUE)
# Very transparant:
plot(final$a , final$b ,col=addTrans(cols,100),pch=16,cex=1)
But this is also not able to differentiate between the two color classes that I have.
Problem: Can someone please suggest how to fix the problem with overlapping points, such that the overlap appear more visible?
Thanks!
I would use a density heatmap
ggplot(final, aes(x=a, y=b, col = c))+
stat_density_2d(aes(fill = stat(density)), geom = 'raster', contour = FALSE) +
scale_fill_viridis_c() +
coord_cartesian(expand = FALSE) +
geom_point(shape = '.', col = 'white')
or
ggplot(final, aes(x=a, y=b, col = c))+
stat_density_2d(aes(fill = stat(level)), geom = 'polygon') +
scale_fill_viridis_c(name = "density") +
geom_point(shape = '.')
or
ggplot(final, aes(x=a, y=b, col = c))+
geom_point(alpha = 0.1) +
geom_rug(alpha = 0.01)
I am plotting a box-plot to see the distribution of the variable. I am also interested in seeing the number of observations in each quartile. Is there any way to add the number of observations in each quartile to the boxplot along with the values of quartiles?
I included some code below which can generate box-plot with the values of quartiles.
df <- datasets::iris
boxplot <- ggplot(df, aes(x = "", y = Sepal.Length)) +
geom_boxplot(width=0.1, position = "dodge", fill = "red") +
stat_boxplot(geom = "errorbar", width = 0.1) +
stat_summary(geom = "label_repel", fun.y = quantile, aes(label = ..y..),
position = position_nudge(x = -0.1), size = 3) +
ggtitle("") +
xlab("") +
ylab('Sepal.Length')
I expect the values of quartiles on the left-hand side of the plot and the number of observations on the right-hand side of the plot if possible.
this would be one possibility. I always prefer to have my additional data as an extra data frame, because this gives me more control on what is how calculated.
Counting made with some inspiration from https://stackoverflow.com/a/54451575
quantile_counts=function(x){
df= data.frame(label=table(cut(x, quantile(x))),
label_pos=diff(quantile(x))/2+quantile(x)[1:4])
return(df)
}
df_quantile_counts=quantile_counts(df$Sepal.Length)
boxplot <- ggplot(df, aes(x = "", y = Sepal.Length)) +
geom_boxplot(width=0.1, position = "dodge", fill = "red") +
stat_boxplot(geom = "errorbar", width = 0.1) +
stat_summary(geom = "label", fun.y = quantile, aes(label = ..y..),
position = position_nudge(x = -0.1), size = 3) +
geom_text(data=df_quantile_counts,aes(x="",y=label_pos,label = label.Freq),
position = position_nudge(x = +0.1), size = 3) +
ggtitle("") +
xlab("") +
ylab('Sepal.Length')
HTH, Tobi
#TobiO 's answer is correct. But, my data was kind of skewed and some cut points were the same (such as the first and second cut points were the same). I needed to take the unique values to calculate the number of observations in each quartile. Another point is related to usage of cut function which does not include the starting point (low bound, high bound]. In order to include the starting point, I have used the cut2 function from the Hmisc package. I included a label_pos_extension line in order to prevent the overlap of label/text for the quartiles whose cut points are very close to each other. geom_text_repel did not work for preventing the overlaps.
quantile_counts2 <- function(x){
label_pos_extension <- c(0,3,4,0)
if(length(unique(quantile(x))) < 5){
df <- data.frame(label = table(cut2(x, g = 4)),
label_pos = c(0, diff(unique(quantile(x))) / 2 + quantile(x)[1:length(unique(quantile(x)))-1]) + label_pos_extension[1:length(unique(quantile(x)))])
} else {
df <- data.frame(label = table(cut2(x, g = 4)),
label_pos = diff(quantile(x)) / 2 + quantile(x)[1:4] + label_pos_extension)
} return(df)
}
PS. I tried to put my edited function in comment but, it did not work.
I have a data which has two variables and I want to see a single plot with heatmap for each of them overlaid on one another and showing two color scales for the two different variables. My code while not correct should clearly indicate what I am trying to achieve.
I have looked through several examples none of those indicate how to do this for geom_tile(). It would have been easy for geom_point. I am providing a synthetic example to show what I am doing. I get the error saying "Scale for 'fill' is already present. Adding another scale for 'fill', which will replace the existing scale." Evidently it is accepting only the second scale_fill_gradient, but I would like to view both the color gradients corresponding to the variables in the same heatmap.
It would be great if I could find a way to get this plot. Thank you!
library(reshape2)
library(ggplot2)
set.seed(2)
m1 = matrix(rnorm(100), nrow=10)
m2 = matrix(rnorm(100), nrow=10)
M1 = melt(m1)
M2 = melt(m2)
names(M1) = c("Var1", "Var2", "value1")
names(M2) = c("Var1", "Var2", "value2")
pp1 <- ggplot() +
geom_tile(data=M1, aes(x=Var1, y=Var2, fill=value1)) +
scale_fill_gradient(low="white", high="red") +
geom_tile(data=M2, aes(x=Var1, y=Var2, fill=value2)) +
scale_fill_gradient(low="blue", high="yellow")
pp1
So the legends themselves are no problem with the ggnewscale package, the problem lies in choosing the actual colours that you want to display. So let's make a new matrix with the actual colours you want to display:
library(ggnewscale)
library(scales)
r <- rescale(M1$value1)
# 1 - rescaled value because yellow should be bottom
g <- 1 - rescale(M2$value2)
# Second scale goes from yellow (low) to blue (high)
# Yellow is 100% blue, 100% green, so blue stays invariant
rgb <- rgb(r, g, 1)
# Make new matrix
M3 <- M1
M3$value1 <- rgb
And now plotting would occur as follows:
ggplot(mapping = aes(x = Var1, y = Var2)) +
# This bit is for making scales
geom_tile(data=M1, aes(fill = value1)) +
scale_fill_gradient(low = "white", high = "red") +
new_scale_fill() +
geom_tile(data=M2, aes(fill=value2)) +
scale_fill_gradient(low="yellow", high="blue") +
new_scale_fill() +
# This is the actual colours
geom_tile(data=M3, aes(fill = M3$value1)) +
scale_fill_identity()
The legends aren't 100% accurate since ggplot mixes colours in 'Lab' space, while we've mixed colours in rgb space, but you could replace the scale_fill_gradient() with for example scale_fill_gradientn(colours = rgb(seq(0, 1, length.out = 100), 0, 0)). Also be aware that the white-to-red scale should technically be a black-to-red scale in this example.
A bivariate color legend. The intervals should maybe be the corresponding quantile.
library(tidyverse)
library(cowplot)
set.seed(2)
m1 = matrix(rnorm(100), nrow=10)
m2 = matrix(rnorm(100), nrow=10)
M1 = melt(m1)
M2 = melt(m2)
names(M1) = c("Var1", "Var2", "value1")
names(M2) = c("Var1", "Var2", "value2")
M1$value_cut <- cut(M1$value1, breaks = 3)
M2$value_cut <- cut(M2$value2, breaks = 3)
M1$value_cut2 <- M2$value_cut
M1$cuts <- paste(M1$value_cut, M1$value_cut2, sep = "-")
levels_comb <- expand.grid(lev1 = levels(M1$value_cut), lev2 = levels(M2$value_cut))
levels_comb$cuts <- paste(levels_comb$lev1, levels_comb$lev2, sep = "-")
levels_comb$filling <- c("#be64ac","#8c62aa","#3b4994","#dfb0d6","#a5add3","#5698b9","#e8e8e8","#ace4e4","#5ac8c8")
data_m <- left_join(M1, levels_comb, by = "cuts")
plot_tile <- ggplot(data_m, aes(x = Var1, y = Var2, fill = filling)) +
geom_tile() +
scale_fill_identity() +
coord_equal() +
theme_minimal()
legend_tile <- ggplot(levels_comb, aes(x = lev1, y = lev2, fill = filling)) +
geom_tile() +
scale_fill_identity() +
coord_equal() +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggdraw() +
draw_plot(plot_tile, 0, 0, 1, 1) +
draw_plot(legend_tile, .75, .4, .3, .3)
I find geom_col() + facet_grid() to be a useful pattern to get at your goal of visualizing the multiple values from the same area together.
There is a little set-up overhead from your starting data:
names(M1) = c("Var1", "Var2", "value")
names(M2) = c("Var1", "Var2", "value")
M1$type <- "M1"
M2$type <- "M2"
M <- rbind(M1, M2)
But the plot is straight forward. You don't really need the fill scale anymore, but I like to keep for highlighting the value changes.
ggplot(M) +
geom_col(aes(type, value, fill = value)) +
facet_grid(Var2 ~ Var1) +
scale_fill_gradient(low="blue", high="yellow")
Not sure if this is palatable for you or not, but at least you get to see an alternative viz option.
I'm struggling with ggplot (I always do). There are a number of very similar questions about forcing ggplot to include zero value categories in legends - here and here (for example). BUT I (think I) have a slightly different requirement to which all my mucking about with scale_x_discrete and scale_fill_manual has not helped.
Requirement: As you can see; the right-hand plot has no data in the TM=5 category - so is missing. What I need is for that right plot to have category 5 shown on the axis but obviously with no points or box.
Current Plot Script:
#data
plotData <- data.frame("TM" = c(3,2,3,3,3,4,3,2,3,3,4,3,4,3,2,3,2,2,3,2,3,3,3,2,3,1,3,2,2,4,4,3,2,3,4,2,3),
"Score" = c(5,4,4,4,3,5,5,5,5,5,5,3,5,5,4,4,5,4,5,4,5,4,5,4,4,4,4,4,5,4,4,5,3,5,5,5,5))
#vars
xTitle <- bquote("T"["M"])
v.I <- plotData$TM
depVar <- plotData$Score
#plot
p <- ggplot(plotData, aes_string(x=v.I,y=depVar,color=v.I)) +
geom_point() +
geom_jitter(alpha=0.8, position = position_jitter(width = 0.2, height = 0.2)) +
geom_boxplot(width=0.75,alpha=0.5,aes_string(group=v.I)) +
theme_bw() +
labs(x=xTitle) +
labs(y=NULL) +
theme(legend.position='none',
axis.text=element_text(size=10, face="bold"),
axis.title=element_text(size=16))
Attempted Solutions:
drop=False to scales (suggested by #Jarretinha here) totally borks margins and x-axis labels
> plot + scale_x_discrete(drop=FALSE) + scale_fill_manual(drop=FALSE)
Following logic from here and manually setting the labels in scale_fill_manual does nothing and results in the same right-hand plot from example above.
> p + scale_fill_manual(values = c("red", "blue", "green", "purple", "pink"),
labels = c("Cat1", "Cat2", "Cat3", "Cat4", "Cat5"),
drop=FALSE)
Playing with this logic and trying something with scale_x_discrete results in a change to category names on x-axis but the fifth is still missing AND the margins (as attempt 1) are borked again. BUT apparent that scale_x_discrete is important and NOT the whole answer
> p + scale_x_discrete(limits = c("Cat1", "Cat2", "Cat3", "Cat4", "Cat5"), drop=FALSE)
ANSWER for above example courtesy of input from #Bouncyball & #aosmith
#data
plotData <- data.frame("TM" = c(3,2,3,3,3,4,3,2,3,3,4,3,4,3,2,3,2,2,3,2,3,3,3,2,3,1,3,2,2,4,4,3,2,3,4,2,3),
"Score" = c(5,4,4,4,3,5,5,5,5,5,5,3,5,5,4,4,5,4,5,4,5,4,5,4,4,4,4,4,5,4,4,5,3,5,5,5,5))
plotData$TM <- factor(plotData$TM, levels=1:5) # add correct (desired number of factors to input data)
#vars
xTitle <- bquote("T"["M"])
v.I <- plotData$TM
depVar <- plotData$Score
myPalette <- c('#5c9bd4','#a5a5a4','#4770b6','#275f92','#646464','#002060')
#plot
ggplot(plotData, aes_string(x=v.I,y=depVar,color=v.I)) +
geom_jitter(alpha=0.8, position = position_jitter(width = 0.2, height = 0.2)) +
geom_boxplot(width=0.75,alpha=0.5,aes_string(group=v.I)) +
scale_colour_manual(values = myPalette, drop=F) + # new line added here
scale_x_discrete(drop=F) + # new line added here
theme_bw() +
labs(x=xTitle) +
labs(y=NULL) +
theme(legend.position='none',
axis.text=element_text(size=10, face="bold"),
axis.title=element_text(size=16))
Here's a workaround you could use:
# generate dummy data
set.seed(123)
df1 <- data.frame(lets = sample(letters[1:4], 20, replace = T),
y = rnorm(20), stringsAsFactors = FALSE)
# define factor, including the missing category as a level
df1$lets <- factor(df1$lets, levels = letters[1:5])
# make plot
ggplot(df1, aes(x = lets, y = y))+
geom_boxplot(aes(fill = lets))+
geom_point(data = NULL, aes(x = 'e', y = 0), pch = NA)+
scale_fill_brewer(drop = F, palette = 'Set1')+
theme_bw()
Basically, we plot an "empty" point (i.e. pch = NA) so that the category shows up on the x-axis, but has no visible geom associated with it. We also define our discrete variable, lets as a factor with five levels when only four are present in the data.frame. The missing category is the letter e.
NB: You'll have to adjust the positioning of this "empty" point so that it doesn't skew your y axis.
Otherwise, you could use the result from this answer to avoid having to plot an "empty" point.
# generate dummy data
set.seed(123)
df1 <- data.frame(lets = sample(letters[1:4], 20, replace = T),
y = rnorm(20), stringsAsFactors = FALSE)
# define factor, including the missing category as a level
df1$lets <- factor(df1$lets, levels = letters[1:5])
# make plot
ggplot(df1, aes(x = lets, y = y)) +
geom_boxplot(aes(fill = lets)) +
scale_x_discrete(drop = F) +
scale_fill_brewer(drop = F, palette = 'Set1') +
theme_bw()
I'm trying to produce a scatter plot with geom_point where the points are circumscribed by a smoothed polygon, with geom_polygon.
Here's my point data:
set.seed(1)
df <- data.frame(x=c(rnorm(30,-0.1,0.1),rnorm(30,0,0.1),rnorm(30,0.1,0.1)),y=c(rnorm(30,-1,0.1),rnorm(30,0,0.1),rnorm(30,1,0.1)),val=rnorm(90),cluster=c(rep(1,30),rep(2,30),rep(3,30)),stringsAsFactors=F)
I color each point according the an interval that df$val is in. Here's the interval data:
intervals.df <- data.frame(interval=c("(-3,-2]","(-2,-0.999]","(-0.999,0]","(0,1.96]","(1.96,3.91]","(3.91,5.87]","not expressed"),
start=c(-3,-2,-0.999,0,1.96,3.91,NA),end=c(-2,-0.999,0,1.96,3.91,5.87,NA),
col=c("#2f3b61","#436CE8","#E0E0FF","#7d4343","#C74747","#EBCCD6","#D3D3D3"),stringsAsFactors=F)
Assigning colors and intervals to the points:
df <- cbind(df,do.call(rbind,lapply(df$val,function(x){
if(is.na(x)){
return(data.frame(col=intervals.df$col[nrow(intervals.df)],interval=intervals.df$interval[nrow(intervals.df)],stringsAsFactors=F))
} else{
idx <- which(intervals.df$start <= x & intervals.df$end >= x)
return(data.frame(col=intervals.df$col[idx],interval=intervals.df$interval[idx],stringsAsFactors=F))
}
})))
Preparing the colors for the leged which will show each interval:
df$interval <- factor(df$interval,levels=intervals.df$interval)
colors <- intervals.df$col
names(colors) <- intervals.df$interval
Here's where I constructed the smoothed polygons (using a function courtesy of this link):
clusters <- sort(unique(df$cluster))
cluster.cols <- c("#ff00ff","#088163","#ccbfa5")
splinePolygon <- function(xy,vertices,k=3, ...)
{
# Assert: xy is an n by 2 matrix with n >= k.
# Wrap k vertices around each end.
n <- dim(xy)[1]
if (k >= 1) {
data <- rbind(xy[(n-k+1):n,], xy, xy[1:k, ])
} else {
data <- xy
}
# Spline the x and y coordinates.
data.spline <- spline(1:(n+2*k), data[,1], n=vertices, ...)
x <- data.spline$x
x1 <- data.spline$y
x2 <- spline(1:(n+2*k), data[,2], n=vertices, ...)$y
# Retain only the middle part.
cbind(x1, x2)[k < x & x <= n+k, ]
}
library(data.table)
hulls.df <- do.call(rbind,lapply(1:length(clusters),function(l){
dt <- data.table(df[which(df$cluster==clusters[l]),])
hull <- dt[, .SD[chull(x,y)]]
spline.hull <- splinePolygon(cbind(hull$x,hull$y),100)
return(data.frame(x=spline.hull[,1],y=spline.hull[,2],val=NA,cluster=clusters[l],col=cluster.cols[l],interval=NA,stringsAsFactors=F))
}))
hulls.df$cluster <- factor(hulls.df$cluster,levels=clusters)
And here's my ggplot command:
library(ggplot2)
p <- ggplot(df,aes(x=x,y=y,colour=interval))+geom_point(cex=2,shape=1,stroke=1)+labs(x="X", y="Y")+theme_bw()+theme(legend.key=element_blank(),panel.border=element_blank(),strip.background=element_blank())+scale_color_manual(drop=FALSE,values=colors,name="DE")
p <- p+geom_polygon(data=hulls.df,aes(x=x,y=y,group=cluster),color=hulls.df$col,fill=NA)
which produces:
My question is how do I add a legend for the polygon under the legend for the points? I want it to a legend with 3 lines colored according to the cluster colors and the corresponding cluster number beside each line?
Slightly different output, only changing the last line of your code, it may solve your purpose:
p+geom_polygon(data=hulls.df,aes(x=x,y=y,group=cluster, fill=cluster),alpha=0.1)
Say, you want to add a legend of the_factor. My basic idea is,
(1) put the_factor into mapping by using unused aes arguments; aes(xx = the_factor)
(2) if (1) affects something, delete the effect by using scale_xx_manual()
(3) modify the legend by using guides(xx = guide_legend(override.aes = list()))
In your case, aes(fill) and aes(alpha) are unused. The former is better to do it because of no effect. So I used aes(fill=as.factor(cluster)).
p <- ggplot(df,aes(x=x,y=y,colour=interval, fill=as.factor(cluster))) + # add aes(fill=...)
geom_point(cex=2, shape=1, stroke=1) +
labs(x="X", y="Y",fill="cluster") + # add fill="cluster"
theme_bw() + theme(legend.key=element_blank(),panel.border=element_blank(),strip.background=element_blank()) + scale_color_manual(drop=FALSE,values=colors,name="DE") +
guides(fill = guide_legend(override.aes = list(colour = cluster.cols, pch=0))) # add
p <- p+geom_polygon(data=hulls.df,aes(x=x,y=y,group=cluster), color=hulls.df$col,fill=NA)
Of course, you can make the same graph by using aes(alpha = the_factor)). Because it has influence, you need to control it by using scale_alpha_manual().
g <- ggplot(df, aes(x=x,y=y,colour=interval)) +
geom_point(cex=2, shape=1, stroke=1, aes(alpha=as.factor(cluster))) + # add aes(alpha)
labs(x="X", y="Y",alpha="cluster") + # add alpha="cluster"
theme_bw() + theme(legend.key=element_blank(),panel.border=element_blank(),strip.background=element_blank()) + scale_color_manual(drop=FALSE,values=colors,name="DE") +
scale_alpha_manual(values=c(1,1,1)) + # add
guides(alpha = guide_legend(override.aes = list(colour = cluster.cols, pch=0))) # add
g <- p+geom_polygon(data=hulls.df,aes(x=x,y=y,group=cluster), color=hulls.df$col,fill=NA)
What you are asking for is two colour scales. My understanding is that this is not possible. But you can give the impression of having two colour scales with a bit of a cheat and using the filled symbols (shapes 21 to 25).
p <- ggplot(df, aes(x = x, y = y, fill = interval)) +
geom_point(cex = 2, shape = 21, stroke = 1, colour = NA)+
labs(x = "X", y = "Y") +
theme_bw() +
theme(legend.key = element_blank(), panel.border = element_blank(), strip.background = element_blank()) +
scale_fill_manual(drop=FALSE, values=colors, name="DE") +
geom_polygon(data = hulls.df, aes(x = x, y = y, colour = cluster), fill = NA) +
scale_colour_manual(values = cluster.cols)
p
Alternatively, use a filled polygon with a low alpha
p <- ggplot(df,aes(x=x,y=y,colour=interval))+
geom_point(cex=2,shape=1,stroke=1)+
labs(x="X", y="Y")+
theme_bw() +
theme(legend.key = element_blank(),panel.border=element_blank(), strip.background=element_blank()) +
scale_color_manual(drop=FALSE,values=colors,name="DE", guide = guide_legend(override.aes = list(fill = NA))) +
geom_polygon(data=hulls.df,aes(x=x,y=y,group=cluster, fill = cluster), alpha = 0.2, show.legend = TRUE) +
scale_fill_manual(values = cluster.cols)
p
But this might make the point colours difficult to see.