None-missing rows were removed in geom_point in ggplot - r

Why the rows in this data was claimed to be missing and removed in the plot even though the x-scale isn't out of range? I have tried to include xlim without success. What do I miss here? This is the figure Gp2 (geom_point) isn't included in the plot. The code I used is as follows:
df1 <- data.frame(x=c(2,4:8),
y=c(1.030928,4.123711,3.092784,8.247423,9.278351,4.123711))
df2 <- data.frame(x=3:8,
y=c(1.700680,1.360544,4.081633,3.401361,3.061224,9.183673))
require(ggplot2)
ggplot(NULL, aes(x=x, y=y)) +
geom_bar(data = df1, aes(fill="Gp1", shape="Gp1"),
stat= "identity") +
geom_point(data = df2, stat= "identity", size = 5,
aes(shape="Gp2", fill="Gp2")) +
ylab("%") + xlab("grades") +
ggtitle("Test figure") +
scale_shape_manual(values = c(23, NA)) +
scale_fill_manual(values = c("#6699CC","#000099")) +
guides(fill = guide_legend(reverse = TRUE),
shape = guide_legend(override.aes = list(shape=0), reverse = TRUE))
This gives warning message:
Removed 6 rows containing missing values (geom_point).

Running your code piece-by-piece we can easily find that scale_shape_manual is the culprit here, everything before that works find. (If you had made a minimal example, you would have easily found that..)
You have told ggplot that all the shapes for geom_point should be Gp2, which is the second shape you have mapped. So it will look at the second entry in values and find there is an NA. So you yourself told ggplot that it should give NA shapes to all points.
(Note that you mapped shape Gp1 in geom_bar, but geom_bar doesn't take that aesthetic..)

Related

Colour UMAP based on expression of multiple genes in ggplot2

I was just wondering if anybody had any experience with coloring something like a UMAP made in ggplot based on the expression of multiple genes at the same time? What I want to do is something like the blend function in Seurat featureplots, but with 3 genes / colors instead of 2.
I'm looking to make something like this:
Where the colors for the genes combine where there is overlap.
What I've gotten to so far is
ggplot(FD, vars = c("UMAP_1", "UMAP_2", "FOSL2", "JUNB", "HES1"), aes(x = UMAP_1, y = UMAP_2, colour = FOSL2)) +
geom_point(size=0.3, alpha=1) +
scale_colour_gradientn(colours = c("lightgrey", colour1), limits = c(0, 0.3), oob = scales::squish) +
new_scale_color() +
geom_point(aes(colour = JUNB), size=0.3, alpha=0.7) +
scale_colour_gradientn(colours = c("lightgrey", colour2), limits = c(0.1, 0.2), oob = scales::squish) +
new_scale_color() +
geom_point(aes(colour = HES1), size=0.3, alpha=0.1) +
scale_colour_gradientn(colours = c("lightgrey", colour3), limits = c(0, 0.3), oob = scales::squish)
Where FD is a data frame containing the information from the seurat object for the UMAP coordinates and the expression levels of the three genes of interest. All I can get is a plot where the points from one layer obscure those below it, I've tried messing around with the colours, gradients, alpha and scales but I'm guessing I'm doing it the wrong way.
If anyone knows of a way to make this work or has any suggestions on something else to try that would be very much appreciated.
There is no 'vanilla' way of doing this in ggplot2. One can precalculate the blended colours and append invisible layers and scales with the ggnewscale package.
Let's pretend for reproducibility purposes that we want to make a UMAP of the iris dataset and using the descriptors of leaves as 'genes'.
library(ggplot2)
library(scales)
library(ggnewscale)
#> Warning: package 'ggnewscale' was built under R version 4.1.1
# Calculate a UMAP
umap <- uwot::umap(iris[, 1:4])
# Combine with original data and blended colours
df <- cbind.data.frame(
setNames(as.data.frame(umap), c("x", "y")),
iris,
colour = rgb(
rescale(iris$Sepal.Length),
rescale(iris$Sepal.Width),
rescale(iris$Petal.Length)
)
)
ggplot(df, aes(x, y, colour = colour)) +
geom_point() +
scale_colour_identity() +
new_scale_colour() +
# shape = NA --> invisible layers
geom_point(aes(colour = Sepal.Length), shape = NA) +
scale_colour_gradient(low = "black", high = "red") +
new_scale_colour() +
geom_point(aes(colour = Sepal.Width), shape = NA) +
scale_colour_gradient(low = "black", high = "green") +
new_scale_colour() +
geom_point(aes(colour = Petal.Length), shape = NA) +
scale_colour_gradient(low = "black", high = "blue")
#> Warning: Removed 150 rows containing missing values (geom_point).
#> Warning: Removed 150 rows containing missing values (geom_point).
#> Warning: Removed 150 rows containing missing values (geom_point).
On the more experimental side of things, I have a package on github that has related functionality.
library(ggchromatic) # devtools::install_github("teunbrand/ggchromatic")
ggplot(df, aes(x, y, colour = rgb_spec(Sepal.Length, Sepal.Width, Petal.Length))) +
geom_point()
Created on 2021-10-18 by the reprex package (v2.0.1)
A small sidenote: a plot becomes very hard to interpret when some attributes of the data are mapped to different colour channels.

ggplot2 geom_points won't colour or dodge

So I'm using ggplot2 to plot both a bar graph and points. I'm currently getting this:
As you can see the bars are nicely separated and colored in the desired colors. However my points are all uncolored and stacked ontop of eachother. I would like the points to be above their designated bar and in the same color.
#Add bars
A <- A + geom_col(aes(y = w1, fill = factor(Species1)),
position = position_dodge(preserve = 'single'))
#Add colors
A <- A + scale_fill_manual(values = c("A. pelagicus"= "skyblue1","A. superciliosus"="dodgerblue","A. vulpinus"="midnightblue","Alopias sp."="black"))
#Add points
A <- A + geom_point(aes(y = f1/2.5),
shape= 24,
size = 3,
fill = factor(Species1),
position = position_dodge(preserve = 'single'))
#change x and y axis range
A <- A + scale_x_continuous(breaks = c(2000:2020), limits = c(2016,2019))
A <- A + expand_limits(y=c(0,150))
# now adding the secondary axis, following the example in the help file ?scale_y_continuous
# and, very important, reverting the above transformation
A <- A + scale_y_continuous(sec.axis = sec_axis(~.*2.5, name = " "))
# modifying axis and title
A <- A + labs(y = " ",
x = " ")
A <- A + theme(plot.title = element_text(size = rel(4)))
A <- A + theme(axis.text.x = element_text(face="bold", size=14, angle=45),
axis.text.y = element_text(face="bold", size=14))
#A <- A + theme(legend.title = element_blank(),legend.position = "none")
#Print plot
A
When I run this code I get the following error:
Error: Unknown colour name: A. pelagicus
In addition: Warning messages:
1: Width not defined. Set with position_dodge(width = ?)
2: In max(table(panel$xmin)) : no non-missing arguments to max; returning -Inf
I've tried a couple of things but I can't figure out it does work for geom_col and not for geom_points.
Thanks in advance
The two basic problems you have are dealing with your color error and not dodging, and they can be solved by formatting your scale_...(values= argument using a list instead of a vector, and applying the group= aesthetic, respectively.
You'll see the answer to these two question using an example:
# dummy dataset
year <- c(rep(2017, 4), rep(2018, 4))
species <- rep(c('things', 'things1', 'wee beasties', 'ew'), 2)
values <- c(10, 5, 5, 4, 60, 10, 25, 7)
pt.value <- c(8, 7, 10, 2, 43, 12, 20, 10)
df <-data.frame(year, species, values, pt.value)
I made the "values" set for my column heights and I wanted to use a different y aesthetic for points for illustrative purposes, called "pt.value". Otherwise, the data setup is similar to your own. Note that df$year will be set as numeric, so it's best to change that into either Date format (kinda more trouble than it's worth here), or just as a factor, since "2017.5" isn't gonna make too much sense here :). The point is, I need "year" to be discrete, not continuous.
Solve the color error
For the plot, I'll try to create it similar to you. Here note that in the scale_fill_manual object, you have to set the values= argument using a list. In your example code, you are using a vector (c()) to specify the colors and naming. If you have name1=color1, name2=color2,..., this represents a list structure.
ggplot(df, aes(x=as.factor(year), y=values)) +
geom_col(aes(fill=species), position=position_dodge(width=0.62), width=0.6) +
scale_fill_manual(values=
list('ew' = 'skyblue1', 'things' = 'dodgerblue',
'things1'='midnightblue', 'wee beasties' = 'gray')) +
geom_point(aes(y=pt.value), shape=24, position=position_dodge(width=0.62)) +
theme_bw() + labs(x='Year')
So the colors are applied correctly and my axis is discrete, and the y values of the points are mapped to pt.value like I wanted, but why don't the points dodge?!
Solve the dodging issue
Dodging is a funny thing in ggplot2. The best reasoning here I can give you is that for columns and barplots, dodging is sort of "built-in" to the geom, since the default position is "stack" and "dodge" represents an alternative method to draw the geom. For points, text, labels, and others, the default position is "identity" and you have to be more explicit in how they are going to dodge or they just don't dodge at all.
Basically, we need to let the points know what they are dodging based on. Is it "species"? With geom_col, it's assumed to be, but with geom_point, you need to specify. We do that by using a group= aesthetic, which let's the geom_point know what to use as criteria for dodging. When you add that, it works!
ggplot(df, aes(x=as.factor(year), y=values, group=species)) +
geom_col(aes(fill=species), position=position_dodge(width=0.62), width=0.6) +
scale_fill_manual(values=
list('ew' = 'skyblue1', 'things' = 'dodgerblue',
'things1'='midnightblue', 'wee beasties' = 'gray')) +
geom_point(aes(y=pt.value), shape=24, position=position_dodge(width=0.62)) +
theme_bw() + labs(x='Year')

Stat summary for each factor in scatter plot ggplot2: What about fun.x, fun_y combinations?

I have a bunch of data for people touching bacteria for up to 5 touches. I'm comparing how much they pick up with and without gloves. I'd like to plot the mean by the factor NumberContacts and colour it red. E.g. the red dots on the following graphs.
So far I have:
require(tidyverse)
require(reshape2)
Make some data
df<-data.frame(Yes=rnorm(n=100),
No=rnorm(n=100),
NumberContacts=factor(rep(1:5, each=20)))
Calculate the mean for each group= NumberContacts
centroids<-aggregate(data=melt(df,id.vars ="NumberContacts"),value~NumberContacts+variable,mean)
Get them into two columns
centYes<-subset(centroids, variable=="Yes",select=c("NumberContacts","value"))
centNo<-subset(centroids, variable=="No",select="value")
centroids<-cbind(centYes,centNo)
colnames(centroids)<-c("NumberContacts","Gloved","Ungloved")
Make an ugly plot.
ggplot(df,aes(x=gloves,y=ungloved)+
geom_point()+
geom_abline(slope=1,linetype=2)+
stat_ellipse(type="norm",linetype=2,level=0.975)+
geom_point(data=centroids,size=5,color='red')+
#stat_summary(fun.y="mean",colour="red")+ doesn't work
facet_wrap(~NumberContacts,nrow=2)+
theme_classic()
Is there a more elegant way by using stat_summary? Also How can I change the look of the boxes at the top of my graphs?
stat_summary is not an option because (see ?stat_summary):
stat_summary operates on unique x
That is, while we can take a mean of y, x remains fixed. But we may do something else that is very concise:
ggplot(df, aes(x = Yes, y = No, group = NumberContacts)) +
geom_point() + geom_abline(slope = 1, linetype = 2)+
stat_ellipse(type = "norm", linetype = 2, level = 0.975)+
geom_point(data = df %>% group_by(NumberContacts) %>% summarise_all(mean), size = 5, color = "red")+
facet_wrap(~ NumberContacts, nrow = 2) + theme_classic() +
theme(strip.background = element_rect(fill = "black"),
strip.text = element_text(color = "white"))
which also shows that to modify the boxes above you want to look at strip elements of theme.

How can I add the legend for a line made with stat_summary in ggplot2?

Say I am working with the following (fake) data:
var1 <- runif(20, 0, 30)
var2 <- runif(20, 0, 40)
year <- c(1900:1919)
data_gg <- cbind.data.frame(var1, var2, year)
I melt the data for ggplot:
data_melt <- melt(data_gg, id.vars='year')
and I make a grouped barplot for var1 and var2:
plot1 <- ggplot(data_melt, aes(as.factor(year), value)) +
geom_bar(aes(fill = variable), position = "dodge", stat="identity")+
xlab('Year')+
ylab('Density')+
theme_light()+
theme(panel.grid.major.x=element_blank())+
scale_fill_manual(values=c('goldenrod2', 'firebrick2'), labels=c("Var1",
"Var2"))+
theme(axis.title = element_text(size=15),
axis.text = element_text(size=12),
legend.title = element_text(size=13),
legend.text = element_text(size=12))+
theme(legend.title=element_blank())
Finally, I want to add a line showing the cumulative sum (Var1 + Var2) for each year. I manage to make it using stat_summary, but it does not show up in the legend.
plot1 + stat_summary(fun.y = sum, aes(as.factor(year), value, colour="sum"),
group=1, color='steelblue', geom = 'line', size=1.5)+
scale_colour_manual(values=c("sum"="blue"))+
labs(colour="")
How can I make it so that it appears in the legend?
To be precise and without being a ggplot2 expert the thing that you need to change in your code is to remove the color argument from outside the aes of the stat.summary call.
stat_summary(fun.y = sum, aes(as.factor(year), value, col="sum"), group=1, geom = 'line', size=1.5)
Apparently, the color argument outside the aes function (so defining color as an argument) overrides the aesthetics mapping. Therefore, ggplot2 cannot show that mapping in the legend.
As far as the group argument is concerned it is used to connect the points for making the line, the details of which you can read here: ggplot2 line chart gives "geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?"
But it is not necessary to add it inside the aes call. In fact if you leave it outside the chart will not change.

Add a box for the NA values to the ggplot legend for a continuous map

I have got a map with a legend gradient and I would like to add a box for the NA values. My question is really similar to this one and this one. Also I have read this topic, but I can't find a "nice" solution somewhere or maybe there isn't any?
Here is an reproducible example:
library(ggplot2)
map <- map_data("world")
map$value <- setNames(sample(-50:50, length(unique(map$region)), TRUE),
unique(map$region))[map$region]
map[map$region == "Russia", "value"] <- NA
ggplot() +
geom_polygon(data = map,
aes(long, lat, group = group, fill = value)) +
scale_fill_gradient2(low = "brown3", mid = "cornsilk1", high = "turquoise4",
limits = c(-50, 50),
na.value = "black")
So I would like to add a black box for the NA value for Russia. I know, I can replace the NA's by a number, so it will appear in the gradient and I think, I can write a workaround like the following, but all this workarounds do not seem like a pretty solution for me and also I would like to avoid "senseless" warnings:
ggplot() +
geom_polygon(data = map,
aes(long, lat, group = group, fill = value)) +
scale_fill_gradient2(low = "brown3", mid = "cornsilk1", high = "turquoise4",
limits = c(-50, 50),
na.value = "black") +
geom_point(aes(x = -100, y = -50, size = "NA"), shape = NA, colour = "black") +
guides(size = guide_legend("NA", override.aes = list(shape = 15, size = 10)))
Warning messages:
1: Using size for a discrete variable is not advised.
2: Removed 1 rows containing missing values (geom_point).
One approach is to split your value variable into a discrete scale. I have done this using cut(). You can then use a discrete color scale where "NA" is one of the distinct colors labels. I have used scale_fill_brewer(), but there are other ways to do this.
map$discrete_value = cut(map$value, breaks=seq(from=-50, to=50, length.out=8))
p = ggplot() +
geom_polygon(data=map, aes(long, lat, group=group, fill=discrete_value)) +
scale_fill_brewer(palette="RdYlBu", na.value="black") +
coord_quickmap()
ggsave("map.png", plot=p, width=10, height=5, dpi=150)
Another solution
Because the original poster said they need to retain the color gradient scale and the colorbar-style legend, I am posting another possible solution. It has 3 components:
We need to trick ggplot into drawing a separate color scale by using aes() to map something to color. I mapped a column of empty strings using aes(colour="").
To ensure that we do not draw a colored boundary around each polygon, I specified a manual color scale with a single possible value, NA.
Finally, guides() along with override.aes is used to ensure the new color legend is drawn as the correct color.
p2 = ggplot() +
geom_polygon(data=map, aes(long, lat, group=group, fill=value, colour="")) +
scale_fill_gradient2(low="brown3", mid="cornsilk1", high="turquoise4",
limits=c(-50, 50), na.value="black") +
scale_colour_manual(values=NA) +
guides(colour=guide_legend("No data", override.aes=list(colour="black")))
ggsave("map2.png", plot=p2, width=10, height=5, dpi=150)
It's possible, but I did it years ago. You can't use guides. You have to set individually the continuous scale for the values as well as the discrete scale for the NAs. This is what the error is telling you and this is how ggplot2 works. Did you try using both scale_continuous and scale_discrete since your set up is rather awkward, instead of simply using guides which is basically used for simple plot designs?

Resources