How to add labels to facet_wrap(ed) geom_count plot? - r

I am creating a facetted plot using facet_wrap. I want text labels to be included inside the bubble. Instead it seems the total is included as label - i.e. all graphs has the same numbers but different bubble size (which is correct).
(Edits)
My code:
Category1 <- c('A','B','C','A','B','C','A','B','C','A','B','C','A','B','C','A','B','C','A','B')
Category2 <- c('W','V','W','V','W','V','W','V','W','V','W','V','W','V','W','V','W','V','W','V')
Class <- c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4)
df <- data.frame(Category1, Category2, Class)
g <- ggplot(df, aes(Category1, Category2))
g <- g + facet_wrap(Class ~ ., nrow = 3) + geom_count(col="tomato3", show.legend=F) + scale_size_continuous(range = c(5, 10))
labs(subtitle="Count Plot", y="Category2", x="Category1", title="Cat1 vs Cat2")
g
g2 <- g + geom_text(data=ggplot_build(g)$data[[1]], aes(x, y, label=n), size=2) #+ scale_size(range = c(5, 15))
g2
I expect that the size of the bubble will be indicated by the text inside the bubble. But the actual result is all graphs have the same number. I want the small bubble to have small number proportional to its size.

The problem is that your code using ggplot_build data does not have the same categories as the original. You need to create a count data before hand and use it for plotting.
Create count data
library(tidyverse)
df_count <- df %>%
count(Class, Category1, Category2)
Plot
There are two ways to incorporate this new data.
Method 1
The first example I show is to use both df and df_count. This method will modify your code minimally:
g <- ggplot(df, aes(Category1, Category2))
g <- g + facet_wrap(Class ~ ., nrow = 3) + geom_count(col="tomato3", show.legend=F) +
geom_text(data = df_count, aes(Category1, Category2, label=n), size=2) +
scale_size_continuous(range = c(5, 10)) +
labs(subtitle="Count Plot", y="Category2", x="Category1", title="Cat1 vs Cat2")
g
The line geom_text(data = df_count, aes(Category1, Category2, label=n), size=2) + is added.
Method 2
This method uses only the count data. It uses geom_point() instead of geom_count() and alter the size using the variable n. This method is probably better in terms of code readability.
g_alternative <- ggplot(df_count, aes(Category1, Category2, label = n)) +
facet_wrap(Class ~ ., nrow = 3) +
geom_point(col="tomato3", aes(size = n), show.legend=F) +
geom_text() +
scale_size_continuous(range = c(5, 10)) +
labs(subtitle="Count Plot", y="Category2", x="Category1", title="Cat1 vs Cat2")
g_alternative
The output looks like this:

Related

Independent colouring of points by category and contours by height in ggplot

The following sample or R code displays contour levels and the data points used in generating the contours.
n <- 10
x <- c(rnorm(n,-1,0.5), rnorm(n,1,0.5))
y <- c(rnorm(n,-1,1), rnorm(n,1,0.5))
df <- data.frame(x,y)
# categorise the points
df$cat <- sample(c(1,2), n, replace=T)
library(ggplot2)
p <- ggplot(df)
# for manual colouring of points, but not showing contours due to error
#p <- p + geom_point(aes(x=x,y=y,col=factor(cat)))
#cols <- c("1"="red", "2"="blue")
#p <- p + scale_color_manual(values=cols)
# this works fine except I am not controlling the colours
p <- p + geom_point(aes(x=x,y=y,col=cat))
p <- p + geom_density2d(aes(x=x,y=y,color=..level..))
print(p)
I am able to colour the points according to their binary category (see commented out code above) manually if I do not display the contours, but adding the contours results in a "Continuous value supplied to discrete scale" error.
Various attempts have failed.
The question: Is it possible to colour the points (according to category) and independently colour the contour levels (according to height)?
You can try
library(tidyverse)
df %>%
ggplot(aes(x=x,y=y)) +
stat_density_2d(aes(fill = ..level..), geom = "polygon") +
geom_point(aes(color=factor(cat)), size=5) +
theme_bw()
Or switch to points where fill is working like shape=21
df %>%
ggplot(aes(x=x,y=y)) +
geom_density2d(aes(color=..level..))+
geom_point(aes(fill=factor(cat)),color="black",shape=21, size=5) +
theme_bw() +
scale_fill_manual(values = c(2,4)) +
scale_color_continuous(low = "green", high = "orange")
or try to add scale_color_gradientn(colours = rainbow(10)) instead.

ggplot gantt chart - consistent space between lines

I have two data frames, one larger (10 people) and one smaller (two people). I have generated a gantt chart for each data frame. How do I get it so the distance between lines is the same for each plot (i.e. not scaled based on number of entries).
# Generate vectors:
name <- paste("person", seq(10), sep = '_')
start <- sample(seq(5), size = 10, replace = T)
end <- sample(seq(6,10), size = 10, replace = T)
# Generate data frames:
big_chart <- data.frame(name = c(name,name), value = c(start,end))
small_chart <- big_chart[c(1:2,11:12),]
# big plot
library(ggplot)
ggplot(big_chart, aes(value, name)) +
geom_line()
# small plot
ggplot(small_chart, aes(value, name)) +
geom_line()
Below is my solution for you, hopefully it is what you were looking for. I made use of the coord_fixed function to control the overall scaling. In addition, I also fixed your x-axis range using the xlim function.
library(ggplot2)
ggplot(big_chart, aes(value, name)) +
geom_line() +
xlim(0, 10) + #optional
coord_fixed(ratio = 0.5)
ggplot(small_chart, aes(value, name)) +
geom_line() +
xlim(0, 10) + #optional
coord_fixed(ratio = 0.5)

ggplot2 geom_tile diagonal line overlay

I'm looking for a way to produce a diagonal slash from the bottom left the to top right corner of a cell within a plot made using geom_tile.
The input is a melted data frame with two categorical factor columns, sample and gene. I'd like to use something like geom_segment, but I'm not able to specify fractional increments. Any ideas on the best way to accomplish this?
edit: Here is a reproducible example, I can't share one from my own data, as it's protected patient information.
df <- data_frame( gene = c('TP53','TP53','MTOR','BRACA1'),
sample = c('A','B','A','B'),
diagonal = c(FALSE,TRUE,TRUE,FALSE),
effect = c('missense', 'nonsense', 'missense', 'silent') )
ggplot(df, aes(sample, gene)) + geom_tile(aes(fill = effect))
what I'm looking for:
One way to do it:
library(ggplot2)
df <- data.frame(
x = rep(c(2, 5, 7, 9, 12), 2),
y = rep(c(1, 2), each = 5),
z = factor(1:10),
w = rep(diff(c(0, 4, 6, 8, 10, 14)), 2)
)
p <- ggplot(df, aes(x, y)) + geom_tile(aes(fill = z))
gb <- ggplot_build(p)
p + geom_segment(data=gb$data[[1]][1:2, ],
aes(x=xmin, xend=xmax, y=ymin, yend=ymax),
color="white")
In your example, could also rely on the indices of the factor levels like this:
library(ggplot2)
df <- data.frame( gene = c('TP53','TP53','MTOR','BRACA1'),
sample = c('A','B','A','B'),
diagonal = c(FALSE,TRUE,TRUE,FALSE),
effect = c('missense', 'nonsense', 'missense', 'silent') )
df$cross <- c(F,T,T,F)
ggplot(df, aes(sample, gene)) +
geom_tile(aes(fill = effect)) +
geom_segment(data=transform(subset(df, !!cross), sample=as.numeric(sample), gene=as.numeric(gene)),
aes(x=sample-.49, xend=sample+.49, y=gene-.49, yend=gene+.49),
color="white", size=2)
(Note that I used data.frame and not dplyr::data_frame, so that both columns become factors.)
If you want a legend:
ggplot(df, aes(sample, gene)) +
geom_tile(aes(fill = effect)) +
geom_segment(data=transform(subset(df, !!cross), sample=as.numeric(sample), gene=as.numeric(gene)),
aes(x=sample-.49, xend=sample+.49, y=gene-.49, yend=gene+.49, color=cross),
size=2) +
scale_color_manual(values=c("TRUE"="white", "FALSE"=NA))
You can use geom_abline. You can tweak intercept and slope to get what you want. More info and examples here.
ggplot(df, aes(sample, gene)) +
geom_tile(aes(fill = effect)) +
geom_abline(intercept = 1, slope = 1, color="white", size=2)
If you don't actually want specific lines, but just want to highlight, you can simply draw dots:
ggplot(df, aes(sample, gene)) + geom_tile(aes(fill = effect)) +
geom_point(aes(sample, gene))
You can make it look like a line: geom_point(aes(sample, gene), shape='/', size=10, color='white')
To have the lines be only on some tiles, simply pass only the rows with those coordinates to geom_point: geom_point(data=filter(df, diagonal), aes(sample, gene))
Alternatively, you can hack it with a manual shape scale: geom_point(aes(sample, gene, shape=diagonal)) + scale_shape_manual(values=c(' ', '/'))

R: How to spread (jitter) points with respect to the x axis?

I have the following code snippet in R:
dat <- data.frame(cond = factor(rep("A",10)),
rating = c(1,2,3,4,6,6,7,8,9,10))
ggplot(dat, aes(x=cond, y=rating)) +
geom_boxplot() +
guides(fill=FALSE) +
geom_point(aes(y=3)) +
geom_point(aes(y=3)) +
geom_point(aes(y=5))
This particular snippet of code produces a boxplot where one point goes over another (in the above case one point 3 goes over another point 3).
How can I move the point 3 so that the point remains in the same position on the y axis, but it is slightly moved left or right on the x axis?
This can be achieved by using the position_jitter function:
geom_point(aes(y=3), position = position_jitter(w = 0.1, h = 0))
Update:
To only plot the three supplied points you can construct a new dataset and plot that:
points_dat <- data.frame(cond = factor(rep("A", 3)), rating = c(3, 3, 5))
ggplot(dat, aes(x=cond, y=rating)) +
geom_boxplot() +
guides(fill=FALSE) +
geom_point(aes(x=cond, y=rating), data = points_dat, position = position_jitter(w = 0.05, h = 0))
ggplot2 now includes position_dodge(). From the help's description: "Dodging preserves the vertical position of an geom while adjusting the horizontal position."
Thus you can either use it as geom_point(position = position_dodge(0.5)) or, if you want to dodge points that are connected by lines and need the dodge to the be the same across both geoms, you can use something like:
dat <- data.frame(cond = rep(c("A", "B"), each=10), x=rep(1:10, 2), y=rnorm(20))
dodge <- position_dodge(.3) # how much jitter on the x-axis?
ggplot(dat, aes(x, y, group=cond, color=cond)) +
geom_line(position = dodge) +
geom_point(position = dodge)
ggplot2 now has a separate geom for this called geom_jitter so you don't need the position = dodge or position = position_dodge()) argument. Here applied to OP's example:
dat <- data.frame(cond = factor(rep("A",10)),
rating = c(1,2,3,4,6,6,7,8,9,10))
ggplot(dat, aes(x=cond, y=rating)) +
geom_boxplot() +
guides(fill=FALSE) +
geom_jitter(aes(y=c(3, 3, 5)))

How to add different lines for facets

I have data where I look at the difference in growth between a monoculture and a mixed culture for two different species. Additionally, I made a graph to make my data clear.
I want a barplot with error bars, the whole dataset is of course bigger, but for this graph this is the data.frame with the means for the barplot.
plant species means
Mixed culture Elytrigia 0.886625
Monoculture Elytrigia 1.022667
Monoculture Festuca 0.314375
Mixed culture Festuca 0.078125
With this data I made a graph in ggplot2, where plant is on the x-axis and means on the y-axis, and I used a facet to divide the species.
This is my code:
limits <- aes(ymax = meansS$means + eS$se, ymin=meansS$means - eS$se)
dodge <- position_dodge(width=0.9)
myplot <- ggplot(data=meansS, aes(x=plant, y=means, fill=plant)) + facet_grid(. ~ species)
myplot <- myplot + geom_bar(position=dodge) + geom_errorbar(limits, position=dodge, width=0.25)
myplot <- myplot + scale_fill_manual(values=c("#6495ED","#FF7F50"))
myplot <- myplot + labs(x = "Plant treatment", y = "Shoot biomass (gr)")
myplot <- myplot + opts(title="Plant competition")
myplot <- myplot + opts(legend.position = "none")
myplot <- myplot + opts(panel.grid.minor=theme_blank(), panel.grid.major=theme_blank())
So far it is fine. However, I want to add two different horizontal lines in the two facets. For that, I used this code:
hline.data <- data.frame(z = c(0.511,0.157), species = c("Elytrigia","Festuca"))
myplot <- myplot + geom_hline(aes(yintercept = z), hline.data)
However if I do that, I get a plot were there are two extra facets, where the two horizontal lines are plotted. Instead, I want the horizontal lines to be plotted in the facets with the bars, not to make two new facets. Anyone a idea how to solve this.
I think it makes it clearer if I put the graph I create now:
Make sure that the variable species is identical in both datasets. If it a factor in one on them, then it must be a factor in the other too
library(ggplot2)
dummy1 <- expand.grid(X = factor(c("A", "B")), Y = rnorm(10))
dummy1$D <- rnorm(nrow(dummy1))
dummy2 <- data.frame(X = c("A", "B"), Z = c(1, 0))
ggplot(dummy1, aes(x = D, y = Y)) + geom_point() + facet_grid(~X) +
geom_hline(data = dummy2, aes(yintercept = Z))
dummy2$X <- factor(dummy2$X)
ggplot(dummy1, aes(x = D, y = Y)) + geom_point() + facet_grid(~X) +
geom_hline(data = dummy2, aes(yintercept = Z))

Resources