Combining two gene counts in one single plot using ggplot2? - r

Display of two gene counts in the same graph along two different conditions. Normalized Counts for these genes were obtained from Deseq2 using plotcounts functions. To plot these two genes in the same plot with the same x-axis which has three conditions Ctrl,T1,T2 and different y-axis (based on counts). And one extra variable is the replicates PAT1,2,3,4,5 which i want to be distinguished by different shape and genes "x" and "y" with two different colors. I tried something like this from the link mentioned which did not really worked so far
geneX
genecounts <- plotCounts(dds, gene = paste(geneX),
intgroup = c("timepoint","patient"),returnData = TRUE)
# count timepoint patient
# PAT1.ctrl 19.975535 ctrl PAT1
# PAT2.ctrl 15.095701 ctrl PAT2
# PAT3.ctrl 31.067328 ctrl PAT3
# PAT4.ctrl 23.507453 ctrl PAT4
# PAT5.ctrl 64.955803 ctrl PAT5
# PAT1.T1 25.087863 T1 PAT1
# PAT2.T1 12.265661 T1 PAT2
# PAT3.T1 21.514517 T1 PAT3
# PAT4.T1 12.853989 T1 PAT4
# PAT5.T1 29.887820 T1 PAT5
# PAT1.T2 16.234911 T2 PAT1
# PAT2.T2 7.620990 T2 PAT2
# PAT3.T2 36.834481 T2 PAT3
# PAT4.T2 7.085464 T2 PAT4
# PAT5.T2 13.330165 T2 PAT5
second gene Y plotcounts
# count timepoint patient
PAT1.ctrl 156949.94 ctrl PAT1
PAT2.ctrl 164856.70 ctrl PAT2
PAT3.ctrl 258139.79 ctrl PAT3
PAT4.ctrl 103669.21 ctrl PAT4
PAT5.ctrl 434170.02 ctrl PAT5
PAT1.T1 128839.83 T1 PAT1
PAT2.T1 98877.64 T1 PAT2
PAT3.T1 198419.57 T1 PAT3
PAT4.T1 97918.21 T1 PAT4
PAT5.T1 306861.69 T1 PAT5
PAT1.T2 124161.91 T2 PAT1
PAT2.T2 92150.86 T2 PAT2
PAT3.T2 265243.35 T2 PAT3
PAT4.T2 90364.91 T2 PAT4
PAT5.T2 399177.04 T2 PAT5
So far i used this code to generate individual ggplots
#ggplot(genecounts, aes(x = timepoint, y = count, color = patient)) + geom_beeswarm(cex =3)
Any help/suggestions would be highly appreciated

The first step is to add a column for the gene name to each data frame, then combine them.
You could start with geom_point: I would use color for patients and shape for genes. You will want to use a log scale, since the counts differ by orders of magnitude. Assuming that your data frames are named geneX and geneY:
library(dplyr)
library(ggplot2)
geneX %>%
mutate(gene = "X") %>%
bind_rows(mutate(geneY, gene = "Y")) %>%
ggplot(aes(timepoint, count)) +
geom_point(aes(color = patient, shape = gene)) +
scale_y_log10()
You can try geom-jitter instead to avoid point overlap.
If you want to connect the points, you will need to group by both gene and patient, which is a little more work:
geneX %>%
mutate(gene = "X") %>%
bind_rows(mutate(geneY, gene = "Y")) %>%
ggplot(aes(timepoint, count)) +
geom_line(aes(color = patient, group = interaction(patient, gene))) +
geom_point(aes(color = patient, shape = gene)) +
scale_y_log10()

Related

plotting two numeric variables in the same graph

I want to visualise two variables in the same graph.
the variables look like this
> head(intp.trust_male)
# A tibble: 1 × 1
average_intp.trust
<dbl>
1 2.33
and
> head(intp.trust_fem)
# A tibble: 1 × 1
average_intp.trust
<dbl>
1 2.34
I have tried merge to put them in the same data frame, but it doesn't seem to work
Q5 <- merge(intp.trust_fem, intp.trust_male)
ggplot(data = Q5)+
aes(fill = percent_owned) +
geom_sf() +
scale_fill_viridis_c()
can anyone help me out here, please?
Thank you :)
I think what you want to do is stack your data frames. You can do this with dplyr::bind_rows. It's not clear from your question what you're trying to accomplish because percent_owned is not a variable in the data you've shown. Generally, you could do (using geom_point):
library(dplyr)
library(ggplot2)
intp.trust_male <- mutate(intp.trust_male, label = "intp.trust_male")
intp.trust_fem <- mutate(intp.trust_fem, label = "intp.trust_fem")
df <- bind_rows(intp.trust_male, intp.trust_fem)
ggplot(df, aes(x = label, y = average_intp.trust)) +
geom_point()

adding rows to a tibble based on mostly replicating existing rows

I have data that only shows a variable if it is not 0. However, I would like to have gaps representing these 0s in the graph.
(I will be working from a large dataframe, but have created an example data based on how I will be manipulating it for this purpose.)
library(tidyverse)
library(ggplot2)
A <- tibble(
name = c("CTX_M", "CblA_1"),
rpkm = c(350, 4),
sample = "A"
)
B <- tibble(
name = c("CTX_M", "OXA_1", "ampC"),
rpkm = c(324, 357, 99),
sample = "B"
)
plot <- bind_rows(A, B)
ggplot()+ geom_col(data = plot, aes(x = sample, y = rpkm, fill = name),
position = "dodge")
Sample A and B both have CTX_M, however the othre three "names" are only present in either sample A or sample B. When I run the code, the output graph shows two bars for sample A and three bars for sample B the resulting graph was:
Is there a way for me to add ClbA_1 to sample B with rpkm=0, and OXA_1 and ampC to sample A with rpkm=0, while maintaining sample separation? - so the tibble would look like this (order not important):
and the graph would therefore look like this:
You can use complete from tidyr.
plot <- plot %>% complete(name,sample,fill=list(rpkm=0))
# A tibble: 8 x 3
name sample rpkm
<chr> <chr> <dbl>
1 ampC A 0
2 ampC B 99
3 CblA_1 A 4
4 CblA_1 B 0
5 CTX_M A 350
6 CTX_M B 324
7 OXA_1 A 0
8 OXA_1 B 357
ggplot()+ geom_col(data = plot, aes(x = sample, y = rpkm, fill = name),
position = "dodge")

Transition (Sankey) plot with time on x axis

I have a transition matrix as following:
1. A A B
2. B C A
3. A C C
where each column represents periods,each row represents an agent and each letter represents a state. I would like a create a plot such as Sankey Diagram which shows transitions from states to states in each period.
Agents' identities are not important. So I would like to have a plot like this:
. It seems that I can use networkD3 or googleVis packages. However since the position of each node is endogenously determined by the packages, I have no clue how to put the time aspect on X axis.
Any help or alternative visualization suggestions will be highly appreciated,
Thanks a lot in advance,
You can reproduce the sample data by:
transitiondata <- data.frame("t1"=c("A","B","A"),
"t2"=c("A","C","C"),
"t3"=c("B","A","C"))
Self-answering from the future: ggalluvial package, which is perfect for this task, was developed during that time. In order to use it, we need to provide tidy data.
Let's load the libraries we need:
library(ggplot2)
library(ggalluvial)
library(tidyr)
library(dplyr
Then need to create identifiers for the data before we convert it to tidy format. So the new data is like this:
transitiondata$id <- c("id1","id2","id3")
Convert to tidy format
transitiondata_tidy <- transitiondata %>%
gather(time, state, t1,t2,t3) %>%
mutate(time = as.factor(time), state = as.factor(state))
Here is how our data looks like:
id time state
1 id1 t1 A
2 id2 t1 B
3 id3 t1 A
4 id1 t2 A
5 id2 t2 C
6 id3 t2 C
7 id1 t3 B
8 id2 t3 A
9 id3 t3 C
And ggplot2 and ggalluvial does the trick:
ggplot(transitiondata_tidy,
aes(x = time, stratum = state, alluvium = id, fill = state, label = state)) +
geom_stratum() +
geom_text(stat = "stratum", size = 3) +
geom_flow(fill = "darkgrey", color = "black")
And our transition (Sankey) plot is ready:

Plotting Bacteria according to Food Groups & Abundance in R

I have a dataframe that includes four bacteria types: R, B, P, Bi - this is in variable.x
value.y is their abundance and variable.y is various groups they are in.
I would like to plot them according to their food categories: "FiberCategory", "FruitCategory", "VegetablesCategory" & "WholegrainCategory." I have made 4 separate files that have the as such:
Sample Bacteria Abundance Category Level
30841102 R 0.005293192 1 Low
30841102 P 0.000002570 1 Low
30841102 B 0.005813275 1 Low
30841102 Bi 0.000000000 1 Low
49812105 R 0.003298709 1 Low
49812105 P 0.000000855 1 Low
49812105 B 0.131147541 1 Low
49812105 Bi 0.000350086 1 Low
So, I would like a bar plot of how much of each bacteria is in each category. So it should be 4 plots, for each bacteria, with value on the y-axis and food category on the x-axis.
I have tried this code:
library(dplyr)
genus_veg %>% group_by(Genus, Abundance) %>% summarise(Abundance = sum(Abundance)) %>%
ggplot(aes(x = Level, y= Abundance, fill = Genus)) + geom_bar(stat="identity")
But get this error:
Error: cannot modify grouping variable
Any suggestions?
TL;DR Combine individual plots with cowplot
In another interpretation of the super unclear question, this time from:
Plotting Bacteria according to Food Groups & Abundance in R
and
would like to plot them according to their food categories: "FiberCategory", "FruitCategory", "VegetablesCategory" & "WholegrainCategory." I have made 4 separate files
You might be asking for:
You want a bar chart
You want 4 plots, one for each of the food categories
x-axis = bacteria type
y-axis = abundance of bacteria
Input
Let say you have a data frame for each food category. (Again, I'm using dummy data)
library(tidyr)
library(dplyr)
library(ggplot2)
## The categories you have defined
bacteria <- c("R", "B", "P", "Bi")
food <- c("FiberCategory", "FruitCategory", "VegetablesCategory", "WholegrainCategory")
## Create dummy data for plotting
set.seed(1)
num_rows <- length(bacteria)
num_cols <- length(food)
dummydata <-
matrix(data = abs(rnorm(num_rows*num_cols, mean=0.01, sd=0.05)),
nrow=num_rows, ncol=num_cols)
rownames(dummydata) <- bacteria
colnames(dummydata) <- food
dummydata <-
dummydata %>%
as.data.frame() %>%
tibble::rownames_to_column("bacteria") %>%
gather(food, abundance, -bacteria)
## If we have 4 data frames
filter_food <- function(dummydata, foodcat){
dummydata %>%
filter(food == foodcat) %>%
select(-food)
}
dd_fiber <- filter_food(dummydata, "FiberCategory")
dd_fruit <- filter_food(dummydata, "FruitCategory")
dd_veg <- filter_food(dummydata, "VegetablesCategory")
dd_grain <- filter_food(dummydata, "WholegrainCategory")
Where one data frame looks something like
#> dd_grain
# bacteria abundance
#1 R 0.02106203
#2 B 0.10073499
#3 P 0.06624655
#4 Bi 0.00775332
Plot
You can create separate plots. (Here, I'm using a function to generate my plots)
plot_food <- function(dd, title=""){
dd %>%
ggplot(aes(x = bacteria, y = abundance)) +
geom_bar(stat = "identity") +
ggtitle(title)
}
plt_fiber <- plot_food(dd_fiber, "fiber")
plt_fruit <- plot_food(dd_fruit, "fruit")
plt_veg <- plot_food(dd_veg, "veg")
plt_grain <- plot_food(dd_grain, "grain")
And then combine them using cowplot
cowplot::plot_grid(plt_fiber, plt_fruit, plt_veg, plt_grain)
TL;DR Plotting by facets
How you posed the question is super unclear. So I have interpreted your question from
So, I would like a bar plot of how much of each bacteria is in each category. So it should be 4 plots, for each bacteria, with value on the y-axis and food category on the x-axis.
as:
You want a bar chart
You want 4 plots, one for each of the bacteria types: R, B, P, Bi
x-axis = food category
y-axis = abundance of bacteria
Input
In regards to the input data, the data was unclear e.g. you did not describe what "Sample", "Level", or "Category" is. Ideally, you would keep all the food category in one data frame. e.g.
library(tidyr)
library(dplyr)
library(ggplot2)
## The categories you have defined
bacteria <- c("R", "B", "P", "Bi")
food <- c("FiberCategory", "FruitCategory", "VegetablesCategory", "WholegrainCategory")
## Create dummy data for plotting
set.seed(1)
num_rows <- length(bacteria)
num_cols <- length(food)
dummydata <-
matrix(data = abs(rnorm(num_rows*num_cols, mean=0.01, sd=0.05)),
nrow=num_rows, ncol=num_cols)
rownames(dummydata) <- bacteria
colnames(dummydata) <- food
dummydata <-
dummydata %>%
as.data.frame() %>%
tibble::rownames_to_column("bacteria") %>%
gather(food, abundance, -bacteria)
of which the output looks like:
#> dummydata
# bacteria food abundance
#1 R FiberCategory 0.021322691
#2 B FiberCategory 0.019182166
#3 P FiberCategory 0.031781431
#4 Bi FiberCategory 0.089764040
#5 R FruitCategory 0.026475389
#6 B FruitCategory 0.031023419
#7 P FruitCategory 0.034371453
#8 Bi FruitCategory 0.046916235
#9 R VegetablesCategory 0.038789068
#10 B VegetablesCategory 0.005269419
#11 P VegetablesCategory 0.085589058
#12 Bi VegetablesCategory 0.029492162
#13 R WholegrainCategory 0.021062029
#14 B WholegrainCategory 0.100734994
#15 P WholegrainCategory 0.066246546
#16 Bi WholegrainCategory 0.007753320
Plot
Once you have the data formatted as above, you can simply do:
dummydata %>%
ggplot(aes(x = food,
y = abundance,
group = bacteria)) +
geom_bar(stat="identity") +
## Split into 4 plots
## Note: can also use 'facet_grid' to do this
facet_wrap(~bacteria) +
theme(
## rotate the x-axis label
axis.text.x = element_text(angle=90, hjust=1, vjust=.5)
)

barchart and standard errors

I have the following table in R (inspired by a cran help datasheet) :
> dfx <- data.frame(
+ group = c(rep('A', 108), rep('B', 115), rep('C', 106)),
+ sex = sample(c("M", "F","U"), size = 329, replace = TRUE),
+ age = runif(n = 329, min = 18, max = 54)
+ )
> head(dfx)
group sex age
1 A U 47.00788
2 A M 32.40236
3 A M 21.95732
4 A F 19.82798
5 A F 30.70890
6 A M 30.00830
I am interested in plotting the percentages of males (M), females (F) and "unknown"(U) in each group using barcharts, including error bars.
To do this graph, i plan to use the panel.ci/prepanel.ci commands.
I can easily build a proportion table for each group using the prop.table command :
> with(dfx, prop.table(table(group,sex), margin=1)*100)
sex
group F M U
A 29.62963 28.70370 41.66667
B 35.65217 35.65217 28.69565
C 37.73585 33.01887 29.24528
But now, i would like to build a similar table with error bars, and use these two tables to make a barchart.
If possible, i would like to use the ddply command, that i use for similar purposes (except that it was nor percentages but means).
Try something like this:
library(plyr)
library(ggplot2)
summary(dfx) # for example, each variable
dfx$interaction <- interaction(dfx$group, dfx$sex)
ddply(dfx, .(interaction), summary) #group by interaction, summary on dfx
ggplot(dfx, aes(x = sex, y = age, fill = group)) + geom_boxplot()
You can get a good on-line tutorial on building graphs here.
edit
I'm pretty sure you would need more than 1 value for the proportion in order to have any error. I only see 1 value for the proportion for each unique combination of variables group and sex.
This is the most I can help you with (below), but I'd be interested to see you post an answer to your own question when you find a suitable solution.
dfx$interaction <- interaction(dfx$group, dfx$sex)
dfx.summary <- ddply(dfx, .(group, sex), summarise, total = length(group))
dfx.summary$prop <- with(dfx.summary, total/sum(total))
dfx.summary
# group sex prop
# 1 A F 0.06382979
# 2 A M 0.12158055
# 3 A U 0.14285714
# 4 B F 0.12462006
# 5 B M 0.11854103
# 6 B U 0.10638298
# 7 C F 0.10334347
# 8 C M 0.12158055
# 9 C U 0.09726444
ggplot(dfx.summary, aes(sex, total, color = group)) + geom_point(size = 5)

Resources