How to make a heatmap of 3 discrete values using ggplot2? - r

I have a problem with the aes parameters; not sure what it's trying to tell me to fix.
My data frame is like so:
> qualityScores
Test1 Test2 Test3 Test4 Test5
Sample1 1 2 2 3 1
Sample2 1 2 2 3 2
Sample3 1 2 1 1 3
Sample4 1 1 3 1 1
Sample5 1 3 1 1 2
Where 1 stands for PASS, 2 for WARN, and 3 for FAIL.
Here's the dput of my data:
structure(list(Test1 = c(1L, 1L, 1L, 1L, 1L), Test2 = c(2L, 2L,
2L, 1L, 3L), Test3 = c(2L, 2L, 1L, 3L, 1L), Test4 = c(3L, 3L,
1L, 1L, 1L), Test5 = c(1L, 2L, 3L, 1L, 2L)), .Names = c("Test1",
"Test2", "Test3", "Test4", "Test5"), class = "data.frame", row.names = c("Sample1",
"Sample2", "Sample3", "Sample4", "Sample5"))
I am trying to make a heatmap where 1 will be represented by gree, 2 by yellow, and 3 by red, using ggplots2.
This is my code:
samples <- rownames(qualityScores)
tests <- colnames(qualityScore)
testScores <- unlist(qualityScores)
colors <- colorRampPalette(c("red", "yellow", "green"))(n=3)
ggplot(qualityScores, aes(x = tests, y = samples, fill = testScores) ) + geom_tile() + scale_fill_gradient2(low = colors[1], mid = colors[2], high = colors[3])
And I get this error message:
Error: Aesthetics must either be length one, or the same length as the dataProblems:colnames(SeqRunQualitySumNumeric)
Where am I going wrong?
Thank you.

This will be much easier if your reshape your data from wide to long format. There are many ways to do that but here I used reshape2
library(reshape2); library(ggplot2)
colors <- c("green", "yellow", "red")
ggplot(melt(cbind(sample=rownames(qualityScores), qualityScores)),
aes(x = variable, y = sample, fill = factor(value))) +
geom_tile() +
scale_fill_manual(values=colors)

Related

Add greek letters on the ggpplot axis, and add main Y axis elements [duplicate]

This question already has answers here:
Customize axis labels
(4 answers)
How to use Greek symbols in ggplot2?
(4 answers)
Closed 2 years ago.
Hello everyone and happy new year !!! I would need help in order to improve a ggplot figure.
I have a dfataframe (df1) that looks like so:
x y z
1 a 1 -0.13031299
2 b 1 0.71407346
3 c 1 -0.15669153
4 d 1 0.39894708
5 a 2 0.64465669
6 b 2 -1.18694632
7 c 2 -0.25720456
8 d 2 1.34927378
9 a 3 -1.03584455
10 b 3 0.14840876
11 c 3 0.50119220
12 d 3 0.51168810
13 a 4 -0.94795328
14 b 4 0.08610489
15 c 4 1.55144239
16 d 4 0.20220334
Here is the data as dput() and my code:
df1 <- structure(list(x = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L,
1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("a", "b", "c", "d"
), class = "factor"), y = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 4L, 4L, 4L, 4L), z = c(-0.130312994048691, 0.714073455094197,
-0.156691533710652, 0.39894708481517, 0.644656691110372, -1.18694632145378,
-0.257204564112021, 1.34927378214664, -1.03584454605617, 0.148408762003154,
0.501192202628166, 0.511688097742773, -0.947953281835912, 0.0861048893885463,
1.55144239199118, 0.20220333664676)), class = "data.frame", row.names = c(NA,
-16L))
library(ggplot2)
df1$facet <- ifelse(df1$x %in% c("c", "d"), "cd", df1$x)
p1 <- ggplot(df1, aes(x = x, y = y))
p1 <- p1 + geom_tile(aes(fill = z), colour = "grey20")
p1 <- p1 + scale_fill_gradient2(
low = "darkgreen",
mid = "white",
high = "darkred",
breaks = c(min(df1$z), max(df1$z)),
labels = c("Low", "High")
)
p1 + facet_grid(.~facet, space = "free", scales = "free_x") +
theme(strip.text.x = element_blank())
With this code (inspired from here) I get this figure:
But I wondered if someone had an idea in order to :
Add Greek letter in the x axis text (here alpha and beta)
To add sub Y axis element (here noted as Element 1-3) where Element1 (from 0 to 1); Element2 (from 1 to 3) and Element3 (from 3 to the end)
the result should be something like:

Plot two box plots with same y variable but different x variables in ggplot2

I have a dataset with 3 columns. One is monthly expenditure (y-variable). Each value in this variable is categorised as either 1 or 0 under two different variables.
Data looks something like this:
df_UP.q234_month_exp df_UP.LFT df_UP.LF
1 NA 0 1
2 NA 1 1
3 12000 1 1
4 NA 1 1
5 20000 1 1
6 NA 0 1
Data has about 1200 rows.
I want a plot which creates a box plot for 'df_UP.q234_month_exp' as y-variable for all rows of 'df_UP.LFT' which are 1, and another box plot in the same plot with same y-variable, but for all rows of 'df_UP.LF' which are 1.
How to accomplish this using ggplot2?
You can try this:
library(tidyverse)
#Data
df <- structure(list(df_UP.q234_month_exp = c(NA, NA, 12000L, NA, 20000L,
NA), df_UP.LFT = c(0L, 1L, 1L, 1L, 1L, 0L), df_UP.LF = c(1L,
1L, 1L, 1L, 1L, 1L)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
The code:
df %>% pivot_longer(cols = -df_UP.q234_month_exp) %>%
filter(value==1) %>% ggplot()+
geom_boxplot(aes(x = name,y = df_UP.q234_month_exp,color=name,group=name))
Output:

R - replace values by row given some statement in if loop with another value in same df

I have a dataset with which I want to conduct a multilevel analysis. Therefore I have two rows for every patient, and a couple column with 1's and 2's (1 = patient, 2 = partner of patient).
Now, I have variables with date of birth and age, for both patient and partner in different columns that are now on the same row.
What I want to do is to write a code that does:
if mydata$couple == 2, then replace mydata$dateofbirthpatient with mydata$dateofbirthpatient
And that for every row. Since I have multiple variables that I want to replace, it would be lovely if I could get this in a loop and just 'add' variables that I want to replace.
What I tried so far:
mydf_longer <- if (mydf_long$couple == 2) {
mydf_long$pgebdat <- mydf_long$prgebdat
}
Ofcourse this wasn't working - but simply stated this is what I want.
And I started with this code, following the example in By row, replace values equal to value in specified column
, but don't know how to finish:
mydf_longer[6:7][mydf_longer[,1:4]==mydf_longer[2,2]] <-
Any ideas? Let me know if you need more information.
Example of data:
# id couple groep_MNC zkhs fbeh pgebdat p_age pgesl prgebdat pr_age
# 1 3 1 1 1 1 1955-12-01 42.50000 1 <NA> NA
# 1.1 3 2 1 1 1 1955-12-01 42.50000 1 <NA> NA
# 2 5 1 1 1 1 1943-04-09 55.16667 1 1962-04-18 36.5
# 2.1 5 2 1 1 1 1943-04-09 55.16667 1 1962-04-18 36.5
# 3 7 1 1 1 1 1958-04-10 40.25000 1 <NA> NA
# 3.1 7 2 1 1 1 1958-04-10 40.25000 1 <NA> NA
mydf_long <- structure(
list(id = c(3L, 3L, 5L, 5L, 7L, 7L),
couple = c(1L, 2L, 1L, 2L, 1L, 2L),
groep_MNC = c(1L, 1L, 1L, 1L, 1L, 1L),
zkhs = c(1L, 1L, 1L, 1L, 1L, 1L),
fbeh = c(1L, 1L, 1L, 1L, 1L, 1L),
pgebdat = structure(c(-5145, -5145, -9764, -9764, -4284, -4284), class = "Date"),
p_age = c(42.5, 42.5, 55.16667, 55.16667, 40.25, 40.25),
pgesl = c(1L, 1L, 1L, 1L, 1L, 1L),
prgebdat = structure(c(NA, NA, -2815, -2815, NA, NA), class = "Date"),
pr_age = c(NA, NA, 36.5, 36.5, NA, NA)),
.Names = c("id", "couple", "groep_MNC", "zkhs", "fbeh", "pgebdat",
"p_age", "pgesl", "prgebdat", "pr_age"),
row.names = c("1", "1.1", "2", "2.1", "3", "3.1"),
class = "data.frame"
)
The following for loop should work if you only want to change the values based on a condition:
for(i in 1:nrow(mydata)){
if(mydata$couple[i] == 2){
mydata$pgebdat[i] <- mydata$prgebdat[i]
}
}
OR
As suggested by #lmo, following will work faster.
mydata$pgebdat[mydata$couple == 2] <- mydata$prgebdat[mydata$couple == 2]

Converting bar chart to pie chart in R

I have following data and code:
dd
grp categ condition value
1 A X P 2
2 B X P 5
3 A Y P 9
4 B Y P 6
5 A X Q 4
6 B X Q 5
7 A Y Q 8
8 B Y Q 2
>
>
dput(dd)
structure(list(grp = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L), .Label = c("A", "B"), class = "factor"), categ = structure(c(1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("X", "Y"), class = "factor"),
condition = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("P",
"Q"), class = "factor"), value = c(2, 5, 9, 6, 4, 5, 8, 2
)), .Names = c("grp", "categ", "condition", "value"), out.attrs = structure(list(
dim = structure(c(2L, 2L, 2L), .Names = c("grp", "categ",
"condition")), dimnames = structure(list(grp = c("grp=A",
"grp=B"), categ = c("categ=X", "categ=Y"), condition = c("condition=P",
"condition=Q")), .Names = c("grp", "categ", "condition"))), .Names = c("dim",
"dimnames")), row.names = c(NA, -8L), class = "data.frame")
ggplot(dd, aes(grp,value, fill=condition))+geom_bar(stat='identity')+facet_grid(~categ)
How can I convert this bar chart to pie chart? I want 4 pies here with their sizes corresponding to heights of respective bars here. I tried following but they did not work:
ggplot(dd, aes(grp,value, fill=condition))+geom_bar(stat='identity')+facet_grid(~categ)+coord_polar()
ggplot(dd, aes(grp,value, fill=condition))+geom_bar(stat='identity')+facet_grid(~categ)+coord_polar('y')
I also tried to make pie chart similar to Pie charts in ggplot2 with variable pie sizes but I am not able to manage with my data. Thanks for your help.
Using the same idea as in the link you posted, you could add a column size do your dataframe that would be the sum of the values for each group, and use that as the width argument:
library(dplyr)
dd<-dd %>% group_by(categ,grp) %>% mutate(size=sum(value))
ggplot(dd, aes(x=size/2,y=value,fill=condition,width=size))+geom_bar(position="fill",stat='identity')+facet_grid(grp~categ)+coord_polar("y")
You want the group and category both to be variables for the grid, and not inside any plot. Here are two different layouts. X ought to be any single item, string, or something else.
ggplot(dd, aes(x=factor(1),y=value,
fill=condition))+geom_bar(stat='identity')+
facet_grid(~grp+categ)+coord_polar("x")
ggplot(dd, aes(x=factor(1),y=value,
fill=condition))+geom_bar(stat='identity')+
facet_grid(grp~categ)+coord_polar("x")
Something strange happened with the top opening here, maybe its just my interface. Should get you going enough though!

Calculating ratios by group with dplyr

Using the following dataframe I would like to group the data by replicate and group and then calculate a ratio of treatment values to control values.
structure(list(group = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L), .Label = c("case", "controls"), class = "factor"), treatment = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "EPA", class = "factor"),
replicate = structure(c(2L, 4L, 3L, 1L, 2L, 4L, 3L, 1L), .Label = c("four",
"one", "three", "two"), class = "factor"), fatty_acid_family = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "saturated", class = "factor"),
fatty_acid = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "14:0", class = "factor"),
quant = c(6.16, 6.415, 4.02, 4.05, 4.62, 4.435, 3.755, 3.755
)), .Names = c("group", "treatment", "replicate", "fatty_acid_family",
"fatty_acid", "quant"), class = "data.frame", row.names = c(NA,
-8L))
I have tried using dplyr as follows:
group_by(dataIn, replicate, group) %>% transmute(ratio = quant[group=="case"]/quant[group=="controls"])
but this results in Error: incompatible size (%d), expecting %d (the group size) or 1
Initially I thought this might be because I was trying to create 4 ratios from a df 8 rows deep and so I thought summarise might be the answer (collapsing each group to one ratio) but that doesn't work either (my understanding is a shortcoming).
group_by(dataIn, replicate, group) %>% summarise(ratio = quant[group=="case"]/quant[group=="controls"])
replicate group ratio
1 four case NA
2 four controls NA
3 one case NA
4 one controls NA
5 three case NA
6 three controls NA
7 two case NA
8 two controls NA
I would appreciate some advice on where I'm going wrong or even if this can be done with dplyr.
Thanks.
You can try:
group_by(dataIn, replicate) %>%
summarise(ratio = quant[group=="case"]/quant[group=="controls"])
#Source: local data frame [4 x 2]
#
# replicate ratio
#1 four 1.078562
#2 one 1.333333
#3 three 1.070573
#4 two 1.446449
Because you grouped by replicate and group, you could not access data from different groups at the same time.
#talat's answer solved for me. I created a minimal reproducible example to help my own understanding:
df <- structure(list(a = c("a", "a", "b", "b", "c", "c", "d", "d"),
b = c(1, 2, 1, 2, 1, 2, 1, 2), c = c(22, 15, 5, 0.2, 107,
6, 0.2, 4)), row.names = c(NA, -8L), class = c("tbl_df",
"tbl", "data.frame"))
# a b c
# 1 a 1 22.0
# 2 a 2 15.0
# 3 b 1 5.0
# 4 b 2 0.2
# 5 c 1 107.0
# 6 c 2 6.0
# 7 d 1 0.2
# 8 d 2 4.0
library(dplyr)
df %>%
group_by(a) %>%
summarise(prop = c[b == 1] / c[b == 2])
# a prop
# 1 a 1.466667
# 2 b 25.000000
# 3 c 17.833333
# 4 d 0.050000

Resources