Related
This question already has answers here:
Customize axis labels
(4 answers)
How to use Greek symbols in ggplot2?
(4 answers)
Closed 2 years ago.
Hello everyone and happy new year !!! I would need help in order to improve a ggplot figure.
I have a dfataframe (df1) that looks like so:
x y z
1 a 1 -0.13031299
2 b 1 0.71407346
3 c 1 -0.15669153
4 d 1 0.39894708
5 a 2 0.64465669
6 b 2 -1.18694632
7 c 2 -0.25720456
8 d 2 1.34927378
9 a 3 -1.03584455
10 b 3 0.14840876
11 c 3 0.50119220
12 d 3 0.51168810
13 a 4 -0.94795328
14 b 4 0.08610489
15 c 4 1.55144239
16 d 4 0.20220334
Here is the data as dput() and my code:
df1 <- structure(list(x = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L,
1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("a", "b", "c", "d"
), class = "factor"), y = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 4L, 4L, 4L, 4L), z = c(-0.130312994048691, 0.714073455094197,
-0.156691533710652, 0.39894708481517, 0.644656691110372, -1.18694632145378,
-0.257204564112021, 1.34927378214664, -1.03584454605617, 0.148408762003154,
0.501192202628166, 0.511688097742773, -0.947953281835912, 0.0861048893885463,
1.55144239199118, 0.20220333664676)), class = "data.frame", row.names = c(NA,
-16L))
library(ggplot2)
df1$facet <- ifelse(df1$x %in% c("c", "d"), "cd", df1$x)
p1 <- ggplot(df1, aes(x = x, y = y))
p1 <- p1 + geom_tile(aes(fill = z), colour = "grey20")
p1 <- p1 + scale_fill_gradient2(
low = "darkgreen",
mid = "white",
high = "darkred",
breaks = c(min(df1$z), max(df1$z)),
labels = c("Low", "High")
)
p1 + facet_grid(.~facet, space = "free", scales = "free_x") +
theme(strip.text.x = element_blank())
With this code (inspired from here) I get this figure:
But I wondered if someone had an idea in order to :
Add Greek letter in the x axis text (here alpha and beta)
To add sub Y axis element (here noted as Element 1-3) where Element1 (from 0 to 1); Element2 (from 1 to 3) and Element3 (from 3 to the end)
the result should be something like:
I have a dataset where the first three columns (G1P1, G1P2, G1P3) indicate one grouping of three individuals (i.e. Sidney, Blake, Max on Row 1), the second three columns (G2P1, G2P2, G2P3) indicate another grouping of three individuals (i.e. David, Steve, Daniel on Row 2), etc.... There are a total of 12 individuals, and dataset is pretty much all the possible groupings of these 12 people (approximately 300,000 rows). Each group's cumulative test scores are represented on far right columns, (G1.Sum, G2.Sum, G3.Sum, G4.Sum
).
#### The dput(data) of the first five rows ####
data <- structure(list(X = 1:5, G1P1 = structure(c(4L, 4L, 4L, 4L, 4L), .Label = c("CHRIS", "DAVID", "EVA", "SIDNEY"), class = "factor"), G1P2 = structure(c(1L, 1L, 1L, 1L, 1L), .Label = c("BLAKE", "NICK", "PATRIC", "STEVE"), class = "factor"), G1P3 = structure(c(4L, 4L, 4L, 4L, 4L), .Label = c("BEAU", "BRANDON", "DANIEL", "MAX"), class = "factor"), G2P1 = structure(c(2L, 2L, 1L, 1L, 3L), .Label = c("CHRIS", "DAVID", "EVA", "SIDNEY"), class = "factor"), G2P2 = structure(c(4L, 4L, 3L, 3L, 2L), .Label = c("BLAKE", "NICK", "PATRIC", "STEVE"), class = "factor"), G2P3 = structure(c(3L, 3L, 2L, 2L, 1L), .Label = c("BEAU", "BRANDON", "DANIEL", "MAX"), class = "factor"), G3P1 = structure(c(1L, 3L, 2L, 3L, 2L), .Label = c("CHRIS", "DAVID", "EVA", "SIDNEY"), class = "factor"), G3P2 = structure(c(3L, 2L, 4L, 2L, 4L), .Label = c("BLAKE", "NICK", "PATRIC", "STEVE"), class = "factor"), G3P3 = structure(c(2L, 1L, 3L, 1L, 3L), .Label = c("BEAU", "BRANDON", "DANIEL", "MAX"), class = "factor"), G4P1 = structure(c(3L, 1L, 3L, 2L, 1L), .Label = c("CHRIS", "DAVID", "EVA", "SIDNEY"), class = "factor"), G4P2 = structure(c(2L, 3L, 2L, 4L, 3L), .Label = c("BLAKE", "NICK", "PATRIC", "STEVE"), class = "factor"), G4P3 = structure(c(1L, 2L, 1L, 3L, 2L), .Label = c("BEAU", "BRANDON", "DANIEL", "MAX"), class = "factor"), G1.Sum = c(63.33333333, 63.33333333, 63.33333333, 63.33333333, 63.33333333), G2.Sum = c(58.78333333, 58.78333333, 54.62333333, 54.62333333, 58.69), G3.Sum = c(54.62333333, 58.69, 58.78333333, 58.69, 58.78333333), G4.Sum = c(58.69, 54.62333333, 58.69, 58.78333333, 54.62333333)), .Names = c("X", "G1P1", "G1P2", "G1P3", "G2P1", "G2P2", "G2P3", "G3P1", "G3P2", "G3P3", "G4P1", "G4P2", "G4P3", "G1.Sum", "G2.Sum", "G3.Sum", "G4.Sum"), row.names = c(NA, 5L), class = "data.frame")
I was wondering how you would write an R function so for each row, you can record where the person's group score ranked. For example, on Row 1, SIDNEY was in a group with the highest score at 63.3333. So his rank would be a '1.' But for BRANDON, his group scored last (54.62333), so her rank would be 4. I would like my final data.frame output to be something like this:
ranks <- t(apply(data[grep("Sum", names(data))], 1, function(x) rep(match(x, sort(x, decreasing=T)),each=3)))
just.names <- data[grep("P", names(data))] #Subset without sums
names <- as.character(unlist(just.names[1,])) #create name vector
sapply(names, function(x) ranks[just.names == x])
# SIDNEY BLAKE MAX DAVID STEVE DANIEL CHRIS PATRIC BRANDON EVA NICK BEAU
# [1,] 1 1 1 2 2 2 4 4 4 3 3 3
# [2,] 1 1 1 2 2 2 4 4 4 3 3 3
# [3,] 1 1 1 2 2 2 4 4 4 3 3 3
# [4,] 1 1 1 2 2 2 4 4 4 3 3 3
# [5,] 1 1 1 2 2 2 4 4 4 3 3 3
We first rank the sums and replicate them 3 times each. Next we subset the larger data frame with the names only (take out the sum columns). We make a vector with the individual names. And lastly, we subset the ranks matrix that we created first by seeing where in the data frame the name appears.
Using dplyr and tidyr. First, ranking, then uniting all the rows with their rank, converting to long data, separating out the variables, then finally spreading.
It got really long, and can probably be simplified:
library(dplyr)
library(tidyr)
data[ ,14:17] <- t(apply(-data[ ,14:17], 1 , rank))
data %>% unite("g1", starts_with("G1")) %>%
unite("g2", starts_with("G2")) %>%
unite("g3", starts_with("G3")) %>%
unite("g4", starts_with("G4")) %>%
gather(Row, val, -X) %>%
select(-Row) %>%
separate(val, c("1", "2", "3", "rank")) %>%
gather(zzz, name, -X, -rank) %>%
select(-zzz) %>%
spread(name, rank)
X BEAU BLAKE BRANDON CHRIS DANIEL DAVID EVA MAX NICK PATRIC SIDNEY STEVE
1 1 3 1 4 4 2 2 3 1 3 4 1 2
2 2 3 1 4 4 2 2 3 1 3 4 1 2
3 3 3 1 4 4 2 2 3 1 3 4 1 2
4 4 3 1 4 4 2 2 3 1 3 4 1 2
5 5 3 1 4 4 2 2 3 1 3 4 1 2
Using previous answer's 'rank' matrix and library(reshape2) to convert wide data.frame to long data.frame,
ranks <- t(apply(test[grep("Sum",names(test))], 1, function (x)
rep(match(x, sort(x, decreasing=T)),each=3)))
colnames(ranks) <- names(test)[grep("P", names(test))]
# data subset
test_L <- test[,-grep("Avg", names(test))]
df_player <- data.frame(position= names(test)[grep("P", names(test))],
t(test_L[,-1]), row.names = NULL)
df_ranks <- data.frame(position=names(test)[grep("P", names(test))],
t(ranks), row.names=NULL)
# Combine two temporary data.frames
df_player_melted <- melt(df_player, id=1,
variable.name = "rowNumber", value.name = "player")
df_ranks_melted <- rank= melt(df_ranks, id=1,
variable.name = "rowNumber", value.name = "rank")
df <- cbind(df_player_melted, rank= df_ranks_melted$rank)
# cast into the output format you want
df <- dcast(df, rowNumber ~ player + rank)[1,]
I have following data and code:
mydf
grp categ condition value
1 A X P 2
2 B X P 5
3 A Y P 9
4 B Y P 6
5 A X Q 4
6 B X Q 5
7 A Y Q 8
8 B Y Q 2
>
>
mydf = structure(list(grp = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L), .Label = c("A", "B"), class = "factor"), categ = structure(c(1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("X", "Y"), class = "factor"),
condition = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("P",
"Q"), class = "factor"), value = c(2, 5, 9, 6, 4, 5, 8, 2
)), .Names = c("grp", "categ", "condition", "value"), out.attrs = structure(list(
dim = structure(c(2L, 2L, 2L), .Names = c("grp", "categ",
"condition")), dimnames = structure(list(grp = c("grp=A",
"grp=B"), categ = c("categ=X", "categ=Y"), condition = c("condition=P",
"condition=Q")), .Names = c("grp", "categ", "condition"))), .Names = c("dim",
"dimnames")), row.names = c(NA, -8L), class = "data.frame")
However, following works for data.frame but not for data.table:
> data.frame(with(mydf, table(grp, categ, condition)))
grp categ condition Freq
1 A X P 1
2 B X P 1
3 A Y P 1
4 B Y P 1
5 A X Q 1
6 B X Q 1
7 A Y Q 1
8 B Y Q 1
>
> data.table(with(mydf, table(grp, categ, condition)))
V1
1: 1
2: 1
3: 1
4: 1
5: 1
6: 1
7: 1
8: 1
>
Am I making some mistake here or do I need to correct the data.table command to get other variables? It is highly unlikely that there is a bug here. With 2 variables it works all right:
> data.table(with(mydf, table(grp, categ)))
categ grp N
1: A X 2
2: B X 2
3: A Y 2
4: B Y 2
>
>
> data.frame(with(mydf, table(grp, categ)))
grp categ Freq
1 A X 2
2 B X 2
3 A Y 2
4 B Y 2
Thanks for your help.
Fixed in commit #1760 of data.table v1.9.5. Closes #1043.
I have a problem with the aes parameters; not sure what it's trying to tell me to fix.
My data frame is like so:
> qualityScores
Test1 Test2 Test3 Test4 Test5
Sample1 1 2 2 3 1
Sample2 1 2 2 3 2
Sample3 1 2 1 1 3
Sample4 1 1 3 1 1
Sample5 1 3 1 1 2
Where 1 stands for PASS, 2 for WARN, and 3 for FAIL.
Here's the dput of my data:
structure(list(Test1 = c(1L, 1L, 1L, 1L, 1L), Test2 = c(2L, 2L,
2L, 1L, 3L), Test3 = c(2L, 2L, 1L, 3L, 1L), Test4 = c(3L, 3L,
1L, 1L, 1L), Test5 = c(1L, 2L, 3L, 1L, 2L)), .Names = c("Test1",
"Test2", "Test3", "Test4", "Test5"), class = "data.frame", row.names = c("Sample1",
"Sample2", "Sample3", "Sample4", "Sample5"))
I am trying to make a heatmap where 1 will be represented by gree, 2 by yellow, and 3 by red, using ggplots2.
This is my code:
samples <- rownames(qualityScores)
tests <- colnames(qualityScore)
testScores <- unlist(qualityScores)
colors <- colorRampPalette(c("red", "yellow", "green"))(n=3)
ggplot(qualityScores, aes(x = tests, y = samples, fill = testScores) ) + geom_tile() + scale_fill_gradient2(low = colors[1], mid = colors[2], high = colors[3])
And I get this error message:
Error: Aesthetics must either be length one, or the same length as the dataProblems:colnames(SeqRunQualitySumNumeric)
Where am I going wrong?
Thank you.
This will be much easier if your reshape your data from wide to long format. There are many ways to do that but here I used reshape2
library(reshape2); library(ggplot2)
colors <- c("green", "yellow", "red")
ggplot(melt(cbind(sample=rownames(qualityScores), qualityScores)),
aes(x = variable, y = sample, fill = factor(value))) +
geom_tile() +
scale_fill_manual(values=colors)
I have data of the form:
Day A B
1 1 4
1 2 5
1 3 6
2 2 2
2 3 4
2 5 6
3 6 7
3 4 6
And I would like to display this on a single chart, with Day along the x-axis, and with each x-position having a boxplot for each of A and B (colour coded).
Here's a (slight) modification of an example form the ?boxplot help page. The examples show off many common uses of the functions.
tg <- data.frame(
dose=ToothGrowth$dose[1:30],
A=ToothGrowth$len[1:30],
B=ToothGrowth$len[31:60]
)
head(tg)
# dose A B
# 1 0.5 4.2 15.2
# 2 0.5 11.5 21.5
# 3 0.5 7.3 17.6
# 4 0.5 5.8 9.7
# 5 0.5 6.4 14.5
# 6 0.5 10.0 10.0
boxplot(A ~ dose, data = tg,
boxwex = 0.25, at = 1:3 - 0.2,
col = "yellow",
main = "Guinea Pigs' Tooth Growth",
xlab = "Vitamin C dose mg",
ylab = "tooth length",
xlim = c(0.5, 3.5), ylim = c(0, 35), yaxs = "i")
boxplot(B ~ dose, data = tg, add = TRUE,
boxwex = 0.25, at = 1:3 + 0.2,
col = "orange")
legend(2, 9, c("A", "B"),
fill = c("yellow", "orange"))
Try:
ddf = structure(list(Day = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L), A = c(1L,
2L, 3L, 2L, 3L, 5L, 6L, 4L), B = c(4L, 5L, 6L, 2L, 4L, 6L, 7L,
6L)), .Names = c("Day", "A", "B"), class = "data.frame", row.names = c(NA,
-8L))
mm = melt(ddf, id='Day')
ggplot(mm)+geom_boxplot(aes(x=factor(Day), y=value, fill=variable))