how to make a merged heatmap between each two columns of values

how to make a merged heatmap between each two columns of values - r

How can I put two columns in one heatmap?
Lets say I have the following data
data<- structure(list(names = structure(c(5L, 1L, 10L, 2L, 6L, 4L, 9L,
7L, 11L, 3L, 8L), .Label = c("Bin", "Dari", "Down", "How", "India",
"Karachi", "Left", "middle", "Right", "Trash", "Up"), class = "factor"),
X1Huor = c(1.555555556, 5.2555556, 2.256544, 2.3654225, 1.2665545,
0, 1.889822365, 2.37232101, -1, -1.885618083, 1.128576187
), X2Hour = c(1.36558854, 2.254887, 2.3333333, 0.22255444,
2.256588, 5.66666, -0.377964473, 0.107211253, -1, 0, 0),
X3Hour = c(0, 1.222222222, 5.336666, 1.179323788, 0.832050294,
-0.397359707, 0.185695338, 1.393746295, -1, -2.121320344,
1.523019248), X4Hour = c(3.988620176, 3.544745039, -2.365555,
2.366666, 1.000000225, -0.662266179, -0.557086015, 0.862662186,
0, -1.305459824, 1.929157714), X5Hour = c(2.366666, 2.333365,
4.22222, 0.823333333, 0.980196059, -2.516611478, 2.267786838,
0.32163376, 0, -2.592724864, 0.816496581)), .Names = c("names",
"X1Huor", "X2Hour", "X3Hour", "X4Hour", "X5Hour"), class = "data.frame", row.names = c(NA,
-11L))
This data has 5 columns of values. I want to make a heatmap which half of it is the value from first colum and the other half of each cell is from the second column.
The same for the third column and fourth
The same for the fifth and sixth ( there is no sixth but I can leave it empty)
This is just an example to show what I am looking for. I have searched a lot but I could not find anything like this
The color range from Red to green, if the value is higher than 2 the color red and if the value is lower than -2 the color is green.
Any thought how to do this ?

This is a somewhat hacky solution, but it might work for you, so check this out.
The idea is to utilize geom_polygon to create the triangles and stack them. To do that we first need to generate the triangle coordinates
library(dplyr)
library(tidyr)
library(stringr)
# the following two line create the triangle coordinates
x = rep(c(1,2,2, 1, 1, 2),nrow(data))
y = rep(c(1,1,2, 1, 2, 2),nrow(data)) + rep(0:10, each=6)
Now that we have our coordinates we need to generate their ids, which are the names. But because we want each triangle to be unique, we need to create two unique versions of each name:
names <- data %>%
select(names, X1Huor, X2Hour) %>%
gather("key", "value", X1Huor, X2Hour) %>%
arrange(names, key) %>%
mutate(name = str_c(names, key)) %>%
.$name %>%
rep(each = 3)
And now we do the same with the hours:
hour <- data %>%
select(names, X1Huor, X2Hour) %>%
gather("key", "value", X1Huor, X2Hour) %>%
arrange(names, key) %>%
.$value %>%
rep(each = 3)
datapoly <- data.frame(x = x, y = y , hour = hour, names = names)
Since there are no proper labels for the plot in our datapoly df, we need to create one:
name_labels <- data %>%
select(names) %>%
arrange(names) %>%
.$names
The scene is now set for our graph:
ggplot(datapoly, aes(x = x, y = y)) +
geom_polygon(aes(group = names, fill = hour), color = "black") +
scale_fill_continuous(low = "green", high = "red") +
scale_y_continuous(breaks = 1:nrow(data), labels = name_labels) +
theme(axis.text.y = element_text(vjust = -2),
axis.ticks = element_blank(),
axis.text.x = element_blank(),
axis.title = element_blank())
The output looks like this:
Several points to keep in mind: Is this really a plot you want to be creating and using? Is this really useful for your purposes? Perhaps other, more traditional visualization methods are more suitable. Also, I didn't bother doing the same for the other hour columns as these are quite tedious, but the method on how to achieve them should be clear enough (I hope).

Related

Avoid ordering hoverinfo by group when using legendgroup in plotly

I have build a Gantt chart in plotly as per example.
I want to use trace filtering with plotly legend, but the code creates an enormous amount of traces which results in hundreds of traces in legend as well. To avoid that - I first use add_lines() on one row per each group with parameter showlegend = T and by specifying legendgroup. After that, I add the remaining lines to the figure with showlegend = F and by specifying the same legendgroup. This works great for the problem, but it messes the text flow inside hoverinfo. Namely, it groups and orders the text by legendgroup instead of allowing them to be ordered by xaxis value.
Data (an excerpt from Soderberg's Sex, Lies & Videotape :)):
library(plotly)
library(dplyr)
df_plotly = structure(list(tt = 1:9, speaker = c("A", "T", "A", "T", "A",
"T", "A", "T", "A"), min_time = c(42.328, 50.67, 53.297, 79.073,
87.54, 116.569, 120.948, 122.45, 131.959), max_time = c(50.5,
55.67, 81.573, 86.203, 112.938, 121.569, 125.948, 130.413, 136.959
), line = c("</br> Garbage.</br>All I've been thinking about all week is garbage.</br>I mean, I can't stop thinking about it.",
"</br> What kind of thoughts about garbage?", "</br> I just</br>I've gotten real concerned over what's gonna happen with all the garbage.</br>I mean, we've got so much of it.</br>You know? I mean, we have to run out of places to put this stuff eventually.</br>The last time I .</br>I started feelin' this way is when that barge was stranded</br>and, you know, it was going around the island and nobody would claim it.</br>Do you remember that?",
"</br> Yes, I remember.</br>Do you have any idea what may have triggered this concern?",
"</br> Yeah. Yeah.</br>You see, the other night, John was taking out the garbage,</br>and he kept spilling things out of the container,</br>and that made me</br>I started imagining, like,</br>a garbage can that produces garbage, and it doesn't stop.</br>It just keeps producing garbage, and it just keeps overflowing.</br>And, yy-you know, what would you do, you know, to try to stop something like that?",
"</br> Ann, do you see any pattern here?", "</br> What do you mean?",
"</br> Well, last week we were talking about your obsession</br>with the families of airline fatalities.</br>Now we're talking about your concern over the garbage problem.",
"</br> Yeah? So?"), color = structure(c(1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L), .Label = c("#66C2A5", "#FC8D62"), class = "factor")), row.names = c(NA,
-9L), class = c("tbl_df", "tbl", "data.frame"))
Gantt without legend with correct order inside hoverinfo.
fig = plot_ly() %>% layout(hovermode = "x unified")
for(i in 1:(nrow(df_plotly) - 1)){
df =
df_plotly
fig <- add_lines(fig,
x = c(df$min_time[i], df$max_time[i]), # x0, x1
y = c(df$speaker[i], df$speaker[i]), # y0, y1
name = df$speaker[i],
mode = "lines",
line = list(color = df$color[i], width = 20),
hoverinfo = "text",
text = df$line[i],
evaluate = T # needed to avoid lazy loading
)
}
fig
Gantt with legend but with incorrect order inside hoverinfo:
fig = plot_ly() %>% layout(hovermode = "x unified")
for(i in 1:2){
df =
df_plotly %>%
group_by(speaker) %>%
filter(min_time == min(min_time)) %>%
ungroup()
fig <- add_lines(fig,
x = c(df$min_time[i], df$max_time[i]), # x0, x1
y = c(df$speaker[i], df$speaker[i]), # y0, y1
name = df$speaker[i],
mode = "lines",
line = list(color = df$color[i], width = 20),
legendgroup = df$speaker[i],
showlegend = T,
hoverinfo = "text",
text = df$line[i],
evaluate = T # needed to avoid lazy loading
)
}
for(i in 1:(nrow(df_plotly) - 1)){
df =
df_plotly %>%
group_by(speaker) %>%
filter(min_time != min(min_time)) %>%
ungroup()
fig <- add_lines(fig,
x = c(df$min_time[i], df$max_time[i]), # x0, x1
y = c(df$speaker[i], df$speaker[i]), # y0, y1
name = df$speaker[i],
mode = "lines",
line = list(color = df$color[i], width = 20),
legendgroup = df$speaker[i],
showlegend = F,
hoverinfo = "text",
text = df$line[i],
evaluate = T # needed to avoid lazy loading
)
}
fig
Thanks!

speech-gaze activity plot in ggplot2

I have data with Utterances by speakers in conversation as well as their gazes to one another. The speakers' gazes are in columns A_aoi, B_aoi, and C_aoi, the gaze durations are in A_aoi_dur, B_aoi_dur, and C_aoi_dur. Here's a reproducible snippet of the data:
df0 <- structure(list(Line = c(105L, 106L, 107L, 109L, 110L, 111L, 112L,
113L, 114L, 115L, 116L), Speaker = c("ID01.A", NA, "ID01.A",
NA, "ID01.B", NA, "ID01.A", NA, "ID01.A", NA, "ID01.C"), Utterance = c("so you've ↑obviously↑ thought about it obviously: (.) have made a decision (.) I'm !head!ing in this door (.) one of the cleaning ladies at the UB !grabb!ed my elbow",
"(0.662)", "and said (.) ~no no no !this! is the !womens'! bathroom~=",
"(0.015)", "=((v: gasps))=", "(0.166)", "=NOW", "(0.622)", "!how! this always plays out ",
"(0.726)", "[when was] that¿="), UttStart = c(163898L, 172500L,
173162L, 176100L, 176115L, 176800L, 176966L, 177372L, 177994L,
179328L, 180054L), UttEnd = c(172500, 173162, 176100, 176115,
176800, 176966, 177372, 177994, 179328, 180054, 180668), UttDur = c(8602,
662, 2938, 15, 685, 166, 406, 622, 1334, 726, 614), A_aoi = c("*B*C*B*C*B*C*B*C*B*C",
"C*", "*B*C*C", "C", "C*", "*", "*C", "C", "C*B", "B*", "*"),
A_aoi_dur = c("21,516,79,333,200,634,233,651,17,2332,33,400,33,518,17,532,33,1900,119,1",
"414,248", "1124,412,116,533,600,153", "15", "616,69", "166",
"153,253", "622", "204,151,979", "219,507", "614"), B_aoi = c("A*A*A*A*A",
"A", "A", "A", "A", "A", "A", "A*", "*A*A", "A*A", "A*A"),
B_aoi_dur = c("475,130,567,137,1983,313,787,1400,2810", "662",
"2938", "15", "685", "166", "406", "398,224", "76,136,284,838",
"108,571,47", "116,270,228"), C_aoi = c("A", "A", "A*A*A",
"A", "A", "A", "A", "A*A", "A", "A*A", "A"), C_aoi_dur = c("8602",
"662", "1058,123,1300,144,313", "15", "685", "166", "406",
"264,351,7", "1334", "125,323,278", "614")), row.names = c(NA,
-11L), class = c("tbl_df", "tbl", "data.frame"))
EDIT: new test data with temporally overlapping Utterances:
df0 <- structure(list(Line = 137:145,
Speaker = c("ID01.A", "ID01.A-Q", NA, "ID01.A", "ID01.A-Q", "ID01.A-Q", "ID01.A-Q", "ID01.A-Q",NA),
Utterance = c("↑she gra:bs my elbow (.) I turn to !look! at her↑ and she's like ~this is a (.) womens' bathroom you can't go in there~",
"~this is a (.) womens' bathroom you can't go in there~", "(0.534)",
"and I'm like ~((silent f: blank stare))~ (.) and she didn't, she was just like ~you can't go in~ (.) I'm like ~I'm a !woman!~ she said ~no you're not you can't go in~",
"~((silent f: blank stare))~", "~you can't go in~", "~I'm a !woman!~",
"~no you're not you can't go in~", "(0.487)"),
UttStart = c(208845L, 211450L, 214136L, 214670L, 215409L, 218307L, 219235L, 220076L, 221368L),
UttEnd = c(214136, 214136, 214670, 221368, 217117, 219050, 219885, 221368, 221855),
UttDur = c(5291, 2686, 534, 6698, 1708, 743, 650, 1292, 487),
A_aoi = c("C*B*C*C*B*C*", "C*B*C*", "*B", "B*C*B*C*C*B*B", "C*B", "C*B", "*", "*B","B"),
A_aoi_dur = c("57,445,1100,135,199,333,866,302,832,33,468,521","530,302,832,33,468,521",
"144,390", "377,235,466,399,1268,132,268,132,433,6,716,1412,854","339,399,970", "73,6,664", "650", "438,854", "487"),
B_aoi = c("A*A","A", "A", "A*A*A*A*A*A", "A", "*A*A", "*A", "A*A", "A"),
B_aoi_dur = c("1691,121,3479", "2686", "534", "53,180,3333,134,253,280,203,534,1296,138,294",
"1708", "63,253,280,147", "405,245", "860,138,294", "487"),
C_aoi = c("A", "A", "A", "A*A", "A", "A*", "A", "A", "A"),
C_aoi_dur = c("5291", "2686", "534", "3766,734,2198",
"1708", "129,614", "650", "1292", "487")),
row.names = c(NA, -9L), class = c("tbl_df", "tbl", "data.frame"))
What I'd like to be able to visualize is who is looking at whom and for how long for each Utterance, roughly like in this schematic representation:
What I can do at present is plot the gazes on a minute-by-minute scale, but just the gazes - not the Utterances: Plotting gaze movements by multiple speakers in a single plot. Starting from the data as above, this can be achieved by multiple transformations (shown below) but the resulting plot does not feature the Utterances and it plots the gazes per minute, whereas I need the gazes per Utterance:
I'm fully aware that this is demanding a lot. Help with it is all the more appreciated.
# pivot_longer so that all gazes have their own row:
df0 <- df0 %>%
rename_with(~ str_c(., "_AOI"), ends_with("_aoi")) %>%
pivot_longer(cols = contains("_"),
names_to = c("Gaze_by", ".value"), #
names_pattern = "^(.*)_([^_]+$)"
) %>%
mutate(Gaze_by = sub("^(.).*", "\\1", Gaze_by)) %>%
mutate(AOI = str_replace_all(AOI, "(?<=.)(?=.)", ",")) %>%
separate_rows(c(AOI, dur), sep = ",", convert = TRUE)
# compute starttimes and endtimes for gazes:
df1 <- df0 %>%
group_by(Gaze_by) %>%
mutate(
end = cumsum(dur),
start = end - dur
)
View(df1)
# compute minutes:
df2 <- df1 %>%
mutate(
# which minute does the event start in?
minute_start = as.integer(start/60000),
# which minute does the event end in?
minute_end = as.integer(end/60000),
# does the event straddle a minute mark?
straddler = minute_end > minute_start)
View(df2)
# 1st subset of `df2`:
df2_A1 <- df2 %>%
# filter those rows that contain events straddling minute marks:
filter(straddler=="TRUE") %>%
# reduce the endtime to the exact minute mark:
mutate(end = minute_end*60000)
View(df2_A1)
# 2nd subset of `df2`:
df2_A2 <- df2 %>%
# filter those rows that contain events straddling minute marks:
filter(straddler=="TRUE") %>%
# reduce the starttime to the exact minute mark:
mutate(start = minute_end*60000)
View(df2_A2)
# 3rd subset of `df0`:
df2_A3 <- df2 %>%
# filter those rows that do not contain events straddling minute marks:
filter(!straddler == "TRUE")
View(df2_A3)
# row-bind all three subsets:
df4 <- rbind(df2_A1, df2_A2, df2_A3) %>%
arrange(start) %>%
mutate(
minute = as.integer(start/60000),
# reduce total starttimes to starttimes per minute:
start_pm = start - 60000*minute,
# reduce total endtimes to endtimes per minute:
end_pm = end - 60000*minute)
# plot gaze activity for **ALL** speakers:
df4 %>%
ggplot(aes(x = start_pm,
xend = end_pm,
y = minute + scale(as.numeric(as.factor(Gaze_by))) / 6,
yend = minute + scale(as.numeric(as.factor(Gaze_by))) / 6,
color = AOI)) +
# draw segments for AOI:
geom_segment(size = 2) +
# reverse y-axis scale:
scale_y_reverse(breaks = 0:max(df4$minute),
labels = paste(0:max(df4$minute), "min", " Gaze_by_A\n Gaze_by_B\n Gaze_by_C", sep = " "),
name = NULL) +
# define custom colors:
scale_colour_manual(values = c("*" = "lemonchiffon",
"A" = "darkorange",
"B" = "lawngreen",
"C" = "slateblue1")) +
# plot title:
labs(title = "Gaze activity") +
theme(axis.title.x.bottom = element_blank())

Here is a solution that gets close to what you are looking for, making use of facets. It also uses forcats::fct_reorder and stringr::str_wrap (which are both part of the tidyverse).
This also wraps any long utterances and keeps the x-scale the same for all facets, rather than allowing them to stretch to fill the width.
df4 %>%
mutate(#add text for y axis labels
Gaze_by = paste0("Gaze_by_", Gaze_by),
#reorder facet panels, add speaker at start, and wrap to 120 characters
Utterance = fct_reorder(str_wrap(paste0(substr(Speaker, 6, 6), ": ",
Utterance),
120),
start_pm),
#set a dummy end point for each utterance based on the longest one
max_x = UttStart - min(UttStart) + max(UttDur)) %>%
ggplot(aes(x = start_pm, xend = end_pm,
y = Gaze_by, yend = Gaze_by, #as discrete variable
color = AOI)) +
geom_segment(size = 3) +
geom_point(aes(x = max_x, y = Gaze_by), alpha = 0) + #plot invisible dummy end points
scale_y_discrete(name = NULL, limits = rev) + #rev to get A at the top
facet_wrap(~Utterance, scales = "free_x", ncol = 1) +
scale_colour_manual(values = c("*" = "lemonchiffon",
"A" = "darkorange",
"B" = "lawngreen",
"C" = "slateblue1")) +
labs(title = "Gaze activity") +
theme_minimal() + #removes a lot of lines etc
theme(strip.text = element_text(color = "blue", hjust = 0), #facet strip text
strip.background = element_rect(fill = "white", color = "white"),
axis.title.x.bottom = element_blank())
To cut the utterances into 4-second chunks, you can do something like this...
df4 %>% group_by(Utterance) %>%
#work out relative durations from start of utterance and create subutterances
mutate(relStart = start_pm - min(start_pm),
relEnd = end_pm - min(start_pm),
subNo = map2(relStart, relEnd, ~seq(.x %/% 4000, .y %/% 4000, 1))) %>%
unnest(subNo) %>% #expand one row per subutterance
mutate(Utterance = paste0(Utterance, " (#", subNo + 1, ")"), #add sub no
subStart = pmax(4000 * subNo, relStart), #limits on subUtt
subEnd = pmin(4000 * (subNo + 1), relEnd), #limits on subUtt
start_pm = min(start_pm) + subStart, #redefine start
end_pm = min(start_pm) + subEnd) %>% #redefine end
group_by(Utterance) %>% #regroup as Utterance has changed!
mutate(max_x = min(start_pm) + 4000) %>% #define dummy end points
ungroup() %>%
mutate(Gaze_by = paste0("Gaze_by_", Gaze_by),
Utterance = fct_reorder(str_wrap(paste0(substr(Speaker, 6, 6), ": ", Utterance),
120), start_pm)) %>%
ggplot(...) #...as per code above from this point

Comparing "Unlimited" value to numerical values in ggplot

I am trying to make a visual comparison between an input vector and my database.However, the input vector or the database may contain the "UL" character, which means, an infinite number. Think of it as your unlimited voice plan, with which you can make an unlimited number of calls.
Here is the code I have used to try to make a visual comparison between "UL" and other numerical values.
# d is the database data.frame, with which we want to compare the input vector
d = structure(list(Type = c("H1", "H2", "H3"),
P1 = c(2000L, 1500L, 1000L),
P2 = c(60L, 40L, 20L),
P3 = c("UL", 3000L, 2000L)),
class = "data.frame",
row.names = c(NA, -3L))
# d2 is the input vector
d2 = structure(list(Type = "New_offre", P1 = 1200L, P2 = "UL", P3 = 2000),
class = "data.frame",
row.names = c(NA, -1L))
#Check if there are some unlimited values in both d and d2
y1 <-rbind(d,d2)
y <- y1
if("UL" %in% y$P3){
max_P3_scale <- max(as.numeric(y[y$P3!="UL","P3"]))
y[y$P3=="UL","P3"]= 2*max_P3_scale
}
if("UL" %in% y$P2){
max_P2_scale <- max(as.numeric(y[y$P2!="UL","P2"]))
y[y$P2=="UL","P2"]= 2*max_P2_scale
}
y <- transform(y,P1=as.numeric(P1),
P2=as.numeric(P2),
P3=as.numeric(P3))
d <- y[1:nrow(d),]
d2<- y[nrow(d)+1,]
d %>% gather(var1, current, -Type) %>%
mutate(new = as.numeric(d2[cbind(rep(1, max(row_number())),
match(var1, names(d2)))]),
slope = factor(sign(current - new), -1:1)) %>%
gather(var2, val, -Type, -var1, -slope) %>%
ggplot(aes(x = factor(var2,levels = c("new","current")), y = val, group = 1)) +
geom_point(aes(fill = var2), shape = 2,size=4) +
geom_line(aes(colour = slope)) +
scale_colour_manual(values = c("green","green", "red")) +
facet_wrap(Type ~ var1,scales = "free")
My first attempt was to find if there is "UL" values in P2 and P3. If yes, I try to find the maximum numeric value other than "UL". Then, I replace all "UL" occurrences by this maximum value* 2, so the graphical representations will always show that "UL" is maximum.
The issue with this is that I am not able to differentiate between actual values and "UL" ones.
Here is how my plot looks like using this solution

How to plot multiple curves and color them as group using R ggplot

I have a data frame like this.
ID read1 read2 read3 read4 class
1 5820350 0.3791915 0.3747022 0.3729779 0.3724259 1
2 5820364 0.3758676 0.3711775 0.3695976 0.3693112 2
3 5820378 0.3885081 0.3823900 0.3804273 0.3797707 2
4 5820392 0.3779945 0.3729582 0.3714910 0.3709072 1
5 5820425 0.2954782 0.2971604 0.2973882 0.2973216 3
6 5820426 0.3376101 0.3368173 0.3360203 0.3359517 3
Each row represents one sample with four values,and the last column is the classification of this sample. I want to visualize each sample curve and set the class as the color.
I tried to reshape the data frame, but I then lost the class feature which I need.
Could you please give me some hint or show me how to do that in R?
Thanks in advance.

You are going to want to tidy your data first (shown below with tidyr::gather). Then, when you plot, you will want to set your group = ID and color = factor(class) (for discrete colors):
library(tidyr)
library(ggplot2)
df <- structure(list(ID = c(5820350L, 5820364L, 5820378L, 5820392L, 5820425L, 5820426L),
read1 = c(0.3791915, 0.3758676, 0.3885081, 0.3779945, 0.2954782, 0.3376101),
read2 = c(0.3747022, 0.3711775, 0.38239, 0.3729582, 0.2971604, 0.3368173),
read3 = c(0.3729779, 0.3695976, 0.3804273, 0.371491, 0.2973882, 0.3360203),
read4 = c(0.3724259, 0.3693112, 0.3797707, 0.3709072, 0.2973216, 0.3359517),
class = c(1L, 2L, 2L, 1L, 3L, 3L)),
.Names = c("ID", "read1", "read2", "read3", "read4", "class"),
class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6"))
df <- gather(df, reading, value, -c(ID, class))
ggplot(df, aes(x = reading, y = value, color = factor(class))) +
geom_line(aes(group = ID))

Here's a function that may do what you want:
PlotMultiCurve = function(x, classes, cols = NULL, colSet = "Set1", ...) {
if(!is.factor(classes)) classes = as.factor(classes)
nClasses = length(levels(classes))
if(is.null(cols)) cols = brewer.pal(nClasses, colSet)
plot(1:ncol(x), x[1,], col = cols[classes[1]], type = "l",
ylim = range(x), xaxt = "n", ...)
axis(1, 1:ncol(x), 1:ncol(x))
for(i in 2:nrow(x)) {
par(new = T)
plot(1:ncol(x), x[i,], col = cols[classes[i]], type = "l",
ylim = range(x), axes = F, xlab = "", ylab = "")
}
}
It uses chooses colors automatically from the RColorBrewer package unless you provide the colors. I copied your data directly into a text file and then ran the following:
# Prepare data
require(RColorBrewer)
myData = read.table("Data.2016-05-03.txt")
x = myData[,2:5]
classes = as.factor(myData$class)
# Plot into PNG file[![enter image description here][1]][1]
png("Plot.2016-05-03.png", width = 1000, height = 1000, res = 300)
par(cex = 0.8)
PlotMultiCurve(x = x, classes = classes, xlab = "Read", ylab = "Response")
dev.off()

ggplot2 scatter plot with overlay of means and bidirectional SD bars

This question is a direct successor to a pervious question asked here called “ggplot scatter plot of two groups with superimposed means with X and Y error bars”. That questions answer looks to do exactly what I am trying to accomplish however the code provided results in an error which I can’t get around. I will use my data as example here but I have tried the original question code as well with the same result.
I have a data frame which looks like this:
structure(list(Meta_ID = structure(c(15L, 22L, 31L, 17L), .Label = c("NM*624-46",
"NM*624-54", "NM*624-56", "NM*624-61", "NM*624-70", "NM624-36",
"NM624-38", "NM624-39", "NM624-40", "NM624-41", "NM624-43", "NM624-46",
"NM624-47", "NM624-51", "NM624-54 ", "NM624-56", "NM624-57",
"NM624-59", "NM624-61", "NM624-64", "NM624-70", "NM624-73", "NM624-75",
"NM624-77", "NM624-81", "NM624-82", "NM624-83", "NM624-84", "NM625-02",
"NM625-10", "NM625-11", "SM621-43", "SM621-44", "SM621-46", "SM621-47",
"SM621-48", "SM621-52", "SM621-53", "SM621-55", "SM621-56", "SM621-96",
"SM621-97", "SM622-51", "SM622-52", "SM623-14", "SM623-23", "SM623-26",
"SM623-27", "SM623-32", "SM623-33", "SM623-34", "SM623-55", "SM623-56",
"SM623-57", "SM623-58", "SM623-59", "SM623-61", "SM623-62", "SM623-64",
"SM623-65", "SM623-66", "SM623-67", "SM680-74", "SM681-16"), class = "factor"),
Region = structure(c(1L, 1L, 1L, 1L), .Label = c("N", "S"
), class = "factor"), Tissue = structure(c(1L, 2L, 1L, 1L
), .Label = c("M", "M*"), class = "factor"), Tag_Num = structure(c(41L,
48L, 57L, 43L), .Label = c("621-43", "621-44", "621-46",
"621-47", "621-48", "621-52", "621-53", "621-55", "621-56",
"621-96", "621-97", "622-51", "622-52", "623-14", "623-23",
"623-26", "623-27", "623-32", "623-33", "623-34", "623-55",
"623-56", "623-57", "623-58", "623-59", "623-61", "623-62",
"623-64", "623-65", "623-66", "623-67", "624-36", "624-38",
"624-39", "624-40", "624-41", "624-43", "624-46", "624-47",
"624-51", "624-54", "624-56", "624-57", "624-59", "624-61",
"624-64", "624-70", "624-73", "624-75", "624-77", "624-81",
"624-82", "624-83", "624-84", "625-02", "625-10", "625-11",
"680-74", "681-16"), class = "factor"), Lab_Num = structure(1:4, .Label = c("C4683",
"C4684", "C4685", "C4686", "C4687", "C4688", "C4689", "C4690",
"C4691", "C4692", "C4693", "C4694", "C4695", "C4696", "C4697",
"C4698", "C4699", "C4700", "C4701", "C4702", "C4703", "C4704",
"C4705", "C4706", "C4707", "C4708", "C4709", "C4710", "C4711",
"C4712", "C4713", "C4714", "C4715", "C4716", "C4717", "C4718",
"C4719", "C4720", "C4721", "C4722", "C4723", "C4724", "C4725",
"C4726", "C4727", "C4728", "C4729", "C4730", "C4731", "C4732",
"C4733", "C4734", "C4735", "C4736", "C4737", "C4738", "C4739",
"C4740", "C4741", "C4742", "C4743", "C4744", "C4745", "C4746",
"C4747", "C4748"), class = "factor"), C = c(46.5, 46.7, 45,
43.6), N = c(12.9, 13.7, 14.5, 13.4), C.N = c(3.6, 3.4, 3.1,
3.3), d13C = c(-19.7, -19.5, -19.4, -19.2), d15N = c(13.3,
12.4, 11.7, 11.9)), .Names = c("Meta_ID", "Region", "Tissue",
"Tag_Num", "Lab_Num", "C", "N", "C.N", "d13C", "d15N"), row.names = c(NA,
4L), class = "data.frame")
What I want to produce is a scatter plot of the raw data with an overlay of the data means for each “Region” with bidirectional error bars. To accomplish that I use plyr to summarize my data and generate the means and SD’s. Then I use ggplot2:
library(plyr)
Basic <- ddply(First.run,.(Region),summarise,
N = length(d13C),
d13C.mean = mean(d13C),
d15N.mean = mean(d15N),
d13C.SD = sd(d13C),
d15N.SD = sd(d15N))
ggplot(data=First.run, aes(x = First.run$d13C, y = First.run$d15N))+
geom_point(aes(colour = Region))+
geom_point(data = Basic,aes(colour = Region))+
geom_errorbarh(data = Basic, aes(xmin = d13C.mean + d13C.SD, xmax = d13C.mean - d13C.SD,
y = d15N.mean, colour = Region, height = 0.01))+
geom_errorbar(data = Basic, aes(ymin = d15N.mean - d15N.SD, ymax = d15N.mean + d15N.SD,
x = d13C.mean,colour = Region))
But each time I run this code I get the same error and can’t figure out what the problem is.
Error: Aesthetics must either be length one, or the same length as the dataProblems:Region
Any help would be much appreciated.
Edit: Since my example data is taken from the head of my full dataset it only includes samples from the "N" Region. With only this one region the code works fine but if you use fix() to change the provided dataset so that at least one other Region is included (in my data the other Region is "S") then the error I get shows up. My mistake in not including some data from each Region.

I ended up changing two of the "N" Regions to "S" so I could calculate standard deviation for both groups.
I think the problem was that you were missing required aesthetics in some of your geoms (geom_point was missing x and y, for example). At least getting all the required aesthetics into each geom seemed to get everything working. I cleaned up a few other things while I was at it to shorten the code up a bit.
ggplot(data = First.run, aes(x = d13C, y = d15N, colour = Region)) +
geom_point() +
geom_point(data = Basic,aes(x = d13C.mean, y = d15N.mean)) +
geom_errorbarh(data = Basic, aes(xmin = d13C.mean + d13C.SD,
xmax = d13C.mean - d13C.SD, y = d15N.mean, x = d13C.mean), height = .5) +
geom_errorbar(data = Basic, aes(ymin = d15N.mean - d15N.SD,
ymax = d15N.mean + d15N.SD, x = d13C.mean, y = d15N.mean), width = .01)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

how to make a merged heatmap between each two columns of values - r

Related

Avoid ordering hoverinfo by group when using legendgroup in plotly

speech-gaze activity plot in ggplot2

Comparing "Unlimited" value to numerical values in ggplot

How to plot multiple curves and color them as group using R ggplot

ggplot2 scatter plot with overlay of means and bidirectional SD bars

Categories

Resources