Labeling a single point with ggrepel

Labeling a single point with ggrepel - r

I am trying to use geom_label_repel to add labels to a couple of data points on a plot. In this case, they happen to be outliers on box plots. I've got most of the code working, I can label the outlier, but for some reason I am getting multiple labels (equal to my sample size for the entire data set) mapped to that point. I'd like just one label for this outlier.
Example:
Here is my data:
dput(sus_dev_data)
structure(list(time_point = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L), .Label = c("3", "8", "12"), class = "factor"),
days_to_pupation = c(135L, 142L, 143L, 155L, 149L, 159L,
153L, 171L, 9L, 67L, 53L, 49L, 72L, 67L, 55L, 64L, 60L, 122L,
53L, 51L, 49L, 53L, 50L, 56L, 44L, 47L, 60L)), row.names = c(1L,
2L, 3L, 4L, 5L, 6L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L,
17L, 18L, 20L, 21L, 22L, 23L, 24L, 26L, 27L, 28L, 29L, 30L), class = "data.frame")
and my code...
####################################################################################################
# Time to pupation statistical analysis
####################################################################################################
## linear model
pupation_Model=lm(sus_dev_data$days_to_pupation~sus_dev_data$time_point)
pupationANOVA=aov(pupation_Model)
summary(pupationANOVA)
# Tukey test to study each pair of treatment :
pupationTUKEY <- TukeyHSD(x=pupationANOVA, which = 'sus_dev_data$time_point',
conf.level=0.95)
## Function to generate significance labels on box plot
generate_label_df <- function(pupationTUKEY, variable){
# Extract labels and factor levels from Tukey post-hoc
Tukey.levels <- pupationTUKEY[[variable]][,4]
Tukey.labels <- data.frame(multcompLetters(Tukey.levels, reversed = TRUE)['Letters'])
#I need to put the labels in the same order as in the boxplot :
Tukey.labels$treatment=rownames(Tukey.labels)
Tukey.labels=Tukey.labels[order(Tukey.labels$treatment) , ]
return(Tukey.labels)
}
#generate labels using function
labels<-generate_label_df(pupationTUKEY , "sus_dev_data$time_point")
#rename columns for merging
names(labels)<-c('Letters','time_point')
# obtain letter position for y axis using means
pupationyvalue<-aggregate(.~time_point, data=sus_dev_data, max)
#merge dataframes
pupationfinal<-merge(labels,pupationyvalue)
####################################################################################################
# Time to pupation plot
####################################################################################################
# Plot of data
(pupation_plot <- ggplot(sus_dev_data, aes(time_point, days_to_pupation)) +
Alex_Theme +
geom_boxplot(fill = "grey80", outlier.size = 0.75) +
geom_text(data = pupationfinal, aes(x = time_point, y = days_to_pupation,
label = Letters),vjust=-2,hjust=.5, size = 4) +
#ggtitle(expression(atop("Days to pupation"))) +
labs(y = 'Days to pupation', x = 'Weeks post-hatch') +
scale_y_continuous(limits = c(0, 200)) +
scale_x_discrete(labels=c("3" = "13", "8" = "18",
"12" = "22")) +
geom_label_repel(aes(x = 1, y = 9),
label = '1')
)

Here's a shorter example to demonstrate what is going on. Essentially, your labels are beng recycled to be the same length as the data.
df = data.frame(x=1:5, y=1:5)
ggplot(df, aes(x,y, color=x)) +
geom_point() +
geom_label_repel(aes(x = 1, y = 1), label = '1')
You can override this by providing new data for the ggrepel
ggplot(df, aes(x,y, color=x)) +
geom_point() +
geom_label_repel(data = data.frame(x=1, y=1), label = '1')

Based on your data, you have 3 outliers (one in each group), you can manually identify them by applying the classic definition of outliers by John Tukey (Upper: Q3+1.5*IQR and Lower: Q1-1.5*IQR) (but you are free to set your own rules to define an outlier). You can use the function quantile and IQR to get those points.
Here, I incorporated them in a sequence of pipe using dplyr package:
library(tidyverse)
Outliers <- sus_dev_data %>% group_by(time_point) %>%
mutate(Out_up = ifelse(days_to_pupation > quantile(days_to_pupation,0.75)+1.5*IQR(days_to_pupation), "Out","In"))%>%
mutate(Out_Down = ifelse(days_to_pupation < quantile(days_to_pupation,0.25)-1.5*IQR(days_to_pupation), "Out","In")) %>%
filter(Out_up == "Out" | Out_Down == "Out")
# A tibble: 3 x 4
# Groups: time_point [3]
time_point days_to_pupation Out_up Out_Down
<fct> <int> <chr> <chr>
1 3 9 In Out
2 8 122 Out In
3 12 60 Out In
As mentioned by #dww, you need to pass a new dataframe to geom_label_repel if you want your outliers to be single labeled. So, here we use the dataframe Outliers to feed the geom_label_repel function:
library(ggplot2)
library(ggrepel)
ggplot(sus_dev_data, aes(time_point, days_to_pupation)) +
#Alex_Theme +
geom_boxplot(fill = "grey80", outlier.size = 0.75) +
geom_text(data = pupationfinal, aes(x = time_point, y = days_to_pupation,
label = Letters),vjust=-2,hjust=.5, size = 4) +
#ggtitle(expression(atop("Days to pupation"))) +
labs(y = 'Days to pupation', x = 'Weeks post-hatch') +
scale_y_continuous(limits = c(0, 200)) +
scale_x_discrete(labels=c("3" = "13", "8" = "18",
"12" = "22")) +
geom_label_repel(inherit.aes = FALSE,
data = Outliers,
aes(x = time_point, y = days_to_pupation, label = "Out"))
And you get the following graph:
I hope it helps you to figure it how to label all your outliers.

Related

Order Bars in ggplot2 from high to low (when repeating words are used)

I am trying to reorder the bars in ggPlot2's barplot from the highest values to lowest values. Where the highest values are at the top of the barchart and the lowest values are at the bottom.
I've used this stack overflow post in other plots and it works with no problem.
However, ggPlot2 seems to have a problem when there are the same values in both facets. It does not produce the correct ordering in the plot.
Here is what it looks like now. As you can see, it is out of order. Idealy, I'd like the Unvax_to_Vax facet to read (from top to bottom): safe, sheep, good, dumb, stupid, scared and I'd like the Vax_to_Unvax facet to read (from top to bottom): stupid, selfish, ingnorant, dumb, unsade, foolish.
Here is the data and code to reproduce the figure.
df <- structure(list(Var1 = structure(c(8L, 7L, 4L, 1L, 9L, 2L, 5L,
10L, 3L, 1L, 8L, 6L), .Label = c("dumb", "foolish", "good", "ignorant",
"safe", "scared", "selfish", "stupid", "unsafe", "sheep"), class = "factor"),
Freq = c(101L, 94L, 47L, 33L, 29L, 24L, 27L, 22L, 18L, 15L,
15L, 11L), Percent = c(8.82096069868996, 8.20960698689956,
4.10480349344978, 2.882096069869, 2.53275109170306, 2.09606986899563,
5.54414784394251, 4.51745379876797, 3.69609856262834, 3.08008213552361,
3.08008213552361, 2.25872689938398), Group = c("Vax_to_Unvax",
"Vax_to_Unvax", "Vax_to_Unvax", "Vax_to_Unvax", "Vax_to_Unvax",
"Vax_to_Unvax", "Unvax_to_Vax", "Unvax_to_Vax", "Unvax_to_Vax",
"Unvax_to_Vax", "Unvax_to_Vax", "Unvax_to_Vax")), row.names = c(319L,
292L, 147L, 82L, 375L, 98L, 173L, 182L, 76L, 54L, 190L, 176L), class = "data.frame")
ggplot(df,
aes( x= reorder(Var1, Freq), y = Percent, fill = Group)) +
geom_bar(stat="identity") +
facet_wrap(Group ~. , scales = "free") +
coord_flip()
Thank you for your help.

Add age adjustment to geom_smooth

I need to include age adjustment in the geom_smooth line I am adding to my ggscatter plot.
my data looks like~
table link
structure(list(Time = c(0L, 0L, 0L, 0L, 6L, 12L, 18L, 18L, 0L,
12L, 18L, 6L), group = structure(c(1L, 1L, 2L, 2L, 1L, 3L, 3L,
3L, 3L, 4L, 4L, 1L), .Label = c("A", "B", "C", "D"), class = "factor"),
Age = c(77, 70.2, 69.9, 65.7, 66.2, 66.7, 67.2, 67.7, 66.8,
67.8, 68.3, 68.8), Average = c(96L, 90L, 94L, 94L, 96L, 96L,
92L, 120L, 114L, 109L, 113L, 103L)), row.names = c(NA, 12L
), class = "data.frame")
What I currently have (the 'Average" value have dependency in age..):
ggscatter(dtable, "Time","Average",conf.int = TRUE)+theme_bw()+
geom_smooth(aes(group=1),method='lm')+facet_wrap(~groups)
What I would like to have is something like:
ggscatter(dtable, "Time","Average",conf.int = TRUE)+theme_bw()+
geom_smooth(aes(group=1),method='lm', adjust= ~age)+facet_wrap(~groups)
With adjustment per each group mean age
Any suggestions?

Here is I think what you are after.
First, we need to fit the more complicated model because ggplot does not have a functionality for multivariable models (yet)
fit <- lm(Average ~ Time + group + Age, data = tdata)
Then we can use some functionality from the broom package to add the predictions and associated standard errors. With these in hand we can manually build the plot using the geom_line and geom_ribbon geoms
library(broom)
tdata %>%
bind_cols(augment(fit)) %>%
ggplot(aes(Time, Average))+
geom_point()+
geom_line(aes(x = Time, y = .fitted), size = 2, color = "blue")+
geom_ribbon(aes(ymin = .fitted + .se.fit*2, ymax = .fitted - .se.fit*2), alpha = .2)+
facet_wrap(~group)+
theme_bw()
Additionally, if you wanted to look at pooled vs non-pooled estimates
fit_no_pool <- lm(Average ~ Time + group + Age, data = tdata)
fit_complete_pool <- lm(Average ~ Time + Age, data = tdata)
library(broom)
tdata %>%
bind_cols(augment(fit_no_pool) %>% setNames(sprintf("no_pool%s", names(.)))) %>%
bind_cols(augment(fit_complete_pool) %>% setNames(sprintf("pool%s", names(.)))) %>%
ggplot(aes(Time, Average))+
geom_point()+
# Non-Pooled Estimates
geom_line(aes(x = Time, y = no_pool.fitted, color = "blue"), size = 2)+
geom_ribbon(aes(ymin = no_pool.fitted + no_pool.se.fit*2,
ymax = no_pool.fitted - no_pool.se.fit*2), alpha = .2)+
# Pooled Estimates
geom_line(aes(x = Time, y = pool.fitted, color = "orange"), size = 2)+
geom_ribbon(aes(ymin = pool.fitted + pool.se.fit*2,
ymax = pool.fitted - pool.se.fit*2), alpha = .2)+
facet_wrap(~group)+
scale_color_manual(name = "Regression",
labels = c("Pooled", "Non-Pooled"),
values = c("blue", "orange"))+
theme_bw()

One way to go is to run your model with Age as an additional predictor in your model. then use predict to get the predicted value with CIs. Append to your data then use ggplot to plot. I know you want to facet by group, so it might be worth putting it into your model as well. Just a thought. The steps would be the same.
df <- structure(list(Time = c(0L, 0L, 0L, 0L, 6L, 12L, 18L, 18L, 0L,
12L, 18L, 6L), group = structure(c(1L, 1L, 2L, 2L, 1L, 3L, 3L,
3L, 3L, 4L, 4L, 1L), .Label = c("A", "B", "C", "D"), class = "factor"),
Age = c(77, 70.2, 69.9, 65.7, 66.2, 66.7, 67.2, 67.7, 66.8,
67.8, 68.3, 68.8), Average = c(96L, 90L, 94L, 94L, 96L, 96L,
92L, 120L, 114L, 109L, 113L, 103L)), row.names = c(NA, 12L
), class = "data.frame")
#model adjusted for age
mod <- lm(Average ~ Time + Age, data = df)
#get prediction with CIS
premod <- predict(mod, interval = "predict")
#append to data
df2 <- cbind(df,premod)
#add prediction to ggplot with scatter plot
ggplot(df2) +
geom_point(aes(x=Time,y=Average)) +
geom_line(aes(x=Time, y = fit)) +
geom_ribbon(aes(x = Time,ymin = lwr, ymax = upr), alpha = .1)+
facet_wrap(~group)+
theme_bw()

How is geom_point removing rows containing missing values?

I'm unsure why none of my data points show up on the map.
Store_ID visits CRIND_CC ISCC EBITDAR top_bottom Latitude Longitude
(int) (int) (int) (int) (dbl) (chr) (fctr) (fctr)
1 92 348 14819 39013 76449.15 top 41.731373 -93.58184
2 2035 289 15584 35961 72454.42 top 41.589428 -93.80785
3 50 266 14117 27262 49775.02 top 41.559017 -93.77287
4 156 266 7797 25095 28645.95 top 41.6143 -93.834404
5 66 234 8314 18718 46325.12 top 41.6002 -93.779236
6 207 18 2159 17999 20097.99 bottom 41.636208 -93.531876
7 59 23 10547 28806 52168.07 bottom 41.56153 -93.88083
8 101 23 1469 11611 7325.45 bottom 41.20982 -93.84298
9 130 26 2670 13561 14348.98 bottom 41.614517 -93.65789
10 130 26 2670 13561 14348.98 bottom 41.6145172 -93.65789
11 24 27 17916 41721 69991.10 bottom 41.597134 -93.49263
> dput(droplevels(top_bottom))
structure(list(Store_ID = c(92L, 2035L, 50L, 156L, 66L, 207L,
59L, 101L, 130L, 130L, 24L), visits = c(348L, 289L, 266L, 266L,
234L, 18L, 23L, 23L, 26L, 26L, 27L), CRIND_CC = c(14819L, 15584L,
14117L, 7797L, 8314L, 2159L, 10547L, 1469L, 2670L, 2670L, 17916L
), ISCC = c(39013L, 35961L, 27262L, 25095L, 18718L, 17999L, 28806L,
11611L, 13561L, 13561L, 41721L), EBITDAR = c(76449.15, 72454.42,
49775.02, 28645.95, 46325.12, 20097.99, 52168.07, 7325.45, 14348.98,
14348.98, 69991.1), top_bottom = c("top", "top", "top", "top",
"top", "bottom", "bottom", "bottom", "bottom", "bottom", "bottom"
), Latitude = structure(c(11L, 4L, 2L, 7L, 6L, 10L, 3L, 1L, 8L,
9L, 5L), .Label = c("41.20982", "41.559017", "41.56153", "41.589428",
"41.597134", "41.6002", "41.6143", "41.614517", "41.6145172",
"41.636208", "41.731373"), class = "factor"), Longitude = structure(c(3L,
7L, 5L, 8L, 6L, 2L, 10L, 9L, 4L, 4L, 1L), .Label = c("-93.49263",
"-93.531876", "-93.58184", "-93.65789", "-93.77287", "-93.779236",
"-93.80785", "-93.834404", "-93.84298", "-93.88083"), class = "factor")), row.names = c(NA,
-11L), .Names = c("Store_ID", "visits", "CRIND_CC", "ISCC", "EBITDAR",
"top_bottom", "Latitude", "Longitude"), class = c("tbl_df", "tbl",
"data.frame"))
Creating the plot:
map <- qmap('Des Moines') +
geom_point(data = top_bottom, aes(x = as.numeric(Longitude),
y = as.numeric(Latitude)), colour = top_bottom, size = 3)
I get the warning message:
Removed 11 rows containing missing values (geom_point).
However, this works without the use of ggmap():
ggplot(top_bottom) +
geom_point(aes(x = as.numeric(Longitude), y = as.numeric(Latitude)),
colour = top_bottom, size = 3)
How do I get the points to overlay on ggmap??

You are using as.numeric() with a factor. As seen here that gives you a level number for the factor (not the number represented). Unsurprisingly, all those levels are points not on the canvas displayed for "Des Moines".
Use as.numeric(as.character(Latitude)) and as.numeric(as.character(Longitude)), as ugly as it seems.

Seeing the sample data, it seems that there is one data point which does not stay in the map area.
library(dplyr)
library(ggplot2)
library(ggmap)
### You can find lon/lat for bbox using your ggmap object.
### For instance, des1 <- ggmap(mymap1)
### str(des1)
### You could use bb2bbox() in the ggmap package to find lon/lat.
filter(top_bottom,
between(Latitude, 41.27057, 41.92782),
between(Longitude, -94.04787, -93.16897)) -> inside
setdiff(top_bottom, inside)
# Store_ID visits CRIND_CC ISCC EBITDAR top_bottom Latitude Longitude
#1 101 23 1469 11611 7325.45 bottom 41.20982 -93.84298
Since you used qmap() without specifying zoom, I do not know what zoom level you had. Let's play around a bit. In the first case, there is one data point missing; Removed 1 rows containing missing values (geom_point).
mymap1 <- get_map('Des Moines', zoom = 10)
ggmap(mymap1) +
geom_point(data = top_bottom, aes(x = as.numeric(Longitude),
y = as.numeric(Latitude)), colour = top_bottom, size = 3)
mymap2 <- get_map('Des Moines', zoom = 9)
ggmap(mymap2) +
geom_point(data = top_bottom, aes(x = as.numeric(Longitude),
y = as.numeric(Latitude)), colour = top_bottom, size = 3)
So the key thing, I think, is that you want to make sure you choose the right zoom level for your data set. For that, you may want to specify zoom in qmap(). I hope this will help you.
DATA
top_bottom <- structure(list(Store_ID = c(92L, 2035L, 50L, 156L, 66L, 207L,
59L, 101L, 130L, 130L, 24L), visits = c(348L, 289L, 266L, 266L,
234L, 18L, 23L, 23L, 26L, 26L, 27L), CRIND_CC = c(14819L, 15584L,
14117L, 7797L, 8314L, 2159L, 10547L, 1469L, 2670L, 2670L, 17916L
), ISCC = c(39013L, 35961L, 27262L, 25095L, 18718L, 17999L, 28806L,
11611L, 13561L, 13561L, 41721L), EBITDAR = c(76449.15, 72454.42,
49775.02, 28645.95, 46325.12, 20097.99, 52168.07, 7325.45, 14348.98,
14348.98, 69991.1), top_bottom = structure(c(2L, 2L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("bottom", "top"), class = "factor"),
Latitude = c(41.731373, 41.589428, 41.559017, 41.6143, 41.6002,
41.636208, 41.56153, 41.20982, 41.614517, 41.6145172, 41.597134
), Longitude = c(-93.58184, -93.80785, -93.77287, -93.834404,
-93.779236, -93.531876, -93.88083, -93.84298, -93.65789,
-93.65789, -93.49263)), .Names = c("Store_ID", "visits",
"CRIND_CC", "ISCC", "EBITDAR", "top_bottom", "Latitude", "Longitude"
), class = "data.frame", row.names = c("1", "2", "3", "4", "5",
"6", "7", "8", "9", "10", "11"))

geom_smooth does not plot SE with separate data frame for local adjustment smoothing

I want to plot a "fitted" mixed models regression by plotting predicted/fitted values and a regression line for each condition with SE's included. Unfortunately, the SE is not plotted (although I do think that the grey shading of the legend indicates that it is processed).
newdat contains the data to plot the predicted values (geom_point); nd contains the data needed to plot the lines via geom_smooth().
newdat <- structure(list(v0 = c(55L, 90L, 30L, 23L, 74L, 48L, 25L, 25L,
60L, 69L, 55L, 41L, 34L, 41L, 53L, 76L, 72L, 64L, 34L, 37L, 75L,
21L, 26L, 14L, 24L, 19L), treatment = structure(c(2L, 1L, 1L,
2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L,
1L, 2L, 1L, 1L, 2L, 2L, 1L), .Label = c("hc", "nhc"), class = "factor"),
cse = c(2, 2, 6, 6, -4, -4, 5, 5, -4, -4, -3, -3, -2, -2,
3, 3, 2, 2, -4, -4, -7, -7, 4, 4, 2, 2), dv280 = structure(c(28.5954553607209,
29.0010807407473, 0.820231380215773, 3.35865456461513, 49.8359456217717,
24.461804847022, 6.23032836368822, 0.772936154511909, 41.8150506885472,
31.9089377911506, 25.2183508293096, 29.203718756273, 23.4674396239055,
18.5277638674685, 14.154110078194, 38.9009660948022, 22.6178239314942,
33.7517449606509, 26.9191029554161, 20.5609256858118, 55.5863616856965,
20.0644146304084, 2.85339319855906, 1.65402829619576, 10.8349022942953,
3.82267888202684), .Dim = c(26L, 1L), .Dimnames = list(c("1",
"2", "3", "4", "5", "6", "7", "8", "11", "12", "13", "14",
"15", "16", "17", "18", "19", "20", "21", "22", "23", "24",
"25", "26", "27", "28"), NULL)), plo = c(18.2940632968672,
8.70682874092615, -9.57004073754051, -7.05295432875793, 35.2691733515267,
14.2687966060566, -3.12208622604343, -8.52627071371677, 30.1788256344375,
18.2506947724591, 14.8705702665525, 20.3644901882128, 15.3980231727933,
10.4235410902273, 3.52894178176158, 22.3750340692014, 7.67201979003711,
21.2004011925819, 16.8945364920955, 10.6654316626679, 39.1117560188314,
4.71896161593837, -5.54649636719771, -8.03839072475669, 3.25706574634023,
-4.38303434571468), phi = c(38.8968474245745, 49.2953327405684,
11.2105034979721, 13.7702634579882, 64.4027178920167, 34.6548130879875,
15.5827429534199, 10.0721430227406, 53.4512757426569, 45.5671808098421,
35.5661313920666, 38.0429473243332, 31.5368560750176, 26.6319866447097,
24.7792783746264, 55.4268981204031, 37.5636280729513, 46.3030887287198,
36.9436694187367, 30.4564197089558, 72.0609673525617, 35.4098676448784,
11.2532827643158, 11.3464473171482, 18.4127388422504, 12.0283921097684
), tlo = c(18.2877068225676, 8.70360144639113, -9.57634287064189,
-7.05924355454202, 35.2646774598802, 14.2623725847359, -3.12908722334489,
-8.53331173874155, 30.1731979587424, 18.2458999214011, 14.8642422705033,
20.3570830595245, 15.3899100922942, 10.4154628193239, 3.52277889155111,
22.371071031997, 7.6676378822382, 21.1951836536363, 16.8880045983016,
10.6588146263129, 39.1077806378248, 4.71469379607788, -5.55429056032973,
-8.04514630529966, 3.24842694535383, -4.39101280006747),
thi = c(38.9032038988741, 49.2985600351034, 11.2168056310734,
13.7765526837723, 64.4072137836632, 34.6612371093081, 15.5897439507213,
10.0791840477654, 53.456903418352, 45.5719756609001, 35.5724593881158,
38.0503544530215, 31.5449691555168, 26.6400649156131, 24.7854412648369,
55.4308611576074, 37.5680099807502, 46.3083062676655, 36.9502013125306,
30.4630367453107, 72.0649427335683, 35.4141354647389, 11.2610769574479,
11.3532028976912, 18.4213776432368, 12.0363705641212)), .Names = c("v0",
"treatment", "cse", "dv280", "plo", "phi", "tlo", "thi"), row.names = c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 11L, 12L, 13L, 14L, 15L, 16L, 17L,
18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L), class =
"data.frame")
nd <- structure(list(v0 = c(55L, 90L, 30L, 23L, 74L, 48L, 25L, 25L,
60L, 69L, 55L, 41L, 34L, 41L, 53L, 76L, 72L, 64L, 34L, 37L, 75L,
21L, 26L, 14L, 24L, 19L), treatment = structure(c(2L, 1L, 1L,
2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L,
1L, 2L, 1L, 1L, 2L, 2L, 1L), .Label = c("hc", "nhc"), class = "factor"),
cse = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0), dv280 = structure(c(32.0471186922315,
32.4527440722579, 11.1752213747477, 13.713644559147, 42.9326189587504,
17.5584781840008, 14.8594866924648, 9.4020944832885, 34.911724025526,
25.0056111281293, 20.0408558320436, 24.0262237590071, 20.0157762923948,
15.0761005359579, 19.3316050754599, 44.0784610920682, 26.0694872630048,
37.2034082921615, 20.0157762923948, 13.6575990227905, 43.5055400254093,
7.98359297012116, 9.75671986158034, 8.55735495921703, 14.2865656258059,
7.27434221353748), .Dim = c(26L, 1L), .Dimnames = list(c("1",
"2", "3", "4", "5", "6", "7", "8", "11", "12", "13", "14",
"15", "16", "17", "18", "19", "20", "21", "22", "23", "24",
"25", "26", "27", "28"), NULL)), plo = c(22.5072031474275,
13.0500664185171, 4.05863186267882, 5.77259649335957, 28.5121051842211,
9.29736790581986, 7.2048329083037, 1.78971324244184, 24.2096188947274,
11.7540646048434, 10.363946079095, 16.7677263682142, 13.0339097497873,
7.77962797988299, 10.0943827426394, 29.0914605330986, 11.9778881852231,
25.4951576099086, 13.0339097497873, 6.63425372645495, 28.8025761975293,
-0.238742065354621, 2.26914358668319, -1.1076157441286, 6.49404176281806,
-1.31100367364568), phi = c(41.5870342370355, 51.8554217259987,
18.2918108868166, 21.6546926249345, 57.3531327332797, 25.8195884621816,
22.5141404766259, 17.0144757241352, 45.6138291563245, 38.2571576514153,
29.7177655849923, 31.2847211497999, 26.9976428350024, 22.3725730920328,
28.5688274082805, 59.0654616510377, 40.1610863407866, 48.9116589744144,
26.9976428350024, 20.6809443191261, 58.2085038532893, 16.2059280055969,
17.2442961364775, 18.2223256625627, 22.0790894887938, 15.8596881007206
), tlo = c(22.500339650347, 13.046690851483, 4.0494338564546,
5.76435239847513, 28.5075637077657, 9.28944290164532, 7.19628070358881,
1.78111359966419, 24.2035002837825, 11.7491226514857, 10.3571796788833,
16.7587079630338, 13.024534493447, 7.77065645403329, 10.0872945063297,
29.087090666816, 11.9732407398515, 25.4895645373964, 13.024534493447,
6.62493376528452, 28.7981219380453, -0.246704406534527, 2.26040076852036,
-1.1143904969611, 6.4856407045028, -1.3186296516059), thi = c(41.593897734116,
51.8587972930328, 18.3010088930408, 21.6629367198189, 57.3576742097352,
25.8275134663562, 22.5226926813408, 17.0230753669128, 45.6199477672694,
38.2620996047729, 29.7245319852039, 31.2937395549803, 27.0070180913427,
22.3815446178825, 28.5759156445901, 59.0698315173204, 40.1657337861582,
48.9172520469266, 27.0070180913427, 20.6902642802966, 58.2129581127733,
16.2138903467768, 17.2530389546403, 18.2291004153952, 22.0874905471091,
15.8673140786809)), .Names = c("v0", "treatment", "cse",
"dv280", "plo", "phi", "tlo", "thi"), row.names = c(1L, 2L, 3L,
4L, 5L, 6L, 7L, 8L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L,
20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L), class = "data.frame")
p <- ggplot(data=newdat, mapping=aes(x=v0, y=dv280, colour=treatment)) +
geom_point() +
geom_smooth(data=nd, method='lm', se=TRUE) +
scale_colour_discrete(guide=guide_legend(title.position='left', title.hjust=1))
p + .mytheme + coord_cartesian(xlim=c(-20,100)) +
geom_hline(yintercept=0, colour='gray35', linetype='dashed') +
geom_vline(xintercept=0, colour='gray35', linetype='dashed')
This is all nice and dandy but unfortunately, the SE is not displayed:
I don't get why the SE is dropped (or perhaps overwritten by/with something else, given that the legend appears to indicate that the SE parameter is seen and recognized).
The code to generate newdat and nd is as follows with the main difference that for nd the value for cse is set to zero.
m <- lmer(dv280 ~ 1 + v0:treatment + cse + (0 + v0 | pp), data=dat, REML=TRUE)
newdat <- data.frame(
v0=dat$v0,
treatment=dat$treatment,
cse=dat$cse,
dv280=0)
newdat <- newdat[-c(9,10),]
mm <- model.matrix(terms(m), newdat)
newdat$dv280 <- mm %*% fixef(m)
pvar1 <- diag(mm %*% tcrossprod(vcov(m), mm))
tvar1 <- pvar1 + VarCorr(m)$pp[1]
newdat <- data.frame(newdat, plo=newdat$dv280 - 2 * sqrt(pvar1), phi=newdat$dv280 + 2 * sqrt(pvar1),
tlo=newdat$dv280 - 2 * sqrt(tvar1), thi=newdat$dv280 + 2 * sqrt(tvar1))
nd <- data.frame(
v0=dat$v0,
treatment=dat$treatment,
cse=0,
dv280=0)
nd <- nd[-c(9,10),]
mm <- model.matrix(terms(m), nd)
nd$dv280 <- mm %*% fixef(m)
pvar1 <- diag(mm %*% tcrossprod(vcov(m), mm))
tvar1 <- pvar1 + VarCorr(m)$pp[1]
nd <- data.frame(nd, plo=nd$dv280 - 2 * sqrt(pvar1), phi=nd$dv280 + 2 * sqrt(pvar1),
tlo=nd$dv280 - 2 * sqrt(tvar1), thi=nd$dv280 + 2 * sqrt(tvar1))

In your code,
p <- ggplot(data=newdat, mapping=aes(x=v0, y=dv280, colour=treatment)) +
geom_point() +
geom_smooth(data=nd, method='lm', se=TRUE) +
scale_colour_discrete(guide=guide_legend(title.position='left', title.hjust=1))
You use newdat as the dataset in geom_points(...), and nd in geom_smooth(...). The problem is that nd apparently contains the fitted values (e.g. the predictions). So the se around the "data" from nd is 0. You can see the by typing:
p <- ggplot(data=nd, mapping=aes(x=v0, y=dv280, colour=treatment)) +
geom_point() +
geom_smooth(data=nd, method='lm', se=TRUE) +
scale_colour_discrete(guide=guide_legend(title.position='left', title.hjust=1))
The points in nd lie perfectly on a line, so se=0. If the point of using nd is to calculate the lm using some subset of your data, then provide that as the dataset to geom_smooth(...). For example, the code below plots all the points, but lm is done on the subset where dv280 > 5:
p <- ggplot(data=newdat, mapping=aes(x=v0, y=dv280, colour=treatment)) +
geom_point() +
geom_smooth(data=subset(newdat,dv280>5), method='lm', se=TRUE) +
scale_colour_discrete(guide=guide_legend(title.position='left', title.hjust=1))
Edit: In response to OP's comment.
It looks like you are fitting an lme model to dat$dv280. So the data is in dat. On the other hand, newdat$dv280 is the prediction based on the model parameters and a model matrix containing v0, treatment, and cse. In addition, nd$dv280 is the prediction based on v0 and treatment, with cse=0. So it is not surprising that a plot of nd$dv280 vs. v0, grouped by treatment, will be perfectly linear. Nowhere in your ggplot code are you plotting the actual data (e.g., dat$dv280). The apparent scatter is solely due to the effect of cse. So at this point I'm not really sure what you are trying to demonstrate with this plot (??).

If I understand the question correctly, I think the following code might put you on the right track. You can explicitly state the standard errors:
model <- lm(dv280 ~ v0, data=newdat)
err <- stats::predict(model, newdata=newdat, se=TRUE)
newdat$ucl <- err$fit + 1.96 * err$se.fit
newdat$lcl <- err$fit - 1.96 * err$se.fit
qplot(v0, dv280, data=newdat, colour=treatment) +
geom_smooth(aes(ymin=lcl, ymax=ucl), data=newdat, method="lm")
This will give you the following figure:

scale_fill_manual based on another factor in ggplot2

I am trying to color-code my legend based on a broader categorization of the factor used to "fill" my geom_bar in ggplot2. My plot looks like this: which I got using this R code:
ggplot(df, aes(year, TOTALshark, fill=fishery)) + geom_bar(width=.5,stat="identity", position="dodge")+ facet_wrap(~div)
Here is a dput sample of my dataset:
> dput(smpl)
df <- structure(list(X1 = structure(c(6L, 11L, 22L, 27L, 10L, 10L,
6L, 11L, 6L, 10L, 8L, 6L, 6L, 4L, 22L, 18L, 10L, 10L, 11L, 6L
), .Label = c("AMERICAN PLAICE", "BIGEYE TUNA", "BIVALVE", "BLUEFIN TUNA",
"CAPELIN", "COD(ATL)", "CRAB(SNOW,QUEEN)", "HADDOCK", "HAGFISH(ATL)",
"HALIBUT(ATL)", "HALIBUT(GREENLAND)", "HERRING(ATL)", "JONAH CRAB (CANC.BOR.)",
"LOBSTER", "LONGHORN SCULPIN", "LUMPFISH", "MACKEREL(ATL)", "MONKFISH",
"PAND.BOR.", "PAND.MON.", "POLLOCK", "REDFISH", "SCALLOP", "SEA URCHINS",
"SEACU", "SILVER HAKE", "SWORDFISH", "WHELK", "WHITE HAKE", "WINTER FLOUNDER",
"WITCH FLOUNDER", "YELLOWFIN TUNA", "YELLOWTAIL FLOUNDER"), class = "factor"),
X2 = structure(c(2L, 2L, 8L, 5L, 5L, 5L, 5L, 8L, 5L, 5L,
5L, 2L, 5L, 5L, 8L, 2L, 5L, 5L, 2L, 2L), .Label = c("Dredge",
"Gillnet", "Hook", "Jigger", "Line", "Seine", "Trap", "Trawlb",
"Trawlm"), class = "factor"), fishery = structure(c(12L,
25L, 43L, 50L, 24L, 24L, 15L, 27L, 15L, 24L, 21L, 12L, 15L,
9L, 43L, 36L, 24L, 24L, 25L, 12L), .Label = c("AMERICAN PLAICE-Gillnet",
"AMERICAN PLAICE-Line", "AMERICAN PLAICE-Trawlb", "BIGEYE TUNA-Jigger",
"BIGEYE TUNA-Line", "BIVALVE-Dredge", "BLUEFIN TUNA-Hook",
"BLUEFIN TUNA-Jigger", "BLUEFIN TUNA-Line", "CAPELIN-Seine",
"CAPELIN-Trap", "COD(ATL)-Gillnet", "COD(ATL)-Hook", "COD(ATL)-Jigger",
"COD(ATL)-Line", "COD(ATL)-Trap", "COD(ATL)-Trawlb", "CRAB(SNOW,QUEEN)-Trap",
"CUSK-Line", "HADDOCK-Gillnet", "HADDOCK-Line", "HADDOCK-Trawlb",
"HAGFISH(ATL)-Trap", "HALIBUT(ATL)-Line", "HALIBUT(GREENLAND)-Gillnet",
"HALIBUT(GREENLAND)-Line", "HALIBUT(GREENLAND)-Trawlb", "HERRING(ATL)-Seine",
"HERRING(ATL)-Trawlm", "JONAH CRAB (CANC.BOR.)-Trap", "LOBSTER-Trap",
"LONGHORN SCULPIN-Trawlb", "LUMPFISH-Gillnet", "MACKEREL(ATL)-Seine",
"MACKEREL(ATL)-Trawlm", "MONKFISH-Gillnet", "MONKFISH-Trawlb",
"PAND.BOR.-Trawlb", "PAND.MON.-Trawlb", "POLLOCK-Gillnet",
"POLLOCK-Trawlb", "REDFISH-Gillnet", "REDFISH-Trawlb", "REDFISH-Trawlm",
"SCALLOP-Dredge", "SEA URCHINS-Dredge", "SEACU-Dredge", "SILVER HAKE-Trawlb",
"SWORDFISH-Jigger", "SWORDFISH-Line", "SWORDFISH-unk", "WHELK-Trap",
"WHITE HAKE-Gillnet", "WHITE HAKE-Line", "WINTER FLOUNDER-Gillnet",
"WINTER FLOUNDER-Trawlb", "WITCH FLOUNDER-Trawlb", "YELLOWFIN TUNA-Line",
"YELLOWTAIL FLOUNDER-Trawlb"), class = "factor"), year = c(2008L,
2008L, 2009L, 2009L, 2008L, 2009L, 2009L, 2008L, 2006L, 2007L,
2007L, 2007L, 2007L, 2007L, 2008L, 2008L, 2009L, 2009L, 2009L,
2009L), div = structure(c(6L, 19L, 2L, 4L, 5L, 10L, 3L, 19L,
9L, 10L, 3L, 9L, 6L, 4L, 3L, 9L, 6L, 11L, 7L, 9L), .Label = c("5Z",
"5Y", "4X", "4W", "4V", "4T", "4S", "4R", "3P", "3O", "3N",
"3M", "3L", "3K", "2J", "2H", "2G", "1F", "0B", "1B", "0A"
), class = "factor"), TOTALshark = c(3369.72, 12243.2, 6080.06,
316646.05, 18786.8, 6565.91, 1339771.2, 45841.03, 41329.64,
6411.86, 204980.36, 67608.78, 2617.05, 61547.64, 447349.44,
13226.4, 1362.55, 6012.23, 13152.51, 1067.92), cat = structure(c(1L,
1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L,
1L, 1L, 1L, 1L), .Label = c("groundfish", "largepelagic",
"bivalve", "smallpelagic", "crabs/lobsters", "shrimps", "others"
), class = "factor")), .Names = c("X1", "X2", "fishery",
"year", "div", "TOTALshark", "cat"), class = "data.frame", row.names = c(70L,
278L, 500L, 554L, 242L, 245L, 131L, 315L, 106L, 224L, 194L, 60L,
115L, 37L, 489L, 385L, 249L, 244L, 284L, 75L))
I wish to have the same legend, but with a few colors based on which category "cat" variable (i.e.,, pelagic, groundfish) the fishery falls in.

Is this what you want?
library(ggplot2)
library(plyr)
library(gridExtra)
# create data that links colour per 'cat' with 'fishery'
# the 'cat' colours will be used as manually set fill colours.
# get 'cat' colours
# alt. 1: grab 'cat' colours from plot object
# create a plot with fill = fishery *and* colour = cat
g1 <- ggplot(df, aes(x = year, y = TOTALshark, fill = fishery, colour = cat)) +
geom_bar(width = 0.5, stat = "identity", position = "dodge") +
facet_wrap(~ div)
g1
# grab 'cat' colours for each 'fishery' from plot object
# to be used in manual fill
cat_cols <- unique(ggplot_build(g1)[["data"]][[1]]$colour)
# unique 'cat'
cat <- unique(df$cat)
# create data frame with one colour per 'cat'
df2 <- data.frame(cat = cat, cat_cols)
df2
# alt 2: create your own 'cat' colours
# number of unique 'cat'
n <- length(cats)
# select one colour per 'cat', from e.g. brewer_pal or other palette tools
cat_cols <- brewer_pal(type = "qual")(n)
# cat_cols <- rainbow(n)
# create data frame with one colour per 'cat'
df2 <- data.frame(cat, cat_cols)
df2
# select unique 'fishery' and 'cat' combinations
# in the order they show up in the legend, i.e. ordered ('arranged') by fishery
df3 <- unique(arrange(df[, c("fishery", "cat")], fishery))
df3
# add 'cat' colours to 'fishery'
# use 'join' to keep order
df3 <- join(df3, df2)
df3
# plot with fill by 'fishery' creates a fill scale by fishery,
# but colours are set manually using scale_fill_manual and the 'cat' colours from above
g2 <- ggplot(df, aes(x = year, y = TOTALshark, fill = fishery)) +
geom_bar(width = 0.5, stat = "identity", position = "dodge") +
facet_wrap(~ div, nrow = 5) +
scale_fill_manual(values = as.character(df3$cat_cols))
g2
# create plot with both 'fishery' and 'cat' legend.
# extract 'fisheries' legend
tmp <- ggplot_gtable(ggplot_build(g2))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend_fish <- tmp$grobs[[leg]]
# create a non-sense plot just to get a 'fill = cat' legend
g3 <- ggplot(df, aes(x = year, y = TOTALshark, fill = cat)) +
geom_bar(stat = "identity") +
scale_fill_manual(values = as.character(df3$cat_cols))
# extract 'cat' legend
tmp <- ggplot_gtable(ggplot_build(g3))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend_cat <- tmp$grobs[[leg]]
# arrange plot and legends
library(gridExtra)
# quick and dirty with grid.arrange
# in the first column, put the plot (g2) without legend (removed using the 'theme' code)
# put the two legends in the second column
grid.arrange(g2 + theme(legend.position = "none"),
arrangeGrob(legend_fish, legend_cat), ncol = 2)
# arrange with viewports
# define plotting regions (viewports)
grid.newpage()
vp_plot <- viewport(x = 0.25, y = 0.5,
width = 0.5, height = 1)
vp_legend <- viewport(x = 0.75, y = 0.7,
width = 0.5, height = 0.5)
vp_sublegend <- viewport(x = 0.7, y = 0.25,
width = 0.5, height = 0.3)
print(g2 + theme(legend.position = "none"), vp = vp_plot)
upViewport(0)
pushViewport(vp_legend)
grid.draw(legend_fish)
upViewport(0)
pushViewport(vp_sublegend)
grid.draw(legend_cat)
See also #mnel's answer here for replacing values in the plot object. It might be worth trying here as well. You may also check gtable methods for arranging grobs.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Labeling a single point with ggrepel - r

Related

Order Bars in ggplot2 from high to low (when repeating words are used)

Add age adjustment to geom_smooth

How is geom_point removing rows containing missing values?

geom_smooth does not plot SE with separate data frame for local adjustment smoothing

scale_fill_manual based on another factor in ggplot2

Categories

Resources