How to use mse function - babynames example - r

So I am using the 'babynames' package in rstudio and am trying to get the 35 most common unisex names. I am trying to rank the names based on the mean squared error from the 50-50 line (however, I am not sure how to do this). Any help would be greatly appreciated! (Also below my code I will put the 'reference code' we were given that includes what the top 35 unisex names are)
Reference Code:
actual_names <- c("Jessie", "Marion", "Jackie", "Alva", "Ollie",
"Jody", "Cleo", "Kerry", "Frankie", "Guadalupe",
"Carey", "Tommie", "Angel", "Hollis", "Sammie",
"Jamie", "Kris", "Robbie", "Tracy", "Merrill",
"Noel", "Rene", "Johnnie", "Ariel", "Jan",
"Devon", "Cruz", "Michel", "Gale", "Robin",
"Dorian", "Casey", "Dana", "Kim", "Shannon")

I think there are a few ways to answer the question as posed, since there's a tradeoff between "most popular" and "most unisex."
Here's a way to prep the data to collect some stats for each name.
library(babynames)
library(tidyverse)
babynames_share <-
babynames %>%
filter(year >= 1930, year <= 2012) %>%
count(name, sex, wt = n) %>%
spread(sex, n, fill = 0) %>%
mutate(Total = F + M,
F_share = F / Total,
MS_50 = ((F_share-0.5)^2 +
(0.5-F_share)^2) / 2)
It looks like around 100 names have perfect gender parity -- but they're all quite uncommon:
babynames_share %>%
filter(F == M) %>%
arrange(-Total)
# A tibble: 100 x 6
name F M Total F_share RMS_50
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Tyjae 157 157 314 0.5 0
2 Callaway 128 128 256 0.5 0
3 Avyn 100 100 200 0.5 0
4 Zarin 92 92 184 0.5 0
5 Tkai 72 72 144 0.5 0
6 Rayen 57 57 114 0.5 0
7 Meco 43 43 86 0.5 0
8 Pele 40 40 80 0.5 0
9 Nijay 35 35 70 0.5 0
10 Mako 27 27 54 0.5 0
# … with 90 more rows
Or we might pick some arbitrary threshold for what counts as unisex. In the example above, I've calculated the mean squared error for the female and male percent shares. We can plot that to see very gendered names on the top (MS_50 tops out at 0.25 by this measure), and unisex names toward the bottom. But it isn't obvious to me how far down we should go to count a name as unisex. Is Casey, which is 58.9% male, with therefore an 8.9%^2 = 0.79% squared error, unisex? Or do we need to further to Jessie, which is 50.8% male?
babynames_share %>%
ggplot(data = .,
aes(Total, MS_50, label = name)) +
geom_point(size = 0.2, alpha = 0.1, color = "gray30") +
geom_text(data = . %>% filter(Total > 10000),
check_overlap = TRUE, size = 3) +
scale_x_log10(breaks = c(10^(1:7)),
labels = scales::comma)
At the "Casey" level of gender parity, here are the top 35:
unisex_names <- babynames_share %>%
filter(MS_50 <= 0.00796) %>%
arrange(-Total) %>%
top_n(35, wt = Total)
It's also interesting to see the whole spectrum of names, with most male on the bottom, female on the top, and unisex in the middle:
babynames_share %>%
ggplot(data = .,
aes(Total, F_share, label = name)) +
geom_point(size = 0.2, alpha = 0.1, color = "gray30") +
geom_text(data = . %>% filter(Total > 10000),
check_overlap = TRUE, size = 2) +
scale_x_log10(breaks = c(10^(1:7)),
labels = scales::comma)

Related

How to apply a filter ( dplyr) in ggplot2?

I'm trying to fill some specific areas of my geographic map with the purple color and I have no problem in doing that. This is the script I'm using:
right_join(prov2022, database, by = "COD_PROV") %>%
ggplot(aes(fill = `wage` > 500 & `wage` <=1000))+
geom_sf() +
theme_void() +
theme(legend.title=element_blank())+
scale_fill_manual(values = c('white', 'purple'))
But now I want to apply a filter in my ggplot2 picture.
I need to fill the areas of the map, but only those that have the value 13 in the column(variable) COD_REG.
I have added filter( COD_REG == 13) but it doesn't work
right_join(prov2022, database, by = "COD_PROV") %>%
filter( COD_REG == 13)
ggplot(aes(fill = `wage` > 500 & `wage` <=1000))+
geom_sf() +
theme_void() +
theme(legend.title=element_blank())+
scale_fill_manual(values = c('white', 'purple'))
R answers
> right_join(prov2022, database, by = "COD_PROV") %>%
+ filter( COD_REG == 13)
Error in `stopifnot()`:
! Problem while computing `..1 = COD_REG == 13`.
✖ Input `..1` must be of size 106 or 1, not size 107.
Run `rlang::last_error()` to see where the error occurred.
My database has 106 obs and 13 variables and it is like this
COD_REG COD_PROV wage
1 91 530
1 92 520
1 93 510
2 97 500
2 98 505
2 99 501
13 102 700
13 103 800
13 109 900
Where is the mistake?
Why R answers << ✖ Input ..1 must be of size 106 or 1, not size 107. >> ??
How can I solve???
I think you might have another filter function that shadows the dplyr one. Also you have forgot to add a %>% after your filter. Could you try this:
right_join(prov2022, database, by = "COD_PROV") %>%
dplyr::filter(COD_REG == 13) %>%
ggplot(aes(fill = `wage` > 500 & `wage` <=1000))+
geom_sf() +
theme_void() +
theme(legend.title=element_blank())+
scale_fill_manual(values = c('white', 'purple'))
The code runs as expected with the sample data.
library(tidyverse)
data <- "COD_REG COD_PROV wage
1 91 530
1 92 520
1 93 510
2 97 500
2 98 505
2 99 501
13 102 700
13 103 800
13 109 900"
read_table(data) |>
filter(COD_REG == 13)
#> # A tibble: 3 × 3
#> COD_REG COD_PROV wage
#> <dbl> <dbl> <dbl>
#> 1 13 102 700
#> 2 13 103 800
#> 3 13 109 900
Created on 2023-02-03 with reprex v2.0.2

Produce multiple stratified frequency tables using a for-loop in R

I am trying to produce multiple frequency tables that are stratified by multiple independent variables. I can get this to work for one variable and one stratification variable, but my for-loop is broken.
library(tidyverse)
# Create example dataframe of survey data
df <- data.frame(
var1 = sample(1:7, 1000, replace = TRUE),
var2 = sample(1:7, 1000, replace = TRUE),
var3 = sample(1:7, 1000, replace = TRUE),
var4 = sample(1:7, 1000, replace = TRUE),
var5 = sample(1:7, 1000, replace = TRUE),
var6 = sample(1:7, 1000, replace = TRUE),
strat1 = sample(c("A", "B", "C"), 1000, replace = TRUE),
strat2 = sample(c("X", "Y"), 1000, replace = TRUE),
strat3 = sample(c("True", "False"), 1000, replace = TRUE)
)
Example that works for one variable and one stratification variable. I want to convert this code into a for loop:
temp_df <- df %>% count(var1)
temp_df$percent <- temp_df$n / sum(temp_df$n) * 10
strat_df <- temp_df %>%
left_join((df %>% group_by(var1, strat1) %>% count(var1) %>% pivot_wider(names_from = strat1, values_from = n)), by = "var1")
for(k in c("A","B","C")){
strat_df[paste0(k, "_pct")] <- (strat_df[[k]] / temp_df$n) * 100
}
I want this same sort of output, but with added columns for count and _pct of the other two stratification variables.
I've tried using the following for loop, but it's only giving me one row per variable and it only produces two columns for each strat variable, whereas the output I'm looking for would have a raw count and column percentage column for each category within a stratification variable. Since there are 3 strat vars, two having two categories and one having three categories, my desired output would have 13 columns including the column for "v#", "n", and "percent".
# Create a list of the variables of interest
variables <- c("var1", "var2", "var3", "var4", "var5", "var6")
# Create a list of the stratification variables
strats <- c("strat1", "strat2", "strat3")
# Create a loop that runs through each variable
for(i in variables){
# Create a frequency table for the current variable
temp_df <- df %>% count(!! i)
# Add a column for the percent of responses within each response category
temp_df$percent <- temp_df$n / sum(temp_df$n) * 100
# Add a column for the raw count for each category of the stratification variables
for(j in strats){
temp_df <- temp_df %>% group_by(!!i) %>% mutate( !!j := n() )
}
# Add a column for the percent of the stratification variable category within the response category
for(j in strats){
temp_df[paste0(j, "_pct")] <- (temp_df[[j]] / temp_df$n) * 100
}
assign(paste0(i,"_df"), temp_df)
}
This is what I would like my output to look like:
UPDATE:
Came up with a solution that outputs what I need:
for(i in variables){
j = sym(i)
temp_df <- df %>% count(!!j)
temp_df$percent <- temp_df$n / sum(temp_df$n) * 10
strat_df <- temp_df %>%
left_join((df %>% group_by(!!j, strat1) %>% count(!!j) %>% pivot_wider(names_from = strat1, values_from = n)), by = i) %>%
left_join((df %>% group_by(!!j, strat2) %>% count(!!j) %>% pivot_wider(names_from = strat2, values_from = n)), by = i) %>%
left_join((df %>% group_by(!!j, strat3) %>% count(!!j) %>% pivot_wider(names_from = strat3, values_from = n)), by = i)
for(k in c("A","B","C","X","Y","True","False")){
strat_df[paste0(k, "_pct")] <- (strat_df[[k]] / temp_df$n) * 100
}
assign(paste0(i,"_df"), strat_df)
Either convert to symbol and evaluate (!!) or use across as the variables looped are strings
for(i in variables){
# Create a frequency table for the current variable
temp_df <- df %>% count(across(all_of(i)))
# Add a column for the percent of responses within each response category
temp_df$percent <- temp_df$n / sum(temp_df$n) * 100
# Add a column for the raw count for each category of the stratification variables
strat_df <- temp_df %>%
left_join((df %>% group_by(across(all_of(c(i, "strat1")))) %>%
count(across(all_of(i))) %>%
pivot_wider(names_from = strat1, values_from = n)), by = i) %>%
left_join((df %>% group_by(across(all_of(c(i, "strat2")))) %>%
count(across(all_of(i))) %>%
pivot_wider(names_from = strat2, values_from = n)), by = i) %>%
left_join((df %>% group_by(across(all_of(c(i, "strat3")))) %>%
count(across(all_of(i))) %>%
pivot_wider(names_from = strat3, values_from = n)), by = i)
# Add a column for the percent of the stratification variable category within the response category
for(j in c("A","B","C","X","Y","True","False")){
strat_df[paste0(j, "_pct")] <- (strat_df[[j]] / temp_df$n) * 100
}
assign(paste0(i,"_df"), strat_df)
}
-output
> var1_df
var1 n percent A B C X Y False True A_pct B_pct C_pct X_pct Y_pct True_pct False_pct
1 1 121 12.1 36 42 43 59 62 63 58 29.75207 34.71074 35.53719 48.76033 51.23967 47.93388 52.06612
2 2 144 14.4 51 42 51 84 60 69 75 35.41667 29.16667 35.41667 58.33333 41.66667 52.08333 47.91667
3 3 147 14.7 41 39 67 60 87 73 74 27.89116 26.53061 45.57823 40.81633 59.18367 50.34014 49.65986
4 4 146 14.6 52 45 49 74 72 79 67 35.61644 30.82192 33.56164 50.68493 49.31507 45.89041 54.10959
5 5 165 16.5 51 57 57 86 79 76 89 30.90909 34.54545 34.54545 52.12121 47.87879 53.93939 46.06061
6 6 133 13.3 48 51 34 64 69 68 65 36.09023 38.34586 25.56391 48.12030 51.87970 48.87218 51.12782
7 7 144 14.4 53 44 47 67 77 73 71 36.80556 30.55556 32.63889 46.52778 53.47222 49.30556 50.69444
> var2_df
var2 n percent A B C X Y False True A_pct B_pct C_pct X_pct Y_pct True_pct False_pct
1 1 152 15.2 51 53 48 79 73 70 82 33.55263 34.86842 31.57895 51.97368 48.02632 53.94737 46.05263
2 2 147 14.7 49 46 52 73 74 55 92 33.33333 31.29252 35.37415 49.65986 50.34014 62.58503 37.41497
3 3 142 14.2 46 45 51 72 70 79 63 32.39437 31.69014 35.91549 50.70423 49.29577 44.36620 55.63380
4 4 147 14.7 50 48 49 74 73 72 75 34.01361 32.65306 33.33333 50.34014 49.65986 51.02041 48.97959
5 5 128 12.8 45 43 40 59 69 72 56 35.15625 33.59375 31.25000 46.09375 53.90625 43.75000 56.25000
6 6 152 15.2 37 52 63 74 78 83 69 24.34211 34.21053 41.44737 48.68421 51.31579 45.39474 54.60526
7 7 132 13.2 54 33 45 63 69 70 62 40.90909 25.00000 34.09091 47.72727 52.27273 46.96970 53.03030

ggplot boxplots with 2 y axes

I have been looking everywhere to find out how to ggplot boxplots with 2 y axes.
This is what I want the plot to look like:
boxplot
Example data:
Sample Tumor Score_1 Score_2
1 A 100 -20
2 B 80 -10
3 C 5 -5
4 C 6 -7
5 C 80 -8
6 C 70 -30
7 C 80 -5
8 C 90 -6
9 A 150 -8
10 B 1 -10
11 B 2 -10
12 B 4 -9
13 B 5 -7
14 B 8 -6
15 B 10 -4
16 B 12 -8
17 B 7 -10
18 B 6 -11
19 C 70 -15
20 C 90 -4
21 C 95 -3
22 C 120 -6
23 C 130 -9
24 C 50 -5
25 C 113 -10
26 C 100 -2
27 C 90 -1
28 C 50 -11
29 C 80 -15
30 A 200 -7
31 A 200 -4
32 A 180 -3
33 A 160 -9
34 A 107 -15
35 A 115 -11
36 A 80 -12
37 A 90 -14
38 A 130 -13
39 A 140 -9
40 A 120 -10
myboxplot <- read.csv("Example.csv")
#Set up labels
ylim.prim <- c(0, 500)
ylim.sec <- c(-35, 0)
b <- diff(ylim.prim)/diff(ylim.sec)
a <- b*(ylim.prim[1] - ylim.sec[1])
myboxplot %>%
pivot_longer(cols = c(Score_1, Score_2)) %>%
mutate(name = factor(name, levels = c("Score_1", "Score_2"))) %>%
ggplot(aes(x = Tumor)) +
geom_boxplot(aes(y = value, fill = name)) +
scale_y_continuous(name ="Score 1", sec.axis = sec_axis(~ ((. - a)/b), name = expression("Score 2"))) +
scale_x_discrete(name = "Tumor") +
theme_bw() +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank())+
theme(plot.title = element_text(size = 14, face = "bold"),
text = element_text(size = 12),
#axis.title = element_text(face="bold"),
axis.text.x=element_text(size = 11),
legend.position = "right") +
scale_fill_manual(values = wes_palette("GrandBudapest2"))
I do get the plot in the image (linked above), the problem is my second set of data (the purple boxplots "Score 2") is not being aligned with the second y axis, it is aligning with the first y axis. Since the data is much smaller with a range of -35 to 0, you can't see the difference between the tumor types. Does anyone have any ideas how to change this?
Thank you in advance!
I think the plot you are requesting might be misleading. Instead, how about a facet?
library(tidyverse)
data %>%
pivot_longer(-c("Sample","Tumor"), names_to = "Score") %>%
ggplot(aes( x= Tumor, y = value, fill = Score)) +
geom_boxplot() +
facet_wrap(.~Score, scales = "free")
Or as #NickCox suggests:
data %>%
pivot_longer(-c("Sample","Tumor"), names_to = "Score") %>%
group_by(Score,Tumor) %>%
arrange(value) %>%
mutate(xcoord = seq(-0.25,0.25,length.out = n()),
Tumor = factor(Tumor)) %>%
ggplot(aes( x= Tumor, y = value, fill = Score)) +
geom_boxplot(outlier.shape = NA, coef = 0) +
geom_point(aes(x = xcoord + as.integer(Tumor))) +
facet_wrap(.~Score, scales = "free")
[This was posted when the question was on Cross Validated]
Box plots I find oversold whenever, as usually, there is scope to show more detail. Here is one of several possibilities, a quantile-box plot in Parzen's sense in which for each group a standard box showing median and quartiles is superimposed on a quantile plot, in which the implicit horizontal axis is rank order. The detail that apart from some small integers many values are just multiples of 10 is of interest and should help a little in interpretation.
This plot doesn't use R. People who use R should find doing something similar or better to be trivial -- and those whose favourite software is different should be able to say the same. If not, you need new favourite software.

How to use geom_errorbar with facet_wrap in ggplot2

I am facing a problem adding error bars to my plots. I have a data frame like this:
> str(bank1)
'data.frame': 24 obs. of 4 variables:
$ site : Factor w/ 12 levels "BED","BEU","EB",..: 8 9 10 3 11 1 6 7 5 4 ...
$ canopy : Factor w/ 3 levels "M_Closed","M_Open",..: 3 3 3 3 2 2 2 2 1 1 ...
$ variable: Factor w/ 2 levels "depth5","depth10": 1 1 1 1 1 1 1 1 1 1 ...
$ value : int 200 319 103 437 33 51 165 38 26 29 ...
I plot it like this:
gs1 <- ggplot(bank1, aes(x = canopy, y= value , fill = variable)) +
geom_bar(stat='identity', position = 'dodge', fill = 'darkgray')+
xlab("Canopy cover")+ylab("Seed Bank")+
facet_wrap(~variable,nrow=1)
gs1
This gives a plot like this:
My problem is when I want to add the error bars (standard deviation), the code does not run. I use this code:
bank2 <- bank1
bank2.mean = ddply(bank2, .(canopy, variable), summarize,
plant.mean = mean(value), plant.sd = sd(value))
gs1 <- ggplot(bank1, aes(x = canopy, y= value , fill = variable)) +
geom_bar(stat='identity', position = 'dodge', fill = 'darkgray')+
geom_errorbar(aes(ymin=plant.mean-plant.sd, ymax = plant.mean +
plant.sd), width = 0.5)+
xlab("Canopy cover")+ylab("Seed Bank")+
facet_wrap(~variable,nrow=1)
gs1
I searched for help here, here, here and here but I did not understand how to proceed.
Kindly help!
Here I reproduce an example:
> set.seed(1)
> Data1 <- data.frame(
+ site= c("KOA","KOB","KOO","EB","PNS","BED","KB","KER","KAU","KAD","RO","BEU"),
+ variable = sample(c("depth5", "depth10"), 12, replace = TRUE),
+ canopy=sample(c("open", "M_open", "M_closed"), 12, replace = TRUE),
+ value=sample(c(100,500,50,20,112,200,230,250,300,150,160,400))
+ )
> Data1
site variable canopy value
1 KOA depth5 M_closed 20
2 KOB depth5 M_open 112
3 KOO depth10 M_closed 100
4 EB depth10 M_open 400
5 PNS depth5 M_closed 230
6 BED depth10 M_closed 50
7 KB depth10 M_open 250
8 KER depth10 M_closed 200
9 KAU depth10 M_closed 500
10 KAD depth5 open 150
11 RO depth5 M_open 300
12 BEU depth5 open 160
> gs1 <- ggplot(Data1, aes(x = canopy, y= value , fill = variable)) +
+ geom_bar(stat='identity', position = 'dodge', fill = 'darkgray')+
+ xlab("Canopy cover")+ylab("Seed Bank")+
+ facet_wrap(~variable,nrow=1)
> gs1
> Data2 <- Data1
> data2.mean = ddply(Data2, .(canopy, variable), summarize,
+ plant.mean = mean(value), plant.sd = sd(value))
> gs1 <- ggplot(Data2, aes(x = canopy, y= value , fill = variable)) +
+ geom_bar(stat='identity', position = 'dodge', fill = 'darkgray')+
+ geom_errorbar(aes(ymin=plant.mean-plant.sd, ymax = plant.mean +
+ plant.sd), width = 0.5)+
+ xlab("Canopy cover")+ylab("Seed Bank")+
+ facet_wrap(~variable,nrow=1)
> gs1
Error in FUN(X[[i]], ...) : object 'plant.mean' not found
I get the same error with my original data
The solution to my problem is here. The way I wanted. You need these packages
library(ggplot2)
library(dplyr)
My data frame bank1 was piped into a new data frame cleandata to calculate the mean, sd and se and summarize the results
cleandata <- bank1 %>%
group_by(canopy, variable) %>%
summarise(mean.value = mean(value),
sd.value = sd(value), count = n(),
se.mean = sd.value/sqrt(count))
The summarized results look like this:
> head(cleandata)
# A tibble: 6 x 6
# Groups: canopy [3]
canopy variable mean.value sd.value count se.mean
<fct> <fct> <dbl> <dbl> <int> <dbl>
1 Open depth5 265. 145. 4 72.4
2 Open depth10 20.5 12.8 4 6.41
3 M_Open depth5 71.8 62.6 4 31.3
4 M_Open depth10 6.5 4.20 4 2.10
5 M_Closed depth5 20 8.98 4 4.49
6 M_Closed depth10 0.5 1 4 0.5
Finally, the plotting was done with this piece of code:
gs1 <- ggplot(cleandata, aes(x=canopy, y=mean.value)) +
geom_bar(stat = "identity", color = "black", position = position_dodge())+
geom_errorbar(aes(ymin = mean.value - sd.value, ymax = mean.value + sd.value),
width=0.2)+
xlab("Canopy cover")+ylab("Seed Bank")+
facet_wrap(~variable,nrow=1)
gs1
This gives a graph with error bars (standard deviation) as given below
Problem solved! Cheers!

plotting subset of grouped data in ggplot2

I am trying to make a plot that has mean (+/- SD) number (ID = total count per row) of Explorations on the y-axis and then grouped by both pp and type on the x-axis.
That is, I want to generate something that looks like this (hand-drawn and made up graph):
Here is how the dataframe is structured (available here).
pp crossingtype km type ID
0 Complete 80.0 DCC 10
1 Complete 80.0 DCC 4
0 Exploration 80.0 DCC 49
1 Exploration 80.0 DCC 4
0 Complete 144.0 DWC 235
1 Complete 144.0 DWC 22
0 Exploration 144.0 DWC 238
1 Exploration 144.0 DWC 18
1 Exploration 84.0 PC 40
0 Complete 107.0 PC 43
1 Complete 107.0 PC 22
0 Exploration 107.0 PC 389
I want to use ggplot2 and have tried this code:
ggplot(expMean, aes(x=as.factor(pp), y=crossingtype, color=factor(type),group=factor(type)))
+ geom_point(shape=16,cex=3)
+ geom_smooth(method=lm)
+ facet_grid(.~type)
But it gives me this figure (which is not what I am trying to make).
How can I use ggplot2 to make the first plot?
You can do the statistical transformations within ggplot(), but my preference is to process the data first, then plot the results.
library(tidyverse)
expMean %>%
filter(crossingtype == "Exploration") %>%
group_by(type, pp) %>%
summarise(Mean = mean(ID), SD = sd(ID)) %>%
ggplot(aes(factor(pp), Mean)) +
geom_pointrange(aes(ymax = Mean + SD,
ymin = Mean - SD)) +
facet_wrap(~type) +
theme_bw()
Is this what you want? This filters the data to only include Exploration, uses ID as the y variable, groups by pp and facets on type
tbl <- read_table2(
"pp crossingtype km type ID
0 Complete 80.0 DCC 10
1 Complete 80.0 DCC 4
0 Exploration 80.0 DCC 49
1 Exploration 80.0 DCC 4
0 Complete 144.0 DWC 235
1 Complete 144.0 DWC 22
0 Exploration 144.0 DWC 238
1 Exploration 144.0 DWC 18
1 Exploration 84.0 PC 40
0 Complete 107.0 PC 43
1 Complete 107.0 PC 22
0 Exploration 107.0 PC 389"
) %>%
mutate(pp = factor(pp))
ggplot(data = tbl %>% filter(crossingtype == "Exploration")) +
geom_boxplot(aes(x = pp, y = ID)) +
facet_wrap(~type)
I ran this code on the linked dataset to produce this:
Here's the approach I used. Utilised a colour instead of the double valued x-axis.
Note that I downloaded the data to my working directory, so the read.table command may need to be modified
library(dplyr)
library(ggplot2)
dat <- read.table("figshare.txt")
dat <- droplevels(filter(dat, crossingtype == "Exploration"))
dat <- dat %>%
group_by(pp, type) %>%
summarise(val = mean(ID),
SD = sd(ID))
ggplot(dat, aes(x = type, y = val, colour = as.factor(pp), group =
as.factor(pp))) +
geom_point(size = 3, position = position_dodge(width = 0.2)) +
geom_errorbar(aes(ymax = val + SD, ymin = val - SD), position =
position_dodge(width = 0.2), width = 0.2) +
labs(y = "Mean # of explorations (+/- SD", colour = "pp")

Resources