Add means to a boxplot - r

I want to add the mean values to my basic boxplot and found this function here.
fun_mean <- function(x){
return(data.frame(y=mean(x),label=mean(x,na.rm=T)))
}
I used it within my code, but because I have two factors, it is not working properly. Where do I have to add the second factor?
FixationT2.plot = ggplot(dataT2fix_figs,
aes(x = length, y = perc_fixated, fill = mask)) +
geom_boxplot() +
coord_cartesian (ylim =c(35, 100)) +
geom_hline(yintercept = 50) +
stat_summary(fun.y = mean, geom="point", colour="darkred", size=3) +
labs(title="") +
xlab("Länge Wort N+1") +
ylab("Fixationswahrscheinlichkeit in %\n von Wort N+1") +
guides(fill=guide_legend(title="Preview Maske"))
This is what the data looks like
Subject length mask perc_fixated
<fct> <fct> <fct> <dbl>
1 1 "kurzes\n N+1" keine Maske 41.7
2 1 "kurzes\n N+1" syntaktisch korrekt 91.7
3 1 "kurzes\n N+1" syntaktisch inkorrekt 86.7
4 1 "langes \nN+1" keine Maske 100
5 1 "langes \nN+1" syntaktisch korrekt 87.5
6 1 "langes \nN+1" syntaktisch inkorrekt 91.7
7 2 "kurzes\n N+1" keine Maske 73.3
8 2 "kurzes\n N+1" syntaktisch korrekt 84.6
9 2 "kurzes\n N+1" syntaktisch inkorrekt 83.3
10 2 "langes \nN+1" keine Maske 83.3

You can specify the dodge width for the calculated mean value layer. Right now they appear to be overlapping one another at each x-axis value. I don't see the function you mentioned (fun_mean) actually used in the ggplot code, but it shouldn't really be necessary.
Try this:
ggplot(df,
aes(x = length, y = perc_fixated, fill = mask)) +
geom_boxplot() +
stat_summary(fun.y = mean, geom="point", colour="darkred", size=3,
position = position_dodge2(width = 0.75))
# ... code for axis titles & so on omitted for brevity.
I used width = 0.75 above, because this is the default width for geom_boxplot() / stat_boxplot() (as found in the ggplot2 code here). If you specify a width explicitly in your boxplot, use that instead.
Data used:
df <- read.table(header = TRUE,
text = 'Subject length mask perc_fixated
1 1 "kurzes\n N+1" "keine Maske" 41.7
2 1 "kurzes\n N+1" "syntaktisch korrekt" 91.7
3 1 "kurzes\n N+1" "syntaktisch inkorrekt" 86.7
4 1 "langes \nN+1" "keine Maske" 100
5 1 "langes \nN+1" "syntaktisch korrekt" 87.5
6 1 "langes \nN+1" "syntaktisch inkorrekt" 91.7
7 2 "kurzes\n N+1" "keine Maske" 73.3
8 2 "kurzes\n N+1" "syntaktisch korrekt" 84.6
9 2 "kurzes\n N+1" "syntaktisch inkorrekt" 83.3
10 2 "langes \nN+1" "keine Maske" 83.3')
df$Subject <- factor(df$Subject)
(Next time, please use dput() as advised in the comments to provide your data.)

I have in the past just used the points() function to add mean to my box plots like this:
boxplot(mtcars$mpg ~ mtcars$cyl)
points(x = c(1, 2, 3),
y = tapply(mtcars$mpg, mtcars$cyl, "mean"), col = "red")
So you plot a boxplot then calculate the mean for each of your boxes and plot them as the y argument in points and the x is just a sequence with length how ever many boxes you have.

Related

Change position of number of bar (ggplot) with negative and positive values

I have the next code, where I have negative positive and negative values and I want to put the positive values above the edge of bar, and negative values below the edge of the bar. I want to know too how to change the y axes (limites), changes the order of "Flujo" and how to change background graph.
tabla <-
Flujo Mes Valor
1 Qns Septiembre 79.4
2 Qnl Septiembre -97.5
3 Qh Septiembre -3.1
4 Qe Septiembre -11.3
5 Qr Septiembre 0.5
6 Qg Septiembre 16.5
7 Qm Septiembre 15.5
8 Qns Octubre 79.1
9 Qnl Octubre -87.8
10 Qh Octubre -0.8
11 Qe Octubre -1.7
12 Qr Octubre 0.0
13 Qg Octubre 36.0
14 Qm Octubre -57.9
tabla<-data.frame("Flujo"=c("Qns","Qnl","Qh","Qe","Qr","Qg","Qm","Qns","Qnl","Qh","Qe","Qr","Qg","Qm"),
"Mes"= rbind(array(" Septiembre", dim=c(7,1)) , array("Octubre", dim=c(7,1))) ,
"Valor"=round(c(s1$Qns,s1$Qnl,s1$Qh,s1$Qe,s1$Qr,s1$Qg,s1$Qm,s2$Qns,s2$Qnl,s2$Qh,s2$Qe,s2$Qr,s2$Qg,s2$Qm),digits=1))
colors <-c("Qns"="red","Qnl"="blue","Qh"="deeppink","Qe"="darkgoldenrod1","Qr"="darkblue","Qg"="green","Qm"="brown")
fig31 <- ggplot(data=tabla, aes(x=Mes, y=Valor,fill=Flujo)) +
geom_bar(stat="identity", color="black", position=position_dodge())+
theme_minimal()+
geom_text(aes(label=Valor),position=position_dodge(width=0.9),size=4, vjust=1.5)+
scale_fill_manual(values = colors) +
theme_light()
fig31
Here is one way you could do: I changed geom_bar with geom_col anc put an ifelse in geom_text(). Here you can define the position 0,-1,1.5` of the numbers.
fig31 <- ggplot(tabla, aes(Mes, Valor, fill=Flujo)) +
geom_col(position = position_dodge()) +
geom_text(aes(label = paste(Valor)),
position=position_dodge(width=0.9),size=3,
vjust = ifelse(tabla$Valor >= 0, -1, 1.5)) +
scale_fill_manual(values = colors) +
theme_light()
fig31

How to reorder discrete y axis on a plot with facets, flipped coordinates, and continuous x axis

Objective:
I am trying to draw a plot of data on cities in several countries, grouped by region, then arranged by my own order from 1 at the top in ascending order going down the plot along the y axis.
The plot is currently grouping the regions in reverse alphabetical order; I'd like it to be in alphabetical order as well. Also, it is currently arranging the nations alphabetically. My order is not being used.
What I've tried so far:
You'll see two lines of code that have been commented out below:
#aspect.ratio=1/5, #I tried this and it did not work for me
and
#coord_fixed(ratio=1/500) + #new one for fixing y axis spacing #this did not solve it either
and I've tried to solve it using the solution proposed here: https://www.r-bloggers.com/ordering-categories-within-ggplot2-facets-2/ with this code:
group_by(rdatacities$region, rdatacities$country) %>%
arrange(desc(contribution)) %>%
ungroup() %>%
mutate(country = factor(paste(country, region, sep = "__"), levels = rev(paste(country, region, sep = "__")))) %>%
# --ggplot here--
scale_x_discrete(labels = function(x) gsub("__.+$", "", x))
but group_by(rdatacities$region, rdatacities$country) %>% throws this error:
Error in UseMethod("group_by_") :
no applicable method for 'group_by_' applied to an object of class "character"
I am not an R expert, I believe this to be a minimal, complete, and verifiable example, but if it's a wall of terrible code, I apologize in advance.
library(readxl)
library(grid)
library(scales)
library(ggplot2)
library(graphics)
library(grDevices)
library(datasets)
p<-ggplot(rdatacities, aes(x=country, y=target, size=citiesorig)) +
geom_point(shape=21, alpha=.6, stroke=.75) + #municipal targets
facet_grid(region ~ ., scales = "free_y", space = "free_y") + #regional clusters
geom_point(data=rdatanation, shape=0, size=2.5, stroke=1, aes(colour='red'))
#national targets
plot <- p + theme(
plot.title = element_text(hjust=0.5), legend.key=element_rect(fill='white'),
panel.background = element_blank(), plot.margin = margin(2),
axis.ticks.y = element_blank(), panel.grid.major.y= element_line(size=.2, linetype='solid', colour='grey'),
axis.text.y = element_text(size=6),
panel.grid.major.x= element_blank(), axis.line.x = element_blank(),
axis.ticks.x=element_line(color='black'),
#aspect.ratio=1/5, #I tried this and it did not work for me
strip.background=element_rect(fill = NA, size = 0, color = "white", linetype = "blank")
) +
#coord_fixed(ratio=1/500) + #new one for fixing y axis spacing #this did not solve it either
scale_y_continuous(name="Target by 2030 as Percent of Base Year", labels = percent) +
scale_x_discrete(name="Country", expand=waiver()) +
coord_flip(ylim = c(0, 3)) +
ggtitle("National vs. Municipal GHG Emissions Targets") +
guides(colour = guide_legend(order = 1, "Targets"),
size = guide_legend(order = 2,"Municipal")) +
scale_color_manual(name="Targets", labels = c("National"), values = c('#843C0C'))
plot
Here is a sample of the data from rdatacities:
region country target cities order citiesorig
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 EAP JPN 0.660 1 71 1
2 EAP JPN 0.75 2 71 2
3 EAP JPN 0.8 1 71 1
4 EAP JPN 0.85 1 71 1
5 EAP JPN 0.88 1 71 1
6 EAP JPN 0.96 1 71 1
7 EAP KOR 0.6 1 72 1
8 ECA ALB 0.22 1 68 1
9 ECA AZE 0.2 1 65 1
10 ECA BLR 0.2 2 62 6
# ... with 391 more rows
Here is a sample of the data from rdatanations
region countryname country target EmBase EmCBase EmC2014 UrbPop2014 `%UrbPop2014` pop2014
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 EAP Australia AUS 1 263705. 15.5 15.4 20947819 0.893 23460694
2 EAP Brunei Darussalam BRN 2.09 6194. 23.9 22.1 316547 0.769 411704
3 EAP French Polynesia PYF NA 436. 2.20 2.92 154205 0.560 275484
4 EAP Japan JPN 0.82 1096180. 8.87 9.54 118393408 0.930 127276000
5 EAP Korea, Rep. KOR 0.74 246943. 5.76 11.6 41794948 0.824 50746659
6 EAP Malaysia MYS 5.93 56593. 3.14 8.03 22371755 0.740 30228017
7 EAP Mongolia MNG 1.89 9989. 4.57 7.13 2082457 0.712 2923896
8 EAP New Caledonia NCL NA 1584. 9.27 16.0 186697 0.697 268000
9 EAP New Zealand NZL 1.1 23546. 7.07 7.69 3889661 0.863 4509700
10 EAP Palau PLW 1.73 235. 15.6 12.3 18236 0.865 21094
# ... with 70 more rows

ggplot2 geom_smooth didn't work

I'm plotting two different variables on the same plot.
sex_female is chr, including 0 and 1.
epoch_36:epoch_144 are num, time variables.
Here is my code:
total %>%
select(sex_female, epoch_36:epoch_144)%>%
gather(key = time, value = ac, epoch_36:epoch_144) %>%
group_by(sex_female,time) %>%
mutate(mean = mean(ac)) %>%
ggplot(aes(x = time, y = mean,color = sex_female)) +
geom_point(alpha = .3)+
geom_smooth(method = "lm")+
theme(axis.text.x = element_text(angle = 90,hjust = 1))
After the mutation, I got the tibble:
A tibble: 45,780 x 4
# Groups: sex_female, time [218]
sex_female time ac mean
<chr> <chr> <dbl> <dbl>
1 1 epoch_36 49.8 54.96406
2 0 epoch_36 34.7 55.43448
3 0 epoch_36 70.9 55.43448
4 0 epoch_36 12.3 55.43448
5 1 epoch_36 102.7 54.96406
6 1 epoch_36 77.9 54.96406
7 0 epoch_36 1.1 55.43448
8 1 epoch_36 140.0 54.96406
9 1 epoch_36 51.3 54.96406
10 0 epoch_36 0.0 55.43448
# ... with 45,770 more rows
I've tried using the solution suggested in a similar question: Plot dashed regression line with geom_smooth in ggplot2, but no lines showed up. How do I fix my code to produce lines?
Your time column is categorical and you should transform it into numerical.
mutTibble$time <- as.numeric(mutTibble$time)
And for plotting you can use this:
library(ggplot2)
ggplot(mutTibble,
aes(time, mean, color = factor(sex_female))) +
geom_point(alpha = 0.3)+
geom_smooth(method = "lm")+
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(x = "Time",
y = "Mean"
color = "Gender (female)")

How can I add a line plot in a BOX plot using secondary y axis but same x axis

I am trying to add a line plot to my box plot, on secondary y axis, but i am not able to. what to do? Please help
code for my box plot are:
library(ggplot2)
mydata<-read.csv("boxplot2.csv")
mydata$Class <- factor(mydata$Class,labels = c("1", "2", "3", "4", "5", "6"))
p10 <- ggplot(mydata, aes(x = mydata$Class, y = log(mydata$erosion))) +
geom_boxplot()
p10
p10 <- p10 +
scale_x_discrete(name = "Mean Annual Precipitation(mm/yr)") +
scale_y_continuous(name = "Log Average Erosion Rate(m/My)")
p10 <- ggplot(mydata, aes(x = mydata$Class, y = log(mydata$erosion))) +
geom_boxplot(varwidth=TRUE)
p10 <- p10 +
scale_x_discrete(name = "Mean Annual Precipitation(mm/yr)") +
scale_y_continuous(name = "Log Average Erosion Rate(m/My)")
I want similar figure, but instead of histograms, i will have box plot
add sample data
% Vegetation erosion Class
0 0.43 1
0 0.81 1
2 0.26 1
3 1.05 1
3 0.97 1
12.76 15.97 2
12.84 17.69 2
11.01 14.76 2
13.44 17.94 2
10.76 10.65 2
7.28 67.47 2
23 120.4 3
21 298.63 3
52 21.4 3
9 64.94 3
50 291.88 3
16 493.98 3
11 183.45 3
You just have to specify different aesthetics for the geom_line, something like this:
ggplot(iris,aes(x=Species, y=Sepal.Length, fill=Species)) +
geom_boxplot() +
geom_line(aes(x=Species, y=Petal.Length, group=1), stat = "summary", fun.y="mean") +
scale_y_continuous(sec.axis = sec_axis(~.))

Obtain hexadecimal color codes used in “scale_fill_grey” function

I want to get the hexadecimal codes of the colors that the scale_fill_grey function uses to fill the categories of the barplot produced by the following codes:
library(ggplot2)
data <- data.frame(
Meal = factor(c("Breakfast","Lunch","Dinner","Snacks"),
levels=c("Breakfast","Lunch","Dinner","Snacks")),
Cost = c(9.75,13,19,10.20))
ggplot(data=data, aes(x=Meal, y=Cost, fill=Meal)) +
geom_bar(stat="identity") +
scale_fill_grey(start=0.8, end=0.2)
scale_fill_grey() uses grey_pal() from the scales package, which in turn uses grey.colors(). So, you can generate the codes for the scale of four colours that you used as follows:
grey.colors(4, start = 0.8, end = 0.2)
## [1] "#CCCCCC" "#ABABAB" "#818181" "#333333"
This shows a plot with the colours
image(1:4, 1, matrix(1:4), col = grey.colors(4, start = 0.8, end = 0.2))
Using ggplot_build() function:
#assign ggplot to a variable
myplot <- ggplot(data=data, aes(x=Meal, y=Cost, fill=Meal)) +
geom_bar(stat="identity") +
scale_fill_grey(start=0.8, end=0.2)
#get build
myplotBuild <- ggplot_build(myplot)
#see colours
myplotBuild$data
# [[1]]
# fill x y PANEL group ymin ymax xmin xmax colour size linetype alpha
# 1 #CCCCCC 1 9.75 1 1 0 9.75 0.55 1.45 NA 0.5 1 NA
# 2 #ABABAB 2 13.00 1 2 0 13.00 1.55 2.45 NA 0.5 1 NA
# 3 #818181 3 19.00 1 3 0 19.00 2.55 3.45 NA 0.5 1 NA
# 4 #333333 4 10.20 1 4 0 10.20 3.55 4.45 NA 0.5 1 NA

Resources