Create legend for line chart R GGPlot2 - r

Hello I am trying to add a legend to my graph:
Having looked at a few previous answers they all seem to rely on aes() or having the lines be related to a factor in some way. I didn't understand this answer Add legend to geom_line() graph in r.
In my case I simply want a legend that states "RED = No Cross Validation" and "BLUE = Cross Validation"
R Code
ggplot(data=graphDF,aes(x=rev(kAxis)))+
geom_line(y=rev(noCVErr),color="red")+
geom_point(y=rev(noCVErr),color="red")+
geom_line(y=rev(CVErr),color="blue")+
geom_point(y=rev(CVErr),color="blue")+
ylim(minErr,maxErr)+
ggtitle("The KNN Error Rate for Cross Validated and Non-Cross Validated Models")+
labs(y="Error Rate", x = "1/K")
Dataset
ks kAxis noCVAcc noCVErr CVAcc CVErr
1 1 1.00000000 1.0000000 0.00000000 0.8279075 0.1720925
2 3 0.33333333 0.9345238 0.06547619 0.8336898 0.1663102
3 5 0.20000000 0.8809524 0.11904762 0.8158645 0.1841355
4 7 0.14285714 0.8690476 0.13095238 0.8272727 0.1727273
5 9 0.11111111 0.8809524 0.11904762 0.7857398 0.2142602
6 11 0.09090909 0.8809524 0.11904762 0.7500891 0.2499109
7 13 0.07692308 0.8511905 0.14880952 0.7622103 0.2377897
8 15 0.06666667 0.7976190 0.20238095 0.7320856 0.2679144
9 17 0.05882353 0.7916667 0.20833333 0.7320856 0.2679144
10 19 0.05263158 0.7559524 0.24404762 0.7201426 0.2798574
11 21 0.04761905 0.7678571 0.23214286 0.7023173 0.2976827
12 23 0.04347826 0.7440476 0.25595238 0.6903743 0.3096257
13 25 0.04000000 0.7559524 0.24404762 0.6786096 0.3213904

It might help if you put your data into "long" form, such as this for your data frame graphDF (perhaps using pivot_longer from tidyr if necessary):
library(tidyr)
graphDF_long <- pivot_longer(data = graphDF,
cols = c(noCVErr, CVErr),
names_to = "model",
values_to = "errRate")
This creates a new data.frame called graphDF_long that has a single column for the error rate, and a new column that specifies model:
ks kAxis noCVAcc CVAcc model errRate
<int> <dbl> <dbl> <dbl> <chr> <dbl>
1 1 1 1 0.828 noCVErr 0
2 1 1 1 0.828 CVErr 0.172
3 3 0.333 0.935 0.834 noCVErr 0.0655
4 3 0.333 0.935 0.834 CVErr 0.166
5 5 0.2 0.881 0.816 noCVErr 0.119
6 5 0.2 0.881 0.816 CVErr 0.184
....
Then, you can simplify your ggplot statement, and use an aesthetic with the column model for color:
library(ggplot2)
ggplot(data = graphDF_long, aes(x = rev(kAxis), y = rev(errRate), color = model)) +
geom_line() +
geom_point() +
scale_color_manual(values = c("blue", "red"),
labels = c("Cross Validation", "No Cross Validation")) +
ylim(min(graphDF_long$errRate), max(graphDF_long$errRate)) +
ggtitle("The KNN Error Rate for Cross Validated and Non-Cross Validated Models") +
labs(y="Error Rate", x = "1/K")
This will generate the legend automatically:

Related

How can you split ggplot from 12 individual bars to 3 groups of 4?

I have a bar graph with 12 individual bars. I would like to split them into their 3 respective groups, each with their own color so that they are recognized as the same group. I have been using ColorBrewer Set 3, because it is photocopy safe. When I use it on my plot it all turns one color.
In the plot, you can see the 3 groups - ELE, KEB, and SMI, each with 4 blocks. It would be great if they could be split up more cohesively.
# A tibble: 12 x 7
vid.order sum.correct n prop.correct z_score p_val sig
<chr> <int> <int> <dbl> <dbl> <dbl> <lgl>
1 ELE1 47 55 0.855 5.26 0.000000145 TRUE
2 ELE2 46 55 0.836 4.99 0.000000607 TRUE
3 ELE3 37 55 0.673 2.56 0.0104 TRUE
4 ELE4 47 55 0.855 5.26 0.000000145 TRUE
5 KEB1 40 55 0.727 3.37 0.000749 TRUE
6 KEB2 46 55 0.836 4.99 0.000000607 TRUE
7 KEB3 47 55 0.855 5.26 0.000000145 TRUE
8 KEB4 44 55 0.8 4.45 0.00000860 TRUE
9 SMI1 35 55 0.636 2.02 0.0431 TRUE
10 SMI2 46 55 0.836 4.99 0.000000607 TRUE
11 SMI3 41 55 0.745 3.64 0.000272 TRUE
12 SMI4 35 55 0.636 2.02 0.0431 TRUE
byBlot_sigtests %>%
ggplot(aes(x=vid.order, y=prop.correct))+
geom_bar(stat="identity", position=position_dodge(.9))+
labs(x="video", y="proportion natural selected")+
geom_hline(yintercept = .5) +
expand_limits(y=c(0,1)) +
scale_fill_brewer(palette = "Set3") +
jtools::theme_apa()
Personally I would create a column with the groups (ELE, KEB or SMI) and use that in aes(fill = )
library(data.table)
library(dplyr)
library(ggplot2)
library(jtools)
#make object data.table
setDT(byBlot_sigtests)
#create a column with the groups (vid.order but without the numbers)
byBlot_sigtests[, group := gsub("[0-9]", "", vid.order)]
#plot
byBlot_sigtests %>%
ggplot(aes(x=vid.order, y=prop.correct, fill = group))+
geom_bar(stat="identity", position=position_dodge(.9))+
labs(x="video", y="proportion natural selected")+
geom_hline(yintercept = .5) +
expand_limits(y=c(0,1)) +
scale_fill_brewer(palette = "Set3") +
jtools::theme_apa()

How to graph an ANOVA w/ conditional p-values and confidence intervals in R?

I have df1:
Rate Dogs MHI_2018 Points Level AGE65_MORE P_Elderly
1 0.10791173 0.00000000 59338 236.4064 C 8653 15.56267
2 0.06880040 0.00000000 57588 229.4343 C 44571 20.44335
3 0.08644537 0.00000000 50412 200.8446 C 10548 18.23651
4 0.29591635 0.00000000 29267 116.6016 A 1661 16.38390
5 0.05081301 0.00000000 37365 148.8645 B 3995 20.29980
6 0.02625200 0.00000000 45400 180.8765 D 20247 17.71748
7 0.80321285 0.02974862 39917 159.0319 D 6562 19.52105
8 0.07682852 0.00000000 42132 167.8566 D 5980 22.97173
9 0.18118814 0.00000000 47547 189.4303 B 7411 16.78482
10 0.07787555 0.00000000 39907 158.9920 B 2953 22.99665
11 0.15065913 0.00000000 39201 156.1793 C 2751 20.72316
12 0.33362247 0.00000000 46495 185.2390 B 2915 19.45019
13 0.03652168 0.00000000 49055 195.4382 B 10914 19.92988
14 0.27998133 0.00000000 42423 169.0159 A 2481 23.15446
15 0.05407451 0.00000000 40203 160.1713 A 7790 21.06202
16 0.07233796 0.00000000 39057 155.6056 A 2629 19.01765
17 0.08389061 0.00000000 45796 182.4542 B 15446 18.51106
18 0.05220569 0.00000000 34035 135.5976 B 6921 18.06578
19 0.05603418 0.00000000 39491 157.3347 B 12322 17.26133
20 0.15875536 0.00000000 60367 240.5060 C 12400 15.14282
With
AOV <- aov(Rate~Level, data = df)
TukeyHSD(AOV)
$Level
diff lwr upr p adj
B-A -0.066558621 -0.3783957 0.2452784 0.9272012
C-A -0.061063140 -0.4026635 0.2805372 0.9551663
D-A 0.126520253 -0.2624089 0.5154494 0.7890519
C-B 0.005495482 -0.2848090 0.2958000 0.9999404
D-B 0.193078874 -0.1516699 0.5378277 0.4049948
D-C 0.187583392 -0.1843040 0.5594708 0.4923479
I would now like to make a plot of this data with means and confidence intervals. I would also like to plot the p_adj between variables if it is < 0.50. Output would look like:
One solution is to use ggsignif package but first you need to prepare the output of TukeyHSD for its use in ggsignif:
AOV <- aov(Rate~Level, data = df)
t <-as.data.frame(TukeyHSD(AOV)$Level)
library(tidyverse)
MAX <-df %>% group_by(Level) %>% summarise(Max = max(Rate))
T1 <- t %>% rownames_to_column("Group") %>%
mutate(Start = sub("^(.).*","\\1",Group),
End = sub(".*(.)$","\\1",Group)) %>%
left_join(.,MAX, by = c("Start" = "Level")) %>%
left_join(.,MAX, by = c("End" = "Level")) %>%
mutate(End = factor(End)) %>%
rowwise() %>%mutate(ypos = max(Max.x, Max.y)*(1+0.25*as.numeric(End)))
Source: local data frame [6 x 10]
Groups: <by row>
# A tibble: 6 x 10
Group diff lwr upr `p adj` Start End Max.x Max.y ypos
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <fct> <dbl> <dbl> <dbl>
1 B-A -0.0666 -0.378 0.245 0.927 B A 0.334 0.296 0.417
2 C-A -0.0611 -0.403 0.281 0.955 C A 0.159 0.296 0.370
3 D-A 0.127 -0.262 0.515 0.789 D A 0.803 0.296 1.00
4 C-B 0.00550 -0.285 0.296 1.00 C B 0.159 0.334 0.500
5 D-B 0.193 -0.152 0.538 0.405 D B 0.803 0.334 1.20
6 D-C 0.188 -0.184 0.559 0.492 D C 0.803 0.159 1.41
Now, you can plot your data and add significance based on the T1 dataset:
library(ggsignif)
library(ggplot2)
ggplot(df, aes(x = Level, y = Rate))+
geom_jitter(width = 0.2)+
stat_summary(fun.data = "mean_cl_normal", geom = "errorbar", width = 0, color = "red") +
stat_summary(fun = "mean", geom = "errorbar", aes(ymax = ..y.., ymin = ..y..), col = "red", width = 0.5) +
geom_signif(data = subset(T1,`p adj` <0.5), manual = TRUE,
aes(xmax = End, xmin = Start, y_position= ypos, annotations = round(`p adj`,3)))

Plot multiple lines in ggplot2 from

I want to produce a x,y plot, with ggplot or whatever works, with multiple columns represented in the table below: They should be grouped together with Day, Soil Number, Sample. Mean is my y value and SD as my errorbar while the column Day should also serve as my x value as a timeline. How do I manage this?
Results_CMT
# A tibble: 22 x 5
# Groups: Day, Soil_Number [10]
Day Soil_Number Sample Mean SD
<int> <int> <chr> <dbl> <dbl>
1 3.84 0.230
2 0 65872 R 4.82 0.679
3 1 65871 R 3.80 1.10
4 1 65872 R 3.24 1.61
5 3 65871 fLF NA NA
6 3 65871 HF 1.73 0.795
7 3 65871 oLF 0.360 0.129
8 3 65871 R 3.13 1.36
9 3 65872 fLF NA NA
10 3 65872 HF 1.86 0.374
# ... with 12 more rows
At the end their should be 8 Lines (if data is found).
65871 R
65871 HF
65871 fLF
65871 oLF
65872 R
65872 HF
65872 fLF
65872 oLF
Do I have to produce another Column with a combined character of Day, SoilNumber and Sample?
Thanks for any help.
Try this:
library(ggplot2)
ggplot(Results_CMT, aes(x = Day, y = Mean, colour = interaction(Sample, Soil_Number))) +
geom_line() +
geom_errorbar(aes(ymin = Mean-SD, ymax = Mean+SD), width = .2)

Graphing using r and points (pch)

What happens to my R-code? I don't want to see the bar graph and I want to see only the plotting characters (pch):
plot(Graphdata$Sites,ylim=c(-1,2.5), xlab="sites",
ylab="density frequency",lwd=2)
points(Graphdata$A,pch=1,col="blue",lwd=2)
points(Graphdata$B,pch=2,col="green",lwd=2)
points(Graphdata$C,pch=3,col="red",lwd=2)
points(Graphdata$D,pch=4,col="orange",lwd=2)
legend("topright",
legend=c("A","B","C","D"),
col=c("red","blue","green","orange"),lwd=2)
Part of my data looks like:
Sites A B C D
1 A 2.052 2.268 1.828 1.474
2 B 0.549 0.664 0.621 1.921
3 C 0.391 0.482 0.400 0.382
4 D 0.510 0.636 0.497 0.476
5 A 0.214 0.239 0.215 0.211
6 B 1.016 1.362 0.978 0.876
......................................
.....................................
and I want the legend according to pch and not in the form of line.
The barplot is happening because you're plotting the factor of your Sites column. To only plot the points:
library(reshape2)
library(ggplot2)
#plots 6 categorical variables
Graphadata.m = melt(t(Graphdata[-1]))
ggplot(Graphadata.m,aes(Var2,value,group=Var2)) + geom_point() + ylim(-1,2.5)
If you want it to plot 4 categorical variables instead:
Graphdata.1 = t(Graphdata[-1])
colnames(Graphdata.1) = c("A","B","C","D","A","B")
Graphdata.m = melt(Graphdata.1)
ggplot(Graphdata.m,aes(Var2,value,group=Var2)) + geom_point() + ylim(-1,2.5)
EDIT:
In base R:
plot(1,xlim=c(1,4),ylim=c(-1,2.5), xlab="sites",ylab="density frequency",lwd=2,type="n",xaxt = 'n')
axis(1,at=c(1,2,3,4),tick=T,labels=c("A","B","C","D"))
points(Graphdata$A,pch=1,col="blue",lwd=2)
points(Graphdata$B,pch=1,col="green",lwd=2)
points(Graphdata$C,pch=1,col="red",lwd=2)
points(Graphdata$D,pch=1,col="orange",lwd=2)

line connecting missing data R

I would like a line plot in R of the days a bird spent away from its nest.
I have missing data that is making it difficult to show the general trend. I want to replace the line for the days that I don't have information for with a dotted line. I have absolutely no idea how to do this. Is it possible to do in R?
> time.away.1
hrs.away days.rel
1 0.380 -2
2 0.950 -1
3 1.000 0
4 0.200 1
5 0.490 12
6 0.280 13
7 0.130 14
8 0.750 20
9 0.160 21
10 1.830 22
11 0.128 26
12 0.126 27
13 0.500 28
14 0.250 31
15 0.230 32
16 0.220 33
17 0.530 40
18 3.220 41
19 0.430 42
20 1.960 45
21 1.490 46
22 24.000 56
23 24.000 57
24 24.000 58
25 24.000 59
26 24.000 60
27 24.000 61
My attempt:
plot(hrs.away ~ days.rel, data=time.away.1,
type="o",
main="Time Away Relative to Nest Age",
ylab="Time spent away",
xlab="Days Relative to Initiation",
ylim=c(0,4))
Here is a way using diff to make a variable determining if a sequence is missing. Note that I renamed your data to dat
## make the flag variable
dat$type <- c(TRUE, diff(dat$days.rel) == 1)
plot(hrs.away ~ days.rel, data=dat,
type="p",
main="Time Away Relative to Nest Age",
ylab="Time spent away",
xlab="Days Relative to Initiation",
ylim=c(0,4))
legend("topright", c("missing", "sampled"), lty=c(2,1))
## Add line segments
len <- nrow(dat)
with(dat,
segments(x0=days.rel[-len], y0=hrs.away[-len],
x1=days.rel[-1], y1=hrs.away[-1],
lty=ifelse(type[-1], 1, 2),
lwd=ifelse(type[-1], 2, 1))
)
For the ggplot version, you can make another data.frame with the lagged variables used above,
library(ggplot2)
dat2 <- with(dat, data.frame(x=days.rel[-len], xend=days.rel[-1],
y=hrs.away[-len], yend=hrs.away[-1],
type=factor(as.integer(type[-1]))))
ggplot() +
geom_point(data=dat, aes(x=days.rel, y=hrs.away)) +
geom_segment(data=dat2, aes(x=x, xend=xend, y=y, yend=yend, lty=type, size=type)) +
scale_linetype_manual(values=2:1) +
scale_size_manual(values=c(0.5,1)) +
ylim(0, 4) + theme_bw()

Resources