line connecting missing data R - r

I would like a line plot in R of the days a bird spent away from its nest.
I have missing data that is making it difficult to show the general trend. I want to replace the line for the days that I don't have information for with a dotted line. I have absolutely no idea how to do this. Is it possible to do in R?
> time.away.1
hrs.away days.rel
1 0.380 -2
2 0.950 -1
3 1.000 0
4 0.200 1
5 0.490 12
6 0.280 13
7 0.130 14
8 0.750 20
9 0.160 21
10 1.830 22
11 0.128 26
12 0.126 27
13 0.500 28
14 0.250 31
15 0.230 32
16 0.220 33
17 0.530 40
18 3.220 41
19 0.430 42
20 1.960 45
21 1.490 46
22 24.000 56
23 24.000 57
24 24.000 58
25 24.000 59
26 24.000 60
27 24.000 61
My attempt:
plot(hrs.away ~ days.rel, data=time.away.1,
type="o",
main="Time Away Relative to Nest Age",
ylab="Time spent away",
xlab="Days Relative to Initiation",
ylim=c(0,4))

Here is a way using diff to make a variable determining if a sequence is missing. Note that I renamed your data to dat
## make the flag variable
dat$type <- c(TRUE, diff(dat$days.rel) == 1)
plot(hrs.away ~ days.rel, data=dat,
type="p",
main="Time Away Relative to Nest Age",
ylab="Time spent away",
xlab="Days Relative to Initiation",
ylim=c(0,4))
legend("topright", c("missing", "sampled"), lty=c(2,1))
## Add line segments
len <- nrow(dat)
with(dat,
segments(x0=days.rel[-len], y0=hrs.away[-len],
x1=days.rel[-1], y1=hrs.away[-1],
lty=ifelse(type[-1], 1, 2),
lwd=ifelse(type[-1], 2, 1))
)
For the ggplot version, you can make another data.frame with the lagged variables used above,
library(ggplot2)
dat2 <- with(dat, data.frame(x=days.rel[-len], xend=days.rel[-1],
y=hrs.away[-len], yend=hrs.away[-1],
type=factor(as.integer(type[-1]))))
ggplot() +
geom_point(data=dat, aes(x=days.rel, y=hrs.away)) +
geom_segment(data=dat2, aes(x=x, xend=xend, y=y, yend=yend, lty=type, size=type)) +
scale_linetype_manual(values=2:1) +
scale_size_manual(values=c(0.5,1)) +
ylim(0, 4) + theme_bw()

Related

How to plot a Sequential Bayes Factor as participants are added

I am currently analyzing eye-tracking data using the Sequential Bayes Factor method, and I would like to plot how the resulting Bayes Factor (BF; calculated from average looking times) changes as participants are added.
I would like the x-axis to represent the number of participants included in the calculation, and the y-axis to represent the resulting Bayes Factor.
For example, when participants 1-10 are included, BF = [y-value], and that is one plot point on the graph. When participants 1-11 are included, BF = [y-value], and that is the second plot point on the graph.
Is there a way to do this in R?
For example, I have this data set:
ID avg_PTL
<chr> <dbl>
1 D07 -0.0609
2 D08 0.0427
3 D12 0.112
4 D15 -0.106
5 D16 0.199
6 D19 0.0677
7 D20 0.0459
8 d21 -0.158
9 D23 0.0650
10 D25 0.0579
11 D27 0.0463
12 D29 0.00822
13 D30 0.00613
14 D36 -0.0484
15 D37 0.0312
16 D39 0.000547
17 D44 0.0336
18 D46 0.0514
19 D48 0.236
20 D51 -0.000487
21 D60 0.0410
22 D61 0.0622
23 D62 0.0337
24 D64 -0.125
25 D65 0.215
26 D66 0.200
And I calculate the BF with:
bf.mono.correct = ttestBF(x = avg_PTL_mono_correct$avg_PTL)
Any tips are much appreciated!
You can use sapply to run the test multuiple times and just subset the vector of observations each time. For example
srange <- 10:nrow(avg_PTL_mono_correct)
BF <- sapply(srange, function(i) {
extractBF(ttestBF(x = avg_PTL_mono_correct$avg_PTL[1:i]), onlybf=TRUE)
})
plot(srange, BF)
Will result in

How can you split ggplot from 12 individual bars to 3 groups of 4?

I have a bar graph with 12 individual bars. I would like to split them into their 3 respective groups, each with their own color so that they are recognized as the same group. I have been using ColorBrewer Set 3, because it is photocopy safe. When I use it on my plot it all turns one color.
In the plot, you can see the 3 groups - ELE, KEB, and SMI, each with 4 blocks. It would be great if they could be split up more cohesively.
# A tibble: 12 x 7
vid.order sum.correct n prop.correct z_score p_val sig
<chr> <int> <int> <dbl> <dbl> <dbl> <lgl>
1 ELE1 47 55 0.855 5.26 0.000000145 TRUE
2 ELE2 46 55 0.836 4.99 0.000000607 TRUE
3 ELE3 37 55 0.673 2.56 0.0104 TRUE
4 ELE4 47 55 0.855 5.26 0.000000145 TRUE
5 KEB1 40 55 0.727 3.37 0.000749 TRUE
6 KEB2 46 55 0.836 4.99 0.000000607 TRUE
7 KEB3 47 55 0.855 5.26 0.000000145 TRUE
8 KEB4 44 55 0.8 4.45 0.00000860 TRUE
9 SMI1 35 55 0.636 2.02 0.0431 TRUE
10 SMI2 46 55 0.836 4.99 0.000000607 TRUE
11 SMI3 41 55 0.745 3.64 0.000272 TRUE
12 SMI4 35 55 0.636 2.02 0.0431 TRUE
byBlot_sigtests %>%
ggplot(aes(x=vid.order, y=prop.correct))+
geom_bar(stat="identity", position=position_dodge(.9))+
labs(x="video", y="proportion natural selected")+
geom_hline(yintercept = .5) +
expand_limits(y=c(0,1)) +
scale_fill_brewer(palette = "Set3") +
jtools::theme_apa()
Personally I would create a column with the groups (ELE, KEB or SMI) and use that in aes(fill = )
library(data.table)
library(dplyr)
library(ggplot2)
library(jtools)
#make object data.table
setDT(byBlot_sigtests)
#create a column with the groups (vid.order but without the numbers)
byBlot_sigtests[, group := gsub("[0-9]", "", vid.order)]
#plot
byBlot_sigtests %>%
ggplot(aes(x=vid.order, y=prop.correct, fill = group))+
geom_bar(stat="identity", position=position_dodge(.9))+
labs(x="video", y="proportion natural selected")+
geom_hline(yintercept = .5) +
expand_limits(y=c(0,1)) +
scale_fill_brewer(palette = "Set3") +
jtools::theme_apa()

Create legend for line chart R GGPlot2

Hello I am trying to add a legend to my graph:
Having looked at a few previous answers they all seem to rely on aes() or having the lines be related to a factor in some way. I didn't understand this answer Add legend to geom_line() graph in r.
In my case I simply want a legend that states "RED = No Cross Validation" and "BLUE = Cross Validation"
R Code
ggplot(data=graphDF,aes(x=rev(kAxis)))+
geom_line(y=rev(noCVErr),color="red")+
geom_point(y=rev(noCVErr),color="red")+
geom_line(y=rev(CVErr),color="blue")+
geom_point(y=rev(CVErr),color="blue")+
ylim(minErr,maxErr)+
ggtitle("The KNN Error Rate for Cross Validated and Non-Cross Validated Models")+
labs(y="Error Rate", x = "1/K")
Dataset
ks kAxis noCVAcc noCVErr CVAcc CVErr
1 1 1.00000000 1.0000000 0.00000000 0.8279075 0.1720925
2 3 0.33333333 0.9345238 0.06547619 0.8336898 0.1663102
3 5 0.20000000 0.8809524 0.11904762 0.8158645 0.1841355
4 7 0.14285714 0.8690476 0.13095238 0.8272727 0.1727273
5 9 0.11111111 0.8809524 0.11904762 0.7857398 0.2142602
6 11 0.09090909 0.8809524 0.11904762 0.7500891 0.2499109
7 13 0.07692308 0.8511905 0.14880952 0.7622103 0.2377897
8 15 0.06666667 0.7976190 0.20238095 0.7320856 0.2679144
9 17 0.05882353 0.7916667 0.20833333 0.7320856 0.2679144
10 19 0.05263158 0.7559524 0.24404762 0.7201426 0.2798574
11 21 0.04761905 0.7678571 0.23214286 0.7023173 0.2976827
12 23 0.04347826 0.7440476 0.25595238 0.6903743 0.3096257
13 25 0.04000000 0.7559524 0.24404762 0.6786096 0.3213904
It might help if you put your data into "long" form, such as this for your data frame graphDF (perhaps using pivot_longer from tidyr if necessary):
library(tidyr)
graphDF_long <- pivot_longer(data = graphDF,
cols = c(noCVErr, CVErr),
names_to = "model",
values_to = "errRate")
This creates a new data.frame called graphDF_long that has a single column for the error rate, and a new column that specifies model:
ks kAxis noCVAcc CVAcc model errRate
<int> <dbl> <dbl> <dbl> <chr> <dbl>
1 1 1 1 0.828 noCVErr 0
2 1 1 1 0.828 CVErr 0.172
3 3 0.333 0.935 0.834 noCVErr 0.0655
4 3 0.333 0.935 0.834 CVErr 0.166
5 5 0.2 0.881 0.816 noCVErr 0.119
6 5 0.2 0.881 0.816 CVErr 0.184
....
Then, you can simplify your ggplot statement, and use an aesthetic with the column model for color:
library(ggplot2)
ggplot(data = graphDF_long, aes(x = rev(kAxis), y = rev(errRate), color = model)) +
geom_line() +
geom_point() +
scale_color_manual(values = c("blue", "red"),
labels = c("Cross Validation", "No Cross Validation")) +
ylim(min(graphDF_long$errRate), max(graphDF_long$errRate)) +
ggtitle("The KNN Error Rate for Cross Validated and Non-Cross Validated Models") +
labs(y="Error Rate", x = "1/K")
This will generate the legend automatically:

dplyr: categorical counts from a single column across multiple variables

So I have been trying to do a boxplot of "yes/no" counts for hours now.
My dataset looks like this
> stack
Site Plot Treatment Meters Retrieved
2 Southern 18 Control -5.00 y
3 Southern 18 Control 9.55 y
4 Southern 18 Control 4.70 y
5 Southern 27 Control -5.00 y
6 Southern 27 Control 20.00 n
9 Southern 18 Control -0.10 y
17 Southern 18 Control 20.00 y
23 Southern 31 Control 100.00 y
53 Southern 25 Mu 3.55 n
54 Southern 20 Mu 5.90 y
55 Southern 25 Mu -0.10 y
56 Southern 29 Mu 9.55 y
58 Southern 25 Mu 4.70 y
60 Southern 20 Mu 2.90 y
61 Southern 24 Mu 5.90 n
62 Southern 24 Mu 3.55 y
63 Southern 20 Mu 3.55 y
65 Southern 24 Mu 0.55 y
66 Southern 29 Mu 8.90 y
68 Southern 25 Mu 8.90 y
69 Southern 29 Mu 0.55 y
70 Southern 24 Mu 1.70 y
72 Southern 29 Mu -5.00 y
76 Southern 29 Mu 1.70 y
77 Southern 25 Mu 9.55 y
78 Southern 25 Mu 13.20 y
79 Southern 29 Mu 3.55 y
80 Southern 25 Mu 15.00 y
81 Southern 25 Mu -5.00 n
84 Southern 24 Mu 8.90 y
85 Southern 20 Mu 6.55 y
86 Southern 29 Mu 2.90 y
92 Southern 24 Mu -0.10 y
93 Southern 20 Mu 100.00 y
I want to get counts of both y(yes) and n(no) of the variable "Retrieved" while grouping for "Treatment" and "Meters".
So it should look something like this
Treatment Meters Yes No
Control -5.00 2 0
Control 9.55 1 2
Control 4.70 1 1
Control 20.00 0 2
Mu 3.55 4 0
Mu 5.90 0 1
Mu -0.10 2 2
Mu 9.55 1 0
With this data I want to do a stacked boxplot with x=Meters, y= count and Treatment as grid or something. like this
This is my code but it's not working
plot_data <- stack %>%
count(Retrieved, Treatment, Meters) %>%
group_by(Treatment, Meters) %>%
mutate(count= n)
plot_data
ggplot(plot_data, aes(x = Meters, y = count, fill = Treatment)) +
geom_col(position = "fill") +
geom_label(aes(label = count(count)), position = "fill", color = "white", vjust = 1, show.legend = FALSE) +
scale_y_continuous(labels = count)
Could you please tell me what I'm doing wrong.
geom_bar is for precisely this case, and you won't even need to use group_by or count. (From the docs: "geom_bar makes the height of the bar proportional to the number of cases in each group".)
This should do what you're looking for:
ggplot(stack, aes(x = Meters, fill = Treatment)) +
geom_bar(position = "stack")
However, the bars will be very narrow because "Meters" is continuous and has a large range. You could address this by converting it into a factor. One way to do that would be to do this first:
data <- data %>%
mutate(Meters = as.factor(Meters))
If you want to get the counts in the format that you mentioned (in addition to creating the plot), you could do:
data %>%
count(Treatment, Meters, Retrieved) %>%
spread(Retrieved, n, fill = 0) %>%
rename(Yes = y, No = n)
count does group_by for you, so I didn't need to carry that over from your code. Then, spread creates the separate columns for y and n. Finally, I rename those columns to Yes and No.

R Programming issue intervals

I'm trying to figure out a formula to be able to divide the max and min number inside the intervals.
x <- sample(10:40,100,rep=TRUE)
factorx<- factor(cut(x, breaks=nclass.Sturges(x)))
xout<-as.data.frame(table(factorx))
xout<- transform(xout, cumFreq = cumsum(Freq), relative = prop.table(Freq))
Using the above code in the R editor program, I get the following:
xout
factorx Freq cumFreq relative
1 (9.97,13.8] 14 14 0.14
2 (13.8,17.5] 13 27 0.13
3 (17.5,21.2] 16 43 0.16
4 (21.2,25] 5 48 0.05
5 (25,28.8] 11 59 0.11
6 (28.8,32.5] 8 67 0.08
7 (32.5,36.2] 16 83 0.16
8 (36.2,40] 17 100 0.17
What I want to know is if there is a way to calculate the interval. For example it would be:
(13.8 + 9.97)/2
It's called the class midpoint in statistics I believe.
Here's a one-liner that is probably close to what you want:
> sapply(strsplit(levels(xout$factorx), ","), function(x) sum(as.numeric(gsub("[[:space:]]", "", chartr(old = "(]", new = " ", x))))/2)
[1] 11.885 15.650 19.350 23.100 26.900 30.650 34.350 38.100
#One possible solution is to split by (,] (xout is your dataframe)
x1<-strsplit(as.character(xout$factorx),",|\\(|]")
x2<-do.call(rbind,x1)
xout$lower=as.numeric(x2[,2])
xout$higher=as.numeric(x2[,3])
xout$ave<-rowMeans(xout[,c("lower","higher")])
> head(xout,3)
factorx Freq cumFreq relative higher lower aver
1 (9.97,13.7] 15 15 0.15 13.7 9.97 11.835
2 (13.7,17.5] 14 29 0.14 17.5 13.70 15.600
3 (17.5,21.2] 12 41 0.12 21.2 17.50 19.350

Resources