How to plot a Sequential Bayes Factor as participants are added - r

I am currently analyzing eye-tracking data using the Sequential Bayes Factor method, and I would like to plot how the resulting Bayes Factor (BF; calculated from average looking times) changes as participants are added.
I would like the x-axis to represent the number of participants included in the calculation, and the y-axis to represent the resulting Bayes Factor.
For example, when participants 1-10 are included, BF = [y-value], and that is one plot point on the graph. When participants 1-11 are included, BF = [y-value], and that is the second plot point on the graph.
Is there a way to do this in R?
For example, I have this data set:
ID avg_PTL
<chr> <dbl>
1 D07 -0.0609
2 D08 0.0427
3 D12 0.112
4 D15 -0.106
5 D16 0.199
6 D19 0.0677
7 D20 0.0459
8 d21 -0.158
9 D23 0.0650
10 D25 0.0579
11 D27 0.0463
12 D29 0.00822
13 D30 0.00613
14 D36 -0.0484
15 D37 0.0312
16 D39 0.000547
17 D44 0.0336
18 D46 0.0514
19 D48 0.236
20 D51 -0.000487
21 D60 0.0410
22 D61 0.0622
23 D62 0.0337
24 D64 -0.125
25 D65 0.215
26 D66 0.200
And I calculate the BF with:
bf.mono.correct = ttestBF(x = avg_PTL_mono_correct$avg_PTL)
Any tips are much appreciated!

You can use sapply to run the test multuiple times and just subset the vector of observations each time. For example
srange <- 10:nrow(avg_PTL_mono_correct)
BF <- sapply(srange, function(i) {
extractBF(ttestBF(x = avg_PTL_mono_correct$avg_PTL[1:i]), onlybf=TRUE)
})
plot(srange, BF)
Will result in

Related

Is there another way to calculate within-subject Hedges'g (and error)?

I'm carrying out a meta-analysis of within-subject studies (crossover studies). I've read some papers that used the esc package (esc_mean_sd function, more precisely) to calculate Hedges'g to perform it. However, its output is doubling the "n" of each study.
Please, look that the "n" in the data is n=12 for all the three studies, while in the output there are n=24.
ID mean_exp mean_con sd_exp sd_con n
1 A 150 130 15 22 12
2 B 166 145 10 8 12
3 C 179 165 11 14 12
# What I did:
e1 <- esc_mean_sd(data[1,2],data[1,4],data[1,6],
data[1,3],data[1,5],data[1,6],
r = .9,es.type = "g")
e2 <- esc_mean_sd(data[2,2],data[2,4],data[2,6],
data[2,3],data[2,5],data[2,6],
r = .9,es.type = "g")
e3 <- esc_mean_sd(data[3,2],data[3,4],data[3,6],
data[3,3],data[3,5],data[3,6],
r = .9,es.type = "g")
data2 <- combine_esc(e1, e2, e3)
colnames(data2) <- ("study","es","weight","n","se","var","lCI","uCI","measure")
head(data2, 3)
# study es weight n se var lCI uCI measure
# 1 1.80 4.18 24 0.489 0.239 0.842 2.76 g
# 2 4.53 1.60 24 0.791 0.626 2.983 6.08 g
# 3 2.14 3.71 24 0.519 0.269 1.126 3.16 g

How can I extract specific data points from a wide-formatted text file in R?

I have datasheets with multiple measurements that look like the following:
FILE DATE TIME LOC QUAD LAI SEL DIFN MTA SEM SMP
20 20210805 08:38:32 H 1161 2.80 0.68 0.145 49. 8. 4
ANGLES 7.000 23.00 38.00 53.00 68.00
CNTCT# 1.969 1.517 0.981 1.579 1.386
STDDEV 1.632 1.051 0.596 0.904 0.379
DISTS 1.008 1.087 1.270 1.662 2.670
GAPS 0.137 0.192 0.288 0.073 0.025
A 1 08:38:40 31.66 33.63 34.59 39.13 55.86
1 2 08:38:40 -5.0e-006
B 3 08:38:48 25.74 20.71 15.03 2.584 1.716
B 4 08:38:55 0.344 1.107 2.730 0.285 0.265
B 5 08:39:02 3.211 5.105 13.01 4.828 1.943
B 6 08:39:10 8.423 22.91 48.77 16.34 3.572
B 7 08:39:19 12.58 14.90 18.34 18.26 4.125
I would like to read the entire datasheet and extract the values for 'QUAD' and 'LAI' only. For example, for the data above I would only be extracting a QUAD of 1161 and an LAI of 2.80.
In the past the datasheets were formatted as long data, and I was able to use the following code:
library(stringr)
QUAD <- as.numeric(str_trim(str_extract(data, "(?m)(?<=^QUAD).*$")))
LAI <- as.numeric(str_trim(str_extract(data, "(?m)(?<=^LAI).*$")))
data_extract <- data.frame(
QUAD = QUAD[!is.na(QUAD)],
LAI = LAI[!is.na(LAI)]
)
data_extract
Unfortunately, this does not work because of the wide formatting in the current datasheet. Any help would be hugely appreciated. Thanks in advance for your time.

Why MARS (earth package) generates so many predictors?

I am working on a MARS model using earth package in R. My dataset (CE.Rda) consists of one dependent variable (D9_RTO_avg) and 10 potential predictors (NDVI_l1, NDVI_f0, NDVI_f1, NDVI_f2, NDVI_f3, LST_l1, LST_f0, LST_f1, NDVI_f2,NDVI_f3). Next, I show you the head of my dataset
D9_RTO_avg NDVI_l1 NDVI_f0 NDVI_f1 NDVI_f2 NDVI_f3 LST_l1 LST_f0 LST_f1 LST_f2 LST_f3
2 1.866667 0.3082 0.3290 0.4785 0.4330 0.5844 38.25 30.87 31 21.23 17.92
3 2.000000 0.2164 0.2119 0.2334 0.2539 0.4686 35.7 29.7 28.35 21.67 17.71
4 1.200000 0.2324 0.2503 0.2640 0.2697 0.4726 40.13 33.3 28.95 22.81 16.29
5 1.600000 0.1865 0.2070 0.2104 0.2164 0.3911 43.26 35.79 30.22 23.07 17.88
6 1.800000 0.2757 0.3123 0.3462 0.3778 0.5482 43.99 36.06 30.26 21.36 17.93
7 2.700000 0.2265 0.2654 0.3174 0.2741 0.3590 41.61 35.4 27.51 23.55 18.88_
After creating my earth model as follows
mymodel.mod <- earth(D9_RTO_avg ~ ., data=CE, nk=10)
I print the summary of the resulting model by typing
print(summary(mymodel.mod, digits=2, style="pmax"))
and I obtain the following output
D9_RTO_avg =
4.1
+ 38 * LST_f128.68
+ 6.3 * LST_f216.41
- 2.9 * pmax(0, 0.66 - NDVI_l1)
- 2.3 * pmax(0, NDVI_f3 - 0.23)
Selected 5 of 7 terms, and 4 of 13169 predictors
Termination condition: Reached nk 10
Importance: LST_f128.68, NDVI_l1, NDVI_f3, LST_f216.41, NDVI_f0-unused, NDVI_f1-unused, NDVI_f2-unused, ...
Number of terms at each degree of interaction: 1 4 (additive model)
GCV 2 RSS 4046 GRSq 0.29 RSq 0.29
My question is why earth is identifying 13169 predictors when they are actually 10!? It seems that MARS is considering single observations of candidate predictors as predictors themselves. How can I avoid MARS from doing so?
Thanks for your help

line connecting missing data R

I would like a line plot in R of the days a bird spent away from its nest.
I have missing data that is making it difficult to show the general trend. I want to replace the line for the days that I don't have information for with a dotted line. I have absolutely no idea how to do this. Is it possible to do in R?
> time.away.1
hrs.away days.rel
1 0.380 -2
2 0.950 -1
3 1.000 0
4 0.200 1
5 0.490 12
6 0.280 13
7 0.130 14
8 0.750 20
9 0.160 21
10 1.830 22
11 0.128 26
12 0.126 27
13 0.500 28
14 0.250 31
15 0.230 32
16 0.220 33
17 0.530 40
18 3.220 41
19 0.430 42
20 1.960 45
21 1.490 46
22 24.000 56
23 24.000 57
24 24.000 58
25 24.000 59
26 24.000 60
27 24.000 61
My attempt:
plot(hrs.away ~ days.rel, data=time.away.1,
type="o",
main="Time Away Relative to Nest Age",
ylab="Time spent away",
xlab="Days Relative to Initiation",
ylim=c(0,4))
Here is a way using diff to make a variable determining if a sequence is missing. Note that I renamed your data to dat
## make the flag variable
dat$type <- c(TRUE, diff(dat$days.rel) == 1)
plot(hrs.away ~ days.rel, data=dat,
type="p",
main="Time Away Relative to Nest Age",
ylab="Time spent away",
xlab="Days Relative to Initiation",
ylim=c(0,4))
legend("topright", c("missing", "sampled"), lty=c(2,1))
## Add line segments
len <- nrow(dat)
with(dat,
segments(x0=days.rel[-len], y0=hrs.away[-len],
x1=days.rel[-1], y1=hrs.away[-1],
lty=ifelse(type[-1], 1, 2),
lwd=ifelse(type[-1], 2, 1))
)
For the ggplot version, you can make another data.frame with the lagged variables used above,
library(ggplot2)
dat2 <- with(dat, data.frame(x=days.rel[-len], xend=days.rel[-1],
y=hrs.away[-len], yend=hrs.away[-1],
type=factor(as.integer(type[-1]))))
ggplot() +
geom_point(data=dat, aes(x=days.rel, y=hrs.away)) +
geom_segment(data=dat2, aes(x=x, xend=xend, y=y, yend=yend, lty=type, size=type)) +
scale_linetype_manual(values=2:1) +
scale_size_manual(values=c(0.5,1)) +
ylim(0, 4) + theme_bw()

Converting Repeated Measures Code in SAS to Equivalent in R

I am attempting to convert existing SAS code I have for a research project into R. I am, unfortunately, finding myself totally clueless on how to approach this for repeated measures ANOVA despite a few hours of looking at other people's questions across both StackExchange and the web at large. I suspect this may at least, if not in total, be due to my not knowing the right questions to ask and limited statistics background.
First, I will present some sample data (tab-delimited, which I'm not sure will be preserved on SE), then explain what I'm attempting to do, and then the code I have written as of this moment.
Sample data:
Full data frame at: http://grandprairiefriends.org/document/data.df
Obs SbjctID Sex Treatment Measured BirthDate DateStarted DateAssayed SubjectAge_Start_days SubjectAgeAssay.d. PreMass_mg PostMass_mg DiffMass_mg PerCentMassDiff Length_mm Width_mm PO1_abs_min PO1_r2 PO2_abs_min PO2_r2 ProteinConc_ul Protein1_net_abs Protein1_mg_ml Protein1_adjusted_mg_ml Protein2_net_abs Protein2_mg_ml Protein2_adjusted_mg_ml zPO_avg_abs_min z_Protein_avg_adjusted_mg_ml POPer_ug_Protein POPer_ug_Protein_x1000 ImgDarkness1 ImgDarkness2 ImgDarkness3 ImgDarkness4 DarknessAvg AGV_1_1 AGV_1_2 AGV_2_1 AGV_2_2 AGV_12_1 AGV_12_2 z_AGV predicted_premass resid_premass predicted_premass_calculated resid_premass_calculated predicted_postmass_calculated resid_postmass_calculated predicted_postmass resid_postmass ln_premass_mg ln_postmass_mg ln_length ln_melanization ln_po sqrt_p
1 aF001 Female a PO_P 08/05/09 09/06/09 09/13/09 32 39 282.7 309.4 26.66 9.43 10.1 5.3 0.0175 0.996 0.0201 0.996 40 0.227 0.960 0.960 0.234 1.030 1.030 0.0188 0.995 0.00031 0.31491 33.7045 35.9165 28.8383 30.3763 32.2089 NA NA NA NA NA NA NA 5.660963 -0.016576413 4.077123 1.567263 4.077123 1.657382 5.660963 0.0735429694 8.143128 8.273329 3.336283 NA -5.733124 -0.007231569
2 aF002 Female a PO_P 08/02/09 09/06/09 09/13/09 35 42 298.9 313.1 14.23 4.76 10.0 5.9 0.0123 0.999 0.0134 0.996 40 0.213 0.840 0.840 0.219 0.860 0.860 0.0129 0.850 0.00025 0.25196 31.8700 31.8800 32.4680 32.3020 32.1300 NA NA NA NA NA NA NA 5.640012 0.059996453 4.056173 1.643836 4.056173 1.690350 5.640012 0.1065103847 8.223519 8.290480 3.321928 NA -6.276485 -0.234465254
3 aF003 Female a PO_P 08/03/09 09/06/09 09/13/09 34 41 237.1 270.6 33.53 14.14 9.4 5.3 0.0227 0.992 0.0248 0.994 40 0.245 1.120 1.120 0.235 1.030 1.030 0.0238 1.075 0.00037 0.36822 36.0565 41.9355 41.6260 40.0180 39.9090 NA NA NA NA NA NA NA 5.509734 -0.041209334 3.925894 1.542630 3.925894 1.674895 5.509734 0.0910560222 7.889352 8.080018 3.232661 NA -5.392895 0.104336660
82 bM001 Male b PO_P 08/02/09 08/31/09 09/07/09 29 36 468.1 371.7 -96.38 -20.59 10.7 6.8 0.0049 0.999 0.0056 1.000 40 0.228 0.350 0.350 0.222 0.330 0.330 0.0053 0.340 0.00026 0.25735 NA NA NA NA NA NA NA NA NA NA NA NA 5.782468 0.366214334 4.198628 1.950054 4.198628 1.719513 5.640012 -0.0844204671 8.870673 8.537995 3.419539 NA -7.559792 -1.556393349
157 cM022 Male c PO_P 08/03/09 10/31/09 11/07/09 89 96 451.1 402.4 -48.71 -10.80 11.3 6.9 0.0024 0.995 0.0026 0.995 10 0.091 0.110 0.028 NA NA NA 0.0025 0.028 0.00152 1.51515 NA NA NA NA NA NA NA NA NA NA NA NA 5.897342 0.214325251 4.313502 1.798165 4.313502 1.683895 5.897342 0.1000552907 8.817303 8.652486 3.498251 NA -8.643856 -5.158429363
Explanation of what I'm looking to accomplish:
This experiment was attempting to determine if a particular feeding regime (Treatment) had an effect on the after-experiment mass of the subject (ln_postmass_mg). The mass of each individual was measured twice, once at the beginning (ln_premass_mg), and once at the end of the feeding regime. Sex, Treatment, and Measured are all categorical variables.
I have generated some R code, but the output does not match the SAS code, which it shouldn't, as I don't believe it's coded for repeated measures. It's not clear to me if I need to transpose or otherwise manipulate my dataframe in R to perform additional analyses, or what. I seem to be reading multiple different approaches to repeated measures problems, and am not sure which, if any, apply to my particular problem. If anyone can put me in the right track to learn how to write the additional lines of code necessary for the R equivalent, or have suggestions, I'd much appreciate it.
SAS Code:
/* test for effect of diet regime */
/* repeated measures ANOVA for mass */
proc glm data=No_diet_lab;
class measured sex Treatment;
model ln_premass ln_postmass=Measured Sex Treatment Measured*Sex Measured*Treatment Sex*Treatment Measured*Sex*Treatment /nouni;
repeated time 2;
R Code:
options(contrasts=c("contr.sum","contr.poly"))
model <- lm(cbind(ln_premass_mg, ln_postmass_mg) ~ Sex + Treatment + Measured + Sex:Treatment + Sex:Measured + Measured:Treatment + Sex:Treatment:Measured, data = diet_lab_data, na.action=na.omit)
This should hopefully replicate your SAS output:
First we'll put the data in long form:
df <- subset(diet_lab_data, select = c("SubjectID", "Sex", "Treatment", "Measured",
"ln_premass_mg", "ln_postmass_mg"))
dfL <- reshape(df, varying = list(5:6), idvar = "SubjectID", direction = "long",
v.names = "ln_mass_mg")
dfL$time <- factor(dfL$time, levels = 1:2, labels = c("pre", "post"))
head(dfL); tail(dfL)
SubjectID Sex Treatment Measured time ln_mass_mg
aF001.1 aF001 Female a PO_P pre 8.143128
aF002.1 aF002 Female a PO_P pre 8.223519
aF003.1 aF003 Female a PO_P pre 7.889352
aF004.1 aF004 Female a PO_P pre 8.521993
aF005.1 aF005 Female a PO_P pre 8.335390
aF006.1 aF006 Female a PO_P pre 8.259743
SubjectID Sex Treatment Measured time ln_mass_mg
cM033.2 cM033 Male c Melaniz post 8.163398
bF037.2 bF037 Female b Melaniz post 8.222070
cM032.2 cM032 Male c Melaniz post 8.422485
cF030.2 cF030 Female c Melaniz post 8.580447
cM039.2 cM039 Male c Melaniz post 8.710118
cM036.2 cM036 Male c Melaniz post 8.049849
That's better. Now we fit the model using aov and specifying time as a within subjects factor.
aovMod <- aov(ln_mass_mg ~ Sex * Treatment * Measured * time +
Error(SubjectID/time), data = dfL)
All that being said, I'm not sure this is the appropriate analysis, as your design is unbalanced. Consider a mixed-effects model.

Resources