Locating changepoints in multiple regression model

Locating changepoints in multiple regression model - r

I have some experimental data with a response variable (y) and multiple explanatory variables (x1 to x4). Zero values in the explanatory variable fields are the experimental controls.
x1 x2 x3 x4 y
0.082494269 1 16.43 328 -0.325
0.195673137 5 27.07 318 -0.3625
0.219937331 7 45.44 360 -0.525
0.059035684 4 32.68 203 -0.4125
0.381432485 8 71.38 167 -0.475
0.040377394 3 16.43 135 -0.425
0.055993298 1 21.88 154 -0.3875
0 0 0 0 -0.325
0.112635472 5 33.63 328 -0.3625
0.217039159 7 45.83 200 -0.475
0.035330022 2 17.78 117 -0.48
0.234216386 6 51.79 119 -0.45
0.085048722 6 39.71 98 -0.445
0.064759017 1 27.46 133 -0.3625
0.123863896 7 36.61 82 -0.4625
0.18932145 6 44.21 74 -0.425
0.409036425 8 62.38 154 -0.525
0 0 0 0 -0.275
0.493185115 8 120.3 103 -0.5625
0.132214199 5 26.61 222 -0.425
0 0 0 0 -0.3375
My regression models are usually bivariate (y~x1) but all the x values here could be used as predictors. What I'm trying to do is find (if there is one) the point at which the y value changes significantly using whichever combination of x values is required to elicit a change.
I'm using R and have briefly looked at the 'chngpt' package but it's a little beyond me at the moment. Has anyone had any experience using this package and is this appropriate for my purpose? Could anyone demonstrate how to do what I'm trying to do with the data provided?
Thanks

Related

how to fit curve with two exponential terms in R

the formula is
f=exp(-d*t)+exp(g*t)-1
my dataset includes many observations(f in the formula) on several different times(t) for several subjects. And I want to get estimations on d and g.
How should I code for this in R? And I don't know how to determine starting values, since every subject might have different curve shapes.
Here are some hypothetical examples:
subject t f
1 1 0 515.6
2 1 70 62.9
3 1 126 34.8
4 1 181 18.5
5 1 245 28.9
6 1 289 29.6
7 1 359 109.1
8 1 408 33.2
9 1 531 16.9
10 1 569 97.2
I have hundreds of subjects, and I want to estimate the parameters (d and g) on personal level, means different curve for different subject.

glmmTMB, post-hoc testing and glht

I am using glmmTMB to analyze a negative binomial generalized linear mixed model (GLMM) where the dependent variable is count data (CT), which is over-dispersed.
There are 115 samples (rows) in the relevant data frame. There are two fixed effects (F1, F2) and a random intercept (R), within which is nested a further random effect (NR). There is also an offset, consisting of the natural logarithm of the total counts in each sample (LOG_TOT).
An example of a data frame, df, is:
CT F1 F2 R NR LOG_TOT
77 0 0 1 1 12.9
167 0 0 2 6 13.7
289 0 0 3 11 13.9
253 0 0 4 16 13.9
125 0 0 5 21 13.7
109 0 0 6 26 13.6
96 1 0 1 2 13.1
169 1 0 2 7 13.7
190 1 0 3 12 13.8
258 1 0 4 17 13.9
101 1 0 5 22 13.5
94 1 0 6 27 13.5
89 1 25 1 4 13.0
166 1 25 2 9 13.6
175 1 25 3 14 13.7
221 1 25 4 19 13.8
131 1 25 5 24 13.5
118 1 25 6 29 13.6
58 1 75 1 5 12.9
123 1 75 2 10 13.4
197 1 75 3 15 13.7
208 1 75 4 20 13.8
113 1 8 1 3 13.2
125 1 8 2 8 13.7
182 1 8 3 13 13.7
224 1 8 4 18 13.9
104 1 8 5 23 13.5
116 1 8 6 28 13.7
122 2 0 1 2 13.1
115 2 0 2 7 13.6
149 2 0 3 12 13.7
270 2 0 4 17 14.1
116 2 0 5 22 13.5
94 2 0 6 27 13.7
73 2 25 1 4 12.8
61 2 25 2 9 13.0
185 2 25 3 14 13.8
159 2 25 4 19 13.7
125 2 25 5 24 13.6
75 2 25 6 29 13.5
121 2 8 1 3 13.0
143 2 8 2 8 13.8
219 2 8 3 13 13.9
191 2 8 4 18 13.7
98 2 8 5 23 13.5
115 2 8 6 28 13.6
110 3 0 1 2 12.8
123 3 0 2 7 13.6
210 3 0 3 12 13.9
354 3 0 4 17 14.4
160 3 0 5 22 13.7
101 3 0 6 27 13.6
69 3 25 1 4 12.6
112 3 25 2 9 13.5
258 3 25 3 14 13.8
174 3 25 4 19 13.5
171 3 25 5 24 13.9
117 3 25 6 29 13.7
38 3 75 1 5 12.1
222 3 75 2 10 14.1
204 3 75 3 15 13.5
235 3 75 4 20 13.7
241 3 75 5 25 13.8
141 3 75 6 30 13.9
113 3 8 1 3 12.9
90 3 8 2 8 13.5
276 3 8 3 13 14.1
199 3 8 4 18 13.8
111 3 8 5 23 13.6
109 3 8 6 28 13.7
135 4 0 1 2 13.1
144 4 0 2 7 13.6
289 4 0 3 12 14.2
395 4 0 4 17 14.6
154 4 0 5 22 13.7
148 4 0 6 27 13.8
58 4 25 1 4 12.8
136 4 25 2 9 13.8
288 4 25 3 14 14.0
113 4 25 4 19 13.5
162 4 25 5 24 13.7
172 4 25 6 29 14.1
2 4 75 1 5 12.3
246 4 75 3 15 13.7
247 4 75 4 20 13.9
114 4 8 1 3 13.1
107 4 8 2 8 13.6
209 4 8 3 13 14.0
190 4 8 4 18 13.9
127 4 8 5 23 13.5
101 4 8 6 28 13.7
167 6 0 1 2 13.4
131 6 0 2 7 13.5
369 6 0 3 12 14.5
434 6 0 4 17 14.9
172 6 0 5 22 13.8
126 6 0 6 27 13.8
90 6 25 1 4 13.1
172 6 25 2 9 13.7
330 6 25 3 14 14.2
131 6 25 4 19 13.7
151 6 25 5 24 13.9
141 6 25 6 29 14.2
7 6 75 1 5 12.2
194 6 75 2 10 14.2
280 6 75 3 15 13.7
253 6 75 4 20 13.8
45 6 75 5 25 13.4
155 6 75 6 30 13.9
208 6 8 1 3 13.5
97 6 8 2 8 13.5
325 6 8 3 13 14.3
235 6 8 4 18 14.1
112 6 8 5 23 13.6
188 6 8 6 28 14.1
The random and nested random effects are treated as factors. The fixed effect F1 has the value 0, 1, 2, 3, 4 and 6. The fixed effect F2 has the values 0, 8, 25 and 75. I am treating the fixed effects as continuous, rather than ordinal, because I would like to identify monotonic unidirectional changes in the dependent variable CT rather than up and down changes.
I previously used the lme4 package to analyze the data as a mixed model:
library(lme4)
m1 <- lmer(CT ~ F1*F2 + (1|R/NR) +
offset(LOG_TOT), data = df, verbose=FALSE)
Followed by the use of glht in the multcomp package for post-hoc analysis employing the formula approach:
library(multcomp)
glht_fixed1 <- glht(m1, linfct = c(
"F1 == 0",
"F1 + 8*F1:F2 == 0",
"F1 + 25*F1:F2 == 0",
"F1 + 75*F1:F2 == 0",
"F1 + (27)*F1:F2 == 0"))
glht_fixed2 <- glht(m1, linfct = c(
"F2 + 1*F1:F2 == 0",
"F2 + 2*F1:F2 == 0",
"F2 + 3*F1:F2 == 0",
"F2 + 4*F1:F2 == 0",
"F2 + 6*F1:F2 == 0",
"F2 + (3.2)*F1:F2 == 0"))
glht_omni <- glht(m1)
Here is the corresponding negative binomial glmmTMB model, which I now prefer:
library(glmmTMB)
m2 <- glmmTMB(CT ~ F1*F2 + (1|R/NR) +
offset(LOG_TOT), data = df, verbose=FALSE, family="nbinom2")
According to this suggestion by Ben Bolker (https://stat.ethz.ch/pipermail/r-sig-mixed-models/2017q3/025813.html), the best approach to post hoc testing with glmmTMB is to use lsmeans (?or its more recent equivalent, emmeans).
I follwed Ben's suggestion, running
source(system.file("other_methods","lsmeans_methods.R",package="glmmTMB"))
and I can then use emmeans on the glmmTMB object. For example,
as.glht(emmeans(m2,~(F1 + 27*F1:F2)))
General Linear Hypotheses
Linear Hypotheses:
Estimate
3.11304347826087, 21 == 0 -8.813
But this does not seem correct. I can also change F1 and F2 to factors and then try this:
as.glht(emmeans(m2,~(week + 27*week:conc)))
General Linear Hypotheses
Linear Hypotheses:
Estimate
0, 0 == 0 -6.721
1, 0 == 0 -6.621
2, 0 == 0 -6.342
3, 0 == 0 -6.740
4, 0 == 0 -6.474
6, 0 == 0 -6.967
0, 8 == 0 -6.694
1, 8 == 0 -6.651
2, 8 == 0 -6.227
3, 8 == 0 -6.812
4, 8 == 0 -6.371
6, 8 == 0 -6.920
0, 25 == 0 -6.653
1, 25 == 0 -6.648
2, 25 == 0 -6.282
3, 25 == 0 -6.766
4, 25 == 0 -6.338
6, 25 == 0 -6.702
0, 75 == 0 -6.470
1, 75 == 0 -6.642
2, 75 == 0 -6.091
3, 75 == 0 -6.531
4, 75 == 0 -5.762
6, 75 == 0 -6.612
But, again, I am not sure how to bend this output to my will. If some kind person could tell me how to correctly carry over the use of formulae in glht and linfct to the emmeans scenario with glmmTMB, I would be very grateful. I have read all the manuals and vignettes until I am blue in face (or it feels that way, at least), but I am still at a loss. In my defense (culpability?) I am a statistical tyro, so many apologies if I am asking a question with very obvious answers here.
The glht software and post hoc testing carries directly over to the glmmADMB package, but glmmADMB is 10x slower than glmmTMB. I need to perform multiple runs of this analysis, each with 300,000 examples of the negative binomial mixed model, so speed is essential.
Many thanks for your suggestions and help!

The second argument (specs) to emmeans is not the same as the linfct argument in glht, so you can't use it in the same way. You have to call emmeans() using it the way it was intended. The as.glht() function converts the result to a glht object, but it really is not necessary to do that as the emmeans summary yields similar results.
I think the results you were trying to get are obtainable via
emmeans(m2, ~ F2, at = list(F2 = c(0, 8, 25, 75)))
(using the original model with the predictors as quantitative variables). This will compute the adjusted means holding F1 at its average, and at each of the specified values of F2.
Please look at the documentation for emmeans(). In addition, there are lots of vignettes that provide explanations and examples -- starting with https://cran.r-project.org/web/packages/emmeans/vignettes/basics.html.

Following the advice of my excellent statistical consultant, I think the solution below provides what I had previously obtained using glht and linfct.
The slopes for F1 are calculated at the various levels of F2 by using contrast and emmeans to compute the differences in the dependendent variable between two values of F1 separated by one unit (i.e. c(0,1)). (Since the regression is linear, the two values of F1 are arbitrary, provided they are separated by one unit, eg c(3,4)). Vice versa for the slopes of F2.
Thus, slopes of F1 at F2 = 0, 8, 25, 75 and 27 (27 is average of F2):
contrast(emmeans(m1, specs="F1", at=list(F1=c(0,1), F2=0)),list(c(-1,1)))
(above equivalent to: summary(m1)$coefficients$cond["F1",])
contrast(emmeans(m1, specs="F1", at=list(F1=c(0,1), F2=8)),list(c(-1,1)))
contrast(emmeans(m1, specs="F1", at=list(F1=c(0,1), F2=25)),list(c(-1,1)))
contrast(emmeans(m1, specs="F1", at=list(F1=c(0,1), F2=75)),list(c(-1,1)))
contrast(emmeans(m1, specs="F1", at=list(F1=c(0,1), F2=27)),list(c(-1,1)))
and slopes of F2 at F1 = 1, 2, 3, 4, 6 and 3.2 (3.2 is average of F1, excluding zero value):
contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=0)),list(c(-1,1)))
(above equivalent to: summary(m1)$coefficients$cond["F2",])
contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=1)),list(c(-1,1)))
contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=2)),list(c(-1,1)))
contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=3)),list(c(-1,1)))
contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=4)),list(c(-1,1)))
contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=6)),list(c(-1,1)))
contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=3.2)),list(c(-1,1)))
Interaction of F1 and F2 slopes at F1 = 0 and F2 = 0
contrast(emmeans(m1, specs=c("F1","F2"), at=list(F1=c(0,1),F2=c(0,1))),list(c(1,-1,-1,1)))
(above equivalent to: summary(m1)$coefficients$cond["F1:F2",])
From the resulting emmGrid objects provided from contrast(), one can pick out as desired the estimate of the slope (estimate), standard deviation of the estimated slope (SE), Z score for the difference of the estimated slope from a null hypothesized slope of zero (z.ratio, calculated by emmGrid from estimate divided by SE) and corresponding P value (p.value calculated by emmGrid as 2*pnorm(-abs(z.ratio)).
For example:
contrast(emmeans(m1, specs="F1", at=list(F2=c(0,1), F1=0)),list(c(-1,1)))
yields:
NOTE: Results may be misleading due to involvement in interactions
contrast estimate SE df z.ratio p.value
c(-1, 1) 0.001971714 0.002616634 NA 0.754 0.4511
Postscript added 1.25 yrs later:
The above gives the correct solutions, but as Russell Lenth pointed out the answers are more easily obtained using emtrends. However, I have selected this answer as being correct since there may be have some didactic value in showing how to calculate slopes using emmeans to find the resulting change in the predicted dependent variable when the independent variable changes by 1.

Stacking time series data vertically

I am struggling with manipulation of time series data. The dataset has first Column containing information about time points of data collection, 2nd column onwards contains data from different studies.I have several hundred studies. As an example I have included sample data for 5 studies. I want to stack the dataset vertically with time and datapoints for each study. Example data set looks like data provided below:
TIME Study1 Study2 Study3 Study4 Study5
0.00 52.12 53.66 52.03 50.36 51.34
90.00 49.49 51.71 49.49 48.48 50.19
180.00 47.00 49.83 47.07 46.67 49.05
270.00 44.63 48.02 44.77 44.93 47.95
360.00 42.38 46.28 42.59 43.25 46.87
450.00 40.24 44.60 40.50 41.64 45.81
540.00 38.21 42.98 38.53 40.08 44.78
I am looking for an output in the form of:
TIME Study ID
0 52.12 1
90 49.49 1
180 47 1
270 44.63 1
360 42.38 1
450 40.24 1
540 38.21 1
0 53.66 2
90 51.71 2
180 49.83 2
270 48.02 2
360 46.28 2
450 44.6 2
540 42.98 2
0 52.03 3
90 49.49 3
180 47.07 3
270 44.77 3
...

This is a classic 'wide to long' dataset manipulation. Below, I show the use of the base function ?reshape for your data:
d.l <- reshape(d, varying=list(c("Study1","Study2","Study3","Study4","Study5")),
v.names="Y", idvar="TIME", times=1:5, timevar="Study",
direction="long")
d.l <- d.l[,c(2,1,3)]
rownames(d.l) <- NULL
d.l
# Study TIME Y
# 1 1 0 52.12
# 2 1 90 49.49
# 3 1 180 47.00
# 4 1 270 44.63
# 5 1 360 42.38
# 6 1 450 40.24
# 7 1 540 38.21
# 8 2 0 53.66
# 9 2 90 51.71
# 10 2 180 49.83
# 11 2 270 48.02
# 12 2 360 46.28
# 13 2 450 44.60
# 14 2 540 42.98
# 15 3 0 52.03
# 16 3 90 49.49
# 17 3 180 47.07
# ...
However, there are many ways to do this in R: the most basic reference on SO (of which this is probably a duplicate) is Reshaping data.frame from wide to long format, but there are many other relevant threads (see this search: [r] wide to long). Beyond using reshape, #lmo's method can be used, as well as methods based on the reshape2, tidyr, and data.table packages (presumably among others).

Here is one method using cbind and stack:
longdf <- cbind(df$TIME, stack(df[,-1], ))
names(longdf) <- c("TIME", "Study", "id")
This returns
longdf
TIME Study id
1 0 52.12 Study1
2 90 49.49 Study1
3 180 47.00 Study1
4 270 44.63 Study1
5 360 42.38 Study1
6 450 40.24 Study1
7 540 38.21 Study1
8 0 53.66 Study2
9 90 51.71 Study2
...
If you want to change id to integers as in your example, use
longdf$id <- as.integer(longdf$id)

Merging uneven Panel Data frames in R

I have two sets of panel data that I would like to merge. The problem is that, for each respective time interval, the variable which links the two data sets appears more frequently in the first data frame than the second. My objective is to add each row from the second data set to its corresponding row in the first data set, even if that necessitates copying said row multiple times in the same time interval. Specifically, I am working with basketball data from the NBA. The first data set is a panel of Player and Date while the second is one of Team (Tm) and Date. Thus, each Team entry should be copied multiple times per date, once for each player on that team who played that day. I could do this easily in excel, but the data frames are too large.
The result is 0 observations of 52 variables. I've experimented with bind, match, different versions of merge, and I've searched for everything I can think of; but, nothing seems to address this issue specifically. Disclaimer, I am very new to R.
Here is my code up until my road block:
HGwd = "~/Documents/Fantasy/Basketball"
library(plm)
library(mice)
library(VIM)
library(nnet)
library(tseries)
library(foreign)
library(ggplot2)
library(truncreg)
library(boot)
Pdata = read.csv("2015-16PlayerData.csv", header = T)
attach(Pdata)
Pdata$Age = as.numeric(as.character(Pdata$Age))
Pdata$Date = as.Date(Pdata$Date, '%m/%e/%Y')
names(Pdata)[8] = "OppTm"
Pdata$GS = as.factor(as.character(Pdata$GS))
Pdata$MP = as.numeric(as.character(Pdata$MP))
Pdata$FG = as.numeric(as.character(Pdata$FG))
Pdata$FGA = as.numeric(as.character(Pdata$FGA))
Pdata$X2P = as.numeric(as.character(Pdata$X2P))
Pdata$X2PA = as.numeric(as.character(Pdata$X2PA))
Pdata$X3P = as.numeric(as.character(Pdata$X3P))
Pdata$X3PA = as.numeric(as.character(Pdata$X3PA))
Pdata$FT = as.numeric(as.character(Pdata$FT))
Pdata$FTA = as.numeric(as.character(Pdata$FTA))
Pdata$ORB = as.numeric(as.character(Pdata$ORB))
Pdata$DRB = as.numeric(as.character(Pdata$DRB))
Pdata$TRB = as.numeric(as.character(Pdata$TRB))
Pdata$AST = as.numeric(as.character(Pdata$AST))
Pdata$STL = as.numeric(as.character(Pdata$STL))
Pdata$BLK = as.numeric(as.character(Pdata$BLK))
Pdata$TOV = as.numeric(as.character(Pdata$TOV))
Pdata$PF = as.numeric(as.character(Pdata$PF))
Pdata$PTS = as.numeric(as.character(Pdata$PTS))
PdataPD = plm.data(Pdata, index = c("Player", "Date"))
attach(PdataPD)
Tdata = read.csv("2015-16TeamData.csv", header = T)
attach(Tdata)
Tdata$Date = as.Date(Tdata$Date, '%m/%e/%Y')
names(Tdata)[3] = "OppTm"
Tdata$MP = as.numeric(as.character(Tdata$MP))
Tdata$FG = as.numeric(as.character(Tdata$FG))
Tdata$FGA = as.numeric(as.character(Tdata$FGA))
Tdata$X2P = as.numeric(as.character(Tdata$X2P))
Tdata$X2PA = as.numeric(as.character(Tdata$X2PA))
Tdata$X3P = as.numeric(as.character(Tdata$X3P))
Tdata$X3PA = as.numeric(as.character(Tdata$X3PA))
Tdata$FT = as.numeric(as.character(Tdata$FT))
Tdata$FTA = as.numeric(as.character(Tdata$FTA))
Tdata$PTS = as.numeric(as.character(Tdata$PTS))
Tdata$Opp.FG = as.numeric(as.character(Tdata$Opp.FG))
Tdata$Opp.FGA = as.numeric(as.character(Tdata$Opp.FGA))
Tdata$Opp.2P = as.numeric(as.character(Tdata$Opp.2P))
Tdata$Opp.2PA = as.numeric(as.character(Tdata$Opp.2PA))
Tdata$Opp.3P = as.numeric(as.character(Tdata$Opp.3P))
Tdata$Opp.3PA = as.numeric(as.character(Tdata$Opp.3PA))
Tdata$Opp.FT = as.numeric(as.character(Tdata$Opp.FT))
Tdata$Opp.FTA = as.numeric(as.character(Tdata$Opp.FTA))
Tdata$Opp.PTS = as.numeric(as.character(Tdata$Opp.PTS))
TdataPD = plm.data(Tdata, index = c("OppTm", "Date"))
attach(TdataPD)
PD = merge(PdataPD, TdataPD, by = "OppTm", all.x = TRUE)
attach(PD)
Any help on how to do this would be greatly appreciated!
EDIT
I've tweaked it a little from last night, but still nothing seems to do the trick. See the above, updated code for what I am currently using.
Here is the output for head(PdataPD):
Player Date Rk Pos Tm X..H OppTm W.L GS MP FG FGA FG. X2P
22408 Aaron Brooks 2015-10-27 817 G CHI CLE W 0 16 3 9 0.333 3
22144 Aaron Brooks 2015-10-28 553 G CHI # BRK W 0 16 5 9 0.556 3
21987 Aaron Brooks 2015-10-30 396 G CHI # DET L 0 18 2 6 0.333 1
21456 Aaron Brooks 2015-11-01 4687 G CHI ORL W 0 16 3 11 0.273 3
21152 Aaron Brooks 2015-11-03 4383 G CHI # CHO L 0 17 5 8 0.625 1
20805 Aaron Brooks 2015-11-05 4036 G CHI OKC W 0 13 4 8 0.500 3
X2PA X2P. X3P X3PA X3P. FT FTA FT. ORB DRB TRB AST STL BLK TOV PF PTS GmSc
22408 8 0.375 0 1 0.000 0 0 NA 0 2 2 0 0 0 2 1 6 -0.9
22144 3 1.000 2 6 0.333 0 0 NA 0 1 1 3 1 0 1 4 12 8.5
21987 2 0.500 1 4 0.250 0 0 NA 0 4 4 4 0 0 0 1 5 5.2
21456 6 0.500 0 5 0.000 0 0 NA 2 1 3 1 1 1 1 4 6 1.0
21152 3 0.333 4 5 0.800 0 0 NA 0 0 0 4 1 0 0 4 14 12.6
20805 5 0.600 1 3 0.333 0 0 NA 1 1 2 0 0 0 0 1 9 5.6
FPTS H.A
22408 7.50 H
22144 20.25 A
21987 16.50 A
21456 14.75 H
21152 24.00 A
20805 12.00 H
And for head(TdataPD):
OppTm Date Rk X Opp Result MP FG FGA FG. X2P X2PA X2P. X3P X3PA
2105 ATL 2015-10-27 71 DET L 94-106 240 37 82 0.451 29 55 0.527 8 27
2075 ATL 2015-10-29 41 # NYK W 112-101 240 42 83 0.506 32 59 0.542 10 24
2047 ATL 2015-10-30 13 CHO W 97-94 240 36 83 0.434 28 60 0.467 8 23
2025 ATL 2015-11-01 437 # CHO W 94-92 240 37 88 0.420 30 59 0.508 7 29
2001 ATL 2015-11-03 413 # MIA W 98-92 240 37 90 0.411 30 69 0.435 7 21
1973 ATL 2015-11-04 385 BRK W 101-87 240 37 76 0.487 29 54 0.537 8 22
X3P. FT FTA FT. PTS Opp.FG Opp.FGA Opp.FG. Opp.2P Opp.2PA Opp.2P. Opp.3P
2105 0.296 12 15 0.800 94 37 96 0.385 25 67 0.373 12
2075 0.417 18 26 0.692 112 38 93 0.409 32 64 0.500 6
2047 0.348 17 22 0.773 97 36 88 0.409 24 58 0.414 12
2025 0.241 13 14 0.929 94 32 86 0.372 18 49 0.367 14
2001 0.333 17 22 0.773 98 38 86 0.442 33 58 0.569 5
1973 0.364 19 24 0.792 101 36 83 0.434 31 62 0.500 5
Opp.3PA Opp.3P. Opp.FT Opp.FTA Opp.FT. Opp.PTS
2105 29 0.414 20 26 0.769 106
2075 29 0.207 19 21 0.905 101
2047 30 0.400 10 13 0.769 94
2025 37 0.378 14 15 0.933 92
2001 28 0.179 11 16 0.688 92
1973 21 0.238 10 13 0.769 87
If there is way to truncate the output from dput(head(___)), I am not familiar with it. It appears that simply erasing the excess characters would remove entire variables from the dataset.

It would help if you posted your data (or a working subset of it) and a little more detail on how you are trying to merge, but if I understand what you are trying to do, you want each final data record to have individual stats for each player on a particular date followed by the player's team's stats for that date. In this case, you should have a team column in the Player table that identifies the player's team, and then join the two tables on the composite key Date and Team by setting the by= attribute in merge:
merge(PData, TData, by=c("Date", "Team"))
The fact that the data frames are of different lengths doesn't matter--this is exactly what join/merge operations are for.
For an alternative to merge(), you might check out the dplyr package join functions at https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html

How can I get the recapture probabilities in R (which package to use) ?

I'm trying to find a way to estimate the recapture probabilities in my data. Here is an example directly from the package FSA in R.
library(FSA)
## First example -- capture histories summarized with capHistSum()
data(CutthroatAL)
ch1 <- capHistSum(CutthroatAL,cols2use=-1) # ignore first column of fish ID
ex1 <- mrOpen(ch1)
summary(ex1)
summary(ex1,verbose=TRUE)
confint(ex1)
confint(ex1,verbose=TRUE)
If you type summary(ex1,verbose=TRUE), you'll have this result
# Observables:
# m n R r z
# i=1 0 89 89 26 NA
# i=2 22 352 352 96 4
# i=3 94 292 292 51 6
# i=4 41 233 233 46 16
# i=5 58 259 259 100 4
# i=6 99 370 370 99 5
# i=7 91 290 290 44 13
# i=8 52 134 134 13 5
# i=9 18 140 0 NA NA
# Estimates (phi.se includes sampling and individual variability):
# M M.se N N.se phi phi.se B B.se
# i=1 NA NA NA NA 0.411 0.088 NA NA
# i=2 36.6 6.4 561.1 117.9 0.349 0.045 198.6 48.2
# i=3 127.8 13.4 394.2 44.2 0.370 0.071 526.3 119.7
# i=4 120.7 20.8 672.2 138.8 0.218 0.031 154.1 30.2
# i=5 68.3 4.1 301.0 21.8 0.437 0.041 304.7 25.4
# i=6 117.5 7.3 436.1 30.3 0.451 0.069 357.2 61.2
# i=7 175.1 24.6 553.7 84.3 0.268 0.072 106.9 36.2
# i=8 100.2 24.7 255.3 65.4 NA NA NA NA
# i=9 NA NA NA NA NA NA NA NA
Since, "Observables" is not in a list, I cannot extract automatically the numbers. Is it possible?
I have the same type of dataset, but the output won't show me a probability of recapture. I have an open population. That's why I try to use this package.
Here's a look of the typical dataset:
head(CutthroatAL)
# id y1998 y1999 y2000 y2001 y2002 y2003 y2004 y2005 y2006
# 1 1 0 0 0 0 0 0 0 0 1
# 2 2 0 0 0 0 0 0 0 0 1
# 3 3 0 0 0 0 0 0 0 0 1
# 4 4 0 0 0 0 0 0 0 0 1
# 5 5 0 0 0 0 0 0 0 0 1
# 6 6 0 0 0 0 0 0 0 0 1
I also tried the package mra and its F.cjs.estim() function. But, I don't have survival information...
I haven't find any function in RCapture that allows me to print a capture probability.
I'm trying to find the information pj on page 38 of this book Handbook of Capture-Recapture Analysis.
I haven't found as well in the RMark package.
So how can I estimate recapture probabilities in R?
Thanks,

If you just want to capture the "Observable" values in the summary, you can do it the same way the function does. If you look at the source for FSA:::summary.mrOpen, you can see that you can grab those values with
ex1$df[, c("m", "n", "R", "r", "z")]

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Locating changepoints in multiple regression model - r

Related

how to fit curve with two exponential terms in R

glmmTMB, post-hoc testing and glht

Stacking time series data vertically

Merging uneven Panel Data frames in R

How can I get the recapture probabilities in R (which package to use) ?

Categories

Resources