R variable evaluates differently depending on context - in loop or not - r

EDIT: now with reproducible code/data.
I am trying to run chi-squared tests on multiple variables in my dataframe.
Using the npk dataset:
A single variable, N producing the proper result.
npk %>%
group_by(yield, N) %>%
select(yield, N) %>%
table() %>%
print() %>%
chisq.test()
As you can see the output of table() is in a form that chisq.test() can utilize.
N
yield 0 1
44.2 1 0
45.5 1 0
46.8 1 0
48.8 1 1
49.5 1 0
49.8 0 1
51.5 1 0
52 0 1
53.2 1 0
55 1 0
55.5 1 0
55.8 0 1
56 2 0
57 0 1
57.2 0 1
58.5 0 1
59 0 1
59.8 0 1
62 0 1
62.8 1 1
69.5 0 1
Pearson's Chi-squared test
data: .
X-squared = 20, df = 20, p-value = 0.4579
When I try and do multiple tests using a loop something about calling on the particular variable changes the output of my table and the chi-squared test cannot run.
Create the list that the loop runs through:
test_ordinal_variables <- noquote(names(npk[2:4]))
test_ordinal_variables
The loop with the errorcode: (1:1 for clarity, error is repeated if you use 1:3)
for (i in 1:1){
npk %>%
group_by(yield, test_ordinal_variables[i]) %>%
select(yield, test_ordinal_variables[i]) %>%
table() %>%
print() %>%
chisq.test()
}
The output clearly showing the table that chisq.test() cannot interpret:
Adding missing grouping variables: `test_ordinal_variables[i]`
, , N = 0
yield
test_ordinal_variables[i] 44.2 45.5 46.8 48.8 49.5 49.8 51.5 52 53.2 55 55.5 55.8 56 57 57.2 58.5 59 59.8 62
N 1 1 1 1 1 0 1 0 1 1 1 0 2 0 0 0 0 0 0
yield
test_ordinal_variables[i] 62.8 69.5
N 1 0
, , N = 1
yield
test_ordinal_variables[i] 44.2 45.5 46.8 48.8 49.5 49.8 51.5 52 53.2 55 55.5 55.8 56 57 57.2 58.5 59 59.8 62
N 0 0 0 1 0 1 0 1 0 0 0 1 0 1 1 1 1 1 1
yield
test_ordinal_variables[i] 62.8 69.5
N 1 1
For some reason test_ordinal_variables[i] is not evaluating perfectly to what I would expect when it is in the loop. You can see as the error claimed that it is "Adding missing grouping variables", but if it just evaluated the expression rather than adding a variable then I think it would work.
This evaluates on its own as I would expect.
> test_ordinal_variables[1]
[1] N
So why won't it do the same when it is in the loop?

Since you are passing a dynamic, quoted variable into a dplyr chained method consider the group_by_() and select_() underscore counterpart versions. And since yield is not being dynamically passed, convert it to a symbol() to be processed.
for (i in names(npk[2:4])){
npk %>%
group_by_(as.symbol("yield"), i) %>%
select_(as.symbol("yield"), i) %>%
table() %>%
print() %>%
chisq.test() %>%
print()
}
Output
N
yield 0 1
44.2 1 0
45.5 1 0
46.8 1 0
48.8 1 1
49.5 1 0
49.8 0 1
51.5 1 0
52 0 1
53.2 1 0
55 1 0
55.5 1 0
55.8 0 1
56 2 0
57 0 1
57.2 0 1
58.5 0 1
59 0 1
59.8 0 1
62 0 1
62.8 1 1
69.5 0 1
Pearson's Chi-squared test
data: .
X-squared = 20, df = 20, p-value = 0.4579
P
yield 0 1
44.2 0 1
45.5 1 0
46.8 1 0
48.8 0 2
49.5 0 1
49.8 1 0
51.5 1 0
52 0 1
53.2 0 1
55 1 0
55.5 1 0
55.8 0 1
56 1 1
57 1 0
57.2 1 0
58.5 0 1
59 0 1
59.8 1 0
62 1 0
62.8 0 2
69.5 1 0
Pearson's Chi-squared test
data: .
X-squared = 22, df = 20, p-value = 0.3405
K
yield 0 1
44.2 1 0
45.5 0 1
46.8 1 0
48.8 0 2
49.5 0 1
49.8 0 1
51.5 1 0
52 1 0
53.2 0 1
55 0 1
55.5 0 1
55.8 0 1
56 2 0
57 0 1
57.2 0 1
58.5 0 1
59 1 0
59.8 1 0
62 1 0
62.8 2 0
69.5 1 0
Pearson's Chi-squared test
data: .
X-squared = 24, df = 20, p-value = 0.2424
Warning messages:
1: In chisq.test(.) : Chi-squared approximation may be incorrect
2: In chisq.test(.) : Chi-squared approximation may be incorrect
3: In chisq.test(.) : Chi-squared approximation may be incorrect

Related

Cannot read .data file under R?

Good morning,
I need to read the following .data file : https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/cleveland.data
For this , I tried without success :
f <-file("https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/cleveland.data", open="r" ,encoding="UTF-16LE")
data <- read.table(f, dec=",", header=F)
Thank you a lot for help!
I would try to use the coatless/ucidata package to access the data.
https://github.com/coatless/ucidata
Here you can see how the package loads in the data file and processing:
https://github.com/coatless/ucidata/blob/master/data-raw/heart_disease_build.R
If you wish to try out the package, you will need devtools installed. Here is what you can try:
# install.packages("devtools")
devtools::install_github("coatless/ucidata")
# load data
data("heart_disease_cl", package = "ucidata")
# show beginning rows of data
head(heart_disease_cl)
Output
age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal num
1 63 Male typical angina 145 233 1 probable/definite hypertrophy 150 No 2.3 downsloping 0 fixed defect 0
2 67 Male asymptomatic 160 286 0 probable/definite hypertrophy 108 Yes 1.5 flat 3 normal 2
3 67 Male asymptomatic 120 229 0 probable/definite hypertrophy 129 Yes 2.6 flat 2 reversable defect 1
4 37 Male non-anginal pain 130 250 0 normal 187 No 3.5 downsloping 0 normal 0
5 41 Female atypical angina 130 204 0 probable/definite hypertrophy 172 No 1.4 upsloping 0 normal 0
6 56 Male atypical angina 120 236 0 normal 178 No 0.8 upsloping 0 normal 0
I had found another solution with RCurl :
library (RCurl)
download <- getURL("http://archive.ics.uci.edu/ml/machine-learning-databases/00519/heart_failure_clinical_records_dataset.csv")
data <- read.csv (text = download)
head(data)
#Output :
age anaemia creatinine_phosphokinase diabetes ejection_fraction high_blood_pressure platelets serum_creatinine
1 75 0 582 0 20 1 265000 1.9
2 55 0 7861 0 38 0 263358 1.1
3 65 0 146 0 20 0 162000 1.3
4 50 1 111 0 20 0 210000 1.9
5 65 1 160 1 20 0 327000 2.7
6 90 1 47 0 40 1 204000 2.1
serum_sodium sex smoking time DEATH_EVENT
1 130 1 0 4 1
2 136 1 0 6 1
3 129 1 1 7 1
4 137 1 0 7 1
5 116 0 0 8 1
6 132 1 1 8 1

R / PLM Package: How to include a nested effect for the same brands across countries?

I am currently analyzing a panel data set, which tracks brands across countries over time.In particular, I am interested in how certain variables are impacting future market share.
Using the head() function, my data set looks as followed:
# A tibble: 20 x 10
Country Brand Date MS uuu vvv www xxx yyy zzz
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 BR AAA 2015Q1 13.0 46.4 66.7 23.8 9 23.7 613.
2 DE AAA 2015Q1 16.1 56.5 35.6 36.5 8 23.7 461.
3 ES AAA 2015Q1 15.4 40.9 40.9 27.6 9 23.7 548.
4 FR AAA 2015Q1 14.4 48.4 25.1 30.1 8 23.7 414.
5 IT AAA 2015Q1 15.0 33.2 33.2 35.2 9 23.7 650.
6 JP AAA 2015Q1 14.8 19.6 -5.81 31.5 9 23.7 735.
7 KO AAA 2015Q1 14.3 45.7 26.1 42.4 9 23.7 410.
8 UK AAA 2015Q1 17.6 40.6 12.7 25.9 9 23.7 660.
9 US AAA 2015Q1 3.82 -0.651 -0.651 23.2 11 23.7 429.
10 BR AAA 2015Q2 14.6 40.4 62.2 25.0 9 23.7 583.
11 DE AAA 2015Q2 16.9 52.0 30.6 35.3 9 23.7 539.
12 ES AAA 2015Q2 14.7 41.8 41.8 30.1 8 23.7 558.
13 FR AAA 2015Q2 15.2 51.7 29.1 30.2 9 23.7 445.
14 IT AAA 2015Q2 16.9 44.4 44.4 35.5 9 23.7 573.
15 JP AAA 2015Q2 16.2 20.3 -8.73 29.4 9 23.7 664.
16 KO AAA 2015Q2 16.0 49.1 31.6 40.2 9 23.7 408.
17 UK AAA 2015Q2 18.8 34.4 10.2 27.5 9 23.7 788.
18 US AAA 2015Q2 4.12 0.0770 0.0770 22.5 11 23.7 446.
19 BR AAA 2015Q3 13.4 43.5 65.6 26.6 9 23.7 624.
20 CN AAA 2015Q3 14.3 59.9 66.1 48.6 9 23.7 664.
Similar to some examples which I could find in here or on other websites, I attempted to use the PLM package in R to analyze the data via a nested random effects regression.
I started writing my code as followed:
p_crosscountry2 <- pdata.frame(crosscountry2,index=c("Brand","Date","Country"))
which is pretty much the same as specified in this threat:
estimate a repeated measures random effects model with a nested structure using `plm()`
However, there is one fundamental difference between the examples I could find and my case:
In all examples, each lowest level unit is only linked to one higher level (example: each Student only belongs to one school), while in my case the lowest level units can belong to several higher levels (several brands are present in multiple countries).
When I run the dataframe code, it shows me the following error message:
Warning message:
In pdata.frame(crosscountry2, index = c("Brand", "Date", :
duplicate couples (id-time) in resulting pdata.frame
to find out which, use e.g. table(index(your_pdataframe), useNA = "ifany")
Looking into the table (using the proposed function), the data structure looks fine:
, , Country = BR
Date
Brand 2015Q1 2015Q2 2015Q3 2015Q4 2016Q1 2016Q2 2016Q3 2016Q4 2017Q1 2017Q2 2017Q3
AAA 1 1 1 1 1 1 1 1 1 1 1
BBB 0 0 0 0 0 0 0 0 0 0 0
CCC 1 1 1 1 1 1 1 1 1 1 1
DDD 0 0 0 0 0 0 0 0 0 0 0
EEE 0 0 0 0 0 0 0 0 0 0 0
FFF 0 0 0 0 0 0 0 0 0 0 0
GGG 0 0 0 0 0 0 0 0 0 0 0
HHH 1 1 1 1 1 1 1 1 1 1 1
III 1 1 1 1 1 1 1 1 1 1 1
JJJ 1 1 1 1 1 1 1 1 1 1 1
KKK 1 1 1 1 1 1 1 1 1 1 1
LLL 0 0 0 0 0 0 0 0 0 0 0
MMM 0 0 0 0 0 0 0 0 0 0 0
, , Country = CN
Date
Brand 2015Q1 2015Q2 2015Q3 2015Q4 2016Q1 2016Q2 2016Q3 2016Q4 2017Q1 2017Q2 2017Q3
AAA 0 0 1 1 1 1 1 1 1 1 0
BBB 0 0 1 1 1 1 1 1 1 1 0
CCC 0 0 0 0 0 0 0 0 0 0 0
DDD 0 0 0 0 0 0 0 0 0 0 0
EEE 0 0 1 1 1 1 1 1 1 1 0
FFF 0 0 0 0 0 0 0 0 0 0 0
GGG 0 0 1 1 1 1 1 1 1 1 0
HHH 0 0 1 1 1 1 1 1 1 1 0
Using the head() function, the table looks as followed:
Country Brand Date MS uuu vvv www xxx yyy zzz
BR-AAA-2015Q1 BR AAA 2015Q1 12.97 46.43449 66.66667 23.79714 9 23.7342 613.1986
BR-AAA-2015Q2 BR AAA 2015Q2 14.56 40.38156 62.16216 24.96264 9 23.7342 583.3325
BR-AAA-2015Q3 BR AAA 2015Q3 13.38 43.53741 65.64626 26.59215 9 23.7342 623.8418
BR-AAA-2015Q4 BR AAA 2015Q4 14.94 42.18077 67.14491 26.05669 9 23.7342 610.7607
BR-AAA-2016Q1 BR AAA 2016Q1 15.10 46.62681 68.57387 26.93074 9 26.6329 650.2189
BR-AAA-2016Q2 BR AAA 2016Q2 15.17 48.34142 71.68285 25.15683 9 26.6329 671.1437
BR-AAA-2016Q3 BR AAA 2016Q3 13.90 49.98002 71.39433 26.26867 9 26.6329 645.4896
BR-AAA-2016Q4 BR AAA 2016Q4 15.93 50.23791 71.45123 25.62308 9 26.6329 669.8751
BR-AAA-2017Q1 BR AAA 2017Q1 14.51 50.65138 72.48567 25.49358 9 31.3494 768.1376
BR-AAA-2017Q2 BR AAA 2017Q2 14.71 50.07792 73.29870 27.33325 9 31.3494 639.0572
BR-AAA-2017Q3 BR AAA 2017Q3 14.02 50.25853 72.64736 25.06274 9 31.3494 666.7235
BR-CCC-2015Q1 BR CCC 2015Q1 4.87 -47.35099 -24.83444 57.96842 7 4.7340 613.1986
BR-CCC-2015Q2 BR CCC 2015Q2 4.91 -60.50955 -41.71975 50.41528 8 4.7340 583.3325
BR-CCC-2015Q3 BR CCC 2015Q3 4.86 -56.59722 -39.58333 43.69692 6 4.7340 623.8418
BR-CCC-2015Q4 BR CCC 2015Q4 4.40 -57.38636 -40.62500 45.89061 7 4.7340 610.7607
BR-CCC-2016Q1 BR CCC 2016Q1 4.34 -46.82131 -27.83505 46.66406 6 4.8588 650.2189
BR-CCC-2016Q2 BR CCC 2016Q2 4.63 -45.86039 -27.02922 52.84351 8 4.8588 671.1437
BR-CCC-2016Q3 BR CCC 2016Q3 5.47 -52.37342 -32.27848 62.05175 6 4.8588 645.4896
BR-CCC-2016Q4 BR CCC 2016Q4 4.82 -43.54067 -24.40191 58.59462 7 4.8588 669.8751
BR-CCC-2017Q1 BR CCC 2017Q1 5.10 -42.12185 -22.26891 46.53642 8 4.9031 768.1376
I would have two questions on this:
1) Is there a different way to define the dataframe properly? I also tried different orders of the same columns but always ended up with the same error message.
2) As a work-around I created a combined variable of country & brand (e.g. AAA_BR, AAA_CN, CCC_BR,...etc) and used this one instead of the brand variable. This version actually allowed me to executed the nested random effects regression. Do you think that this is feasible option or do I run into problems as I don't account for the fact that the same brands are present in multiple countries anymore?
Thanks for your support!
Sven

Use cases with higher value on one variable for each case of another variable in R

I am doing a meta-analysis in R. For each study (variable StudyID) I have multiple effect sizes. For some studies I have the same effect size multiple times depending on the level of acquaintance (variable Familiarity) between the subjects.
head(dat)
studyID A.C.Extent Visibility Familiarity p_t_cov group.size same.sex N published
1 1 3.0 5.0 1 0.0462 4 0 44 1
2 1 5.0 2.5 1 0.1335 4 0 44 1
3 1 2.5 3.0 1 -0.1239 4 0 44 1
4 1 2.5 3.5 1 0.2062 4 0 44 1
5 1 2.5 3.0 1 -0.0370 4 0 44 1
6 1 3.0 5.0 1 -0.3850 4 0 44 1
Those are the first rows of the data set. In total there are over 50 studies. Most studies look like study 1 with the same value in "Familiarity" for all effect sizes. In some studies, there are effect sizes with multiple levels of familiarity. For example study 36 as seen below.
head(dat)
studyID A.C.Extent Visibility Familiarity p_t_cov group.size same.sex N published
142 36 1.0 4.5 0 0.1233 5.00 0 311 1
143 36 3.5 3.0 0 0.0428 5.00 0 311 1
144 36 1.0 4.5 0 0.0986 5.00 0 311 1
145 36 1.0 4.5 1 -0.0520 5.00 0 311 1
146 36 1.5 2.5 1 -0.0258 5.00 0 311 1
147 36 3.5 3.0 1 0.1104 5.00 0 311 1
148 36 1.0 4.5 1 0.0282 5.00 0 311 1
149 36 1.0 4.5 2 -0.1724 5.00 0 311 1
150 36 3.5 3.0 2 0.2646 5.00 0 311 1
151 36 1.0 4.5 2 -0.1426 5.00 0 311 1
152 37 3.0 4.0 1 0.0118 5.35 0 123 0
153 37 1.0 4.5 1 -0.3205 5.35 0 123 0
154 37 2.5 3.0 1 -0.2356 5.35 0 123 0
155 37 3.0 2.0 1 0.1372 5.35 0 123 0
156 37 2.5 2.5 1 -0.1401 5.35 0 123 0
157 37 3.0 3.5 1 -0.3334 5.35 0 123 0
158 37 2.5 2.5 1 0.0317 5.35 0 123 0
159 37 1.0 3.0 1 -0.3025 5.35 0 123 0
160 37 1.0 3.5 1 -0.3248 5.35 0 123 0
Now I want for those studies that include multiple levels of familiarity, to take the rows with only one level of familiarity (two seperate versions: one with the lower, one with the higher familiarity).
I think that it can be possible with the package dplyr, but I have no real code so far.
In a second step I would like to give those rows unique studyIDs for each level of familiarity (so create out of study 36 three "different" studies).
Thank you in advance!
If you want to use dplyr, you could create an alternate ID or casenum by using group_indices:
df <- df %>%
mutate(case_num = group_indices(.dots=c("studyID", "Familiarity")))
You could do:
library(dplyr)
df %>%
group_by(studyID) %>%
mutate(nDist = n_distinct(Familiarity) > 1) %>%
ungroup() %>%
mutate(
studyID = case_when(nDist ~ paste(studyID, Familiarity, sep = "_"), TRUE ~ studyID %>% as.character),
nDist = NULL
)
Output:
# A tibble: 19 x 9
studyID A.C.Extent Visibility Familiarity p_t_cov group.size same.sex N published
<chr> <dbl> <dbl> <int> <dbl> <dbl> <int> <int> <int>
1 36_0 1 4.5 0 0.123 5 0 311 1
2 36_0 3.5 3 0 0.0428 5 0 311 1
3 36_0 1 4.5 0 0.0986 5 0 311 1
4 36_1 1 4.5 1 -0.052 5 0 311 1
5 36_1 1.5 2.5 1 -0.0258 5 0 311 1
6 36_1 3.5 3 1 0.110 5 0 311 1
7 36_1 1 4.5 1 0.0282 5 0 311 1
8 36_2 1 4.5 2 -0.172 5 0 311 1
9 36_2 3.5 3 2 0.265 5 0 311 1
10 36_2 1 4.5 2 -0.143 5 0 311 1
11 37 3 4 1 0.0118 5.35 0 123 0
12 37 1 4.5 1 -0.320 5.35 0 123 0
13 37 2.5 3 1 -0.236 5.35 0 123 0
14 37 3 2 1 0.137 5.35 0 123 0
15 37 2.5 2.5 1 -0.140 5.35 0 123 0
16 37 3 3.5 1 -0.333 5.35 0 123 0
17 37 2.5 2.5 1 0.0317 5.35 0 123 0
18 37 1 3 1 -0.302 5.35 0 123 0
19 37 1 3.5 1 -0.325 5.35 0 123 0

Removing data with a non-numeric column value in R

So I have a dataset that includes the lung capacity of certain individuals. I am trying to analyze the data distributions and relations. The only problem is that the data is somewhat incomplete. Some of the rows include "N/A" as the lung capacity. This is causing an issue because it is resulting in a mean and sd of always "N/A" for the different subsets. How would I form this into a subset so that it only includes the data that isn't N/A?
I've tried this:
fData1 = read.table("lung.txt",header=TRUE)
fData2= fData1[fData1$fev!="N/A"]
but this gives me an "undefinied columns selected error".
How can I make it so that I have a data set that excludes the rows with "N/A"?
Here is the begining of my data set:
id age fev height male smoke
1 72 1.2840 66.5 1 1
2 81 2.5530 67.0 0 0
3 90 2.3830 67.0 1 0
4 72 2.6990 71.5 1 0
5 70 2.0310 62.5 0 0
6 72 2.4100 67.5 1 0
7 75 3.5860 69.0 1 0
8 75 2.9580 67.0 1 0
9 67 1.9160 62.5 0 0
10 70 NA 66.0 0 1
One option is to apply the operations excluding the NA values:
dat <- read.table("lung.txt", header = T, na.strings = "NA")
mean(dat$fev, na.rm=T) # mean of fev col
sd(dat$fev, na.rm=T)
If you simply want to get rid of the NAs:
fData1 <- na.omit(fData1)
fData1 <- na.exclude(fData1) # same result
If you'd like to save the rows with NA's here are 2 options:
fData2 <- fData1[is.na(fData1$fev), ]
fData2 <- subset(fData1, is.na(fData1$fev))
If you just want to filter out rows with NA values, you can use complete.cases():
> df
id age fev height male smoke
1 1 72 1.284 66.5 1 1
2 2 81 2.553 67.0 0 0
3 3 90 2.383 67.0 1 0
4 4 72 2.699 71.5 1 0
5 5 70 2.031 62.5 0 0
6 6 72 2.410 67.5 1 0
7 7 75 3.586 69.0 1 0
8 8 75 2.958 67.0 1 0
9 9 67 1.916 62.5 0 0
10 10 70 NA 66.0 0 1
> df[complete.cases(df), ]
id age fev height male smoke
1 1 72 1.284 66.5 1 1
2 2 81 2.553 67.0 0 0
3 3 90 2.383 67.0 1 0
4 4 72 2.699 71.5 1 0
5 5 70 2.031 62.5 0 0
6 6 72 2.410 67.5 1 0
7 7 75 3.586 69.0 1 0
8 8 75 2.958 67.0 1 0
9 9 67 1.916 62.5 0 0

How can I get the recapture probabilities in R (which package to use) ?

I'm trying to find a way to estimate the recapture probabilities in my data. Here is an example directly from the package FSA in R.
library(FSA)
## First example -- capture histories summarized with capHistSum()
data(CutthroatAL)
ch1 <- capHistSum(CutthroatAL,cols2use=-1) # ignore first column of fish ID
ex1 <- mrOpen(ch1)
summary(ex1)
summary(ex1,verbose=TRUE)
confint(ex1)
confint(ex1,verbose=TRUE)
If you type summary(ex1,verbose=TRUE), you'll have this result
# Observables:
# m n R r z
# i=1 0 89 89 26 NA
# i=2 22 352 352 96 4
# i=3 94 292 292 51 6
# i=4 41 233 233 46 16
# i=5 58 259 259 100 4
# i=6 99 370 370 99 5
# i=7 91 290 290 44 13
# i=8 52 134 134 13 5
# i=9 18 140 0 NA NA
# Estimates (phi.se includes sampling and individual variability):
# M M.se N N.se phi phi.se B B.se
# i=1 NA NA NA NA 0.411 0.088 NA NA
# i=2 36.6 6.4 561.1 117.9 0.349 0.045 198.6 48.2
# i=3 127.8 13.4 394.2 44.2 0.370 0.071 526.3 119.7
# i=4 120.7 20.8 672.2 138.8 0.218 0.031 154.1 30.2
# i=5 68.3 4.1 301.0 21.8 0.437 0.041 304.7 25.4
# i=6 117.5 7.3 436.1 30.3 0.451 0.069 357.2 61.2
# i=7 175.1 24.6 553.7 84.3 0.268 0.072 106.9 36.2
# i=8 100.2 24.7 255.3 65.4 NA NA NA NA
# i=9 NA NA NA NA NA NA NA NA
Since, "Observables" is not in a list, I cannot extract automatically the numbers. Is it possible?
I have the same type of dataset, but the output won't show me a probability of recapture. I have an open population. That's why I try to use this package.
Here's a look of the typical dataset:
head(CutthroatAL)
# id y1998 y1999 y2000 y2001 y2002 y2003 y2004 y2005 y2006
# 1 1 0 0 0 0 0 0 0 0 1
# 2 2 0 0 0 0 0 0 0 0 1
# 3 3 0 0 0 0 0 0 0 0 1
# 4 4 0 0 0 0 0 0 0 0 1
# 5 5 0 0 0 0 0 0 0 0 1
# 6 6 0 0 0 0 0 0 0 0 1
I also tried the package mra and its F.cjs.estim() function. But, I don't have survival information...
I haven't find any function in RCapture that allows me to print a capture probability.
I'm trying to find the information pj on page 38 of this book Handbook of Capture-Recapture Analysis.
I haven't found as well in the RMark package.
So how can I estimate recapture probabilities in R?
Thanks,
If you just want to capture the "Observable" values in the summary, you can do it the same way the function does. If you look at the source for FSA:::summary.mrOpen, you can see that you can grab those values with
ex1$df[, c("m", "n", "R", "r", "z")]

Resources