Is it possible to fit Partial Credit Model when one of possible responses is never selected in one of items? - r

I'm fitting a Partial Credit Model (PCM) with ltm package.
Suppose, my data contains 3 items each scored 1, 2 or 3, like this one:
my_data<-data.frame(
X1 = c(1,1,3,1,1,3,1,3,1,1,3,3,3,3,3,3,3,3,1,3,3,3,3,1,1,3,3,3,3,3,3,3,3,1,3,3,3,1,1,3),
X2 = c(1,1,2,3,2,3,2,3,3,3,3,3,3,3,2,2,2,2,2,2,2,2,3,3,3,3,3,3,2,2,2,2,2,2,2,2,3,2,1,1),
X3 = c(2,1,2,2,3,3,2,3,1,2,1,1,1,3,2,2,1,1,1,2,3,1,3,3,2,3,1,2,1,1,1,3,2,2,1,1,1,2,2,1)
)
But it happened that no one have chosen option 2 in the first item:
lapply(my_data, table)
$X1
1 3
13 27
$X2
1 2 3
4 20 16
$X3
1 2 3
17 14 9
Now, when I run ltm::gpcm() to fit the model and factor.scores() to examine person abilities, I get the following output:
library('ltm')
fit<-gpcm(my_data, constraint='rasch')
factor.scores(fit)
Call:
gpcm(data = my_data, constraint = "rasch")
Scoring Method: Empirical Bayes
Factor-Scores for observed response patterns:
X1 X2 X3 Obs Exp z1 se.z1
1 1 1 1 1 1.578 -1.414 0.744
2 1 1 2 2 0.486 -0.880 0.718
3 1 2 1 1 4.228 -0.880 0.718
4 1 2 2 3 2.209 -0.379 0.700
5 1 2 3 1 0.787 0.104 0.694
6 1 3 1 1 1.546 -0.379 0.700
7 1 3 2 3 1.343 0.104 0.694
8 1 3 3 1 0.793 0.591 0.705
9 2 1 1 1 1.159 -0.880 0.718
10 2 2 1 8 5.267 -0.379 0.700
11 2 2 2 5 4.573 0.104 0.694
12 2 2 3 2 2.701 0.591 0.705
13 2 3 1 5 3.201 0.104 0.694
14 2 3 2 1 4.607 0.591 0.705
15 2 3 3 5 4.597 1.107 0.737
It looks like X1 is treated like it had two possible responses: "1" and "2", not "1" and "3"!
Is there any way to inlude unobserved response "2" for X1?
Why this is important?
It's all about scoring. Look at lines 2 and 9 above:
Line 2 is espondent, who scored 1, 1 and 2 (respectively on X1, X2 and X3).
Line 9 is respondent who scored 3, 1, 1 (since X1=3 in original dataset is recoded to X1=2 by ltm package)
Those two people have:
exatly the same person-ability score assigned (column z1),
different raw scores (4 and 5, respectively),
which should not happen.
To be precise: I understand why this happens. My question is how to overcome such behaviour?

Related

Flag run-length of grouped intervals

I have a dataframe grouped by grp:
df <- data.frame(
v = rnorm(25),
grp = c(rep("A",10), rep("B",15)),
size = 2)
I want to flag the run-length of intervals determined by size. For example, for grp == "A", size is 2, and the number of rows is 10. So the interval should have length 10/2 = 5. This code, however, creates intervals with length 2:
df %>%
group_by(grp) %>%
mutate(
interval = (row_number() -1) %/% size)
# A tibble: 25 × 4
# Groups: grp [2]
v grp size interval
<dbl> <chr> <dbl> <dbl>
1 -0.166 A 2 0
2 -1.12 A 2 0
3 0.941 A 2 1
4 -0.913 A 2 1
5 0.486 A 2 2
6 -1.80 A 2 2
7 -0.370 A 2 3
8 -0.209 A 2 3
9 -0.661 A 2 4
10 -0.177 A 2 4
# … with 15 more rows
How can I flag the correct run-length of the size-determined intervals? The desired output is this:
# A tibble: 25 × 4
# Groups: grp [2]
v grp size interval
<dbl> <chr> <dbl> <dbl>
1 -0.166 A 2 0
2 -1.12 A 2 0
3 0.941 A 2 0
4 -0.913 A 2 0
5 0.486 A 2 0
6 -1.80 A 2 1
7 -0.370 A 2 1
8 -0.209 A 2 1
9 -0.661 A 2 1
10 -0.177 A 2 1
# … with 15 more rows
If I interpreted your question correctly, this small change should do the trick?
df %>%
group_by(grp) %>%
mutate(
interval = (row_number() -1) %/% (n()/size))
You can use gl:
df %>%
group_by(grp) %>%
mutate(interval = gl(first(size), ceiling(n() / first(size)))[1:n()])
output
# A tibble: 26 × 4
# Groups: grp [2]
v grp size interval
<dbl> <chr> <dbl> <fct>
1 -1.12 A 2 1
2 3.04 A 2 1
3 0.235 A 2 1
4 -0.0333 A 2 1
5 -2.73 A 2 1
6 -0.0998 A 2 1
7 0.976 A 2 2
8 0.414 A 2 2
9 0.912 A 2 2
10 1.98 A 2 2
11 1.17 A 2 2
12 -0.509 B 2 1
13 0.704 B 2 1
14 -0.198 B 2 1
15 -0.538 B 2 1
16 -2.86 B 2 1
17 -0.790 B 2 1
18 0.488 B 2 1
19 2.17 B 2 1
20 0.501 B 2 2
21 0.620 B 2 2
22 -0.966 B 2 2
23 0.163 B 2 2
24 -2.08 B 2 2
25 0.485 B 2 2
26 0.697 B 2 2

Sum groupwise but using only individual's first entry

Consider the following sample data. The data has 2 individuals per group and each individual has 2 entries.
rm(list=ls()); set.seed(1234)
G=4 ; # Suppose you have 4 groups
nTot<-8 # We have 2 individuals per group so in total we have 8 individuals
group<-rep(1:G, rep(4,G) )#Group identifier
individualID<-rep(1:nTot, rep(2,nTot) )#We have 2 individuals per group each with 2 entries
n<-2*nTot # We have 16 entries in total
X<-rbinom(n, 1, 0.5)
Y<-runif(n, 0, 1)
Z<-runif(n, 0, 4)
df1<-round(data.frame(group,individualID,X,Y,Z),3)
> df1
group individualID X Y Z
1 1 1 0 0.286 1.219
2 1 1 1 0.267 2.029
3 1 2 1 0.187 0.724
4 1 2 1 0.232 3.039
5 2 3 1 0.317 0.805
6 2 3 1 0.303 1.035
7 2 4 0 0.159 3.969
8 2 4 0 0.040 3.229
9 3 5 1 0.219 2.213
10 3 5 1 0.811 2.586
11 3 6 1 0.526 1.247
12 3 6 1 0.915 2.487
13 4 7 0 0.831 1.319
14 4 7 1 0.046 2.008
15 4 8 0 0.456 2.708
16 4 8 1 0.265 1.940
Func<-X*Y+Z
Func
The code below computes sum of Func per group using split() function.
Func<-X*Y+Z
GroupSum<-as.numeric( sapply( split(Func,group),sum) ) # Group sum of X*Y+Z
I would like a code that will split the data and group sum Func only for the first entry per individual i.e I should end up with a vector of 4 values as we have 4 groups.
We may use a group by approach i.e. grouped by 'group', slice the first row, ungroup, and then summarise to get the sum of X multiplied by 'Y' and added to 'Z'
library(dplyr)
df1 %>%
group_by(group) %>%
slice_head(n = 1) %>%
summarise(out = sum(X * Y + Z, na.rm = TRUE))
-output
# A tibble: 4 × 2
group out
<dbl> <dbl>
1 1 2.19
2 2 1.31
3 3 1.50
4 4 2.52
Or can use duplicated in base R
aggregate(out ~ group, transform(subset(df1, !duplicated(group)),
out = X * Y + Z), FUN = sum)
group out
1 1 2.194
2 2 1.311
3 3 1.501
4 4 2.522

boot.ci error message in R

I get All values of t are equal to 0.999999999999995 \n Cannot calculate confidence intervals
My data is this:
$data
y x2 x3
1 2 1 1
2 0 2 1
3 -2 3 1
4 -4 4 1
5 -6 5 1
6 5 1 2
7 3 2 2
8 1 3 2
9 -1 4 2
10 -3 5 2
11 8 1 3
12 6 2 3
13 4 3 3
14 2 4 3
15 0 5 3
16 11 1 4
17 9 2 4
18 7 3 4
19 5 4 4
20 3 5 4
21 14 1 5
22 12 2 5
23 10 3 5
24 8 4 5
25 6 5 5
I did this:
b<-boot(data=d,statistic=lmcoefs,R=5)
> b
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = d, statistic = lmcoefs, R = 5)
Bootstrap Statistics :
original bias std. error
t1* 1 -9.436896e-15 4.831384e-15
t2* -2 1.110223e-15 1.169693e-15
t3* 3 1.776357e-15 8.881784e-16
Then I calculated the confidence intervals using boot.ci: which gives
c<-boot.ci(b,conf=0.95,type="all")
[1] "All values of t are equal to 0.999999999999995 \n Cannot calculate confidence intervals"
And I get this error I cant get rid of. I dont understand why either.
The problem is that I did not do enough bootstrap replications. Hence I did not have enough variation in my data, and that's why I encountered this error.

Sort one column while keeping the other one fixed (within the other)

With
df <- data.frame(x = rep(1:3, each = 3)
, y = rep(1:3, 3)
, z = round(rnorm(9), 2))
df
x y z
1 1 1 0.55
2 1 2 0.99
3 1 3 -2.32
4 2 1 -0.25
5 2 2 1.20
6 2 3 -0.38
7 3 1 1.07
8 3 2 -0.98
9 3 3 -1.09
Is there a way to sort z within each x so that:
df.sort
x y z
1 1 3 -2.32
2 1 1 0.55
3 1 2 0.99
4 2 3 -0.38
5 2 1 -0.25
6 2 2 1.20
7 3 3 -1.09
8 3 2 -0.98
9 3 1 1.07
Thanks!
If you want to sort by z within each value of x ( what your example shows, not really what your question seems to lead towards, you can use plyr and arrange
library(plyr)
dfa <- arrange(df, x, z)
What you are doing here is ordering first by x, then by z
You could create a new data.frame on the fly.
data.frame(df$x, df[order(df$z), c("y", "z")])

How can I create a distance matrix containing the mean absolute scores between each row?

Given the matrix,
df <- read.table(text="
X1 X2 X3 X4 X5
1 2 3 2 1
2 3 4 4 3
3 4 4 6 2
4 5 5 5 4
2 3 3 3 6
5 6 2 8 4", header=T)
I want to create a distance matrix containing the absolute mean difference between each row of each column. For example, the distance between X1 and X3 should be = 1.67 given that:
abs(1 - 3) + abs(2-4) + abs(3-4) + abs(4-5) + abs(2-3) + abs(5-2) = 10 / 6 = 1.67
I have tried using the designdist() function in the vegan package this way:
designdist(t(df), method = "abs(A-B)/6", terms = "minimum")
The resulting distance for columns 1 and 3 is 0.666. The problem with this function is that it sums all the values in each column and then subtracts them. But I need to sum the absolute differences between each row (individually, absolute) and then divide it by N.
Here's a one-line solution. It takes advantage of dist()'s method argument to calculate the L1 norm aka city block distance aka Manhattan distance between each pair of columns in your data.frame.
as.matrix(dist(df, "manhattan", diag=TRUE, upper=TRUE)/nrow(df))
To make it reproducible:
df <- read.table(text="
X1 X2 X3 X4 X5
1 2 3 2 1
2 3 4 4 3
3 4 4 6 2
4 5 5 5 4
2 3 3 3 6
5 6 2 8 4", header=T)
dmat <- as.matrix(dist(df, "manhattan", diag=TRUE, upper=TRUE)/nrow(df))
print(dmat, digits=3)
# 1 2 3 4 5 6
# 1 0.00 1.167 1.667 2.33 1.333 3.00
# 2 1.17 0.000 0.833 1.17 0.833 2.17
# 3 1.67 0.833 0.000 1.00 1.667 1.67
# 4 2.33 1.167 1.000 0.00 1.667 1.33
# 5 1.33 0.833 1.667 1.67 0.000 2.33
# 6 3.00 2.167 1.667 1.33 2.333 0.00

Resources