Is it possible to fit Partial Credit Model when one of possible responses is never selected in one of items? - r
I'm fitting a Partial Credit Model (PCM) with ltm package.
Suppose, my data contains 3 items each scored 1, 2 or 3, like this one:
my_data<-data.frame(
X1 = c(1,1,3,1,1,3,1,3,1,1,3,3,3,3,3,3,3,3,1,3,3,3,3,1,1,3,3,3,3,3,3,3,3,1,3,3,3,1,1,3),
X2 = c(1,1,2,3,2,3,2,3,3,3,3,3,3,3,2,2,2,2,2,2,2,2,3,3,3,3,3,3,2,2,2,2,2,2,2,2,3,2,1,1),
X3 = c(2,1,2,2,3,3,2,3,1,2,1,1,1,3,2,2,1,1,1,2,3,1,3,3,2,3,1,2,1,1,1,3,2,2,1,1,1,2,2,1)
)
But it happened that no one have chosen option 2 in the first item:
lapply(my_data, table)
$X1
1 3
13 27
$X2
1 2 3
4 20 16
$X3
1 2 3
17 14 9
Now, when I run ltm::gpcm() to fit the model and factor.scores() to examine person abilities, I get the following output:
library('ltm')
fit<-gpcm(my_data, constraint='rasch')
factor.scores(fit)
Call:
gpcm(data = my_data, constraint = "rasch")
Scoring Method: Empirical Bayes
Factor-Scores for observed response patterns:
X1 X2 X3 Obs Exp z1 se.z1
1 1 1 1 1 1.578 -1.414 0.744
2 1 1 2 2 0.486 -0.880 0.718
3 1 2 1 1 4.228 -0.880 0.718
4 1 2 2 3 2.209 -0.379 0.700
5 1 2 3 1 0.787 0.104 0.694
6 1 3 1 1 1.546 -0.379 0.700
7 1 3 2 3 1.343 0.104 0.694
8 1 3 3 1 0.793 0.591 0.705
9 2 1 1 1 1.159 -0.880 0.718
10 2 2 1 8 5.267 -0.379 0.700
11 2 2 2 5 4.573 0.104 0.694
12 2 2 3 2 2.701 0.591 0.705
13 2 3 1 5 3.201 0.104 0.694
14 2 3 2 1 4.607 0.591 0.705
15 2 3 3 5 4.597 1.107 0.737
It looks like X1 is treated like it had two possible responses: "1" and "2", not "1" and "3"!
Is there any way to inlude unobserved response "2" for X1?
Why this is important?
It's all about scoring. Look at lines 2 and 9 above:
Line 2 is espondent, who scored 1, 1 and 2 (respectively on X1, X2 and X3).
Line 9 is respondent who scored 3, 1, 1 (since X1=3 in original dataset is recoded to X1=2 by ltm package)
Those two people have:
exatly the same person-ability score assigned (column z1),
different raw scores (4 and 5, respectively),
which should not happen.
To be precise: I understand why this happens. My question is how to overcome such behaviour?
Related
Flag run-length of grouped intervals
I have a dataframe grouped by grp: df <- data.frame( v = rnorm(25), grp = c(rep("A",10), rep("B",15)), size = 2) I want to flag the run-length of intervals determined by size. For example, for grp == "A", size is 2, and the number of rows is 10. So the interval should have length 10/2 = 5. This code, however, creates intervals with length 2: df %>% group_by(grp) %>% mutate( interval = (row_number() -1) %/% size) # A tibble: 25 × 4 # Groups: grp [2] v grp size interval <dbl> <chr> <dbl> <dbl> 1 -0.166 A 2 0 2 -1.12 A 2 0 3 0.941 A 2 1 4 -0.913 A 2 1 5 0.486 A 2 2 6 -1.80 A 2 2 7 -0.370 A 2 3 8 -0.209 A 2 3 9 -0.661 A 2 4 10 -0.177 A 2 4 # … with 15 more rows How can I flag the correct run-length of the size-determined intervals? The desired output is this: # A tibble: 25 × 4 # Groups: grp [2] v grp size interval <dbl> <chr> <dbl> <dbl> 1 -0.166 A 2 0 2 -1.12 A 2 0 3 0.941 A 2 0 4 -0.913 A 2 0 5 0.486 A 2 0 6 -1.80 A 2 1 7 -0.370 A 2 1 8 -0.209 A 2 1 9 -0.661 A 2 1 10 -0.177 A 2 1 # … with 15 more rows
If I interpreted your question correctly, this small change should do the trick? df %>% group_by(grp) %>% mutate( interval = (row_number() -1) %/% (n()/size))
You can use gl: df %>% group_by(grp) %>% mutate(interval = gl(first(size), ceiling(n() / first(size)))[1:n()]) output # A tibble: 26 × 4 # Groups: grp [2] v grp size interval <dbl> <chr> <dbl> <fct> 1 -1.12 A 2 1 2 3.04 A 2 1 3 0.235 A 2 1 4 -0.0333 A 2 1 5 -2.73 A 2 1 6 -0.0998 A 2 1 7 0.976 A 2 2 8 0.414 A 2 2 9 0.912 A 2 2 10 1.98 A 2 2 11 1.17 A 2 2 12 -0.509 B 2 1 13 0.704 B 2 1 14 -0.198 B 2 1 15 -0.538 B 2 1 16 -2.86 B 2 1 17 -0.790 B 2 1 18 0.488 B 2 1 19 2.17 B 2 1 20 0.501 B 2 2 21 0.620 B 2 2 22 -0.966 B 2 2 23 0.163 B 2 2 24 -2.08 B 2 2 25 0.485 B 2 2 26 0.697 B 2 2
Sum groupwise but using only individual's first entry
Consider the following sample data. The data has 2 individuals per group and each individual has 2 entries. rm(list=ls()); set.seed(1234) G=4 ; # Suppose you have 4 groups nTot<-8 # We have 2 individuals per group so in total we have 8 individuals group<-rep(1:G, rep(4,G) )#Group identifier individualID<-rep(1:nTot, rep(2,nTot) )#We have 2 individuals per group each with 2 entries n<-2*nTot # We have 16 entries in total X<-rbinom(n, 1, 0.5) Y<-runif(n, 0, 1) Z<-runif(n, 0, 4) df1<-round(data.frame(group,individualID,X,Y,Z),3) > df1 group individualID X Y Z 1 1 1 0 0.286 1.219 2 1 1 1 0.267 2.029 3 1 2 1 0.187 0.724 4 1 2 1 0.232 3.039 5 2 3 1 0.317 0.805 6 2 3 1 0.303 1.035 7 2 4 0 0.159 3.969 8 2 4 0 0.040 3.229 9 3 5 1 0.219 2.213 10 3 5 1 0.811 2.586 11 3 6 1 0.526 1.247 12 3 6 1 0.915 2.487 13 4 7 0 0.831 1.319 14 4 7 1 0.046 2.008 15 4 8 0 0.456 2.708 16 4 8 1 0.265 1.940 Func<-X*Y+Z Func The code below computes sum of Func per group using split() function. Func<-X*Y+Z GroupSum<-as.numeric( sapply( split(Func,group),sum) ) # Group sum of X*Y+Z I would like a code that will split the data and group sum Func only for the first entry per individual i.e I should end up with a vector of 4 values as we have 4 groups.
We may use a group by approach i.e. grouped by 'group', slice the first row, ungroup, and then summarise to get the sum of X multiplied by 'Y' and added to 'Z' library(dplyr) df1 %>% group_by(group) %>% slice_head(n = 1) %>% summarise(out = sum(X * Y + Z, na.rm = TRUE)) -output # A tibble: 4 × 2 group out <dbl> <dbl> 1 1 2.19 2 2 1.31 3 3 1.50 4 4 2.52 Or can use duplicated in base R aggregate(out ~ group, transform(subset(df1, !duplicated(group)), out = X * Y + Z), FUN = sum) group out 1 1 2.194 2 2 1.311 3 3 1.501 4 4 2.522
boot.ci error message in R
I get All values of t are equal to 0.999999999999995 \n Cannot calculate confidence intervals My data is this: $data y x2 x3 1 2 1 1 2 0 2 1 3 -2 3 1 4 -4 4 1 5 -6 5 1 6 5 1 2 7 3 2 2 8 1 3 2 9 -1 4 2 10 -3 5 2 11 8 1 3 12 6 2 3 13 4 3 3 14 2 4 3 15 0 5 3 16 11 1 4 17 9 2 4 18 7 3 4 19 5 4 4 20 3 5 4 21 14 1 5 22 12 2 5 23 10 3 5 24 8 4 5 25 6 5 5 I did this: b<-boot(data=d,statistic=lmcoefs,R=5) > b ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = d, statistic = lmcoefs, R = 5) Bootstrap Statistics : original bias std. error t1* 1 -9.436896e-15 4.831384e-15 t2* -2 1.110223e-15 1.169693e-15 t3* 3 1.776357e-15 8.881784e-16 Then I calculated the confidence intervals using boot.ci: which gives c<-boot.ci(b,conf=0.95,type="all") [1] "All values of t are equal to 0.999999999999995 \n Cannot calculate confidence intervals" And I get this error I cant get rid of. I dont understand why either.
The problem is that I did not do enough bootstrap replications. Hence I did not have enough variation in my data, and that's why I encountered this error.
Sort one column while keeping the other one fixed (within the other)
With df <- data.frame(x = rep(1:3, each = 3) , y = rep(1:3, 3) , z = round(rnorm(9), 2)) df x y z 1 1 1 0.55 2 1 2 0.99 3 1 3 -2.32 4 2 1 -0.25 5 2 2 1.20 6 2 3 -0.38 7 3 1 1.07 8 3 2 -0.98 9 3 3 -1.09 Is there a way to sort z within each x so that: df.sort x y z 1 1 3 -2.32 2 1 1 0.55 3 1 2 0.99 4 2 3 -0.38 5 2 1 -0.25 6 2 2 1.20 7 3 3 -1.09 8 3 2 -0.98 9 3 1 1.07 Thanks!
If you want to sort by z within each value of x ( what your example shows, not really what your question seems to lead towards, you can use plyr and arrange library(plyr) dfa <- arrange(df, x, z) What you are doing here is ordering first by x, then by z
You could create a new data.frame on the fly. data.frame(df$x, df[order(df$z), c("y", "z")])
How can I create a distance matrix containing the mean absolute scores between each row?
Given the matrix, df <- read.table(text=" X1 X2 X3 X4 X5 1 2 3 2 1 2 3 4 4 3 3 4 4 6 2 4 5 5 5 4 2 3 3 3 6 5 6 2 8 4", header=T) I want to create a distance matrix containing the absolute mean difference between each row of each column. For example, the distance between X1 and X3 should be = 1.67 given that: abs(1 - 3) + abs(2-4) + abs(3-4) + abs(4-5) + abs(2-3) + abs(5-2) = 10 / 6 = 1.67 I have tried using the designdist() function in the vegan package this way: designdist(t(df), method = "abs(A-B)/6", terms = "minimum") The resulting distance for columns 1 and 3 is 0.666. The problem with this function is that it sums all the values in each column and then subtracts them. But I need to sum the absolute differences between each row (individually, absolute) and then divide it by N.
Here's a one-line solution. It takes advantage of dist()'s method argument to calculate the L1 norm aka city block distance aka Manhattan distance between each pair of columns in your data.frame. as.matrix(dist(df, "manhattan", diag=TRUE, upper=TRUE)/nrow(df)) To make it reproducible: df <- read.table(text=" X1 X2 X3 X4 X5 1 2 3 2 1 2 3 4 4 3 3 4 4 6 2 4 5 5 5 4 2 3 3 3 6 5 6 2 8 4", header=T) dmat <- as.matrix(dist(df, "manhattan", diag=TRUE, upper=TRUE)/nrow(df)) print(dmat, digits=3) # 1 2 3 4 5 6 # 1 0.00 1.167 1.667 2.33 1.333 3.00 # 2 1.17 0.000 0.833 1.17 0.833 2.17 # 3 1.67 0.833 0.000 1.00 1.667 1.67 # 4 2.33 1.167 1.000 0.00 1.667 1.33 # 5 1.33 0.833 1.667 1.67 0.000 2.33 # 6 3.00 2.167 1.667 1.33 2.333 0.00