I'm fitting a Partial Credit Model (PCM) with ltm package.
Suppose, my data contains 3 items each scored 1, 2 or 3, like this one:
my_data<-data.frame(
X1 = c(1,1,3,1,1,3,1,3,1,1,3,3,3,3,3,3,3,3,1,3,3,3,3,1,1,3,3,3,3,3,3,3,3,1,3,3,3,1,1,3),
X2 = c(1,1,2,3,2,3,2,3,3,3,3,3,3,3,2,2,2,2,2,2,2,2,3,3,3,3,3,3,2,2,2,2,2,2,2,2,3,2,1,1),
X3 = c(2,1,2,2,3,3,2,3,1,2,1,1,1,3,2,2,1,1,1,2,3,1,3,3,2,3,1,2,1,1,1,3,2,2,1,1,1,2,2,1)
)
But it happened that no one have chosen option 2 in the first item:
lapply(my_data, table)
$X1
1 3
13 27
$X2
1 2 3
4 20 16
$X3
1 2 3
17 14 9
Now, when I run ltm::gpcm() to fit the model and factor.scores() to examine person abilities, I get the following output:
library('ltm')
fit<-gpcm(my_data, constraint='rasch')
factor.scores(fit)
Call:
gpcm(data = my_data, constraint = "rasch")
Scoring Method: Empirical Bayes
Factor-Scores for observed response patterns:
X1 X2 X3 Obs Exp z1 se.z1
1 1 1 1 1 1.578 -1.414 0.744
2 1 1 2 2 0.486 -0.880 0.718
3 1 2 1 1 4.228 -0.880 0.718
4 1 2 2 3 2.209 -0.379 0.700
5 1 2 3 1 0.787 0.104 0.694
6 1 3 1 1 1.546 -0.379 0.700
7 1 3 2 3 1.343 0.104 0.694
8 1 3 3 1 0.793 0.591 0.705
9 2 1 1 1 1.159 -0.880 0.718
10 2 2 1 8 5.267 -0.379 0.700
11 2 2 2 5 4.573 0.104 0.694
12 2 2 3 2 2.701 0.591 0.705
13 2 3 1 5 3.201 0.104 0.694
14 2 3 2 1 4.607 0.591 0.705
15 2 3 3 5 4.597 1.107 0.737
It looks like X1 is treated like it had two possible responses: "1" and "2", not "1" and "3"!
Is there any way to inlude unobserved response "2" for X1?
Why this is important?
It's all about scoring. Look at lines 2 and 9 above:
Line 2 is espondent, who scored 1, 1 and 2 (respectively on X1, X2 and X3).
Line 9 is respondent who scored 3, 1, 1 (since X1=3 in original dataset is recoded to X1=2 by ltm package)
Those two people have:
exatly the same person-ability score assigned (column z1),
different raw scores (4 and 5, respectively),
which should not happen.
To be precise: I understand why this happens. My question is how to overcome such behaviour?
I have a dataframe grouped by grp:
df <- data.frame(
v = rnorm(25),
grp = c(rep("A",10), rep("B",15)),
size = 2)
I want to flag the run-length of intervals determined by size. For example, for grp == "A", size is 2, and the number of rows is 10. So the interval should have length 10/2 = 5. This code, however, creates intervals with length 2:
df %>%
group_by(grp) %>%
mutate(
interval = (row_number() -1) %/% size)
# A tibble: 25 × 4
# Groups: grp [2]
v grp size interval
<dbl> <chr> <dbl> <dbl>
1 -0.166 A 2 0
2 -1.12 A 2 0
3 0.941 A 2 1
4 -0.913 A 2 1
5 0.486 A 2 2
6 -1.80 A 2 2
7 -0.370 A 2 3
8 -0.209 A 2 3
9 -0.661 A 2 4
10 -0.177 A 2 4
# … with 15 more rows
How can I flag the correct run-length of the size-determined intervals? The desired output is this:
# A tibble: 25 × 4
# Groups: grp [2]
v grp size interval
<dbl> <chr> <dbl> <dbl>
1 -0.166 A 2 0
2 -1.12 A 2 0
3 0.941 A 2 0
4 -0.913 A 2 0
5 0.486 A 2 0
6 -1.80 A 2 1
7 -0.370 A 2 1
8 -0.209 A 2 1
9 -0.661 A 2 1
10 -0.177 A 2 1
# … with 15 more rows
If I interpreted your question correctly, this small change should do the trick?
df %>%
group_by(grp) %>%
mutate(
interval = (row_number() -1) %/% (n()/size))
You can use gl:
df %>%
group_by(grp) %>%
mutate(interval = gl(first(size), ceiling(n() / first(size)))[1:n()])
output
# A tibble: 26 × 4
# Groups: grp [2]
v grp size interval
<dbl> <chr> <dbl> <fct>
1 -1.12 A 2 1
2 3.04 A 2 1
3 0.235 A 2 1
4 -0.0333 A 2 1
5 -2.73 A 2 1
6 -0.0998 A 2 1
7 0.976 A 2 2
8 0.414 A 2 2
9 0.912 A 2 2
10 1.98 A 2 2
11 1.17 A 2 2
12 -0.509 B 2 1
13 0.704 B 2 1
14 -0.198 B 2 1
15 -0.538 B 2 1
16 -2.86 B 2 1
17 -0.790 B 2 1
18 0.488 B 2 1
19 2.17 B 2 1
20 0.501 B 2 2
21 0.620 B 2 2
22 -0.966 B 2 2
23 0.163 B 2 2
24 -2.08 B 2 2
25 0.485 B 2 2
26 0.697 B 2 2
I have created a tibble thus:
library(tidyverse)
set.seed(68)
a <- c(1, 2, 3, 4, 5)
b <- runif(5)
c <- c(1, 3, 3, 3, 1)
tib <- tibble(a, b, c)
which produces this
tib
# A tibble: 5 x 3
a b c
<dbl> <dbl> <dbl>
1 1 0.924 1
2 2 0.661 3
3 3 0.402 3
4 4 0.637 3
5 5 0.353 1
I would like to add another column, d, which is the value of b according to the a value given in column c. The resulting data frame should look thus:
a b c d
<dbl> <dbl> <dbl> <dbl>
1 1 0.924 1 0.924
2 2 0.661 3 0.402
3 3 0.402 3 0.402
4 4 0.637 3 0.402
5 5 0.353 1 0.924
Thanks for looking!
Use c to index the desired row of b:
tib %>% mutate(d = b[c])
a b c d
<dbl> <dbl> <dbl> <dbl>
1 1 0.924 1 0.924
2 2 0.661 3 0.402
3 3 0.402 3 0.402
4 4 0.637 3 0.402
5 5 0.353 1 0.924
This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 3 years ago.
Spread row values into columns
Data looks like
> head(df2)
ID fungi Conc Abs date_no
1 1 R3 2.500000 0.209 0
22 1 R3 1.250000 0.153 0
43 1 R3 0.625000 0.159 0
64 1 R3 0.312500 0.164 0
85 1 R3 0.156250 0.157 0
106 1 R3 0.078125 0.170 0
And I used this function, which spread the date column into three columns but didn't populate them correctly.
separate_DF <- spread(df2, "date_no", "Abs")
What I get is this...
> head(df3)
ID fungi Conc date_no_0 date_no_1 date_no_3
1 1 R3 0.01953125 0.162 NA NA
2 1 R3 0.03906253 0.169 NA NA
3 1 R3 0.07812500 0.170 NA NA
4 1 R3 0.15625000 0.157 NA NA
5 1 R3 0.31250000 0.164 NA NA
6 1 R3 0.62500000 0.159 NA NA
So that the three date columns are populated by the Abs values. And each fungi at each concentration is its own row.
Try this one,
library(tidyr)
txt <- "fungi date Abs Conc
1 1 x 2.5
1 2 x 2.5
1 3 x 2.5
2 1 x 2.5
2 2 x 2.5
2 3 x 2.5
"
date_df <- read.table(textConnection(txt), header = TRUE)
print(spread(date_df, date, Abs, sep=""))
Result:
fungi Conc date1 date2 date3
1 1 2.5 x x x
2 2 2.5 x x x
This question already has answers here:
R - Group by variable and then assign a unique ID [duplicate]
(3 answers)
How to create a consecutive group number
(13 answers)
Closed 4 years ago.
I can not get my head around this must be simple task. How to get a group label as a consecutive number.
library(dplyr)
set.seed(1)
df <- data.frame(id = sample(c('a','b'), 20, T),
name = sample(c('N1', 'N2', 'N3'), 20, T),
val = runif(20)) %>%
group_by(id) %>%
arrange(id, name)
What I want is a label group_no that indicates the number of categories of the variable name within each id dplyr group. I can not find a solution in the dplyr package itself. Something like this:
# A tibble: 20 x 4
# Groups: id [2]
id name val group_no
<fct> <fct> <dbl> <int>
1 a N1 0.647 1
2 a N1 0.530 1
3 a N1 0.245 1
4 a N2 0.693 2
5 a N2 0.478 2
6 a N2 0.861 2
7 a N3 0.821 3
8 a N3 0.0995 3
9 a N3 0.662 3
10 b N1 0.553 1
11 b N1 0.0233 1
12 b N1 0.519 1
13 b N2 0.783 2
14 b N2 0.789 2
15 b N2 0.477 2
16 b N2 0.438 2
17 b N2 0.407 2
18 b N3 0.732 3
19 b N3 0.0707 3
20 b N3 0.316 3
Note, that the values of name could be anything and certainly are not normally suffixed by a number as in the example (otherwise I could do sub("^N", "", df$name).
I am looking for something a little different than the 1:n() solution in SO posts such as here.
I think in this case something as simple as :
df %>%
mutate(group_no = as.integer(name))
will work
# A tibble: 20 x 4
# Groups: id [2]
id name val group_no
<fct> <fct> <dbl> <int>
1 a N1 0.647 1
2 a N1 0.530 1
3 a N1 0.245 1
4 a N2 0.693 2
5 a N2 0.478 2
6 a N2 0.861 2
7 a N3 0.821 3
8 a N3 0.0995 3
9 a N3 0.662 3
10 b N1 0.553 1
11 b N1 0.0233 1
12 b N1 0.519 1
13 b N2 0.783 2
14 b N2 0.789 2
15 b N2 0.477 2
16 b N2 0.438 2
17 b N2 0.407 2
18 b N3 0.732 3
19 b N3 0.0707 3
20 b N3 0.316 3
We can do
df %>%
group_by(id) %>%
mutate(group_no = cumsum(c(TRUE, name[-1] != name[-n()])))
Or with match
df %>%
group_by(id) %>%
mutate(group_no = match(name, unique(name)))
# A tibble: 20 x 4
# Groups: id [2]
# id name val group_no
# <fct> <fct> <dbl> <int>
# 1 a N1 0.647 1
# 2 a N1 0.530 1
# 3 a N1 0.245 1
# 4 a N2 0.693 2
# 5 a N2 0.478 2
# 6 a N2 0.861 2
# 7 a N3 0.821 3
# 8 a N3 0.0995 3
# 9 a N3 0.662 3
#10 b N1 0.553 1
#11 b N1 0.0233 1
#12 b N1 0.519 1
#13 b N2 0.783 2
#14 b N2 0.789 2
#15 b N2 0.477 2
#16 b N2 0.438 2
#17 b N2 0.407 2
#18 b N3 0.732 3
#19 b N3 0.0707 3
#20 b N3 0.316 3
Here is a solution that uses left_join.
df %>%
left_join(df %>%
group_by(id, name) %>%
summarise(group_no = row_number()))