How to generate permutation in data.frame format in R? - r

I am preparing a questionnaire to ask which transport mode do the respondents use in different conditions in terms of its travel time and cost.
There are three transport modes, two levels of travel time and three levels of travel cost as below:
mode <- c(1:3)
time <- c(1:2)
cost <- c(1:3)
I would generate all combinations of travel time and cost by transport mode but do not know how to generate it easily in R.
In the questionnaire, it shows three modes in one pair of modes with different conditions like the example below. comb indicates combination number of each pair of modes.
comb mode time cost
1 1 1 1
1 2 1 1
1 3 1 1
2 1 1 1
2 2 2 1
2 3 2 1
3 1 2 1
3 2 1 2
3 3 1 1
4 1 1 3
4 2 2 3
4 3 1 1
5 1 1 2
5 2 2 1
5 3 1 3
6 1 1 1
6 2 1 1
6 3 1 1
7 1 1 1
7 2 1 1
7 3 1 1
8 1 1 1
8 2 1 1
8 3 1 1
..... continues till fulfilling all combinations
I used expand.grid() but it returns just 18 combinations of mode, time and cost (3*2*3) without taking permutation by a pair of transport mode into account. I also tried several permutation functions but it may not bring my desired result. I prefer to make it in a data.frame with grouping variable such as comb in the example.
Permute groups in a data.frame R
How to calculate permutations of group labels with R?
It would be highly appreciated to generate all combinations simply..

Related

How do I create an index variable based on three variables in R? [duplicate]

This question already has answers here:
Numbering rows within groups in a data frame
(10 answers)
Closed 26 days ago.
I'm trying to create an index variable based on an individual identifier, a test name, and the date the test was taken in R. My data has repeated students taking the same test over and over with different scores. I'd like to be able to easily identify what number try each observation is for that specific test. My data looks something like this and I'd like to create a variable like the ID variable shown. It should start over at 1 and count, in order of date, the number of observations with the same student and test name.
student <- c(1,1,1,1,1,1,2,2,2,3,3,3,3,3)
test <-c("math","math","reading","math","reading","reading","reading","math","reading","math","math","math","reading","reading")
date <- c(1,2,3,3,4,5,2,3,5,1,2,3,4,5)
data <- data.frame(student,test,date)
print(data)
student test date
1 1 math 1
2 1 math 2
3 1 reading 3
4 1 math 3
5 1 reading 4
6 1 reading 5
7 2 reading 2
8 2 math 3
9 2 reading 5
10 3 math 1
11 3 math 2
12 3 math 3
13 3 reading 4
14 3 reading 5
I want to add a variable that indicates the attempt number for a test taken by the same student so it looks something like this:
student test date id
1 1 math 1 1
2 1 math 2 2
3 1 reading 3 1
4 1 math 3 3
5 1 reading 4 2
6 1 reading 5 3
7 2 reading 2 1
8 2 math 3 1
9 2 reading 5 2
10 3 math 1 1
11 3 math 2 2
12 3 math 3 3
13 3 reading 4 1
14 3 reading 5 2
I figured how to create an ID variable based on only one other variable, for example based on the student number, but I don't know how to do it for multiple variables. I also tried cumsum but that keeps counting with each new value, and doesn't start over at 1 when there is a new value.
tests <- transform(tests, ID = as.numeric(factor(EMPLID)))
tests$id <-cumsum(!duplicated(tests[1:3]))
library(dplyr)
data %>%
group_by(student, test) %>%
arrange(date, .by_group = TRUE) %>% ## make sure things are sorted by date
mutate(id = row_number()) %>%
ungroup()
# # A tibble: 14 × 4
# student test date id
# <dbl> <chr> <dbl> <int>
# 1 1 math 1 1
# 2 1 math 2 2
# 3 1 math 3 3
# 4 1 reading 3 1
# 5 1 reading 4 2
# 6 1 reading 5 3
# 7 2 math 3 1
# 8 2 reading 2 1
# 9 2 reading 5 2
# 10 3 math 1 1
# 11 3 math 2 2
# 12 3 math 3 3
# 13 3 reading 4 1
# 14 3 reading 5 2

Add a column that divides another column into n chunks, R

There's no easy way to describe my question, that's probably why I was not able to find answer through search.
So I have a data frame with 3 columns, one of the columns is Subject number, the other two columns are Correctness and Block. There are 2 participants, each was exposed to 2 blocks of 3 stimuli in each block.
subj corr block
1 1 1 1
2 1 0 1
3 1 1 1
4 1 1 2
5 1 1 2
6 1 1 2
7 2 0 1
8 2 1 1
9 2 1 1
10 2 0 2
11 2 1 2
12 2 1 2
So what I want to do is to create another column that look at a specific subj number and divide the block columns corresponding to the subj into 3 even chunks (the original df has 2 chunks). In general, I want to know how to divide the stimuli each subj is exposed to in to N chunks and input the chunk number into another column.
subj corr block newblock
1 1 1 1 1
2 1 0 1 1
3 1 1 1 2
4 1 1 2 2
5 1 1 2 3
6 1 1 2 3
7 2 0 1 1
8 2 1 1 1
9 2 1 1 2
10 2 0 2 2
11 2 1 2 3
12 2 1 2 3
Something like this:
library(dplyr)
n_chunks = 3
df %>%
group_by(subj) %>%
mutate(newblock = rep(1:n_chunks, each = ceiling(n() / n_chunks))[1:n()])
How much of this is necessary depends on your use case. If you can guarantee that n_chunks evenly divides the number of observations for each subject you can simplify to:
df %>%
group_by(subj) %>%
mutate(newblock = rep(1:n_chunks, each = n() / n_chunks))

R - Count duplicates values for each row

I'm working on a data frame that requires to calculate Fleiss's Kappa for inter-rater agreements. I'm using the 'irr' package for that.
Besides that, I need to count, for each observation, how many of raters are in agreement.
My data looks like these:
a b c
1 1 1 1
2 1 2 2
3 2 3 2
4 3 3 1
5 4 2 1
I'm expecting something like this, , where count stands for number of raters on agreement
a b c count
1 1 1 1 3
2 1 2 2 2
3 2 3 2 2
4 3 3 1 2
5 4 2 1 0
Thanks a lot.
Alternative solution if your data is in a data frame called abc:
as.numeric(apply(abc,1,function(x) {
ux<-unique(x);
tab <- tabulate(match(x, ux));
mode <- ux[tab == max(tab)];
ifelse(length(mode)==1,length(which(x==mode)),NA_character_);
} ))
When you run it gives:
[1] 3 2 2 2 NA

Possible to arrange observations in groups of N that reflect data set proportions using R?

Are there are functions in R that arrange observations in groups of N that reflect, as closely as possible, the data set proportions of certain variables?
For example, if I have a data set with 8 observations and two variables each with two levels with data set proportions as follows:
Var1 Var2
1 0.5 0.5
2 0.5 0.5
Are there any functions that would enable me to optimally sample from the data set to say create groups of 2 observations that reflect the above data set proportions?
Example data:
Data <- read.table(text=" Obs Var1 Var2
1 1 1
2 1 2
3 2 1
4 2 2
5 1 1
6 1 2
7 2 1
8 2 2 ", header=T)
Desired Result:
Result <- read.table(text=" Obs Var1 Var2 Group_ID
1 1 1 1
4 2 2 1
2 1 2 2
3 2 1 2
5 1 1 3
7 2 1 3
6 1 2 4
8 2 2 4 ", header=T)
Not that all groups have proportions of .5 for each level of each variable.

Creating a fractional factorial design in R without prohibited pairs

I'm trying to write R code for a choice-based conjoint study.
I can create a factorial design using AlgDesign or conjoint - however, there are combinations of attribute levels that should not be together
Using an example from the web:
#Creating a full factorial design
library(AlgDesign)
ffd <- gen.factorial(c(2,2,4), varNames=c("Discount","Amount","Price"), factors="all")
ffd
Discount Amount Price
1 1 1 1
2 2 1 1
3 1 2 1
4 2 2 1
5 1 1 2
6 2 1 2
7 1 2 2
8 2 2 2
9 1 1 3
10 2 1 3
11 1 2 3
12 2 2 3
13 1 1 4
14 2 1 4
15 1 2 4
16 2 2 4
But what if "Discount" 2 ("no discount") should never be paired with "Amount" 1 ("20% discount")
Is there a way to tell AlgDesign or conjoint or some other factorial design to remove any prohibited pairs from the design?
Any advice would be appreciated.
You could always generate ffd as you did there, and then remove rows which meet your criteria, e.g. ffd$Discount == 2 & ffd$Amount==1 . The easy-ish way is to keep all the rows which do not meet the condition:
ffd<-ffd[(ffd$Discount != 2 | ffd$Amount != 1),]
Repeat for each condition you want to reject.

Resources