All combinations with multiple constraints - r

I wish to generate all possible combinations of a set of numbers, but with multiple constraints. I have found several similar questions on Stack Overflow, but none that appear to address all of my constraints:
Below is an example data set. This is a deterministic data set, in my mind anyway. <- read.table(text = '
x1 x2 x3 x4
1 1 1 1
1 1 1 2
1 1 1 3
1 1 2 1
1 1 2 2
1 1 2 3
1 1 3 3
1 2 1 1
1 2 1 2
1 2 1 3
1 2 2 1
1 2 2 2
1 2 2 3
1 2 3 3
1 3 3 3
0 1 1 1
0 1 1 2
0 1 1 3
0 1 2 1
0 1 2 2
0 1 2 3
0 1 3 3
0 0 1 1
0 0 1 2
0 0 1 3
0 0 0 1
', header = TRUE, stringsAsFactors = FALSE, na.strings = 'NA')
Here are the constraints:
Column 1 can only contain a 0 or 1
The last column can only contain 1, 2 or 3
All other columns can contain 0, 1, 2 or 3
Once a non-0 appears in a row the rest of that row cannot contain another 0
Once a 3 appears in a row the rest of that row must only contain 3's
The first non-0 number in a row must be a 1
The only way I know to generate this type of data set is to use nested for-loops as shown below. I have used this technique for years and finally decided to ask if there might be a better way.
I hope this is not a duplicate and I hope it is not considered too specialized. I create these types of data sets frequently and a simpler solution would be quite helpful. <- matrix(0, ncol = 4, nrow = 25) <-
j <- 1
for(i1 in 0:1) {
if(i1 == 0) i2.begin = 0
if(i1 == 0) i2.end = 1
if(i1 == 1) i2.begin = 1
if(i1 == 1) i2.end = 3
if(i1 == 2) i2.begin = 1
if(i1 == 2) i2.end = 3
if(i1 == 3) i2.begin = 3
if(i1 == 3) i2.end = 3
for(i2 in i2.begin:i2.end) {
if(i2 == 0) i3.begin = 0
if(i2 == 0) i3.end = 1
if(i2 == 1) i3.begin = 1
if(i2 == 1) i3.end = 3
if(i2 == 2) i3.begin = 1
if(i2 == 2) i3.end = 3
if(i2 == 3) i3.begin = 3
if(i2 == 3) i3.end = 3
for(i3 in i3.begin:i3.end) {
if(i3 == 0) i4.begin = 1 # 1 not 0 because last column
if(i3 == 0) i4.end = 1
if(i3 == 1) i4.begin = 1
if(i3 == 1) i4.end = 3
if(i3 == 2) i4.begin = 1
if(i3 == 2) i4.end = 3
if(i3 == 3) i4.begin = 3
if(i3 == 3) i4.end = 3
for(i4 in i4.begin:i4.end) {[j,1] <- i1[j,2] <- i2[j,3] <- i3[j,4] <- i4
j <- j + 1
Here is the output:
V1 V2 V3 V4
1 0 0 0 1
2 0 0 1 1
3 0 0 1 2
4 0 0 1 3
5 0 1 1 1
6 0 1 1 2
7 0 1 1 3
8 0 1 2 1
9 0 1 2 2
10 0 1 2 3
11 0 1 3 3
12 1 1 1 1
13 1 1 1 2
14 1 1 1 3
15 1 1 2 1
16 1 1 2 2
17 1 1 2 3
18 1 1 3 3
19 1 2 1 1
20 1 2 1 2
21 1 2 1 3
22 1 2 2 1
23 1 2 2 2
24 1 2 2 3
25 1 2 3 3
26 1 3 3 3
Sorry that I initially forgot to include Constraint #6.

Here is code that creates the desired data set for this specific example. I suspect the code can be generalized. If I succeed in generalizing it I will post the result. Although the code is messy and not intuitive I am convinced there is a basic general pattern. <- read.table(text = '
n <- 3 # non-zero numbers
m <- 4-2 # number of middle columns
x1 <- rep(1:0, c(((n*(n-1)) * (n-1) + n), (n*(n-1) + n + (n-1))))
x2 <- rep(c(1:n, 1:0), c(n*m+1, n*m+1, 1, n*m+1, n*1+1))
x3 <- rep(c(rep(1:n, n-1), n, 1:n, 1:0), c(rep(c(n,n,1), n-1), 1, n,n,1, n,1))
x4 <- c(rep(c(rep(1:n, (n-1)), n), (n-1)), n, rep(1:n,(n-1)), n, 1:n, 1) <- data.frame(x1, x2, x3, x4)
# [1] TRUE

I would use expand.grid to generate all combinations and then subset it, one constraint at a time:
## Once a non-0 appears in a row the rest of that row cannot contain another 0
b1<-apply(x,1,function(z) min(diff(z!=0))==0)
## Once a 3 appears in a row the rest of that row must only contain 3's
b1<-apply(x,1,function(z) min(diff(z==3))==0)
## The first non-0 number in a row must be a 1
b1<-apply(x,1,function(z) {
length(w)==0 || z[tail(w,1)+1]==1
And now sort it:
Similar to #mrip, start from expand.grid which can handle the first 3 constraints since they don't interact with the other columns
Next I would filter it. The difference between this approach and mrip's is that my filtering is in one apply instead of 3 so it should be around 3 times faster to filter.
filtered<-step1[apply(step1,1,function(x) all(if(length(which(x==0))>0) {max(which(x==0))==length(which(x==0))} else {TRUE}, if(length(which(x==3))>0) {min(which(x==3))==length(x)-length(which(x==3))+1} else {TRUE}, x[!x%in%0][1]==1)),]
That should be it. If you want to inspect each element inside the apply here it is:
if(length(which(x==0))>0) {max(which(x==0))==length(which(x==0))} else {TRUE}
If there are any zeros then it makes sure that nothing comes before the zero
if(length(which(x==3))>0) {min(which(x==3))==length(x)-length(which(x==3))+1} else {TRUE}
If there are any 3s it makes sure nothing is after them.
x[!x%in%0][1]==1) This first filters the zeros out of the row and then takes the first element of the row after that filter and only allows it to be a one.


Recoding by an order in r

I have a data recoding puzzle. Here is how my sample data looks like:
df <- data.frame(
id = c(1,1,1,1,1,1,1, 2,2,2,2,2,2, 3,3,3,3,3,3,3),
scores = c(0,1,1,0,0,-1,-1, 0,0,1,-1,-1,-1, 0,1,0,1,1,0,1),
position = c(1,2,3,4,5,6,7, 1,2,3,4,5,6, 1,2,3,4,5,6,7),
cat = c(1,1,1,1,1,0,0, 1,1,1,0,0,0, 1,1,1,1,1,1,1))
There are three ids in the dataset and rows were ordered by a positon variable. For each id, the first row after the scores start by -1 needs to be 0, and the cat variable needs to be 1. For example, for id=1, the first row would be 6th position and in that row, score should be 0 and the cat variable needs to 1. For those ids do not have scores=-1, I keep them as they are.
The desired output should look like below:
Any recommendations??
This may be what you are after
df %>%
group_by(id) %>%
mutate(i = which(scores == -1)[1]) %>% # find the first row == -1
mutate(scores = case_when(position == i & scores !=0 ~ 0, T ~ scores), # update the score using position & i
cat = ifelse(scores == -1,0,1)) %>% # then update cat
select (-i) # remove I
After trying a few things and getting ideas from #Ricky and #e.matt, I came up with a solution.
df %>%
filter(scores == -1) %>% # keep cases where var = 1
distinct(id, .keep_all = T) %>% # keep distinct cases based on group
mutate(first = 1) %>% # create first column
right_join(df, by=c("id","scores","position","cat")) %>% # join back original dataset
mutate(first = coalesce(first, 0)) %>% # replace NAs with 0
mutate(scores = case_when(
first == 1 ~ 0,
TRUE~scores)) %>%
mutate(cat = case_when(
first == 1 ~ 1,
This provides my desired output.
here is a data.table oneliner
library( data.table )
df[ df[, .(cumsum( scores == -1 ) == 1), by = .(id)]$V1, `:=`( scores = 0, cat = 1) ]
You could do something along these lines using the dplyr package:
df = mutate(df, cat = ifelse(scores == -1, 1, cat),
scores = ifelse(scores == -1, 0, scores))
Using the mutate() function, I am re-assigning the values for the scores and cat fields according to ifelse() conditional statements. For scores, if the score is -1, the value is replaced by 0, otherwise it keeps the score as is. For cat, it also checks if scores is equal to -1, but would assign a value of 1 when the condition is met, or the already existing value of cat when the condition is not met.
After our discussion in the comments, I think something along these lines should be helpful (you may have to modify the logic since I don't exactly follow what the desired output is here):
for(i in 1:nrow(df)){
# Check if score is -1
if(df[i, 'scores'] == -1){
# Update values for the next row
df[i+1, 'scores'] <- 0
df[i+1, 'cat'] <- 1
Sorry that I don't really follow the desired output, hopefully this is helpful in getting you to your answer!

Generate a new variable based on values change in another variable r

I asked something very similar [enter link description here][1] but I have a better understanding of my problem now. I will try my best to ask it as clear as I can.
I have a sample dataset looks like this below:
id <- c(1,1,1, 2,2,2, 3,3, 4,4, 5,5,5,5, 6,6,6, 7, 8,8, 9,9, 10,10) <- c(1,1,2, 1,1,1 ,1,1, 1,2, 1,2,2,2, 1,1,1, 1, 1,2, 1,1, 1,1)
sequence <- c(1,2,1, 1,2,3, 1,2, 1,1, 1,1,2,3, 1,2,3, 1, 1,1, 1,2, 1,2)
score <- c(0,0,0, 0,0,1, 2,0, 1,1, 1,0,1,1, 0,0,0, 1, 0,2, 1,2, 2,1)
data <- data.frame("id"=id, "", "sequence"=sequence, "score"=score)
> data
id represents for each student, represents the questions students take, sequence is the attempt number for each, and score is the score for each attempt, taking 0,1, or 2. Students can change their answers.
For within each id, I create a variable (status) by looking at the last two sequences (changes): Here the recoding rules are for status:
1-If there is only one attempt for each question:
a) assign "BTW" (Blank to Wrong) if the item score is 0.
b) assign "BTW" (Blank to Right) if the item score is 1.
2-If there are multiple attempts for each question:
a) assign "BTW" (Blank to Wrong) if the first item attempt score is 0.
b) assign "BTW" (Blank to Right) if the first item attempt score is 1.
c) assign "WW" for those who changed from wrong to wrong (0 to 0),
d) assign "WR" for those who changed to increasing score (0 to 1, or 1 to 2),
e) assign "RW" for those who changed to decreasing score (2 to 1, 2 to 0, or 1 to 0 ), and
f) assign "RR" for those who changed from right to right (1 to 1, 2 to 2).
score change from 0 to 1 or 0 to 2 or 1 to 2 considered correct (right) change while,
score change from 1 to 0 or 2 to 0 or 2 to 1 considered incorrect (wrong) change.
If there is only one attempt for as in id=7, then the status should be "BTR". If the score was 0, then it should be "BTW". the logic is supposed to be if the score increases, it should be WR, if it decreases, it should be RW.
a) from 1 to 2 as WR, instead, they were coded as RR,
b) from 2 to 1 as RW, instead, they were coded as WW.
I used this code. Things did not work out for some, for example for id=1. The status should be {BTW, WW}.
data %>% group_by(id, %>%
mutate(diff = c(0, diff(score)),
status = case_when(
n() == 1 & score == 0 ~ "BTW",
n() == 1 & score == 1 ~ "BTR",
diff == 0 & score == 0 ~ "WW",
diff == 0 & score > 0 ~ "RR",
diff > 0 ~ "WR",
diff < 0 ~ "RW",
TRUE ~ "oops"))
> data
the desired output would be with cases:
> desired
Any opinions?
In order to solve this, I broke the problem down into two steps. First identify the Blank to answer lines. Then once the first tries are identified then assign the change of answers to the remaining lines.
#rows that are not the first answer are assigned a "NA"
test<-data %>% group_by(id, %>%
mutate(status = case_when(
sequence == 1 & score == 0 ~ "BTW",
sequence == 1 & score >0 ~ "BTR",
TRUE ~ "NA"))
answer<- test %>% ungroup() %>% group_by(id, %>%
transmute(sequence, score,
status = case_when(score == 0 & score==lag(score) & status=="NA" ~ "WW",
score >= 1 & score == lag(score) & status=="NA"~ "RR",
score > 0 & score > lag(score) & status=="NA"~ "WR",
score < lag(score) & status=="NA"~ "RW",
TRUE ~ status))
head(answer, 20)
tail(answer, 4)
The status column matches your sample data for all rows except row 20, please double check the calculation.

R For Loop Not Working

Mydata set test is below. I want to create a new variable "indicator" which is=1 if all variables equal 1 (example row 3) or else 0.
i created the following for loop:
for (i in c(test$X10J,test$X20f,test$X10m,test$X10ap,test$Xmyy,test$X10junn,test$X10julyy)){
}else if(i==2|i==-1){
this creates a variable with all values=1 instead of 0 and -1.
A vectorized solution:
test$indicator <- ifelse(rowSums(test[,-1] ==1)==ncol(test[,-1]),1,0)
No need for a for loop. You can use apply
> test$indicator <- apply(test[-1], 1, function(x) ifelse(all(x == 1), 1, 0))
> test
You could just use:
indicator <- apply(test[,-1], 1, function(row)
ifelse(all(row==1), 1, 0)
Note: the second parameter of apply is 1 if you for rows and 2 for columns.

Conditional counting in R

I have a question I hope some of you might help me with. I am doing a thesis on pharmaceuticals and the effect from parallelimports. I am dealing with this in R, having a Panel Dataset
I need a variable, that counts for a given original product - how many parallelimporters are there for this given time period.
Ideally what i want here is a new column, like number of PI-products (PI=1) for an original (PI=0) at time, t. So the output would be like:
Product_ID PI t nPIcomp
I hope I have made my issue clear :)
Thanks in advance,
Something like this?
find.count <- rle(x$PI)
count <- find.count$lengths[find.count$values == 1]
x[x$PI == 0, "nPIcomp"] <- count
I would use ave and your two columns Product_ID and t as grouping variables. Then, within each group, apply a function that returns the sum of PI followed by the appropriate number of NAs:
dat <- transform(dat, nPIcomp = ave(PI, Product_ID, t,
FUN = function(z) {
n <- sum(z)
c(n, rep(NA, n))
The same idea can be used with the data.table package if your data is large and speed is a concern.
Roman's answers gives exactly what you want. In case you want to summarise the data this would be handy, using the plyr pacakge (df is what I have called your data.frame)...
ddply( df , .(Product_ID , t ) , summarise , nPIcomp = sum(PI) )
# Product_ID t nPIcomp
#1 1 1 2
#2 1 2 4
#3 2 1 1
#4 2 2 1
#5 2 3 3

Changing the ID value based on another column

I have a large data set that looks something like this:
Conv. Rev. ID Order path_no
I want to make a new ID column based on when there is a new path_no, then the ID will change. So I am hoping it will look something like this:
Conv. Rev. ID Order path_no
I think rleid from data.table should do the trick. Here's one solution that uses data.table and dplyr:
dplyr::mutate(df, ID = data.table::rleid(path_no))
Or with data.table only:
dt <- setDT(df)
dt[, ID := rleid(path_no)][]
Can go for a simple for loop:
vals <- c(1, 1, 1, 2, 2, 2, 1, 1, 2)
nobs <- length(vals)
idx <- rep(1, nobs)
for (i in 2:nobs) {
if (vals[i] != vals[i-1]) {
idx[i] <- idx[i-1] + 1
} else {
idx[i] <- idx[i-1]
