R For Loop Not Working - r

Mydata set test is below. I want to create a new variable "indicator" which is=1 if all variables equal 1 (example row 3) or else 0.
id X10J X10f X10m X10ap X10myy X10junn X10julyy
1 1001 2 2 2 2 2 2 2
2 1002 1 1 -1 2 1 1 1
3 1003 1 1 1 1 1 1 1
4 1004 1 1 2 1 1 1 1
12 1012 1 2 1 1 1 1 1
i created the following for loop:
for (i in c(test$X10J,test$X20f,test$X10m,test$X10ap,test$Xmyy,test$X10junn,test$X10julyy)){
if(i==1){
test$indicator=1
}else if(i==2|i==-1){
test$indicator=0
}
}
this creates a variable with all values=1 instead of 0 and -1.

A vectorized solution:
test$indicator <- ifelse(rowSums(test[,-1] ==1)==ncol(test[,-1]),1,0)

No need for a for loop. You can use apply
> test$indicator <- apply(test[-1], 1, function(x) ifelse(all(x == 1), 1, 0))
> test
id X10J X10f X10m X10ap X10myy X10junn X10julyy indicator
1 1001 2 2 2 2 2 2 2 0
2 1002 1 1 -1 2 1 1 1 0
3 1003 1 1 1 1 1 1 1 1
4 1004 1 1 2 1 1 1 1 0
12 1012 1 2 1 1 1 1 1 0

You could just use:
indicator <- apply(test[,-1], 1, function(row)
{
ifelse(all(row==1), 1, 0)
})
Note: the second parameter of apply is 1 if you for rows and 2 for columns.

Related

Conditional variable using R code

I have a data set named "dats".
id y i j
1 0 1 1
1 0 1 2
1 0 1 3
2 1 2 1
2 1 2 2
2 1 2 3
I want to calculate, a new variable ynew=(yij-1*yij) based on (y11*y12, y12*y13....so on). I have tried in this way:
ynew <- NULL
for(p in 1)
{
for (q in ni)
{
ynew[p,q] <- dats$y[dats$i==p & dats$j==q-1]*dats$y[dats$i==p & dats$j==q]
}
}
ynew
But it showing error!
Expected output
id y i j ynew
1 0 1 1 NA
1 0 1 2 0
1 0 1 3 0
2 1 2 1 NA
2 1 2 2 1
2 1 2 3 1
Could anybody help? TIA
Using dplyr and rollapply from zoo package,
library(dplyr)
library(zoo)
dats %>%
group_by(id) %>%
mutate(ynew = c(NA, rollapply(y, 1, by = 2, prod)))
#Source: local data frame [6 x 5]
#Groups: id [2]
# id y i j ynew
# (int) (int) (int) (int) (dbl)
#1 1 0 1 1 NA
#2 1 0 1 2 0
#3 1 0 1 3 0
#4 2 1 2 1 NA
#5 2 1 2 2 1
#6 2 1 2 3 1
May be we need to just multiply with the lag of 'y' grouped by 'id'
library(data.table)
setDT(dats)[, ynew := y * shift(y), by = id]
dats
# id y i j ynew
#1: 1 0 1 1 NA
#2: 1 0 1 2 0
#3: 1 0 1 3 0
#4: 2 1 2 1 NA
#5: 2 1 2 2 1
#6: 2 1 2 3 1
It could also be done with roll_prod
library(RcppRoll)
setDT(dats)[, ynew := c(NA, roll_prod(y, 2)), by = id]
dats
# id y i j ynew
#1: 1 0 1 1 NA
#2: 1 0 1 2 0
#3: 1 0 1 3 0
#4: 2 1 2 1 NA
#5: 2 1 2 2 1
#6: 2 1 2 3 1

All combinations with multiple constraints

I wish to generate all possible combinations of a set of numbers, but with multiple constraints. I have found several similar questions on Stack Overflow, but none that appear to address all of my constraints:
R: sample() command subject to a constraint
R all combinations of 3 vectors with conditions
Generate all combinations given a constraint
R - generate all combinations from 2 vectors given constraints
Below is an example data set. This is a deterministic data set, in my mind anyway.
desired.data <- read.table(text = '
x1 x2 x3 x4
1 1 1 1
1 1 1 2
1 1 1 3
1 1 2 1
1 1 2 2
1 1 2 3
1 1 3 3
1 2 1 1
1 2 1 2
1 2 1 3
1 2 2 1
1 2 2 2
1 2 2 3
1 2 3 3
1 3 3 3
0 1 1 1
0 1 1 2
0 1 1 3
0 1 2 1
0 1 2 2
0 1 2 3
0 1 3 3
0 0 1 1
0 0 1 2
0 0 1 3
0 0 0 1
', header = TRUE, stringsAsFactors = FALSE, na.strings = 'NA')
Here are the constraints:
Column 1 can only contain a 0 or 1
The last column can only contain 1, 2 or 3
All other columns can contain 0, 1, 2 or 3
Once a non-0 appears in a row the rest of that row cannot contain another 0
Once a 3 appears in a row the rest of that row must only contain 3's
The first non-0 number in a row must be a 1
The only way I know to generate this type of data set is to use nested for-loops as shown below. I have used this technique for years and finally decided to ask if there might be a better way.
I hope this is not a duplicate and I hope it is not considered too specialized. I create these types of data sets frequently and a simpler solution would be quite helpful.
my.data <- matrix(0, ncol = 4, nrow = 25)
my.data <- as.data.frame(my.data)
j <- 1
for(i1 in 0:1) {
if(i1 == 0) i2.begin = 0
if(i1 == 0) i2.end = 1
if(i1 == 1) i2.begin = 1
if(i1 == 1) i2.end = 3
if(i1 == 2) i2.begin = 1
if(i1 == 2) i2.end = 3
if(i1 == 3) i2.begin = 3
if(i1 == 3) i2.end = 3
for(i2 in i2.begin:i2.end) {
if(i2 == 0) i3.begin = 0
if(i2 == 0) i3.end = 1
if(i2 == 1) i3.begin = 1
if(i2 == 1) i3.end = 3
if(i2 == 2) i3.begin = 1
if(i2 == 2) i3.end = 3
if(i2 == 3) i3.begin = 3
if(i2 == 3) i3.end = 3
for(i3 in i3.begin:i3.end) {
if(i3 == 0) i4.begin = 1 # 1 not 0 because last column
if(i3 == 0) i4.end = 1
if(i3 == 1) i4.begin = 1
if(i3 == 1) i4.end = 3
if(i3 == 2) i4.begin = 1
if(i3 == 2) i4.end = 3
if(i3 == 3) i4.begin = 3
if(i3 == 3) i4.end = 3
for(i4 in i4.begin:i4.end) {
my.data[j,1] <- i1
my.data[j,2] <- i2
my.data[j,3] <- i3
my.data[j,4] <- i4
j <- j + 1
}
}
}
}
my.data
dim(my.data)
Here is the output:
V1 V2 V3 V4
1 0 0 0 1
2 0 0 1 1
3 0 0 1 2
4 0 0 1 3
5 0 1 1 1
6 0 1 1 2
7 0 1 1 3
8 0 1 2 1
9 0 1 2 2
10 0 1 2 3
11 0 1 3 3
12 1 1 1 1
13 1 1 1 2
14 1 1 1 3
15 1 1 2 1
16 1 1 2 2
17 1 1 2 3
18 1 1 3 3
19 1 2 1 1
20 1 2 1 2
21 1 2 1 3
22 1 2 2 1
23 1 2 2 2
24 1 2 2 3
25 1 2 3 3
26 1 3 3 3
EDIT
Sorry that I initially forgot to include Constraint #6.
Here is code that creates the desired data set for this specific example. I suspect the code can be generalized. If I succeed in generalizing it I will post the result. Although the code is messy and not intuitive I am convinced there is a basic general pattern.
desired.data <- read.table(text = '
x1 x2 x3 x4
1 1 1 1
1 1 1 2
1 1 1 3
1 1 2 1
1 1 2 2
1 1 2 3
1 1 3 3
1 2 1 1
1 2 1 2
1 2 1 3
1 2 2 1
1 2 2 2
1 2 2 3
1 2 3 3
1 3 3 3
0 1 1 1
0 1 1 2
0 1 1 3
0 1 2 1
0 1 2 2
0 1 2 3
0 1 3 3
0 0 1 1
0 0 1 2
0 0 1 3
0 0 0 1
', header = TRUE, stringsAsFactors = FALSE, na.strings = 'NA')
n <- 3 # non-zero numbers
m <- 4-2 # number of middle columns
x1 <- rep(1:0, c(((n*(n-1)) * (n-1) + n), (n*(n-1) + n + (n-1))))
x2 <- rep(c(1:n, 1:0), c(n*m+1, n*m+1, 1, n*m+1, n*1+1))
x3 <- rep(c(rep(1:n, n-1), n, 1:n, 1:0), c(rep(c(n,n,1), n-1), 1, n,n,1, n,1))
x4 <- c(rep(c(rep(1:n, (n-1)), n), (n-1)), n, rep(1:n,(n-1)), n, 1:n, 1)
my.data <- data.frame(x1, x2, x3, x4)
all.equal(desired.data, my.data)
# [1] TRUE
I would use expand.grid to generate all combinations and then subset it, one constraint at a time:
x<-expand.grid(0:1,0:3,0:3,1:3)
## Once a non-0 appears in a row the rest of that row cannot contain another 0
b1<-apply(x,1,function(z) min(diff(z!=0))==0)
x<-x[b1,]
## Once a 3 appears in a row the rest of that row must only contain 3's
b1<-apply(x,1,function(z) min(diff(z==3))==0)
x<-x[b1,]
## The first non-0 number in a row must be a 1
b1<-apply(x,1,function(z) {
w<-which(z==0)
length(w)==0 || z[tail(w,1)+1]==1
})
x<-x[b1,]
And now sort it:
x<-x[order(x[,1],x[,2],x[,3],x[,4]),]
x
Output:
Var1 Var2 Var3 Var4
1 0 0 0 1
9 0 0 1 1
41 0 0 1 2
73 0 0 1 3
11 0 1 1 1
43 0 1 1 2
75 0 1 1 3
19 0 1 2 1
51 0 1 2 2
83 0 1 2 3
91 0 1 3 3
12 1 1 1 1
44 1 1 1 2
76 1 1 1 3
20 1 1 2 1
52 1 1 2 2
84 1 1 2 3
92 1 1 3 3
14 1 2 1 1
46 1 2 1 2
78 1 2 1 3
22 1 2 2 1
54 1 2 2 2
86 1 2 2 3
94 1 2 3 3
96 1 3 3 3
Similar to #mrip, start from expand.grid which can handle the first 3 constraints since they don't interact with the other columns
step1<-expand.grid(0:1,0:3,0:3,1:3)
Next I would filter it. The difference between this approach and mrip's is that my filtering is in one apply instead of 3 so it should be around 3 times faster to filter.
filtered<-step1[apply(step1,1,function(x) all(if(length(which(x==0))>0) {max(which(x==0))==length(which(x==0))} else {TRUE}, if(length(which(x==3))>0) {min(which(x==3))==length(x)-length(which(x==3))+1} else {TRUE}, x[!x%in%0][1]==1)),]
That should be it. If you want to inspect each element inside the apply here it is:
if(length(which(x==0))>0) {max(which(x==0))==length(which(x==0))} else {TRUE}
If there are any zeros then it makes sure that nothing comes before the zero
if(length(which(x==3))>0) {min(which(x==3))==length(x)-length(which(x==3))+1} else {TRUE}
If there are any 3s it makes sure nothing is after them.
x[!x%in%0][1]==1) This first filters the zeros out of the row and then takes the first element of the row after that filter and only allows it to be a one.

How can i count occurrence with few variables in R

I have some example data.frame:
x<- data.frame(c(0,1,2,1,2,1,2),c(0,1,2,1,2,2,1),c(0,1,2,1,2,1,2),c(0,1,2,1,2,2,1))
colnames(x) <- c('PV','LA','Wiz','LAg')
I want to count occurrence by hole row. The result should look like:
PV LA Wiz Lag Replace
0 0 0 0 1
1 1 1 1 2
2 2 2 2 2
1 2 1 2 1
2 1 2 1 1
The row 0 0 0 0 was replaced 1, row 1 1 1 1 was replaced 2 times etc.
Do you have any idea, how can I do it ?
Maybe you want this?
as.data.frame(table(do.call(paste, x[,-1])))
# Var1 Freq
#1 0 0 0 0 1
#2 1 1 1 1 2
#3 1 2 1 2 1
#4 2 1 2 1 1
#5 2 2 2 2 2

Conditional counting in R

I have a question I hope some of you might help me with. I am doing a thesis on pharmaceuticals and the effect from parallelimports. I am dealing with this in R, having a Panel Dataset
I need a variable, that counts for a given original product - how many parallelimporters are there for this given time period.
Product_ID PI t
1 0 1
1 1 1
1 1 1
1 0 2
1 1 2
1 1 2
1 1 2
1 1 2
2 0 1
2 1 1
2 0 2
2 1 2
2 0 3
2 1 3
2 1 3
2 1 3
Ideally what i want here is a new column, like number of PI-products (PI=1) for an original (PI=0) at time, t. So the output would be like:
Product_ID PI t nPIcomp
1 0 1 2
1 1 1
1 1 1
1 0 2 4
1 1 2
1 1 2
1 1 2
1 1 2
2 0 1 1
2 1 1
2 0 2 1
2 1 2
2 0 3 3
2 1 3
2 1 3
2 1 3
I hope I have made my issue clear :)
Thanks in advance,
Henrik
Something like this?
x <- read.table(text = "Product_ID PI t
1 0 1
1 1 1
1 1 1
1 0 2
1 1 2
1 1 2
1 1 2
1 1 2
2 0 1
2 1 1
2 0 2
2 1 2
2 0 3
2 1 3
2 1 3
2 1 3", header = TRUE)
find.count <- rle(x$PI)
count <- find.count$lengths[find.count$values == 1]
x[x$PI == 0, "nPIcomp"] <- count
Product_ID PI t nPIcomp
1 1 0 1 2
2 1 1 1 NA
3 1 1 1 NA
4 1 0 2 4
5 1 1 2 NA
6 1 1 2 NA
7 1 1 2 NA
8 1 1 2 NA
9 2 0 1 1
10 2 1 1 NA
11 2 0 2 1
12 2 1 2 NA
13 2 0 3 3
14 2 1 3 NA
15 2 1 3 NA
16 2 1 3 NA
I would use ave and your two columns Product_ID and t as grouping variables. Then, within each group, apply a function that returns the sum of PI followed by the appropriate number of NAs:
dat <- transform(dat, nPIcomp = ave(PI, Product_ID, t,
FUN = function(z) {
n <- sum(z)
c(n, rep(NA, n))
}))
The same idea can be used with the data.table package if your data is large and speed is a concern.
Roman's answers gives exactly what you want. In case you want to summarise the data this would be handy, using the plyr pacakge (df is what I have called your data.frame)...
ddply( df , .(Product_ID , t ) , summarise , nPIcomp = sum(PI) )
# Product_ID t nPIcomp
#1 1 1 2
#2 1 2 4
#3 2 1 1
#4 2 2 1
#5 2 3 3

Changing the ID value based on another column

I have a large data set that looks something like this:
Conv. Rev. ID Order path_no
0 0 1 1 1
1 50 1 2 1
0 0 1 3 2
1 100 1 4 2
0 0 2 1 1
0 0 2 2 1
1 150 2 3 1
1 100 2 4 2
I want to make a new ID column based on when there is a new path_no, then the ID will change. So I am hoping it will look something like this:
Conv. Rev. ID Order path_no
0 0 1 1 1
1 50 1 2 1
0 0 2 3 2
1 100 2 4 2
0 0 3 1 1
0 0 3 2 1
1 150 3 3 1
1 100 4 4 2
I think rleid from data.table should do the trick. Here's one solution that uses data.table and dplyr:
dplyr::mutate(df, ID = data.table::rleid(path_no))
Conv. Rev. ID Order path_no
1 0 0 1 1 1
2 1 50 1 2 1
3 0 0 2 3 2
4 1 100 2 4 2
5 0 0 3 1 1
6 0 0 3 2 1
7 1 150 3 3 1
8 1 100 4 4 2
Or with data.table only:
dt <- setDT(df)
dt[, ID := rleid(path_no)][]
Conv. Rev. ID Order path_no
1: 0 0 1 1 1
2: 1 50 1 2 1
3: 0 0 2 3 2
4: 1 100 2 4 2
5: 0 0 3 1 1
6: 0 0 3 2 1
7: 1 150 3 3 1
8: 1 100 4 4 2
Data:
text <- "Conv. Rev. ID Order path_no
0 0 1 1 1
1 50 1 2 1
0 0 1 3 2
1 100 1 4 2
0 0 2 1 1
0 0 2 2 1
1 150 2 3 1
1 100 2 4 2"
df <- read.table(text = text, stringsAsFactors = FALSE, header = TRUE)
Can go for a simple for loop:
vals <- c(1, 1, 1, 2, 2, 2, 1, 1, 2)
nobs <- length(vals)
idx <- rep(1, nobs)
for (i in 2:nobs) {
if (vals[i] != vals[i-1]) {
idx[i] <- idx[i-1] + 1
} else {
idx[i] <- idx[i-1]
}
}

Resources