adding reproducible code as suggested by answers
Qs<-paste0("Q2_", 1:18)
set.seed(15)
maindata <- data.frame(ID=1:5)
for(q in Qs) {
maindata[,q] <- sample(1:20,5,replace=T)
}
I have below code. Is their a better to achieve the output without writing each line? If i thought of writing the for loop for iterating over questions 1 to 18 but felt that for loop might not be too efficient...
ifelse(maindata$Q2_1 > 2 & maindata$Q2_1< 11 & !is.na(maindata$Q2_1), 1, 0 )+
ifelse(maindata$Q2_2 > 2 & maindata$Q2_2< 11 & !is.na(maindata$Q2_2), 1, 0)+
ifelse(maindata$Q2_3 > 2 & maindata$Q2_3< 11 & !is.na(maindata$Q2_3), 1, 0)+
ifelse(maindata$Q2_4 > 2 & maindata$Q2_4< 11 & !is.na(maindata$Q2_4), 1, 0)+
ifelse(maindata$Q2_5 > 2 & maindata$Q2_5< 11 & !is.na(maindata$Q2_5), 1, 0)+
ifelse(maindata$Q2_6 > 2 & maindata$Q2_6< 11 & !is.na(maindata$Q2_6), 1, 0)+
ifelse(maindata$Q2_7 > 2 & maindata$Q2_7< 11 & !is.na(maindata$Q2_7), 1, 0)+
ifelse(maindata$Q2_8 > 2 & maindata$Q2_8< 11 & !is.na(maindata$Q2_8), 1, 0)+
ifelse(maindata$Q2_9 > 2 & maindata$Q2_9< 11 & !is.na(maindata$Q2_9), 1, 0)+
ifelse(maindata$Q2_10 > 2 & maindata$Q2_10< 11 & !is.na(maindata$Q2_10), 1, 0)+
ifelse(maindata$Q2_11 > 2 & maindata$Q2_11< 11 & !is.na(maindata$Q2_11), 1, 0)+
ifelse(maindata$Q2_12 > 2 & maindata$Q2_12< 11 & !is.na(maindata$Q2_12), 1, 0)+
ifelse(maindata$Q2_13 > 2 & maindata$Q2_13< 11 & !is.na(maindata$Q2_13), 1, 0)+
ifelse(maindata$Q2_14 > 2 & maindata$Q2_14< 11 & !is.na(maindata$Q2_14), 1, 0)+
ifelse(maindata$Q2_15 > 2 & maindata$Q2_15< 11 & !is.na(maindata$Q2_15), 1, 0)+
ifelse(maindata$Q2_16 > 2 & maindata$Q2_16< 11 & !is.na(maindata$Q2_16), 1, 0)+
ifelse(maindata$Q2_17 > 2 & maindata$Q2_17< 11 & !is.na(maindata$Q2_17), 1, 0)+
ifelse(maindata$Q2_18 > 2 & maindata$Q2_18< 11 & !is.na(maindata$Q2_18), 1, 0)
Well, here's one way. First, let's create some sample data
Qs<-paste0("Q2_", 1:18)
set.seed(15)
maindata <- data.frame(ID=1:5)
for(q in Qs) {
maindata[,q] <- sample(1:20,5,replace=T)
}
Here we make a list of all the question names (Qs) and we create a data.frame with 5 rows where each column contains values sampled from 1:20. If we want the score for each line for each individual, we can do
score <- rowSums(sapply(Qs, function(q)
maindata[,q] > 2 & maindata[,q] <11 & !is.na(maindata[,q]) )+0)
Here I use sapply to iterate over the question names. Then i wrote the formula once and swap in the different question names. Here I return a simple logical value and add zero to convert FALSE to 0 and TRUE to 1. Then I use rowSums to app up scores across rows. We can see the results with
cbind(maindata[,"ID", drop=F], score)
# ID score
# 1 1 9
# 2 2 8
# 3 3 4
# 4 4 6
# 5 5 10
Related
I have two columns one is gender and the other one a measure as below. I want to set cutoffs for male(gender = 1) and measure column. I want to say if it is male and measure is less that 23 then it is 1 otherwise 0 and if if it is female and measure is less that 15 then it is 1 otherwise 0.
I tried below, but not not working. I appreciate your help.
d$measure_status = ifelse(d$gender ==2 & d$measure<15, 1, ifelse( d$gender ==1 & d$measure<23, 1), 0)
gender measure measure_status
2 14
2 17
1 25
1 26
You can use with and make it a single ifelse condition
df = data.frame(gender = c(2,2,1,1), measure = c(14,17,25,26))
df$measure_status <- with(df, ifelse((df$gender ==2 & df$measure<15) | (df$gender ==1 & df$measure<23), 1 , 0))
df
Output:
gender measure measure_status
1 2 14 1
2 2 17 0
3 1 25 0
4 1 26 0
You can also use transform
df = data.frame(gender = c(2,2,1,1), measure = c(14,17,25,26))
df <- transform(df, measure_status = ifelse((df$gender ==2 & df$measure<15) | (df$gender ==1 & df$measure<23), 1 , 0))
I have a dataframe with two columns. I want to compare the signs of each element in the column and see when it differs. It is easier to see with an example.
This is the dataframe:
df = data.frame(COL1 = rnorm(15, 0, 1), COL2 = rnorm(15, 0, 1))
COL1 COL2
1 0.01274137 -0.97966119
2 -0.48455106 1.19248167
3 -0.79149435 -1.45365392
4 -0.18961660 0.02216361
5 -0.34771000 1.39026672
6 0.28199427 0.49143945
7 -0.28650800 -0.71676355
8 -0.29677529 1.13092654
9 -0.24240084 0.99432286
10 2.13540200 0.66348347
11 1.94442199 0.53371032
12 -1.63108069 -0.21556863
13 0.38334186 -0.91472900
14 1.15981803 -0.54540520
15 1.04363634 -1.68835445
I would like to have a code that compares the signs of COL1 and COL2 and tells me when it differs. The outcome should be:
# rows where the sign differs: 1, 2, 3, 4, 5, 8, 9, 13, 14, 15
Can anyone help me with this?
Thanks
You can retrieve sign of each element with sign, and which retrieves the index of the inequalities
which(sign(df$COL1) != sign(df$COL2))
Edit: Warning, all three current answers above fail when there are NA values.
set.seed(4)
df2 = data.frame(COL1 = rnorm(15, 0, 1), COL2 = rnorm(15, 0, 1))
df2[1, 1] <- NA
COL1 COL2
1 NA 0.1690268
2 -0.54249257 1.1650268
3 0.89114465 -0.0442040
4 0.59598058 -0.1003684
5 1.63561800 -0.2834446
6 0.68927544 1.5408150
7 -1.28124663 0.1651690
8 -0.21314452 1.3076224
9 1.89653987 1.2882569
10 1.77686321 0.5928969
11 0.56660450 -0.2829437
12 0.01571945 1.2558840
13 0.38305734 0.9098392
14 -0.04513712 -0.9280281
15 0.03435191 1.2401808
which(sign(df2$COL1) != sign(df2$COL2))
[1] 2 3 4 5 7 8 11
which(sign(df2[,1] * df2[,2]) == -1)
[1] 2 3 4 5 7 8 11
which(df2$COL1 < 0 & df2$COL2 > 0 | df2$COL1 > 0 & df2$COL2 < 0)
[1] 2 3 4 5 7 8 11
Here is a solution that works if you have NA values, which tests equality and retrieves index when equality values are not in ! ... %in% TRUE, as opposed to != TRUE
which(!(sign(df2$COL1) == sign(df2$COL2)) %in% TRUE)
[1] 1 2 3 4 5 7 8 11
Compare output of
! NA %in% TRUE
[1] TRUE
NA != TRUE
[1] NA
How about multiplying the columns together and getting the sign with sign?
which(sign(data[,1] * data[,2]) == -1)
[1] 1 2 4 5 8 9 13 14 15
You can just apply logic comparing the columns if they're are < or > zero.
library(dplyr)
df %>%
filter(COL1 < 0 & COL2 > 0 | COL1 > 0 & COL2 < 0)
The index of rows can be obtained using which
which(df$COL1 < 0 & df$COL2 > 0 | df$COL1 > 0 & df$COL2 < 0)
The input vector is as below,
data=c(1,1,1,1,11,1,1,1,1,12,1,1,2,1,1,1)
I want the output as 1,1,1,1,11,11,11,11,11,12,12,12,2,2,2,2 where the 1's proceeding the non 1's should be imputed the non 1 value in R.
I tried the following code
data=c(1,1,1,1,11,1,1,1,1,12,1,1,2,1,1,1)
sapply(data, function(x) ifelse (lag(x)!=1,lag(x),x))
but it didn't yield expected output
You can convert every 1 after the first non-1 value to NA then use zoo::na.locf():
library(zoo)
x <- c(1,1,1,1,11,1,1,1,1,12,1,1,2,1,1,1)
data[seq_along(x) > which.max(x!= 1) & x== 1] <- NA
na.locf(x)
[1] 1 1 1 1 11 11 11 11 11 12 12 12 2 2 2 2
Or using replace() to add the NA values:
na.locf(replace(x, seq_along(x) > which.max(x != 1) & x == 1, NA))
In response to your comment about applying it to groups, you can use ave():
df <- data.frame(x = c(x, rev(x)), grp = rep(1:2, each = length(x)))
ave(df$x, df$grp, FUN = function(y)
na.locf(replace(y, seq_along(y) > which.max(y != 1) & y == 1, NA))
)
You can write your custom fill function:
x <- c(1,1,1,1,11,1,1,1,1,12,1,1,2,1,1,1)
myfill <- function(x) {
mem <- x[1]
for (i in seq_along(x)) {
if (x[i] == 1) {
x[i] <- mem
} else {
mem <- x[i]
}
}
x
}
myfill(x)
# 1 1 1 1 11 11 11 11 11 12 12 12 2 2 2 2
You could match unique 1 and non-1 values with the cumsum of non-1 values.
(c(1, x[x != 1]))[match(cumsum(x != 1), 0:3)]
# [1] 1 1 1 1 11 11 11 11 11 12 12 12 2 2 2 2
Data
x <- c(1, 1, 1, 1, 11, 1, 1, 1, 1, 12, 1, 1, 2, 1, 1, 1)
You can use rle from base to overwrite 1 with the value before.
x <- rle(data)
y <- c(FALSE, (x$values == 1)[-1])
x$values[y] <- x$values[which(y)-1]
inverse.rle(x)
# [1] 1 1 1 1 11 11 11 11 11 12 12 12 2 2 2 2
What is the optimal way to get the index of all elements that are repeated # times? I want to identify the elements that are duplicated more than 2 times.
rle() and rleid() both hint to the values I need but neither method directly gives me the indices.
I came up with this code:
t1 <- c(1, 10, 10, 10, 14, 37, 3, 14, 8, 8, 8, 8, 39, 12)
t2 <- lag(t1,1)
t2[is.na(t2)] <- 0
t3 <- ifelse(t1 - t2 == 0, 1, 0)
t4 <- rep(0, length(t3))
for (i in 2:length(t3)) t4[i] <- ifelse(t3[i] > 0, t3[i - 1] + t3[i], 0)
which(t4 > 1)
returns:
[1] 4 11 12
and those are the values I need.
Are there any R-functions that are more appropriate?
Ben
One option with data.table. No real reason to use this instead of lag/shift when n = 2, but for larger n this would save you from creating a large number of new lagged vectors.
library(data.table)
which(rowid(rleid(t1)) > 2)
# [1] 4 11 12
Explanation:
rleid will produce a unique value for each "run" of equal values, and rowid will mark how many elements "into" the run each element is. What you want is elements more than 2 "into" a run.
data.table(
t1,
rleid(t1),
rowid(t1))
# t1 V2 V3
# 1: 1 1 1
# 2: 10 2 1
# 3: 10 2 2
# 4: 10 2 3
# 5: 14 3 1
# 6: 37 4 1
# 7: 3 5 1
# 8: 14 6 2
# 9: 8 7 1
# 10: 8 7 2
# 11: 8 7 3
# 12: 8 7 4
# 13: 39 8 1
# 14: 12 9 1
Edit: If, as in the example posed by this question, no two runs (even length-1 "runs") are of the same value (or if you don't care whether the duplicates are next to eachother), you can just use which(rowid(t1) > 2) instead. (This is noted by Frank in the comments)
Hopefully this example clarifies the differences
a <- c(1, 1, 1, 2, 2, 1)
which(rowid(a) > 2)
# [1] 3 6
which(rowid(rleid(a)) > 2)
# [1] 3
You can use dplyr::lag or data.table::shift (note, default for shift is to lag, so shift(t1, 1) is equal to shift(t1, 1, type = "lag"):
which(t1 == lag(t1, 1) & lag(t1, 1) == lag(t1, 2))
[1] 4 11 12
# Or
which(t1 == shift(t1, 1) & shift(t1, 1) == shift(t1, 2))
[1] 4 11 12
If you need it to scale for several duplicates you can do the following (thanks for the tip #IceCreamToucan):
n <- 2
df1 <- sapply(0:n, function(x) shift(t1, x))
which(rowMeans(df1 == df1[,1]) == 1)
[1] 4 11 12
This is usually a case that rle is useful, i.e.
v1 <- rle(t1)
i1 <- seq_along(t1)[t1 %in% v1$values[v1$lengths > 2]]
i2 <- t1[t1 %in% v1$values[v1$lengths > 2]]
tapply(i1, i2, function(i) tail(i, -2))
#$`8`
#[1] 11 12
#$`10`
#[1] 4
You can unlist and get it as a vector,
unlist(tapply(i1, i2, function(i) tail(i, -2)))
#81 82 10
#11 12 4
There is also a function called rleid in data.table package which we can use,
unlist(lapply(Filter(function(i) length(i) > 2, split(seq_along(t1), data.table::rleid(t1))),
function(i) tail(i, -2)))
#2 71 72
#4 11 12
Another possibility involving rle() could be:
pseudo_rleid <- with(rle(t1), rep(seq_along(values), lengths))
which(ave(t1, pseudo_rleid, FUN = function(x) seq_along(x) > 2) != 0)
[1] 4 11 12
Hello i need help with programming R. I have data.frame B with four column
x<- c(1,2,1,2,1,2,1,2,1,2,1,2,.......etc.)
y<-c(5,5,8,8,12,12,19,19,30,30,50,50,...etc.)
z<- c(2018-11-08,2018-11-08,2018-11-09,2018-11-09,2018-11-11,2018-11-11,2018-11-20,2018-11-20,2018-11-29,2018-11-29,2018-11-30,2018-11-30,.......etc.)
m<-c(0,1,1,0,1,1,0,1,0,1,0,1,...etc.)
2 milion rows and i need create next columns . Next columns should look as
t<-c(0,1,0,0,0,0,0,1,0,1,0,1,....)
code in cycle look like
B$t[1]=ifelse(B$y[i]==B$y[i+1] & B$z[i]==B$z[i+1] & B$x[i]==2 & B$m[1]==1,1,0)
for (i in 2:length(B$z))
{
B$t[i]<-ifelse(B$y[i]==B$y[i-1] & B$z[i]==B$z[i-1] & B$x[i]==2 & B$m[i]==1 & B$m[i]!=B$m[i-1],1,0)
}
I do not want to use cycle- loop.
I use basic package in R.
And i have new one question when i have data.frame E
x<- c(1,2,3,1,2,3,1,2,3,1,2,3,.......etc.)
y<-c(5,5,5,8,8,8,12,12,12,,19,19,19,30,30,30,50,50,50,...etc.)
z<- c(2018-11-08,2018-11-08,2018-11-08,2018-11-09,2018-11-09,2018-11-09,2018-11-11,2018-11-11,2018-11-11,2018-11-20,2018-11-20,2018-11-20,2018-11-29,2018-11-29,2018-11-29,2018-11-30,2018-11-30,2018-11-30,.......etc.)
m<-c(0,1,1,0,0,1,0,1,0,1,0,1,0,0,1...etc.)
2 milion rows and i need create next columns . Next columns should look as
t<-c(0,1,0,0,1,....)
code in cycle look like
E$t[1]=ifelse(E$y[i]==E$y[i+1] & E$z[i]==E$z[i+1] & E$x[1]==2 & E$m[1]==1,1,0)
E$t[2]=ifelse(E$y[i]==E$y[i+1] & E$z[i]==E$z[i+1] & E$x[2]==3 & E$m[2]==1,1,0)
for (i in 3:length(E$y))
{
E$t[i]<-ifelse(E$y[i]==E$y[i-2] & E$z[i]==E$z[i-2] & E$x[i]==3 & E$m[i]==1 &
E$m[i-1]==0 & E$m[i-2]==0,1,0)
}
I do not want to use cycle- loop.
I use basic package in R.
Here is a solution with base R:
N <- nrow(B)
B$t <- ifelse(B$y==c(NA, B$y[-N]) & B$z==c(NA, B$z[-N]) & B$x==2 & B$m==1 & B$m!=c(NA, B$m[-N]), 1, 0)
Here is a solution with data.table:
library("data.table")
B <- data.table(
x= c(1,2,1,2,1,2,1,2,1,2,1,2), y= c(5,5,8,8,12,12,19,19,30,30,50,50),
z= c("2018-11-08", "2018-11-08", "2018-11-09", "2018-11-09", "2018-11-11", "2018-11-11", "2018-11-20",
"2018-11-20", "2018-11-29", "2018-11-29", "2018-11-30", "2018-11-30"),
m= c(0,1,1,0,1,1,0,1,0,1,0,1)
)
B[, t := ifelse(y==c(NA, y[- .N]) & z==c(NA, z[- .N]) & x==2 & m==1 & m!=c(NA, m[- .N]), 1, 0)]
or (if logical is acceptable)
B[, t := (y==c(NA, y[- .N]) & z==c(NA, z[- .N]) & x==2 & m==1 & m!=c(NA, m[- .N]))]
or using shift()
B[, t := (y==shift(y) & z==shift(z) & x==2 & m==1 & m!=shift(m))]
With dplyr you can use if_else and lag:
library(dplyr)
dat %>%
mutate(t = if_else(
y == lag(y) & z == lag(z) & x == 2 & m == 1 & m != lag(m), 1, 0)
) # mutate lets you create a new variable in dat (named t here)
# x y z m t
# 1 1 5 2018-11-08 0 0
# 2 2 5 2018-11-08 1 1
# 3 1 8 2018-11-09 1 0
# 4 2 8 2018-11-09 0 0
# 5 1 12 2018-11-11 1 0
# 6 2 12 2018-11-11 1 0
# 7 1 19 2018-11-20 0 0
# 8 2 19 2018-11-20 1 1
# 9 1 30 2018-11-29 0 0
# 10 2 30 2018-11-29 1 1
# 11 1 50 2018-11-30 0 0
# 12 2 50 2018-11-30 1 1
Data:
x<- c(1,2,1,2,1,2,1,2,1,2,1,2)
y<-c(5,5,8,8,12,12,19,19,30,30,50,50)
z<- c("2018-11-08","2018-11-08","2018-11-09","2018-11-09","2018-11-11","2018-11-11","2018-11-20","2018-11-20","2018-11-29","2018-11-29","2018-11-30","2018-11-30")
m<-c(0,1,1,0,1,1,0,1,0,1,0,1)
dat <- data.frame(x, y, z, m)