how to make binary variable? - r

I have three Variables with the scale (0,1,2)
for Example;
x1
x2
x3
1
0
1
NA
NA
0
1
1
1
NA
NA
NA
0
0
0
I want to create another variable if variable x1 and/or X2 and/or x3 has 1 then x4 has to be 1, sample values for x4 are under
x1
x2
x3
x4
1
0
1
1
NA
NA
0
0
1
1
1
1
NA
NA
NA
NA
0
0
0
0
I am using rstudio, i used if else function but I didn't get what I wanted.
can anyone please guide me what other ways I can have this variable.
I used following code
data$hope <- ifelse(data$x1 > 0 && data$x2 > 0 && data$x3 > 0,1,0)
data$hope <- ifelse(data$x1 > 0 && data$x2 > 0 && data$x3 > 0,1,0)

We could use pmax if there are only binary columns in the dataset
df1$x4 <- do.call(pmax, c(df1, na.rm = TRUE))

Related

How to construct dummy matrix with a list of data

The sample data is like this:
data1:
x1
x2
x3
x4
1
2
3
4
2
3
-1
-1
NA
NA
NA
NA
0
0
0
0
1
-1
-1
-1
NA
NA
NA
NA
4
3
-1
-1
0
0
0
0
data1[,1] means that data1[,1] belongs to group x1,x2,x3,x4.
-1 means that there is a blank.
0 means that the data does not belong to the corresponding group(i.e. if 0 is in x1, which means the datum does not belong to group 1.)
NA means missing data, where NA will randomly appear in the dataset.
Edit:
For example, in 1st row,
[1,2,3,4] means the first, second, third, and fourth columns.
Therefore, in the 1st row of data2, the row will be
[1,1,1,1].
In 1st row,
[2,3,-1,-1] means the second and third columns, -1 means that there is a blank.
Therefore, in the 1st row of data2, the row will be
[0,1,1,0].
My expected outcome is :
data2:
x1
x2
x3
x4
1
1
1
1
0
1
1
0
NA
NA
NA
NA
0
0
0
0
1
0
0
0
NA
NA
NA
NA
0
0
1
1
0
0
0
0
My code is as below:
for (i in 1:8){
if(data1$x1[i] %in% c(0)) {
data1[i,] = as.list(rep(0,4))
}
else if(is.na(data1$x1[i]))
{data1[i,] = as.list(rep(NA,4))
}}
for (i in which(data1$x1 %nin% c(NA,0))){
for (j in 1:4){
if (data1[i,j]<15 & data1[i,j]>0){
data1[i,j] = m
data1[i,m] = 1
}
}
}
#replace -1 to 0
data1[data1== -1] = 0
#This for loop creates dummy matrix
for (i in which(data1$x1%nin%c(NA,0))){
m = data1[i,]
m = m[m>0]
for(j in 1:length(m)){
data1[i,m] = 1
}
}
#replace the number that greater than zero to zero
data1[data1>1] = 0
I wonder if there is any function can be used to replace forloop. Please give me some suggestion, thank you!
I am still not entirely sure of logic, but this might be helpful. Using apply you can evaluate each row independently.
First, create a vector of NA. Then, where a value is greater than 1, set that element in the vector (column number) to 1.
Second, if the vector has at least one 1 value, then change the others missing to 0.
Third, if all elements are zero and no values are missing, then make all values in that row 0.
The end result is a matrix in this example.
t(apply(
data1,
MARGIN = 1,
\(x) {
vec <- rep(NA, length(x))
vec[x[x > 0]] <- 1
if (any(vec == 1, na.rm = T)) vec[is.na(vec)] <- 0
if (any(!is.na(x)) & all(x == 0)) vec <- rep(0, length(x))
vec
}
))
Output
[,1] [,2] [,3] [,4]
[1,] 1 1 1 1
[2,] 0 1 1 0
[3,] NA NA NA NA
[4,] 0 0 0 0
[5,] 1 0 0 0
[6,] NA NA NA NA
[7,] 0 0 1 1
[8,] 0 0 0 0

Create a new variable based on any 2 conditions being true

I have a dataframe in R with 4 variables and would like to create a new variable based on any 2 conditions being true on those variables.
I have attempted to create it via if/else statements however would require a permutation of every variable condition being true. I would also need to scale to where I can create a new variable based on any 3 conditions being true. I am not sure if there is a more efficient method than using if/else statements?
My example:
I have a dataframe X with following column variables
x1 = c(1,0,1,0)
X2 = c(0,0,0,0)
X3 = c(1,1,0,0)
X4 = c(0,0,1,0)
I would like to create a new variable X5 if any 2 of the variables are true (eg ==1)
The new variable based on the above dataframe would produce X5 (1,0,1,0)
This can easily be done by using the apply function:
x1 = c(1,0,1,0)
x2 = c(0,0,0,0)
x3 = c(1,1,0,0)
x4 = c(0,0,1,0)
df <- data.frame(x1,x2,x3,x4)
df$x5 <- apply(df,1,function(row) ifelse(sum(row != 0) == 2, 1, 0))
x1 x2 x3 x4 X5
1 1 0 1 0 1
2 0 0 1 0 0
3 1 0 0 1 1
4 0 0 0 0 0
apply with option 1 means: Do this function on every row. To scale this up to 3...N true values, just change the number in the ifelse statement.
You can try this:
#Data
df <- data.frame(x1,X2,X3,X4)
#Code
df$X5 <- ifelse(rowSums(df,na.rm=T)==2,1,0)
x1 X2 X3 X4 X5
1 1 0 1 0 1
2 0 0 1 0 0
3 1 0 0 1 1
4 0 0 0 0 0
You can use:
df$X5 <- 1*(apply(df == 1, 1, sum) == 2)
or
df$X5 <- 1*(mapply(sum, df) == 2)
Output
> df
X1 X2 X3 X4 X5
1 0 1 0 1
0 0 1 0 0
1 0 0 1 1
0 0 0 0 0
Data
df <- data.frame(X1,X2,X3,X4)

Alternatives to apply same condition to multiple variables inside case_when function

I am trying to find a more efficient or elegant solution to multiple conditioning inside case_when function.
I am creating a dummy column based on multiple conditions across specific columns of a data frame. There are many cases where I use the same is.na() for many columns. I have the correct result, but I have tried other approaches with apply, reduce and anyNa without success.
Let's say this data frame looks like the data I'm working on:
set.seed(12)
dframe <- data.frame(
x1 = sample(letters[1:2], 10, replace = TRUE),
x2 = sample(0:1, 10, replace = TRUE),
x3 = sample(0:2, 10, replace = TRUE),
x4 = sample(0:2, 10, replace = TRUE),
x5 = sample(0:2, 10, replace = TRUE),
x6 = sample(0:2, 10, replace = TRUE)
) %>%
mutate_if(is.numeric, list(~na_if(., 2)))
And it looks like this:
x1 x2 x3 x4 x5 x6
1 b 1 NA 0 0 0
2 b 0 0 0 NA NA
3 b 1 0 0 0 1
4 a 0 NA 1 NA 0
5 a 1 1 NA NA NA
6 b 0 NA 1 1 1
7 a 1 1 NA NA 0
8 a 1 0 1 NA 0
9 b 1 NA NA 0 0
10 b 1 1 0 NA NA
Then, I create the column x7 based on the following conditions:
dframe %>%
mutate(
x7 = case_when(
x2 == 1 &
(!is.na(x3) | !is.na(x4) | !is.na(x5)) &
!is.na(x6) ~ 1,
x2 == 1 ~ 0,
TRUE ~ NA_real_
)
)
resulting in:
x1 x2 x3 x4 x5 x6 x7
1 b 1 NA 0 0 0 1
2 b 0 0 0 NA NA NA
3 b 1 0 0 0 1 1
4 a 0 NA 1 NA 0 NA
5 a 1 1 NA NA NA 0
6 b 0 NA 1 1 1 NA
7 a 1 1 NA NA 0 1
8 a 1 0 1 NA 0 1
9 b 1 NA NA 0 0 1
10 b 1 1 0 NA NA 0
However, I want to find an alternative to write (!is.na(x3) | !is.na(x4) | !is.na(x5)) because in my real script I have to type this for 11 columns.
I've tried to use complete.cases(x3, x4, x5), but it doesn't follow the logic I'm using in the code.
Using anyNA(x3, x4, x5) throws Error in anyNA(x3, x4, x5) : anyNA takes 1 or 2 arguments.
Also tried the answers of a similar problem, but since I'm not using it for filtering, it didn't work out.
Maybe I'm overthinking it, but what I'm looking for is something without having to use (!is.na(x3) | !is.na(x4) | !is.na(x5)).
We could use rowSums and specify the columns by name
library(dplyr)
dframe %>%
mutate(x7 = case_when(
x2 == 1 &
rowSums(!is.na(.[c("x3","x4","x5")])) > 0 &
!is.na(x6) ~ 1,
x2 == 1 ~ 0,
TRUE ~ NA_real_
)
)
Or by position
rowSums(!is.na(.[3:5])) > 0
We could do this using inverted logic as well.
rowSums(is.na(.[c("x3","x4","x5")])) != 3
Or
rowSums(is.na(.[3:5])) != 3
We use 3 here as there are 3 columns to check in the given example (x3, x4 and x5), you can change the number based on your actual number of columns (11).

How to parse a column while referring to values from other multiple columns?

I have this sample dataframe where column a to d are reference columns and column x1-3 need to be parsed and plugged with new values.
Here is the code to re-produce the data frame:
df1 <- data_frame(a = c(0,1,0,1), b = c(0,0,1,1), c = c(0,0,0,0), d =
c(1,0,0,1), x1= c(NA, NA, NA, NA), x2= c(NA, NA, NA, NA), x3= c(NA, NA, NA, NA))
I want to give new values to x1 -x3 based on different value combination from column a, b, c, d. My pseudocode is as follows:
for df1[ , "x1"]:
if a = 1: then return 1
else: return 0
for df1[ , "x2"]:
if a = 1 & b = 1: then return 1
else: return 0
for df1[ , "x3"]:
all conditions: return 1
Ideally, all the values in x1 and x2 will be changed according to their given conditions. X3 should be filled with 1 no matter what. Can anyone suggest a efficient way to loop & parse through those columns, please?
You don't need loops:
df1$x1 <- df1$a
df1$x2 <- as.integer(df1$a & df1$b)
df1$x3 <- 1
Result:
a b c d x1 x2 x3
1 0 0 0 1 0 0 1
2 1 0 0 0 1 0 1
3 0 1 0 0 0 0 1
4 1 1 0 1 1 1 1
Edit:
If columns a-d are not binary values (0 or 1) you still can use the same expressions to create columns x1-3. Let's say you have this data frame:
a b c d x1 x2 x3
1 0 0 1 5 NA NA NA
2 3 9 2 1 NA NA NA
3 4 2 3 5 NA NA NA
4 2 1 4 1 NA NA NA
And your conditions are:
x1 = 1 if (b >= 2) and (d < 4) 0 otherwise
x2 = 1 if (a > b) and (b < d) 0 otherwise
x3 = always 1
You can use the same methodology:
df1$x1 <- as.integer(df1$b >= 2 & df1$d < 4)
df1$x2 <- as.integer(df1$a > df1$b & df1$b < df1$d)
df1$x3 <- 1
Result:
a b c d x1 x2 x3
1 0 0 1 5 0 0 1
2 3 9 2 1 1 0 1
3 4 2 3 5 0 1 1
4 2 1 4 1 0 0 1

extract rows for which first non-zero element is one

I would like to extract every row from the data frame my.data for which the first non-zero element is a 1.
my.data <- read.table(text = '
x1 x2 x3 x4
0 0 1 1
0 0 0 1
0 2 1 1
2 1 2 1
1 1 1 2
0 0 0 0
0 1 0 0
', header = TRUE)
my.data
desired.result <- read.table(text = '
x1 x2 x3 x4
0 0 1 1
0 0 0 1
1 1 1 2
0 1 0 0
', header = TRUE)
desired.result
I am not even sure where to begin. Sorry if this is a duplicate. Thank you for any suggestions or advice.
Here's one approach:
# index of rows
idx <- apply(my.data, 1, function(x) any(x) && x[as.logical(x)][1] == 1)
# extract rows
desired.result <- my.data[idx, ]
The result:
x1 x2 x3 x4
1 0 0 1 1
2 0 0 0 1
5 1 1 1 2
7 0 1 0 0
Probably not the best answer, but:
rows.to.extract <- apply(my.data, 1, function(x) {
no.zeroes <- x[x!=0] # removing 0
to.return <- no.zeroes[1] == 1 # finding if first number is 0
# if a row is all 0, then to.return will be NA
# this fixes that problem
to.return[is.na(to.return)] <- FALSE # if row is all 0
to.return
})
my.data[rows.to.extract, ]
x1 x2 x3 x4
1 0 0 1 1
2 0 0 0 1
5 1 1 1 2
7 0 1 0 0
Use apply to iterate over all rows:
first.element.is.one <- apply(my.data, 1, function(x) x[x != 0][1] == 1)
The function passed to apply compares the first [1] non-zero [x != 0] element of x to == 1. It will be called once for each row, x will be a vector of four in your example.
Use which to extract the indices of the candidate rows (and remove NA values, too):
desired.rows <- which(first.element.is.one)
Select the rows of the matrix -- you probably know how to do this.
Bonus question: Where do the NA values mentioned in step 2 come from?

Resources