I have a dataframe in R with 4 variables and would like to create a new variable based on any 2 conditions being true on those variables.
I have attempted to create it via if/else statements however would require a permutation of every variable condition being true. I would also need to scale to where I can create a new variable based on any 3 conditions being true. I am not sure if there is a more efficient method than using if/else statements?
My example:
I have a dataframe X with following column variables
x1 = c(1,0,1,0)
X2 = c(0,0,0,0)
X3 = c(1,1,0,0)
X4 = c(0,0,1,0)
I would like to create a new variable X5 if any 2 of the variables are true (eg ==1)
The new variable based on the above dataframe would produce X5 (1,0,1,0)
This can easily be done by using the apply function:
x1 = c(1,0,1,0)
x2 = c(0,0,0,0)
x3 = c(1,1,0,0)
x4 = c(0,0,1,0)
df <- data.frame(x1,x2,x3,x4)
df$x5 <- apply(df,1,function(row) ifelse(sum(row != 0) == 2, 1, 0))
x1 x2 x3 x4 X5
1 1 0 1 0 1
2 0 0 1 0 0
3 1 0 0 1 1
4 0 0 0 0 0
apply with option 1 means: Do this function on every row. To scale this up to 3...N true values, just change the number in the ifelse statement.
You can try this:
#Data
df <- data.frame(x1,X2,X3,X4)
#Code
df$X5 <- ifelse(rowSums(df,na.rm=T)==2,1,0)
x1 X2 X3 X4 X5
1 1 0 1 0 1
2 0 0 1 0 0
3 1 0 0 1 1
4 0 0 0 0 0
You can use:
df$X5 <- 1*(apply(df == 1, 1, sum) == 2)
or
df$X5 <- 1*(mapply(sum, df) == 2)
Output
> df
X1 X2 X3 X4 X5
1 0 1 0 1
0 0 1 0 0
1 0 0 1 1
0 0 0 0 0
Data
df <- data.frame(X1,X2,X3,X4)
Related
I have three Variables with the scale (0,1,2)
for Example;
x1
x2
x3
1
0
1
NA
NA
0
1
1
1
NA
NA
NA
0
0
0
I want to create another variable if variable x1 and/or X2 and/or x3 has 1 then x4 has to be 1, sample values for x4 are under
x1
x2
x3
x4
1
0
1
1
NA
NA
0
0
1
1
1
1
NA
NA
NA
NA
0
0
0
0
I am using rstudio, i used if else function but I didn't get what I wanted.
can anyone please guide me what other ways I can have this variable.
I used following code
data$hope <- ifelse(data$x1 > 0 && data$x2 > 0 && data$x3 > 0,1,0)
data$hope <- ifelse(data$x1 > 0 && data$x2 > 0 && data$x3 > 0,1,0)
We could use pmax if there are only binary columns in the dataset
df1$x4 <- do.call(pmax, c(df1, na.rm = TRUE))
If I have a date set with lots of binary variables, all with values o/1. I want to create a new column, and add by one if the observation is 1 of one binary variable, add by two if it has 1 of two binary variables...
Such as:
x1 x2 x3 x4 x5
1 1 1 0 1
0 0 1 0 0
0 0 0 0 0
I want to have
x1 x2 x3 x4 x5 count
1 1 1 0 1 4
0 0 1 0 0 1
0 0 0 0 0 0
If your dataset contains only the binary variables you are interested in, you can use
df$count <- rowSums(df)
Otherwise, please provide a more detailed description of your data.
Another option is Reduce with +
df$count <- Reduce(`+`, df)
This question already has an answer here:
How to convert two factors to adjacency matrix in R?
(1 answer)
Closed 4 years ago.
I am facing a challenge that I cannot manage to solve. I have a list of observations x_i (dimension is large, something around 30k) and a list of observations y_j (also large). x_i and y_i are id of the same units (say firms).
I have a dataframe of two columns that links x_i and y_j: if they appear on the same line, it means that they are connected. What I would like is to convert this network into a large matrix M of size (unique(union(x, y))) and which takes the value 1 if the two firms are connected.
Here is an example in small dimensions:
x1 x2
x3 x6
x4 x5
x1 x5
What I would like is a matrix:
0 1 0 0 1 0
0 0 0 0 0 0
0 0 0 0 0 1
0 0 0 1 0 0
0 0 0 0 0 0
0 0 0 0 0 0
Right now, the only solution I could think of is a double loop combined with a search in the initial dataframe:
list_firm = union(as.vector(df[1]), as.vector(df[2]))
list_firm <- sort(list_firm[[1]])
list_firm <- unique(list_firm)
M <- Matrix(nrow = length(list_firm), ncol = length(list_firm))
for (i in list_firm) {
for (j in list_firm) {
M[i, j] = !is.null(which(df$col1 == i & df$col2 == j))
}
}
Where df is the two columns data frame. This is obviously much too long to run.
Any suggestion? This would be very welcome
We convert the columns to factor with levels specified as the unique elements of both columns and get the frequency with table
lvls <- sort(unique(unlist(df)))
df[] <- lapply(df, factor, levels = lvls)
table(df)
# col2
#col1 x1 x2 x3 x4 x5 x6
# x1 0 1 0 0 1 0
# x2 0 0 0 0 0 0
# x3 0 0 0 0 0 1
# x4 0 0 0 0 1 0
# x5 0 0 0 0 0 0
# x6 0 0 0 0 0 0
data
df <- structure(list(col1 = c("x1", "x3", "x4", "x1"), col2 = c("x2",
"x6", "x5", "x5")), class = "data.frame", row.names = c(NA, -4L
))
The answer provided by #akrun in the comments is a good one. However, this is a good scenario to take advantage of a different data structure than data frames. Basically, what you're looking for is an adjacency matrix, which is a data structure in social network analysis. To achieve this, we can use the igraph package in R.
library(igraph)
library(dplyr)
df = data_frame(source=c('x1', 'x3', 'x4', 'x1'), target=c('x2', 'x6', 'x5', 'x5'))
g = graph_from_data_frame(df, directed=FALSE)
output = as.matrix(get.adjacency(g))
x1 x3 x4 x2 x6 x5
x1 0 0 0 1 0 1
x3 0 0 0 0 1 0
x4 0 0 0 0 0 1
x2 1 0 0 0 0 0
x6 0 1 0 0 0 0
x5 1 0 1 0 0 0
The output columns aren't in the exact order as your example, but this is a trivial problem to solve if needed.
I have a dataframe:
> df <- data.frame(x = c('x1','x1','x2','x2','x2','x3','x3','x3'),
+ y = c(0,0,1,1,1,0,0,0),
+ z = c(1,1,0,0,0,0,0,0))
> df
x y z
1 x1 0 1
2 x1 0 1
3 x2 1 0
4 x2 1 0
5 x2 1 0
6 x3 0 0
7 x3 0 0
8 x3 0 0
I would like to create a subset based on y column where it is equal to 1, keep the value of x column based on the condition and make the 1 be 0.
I have only found how I could find the first step:
> length(which(df$y == 1))
[1] 3
How could a have a final output like this:
x y
x2 0
x2 0
x2 0
require(dplyr)
df %>%
filter(y == 1) %>%
select(x, y) %>%
mutate(y = 0)
transform(subset(df[1:2],y==1),y=0)
x y
3 x2 0
4 x2 0
5 x2 0
If you're open to using other packages, data.table is another option:
library(data.table)
setDT(df)[y == 1, .(x, y = 0)]
# x y
#1: x2 0
#2: x2 0
#3: x2 0
I would like to extract every row from the data frame my.data for which the first non-zero element is a 1.
my.data <- read.table(text = '
x1 x2 x3 x4
0 0 1 1
0 0 0 1
0 2 1 1
2 1 2 1
1 1 1 2
0 0 0 0
0 1 0 0
', header = TRUE)
my.data
desired.result <- read.table(text = '
x1 x2 x3 x4
0 0 1 1
0 0 0 1
1 1 1 2
0 1 0 0
', header = TRUE)
desired.result
I am not even sure where to begin. Sorry if this is a duplicate. Thank you for any suggestions or advice.
Here's one approach:
# index of rows
idx <- apply(my.data, 1, function(x) any(x) && x[as.logical(x)][1] == 1)
# extract rows
desired.result <- my.data[idx, ]
The result:
x1 x2 x3 x4
1 0 0 1 1
2 0 0 0 1
5 1 1 1 2
7 0 1 0 0
Probably not the best answer, but:
rows.to.extract <- apply(my.data, 1, function(x) {
no.zeroes <- x[x!=0] # removing 0
to.return <- no.zeroes[1] == 1 # finding if first number is 0
# if a row is all 0, then to.return will be NA
# this fixes that problem
to.return[is.na(to.return)] <- FALSE # if row is all 0
to.return
})
my.data[rows.to.extract, ]
x1 x2 x3 x4
1 0 0 1 1
2 0 0 0 1
5 1 1 1 2
7 0 1 0 0
Use apply to iterate over all rows:
first.element.is.one <- apply(my.data, 1, function(x) x[x != 0][1] == 1)
The function passed to apply compares the first [1] non-zero [x != 0] element of x to == 1. It will be called once for each row, x will be a vector of four in your example.
Use which to extract the indices of the candidate rows (and remove NA values, too):
desired.rows <- which(first.element.is.one)
Select the rows of the matrix -- you probably know how to do this.
Bonus question: Where do the NA values mentioned in step 2 come from?