Collecting same answer from different questions in one variable? - r

I am completely new to R, but running out of time here.
In my dataset, I have people from several countries answering who they voted for last. People from different countries got different questions, so in each column, only the ones from the country have an answer, the rest is NA.
I am trying to collect everyone who voted for a green party in one variable. So far I have succeeded in coding it into a separate dummy variable for each country using ifelse, but I cant seem to merge these variables. So now I have ie a variable for Germany, where a green vote in the german election is 1, and everyone else is 0. Same goes for France etc.
But how can I collect all this information in just one variable?
Appreciate your help.

Assuming your data set looks like this...
> ctry <- c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3)
> vote_ctry_1 <- c(1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0)
> vote_ctry_2 <- c(0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0)
> vote_ctry_3 <- c(0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0)
>
> dd <- data.frame(ctry, vote_ctry_1, vote_ctry_2, vote_ctry_3)
> dd
ctry vote_ctry_1 vote_ctry_2 vote_ctry_3
1 1 1 0 0
2 1 0 0 0
3 1 0 0 0
4 1 1 0 0
5 2 0 1 0
6 2 0 1 0
7 2 0 0 0
8 2 0 1 0
9 3 0 0 1
10 3 0 0 0
11 3 0 0 0
12 3 0 0 0
... then just add up the dummy variables:
> dd$vote_all <- vote_ctry_1 + vote_ctry_2 + vote_ctry_3
> dd
ctry vote_ctry_1 vote_ctry_2 vote_ctry_3 vote_all
1 1 1 0 0 1
2 1 0 0 0 0
3 1 0 0 0 0
4 1 1 0 0 1
5 2 0 1 0 1
6 2 0 1 0 1
7 2 0 0 0 0
8 2 0 1 0 1
9 3 0 0 1 1
10 3 0 0 0 0
11 3 0 0 0 0
12 3 0 0 0 0

Related

Conditionally replace all values based on value of 1 column in R

Background
The data set is given below for reproducibility
data <- structure(list(rest1 = c(1, 1, 0, 1, 1, 1, 0, 1, 0, 1),
rest2 = c(1, 0, 1, 0, 0, 1, 1, 0, 0, 0),
rest3 = c(1, 0, 0, 0, 0, 1, 0, 1, 0, 0),
rest4 = c(1, 0, 0, 0, 0, 1, 0, 0, 0, 0),
rest5 = c(1, 1, 0, 0, 0, 1, 0, 1, 0, 1),
rest6 = c(0, 0, 1, 0, 0, 0, 1, 0, 1, 0)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -10L))
The output is given below:
A tibble: 10 x 6
rest1 rest2 rest3 rest4 rest5 rest6
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 1 1 0
2 1 0 0 0 1 0
3 0 1 0 0 0 1
4 1 0 0 0 0 0
5 1 0 0 0 0 0
6 1 1 1 1 1 0
7 0 1 0 0 0 1
8 1 0 1 0 1 0
9 0 0 0 0 0 1
10 1 0 0 0 1 0
My question
Based on the values of column sleep 6, there needs to be changes made. Given the variable rest6 is equal to 1, the other variables rest1-rest5 need to be changed to 0. Here, variables 3 and 7 need to be fixed.
The desired output is below:
rest1 rest2 rest3 rest4 rest5 rest6
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 1 1 0
2 1 0 0 0 1 0
3 0 0 0 0 0 1
4 1 0 0 0 0 0
5 1 0 0 0 0 0
6 1 1 1 1 1 0
7 0 0 0 0 0 1
8 1 0 1 0 1 0
9 0 0 0 0 0 1
10 1 0 0 0 1 0
Previous Attempts
I have attempted to do so using my basic knowledge of R. My logic is if rest6 is equal to 1 and the observations are equal to 1, then set to 0, else we return the original value. However, this has not worked and I am a little unsure/not as proficient in R as of deliberate.
data <- ifelse(data$rest6 == 1 & data[,c(2:5) == 1],
0,
data[,c(2:6)])
Another attempt I have tried to use a function() to identify where to place the values.
Thank you for your help.
A simple base R solution may be to isolate all those in which rest6 == 1 and change all values in the relevant columns to 0:
data[data$rest6 %in% 1, 1:5] <- 0
Output:
rest1 rest2 rest3 rest4 rest5 rest6
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 1 1 0
2 1 0 0 0 1 0
3 0 0 0 0 0 1
4 1 0 0 0 0 0
5 1 0 0 0 0 0
6 1 1 1 1 1 0
7 0 0 0 0 0 1
8 1 0 1 0 1 0
9 0 0 0 0 0 1
10 1 0 0 0 1 0
In tidyverse, a simple solution would be to loop across columns rest1 to rest5, and use case_when to replace the values that correspond to 1 in rest6 to 0
library(dplyr)
data <- data %>%
mutate(across(rest1:rest5,
~ case_when(rest6 == 1 ~ 0, TRUE ~ .x)))
-output
data
# A tibble: 10 × 6
rest1 rest2 rest3 rest4 rest5 rest6
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 1 1 0
2 1 0 0 0 1 0
3 0 0 0 0 0 1
4 1 0 0 0 0 0
5 1 0 0 0 0 0
6 1 1 1 1 1 0
7 0 0 0 0 0 1
8 1 0 1 0 1 0
9 0 0 0 0 0 1
10 1 0 0 0 1 0
data.table solution
library(data.table)
setDT(data)
data[rest6 == 1, 1:5 := 0]

R - Function that dichotomizes certain columns of a data frame based on different thresholds

I am trying to create a function that dichotomizes certain defined columns of a data frame based on different values depending on the column.
For example, in the following data frame with conditions A, B, C and D:
A <- c(0, 2, 1, 0, 2, 1, 0, 0, 1, 2)
B <- c(0, 1, 1, 1, 0, 0, 0, 1, 1, 0)
C <- c(0, 0, 0, 1, 1, 1, 1, 1, 1, 1)
D <- c(0, 0, 3, 1, 2, 1, 4, 0, 3, 0)
Data <- data.frame(A, B, C, D)
I would like the function to dichotomize the conditions that I select [e.g. A, B, D] and dichotomize them based on thresholds that I assign [e.g. 2 for A, 1 for B, 3 for D].
I would like the dichotomized columns to be added to the data frame with different names [e.g. A_dich, B_dich, D_dich].
The final data frame should look like this (you will notice B is already dichotomized, which is fine, it should just be treated equally and added):
A B C D A_dicho B_dicho D_dicho
1 0 0 0 0 0 0 0
2 2 1 0 0 1 1 0
3 1 1 0 3 0 1 1
4 0 1 1 1 0 1 0
5 2 0 1 2 1 0 0
6 1 0 1 1 0 0 0
7 0 0 1 4 0 0 1
8 0 1 1 0 0 1 0
9 1 1 1 3 0 1 1
10 2 0 1 0 1 0 0
Could someone help me? Many thanks in advance.
Make a little threshold vector specifying the values, then Map it to the columns:
thresh <- c("A"=2, "B"=1, "D"=3)
Data[paste(names(thresh), "dicho", sep="_")] <- Map(
\(d,th) as.integer(d >= th), Data[names(thresh)], thresh
)
Data
## A B C D A_dicho B_dicho D_dicho
##1 0 0 0 0 0 0 0
##2 2 1 0 0 1 1 0
##3 1 1 0 3 0 1 1
##4 0 1 1 1 0 1 0
##5 2 0 1 2 1 0 0
##6 1 0 1 1 0 0 0
##7 0 0 1 4 0 0 1
##8 0 1 1 0 0 1 0
##9 1 1 1 3 0 1 1
##10 2 0 1 0 1 0 0

How to apply a function to several columns listed in a vector in a function

Within a function, I am trying to create an additional column to a data frame, which corresponds to the minimum of several other columns that are listed in the entry of the function.
A minimal data set would be:
C1 <- c(1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0)
C2 <- c(0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0)
C3 <- c(0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1)
C4 <- c(0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0)
Data <- data.frame(C1, C2, C3, C4)
If I want the minimum from C1, C2, and C4, outside a function, I would call:
Data$Min <- pmin(Data$C1, Data$C2, Data$C4)
Inside a function, however, I struggle and was only able to produce this:
min.col <- function(data, conditions){
data$Min <- pmin(data[[conditions]]) # [[ ]] is the wrong way to refer to the conditions, but I do not find how to
# After that, I go on here with my function based on the column data$Min but it is not relevant for the present problem.
}
To be called by:
min.col(data, conditions=c("C1", "C2", "C4"))
Anyone there to help? Many thanks in advance!
These use only base R.
1) We can use do.call("pmin", ...) like this.
f <- function(data, cols) transform(data, min = do.call("pmin", data[cols]))
f(Data, c("C1", "C2", "C4"))
giving:
C1 C2 C3 C4 min
1 1 0 0 0 0
2 0 1 1 0 0
3 1 1 0 0 0
4 1 1 0 1 1
5 0 0 0 0 0
6 0 1 0 0 0
7 1 0 1 0 0
8 1 0 1 1 0
9 0 0 1 0 0
10 0 1 0 0 0
11 0 1 1 0 0
12 1 1 1 1 1
13 1 0 0 0 0
14 0 1 0 0 0
15 0 0 1 0 0
2) or use apply
f2 <- function(data, cols) transform(data, min = apply(data[cols], 1, min))
f2(Data, c("C1", "C2", "C4"))
3) or Reduce
f3 <- function(data, cols) transform(data, min = Reduce(pmin, data[cols]))
f3(Data, c("C1", "C2", "C4"))
4) If data[cols] only has 0 and 1 cells then if we compute the number of 0's in a row then the minimum should be 1 if that sum is 0 and the minimum is 0 otherwise. Note that 0 is regarded as FALSE and any other number is regarded as TRUE when coerced to logical so:
f4 <- function(data, cols) transform(data, min = +!rowSums(!data[cols]))
f4(Data, c("C1", "C2", "C4"))
We can do this pretty quickly using some of the functions from the tidyverse packages. The key here is not to use quotation marks, wrap the columns in vars, and then use the triple bang !!! to separate and evaluate in the function.
library(tidyverse)
min.col <- function(data, conditions){
data %>%
mutate(Min = pmin(!!!conditions))
}
min.col(Data, vars(C1, C2))
#> C1 C2 C3 C4 Min
#> 1 1 0 0 0 0
#> 2 0 1 1 0 0
#> 3 1 1 0 0 1
#> 4 1 1 0 1 1
#> 5 0 0 0 0 0
#> 6 0 1 0 0 0
#> 7 1 0 1 0 0
#> 8 1 0 1 1 0
#> 9 0 0 1 0 0
#> 10 0 1 0 0 0
#> 11 0 1 1 0 0
#> 12 1 1 1 1 1
#> 13 1 0 0 0 0
#> 14 0 1 0 0 0
#> 15 0 0 1 0 0

Is there a way to create new var containing conditional value in R?

I am a relatively new user to R and have been struggling with this issue.
Supposed I have the following df with 5 variables a:e
year <- c(1990:1994)
a <- c(1, 0, 0, 0, 0)
b <- c(0, 1, 0, 0, 0)
c <- c(0, 0, 5, 1, 0)
d <- c(0, 0, 0, 1, 0)
e <- c(0, 2, 0, 0, 1)
df <- data.frame(year, a, b, c, d, e)
Then, how do I create a new variable "f", which contains "value > 0" according to "year".
Any help would be much appreciated!
Edited: the desired output is column f below
year a b c d e f
1990 1 0 0 0 0 1
1991 0 1 0 0 2 2
1992 0 0 5 0 0 1
1993 0 0 1 1 0 2
1994 0 2 0 0 1 2
Use rowSums to count how many values are > 0, excluding the first column.
df$f = rowSums(df[-1] > 0)
df
# year a b c d e f
# 1 1990 1 0 0 0 0 1
# 2 1991 0 1 0 0 2 2
# 3 1992 0 0 5 0 0 1
# 4 1993 0 0 1 1 0 2
# 5 1994 0 0 0 0 1 1
We can use apply with sum
df$f <- apply(df[-1] > 0, 1, sum)

How to create create a 0-1 combination of n arrays with specific condition in Julia

I am setting up my huge 0-1combination matrix for n arrays in Julia. However, I do not want all the combination as it creates a memory usage problem. I just want to have only a legal combination matching some specific condition and the condition is if column I and column J is 1 then this row should not be in the combination.
I have tried some codes in https://discourse.julialang.org/t/cleanest-way-to-generate-all-combinations-of-n-arrays/20127/6 and then delete the unwanted row, but this failed when it comes to 2^34 combinations.
Let say we have n=6 which result in 64 0-1combinations in total
and I want to exclude the combination when the value of element 1 and 4 are 1, and 2 and 5 are 1, and 3 and 6 are 1. The matrix should contain 28 instead of 64 rows like:
0 0 0 0 0 1
0 0 0 0 1 0
0 0 0 0 1 1
0 0 0 1 0 0
0 0 0 1 0 1
0 0 0 1 1 0
0 0 0 1 1 1
0 0 1 0 0 0
0 0 1 0 1 0
0 0 1 1 0 0
0 0 1 1 1 0
0 1 0 0 0 0
0 1 0 0 0 1
0 1 0 1 0 0
0 1 0 1 0 1
0 1 1 0 0 0
0 1 1 0 0 1
0 1 1 1 0 0
1 0 0 0 0 0
1 0 0 0 0 1
1 0 0 0 1 0
1 0 0 0 1 1
1 0 1 0 0 0
1 0 1 0 1 0
1 1 0 0 0 0
1 1 0 0 0 1
1 1 1 0 0 0
0 0 0 0 0 0
Why do you need to materialize that entire array? It's much better to create each combination on the fly when you need it, or create an iterator that gives you the permissible rows one at a time. In the discourse post you link, Stefan describes this too https://discourse.julialang.org/t/cleanest-way-to-generate-all-combinations-of-n-arrays/20127/17 , and as he also says, it's hard to give more advice without knowing what you'll use it for.
You can make an iterator that gives you mostly what you want by
iter = (x for x in Iterators.product(0:1, 0:1, 0:1, 0:1, 0:1, 0:1) if max(x[2] + x[5],x[1] + x[4], x[3] + x[6]) != 2)
You can iterate over iter in a for loop or whatever you need it for:
collect(iter)
27-element Array{NTuple{6,Int64},1}:
(0, 0, 0, 0, 0, 0)
(1, 0, 0, 0, 0, 0)
(0, 1, 0, 0, 0, 0)
(1, 1, 0, 0, 0, 0)
(0, 0, 1, 0, 0, 0)
⋮
(0, 0, 0, 0, 1, 1)
(1, 0, 0, 0, 1, 1)
(0, 0, 0, 1, 1, 1)

Resources