R Set Column Value based on other Column Values - r

I need to set the values of a column to 0 or 1 based on other columns values.
If they are 0 or NA the new column should be 1.
I Thought about:
ifelse(df[,53:62]==0|NA, df$newCol <- 1, df$newCol <- 0)
But I the End I get only 1 in the new Column
Thanks for your help

I think the tidyverse fits perfectly on this common use case
library(tidyverse)
df_example <- matrix(c(0,1),ncol = 100,nrow = 100) %>%
as_tibble()
df_example %>%
mutate(across(.cols = 53:62,
.fns = ~ if_else(.x == 0|is.na(.x),
1,
0))
) %>%
select(V54) # example**

Related

Create columns based on other columns names R

I need to operate columns based on their name condition. In the following reproducible example, per each column that ends with 'x', I create a column that multiplies by 2 the respective variable:
library(dplyr)
set.seed(8)
id <- seq(1,700, by = 1)
a1_x <- runif(700, 0, 10)
a1_y <- runif(700, 0, 10)
a2_x <- runif(700, 0, 10)
df <- data.frame(id, a1_x, a1_y, a2_x)
#Create variables manually: For every column that ends with X, I need to create one column that multiplies the respective column by 2
df <- df %>%
mutate(a1_x_new = a1_x*2,
a2_x_new = a2_x*2)
Since I'm working with several columns, I need to automate this process. Does anybody know how to achieve this? Thanks in advance!
Try this:
df %>% mutate(
across(ends_with("x"), ~ .x*2, .names = "{.col}_new")
)
Thanks #RicardoVillalba for correction.
You could use transmute and across to generate the new columns for those column names ending in "x". Then, use rename_with to add the "_new" suffix and bind_cols back to the original data frame.
library(dplyr)
df <- df %>%
transmute(across(ends_with("x"), ~ . * 2)) %>%
rename_with(., ~ paste0(.x, "_new")) %>%
bind_cols(df, .)
Result:
head(df)
id a1_x a1_y a2_x a1_x_new a2_x_new
1 1 4.662952 0.4152313 8.706219 9.325905 17.412438
2 2 2.078233 1.4834044 3.317145 4.156466 6.634290
3 3 7.996580 1.4035441 4.834126 15.993159 9.668252
4 4 6.518713 7.0844794 8.457379 13.037426 16.914759
5 5 3.215092 3.5578827 8.196574 6.430184 16.393149
6 6 7.189275 5.2277208 3.712805 14.378550 7.425611

replace_na with tidyselect?

Suppose I have a data frame with a bunch of columns where I want to do the same NA replacement:
dd <- data.frame(x = c(NA, LETTERS[1:4]), a = rep(NA_real_, 5), b = c(1:4, NA))
For example, in the data frame above I'd like to do something like replace_na(dd, where(is.numeric), 0) to replace the NA values in columns a and b.
I could do
num_cols <- purrr::map_lgl(dd, is.numeric)
r <- as.list(setNames(rep(0, sum(num_cols)), names(dd)[num_cols]))
replace_na(dd, r)
but I'm looking for something tidier/more idiomatic/nicer ...
If we need to dynamically do the replacement with where(is.numeric), can wrap it in across
library(dplyr)
library(tidyr)
dd %>%
mutate(across(where(is.numeric), replace_na, 0))
Or we can specify the replace as a list of key/value pairs
replace_na(dd, list(a = 0, b = 0))
which can be programmatically created by selecting the columns that are numeric, get the names, convert to a key/value pair with deframe (or use summarise with 0) and then use replace_na
library(tibble)
dd %>%
select(where(is.numeric)) %>%
summarise(across(everything(), ~ 0)) %>%
replace_na(dd, .)

find the count of values having zero in rows in dataframe

i am trying to calculate the count of zero in rows and then subtract it from 5
for eg in excel =3-COUNTIF(SM1:SM3,0)
any solution for this
df <- data.frame("T_1_1"= c(68,NA,0,105,NA,0,135,NA,24),
"T_1_2"=c(26,NA,0,73,NA,97,46,NA,0),
"T_1_3"=c(93,32,NA,103,NA,0,147,NA,139),
"S_2_1"=c(69,67,94,0,NA,136,NA,92,73),
"S_2_2"=c(87,67,NA,120,NA,122,0,NA,79),
"S_2_3"= c(150,0,NA,121,NA,78,109,NA,0),
"T_1_0"= c(79,0,0,NA,98,NA,15,NA,2)
)
df <- df %>% mutate(ltc = (5-rowSums(select(., matches('T_1[1-9]')) == 0,na.rm = TRUE)))
I believe you forgot an underscore in matches().
df %>%
mutate(ltc = 5 - rowSums(select(., matches('T_1_[1-9]')) == 0, na.rm = T))
Here is a base R option using rowSums
df$ltc = 5- rowSums(df == 0, na.rm = TRUE)

combine mutate(across) and case_when to fill multiple columns with 0 depending on condition

In a dplyr workflow I try to paste a 0 in each column of a dataframe after the newvar column when newvar == 0, else do nothing.
I modified the iris dataset:
library(dplyr)
n <- 150 # sample size
iris1 <- iris %>%
mutate(id = row_number(), .before = Sepal.Length) %>%
mutate(newvar = sample(c(0,1), replace=TRUE, size=n), .before = Sepal.Length ) %>%
mutate(across(.[,3:ncol(.)], ~ case_when(newvar==0 ~ 0)))
I tried a solution like here How to combine the across () function with mutate () and case_when () to mutate values in multiple columns according to a condition?.
My understanding:
with .[,3:ncol(.)] I go through the columns after newvar column.
with case_when(newvar==0 I try to set the condition.
with ~ 0 after newvar==0 I try to say paste 0 if condition is fulfilled.
I know that I am doing something wrong, but I don't know what! Thank you for your help.
.[,3:ncol(.)] are the values of the column and not the actual column numbers. Using 3:ncol(.) should work fine.
In general, it is also better to avoid referring column by positions and instead use their names. You can do this in one mutate call.
library(dplyr)
n <- 150
iris %>%
mutate(id = row_number(),
newvar = sample(c(0,1), replace=TRUE, size=n),
across(Sepal.Length:Petal.Width, ~ case_when(newvar==0 ~ 0,
newvar == 1 ~ .)))

Unexpected behavior with case_when and is.na

I want to change all NA values in a column to 0 and all other values to 1. However, I can't get the combination of case_when and is.na to work.
# Create dataframe
a <- c(rep(NA,9), 2, rep(NA, 10))
b <- c(rep(NA,9), "test", rep(NA, 10))
df <- data.frame(a,b, stringsAsFactors = F)
# Create new column (c), where all NA values in (a) are transformed to 0 and other values are transformed to 1
df <- df %>%
mutate(
c = case_when(
a == is.na(.$a) ~ 0,
FALSE ~ 1
)
)
I expect column (c) to indicate all 0 values and one 1 value, but its all 0's.
It does work when I use an if_else statement with is.na, like:
df <- df %>%
mutate(
c = if_else(is.na(a), 0, 1))
)
What is going on here?
You should be doing this instead:
df %>%
mutate(
c = case_when(
is.na(a) ~ 0,
TRUE ~ 1
)
)

Resources