Mutating a variable based on whether other multiple columns are all NA - r

I am trying to create a column and set it to 1 based on whether all particular columns (with similar name pattern) is NA.
This is what I have tried so far and doesn't seem to work.
Any help would be appreciated thanks!
mutate(
column_to_create =
case_when(
is.na(vars(matches('pattern'))) ~ as.character(1)
)
)

You can try -
library(dplyr)
df <- df %>%
mutate(column_to_create = as.integer(rowSums(!is.na(select(.,
matches('pattern')))) == 0))
This should give 1 when all the values in the column that has 'pattern' in them has NA and 0 otherwise.

Related

R how to add another column to a dataset based on 2 other columns

I have a data set of messages exchanged in an organization, I want to create another column based on case_when the sender_department == receiver_department, assign "intra" while if the sender_department != receiver_department, assign "inter".
I'm doing this to know the proportion of inter and intra departmental messages over the period.
I've use the code below
intra_inter_msg <- DF %>%
mutate(inter_intra = case_when(sender_department == receiver_department, ~"intra", ,
sender_department != receiver_department, ~"inter"))
and I got this error
Error in mutate():
! Problem while computing inter_intra = case_when(...).
Caused by error in case_when():
! Case 1 (sender_department == receiver_department) must be a two-sided formula, not a logical
I made a little example DF to test it:
require(dplyr)
DF = data.frame (sender_department = c("econ","math","history"),receiver_department = c("econ","history","math"))
DF
intra_inter_msg <- DF %>%
mutate(inter_intra = case_when(sender_department == receiver_department ~"intra",
sender_department != receiver_department ~"inter"))
intra_inter_msg

NA values are not recognized properly using dplyr

I have a dataset with two columns, in one of them are missing values.
I load it using
data <- read_excel("file.xlsx") %>%
select("ID", "Value")
The tibble looks like that
ID
Value
1
2
NA
4
32
1
The NAs are recognized as such.
However, I use
data["ID"=="NA"] <- NA
to ensure that this is not the problem (R: is.na() does not pick up NA value).
When I try to filter:
data %>%
filter(!is.na(ID))
the whole tibble stays the same, and no row is deleted.
So I try
data %>%
mutate(
isna <- is.na(ID)
)
and all isna are FALSE.
Why doesn't recognize dplyr the NAs?
I am grateful for every help!
data["ID"=="NA"] <- NA
does nothing. The condition "ID"=="NA" is always FALSE, since you are comparing two unequal string literals ("ID" and "NA"). To fix it, use e.g.
data[data$ID == "NA", "ID"] <- NA
Welcome to SO! Use this to get NAs mutated and then delete the NAs:
data <- data %>%
mutate(ID = ifelse(ID == "NA",NA,ID)) %>%
filter(!is.na(ID))
Why not directly
data %>%
filter(ID != "NA")
or
subset(data, ID != "NA")

New column / mutate based on existing column

I want to add a new column to a dataframe df based on a condition from the existing columns e.g.,
df$TScore = as.factor(0)
df$TScore =
if_else(df$test_score >= '8.0', 'high',
if_else(!is.na(df$test_score), 'low', 'NA'))
The problem I am facing is, for some cases TScore is what I would expect it to be i.e., 'high' when the score is 8 or greater but for some cases it is not correct.
Is there an error in the above code? There are lots of NAs in this data.
I am also struggling with how to write it using dplyr(). So far, I have written this:
df$TScore = df %>%
filter(test_score >= 8) %>%
mutate(TScore = 'high')
But as we would expect, the dimensions do not match. Following error is given:
Error in `$<-.data.frame`(`*tmp*`, appScore, value = list(cluster3 = c(1L, : replacement has 126 rows, data has 236
Any advice would be greatly appreciated.
We don't need to do the filter, insted can use ifelse or case_when
library(dplyr)
df <- df %>%
mutate(TScore = case_when(test_score >= 8 ~'high', TRUE ~ "low"))
if we need to avoid the assignment <, can use the compound operator (%<>% from magrittr
library(magrittr)
df %<>%
mutate(TScore = case_when(is.na(test_score) ~ NA_character_,
test_score >= 8 & !is.na(test_score) ~'high',
TRUE ~ "low"))
The error occurred because of assigning a filtered data.frame to a new column in the original dataset

Dynamic sum/count condition while assignment

I have two data frames (table1 and randomdata) with the following schema:
#randomdata
randomdata$cube = {1,5,3,3,4,5,5,2,2,6,1,2,....} (1000 rows)
#table1
table1$side = {1,2,3,4,5,6} (6 rows)
table1$frequency = NULL
I want to count the occurence from the different sides of the cube (of the first 10 rows from randomdata$cube) and assign the result to table1$frequency to the corresponding row (based on table1$side).
I can do this successfuly this way:
table1$frequency[1] <- sum(randomdata$cube[1:10] == 1)
table1$frequency[2] <- sum(randomdata$cube[1:10] == 2)
table1$frequency[3] <- sum(randomdata$cube[1:10] == 3)
...
table1$frequency[6] <- sum(randomdata$cube[1:10] == 6)
This works very well, but there must be a better way.
Instead of 6 statements, I imagine something like this:
table1$frequency <- sum(randomdata$cube[1:10] == table1$side)
Can someone show me a more dynamic way to do this?
Thank you.
We can do this with converting the 'cube' column to factor with levels specified as 1:6 and then do the table. If we do it without that, missing elements can get dropped out of the table output. Here, it would be 0 if a level is missing
table1$frequency <- table(factor(randomdata$cube[1:10], levels = 1:6))
Or using tidyverse
library(tidyverse)
randomdata %>%
slice(1:6) %>%
count(cube = factor(cube, levels = 1:6), .drop = FALSE) %>%
pull(n) %>%
mutate(table1, frequency = .)

Is there an R function to fill NA value of a column in a specific way?

So I have a data frame like this :
And I'd like that all and only the missing values that I have (NAs) are replaced by this formula : Value1 / Value2
I know how to do this with a loop, but when it comes to a large scale data frame it takes time so I was wondering if there is any function/tip to give me the expected result faster
Not a direct function but something like this would work
#Get indices for NA non-zero values
inds1 <- is.na(df$Result) & df$Value2 != 0
#Get indices for NA zero values
inds2 <- is.na(df$Result) & df$Value2 == 0
#Replace them
df$Result[inds1] <- df$Value1[inds1]/df$Value2[inds1]
df$Result[inds2] <- 0
perfect for tidyverse
library(tidyverse)
d %>%
mutate(Result = ifelse(is.na(Result), Value1/Value2, Result)))
or
d %>%
mutate(Result = case_when(is.na(Result) & Value2 == 0 ~ Value2,
is.na(Result) ~ Value1/Value2,
TRUE ~ Result))

Resources