Mutate in R conditional - r

I have a dataframe df with columns payment_type and payment_amount.
I want to do a conditional mutate such that if payment_type is "tpt" or "trt", it should make the payment_amount 5 times.
df$payment_amount<-df%>%select(payment_amount)%>%filter(payment_type=='tpt' | payment_type=='trt')%>%mutate(payment_amount=payment_amount*100)
But this isn't working.
TIA

Here is an alternative to Park's answer with %in%.
df %>%
mutate(pament_amount = case_when(
payment_type %in% c("tpt" "trt") ~ 5 * payment_amount,
TRUE ~ payment_amount
))

Try
df %>%
mutate(pament_amount = case_when(
payment_type == "tpt" | payment_type == "trt" ~ 5 * payment_amount,
TRUE ~ payment_amount
))
Approach based on question
df$payment_amount[(df$payment_type=='tpt' | df$payment_type == "trt")]<- df %>%
filter(payment_type=='tpt' | payment_type=='trt')%>%
select(payment_amount)%>%
mutate(payment_amount=payment_amount*100) %>% pull

Related

How do I assign group level value - based on row level values - to df using dplyr

I have the following decision rules:
RELIABILITY LEVEL DESCRIPTION
LEVEL I Multiple regression
LEVEL II Multiple regression + mechanisms specified (all interest variables)
LEVEL III Multiple regression + mechanisms specified (all interest + control vars)
The first three columns are the data upon which the 4th column should be reproduced using dplyr.
The reliability level should be the same for the whole table (model)... I want to code it using dplyr.
Here is my try so far... As you can see, I can't get it to be the same for the whole model
library(tidyverse)
library(readxl)
library(effectsize)
df <- read_excel("https://github.com/timverlaan/relia/blob/59d2cbc5d7830c41542c5f65449d5f324d6013ad/relia.xlsx")
df1 <- df %>%
group_by(study, table, function_var) %>%
mutate(count_vars = n()) %>%
ungroup %>%
group_by(study, table, function_var, mechanism_described) %>%
mutate(count_int = case_when(
function_var == 'interest' & mechanism_described == 'yes' ~ n()
)) %>%
mutate(count_con = case_when(
function_var == 'control' & mechanism_described == 'yes' ~ n()
)) %>%
mutate(reliable_int = case_when(
function_var == 'interest' & count_vars/count_int == 1 ~ 1)) %>%
mutate(reliable_con = case_when(
function_var == 'control' & count_vars/count_con == 1 ~ 1)) %>%
# group_by(study, source) %>%
mutate(reliable = case_when(
reliable_int != 1 ~ 1,
reliable_int == 1 ~ 2,
reliable_int + reliable_con == 2 ~ 3)) %>%
# ungroup() %>%
The code settled on is:
library(tidyverse)
library(readxl)
df <- read_excel("C:/Users/relia.xlxs")
df <- df %>% select(-reliability_score)
test<-df %>% group_by(study,model,function_var) %>%
summarise(count_yes=sum(mechanism_described=="yes"),n=n(),frac=count_yes/n) %>%
mutate(frac_control=frac[function_var=="control"],
frac_interest=frac[function_var=="interest"]) %>%
mutate(reliability = case_when(
frac_control == 1 & frac_interest != 1 ~ -99,
frac_control != 1 & frac_interest != 1 ~ 2,
frac_interest == 1 & frac_control != 1 ~ 3,
frac_interest ==1 & frac_control == 1 ~ 4)) %>% group_by(study,model) %>% summarise(reliability=mean(reliability))
df_reliability<-left_join(df,test)
View(df_reliability)
However, I would prefer to do this all within one dplyr pipe. If anyone has a solution I would love to hear it...

How to select the rows in R data.frame?

How can I select the rows, which at least once have value 1 in all 4 columns? or have only 0 through all columns?
We can use filter with if_any/if_all
library(dplyr)
df1 %>%
filter(if_any(everything(), ~ .== 1)|if_all(everything(), ~ . == 0))
Or with base R
df1[(rowSums(df1 == 1) > 0) | (rowSums(df1 == 0) == ncol(df1)),]

How can i change multiple columns using the same condition in R?

I need to recode some columns in my data, there are 29 columns with the same coded expressions
The cells are coded with numbers, something like that:
1 - Normal
2 - Altered
3 - NA
I want to create a for loop to change all columns at the same time. I need to transform the number code (1;2;3) into names(Normal;Alteres;NA)
thats what im trying to do.... i dont get any error message but this arent working....
for (i in names(df[,123:151])){
mutate(i = case_when(
i == 1 ~ 'Normal',
i == 2 ~ 'Altered',
i == 3 ~ 'NA'))
}
An easy way to do this would be to use dplyr from tidyverse.
library(tidyverse)
#make test dataframe
col1 <- c("1", "2", "3")
col2 <- c(3, 2, 2)
df <- data.frame(col1, col2)
df_recoded<-df %>%
mutate(across(.cols = everything(), ~case_when(
. == 1 ~ 'Normal',
. == 2 ~ 'Altered',
. == 3 ~ NA_character_)))
Try this:
df %>% mutate(across(.cols = names(df)[121:151],
.fns = ~recode(.,`1` = "Normal", `2` = "Altered", `3` = "NA",.default=NA_character_)))

Automate Dplyr's mutate function

What is the best way to automate mutate function in one dplyr aggregation.
Best if I demonstrate on the example.
So in the first part of an example I am creating new columns based on values of variable gear. However, imagine I need to automate this step to automatically 'iterate' over all unique values of gear and creates new columns for each value.
Is there any how to do to so?
library(tidyverse)
cr <-
mtcars %>%
group_by(gear) %>%
nest()
# This is 'by-hand' approach of what I would like to do - How to automate it? E.g. we do not know all values of 'carb'
cr$data[[1]] %>%
mutate(VARIABLE1 =
case_when(carb == 1 ~ hp/mpg,
TRUE ~ 0)) %>%
mutate(VARIABLE2 =
case_when(carb == 2 ~ hp/mpg,
TRUE ~ 0)) %>%
mutate(VARIABLE4 =
case_when(carb == 4 ~ hp/mpg,
TRUE ~ 0))
# This is a pseodu-idea of what I need to do. Is the any way how to change iteration number in ONE dplyr code?
vals <- cr$data[[1]] %>% pull(carb) %>% sort %>% unique()
for (i in vals) {
message(i)
cr$data[[1]] %>%
mutate(paste('VARIABLE', i, sep = '') = case_when(carb == i ~ hp/mpg, # At this line, all i shall be first element of vals
TRUE ~ 0)) %>%
mutate(paste('VARIABLE', i, sep = '') = case_when(carb == i ~ hp/mpg, # At this line, all i shall be second element of vals
TRUE ~ 0)) %>%
mutate(paste('VARIABLE', i, sep = '') = case_when(carb == i ~ hp/mpg, # At this line, all i shall be third element of vals
TRUE ~ 0))
}
One way would be to use dummy_cols from package fastDummies
Doing it for one dataframe at a time:
cr$data[[1]] %>%
dummy_cols(select_columns = 'carb')%>%
mutate_at(vars(starts_with('carb_')),funs(.*hp/mpg))
You can also do this first and the group by gear since you are not using gear value in calculation so it wouldn't matter. For that:
cr_new=mtcars %>%
dummy_cols(select_columns = 'carb')%>%
mutate_at(vars(starts_with('carb_')),funs(.*hp/mpg))%>%
group_by(gear)%>%
nest()
Perhaps, something like this would help -
library(dplyr)
library(purrr)
bind_cols(mtcars, map_dfc(unique(mtcars$carb),
~mtcars %>%
transmute(!!paste0('carb', .x) := case_when(carb == 1 ~ hp/mpg,TRUE ~ 0))))
It sounds a lot like what's called "the XY-problem".
https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem
Please read about tidy data, and/or tidyr's pivot_longer/pivot_wider. Column names should not encode information.

Use dplyr to substitute apply

I have table like this (but number of columns can be different, I have a number of pairs ref_* + alt_*):
+--------+-------+-------+-------+-------+
| GeneID | ref_a | alt_a | ref_b | alt_b |
+--------+-------+-------+-------+-------+
| a1 | 0 | 1 | 1 | 3 |
| a2 | 1 | 1 | 7 | 8 |
| a3 | 0 | 1 | 1 | 3 |
| a4 | 0 | 1 | 1 | 3 |
+--------+-------+-------+---------------+
and need to filter out rows that have ref_a + alt_a < 10 and ref_b + alt_b < 10. It's easy to do it with apply, creating additional columns and filtering, but I'm learning to keep my data tidy, so trying to do it with dplyr.
I would use mutate first to create columns with sums and then filter by these sums. But can't figure out how to use mutate in this case.
Edited:
Number of columns is not fixed!
You do not need to mutate here. Just do the following:
require(tidyverse)
df %>%
filter(ref_a + alt_a < 10 & ref_b + alt_b < 10)
If you want to use mutate first you could go with:
df %>%
mutate(sum1 = ref_a + alt_a, sum2 = ref_b + alt_b) %>%
filter(sum1 < 10 & sum2 < 10)
Edit: The fact that we don't know the number of variables in advance makes it a bit more complicated. However, I think you could use the following code to perform this task (assuming that the variable names are all formated with "_a", "_b" and so on. I hope there is a shorter way to perform this task :)
df$GeneID <- as.character(df$GeneID)
df %>%
gather(variable, value, -GeneID) %>%
rowwise() %>%
mutate(variable = unlist(strsplit(variable, "_"))[2]) %>%
ungroup() %>%
group_by(GeneID, variable) %>%
summarise(sum = sum(value)) %>%
filter(sum < 10) %>%
summarise(keepGeneID = ifelse(n() == (ncol(df) - 1)/2, TRUE, FALSE)) %>%
filter(keepGeneID == TRUE) %>%
select(GeneID) -> ids
df %>%
filter(GeneID %in% ids$GeneID)
Edit 2: After some rework I was able to improve the code a bit:
df$GeneID <- as.character(df$GeneID)
df %>%
gather(variable, value, -GeneID) %>%
rowwise() %>%
mutate(variable = unlist(strsplit(variable, "_"))[2]) %>%
ungroup() %>%
group_by(GeneID, variable) %>%
summarise(sum = sum(value)) %>%
group_by(GeneID) %>%
summarise(max = max(sum)) %>%
filter(max < 10) -> ids
df %>%
filter(GeneID %in% ids$GeneID)

Resources