Use a list in a mutate function in R - r

I am trying to use a list in a mutate, please see below:
Grouping <- c('f_risk_code', 'f_risk_category')
Pasting <- c('f_risk_code, f_risk_category')
Then using it in here:
Nested_Train %>%
mutate(Category = paste0(glue_col(Pasting), sep='_'))
But this is not having the desired effect - it is just returning f_risk_code', 'f_risk_category as the Category instead of the actual risk code and risk category fields. Any help appreciated.

You can use do.call :
library(dplyr)
Nested_Train %>% mutate(Category = do.call(paste0, .[Pasting]))

With tidyverse, we can use invoke
library(dplyr)
library(purrr)
Nested_Train %>%
mutate(Category = invoke(paste0, .[Pasting]))

This was the solution for me:
Try something like:
cols <- rlang::exprs(hp, cyl); mtcars %>% mutate(out=paste(!!!cols, sep="_"))

Related

how to add a new variable to a list of file

If I want to build a variable IDwith STUDYID+SUBJECT for all the dataset with name _OK in my environment, what should I do?
I can think of
list<-mget(ls(pattern = "_OK$"))
then I can I create new var ID for all data.frames that with name _OK? I think lapply or map should work, but i am not sure how to use those two. Could someone help me set an example?
Thanks.
You can use :
list_data <-mget(ls(pattern = "_OK$"))
lapply(list_data, function(x) transform(x,ID = paste(STUDYID,SUBJECT,sep = "-")))
Or using tidyverse :
library(dplyr)
library(purrr)
map(list_data, ~.x %>% mutate(ID = paste(STUDYID,SUBJECT,sep = "-")))

How to change variable to factor based on its name in some list by using across?

(I am new in R)
Trying to change variables data type of df members to factors based on condition if their names available in a list to_factors_list.
I have tried some code using mutate(across()) but it's giving errors.
Data prep.:
library(tidyverse)
# tidytuesday himalayan data
members <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-22/members.csv")
# creating list of names
to_factors_list <- members %>%
map_df(~(data.frame(n_distinct = n_distinct(.x))),
.id = "var_name") %>%
filter(n_distinct < 15) %>%
select(var_name) %>% pull()
to_factors_list
############### output ###############
'season''sex''hired''success''solo''oxygen_used''died''death_cause''injured''injury_type'
Getting error in below code attempts:
members %>%
mutate(across(~.x %in% to_factors_list, factor))
members %>%
mutate_if( ~.x %in% to_factors_list, factor)
I am not sure what's wrong and how can I make this work ?
In base R, this can be done with lapply
members[to_factors_list] <- lapply(members[to_factors_list], factor)
The correct syntax is:
members %>% mutate(across(to_factors_list, factor))
Or if you prefer an older-version dplyr syntax:
members %>% mutate_at(vars(to_factors_list), factor)

Pipeline %>% in R

I want to use the pipeline %>% from TIDYVERSE/PURRR to make this more readable:
myChargingDevices<-data.frame(fromJSON(jsonFile))
myChargingDevices<-myChargingDevices %>%
mutate(myTime=ymd_hms(lastUpdateCheck))
myChargingDevices<-myChargingDevices[order(myChargingDevices$myTime,decreasing = TRUE),]
myChargingDevices$lastUpdateCheck<-NULL
Any ideas to do this more convenient?
Thanks in advance
Like this:
myChargingDevices <- jsonFile %>%
fromJSON %>%
data.frame %>%
mutate(myTime = ymd_hms(lastUpdateCheck)) %>%
arrange(desc(myTime)) %>%
select(-lastUpdateCheck)
I cannot test it, because you do not give reproducible code.

R Dplyr top_n does not work when used within function

My dplyr function looks like this
convert_to_top5_df=function(df)
{
require(dplyr)
require(lazyeval)
require(tidyr)
df %>%
filter(!is.na(SVM_LABEL_QOL)) %>%
select(globalsegment,Account,SVM_LABEL_QOL) %>%
group_by(globalsegment,Account) %>%
summarise_(QoL=interp(~round(sum(SVM_LABEL_QOL %in% 'QoL')/n(),2))) %>%
ungroup(globalsegment,Account) %>%
arrange(desc(QoL)) %>%
interp(~top_n(5,wt = "QoL"))
}
I added the interp argument, as I thought the problem was due to lazyeval
However this is not the case.
Using the function below (no interp for top_n), I get a result, however I do not see the top 5 results as desired.
Reading other stackoverflow posts, I understand that this has to do with ungroup, but not sure how to implement this.
convert_to_top5_df=function(df)
{
require(dplyr)
require(lazyeval)
require(tidyr)
df %>%
filter(!is.na(SVM_LABEL_QOL)) %>%
select(globalsegment,Account,SVM_LABEL_QOL) %>%
group_by(globalsegment,Account) %>%
summarise_(QoL=interp(~round(sum(SVM_LABEL_QOL %in% 'QoL')/n(),2))) %>%
ungroup(globalsegment,Account) %>%
arrange(desc(QoL)) %>%
top_n(5,wt = "QoL")
}
Any ideas?
My solutionn, remove the inverted quotes from QoL and add an additional argument to arrange:
#Function to convert dataframe for pie chart analysis (Global)
convert_to_top5_df=function(df)
{
require(dplyr)
require(lazyeval)
require(tidyr)
df %>%
filter(!is.na(SVM_LABEL_QOL)) %>%
select(globalsegment,Account,SVM_LABEL_QOL) %>%
group_by(globalsegment,Account) %>%
summarise_(QoL=interp(~round(sum(SVM_LABEL_QOL %in% 'QoL')/n(),2))) %>%
top_n(5,QoL) %>%
arrange(globalsegment,desc(QoL))
}
If anyone's got a more efficient way, please share

Can one argument be mapped to more than one argument in a user-defined function?

Assume I want to run this:
MS_date<-bind_inpatient_MSW %>%
arrange(NRIC,
APPROVED_DATE_BILL,APPROVED_DATE_FF_APPLICATION) %>%
group_by(NRIC,
APPROVED_DATE_BILL,APPROVED_DATE_FF_APPLICATION) %>%
mutate(n_marital_status=n_distinct(MARITAL_STATUS,na.rm=TRUE))
and this
TH_date<-bind_inpatient_MSW %>%
arrange(NRIC,
APPROVED_DATE_BILL) %>%
group_by(NRIC,
APPROVED_DATE_BILL) %>%
mutate(n_TH=n_distinct(TYPE_OF_HOUSING,na.rm=TRUE))
These two differ by the variables that arrange and group the dataframe, as well as the added variable. I would like to write a user-defined function so that I dont have to write this more than once. I tried as follows:
df_date<-function(df,grpby,cntby){
dfnew<-df %>%
arrange(grpby) %>%
group_by(grpby) %>%
mutate(n=n_distinct(cntby,na.rm=TRUE))
return(dfnew)
}
And applying df_date(bind_inpatient_MSW,NRIC,APPROVED_DATE_BILL,APPROVED_DATE_FF_APPLICATION,MARITAL_STATUS)
and
df_date(bind_inpatient_MSW,NRIC,APPROVED_DATE_BILL,TYPE_OF_HOUSING)
They wouldnt work. How could I solve this?
You can try something like:
fun <- function(dat,group,ctnby) {
dat %>%
group_by_(group) %>%
do((function(., ctnby) {
with(., data.frame(n = n_distinct(get(ctnby))))
}
)(.,ctnby))
}
fun(mtcars,"cyl","hp")
which avoids lazy evaluation using do.

Resources