I am trying to use a list in a mutate, please see below:
Grouping <- c('f_risk_code', 'f_risk_category')
Pasting <- c('f_risk_code, f_risk_category')
Then using it in here:
Nested_Train %>%
mutate(Category = paste0(glue_col(Pasting), sep='_'))
But this is not having the desired effect - it is just returning f_risk_code', 'f_risk_category as the Category instead of the actual risk code and risk category fields. Any help appreciated.
You can use do.call :
library(dplyr)
Nested_Train %>% mutate(Category = do.call(paste0, .[Pasting]))
With tidyverse, we can use invoke
library(dplyr)
library(purrr)
Nested_Train %>%
mutate(Category = invoke(paste0, .[Pasting]))
This was the solution for me:
Try something like:
cols <- rlang::exprs(hp, cyl); mtcars %>% mutate(out=paste(!!!cols, sep="_"))
Related
If I want to build a variable IDwith STUDYID+SUBJECT for all the dataset with name _OK in my environment, what should I do?
I can think of
list<-mget(ls(pattern = "_OK$"))
then I can I create new var ID for all data.frames that with name _OK? I think lapply or map should work, but i am not sure how to use those two. Could someone help me set an example?
Thanks.
You can use :
list_data <-mget(ls(pattern = "_OK$"))
lapply(list_data, function(x) transform(x,ID = paste(STUDYID,SUBJECT,sep = "-")))
Or using tidyverse :
library(dplyr)
library(purrr)
map(list_data, ~.x %>% mutate(ID = paste(STUDYID,SUBJECT,sep = "-")))
(I am new in R)
Trying to change variables data type of df members to factors based on condition if their names available in a list to_factors_list.
I have tried some code using mutate(across()) but it's giving errors.
Data prep.:
library(tidyverse)
# tidytuesday himalayan data
members <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-22/members.csv")
# creating list of names
to_factors_list <- members %>%
map_df(~(data.frame(n_distinct = n_distinct(.x))),
.id = "var_name") %>%
filter(n_distinct < 15) %>%
select(var_name) %>% pull()
to_factors_list
############### output ###############
'season''sex''hired''success''solo''oxygen_used''died''death_cause''injured''injury_type'
Getting error in below code attempts:
members %>%
mutate(across(~.x %in% to_factors_list, factor))
members %>%
mutate_if( ~.x %in% to_factors_list, factor)
I am not sure what's wrong and how can I make this work ?
In base R, this can be done with lapply
members[to_factors_list] <- lapply(members[to_factors_list], factor)
The correct syntax is:
members %>% mutate(across(to_factors_list, factor))
Or if you prefer an older-version dplyr syntax:
members %>% mutate_at(vars(to_factors_list), factor)
I want to use the pipeline %>% from TIDYVERSE/PURRR to make this more readable:
myChargingDevices<-data.frame(fromJSON(jsonFile))
myChargingDevices<-myChargingDevices %>%
mutate(myTime=ymd_hms(lastUpdateCheck))
myChargingDevices<-myChargingDevices[order(myChargingDevices$myTime,decreasing = TRUE),]
myChargingDevices$lastUpdateCheck<-NULL
Any ideas to do this more convenient?
Thanks in advance
Like this:
myChargingDevices <- jsonFile %>%
fromJSON %>%
data.frame %>%
mutate(myTime = ymd_hms(lastUpdateCheck)) %>%
arrange(desc(myTime)) %>%
select(-lastUpdateCheck)
I cannot test it, because you do not give reproducible code.
My dplyr function looks like this
convert_to_top5_df=function(df)
{
require(dplyr)
require(lazyeval)
require(tidyr)
df %>%
filter(!is.na(SVM_LABEL_QOL)) %>%
select(globalsegment,Account,SVM_LABEL_QOL) %>%
group_by(globalsegment,Account) %>%
summarise_(QoL=interp(~round(sum(SVM_LABEL_QOL %in% 'QoL')/n(),2))) %>%
ungroup(globalsegment,Account) %>%
arrange(desc(QoL)) %>%
interp(~top_n(5,wt = "QoL"))
}
I added the interp argument, as I thought the problem was due to lazyeval
However this is not the case.
Using the function below (no interp for top_n), I get a result, however I do not see the top 5 results as desired.
Reading other stackoverflow posts, I understand that this has to do with ungroup, but not sure how to implement this.
convert_to_top5_df=function(df)
{
require(dplyr)
require(lazyeval)
require(tidyr)
df %>%
filter(!is.na(SVM_LABEL_QOL)) %>%
select(globalsegment,Account,SVM_LABEL_QOL) %>%
group_by(globalsegment,Account) %>%
summarise_(QoL=interp(~round(sum(SVM_LABEL_QOL %in% 'QoL')/n(),2))) %>%
ungroup(globalsegment,Account) %>%
arrange(desc(QoL)) %>%
top_n(5,wt = "QoL")
}
Any ideas?
My solutionn, remove the inverted quotes from QoL and add an additional argument to arrange:
#Function to convert dataframe for pie chart analysis (Global)
convert_to_top5_df=function(df)
{
require(dplyr)
require(lazyeval)
require(tidyr)
df %>%
filter(!is.na(SVM_LABEL_QOL)) %>%
select(globalsegment,Account,SVM_LABEL_QOL) %>%
group_by(globalsegment,Account) %>%
summarise_(QoL=interp(~round(sum(SVM_LABEL_QOL %in% 'QoL')/n(),2))) %>%
top_n(5,QoL) %>%
arrange(globalsegment,desc(QoL))
}
If anyone's got a more efficient way, please share
Assume I want to run this:
MS_date<-bind_inpatient_MSW %>%
arrange(NRIC,
APPROVED_DATE_BILL,APPROVED_DATE_FF_APPLICATION) %>%
group_by(NRIC,
APPROVED_DATE_BILL,APPROVED_DATE_FF_APPLICATION) %>%
mutate(n_marital_status=n_distinct(MARITAL_STATUS,na.rm=TRUE))
and this
TH_date<-bind_inpatient_MSW %>%
arrange(NRIC,
APPROVED_DATE_BILL) %>%
group_by(NRIC,
APPROVED_DATE_BILL) %>%
mutate(n_TH=n_distinct(TYPE_OF_HOUSING,na.rm=TRUE))
These two differ by the variables that arrange and group the dataframe, as well as the added variable. I would like to write a user-defined function so that I dont have to write this more than once. I tried as follows:
df_date<-function(df,grpby,cntby){
dfnew<-df %>%
arrange(grpby) %>%
group_by(grpby) %>%
mutate(n=n_distinct(cntby,na.rm=TRUE))
return(dfnew)
}
And applying df_date(bind_inpatient_MSW,NRIC,APPROVED_DATE_BILL,APPROVED_DATE_FF_APPLICATION,MARITAL_STATUS)
and
df_date(bind_inpatient_MSW,NRIC,APPROVED_DATE_BILL,TYPE_OF_HOUSING)
They wouldnt work. How could I solve this?
You can try something like:
fun <- function(dat,group,ctnby) {
dat %>%
group_by_(group) %>%
do((function(., ctnby) {
with(., data.frame(n = n_distinct(get(ctnby))))
}
)(.,ctnby))
}
fun(mtcars,"cyl","hp")
which avoids lazy evaluation using do.