Pipeline %>% in R - r

I want to use the pipeline %>% from TIDYVERSE/PURRR to make this more readable:
myChargingDevices<-data.frame(fromJSON(jsonFile))
myChargingDevices<-myChargingDevices %>%
mutate(myTime=ymd_hms(lastUpdateCheck))
myChargingDevices<-myChargingDevices[order(myChargingDevices$myTime,decreasing = TRUE),]
myChargingDevices$lastUpdateCheck<-NULL
Any ideas to do this more convenient?
Thanks in advance

Like this:
myChargingDevices <- jsonFile %>%
fromJSON %>%
data.frame %>%
mutate(myTime = ymd_hms(lastUpdateCheck)) %>%
arrange(desc(myTime)) %>%
select(-lastUpdateCheck)
I cannot test it, because you do not give reproducible code.

Related

Use a list in a mutate function in R

I am trying to use a list in a mutate, please see below:
Grouping <- c('f_risk_code', 'f_risk_category')
Pasting <- c('f_risk_code, f_risk_category')
Then using it in here:
Nested_Train %>%
mutate(Category = paste0(glue_col(Pasting), sep='_'))
But this is not having the desired effect - it is just returning f_risk_code', 'f_risk_category as the Category instead of the actual risk code and risk category fields. Any help appreciated.
You can use do.call :
library(dplyr)
Nested_Train %>% mutate(Category = do.call(paste0, .[Pasting]))
With tidyverse, we can use invoke
library(dplyr)
library(purrr)
Nested_Train %>%
mutate(Category = invoke(paste0, .[Pasting]))
This was the solution for me:
Try something like:
cols <- rlang::exprs(hp, cyl); mtcars %>% mutate(out=paste(!!!cols, sep="_"))

R Dplyr top_n does not work when used within function

My dplyr function looks like this
convert_to_top5_df=function(df)
{
require(dplyr)
require(lazyeval)
require(tidyr)
df %>%
filter(!is.na(SVM_LABEL_QOL)) %>%
select(globalsegment,Account,SVM_LABEL_QOL) %>%
group_by(globalsegment,Account) %>%
summarise_(QoL=interp(~round(sum(SVM_LABEL_QOL %in% 'QoL')/n(),2))) %>%
ungroup(globalsegment,Account) %>%
arrange(desc(QoL)) %>%
interp(~top_n(5,wt = "QoL"))
}
I added the interp argument, as I thought the problem was due to lazyeval
However this is not the case.
Using the function below (no interp for top_n), I get a result, however I do not see the top 5 results as desired.
Reading other stackoverflow posts, I understand that this has to do with ungroup, but not sure how to implement this.
convert_to_top5_df=function(df)
{
require(dplyr)
require(lazyeval)
require(tidyr)
df %>%
filter(!is.na(SVM_LABEL_QOL)) %>%
select(globalsegment,Account,SVM_LABEL_QOL) %>%
group_by(globalsegment,Account) %>%
summarise_(QoL=interp(~round(sum(SVM_LABEL_QOL %in% 'QoL')/n(),2))) %>%
ungroup(globalsegment,Account) %>%
arrange(desc(QoL)) %>%
top_n(5,wt = "QoL")
}
Any ideas?
My solutionn, remove the inverted quotes from QoL and add an additional argument to arrange:
#Function to convert dataframe for pie chart analysis (Global)
convert_to_top5_df=function(df)
{
require(dplyr)
require(lazyeval)
require(tidyr)
df %>%
filter(!is.na(SVM_LABEL_QOL)) %>%
select(globalsegment,Account,SVM_LABEL_QOL) %>%
group_by(globalsegment,Account) %>%
summarise_(QoL=interp(~round(sum(SVM_LABEL_QOL %in% 'QoL')/n(),2))) %>%
top_n(5,QoL) %>%
arrange(globalsegment,desc(QoL))
}
If anyone's got a more efficient way, please share

Printing intermediate results without breaking pipeline in tidyverse

Is there a command to add to tidyverse pipelines that does not break the flow, but produces some side effect, like printing something out. The usecase I have in mind is something like this. In case of a pipeline
data %>%
mutate(new_var = <some time consuming operation>) %>%
mutate(new_var2 = <some other time consuming operation>) %>%
...
I would like to add some command to the pipeline that would not modify the end result, but would print out some progress or the state of things. Maybe something like this:
data %>%
mutate(new_var = <some time consuming operation>) %>%
command_x(print("first operation done")) %>%
mutate(new_var2 = <some other time consuming operation>) %>%
...
Does there exist such command_x already?
For the specific case of printing an intermediate step in the pipeline, just use %>% print() %>%. E.g.,
mtcars %>%
filter(cyl == 4) %>%
print() %>%
summarise(mpg = mean(mpg))
For a simple status message, you'd do:
pipe_message = function(.data, status) {message(status); .data}
mtcars %>%
filter(cyl == 4) %>%
pipe_message("first operation done") %>%
select(cyl)
See the answer by #MrFlick for a more general solution for non-print functions.
You could easily write your own function
pass_through <- function(data, fun) {fun(data); data}
And use it like
mtcars %>% pass_through(. %>% ncol %>% print) %>% nrow
Here we use the . %>% syntax to create an anonymous function. You could also write your own more explicitly with
mtcars %>% pass_through(function(x) print(ncol(x))) %>% nrow
You can do on the fly with an anonymous function:
mtcars %>% ( function(x){print(x); return(x)} ) %>% nrow()

Getting the tidyr::nest() -> purrr:map() workflow to work for special case of no grouping var

I'm trying to write a function that does a split-apply-combine for which the split variable(s) are parameters, and - importantly - a null split is acceptable. For example, running statistics either on subsets of data or on the entire dataset.
somedata=expand.grid(a=1:3,b=1:3)
somefun=function(df_in,grpvars=NULL){
df_in %>% group_by_(.dots=grpvars) %>% nest() %>%
mutate(X2.Resid=map(data,~with(.x,chisq.test(b)$residuals))) %>%
unnest(data,X2.Resid) %>% return()
}
somefun(somedata,"a") # This works
somefun(somedata) # This fails
The null condition fails because nest() seems to need a variable to nest by, rather than nesting the entire df into a 1x1 data.frame. I can get around this as follows:
somefun2=function(df_in,grpvars="Dummy"){
df_in$Dummy=1
df_in %>% group_by_(.dots=grpvars) %>% nest() %>%
mutate(X2.Resid=map(data,~with(.x,chisq.test(b)$residuals))) %>%
unnest(data,X2.Resid) %>%
select(-Dummy) %>% return()
}
somefun2(somedata) # This works
However, I'm wondering if there is a more elegant way to fix this, without needing the dummy variabe?
Hmm, that behavior is a little surprising to me. A fix is easy though: you just have to make sure you nest everything():
somefun3 <- function(df_in, grpvars = NULL) {
df_in %>%
group_by_(.dots = grpvars) %>%
nest(everything()) %>%
mutate(X2.Resid = map(data, ~with(.x, chisq.test(b)$residuals))) %>%
unnest()
}
somefun3(somedata, "a")
somefun3(somedata)
Both work.

Converting a code from dplyr to base R

I have to convert a two sets of code designed for dplyr into base R code as a package I use does not support dplyr. My code is below. Could anybody help converting it? I am not very experienced with base R.
#Annual average
riverObs %>%
filter(!is.na(Qobs)) %>%
mutate(year = lubridate::year(date)) %>%
group_by(year) %>%
mutate(s1 = Qsim-Qobs,
s2 = abs(s1),
s3 = sum(s2),
s4 = sum(Qobs),
s5 = s3/s4,
s6 = 1-s5) %>%
summarise(AAVE = mean(s6))
#Annual peak error, Qobs
riverObs %>%
filter(!is.na(Qobs)) %>%
mutate(year = lubridate::year(date)) %>%
group_by(year) %>%
filter(Qobs == max(Qobs)) %>%
select(year, Qobs) %>%
ungroup
From your comments, I assume that the following is your problem: You can't use dplyr together with hydromad because there are collisions between the two packages. E.g. there are functions with the same name in both packages.
One way to work around this issue is to, instead of loading dplyr, call its functions as follows: dplyr::filter, dplyr::select, etc. One thing you do have to account for is that to use the pipe %>% you would have to load the magrittr package.
Try this
library(data.table)
riverObs <- as.data.table(riverObs)
riverObs <- na.omit(riverObs)
riverObs[,year:=(year(date))]
riverObs[,s1:=(Qsim-Qobs),by="year"]
riverObs[,s2:=abs(s1)]
riverObs[,s3:=sum(s2),by="year"]
riverObs[,s4:=sum(Qobs),by="year"]
riverObs[,s5:=s3/s4]
riverObs[,s6:=1-s5,by="year"]
riverObs[,AAVE:=mean(s6),by="year"]
riverObs <- as.data.table(riverObs)
riverObs <- na.omit(riverObs)
riverObs[,year:=(year(date))]
riverObs[,max_Qobs:=max(Qobs),by="year"]
riverObs <- riverObs[Qobs == max(Qobs),]
riverObs <- riverObs[,.(year,Qobs)]

Resources