Converting a code from dplyr to base R - r

I have to convert a two sets of code designed for dplyr into base R code as a package I use does not support dplyr. My code is below. Could anybody help converting it? I am not very experienced with base R.
#Annual average
riverObs %>%
filter(!is.na(Qobs)) %>%
mutate(year = lubridate::year(date)) %>%
group_by(year) %>%
mutate(s1 = Qsim-Qobs,
s2 = abs(s1),
s3 = sum(s2),
s4 = sum(Qobs),
s5 = s3/s4,
s6 = 1-s5) %>%
summarise(AAVE = mean(s6))
#Annual peak error, Qobs
riverObs %>%
filter(!is.na(Qobs)) %>%
mutate(year = lubridate::year(date)) %>%
group_by(year) %>%
filter(Qobs == max(Qobs)) %>%
select(year, Qobs) %>%
ungroup

From your comments, I assume that the following is your problem: You can't use dplyr together with hydromad because there are collisions between the two packages. E.g. there are functions with the same name in both packages.
One way to work around this issue is to, instead of loading dplyr, call its functions as follows: dplyr::filter, dplyr::select, etc. One thing you do have to account for is that to use the pipe %>% you would have to load the magrittr package.

Try this
library(data.table)
riverObs <- as.data.table(riverObs)
riverObs <- na.omit(riverObs)
riverObs[,year:=(year(date))]
riverObs[,s1:=(Qsim-Qobs),by="year"]
riverObs[,s2:=abs(s1)]
riverObs[,s3:=sum(s2),by="year"]
riverObs[,s4:=sum(Qobs),by="year"]
riverObs[,s5:=s3/s4]
riverObs[,s6:=1-s5,by="year"]
riverObs[,AAVE:=mean(s6),by="year"]
riverObs <- as.data.table(riverObs)
riverObs <- na.omit(riverObs)
riverObs[,year:=(year(date))]
riverObs[,max_Qobs:=max(Qobs),by="year"]
riverObs <- riverObs[Qobs == max(Qobs),]
riverObs <- riverObs[,.(year,Qobs)]

Related

Use a list in a mutate function in R

I am trying to use a list in a mutate, please see below:
Grouping <- c('f_risk_code', 'f_risk_category')
Pasting <- c('f_risk_code, f_risk_category')
Then using it in here:
Nested_Train %>%
mutate(Category = paste0(glue_col(Pasting), sep='_'))
But this is not having the desired effect - it is just returning f_risk_code', 'f_risk_category as the Category instead of the actual risk code and risk category fields. Any help appreciated.
You can use do.call :
library(dplyr)
Nested_Train %>% mutate(Category = do.call(paste0, .[Pasting]))
With tidyverse, we can use invoke
library(dplyr)
library(purrr)
Nested_Train %>%
mutate(Category = invoke(paste0, .[Pasting]))
This was the solution for me:
Try something like:
cols <- rlang::exprs(hp, cyl); mtcars %>% mutate(out=paste(!!!cols, sep="_"))

PDF: Table Extraction - Tabulizer (R)

I'm trying to extract a table from a PDF with the R tabulizer package. The functions work fine, but it can't get all the data from the entire table.
Below are my codes
library(tabulizer)
library(tidyverse)
library(abjutils)
D_path = "https://github.com/financebr/files/raw/master/Compacto09-08-2019.pdf"
out <- extract_tables(D_path,encoding = 'UTF-8')
arrumar_nomes <- function(x) {
x %>%
tolower() %>%
str_trim() %>%
str_replace_all('[[:space:]]+', '_') %>%
str_replace_all('%', 'p') %>%
str_replace_all('r\\$', '') %>%
abjutils::rm_accent()
}
tab_tidy <- out %>%
map(as_tibble) %>%
bind_rows() %>%
set_names(arrumar_nomes(.[1,])) %>%
slice(-1) %>%
mutate_all(funs(str_replace_all(., '[[:space:]]+', ' '))) %>%
mutate_all(str_trim)
Comparing the PDF table (D_path) with the tab_tidy database you can see that some information was missing. All first columns, which are merged, are not found during extract_tables(). Also, all lines that contain “Boi Gordo” and “Boi Magro” information are not found by the function either.
The rest is in perfect condition. Would you know why and how to solve it? The questions here in the forum dealing with this do not have much answer.

Pipeline %>% in R

I want to use the pipeline %>% from TIDYVERSE/PURRR to make this more readable:
myChargingDevices<-data.frame(fromJSON(jsonFile))
myChargingDevices<-myChargingDevices %>%
mutate(myTime=ymd_hms(lastUpdateCheck))
myChargingDevices<-myChargingDevices[order(myChargingDevices$myTime,decreasing = TRUE),]
myChargingDevices$lastUpdateCheck<-NULL
Any ideas to do this more convenient?
Thanks in advance
Like this:
myChargingDevices <- jsonFile %>%
fromJSON %>%
data.frame %>%
mutate(myTime = ymd_hms(lastUpdateCheck)) %>%
arrange(desc(myTime)) %>%
select(-lastUpdateCheck)
I cannot test it, because you do not give reproducible code.

R Dplyr top_n does not work when used within function

My dplyr function looks like this
convert_to_top5_df=function(df)
{
require(dplyr)
require(lazyeval)
require(tidyr)
df %>%
filter(!is.na(SVM_LABEL_QOL)) %>%
select(globalsegment,Account,SVM_LABEL_QOL) %>%
group_by(globalsegment,Account) %>%
summarise_(QoL=interp(~round(sum(SVM_LABEL_QOL %in% 'QoL')/n(),2))) %>%
ungroup(globalsegment,Account) %>%
arrange(desc(QoL)) %>%
interp(~top_n(5,wt = "QoL"))
}
I added the interp argument, as I thought the problem was due to lazyeval
However this is not the case.
Using the function below (no interp for top_n), I get a result, however I do not see the top 5 results as desired.
Reading other stackoverflow posts, I understand that this has to do with ungroup, but not sure how to implement this.
convert_to_top5_df=function(df)
{
require(dplyr)
require(lazyeval)
require(tidyr)
df %>%
filter(!is.na(SVM_LABEL_QOL)) %>%
select(globalsegment,Account,SVM_LABEL_QOL) %>%
group_by(globalsegment,Account) %>%
summarise_(QoL=interp(~round(sum(SVM_LABEL_QOL %in% 'QoL')/n(),2))) %>%
ungroup(globalsegment,Account) %>%
arrange(desc(QoL)) %>%
top_n(5,wt = "QoL")
}
Any ideas?
My solutionn, remove the inverted quotes from QoL and add an additional argument to arrange:
#Function to convert dataframe for pie chart analysis (Global)
convert_to_top5_df=function(df)
{
require(dplyr)
require(lazyeval)
require(tidyr)
df %>%
filter(!is.na(SVM_LABEL_QOL)) %>%
select(globalsegment,Account,SVM_LABEL_QOL) %>%
group_by(globalsegment,Account) %>%
summarise_(QoL=interp(~round(sum(SVM_LABEL_QOL %in% 'QoL')/n(),2))) %>%
top_n(5,QoL) %>%
arrange(globalsegment,desc(QoL))
}
If anyone's got a more efficient way, please share

Weird error with lapply and dplyr/magrittr

Here's a piece of code:
data <- data.frame(a=runif(20),b=runif(20),subject=rep(1:2,10)) %>%
group_by(subject) %>%
do(distance = dist(.))
#no dplyr
intermediate <- lapply(data$distance,as.matrix)
mean.dists <- apply(simplify2array(intermediate),MARGIN = c(1,2),FUN=mean)
#dplyr
mean.dists <- lapply(data$distance,as.matrix) %>%
apply(simplify2array(.),MARGIN=c(1,2),FUN=mean)
Why does the "no dplyr" version work, and the "dplyr" version throws the error, "dim(X) must have a positive length"? They seem identical to me.
The issue is that you haven't quite fully implemented the pipe line. You are using magrittr here, and the issue has little to do with dplyr
data$distance %>%
lapply(as.matrix ) %>%
simplify2array %>%
apply(MARGIN=1:2, FUN=mean)

Resources