Here's a piece of code:
data <- data.frame(a=runif(20),b=runif(20),subject=rep(1:2,10)) %>%
group_by(subject) %>%
do(distance = dist(.))
#no dplyr
intermediate <- lapply(data$distance,as.matrix)
mean.dists <- apply(simplify2array(intermediate),MARGIN = c(1,2),FUN=mean)
#dplyr
mean.dists <- lapply(data$distance,as.matrix) %>%
apply(simplify2array(.),MARGIN=c(1,2),FUN=mean)
Why does the "no dplyr" version work, and the "dplyr" version throws the error, "dim(X) must have a positive length"? They seem identical to me.
The issue is that you haven't quite fully implemented the pipe line. You are using magrittr here, and the issue has little to do with dplyr
data$distance %>%
lapply(as.matrix ) %>%
simplify2array %>%
apply(MARGIN=1:2, FUN=mean)
Related
I am trying to use a list in a mutate, please see below:
Grouping <- c('f_risk_code', 'f_risk_category')
Pasting <- c('f_risk_code, f_risk_category')
Then using it in here:
Nested_Train %>%
mutate(Category = paste0(glue_col(Pasting), sep='_'))
But this is not having the desired effect - it is just returning f_risk_code', 'f_risk_category as the Category instead of the actual risk code and risk category fields. Any help appreciated.
You can use do.call :
library(dplyr)
Nested_Train %>% mutate(Category = do.call(paste0, .[Pasting]))
With tidyverse, we can use invoke
library(dplyr)
library(purrr)
Nested_Train %>%
mutate(Category = invoke(paste0, .[Pasting]))
This was the solution for me:
Try something like:
cols <- rlang::exprs(hp, cyl); mtcars %>% mutate(out=paste(!!!cols, sep="_"))
Pretty basic but I don't think I really understand the change:
library(dplyr)
library(lubridate)
Lab_import_sql <- Lab_import %>%
select_if(~sum(!is.na(.)) > 0) %>%
mutate_if(is.factor, as.character) %>%
mutate_if(is.character, funs(ifelse(is.character(.), trimws(.),.))) %>%
mutate_at(.vars = Lab_import %>% select_if(grepl("'",.)) %>% colnames(),
.funs = gsub,
pattern = "'",
replacement = "''") %>%
mutate_if(is.character, funs(ifelse(is.character(.), paste0("'", ., "'"),.))) %>%
mutate_if(is.Date, funs(ifelse(is.Date(.), paste0("'", ., "'"),.)))
Edit:
Thanks everyone for the input, here's reproducible code and my solution:
library(dplyr)
library(lubridate)
import <- data.frame(Test_Name = "Fir'st Last",
Test_Date = "2019-01-01",
Test_Number = 10)
import_sql <-import %>%
select_if(~!all(is.na(.))) %>%
mutate_if(is.factor, as.character) %>%
mutate_if(is.character, trimws) %>%
mutate_if(is.character, list(~gsub("'", "''",.))) %>%
mutate_if(is.character, list(~paste0("'", ., "'"))) %>%
mutate_if(is.Date, list(~paste0("'", ., "'")))
As of dplyr 0.8.0, the documentation states that we should use list instead of funs, giving the example:
Before:
funs(name = f(.))
After:
list(name = ~f(.))
So here, the call funs(ifelse(is.character(.), trimws(.),.)) can become instead list(~ifelse(is.character(.), trimws(.),.)). This is using the formula notation for anonymous functions in the tidyverse, where a one-sided formula (expression beginning with ~) is interpreted as function(x), and wherever x would go in the function is represented by .. You can still use full functions inside list.
Note the difference between the .funs argument of mutate_if and the funs() function which wrapped other functions to pass to .funs; i.e. .funs = gsub still works. You only needed funs() if you needed to apply multiple functions to selected columns or to name them something by passing them as named arguments. You can do all the same things with list().
You also are duplicating work by adding ifelse inside mutate_if; that line could be simplified to mutate_if(is.character, trimws) since if the column is character already you don't need to check it again with ifelse. Since you apply only one function, no need for funs or list at all.
My dplyr function looks like this
convert_to_top5_df=function(df)
{
require(dplyr)
require(lazyeval)
require(tidyr)
df %>%
filter(!is.na(SVM_LABEL_QOL)) %>%
select(globalsegment,Account,SVM_LABEL_QOL) %>%
group_by(globalsegment,Account) %>%
summarise_(QoL=interp(~round(sum(SVM_LABEL_QOL %in% 'QoL')/n(),2))) %>%
ungroup(globalsegment,Account) %>%
arrange(desc(QoL)) %>%
interp(~top_n(5,wt = "QoL"))
}
I added the interp argument, as I thought the problem was due to lazyeval
However this is not the case.
Using the function below (no interp for top_n), I get a result, however I do not see the top 5 results as desired.
Reading other stackoverflow posts, I understand that this has to do with ungroup, but not sure how to implement this.
convert_to_top5_df=function(df)
{
require(dplyr)
require(lazyeval)
require(tidyr)
df %>%
filter(!is.na(SVM_LABEL_QOL)) %>%
select(globalsegment,Account,SVM_LABEL_QOL) %>%
group_by(globalsegment,Account) %>%
summarise_(QoL=interp(~round(sum(SVM_LABEL_QOL %in% 'QoL')/n(),2))) %>%
ungroup(globalsegment,Account) %>%
arrange(desc(QoL)) %>%
top_n(5,wt = "QoL")
}
Any ideas?
My solutionn, remove the inverted quotes from QoL and add an additional argument to arrange:
#Function to convert dataframe for pie chart analysis (Global)
convert_to_top5_df=function(df)
{
require(dplyr)
require(lazyeval)
require(tidyr)
df %>%
filter(!is.na(SVM_LABEL_QOL)) %>%
select(globalsegment,Account,SVM_LABEL_QOL) %>%
group_by(globalsegment,Account) %>%
summarise_(QoL=interp(~round(sum(SVM_LABEL_QOL %in% 'QoL')/n(),2))) %>%
top_n(5,QoL) %>%
arrange(globalsegment,desc(QoL))
}
If anyone's got a more efficient way, please share
I have to convert a two sets of code designed for dplyr into base R code as a package I use does not support dplyr. My code is below. Could anybody help converting it? I am not very experienced with base R.
#Annual average
riverObs %>%
filter(!is.na(Qobs)) %>%
mutate(year = lubridate::year(date)) %>%
group_by(year) %>%
mutate(s1 = Qsim-Qobs,
s2 = abs(s1),
s3 = sum(s2),
s4 = sum(Qobs),
s5 = s3/s4,
s6 = 1-s5) %>%
summarise(AAVE = mean(s6))
#Annual peak error, Qobs
riverObs %>%
filter(!is.na(Qobs)) %>%
mutate(year = lubridate::year(date)) %>%
group_by(year) %>%
filter(Qobs == max(Qobs)) %>%
select(year, Qobs) %>%
ungroup
From your comments, I assume that the following is your problem: You can't use dplyr together with hydromad because there are collisions between the two packages. E.g. there are functions with the same name in both packages.
One way to work around this issue is to, instead of loading dplyr, call its functions as follows: dplyr::filter, dplyr::select, etc. One thing you do have to account for is that to use the pipe %>% you would have to load the magrittr package.
Try this
library(data.table)
riverObs <- as.data.table(riverObs)
riverObs <- na.omit(riverObs)
riverObs[,year:=(year(date))]
riverObs[,s1:=(Qsim-Qobs),by="year"]
riverObs[,s2:=abs(s1)]
riverObs[,s3:=sum(s2),by="year"]
riverObs[,s4:=sum(Qobs),by="year"]
riverObs[,s5:=s3/s4]
riverObs[,s6:=1-s5,by="year"]
riverObs[,AAVE:=mean(s6),by="year"]
riverObs <- as.data.table(riverObs)
riverObs <- na.omit(riverObs)
riverObs[,year:=(year(date))]
riverObs[,max_Qobs:=max(Qobs),by="year"]
riverObs <- riverObs[Qobs == max(Qobs),]
riverObs <- riverObs[,.(year,Qobs)]
I've been playing with magrittr, and I really like the resulting code. It's clean and can really save on typing.
How can I rename list elements in magrittr:
In typical base R:
data_lists <- paste0("q",2011:2015)
data_lists <- lapply(data_lists,get)
names(data_lists) <- paste0("q",2011:2015)
In magrittr, I thought:
data_lists <-
paste0("q",2011:2015) %>%
lapply(.,get) %>%
names(.) %<>% paste0("q",2011:2015) # this is wrong
But... no dice.
Magrittr uses a number of aliases for problems of this nature. Here is an example sequence using the alias set_names()
data_lists <-
paste0("q",2011:2015) %>%
lapply(.,get) %>%
set_names(paste0("q",2011:2015))
See ?extract for more aliases
Because everything in R is a function (mostly), you could also do
data_lists <-
paste0("q",2011:2015) %>%
lapply(.,get) %>%
`names<-`(paste0("q",2011:2015))