Weird error with lapply and dplyr/magrittr

Weird error with lapply and dplyr/magrittr - r

Here's a piece of code:
data <- data.frame(a=runif(20),b=runif(20),subject=rep(1:2,10)) %>%
group_by(subject) %>%
do(distance = dist(.))
#no dplyr
intermediate <- lapply(data$distance,as.matrix)
mean.dists <- apply(simplify2array(intermediate),MARGIN = c(1,2),FUN=mean)
#dplyr
mean.dists <- lapply(data$distance,as.matrix) %>%
apply(simplify2array(.),MARGIN=c(1,2),FUN=mean)
Why does the "no dplyr" version work, and the "dplyr" version throws the error, "dim(X) must have a positive length"? They seem identical to me.

The issue is that you haven't quite fully implemented the pipe line. You are using magrittr here, and the issue has little to do with dplyr
data$distance %>%
lapply(as.matrix ) %>%
simplify2array %>%
apply(MARGIN=1:2, FUN=mean)

Related

Use a list in a mutate function in R

I am trying to use a list in a mutate, please see below:
Grouping <- c('f_risk_code', 'f_risk_category')
Pasting <- c('f_risk_code, f_risk_category')
Then using it in here:
Nested_Train %>%
mutate(Category = paste0(glue_col(Pasting), sep='_'))
But this is not having the desired effect - it is just returning f_risk_code', 'f_risk_category as the Category instead of the actual risk code and risk category fields. Any help appreciated.

You can use do.call :
library(dplyr)
Nested_Train %>% mutate(Category = do.call(paste0, .[Pasting]))

With tidyverse, we can use invoke
library(dplyr)
library(purrr)
Nested_Train %>%
mutate(Category = invoke(paste0, .[Pasting]))

This was the solution for me:
Try something like:
cols <- rlang::exprs(hp, cyl); mtcars %>% mutate(out=paste(!!!cols, sep="_"))

dplyr works using funs, but gives error with list [duplicate]

Pretty basic but I don't think I really understand the change:
library(dplyr)
library(lubridate)
Lab_import_sql <- Lab_import %>%
select_if(~sum(!is.na(.)) > 0) %>%
mutate_if(is.factor, as.character) %>%
mutate_if(is.character, funs(ifelse(is.character(.), trimws(.),.))) %>%
mutate_at(.vars = Lab_import %>% select_if(grepl("'",.)) %>% colnames(),
.funs = gsub,
pattern = "'",
replacement = "''") %>%
mutate_if(is.character, funs(ifelse(is.character(.), paste0("'", ., "'"),.))) %>%
mutate_if(is.Date, funs(ifelse(is.Date(.), paste0("'", ., "'"),.)))
Edit:
Thanks everyone for the input, here's reproducible code and my solution:
library(dplyr)
library(lubridate)
import <- data.frame(Test_Name = "Fir'st Last",
Test_Date = "2019-01-01",
Test_Number = 10)
import_sql <-import %>%
select_if(~!all(is.na(.))) %>%
mutate_if(is.factor, as.character) %>%
mutate_if(is.character, trimws) %>%
mutate_if(is.character, list(~gsub("'", "''",.))) %>%
mutate_if(is.character, list(~paste0("'", ., "'"))) %>%
mutate_if(is.Date, list(~paste0("'", ., "'")))

As of dplyr 0.8.0, the documentation states that we should use list instead of funs, giving the example:
Before:
funs(name = f(.))
After:
list(name = ~f(.))
So here, the call funs(ifelse(is.character(.), trimws(.),.)) can become instead list(~ifelse(is.character(.), trimws(.),.)). This is using the formula notation for anonymous functions in the tidyverse, where a one-sided formula (expression beginning with ~) is interpreted as function(x), and wherever x would go in the function is represented by .. You can still use full functions inside list.
Note the difference between the .funs argument of mutate_if and the funs() function which wrapped other functions to pass to .funs; i.e. .funs = gsub still works. You only needed funs() if you needed to apply multiple functions to selected columns or to name them something by passing them as named arguments. You can do all the same things with list().
You also are duplicating work by adding ifelse inside mutate_if; that line could be simplified to mutate_if(is.character, trimws) since if the column is character already you don't need to check it again with ifelse. Since you apply only one function, no need for funs or list at all.

R Dplyr top_n does not work when used within function

My dplyr function looks like this
convert_to_top5_df=function(df)
{
require(dplyr)
require(lazyeval)
require(tidyr)
df %>%
filter(!is.na(SVM_LABEL_QOL)) %>%
select(globalsegment,Account,SVM_LABEL_QOL) %>%
group_by(globalsegment,Account) %>%
summarise_(QoL=interp(~round(sum(SVM_LABEL_QOL %in% 'QoL')/n(),2))) %>%
ungroup(globalsegment,Account) %>%
arrange(desc(QoL)) %>%
interp(~top_n(5,wt = "QoL"))
}
I added the interp argument, as I thought the problem was due to lazyeval
However this is not the case.
Using the function below (no interp for top_n), I get a result, however I do not see the top 5 results as desired.
Reading other stackoverflow posts, I understand that this has to do with ungroup, but not sure how to implement this.
convert_to_top5_df=function(df)
{
require(dplyr)
require(lazyeval)
require(tidyr)
df %>%
filter(!is.na(SVM_LABEL_QOL)) %>%
select(globalsegment,Account,SVM_LABEL_QOL) %>%
group_by(globalsegment,Account) %>%
summarise_(QoL=interp(~round(sum(SVM_LABEL_QOL %in% 'QoL')/n(),2))) %>%
ungroup(globalsegment,Account) %>%
arrange(desc(QoL)) %>%
top_n(5,wt = "QoL")
}
Any ideas?

My solutionn, remove the inverted quotes from QoL and add an additional argument to arrange:
#Function to convert dataframe for pie chart analysis (Global)
convert_to_top5_df=function(df)
{
require(dplyr)
require(lazyeval)
require(tidyr)
df %>%
filter(!is.na(SVM_LABEL_QOL)) %>%
select(globalsegment,Account,SVM_LABEL_QOL) %>%
group_by(globalsegment,Account) %>%
summarise_(QoL=interp(~round(sum(SVM_LABEL_QOL %in% 'QoL')/n(),2))) %>%
top_n(5,QoL) %>%
arrange(globalsegment,desc(QoL))
}
If anyone's got a more efficient way, please share

Converting a code from dplyr to base R

I have to convert a two sets of code designed for dplyr into base R code as a package I use does not support dplyr. My code is below. Could anybody help converting it? I am not very experienced with base R.
#Annual average
riverObs %>%
filter(!is.na(Qobs)) %>%
mutate(year = lubridate::year(date)) %>%
group_by(year) %>%
mutate(s1 = Qsim-Qobs,
s2 = abs(s1),
s3 = sum(s2),
s4 = sum(Qobs),
s5 = s3/s4,
s6 = 1-s5) %>%
summarise(AAVE = mean(s6))
#Annual peak error, Qobs
riverObs %>%
filter(!is.na(Qobs)) %>%
mutate(year = lubridate::year(date)) %>%
group_by(year) %>%
filter(Qobs == max(Qobs)) %>%
select(year, Qobs) %>%
ungroup

From your comments, I assume that the following is your problem: You can't use dplyr together with hydromad because there are collisions between the two packages. E.g. there are functions with the same name in both packages.
One way to work around this issue is to, instead of loading dplyr, call its functions as follows: dplyr::filter, dplyr::select, etc. One thing you do have to account for is that to use the pipe %>% you would have to load the magrittr package.

Try this
library(data.table)
riverObs <- as.data.table(riverObs)
riverObs <- na.omit(riverObs)
riverObs[,year:=(year(date))]
riverObs[,s1:=(Qsim-Qobs),by="year"]
riverObs[,s2:=abs(s1)]
riverObs[,s3:=sum(s2),by="year"]
riverObs[,s4:=sum(Qobs),by="year"]
riverObs[,s5:=s3/s4]
riverObs[,s6:=1-s5,by="year"]
riverObs[,AAVE:=mean(s6),by="year"]
riverObs <- as.data.table(riverObs)
riverObs <- na.omit(riverObs)
riverObs[,year:=(year(date))]
riverObs[,max_Qobs:=max(Qobs),by="year"]
riverObs <- riverObs[Qobs == max(Qobs),]
riverObs <- riverObs[,.(year,Qobs)]

Renaming with list with magrittr

I've been playing with magrittr, and I really like the resulting code. It's clean and can really save on typing.
How can I rename list elements in magrittr:
In typical base R:
data_lists <- paste0("q",2011:2015)
data_lists <- lapply(data_lists,get)
names(data_lists) <- paste0("q",2011:2015)
In magrittr, I thought:
data_lists <-
paste0("q",2011:2015) %>%
lapply(.,get) %>%
names(.) %<>% paste0("q",2011:2015) # this is wrong
But... no dice.

Magrittr uses a number of aliases for problems of this nature. Here is an example sequence using the alias set_names()
data_lists <-
paste0("q",2011:2015) %>%
lapply(.,get) %>%
set_names(paste0("q",2011:2015))
See ?extract for more aliases

Because everything in R is a function (mostly), you could also do
data_lists <-
paste0("q",2011:2015) %>%
lapply(.,get) %>%
`names<-`(paste0("q",2011:2015))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Weird error with lapply and dplyr/magrittr - r

The issue is that you haven't quite fully implemented the pipe line. You are using magrittr here, and the issue has little to do with dplyr data$distance %>% lapply(as.matrix ) %>% simplify2array %>% apply(MARGIN=1:2, FUN=mean)

Related

Use a list in a mutate function in R

dplyr works using funs, but gives error with list [duplicate]

R Dplyr top_n does not work when used within function

Converting a code from dplyr to base R

Renaming with list with magrittr

Categories

Resources