tidy function cannot be used within future_map? - dictionary

I have R code below.
for the last row, when I used map() function, it worked well.
however, when I changed to future_map() function, I got the following error message:
"Error: Problem with mutate() column model.
i model = future_map(splits, fun1).
x no applicable method for 'tidy' applied to an object of class "c('lmerMod', 'merMod')""
any idea on what's wrong? thanks.
fun1 <- function(data) {
data %>% analysis %>%
lmer(val ~ period + (1 | id), data = .) %>% tidy
}
plan(multisession)
raw %>%
nest(data = -c(analyte, var)) %>%
mutate(boot = future_map(data, ~ bootstraps(
data = .x,
times = 5,
strata = id
),
.progress = T)) %>%
unnest(boot) %>%
mutate(model =future_map(splits, fun1))

I experienced exactly the same problem with one of my scripts. In order to get future_map to work properly with tidy, I needed to explicitly reference the broom package (i.e. I needed to use broom::tidy in place of tidy). In your example, you are attempting to extract summary statistics from a mixed model, so the code should run without error if we modify fun1 to be as follows:
fun1 <- function(data) {
data %>% analysis %>%
lmer(val ~ period + (1 | id), data = .) %>% broom.mixed::tidy
}
UPDATE (13-Dec-2021):
After a bit more reading, I now understand that the problem, as described in the original post, is due to the broom.mixed package not being attached in the R environment(s) where the future is evaluated. Instead of modifying fun1 (which is a very hacky way of resolving the problem), we should make use of the .options argument of future_map to guarantee that broom.mixed is attached (and all associated functions are available) in the future environments. The following code should run without error:
fun1 <- function(data) {
data %>%
analysis %>%
lmer(val ~ period + (1 | id), data = .) %>%
tidy
}
plan(multisession)
raw %>%
nest(data = -c(analyte, var)) %>%
mutate(boot = future_map(data, ~ bootstraps(data = .x,
times = 5,
strata = id),
.progress = T)) %>%
unnest(boot) %>%
mutate(model = future_map(splits,
fun1,
.options = furrr_options(packages = "broom.mixed")))
My take-home from this is that it's probably good practice to always list the packages that we need to use (as a character vector) using the .options argument of future_map, just to be on the safe side. I hope this helps!

Related

Problems generating tree diagram with hctreemap2

library(highcharter)
library(dplyr)
library(viridisLite)
library(forecast)
library(treemap)
data("Groceries", package = "arules")
dfitems <- tbl_df(Groceries#itemInfo)
set.seed(10)
dfitemsg <- dfitems %>%
mutate(category = gsub(" ", "-", level1),
subcategory = gsub(" ", "-", level2)) %>%
group_by(category, subcategory) %>%
summarise(sales = n() ^ 3 ) %>%
ungroup() %>%
sample_n(31)
hctreemap2(group_vars = c("category","subcategory"),
size_var = "sales")%>%
hc_tooltip(pointFormat = "<b>{point.name}</b>:<br>
Pop: {point.value:,.0f}<br>
GNI: {point.colorValue:,.0f}")
the error is the following
Error in hctreemap2(., group_vars = c("category", "subcategory"), size_var = "sales") : Treemap data uses same label at multiple levels.
I tried everything and it doesn't work out, could someone with experience explain to me what is happening?
When I tried your code, it also stated that the function was deprecated and to use data_to_hierarchical. Although, it's never quite that simple, right? I tried multiple ways to get hctreemap2 to work, but wasn't able to discern that issue. From there I turned to the package recommended data_to_hierarchical. Now that worked without an issue--once I figured out the right type, which in hindsight seemed kind-of obvious.
That being said, this is what I've got:
data_to_hierarchical(data = dfitemsg,
group_vars = c(category,subcategory),
size_var = sales) %>%
hchart(type = "treemap") %>%
hc_tooltip(pointFormat = "<b>{point.name}</b>:<br>
Pop: {point.value:,.0f}<br>
GNI: {point.colorValue:,.0f}")
You didn't actually designate a color, so the GNI comes up blank.
Let me know if you run into any issues.
Based on your comment:
I have not found a way to change the color to density, which is what both hctreemap2 and treemap appear to do. The function data_to_heirarchical codes the colors to the first grouping variable or the level 1 variable.
Inadvertently, I did figure out why the function hctreemap2 would not work. It checks to see if any category labels are the same as a subcategory label. I didn't go through all of the data, but I know there is a perfumery perfumery. I don't understand what that's a hard stop. If that is a problem for this call, why wouldn't data_to_heirchical be looking for this issue, as well?
So, I changed the function. First, I called the function itself.
x = hctreemap2
Then I selected it from the environment pane. Alternatively, you can code View(x).
This view is read-only, but it's easier to read than the console. I copied the function and assigned it to its original name with changes. I removed two pieces of the code, which changed nothing structurally speaking to how the chart is created.
I removed the first line of code in the function:
.Deprecated("data_to_hierarchical")
and this code (about a third of the way down)
if (data %>% select(!!!group_syms) %>% map(unique) %>% unlist() %>%
anyDuplicated()) {
stop("Treemap data uses same label at multiple levels.")
}
This left me to recreate the function with this code:
hctreemap2 <- function (data, group_vars, size_var, color_var = NULL, ...)
{
assertthat::assert_that(is.data.frame(data))
assertthat::assert_that(is.character(group_vars))
assertthat::assert_that(is.character(size_var))
if (!is.null(color_var))
assertthat::assert_that(is.character(color_var))
group_syms <- rlang::syms(group_vars)
size_sym <- rlang::sym(size_var)
color_sym <- rlang::sym(ifelse(is.null(color_var), size_var, color_var))
data <- data %>% mutate_at(group_vars, as.character)
name_cell <- function(..., depth) paste0(list(...),
seq_len(depth),
collapse = "")
data_at_depth <- function(depth) {
data %>%
group_by(!!!group_syms) %>%
summarise(value = sum(!!size_sym), colorValue = sum(!!color_sym)) %>%
ungroup() %>%
mutate(name = !!group_syms[[depth]], level = depth) %>%
mutate_at(group_vars, as.character()) %>% {
if (depth == 1) {
mutate(., id = paste0(name, 1))
}
else {
mutate(.,
parent = pmap_chr(list(!!!group_syms[seq_len(depth) - 1]),
name_cell, depth = depth - 1),
id = paste0(parent, name, depth))
}
}
}
treemap_df <- seq_along(group_vars) %>% map(data_at_depth) %>% bind_rows()
data_list <- treemap_df %>% highcharter::list_parse() %>%
purrr::map(~.[!is.na(.)])
colorVals <- treemap_df %>%
filter(level == length(group_vars)) %>% pull(colorValue)
highchart() %>%
hc_add_series(data = data_list, type = "treemap",
allowDrillToNode = TRUE, ...) %>%
hc_colorAxis(min = min(colorVals), max = max(colorVals), enabled = TRUE)
}
Now your code, as originally written will work. You did not change the highcharter package by doing this. So if you think you'll use it in the future save the function code, as well. You will need the library purrr, since you already called dplyr (where most, if any conflicts occur), you could just call tidyverse (which calls several libraries at one time, including both dplyr and purrr).
This is what it will look like with set.seed(10):
If you drill down on the largest block:
It looks odd to me, but I'm guessing that's what you were looking for to begin with.

step_mutate() couldn't find the function str_remove()

I have a recipe with the step_mutate() function in between, performing text data transformations on titanic dataset, supported by the stringr package.
library(tidyverse)
library(tidymodels)
extract_title <- function(x) stringr::str_remove(str_extract(x, "Mr\\.? |Mrs\\.?|Miss\\.?|Master\\.?"), "\\.")
rf_recipe <-
recipe(Survived ~ ., data = titanic_train) %>%
step_impute_mode(Embarked) %>%
step_mutate(Cabin = if_else(is.na(Cabin), "Yes", "No"),
Title = if_else(is.na(extract_title(Name)), "Other", extract_title(Name))) %>%
step_impute_knn(Age, impute_with = c("Title", "Sex", "SibSp", "Parch")) %>%
update_role(PassengerId, Name, new_role = "id")
This set of transformations works perfectly well with rf_recipe %>% prep() %>% bake(new_data = NULL).
When I try to fit a random forests model with hyperparameter tunning and 10-fold cross validation within a workflow, all models fail. The output of the .notes columns explicitly says that there was a problem with mutate() column Title: couldn't find the function str_remove().
doParallel::registerDoParallel()
rf_res <-
tune_grid(
rf_wf,
resamples = titanic_folds,
grid = rf_grid,
control = control_resamples(save_pred = TRUE)
)
As this post suggests I've explicitly told R that str_remove should be found in stringr package. Why this isn't working and what could be causing it?
I don't think this will fix the error, but just in case the str_extract function is not written stringr :: str_extract, did you load the package?
The error shows up because step_knn_impute() and subsequently the gower::gower_topn function transforms all characters to factors. To overcome this issue I had to apply prep()and bake() functions, without the inclusion of the recipe in the workflow.
prep_recipe <- prep(rf_recipe)
train_processed <- bake(prep_recipe, new_data = NULL)
test_processed <- bake(prep_recipe, new_data = titanic_test %>%
mutate(across(where(is.character), as.factor)))
Now the models converge.

error in could not find function in r with pipeline

I'm just starting with r, so this may very well be a very simple question but...
I've tried changing the name in 'a' to be more elaborate but this makes no difference
If I try to assign it to a variable
(e.g. baseline <- a %>% filter(Period == "Baseline") %>% group_by(File)%>%
It just tells me:
"Error in a %>% filter(Period == "Baseline") %>% group_by(File) %>% :
could not find function "%>%<-"
I'd really be grateful for any help with this.
It keeps telling me "Error in a(.) : could not find function "a"
and that it is unable to find Baseline_MAP even though it is defined earlier.
in mutate(Delta_MAP = Group_MAP - Baseline_MAP,
a <- read_csv("file.csv")
summary(a)
a %>%
filter(Period == "Baseline") %>%
group_by(File)%>%
summarise(Baseline_MAP = mean(MAP_Mean, na.rm=T),
Baseline_SBP = mean(SBP_Mean, na.rm=T),
Baseline_LaserMc1 = mean(Laser1_Magic, na.rm=T),
Baseline_Laser1 = mean(Laser1_Mean, na.rm=T))%>%
a%>%
filter(Period != "Baseline") %>%
group_by(File)%>%
summarise(Group_MAP = mean(MAP_Mean, na.rm=T),
Group_SBP = mean(SBP_Mean, na.rm=T),
Group_Laser_1Magic = mean(Laser1_Magic, na.rm=T),
Group_Laser_1 = mean(Laser1_Mean, na.rm=T))
a%>%
mutate(Delta_MAP = Group_MAP - Baseline_MAP,
Delta_MAP_Log = log(Group_MAP)-log(Baseline_MAP),
Delta_SBP = Group_SBP - Baseline_SBP,
Delta_SBP_Log = log(Group_SBP)-log(Baseline_SBP),
Delta_Laser1_Magic = Group_Laser_1Magic - Baseline_LaserMc1,
Delta_Laser1_Log = log(Group_Laser_1Magic)-log(Baseline_LaserMc1))
%>% is from the package "dplyr". So make sure you load it, i.e. library(dplyr).
Next, %>% does not assign the result to a variable. I.e.
a %>% mutate(foo=bar(x))
does not alter a. It will just show the result on the console (and none if you are running the script or calling it from a function).
You might be confusing the pipe-operator with %<>% (found in the package magrittr) which uses the left-hand variable as input for the pipe, and overwrites the variable with the modified result.
Finally, when you write
If I try to assign it to a variable (e.g. baseline <- a %>% filter(Period == "Baseline") %>% group_by(File)%>%)
You are assigning the result from the pipeline to a variable baseline -- this however does not modify the variable-names in the data frames (i.e. the column names).

Return function not accessible in R

I have made a function and then returning an object named final, However when I try to access the object outside of the function it give me error object not found.
I am not sure where I am getting wrong this seems to be fairly simple and correct, when I try to exclucde the function and just run the statements, I am able to access the final object only when trying to return the object I am not able to do so.
I am not sure why this is happening.
myfunction <- function(lo,X_train,y_train,X_test,y_test,pred){
loan_number<-as.numeric(testing$lo)
xgb.train = xgb.DMatrix(data=X_train,label=y_train)
xgb.test = xgb.DMatrix(data=X_test,label=y_test)
explainer = buildExplainer(xgb,xgb.train, type="binary", base_score = 0.5, trees = NULL)
pred.breakdown = explainPredictions(xgb, explainer, X_test)
pred.breakdown<-as.data.frame(pred.breakdown)
pred.breakdown <- pred.breakdown %>% do(.[!duplicated(names(.))])
pred_break<-pred.breakdown %>%
#Create an id by row
dplyr::mutate(id=1:n()) %>%
#Reshape
pivot_longer(cols = -id) %>%
#Arrange
arrange(id,-value) %>%
#Filter top 5
group_by(id) %>%
dplyr::mutate(Var=1:n()) %>%
filter(Var<=5) %>%
select(-c(value,Var)) %>%
#Format
dplyr::mutate(Var=paste0('Attribute',1:n())) %>%
pivot_wider(names_from = Var,values_from=name) %>%
ungroup() %>%
select(-id)
pred_break<-as.data.frame(pred_break)
prop_score<-pred
final<-as.data.frame(cbind(loan_number,prop_score,pred_break))
print("final exec")
return(final)
}
myfunction(loan_number,X_train,y_train,X_test,y_test,pred)
final<-as.data.frame(final)
Printing final exec to check if everything is working or not , Apparently it's weird that I am not able to access the final object which is passed to return statement.
R is primarily a functional programming language with lexical scoping. This line:
myfunction(loan_number,X_train,y_train,X_test,y_test,pred)
runs your function and returns the VALUE of final, but it's returning to the console. The function's return needs to be assigned to another variable in order to be used, like #Duck suggests:
final <- myfunction(loan_number,X_train,y_train,X_test,y_test,pred)
This is different than final in your function. That final is inaccessible outside of the function.

NSE on complex expressions with dplyr's do()

Can someone help me understand how NSE works with dplyr when the variable reference is in the form ".$mpg" .
After reading here, I thought using as.name would do it, since I have a character string that gives a variable name.
For example, this works:
mtcars %>%
summarise_(interp(~mean(var), var = as.name("mpg")))
and this doesn't work:
mtcars %>%
summarise_(interp(~mean(var), var = as.name(".$mpg")))
but this does:
mtcars %>%
summarise(mean(.$mpg))
and so does this:
mtcars %>%
summarise(mean(mpg))
I want to be able to specify the variable in the form .$mpg so that I can use it with do() when I don't have the option of specifying a dot for the data like in the following example:
library(dplyr)
library(broom)
mtcars %>%
tbl_df() %>%
slice(., 1) %>%
do(tidy(prop.test(.$mpg, .$disp, p = .50)))
chose random variables here to demonstrate how the prop.test function works, please don't interpret this as misuse of the test.
Eventually, I want to turn this into a function like this:
library(lazyeval)
library(broom)
library(dplyr)
p_test <- function(x, miles, distance){
x %>%
tbl_df() %>%
slice(., 1) %>%
do_(tidy(prop.test(miles, distance, p = .50)))
}
p_test(mtcars, ".$mpg", ".$disp")
I originally thought that I would have to do something like:
interp(~var, var = as.name(miles) where miles would get replaced with .$mpg, but as I mentioned at the top this does not seem to work.
The reason is that as.name creates an unevaluated variable name, but .$mpg, when used in code, is not a variable name. Rather, it’s a complex expression which is equivalent to:
`$`(., mpg)
That is, it’s a function call to the function $ with two arguments. Using as.name causes R to subsequently search for a variable with the name `.$mpg` rather than calling the above-described function.
That’s the explanation of why your attempt doesn’t work. The solution is then relatively straightforward: instead of creating an unevaluated variable name, we need to create an unevaluated function call expression. We can do this in various ways, and I’m going to show two here.
The first is simply to call parse:
p_test = function (data, miles, distance) {
x = parse(text = miles)[[1]]
n = parse(text = distance)[[1]]
data %>%
slice(1) %>%
do_(interp(~tidy(prop.test(x, n, p = 0.5)), x = x, n = n))
}
Now you can call p_test(mtcars, '.$mpg', '.$disp') and get the desired result.
However, a more dplyr-y way of doing the same thing would be to pass unevaluated objects to p_test:
p_test(mtcars, mpg, disp)
… and we can easily do this with a simple change:
p_test_ = function (data, var1, var2) {
data %>%
slice(1) %>%
do_(interp(~tidy(prop.test(.$x, .$n, p = 0.5)),
x = as.name(var1), n = as.name(var2)))
}
p_test = function (data, var1, var2) {
p_test_(data, substitute(var1), substitute(var2))
}
Now the following two pieces of code both work:
p_test(mtcars, mpg, disp)
p_test_(mtcars, 'mpg', 'disp')

Resources