The following explain_tidymodels is created, to to display partial dependence plots.
explainer <- explain_tidymodels(rf_vi_fit, data = Data_train, y = Data_train$Lead_week)
Now i'm creating plots by doing the following:
model_profile(explainer, variables = c("Month", "AC")) %>% plot()
It automatically gives me plots in blue. However, how do I change the plot colour to red?
I already tried things like %>% plot(color = "red") and %>% plot(col = "red") but both do not work.
Any suggestions to change the partial dependence plots to a prefered colour? Thanks in advance!
I suggest that you access the underlying data and make your own plot using ggplot2, like I show here:
library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#> method from
#> required_pkgs.model_spec parsnip
library(DALEXtra)
#> Loading required package: DALEX
#> Welcome to DALEX (version: 2.2.0).
#> Find examples and detailed introduction at: http://ema.drwhy.ai/
#> Additional features will be available after installation of: ggpubr.
#> Use 'install_dependencies()' to get all suggested dependencies
#>
#> Attaching package: 'DALEX'
#> The following object is masked from 'package:dplyr':
#>
#> explain
data(ames)
ames_train <- ames %>%
transmute(Sale_Price = log10(Sale_Price),
Gr_Liv_Area = as.numeric(Gr_Liv_Area),
Year_Built, Bldg_Type)
rf_model <-
rand_forest(trees = 1000) %>%
set_engine("ranger") %>%
set_mode("regression")
rf_wflow <-
workflow() %>%
add_formula(
Sale_Price ~ Gr_Liv_Area + Year_Built + Bldg_Type) %>%
add_model(rf_model)
rf_fit <- rf_wflow %>% fit(data = ames_train)
explainer_rf <- explain_tidymodels(
rf_fit,
data = dplyr::select(ames_train, -Sale_Price),
y = ames_train$Sale_Price,
label = "random forest"
)
#> Preparation of a new explainer is initiated
#> -> model label : random forest
#> -> data : 2930 rows 3 cols
#> -> data : tibble converted into a data.frame
#> -> target variable : 2930 values
#> -> predict function : yhat.workflow will be used ( [33m default [39m )
#> -> predicted values : No value for predict function target column. ( [33m default [39m )
#> -> model_info : package tidymodels , ver. 0.1.3 , task regression ( [33m default [39m )
#> -> predicted values : numerical, min = 4.896787 , mean = 5.220582 , max = 5.51655
#> -> residual function : difference between y and yhat ( [33m default [39m )
#> -> residuals : numerical, min = -0.8021289 , mean = 5.872977e-05 , max = 0.3613971
#> [32m A new explainer has been created! [39m
pdp_rf <- model_profile(explainer_rf, N = NULL,
variables = "Gr_Liv_Area", groups = "Bldg_Type")
as_tibble(pdp_rf$agr_profiles) %>%
mutate(`_label_` = stringr::str_remove(`_label_`, "random forest_")) %>%
ggplot(aes(`_x_`, `_yhat_`, color = `_label_`)) +
geom_line(size = 1.2, alpha = 0.8) +
labs(x = "Gross living area",
y = "Sale Price (log)",
color = NULL,
title = "Partial dependence profile for Ames housing sales",
subtitle = "Predictions from a random forest model")
Created on 2021-05-27 by the reprex package (v2.0.0)
You can also access pdp_rf$cp_profiles.
Related
I am pretty new to survey data and gtsumarry package.
I'm trying to create a stratified table from the survey data using the following code, and I get the error "Error: Problem with mutate() input tbl".
# Reading the subset of the data
fileUrl <- "https://raw.github.com/Shadi-Sadie/Paper-1-Cancer-Screening-and-Immigrants/master/Cleaned%20Data/subset.csv"
SData<-read.csv( fileUrl , header = TRUE, sep ="," )
# Setting the weight
options( "survey.replicates.mse" = TRUE)
svy <- svrepdesign(repweights = "PWGTP[0-9]+",
weights = ~PWGTP,
combined.weights = TRUE,
type = "JK1",
scale = 4/80, rscales = rep(1, 80),
data = SData)
# creating the table
SData %>%
select(CITG, HICOV, ESRG , EXPANSION) %>%
tbl_strata(
strata = CITG,
.tbl_fun =
~ .x %>% tbl_svysummary(
by = EXPANSION,
include = c(CITG, HICOV, ESRG , EXPANSION),
label = list(CITG ~ "Nativity",
HICOV~ "Any health insurance",
ESRG~ "Employment",
EXPANSION ~ "Expansion" )
)
)
If it is possible to use tbl_svysummary() with the tbl_strata() could anyone tell me where I'm doing wrong?
Thanks for updating with a reproducible post. I made the following changes:
You were passing the data frame to tbl_strata() and it needed to be updated to the survey design object.
The stratifying variable should no be listed in the tbl_summary(include=) argument.
Happy Porgramming!
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.6.0'
fileUrl <- "https://raw.github.com/Shadi-Sadie/Paper-1-Cancer-Screening-and-Immigrants/master/Cleaned%20Data/subset.csv"
SData <- read.csv(fileUrl, header = TRUE, sep = ",")
# Setting the weight
options("survey.replicates.mse" = TRUE)
svy <- survey::svrepdesign(
repweights = "PWGTP[0-9]+",
weights = ~PWGTP,
combined.weights = TRUE,
type = "JK1",
scale = 4 / 80, rscales = rep(1, 80),
data = SData
)
# creating the table
tbl <-
svy %>%
tbl_strata(
strata = CITG,
.tbl_fun =
~ .x %>% tbl_svysummary(
by = EXPANSION,
include = c(HICOV, ESRG, EXPANSION),
label = list(
HICOV = "Any health insurance",
ESRG = "Employment",
EXPANSION = "Expansion"
)
)
)
Created on 2022-05-01 by the reprex package (v2.0.1)
I'm generating several chorddiag plots in R and would like to combine them together to a single plot. Here's an example list of 3 chorddiag plots:
library(chorddiag)
m <- matrix(c(11975, 5871, 8916, 2868,
1951, 10048, 2060, 6171,
8010, 16145, 8090, 8045,
1013, 990, 940, 6907),
byrow = TRUE,
nrow = 4, ncol = 4)
haircolors <- c("black", "blonde", "brown", "red")
dimnames(m) <- list(haircolors,haircolors)
groupColors <- c("#000000", "#FFDD89", "#957244", "#F26223")
ll <- lapply(1:3,function(i) chorddiag(m, groupColors = groupColors, groupnamePadding = 20))
If these were plotly object I'd use plotly's subplot function. Is there anything equivalent for case of:
> class(ll[[1]])
[1] "chorddiag" "htmlwidget"
I haven't tried the chorddiag package (I don't think it's on CRAN, maybe some other repos?), but the manipulateWidget package may be what you want. example(combineWidgets) has this code:
data(iris)
library(manipulateWidget); library(plotly)
#> Loading required package: ggplot2
#>
#> Attaching package: 'plotly'
#> The following object is masked from 'package:ggplot2':
#>
#> last_plot
#> The following object is masked from 'package:stats':
#>
#> filter
#> The following object is masked from 'package:graphics':
#>
#> layout
combineWidgets(title = "The Iris dataset",
plot_ly(iris, x = ~Sepal.Length, type = "histogram", nbinsx = 20),
plot_ly(iris, x = ~Sepal.Width, type = "histogram", nbinsx = 20),
plot_ly(iris, x = ~Petal.Length, type = "histogram", nbinsx = 20),
plot_ly(iris, x = ~Petal.Width, type = "histogram", nbinsx = 20)
)
Created on 2022-03-24 by the reprex package (v2.0.1)
EDITED to add:
Okay, I found chorddiag on Github: https://github.com/mattflor/chorddiag/ . After running your code, this combines the three diagrams:
manipulateWidgets::combineWidgets(shiny::tagList(ll))
They don't resize nicely; I suspect that's because chorddiag wants to be fullscreen, but maybe it's a problem in manipulateWidgets. You'll probably have to patch one or the other.
Please see the map I drew below using the tmap package. I did not any find parameters that I can use to customize the font of the histogram legend. From the code below, you can see that I've already set the legend.text.fontface = 'bold'. However, this did not work.
psp1 <- tm_shape(province) +
tm_borders(col = 'black') +
tm_shape(county) +
tm_polygons(col = '+1 °C', title = 'Changes in %', style = 'pretty', aes.palette = 'div', n=5, legend.hist = T) +
tm_compass(north = 0, type = 'arrow', show.labels =0, position = c('right','top')) +
tm_layout(legend.format = list(fun = function(x) formatC(x, digits = 1, format = "f")),
legend.outside = T, legend.outside.position = 'bottom',
legend.hist.width = 1,
legend.hist.height = 0.5,
legend.stack = 'horizontal',
legend.title.fontface = 'bold',
legend.text.fontface = 'bold')
Very interesting question. Indeed, it does not seem possible to change the font of the labels for the histogram using legend.text.fontface = 'bold'
Hopefully, it is possible to change this using the base R library grid on which the tmap library is based on.
So, please find below one possible solution to your request (hoping that this answer does not come too late and that it will still be useful to you)
Preliminary note for other Stackoverflow users: to run the reprex below correctly you will need to first download the data made available by the OP in this post.
Reprex
STEP 1 - BUILDING THE MAP WITH THE LEGEND
library(sf)
library(tmap)
library(RColorBrewer)
setwd("Add the path to your working directory")
# Import data
province <- st_read("province.shp")
county <- st_read("county.shp")
# Split the 'sf' object 'county' into a list of five 'sf' objects
county_warm_list <- split(county , f = county$warming)
# Build the map with the legend
psp1 <- tm_shape(province) +
tm_borders(col = 'black') +
tm_shape(st_sf(county_warm_list[[3]])) + # using the scenario +3°C
tm_polygons(col = 'estimate',
title = 'Changes in %',
style = 'pretty',
aes.palette = 'div',
n=5,
legend.hist = TRUE,
midpoint = 0) +
tm_compass(north = 0,
type = 'arrow',
show.labels =0,
position = c('right','top')) +
tm_layout(legend.show = TRUE,
legend.format = list(fun = function(x) formatC(x, digits = 1, format = "f")),
legend.outside = TRUE,
legend.outside.position = 'bottom',
legend.hist.width = 1,
legend.hist.height = 0.5,
legend.stack = 'horizontal',
legend.title.fontface = 'bold',
legend.text.fontface = 'bold')
STEP 2 - BOLD ALL THE LABELS IN THE LEGEND (i.e. including those in the histogram)
library(grid)
# Convert the 'tmap' object psp1 into a 'grob' object ('grob' = 'grid graphical object')
psp1 <- tmap_grob(psp1)
# Find the name of the element we want to change using 'grid.list()' which
# returns a listing of 'grobs' (including gTree)
grid.ls(psp1)
#> GRID.gTree.41
#> multiple_1
#> BG
#> mapBG
#> mapElements
#> GRID.gTree.11
#> tm_polygons_1_2
#> GRID.gTree.12
#> tm_polygons_1_3
#> GRID.rect.13
#> meta_with_bg
#> meta
#> GRID.gTree.16
#> GRID.gTree.15
#> compass
#> GRID.polygon.14
#> outside_legend !!!! "outside_legend" element !!!!
#> meta_with_bg
#> meta
#> legend
#> GRID.rect.39
#> GRID.gTree.40
#> GRID.gTree.19
#> GRID.gTree.18
#> GRID.text.17
#> GRID.gTree.23
#> GRID.gTree.22
#> GRID.rect.20
#> GRID.text.21
#> GRID.gTree.38
#> GRID.gTree.37
#> GRID.gTree.36
#> GRID.gTree.25
#> GRID.rect.24
#> GRID.gTree.27
#> GRID.polyline.26
#> GRID.gTree.29
#> GRID.text.28
#> GRID.gTree.33
#> GRID.gTree.30
#> GRID.lines.31
#> GRID.polyline.32
#> GRID.gTree.35
#> GRID.text.34
In the listing of grob objects just above, you can see an element named "outside_legend". So, we will modify it to bold the fonts of the legend:
# Edit the 'outside_legend' element of the 'grob' object 'psp1' using
# 'editGrob()' and save it in the new 'grob' object 'my_map'
my_map <- editGrob(psp1, gPath("outside_legend"), gp = gpar(fontface = "bold"))
# Draw the 'grob' object 'my_map'
# !!!! NB: may take a few seconds to be displayed in the graphic device !!!!
grid.draw(my_map)
STEP 3 - SAVING THE MAP EITHER MANUALLY OR PROGRAMMATICALLY
(in the latter case, you need to install the rstudioapi library)
rstudioapi::savePlotAsImage(
"my_map.png", # add the path if different of the working directory
format = "png", # other possible formats: "jpeg", "bmp", "tiff", "emf", "svg", "eps"
width = 670,
height = 710
)
And that's it :-)
Created on 2022-01-30 by the reprex package (v2.0.1)
I have a question regarding the gtsummary package and how to set a default theme:
DF:
library(tidyverse)
library(gtsummary)
library(gapminder)
gap <- gapminder %>%
dplyr::mutate_all(~ifelse(
sample(c(TRUE, FALSE), size = length(.), replace = TRUE, prob = c(0.8, 0.2)),
as.character(.),
NA)
) %>%
dplyr::mutate_at(vars(year:gdpPercap), ~as.numeric(.)
)
my_theme <-
list(
"tbl_summary-str:default_con_type" = "continuous2",
"tbl_summary-str:continuous_stat" = c("{median} ({p25} - {p75})",
"{mean} ({sd})",
"{min} - {max}",
"{N_miss} ({p_miss}%)"),
"tbl_summary-str:categorical_stat" = "{n} / {N} ({p}%)",
"style_number-arg:big.mark" = ""
)
gtsummary::set_gtsummary_theme(my_theme)
gap[, -1] %>%
gtsummary::tbl_summary(
# remove default missing values because we specified them in
# the theme above already
missing = "no"
)
The above gives me an error when I include the continent as a categorical variable and does not display a table with summary statistics for the continent. Do I have to specify some arguments in the tbl_summary function or do I have to add/remove some arguments in the theme function? Thank you!
Thank you so much for reporting this unexpected behavior. This is indeed a bug, and it has been fixed in the dev version (which you can install from github).
remotes::install_github("ddsjoberg/gtsummary", quiet = TRUE)
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.3.5.9003'
my_theme <-
list(
"tbl_summary-str:default_con_type" = "continuous2",
"tbl_summary-str:continuous_stat" = c("{median} ({p25} - {p75})", "{mean} ({sd})"),
"tbl_summary-str:categorical_stat" = "{n} / {N} ({p}%)"
)
set_gtsummary_theme(my_theme)
tbl <-
gapminder::gapminder[, 2:3] %>%
tbl_summary()
Created on 2020-10-25 by the reprex package (v0.3.0)
Using gtsummary package, when I try to style my regression table to a journal format (JAMA in this case), I get the following error:
Error in theme_gtsummary_journal(., journal = "jama") :
'list' object cannot be coerced to type 'logical'
Does anyone know why this is? The code for the data and library is as follows:
# load packages
library(gtsummary)
# dummy data
crime <-data.frame(State = sample(c("SF", "AR", "NYC","MN"),13000,replace = TRUE),
Year = sample(as.factor(c(1990, 2000)),13000, replace = TRUE)
)
# multinom model with visual
glm(Year ~ State, data = crime, family = binomial) %>%
tbl_regression(exponentiate = TRUE) %>%
theme_gtsummary_journal(journal = "jama")
The issue is that in gtsummary you don't add a theme like in ggplot2. Instead you have to set the theme. See gtsummary themes vignette:
# load packages
library(gtsummary)
set.seed(42)
# dummy data
crime <-data.frame(State = sample(c("SF", "AR", "NYC","MN"),13000,replace = TRUE),
Year = sample(as.factor(c(1990, 2000)),13000, replace = TRUE)
)
tab <- glm(Year ~ State, data = crime, family = binomial) %>%
tbl_regression(exponentiate = TRUE)
# Set theme
theme_gtsummary_journal(journal = "lancet")
tab
# Reset theme
reset_gtsummary_theme()