Using metafor::rma with broom::tidy? - r

I'm a complete R novice and would like to do the following:
library(metafor)
library(broomExtra)
df <-
escalc(
measure = "RR",
ai = tpos,
bi = tneg,
ci = cpos,
di = cneg,
data = dat.bcg
)
meta_analysis <- rma(yi, vi, data = df, method = "EB")
meta_analysis
tidy(meta_analysis)
Why does tidy(meta_analysis)always give me NULL?

You can use broomExtra::tidy_parameters function if there is no tidier in broom:
library(metafor)
#> Loading required package: Matrix
#> Loading 'metafor' package (version 2.1-0). For an overview
df <-
escalc(
measure = "RR",
ai = tpos,
bi = tneg,
ci = cpos,
di = cneg,
data = dat.bcg
)
meta_analysis <- rma(yi, vi, data = df, method = "EB")
broomExtra::tidy_parameters(meta_analysis)
#> # A tibble: 1 x 8
#> term type estimate std.error statistic p.value conf.low conf.high
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 overall summary -0.715 0.181 -3.95 0.0000774 -1.07 -0.360

Checked the documentation (?tidy). Seems there is no tidy method for an object of class rma. From the docs of broomExtra::tidy:
Checks if a tidy method exits for a given object, either in broom or
in broom.mixed. If it does, it turn an object into a tidy tibble, if
not, return a NULL. In case of data frames, a tibble data frame is
returned.

Related

RMSE value on the example of randomForrest

I am watching one of the solutions for House Prices Kaggle competition. I would like to know how do you get RMSE value from this:
Subset the train rows and selected features
dt.train <- fulldt %>% filter(Set == "Train") %>% select("Id", "OverallQual", "TotalArea", "AreaAbvground", "GarageArea", "TotalBaths", "YearBuilt", "Neighborhood", "MSSubClass", "FireplaceQu", "ExterQual", "KitchenQual", "BsmtQual", "HouseStyle") %>% mutate(SalePrice = log(raw.train$SalePrice))
Same for the test features
dt.test <- fulldt %>% filter(Set == "Test") %>%
select("Id", "OverallQual", "TotalArea", "AreaAbvground", "GarageArea", "TotalBaths", "YearBuilt",
"Neighborhood", "MSSubClass", "FireplaceQu", "ExterQual", "KitchenQual", "BsmtQual", "HouseStyle")
Random Forest model
fit <- randomForest(SalePrice ~ ., data = dt.train, importance = T)
Use new model to predict SalePrice values from the test set
pred <- exp(predict(fit , newdata = dt.test))
How do you get RMSE value from pred ?
Let's calculate the RMSE of the training and test rows based on the minimal example iris data:
library(tibble)
library(randomForest)
#> randomForest 4.6-14
#> Type rfNews() to see new features/changes/bug fixes.
library(yardstick)
#> For binary classification, the first factor level is assumed to be the event.
#> Use the argument `event_level = "second"` to alter this as needed.
train_df <- head(iris, 100)
test_df <- tail(iris, 50)
model <- randomForest(Sepal.Length ~ ., data = train_df, importance = T)
# Test RMSE
tibble(
truth = predict(model, newdata = test_df),
predicted = test_df$Sepal.Length
) %>%
rmse(truth, predicted)
#> # A tibble: 1 x 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 rmse standard 0.836
# Train RMSE
tibble(
truth = predict(model, newdata = train_df),
predicted = train_df$Sepal.Length
) %>%
rmse(truth, predicted)
#> # A tibble: 1 x 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 rmse standard 0.265
Created on 2021-12-13 by the reprex package (v2.0.1)

extract certain elements from lists of lists

I have a list that contains outputs from multiple correlation tests
dput(head(corr[1:2]))
list(structure(list(statistic = c(S = 1486), parameter = NULL,
p.value = 0.219369570345178, estimate = c(rho = 0.265810276679842),
null.value = c(rho = 0), alternative = "two.sided", method = "Spearman's rank correlation rho",
data.name = "x$theta.x and x$theta.y"), class = "htest"),
structure(list(statistic = c(S = 1852), parameter = NULL,
p.value = 0.699151237307271, estimate = c(rho = 0.0849802371541502),
null.value = c(rho = 0), alternative = "two.sided", method = "Spearman's rank correlation rho",
data.name = "x$theta.x and x$theta.y"), class = "htest"))
I would like to extract into a separate data frame p.value and estimate. For each element I can do it like this:
corr[[1]][3]
$p.value
[1] 0.2193696
> corr[[1]][4]
$estimate
rho
0.2658103
But I did not have any success in trying to extract those values from the entire list at once.
We can also use extract function from magrittr package for this purpose:
library(purrr)
df %>% map_dfr(magrittr::extract, c("estimate", "p.value"))
# A tibble: 2 x 2
estimate p.value
<dbl> <dbl>
1 0.266 0.219
2 0.0850 0.699
We could do
do.call(rbind, lapply(corr, \(x) data.frame(x[3:4])))
p.value estimate
rho 0.2193696 0.26581028
rho1 0.6991512 0.08498024
You can use [ to extract specific element.
as.data.frame(t(sapply(corr, `[`, c(3, 4))))
# p.value estimate
#1 0.219 0.266
#2 0.699 0.085
Moreover, using broom::tidy might be simpler.
purrr::map_df(corr, broom::tidy)
# estimate statistic p.value method alternative
# <dbl> <dbl> <dbl> <chr> <chr>
#1 0.266 1486 0.219 Spearman's rank correlation rho two.sided
#2 0.0850 1852 0.699 Spearman's rank correlation rho two.sided

object '...' not found in R Functions with lm -->> (Error in eval(predvars, data, env) : object '...' not found)

I'm using the moderndrive package to calculate a linear regression but using a function. I am trying to create a function where i can just pass in two selected columns(e.g deaths & cases, titles of the columns) from my data frame (Rona_2020). Below is the function...
score_model_Fxn <- function(y, x){
score_mod <- lm(y ~ x, data = Rona_2020)
Reg_Table <- get_regression_table(score_mod)
print(paste('The regression table is', Reg_Table))
}
when I run the function ...
score_model_Fxn(deaths, cases)
I get ...
Error in eval(predvars, data, env) : object 'deaths' not found
What should i do? I have looked several similar issues but to no avail.
What you want to do by passing deaths and cases is called non-standard evaluation. You need to combine this with computing on the language if you want to run a model with the correct formula and scoping. Computing on the language can be done with substitute and bquote.
library(moderndive)
score_model_Fxn <- function(y, x, data){
#get the symbols passed as arguments:
data <- substitute(data)
y <- substitute(y)
x <- substitute(x)
#substitute them into the lm call and evaluate the call:
score_mod <- eval(bquote(lm(.(y) ~ .(x), data = .(data))))
Reg_Table <- get_regression_table(score_mod)
message('The regression table is') #better than your paste solution
print(Reg_Table)
invisible(score_mod) #a function should always return something useful
}
mod <- score_model_Fxn(Sepal.Length, Sepal.Width, iris)
#The regression table is
## A tibble: 2 x 7
# term estimate std_error statistic p_value lower_ci upper_ci
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 intercept 6.53 0.479 13.6 0 5.58 7.47
#2 Sepal.Width -0.223 0.155 -1.44 0.152 -0.53 0.083
print(mod)
#
#Call:
#lm(formula = Sepal.Length ~ Sepal.Width, data = iris)
#
#Coefficients:
#(Intercept) Sepal.Width
# 6.5262 -0.2234
You could have the function return Reg_Table instead if you prefer.
One of the coolest ways of doing this is using the new recipes package to generate the formula for us and then manipulating a tibble to produce or result
library(tidyverse)
library(recipes)
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stringr':
#>
#> fixed
#> The following object is masked from 'package:stats':
#>
#> step
library(moderndive)
score_model_Fxn <- function(df,x, y){
formula_1 <- df %>%
recipe() %>%
update_role({{x}},new_role = "outcome") %>%
update_role({{y}},new_role = "predictor") %>%
formula()
Reg_Table <- mtcars %>%
summarise(score_mod = list(lm(formula_1,data = .))) %>%
rowwise() %>%
mutate(Reg_Table = list(get_regression_table(score_mod))) %>%
pull(Reg_Table)
print(paste('The regression table is', Reg_Table))
Reg_Table
}
k <- mtcars %>%
score_model_Fxn(x = cyl,y = gear)
#> [1] "The regression table is list(term = c(\"intercept\", \"gear\"), estimate = c(10.585, -1.193), std_error = c(1.445, 0.385), statistic = c(7.324, -3.101), p_value = c(0, 0.004), lower_ci = c(7.633, -1.978), upper_ci = c(13.537, -0.407))"
k
#> [[1]]
#> # A tibble: 2 x 7
#> term estimate std_error statistic p_value lower_ci upper_ci
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 intercept 10.6 1.44 7.32 0 7.63 13.5
#> 2 gear -1.19 0.385 -3.10 0.004 -1.98 -0.407
Created on 2020-06-09 by the reprex package (v0.3.0)
For those that might be interested...I modified Bruno's answer.
library(tidyverse); library(recipes); library(moderndive)
score_model_Fxn2 <- function(df,x, y){
formula_1 <- df %>%
recipe() %>%
update_role({{y}},new_role = "outcome") %>%
update_role({{x}},new_role = "predictor") %>%
formula()
Reg_Table <- df %>%
summarise(score_mod = list(lm(formula_1,data = .))) %>%
rowwise() %>%
mutate(Reg_Table = list(get_regression_table(score_mod))) %>%
pull(Reg_Table)
print(Reg_Table)
}
score_model_Fxn2()

How to create dataframe using results (output) of sens.slope function?

I have an Excel data with multiple sheets. I imported them into R and applied Mann-Kendall trend test with the function sens.slope(). The results of this function are in htest class, but I want to put them in a table.
I installed packages needed and imported each sheets of dataset.
require(readxl)
require(trend)
tmin1 <- read_excel("C:/TEZ/ANALİZ/future_projection/2051-2100/model 3-3/average_tmin_3_3_end.xlsx", sheet = "acipayam")
tmin2 <- read_excel("C:/TEZ/ANALİZ/future_projection/2051-2100/model 3-3/average_tmin_3_3_end.xlsx", sheet = "adana")
...
tmin57 <- read_excel("C:/TEZ/ANALİZ/future_projection/2051-2100/model 3-3/average_tmin_3_3_end.xlsx", sheet = "yumurtalik")
Then, specified the columns for trend test.
x1<-tmin1$`13`
x2<-tmin1$`14`
x3<-tmin1$`15`
x4<-tmin1$`16`
x5<-tmin1$`17`
...
x281<-tmin57$`13`
x282<-tmin57$`14`
x283<-tmin57$`15`
x284<-tmin57$`16`
x285<-tmin57$`17`
And appplied the function.
sens.slope(x1)
sens.slope(x2)
sens.slope(x3)
....
sens.slope(x285)
The result is looking like this.
> sens.slope(x1)
Sen's slope
data: x1
z = 4.6116, n = 49, p-value = 3.996e-06
alternative hypothesis: true z is not equal to 0
95 percent confidence interval:
0.03241168 0.08101651
sample estimates:
Sen's slope
0.05689083
> sens.slope(x2)
Sen's slope
data: x2
z = 6.8011, n = 49, p-value = 1.039e-11
alternative hypothesis: true z is not equal to 0
95 percent confidence interval:
0.05632911 0.08373755
sample estimates:
Sen's slope
0.07032428
...
How can I put these values in a single table and write them to an Excel file? (names of needed values are statistic and estimates in the function.)
There is a package broom precisely for this:
library(tidyverse)
library(trend)
sens.slope(runif(1000)) %>%
broom::tidy()
# A tibble: 1 x 7
statistic p.value parameter conf.low conf.high method alternative
<dbl> <dbl> <int> <dbl> <dbl> <chr> <chr>
1 0.548 0.584 1000 -0.0000442 0.0000801 Sen's slope two.sided
And if you have many data frames, bind them all into one list and loop it over with map_df:
A = tibble(Value = runif(1000))
B = tibble(Value = runif(1000))
C = tibble(Value = runif(1000))
D = tibble(Value = runif(1000))
list(A,B,C,D) %>%
map_df(~.x %>%
pull(1) %>%
sens.slope() %>%
broom::tidy())
# A tibble: 4 x 7
statistic p.value parameter conf.low conf.high method alternative
<dbl> <dbl> <int> <dbl> <dbl> <chr> <chr>
1 -0.376 0.707 1000 -0.0000732 0.0000502 Sen's slope two.sided
2 -2.30 0.0215 1000 -0.000138 -0.0000110 Sen's slope two.sided
3 -1.30 0.194 1000 -0.000104 0.0000209 Sen's slope two.sided
4 0.674 0.500 1000 -0.0000410 0.0000848 Sen's slope two.sided
Edit: Just realised that broom::tidy in this case doesn't provide the estimate (haven't encountered this before), here is the solution without using broom:
A = tibble(Value = runif(1000))
B = tibble(Value = runif(1000))
C = tibble(Value = runif(1000))
D = tibble(Value = runif(1000))
list(A,B,C,D) %>%
purrr::map_df(.,~{
Test = sens.slope(.x %>% pull(1))
Test = tibble(Estimate = Test["estimates"] %>% unlist,
Statistic = Test["statistic"] %>% unlist)
}
)
# A tibble: 4 x 2
Estimate Statistic
<dbl> <dbl>
1 -0.0000495 -1.55
2 -0.00000491 -0.155
3 0.0000242 0.755
4 -0.0000301 -0.921
Try using lists instead of having so many objects in global environment.
Now since you already have them, you can combine them in a list, apply sens.slope on each one, extract statistic and estimates from them an get the dataframe.
library(trend)
output <- data.frame(t(sapply(mget(paste0('x', 1:285)), function(y)
{temp <- sens.slope(y);c(temp$statistic, temp$estimates)})))
You can now write this dataframe as csv using write.csv.
write.csv(output, 'output.csv', row.names = FALSE)

arguments provided as a list not getting evaluated properly

I am working on a custom function whose goal is to run a function (..f) for all combinations of grouping variables grouping.var provides for a given dataframe and then tidy those results into a dataframe using broom package.
Here is a custom function I've written. Note that ... are supplied to ..f, while additional arguments for broom::tidy method are supplied via tidy.args list.
# setup
set.seed(123)
library(tidyverse)
options(pillar.sigfig = 8)
# custom function
grouped_tidy <- function(data,
grouping.vars,
..f,
...,
tidy.args = list()) {
# check how many variables were entered for grouping variable vector
grouping.vars <-
as.list(rlang::quo_squash(rlang::enquo(grouping.vars)))
grouping.vars <-
if (length(grouping.vars) == 1) {
grouping.vars
} else {
grouping.vars[-1]
}
# quote all argument to `..f`
dots <- rlang::enquos(...)
# running the grouped analysis
df_results <- data %>%
dplyr::group_by(.data = ., !!!grouping.vars, .drop = TRUE) %>%
dplyr::group_map(
.tbl = .,
.f = ~ broom::tidy(
x = rlang::exec(.fn = ..f, !!!dots, data = .x),
unlist(tidy.args)
))
# return the final dataframe with results
return(df_results)
}
As shown by examples below, although this function works, I am doubtful the tidy.args list is getting evaluated properly because irrespective of what conf.level I choose, I always get the same results to the 4th decimal place.
95% CI
# using the function to get 95% CI
grouped_tidy(
data = ggplot2::diamonds,
grouping.vars = c(cut),
..f = stats::lm,
formula = price ~ carat - 1,
tidy.args = list(conf.int = TRUE, conf.level = 0.95)
)
#> # A tibble: 5 x 8
#> # Groups: cut [5]
#> cut term estimate std.error statistic p.value conf.low conf.high
#> <ord> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Fair carat 4510.7919 42.614474 105.85117 0 4427.2062 4594.3776
#> 2 Good carat 5260.8494 27.036670 194.58200 0 5207.8454 5313.8534
#> 3 Very Good carat 5672.5054 18.675939 303.73334 0 5635.8976 5709.1132
#> 4 Premium carat 5807.1392 16.836474 344.91422 0 5774.1374 5840.1410
#> 5 Ideal carat 5819.4837 15.178657 383.39911 0 5789.7324 5849.2350
99% CI
# using the function to get 99% CI
grouped_tidy(
data = ggplot2::diamonds,
grouping.vars = c(cut),
..f = stats::lm,
formula = price ~ carat - 1,
tidy.args = list(conf.int = TRUE, conf.level = 0.99)
)
#> # A tibble: 5 x 8
#> # Groups: cut [5]
#> cut term estimate std.error statistic p.value conf.low conf.high
#> <ord> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Fair carat 4510.7919 42.614474 105.85117 0 4427.2062 4594.3776
#> 2 Good carat 5260.8494 27.036670 194.58200 0 5207.8454 5313.8534
#> 3 Very Good carat 5672.5054 18.675939 303.73334 0 5635.8976 5709.1132
#> 4 Premium carat 5807.1392 16.836474 344.91422 0 5774.1374 5840.1410
#> 5 Ideal carat 5819.4837 15.178657 383.39911 0 5789.7324 5849.2350
Any idea on how I can change the function so that the list of arguments will be evaluated properly by broom::tidy?
set.seed(123)
library(tidyverse)
options(pillar.sigfig = 8)
grouped_tidy <- function(data,
grouping.vars,
..f,
...,
tidy.args = list()) {
# functions passed to group_map must accept
# .x and .y arguments, where .x is the data
tidy_group <- function(.x, .y) {
# presumes ..f won't explode if called with these args
model <- ..f(..., data = .x)
# mild variation on do.call to call function with
# list of arguments
rlang::exec(broom::tidy, model, !!!tidy.args)
}
data %>%
group_by(!!!grouping.vars, .drop = TRUE) %>%
group_map(tidy_group) %>%
ungroup() # don't get bitten by groups downstream
}
grouped_tidy(
data = ggplot2::diamonds,
# wrap grouping columns in vars() like in scoped dplyr verbs
grouping.vars = vars(cut),
..f = stats::lm,
formula = price ~ carat - 1,
tidy.args = list(conf.int = TRUE, conf.level = 0.95)
)
#> # A tibble: 5 x 8
#> cut term estimate std.error statistic p.value conf.low conf.high
#> <ord> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Fair carat 4510.7919 42.614474 105.85117 0 4427.2062 4594.3776
#> 2 Good carat 5260.8494 27.036670 194.58200 0 5207.8454 5313.8534
#> 3 Very Good carat 5672.5054 18.675939 303.73334 0 5635.8976 5709.1132
#> 4 Premium carat 5807.1392 16.836474 344.91422 0 5774.1374 5840.1410
#> 5 Ideal carat 5819.4837 15.178657 383.39911 0 5789.7324 5849.2350
Created on 2019-02-23 by the reprex package (v0.2.1)

Resources