I am trying to add a footnote to the bottom of my plot with betas, standard errors, and p values directly from the model summary I saved. However, it keeps telling me there is an unexpected error in the parse text. Any help would be greatly appreciated!
exact error:
Error in parse(text = text[[i]]) : :1:26: unexpected input
1: 'Main effect of age: ' $
^
minimal reproducible example:
id<-rep(1:50)
tst<-c(sample(7:9,50, replace = T))
mydf<-data.frame(id,tst)
mydf$age<-sample(40:90,50, replace = T)
mydf$bmi<-sample(20:30,50, replace = T)
mydf$sex<-sample(1:2,50, replace = T)
##Overall model##
model <- lm( tst ~ age*sex + bmi , data = mydf)
summary(model)
model.df<-ggpredict(model, terms = c("age", "sex"))
model.plot<-plot(model.df)+theme(legend.position="none")+
theme(plot.title = element_text(hjust = 0.5))+
annotate("text", x = 0, y = 0.05, parse = TRUE, size = 4,
label = " 'Main effect of age: ' $\beta == %.2g ",
coef(model)[2])
(model.plot)
Seems like parsing your syntax for parsing is wrong. Also, your code would add the text to each facet - not sure if that was the intended outcome (if so, just use parse=FALSE and the paste0(...) expression from below for your annotation. If you wish a global footnote, you could a caption like so:
library(ggiraphExtra)
library(ggplot2)
set.seed(1234)
mydf <- data.frame(
id = 1:50,
tst = sample(7:9, 50, replace = T),
age = sample(40:90,50, replace = T),
bmi = sample(20:30,50, replace = T),
sex = sample(1:2,50, replace = T)
)
##Overall model##
model <- lm( tst ~ age*sex + bmi , data = mydf)
summary(model)
#>
#> Call:
#> lm(formula = tst ~ age * sex + bmi, data = mydf)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1.24030 -0.67286 -0.07152 0.62489 1.27281
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 4.99309 1.66745 2.994 0.00446 **
#> age 0.01975 0.02389 0.827 0.41274
#> sex 1.21860 0.95986 1.270 0.21077
#> bmi 0.06602 0.03805 1.735 0.08955 .
#> age:sex -0.01852 0.01532 -1.209 0.23307
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.7507 on 45 degrees of freedom
#> Multiple R-squared: 0.125, Adjusted R-squared: 0.04721
#> F-statistic: 1.607 on 4 and 45 DF, p-value: 0.189
model.df <- ggPredict(model, terms = c("age", "sex"))
model.plot <- model.df +
theme(legend.position="none",
plot.title = element_text(hjust = 0.5)) +
labs(caption = paste0(
"Main effect of age: β = ",
sprintf("%.2g", coef(model)[2])))
model.plot
Created on 2022-06-30 by the reprex package (v2.0.1)
Related
I am using the cmprsk package to create a series of regressions. In the real models I used, I specified my models in the same way that is shown in the example that produces mel2 below. My problem is, I want the Melanoma$ in front of the coefficients to go away, as happens if I had specified the model like in mel1. Is there a way to delete that data frame prefix out of the object without re-running it?
library(cmprsk)
data(Melanoma, package = "MASS")
head(Melanoma)
mel1 <- crr(ftime = Melanoma$time, fstatus = Melanoma$status, cov1 = Melanoma[, c("sex", "age")], cencode = 2)
covs2 <- model.matrix(~ Melanoma$sex + Melanoma$age)[, -1]
mel2 <- crr(ftime = Melanoma$time, fstatus = Melanoma$status, cov1 = covs2, cencode = 2)
What I want:
What I have:
You could use the data argument in model.matrix, and wrap the crr call in with(Melanoma, ...)
covs2 <- model.matrix(~ sex + age, data = Melanoma)[, -1]
mel2 <- with(Melanoma, crr(ftime = time, fstatus = status,
cov1 = covs2, cencode = 2))
mel2$coef
#> sex age
#> 0.58838573 0.01259388
If you are stuck with existing models like this:
covs2 <- model.matrix(~ Melanoma$sex + Melanoma$age)[, -1]
mel2 <- crr(ftime = Melanoma$time, fstatus = Melanoma$status,
cov1 = covs2, cencode = 2)
You could simply rename the coefficients like this
names(mel2$coef) <- c("sex", "age")
mel2
#> convergence: TRUE
#> coefficients:
#> sex age
#> 0.58840 0.01259
#> standard errors:
#> [1] 0.271800 0.009301
#> two-sided p-values:
#> sex age
#> 0.03 0.18
I am developing a shiny app in which I am plotting scatterplot and a spline fit function on it, the degree of spline function can be changed by a slider whose values varies from 2-12, shown below:
ui <- tabPanel(sidebarLayout(
sidebarPanel(sliderInput('degree', 'Degree of the Polynomial:', min = 2, max = 12, value = 3, step = 1)),
mainPanel(plotlyOutput("plot"))))
Below is the server side code:
server <- function(input, output, session){
observeEvent(input$degree, {
output$plot <- renderPlotly({
m <- lm(formula = y ~ splines::bs(x, df = input$degree), df4)
#plot
g <- ggplot(data = df4, aes_string(x = df4$x, y = df4$y)) + theme_bw() +
geom_point(colour = "blue", size = 0.1)+
geom_smooth(formula = y ~ splines::bs(x, df = input$degree), method = "lm", color = "green3", level = 1, size = 1)
h <- g + xlab("X (mm)") + ylab("Z (um)")
ggplotly(h) %>% add_annotations(text= sprintf("R^2: %f", summary(m)[8]), xref="paper", yref="paper", x=0.05,y=0.9)
})
})
}
df4 is the dataset that has been used to plot the scatterplot, which looks like this:
Now I want the value of the degree of spline fit to get selected automatically based on the R^2 value.
For example, if 0.8 is the set threshold for the R^2 value, then that degree of spline function should get automatically selected as the default value of the slider, where the value of R^2 crosses the threshold of 0.8 for the first time.
All in all, I want the default set value of the slider (which is set to 3 here) to be dynamic based on the set threshold value of R^2.
This should do it. You need to estimate the model outside of the rendered output so you can identify the correct degree. Then, you need to use renderUI() to build the slider so you can pass the identified value of degree to the value argument. Then, you can make the plot without being inside the event observer because it's already a reactive function and observing the degree input slider.
ui <- fluidPage(sidebarLayout(
sidebarPanel(uiOutput("slider")),
mainPanel(plotlyOutput("plot"))))
server <- function(input, output, session){
library(ggplot2)
library(plotly)
library(splines)
set.seed(1)
## set number of observations
n <- 400
## generate x in [0,1]
x <- 0:(n-1)/(n-1)
## create compled function of x
f <- 0.2*x^11*(10*(1-x))^6+10*(10*x)^3*(1-x)^10
## create y = f(x) + random noise
y <- f + rnorm(n, 0, sd = 2)
df4 <- data.frame(x=x, y=y)
deg <- 2
r2 <- 0
while(r2 < .8){
deg <- deg + 1
m <- lm(formula = y ~ splines::bs(x, df = deg), df4)
r2 <- summary(m)$r.squared
}
output$slider <- renderUI(sliderInput('degree',
'Degree of the Polynomial:',
min = 2,
max = 300,
value = deg,
step = 1) )
output$plot <- renderPlotly({
#plot
m <- lm(formula = y ~ splines::bs(x, df = input$degree), df4)
g <- ggplot(data = df4, aes(x = x, y = y)) + theme_bw() +
geom_point(colour = "blue", size = 0.1)+
geom_smooth(formula = y ~ splines::bs(x, df = input$degree), method = "lm", color = "green3", level = 1, size = 1)
h <- g + xlab("X (mm)") + ylab("Z (um)")
ggplotly(h) %>% add_annotations(text= sprintf("R^2: %f", summary(m)[8]), xref="paper", yref="paper", x=0.05,y=0.9)
})
}
shinyApp(ui, server)
EDIT add file upload
I added a file upload button and text box along with variable choosers for the x- and y- variables from the names in the uploaded dataset.
ui <- fluidPage(sidebarLayout(
sidebarPanel(
fileInput('file1', 'Choose file to upload',
accept = c(
'text/csv',
'text/comma-separated-values',
'text/tab-separated-values',
'text/plain',
'.csv',
'.tsv'
)
),
uiOutput("xvar"),
uiOutput("yvar"),
uiOutput("slider")),
mainPanel(plotlyOutput("plot"))))
server <- function(input, output, session){
library(ggplot2)
library(plotly)
library(splines)
df4 <- reactive({
req(input$file1)
inFile <- input$file1
read.csv(inFile$datapath, header = TRUE)
})
output$xvar <- renderUI({
req(df4())
selectInput("xvar", "X-variable", choices=names(df4()), selected = NULL)
})
output$yvar <- renderUI({
req(df4())
selectInput("yvar", "Y-variable", choices=names(df4()), selected = NULL)
})
deg <- reactive({
req(input$yvar)
degr <- 2
r2 <- 0
while(r2 < .8){
degr <- degr + 1
form <- paste(input$yvar, "~ splines::bs(", input$xvar, ", df = ", degr, ")")
m <- lm(formula = form, df4())
r2 <- summary(m)$r.squared
}
degr
})
output$slider <- renderUI({
req(deg())
sliderInput('degree',
'Degree of the Polynomial:',
min = 2,
max = 300,
value = deg(),
step = 1) })
output$plot <- renderPlotly({
req(deg())
#plot
form <- paste(input$yvar, "~ splines::bs(", input$xvar, ", df = ", input$degree, ")")
m <- lm(formula = form, df4())
g <- ggplot(data = df4(), aes_string(x = input$xvar, y = input$yvar)) + theme_bw() +
geom_point(colour = "blue", size = 0.1)+
geom_smooth(formula = y ~ splines::bs(x, df = input$degree), method = "lm", color = "green3", level = 1, size = 1)
h <- g + xlab("X (mm)") + ylab("Z (um)")
ggplotly(h) %>% add_annotations(text= sprintf("R^2: %f", summary(m)[8]), xref="paper", yref="paper", x=0.05,y=0.9)
})
}
shinyApp(ui, server)
This is tricky without some sample data, but suppose we had the following data set:
set.seed(1)
df4 <- data.frame(x = 1:10, y = rnorm(10, (1:10)/10))
df4
#> x y
#> 1 1 -0.5264538
#> 2 2 0.3836433
#> 3 3 -0.5356286
#> 4 4 1.9952808
#> 5 5 0.8295078
#> 6 6 -0.2204684
#> 7 7 1.1874291
#> 8 8 1.5383247
#> 9 9 1.4757814
#> 10 10 0.6946116
When plotted, it looks like this:
plot(df)
so it has a slight upwards trend.
If we want to find the number of splines that gives a fit with r squared > 0.8 we can do:
library(splines)
i <- 3
while(summary(lm(formula = y ~ bs(x, df = i), df4))$r.squared < 0.8) i <- i + 1
So now i is the lowest number of splines that gives an r squared of 0.8 or more:
i
#> [1] 8
And we can fit i into our fixed model:
fit <- lm(formula = y ~ splines::bs(x, df = i), df4)
summary(fit)
#>
#> Call:
#> lm(formula = y ~ splines::bs(x, df = i), data = df4)
#>
#> Residuals:
#> 1 2 3 4 5 6 7 8
#> 0.00008 -0.00216 0.01512 -0.04776 0.08208 -0.08208 0.04776 -0.01512
#> 9 10
#> 0.00216 -0.00008
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.5265 0.1360 -3.871 0.1609
#> splines::bs(x, df = i)1 4.4178 0.4344 10.170 0.0624 .
#> splines::bs(x, df = i)2 -4.1409 0.4194 -9.874 0.0643 .
#> splines::bs(x, df = i)3 5.2151 0.3247 16.064 0.0396 *
#> splines::bs(x, df = i)4 -1.3020 0.3068 -4.244 0.1473
#> splines::bs(x, df = i)5 2.3384 0.3245 7.206 0.0878 .
#> splines::bs(x, df = i)6 1.9458 0.4199 4.634 0.1353
#> splines::bs(x, df = i)7 2.0650 0.4309 4.792 0.1310
#> splines::bs(x, df = i)8 1.2212 0.1924 6.349 0.0995 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.136 on 1 degrees of freedom
#> Multiple R-squared: 0.9974, Adjusted R-squared: 0.9769
#> F-statistic: 48.6 on 8 and 1 DF, p-value: 0.1105
and
lines(10:100/10, predict(fit, newdata = list(x = 10:100/10)), col = "red")
Created on 2020-11-30 by the reprex package (v0.3.0)
Assume I have data with a dependency y(t) and parameters p1, p2 and p3
which might influence the value y(t).
I create 3 linear equations which depend on the following combinations of the
parameters p1 and p2 - p3 has no impact on y(t), that means it follows a random assignment.
You can find a reproducible example in the end of the question.
The 3 equations are
p1 p2 Equation
1 1 5 + 3t
2 1 1 - t
2 2 3 + t
A plot of the 3 equations including random data looks like the following:
Now, if I call lm() (For formulae see here) based on my random data, I get the following result.
lm(formula = y ~ .^2, data = mydata)
Residuals:
Min 1Q Median 3Q Max
-1.14707 -0.22785 0.00157 0.23099 1.10528
Coefficients: (6 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.83711 0.17548 27.565 <2e-16 ***
t 2.97316 0.02909 102.220 <2e-16 ***
p12 -3.86697 0.21487 -17.997 <2e-16 ***
p22 2.30617 0.20508 11.245 <2e-16 ***
p23 NA NA NA NA
p32 0.16518 0.21213 0.779 0.4375
p33 0.23450 0.22594 1.038 0.3012
t:p12 -4.00574 0.03119 -128.435 <2e-16 ***
t:p22 2.01230 0.03147 63.947 <2e-16 ***
t:p23 NA NA NA NA
t:p32 0.01155 0.03020 0.383 0.7027
t:p33 0.02469 0.03265 0.756 0.4508
p12:p22 NA NA NA NA
p12:p23 NA NA NA NA
p12:p32 -0.10368 0.21629 -0.479 0.6325
p12:p33 -0.11728 0.21386 -0.548 0.5843
p22:p32 -0.20871 0.19633 -1.063 0.2896
p23:p32 NA NA NA NA
p22:p33 -0.44250 0.22322 -1.982 0.0495 *
p23:p33 NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4112 on 136 degrees of freedom
Multiple R-squared: 0.9988, Adjusted R-squared: 0.9987
F-statistic: 8589 on 13 and 136 DF, p-value: < 2.2e-16
If I only want to condsider parameters with high significance, I would argue to ignore parameters close to zero. If I understand correctly, zero-parameters do not lead to "new lines". I then obtain the following simplified model (Values are rounded for readability):
Estimate
(Intercept) 5 ***
t 3 ***
p12 -4 ***
p22 2 ***
t:p12 -4 ***
t:p22 2 ***
I would then reconstruct the theoretical model as follows from the estimate
above (only highly significant parameters!):
p1 p2 Equation Result
1 1 5+3t 5+3t
1 2 5+3t+p22+t:p22*t 7+5t
2 1 5+3t+p12+t:p12*t 1-t
2 2 5+3t+p22+t:p22*t+p12+t:p12*t 3+t
Now, 7 + 5t is obviously wrong, but I am not sure about the reason.
I guess, lm successively adds the paramters, thus the corresponding model
y ~ t:p2 is not contained in the model above?
This question and references therein might be related, but I didn't look at the lm result - so there is nothing about that.
Reproducible example:
r <- generate_3lines(sigma = 0.5, slopes = c(3, 1, -1), offsets = c(5, 3, 1))
t_m <- r$t_m; y_m <- r$y_m; y_t <- r$y_t; rm(r)
mydata <- generate_randomdata(t_m, y_m, y_t)
# What the raw data looks like:
plot(t_m[[1]], y_t[[1]], type = "l", lty = 3, col = "black", main = "Raw data",
xlim = c(0, 10), ylim = c(min(mydata$y), max(mydata$y)), xlab = "t", ylab = "y")
lines(t_m[[2]], y_t[[2]], col = "black", lty = 3)
lines(t_m[[3]], y_t[[3]], col = "black", lty = 3)
points(x = mydata$t, y = mydata$y)
fit <- lm(y ~ .^2, data = mydata) # Not all levels / variables are linearly
print(summary(fit))
and the functions
generate_3lines <- function(sigma = 0.5, slopes = c(3, 1, -1), offsets = c(5, 3, 1)) {
t <- seq(0,10, length.out = 1000) # large sample of x values
t_m <- list()
y_m <- list()
y_t <- list()
for (i in 1:3) {
set.seed(33*i)
t_m[[i]] <- sort(sample(t, 50, replace = F))
set.seed(33*i)
noise <- rnorm(10, 0, sigma)
y_m[[i]] <- slopes[i]*t_m[[i]] + offsets[i] + noise
y_t[[i]] <- slopes[i]*t_m[[i]] + offsets[i]
}
return(list(t_m = t_m, y_m = y_m, y_t = y_t))
}
generate_randomdata <- function(t_m, y_m, y_t) {
# Final data set
df1 <- data.frame(t = t_m[[1]], y = y_m[[1]], p1 = rep(1), p2 = rep(1),
p3 = sample(c(1, 2, 3), length(t_m[[1]]), replace = T))
df2 <- data.frame(t = t_m[[2]], y = y_m[[2]], p1 = rep(2), p2 = rep(2),
p3 = sample(c(1, 2, 3), length(t_m[[1]]), replace = T))
df3 <- data.frame(t = t_m[[3]], y = y_m[[3]], p1 = rep(2), p2 = rep(3),
p3 = sample(c(1, 2, 3), length(t_m[[1]]), replace = T))
mydata <- rbind(df1, df2, df3)
mydata$p1 <- factor(mydata$p1)
mydata$p2 <- factor(mydata$p2)
mydata$p3 <- factor(mydata$p3)
mydata <- mydata[sample(nrow(mydata)), ]
return(mydata)
}
Edit after input from #MrFlick: The question is now also on Cross Validated
Comment: It seems, the fit is not really automated in ggplot, see here
In brief, everything is ok with the model and the result from lm. As explained in this answer on cross-validated, 7+5t is just an extrapolation to a range without data. Furthermore, the synthetic data suffers from collinearity.
I am trying to create a plot where I want to show all coefficients from my linear model and their respective statistical details attached at each point using ggrepel package. I have managed to create the basic plot, but what I haven't been able to figure out is how to use plotmath while creating labels. So, for example, in the plot produced below, I would like to use italics for the t-value (t) and p-value (p). Additionally, if I were to include estimates, I might also want to include the greek letter beta (β) in the label.
# loading needed libraries
library(ggrepel)
#> Loading required package: ggplot2
library(ggplot2)
library(GGally)
library(tidyverse)
# creating a dataframe containing results
(label_df <- broom::tidy(x = stats::lm(data = mtcars, wt ~ am*cyl), conf.int = TRUE) %>%
dplyr::filter(.data = ., term != "(Intercept)") %>%
dplyr::select(.data = ., term, estimate, conf.low, conf.high, statistic, p.value) %>%
purrrlyr::by_row(
.d = .,
..f = ~ paste(
"t = ",
round(.$statistic, digits = 3),
", p = ",
round(.$p.value, digits = 3),
sep = ""
),
.collate = "rows",
.to = "label",
.labels = TRUE
)
)
#> # tibble [3 x 7]
#> term estimate conf.low conf.high statistic p.value label
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 am -0.956 -2.58 0.668 -1.21 0.238 t = -1.206, p = 0~
#> 2 cyl 0.304 0.135 0.473 3.68 0.000989 t = 3.678, p = 0.~
#> 3 am:cyl 0.0328 -0.234 0.300 0.252 0.803 t = 0.252, p = 0.~
# creating the model coefficient plot using ggcoef
plot <- GGally::ggcoef(x = stats::lm(data = mtcars, wt ~ am*cyl), exclude_intercept = TRUE)
# adding labels using ggrepel
plot +
ggrepel::geom_label_repel(
data = label_df,
mapping = ggplot2::aes(x = estimate, y = term, label = label),
size = 3,
box.padding = grid::unit(x = 0.75, units = "lines"),
fontface = "bold",
direction = "y",
color = "black",
label.size = 0.25,
segment.color = "black",
segment.size = 0.5,
segment.alpha = NULL,
min.segment.length = 0.5,
max.iter = 2000,
point.padding = 0.5,
force = 2,
na.rm = TRUE
)
If I use something like base::substitute or base::bquote to create the label inside purrrlyr, I get the following error:
.f must return either data frames or vectors for non-list collation
I can get rid of this error by converting it to character type but then the labels get all messed-up.
# creating a dataframe containing results
(label_df <- broom::tidy(x = stats::lm(data = mtcars, wt ~ am*cyl), conf.int = TRUE) %>%
dplyr::filter(.data = ., term != "(Intercept)") %>%
dplyr::select(.data = ., term, estimate, conf.low, conf.high, statistic, p.value) %>%
purrrlyr::by_row(
.d = .,
..f = ~ as.character(bquote(
"t = "~.(round(.$statistic, digits = 3))~
", p = "~
.(round(.$p.value, digits = 3))
)),
.collate = "rows",
.to = "label",
.labels = TRUE
)
)
#> # tibble [9 x 8]
#> term estimate conf.low conf.high statistic p.value .row label
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <chr>
#> 1 am -0.956 -2.58 0.668 -1.21 0.238 1 ~
#> 2 am -0.956 -2.58 0.668 -1.21 0.238 1 "\"t = \" ~~
#> 3 am -0.956 -2.58 0.668 -1.21 0.238 1 0.238
#> 4 cyl 0.304 0.135 0.473 3.68 0.000989 2 ~
#> 5 cyl 0.304 0.135 0.473 3.68 0.000989 2 "\"t = \" ~~
#> 6 cyl 0.304 0.135 0.473 3.68 0.000989 2 0.001
#> 7 am:cyl 0.0328 -0.234 0.300 0.252 0.803 3 ~
#> 8 am:cyl 0.0328 -0.234 0.300 0.252 0.803 3 "\"t = \" ~~
#> 9 am:cyl 0.0328 -0.234 0.300 0.252 0.803 3 0.803
Created on 2018-06-13 by the reprex package (v0.2.0).
Based on discussion in the comments, you need to use mathematical annotations correctly to avoid errors, see link.
The below label format works for me, and includes beta-estimates with the Greek symbol. list is needed to obtain commas in plotmath.
(label_df <- broom::tidy(x = stats::lm(data = mtcars, wt ~ am*cyl), conf.int = TRUE) %>%
dplyr::filter(.data = ., term != "(Intercept)") %>%
dplyr::select(.data = ., term, estimate, conf.low, conf.high, statistic, p.value) %>%
purrrlyr::by_row(
.d = .,
..f = ~ paste(
"list(italic(t)==",
round(.$statistic, digits = 3),
", ~italic(p)==",
round(.$p.value, digits = 3),
", ~beta==",
round(.$estimate, digits = 3),
")",
sep = ""
),
.collate = "rows",
.to = "label",
.labels = TRUE
)
)
I would like to call glm in my function, a minimum example is:
my.glm <- function(...){
fit <- glm(...)
summary(fit)
}
However, it gives an error.
a <- data.frame(x=rpois(100, 2), y=rnorm(100) )
glm(x ~ 1, offset=y, family=poisson, data=a)
my.glm(x ~ 1, offset=y, family=poisson, data=a) # error eval(expr, envir, enclos)
What can I do?
You can use match.call to expand the ..., and modify its output to make it a call to glm:
my.glm <- function(...){
cl <- match.call()
cl[1] <- call("glm")
fit <- eval(cl)
summary(fit)
}
my.glm(x ~ 1, offset=y, family=poisson, data=a)
Call:
glm(formula = x ~ 1, family = poisson, data = a, offset = y)
Deviance Residuals:
Min 1Q Median 3Q Max
-7.1789 -0.8575 0.3065 1.5343 4.4896
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.07628 0.07433 1.026 0.305
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 346 on 99 degrees of freedom
Residual deviance: 346 on 99 degrees of freedom
AIC: 559.46
Number of Fisher Scoring iterations: 6
A solution in case you want to modify some things and then pass them to glm() along with ... (i.e. ...: additional arguments passed to glm()). This requires the rlang package, but there's probably a way to do it without.
glm_wrap <- function(data, formula, ...) {
#e.g. modify data and formula
data$new <- data$x + rnorm(nrow(data))
f <- update(formula, .~. + new)
#construct new call
new_call <- as.call(c(list(rlang::sym("glm"), formula = f, data = data), rlang::exprs(...)))
eval(new_call)
}
The resulting call is unfortunately long and ugly though.
df <- data.frame(y = 1:10, x = rnorm(10), z = runif(10, 1, 3))
glm_wrap(data = df, formula = y~x, family = gaussian(link = "log"), offset = log(z))
#>
#> Call: glm(formula = y ~ x + new, family = gaussian(link = "log"), data = structure(list(
#> y = 1:10, x = c(0.788586544201169, -0.191055916962356, -0.709038064642618,
#> -1.43594109422505, 0.139431523468874, 1.58756249459749, -0.699123220004699,
#> 0.824223253644347, 0.979299697212903, -0.766809343110728),
#> z = c(1.40056129638106, 1.53261906700209, 1.59653351828456,
#> 2.90909940004349, 2.1954998113215, 2.77657635230571, 2.63835062459111,
#> 2.78547951159999, 2.52235971018672, 1.20802361145616), new = c(-1.4056733559404,
#> -0.590623492831404, -0.460389391631124, 0.376223909604533,
#> -0.0865283753921801, 1.42297343043252, -0.391232902630507,
#> 0.835906008542682, 1.49391399054269, -0.861719595343475)), row.names = c(NA,
#> -10L), class = "data.frame"), offset = log(z))
#>
#> Coefficients:
#> (Intercept) x new
#> 0.87768 0.05808 0.03074
#>
#> Degrees of Freedom: 9 Total (i.e. Null); 7 Residual
#> Null Deviance: 79.57
#> Residual Deviance: 77.64 AIC: 56.87
Created on 2021-02-07 by the reprex package (v0.3.0)