Convert tibble into model class - r

I have an lm object that I generated from a dataframe df:
model <- lm(y ~ poly(x1,...,xN,degree=DEGREE,raw=TRUE)
Each row of the model object represents a term of the polynomial uniquely determined by the powers of the variables x1,...,xN.
I'd like to modify this model by focusing on rows with significant p-values. I applied
t <- summary(model) %>% broom::tidy();
attach(t)
model_slim <- t %>% dplyr::filter(p.value<0.1)
detach(t)
My objective is to run the prediction again to see how severely its changed.
Question
How can I convert this object back to a model so I can apply predict? Is there a better way to do it?
EDIT
attach(iris)
model <- lm(Sepal.Width ~ poly(Sepal.Length,Petal.Length,degree=2)
detach(iris)
t <- summary(model) %>% broom::tidy();
attach(t)
model_slim <- t %>% dplyr::filter(p.value<0.8)
detach(t)
Restating the question, given broom::tidy(): model -> tibble, is there an inverse function which takes tibble -> model? My objective is to predict(model_slim,dataframe)

Related

output step_lencode_mixed (from R package embed)

I have three questions about the sample code below which illustrates the use of step_lencode_mixed.
I read in the vignette that: "For each factor predictor, a generalized linear model is fit to the outcome and the coefficients are returned as the encoding."
In the output from the example below the column 'partial' is the return from step_lencode_mixed. My questions:
Should I use this partial as encoded catagorical variabele "where_town" in the new model to be fitted?
Is there a complete model (Class ~ ., data = okc_train) with all variables on Class fitted in the background and is the contribution from variabele "where_town" returned as partial?
If I convert the partial with the logit2prob function, I notice that the outcome is almost identical to the rate. For that reason I suppose the outcome is not a coefficient?
Thanks a lot!
# ------------------------------------------------------------------------------
# Feature Engineering and Selection: A Practical Approach for Predictive Models
# by Max Kuhn and Kjell Johnson
#
# ------------------------------------------------------------------------------
#
# Code for Section 5.4 at
# https://bookdown.org/max/FES/categorical-supervised-encoding.html
#
# ------------------------------------------------------------------------------
#
# Code requires these packages:
library(tidymodels)
library(embed)
# Create example data ----------------------------------------------------------
load("../Data_Sets/OkCupid/okc.RData")
load("../Data_Sets/OkCupid/okc_binary.RData")
options(width = 120)
partial_rec <-
recipe(Class ~ ., data = okc_train) %>%
step_lencode_mixed(
where_town,
outcome = vars(Class)
) %>%
prep()
okc_train2 <- okc_train %>% select(where_town, Class)
partial_rec2 <-
recipe(Class ~ ., data = okc_train2) %>%
step_lencode_mixed(
where_town,
outcome = vars(Class)
) %>%
prep()
# Organize results -------------------------------------------------------------
partial_pooled <-
tidy(partial_rec, number = 1) %>%
dplyr::select(-terms, -id) %>%
setNames(c("where_town", "partial"))
partial_pooled <- left_join(partial_pooled, okc_props)
partial_pooled2 <-
tidy(partial_rec2, number = 1) %>%
dplyr::select(-terms, -id) %>%
setNames(c("where_town", "partial"))
all.equal(partial_pooled, partial_pooled2)
>
[1] TRUE
Should I use this partial as encoded catagorical variabele "where_town" in the new model to be fitted?
Yes. You don't have to do it manually though. The bake() function does that for you automatically (same as if you include the recipe in a workflow)
Is there a complete model (Class ~ ., data = okc_train) with all variables on Class fitted in the background and is the contribution from variable "where_town" returned as partial?
Yes. There is more information in the tidymodels book (section 17.3).
If I convert the partial with the logit2prob function, I notice that the outcome is almost identical to the rate. For that reason, I suppose the outcome is not a coefficient?
A simpler method to do the conversion to the rate is binomial()$linkinv(partial_pooled$partial).
The value given in the partial column is the log-odds value (hence the negative numbers); we use logistic regression (mixed model) to estimate. It uses an empirical Bayes estimation method that shrinks the coefficient estimates toward the overall (population) estimate.
The amount of shrinkage, for this model, is based on a few things but is mostly driven by the per-category sample size. Smaller sample sizes are affected more than categories with larger amounts of data. So the raw and shrunken estimates for berkeley are about the same since there were 2676 data points there but belvedere_tiburon has larger differences in estimates because the sample size was 35.

Use Apply to iterate through multiple response variables in a model and then input that model into stepwise model

I am trying to use the apply function to create multiple linear models. My data has 32 linear models that will be created using 30 predictor variables. In this case, I am using the dummy data mtcars with only 5 predictors and 5 response variables as seen below (my actual data has about 30 of each). I want to use the lapply () function to create the models and then feed each individual model seperately into a stepAIC function and then feed those models seperately into a stepVIF fucniton. In the end I want to output a list of all of the model results after the stepVIF funciton. Beginnings of code my code are below, I belive my major issue currenlty is the multiple na's that I have in my data. I only wish to remove na's from the predictor column when that model runs and not at the beginning from the entire data frame otherwise it removes some data that can be used. I can get this to work if I input one response variable at a time and copy paste the code over and over.
n=6
full.model <- lapply(1:n, function(p) lm((p)~., cars_data[, c(p, 7:11)])) #Fit full lm model to the last 5 variables in mt cars and create a model for the first 6 variables in mtcars
step.model <- stepAIC(full.model, direction = "both", trace = FALSE) #Fit stepwise regression model
stepmod3 <- stepVIF(model = step.model, threshold = 10, verbose = TRUE)
I am currently running into a variable lengths differ error
This should do it:
n=6
#Fit full lm model to the last 5 variables in mt cars and create a model for the first 6 variables in mtcars
full.model <- lapply(1:n, function(p) lm(reformulate(names(mtcars)[7:11], response=names(mtcars)[p]), mtcars))
step.model <- lapply(full.model, function(x)MASS::stepAIC(x, direction = "both", trace = FALSE)) #Fit stepwise regression model
stepmod3 <- lapply(step.model, function(x)pedometrics::stepVIF(model = x, threshold = 10, verbose = TRUE))
When missing data patterns are different.
When missing data patterns differ and you only want to delete the observations that have some missing data for that particular model, the easiest thing is to embed the data in the call. For example, here's what it would look like with the mtcars data:
library(dplyr)
n=6
#Fit full lm model to the last 5 variables in mt cars and create a model for the first 6 variables in mtcars
full.model <- lapply(1:n, function(p){
args <- list(
formula = reformulate(names(mtcars)[7:11], response=names(mtcars)[p]),
data= mtcars %>% select(p,7:11) %>% na.omit())
do.call("lm", args)
})
step.model <- lapply(full.model, function(x)MASS::stepAIC(x, direction = "both", trace = FALSE)) #Fit stepwise regression model
stepmod3 <- lapply(step.model, function(x)pedometrics::stepVIF(model = x, threshold = 10, verbose = TRUE))

How to regress a list of covariates with a desired predictor and dependent variable and return a table of coefficients and p-values using lme and lmer

I have a dataset with a rather large amount of variables. In the dataset I have a predictor and an outcome variable I want to investigate. I want to find covariates with either a significant effect on the outcome variable, or a significant interaction effect between the predictor and the covariate on the outcome variable.
It would therefore be convenient to be able to regress all the covariates in turn with the desired predictor on the dependent variable and create a table over the effects and interaction effects of the covariates with their respective p-values.
I want to do something like this:
library(dplyr)
# Generating sample data
set.seed(5)
df <- data.frame(matrix(round(abs(2*rnorm(100*100)), digits = 0), ncol=100))
# Selecting covariates
covar <- names(df)[! names(df) %in% c("X1", "X2")]
# Running the lm function over the list of covariates. I should get the covariate coefficients from each regression, but I get an error when I try run this step.
coeff <- lapply(covar, function(x){
# Retrive coefficient matrix
summary(lm(X1 ~ X2 + x + X2*x, df))$coefficients %>%
# Coerce into dataframe and filter for covariates and interaction effects
as.data.frame(.) %>%
filter(row.names(.) %in% grep(x, rownames(.), value =
TRUE))}) %>%
# Finally I want to join all data frames into one
bind_rows(.)
I could use some help with the syntax. I get the following error when I try to run the function:
Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'summary': variable lengths differ (found for 'x')
When you use x(in lapply) inside function, it might be better using paste for model formula instead of just specifying it's formula.
lapply(covar, function(x){
modd <- paste0("X1 ~ X2 +", x, "+ X2 *", x)
summary(lm(modd, df))$coefficients %>%
as.data.frame(.) %>%
filter(row.names(.) %in% grep(x, rownames(.), value =
TRUE))}) %>%
bind_rows(.)

Fitting a quadratic curve for each data set that has different lengths

I would like to fit a quadratic to (Time,SkinTemp) for each id in the following data.frame df. Each id has a different number of Time,SkinTemp entries so I'm stuck with 'predict'
df<-data.frame(Time=seq(65),
SkinTemp=rnorm(65,37,0.5),
id=rep(1:10,c(5,4,10,6,7,8,9,8,4,4)))
So far I have:
#Fit the model y=x^2+x+C
fitted_models = df %>% group_by(id) %>% do(model = lm(SkinTemp ~ Time+I(Time^2), data = .))
So far so good. Here's where I'm stuck. How do I pass the original Time data into the predict function below?
#Predict data points for each quadratic
predQ<-sapply(unique(df$id), function(x) predict(fitted_models$model[[x]]))
Use fitted:
lapply(fitted_models$model, fitted)

extract log rank (score) test result wiht p-value for Coxph Model

I have 100 replicates of coxph model fitted in loop. I am trying to extract out log-rank score test result with p-values for each replicate in a data frame or list. I am using the following. But, it gives me only log rank score, not p-value. Any help will be very appreciated.
I can share dataset, but am not sure how to attach here.
thanks,
Krina
Repl_List <- unique(dat3$Repl)
doLogRank = function(sel_name) {
dum <- dat3[dat3$Repl == sel_name,]
reg <- with(dum, coxph(Surv(TIME_day, STATUS) ~ Treatment, ties = "breslow"))
LogRank <- with(reg, reg$score)
}
LogRank <- t(as.data.frame(lapply(Repl_List, doLogRank)))
Here is a mock example that I took from the help page of the coxph function. I just replicated the dataset 100 times to create your scenario. I highly recommend to start using the tidyverse packages to do such work. broom is a great addition along with dplyr and tidyr.
library(survival)
library(tidyverse)
library(broom)
test <- data.frame(time=c(4,3,1,1,2,2,3),
status=c(1,1,1,0,1,1,0),
x=c(0,2,1,1,1,0,0),
sex=c(0,0,0,0,1,1,1))
Below I am replicating the dataset 100 times using the replicate function.
r <- replicate(test,n = 100,simplify = FALSE) %>% bind_rows %>%
mutate(rep = rep(seq(1,100,1),each=7))
I setup the cox model as a small function that I can them pass on to each replicate of the dataframe.
cxph_mod <- function(df) {
coxph(Surv(time, status) ~ x + strata(sex), df)
}
Below, is the step by step process of fitting the model and extracting the values.
tidyr::nest the dataframe
purrr::map the model into each nest
nest is function in library(tidyr)
map is a function similar to lapply in library(purrr)
nested <- r %>%
group_by(rep) %>%
nest %>%
mutate(model = data %>% map(cxph_mod))
look into the first rep to see the coxph output. You will see the model object stored in the cells of the dataframe allowing easier access.
nested %>% filter(rep==1)
With each model object, now use broom to get the parameter estimates and the prediction from the model into the nested dataset
nested <- nested %>%
mutate(
ests = model %>% map(broom::tidy)
)
tidyr::unnest to view your predictions for fitting each resampled dataset
ests <- unnest(nested,ests,.drop=TRUE) %>% dplyr::select(rep,estimate:conf.high)
In this case since I am repeating the same dataset 100 times, the pvalue will be the same, but in your case you will have 100 different datasets and hence 100 different p.values.
ggplot(data=ests,aes(y=p.value,x=rep))+geom_point()
Vijay

Resources