is there an R function by which you I can extract GAM summary output as a table to be presented in a document? - r

I have run a GAM and obtained a summary output. I want to extract the output of summary into a table, so that I can use it in a document.

Let's say your gam model is called gamfit.
You can use l <- as.list(summary(gamfit)) to store this in a list.
Then access the parts with l[[1]], l[[2]], ...
The problem with this is that, depending on how many variables are used in the analysis, the data you want may move around and the list items aren't named. If the model isn't going to change, then it's fine.
Also note that all the information about the model is stored in gamfit.
Explore this in the Rstudio Environment window or at the command line with names(gamfit), followed by gamfit$name_of_interest.

Related

How to extract variables from tab_model in R to create new data frame?

Example output of tab_model
I have created a table from tab_model that includes multiple models and wish to extract all 'p-values' and 'Estimates/Odds Ratio' to create a data frame that includes these. Output of tab_model is an html file. I am unable to find a function to pull this info in accordance, any ideas on how I could do this?
For example, I want to retrieve all p-values and Estimates for variable 'age' in all of my models...Only 3 in example image but I have hundreds
You should get these values from the regression models themselves, instead of outputting them to a HTML-table, and then extract them.
Without further knowledge of your process and data it is difficult to provide a more concrete answer.

Store regression models in dataframe

I conduct a large number of regression analyses using ols and cph (different models, sensitivity analyses etc) which takes on my computer around two hours. Therefore, I would like to save these models so that I don't have to re-run the same analyses every time I want to work with them. The models all have very structured names, so I can create a list of names as follows:
model.names <- list()[grep("^im", ls())
But how can I use this to save those models? Could they be placed into a data frame?
I think you are looking for save()
save writes an external representation of R objects to the specified file. The objects can be read back from the file at a later date by using the function load or attach (or data in some cases).

What data structure in R is suitable to store models?

Say I'm training models:
lm.1, lm.2, ... , lm.100
I would like to refer back to these later in my code for various purposes: say to inspect coefficients, run test data against them, etc.
What data structure should I use to store them in?
A list is what I've been using but lists for some reason seem a bit unwieldy of all the data structures in R
It is certainly a list.
A complete lm object is a list; the only thing in R that can hold a list is just a list. We have no other option.
In some cases, even if the return of a model is just a vector, we can not guarantee the resulting vectors are of equal length for all models we try, so we still have to use a list.

Generating variable names for dataframes based on the loop number in a loop in R

I am working on developing and optimizing a linear model using the lm() function and subsequently the step() function for optimization. I have added a variable to my dataframe by using a random generator of 0s and 1s (50% chance each). I use this variable to subset the dataframe into a training set and a validation set If a record is not assigned to the training set it is assigned to the validation set. By using these subsets I am able to estimate how good the fit of the model is (by using the predict function for the records in the validation set and comparing them to the original values). I am interested in the coefficients of the optimized model and in the results of the KS-test between the distributions of the predicted and actual results.
All of my code was working fine, but when I wanted to test whether my model is sensitive to the subset that I chose I ran into some problems. To do this I wanted to create a for (i in 1:10) loop, each time using a different random subset. This turned out to be quite a challenge for me (I have never used a for loop in R before).
Here's the problem (well actually there are many problems, but here is one of them):
I would like to have separate dataframes for each run in the loop with a unique name (for example: Run1, Run2, Run3). I have been able to create a variable with different strings using paste(("Run",1:10,sep=""), but that just gives you a list of strings. How do I use these strings as names for my (subsetted) dataframes?
Another problem that I expect to encounter:
Subsequently I want to use the fitted coefficients for each run and export these to Excel. By using coef(function) I have been able to retrieve the coefficients, however the number of coefficients included in the model may change per simulation run because of the optimization algorithm. This will almost certainly give me some trouble with pasting them into the same dataframe, any thoughts on that?
Thanks for helping me out.
For your first question:
You can create the strings as before, using
df.names <- paste(("Run",1:10,sep="")
Then, create your for loop and do the following to give the data frames the names you want:
for (i in 1:10){
d.frame <- # create your data frame here
assign(df.name[i], d.frame)
}
Now you will end up with ten data frames with ten different names.
For your second question about the coefficients:
As far as I can tell, these don't naturally fit into your data frame structure. You should consider using lists, as they allow different classes - in other words, for each run, create a list containing a data frame and a numeric vector with your coefficients.
Don't create objects with numbers in their names, and then try and access them in a loop later, using get and paste and assign. The right way to do this is to store your elements in an R list object.

function for removing nonsignificant variables at one step in R

I am trying to automate logistic regression in R.
Basically, my source code will generate a new equation everyday as the input data is updated,
(Variables, data format etc are same) and print out te significant variables with corresponding coefficients.
When I use step function, sometimes the resulting coefficients are not significant. Therefore, I want to update my set of coefficients and get rid of all the ones that are not significant enough.
Is there a function or automated way of doing it?
If not, the only way I can think of is writing a script on another language that takes the coefficients and corresponding P value and checking significance, and rerunning R accordingly. But even for that, do you know how can I get only P values and coefficients of variables. I can either print whole summary of regression result with "summary" function. I can't reach only P values.
Thank you very much
It's a bit hard for me without sample code and data, but you can subset based on variable values like this,
newdata <- data[ which(data$p.value < 0.5), ]
You can inspect your R object using str, see ?str to figure out how to select whatever you want to use in your subset $p.value or $residuals.
If this doesn't answer your question try submitting some sample code and data.
Best,
Eric

Resources