I want to make a linear regression where my dependant variable is data$fs_deviation_score while independent vairables are multiple columns of my data frame (column 660 to 675).
with this function it works but i am not able to save the output (Coefficients : Estimate , Std.Error, p value..)
Reg<-lapply( data[660:675], function(x) summary(lm(data$fs_deviation_score ~ x)))
I search a function to save the output (Coefficients :Estimate, Std.Error, p value..)
Thanks
If you have that Reg object produced by:
Reg<-lapply( data[660:675], function(x) summary(lm(data$fs_deviation_score ~ x)))
Then all you need to do to extract the coefficients matrix from each summary object is this:
CoefMats <- lapply( Reg, coef)
The summary-objects are actually named lists (as are lm and glm objects). ?summary.lm will bring up the specific help page and the Value subsection will give you all the names. The See Also links should also be reviewed.
Had you want the r.squared values you could have use this:
Rsqds <- lapply(Reg, "[[", "r.squared")
Example:
m <- lm (as.integer (Species), iris);
save (m, "iris_lm.RData");
Now in a new session.
load ("iris_lm.RData");
summary (m);
names (m);
You can use the save function to save the entire object and read and use it later. Note that this is a general way to store any R object.
To be able to want to store the components of the components of the summary of the model m. You can do as follows for example to store the residuals.
write.csv (m$residuals, "residuals.csv");
To access each components of the summary, execute names(m) so see what are the components do help (summary.lm)
If you want to save the output of the regression model in Tibble format, you can use the broom package. This works for other model objects too.
Also, gtsummary package helps to get output in presentation-ready table format.
For examples; see this
Related
For documentation purposes, my department requires I have copies of the raw output from models, not just the an organized table of the parts I want to keep and publish.
For example, here is a function I wrote to run a CoxPH model and organize the output into a table. I have several iterations of this function for various outcomes, I use lapply() to call them all. Here is a simplified version of the code:
myfun <- function(x){
model <- summary(coxph(Surv(FAIL, DEAD)~ x + covariate1 + covariate2, data=df))
# just pull out the parts I need
results <- data.frame(cbind(model$conf.int[,c("exp(coef)",
"lower .95", "upper .95")],
coef(model)[,"Pr(>|z|)"]))
names(results) <- c("RR", "LL", "UL", "pval")
# format the the results for the table
results$answer <- paste0(format(round(results[,"RR"],2),2), " (",
format(round(results[,"LL"],2),2), ", ",
format(round(results[,"UL"],2),2), ")")
# drop all the results for covariates
results <-results[substr(row.names(results),1,nchar(expo))==expo,
c("answer","pval")]
return(results)
The final "results" is just a nice clean data frame of the main independent variable, RR (95% CI) and a p-value, everything runs fine.
When I call the function:
outcomes <- c("DEAD","HEARTDISEASE","LUNGCANCER")
final.results <- lapply(outcomes, function(x) myfun(x))
In this case I get a nice list of three data frames with my organized results. I can rbind() each element of the list and export it to excel. -- in truth, there are a lot more outcomes, and age-only model and a fully adjusted model, and several main effect variables, but it all basically works like the code above --
Now, I need to print in the console the original summary() of the Cox model, and still keep my final list of organized output. I have done this by adding the following to the above code:
myfun <- function(x){
# all that code listed above #
# create a list to hold both raw output and my cleaned results
final <- list()
final$model <- model
final$results <- results
return(final)
Then after the function call, I do some separate coding to separate out the summary() portion for each iterative run, print them, then do the same for the final organized results.
Ideally I would just like each iterative run of the function to only return the nice clean table, but print the entire summary(model) to the console. How can I make that happen in the function that will still work with my lapply() call?
Is there a simpler method than just creating a giant list of everything and dealing with it after all the models are run?
Thank you.
I know there are a lot of post about how to extract the p-value from an aov. However, I have a list with several thousand samples. i did an aov for each sample to compare two different treatments and now i am looking for a way to get a list with all the p-values, as i cannot copy them one by one..
is this even possible?
I had no problems doing this for the p-values created by a ttest:
results <- apply(data,1,function(x){t.test(x[1:3],x[4:6])$p.value})
data is my imported .csv and [1:3] indicates the columns that are compared with the columns [4:6]
so that really was not a problem, but it seems not to be possible to do something similar for the aov:
results <- apply(data,1,function(x){aov(x[1:3]~x[4:6])})
i cannot get a list with all the p-values (that are called Pr(>F)..which is kind of frustrating..
hope you understand what i am trying to do,
results <- apply(data,1,function(x){anova(aov(x[1:3]~x[4:6]))[['Pr(>F)']][1]})
Youll probably want lapply if the data is in a list already. And you can use summary to get the p-values from aov
lapply(yourData, function(x){
av <- aov(yourFormula, data = x)
summary(av)[[1]][,5]
})
Function lm(...) returns an object of class 'lm'. How do I create an array of such objects? I want to do the following:
my_lm_array <- rep(as.lm(NULL), 20)
#### next, populate this array by running lm() repeatedly:
for(i in 1:20) {
my_lm_array[i] <- lm(my_data$results ~ my_data[i,])
}
Obviously the line "my_lm <- rep(as.lm(NULL), 20)" does not work. I'm trying to create an array of objects of type 'lm'. How do I do that?
Not sure it will answer your question, but if what you want to do is run a series of lm from a variable against different columns of a data frame, you can do something like this :
data <- data.frame(result=rnorm(10), v1=rnorm(10), v2=rnorm(10))
my_lms <- lapply(data[,c("v1","v2")], function(v) {
lm(data$result ~ v)
})
Then, my_lms would be a list of elements of class lm.
Well, you can create an array of empty/meaningless lm objects as follows:
z <- NA
class(z) <- "lm"
lm_array <- replicate(20,z,simplify=FALSE)
but that's probably not the best way to solve the problem. You could just create an empty list of the appropriate length (vector("list",20)) and fill in the elements as you go along: R is weakly enough typed that it won't mind you replacing NULL values with lm objects. More idiomatically, though, you can run lapply on your list of predictor names:
my_data <- data.frame(result=rnorm(10), v1=rnorm(10), v2=rnorm(10))
prednames <- setdiff(names(my_data),"result") ## extract predictor names
lapply(prednames,
function(n) lm(reformulate(n,response="result"),
data=my_data))
Or, if you don't feel like creating an anonymous function, you can first generate a list of formulae (using lapply) and then run lm on them:
formList <- lapply(prednames,reformulate,response="result") ## create formulae
lapply(formList,lm,data=my_data) ## run lm() on each formula in turn
will create the same list of lm objects as the first strategy above.
In general it is good practice to avoid using syntax such as my_data$result inside modeling formulae; instead, try to set things up so that all the variables in the model are drawn from inside the data object. That way methods like predict and update are more likely to work correctly ...
I'm trying to run a regression for every zipcode in my dataset and save the coefficients to a data frame but I'm having trouble.
Whenever I run the code below, I get a data frame called "coefficients" containing every zip code but with the intercept and coefficient for every zipcode being equal to the results of the simple regression lm(Sealed$hhincome ~ Sealed$square_footage).
When I run the code as indicated in Ranmath's example at the link below, everything works as expected. I'm new to R after many years with STATA, so any help would be greatly appreciated :)
R extract regression coefficients from multiply regression via lapply command
library(plyr)
Sealed <- read.csv("~/Desktop/SEALED.csv")
x <- function(df) {
lm(Sealed$hhincome ~ Sealed$square_footage)
}
regressions <- dlply(Sealed, .(Sealed$zipcode), x)
coefficients <- ldply(regressions, coef)
Because dlply takes a ... argument that allows additional arguments to be passed to the function, you can make things even simpler:
dlply(Sealed,.(zipcode),lm,formula=hhincome~square_footage)
The first two arguments to lm are formula and data. Since formula is specified here, lm will pick up the next argument it is given (the relevant zipcode-specific chunk of Sealed) as the data argument ...
You are applying the function:
x <- function(df) {
lm(Sealed$hhincome ~ Sealed$square_footage)
}
to each subset of your data, so we shouldn't be surprised that the output each time is exactly
lm(Sealed$hhincome ~ Sealed$square_footage)
right? Try replacing Sealed with df inside your function. That way you're referring to the variables in each individual piece passed to the function, not the whole variable in the data frame Sealed.
The issue is not with plyr but rather in the definition of the function. You are calling a function, but not doing anything with the variable.
As an analogy,
myFun <- function(x) {
3 * 7
}
> myFun(2)
[1] 21
> myFun(578)
[1] 21
If you run this function on different values of x, it will still give you 21, no matter what x is. That is, there is no reference to x within the function. In my silly example, the correction is obvious; in your function above, the confusion is understandable. The $hhincome and $square_footage should conceivably serve as variables.
But you want your x to vary over what comes before the $. As #Joran correctly pointed out, swap sealed$hhincome with df$hhincome (and same for $squ..) and that will help.
For each of 100 data sets, I am using lm() to generate 7 different equations and would like to extract and compare the p-values and adjusted R-squared values.
Kindly assume that lm() is in fact the best regression technique possible for this scenario.
In searching the web I've found a number of useful examples for how to create a function that will extract this information and write it elsewhere, however, my code uses paste() to label each of the functions by the data source, and I can't figure out how to include these unique pasted names in the function I create.
Here's a mini-example:
temp <- data.frame(labels=rep(1:10),LogPre= rnorm(10))
temp$labels2<-temp$labels^2
testrun<-c("XX")
for (i in testrun)
{
assign(paste(i,"test",sep=""),lm(temp$LogPre~temp$labels))
assign(paste(i,"test2",sep=""),lm(temp$LogPre~temp$labels2))
}
I would then like to extract the coefficients of each equation
But the following doesn't work:
summary(paste(i,"test",sep="")$coefficients)
and neither does this:
coef(summary(paste(i,"test",sep="")))
Both generating the error :$ operator is invalid for atomic vectors
EVEN THOUGH
summary(XXtest)$coefficients
and
coef(summary(XXtest))
work just fine.
How can I use paste() within summary() to allow me to do this for AAtest, AAtest2, ABtest, ABtest2, etc.
Thanks!
Hard to tell exactly what your purpose is, but some kind of apply loop may do what you want in a simpler way. Perhaps something like this?
temp <- data.frame(labels=rep(1:10),LogPre= rnorm(10))
temp$labels2<-temp$labels^2
testrun<-c("XX")
names(testrun) <- testrun
out <- lapply(testrun, function(i) {
list(test1=lm(temp$LogPre~temp$labels),
test2=lm(temp$LogPre~temp$labels2))
})
Then to get all the p-values for the slopes you could do:
> sapply(out, function(i) sapply(i, function(x) coef(summary(x))[2,4]))
XX
test1 0.02392516
test2 0.02389790
Just using paste results in a character string, not the object with that name. You need to tell R to get the object with that name by using get.
summary(get(paste(i,"test",sep="")))$coefficients