regarding residual storing in bootstrap regression in R

regarding residual storing in bootstrap regression in R - r

I am trying to do bootstrapping regression by re-sampling X and Y from original sample.
I followed a more manual approach (without using any package)
This is my work so far ,
set.seed(326581)
X1=rnorm(10,0,1)
Y1=rnorm(10,0,2)
data=data.frame(X1,Y1)
lst <- replicate(
100,
df.smpl <- data %>% sample_n(10, replace = T),
simplify = FALSE)
The list contained 100 samples where each sample has 2 columns (X,Y) with a sample size of 10 . These are the bootstrap samples.
to get bootstrap residuals , i separated the X and Y columns into two seperate data frames as follows,
new1=data.frame(lapply(lst, `[`, 'X1'))
new2=data.frame(lapply(lst, `[`, 'Y1))
After that i tried to store the residuals that got from each model fitted by using the following code,
res=c()
for(i in 1:100)
{
res[i]=residuals(lm(new2[,i]~new1[,i]))
}
But seems like something is wrong. Can anyone help me to figure that out ?
By the way is there any easier approach than this ?

You're doing this unnecessarily complicated. The whole advantage of storing objects in a list is that you can easily loop through them with e.g. lapply or sapply.
So for example, to store the residuals of a linear model fit you can do
res <- lapply(lst, function(df) residuals(lm(Y1 ~ X1, data = df)))
This fits a linear model of the form lm(Y1 ~ X1) to all data.frames in lst, and stores the residuals in a list of 100 vectors
length(res)
#[1] 100
You could also store residuals based on an lm fit to all 100 sampled data.frames in a 10x100 matrix by using sapply instead of lapply
res <- sapply(lst, function(df)
residuals(lm(Y1 ~ X1, data = df)))
dim(res)
#[1] 10 100
Update
In response to your comment you can do the following
First calculate and store residuals and residual-derived weights in every data.frame in the list.
# Add residuals and weights to lst
lst <- lapply(lst, function(df) {
df$res <- residuals(lm(Y1 ~ X1, data = df));
df$weights <- 1 / fitted(lm(abs(res) ~ X1, data = df))^2;
df;
})
Then run a weighted linear regression and return the second (slop) coefficients
# Return 2nd coeffficient of weighted regression
coeff <- lapply(lst, function(df)
coefficients(lm(Y1 ~ X1, data = df , weights = weights))[2])

Related

how to store results from multiple regressions in a single dataframe in a neat way

I am going to run dozens of regressions of different Ys on the same X. I want to score the coefficients and standard errors of each regression to a single data frame.
The dataframe is like
Y1, Y2, Y3, ... , Y50, X
1, 2, 3, ..., 50, 1
...
I can do it like for each Y
model1 <- lm (Y1~X, data = data)
summary1 <- summary(model1)
list1 <- list(coef=summary1$coefficients[2,1],se=summary1$coefficients[2,2])
# only coef of X is of interest
And then generate the dataframe I want by
df <- as.data.frame(list1,list2,...,list50)
I am a rookie, is there a more neat way to do this in R? I tried to write functions with the name of the variables as input, but it fails if I define it as function(variable) and use variable directly inside the function.
Thank you so much for your inspiration.

You can try using lapply to loop over each Y variables.
cols <- grep('Y\\d+', names(data))
do.call(rbind, lapply(cols, function(x) {
model <- lm(reformulate('X', x), data)
summary <- summary(model)
data.frame(coef = summary$coefficients[2,1],
se = summary$coefficients[2,2])
})) -> df
df

fitting linear regression models with different predictors using loops

I want to fit regression models using a single predictor variable at a time. In total I have 7 predictors and 1 response variable. I want to write a chunk of code that picks a predictor variable from data frame and fits a model. I would further want to extract regression coefficient( not the intercept) and the sign of it and store them in 2 vectors. Here's my code-
for (x in (1:7))
{
fit <- lm(distance ~ FAA_unique_with_duration_filtered[x] , data=FAA_unique_with_duration_filtered)
coeff_values<-summary(fit)$coefficients[,1]
coeff_value<-coeff_values[2]
append(coeff_value_vector,coeff_value , after = length(coeff_value_vector))
append(RCs_sign_vector ,sign(coeff_values[2]) , after = length(RCs_sign_vector))
}
Over here x in will use the first column , then the 2nd and so on. However, I am getting the following error.
Error in model.frame.default(formula = distance ~ FAA_unique_with_duration_filtered[x], :
invalid type (list) for variable 'FAA_unique_with_duration_filtered[x]'
Is there a way to do this using loops?

You don't really need loops for this.
Suppose we want to regress y1, the 5th column of the built-in anscombe dataset, separately on each of the first 4 columns.
Then:
a <- anscombe
reg <- function(i) coef(lm(y1 ~., a[c(5, i)]))[[2]] # use lm
coefs <- sapply(1:4, reg)
signs <- sign(coefs)
# or
a <- anscombe
reg <- function(i) cov(a$y1, a[[i]]) / var(a[[i]]) # use formula for slope
coefs <- sapply(1:4, reg)
signs <- sign(coefs)
Alternately the following where reg is either of the reg definitions above.
a <- anscombe
coefs <- numeric(4)
for(i in 1:4) coefs[i] <- reg(i)
signs <- sign(coefs)

Loop linear regression different predictor and outcome variables

I'm new to R but am slowly learning it to analyse a data set.
Let's say I have a data frame which contains 8 variables and 20 observations. Of the 8 variables, V1 - V3 are predictors and V4 - V8 are outcomes.
B = matrix(c(1:160),
nrow = 20,
ncol = 8,)
df <- as.data.frame(B)
Using the car package, to perform a simple linear regression, display summary and confidence intervals is:
fit <- lm(V4 ~ V1, data = df)
summary(fit)
confint(fit)
How can I write code (loop or apply) so that R regresses each predictor on each outcome individually and extracts the coefficients and confidence intervals? I realise I'm probably trying to run before I can walk but any help would be really appreciated.

You could wrap your lines in a lapply call and train a linear model for each of your predictors (excluding the target, of course).
my.target <- 4
my.predictors <- 1:8[-my.target]
lapply(my.predictors, (function(i){
fit <- lm(df[,my.target] ~ df[,i])
list(summary= summary(fit), confint = confint(fit))
}))
You obtain a list of lists.

So, the code in my own data that returns the error is:
my.target <- metabdata[c(34)]
my.predictors <- metabdata[c(18 : 23)]
lapply(my.predictors, (function(i){
fit <- lm(metabdata[, my.target] ~ metabdata[, i])
list(summary = summary(fit), confint = confint(fit))
}))
Returns:
Error: Unsupported index type: tbl_df

Running multiple GAMM models using for loop or lapply

Can someone please help me with running multiple GAMM models in a for loop or lapply: I have a set of 10 response and 20 predictor variables in a large data frame arranged in columns.
I'd like to apply GAMM model for each predictor-response combination, and summarize their coefficients and significance tests in a table.
models<-gamm(AnimalCount ~ s(temperature), data=dat,family=poisson(link=log) , random=list(Province=~1) )

I think one way to do this is to create a "matrix" list where the number of rows and columns corresponds to the number of responses (i) and predictors (j), respectively. Then you can store each model result in the cell[i, j]. Let me illustrate:
## make up some data
library(mgcv)
set.seed(0)
dat <- gamSim(1,n=200,scale=2)
set.seed(1)
dat2 <- gamSim(1,n=200,scale=2)
names(dat2)[1:5] <- c("y1", paste0("x", 4:7))
d <- cbind(dat[, 1:5], dat2[, 1:5])
Now the made-up data has 2 responses (y, y1) and 8 predictors (x0 ~ x7). I think you can simplify the process by storing the responses and predictors in separate data frames:
d_resp <- d[ c("y", "y1")]
d_pred <- d[, !(colnames(d) %in% c("y", "y1"))]
## create a "matrix" list of dimensions i x j
results_m <- vector("list", length=ncol(d_resp)*ncol(d_pred))
dim(results_m) <- c(ncol(d_resp), ncol(d_pred))
for(i in 1:ncol(d_resp)){
for(j in 1:ncol(d_pred)){
results_m[i, j][[1]] <- gamm(d_resp[, i] ~ s(d_pred[, j]))
}
}
# flatten the "matrix" list
results_l <- do.call("list", results_m)
You can use sapply/lapply to create a data frame to summarize coefficients, etc. Say, you want to extract fixed-effect intercepts and slopes and stored in a data frame.
data.frame(t(sapply(results_l, function(l) l$lme$coef$fixed)))

using lm() in R for a series of independent fits

I want to use lm() in R to fit a series (actually 93) separate linear regressions. According to the R lm() help manual:
"If response is a matrix a linear model is fitted separately by least-squares to each column of the matrix."
This works fine as long as there are no missing data points in the Y response matrix. When there are missing points, instead of fitting each regression with the available data, every row that has a missing data point in any column is discarded. Is there any way to specify that lm() should fit all of the columns in Y independently and not discard rows where an individual column has a missing data point?

If you are looking to do n regressions between Y1, Y2, ..., Yn and X, you don't specify that with lm() rather you should use R's apply functions:
# create the response matrix and set some random values to NA
values <- runif(50)
values[sample(1:length(values), 10)] <- NA
Y <- data.frame(matrix(values, ncol=5))
colnames(Y) <- paste0("Y", 1:5)
# single regression term
X <- runif(10)
# create regression between each column in Y and X
lms <- lapply(colnames(Y), function(y) {
form <- paste0(y, " ~ X")
lm(form, data=Y)
})
# lms is a list of lm objects, can access them via [[]] operator
# or work with it using apply functions once again
sapply(lms, function(x) {
summary(x)$adj.r.squared
})
#[1] -0.06350560 -0.14319796 0.36319518 -0.16393125 0.04843368

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

regarding residual storing in bootstrap regression in R - r

Related

how to store results from multiple regressions in a single dataframe in a neat way

fitting linear regression models with different predictors using loops

Loop linear regression different predictor and outcome variables

Running multiple GAMM models using for loop or lapply

using lm() in R for a series of independent fits

Categories

Resources