R regressions in a loop [duplicate]

R regressions in a loop [duplicate] - r

This question already has answers here:
Linear Regression and group by in R
(10 answers)
Closed 6 years ago.
I am running a linear regression on some variables in a data frame. I'd like to be able to subset the linear regressions by a categorical variable, run the linear regression for each categorical variable, and then store the t-stats in a data frame. I'd like to do this without a loop if possible.
Here's a sample of what I'm trying to do:
a<- c("a","a","a","a","a",
"b","b","b","b","b",
"c","c","c","c","c")
b<- c(0.1,0.2,0.3,0.2,0.3,
0.1,0.2,0.3,0.2,0.3,
0.1,0.2,0.3,0.2,0.3)
c<- c(0.2,0.1,0.3,0.2,0.4,
0.2,0.5,0.2,0.1,0.2,
0.4,0.2,0.4,0.6,0.8)
cbind(a,b,c)
I can begin by running the following linear regression and pulling the t-statistic out very easily:
summary(lm(b~c))$coefficients[2,3]
However, I'd like to be able to run the regression for when column a is a, b, or c. I'd like to then store the t-stats in a table that looks like this:
variable t-stat
a 0.9
b 2.4
c 1.1
Hope that makes sense. Please let me know if you have any suggestions!

Here is a solution using dplyr and tidy() from the broom package. tidy() converts various statistical model outputs (e.g. lm, glm, anova, etc.) into a tidy data frame.
library(broom)
library(dplyr)
data <- data_frame(a, b, c)
data %>%
group_by(a) %>%
do(tidy(lm(b ~ c, data = .))) %>%
select(variable = a, t_stat = statistic) %>%
slice(2)
# variable t_stat
# 1 a 1.6124515
# 2 b -0.1369306
# 3 c 0.8000000
Or extracting both, the t-statistic for the intercept and the slope term:
data %>%
group_by(a) %>%
do(tidy(lm(b ~ c, data = .))) %>%
select(variable = a, term, t_stat = statistic)
# variable term t_stat
# 1 a (Intercept) 1.2366939
# 2 a c 1.6124515
# 3 b (Intercept) 2.6325081
# 4 b c -0.1369306
# 5 c (Intercept) 1.4572335
# 6 c c 0.8000000

You can use the lmList function from the nlme package to apply lm to subsets of data:
# the data
df <- data.frame(a, b, c)
library(nlme)
res <- lmList(b ~ c | a, df, pool = FALSE)
coef(summary(res))
The output:
, , (Intercept)
Estimate Std. Error t value Pr(>|t|)
a 0.1000000 0.08086075 1.236694 0.30418942
b 0.2304348 0.08753431 2.632508 0.07815663
c 0.1461538 0.10029542 1.457233 0.24110393
, , c
Estimate Std. Error t value Pr(>|t|)
a 0.50000000 0.3100868 1.6124515 0.2052590
b -0.04347826 0.3175203 -0.1369306 0.8997586
c 0.15384615 0.1923077 0.8000000 0.4821990
If you want the t values only, you can use this command:
coef(summary(res))[, "t value", -1]
# a b c
# 1.6124515 -0.1369306 0.8000000

Here's a vote for the plyr package and ddply().
plyrFunc <- function(x){
mod <- lm(b~c, data = x)
return(summary(mod)$coefficients[2,3])
}
tStats <- ddply(dF, .(a), plyrFunc)
tStats
a V1
1 a 1.6124515
2 b -0.1369306
3 c 0.6852483

Use split to subset the data and do the looping by lapply
dat <- data.frame(b,c)
dat_split <- split(x = dat, f = a)
res <- sapply(dat_split, function(x){
summary(lm(b~c, data = x))$coefficients[2,3]
})
Reshape the result to your needs:
data.frame(variable = names(res), "t-stat" = res)
variable t.stat
a a 1.6124515
b b -0.1369306
c c 0.8000000

You could do this:
a<- c("a","a","a","a","a",
"b","b","b","b","b",
"c","c","c","c","c")
b<- c(0.1,0.2,0.3,0.2,0.3,
0.1,0.2,0.3,0.2,0.3,
0.1,0.2,0.3,0.2,0.3)
c<- c(0.2,0.1,0.3,0.2,0.4,
0.2,0.5,0.2,0.1,0.2,
0.4,0.2,0.4,0.6,0.8)
df <- data.frame(a,b,c)
t.stats <- t(data.frame(lapply(c('a','b','c'),
function(x) summary(lm(b~c,data=df[df$a==x,]))$coefficients[2,3])))
colnames(t.stats) <- 't-stat'
rownames(t.stats) <- c('a','b','c')
Output:
> t.stats
t-stat
a 1.6124515
b -0.1369306
c 0.8000000
Unless I am mistaken the values you give in your output are not the correct ones.
Or:
t.stats <- data.frame(t.stats)
t.stats$variable <- rownames(t.stats)
> t.stats[,c(2,1)]
variable t.stat
a a 1.6124515
b b -0.1369306
c c 0.8000000
If you want a data.frame and a separate column.

Related

R: t test over multiple columns using t.test function

I tried to perform independent t-test for many columns of a dataframe. For example, i created a data frame
set seed(333)
a <- rnorm(20, 10, 1)
b <- rnorm(20, 15, 2)
c <- rnorm(20, 20, 3)
grp <- rep(c('m', 'y'),10)
test_data <- data.frame(a, b, c, grp)
To run the test, i used with(df, t.test(y ~ group))
with(test_data, t.test(a ~ grp))
with(test_data, t.test(b ~ grp))
with(test_data, t.test(c ~ grp))
I would like to have the outputs like this
mean in group m mean in group y p-value
9.747412 9.878820 0.6944
15.12936 16.49533 0.07798
20.39531 20.20168 0.9027
I wonder how can I achieve the results using
1. for loop
2. apply()
3. perhaps dplyr
This link R: t-test over all columns is related but it was 6 years old. Perhaps there are better ways to do the same thing.

Use select_if to select only numeric columns then use purrr:map_df to apply t.test against grp. Finally use broom:tidy to get the results in tidy format
library(tidyverse)
res <- test_data %>%
select_if(is.numeric) %>%
map_df(~ broom::tidy(t.test(. ~ grp)), .id = 'var')
res
#> # A tibble: 3 x 11
#> var estimate estimate1 estimate2 statistic p.value parameter conf.low
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 a -0.259 9.78 10.0 -0.587 0.565 16.2 -1.19
#> 2 b 0.154 15.0 14.8 0.169 0.868 15.4 -1.78
#> 3 c -0.359 20.4 20.7 -0.287 0.778 16.5 -3.00
#> # ... with 3 more variables: conf.high <dbl>, method <chr>,
#> # alternative <chr>
Created on 2019-03-15 by the reprex package (v0.2.1.9000)

Simply extract the estimate and p-value results from t.test call while iterating through all needed columns with sapply. Build formulas from a character vector and transpose with t() for output:
formulas <- paste(names(test_data)[1:(ncol(test_data)-1)], "~ grp")
output <- t(sapply(formulas, function(f) {
res <- t.test(as.formula(f))
c(res$estimate, p.value=res$p.value)
}))
Input data (seeded for reproducibility)
set.seed(333)
a <- rnorm(20, 10, 1)
b <- rnorm(20, 15, 2)
c <- rnorm(20, 20, 3)
grp <- rep(c('m', 'y'),10)
test_data <- data.frame(a, b, c, grp)
Output result
# mean in group m mean in group y p.value
# a ~ grp 9.775477 10.03419 0.5654353
# b ~ grp 14.972888 14.81895 0.8678149
# c ~ grp 20.383679 20.74238 0.7776188

As you asked for a for loop:
a <- rnorm(20, 10, 1)
b <- rnorm(20, 15, 2)
c <- rnorm(20, 20, 3)
grp <- rep(c('m', 'y'),10)
test_data <- data.frame(a, b, c, grp)
meanM=NULL
meanY=NULL
p.value=NULL
for (i in 1:(ncol(test_data)-1)){
meanM=as.data.frame(rbind(meanM, t.test(test_data[,i] ~ grp)$estimate[1]))
meanY=as.data.frame(rbind(meanY, t.test(test_data[,i] ~ grp)$estimate[2]))
p.value=as.data.frame(rbind(p.value, t.test(test_data[,i] ~ grp)$p.value))
}
cbind(meanM, meanY, p.value)
It works, but I am a beginner in R. So maybe there is a more efficient solution

Using lapply this is rather easy.
I have tested the code with set.seed(7060) before creating the dataset, in order to make the results reproducible.
tests_list <- lapply(letters[1:3], function(x) t.test(as.formula(paste0(x, "~ grp")), data = test_data))
result <- do.call(rbind, lapply(tests_list, `[[`, "estimate"))
pval <- sapply(tests_list, `[[`, "p.value")
result <- cbind(result, p.value = pval)
result
# mean in group m mean in group y p.value
#[1,] 9.909818 9.658813 0.6167742
#[2,] 14.578926 14.168816 0.6462151
#[3,] 20.682587 19.299133 0.2735725
Note that a real life application would use names(test_data)[1:3], not letters[1:3], in the first lapply instruction.

This should be a comment rather than an answer, but I'll make it an answer. The reason is that the accepted answer is awesome but with one caveat that may cost others hours, which is at least the case for me.
The original data posted by OP
a <- rnorm(20, 10, 1)
b <- rnorm(20, 15, 2)
c <- rnorm(20, 20, 3)
grp <- rep(c('m', 'y'),10)
test_data <- data.frame(a, b, c, grp)
The answer provided by #Tung
library(tidyverse)
res <- test_data %>%
select_if(is.numeric) %>%
map_df(~ broom::tidy(t.test(. ~ grp)), .id = 'var')
res
The problem, or more accurately, the caveat, of this answer is that one has to define the grp variable separately. Having the group variable outside of the dataframe is not a common practice as far as I know. So, even the answer is neat, it may be better to point out this operation (define group variable outside of the dataframe). Therefore, I use this comment like answer in the hope to save some time for those late comers.

Is it possible to perform regression inside aggregate function? [duplicate]

This question already has an answer here:
Can't get aggregate() work for regression by group
(1 answer)
Closed 4 years ago.
For Example
FP <- data.frame(A = 1:9, B = 11:19, C = 21:29, D = 31:39 ..... N = 145:153, Date: Jan 1 to Jan 9)
(I know the syntax above is wrong. Just for your understanding)
There are like n number of columns say 14 and an additional date column
I need to perform simple linear regression of A (independent Variable) on B,C,D,E...N (dependent Variables)SEPARATELY grouped by the date column, How to make aggregate function work? Or is there any other function which will be come in handy ?

When working / saving models you might want to work with lists:
FP <- data.frame(A = 1:9, B = 11:19, C = 21:29, D = rep(1:3,3))
lapply(split(FP, FP$D), function(x) lm(B + C ~ A, data = x))
#$`1`
#
#Call:
#lm(formula = B + C ~ A, data = x)
#Coefficients:
#(Intercept) A
# 30 2
#
#$`2`
#Call:
#lm(formula = B + C ~ A, data = x)
#Coefficients:
#(Intercept) A
# 30 2
#$`3`
#Call:
#lm(formula = B + C ~ A, data = x)
#Coefficients:
#(Intercept) A
# 30 2
First you split your data.frame by D and then run your regressions on those splits.

using lapply to create t-test table

i want to get t-tests between two populations (in or out of treatment group (1 or 0 in sample data below, respectively)) across a number of variables, and for different studies, all of which are sitting in the same dataframe. In the sample data below, I want to generate t-tests for all variables (in sample data: Age, Dollars, DiseaseCnt) between the 1/0 Treatment group. I want to run these t-tests, by Program, rather than across the population. I have the logic to generate the t-tests. However, I need assistance with the final step of extracting the appropriate parts from the function & creating something easily digestable.
Ultimately, what I want is: a table of t-stats, p-values, variable that t-test was performed on, and program for which variable was tested.
DT<-data.frame(
Treated=sample(0:1,1000,replace=T)
,Program=c('Program A','Program B','Program C','Program D')
,Age=as.integer(rnorm(1000,mean=65,sd=15))
,Dollars=as.integer(rpois(1000,lambda=1000))
,DiseaseCnt=as.integer(rnorm(1000,mean=5,sd=2)) )
progs<-unique(DT$Program) # Pull program names
vars<-names(DT)[3:5] # pull variables to run t tests
test<-lapply(progs, function(i)
tt<-lapply(vars, function(j) {t.test( DT[DT$Treated==1 & DT$Program == i,names(DT)==j]
,DT[DT$Treated==0 & DT$Program == i,names(DT)==j]
,alternative = 'two.sided' )
list(j,tt$statistic,tt$p.value) }
) )
# nested lapply produces results in list format that can be binded, but complete output w/ both lapply's is erroneous

You should convert it into a data.table first. (In my code I call your original table DF):
DT <- as.data.table(DF)
DT[, t.test(data=.SD, Age ~ Treated), by=Program]
Program statistic parameter p.value conf.int estimate null.value alternative
1: Program A -0.6286875 247.8390 0.5301326 -4.8110579 65.26667 0 two.sided
2: Program A -0.6286875 247.8390 0.5301326 2.4828527 66.43077 0 two.sided
3: Program B 1.4758524 230.5380 0.1413480 -0.9069634 67.15315 0 two.sided
4: Program B 1.4758524 230.5380 0.1413480 6.3211834 64.44604 0 two.sided
5: Program C 0.1994182 246.9302 0.8420998 -3.3560930 63.56557 0 two.sided
6: Program C 0.1994182 246.9302 0.8420998 4.1122406 63.18750 0 two.sided
7: Program D -1.1321569 246.0086 0.2586708 -6.1855837 62.31707 0 two.sided
8: Program D -1.1321569 246.0086 0.2586708 1.6701237 64.57480 0 two.sided
method data.name
1: Welch Two Sample t-test Age by Treated
2: Welch Two Sample t-test Age by Treated
3: Welch Two Sample t-test Age by Treated
4: Welch Two Sample t-test Age by Treated
5: Welch Two Sample t-test Age by Treated
6: Welch Two Sample t-test Age by Treated
7: Welch Two Sample t-test Age by Treated
8: Welch Two Sample t-test Age by Treated
In this format, for each Program, the statistic is the same for both and equal to t, the parameter here is the df, for conf.int, it goes (in order) lower then upper (so for Program A, the confidence interval is (-4.8110579, 2.4828527), and for estimate it will be group 0 and then group 1 (so for Program A, the mean for Treated == 0 is 65.26667, etc.
This was the quickest solution I could come up with, and you could loop through vars, or perhaps there's a simpler way.
EDIT: I only confirmed for Program A and for Age, using the following code:
DT[Program == 'Program A', t.test(Age ~ Treated)]
Welch Two Sample t-test
data: Age by Treated
t = -0.62869, df = 247.84, p-value = 0.5301
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-4.811058 2.482853
sample estimates:
mean in group 0 mean in group 1
65.26667 66.43077
EDIT 2: Here is code that loops through your variables and rbind's them together:
do.call(rbind, lapply(vars, function(x) DT[, t.test(data=.SD, eval(parse(text=x)) ~ Treated), by=Program]))

You can get the same t-test out of a regression; if you think the effect of treatment is different for different programs, you should include an interaction. You can also specify multiple responses.
> m <- lm(cbind(Age,Dollars,DiseaseCnt)~Treated * Program - Treated - 1, DT)
> lapply(summary(m), `[[`, "coefficients")
$`Response Age`
Estimate Std. Error t value Pr(>|t|)
ProgramProgram A 63.0875912409 1.294086510 48.7506752932 1.355786133e-265
ProgramProgram B 65.3846153846 1.400330869 46.6922616771 1.207761156e-252
ProgramProgram C 66.0695652174 1.412455172 46.7763979425 3.534894216e-253
ProgramProgram D 66.6691729323 1.313402302 50.7606640010 5.038015651e-278
Treated:ProgramProgram A 2.8593114140 1.924837595 1.4854819032 1.377339219e-01
Treated:ProgramProgram B -0.9786003470 1.919883369 -0.5097186438 6.103619649e-01
Treated:ProgramProgram C -0.5066022544 1.922108032 -0.2635659631 7.921691261e-01
Treated:ProgramProgram D -2.8657541289 1.919883369 -1.4926709484 1.358412980e-01
$`Response Dollars`
Estimate Std. Error t value Pr(>|t|)
ProgramProgram A 998.5474452555 2.681598120 372.3702808887 0.0000000000
ProgramProgram B 997.4188034188 2.901757030 343.7292623810 0.0000000000
ProgramProgram C 1001.6869565217 2.926880936 342.2370019265 0.0000000000
ProgramProgram D 1001.2180451128 2.721624185 367.8752013053 0.0000000000
Treated:ProgramProgram A -0.9899231316 3.988636646 -0.2481858388 0.8040419882
Treated:ProgramProgram B 2.5060086113 3.978370529 0.6299082986 0.5288996396
Treated:ProgramProgram C -5.4721417069 3.982980462 -1.3738811324 0.1697889454
Treated:ProgramProgram D -4.0043698991 3.978370529 -1.0065351806 0.3144036460
$`Response DiseaseCnt`
Estimate Std. Error t value Pr(>|t|)
ProgramProgram A 4.53284671533 0.1793523653 25.27341475576 3.409326912e-109
ProgramProgram B 4.56410256410 0.1940771747 23.51694665775 1.515736580e-97
ProgramProgram C 4.25217391304 0.1957575279 21.72163675698 6.839384262e-86
ProgramProgram D 4.60150375940 0.1820294143 25.27890219412 3.133081901e-109
Treated:ProgramProgram A 0.13087009883 0.2667705543 0.49057175444 6.238378600e-01
Treated:ProgramProgram B -0.02274918064 0.2660839292 -0.08549625944 9.318841210e-01
Treated:ProgramProgram C 0.47375201288 0.2663922537 1.77840010867 7.564438017e-02
Treated:ProgramProgram D -0.31090546880 0.2660839292 -1.16844887901 2.429064705e-01
You specifically care about the Treated:Program entries of the regression table.

You're getting errors because you're trying to access tt$statistic from within the function that creates tt. Some bracketing problems.
Here's one way to do it following your version
results <- lapply(progs, function (i) {
DS = subset(DT, Program == i)
o <- lapply(vars, function (i) {
frm <- formula(paste0(i, '~ Treated'))
tt <- t.test(frm, DS)
data.frame(Variable=i, T=tt$statistic, P=tt$p.value)
})
o <- do.call(rbind, o)
o$Program <- i
o
})
do.call(rbind, results)
Or you can do it with rather rbind-ing using (e.g.) ddply (I think the rbinding still happens, just behind the scenes):
library(plyr)
combinations <- expand.grid(Program=progs, Y=vars)
ddply(combinations, .(Program, Y),
function (x) {
# x is a dataframe with the program and variable;
# just do the t-test and add the statistic & p-val to it
frm <- formula(paste0(x$Y, '~ Treated'))
tt <- t.test(frm, subset(DT, Program == x$Program))
x$T <- tt$statistic
x$P <- tt$p.value
x
})

Linear Regression and storing results in data frame [duplicate]

This question already has answers here:
Linear Regression and group by in R
(10 answers)
Closed 6 years ago.
I am running a linear regression on some variables in a data frame. I'd like to be able to subset the linear regressions by a categorical variable, run the linear regression for each categorical variable, and then store the t-stats in a data frame. I'd like to do this without a loop if possible.
Here's a sample of what I'm trying to do:
a<- c("a","a","a","a","a",
"b","b","b","b","b",
"c","c","c","c","c")
b<- c(0.1,0.2,0.3,0.2,0.3,
0.1,0.2,0.3,0.2,0.3,
0.1,0.2,0.3,0.2,0.3)
c<- c(0.2,0.1,0.3,0.2,0.4,
0.2,0.5,0.2,0.1,0.2,
0.4,0.2,0.4,0.6,0.8)
cbind(a,b,c)
I can begin by running the following linear regression and pulling the t-statistic out very easily:
summary(lm(b~c))$coefficients[2,3]
However, I'd like to be able to run the regression for when column a is a, b, or c. I'd like to then store the t-stats in a table that looks like this:
variable t-stat
a 0.9
b 2.4
c 1.1
Hope that makes sense. Please let me know if you have any suggestions!

Here is a solution using dplyr and tidy() from the broom package. tidy() converts various statistical model outputs (e.g. lm, glm, anova, etc.) into a tidy data frame.
library(broom)
library(dplyr)
data <- data_frame(a, b, c)
data %>%
group_by(a) %>%
do(tidy(lm(b ~ c, data = .))) %>%
select(variable = a, t_stat = statistic) %>%
slice(2)
# variable t_stat
# 1 a 1.6124515
# 2 b -0.1369306
# 3 c 0.8000000
Or extracting both, the t-statistic for the intercept and the slope term:
data %>%
group_by(a) %>%
do(tidy(lm(b ~ c, data = .))) %>%
select(variable = a, term, t_stat = statistic)
# variable term t_stat
# 1 a (Intercept) 1.2366939
# 2 a c 1.6124515
# 3 b (Intercept) 2.6325081
# 4 b c -0.1369306
# 5 c (Intercept) 1.4572335
# 6 c c 0.8000000

You can use the lmList function from the nlme package to apply lm to subsets of data:
# the data
df <- data.frame(a, b, c)
library(nlme)
res <- lmList(b ~ c | a, df, pool = FALSE)
coef(summary(res))
The output:
, , (Intercept)
Estimate Std. Error t value Pr(>|t|)
a 0.1000000 0.08086075 1.236694 0.30418942
b 0.2304348 0.08753431 2.632508 0.07815663
c 0.1461538 0.10029542 1.457233 0.24110393
, , c
Estimate Std. Error t value Pr(>|t|)
a 0.50000000 0.3100868 1.6124515 0.2052590
b -0.04347826 0.3175203 -0.1369306 0.8997586
c 0.15384615 0.1923077 0.8000000 0.4821990
If you want the t values only, you can use this command:
coef(summary(res))[, "t value", -1]
# a b c
# 1.6124515 -0.1369306 0.8000000

Here's a vote for the plyr package and ddply().
plyrFunc <- function(x){
mod <- lm(b~c, data = x)
return(summary(mod)$coefficients[2,3])
}
tStats <- ddply(dF, .(a), plyrFunc)
tStats
a V1
1 a 1.6124515
2 b -0.1369306
3 c 0.6852483

Use split to subset the data and do the looping by lapply
dat <- data.frame(b,c)
dat_split <- split(x = dat, f = a)
res <- sapply(dat_split, function(x){
summary(lm(b~c, data = x))$coefficients[2,3]
})
Reshape the result to your needs:
data.frame(variable = names(res), "t-stat" = res)
variable t.stat
a a 1.6124515
b b -0.1369306
c c 0.8000000

You could do this:
a<- c("a","a","a","a","a",
"b","b","b","b","b",
"c","c","c","c","c")
b<- c(0.1,0.2,0.3,0.2,0.3,
0.1,0.2,0.3,0.2,0.3,
0.1,0.2,0.3,0.2,0.3)
c<- c(0.2,0.1,0.3,0.2,0.4,
0.2,0.5,0.2,0.1,0.2,
0.4,0.2,0.4,0.6,0.8)
df <- data.frame(a,b,c)
t.stats <- t(data.frame(lapply(c('a','b','c'),
function(x) summary(lm(b~c,data=df[df$a==x,]))$coefficients[2,3])))
colnames(t.stats) <- 't-stat'
rownames(t.stats) <- c('a','b','c')
Output:
> t.stats
t-stat
a 1.6124515
b -0.1369306
c 0.8000000
Unless I am mistaken the values you give in your output are not the correct ones.
Or:
t.stats <- data.frame(t.stats)
t.stats$variable <- rownames(t.stats)
> t.stats[,c(2,1)]
variable t.stat
a a 1.6124515
b b -0.1369306
c c 0.8000000
If you want a data.frame and a separate column.

Iteration of columns for linear regression in R

I try to select columns in order to make a linear regression.
I tried to make something like this but it does not seems to work
df <- 0
x <- 0
for(i in 1:30){
reg.A_i <- lm(log(match("A", i, sep="_"))~ log(A_0) + B + C , data=y)
x <- coef(summary(reg.A_i))
df <- cbind(df[,1],x)
}
My data frame has variables like this:
A_0, A_1, A_2, A_3 .... A_30, B, C

It seems you want something like this:
set.seed(42)
#Some data:
dat <- data.frame(A0=rnorm(100, mean=20),
A1=rnorm(100, mean=30),
A2=rnorm(100, mean=40),
B=rnorm(100), C = rnorm(100))
#reshape your data
library(reshape2)
dat2 <- melt(dat, id.vars=c("A0", "B", "C"), value.name="y")
#do the regressions
library(plyr)
dlply(dat2, .(variable), function(df) {fit <- lm(log(y) ~ log(A0) + B + C, data=df)
coef(summary(fit))
})
# $A1
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 3.323355703 0.173727484 19.1297061 1.613475e-34
# log(A0) 0.024694764 0.057972711 0.4259722 6.710816e-01
# B 0.001001875 0.003545922 0.2825428 7.781356e-01
# C -0.003843878 0.003045634 -1.2620944 2.099724e-01
#
# $A2
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 3.903836714 0.145839694 26.7679986 2.589532e-46
# log(A0) -0.071847318 0.048666580 -1.4763174 1.431314e-01
# B -0.001431821 0.002976709 -0.4810081 6.316052e-01
# C 0.001999177 0.002556731 0.7819271 4.361817e-01
#
# attr(,"split_type")
# [1] "data.frame"
# attr(,"split_labels")
# variable
# 1 A1
# 2 A2

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R regressions in a loop [duplicate] - r

Here's a vote for the plyr package and ddply(). plyrFunc <- function(x){ mod <- lm(b~c, data = x) return(summary(mod)$coefficients[2,3]) } tStats <- ddply(dF, .(a), plyrFunc) tStats a V1 1 a 1.6124515 2 b -0.1369306 3 c 0.6852483

Related

R: t test over multiple columns using t.test function

Is it possible to perform regression inside aggregate function? [duplicate]

using lapply to create t-test table

Linear Regression and storing results in data frame [duplicate]

Iteration of columns for linear regression in R

Categories

Resources