How can I calculate weighted standard errors and plot them in a bar plot? - r

I have a data frame of counts. I would like to calculate weighted proportions, plot the proportions, and also plot standard error bars for these weighted proportions.
Sample of my data frame:
head(df[1:4,])
badge year total b_1 b_2 b_3 b_4 b_5 b_6 b_7 b_8 b_9 b_10
1 15 2014 14 3 2 1 1 1 1 1 1 1 1
2 15 2015 157 13 12 11 8 6 6 6 5 5 5
3 15 2016 15 5 3 1 1 1 1 1 1 1 0
4 2581 2014 13 1 1 1 1 1 1 1 1 1 1
The data contain counts of 911 calls officers respond to in ten different police beats (b_1, b_2,...) in a given year. So officer 15 responds to 14 calls total in 2014, 3 of which were in beat 1, 2 in beat 2, and so on.
Essentially, what I want is to get the overall proportion of calls that occur within each beat. But I want these proportions to be weighted by the total number of calls.
So far, I've been able to calculate this by just adding the values within each b_ column and the total column, and calculating proportions. I have plotted these in a simple bar plot. I am haven't been able to figure out how to calculate standard errors that are weighted by total.
I have no preference for how the data are plotted. I'm mainly interested in getting the right standard errors.
Here is the code I have so far:
sums_by_beat <- apply(df[, grep('b_', colnames(df2))], 2, sum)
props_by_beat <- sums_by_beat / sum(df$total)
# Bar plot of proportions by beat
barplot(props_by_beat, main='Distribution of Calls by Beat',
xlab="Nth Most Common Division", ylim=c(0,1),
names.arg=1:length(props_by_beat), ylab="Percent of Total Calls")
And a 30-row sample of my data:
df <- structure(list(badge = c(15, 15, 15, 2581, 2581, 2745, 2745,
3162, 3162, 3162, 3396, 3650, 3650, 3688, 3688, 3688, 3698, 3698,
3698, 3717, 3717, 3717, 3740, 3740, 3740, 3813, 3873, 3907, 3930,
4007), year = c(2014, 2015, 2016, 2014, 2015, 2015, 2016, 2014,
2015, 2016, 2016, 2014, 2015, 2014, 2015, 2016, 2014, 2015, 2016,
2014, 2015, 2016, 2014, 2015, 2016, 2016, 2015, 2014, 2014, 2014
), total = c(14, 157, 15, 13, 29, 1, 1, 754, 1172, 1039, 14,
1, 2, 34, 57, 146, 3, 7, 28, 593, 1036, 1303, 461, 952, 1370,
1, 4, 41, 5, 451), b_1 = c(3, 13, 5, 1, 3, 1, 1, 33, 84, 83,
2, 1, 2, 5, 10, 14, 2, 7, 7, 39, 72, 75, 42, 69, 81, 1, 1, 7,
1, 36), b_2 = c(2, 12, 3, 1, 2, 0, 0, 33, 61, 52, 2, 0, 0, 3,
6, 8, 1, 0, 2, 37, 65, 70, 29, 65, 75, 0, 1, 5, 1, 23), b_3 = c(1,
11, 1, 1, 2, 0, 0, 32, 57, 45, 2, 0, 0, 3, 5, 8, 0, 0, 2, 34,
62, 67, 28, 50, 73, 0, 1, 3, 1, 22), b_4 = c(1, 8, 1, 1, 2, 0,
0, 31, 44, 39, 2, 0, 0, 3, 3, 7, 0, 0, 2, 34, 61, 67, 26, 42,
72, 0, 1, 3, 1, 21), b_5 = c(1, 6, 1, 1, 1, 0, 0, 30, 42, 37,
1, 0, 0, 3, 3, 7, 0, 0, 1, 33, 53, 61, 23, 42, 67, 0, 0, 2, 1,
21), b_6 = c(1, 6, 1, 1, 1, 0, 0, 30, 40, 36, 1, 0, 0, 2, 2,
6, 0, 0, 1, 32, 53, 61, 22, 41, 63, 0, 0, 2, 0, 21), b_7 = c(1,
6, 1, 1, 1, 0, 0, 26, 39, 35, 1, 0, 0, 2, 2, 6, 0, 0, 1, 30,
47, 58, 22, 39, 62, 0, 0, 2, 0, 21), b_8 = c(1, 5, 1, 1, 1, 0,
0, 26, 39, 33, 1, 0, 0, 2, 2, 6, 0, 0, 1, 30, 47, 58, 21, 38,
59, 0, 0, 2, 0, 19), b_9 = c(1, 5, 1, 1, 1, 0, 0, 24, 34, 33,
1, 0, 0, 2, 2, 5, 0, 0, 1, 30, 43, 57, 20, 37, 57, 0, 0, 2, 0,
15), b_10 = c(1, 5, 0, 1, 1, 0, 0, 23, 34, 32, 1, 0, 0, 1, 2,
5, 0, 0, 1, 27, 40, 56, 18, 36, 55, 0, 0, 2, 0, 14)), row.names = c(NA,
30L), class = "data.frame")

There isn't (as far as I know) a built-in R function to calculate the standard error of a weighted mean, but it is fairly straightforward to calculate - with some assumptions that are probably valid in the case you describe.
See, for instance:
https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Standard_error
Standard error of the weighted mean
If the elements used to calculate the weighted mean are samples from populations that all have the same variance v, then the variance of the weighted sample mean is estimated as:
var_m = v^2 * sum( wnorm^2 ) # wnorm = weights normalized to sum to 1
And the standard error of the weighted mean is equal to the square root of the variance.
sem = sqrt( var_m )
So, we need to calculate the sample variance from the weighted data.
Weighted variance
The weighted population variance (or biased sample variance) is calculated as:
pop_v = sum( w * (x-mean)^2 ) / sum( w )
However, if (as in the case you describe), we are working with samples taken from the population, rather then with the population itself, we need to make an adjustment to obtain an unbiased sample variance.
If the weights represent the frequencies of observations underlying each of the elements used to calculate the weighted mean & variance, then the adjustment is:
v = pop_v * sum( w ) / ( sum( w ) -1 )
However, this is not the case here, as the weights are the total frequenceis of 911 calls for each policeman, not the calls for each beat. So in this case the weights correspond to the reliabilities of each element, and the adjustment is:
v = pop_v * sum( w )^2 / ( sum( w )^2 - sum( w^2) )
weighted.var and weighted.sem functions
Putting all this together, we can define weighted.var and weighted.sem functions, similar to the base R weighted.mean function (note that several R packages, for instance "Hmisc", already include more-versatile functions to calculate the weighted variance):
weighted.var = function(x,w,type="reliability") {
m=weighted.mean(x,w)
if(type=="frequency"){ return( sum(w*(x-m)^2)/(sum(w)-1) ) }
else { return( sum(w*(x-m)^2)*sum(w)/(sum(w)^2-sum(w^2)) ) }
}
weighted.sem = function(x,w,...) { return( sqrt(weighted.var(x,w,...)*sum(w^2)/sum(w)^2) ) }
applied to 911 call data in the question
In the case of the question, the elements from which we want to calculate the weighted mean and weighted sem correspond to the proportions of calls in each beat, for each policeman.
So (finally...):
props = t(apply(df,1,function(row) row[-(1:3)]/row[3]))
wmean_props = apply(props,2,function(col) weighted.mean(col,w=df[,3]))
wsem_props = apply(props,2,function(col) weighted.sem(col,w=df[,3]))

Aren't your "proportions" actually the mean of the weighted (by total) observations? Then we could simply calculate the weighted colMeans accordingly.
df2 <- df[, grep('b_', colnames(df))]
means.w <- colMeans(df2 / df$total)
For the error bars we could use the quantiles of 1 - alpha/2, i.e. for alpha==.05 we use c(.025, .975). The analytical sds would yield negative values.
q.w <- t(apply(df2 / df$total, 2, quantile, c(.025, .975)))
Now, we store the x-positions that barplot returns invisible,
# Bar plot of proportions by beat
b <- barplot(means.w, main='Distribution of Calls by Beat',
xlab="Nth Most Common Division", ylim=c(0,1),
names.arg=1:length(means.w), ylab="Percent of Total Calls")
and construct the error bars with arrows.
arrows(b, q.w[,1], b, q.w[,2], length=.02, angle=90, code=3)

Related

Finding differences between populations

I have data equivalent data from 2019 and 2020. The proportion of diagnoses in 2020 look like they differ from 2019, but I'd like to ...
a) statistically test the populations are different.
b) determine which categories are the most different.
I've worked out I can do 'a' using:
chisq.test(test$count.2020, test$count.2019)
I don't know how to find out which categories are the ones that are the most different between 2020 and 2019. Any help would be amazing, thanks!
diagnosis <- data.frame(mf_label = c("Audiovestibular", "Autonomic", "Cardiovascular",
"Cerebral palsy", "Cerebrovascular", "COVID", "Cranial nerves",
"CSF disorders", "Developmental", "Epilepsy and consciousness",
"Functional", "Head injury", "Headache", "Hearing loss", "Infection",
"Maxillofacial", "Movement disorders", "Muscle and NMJ", "Musculoskeletal",
"Myelopathy", "Neurodegenerative", "Neuroinflammatory", "Peripheral nerve",
"Plexopathy", "Psychiatric", "Radiculopathy", "Spinal", "Syncope",
"Toxic and nutritional", "Tumour", "Visual system"),
count.2019 = c(5, 0, 1, 1, 2, 0, 4, 3, 0, 7, 4, 0, 24, 0, 0, 2, 22, 3, 3, 0, 3, 18, 12, 0, 0, 2, 2, 0, 1, 4, 0),
count.2020 = c(5, 1, 1, 3, 28, 9, 11, 13, 1, 13, 30, 5, 68, 1, 1, 2, 57, 14, 5, 8, 16, 37, 27, 3, 13, 17, 3, 1, 8, 13, 11))
Your Chi square test is not correct. You need to provide the counts as a table or matrix, not as two separate vectors. Because you have very small expected values for half of the cells, you need to use simulation to estimate the p-value:
results <- chisq.test(diagnosis[, 2:3], simulate.p.value=TRUE)
The overall table is barely significant at .05. The chisq.test function returns a list including the original data, the expected values, residuals, and standardized residuals. The manual page describes these (?chisq.test) and provides some citations for more details.

Interpolating three columns

I have a set of data in ranges like:
x|y|z
-4|1|45
-4|2|68
-4|3|96
-2|1|56
-2|2|65
-2|3|89
0|1|45
0|2|56
0|3|75
2|1|23
2|2|56
2|3|75
4|1|42
4|2|65
4|3|78
Here I need to interpolate between x and y using the z value.
I tried interpolating separately for x and y using z value by using the below code:
interpol<-approx(x,z,method="linear")
interpol_1<-approx(y,z,method="linear")
Now I'm trying to use all the three columns but values are coming wrong.
In your script you forgot to direct to your data.frame. Note the use of $ in the approx function.
interpol <- approx(df$x,df$z,method="linear")
interpol_1 <- approx(df$y,df$z,method="linear")
Data:
df <- data.frame(
x = c(-4, -4, -4, -2, -2, -2, 0, 0, 0, 2, 2, 2, 4, 4, 4),
y = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3),
z = c(45, 68, 96, 56, 65, 89, 45, 56, 75, 23, 56, 75, 42, 65, 78)
)

How to Obtain Constant Term in Linear Discriminant Analysis

Consider dput:
structure(list(REAÇÃO = structure(c(0, 1, 0, 0, 1, 0, 1, 1,
0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1,
1, 0, 1, 1, 0, 1, 1), format.spss = "F11.0"), IDADE = structure(c(22,
38, 36, 58, 37, 31, 32, 54, 60, 34, 45, 27, 30, 20, 30, 30, 22,
26, 19, 18, 22, 23, 24, 50, 20, 47, 34, 31, 43, 35, 23, 34, 51,
63, 22, 29), format.spss = "F11.0"), ESCOLARIDADE = structure(c(6,
12, 12, 8, 12, 12, 10, 12, 8, 12, 12, 12, 8, 4, 8, 8, 12, 8,
9, 4, 12, 6, 12, 12, 12, 12, 12, 12, 12, 8, 8, 12, 16, 12, 12,
12), format.spss = "F11.0"), SEXO = structure(c(1, 1, 0, 0, 1,
0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0,
0, 1, 0, 1, 0, 0, 0, 1, 1, 1), format.spss = "F11.0")), .Names = c("REAÇÃO",
"IDADE", "ESCOLARIDADE", "SEXO"), row.names = c(NA, -36L), class = "data.frame")
where: REAÇÃO is a dependent variable in the model.
Constant: -4.438.
How can I obtain this value using a simple function in R?
For obtain constant term in Discriminant Analysis on R (with library MASS):
groupmean<-(model$prior%*%model$means)
constant<-(groupmean%*%model$scaling)
constant
where model is the lda discriminant expression:
model<-lda(y~x1+x2+xn,data=mydata)
model

How to overlay survival plot

I would to overlay two different survival curves on same plot, for example OS et PFS (here false results).
N pt. OS. OS_Time_(years). PFS. PFS_Time_(years).
__________________________________________________________________
1. 1 12 0 12
2. 0 10 1 8
3. 0 14 0 14
4. 0 10 0 10
5. 1 11 1 8
6. 1 16 1 6
7. 0 11 1 4
8. 0 12 1 10
9. 1 9 0 9
10 1 10 1 9
__________________________________________________________
First, I import my dataset:
library(readxl)
testR <- read_excel("~/test.xlsx")
View(testR)
Then, I created survfit for both OS and PFS:
OS<-survfit(Surv(OS_t,OS)~1, data=test)
PFS<-survfit(Surv(PFS_t,PFS)~1, data=test)
And finally, I can plot each one thanks to:
plot(OS)
plot(PFS)
for example (or ggplot2...).
Here my question, if I want to overlay the 2 ones on same graph, how can I do?
I tried multipleplot or
ggplot(testR, aes(x)) + # basic graphical object
geom_line(aes(y=y1), colour="red") + # first layer
geom_line(aes(y=y2), colour="green") # second layer
But it didn't work (but I'm not sure to use it correctly).
Can someone help me, please ?
Thanks a lot
Here is my code for Data sample:
test <- structure(list(ID = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, 2, 3, 4, 5, 6, 7, 8, 9),
Sex = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1),
Tabac = c(2, 0, 1, 1, 0, 0, 2, 0, 0, 0, 1, 1, 1, 0, 2, 0, 1, 1, 1),
Bmi = c(20, 37, 37, 25, 28, 38, 16, 27, 26, 28, 15, 36, 20, 17, 28, 37, 27, 26, 18),
Age = c(75, 56, 45, 65, 76, 34, 87, 43, 67, 90, 56, 37, 84, 45, 80, 87, 90, 65, 23), c(0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0),
OS_times = c(2, 4, 4, 2, 3, 5, 5, 3, 2, 2, 4, 1, 3, 2, 4, 3, 4, 3, 2),
OS = c(0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0),
PFS_time = c(1, 2, 1, 1, 3, 4, 3, 1, 2, 2, 4, 1, 2, 2, 2, 3, 4, 3, 2),
PFS = c(1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0)),
.Names = c("ID", "Sex", "Tabac", "Bmi", "Age", "LN", "OS_times", "OS", "PFS_time", "PFS"),
class = c("tbl_df", "tbl", "data.frame"),
row.names = c(NA, -19L))
You may use the ggsurv function from the GGally package in the following way. Combine both groups of variables in a data frame and add a "type" column. Later in the call to the plot, you refer to the type.
I used your data structure and named it "test". Afterwards, I transformed it to a data frame with the name "testdf".
library(GGally)
testdf <- data.frame(test)
OS_PFS1 <- data.frame(life = testdf$OS, life_times = testdf$OS_times, type= "OS")
OS_PFS2 <- data.frame(life = testdf$PFS, life_times = testdf$PFS_time, type= "PFS")
OS_PFS <- rbind(OS_PFS1, OS_PFS2)
sf.OS_PFS <- survfit(Surv(life_times, life) ~ type, data = OS_PFS)
ggsurv(sf.OS_PFS)
if you want the confidence intervals shown:
ggsurv(sf.OS_PFS, CI = TRUE)
Please let me know whether this is what you want.

Use and save lm summary for multplie lm fitting

I work with animal trials in which I try to get information about movement for several groups of animals (normally 4 groups of 12 individuals, but not allways the same).
My final data frame per trial looks like this.
> dput(aa)
structure(list(Tiempo = c(618.4, 618.6, 618.8, 619, 619.2, 619.4,
619.6, 619.8, 620, 620.2, 620.4), UT1 = c(0, 0, 15, 19, 26, 27,
29, 37, 42, 44, 45), UT2 = c(0, 0, 0, 0, 0, 1, 18, 19, 21, 21,
21), UT3 = c(0, 2, 3, 3, 3, 3, 16, 19, 20, 20, 20), UT4 = c(0,
0, 0, 0, 0, 0, 5, 17, 29, 34, 39), UT5 = c(0, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1), UT6 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), UT7 = c(0,
0, 1, 2, 2, 3, 4, 6, 7, 7, 8), UT8 = c(0, 19, 20, 23, 24, 25,
33, 80, 119, 122, 130), UT9 = c(0, 1, 1, 1, 1, 3, 6, 9, 19, 19,
19), UT10 = c(0, 0, 0, 0, 0, 1, 2, 3, 10, 12, 14), TR1 = c(0,
0, 0, 0, 0, 0, 0, 1, 2, 2, 2), TR2 = c(0, 0, 0, 0, 0, 0, 2, 19,
32, 37, 43), TR3 = c(0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), TR4 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), TR5 = c(0, 0, 0, 0, 0, 0, 13,
18, 20, 22, 26), TR6 = c(0, 2, 11, 20, 25, 29, 37, 40, 41, 42,
43), TR7 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), TR8 = c(0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0), TR9 = c(0, 0, 4, 9, 16, 19, 23, 27,
31, 33, 34), TR10 = c(0, 1, 9, 25, 32, 41, 49, 49, 51, 57, 60
), UT1.1 = c(0, 10, 15, 17, 23, 31, 37, 48, 53, 57, 58), UT2.1 = c(0,
1, 1, 1, 1, 2, 2, 4, 4, 4, 4), UT3.1 = c(0, 2, 11, 14, 20, 22,
24, 25, 26, 26, 26), UT4.1 = c(0, 0, 0, 0, 0, 0, 0, 11, 13, 13,
14), UT5.1 = c(0, 3, 5, 7, 18, 19, 19, 27, 37, 39, 42), UT6.1 = c(0,
0, 0, 0, 0, 0, 2, 2, 3, 4, 4), UT7.1 = c(0, 0, 2, 8, 9, 9, 12,
16, 18, 18, 18), UT8.1 = c(0, 0, 1, 8, 13, 15, 44, 68, 80, 89,
94), UT9.1 = c(0, 1, 1, 1, 1, 2, 3, 5, 9, 10, 10), UT10.1 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), UT11 = c(0, 12, 17, 17, 18, 34,
74, 116, 131, 145, 170), UT12 = c(0, 1, 2, 3, 3, 3, 5, 14, 21,
22, 24), TR1.1 = c(0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1), TR2.1 = c(0,
0, 0, 11, 16, 19, 40, 94, 121, 134, 145), TR3.1 = c(0, 0, 0,
2, 3, 5, 6, 6, 6, 7, 7), TR4.1 = c(0, 0, 0, 1, 1, 1, 1, 1, 4,
4, 5), TR5.1 = c(0, 24, 27, 28, 29, 37, 86, 151, 212, 258, 288
), TR6.1 = c(0, 0, 1, 1, 1, 2, 5, 9, 12, 12, 13), TR7.1 = c(0,
4, 7, 28, 47, 70, 108, 125, 127, 127, 127), TR8.1 = c(0, 1, 2,
2, 2, 2, 3, 3, 4, 4, 4), TR9.1 = c(0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0), TR10.1 = c(0, 1, 1, 1, 1, 1, 13, 40, 41, 45, 49), TR11 = c(0,
0, 0, 1, 4, 8, 10, 11, 17, 23, 25), TR12 = c(0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0)), .Names = c("Tiempo", "UT1", "UT2", "UT3", "UT4",
"UT5", "UT6", "UT7", "UT8", "UT9", "UT10", "TR1", "TR2", "TR3",
"TR4", "TR5", "TR6", "TR7", "TR8", "TR9", "TR10", "UT1.1", "UT2.1",
"UT3.1", "UT4.1", "UT5.1", "UT6.1", "UT7.1", "UT8.1", "UT9.1",
"UT10.1", "UT11", "UT12", "TR1.1", "TR2.1", "TR3.1", "TR4.1",
"TR5.1", "TR6.1", "TR7.1", "TR8.1", "TR9.1", "TR10.1", "TR11",
"TR12"), row.names = c(NA, -11L), class = "data.frame")
My goal is to lm the individuals represented in each column using Tiempo variable as x so I do it like this:
fit<-apply(aa,2,function(x) lm(x~aa$Tiempo))
It works perfect but the problem is that all the valuable (and useless) information gets stored in that lm object and I can't extract the data in an efficient way. My lm object looks like this
summary(fit)
Length Class Mode
Tiempo 12 lm list
UT1 12 lm list
UT2 12 lm list
UT3 12 lm list
UT4 12 lm list
UT5 12 lm list
UT6 12 lm list
UT7 12 lm list
UT8 12 lm list
UT9 12 lm list
UT10 12 lm list
TR1 12 lm list
TR2 12 lm list
TR3 12 lm list
TR4 12 lm list
TR5 12 lm list
TR6 12 lm list
TR7 12 lm list
TR8 12 lm list
TR9 12 lm list
TR10 12 lm list
UT1.1 12 lm list
UT2.1 12 lm list
UT3.1 12 lm list
UT4.1 12 lm list
UT5.1 12 lm list
UT6.1 12 lm list
UT7.1 12 lm list
UT8.1 12 lm list
UT9.1 12 lm list
UT10.1 12 lm list
UT11 12 lm list
UT12 12 lm list
TR1.1 12 lm list
TR2.1 12 lm list
TR3.1 12 lm list
TR4.1 12 lm list
TR5.1 12 lm list
TR6.1 12 lm list
TR7.1 12 lm list
TR8.1 12 lm list
TR9.1 12 lm list
TR10.1 12 lm list
TR11 12 lm list
TR12 12 lm list
And each animal looks like this
summary(fit$UT1)
Call:
lm(formula = x ~ aa$Tiempo)
Residuals:
Min 1Q Median 3Q Max
-6.873 -1.845 1.182 2.314 4.918
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -14642.700 1104.825 -13.25 3.29e-07 ***
aa$Tiempo 23.682 1.784 13.28 3.24e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.742 on 9 degrees of freedom
Multiple R-squared: 0.9514, Adjusted R-squared: 0.946
F-statistic: 176.3 on 1 and 9 DF, p-value: 3.24e-07
I would like to get the summary information organised in a data frame with all animals (or at least the coefficients and R-squared data) in order to keep doing some statistical analysis. Having that information cuould possibly help me to think a function to evaluate if the R-squared is lower than a fixed value and I should check that fit (or discard that animal if it's really not performing well). Besides, I should find a way to make it reproducible because nowadays I'm using
FIT<-data.frame(UT1=fit$UT1$coefficients,
UT2=fit$UT2$coefficients,
UT3=fit$UT3$coefficients,...)
This approach doesn't even meet what I'm trying to do and it's really precarious.
I've made a little search and find about coef function but
coef(fit)
NULL
With your fit list, you can extract the coefficients and r-squared values with
fit<-apply(aa,2,function(x) lm(x~aa$Tiempo))
mysummary <- t(sapply(fit, function(x) {
ss<-summary(x); c(coef(x),
r.square=ss$r.squared, adj.r.squared=ss$adj.r.squared)
}))
We use sapply to go over the list you created and extract the coefficients from the model and the r-squared values from the summary. The output is
> mysummary
(Intercept) aa$Tiempo r.square adj.r.squared
Tiempo 0.0000 1.0000000 1.0000000 1.0000000
UT1 -14642.7000 23.6818182 0.9514231 0.9460256
UT2 -8662.4182 14.0000000 0.7973105 0.7747894
UT3 -7535.5091 12.1818182 0.8404400 0.8227111
...

Resources