I'm very new to multilevel modeling and doing data analysis for repeated measures. I am trying to figure out if my model is set up correctly using the nlme package and if it's set up correctly to answer the question I want to answer. I want to see if ius moderates the relationship between na and worry.
Variables
id - subject id
count - time variable; day of collection
worry - outcome (collected daily, continuous variable)
na - predictor (collected daily, continuous variable)
ius - moderator (collected at baseline, continuous variable)
I also created lag variables for na (lag_na) and worry (lag_worry) so I can control for the previous days na and worry though I'm not sure if this was the right thing to do.
Here is my code:
library(lme4)
# Here's an example dataset:
Dataset <- structure(list(id = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L), levels = c("1", "2", "3",
"5"), class = "factor"), count = c(1, 2, 3, 1, 2, 3, 4, 1, 2,
3, 4, 1, 2, 3, 4, 5, 6), na = c(1, 0, 0, 18, 13, 3, 5, 29, 15,
19, 21, 3, 5, 2, 2, 18, 19), worry = c(0, 1, 0, 0, 0, 0, 0, 2,
2, 1, 2, 0, 0, 3, 0, 4, 3), ius = c(35, 35, 35, 65, 65, 65, 65,
44, 44, 44, 44, 53, 53, 53, 53, 53, 53), lag_na = c(NA, 1, 0,
NA, 18, 13, 3, NA, 29, 15, 19, NA, 3, 5, 2, 2, 18), lag_worry = c(NA,
0, 1, NA, 0, 0, 0, NA, 2, 2, 1, NA, 0, 0, 3, 0, 4)), row.names = c(NA,
-17L), groups = structure(list(id = structure(1:4, levels = c("1",
"2", "3", "5"), class = "factor"), .rows = structure(list(1:3,
4:7, 8:11, 12:17), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -4L), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
model <- lmer(worry ~ na*ius + lag_na + lag_worry + count + (1 | id), REML=FALSE, data = Dataset)
For a variable to be an "effect moderator" (at least as the term is used in epidemiologic discussion) there would need to be a material change in the predictions from models with and without the interaction term in the model. You have a model with an interaction between ius and na
> model <- lmer(worry ~ na*ius + lag_na + lag_worry + count + (1 | id), REML=FALSE, data = Dataset)
> model
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: worry ~ na * ius + lag_na + lag_worry + count + (1 | id)
Data: Dataset
AIC BIC logLik deviance df.resid
49.0113 54.0958 -15.5056 31.0113 4
Random effects:
Groups Name Std.Dev.
id (Intercept) 0.7525
Residual 0.6096
Number of obs: 13, groups: id, 4
Fixed Effects:
(Intercept) na ius lag_na lag_worry count na:ius
2.185346 -0.169745 -0.092891 0.128599 -0.871304 1.004783 0.003021
# Now remove the interaction term
> model <- lmer(worry ~ na + ius + lag_na + lag_worry + count + (1 | id), REML=FALSE, data = Dataset)
> model
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: worry ~ na + ius + lag_na + lag_worry + count + (1 | id)
Data: Dataset
AIC BIC logLik deviance df.resid
47.4562 51.9758 -15.7281 31.4562 5
Random effects:
Groups Name Std.Dev.
id (Intercept) 0.7212
Residual 0.6325
Number of obs: 13, groups: id, 4
Fixed Effects:
(Intercept) na ius lag_na lag_worry count
1.474439 -0.006056 -0.076298 0.122280 -0.840278 0.951945
From what I can see there is almost no change in measures of global fit (AIC, BIC or deviance). Do you want to proceed further in determining what the differences in predictions are with such a small dataset? There would be a difference in the predictions between these two models, but there seems to be little evidence that they would be considered "material". The method of examining what the data shows versus the respective models is described in this post to the stats.SE forum: https://stats.stackexchange.com/questions/33059/testing-for-moderation-with-continuous-vs-categorical-moderators/33090#33090
Plot (scatterplot) worry as the y-axis and na on the x-axis. Then for the non-interaction model plot the single line at the mean of ius, You're going to find some difficulty in doing this sensibly because the values of `ius are not at all normally distributed. (I discovered this when I went to color the points in a scatterplot:
findInterval(Dataset$ius, c(30,45, 52, 66))
[1] 1 1 1 3 3 3 3 1 1 1 1 3 3 3 3 3 3
> table(Dataset$ius)
35 44 53 65
3 4 6 4
When you plot the points with the four groups you find that the ranges of the outcome and the predictor within groups of identical ius measures are much smaller that the full dataset ranges. It really makes little sense to use an interaction model with continuous variables in this setting:
png(); plot(worry~jitter(na,3), Dataset, col=2+findInterval(Dataset$ius, c(30,36, 52, 56, 66))); dev.off()
So I see two compelling reasons not to use this analysis as evidence for effect moderation. Whether you want to built a categorical prediction model might be determined by how much more data could be gathered. Seems to be a pretty sparse dataset for any conclusions, but there is suggestion of some sort of grouping effect.
Related
I am working on an exercise asking me "Plot the residuals against Y_hat, each predictor variable, and each two-factor interaction term on separate graphs." Here is a snippet of the data set I am using:
> dput(head(Commercial_Properties, 10))
structure(list(Rental_Rates = c(13.5, 12, 10.5, 15, 14, 10.5,
14, 16.5, 17.5, 16.5), Age = c(1, 14, 16, 4, 11, 15, 2, 1, 1,
8), Op_Expense_Tax = c(5.02, 8.19, 3, 10.7, 8.97, 9.45, 8, 6.62,
6.2, 11.78), Vacancy_Rate = c(0.14, 0.27, 0, 0.05, 0.07, 0.24,
0.19, 0.6, 0, 0.03), Total_Sq_Ft = c(123000, 104079, 39998, 57112,
60000, 101385, 31300, 248172, 215000, 251015), residuals = c(`1` = -1.03567244005944,
`2` = -1.51380641405037, `3` = -0.591053402133659, `4` = -0.133568082335235,
`5` = 0.313283765150399, `6` = -3.18718522392237, `7` = -0.538356748944345,
`8` = 0.236302385996349, `9` = 1.98922037248654, `10` = 0.105829602747806
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))
From here I created the proper linear model that includes two factor interaction terms:
commercial_properties_lm_two_degree_interaction <-
lm(data=Commercial_Properties,
formula=Rental_Rates ~ (Age + Op_Expense_Tax + Vacancy_Rate + Total_Sq_Ft)^2)
Next what I was hoping to accomplish was to plot the residuals not just of the linear terms, but also of the interaction terms. I attempted to do this using the residualPlots() function in the car package
library(car)
residualPlots(model=commercial_properties_lm_two_degree_interaction,
terms=~ (Age + Op_Expense_Tax + Vacancy_Rate + Total_Sq_Ft)^2)
When applied in this way the output only produced the residual plots against the linear terms, it didn't plot any interactions. So I then attempted to do it manually, but I got an error:
residualPlots(model=commercial_properties_lm_two_degree_interaction,
terms=~ Age + Op_Expense_Tax + Vacancy_Rate + Tota_Sq_Ft +
Age:Op_Expense_Tax + Age:Vacancy_Rate)
Error in termsToMf(model, terms) : argument 'terms' not interpretable.
Now if I were to do things completely manually I was able to get an interaction plot for example:
with(data=Commercial_Properties, plot(x=Op_Expense_Tax * Vacancy_Rate, y=residuals))
plotted successfully. My issue is that sure I can do this completely manually for a reasonably small amount of variables, but it will get extremely tedious once the amount of variables begins to get larger.
So my question is if there is a way to use an already created function in R to make residual plots of the interaction terms or would I be left to doing it completely manually or most likely having to write some sort of loop ?
Note, I'm not asking about partial residuals. I haven't gotten to that point in my text I'm using. Just plain interaction terms against residuals.
You could do an eval(parse()) approach using the 'term.labels' attribute.
With gsub(':', '*', a[grep(':', a)]) pull out the interaction terms and replace : with * so it can be evaluated.
a <- attr(terms(commercial_properties_lm_two_degree_interaction), 'term.labels')
op <- par(mfrow=c(2, 3))
with(Commercial_Properties,
lapply(gsub(':', '*', a[grep(':', a)]), function(x)
plot(eval(parse(text=x)), residuals, xlab=x)))
par(op)
Edit
This is how we would do this with a for loop in R (but see comments below):
as <- gsub(':', '*', a[grep(':', a)])
op <- par(mfrow=c(2, 3))
for (x in as) {
with(Commercial_Properties,
plot(eval(parse(text=x)), residuals, xlab=x)
)
}
par(op)
I have a dataset looking at a response variable (Fat %), over time (Week 0-4), and over a treatment condition -- short vs long day.
I used a lmer model test to find out if the variables and interaction term were significant and it was significant. I want to look further at the interaction term (so basically a Tukey test but still accounting for the repeated measures). That's when I started to use the emmeans package and the output is not giving me the full output I would like. Any suggestions I would love, thank you.
here is my data set:
structure(list(`Bird ID` = c(61, 62, 71, 72, 73, 76, 77, 63,
64, 69), Day = c("long", "long", "long", "long", "long", "long",
"long", "short", "short", "short"), Week = c(0, 0, 0, 0, 0, 0,
0, 0, 0, 0), `Body Weight` = c(34.57, 41.05, 37.74, 37.04, 33.38,
35.6, 31.88, 34.32, 35.5, 35.78), `Fat %` = c(2.42718446601942,
2.07515423443634, 11.7329093799682, 8.61137591356848, 5.36031238906638,
7.9879679144385, 1.2263099219621, 5.17970401691332, 8.73096446700508,
3.62993896562801), `Lean %` = c(97.5728155339806, 97.9248457655636,
88.2670906200318, 91.3886240864315, 94.6396876109336, 92.0120320855615,
98.7736900780379, 94.8202959830867, 91.2690355329949, 96.370061034372
), `Fat(g)` = c(0.7, 0.74, 3.69, 2.71, 1.51, 2.39, 0.33, 1.47,
2.58, 1.13), `Lean(g)` = c(28.14, 34.92, 27.76, 28.76, 26.66,
27.53, 26.58, 26.91, 26.97, 30), ID = c(1, 2, 3, 4, 5, 6, 7,
8, 9, 10)), row.names = c(NA, -10L), class = c("tbl_df", "tbl",
"data.frame"))
code I have tried:
model:
model3b <- lmer( `Fat %` ~ Day + Week + Day:Week + (1|ID), data=jussara_data)
summary(model3b)
resp <- jussara_data$`Fat %`
f1 <- jussara_data$Week
f2 <- jussara_data$Day
fit1 = lm(log(resp) ~ f1 + f2 + f1:f2, data = jussara_data)
emm1 = emmeans(fit1, specs = pairwise ~ f1:f2)
emm1$emmeans
emm1$contrasts
The contrasts function I was hoping it would give me the summary looking something like this (but I need the repeated measures included not just this anova analysis):
Fat % groups
4:short 32.065752 a
3:short 27.678036 a
2:short 21.358485 b
4:long 13.895404 c
1:short 13.138941 c
2:long 12.245741 c
3:long 12.138498 c
1:long 10.315978 cd
0:short 6.134327 d
0:long 5.631602 d
but instead only gave me this:
f1 f2 emmean SE df lower.CL upper.CL
2 long 2.24 0.0783 66 2.09 2.40
2 short 2.80 0.0783 66 2.64 2.95
Results are given on the log (not the response) scale.
Confidence level used: 0.95
contrast estimate SE df t.ratio p.value
2 long - 2 short -0.556 0.111 66 -5.025 <.0001
Results are given on the log (not the response) scale.
Thank you for the help!
I am having issues with the ICC function from the psych package in R. Pretty much we had three technicians (AA,AB,AC) who measure 11 control solutions three times. We know the control values for these solutions(F_exp). The three measurements were averaged, leaving to AA_avg,AB_avg,AC_avg.
I am trying to calculate the Inter-rater reliability of these three technicians (It reflects the variation between 2 or more raters who measure the same group of subjects). I am planning to use ICC (2,1)
When I try to run
ICC(try[3:5]) # n*p matrix where n=subjects, p=raters.
I get the following results:
I am not sure what to do. I am feeding the data as instructed. When I do it with icc in the irr package, which is more specific with its format of data, I get:
And well and ICC of 0.999998 seems too good to be true. I would really appreciate any help. Thank you!
Here is the structure of my data:
try<-structure(list(Input = c(1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
F_Exp = c(3, 100, 1, 40, 4, 40, 4, 40, 1, 40, 100), AA_avg = c(3.11666666666667,
103.716666666667, 1, 40.8333333333333, 4.18333333333333,
40.8666666666667, 4.18333333333333, 40.9166666666667, 1.03333333333333,
40.9333333333333, 103.783333333333), AB_avg = c(3.25, 103.016666666667,
1.13333333333333, 40.8333333333333, 3.94666666666667, 40.45,
4.28333333333333, 41.1166666666667, 1.05, 40.9166666666667,
104), AC_avg = c(3.2, 103.55, 1.23333333333333, 40.9, 4.26666666666667,
40.4, 4.28333333333333, 40.9, 1.05, 40.95, 103.733333333333
), ALL_avg = c(3.18888888888889, 103.427777777778, 1.12222222222222,
40.8555555555556, 4.13222222222222, 40.5722222222222, 4.25,
40.9777777777778, 1.04444444444444, 40.9333333333333, 103.838888888889
), AA_error = c(-0.116666666666667, -3.71666666666667, 0,
-0.833333333333336, -0.183333333333334, -0.866666666666667,
-0.183333333333334, -0.916666666666664, -0.0333333333333334,
-0.93333333333333, -3.78333333333333), AB_error = c(-0.25,
-3.01666666666667, -0.133333333333333, -0.833333333333336,
0.0533333333333332, -0.450000000000003, -0.283333333333333,
-1.11666666666667, -0.05, -0.916666666666664, -4), AC_error = c(-0.2,
-3.55, -0.233333333333333, -0.899999999999999, -0.266666666666667,
-0.399999999999999, -0.283333333333333, -0.899999999999999,
-0.05, -0.950000000000003, -3.73333333333333)), row.names = c(NA,
-11L), groups = structure(list(Input = c(1, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12), .rows = structure(list(1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 9L, 10L, 11L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -11L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
Your raters' scores are negligibly different across both average rating and rating for each id. It can't estimate random intercept variation if there isn't any. Why don't you believe that your ICC is really high?
Between-rater means:
lapply(try[, 3:5], mean)
$AA_avg
[1] 34.96061
$AB_avg
[1] 34.90879
$AC_avg
[1] 34.95152
I'm trying to write a for loop within another for loop. The first loop grabs the ith vcov matrix from a list of variously sized matrices (vcmats below) and grabs a frame of 24 predictor models of appropriate dimension to multiply with the current vcov matrix from a list of frames (jacobians below) for the different models. The second loop should pull the jth record (row) from the selected predictor frame, correctly format it, then run the calculation with the vcov matrix and output an indicator variable and calculated result needed for post processing to the holding table (holdtab).
When I run the code below I get the following error: Error in jjacob[, 1:4] : incorrect number of dimensions because R is returning the column of 1s (i.e. the intercept column of jacobs), not the complete first record (i.e. jjacob = jacobs[1,]). I've substantially simplified the example but left enough complexity to demonstrate the problem. I would appreciate any help in resolving this issue.
vcmats <- list(structure(c(0.67553, -0.1932, -0.00878, -0.00295, -0.00262,
-0.00637, -0.1932, 0.19988, 0.00331, -0.00159, 0.00149, 2e-05,
-0.00878, 0.00331, 0.00047, -6e-05, 3e-05, 3e-05, -0.00295, -0.00159,
-6e-05, 0.00013, -2e-05, 6e-05, -0.00262, 0.00149, 3e-05, -2e-05,
2e-05, 0, -0.00637, 2e-05, 3e-05, 6e-05, 0, 0.00026), .Dim = c(6L,
6L)), structure(c(0.38399, -0.03572, -0.00543, -0.00453, -0.00634,
-0.03572, 0.10912, 0.00118, -0.00044, 0.00016, -0.00543, 0.00118,
0.00042, -3e-05, 4e-05, -0.00453, -0.00044, -3e-05, 0.00011,
5e-05, -0.00634, 0.00016, 4e-05, 5e-05, 0.00025), .Dim = c(5L,
5L)))
jacobians <- list(structure(list(intcpt = c(1, 1, 1, 1), species = c(1, 1,
0, 0), nage = c(6, 6, 6, 6), T = c(12, 50, 12, 50), hgt = c(90,
90, 90, 90), moon = c(7, 7, 7, 7), hXm = c(0, 0, 0, 0), covr = c(0,
0, 0, 0), het = c(0, 0, 0, 0)), .Names = c("intcpt", "species",
"nage", "T", "hgt", "moon", "hXm", "covr", "het"), row.names = c("1",
"1.4", "1.12", "1.16"), class = "data.frame"), structure(list(
intcpt = c(1, 1, 1, 1), species = c(1, 1, 0, 0), nage = c(6,
6, 6, 6), T = c(12, 50, 12, 50), hgt = c(0, 0, 0, 0), moon = c(7,
7, 7, 7), hXm = c(0, 0, 0, 0), covr = c(0, 0, 0, 0), het = c(0,
0, 0, 0)), .Names = c("intcpt", "species", "nage", "T", "hgt",
"moon", "hXm", "covr", "het"), row.names = c("2", "2.4", "2.12",
"2.16"), class = "data.frame"))
holdtab <- structure(list(model = structure(c(4L, 4L, 4L, 4L, 5L, 5L, 5L,
5L), .Label = c("M.1.BaseCov", "M.2.Height", "M.5.Height.X.LastNewMoon",
"M.6.Height.plus.LastNew", "M.7.LastNewMoon", "M.G.Global"), class = "factor"),
aicc = c(341.317, 341.317, 341.317, 341.317, 342.1412, 342.1412,
342.1412, 342.1412), species = c(NA, NA, NA, NA, NA, NA,
NA, NA), condVar = c(NA, NA, NA, NA, NA, NA, NA, NA)), .Names = c("model",
"aicc", "species", "condVar"), row.names = c(1L, 2L, 3L, 4L,
25L, 26L, 27L, 28L), class = "data.frame")
jloop <- 1
for (imat in vcmats) { # Call the outside loop of vcov matrices
jacobs = jacobians[[jloop]] # Set tempvar jacobs as the jth member of the jacobians frame (n/24)
for (jjacob in jacobs) { # Call inside loop of lines in jacob (each individual set of predictor levels)
# I need to reduce the vector length to match my vcov matrix so
pt1 = jjacob[,1:4] # Separate Core columns from variable columns (because I don't want to drop species when ==0)
pt2 = jjacob[,5:9] # Pull out variable columns for next step
pt2 = pt2[,!apply(pt2 == 0, 2, all)] # Drop any variable columns that ==0
jjacob = cbind(pt1, pt2) # Reconstruct the record now of correct dimensions for the relevant vcov matrix
jjacob = as.matrix(jjacob) # Explicitly convert jjmod - I was having trouble with this previously
tj = (t(jjacob)) # Transpose the vector
condvar = jjacob %*% imat %*% tj # run the calculation
condVarTab[record,3] = jjacob[2] # Write species 0 or 1 to the output table
condVarTab[record,4] = condvar # Write the conditional variance to the table
record = record+1 # Iterate the record number for the next output run
}
jloop = jloop+1 # Once all 24 models in a frame are calculated iterate to the next frame of models which will be associated with a new vcv matrix
}
I have a list data like below. I want to perform nonlinear regression Gaussian curve fitting between mids and counts for each element of my list and report mean and standard deviation
mylist<- structure(list(A = structure(list(breaks = c(-10, -9,
-8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4), counts = c(1L,
0L, 1L, 5L, 9L, 38L, 56L, 105L, 529L, 2858L, 17L, 2L, 0L, 2L),
density = c(0.000276014352746343, 0, 0.000276014352746343,
0.00138007176373171, 0.00248412917471709, 0.010488545404361,
0.0154568037537952, 0.028981507038366, 0.146011592602815,
0.788849020149048, 0.00469224399668783, 0.000552028705492686,
0, 0.000552028705492686), mids = c(-9.5, -8.5, -7.5, -6.5,
-5.5, -4.5, -3.5, -2.5, -1.5, -0.5, 0.5, 1.5, 2.5, 3.5),
xname = "x", equidist = TRUE), .Names = c("breaks", "counts",
"density", "mids", "xname", "equidist"), class = "histogram"),
B = structure(list(breaks = c(-7, -6, -5,
-4, -3, -2, -1, 0), counts = c(2L, 0L, 6L, 2L, 2L, 1L, 3L
), density = c(0.125, 0, 0.375, 0.125, 0.125, 0.0625, 0.1875
), mids = c(-6.5, -5.5, -4.5, -3.5, -2.5, -1.5, -0.5), xname = "x",
equidist = TRUE), .Names = c("breaks", "counts", "density",
"mids", "xname", "equidist"), class = "histogram"), C = structure(list(
breaks = c(-7, -6, -5, -4, -3, -2, -1, 0, 1), counts = c(2L,
2L, 4L, 5L, 14L, 22L, 110L, 3L), density = c(0.0123456790123457,
0.0123456790123457, 0.0246913580246914, 0.0308641975308642,
0.0864197530864197, 0.135802469135802, 0.679012345679012,
0.0185185185185185), mids = c(-6.5, -5.5, -4.5, -3.5,
-2.5, -1.5, -0.5, 0.5), xname = "x", equidist = TRUE), .Names = c("breaks",
"counts", "density", "mids", "xname", "equidist"), class = "histogram")), .Names = c("A",
"B", "C"))
I have read this
Fitting a density curve to a histogram in R
but this is how to fit a curve to a histogram. what I want is Best-fit values"
" Mean"
" SD"
If I use PRISM to do it, I should get the following results
for A
Mids Counts
-9.5 1
-8.5 0
-7.5 1
-6.5 5
-5.5 9
-4.5 38
-3.5 56
-2.5 105
-1.5 529
-0.5 2858
0.5 17
1.5 2
2.5 0
3.5 2
performing nonlinear regression Gaussian curve fitting , I get
"Best-fit values"
" Amplitude" 3537
" Mean" -0.751
" SD" 0.3842
for the second set
B
Mids Counts
-6.5 2
-5.5 0
-4.5 6
-3.5 2
-2.5 2
-1.5 1
-0.5 3
"Best-fit values"
" Amplitude" 7.672
" Mean" -4.2
" SD" 0.4275
and for the third one
Mids Counts
-6.5 2
-5.5 2
-4.5 4
-3.5 5
-2.5 14
-1.5 22
-0.5 110
0.5 3
I get this
"Best-fit values"
" Amplitude" 120.7
" Mean" -0.6893
" SD" 0.4397
In order to convert the histogram back to the estimate of the mean and standard deviation. First convert the results of the bin counts times the bin. This will be an approximation of the original data.
Based on your example above:
#extract the mid points and create list of simulated data
simdata<-lapply(mylist, function(x){rep(x$mids, x$counts)})
#if the original data were integers then this may give a better estimate
#simdata<-lapply(mylist, function(x){rep(x$breaks[-1], x$counts)})
#find the mean and sd of simulated data
means<-lapply(simdata, mean)
sds<-lapply(simdata, sd)
#or use sapply in the above 2 lines depending on future process needs
If your data was integers then using the breaks as the bins will provide a better estimate. Depending on the function for the histogram (ie right=TRUE/FALSE) may shift the results by one.
Edit
I thought this was going to be an easy one. I reviewed the video, the sample data shown was:
mids<-seq(-7, 7)
counts<-c(7, 1, 2, 2, 2, 5, 217, 70, 18, 0, 2, 1, 2, 0, 1)
simdata<-rep(mids, counts)
The video results were mean = -0.7359 and sd= 0.4571. The solution which I found provided the closest results was using the "fitdistrplus" package:
fitdist(simdata, "norm", "mge")
Using the "maximizing goodness-of-fit estimation" resulted in mean = -0.7597280 and sd= 0.8320465.
At this point, the method above provides a close estimate but does not exactly match. I don't not know what technique was used to calculate the fit from the video.
Edit #2
The above solutions involved recreating the original data and fitting that using either the mean/sd or using the fitdistrplus package. This attempt is an attempt to perform a least-square fit using the Gaussian distribution.
simdata<-lapply(mylist, function(x){rep(x$mids, x$counts)})
means<-sapply(simdata, mean)
sds<-sapply(simdata, sd)
#Data from video
#mids<-seq(-7, 7)
#counts<-c(7, 1, 2, 2, 2, 5, 217, 70, 18, 0, 2, 1, 2, 0, 1)
#make list of the bins and distribution in each bin
mids<-lapply(mylist, function(x){x$mids})
dis<-lapply(mylist, function(x) {x$counts/sum(x$counts)})
#function to perform the least square fit
nnorm<-function(values, mids, dis) {
means<-values[1]
sds<-values[2]
#print(paste(means, sds))
#calculate out the Gaussian distribution for each bin
modeld<-dnorm(mids, means, sds)
#sum of the squares
diff<-sum( (modeld-dis)^2)
diff
}
#use optim function with the mean and sd as initial guesses
#find the mininium with the mean and SD as fit parameters
lapply(1:3, function(i) {optim(c(means[[i]], sds[[i]]), nnorm, mids=mids[[i]], dis=dis[[i]])})
This solution provides a closer answer to PRISM results, but still not the same. Here is a comparison of all the 4 solutions.
From the table, the least square fit (the one just above) provides the closest approximation. Maybe tweaking the mid points dnorm function might help. But Case B data is farthest from being normally distributed but the PRISM software still generates a small standard deviation, while the other methods are similar. It is possible the PRISM software performs some type of data filtering to remove the outliers before the fit.