Exchangeable correlations and constant variance in MCMCglmm? - r

I am attempting to fit a very simple model with MCMCglmm, but am getting quite stuck.
Imagine a class (30 students) get grades for two papers throughout the semester where the paper assignments are exactly the same (we don't want to model a difference in average scores between the papers, there are no "learning effects", and we can assume that the variance in grades are the same.)
Let $i = 1...30$ index the student, $y_{i1}$ and $y_{i2}$ index the scores for that student's first and second papers.
One way to model this data is using random intercepts for student scores to account for correlation between each students scores. Let $\mu_i$ be the student intercept, $sigma$ be the residual sd, and $\sigma_{\mu}$ be the sd of the intercepts. Then we write (in shorthand) our random intercept model at $f(y_{ij}|\mu_i) = Normal(\mu_i, \sigma)$ and $f(\mu_i) = Normal(\mu, \sigma_{\mu)$.
An alternative way to write this model would be to model the residual correlation structure more explicitly. That is, we would write that ${y_{i1}, y_{i2}}$ have a multivariate normal distribution with mean ${\mu, \mu}$ variance $\tau = \sigma^2 + \sigma_{\mu}^2$ and correlation $\rho = \sigma_{\mu}^2 / (\sigma^2 + \sigma_{\mu}^2)$.
To be clear, these models are mathematically equivalent, but statistical software will often have a specific implementation for each. For example we can fit the two approaches separately with nlme:
library(nlme)
library(tidyverse)
library(MCMCglmm)
df <-
tibble(id = factor(rep(1:100, each = 20))) %>%
mutate(paper = 1:n()) %>%
group_by(id) %>%
mutate(mu = rnorm(1),
y = mu + rnorm(n(), 0, 3))
gls(data = df,
model = y~1,
correlation = corCompSymm(form = ~ 1 | id))
lme(data = df, fixed = y ~ 1, random = ~1|id)
It seems MCMCglmm can fit the first parameterization (random intercepts) of the model just fine.
MCMCglmm(data = df,
fixed = y ~ 1,
random = ~id,
nitt = 1000, burnin = 0, thin = 1)
However, I am not seeing a way to implement the second approach. My best attempt involves "widening" the data frame and fitting a multiple response model.
df.wide <- df %>% select(- paper) %>%
pivot_wider(values_from = "y",
names_from = "obs", names_prefix = "paper") %>%
as.data.frame
MCMCglmm(fixed = cbind(paper1, paper2) ~ 1,
rcov = ~us(trait):units,
data = df.wide)
However, (1) I am not sure that I am fitting this model correctly, (2) I am not sure how to interpret the fitted values (especially since my posterior mean covariances seem much too small) and (3) there doesn't seem to be a way to get a constant variance across traits.
p.s. I would appreciate not being told to just fit the random intercept model. I am writing some course materials, and would like students to be able to more directly compare the exchangeable correlation model with other types of correlation structures that we might use when we have more than two observations (i.e. AR, Toeplitz, etc.), and I would like my students to be able to do the comparison of the two parameterizations themselves, as I would do when I used nlme.
FOLLOW-UP: I am currently trying to fit the model with BRMS, though would still be open to any "hacks" in MCMCglmm.
model1 <- brms::brm(data = df,
formula = y ~ 1 + cosy(gr = id, time = obs),
family = "gaussian",
chains = 4, thin = 1, iter = 5000, warmup = 100)

Is exchangeability + equal variances the same as what I would call compound symmetry? (I guess so, since you're using corCompSymm() in nlme) ...
As far as I can tell this isn't possible (I can't rule out that there's some way to hack it with the available variance structures, but it's far from obvious ...) From ?MCMCglmm:
Currently, the only ‘variance.functions’ available are ‘idv’,
‘idh’, ‘us’, ‘cor[]’ and ‘ante[]’. ‘idv’ fits a constant
variance across all components in ‘formula’. Both ‘idh’ and
‘us’ fit different variances across each component in
‘formula’, but ‘us’ will also fit the covariances. ‘corg’
fixes the variances along the diagonal to one and ‘corgh’
fixes the variances along the diagonal to those specified in
the prior. ‘cors’ allows correlation submatrices. ‘ante[]’
fits ante-dependence structures of different order (e.g
ante1, ante2), and the number can be prefixed by a ‘c’ to
hold all regression coefficients of the same order equal. The
number can also be suffixed by a ‘v’ to hold all innovation
variances equal (e.g ‘antec2v’ has 3 parameters).
By using the us() (unstructured, what nlme would call pdSymm for "positive-definite symmetric") structure, I believe you're not constraining the correlation parameters to be all the same (i.e., violating exchangeability).
For what it's worth, one reason (other than pedagogy) to want to specify a compound-symmetric correlation matrix explicitly rather than by composing the sum of group-level and individual-level random effects would be if you wanted to model negative compound symmetry (the sum-of-random-effects approach can only model rho>0).
My guess is that you're also restricted to answers using MCMCglmm, but if "some Bayesian MCMC approach" is good enough, then you could do this via brms or (somewhat more obscurely, sort of) glmmTMB + tmbstan (although this combination does not currently use informative priors!)

Related

Quasi-Poisson mixed-effect model on overdispersed count data from multiple imputed datasets in R

I'm dealing with problems of three parts that I can solve separately, but now I need to solve them together:
extremely skewed, over-dispersed dependent count variable (the number of incidents while doing something),
necessity to include random effects,
lots of missing values -> multiple imputation -> 10 imputed datasets.
To solve the first two parts, I chose a quasi-Poisson mixed-effect model. Since stats::glm isn't able to include random effects properly (or I haven't figured it out) and lme4::glmer doesn't support the quasi-families, I worked with glmer(family = "poisson") and then adjusted the std. errors, z statistics and p-values as recommended here and discussed here. So I basically turn Poisson mixed-effect regression into quasi-Poisson mixed-effect regression "by hand".
This is all good with one dataset. But I have 10 of them.
I roughly understand the procedure of analyzing multiple imputed datasets – 1. imputation, 2. model fitting, 3. pooling results (I'm using mice library). I can do these steps for a Poisson regression but not for a quasi-Poisson mixed-effect regression. Is it even possible to A) pool across models based on a quasi-distribution, B) get residuals from a pooled object (class "mipo")? I'm not sure. Also I'm not sure how to understand the pooled results for mixed models (I miss random effects in the pooled output; although I've found this page which I'm currently trying to go through).
Can I get some help, please? Any suggestions on how to complete the analysis (addressing all three issues above) would be highly appreciated.
Example of data is here (repre_d_v1 and repre_all_data are stored in there) and below is a crucial part of my code.
library(dplyr); library(tidyr); library(tidyverse); library(lme4); library(broom.mixed); library(mice)
# please download "qP_data.RData" from the last link above and load them
## ===========================================================================================
# quasi-Poisson mixed model from single data set (this is OK)
# first run Poisson regression on df "repre_d_v1", then turn it into quasi-Poisson
modelSingle = glmer(Y ~ Gender + Age + Xi + Age:Xi + (1|Country) + (1|Participant_ID),
family = "poisson",
data = repre_d_v1)
# I know there are some warnings but it's because I share only a modified subset of data with you (:
printCoefmat(coef(summary(modelSingle))) # unadjusted coefficient table
# define quasi-likelihood adjustment function
quasi_table = function(model, ctab = coef(summary(model))) {
phi = sum(residuals(model, type = "pearson")^2) / df.residual(model)
qctab = within(as.data.frame(ctab),
{`Std. Error` = `Std. Error`*sqrt(phi)
`z value` = Estimate/`Std. Error`
`Pr(>|z|)` = 2*pnorm(abs(`z value`), lower.tail = FALSE)
})
return(qctab)
}
printCoefmat(quasi_table(modelSingle)) # done, makes sense
## ===========================================================================================
# now let's work with more than one data set
# object "repre_all_data" of class "mids" contains 10 imputed data sets
# fit model using with() function, then pool()
modelMultiple = with(data = repre_all_data,
expr = glmer(Y ~ Gender + Age + Xi + Age:Xi + (1|Country) + (1|Participant_ID),
family = "poisson"))
summary(pool(modelMultiple)) # class "mipo" ("mipo.summary")
# this has quite similar structure as coef(summary(someGLM))
# but I don't see where are the random effects?
# and more importantly, I wanted a quasi-Poisson model, not just Poisson model...
# ...but here it is not possible to use quasi_table function (defined earlier)...
# ...and that's because I can't compute "phi"
This seems reasonable, with the caveat that I'm only thinking about the computation, not whether this makes statistical sense. What I'm doing here is computing the dispersion for each of the individual fits and then applying it to the summary table, using a variant of the machinery that you posted above.
## compute dispersion values
phivec <- vapply(modelMultiple$analyses,
function(model) sum(residuals(model, type = "pearson")^2) / df.residual(model),
FUN.VALUE = numeric(1))
phi_mean <- mean(phivec)
ss <- summary(pool(modelMultiple)) # class "mipo" ("mipo.summary")
## adjust
qctab <- within(as.data.frame(ss),
{ std.error <- std.error*sqrt(phi_mean)
statistic <- estimate/std.error
p.value <- 2*pnorm(abs(statistic), lower.tail = FALSE)
})
The results look weird (dispersion < 1, all model results identical), but I'm assuming that's because you gave us a weird subset as a reproducible example ...

Calculate indirect effect of 1-1-1 (within-person, multilevel) mediation analyses

I have data from an Experience Sampling Study, which consists of 8140 observations nested in 106 participants. I want to test if there is a mediation, in which I also want to compare the predictors (X1= socialInteraction_tech, X2= socialInteraction_ftf, M = MPEE_int, Y= wellbeing). X1, X2, and M are person-mean centred in order to obtain the within-person effects. To account for the autocorrelation I have fit a model with an ARMA(2,1) structure. We control for time with the variable "obs".
This is the final model including all variables of interest:
fit_mainH1xmy <- lme(fixed = wellbeing ~ 1 + obs # Controls
+ MPEE_int_centred + socialInteraction_tech_centred + socialInteraction_ftf_centred,
random = ~ 1 + obs | ID, correlation = corARMA(form = ~ obs | ID, p = 2, q = 1),
data = file, method = "ML", na.action=na.exclude)
summary(fit_mainH1xmy)
The mediation is partial, as my predictor X still significantly predicts Y after adding M.
However, I can't find a way to calculate c'(cprime), the indirect effect.
I have found the mlma package, but it looks weird and requires me to do transformations to my data.
I have tried melting the data in a long format and using lmer() to fit the model (following https://quantdev.ssri.psu.edu/sites/qdev/files/ILD_Ch07_2017_Within-PersonMedationWithMLM.html), but lmer() does not let me take into account the moving average (MA-part of the ARMA(2,1) structure).
Does anyone know how I could now obtain the indirect effect?

Random Effects in Longitudinal Multilevel Imputation Models Using MICE

I am trying to impute data in dataset with a longitudinal design. There are two predictors (experimental group, and time) and one outcome variable (score). The clustering variable is id.
Here is the toy data
set.seed(345)
A0 <- rnorm(4,2,.5)
B0 <- rnorm(4,2+3,.5)
A1 <- rnorm(4,6,.5)
B1 <- rnorm(4,6+2,.5)
A2 <- rnorm(4,10,.5)
B2 <- rnorm(4,10+1,.5)
A3 <- rnorm(4,14,.5)
B3 <- rnorm(4,14+0,.5)
score <- c(A0,B0,A1,B1,A2,B2,A3,B3)
id <- rep(1:8,times = 4, length = 32)
time <- rep(0:3, each = 8, length = 32)
group <- rep(c("A","B"), times =2, each = 4, length = 32)
df <- data.frame(id = id, group = group, time = time, score = score)
# plots
(ggplot(df, aes(x = time, y = score, group = group)) +
stat_summary(fun.y = "mean", geom = "line", aes(linetype = group)) +
stat_summary(fun.y = "mean", geom = "point", aes(shape = group), size = 3) +
coord_cartesian(ylim = c(0,18)))
# now place some NAs
df[sample(1:nrow(df), 10, replace = F),"score"] <- NA
df
If I understand this post correctly, in the predictor matrix I should specify the id clustering variable with a -2 and the two fixed predictors time and group with a 1. Like so
library(mice)
(ini <- mice(df, maxit=0))
(pred <- ini$predictorMatrix)
(pred["score",] <- c(-2, 1, 1, 0))
(imp <- mice(df,
method = c("", "", "", "2l.pan"),
pred = pred,
maxit = 1,
seed = 71152))
What i would like to know is:
Is this a longitudinal random intercepts imputation model? Specifying the id variable as -2 designates it as a 'class' variable, but in this mice primer it suggests that for multilevel models you should create a variable of all 1's in the dataframe as a constant, which is then specified as the random intercept via 2 in the predictor matrix. However, this is based on the 2l.norm function rather than the 2l.pan function, so I am not really sure where I am here. Does the 2l.pan function not require this column, or the specification of random effects?
Is there any way to specify a longitudinal random-slopes model, and, if so, how?
This answer is probably a bit late for you, but it may be able to help some people who read this in the future:
How to work with 2l.pan
Below are some details about specifying multilevel imputation models with mice. Because the application is longitudinal, I use the term "persons" to refer to units at Level 2. These are the most relevant arguments for 2l.pan as mentioned in the mice documentation:
type
Vector of length ncol(x) identifying random and class variables.
Random effects are identified by a 2. The group variable (only one
is allowed) is coded as -2. Random effects also include the fixed
effect. If for a covariates X1 group means shall be calculated and
included as further fixed effects choose 3. In addition to the
effects in 3, specification 4 also includes random effects of
X1.
There are 5 different codes you can use in the predictor matrix for variables imputed with 2l.pan. The person identifier is coded as -2 (this is different from 2l.norm). To include predictor variables with fixed or random effects, these variables are coded with 1 or 2, respectively. If coded as 2, the corresponding fixed effect is automatically included.
In addition, 2l.pan offers the codes 3 and 4, which have similar meanings as 1 and 2 but will include an additional fixed effect for the person mean of that variable. This is useful if you're trying to model within- and between-person effects of time-varying predictor variables.
intercept
Logical determining whether the intercept is automatically added.
By default, 2l.pan includes the intercept as both a fixed and a random effect. For this reason, it is not required to include a constant term in the predictor matrix. If one sets intercept=FALSE, this behavior is changed, and the intercept is dropped from the imputation model.
groupcenter.slope
If TRUE, in case of group means (type is 3 or 4) group mean
centering for these predictors are conducted before doing imputations.
Default is FALSE.
Using this option, it is possible to center predictor variables around the person mean instead of including the predictor variable "as is" (i.e., without centering). This only applies to variables coded as 3 or 4. For predictors coded as 3, this is not very important because the models with and without centering are identical.
However, when predictor variables are coded as 4 (i.e., with a random slope), then centering alters the meaning of the random effect so that the random slope no longer applies to the variable "as is" but to the within-person deviation of that variable.
In your example, you can include a simple random slope for time as follows:
library(mice)
ini <- mice(df, maxit=0)
# predictor matrix (following 'type')
pred <- ini$predictorMatrix
pred["score",] <- c(-2, 1, 2, 0)
# imputation method
meth <- c("", "", "", "2l.pan")
imp <- mice(df, method=meth, pred=pred, maxit=10, m=10)
In this example, coding time as 3 or 4 wouldn't make a lot of sense because the person means of time are identical for all persons. However, if you have time-varying covariates that you want to include as predictor variables in the imputation model, 3 and 4 can be useful.
The additional arguments like intercept and groupcenter.slope can be specified directly in the call to mice(), for example:
imp <- mice(df, ..., groupcenter.slope=TRUE)
Regarding your Questions
So, to answer your questions as stated in the post:
Yes, 2l.pan provides a multilevel (or rather two-level) imputation model. The intercept is included as both a fixed and a random effect by default (can be changed with intercept=FALSE) and need not be specified in the predictor matrix (this is in contrast to 2l.norm).
Yes, you can specify random slopes with 2l.pan. To do that, predictors with random slopes are coded as 2 or 4 in the predictor matrix. If coded
as 2, the random slope is included. If coded as 4, the random slope is included as well as an additional fixed effect for the person means of that variable. If coded as 4, the meaning of the random slope may be altered by making use of groupcenter.slope=TRUE (see above).
This article also includes some worked examples for how to work with 2l.pan and other functions for mutlivel imputation: [Link]
The pan library doesn't require an intercept term.
You can dig into the function using
library(pan)
?pan
That said mice uses a wrapper around pan called mice.impute.2l.pan with the mice library loaded you can look at the help for that function. It states: it has a parameters called intercept which is [a] Logical [and] determin[es] whether the intercept is automatically added. It is TRUE by default. This is defined as a random intercept by default. Found this out after browsing the R code for the mice wrapper:
if (intercept) {
x <- cbind(1, as.matrix(x))
type <- c(2, type)
}
Where the pan function parameter type is a Vector of length ncol(x) identifying random and class variables. The intercept is added by default and defined as a random effect.
They do provide and example like you stated with a 1 for "x" in the prediction matrix for fixed effects.
It also states for 2l.norm, The random intercept is automatically added in mice.impute.2l.norm().
It has a few examples with descriptions.
The CRAN documentation for pan might help you.

rstanarm for Bayesian hierarchical modeling of binomial experiments

Suppose there are three binomial experiments conducted chronologically. For each experiment, I know the #of trials as well as the #of successes. To use the first two older experiments as prior for the third experiment, I want to "fit a Bayesian hierarchical model on the two older experiments and use the posterior form that as prior for the third experiment".
Given my available data (below), my question is: is my rstanarm code below capturing what I described above?
Study1_trial = 70
Study1_succs = 27
#==================
Study2_trial = 84
Study2_succs = 31
#==================
Study3_trial = 100
Study3_succs = 55
What I have tried in package rstanarm:
library("rstanarm")
data <- data.frame(n = c(70, 84, 100), y = c(27, 31, 55));
mod <- stan_glm(cbind(y, n - y) ~ 1, prior = NULL, data = data, family = binomial(link = 'logit'))
## can I use a beta(1.2, 1.2) as prior for the first experiment?
TL;DR: If you were directly predicting the probability of success, the model would be a Bernoulli likelihood with parameter theta (the probability of success) that could take on values between zero and one. You could use a Beta prior for theta in this case. But with a logistic regression model, you're actually modeling the log odds of success, which can take on any value from -Inf to Inf, so a prior with a normal distribution (or some other prior that can take on any real value within some range determined by the available prior information) would be more appropriate.
For a model where the only parameter is the intercept, the prior is the probability distribution for the log odds of success. Mathematically, the model is:
log(p/(1-p)) =  a
Where p is the probability of success and a, the parameter you're estimating, is the intercept, which can be any real number. If the odds of success are 1:1 (that is, p = 0.5) then a = 0. If the odds are greater than 1:1 then a is positive. If the odds are less than 1:1 then a is negative.
Since we want a prior for a, we need a probability distribution that can take on any real value. If we didn't know anything about the odds of success, we might use a very weakly informative prior like a normal distribution with, say, mean=0 and sd=10 (this is the rstanarm default), meaning that one standard deviation would encompass odds of success ranging from about 22000:1 to 1:22000! So this prior is essentially flat.
If we take your first two studies to construct the prior, we can use the probability density based on those studies and then transform it to the log odds scale:
# Possible outcomes (that is, the possible number of successes)
s = 0:(70+84)
# Probability density over all possible outcomes
dens = dbinom(s, 70+84, (27+31)/(70+84))
Assuming we'll use a normal distribution for the prior, we want the most likely probability of success (which will be the mean for the prior) and the standard deviation of the mean.
# Prior parameters
pp = s[which.max(dens)]/(70+84) # most likely probability
psd = sum(dens * (s/max(s) - pp)^2)^0.5 # standard deviation
# Convert prior to log odds scale
pp_logodds = log(pp/(1-pp))
psd_logodds = log(pp/(1-pp)) - log((pp-psd)/(1 - (pp-psd)))
c(pp_logodds, psd_logodds)
[1] -0.5039052 0.1702006
You could generate essentially the same prior by running stan_glm on the first two studies with the default (flat) prior:
prior = stan_glm(cbind(y, n-y) ~ 1,
data = data[1:2,],
family = binomial(link = 'logit'))
c(coef(prior), se(prior))
[1] -0.5090579 0.1664091
Now let's fit the model using data from Study 3 using the default prior and the prior we just generated. I've switched to a standard data frame, since stan_glm seems to fail when the data frame has only one row (as in data = data[3, ]).
# Default weakly informative prior
mod1 <- stan_glm(y ~ 1,
data = data.frame(y=rep(0:1, c(45,55))),
family = binomial(link = 'logit'))
# Prior based on studies 1 & 2
mod2 <- stan_glm(y ~ 1,
data = data.frame(y=rep(0:1, c(45,55))),
prior_intercept = normal(location=pp_logodds, scale=psd_logodds),
family = binomial(link = 'logit'))
For comparison, let's also generate a model with all three studies and the default flat prior. We would expect this model to give virtually the same results as mod2:
mod3 <- stan_glm(cbind(y, n - y) ~ 1,
data = data,
family = binomial(link = 'logit'))
Now let's compare the three models:
library(tidyverse)
list(`Study 3, Flat Prior`=mod1,
`Study 3, Prior from Studies 1 & 2`=mod2,
`All Studies, Flat Prior`=mod3) %>%
map_df(~data.frame(log_odds=coef(.x),
p_success=predict(.x, type="response")[1]),
.id="Model")
Model log_odds p_success
1 Study 3, Flat Prior 0.2008133 0.5500353
2 Study 3, Prior from Studies 1 & 2 -0.2115362 0.4473123
3 All Studies, Flat Prior -0.2206890 0.4450506
For Study 3 with the flat prior (row 1), the predicted probability of success is 0.55, as expected, since that's what the data says and the prior provides no additional information.
For Study 3 with a prior based on studies 1 and 2, the probability of success is 0.45. The lower probability of success is due to the lower probability of success in Studies 1 and 2 adding additional information. In fact, the probability of success from mod2 is exactly what you'd calculate directly from the data: with(data, sum(y)/sum(n)). mod3 puts all the information into the likelihood instead of splitting it between the prior and the likelihood, but is otherwise essentially the same as mod2.
Answer to (now deleted) comment: If all you know is the number of trials and successes and you think that a binomial probability is a reasonable model for how the data were generated, then it doesn't matter how you split up the data into "prior" and "likelihood" or whether you shuffle the order of the data. The resulting model fit will be the same.

glm model dataset summarisation

first post, so go easy.
In the insurance world of GLMing, the classic approach is to model claims frequency and average severity. With that in mind, I built a couple of models to experiment for myself and now have a question.
Could somebody please explain how GLM handles varying levels of summarisation of a dataset, particularly with regard to error estimates?
Consider the example below. The data exhibits strong severity trends for both variables:
- A has more expensive claims than B
- Ford > Kia > Vaux > Jag
I fitted a model to unsummarised and a summarised version of the dataset, and accordingly GLM fitted the same parameters in both cases
However, GLM indicates a well fitted model to the unsummarised data. But when I summarise and use a weighted mean, ie average severity, the model fits poorly. Maybe this is as you would expect, after all the unsummarised data has more points to model with. Also, it appears the weighted mean is used to indicate RELATIVE strength, so here, specifiying the weighted mean is pointless, since they are all the same weights.
But more fundementally, can I not model average severity with GLM? I mean, I know the result of fitting a GLM to an unsummarised dataset will be a average severity, but I was hoping to fit a model to already summarised data. It appears that modelling on aggregated datasets will not give a true indication of the model fit.
Apologies if this a stupid question, I'm not a statistician, so don't fully understand the Hessian Matrix.
Please see code below:
library(boot)
library(reshape)
dataset <- data.frame(
Person = rep(c("A", "B"), each=200),
Car = rep(c("Ford", "Kia", "Vaux", "Jag"), 2, each=50),
Amount = c(rgamma(50, 200), rgamma(50, 180), rgamma(50, 160), rgamma(50, 140),
rgamma(50, 100), rgamma(50, 80), rgamma(50, 60), rgamma(50, 40))
)
Agg1 <- ddply(dataset, .(Person, Car), summarise, mean=mean(Amount), length=length(Amount))
m1 <- glm(Amount ~ Person + Car, data = dataset, family = Gamma(link="log"))
m2 <- glm(mean ~ Person + Car, data = Agg1, family = Gamma(link="log"), weights=length)
summary(m1)
summary(m2)
Thanks,
Nick
Bottom line is that both models are identical - the reason the aggregated model "fits poorly" is entirely due to the reduction in degrees of freedom due to aggregation.
Before getting into why the models are identical, I should point out that this does not necessarily mean that either model is a good fit. You should run diagnostics on both, especially using:
par(mfrow=c(2,2))
plot(m1)
When you do this. you'll see that the residuals are normally distributed (which is essential), but that they follow a pattern (-, +, -), which is disturbing. I would want to understand that before declaring that this is a good model. [Admittedly, this is made up data, but the principles apply nevertheless.]
Comparing the aggregated to base models, look at the values of the coefficients.
coef.m1 <- summary(m1)$coefficients
coef.m2 <- summary(m2)$coefficients
cbind(coef.m1[,1],coef.m2[,1])
# [,1] [,2]
# (Intercept) 5.4096980 5.4096976
# PersonB -0.9249371 -0.9249366
# CarJag -0.6144606 -0.6144602
# CarKia -0.1786556 -0.1786555
# CarVaux -0.3597925 -0.3597923
The reason you think the aggregated model is "worse" is because of the p-values, but these depend on t = coeff/se . The ratio of se in m1 vs. m2 is the same for all coefficients:
coef.m2[,2]/coef.m1[,2]
# (Intercept) PersonB CarJag CarKia CarVaux
# 7.836171 7.836171 7.836171 7.836171 7.836171
Since
se ~ sd / √ df
the ratio of se for the two models should be approx
sem1/sem2 = √( (nm1-1) / (nm2-1) )
sqrt((nrow(dataset)-1)/(nrow(Agg1)-1))
# [1] 7.549834
Frankly I'm puzzled why the ratio is not exactly equal to 7.55.
Put another way, glm(...) has no way of knowing that you aggregated your data. It thinks you are trying to fit a model with 4 parameters and an intercept to 8 data points.

Resources