I am running the average marginal effects in Julia using the Effects package. My aim is to see how weight change between men and women at different ages. As you can see in the output below, it runs the average marginal effects for each age for man and women. However, I would like to take a range of the age variable and not take each year on itself alone. For example I would like to have the age range of 0:5, 5:10, 10:15 and so on. This has to be done after the regression model is run and not beforehand. I tried to work it on my own, but I am not fluent enough in Julia.
So the only line that needs to be rectified is the following:
d1 = Dict(:sex => ["male","female"],:age => [0:5; 6:20])
Here is the code:
using DataFrames, Effects, GLM, StatsModels, StableRNGs
rng = StableRNG(42)
growthdata = DataFrame(; age=[13:20; 13:20],
sex=repeat(["male", "female"], inner=8),
weight=[range(100, 155; length=8); range(100, 125; length=8)] .+ randn(rng, 16))
mod_uncentered = lm(#formula(weight ~ 1 + sex * age), growthdata)
d1 = Dict(:sex => ["male","female"],:age => [0:5; 6:20])
ave = effects(d1, mod_uncentered)
OUTPUT
sex age weight err lower upper
String Int64 Float64 Float64 Float64 Float64
1 male 0 0.287822 2.88762 -2.5998 3.17545
2 female 0 56.4387 2.88762 53.5511 59.3263
3 male 1 8.00869 2.71603 5.29266 10.7247
4 female 1 59.8481 2.71603 57.1321 62.5641
5 male 2 15.7296 2.54468 13.1849 18.2742
6 female 2 63.2575 2.54468 60.7128 65.8022
7 male 3 23.4504 2.37361 21.0768 25.824
8 female 3 66.6669 2.37361 64.2933 69.0405
9 male 4 31.1713 2.2029 28.9684 33.3742
10 female 4 70.0763 2.2029 67.8734 72.2792
11 male 5 38.8922 2.03264 36.8595 40.9248
12 female 5 73.4857 2.03264 71.4531 75.5184
13 male 6 46.613 1.86295 44.7501 48.476
14 female 6 76.8951 1.86295 75.0322 78.7581
15 male 7 54.3339 1.69399 52.6399 56.0279
16 female 7 80.3046 1.69399 78.6106 81.9985
17 male 8 62.0548 1.52602 60.5288 63.5808
18 female 8 83.714 1.52602 82.1879 85.24
19 male 9 69.7756 1.3594 68.4162 71.135
20 female 9 87.1234 1.3594 85.764 88.4828
21 male 10 77.4965 1.19469 76.3018 78.6912
22 female 10 90.5328 1.19469 89.3381 91.7275
23 male 11 85.2174 1.03282 84.1846 86.2502
24 female 11 93.9422 1.03282 92.9094 94.975
25 male 12 92.9383 0.875345 92.0629 93.8136
26 female 12 97.3516 0.875345 96.4762 98.2269
27 male 13 100.659 0.72515 99.934 101.384
28 female 13 100.761 0.72515 100.036 101.486
29 male 14 108.38 0.587838 107.792 108.968
30 female 14 104.17 0.587838 103.583 1
For those familiar with R, Effects.jl is comparable to the effects package and not the emmeans package. While there is a certain amount of overlap in effects and emmeans, effects "only" makes predictions for particular values of the predictors, while emmeans is capable of computing marginal averages over several values (e.g., ranges) of the predictors.
Effects.jl is essentially a wrapper for doing a few things:
computing a fully crossed "reference grid" of a small set of predictors
finding typical values for all other predictors in that model. (generally the mean, but you can use a different summary function, note that you need to think about what your summary function means for contrasts associated with a categorical predictor has an interpretation)
adding these typical values into the reference grid for a fully specified set of data to make predictions on
computing predictions and associated standard errors based on the variance-covariance matrix of the model parameter estimates (vcov). Note that for mixed models, this means only the fixed effects play a role. (The same holds for the use of the effects package in R with models fit with lme4.)
In other words, Effects.jl doesn't understand ranges, it just understands a set of values. It doesn't know how to make a prediction for 0:5, but it does know how to make predictions for 0, 1, etc.
Since you're interesting in the average prediction over a range, you could just compute the average of the predictions you have:
julia> using Statistics
julia> transform!(ave, :age => ByRow(x -> x <= 5 ? "0:5" : "6:20") => :age_bin)
42×7 DataFrame
Row │ sex age weight err lower upper age_bin
│ String Int64 Float64 Float64 Float64 Float64 String
─────┼────────────────────────────────────────────────────────────────────
1 │ male 0 0.287822 2.88762 -2.5998 3.17545 0:5
2 │ female 0 56.4387 2.88762 53.5511 59.3263 0:5
3 │ male 1 8.00869 2.71603 5.29266 10.7247 0:5
4 │ female 1 59.8481 2.71603 57.1321 62.5641 0:5
5 │ male 2 15.7296 2.54468 13.1849 18.2742 0:5
6 │ female 2 63.2575 2.54468 60.7128 65.8022 0:5
7 │ male 3 23.4504 2.37361 21.0768 25.824 0:5
8 │ female 3 66.6669 2.37361 64.2933 69.0405 0:5
9 │ male 4 31.1713 2.2029 28.9684 33.3742 0:5
10 │ female 4 70.0763 2.2029 67.8734 72.2792 0:5
11 │ male 5 38.8922 2.03264 36.8595 40.9248 0:5
12 │ female 5 73.4857 2.03264 71.4531 75.5184 0:5
13 │ male 6 46.613 1.86295 44.7501 48.476 6:20
14 │ female 6 76.8951 1.86295 75.0322 78.7581 6:20
....
julia> rms(x) = sqrt(mean(abs2, x))
rms (generic function with 1 method)
julia> combine(groupby(ave, [:sex, :age_bin]), :weight => mean, :err => rms; renamecols=false)
4×4 DataFrame
Row │ sex age_bin weight err
│ String String Float64 Float64
─────┼────────────────────────────────────
1 │ male 0:5 19.59 2.47686
2 │ female 0:5 64.9622 2.47686
3 │ male 6:20 100.659 1.04247
4 │ female 6:20 100.761 1.04247
For the error, I used root-mean-square (RMS): in other words, take the mean of the associated variances and then convert back to standard deviation scale. (Standard errors are the standard deviation of the sampling distribution of a test statistic.)
For this particular model (well-balanced data, no pesky covariates, no nonlinear transformations of the response), this works out to be the same prediction you would get from taking the mean of the predictors and then computing a single prediction:
julia> d2 = Dict(:sex => ["male","female"],:age => [ mean(0:5); mean(6:20)])
Dict{Symbol, Vector} with 2 entries:
:sex => ["male", "female"]
:age => [2.5, 13.0]
julia> effects(d2, mod_uncentered)
4×6 DataFrame
Row │ sex age weight err lower upper
│ String Float64 Float64 Float64 Float64 Float64
─────┼────────────────────────────────────────────────────────
1 │ male 2.5 19.59 2.4591 17.1309 22.0491
2 │ female 2.5 64.9622 2.4591 62.5031 67.4213
3 │ male 13.0 100.659 0.72515 99.934 101.384
4 │ female 13.0 100.761 0.72515 100.036 101.486
The errors are somewhat smaller, because the error here reflects the uncertainty associated with a single prediction, while the error above reflects the uncertainty from several predictions.
I don't know the Effects package, but in [0:5; 6:20] the ranges are automatically expanded by julia. Did you also try [0:5, 6:20] ?
Related
I'm trying to investigate if the proportion of muzzle contact(mc) in primates tends to be directed more towards the mother than other group members (Adults or Juveniles). I have data over 5 years in 4 different groups. This is an exemple for 3 different initiators (those initiating the mc):
age1data
initiator
receiver
count
total_init
prop_mc
subgroupsize
group
Aaa
Mother
1
3
0.333
1
1
Aaa
Adult
2
3
0.666
40
1
Aaa
Juvenile
0
3
0
20
1
Hee
Mother
0
2
0
1
1
Hee
Adult
0
2
0
40
1
Hee
Juvenile
2
2
1
20
1
Awa
Mother
2
10
0.2
1
2
Awa
Adult
3
10
0.3
7
2
Awa
Juvenile
5
10
0.5
13
2
count: number of mc directed to an individual belonging to each receiver subgroups
total_init: total number of mc by this individual
subgroupsize: number of individuals within the group that belong to the receiver subgroup (for exemple, each individual has 1 mother but the group1 has 40 adults (other than the mother) and 20 juveniles
This is the model I tried:
glmm_ages <- glmer(((count_init/total_init)/subgroupsize)~receiver + (1|group) + (1|initiator),
data = age1data,
family = binomial)
This gives me this error message:
Error in pwrssUpdate(pp, resp, tol = tolPwrss, GQmat = GQmat, compDev = compDev, :
Downdated VtV is not positive definite
In addition: Warning message:
In eval(family$initialize, rho) : non-integer #successes in a binomial glm!
The model works when I do a simple GLM without group and initiator as random variables but I really think I need to include them.
From what I understand, the error message means that some categories are all 1 or all 0, which is the case when an individual is only recorded muzzle contacting its mother once, for exemple (dependent variable becomes 1/1/1 = 1).
I'm trying to understand what I should do from this thread I found http://bbolker.github.io/mixedmodels-misc/ecostats_chap.html#digression-complete-separation
In this section, I'm not sure how to find the number I should be putting instead of "10"?
newdat <- subset(age1data,
abs(resid(glmm_ages,"pearson"))<10)
I'm also not sure what all this means and how can I figure out what is my own variance and standard deviation in my dataset:
impose zero-mean Normal priors on the fixed effects (a 4 × 4 diagonal matrix with diagonal elements equal to 9, for variances of 9 or standard deviations of 3)
Can anyone help me figure out if I'm doing the right thing and this is the solution for me?
I apologize for the length of this post, I wanted to make sure everything was there, hope it's clear!
I've been trying for a while to use Laplace smoothing in different columns but honestly, I haven't come to any working method, and theres not much content about this subject on the internet.
I'm currently working on a database with 43 variables and over 27k observations. the DF (time_data) looks like this:
time_data
T01 T02 T03 T_TOT
1 1 4 3 8
2 5 2 0 7
3 3 1 10 14
T_TOT = T01 + T02 + T03
As the name implies, each value correspond to an amount of seconds. For the sake of the example, let's say the max value T_TOT can be is 15 (seconds).
Because of the values are presented as discrete data, I was planning on decomposing the Laplace Smoothing into differente columns and calculate the new values at the end. The problem is that, in reality, I have 42 variables, and to do this i would end up with way to many variables.
is there any way in r to calculate the Laplace Smoothing?
if not, how could I create a loop to generate the new column with the Laplace values?
Expected Outcome:
T01 T02 T03 T_TOT L_T01 L_T02 Lt03
1 1 4 3 8 ... ... ...
2 5 2 0 7 ... ... ...
3 3 1 10 14 ... ... ...
brief explanation of Laplace smoothing: https://programmerclick.com/article/9417297724/
My general problem: I tend to struggle using ggplot, because it's very data-frame-centric but the objects I work with seem to fit matrices better than data frames. Here is an example (adapted a little).
I have a quantity x that can assume values 0:5, and a "context" that can have values 0 or 1. For each context I have 7 different frequency distributions over the quantity x. (More generally I could have more than two "contexts", more values of x, and more frequency distributions.)
I can represent these 7×2 frequency distributions as a list freqs of two matrices, say:
> freqs
$`context0`
x0 x1 x2 x3 x4 x5
sample1 20 10 10 21 37 2
sample2 34 40 6 10 1 8
sample3 52 4 1 2 17 25
sample4 16 32 25 11 5 10
sample5 28 2 10 4 21 35
sample6 22 13 35 12 13 5
sample7 9 5 43 29 4 10
$`context1`
x0 x1 x2 x3 x4 x5
sample1 15 21 14 15 14 21
sample2 27 8 6 5 29 25
sample3 13 7 5 26 48 0
sample4 33 3 18 11 13 22
sample5 12 23 40 11 2 11
sample6 5 51 2 28 5 9
sample7 3 1 21 10 63 2
or a 3D array.
Or I could use a data.table tablefreqs like this one:
> tablefreqs
context x0 x1 x2 x3 x4 x5
1: 0 20 10 10 21 37 2
2: 0 34 40 6 10 1 8
3: 0 52 4 1 2 17 25
4: 0 16 32 25 11 5 10
5: 0 28 2 10 4 21 35
6: 0 22 13 35 12 13 5
7: 0 9 5 43 29 4 10
8: 1 15 21 14 15 14 21
9: 1 27 8 6 5 29 25
10: 1 13 7 5 26 48 0
11: 1 33 3 18 11 13 22
12: 1 12 23 40 11 2 11
13: 1 5 51 2 28 5 9
14: 1 3 1 21 10 63 2
Now I'd like to draw the following line plot (there's a reason why I need line plots and not, say, histograms or bar plots):
The 7 frequency distributions for context 0, with x as x-axis and the frequency as y-axis, all in the same line plot (with some alpha).
The 7 frequency distributions for context 1, again with x as x-axis and the frequency as y-axis, all in the same line plot (with alpha), but displayed upside-down below the plot for context 0.
Ggplot would surely do this very nicely, but it seems to require some acrobatics with data tables:
– If I use the data table tablefreqs it's not clear to me how to plot all its rows having context==0 in the same plot: ggplot seems to only think column-wise, not row-wise. I could use the six values of x as table rows, but then the "context" values would also end up in a row, and I'm not sure I can subset a data table by values in a row, rather than in a column.
– If I use the matrix freqs, I could create a mini-data-table having x as one column and one frequency distribution as another column, input that into ggplot+geom_line, then go over all 7 frequency distributions in a for-loop maybe. Not clear to me how to tell ggplot to keep the previous plots in this case. Then another for-loop over the two "contexts".
I'd be grateful for suggestions on how to approach this problem in particular, and more generally on what objects to choose for storing this kind of data: matrices? data tables, maybe with a different structure than shown here? some other formats?
I would suggest to familiarize yourself with the concept of what is known as Tidy Data, which are principles for data handling and storage that are adopted by ggplot2 and a number of other packages.
You are free to use a matrix or list of matrices to store your data; however, you can certainly store the data as you describe it (and as I understand it) in a data frame or single table following the following convention of columns:
context | sample | x | freq
I'll show you how I would convert the tablefreqs dataset you shared with us into that format, then how I would go about creating a plot as you are describing it in your question. I'm assuming in this case you only have the two values for context, although you allude to there being more. I'm going to try to interpret correctly what you stated in your question.
Create the Tidy Data frame
Your data frame as shown contains columns x1 through x5 that have values for x spread across more than one column, when you really need these to be converted in the format shown above. This is called "gathering" your data, and you can do that with tidyr::gather().
First, I also need to replicate the naming of your samples according to the matrix dataset, so I'll do that and gather your data:
library(dplyr)
library(tidyr)
library(ggplot2)
# create the sample names
tablefreqs$sample <- rep(paste0('sample',1:7), 2)
# gather the columns together
df <- tablefreqs %>%
gather(key='x', value='freq', -c(context, sample))
Note that in the gather() function, we have to specify to leave alone the two columns df$context and df$sample, as they are not part of the gathering effort. But now we are left with df$x containing character vectors. We can't plot that, because we want the to be in the form of a number (at least... I'm assuming you do). For that, we'll convert using:
df$x <- as.numeric(gsub("[^[:digit:].]", "", df$x))
That extracts the number from each value in df$x and represents it as a number, not a character. We have the opposite issue with df$context, which is actually a discrete factor, and we should represent it as such in order to make plotting a bit easier:
df$context <- factor(df$context)
Create the Plot
Now we're ready to create the plot. From your description, I may not have this perfectly right, but it seems that you want a plot containing both context = 1 and context = 0, and when context = 1 the data should be "upside down". By that, I'm assuming you are talking about plotting df$freq when df$context == 0 and -df$freq when df$context == 1. We could do that using some fancy logic in the ggplot() call, but I find it's easier just to create a new column in your dataset to represent what we will be plotting on the y axis. We'll call this column df$freq_adj and use that for plotting:
df$freq_adj <- ifelse(df$context==1, -df$freq, df$freq)
Then we create the plot. I'll explain a bit below the result:
ggplot(df, aes(x=x, y=freq_adj)) +
geom_line(
aes(color=context, linetype=sample)
) +
geom_hline(yintercept=0, color='gray50') +
scale_x_continuous(expand=expansion(mult=0)) +
theme_bw()
Without some clearer description or picture of what you were looking to do, I took some liberties here. I used color to discriminate between the two values for context, and I'm using linetype to discriminate the different samples. I also added a line at 0, since it seemed appropriate to do so here, and the scale_x_continuous() command is removing the extra white space that is put in place at the extreme ends of the data.
An alternative that is maybe closer to your description would be to physically have a separation between the two plots, and represent context = 1 as a physically separate plot compared to context = 0, with one over top of the other.
Here's the code and plot:
ggplot(df, aes(x=x, y=freq_adj)) +
geom_line(aes(group=sample), alpha=0.3) +
facet_grid(context ~ ., scales='free_y') +
scale_x_continuous(expand=expansion(mult=0)) +
theme_bw()
There the use of aes(group=sample) is quite important, since I want all the lines for each sample to be the same (alpha setting and color), yet ggplot2 needs to know that the connections between the points should be based on "sample". This is done using the group= aesthetic. The scales='free_y' argument on facet_grid() allows the y axis scale to shrink and fit the data according to each facet.
I am trying to compute robust/cluster standard errors after using mlogit() to fit a Multinomial Logit (MNL) in a Discrete Choice problem. Unfortunately, I suspect I am having problems with it because I am using data in long format (this is a must in my case), and getting the error #Error in ef/X : non-conformable arrays after sandwich::vcovHC( , "HC0").
The Data
For illustration, please gently consider the following data. It represents data from 5 individuals (id_ind ) that choose among 3 alternatives (altern). Each of the five individuals chose three times; hence we have 15 choice situations (id_choice). Each alternative is represented by two generic attributes (x1 and x2), and the choices are registered in y (1 if selected, 0 otherwise).
df <- read.table(header = TRUE, text = "
id_ind id_choice altern x1 x2 y
1 1 1 1 1.586788801 0.11887832 1
2 1 1 2 -0.937965347 1.15742493 0
3 1 1 3 -0.511504401 -1.90667519 0
4 1 2 1 1.079365680 -0.37267925 0
5 1 2 2 -0.009203032 1.65150370 1
6 1 2 3 0.870474033 -0.82558651 0
7 1 3 1 -0.638604013 -0.09459502 0
8 1 3 2 -0.071679538 1.56879334 0
9 1 3 3 0.398263302 1.45735788 1
10 2 4 1 0.291413453 -0.09107974 0
11 2 4 2 1.632831160 0.92925495 0
12 2 4 3 -1.193272276 0.77092623 1
13 2 5 1 1.967624379 -0.16373709 1
14 2 5 2 -0.479859282 -0.67042130 0
15 2 5 3 1.109780885 0.60348187 0
16 2 6 1 -0.025834772 -0.44004183 0
17 2 6 2 -1.255129594 1.10928280 0
18 2 6 3 1.309493274 1.84247199 1
19 3 7 1 1.593558740 -0.08952151 0
20 3 7 2 1.778701074 1.44483791 1
21 3 7 3 0.643191170 -0.24761157 0
22 3 8 1 1.738820924 -0.96793288 0
23 3 8 2 -1.151429915 -0.08581901 0
24 3 8 3 0.606695064 1.06524268 1
25 3 9 1 0.673866953 -0.26136206 0
26 3 9 2 1.176959443 0.85005871 1
27 3 9 3 -1.568225496 -0.40002252 0
28 4 10 1 0.516456176 -1.02081089 1
29 4 10 2 -1.752854918 -1.71728381 0
30 4 10 3 -1.176101700 -1.60213536 0
31 4 11 1 -1.497779616 -1.66301234 0
32 4 11 2 -0.931117325 1.50128532 1
33 4 11 3 -0.455543630 -0.64370825 0
34 4 12 1 0.894843784 -0.69859139 0
35 4 12 2 -0.354902281 1.02834859 0
36 4 12 3 1.283785176 -1.18923098 1
37 5 13 1 -1.293772990 -0.73491317 0
38 5 13 2 0.748091387 0.07453705 1
39 5 13 3 -0.463585127 0.64802031 0
40 5 14 1 -1.946438667 1.35776140 0
41 5 14 2 -0.470448172 -0.61326604 1
42 5 14 3 1.478763383 -0.66490028 0
43 5 15 1 0.588240775 0.84448489 1
44 5 15 2 1.131731049 -1.51323232 0
45 5 15 3 0.212145247 -1.01804594 0
")
The problem
Consequently, we can fit an MNL using mlogit() and extract their robust variance-covariance as follows:
library(mlogit)
library(sandwich)
mo <- mlogit(formula = y ~ x1 + x2|0 ,
method ="nr",
data = df,
idx = c("id_choice", "altern"))
sandwich::vcovHC(mo, "HC0")
#Error in ef/X : non-conformable arrays
As we can see there is an error produced by sandwich::vcovHC, which says that ef/X is non-conformable. Where X <- model.matrix(x) and ef <- estfun(x, ...). After looking through the source code on the mirror on GitHub I spot the problem which comes from the fact that, given that the data is in long format, ef has dimensions 15 x 2 and X has 45 x 2.
My workaround
Given that the show must continue, I am computing the robust and cluster standard errors manually using some functions that I borrow from sandwich and I adjusted to accommodate the Stata's output.
> Robust Standard Errors
These lines are inspired on the sandwich::meat() function.
psi<- estfun(mo)
k <- NCOL(psi)
n <- NROW(psi)
rval <- (n/(n-1))* crossprod(as.matrix(psi))
vcov(mo) %*% rval %*% vcov(mo)
# x1 x2
# x1 0.23050261 0.09840356
# x2 0.09840356 0.12765662
Stata Equivalent
qui clogit y x1 x2 ,group(id_choice) r
mat li e(V)
symmetric e(V)[2,2]
y: y:
x1 x2
y:x1 .23050262
y:x2 .09840356 .12765662
> Clustered Standard Errors
Here, given that each individual answers 3 questions is highly likely that there is some degree of correlation among individuals; hence cluster corrections should be preferred in such situations. Below I compute the cluster correction in this case and I show the equivalence with the Stata output of clogit , cluster().
id_ind_collapsed <- df$id_ind[!duplicated(mo$model$idx$id_choice,)]
psi_2 <- rowsum(psi, group = id_ind_collapsed )
k_cluster <- NCOL(psi_2)
n_cluster <- NROW(psi_2)
rval_cluster <- (n_cluster/(n_cluster-1))* crossprod(as.matrix(psi_2))
vcov(mo) %*% rval_cluster %*% vcov(mo)
# x1 x2
# x1 0.1766707 0.1007703
# x2 0.1007703 0.1180004
Stata equivalent
qui clogit y x1 x2 ,group(id_choice) cluster(id_ind)
symmetric e(V)[2,2]
y: y:
x1 x2
y:x1 .17667075
y:x2 .1007703 .11800038
The Question:
I would like to accommodate my computations within the sandwich ecosystem, meaning not computing the matrices manually but actually using the sandwich functions. Is it possible to make it work with models in long format like the one described here? For example, providing the meat and bread objects directly to perform the computations? Thanks in advance.
PS: I noted that there is a dedicated bread function in sandwich for mlogit, but I could not spot something like meat for mlogit, but anyways I am probably missing something here...
Why vcovHC does not work for mlogit
The class of HC covariance estimators can just be applied in models with a single linear predictor where the score function aka estimating function is the product of so-called "working residuals" and a regressor matrix. This is explained in some detail in the Zeileis (2006) paper (see Equation 7), provided as vignette("sandwich-OOP", package = "sandwich") in the package. The ?vcovHC also pointed to this but did not explain it very well. I have improved this in the documentation at http://sandwich.R-Forge.R-project.org/reference/vcovHC.html now:
The function meatHC is the real work horse for estimating the meat of HC sandwich estimators - the default vcovHC method is a wrapper calling sandwich and bread. See Zeileis (2006) for more implementation details. The theoretical background, exemplified for the linear regression model, is described below and in Zeileis (2004). Analogous formulas are employed for other types of models, provided that they depend on a single linear predictor and the estimating functions can be represented as a product of “working residual” and regressor vector (Zeileis 2006, Equation 7).
This means that vcovHC() is not applicable to multinomial logit models as they generally use separate linear predictors for the separate response categories. Similarly, two-part or hurdle models etc. are not supported.
Basic "robust" sandwich covariance
Generally, for computing the basic Eicker-Huber-White sandwich covariance matrix estimator, the best strategy is to use the sandwich() function and not the vcovHC() function. The former works for any model with estfun() and bread() methods.
For linear models sandwich(..., adjust = FALSE) (default) and sandwich(..., adjust = TRUE) correspond to HC0 and HC1, respectively. In a model with n observations and k regression coefficients the former standardizes with 1/n and the latter with 1/(n-k).
Stata, however, divides by 1/(n-1) in logit models, see:
Different Robust Standard Errors of Logit Regression in Stata and R. To the best of my knowledge there is no clear theoretical reason for using specifically one or the other adjustment. And already in moderately large samples, this makes no difference anyway.
Remark: The adjustment with 1/(n-1) is not directly available in sandwich() as an option. However, coincidentally, it is the default in vcovCL() without specifying a cluster variable (i.e., treating each observation as a separate cluster). So this is a convenient "trick" if you want to get exactly the same results as Stata.
Clustered covariance
This can be computed "as usual" via vcovCL(..., cluster = ...). For mlogit models you just have to consider that the cluster variable just needs to be provided once (as opposed to stacked several times in long format).
Replicating Stata results
With the data and model from your post:
vcovCL(mo)
## x1 x2
## x1 0.23050261 0.09840356
## x2 0.09840356 0.12765662
vcovCL(mo, cluster = df$id_choice[1:15])
## x1 x2
## x1 0.1766707 0.1007703
## x2 0.1007703 0.1180004
I have a dataframe like this
id anxiety_score anxiety_diagnosis
1 0 nomal
2 0 normal
3 13 mild
4 12 mild
5 15 mild
6 13 mild
7 7 normal
8 4 normal
9 16 severe
10 17 severe
11 21 severe
12 1 normal
people including three diagnosis of anxiety (normal, mild, severe) finished my anxiety scale, of which the score are showed in the column named "anxiety_score". Now I want to infer the distribution (i.e. mean and SD) of the score accoring to the diagnosis. It means that after I know the distribution of the scores, I can locate the participants, and the location match the diagnosis. How could I do that in R?