According to Wikipedia,, the entropy of a logistic distribution is ln(s) + 2, where s > 0 is the scale parameter.
However, ln(s) + 2 with s > 0 can result a negative entropy value, such as,
>>> s = 0.1
>>> print (math.log(s) + 2)
I am confused here. How could an entropy be negative?
After checking "A Mathematical Theory of Communication" , I found that "The entropy of a continuous distribution can be negative.". As the logistic distribution is a continuous distribution, the entropy can be negative.
There are three main categories to classify communication systems: discrete, continuous and mixed.
discrete: a sequence of choices from a finite set. We can think this
is the discrete variable in statistics.
continuous: "A continuous
system is one in which the message and signal are both treated as
continuous functions, e.g., radio or television". We can think this
is the continuous variable which has an infinite number of values in
mixed: discrete + continuous
The definition/equation of "entropy" for discrete set of probabilities and a continuous distribution are different ("20. ENTROPY OF A CONTINUOUS DISTRIBUTION" in the paper). I only knew entropy for discrete variables before. Now, it is clear for me.
I have a query about the output statistics gained from linear mixed models (using the lmer function) relative to the output statistics taken from the estimated marginal means gained from this model
Essentially, I am running an LMM comparing the within-subjects effect of different contexts (with "Negative" coded as the baseline) on enjoyment ratings. The LMM output suggests that the difference between negative and polite contexts is not significant, with a p-value of .35. See the screenshot below with the relevant line highlighted:
LMM output
However, when I then run the lsmeans function on the same model (with the Holm correction), the p-value for the comparison between Negative and Polite context categories is now .05, and all of the other statistics have changed too. Again, see the screenshot below with the relevant line highlighted:
LSMeans output
I'm probably being dense because my understanding of LMMs isn't hugely advanced, but I've tried to Google the reason for this and yet I can't seem to find out why? I don't think it has anything to do with the corrections because the smaller p-value is observed when the Holm correction is used. Therefore, I was wondering why this is the case, and which value I should report/stick with and why?
Regression coefficients and marginal means are not one and the same. Once you learn these concepts it'll be easier to figure out which one is more informative and therefore which one you should report.
After we fit a regression by estimating its coefficients, we can predict the outcome yi given the m input variables Xi = (Xi1, ..., Xim). If the inputs are informative about the outcome, the predicted yi is different for different Xi. If we average the predictions yi for examples with Xij = xj, we get the marginal effect of the jth feature at the value xj. It's crucial to keep track of which inputs are kept fixed (and at what values) and which inputs are averaged over (aka marginalized out).
In your case, contextCatPolite in the coefficients summary is the difference between Polite and Negative when smileType is set to its reference level (no reward, I'd guess). In the emmeans contrasts, Polite - Negative is the average difference over all smileTypes.
Interactions have a way of making interpretation more challenging and your model includes an interaction between smileType and contextCat. See Interaction analysis in emmeans.
To add to #dipetkov's answer, the coefficients in your LMM are based on treatment coding (sometimes called 'dummy' coding). With the interactions in the model, these coefficients are no longer "main-effects" in the traditional sense of factorial ANOVA. For instance, if you have:
y = b_0 + b_1(X_1) + b_2(X_2) + b_3 (X_1 * X_2)
...b_1 is "the effect of X_1" only when X_2 = 0:
y = b_0 + b_1(X_1) + b_2(0) + b_3 (X_1 * 0)
y = b_0 + b_1(X_1)
Thus, as #dipetkov points out, 1.625 is not the difference between Negative and Polite on average across all other factors (which you get from emmeans). Instead, this coefficient is the difference between Negative and Polite specifically when smileType = 0.
If you use contrast coding instead of treatment coding, then the coefficients from the regression output would match the estimated marginal means, because smileType = 0 would now be on average across smile types. The coding scheme thus has a huge effect on the estimated values and statistical significance of regression coefficients, but it should not effect F-tests based on the reduction in deviance/variance (because no matter how you code it, a given variable explains the same amount of variance).
Seems like a very basic question but I just wanted to confirm. I'm running a multivariable linear regression model adjusted for different types of covariates (some numeric, some categorical, etc.). A sample of the model is shown below:
fit <- ols(outcome ~ exposure + age + zbmi + income + sex + ethnicity)
Both the "outcome" and "exposure" are continuous numerical variables.
My question is, if say I run the model and the beta estimate, 95% CI, and p-value looks something like the one below:
B = -0.20 // 95%CI: [-0.50, -0.001] // p = 0.04
Would it be appropriate to interpret this as: "For every 1 unit increase of the exposure is a 0.20 decrease in the outcome"?
What I want to know is how did it determine the order of "per 1 unit increase"? Is that just the default order of how R sorts continuous variables when running it in a regression model? Also, since both my outcome and exposure are continuous variables, does this mean that it automatically sorted these variables in ascending order (by default?) when I ran the model?
Just a bit confused on whether this sorting order matters before I run any regression model using continuous variables. Any tips / help would be appreciated!
Under OLS, there is no ordering or sorting of the predictors. The right hand side of the equation is summed before subtracting it from the left hand side. Then the square of this difference is minimized. So with this technique, the predictors do not have to be sorted in any way.
For interpretation of your betas, the predictors are supposed to be independent, so it doesn't matter in which order you take them.
Side note: In reality, you might get some dependence among the predictors, and this will be reflected in the standard errors being slightly larger.
Suppose I have a following formula for a mixed effects model:
Performance ~ 1 + WorkingHours + Tenure + (1 + WorkingHours + Tenure || JobClass)
then I can specify priors for fixed slopes and fixed intercept as:
prior = normal(c(mu1,mu2), c(sd1,sd2), autoscale = FALSE)
prior_intercept = normal(mean, scale, autoscale = FALSE)
But how do I specify the priors for random slopes and intercept using
prior_covariance = decov(regularization, concentration, shape, scale)
lkj(regularization, scale, df)
if I know the variance between the slopes and intercepts and the correlation between them.
I am unable to understand how to specify the parameters for the above mixed effects formula.
Because you're working in a Bayesian model, you aren't going to specify the correlations or variances. You're going to specify a likelihood distribution of covariance matrices (by way of the correlation matrix and vector of variances) by giving the values for a few parameters.
The regularization parameter is a positive real value that determines how likely things are to be correlated. A value of 1 is sort of the "anything's possible" option (this is the default). Values greater than 1 mean that you believe there are few, if any, correlations. Values less than 1 mean you believe there is a lot of correlation.
The scale parameter is related to the sum of the variances. In particular, the scale parameter is equal to the square root of the average variance.
The concentration parameter is used to control how the total variance is distributed among the different variables. A value of 1 is saying you don't have an expectation. Larger values say that you believe that the variables have similar proportions of the total variance. Values between 0 and 1 mean that you think there are dissimilar contributions.
The shape parameter is used for a Gamma distribution that acts as a prior on the scale.
Then, finally, df is your prior degrees of freedom.
So, decov and lkj are each giving you a different way to express your expectations about properties of the covariance matrix, but they won't let you specify which specific variables you believe to be correlated with which other specific variables. It should decide that as part of the model fitting process.
This is all from the rstanarm documentation
I want to determine the marginal effects of each dependent variable in a probit regression as follows:
predict the (base) probability with the mean of each variable
for each variable, predict the change in probability compared to the base probability if the variable takes the value of mean + 1x standard deviation of the variable
In one of my regressions, I have a multiplicative variable, as follows:
my_probit <- glm(a ~ b + c + I(b*c), family = binomial(link = "probit"), data=data)
Two questions:
When I determine the marginal effects using the approach above, will the value of the multiplicative term reflect the value of b or c taking the value mean + 1x standard deviation of the variable?
Same question, but with an interaction term (* and no I()) instead of a multiplicative term.
Many thanks
When interpreting the results of models involving interaction terms, the general rule is DO NOT interpret coefficients. The very presence of interactions means that the meaning of coefficients for terms will vary depending on the other variate values being used for prediction. The right way to go about looking at the results is to construct a "prediction grid", i.e. a set of values that are spaced across the range of interest (hopefully within the domain of data support). The two essential functions for this process are expand.grid and predict.
dgrid <- expand.grid(b=fivenum(data$b)[2:4], c=fivenum(data$c)[2:4]
# A grid with the upper and lower hinges and the medians for `a` and `b`.
predict(my_probit, newdata=dgrid)
You may want to have the predictions on a scale other than the default (which is to return the linear predictor), so perhaps this would be easier to interpret if it were:
predict(my_probit, newdata=dgrid, type ="response")
Be sure to read ?predict and ?predict.glm and work with some simple examples to make sure you are getting what you intended.
Predictions from models containing interactions (at least those involving 2 covariates) should be thought of as being surfaces or 2-d manifolds in three dimensions. (And for 3-covariate interactions as being iso-value envelopes.) The reason that non-interaction models can be decomposed into separate term "effects" is that the slopes of the planar prediction surfaces remain constant across all levels of input. Such is not the case with interactions, especially those with multiplicative and non-linear model structures. The graphical tools and insights that one picks up in a differential equations course can be productively applied here.
If I have a cox proportional hazards model with as a predictor of event a categorical variable (and other covariates such as age etc), let's say for example the categorial variable I am interested in is tumor size which can be 0-10, 10-20, 20-30 for example, and I see there is a trend towards an higher HR of death with the increasing in tumor size, how can I compute in r and get a p?
There is a bit of danger in accepting your incomplete specification since you could have a size classification that will mess this up, as for example: 0-10, 10-20, 20-30, 30-80, 80-120, 120-240.
Unless you have carefully constructed your factor to have correctly ordered ascending levels, what I am about to suggest as a shortcut will fail.
survmdl <- coxph(Surv(tie,event) ~ as.numeric(fact), data=dat)
You will get a "test of trend" which would be interpreted as the per-category increase in the log(Hazard) for rising size and it would be a single coefficient. So post your actual factor levels if you wnat a better considered reply.