Separate data in two groups - r

I have this data:
x1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22)
y1 = c(1, 6, 2, 5, 4, 7, 9, 6, 8, 4, 5, 6, 5, 5, 6, 7, 5, 8, 9,
5, 4, 7)
plot(x1, y1)
fit <- lm(y1 ~ x1)
fit
abline(fit, col = "black")
prediction <- predict(fit)`
I want to separate the data in two groups using the next condition:
if (y1 < prediction) {print("Negative number")}
else if (y1 > prediction) {print("Positive number")}
But it appears: Warning messages:
1: In if (y1 < prediction) { :
the condition has length > 1 and only the first element will be used
2: In if (y1 > prediction) { :
the condition has length > 1 and only the first element will be used
Can someone tell me how to fix it?

Related

How to get average marginal effects (AMEs) with standard errors of a multinomial logit model?

I want to get the average marginal effects (AME) of a multinomial logit model with standard errors. For this I've tried different methods, but they haven't led to the goal so far.
Best attempt
My best attempt was to get the AMEs by hand using mlogit which I show below.
library(mlogit)
ml.d <- mlogit.data(df1, choice="Y", shape="wide") # shape data for `mlogit()`
ml.fit <- mlogit(Y ~ 1 | D + x1 + x2, reflevel="1", data=ml.d) # fit the model
# coefficient names
c.names <- all.vars(ml.fit$call)[2:4]
# get marginal effects
ME.mnl <- sapply(c.names, function(x)
stats::effects(ml.fit, covariate=x, data=ml.d),
simplify=FALSE)
# get AMEs
(AME.mnl <- t(sapply(ME.mnl, colMeans)))
# 1 2 3 4 5
# D -0.03027080 -0.008806072 0.0015410569 0.017186531 0.02034928
# x1 -0.02913234 -0.015749598 0.0130577842 0.013240212 0.01858394
# x2 -0.02724650 -0.005482753 0.0008575982 0.005331181 0.02654047
I know these values are the correct ones. However, I could not get the correct standard errors by simply doing the columns' standard deviations:
# standard errors - WRONG!
(AME.mnl.se <- t(sapply(E.mnl, colSdColMeans)))
(Note: colSdColMeans() for columns' SD is provided here.)
Accordingly this also led me to the wrong t-values:
# t values - WRONG!
AME.mnl / AME.mnl.se
# 1 2 3 4 5
# D -0.7110537 -0.1615635 0.04013228 0.4190057 0.8951484
# x1 -0.7170813 -0.2765212 0.33325968 0.3656893 0.8907836
# x2 -0.7084573 -0.1155825 0.02600653 0.1281190 0.8559794
Whereas I know the correct t-values for this case are these:
# D -9.26 -1.84 0.31 4.29 8.05
# x1 -6.66 -2.48 1.60 1.50 3.22
# x2 -2.95 -0.39 0.06 0.42 3.21
I learned that there should be a "delta method", but I only found some code for a very special case with interactions at Cross Validated.
Failed attempts
1.) Package margins doesn't seem to be able to handle "mlogit"
objects:
library(margins)
summary(margins(ml.fit))
2.) There's another package for mlogits, nnet,
library(nnet)
ml.fit2 <- multinom(Y ~ D + x1 + x2, data=df1)
summary(ml.fit2)
but margins can't handle this correctly either:
> summary(margins(ml.fit2))
factor AME SE z p lower upper
D -0.0303 NA NA NA NA NA
x1 -0.0291 NA NA NA NA NA
x2 -0.0272 NA NA NA NA NA
3.) There's also a package around that claims to calculate "Average Effects for Multinomial Logistic Regression Models",
library(DAMisc)
mnlChange2(ml.fit2, varnames="D", data=df1)
but I couldn't get a drop of milk out of it, since the function yields just nothing (even not with the function's example).
How now can we get AMEs with standard errors / t-statistics of a multinomial logit model with R?
Data
df1 <- structure(list(Y = c(3, 4, 1, 2, 3, 4, 1, 5, 2, 3, 4, 2, 1, 4,
1, 5, 3, 3, 3, 5, 5, 4, 3, 5, 4, 2, 5, 4, 3, 2, 5, 3, 2, 5, 5,
4, 5, 1, 2, 4, 3, 1, 2, 3, 1, 1, 3, 2, 4, 2, 2, 4, 1, 5, 3, 1,
5, 2, 3, 4, 2, 4, 5, 2, 4, 1, 4, 2, 1, 5, 3, 2, 1, 4, 4, 1, 5,
1, 1, 1, 4, 5, 5, 3, 2, 3, 3, 2, 4, 4, 5, 3, 5, 1, 2, 5, 5, 1,
2, 3), D = c(12, 8, 6, 11, 5, 14, 0, 22, 15, 13, 18, 3, 5, 9,
10, 28, 9, 16, 17, 14, 26, 18, 18, 23, 23, 12, 28, 14, 10, 15,
26, 9, 2, 30, 18, 24, 27, 7, 6, 25, 13, 8, 4, 16, 1, 4, 5, 18,
21, 1, 2, 19, 4, 2, 16, 17, 23, 15, 13, 21, 24, 14, 27, 6, 20,
6, 19, 8, 7, 23, 11, 11, 1, 22, 21, 4, 27, 6, 2, 9, 18, 30, 26,
22, 10, 1, 4, 7, 26, 15, 26, 18, 30, 1, 11, 29, 25, 3, 19, 15
), x1 = c(13, 12, 4, 3, 16, 16, 15, 13, 1, 15, 10, 16, 1, 17,
7, 13, 12, 6, 8, 16, 16, 11, 7, 16, 5, 13, 12, 16, 17, 6, 16,
9, 14, 16, 15, 5, 7, 2, 8, 2, 9, 9, 15, 13, 9, 4, 16, 2, 11,
13, 11, 6, 4, 3, 7, 4, 12, 2, 16, 14, 3, 13, 10, 11, 10, 4, 11,
16, 8, 12, 14, 9, 4, 16, 16, 12, 9, 10, 6, 1, 3, 8, 7, 7, 5,
16, 17, 10, 4, 15, 10, 8, 3, 13, 9, 16, 12, 7, 4, 11), x2 = c(12,
19, 18, 19, 15, 12, 15, 16, 15, 11, 12, 16, 17, 14, 12, 17, 17,
16, 12, 20, 11, 11, 15, 14, 18, 10, 14, 13, 10, 14, 18, 18, 18,
17, 18, 14, 16, 19, 18, 16, 18, 14, 17, 10, 16, 12, 16, 15, 11,
18, 19, 15, 19, 11, 16, 10, 20, 14, 10, 12, 10, 15, 13, 15, 11,
20, 11, 12, 16, 16, 11, 15, 11, 11, 10, 10, 16, 11, 20, 17, 20,
17, 16, 11, 18, 19, 18, 14, 17, 11, 16, 11, 18, 14, 15, 16, 11,
14, 11, 13)), class = "data.frame", row.names = c(NA, -100L))
We can do something very similar to what is done in your linked answer. In particular, first we want a function that would compute AMEs at a given vector of coefficients. For that we can define
AME.fun <- function(betas) {
tmp <- ml.fit
tmp$coefficients <- betas
ME.mnl <- sapply(c.names, function(x)
effects(tmp, covariate = x, data = ml.d), simplify = FALSE)
c(sapply(ME.mnl, colMeans))
}
where the second half is yours, while in the first one I use a trick to take the same ml.fit object and to change its coefficients. Next we find the jacobian with
require(numDeriv)
grad <- jacobian(AME.fun, ml.fit$coef)
and apply the delta method. Square roots of the diagonal of grad %*% vcov(ml.fit) %*% t(grad) is what we want. Hence,
(AME.mnl.se <- matrix(sqrt(diag(grad %*% vcov(ml.fit) %*% t(grad))), nrow = 3, byrow = TRUE))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0.003269320 0.004788536 0.004995723 0.004009762 0.002527462
# [2,] 0.004375795 0.006348496 0.008168883 0.008844684 0.005763966
# [3,] 0.009233616 0.014048212 0.014713090 0.012702188 0.008261734
AME.mnl / AME.mnl.se
# 1 2 3 4 5
# D -9.259050 -1.8389907 0.30847523 4.2861720 8.051269
# x1 -6.657611 -2.4808393 1.59847852 1.4969683 3.224159
# x2 -2.950794 -0.3902812 0.05828811 0.4197057 3.212458
which coincides with Stata's results.
If you use vce="bootstraps" within margin function then it provides SE with Confidence interval as well
summary(margins(ml.fit2,vce="bootstraps"))
The terminology for “marginal effects” is very inconsistent across
disciplines. Since you refer to the margins package, I assume that you
use the expression “Average Marginal Effects” in the same that that the
margins developers used it, which is the result of this procedure:
Compute the slope of the outcome with respect to D for every row
in the original dataset (unit-level marginal effects).
Take the average of the unit-level slopes (average marginal effect)
In models like nnet::multinom, the slopes will be different for every
level of the outcome variable. There will thus be one average marginal
effect per level, per regressor.
Using the marginaleffects package and the data you supplied, we get:
library(nnet)
library(marginaleffects)
mod <- nnet::multinom(Y ~ D + x1*x2, data=df1, trace = FALSE)
marginaleffects(mod) |> summary()
Group Term Effect Std. Error z value Pr(>|z|) 2.5 % 97.5 %
1 1 D -0.027558 0.004183 -6.5878 4.4625e-11 -3.576e-02 -0.019359
2 1 x1 -0.026789 0.003916 -6.8411 7.8596e-12 -3.446e-02 -0.019114
3 1 x2 -0.026542 0.009812 -2.7051 0.00682871 -4.577e-02 -0.007311
4 2 D -0.012115 0.004702 -2.5766 0.00997729 -2.133e-02 -0.002899
5 2 x1 -0.018223 0.006017 -3.0287 0.00245619 -3.002e-02 -0.006430
6 2 x2 -0.007045 0.013101 -0.5377 0.59078427 -3.272e-02 0.018633
7 3 D 0.001536 0.005877 0.2614 0.79380433 -9.982e-03 0.013054
8 3 x1 0.012451 0.008775 1.4189 0.15592516 -4.748e-03 0.029650
9 3 x2 0.002193 0.015573 0.1408 0.88801728 -2.833e-02 0.032715
10 4 D 0.016300 0.004325 3.7689 0.00016399 7.823e-03 0.024776
11 4 x1 0.018111 0.008789 2.0606 0.03934167 8.845e-04 0.035338
12 4 x2 0.013543 0.013266 1.0208 0.30733424 -1.246e-02 0.039544
13 5 D 0.021837 0.003387 6.4479 1.1343e-10 1.520e-02 0.028475
14 5 x1 0.014449 0.005402 2.6749 0.00747469 3.862e-03 0.025037
15 5 x2 0.017851 0.009072 1.9677 0.04909878 7.048e-05 0.035631
Model type: multinom
Prediction type: probs

Quade test in R

I would like to perform a Quade test with more than one covariate in R. I know the command quade.test and I have seen the example below:
## Conover (1999, p. 375f):
## Numbers of five brands of a new hand lotion sold in seven stores
## during one week.
y <- matrix(c( 5, 4, 7, 10, 12,
1, 3, 1, 0, 2,
16, 12, 22, 22, 35,
5, 4, 3, 5, 4,
10, 9, 7, 13, 10,
19, 18, 28, 37, 58,
10, 7, 6, 8, 7),
nrow = 7, byrow = TRUE,
dimnames =
list(Store = as.character(1:7),
Brand = LETTERS[1:5]))
y
quade.test(y)
My question is as follows: how could I introduce more than one covariate? In this example the covariate is the Store variable.

How to dynamically indicate groups in an R in plot

This is my data
x = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22)
y = c(1, 6, 2, 5, 4, 7, 9, 6, 8, 4, 5, 6, 5, 5, 6, 7, 5, 8, 9,
5, 4, 7)
plot(x, y)
fit <- lm(y ~ x)
fit
abline(fit, col = "black", lwd = "1")
I would like to the plot to split the data into two groups, observations above the regression line and and those under the regression line. How can I do this?
You can use predict to get the fitted value at each x, and then a logical comparison between the observed and fitted to test if they're above or below the line. Then set the colors you plot based on this logical comparison.
prediction <- predict(fit)
colors<-ifelse(y>prediction,1,2)
plot(x,y,col=colors)
abline(fit, col= "black",lwd="1")

simultaneous function use in R

I have 2 function in R, the first is :
cyii=function(a,b,L)
{
d=outer(a,b,`-`);I=outer(a,b,`==`)
d=d[upper.tri(d,diag=T)];I=I[upper.tri(I,diag=T)]
L[1]^2*exp(-0.25*d^2/L[2]^2) + I*L[3]^2
}
The second function called the first function many time
zii=list()
for(i in 1:(n-1))
{
zii[[i]]=cyii(v1,v1,H[c(5*i-4,5*i-3,5*n-3+i)])
}
Where v1 is any vector of numbers and H is a vector of parameters. Below is a reproducable example.
dput(v1)=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
dput(H)=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
n=3
Is there a possible way to use the apply function or any other in order to avoid using the for loop, given that for every n I need to provide the first function with diffrent values from H
Yes, shouldn't be a problem at all
# What you supplied
cyii=function(a,b,L)
{
d=outer(a,b,`-`);I=outer(a,b,`==`)
d=d[upper.tri(d,diag=T)];I=I[upper.tri(I,diag=T)]
L[1]^2*exp(-0.25*d^2/L[2]^2) + I*L[3]^2
}
v1=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
H=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
n=3
zii=list()
for(i in 1:(n-1))
{
zii[[i]]=cyii(v1,v1,H[c(5*i-4,5*i-3,5*n-3+i)])
}
# Change it up a little to use 'lapply'
N <- 1:(n-1)
z2 <- lapply(N, function(i){
cyii(v1,v1,H[c(5*i-4,5*i-3,5*n-3+i)])
})
identical(zii,z2)
#[1] TRUE

How to get same results of Wilcoxon sign rank test in R and SAS

R code:
x <- c(9, 5, 9 ,10, 13, 8, 8, 13, 18, 30)
y <- c(10, 6, 9, 8, 11, 4, 1, 3, 3, 10)
library(exactRankTests)
wilcox.exact(y,x, paired = TRUE, alternative = "two.sided")
The results: V = 3, p-value = 0.01562
SAS code:
data aaa;
set aaa;
diff=x-y;
run;
proc univariate;
var diff;
run;
The results: S=19.5 Pr >= |S| 0.0156
How to get statistics S in R?
If n<=20 the exact P was same in SAS and R,but if n>20 the results were different.
x <- c(9, 5, 9 ,10, 13, 8, 8, 13, 18, 30,9, 5, 9 ,10, 13, 8, 8, 13, 18, 30,9,11,12,10)
y <- c(10, 6, 9, 8, 11, 4, 1, 3, 3, 10,10, 6, 9, 8, 11, 4, 1, 3, 3, 10,10,12,11,12)
wilcox.exact(y,x,paired=TRUE, alternative = "two.sided",exact = FALSE)
The results: V = 34, p-value = 0.002534
The SAS results:S=92.5 Pr >= |S| 0.0009
How to get the same statistics S and P value in SAS and R? Thank you!

Resources