Quade test in R - r

I would like to perform a Quade test with more than one covariate in R. I know the command quade.test and I have seen the example below:
## Conover (1999, p. 375f):
## Numbers of five brands of a new hand lotion sold in seven stores
## during one week.
y <- matrix(c( 5, 4, 7, 10, 12,
1, 3, 1, 0, 2,
16, 12, 22, 22, 35,
5, 4, 3, 5, 4,
10, 9, 7, 13, 10,
19, 18, 28, 37, 58,
10, 7, 6, 8, 7),
nrow = 7, byrow = TRUE,
dimnames =
list(Store = as.character(1:7),
Brand = LETTERS[1:5]))
y
quade.test(y)
My question is as follows: how could I introduce more than one covariate? In this example the covariate is the Store variable.

Related

Zelen Exact Test - Trying to use a k 2x2 in the function zelen.test()

I am trying to use the zelen.test function on the package NSM3. I am having difficulty reading the data into the function.
You can recreate my data using
data <- c(4, 2, 3, 3, 8, 3, 4, 7, 0, 7, 1, 1, 12, 13,
74, 74, 77, 85, 31, 37, 11, 7, 18, 18, 96, 97, 48, 40)
events <- matrix(data, ncol = 2)
The documentation on CRAN states that zelen.test(z, example = F, r = 3) where z is an array of k 2 x 2 matrix, example is set to FALSE because it returns a p-value for an example I cannot access, and r is the number of decimals the users wants returned in the p-value.
I've tried:
zelen.test(events, r = 4)
I thought it may want the study number and the trial data, so I tried this:
studies <- c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7)
data <- c(4, 2, 3, 3, 8, 3, 4, 7, 0, 7, 1, 1, 12, 13,
74, 74, 77, 85, 31, 37, 11, 7, 18, 18, 96, 97, 48, 40)
events <- matrix(cbind(studies, events), ncol = 3)
zelen.test(events, r = 4)
but it continues to return and error stating
"Error in z[1, 1, ] : incorrect number of dimensions" for both cases I tried above.
Any help would be greatly appreciated!
If we check the source code by typing zelen.test on the console, if the example = TRUE, it is constructing a 3D array
...
if (example)
z <- array(c(2, 1, 2, 5, 1, 5, 4, 1), dim = c(2, 2, 2))
...
The input z dim is also specified in the documentation of ?zelen.test
z - data as an array of k 2x2 matrices. Small data sets only!
So, we may need to construct an array of dimensions 3
library(NSM3)
z1 <- array(c(4, 2, 3, 3, 8, 3, 4, 7), c(2, 2, 2))
zelen.test(z1, r = 4)
# Zelen's test:
# P = 1
Or with 3rd dimension of length 3
z1 <- array( c(4, 2, 3, 3, 8, 3, 4, 7, 0, 7, 1, 1), c(2, 2, 3))
zelen.test(z1, r = 4)
# Zelen's test:
#P = 0.1238

Logarithmic scaling with ggplot2 in R

I am trying to create a diagram using ggplot2. There are several very small values to be displayed and a few larger ones. I'd like to display all of them in an appropriate way using logarithmic scaling. This is what I do:
plotPointsPre <- ggplot(data = solverEntries, aes(x = val, y = instance,
color = solver, group = solver))
...
finalPlot <- plotPointsPre + coord_trans(x = 'log10') + geom_point() +
xlab("costs") + ylab("instance")
This is the result:
It is just the same as without coord_trans(x = 'log10').
However, if I use it with the y-axis:
How do I achieve the logarithmic scaling on the x-axis? Besides, it is not about the x-axis, if I switch the values of x and y, then it works on the x-axis and no longer on the y-axis. So there seems to be some problem with the displayed values. Does anybody have an idea how to fix this?
Edit - Here's the used data contained in solverEntries:
solverEntries <- data.frame(instance = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, 10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14, 15, 15, 15, 15, 16, 16, 16, 16, 17, 17, 17, 17, 18, 18, 18, 18, 19, 19, 19, 19, 20, 20, 20, 20),
solver = c(4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1, 4, 3, 2, 1),
time = c(1, 24, 13, 6, 1, 41, 15, 5, 1, 26, 16, 5, 1, 39, 7, 4, 1, 28, 11, 3, 1, 31, 12, 3, 1, 38, 20, 3, 1, 37, 10, 4, 1, 25, 11, 3, 1, 32, 18, 4, 1, 27, 21, 3, 1, 23, 22, 3, 1, 30, 17, 2, 1, 36, 8, 3, 1, 37, 19, 4, 1, 40, 21, 3, 1, 29, 11, 4, 1, 33, 10, 3, 1, 34, 9, 3, 1, 35, 14, 3),
val = c(6553.48, 6565.6, 6565.6, 6577.72, 6568.04, 7117.14, 6578.98, 6609.28, 6559.54, 6561.98, 6561.98, 6592.28, 6547.42, 7537.64, 6549.86, 6555.92, 6546.24, 6557.18, 6557.18, 6589.92, 6586.22, 6588.66, 6588.66, 6631.08, 6547.42, 7172.86, 6569.3, 6582.6, 6547.42, 6583.78, 6547.42, 6575.28, 6555.92, 6565.68, 6565.68, 6575.36, 6551.04, 6551.04, 6551.04, 6563.16, 6549.86, 6549.86, 6549.86, 6555.92, 6544.98, 6549.86, 6549.86, 6561.98, 6558.36, 6563.24, 6563.24, 6578.98, 6566.86, 7080.78, 6570.48, 6572.92, 6565.6, 7073.46, 6580.16, 6612.9, 6557.18, 7351.04, 6562.06, 6593.54, 6547.42, 6552.3, 6552.3, 6558.36, 6553.48, 6576.54, 6576.54, 6612.9, 6555.92, 6560.8, 6560.8, 6570.48, 6566.86, 6617.78, 6572.92, 6578.98))
Your data in current form is not log distributed -- most val around 6500 and some 10% higher. If you want to stretch the data, you could use a custom transformation using the scales::trans_new(), or here's a simpler version that just subtracts a baseline value to make a log transform useful. After subtracting 6500, the small values will be mapped to around 50, with the large values around 1000, which is a more appropriate range for a log scale. Then we apply the same transformation to the breaks so that the labels will appear in the right spots. (i.e. the label 6550 is mapped to the data that is mapped to 6550 - 6500 = 50)
This method helps if you want to make the underlying values more distinguishable, but at the cost of distorting the underlying proportions between values. You might be able to help with this by picking useful breaks and labeling them with scaling stats, e.g.
7000
+7% over min
my_breaks <- c(6550, 6600, 6750, 7000, 7500)
baseline = 6500
library(ggplot2)
ggplot(data = solverEntries,
aes(x = val - baseline, y = instance,
color = solver, group = solver)) +
geom_point() +
scale_x_log10(breaks = my_breaks - baseline,
labels = my_breaks, name = "val")
Is this what you're looking for?
x_data <- seq(from=1,to=50)
y_data <- 2*x_data+rnorm(n=50,mean=0,sd=5)
#non log y
ggplot()+
aes(x=x_data,y=y_data)+
geom_point()
#log y scale
ggplot()+
aes(x=x_data,y=y_data)+
geom_point()+
scale_y_log10()
#log x scale
ggplot()+
aes(x=x_data,y=y_data)+
geom_point()+
scale_x_log10()

How to get average marginal effects (AMEs) with standard errors of a multinomial logit model?

I want to get the average marginal effects (AME) of a multinomial logit model with standard errors. For this I've tried different methods, but they haven't led to the goal so far.
Best attempt
My best attempt was to get the AMEs by hand using mlogit which I show below.
library(mlogit)
ml.d <- mlogit.data(df1, choice="Y", shape="wide") # shape data for `mlogit()`
ml.fit <- mlogit(Y ~ 1 | D + x1 + x2, reflevel="1", data=ml.d) # fit the model
# coefficient names
c.names <- all.vars(ml.fit$call)[2:4]
# get marginal effects
ME.mnl <- sapply(c.names, function(x)
stats::effects(ml.fit, covariate=x, data=ml.d),
simplify=FALSE)
# get AMEs
(AME.mnl <- t(sapply(ME.mnl, colMeans)))
# 1 2 3 4 5
# D -0.03027080 -0.008806072 0.0015410569 0.017186531 0.02034928
# x1 -0.02913234 -0.015749598 0.0130577842 0.013240212 0.01858394
# x2 -0.02724650 -0.005482753 0.0008575982 0.005331181 0.02654047
I know these values are the correct ones. However, I could not get the correct standard errors by simply doing the columns' standard deviations:
# standard errors - WRONG!
(AME.mnl.se <- t(sapply(E.mnl, colSdColMeans)))
(Note: colSdColMeans() for columns' SD is provided here.)
Accordingly this also led me to the wrong t-values:
# t values - WRONG!
AME.mnl / AME.mnl.se
# 1 2 3 4 5
# D -0.7110537 -0.1615635 0.04013228 0.4190057 0.8951484
# x1 -0.7170813 -0.2765212 0.33325968 0.3656893 0.8907836
# x2 -0.7084573 -0.1155825 0.02600653 0.1281190 0.8559794
Whereas I know the correct t-values for this case are these:
# D -9.26 -1.84 0.31 4.29 8.05
# x1 -6.66 -2.48 1.60 1.50 3.22
# x2 -2.95 -0.39 0.06 0.42 3.21
I learned that there should be a "delta method", but I only found some code for a very special case with interactions at Cross Validated.
Failed attempts
1.) Package margins doesn't seem to be able to handle "mlogit"
objects:
library(margins)
summary(margins(ml.fit))
2.) There's another package for mlogits, nnet,
library(nnet)
ml.fit2 <- multinom(Y ~ D + x1 + x2, data=df1)
summary(ml.fit2)
but margins can't handle this correctly either:
> summary(margins(ml.fit2))
factor AME SE z p lower upper
D -0.0303 NA NA NA NA NA
x1 -0.0291 NA NA NA NA NA
x2 -0.0272 NA NA NA NA NA
3.) There's also a package around that claims to calculate "Average Effects for Multinomial Logistic Regression Models",
library(DAMisc)
mnlChange2(ml.fit2, varnames="D", data=df1)
but I couldn't get a drop of milk out of it, since the function yields just nothing (even not with the function's example).
How now can we get AMEs with standard errors / t-statistics of a multinomial logit model with R?
Data
df1 <- structure(list(Y = c(3, 4, 1, 2, 3, 4, 1, 5, 2, 3, 4, 2, 1, 4,
1, 5, 3, 3, 3, 5, 5, 4, 3, 5, 4, 2, 5, 4, 3, 2, 5, 3, 2, 5, 5,
4, 5, 1, 2, 4, 3, 1, 2, 3, 1, 1, 3, 2, 4, 2, 2, 4, 1, 5, 3, 1,
5, 2, 3, 4, 2, 4, 5, 2, 4, 1, 4, 2, 1, 5, 3, 2, 1, 4, 4, 1, 5,
1, 1, 1, 4, 5, 5, 3, 2, 3, 3, 2, 4, 4, 5, 3, 5, 1, 2, 5, 5, 1,
2, 3), D = c(12, 8, 6, 11, 5, 14, 0, 22, 15, 13, 18, 3, 5, 9,
10, 28, 9, 16, 17, 14, 26, 18, 18, 23, 23, 12, 28, 14, 10, 15,
26, 9, 2, 30, 18, 24, 27, 7, 6, 25, 13, 8, 4, 16, 1, 4, 5, 18,
21, 1, 2, 19, 4, 2, 16, 17, 23, 15, 13, 21, 24, 14, 27, 6, 20,
6, 19, 8, 7, 23, 11, 11, 1, 22, 21, 4, 27, 6, 2, 9, 18, 30, 26,
22, 10, 1, 4, 7, 26, 15, 26, 18, 30, 1, 11, 29, 25, 3, 19, 15
), x1 = c(13, 12, 4, 3, 16, 16, 15, 13, 1, 15, 10, 16, 1, 17,
7, 13, 12, 6, 8, 16, 16, 11, 7, 16, 5, 13, 12, 16, 17, 6, 16,
9, 14, 16, 15, 5, 7, 2, 8, 2, 9, 9, 15, 13, 9, 4, 16, 2, 11,
13, 11, 6, 4, 3, 7, 4, 12, 2, 16, 14, 3, 13, 10, 11, 10, 4, 11,
16, 8, 12, 14, 9, 4, 16, 16, 12, 9, 10, 6, 1, 3, 8, 7, 7, 5,
16, 17, 10, 4, 15, 10, 8, 3, 13, 9, 16, 12, 7, 4, 11), x2 = c(12,
19, 18, 19, 15, 12, 15, 16, 15, 11, 12, 16, 17, 14, 12, 17, 17,
16, 12, 20, 11, 11, 15, 14, 18, 10, 14, 13, 10, 14, 18, 18, 18,
17, 18, 14, 16, 19, 18, 16, 18, 14, 17, 10, 16, 12, 16, 15, 11,
18, 19, 15, 19, 11, 16, 10, 20, 14, 10, 12, 10, 15, 13, 15, 11,
20, 11, 12, 16, 16, 11, 15, 11, 11, 10, 10, 16, 11, 20, 17, 20,
17, 16, 11, 18, 19, 18, 14, 17, 11, 16, 11, 18, 14, 15, 16, 11,
14, 11, 13)), class = "data.frame", row.names = c(NA, -100L))
We can do something very similar to what is done in your linked answer. In particular, first we want a function that would compute AMEs at a given vector of coefficients. For that we can define
AME.fun <- function(betas) {
tmp <- ml.fit
tmp$coefficients <- betas
ME.mnl <- sapply(c.names, function(x)
effects(tmp, covariate = x, data = ml.d), simplify = FALSE)
c(sapply(ME.mnl, colMeans))
}
where the second half is yours, while in the first one I use a trick to take the same ml.fit object and to change its coefficients. Next we find the jacobian with
require(numDeriv)
grad <- jacobian(AME.fun, ml.fit$coef)
and apply the delta method. Square roots of the diagonal of grad %*% vcov(ml.fit) %*% t(grad) is what we want. Hence,
(AME.mnl.se <- matrix(sqrt(diag(grad %*% vcov(ml.fit) %*% t(grad))), nrow = 3, byrow = TRUE))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0.003269320 0.004788536 0.004995723 0.004009762 0.002527462
# [2,] 0.004375795 0.006348496 0.008168883 0.008844684 0.005763966
# [3,] 0.009233616 0.014048212 0.014713090 0.012702188 0.008261734
AME.mnl / AME.mnl.se
# 1 2 3 4 5
# D -9.259050 -1.8389907 0.30847523 4.2861720 8.051269
# x1 -6.657611 -2.4808393 1.59847852 1.4969683 3.224159
# x2 -2.950794 -0.3902812 0.05828811 0.4197057 3.212458
which coincides with Stata's results.
If you use vce="bootstraps" within margin function then it provides SE with Confidence interval as well
summary(margins(ml.fit2,vce="bootstraps"))
The terminology for “marginal effects” is very inconsistent across
disciplines. Since you refer to the margins package, I assume that you
use the expression “Average Marginal Effects” in the same that that the
margins developers used it, which is the result of this procedure:
Compute the slope of the outcome with respect to D for every row
in the original dataset (unit-level marginal effects).
Take the average of the unit-level slopes (average marginal effect)
In models like nnet::multinom, the slopes will be different for every
level of the outcome variable. There will thus be one average marginal
effect per level, per regressor.
Using the marginaleffects package and the data you supplied, we get:
library(nnet)
library(marginaleffects)
mod <- nnet::multinom(Y ~ D + x1*x2, data=df1, trace = FALSE)
marginaleffects(mod) |> summary()
Group Term Effect Std. Error z value Pr(>|z|) 2.5 % 97.5 %
1 1 D -0.027558 0.004183 -6.5878 4.4625e-11 -3.576e-02 -0.019359
2 1 x1 -0.026789 0.003916 -6.8411 7.8596e-12 -3.446e-02 -0.019114
3 1 x2 -0.026542 0.009812 -2.7051 0.00682871 -4.577e-02 -0.007311
4 2 D -0.012115 0.004702 -2.5766 0.00997729 -2.133e-02 -0.002899
5 2 x1 -0.018223 0.006017 -3.0287 0.00245619 -3.002e-02 -0.006430
6 2 x2 -0.007045 0.013101 -0.5377 0.59078427 -3.272e-02 0.018633
7 3 D 0.001536 0.005877 0.2614 0.79380433 -9.982e-03 0.013054
8 3 x1 0.012451 0.008775 1.4189 0.15592516 -4.748e-03 0.029650
9 3 x2 0.002193 0.015573 0.1408 0.88801728 -2.833e-02 0.032715
10 4 D 0.016300 0.004325 3.7689 0.00016399 7.823e-03 0.024776
11 4 x1 0.018111 0.008789 2.0606 0.03934167 8.845e-04 0.035338
12 4 x2 0.013543 0.013266 1.0208 0.30733424 -1.246e-02 0.039544
13 5 D 0.021837 0.003387 6.4479 1.1343e-10 1.520e-02 0.028475
14 5 x1 0.014449 0.005402 2.6749 0.00747469 3.862e-03 0.025037
15 5 x2 0.017851 0.009072 1.9677 0.04909878 7.048e-05 0.035631
Model type: multinom
Prediction type: probs

dplyr: Create a new variable as a function of all existing variables without defining their names

In the following dataframe I want to create a new variable as the following function of all existing ones:
as.numeric(paste0(df[i,],collapse=""))
However, I don't want to define the column names explicitly because their number and names maybe different each time. How can I do that using dplyr?
The equivalent in base r would be something like this:
apply(df,1,function(x) as.numeric(paste0(x,collapse="")))
df <- structure(list(X1 = c(50, 2, 2, 50, 5, 5, 2, 50, 5, 5, 50, 2,
5, 5, 50, 2, 2, 50, 9, 9, 9, 9, 9, 9), X2 = c(2, 50, 5, 5, 50,
2, 5, 5, 50, 2, 2, 50, 9, 9, 9, 9, 9, 9, 50, 2, 2, 50, 5, 5),
X3 = c(5, 5, 50, 2, 2, 50, 9, 9, 9, 9, 9, 9, 50, 2, 2, 50,
5, 5, 2, 50, 5, 5, 50, 2), X4 = c(9, 9, 9, 9, 9, 9, 50, 2,
2, 50, 5, 5, 2, 50, 5, 5, 50, 2, 5, 5, 50, 2, 2, 50)), class = "data.frame", .Names = c("X1",
"X2", "X3", "X4"), row.names = c(NA, -24L))
You can try:
df %>% mutate(newcol=as.numeric(do.call(paste0,df)))
Or (as you suggested, maybe more dplyr style):
df %>% mutate(newcol=as.numeric(do.call(paste0,.)))

How to get same results of Wilcoxon sign rank test in R and SAS

R code:
x <- c(9, 5, 9 ,10, 13, 8, 8, 13, 18, 30)
y <- c(10, 6, 9, 8, 11, 4, 1, 3, 3, 10)
library(exactRankTests)
wilcox.exact(y,x, paired = TRUE, alternative = "two.sided")
The results: V = 3, p-value = 0.01562
SAS code:
data aaa;
set aaa;
diff=x-y;
run;
proc univariate;
var diff;
run;
The results: S=19.5 Pr >= |S| 0.0156
How to get statistics S in R?
If n<=20 the exact P was same in SAS and R,but if n>20 the results were different.
x <- c(9, 5, 9 ,10, 13, 8, 8, 13, 18, 30,9, 5, 9 ,10, 13, 8, 8, 13, 18, 30,9,11,12,10)
y <- c(10, 6, 9, 8, 11, 4, 1, 3, 3, 10,10, 6, 9, 8, 11, 4, 1, 3, 3, 10,10,12,11,12)
wilcox.exact(y,x,paired=TRUE, alternative = "two.sided",exact = FALSE)
The results: V = 34, p-value = 0.002534
The SAS results:S=92.5 Pr >= |S| 0.0009
How to get the same statistics S and P value in SAS and R? Thank you!

Resources