Question: Have a look at data set Two.csv. It contains a potentially dependent binary variable Y , and
two potentially independent variables {X1, X2} for each unit of measurement.
(a) Read data set Two.csv into R and have a look at the format of the dependent variable.
Discuss three models which might be appropriate in this data situation. Discuss which
aspects speak in favor of each model, and which aspects against.
(b) Suppose variable Y measures financial ratings A : y = 1, B : y = 2, and C : y = 3, that
is, the creditworthiness A: high, B: intermediate, C: low for unit of measurement firm
i. Model Y by means of an ordered Logit model as a function of {X1,X2} and estimate
your model by means of a built-in command.
(c) Explain the proportional odds-assumption and test whether the assumption is critical
in the context of the data set at hand.
##a) Read data set Two.csv into R and have a look at the format of the dependent variable.
O <- read.table("C:/Users/DELL/Downloads/ExamQEIII2021/Two.csv",header=TRUE,sep=";")
str(O)
dim(O)
View(O)
##b)
library(oglmx)
ologit<- oglmx(y~x1+x2,data=O, link="logit",constantMEAN = FALSE, constantSD = FALSE,
delta=0,threshparam =NULL)
results.ologis <- ologit.reg(y~x1+x2,data=O)
summary(results.ologis)
## x1 1.46251
## x2 -0.45391
margins.oglmx(results.ologis,ascontinuous = FALSE) #Build in command for AMElogit
##c) Explain the proportional odds-assumption and test whether the assumption is critical
#in the context of the data set at hand.
#ordinal Logit WITH proportional odds(PO)
library(VGAM)
a <- vglm(y~x1+x2,family=cumulative(parallel=TRUE),data=O)
summary(a)
#ordinal Logit WITHOUT proportional odds [a considers PO and c doesn't]
c <- vglm(y~x1+x2,family=cumulative(parallel=FALSE),data=O)
summary(c)
pchisq(deviance(a)-deviance(c),df.residual(a)-df.residual(c),lower.tail=FALSE)
## 0.4936413 ## No significant difference in the variance left unexplained. Cannot
#confirm that PO assumption is critical.
#small model
LLa <- logLik(a)
#large model
LLc <- logLik(c)
#2*LLc-2*
df.residual(c)
df.residual(a) #or similarly, via a Likelihood Ratio test.
# or, if you are unsure about the number of degrees of freedom
LL<- 2*(LLc -LLa)
1-pchisq(LL,df.residual(a)-df.residual(c))
## 0.4936413 [SAME AS ## No sign. line]
##Conclusion: Likelihood do not differ significantly with the assumption of non PO.
I am conducting a study that analyzes speakers' production and measures their average F2 values. What I need is an R function that allows me to find a relationship for these F2 values with 3 other variables, and if there is, which one is the most significant. These variables have been coded as 1, 2, or 3 for things like "yes" "no" answers or whether responses are positive, neutral or negative (1, 2, 3 respectively).
Is there a particular technique or R function/test that we can use to approach this problem? I've considered using ANOVA or a T-Test but am unsure if this will give me what I need.
A quick solution might look like this. Here, the cor function is used. Read its help page (?cor) to understand what is calculated. By default, the Pearson correlation coefficient is used. The function below return the variable with the highest Pearson correlation with respect to the reference variable.
set.seed(111)
x <- rnorm(100)
y <- rnorm(100)
z <- rnorm(100)
ref <- 0.5*x + 0.5*rnorm(100)
find_max_corr <- function(vars, ref){
val <- sapply(vars, cor, y = ref)
val[which.max(val)]
}
find_max_corr(list('x' = x, 'y' = y, 'z' = z), ref)
Suppose I have these data
library(MASS)
m<-lmer(Y~N*V + (1|B),data=oats)
How can I create a manual contrast in emmeans? For example
Victoria_0.2cwt 1
Victoria_0.4cwt -1
Marvellous_0.2cwt -1
Marvellous_0.4cwt 1
emm = emmeans(m, ~ V * N)
emm
contrast(emm, list(con = c(0,0,0,0,-1,1,0,0,-1,0,0,0)))
However, this is actually a linear function, not a contrast, because the coefficients do not sum to zero.
Note: I may have mis-remembered the factor levels, and if so, the coefficients may need to be rearranged. They should correspond to the combinations you see in the results if the 2nd line
I am experiencing difficulties estimating a BMA-model via glib(), due to multicollinearity issues, even though I have clearly specified which columns to use. Please find the details below.
The data I'll be using for the estimation via Bayesian Model Averaging:
Cij <- c(357848,766940,610542,482940,527326,574398,146342,139950,227229,67948,
352118,884021,933894,1183289,445745,320996,527804,266172,425046,
290507,1001799,926219,1016654,750816,146923,495992,280405,
310608,1108250,776189,1562400,272482,352053,206286,
443160,693190,991983,769488,504851,470639,
396132,937085,847498,805037,705960,
440832,847631,1131398,1063269,
359480,1061648,1443370,
376686,986608,
344014)
n <- length(Cij);
TT <- trunc(sqrt(2*n))
i <- rep(1:TT,TT:1); #row numbers: year of origin
j <- sequence(TT:1) #col numbers: year of development
k <- i+j-1 #diagonal numbers: year of payment
#Since k=i+j-1, we have to leave out another dummy in order to avoid multicollinearity
k <- ifelse(k == 2, 1, k)
I want to evaluate the effect of i and j both via levels and factors, but of course not in the same model. Since I can decide to include i and j as factors, levels, or not include them at all and for k either to include as level, or exclude, there are a total of 18 (3x3x2) models. This brings us to the following data frame:
X <- data.frame(Cij,i.factor=as.factor(i),j.factor=as.factor(j),k,i,j)
X <- model.matrix(Cij ~ -1 + i.factor + j.factor + k + i + j,X)
X <- as.data.frame(X[,-1])
Next, via the following declaration I specify which variables to consider in each of the 18 models. According to me, no linear dependence exists in these specifications.
model.set <- rbind(
c(rep(0,9),rep(0,9),0,0,0),
c(rep(0,9),rep(0,9),0,1,0),
c(rep(0,9),rep(0,9),0,0,1),
c(rep(0,9),rep(0,9),1,0,0),
c(rep(1,9),rep(0,9),0,0,0),
c(rep(0,9),rep(1,9),0,0,0),
c(rep(0,9),rep(0,9),0,1,1),
c(rep(0,9),rep(0,9),1,1,0),
c(rep(0,9),rep(1,9),0,1,0),
c(rep(0,9),rep(0,9),1,0,1),
c(rep(1,9),rep(0,9),0,0,1),
c(rep(1,9),rep(0,9),1,0,0),
c(rep(0,9),rep(1,9),1,0,0),
c(rep(1,9),rep(1,9),0,0,0),
c(rep(0,9),rep(0,9),1,1,1),
c(rep(0,9),rep(1,9),1,1,0),
c(rep(1,9),rep(0,9),1,0,1),
c(rep(1,9),rep(1,9),1,0,0))
Then I call the glib() function, telling it to select the specified columns from X according to model.set.
library(BMA)
model.glib <- glib(X,Cij,error="poisson", link="log",models=model.set)
which results in the error
Error in glim(x, y, n, error = error, link = link, scale = scale) : X matrix is not full rank
The function first checks whether the matrix is f.c.r, before it evaluates which columns to select from X via model.set. How do I circumvent this, or is there any other way to include all 18 models in the glib() function?
Thank you in advance.
I would like to drop some interaction terms from an R formula. My situation is that I have one factor variable with lots of levels (call this A, and it takes values from 1-50), and another continuous variable that I would like to interact it with (call this B).
A*B
creates terms A1:B, A2:B, A3:B,... I want a simple way to get rid of the first A1:B term.
Note: I saw some previous answers for the lm case that called update and then dropped some terms. This will not work for me as I am trying to estimate a multinomial logit model with the mlogit package, and I cannot make the first estimation without dropping some interactions.
Edit: Although I am not attempting to use lm, if I could the following to occur, then I think it would solve my problem.
dd<-data.frame(A=sample(letters[1:10], 100, replace=T),B = runif(100),z=rexp(100))
#need to drop B term below
reg1 <- lm(z~A*B, dd)
#or need to drop Aa:B term here
reg2 <- lm(z~A*B - B, dd)
#but this doesn't work (I realize why, but this is an
#example of what I would like to have happen)
reg3 <- lm(z~A*B - B - Aa:B, dd)
I think you should be able to work with contrasts her to make this happen. Here we create our own contrast that adjusts the default contrast.treament behavior to skip the first two variables.
contr.skip2 <- function (n, contrasts = TRUE, sparse = FALSE)
{
contr <- contr.treatment(n, 1, contrasts, sparse)
contr[2, ] <- 0
contr[, -1]
}
and then we can fit the model and pass along our special contrast
lm(z~A*B, dd, contrasts=list(A="contr.skip2"))
# Call:
# lm(formula = z ~ A * B, data = dd, contrasts = list(A = "contr.skip2"))
#
# Coefficients:
# (Intercept) Ac Ad Ae Af Ag Ah
# 1.09981 -0.14541 -0.86334 -0.18478 -0.77302 0.19681 0.23845
# Ai Aj B Ac:B Ad:B Ae:B Af:B
# -0.74962 -0.49014 0.09729 0.14705 1.09606 0.14706 0.88919
# Ag:B Ah:B Ai:B Aj:B
# -0.62796 -0.70155 1.60253 -0.20564
and as you can see we no longer have Ab terms in the model.