R: How to get confidence interval without this warning message? - r

The model is:
model <- glm(DW ~ P + DV_1, family = "binomial")
The variables are:
DW <- c(1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1)
P <- c(18.584898, 8.177430, -7.392020, -13.123626, 11.742363, 35.836419, 8.177430, 8.177430, 7.209096, 10.398933, -23.382043, -8.177430, 7.392020, 17.607980, -37.631207, -8.177430, 12.202439, -29.602930, -8.177430, 14.709837, 8.194932, 8.177430, -5.222738, 1.185302, 12.049662, 6.193046)
DV_1 <- c(45.49215, 55.40000, 51.63815, 36.12306, 34.78324, 41.17867, 59.14783, 62.45898, 55.04072, 53.76998, 52.31764, 44.71056, 42.23566, 50.08676, 61.34397, 49.59538, 38.21099, 51.05214, 44.69676, 40.83045, 46.09846, 53.45508, 54.73643, 50.26476, 48.75601, 53.68885)
If I try to obtain confidence interval, for each parameter, with confint I get these warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred
For this specific model:
Is it better to use confint or other functions?
How do I fix that warning?
Can I get reliable confidence interval, and how?
Thanks in advance

You can always calculate confidence intervals as this in glm, without having to rely on any type of commands:
exp(confint.default(model))
You can always use the bayesian approach recommended by Sotos. Depends on rely what you want to do. If this is like a HW question telling you to just do a glm model and confidence intervals then the above command will help you out.
Also if you run the glm(), with only one parameter and then use the confint command you will not get any warnings.

Related

Random Variables with Standard Normal Distribution Falling into Specific Intervals

I'm working on a task where we look at 12 independent and identically distributed random variables - each of which have standard normal distribution.
From that I understand we have a mean of 0 and sd of 1.
We then have an interval of (-1.644, 1.644)
To find the probability of a single random variable landing in this interval I write:
(pnorm(1.644, mean = 0, sd = 1, lower.tail=TRUE) - pnorm(-1.644, mean = 0, sd = 1, lower.tail=TRUE))
Which returns the Probability of 0.8998238
I'm able to find the probability of at least one of the 12 random variables landing outside of the interval (-1.644, 1.644) with the following:
PROB_1 = 1-(0.8998238^12))
#PROB_1 = 0.7182333
However - How would if find the probability of Exactly 2 random variables landing outside of the interval? I've attempted the following:
((12*11)/2)*((1-0.7182333)^2)*(0.7182333^10)
I'm sure I'm missing something here, and there is a much easier way to solve this.
Any help is much appreciated.
You need the binomial coefficient
prob=pnorm(1.644, mean = 0, sd = 1, lower.tail=TRUE)-pnorm(-1.644, mean = 0, sd = 1, lower.tail=TRUE)
dbinom(2, 12, 1-prob)
prob^10 * (1-prob)^2 * choose(12, 2)
0.2304877

Extracting the functional form of the likelihood function that gets formed by the msm function in R

Is there a way of extracting the functional form of the likelihood function that gets formed by the msm function in R?
How can I extract the likelihood function that gets formed in the example below? I want to try and implement my own version of the quasi-Newton maximisation algorithm to improve my understanding.
library(msm)
# look at transition counts
statetable.msm(state, PTNUM, data = cav)
# define transition intensity matrix
# 1's mean a transition can occur
# 0's mean a transition should not occur
# any number can be placed on the diagonal as R overwrites the diagonals
# prior to maximising
q <- rbind(
c(0, 1, 0, 1),
c(1, 0, 1, 1),
c(0, 1, 0, 1),
c(0, 0, 0, 0)
)
# fit msm to the data
# the fnscale rescales the likelihood to prevent overflow
msm.fit <- msm(state ~ years, PTNUM, data = cav, qmatrix = q, control=list(fnscale=4000))

Confused by ROC curves and cutoffs in R

I have some data on study participants, the percent change in a biomarker, and their ultimate outcome. I'd like to use an ROC curve to find the best cutoff value for the biomarker for predicting the outcome, using the Youden method but get different answers from different packages and need to know where I'm going wrong.
To set up the dataset:
ID <- c(1:17)
PercentChange <- c(-85.5927051671732, -85.4849965108165,
-63.302752293578, -33.5509138381201, -75, -87.0988867059594,
-93.2523616734143, 65.2037617554859, -19.226393629124,
-44.7095435684647, -65.7342657342657, -43.7831467295227,
-37.0022539444027, 518.75, -77.1014492753623, 20.6572769953052,
-72.0742534301856)
Outcome <- c(1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1)
df <- data.frame(ID, PercentChange, Outcome)
For outcome, 1 is favorable and 0 is unfavorable.
Now with package pROC I did:
library(pROC)
roc <- roc(df$Outcome, df$PercentChange, auc= TRUE, plot = TRUE)
coords(roc, "b", ret="t", best.method="youden")
plot(roc, print.thres="best", print.thres.best.method="youden",main = "Percent change")
This gives me a reasonable curve and a cutoff (by the Youden index) of -44.246 that I verified has the correct sensitivity and specificity listed. The cutpoint seems a bit weird since its halfway between two of the actual values and not an actual value, but it works.
Then using OptimalCutpoints I tried
library(OptimalCutpoints)
optimal.cutpoint.Youden <- optimal.cutpoints(X = "PercentChange", status = "Outcome", tag.healthy = 1, methods = "Youden", data = df)
summary(optimal.cutpoint.Youden)
plot(optimal.cutpoint.Youden)
This gives me a different curve, and a different cutpoint of -43.783, which is one of the two points that pROC took the midpoint of. The sensitivity and specificity are also flipped from what I calculated using that cutpoint.
Lastly I tried the roc function from the Epi package
library(Epi)
ROC(form=Outcome~PercentChange, data=df, plot="ROC", PV=TRUE, MX=TRUE)
Which gave me a third completely different curve and says "24" at the cutpoint which doesn't make any sense. Can anyone help me figure this out? I'm not asking on the stats stackexchange cause they'd want to get into whether Youden is appropriate or not and not the technical application of these functions.

SEM-Path analysis in R

I run a path analysis in R and the following matrix represents the effect between variables.
M <- c(0, 0, 0, 0, 0)
p<-c(0, 0, 0, 0, 0)
O <- c(0, 0, 0, 0, 0)
T <- c(1, 0, 1, 1, 0)
Sales <- c(1, 1, 1, 1, 0)
sales_path <- rbind(M, p, O, T, Sales)
colnames(sales_path) <- rownames(sales_path)
#innerplot(sales_pls)
sales_blocks <- list(
c("m1", "m2"),
#c("pr"),
c("R1"),
#c("C1"),
c("tt1"),
c("Sales")
)
sales_modes = rep("A", 5)
sales_pls <- plspm(input_file, sales_path, sales_blocks, scheme = "centroid", scaled = FALSE, modes = sales_modes)
I have 2 questions:
The weights i receive can i use them to calculate the value of the latent variable e.g. My M variable has the manifest variables is there a formula to calculate its value?
The main purpose i run path analysis is to predict the sales. Is that possible by using the estimations(beta) for each latent variable?
i want to know if i am able to calculate the value of the latent
variable
Yes. Its actually very easy. Your latent variable scores can be obtained by running sales_pls$scores. You can also run summary(sales_pls). For easier interpretation you may wish to have the latent variables expressed in the same scale as the indicators (manifest variables). This can be accomplished by normalizing the outer weights in each block of indicators so that the weights are expressed as proportions. However, in order to apply this normalization all the outer weights must be positive.
You can apply the function rescale() to the object sales_pls and get scores expressed in the original scale of the manifest variables. Once you get the rescaled scores you can use summary() to verify that the obtained results make sense (i.e. scores expressed in original scale of indicators):
# rescaling scores
rescaled_sales_pls = rescale(sales_pls)
# summary
summary(rescaled_sales_pls)
(again you can also run rescaled_sales_pls)
if i can predict Sales using the betas from the output
Theoretically I guess you could, but I'm not really sure why you would want to. The utility of path analysis here is to decompose the sources of a correlation between an independent variable and a dependent variable of a multiple regression model.

Sage polynomial coefficients including zeros

If we have a multivariate polynomial in SAGE for instance
f=3*x^3*y^2+x*y+3
how can i display the full list of coefficients including the zero ones from missing terms between maximum dregree term and constant.
P.<x,y> = PolynomialRing(ZZ, 2, order='lex')
f=3*x^2*y^2+x*y+3
f.coefficients()
gives me the list
[3, 1, 3]
but i'd like the "full" list to put into a a matrix. In the above example it should be
[3, ,0 , 0, 1, 0, 0, 0, 0, 3]
corresponding to terms:
x^2*y^2, x^2*y, x*y^2, x*y, x^2, y^2, x, y, constant
Am I missing something?
Your desired output isn't quite well defined, because the monomials you listed are not in the lexicographic order (which you used in the first line of your code). Anyway, using a double loop you can arrange coefficients in any specific way you want. Here is a natural way to do this:
coeffs = []
for i in range(f.degree(x), -1, -1):
for j in range(f.degree(y), -1, -1):
coeffs.append(f.coefficient({x:i, y:j}))
Now coeffs is [3, 0, 0, 0, 1, 0, 0, 0, 3], corresponding to
x^2*y^2, x^2*y, x^2, x*y^2, x*y, x, y, constant
The built-in .coefficients() method is only useful if you also use .monomials() which provides a matching list of monomials that have those coefficients.

Resources