Apologies for a very basic question. I'm struggling to get R to recognise the y values for a ROC
I'm trying to do a basic ROC but can't seem to set the vector for y.
fullmodel= glm(culture_positive ~ No_symptoms + sex + art_status_v1 +current_cd4 +
bmi_v1 +nurse_tb_diagnosis_crp_v1 + temperature_v1,
family="binomial", data= Data1)
roc(y , fullmodel$fitted.values, plot=TRUE)
Error in roc(y, fullmodel$fitted.values, plot = TRUE) :
object 'y' not found
So 'y' is a column in my dataset Data1 labelled 'culture_positive' as per the glm but whatever I try I keep getting this message that 'y' is not found.
Once again, apologies for a basic question but it is really holding me up.
Since y is not in your global environment you need to specify where to find y. You can either use the value you used to fit the model:
roc(culture_positive , fullmodel$fitted.values, plot=TRUE)
or the response stored in the glm object
roc(fullmodel$y , fullmodel$fitted.values, plot=TRUE)
I would recommend the second option, it's somewhat safer, because you take y and fitted.values from the same object, so they will fit together.
Related
I am checking a few of my Cox multivariate regression analyses' proportional hazard assumptions using time-dependent co-variates, using the survival package. The question is looking at survival in groups with different ADAMTS13 levels (a type of enzyme).
Could I check if something is wrong with my code itself? It keeps saying Error in tt(TMAdata$ADAMTS13level.f) : could not find function "tt" . Why?
Notably, ADAMTS13level.f is a factor variable.
cox_multivariate_survival_ADAMTS13 <- coxph(Surv(TMAdata$Daysalive, TMAdata$'Dead=1')
~TMAdata$ADAMTS13level.f
+TMAdata$`Age at diagnosis`
+TMAdata$CCIwithoutage
+TMAdata$Gender.f
+TMAdata$`Peak Creatinine`
+TMAdata$DICorcrit.f,
tt(TMAdata$ADAMTS13level.f),
tt = function(x, t, ...)
{mtrx <- model.matrix(~x)[,-1]
mtrx * log(t)})
Thanks- starting with the fundamentals of my actual code or typos- I have tried different permutations to no avail yet.
#Limey was on the right track!
The time-transformed version of ADAMTS13level.f needs to be added to the model, instead of being separated into a separate argument of coxph(...).
The form of coxph call when testing the time-dependent categorical variables is described in How to use the timeSplitter by Max Gordon.
Other helpful documentation:
coxph - fit proportional hazards regression model
cox_multivariate_survival_ADAMTS13 <-
coxph(
Surv(
Daysalive,
'Dead=1'
) ~
ADAMTS13level.f
+ `Age at diagnosis`
+ CCIwithoutage
+ Gender.f
+ `Peak Creatinine`
+ DICorcrit.f
+ tt(ADAMTS13level.f),
tt = function(x, t, ...) {
mtrx <- model.matrix(~x)[,-1]
mtrx * log(t)
},
data = TMAdata
)
p.s. with the original data, there was also a problem because Daysalive included a zero (0) value, which eventually resulted in an 'infinite predictor' error from coxph, probably because tt transformed the data using a log(t). (https://rdrr.io/github/therneau/survival/src/R/coxph.R)
I want to fit a non-linear model to a real data.
The real data consists of 2 known numerical vectors ; thickness as 'x' and fh as 'y'
thickness=seq(0.15,2.00,by=0.05)
fh = c(5.17641, 4.20461, 3.31091, 2.60899, 2.23541, 1.97771, 1.88141, 1.62821, 1.50138, 1.51075, 1.40850, 1.26222, 1.09432, 1.13202, 1.12918, 1.10355, 1.11867, 1.09740,1.08324, 1.05687, 1.19422, 1.22984, 1.34516, 1.19713,1.25398 ,1.29885, 1.33658, 1.31166, 1.40332, 1.39550,1.37855, 1.41491, 1.59549, 1.56027, 1.63925, 1.72440, 1.74192, 1.82049)
plot(thickness,fh)
This is apparently non-linear. So, I am trying to fit this model as a non-linear function of
y= x*2/3+(2+2*a)/(3*x)
Variable a is an unknown constant and I am trying to find the best constant a that minimizes the sum of square of error between the regression line and the real data.
I first used a function fitModel that I found on a YouTube video, Fitting Functions to Data in R.
library(TIMP)
f=fitModel(fh~thickness^2/3+(2+2*A)/(3*thickness)) #it finds the coefficient 'A'
coef(f) # to represent just the coefficient
However, there's an error
Error in modelspec[[datasetind[i]]] : subscript out of bounds
So, as an alternative, want to find a plot of 'a' and 'the Sum of Squares of Error'. This time, I have such a hard time finding 'a' and plotting this graph. By manual work, I figured out the value 'a' is somewhere near 0.2 but this is not a precise value.
It would be helpful if someone could manifest either:
Why the fitModel function didn't work or
How to find the value a and plot the graph.
You could try this instead:
yf = function(a,xv) xv*(2/3)+(2+2*a)/(3*xv)
yf(2,thickness)
f <- function (a,y, xv) sum((y - yf(a,xv))^2)
f(2,fh,thickness)
xmin <- optimize(f, c(0, 10), tol = 0.0001, y=fh,xv=thickness)
xmin
plot(thickness,fh)
lines(thickness,yf(xmin$minimum,thickness),col=3)
So I'm trying to fit a cubic, natural, and smoothing spline to the Auto dataset from the ISLR package. I'm having some trouble and am getting some warning/error messages which makes me think there is something wrong with my data or a matrix that I created.
What is really confusing is how this basic command throws an error.
natural.splines.fit <- lm(horsepower ~ ns(mpg, knots = c(25, 50, 75)), data = Auto)
Error in qr.default(t(const)) : NA/NaN/Inf in foreign function call
(arg 1)
There are additional errors/warnings in my code but the thing is: I had essentially copied the code from somewhere and I also ran it, which it worked for the Carseats dataset and modified it to change the variables to match the Auto dataset. This is why it is confusing me. I'm not understanding why I get errors for the Auto dataset but not the Carseats dataset. Does anyone have some insight?
The problem that you have is that you are defining the knots outside the range of the predictor variable. Here is a basic code that will work (I just defined knots that are within the range of the variable mpg).
x <- ISLR::Auto
natural.splines.fit <- lm(horsepower ~ ns(mpg, knots = c(10,20,30,40)), data = x)
summary(natural.splines.fit)
I believe that you are trying to place the knots for the 25th, 50th, and 75th percentile, so I recommend first getting the values corresponding to those locations and then fitting the model.
Here is how I did it
target_quantiles <- unname(quantile(x$mpg, probs = c(0.25,0.5,0.75)))
natural.splines.fit2 <- lm(horsepower ~ ns(mpg, knots = target_quantiles), data = x)
summary(natural.splines.fit2)
I calculated a linear mixed model using the packages lme4 and lsmeans with the lmer-function, where I have one dependent variable rv and the interacting factors treatment, time, age, and race. I'm interested in the response variable change over time, that's why I use the lstrends-function. So far so good. The problem is, I have to square root the response variable in order to fit the model properly. But the pairs-function only gives out a response to the square root of the rv, hard to interpret!
So I tried to back-transform the response variable after pairs:
model.lmer <- lmer(sqrt(rv) ~ treat*time*age*race + (1|individual), data=mydata)
model.lst <- lstrends(model.lmer, ~treat | age*race , var = "time", type="response")
pairs(mouse.lst, type="response")
This obviously doesn't work, as stated by the package itself:
# Transformed response
sqwarp.rg <- ref.grid(update(warp.lm, sqrt(breaks) ~ .))
summary(sqwarp.rg)
# Back-transformed results - compare with summary of 'warp.rg'
summary(sqwarp.rg, type = "response")
# But differences of sqrts can't be back-transformed
summary(pairs(sqwarp.rg, by = "wool"), type = "response")
# We can do it via regrid
sqwarp.rg2 <- regrid(sqwarp.rg)
summary(sqwarp.rg2) # same as for sqwarp.rg with type = "response"
pairs(sqwarp.rg2, by = "wool")
It could look like the following code:
summary(pairs(lsmeans(rg.regrid, ~ treat | race*age, trend="time")), type="response")
The problem is, I can't alter the reference grid for lstrends, just for lsmeans, because the first argument in lstrends or lsmeans with trend="time" requires the linear mixed effect model (model.lmer) intead of just the reference grid like in lsmeans, without the trend-argument... That's probably why I can't back-transform the data with
This here sums up my problem pretty well:
model.sqrt <- lmer(sqrt(rv) ~ time*treat*race*age, data=mydata)
rg <- ref.grid(model.sqrt)
rg.regrid <- regrid(rg)
summary(pairs(lsmeans(rg.regrid, ~treat | race*age*time), type = "response"))
Works perfectly.
summary(pairs(lsmeans(rg.regrid, ~treat | race*age, trend="time"), type = "response"))
Gives the following error:
Error in summary(pairs(lsmeans(rg.regrid, ~vns | gen * age, trend = "time"), :
error in evaluating the argument 'object' in selecting a method for function 'summary': Error in data[[var]] : subscript out of bounds
How to avoid the error and still be able to back-transform my data?
It does NOT seem to seem possible at all - the back-transformation would be a complicated procedure without any obvious pattern. That's what the creator of the package said.
So I have a data set called x. The contents are simple enough to just write out so I'll just outline it here:
the dependent variable, Report, in the first column is binary yes/no (0 = no, 1 = yes)
the subsequent 3 columns are all categorical variables (race.f, sex.f, gender.f) that have all been converted to factors, and they're designated by numbers (e.g. 1= white, 2 = black, etc.)
I have run a logistic regression on x as follows:
glm <- glm(Report ~ race.f + sex.f + gender.f, data=x,
family = binomial(link="logit"))
And I can check the fitted probabilities by looking at summary(glm$fitted).
My question: How do I create a fifth column on the right side of this data set x that will include the predictions (i.e. fitted probabilities) for Report? Of course, I could just insert the glm$fitted as a column, but I'd like to try to write a code that predicts it based on whatever is in the race, sex, gender columns for a more generalized use.
Right now I the follow code which I will hope create a predicted column as well as lower and upper bounds for the confidence interval.
xnew <- cbind(xnew, predict(glm5, newdata = xnew, type = "link", se = TRUE))
xnew <- within(xnew, {
PredictedProb <- plogis(fit)
LL <- plogis(fit - (1.96 * se.fit))
UL <- plogis(fit + (1.96 * se.fit))
})
Unfortunately I get the error:
Error in eval(expr, envir, enclos) : object 'race.f' not found
after the cbind code.
Anyone have any idea?
There appears to be a few typo in your codes; First Xnew calls on glm5 but your model as far as I can see is glm (by the way using glm as name of your output is probably not a good idea). Secondly make sure the variable race.f is actually in the dataset you wish to do the prediction from. My guess is R can't find that variable hence the error.