Multiple non linear regression in R program - r

I am trying to use a logistic model of the form
Y = exp(ao + a1fi1....)/(1 + exp(a0 + a1fi1 ....)
for multiple non linear regression in R, The dependent variable Y is a row consisting of about 500 values and there are 33 independent variables X1, X2, X3.....X33
I am reading my data from an EXCEL file:
data1<-read.csv(file.choose(), header=TRUE)
which populates R with my data. I performed linear regression with the lm() function using input:
results<- lm(Y~ X1 + X2....X33, data = data1)
which worked perfectly fine and now I am trying to use the self starting logistic function of the form:
nls(Y ~ SSlogis(x, Asym, xmid, scal), data1)
for non linear regression; however, I do not seem to be applying the function properly. Thus my question is how would I use this function to perform multiple non linear regression analysis for my dataset?? Thank you for any help you can provide.

You simply choose the type of the model when doing regression. The codes following should help. (I used a online dataset for example)
mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
model <- glm(admit ~ .,
family=binomial(link='logit'),
data= my data)
Then you can use following code to get more info about your model
fit
fit$resample
fit$results
fit$finalModel

Related

Including an offset when using cph {rms} for validation of a Cox model

I am externally validating and updating a Cox model in R. The model predicts 5 year risk. I don't have access to the original data, just the equation for the linear predictor and the value of the baseline survival probability at 5 years.
I have assessed calibration and discrimination of the model in my dataset and found that the model needs to be updated.
I want to update the model by adjusting baseline risk only, so I have been using a Cox model with the linear predictor ("beta.sum") included as an offset term, to restrict its coefficient to be 1.
I want to be able to use cph instead of coxph as it makes internal validation by bootstrapping much easier. However, when including the linear predictor as an offset I get the error:
"Error in exp(object$linear.predictors) :
non-numeric argument to mathematical function"
Is there something I am doing incorrectly, or does the cph function not allow an offset within the formula? If so, is there another way to restrict the coefficient to 1?
My code is below:
load(file="k.Rdata")
### Predicted risk ###
# linear predictor (LP)
k$beta.sum <- -0.2201 * ((k$age/10)-7.036) + 0.2467 * (k$male - 0.5642) - 0.5567 * ((k$epi/5)-7.222) +
0.4510 * (log(k$acr_mgmmol/0.113)-5.137)
k$pred <- 1 - 0.9365^exp(k$beta.sum)
# Recalibrated model
# Using coxph:
cox.new <- coxph(Surv(time, rrt) ~ offset(beta.sum), data = k, x=TRUE, y=TRUE)
# new baseline survival at 5 years
library(pec)
predictSurvProb(cox.new, newdata=data.frame(beta.sum=0), times = 5) #baseline = 0.9570
# Using cph
cph.new <- cph(Surv(time, rrt) ~ offset(beta.sum), data=k, x=TRUE, y=TRUE, surv=TRUE)
The model will run without surv=TRUE included, but this means a lot of the commands I want to use cannot work, such as calibrate, validate and predictSurvProb.
EDIT:
I will include a way to reproduce the error
library(purr)
library(rms)
n <- 1000
set.seed(1234)
status <- as.numeric(rbernoulli(n, p=0.1))
time <- -5* log(runif(n))
lp <- rnorm(1000, mean=-2.7, sd=1)
mydata <- data.frame(status, time, lp)
test <- cph(Surv(time, status) ~ offset(lp), data=mydata, surv=TRUE)

Is there a way to both include PCSE and Prais-Winsten correction in a fixed effects model in R (similar to the xtpcse function in Stata)?

I want to estimate a fixed effects model while using panel-corrected standard errors as well as Prais-Winsten (AR1) transformation in order to solve panel heteroscedasticity, contemporaneous spatial correlation and autocorrelation.
I have time-series cross-section data and want to perform regression analysis. I was able to estimate a fixed effects model, panel corrected standard errors and Prais-winsten estimates individually. And I was able to include panel corrected standard errors in a fixed effects model. But I want them all at once.
# Basic ols model
ols1 <- lm(y ~ x1 + x2, data = data)
summary(ols1)
# Fixed effects model
library('plm')
plm1 <- plm(y ~ x1 + x2, data = data, model = 'within')
summary(plm1)
# Panel Corrected Standard Errors
library(pcse)
lm.pcse1 <- pcse(ols1, groupN = Country, groupT = Time)
summary(lm.pcse1)
# Prais-Winsten estimates
library(prais)
prais1 <- prais_winsten(y ~ x1 + x2, data = data)
summary(prais1)
# Combination of Fixed effects and Panel Corrected Standard Errors
ols.fe <- lm(y ~ x1 + x2 + factor(Country) - 1, data = data)
pcse.fe <- pcse(ols.fe, groupN = Country, groupT = Time)
summary(pcse.fe)
In the Stata command: xtpcse it is possible to include both panel corrected standard errors and Prais-Winsten corrected estimates, with something allong the following code:
xtpcse y x x x i.cc, c(ar1)
I would like to achieve this in R as well.
I am not sure that my answer will completely address your concern, these days I've been trying to deal with the same problem that you mention.
In my case, I ran the Prais-Winsten function from the package prais where I included my model with the fixed effects. Afterwards, I correct for heteroskedasticity using the function vcovHC.prais which is analogous to vcovHC function from the package sandwich.
This basically will give you White's/sandwich heteroskedasticity-consistent covariance matrix which, if you later fit into the function coeftest from the package lmtest, it will give you the table output with the corrected standard errors. Taking your posted example, see below the code that I have used:
# Prais-Winsten estimates with Fixed Effects
library(prais)
prais.fe <- prais_winsten(y ~ x1 + x2 + factor(Country), data = data)
library(lmtest)
prais.fe.w <- coeftest(prais.fe, vcov = vcovHC.prais(prais.fe, "HC1")
h.m1 # run the object to see the output with the corrected standard errors.
Alas, I am aware that the sandwhich heteroskedasticity-consistent standard errors are not exactly the same as the Beck and Katz's PCSEs because PCSE deals with panel heteroskedasticity while sandwhich SEs addresses overall heteroskedasticity. I am not totally sure in how much these two differ in practice, but something is something.
I hope my answer was somehow helpful, this is actually my very first answer :D

Validating a model and introducing a new predictor in glm

I am hitting my head against the computer...
I have a prediction model in R that goes like this
m.final.glm <- glm(binary_outcome ~ rcs(PredictorA, parms=kn.a) + rcs(PredictorB, parms=kn.b) + PredictorC , family = "binomial", data = train_data)
I want to validate this model on test_data2 - first by updating the linear predictor (lp)
train_data$lp <- predict(m.final.glm, train_data)
test_data2$lp <- predict(m.final.glm, test_data2)
lp2 <- predict(m.final.glm, test_data2)
m.update2.lp <- glm(binary_outcome ~ 1, family="binomial", offset=lp2, data=test_data2)
m.update2.lp$coefficients[1]
m.final.update2.lp <- m.final.glm
m.final.update2.lp$coefficients[1] <- m.final.update2.lp$coefficients[1] + m.update2.lp$coefficients[1]
m.final.update2.lp$coefficients[1]
p2.update.lp <- predict(m.final.update2.lp, test_data2, type="response")
This gets me to the point where I have updated the linear predictor, i.e. in the summary of the model only the intercept is different, but the coefficients of each predictor are the same.
Next, I want to include a new predictor (it is categorical, if that matters), PredictorD, into the updated model. This means that the model has to have the updated linear predictor and the same coefficients for Predictors A, B and C but the model also has to contain Predictor D and estimate its significance.
How do I do this? I will be very grateful if you could help me with this. Thanks!!!

Replacing intercept with dummy variables in ARIMAX models in R

I am attempting to fit an ARIMAX model to daily consumption data in R. When I perform an OLS regression with lm() I am able to include a dummy variable for each unit and remove the constant term (intercept) to avoid less then full rank matrices.
lm1 <- lm(y ~ -1 + x1 + x2 + x3, data = dat)
I have not found a way to do this with arima() which forces me to use the constant term and exclude one of the dummy variables.
with(dat, arima(y, xreg = cbind(x1, x2))
Is there a specific reason why arima() doesn't allow this and is there a way to bypass?
See the documentation for the argument include.mean in ?arima, it seems you want the following: arima(y, xreg = cbind(x1, x2), include.mean=FALSE).
Be also aware of the definition of the model fitted by ARIMA as pointed by #RichardHardy.

non linear regression 'abline'

Still quite new to R (and statistics to be honest) and I have currently only used it for simple linear regression models. But now one of my data sets clearly shows a inverted U pattern. I think I have to do a quadratic regression analysis on this data, but I'm not sure how. What I tried so far is:
independentvar2 <- independentvar^2
regression <- lm(dependentvar ~ independentvar + independentvar2)
summary (regression)
plot (independentvar, dependentvar)
abline (regression)
While this would work for a normal linear regression, it doesn't work for non-linear regressions. Can I even use the lm function since I thought that meant linear model?
Thanks
Bert
This example is from this SO post by #Tom Liptrot.
plot(speed ~ dist, data = cars)
fit1 = lm(speed ~ dist, cars) #fits a linear model
plot(speed ~ dist, data = cars)
abline(fit1) #puts line on plot
fit2 = lm(speed ~ I(dist^2) + dist, cars) #fits a model with a quadratic term
fit2line = predict(fit2, data.frame(dist = -10:130))

Resources