Hill Equation Curve Fitting NLS - r

I am trying to calculate rate for the following data. I tried Michaelis menten equation, however, Km was coming negative. I am now trying to fit hill equation, but I am getting error message. I think my starting values are not so good. Any help will be very appreciated.
Thanks,
Krina
x<- c(0.0, 2.5, 5.0, 10.0, 25.0)
y <- c(4.91, 1.32, 1.18, 1.12, 1.09)
fo <- y~(Emax*(x^hill)/((EC50^hill)+(x^hill)))
st <- c(Emax=1.06, EC50=0.5, hill=1)
fit <- nls(fo, data = data.frame(x, y), start = st, trace = T)
Error in numericDeriv(form[[3L]], names(ind), env) :
Missing value or an infinity produced when evaluating the model

I fit the data you posted to a few hundred known, named equations using a genetic algorithm for initial parameter estimation and found an excellent fit to a simple power law equation as follows (also see attached graph):
y = (a + x)b + Offset
a = 3.6792869983309306E-01
b = -1.3439157691306818E+00
Offset = 1.0766655470363218E+00
Degrees of freedom (error): 2
Degrees of freedom (regression): 2
Chi-squared: 1.98157151386e-06
R-squared: 0.999999822702
R-squared adjusted: 0.999999645405
Model F-statistic: 5640229.45337
Model F-statistic p-value: 1.77297720061e-07
Model log-likelihood: 29.7579529506
AIC: -10.7031811802
BIC: -10.9375184328
Root Mean Squared Error (RMSE): 0.000629534989315
a = 3.6792869983309306E-01
std err: 2.36769E-06
t-stat: 2.39112E+02
p-stat: 1.74898E-05
95% confidence intervals: [3.61308E-01, 3.74549E-01]
b = -1.3439157691306818E+00
std err: 2.91468E-05
t-stat: -2.48929E+02
p-stat: 1.61375E-05
95% confidence intervals: [-1.36714E+00, -1.32069E+00]
Offset = 1.0766655470363218E+00
std err: 9.37265E-07
t-stat: 1.11211E+03
p-stat: 8.08538E-07
95% confidence intervals: [1.07250E+00, 1.08083E+00]
Coefficient Covariance Matrix
[ 2.38970842 -8.3732707 1.30483649]
[ -8.3732707 29.41789844 -4.52058247]
[ 1.30483649 -4.52058247 0.94598199]

I was able to get a good fit using log-logistic model in drc library. However, I am not able to find parameters definition for this model. Is it similar to hill model with log transformation?
library(drc)
fit.ll <- drm(y~x, data=data.frame(x,y), fct=LL.5(), type="continuous")
print(summary(fit.ll))
plot(fit.ll)

Related

How to extract te residual standard error from rolling regressions

I'm trying to get the residual standard deviation (residual standard error in R summaries) for rolling regressions. I'm trying to do rolling regressions on 20 days of stock returns over a total of 4000 days. I can do rolling regressions, and I can get the residual standard deviation from a regular lm regression, but not for the rolling regression.
My data is similar to the following, where the data frame has the returns of multiple stocks and the vector is an index return:
data<-as.data.frame(matrix(rexp(20000, rate=.1), ncol=20))
vector<-rexp(1000,rate=0.1)
I can produce a sigma for an lm regression: sigma(lm(data$V1~vector))
I can produce a rolling regression with library(roll) and roll_lm(vector,data$V1,width=20)and with library(rollRegres)and roll_regres(data$V1~vector,width=20)
Is there a way to get the residual standard error / residual standard deviation / sigma from such rolling regressions?
I would like to end up with a data frame containing only the residual standard deviations.
Thanks!
If you read the code for summary.lm, the residual standard error is the square root of residual sum of squares (rss) / degree of freedom of residuals (rdf). Since roll_lm doesn't retain this, you need to use the coefficients to get the prediction and calculate this again:
data<-as.data.frame(matrix(rexp(20000, rate=.1), ncol=20))
vector<-rexp(1000,rate=0.1)
library(roll)
WI = 20
rlm = roll_lm(vector,data$V1,width=WI)
rdf = WI - ncol(rlm$coefficients)
Below we go through every window, get the prediction and calculate rss and from there get the sigma:
sigma = sapply(1:(nrow(data)-WI+1),function(i){
# basically intercept + predictor * coef
pred = cbind(rep(1,WI),vector[i:(i+WI-1)]) %*% rlm$coefficients[WI+i-1,]
rss = sum((data$V1[i:(i+WI-1)] - pred)^2)
sqrt(rss/rdf)
})
We can wrap this up in a function, that takes as input an x,y:
roll_w_sigm = function(x,y,WI=20){
rlm = roll_lm(x=vector,y=y,width=WI)
rdf = WI - ncol(rlm$coefficients)
rlm$sigma = sapply(1:(length(y)-WI+1),function(i){
pred = cbind(rep(1,WI),vector[i:(i+WI-1)]) %*% rlm$coefficients[WI+i-1,]
rss = sum((y[i:(i+WI-1)] - pred)^2)
sqrt(rss/rdf)
})
rlm
}
For 1 column:
res = roll_w_sigm(vector,data$V1)
head(res$sigma)
[1] 9.102188 9.297425 9.324338 9.509460 7.849201 7.993087
For all columns:
lapply(data,function(i)roll_w_sigm(vector,i))

How to validate performance of generalized linear regression model

I'm trying to validate the performance of a generalized linear model, that has a continuous output. Through research I found that the most effective means of validating the performance of a continuous model is to utilise rsquared, adjusted rsquared and RMSE methods(correct me if I'm wrong) rather than utilise the confusion matrix method (accuracy, precision, f1 etc.) used for binomial models.
How do I find the squared value for my model, based on the actual vs. predicted value. Below is the code for my glm model, data has been split into train and test.
Quite new to this so open to suggestions.
#GENERALISED LINEAR MODEL
LR_swim <- glm(racetime_mins ~ event_month +gender + place +
clocktime_mins +handicap_mins +
Wind_Speed_knots+
Air_Temp_Celsius +Water_Temp_Celsius +Wave_Height_m,
data = SwimmingTrain,
family=gaussian(link = "identity"))
summary(LR_swim)
#Predict Race_Time
pred_LR <- predict(LR_swim, SwimmingTest, type ="response")
pred_LR
Such performance measures can be implemented with a simple line of R code. So, for some dummy data:
preds <- c(1.0, 2.0, 9.5)
actuals <- c(0.9, 2.1, 10.0)
the mean squared error (MSE) is simply
mean((preds-actuals)^2)
# [1] 0.09
while the mean absolute error (MAE), is
mean(abs(preds-actuals))
# [1] 0.2333333
and the root mean squared error (RMSE) is simply the square root of the MSE, i.e.:
sqrt(mean((preds-actuals)^2))
# [1] 0.3
The last two measures have an additional advantage of being in the same scale as your original data (not the case for MSE).

Fitting survival density curves using different distributions

I am working with some log-normal data, and naturally I want to demonstrate log-normal distribution results in a better overlap than other possible distributions. Essentially, I want to replicate the following graph with my data:
where the fitted density curves are juxtaposed over log(time).
The text where the linked image is from describes the process as fitting each model and obtaining the following parameters:
For that purpose, I fitted four naive survival models with the above-mentioned distributions:
survreg(Surv(time,event)~1,dist="family")
and extracted the shape parameter (α) and the coefficient (β).
I have several questions regarding the process:
1) Is this the right way of going about it? I have looked into several R packages but couldn't locate one that plots density curves as a built-in function, so I feel like I must be overlooking something obvious.
2) Do the values corresponding log-normal distribution (μ and σ$^2$) just the mean and the variance of the intercept?
3) How can I create a similar table in R? (Maybe this is more of a stack overflow question) I know I can just cbind them manually, but I am more interested in calling them from the fitted models. survreg objects store the coefficient estimates, but calling survreg.obj$coefficients results a named number vector (instead of just a number).
4) Most importantly, how can I plot a similar graph? I thought it would be fairly simple if I just extract the parameters and plot them over the histrogram, but so far no luck. The author of the text says he estimated the density curves from the parameters, but I just get a point estimate - what am I missing? Should I calculate the density curves manually based on distribution before plotting?
I am not sure how to provide a mwe in this case, but honestly I just need a general solution for adding multiple density curves to survival data. On the other hand, if you think it will help, feel free to recommend a mwe solution and I will try to produce one.
Thanks for your input!
Edit: Based on eclark's post, I have made some progress. My parameters are:
Dist = data.frame(
Exponential = rweibull(n = 10000, shape = 1, scale = 6.636684),
Weibull = rweibull(n = 10000, shape = 6.068786, scale = 2.002165),
Gamma = rgamma(n = 10000, shape = 768.1476, scale = 1433.986),
LogNormal = rlnorm(n = 10000, meanlog = 4.986, sdlog = .877)
)
However, given the massive difference in scales, this is what I get:
Going back to question number 3, is this how I should get the parameters?
Currently this is how I do it (sorry for the mess):
summary(fit.exp)
Call:
survreg(formula = Surv(duration, confterm) ~ 1, data = data.na,
dist = "exponential")
Value Std. Error z p
(Intercept) 6.64 0.052 128 0
Scale fixed at 1
Exponential distribution
Loglik(model)= -2825.6 Loglik(intercept only)= -2825.6
Number of Newton-Raphson Iterations: 6
n= 397
summary(fit.wei)
Call:
survreg(formula = Surv(duration, confterm) ~ 1, data = data.na,
dist = "weibull")
Value Std. Error z p
(Intercept) 6.069 0.1075 56.5 0.00e+00
Log(scale) 0.694 0.0411 16.9 6.99e-64
Scale= 2
Weibull distribution
Loglik(model)= -2622.2 Loglik(intercept only)= -2622.2
Number of Newton-Raphson Iterations: 6
n= 397
summary(fit.gau)
Call:
survreg(formula = Surv(duration, confterm) ~ 1, data = data.na,
dist = "gaussian")
Value Std. Error z p
(Intercept) 768.15 72.6174 10.6 3.77e-26
Log(scale) 7.27 0.0372 195.4 0.00e+00
Scale= 1434
Gaussian distribution
Loglik(model)= -3243.7 Loglik(intercept only)= -3243.7
Number of Newton-Raphson Iterations: 4
n= 397
summary(fit.log)
Call:
survreg(formula = Surv(duration, confterm) ~ 1, data = data.na,
dist = "lognormal")
Value Std. Error z p
(Intercept) 4.986 0.1216 41.0 0.00e+00
Log(scale) 0.877 0.0373 23.5 1.71e-122
Scale= 2.4
Log Normal distribution
Loglik(model)= -2624 Loglik(intercept only)= -2624
Number of Newton-Raphson Iterations: 5
n= 397
I feel like I am particularly messing up the lognormal, given that it is not the standard shape-and-coefficient tandem but the mean and variance.
Try this; the idea is generating random variables using the random distribtion functions and then plotting the density functions with the output data, here is an example like you need:
require(ggplot2)
require(dplyr)
require(tidyr)
SampleData <- data.frame(Duration=rlnorm(n = 184,meanlog = 2.859,sdlog = .246)) #Asume this is data we have sampled from a lognormal distribution
#Then we estimate the parameters for different types of distributions for that sample data and come up for this parameters
#We then generate a dataframe with those distributions and parameters
Dist = data.frame(
Weibull = rweibull(10000,shape = 1.995,scale = 22.386),
Gamma = rgamma(n = 10000,shape = 4.203,scale = 4.699),
LogNormal = rlnorm(n = 10000,meanlog = 2.859,sdlog = .246)
)
#We use gather to prepare the distribution data in a manner better suited for group plotting in ggplot2
Dist <- Dist %>% gather(Distribution,Duration)
#Create the plot that sample data as a histogram
G1 <- ggplot(SampleData,aes(x=Duration)) + geom_histogram(aes(,y=..density..),binwidth=5, colour="black", fill="white")
#Add the density distributions of the different distributions with the estimated parameters
G2 <- G1 + geom_density(aes(x=Duration,color=Distribution),data=Dist)
plot(G2)

Finding the distribution from the data

I try to find the distribution for this dataset. I tried with the fitdistrplus package
data <- data.matrix(Book1)
descdist(data, discrete = FALSE)
but get this error:
Error in descdist(data, discrete = FALSE) : data must be a numeric vector
You can use instead
data <- as.numeric(Book1)
descdist(data, discrete = FALSE)
This gets you this graph:
And these values:
summary statistics
------
min: 3 max: 35
median: 5
mean: 6.244898
estimated sd: 3.517
estimated skewness: 1.977063
estimated kurtosis: 9.456783
If you then decide that the closest is an exponentional distribution, you can get its parameters like this
ft <- fitdist(data, distr = "exp" )
ft
Fitting of the distribution ' exp ' by maximum likelihood
Parameters:
estimate Std. Error
rate 0.1601307 0.002299016
And you can compare their density using this function:
denscomp(ft)

How to determine the initial points of the maximum likelihood method

I'm currently working on distribution fitting. I used fitdistr function, but having problem in determining the initial points for the MLE. For example, I want to fit my data (rainfall- 13149 by 1 matrix) with gamma distribution.
fit.gamma = fitdistr(rainfall,dgamma,start=list(shape = ?, scale = ?),method="Nelder-Mead")
The library fitdistrplus is very good for this. It will guess gamma parameters for you if you don't have starting values. Also, you can use method of moments if your guesses fail.
x <- rgamma(100, 0.5, 0.5)
library(fitdistrplus)
(pars <- fitdist(x, "gamma"))
# Fitting of the distribution ' gamma ' by maximum likelihood
# Parameters:
# estimate Std. Error
# shape 0.4443304 0.05131369
# rate 0.5622472 0.10644511

Resources