How to feed data into ode while doing optimisation - r

I'm new to R. I found very useful code, which I've tried to use for my purposes. however, I get an error:
Error in func(time, state, parms, ...) : object 'k4' not found and Error in func(time, state, parms, ...) : object 'E' not found
I don't know where the problem is as I can see all parameters and data.frame is correct as well.
Thank you everyone for taking time to look at this. I've tried to reduce the number of parameters to3 (k10, k11,k12), and using estimated values for the remaining (embeded values in the code). However, I still get an error message, the E value from data.frame is not passed into rxnrate function and as result ode can't use it. I've tried to use events and forcing functions but it doesn't seem to work. Thank you for spotting P4, it was a typo, should be P, I've corrected already.
Editors note: This was crossposted to Rhelp and that message included the source of this code as a stackoverflow question "r-parameter and initial conditions fitting ODE models with nls.lm."
#set working directory
setwd("~/R/wkspace")
#load libraries
library(ggplot2)
library(reshape2)
library(deSolve)
library(minpack.lm)
time=c(22,23,24,46,47,48)
cE=c(15.92,24.01,25.29,15.92,24.01,25.29)
cP=c(0.3,0.14,0.29,0.3,0.14,0.29)
cL=c(6.13,3.91,38.4,6.13,3.91,38.4)
df<-data.frame(time,cE,cP,cL)
df
names(df)=c("time","cE","cP","cL")
#rate function
rxnrate=function(t,c,parms){
#rate constant passed through a list called
k1=parms$k1
k2=parms$k2
k3=parms$k3
k4=parms$k4
k5=parms$k5
k6=parms$k6
k7=parms$k7
k8=parms$k8
k9=parms$k9
k10=parms$k10
#c is the concentration of species
#derivatives dc/dt are computed below
r=rep(0,length(c))
r[1]=(k1+(k2*E^k10)/(k3^k10+E^k10))/(1+P/k6)-k4* ((1+k5*P)/(1+k7*E))*c["pLH"]; #dRP_LH/dt
r[2]=(1/k8)*k4*((1+k5*P)/(1+k7*E))*c["p"]-k9*c["L"] #dL/dt
return(list(r))
}
ssq=function(myparms){
#initial concentration
cinit=c(pLH=unname(myparms[11]),LH=unname(myparms[12]))
print(cinit)
#time points for which conc is reported
#include the points where data is available
t=c(seq(0,46,2),df$time)
t=sort(unique(t))
#parameters from the parameters estimation
k1=myparms[1]
k2=myparms[2]
k3=myparms[3]
k4=myparms[4]
k5=myparms[5]
k6=myparms[6]
k7=myparms[7]
k8=myparms[8]
k9=myparms[9]
k10=myparms[10]
#solve ODE for a given set of parameters
out=ode(y=cinit,times=t,func=rxnrate,parms=list(k1=k1,k2=k2,k3=k3,k4=k4,k5=k5,k6=k6,k7=k7,k8=k8,k9=k9,k10=k10,E=cE,P=cP))
#Filter data that contains time points
outdf=data.frame(out)
outdf=outdf[outdf$time%in% df$time,]
#Evaluate predicted vs experimental residual
preddf=melt(outdf,id.var="time",variable.name="species",value.name="conc")
expdf=melt(df,id.var="time",variable.name="species",value.name="conc")
ssqres=preddf$conc-expdf$conc
return(ssqres)
}
# parameter fitting using levenberg marquart
#initial guess for parameters
myparms=c(k1=500, k2=4500, k3=200,k4=2.42, k5=0.26,k6=12.2,k7=0.004,k8=55,k9=24,k10=8,pLH=14.5,LH=3.55)
#fitting
fitval=nls.lm(par=myparms,fn=ssq)
#summary of fit
summary(fitval)
#estimated parameter
parest=as.list(coef(fitval))

Related

How to fit an inverse guassian distribution to my data, preferably using fitdist {fitdistrplus}

I am trying to analyze some Reaction Time data using GLMM. to find a distribution that fits my data best.I used fitdist() for gamma and lognormal distributions. the results showed that lognormal fits my data better.
However, recently i read that the inverse gaussian distribution might be a better fit for reaction time data.
I used nigFitStart to obtain the start values:
library(GeneralizedHyperbolic)
invstrt <- nigFitStart(RTtotal, startValues = "FN")
which gave me this:
$paramStart
mu delta alpha beta
775.953984862 314.662306398 0.007477984 -0.004930604
so i tried using the start parameteres for fitdist:
require(fitdistrplus)
fitinvgauss <- fitdist(RTtotal, "invgauss", start = list(mu=776, delta=314, alpha=0.007, beta=-0.05))
but i get the following error:
Error in checkparamlist(arg_startfix$start.arg, arg_startfix$fix.arg, :
'start' must specify names which are arguments to 'distr'.
i also used ig_fit{goft} and got the following results:
Inverse Gaussian MLE
mu 775.954
lambda 5279.089
so, this time i used these two parameters for the start argument in fitdist and still got the exact same error:
> fitinvgauss <- fitdist(RTtotal, "invgauss", start = list(mu=776, lambda=5279))
Error in checkparamlist(arg_startfix$start.arg, arg_startfix$fix.arg, :
'start' must specify names which are arguments to 'distr'.
someone had mentioned that changing the parametere names from mu and lambda to mean and shape had solved their problem, but i tried it and still got the same error.
Any idea how i can fix this? or could you suggest an alternative way to fit inverse gaussian to my data?
thank you
dput(RTtotal)
c(594.96, 659.5, 706.14, 620.92, 811.05, 420.63, 457.08, 585.53,
488.59, 484.87, 496.72, 769.01, 458.92, 521.76, 889.08, 514.11,
553.09, 564.68, 1057.19, 437.79, 660.33, 639.58, 643.45, 419.47,
469.16, 457.78, 530.58, 538.73, 557.17, 1140.09, 560.03, 543.18,
1093.29, 607.59, 430.2, 712.06, 716.6, 566.69, 989.71, 449.96,
653.22, 556.52, 654.8, 472.54, 600.26, 548.36, 597.51, 471.97,
596.72, 600.29, 706.77, 511.6, 475.89, 599.13, 570.12, 767.57,
402.68, 601.56, 610.02, 891.95, 483.22, 588.78, 505.95, 554.15,
445.54, 489.02, 678.13, 532.06, 652.61, 654.79, 535.08, 1215.66,
633.6, 645.92, 454.37, 535.81, 508.97, 690.78, 685.97, 703.04,
731.99, 592.75, 662.03, 1400.33, 599.73, 1021.34, 1232.35, 855.1,
780.32, 554.4, 1965.77, 841.89, 1262.76, 721.62, 788.95, 1104.24,
1237.4, 1193.04, 513.91, 474.74, 380.56, 570.63, 700.96, 380.89,
481.96, 723.63, 835.22, 781.1, 468.76, 555.1, 522.22, 944.29,
541.06, 559.18, 738.68, 880.58, 500.14, 1856.97, 1001.59, 703.7,
1022.35, 1813.35, 1128.73, 864.75, 1166.77, 1220.4, 776.56, 2073.72,
1223.88, 617, 1387.71, 595.57, 1506.13, 678.41, 1797.87, 2111.04,
1116.61, 1038.6, 894.25, 778.51, 908.51, 1346.69, 989.09, 1334.17,
877.31, 649.31, 978.22, 1276.84, 1001.58, 1049.66, 1131.83, 700.8,
1267.21, 693.52, 1182.3)
So I'm guessing that you failed to tell us that you also have the statmod-package loaded (or perhaps some other package with a 'invgauss'-family including a dinvgauss function). You should be able to tell which package dinvgauss comes from by reading the top line of the help page for the function, So after installing that package and reading the help page (which one should ALWAYS do) for ?dinvgauss:
fitinvgauss <- fitdist(RTtotal, "invgauss",
start = list(mean=776, dispersion=314, shape=1))
fitinvgauss
# --------------
Fitting of the distribution ' invgauss ' by maximum likelihood
Parameters:
estimate Std. Error
mean 779.2535 NA
dispersion -1007.5490 NA
shape 4972.5745 NA
All I did was read the error message and then read the help page and use the correct names for that function's parameters. (And then play around a bit to get the parameter starting values into the feasible range of values.)

H2O.GeneralizedLowRankModel objective is NA when passing loss by column

I am working with h2o glrm function. When I am trying to pass loss_by_col argument in order to specify different loss function for each column in my DataFrame (I have normal, poisson and binomial variables, so I am passing "Quadratic", "Poisson" and "Logistic" loss), the objective is not getting computed. The testmodel#model$objective returns NaN. But at the same time summary shows that there was few iterations made and objective was NA for all of them. The quality of model is very bad, but the archetypes are somehow computed. So I am confused. How should pass different loss for every variable in my dataset? Here is a (i hope) reproducible example:
df <- data.frame(p1 = rpois(100, 5), n1 = rnorm(100), b1 = rbinom(100, 1, 0.5))
df$b1 <- factor(df$b1)
h2df <- as.h2o(df)
testmodel <- h2o.glrm(h2df,
k=3,
loss_by_col=c("Poisson", "Quadratic", "Logistic"),
transform="STANDARDIZE")
testmodel#model$objective
summary(testmodel)
plot(testmodel)
Please note that there is a jira ticket for this here
It's interesting that you don't get an error when you run your code snippet. When I run your code snippet I get the following error:
Error: DistributedException from localhost/127.0.0.1:54321: 'Poisson loss L(u,a) requires variable a >= 0', caused by java.lang.AssertionError: Poisson loss L(u,a) requires variable a >= 0
I can resolve this error by removing transform="STANDARDIZE", because standardization can lead to negative values. For more information on what the transformations do you can take a look at the user guide here for your convenience here is the definition of how standardize gets used Standardize: Standardizing subtracts the mean and then divides each variable by its standard deviation.

'Invalid parent values' error when running JAGS from R

I am running a simple generalized linear model, calling JAGS from R. The model is negatively binomially distributed. The model is being fitted to data on counts of fish, with the majority of individual counts ('C' in the data set below) being zeros.
I initially ran the model with one covariate, temperature ('Temp'). About half of the time the model ran and the other half of the time the model gave me the error, 'Error in node C[###] Invalid parent values.' The value for C[###] in the error message changes with each successive attempt to run the model.
Since my success at running the model was inconsistent, I tried adding another covariate, salinity ('Salt'). Then the model would not run at all, with the same error message as above.
Any ideas or suggestions on the source of the error are greatly appreciated.
I am suspecting that the initial values for the dispersion parameter, r, may be the issue. Ideally I add several more covariates into model fitting if this error can be addressed.
The data set and code are immediately below. For sake of getting the data to load properly on this website, I have omitted 662 of the 672 total values; even with the reduced data set (n = 10 instead of n = 672) the problem remains.
Thank you.
setwd("C:/Users/John/Desktop")
library('coda')
library('rjags')
library('R2jags')
set.seed(1000000000)
#data
n=10
C=c(0,0,0,0,0,1,0,0,0,1)
Temp=c(0,29.3,25.3,28.7,28.7,24.4,25.1,25.1,24.2,23.3)
Salt=c(6,6,0,6,6,0,12,12,6,12)
sink("My Model.txt")
cat("
model {
r~dunif(0,10)
beta0~dunif (-20,20)
beta1~dunif (-20,20)
beta2~dunif (-20,20)
for (i in 1:n) {
C[i] ~ dnegbin(p[i], r)
p[i] <- r/(r+lambda[i])
log(lambda[i]) <- mu[i]
mu[i] <- beta0 + beta1*Temp[i] + beta2*Salt[i]
}
}
", fill=TRUE)
sink()
n=n
C=C
Temp=Temp
Salt=Salt
#bundle data
bugs.data = list(
"n",
"C",
"Temp",
"Salt")
#parameters to monitor
params<-c(
"r",
"beta0",
"beta1",
"beta2")
#initial values
inits <- function(){list(
r=floor(runif(1,0,5)),
beta0=runif(1,-5,5),
beta1=runif(1,-5,5),
beta2=runif(1,-5,5))}
model.file <- 'My Model.txt'
jagsfit <- jags(data=bugs.data, inits=inits, params, n.iter=1000, n.thin=10, n.burnin=100, model.file)
print(jagsfit, digits=5)
This works fine for me most of the time, but it would fail with the error you describe if the inits function samples a value of r of 0 - which you have made more likely by using floor() in the inits function (not sure why you did that - r is not restricted to integers but is strictly positive). Also, every time you run the model you will get different initial values (unless setting a random seed in R) which is making your life more complicated that it needs to be. I generally recommend picking fixed (and probably over dispersed) initial values, such as r=0.01 and r=10 for the two chains in your example.
However, JAGS picks usable initial values for this model as you can see by not providing your own inits e.g.:
library('runjags')
listdata <- lapply(bugs.data, get)
names(listdata) <- unlist(bugs.data)
run.jags(model.file, params, listdata)
I would also have a think about the prior you are using for r - it could well be that this will have a bigger effect on your posterior than intended. Another (not necessarily better) option is something like a gamma prior.
Matt

Nonlinear regression in R shows error

I am using the R function nlsLM from the package minpack.LM and I have the following error.
I generate my own signal with noise, so I know all parameters, which I'am trying to find doing regression analysis using the same function, I've used to generate signal.
The problem is, that nlsLM function runs fine, and it even could find right parameters values, but at last, when it finds them, error appear like this:
It. 23, RSS = 14.4698, Par. = 42.6727 0.78112 1 65.2211 15.6065 1
It. 24, RSS = 14.4698, Par. = 42.671 0.781102 1 65.2212 15.6069 1
Error in stats:::nlsModel(formula, mf, start, wts) :
singular gradient matrix at initial parameter estimates
And I do not know what to do.
Please explain what it could be, and how I could solve it!
Additional information:
#This is how i generate my signal (it is convolution of gaussian with exp(-kt)
set.seed(100)
Yexp=sim_str_exp(error=10)
time=Yexp[[1]]
y=Yexp[[2]]
dataset_nls=data.frame(time,y)
start=c(tau1=.5,beta1=.5,exp_A1=.5,gaus_pos=.5,gaus_width=.5,gaus_A=0.5)
lower=c(tau1=0.01,beta1=0.01,exp_A1=0.01,gaus_pos=0.01,gaus_width=0.01,gaus_A=0.01)
upper=c(tau1=100,beta1=1,exp_A1=1,gaus_pos=100,gaus_width=850,gaus_A=1)
#here i do fitting
FIT=nlsLM(y ~ str_exp_model(time,tau1,beta1,exp_A1,gaus_pos,gaus_width,gaus_A),data=dataset_nls,start=start,lower=lower,upper=upper,trace=TRUE,algorithm="LM",na.action=na.pass,control=nls.lm.control(maxiter=200,nprint=1))
#Model_function
str_exp_model<-function(time, tau1,beta1,exp_A1,gaus_pos,gaus_width,gaus_A){
F_gen_V<-vector(length=length(time))
F_gaus_V=vector(length=length(time))
F_exp_V=vector(length=length(time))
for (i in 1:length(time)) {
F_gaus_V[i]=gaus_A*exp(-2.77*((i-gaus_pos)/gaus_width)^2)
F_exp_V[i]=exp_A1*exp(-1*(i/tau1)^beta1)
}
convolve(F_gaus_V, F_exp_V,FALSE)
}
function for signal generation
sim_str_exp<- function(num_points=512,time_scale=512,tau1=45,beta1=.80,exp_A1=1,gaus_pos=65,
gaus_width=15, gaus_A=1,Y0=0, error=2.0, show_graph=TRUE, norm="False"){
F_gen_V<-vector(length=num_points)
time_gen_V<-vector(length=num_points)
F_gaus_V=vector(length=num_points)
F_exp_V=vector(length=num_points)
ts=time_scale/num_points
sigma=vector(length=num_points)
for (i in 1:num_points) {
F_gaus_V[i]=gaus_A*exp(-2.77*((i*ts-gaus_pos)/gaus_width)^2)
F_exp_V[i]=exp_A1*exp(-1*(i*ts/tau1)^beta1)
time_gen_V[i]=i*ts
}
F_gen_V<-(convolve(F_gaus_V, F_exp_V,FALSE))+Y0
if(norm==TRUE){
F_gen_V=F_gen_V/max(F_gen_V)}
else{;}
error_V=runif(512,-1*error, error)
for(i in 1:num_points){
F_gen_V[i]=error_V[i]/100*F_gen_V[i]+F_gen_V[i]
sigma[i]=(error_V[i]/100*F_gen_V[i])
}
RETURN=list(time=time_gen_V,y=F_gen_V,sigma=sigma)
if (show_graph==TRUE){
plot(RETURN[[1]],RETURN[[2]], type="l", main="Generated signal with noise",xlab="time, pixel",ylab="Intensity");}
else {;}
return(RETURN)
}
You haven't shown us sim_str_exp, so this example isn't fully reproducible, but I'm going to take a guess here. You say "I generate my own signal with noise", but you use Yexp=sim_str_exp(error=0) to generate the data, so it looks like you're not in fact adding any noise. (Also, your reported RSS at the last step is 1.37e-28 ...)
My guess is that you're running into a problem documented in ?nls, which is that nls() doesn't work well when there is zero noise. This is not documented in ?nlsLM, but I wouldn't be surprised if it held there too.
For convenience, here is the section I'm referring to from ?nls:
Do not use ‘nls’ on artificial "zero-residual" data.
The ‘nls’ function uses a relative-offset convergence criterion
that compares the numerical imprecision at the current parameter
estimates to the residual sum-of-squares. This performs well on
data of the form
y = f(x, theta) + eps
(with ‘var(eps) > 0’). It fails to indicate convergence on data
of the form
y = f(x, theta)
because the criterion amounts to comparing two components of the
round-off error. If you wish to test ‘nls’ on artificial data
please add a noise component, as shown in the example below.
If my hypothesis is correct then you should be able to get a fit without errors if you set the noise amplitude greater than zero.

Predicting with lm object in R - black box paradigm

I have a function that returns an lm object. I want to produce predicted values based on some new data. The new data is a data.frame in the exact format as the data passed to the lm function, except that the response has been removed (since we're predicting, not training). I would expect to execute the following, but get an error:
predict( model , newdata )
"Error in eval(expr, envir, enclos) : object 'ModelResponse' not found"
In my case, ModelResponse was the name of the response column in the data I originally trained on. So just for kicks, I tried to insert NA reponse:
newdata$ModelResponse = NA
predict( model , newdata )
Error in terms.default(object, data = data) : no terms component nor attribute
Highly frustrating! R's notion of models/regression doesn't match mine: 1. I train a model with some data and get a model object. 2. I can score new data from any environment/function/frame/etc. so long as I input data into the model object that "looks like" the data I trained on (i.e. same column names). This is a standard black-box paradigm.
So here are my questions:
1. What concept(s) am I missing here?
2. How do I get my scenario to work?
3. How can I get model object to be portable? str(model) shows me that the model object saved the original data it trained on! So the model object is massive. I want my model to be portable to any function/environment/etc. and only contain the data it needs to score.
In the absence of str() on either the model or the data offered to the model, here's my guess regarding this error message:
predict( model , newdata )
"Error in eval(expr, envir, enclos) : object 'ModelResponse' not found"
I guess that you made a model object named "model" and that your outcome variable (the left-hand-side of the formula( in the original call to lm was named "ModelResponse" and that you then named a column in newdata by the same name. But what you should have done was leave out the "ModelResponse" columns (because that is what you are predicting) and put in the "Model_Predictor1", Model_Predictor2", etc. ... i.e. all the names on the right-hand-side of the formula given to lm()
The coef() function will allow you to extract the information needed to make the model portable.
mod.coef <- coef(model)
mod.coef
Since you expressed interest in the rms/Hmisc package combo Function, here it is using the help-example from ols and comparing the output with an extracted function and the rms Predict method. Note the capitals, since these are designed to work with the package equivalents of lm and glm(..., family="binomial") and coxph, which in rms become ols, lrm, and cph.
> set.seed(1)
> x1 <- runif(200)
> x2 <- sample(0:3, 200, TRUE)
> distance <- (x1 + x2/3 + rnorm(200))^2
> d <- datadist(x1,x2)
> options(datadist="d") # No d -> no summary, plot without giving all details
>
>
> f <- ols(sqrt(distance) ~ rcs(x1,4) + scored(x2), x=TRUE)
>
> Function(f)
function(x1 = 0.50549065,x2 = 1) {0.50497361+1.0737604* x1-
0.79398383*pmax(x1-0.083887788,0)^3+ 1.4392827*pmax(x1-0.38792825,0)^3-
0.38627901*pmax(x1-0.65115162,0)^3-0.25901986*pmax(x1-0.92736774,0)^3+
0.06374433*x2+ 0.60885222*(x2==2)+0.38971577*(x2==3) }
<environment: 0x11b4568e8>
> ols.fun <- Function(f)
> pred1 <- Predict(f, x1=1, x2=3)
> pred1
x1 x2 yhat lower upper
1 1 3 1.862754 1.386107 2.339401
Response variable (y): sqrt(distance)
Limits are 0.95 confidence limits
# The "yhat" is the same as one produces with the extracted function
> ols.fun(x1=1, x2=3)
[1] 1.862754
(I have learned through experience that the restricted cubic-spline fit functions coming from rms need to have spaces and carriage returns added to improve readability. )
Thinking long-term, you should probably take a look at the caret package. Many or most modeling functions work with data frames and matrices, others have a preference, and there may be other variations of their expectations. It's important to quickly get your head around each, but if you want a single wrapper that will simplify life for you, making the intricacies into a "black box", then caret is as close as you can get.
As a disclaimer: I do not use caret, as I don't think modeling should be a be a black box. I've had more than a few emails to maintainers of modeling packages resulting from looking into their code and seeing something amiss. Wrapping that in another layer would not serve my interests. So, in the very long-run, avoid caret and develop an enjoyment for dissecting what's going into and out of the different modeling functions. :)

Resources