Replacing negative values in a model (system of ODEs) with zero - r

I'm currently working on solving a system of ordinary differential equations using deSolve, and was wondering if there's any way of preventing differential variable values from going below zero. I've seen a few other posts about setting negative values to zero in a vector, data frame, etc., but since this is a biological model (and it doesn't make sense for a T cell count to go negative), I need to stop it from happening to begin with so these values don't skew the results, not just replace the negatives in the final output.

My standard approach is to transform the state variables to an unconstrained scale. The most obvious/standard way to do this for positive variables is to write down equations for the dynamics of log(x) rather than of x.
For example, with the Susceptible-Infected-Recovered (SIR) model for infectious disease epidemics, where the equations are dS/dt = -beta*S*I; dI/dt = beta*S*I-gamma*I; dR/dt = gamma*I we would naively write the gradient function as
gfun <- function(time, y, params) {
g <- with(as.list(c(y,params)),
c(-beta*S*I,
beta*S*I-gamma*I,
gamma*I)
)
return(list(g))
}
If we make log(I) rather than I be the state variable (in principle we could do this with S as well, but in practice S is much less likely to approach the boundary), then we have d(log(I))/dt = (dI/dt)/I = beta*S-gamma; the rest of the equations need to use exp(logI) to refer to I. So:
gfun_log <- function(time, y, params) {
g <- with(as.list(c(y,params)),
c(-beta*S*exp(logI),
beta*S-gamma,
gamma*exp(logI))
)
return(list(g))
}
(it would be slightly more efficient to compute exp(logI) once and store/re-use it rather than computing it twice ...)

If a value doesn’t become negative in reality but becomes negative in your model, you should change your model or, equivalently, modify your differential equations such that this is not possible. With other words: Do not try to constrain your dynamical variables but their derivatives. Everything else will only lead to problems with your solver, while it should not care about a change in the differential equation.
For a simple example, suppose that:
you have a one-dimensional differential equation ẏ = f(y),
y shall not become negative,
your initial y is positive.
In this case, y can only become negative if f(0) < 0. Thus, all you have to do is to modify f such that f(0) ≥ 0 (and it is still smooth).
For a proof of principle, you can multiply f with an appropriately modified sigmoid function (which allows you to compose every logical operation with smooth functions). This way, nothing would change for most values of y, and you only change your differential equation if y is close to 0, i.e., when you were going to manipulate things anyway.
However, I would not really recommend using sigmoids without thinking about your model. If your model is totally wrong near y = 0, it will very likely already be useless for nearby values. If your simulations venture in this terrain and you want the results to be meaningful, you should fix this.

Related

How do I solve a system of second order differential equations using Octave?

I am trying to solve two equations below for my project. I was unable to find any straightforward guide to solving the differential equations.
I am trying to plot a graph of h over time for the equations with a given initial h, miu, r, theta, g, L, and derivatives of h and theta wrt t. Is this possible? And if so, how?
The two equations mentioned
I tried to type the equations into Octave with the given conditions, but there seems to be an error that I am unable to identify.
Whatever is the numerical software you will use, Octave or another one (that does not formal calculation), the first step is to transform your system or N coupled Ordinary Differential Equations (ODEs) of order p into a system of p*N coupled ODEs of order 1.
This is always done by setting intermediate derivatives as new variables.
For your system, then
will become
Then, with Octave, and as explained in doc lsode you define a function say Xdot = dsys(X), that codes this system. X is the vector [h, theta, H, J], and Xdot is the returned vector of their respective derivatives, as defined by the right hand expressions of the system of first order ODEs.
So, the 2 first elements of Xdot will be trivial, just Xdot=[X(3) X(4) ...].
Of course, dsys() must also use the parameters M, g, m, µ, L, and r. As far as i understand, they can't be passed as additional arguments to dsys(). So you must defined them before calling lsode.
For initial states, you must define the vector X0=[h0, theta0, H0, J0] of known initial values.
The vector of increasing times >= 0 to which you want to compute and get the values of X must then be defined. For instance, t = 0:100. 0 must be the first element of t.
Finally, call Xt = lsode(#dsys, X0, t). After that you should get
Xt(:,1) are the values of h(t)
Xt(:,2) are the values of theta(t)
Xt(:,3) are the values of H(t)=(dh/dt)(t)
Xt(:,4) are the values of J(t)=(dtheta/dt)(t)

How can I “translate” a statistical model defined on paper to the computer using R?

I have initially posted this question on stats.stackexchange.com,
but it was closed due to being focused on programming. Hopefully, I
can get any help here.
I will not put many theoretical details here to make it simple, but my final goal is to implement a Hidden Markov Model using R.
Although I am fine with the theoretical model construction, when I tried to implement it, I realized that I do not know basic things about computational statistics. My question goes into this direction.
Let and be random variables such that and , with and . If denotes distribution, how can I compute
using R?
I mean, what is the exact meaning of these distributions (one discrete and one continuous) multiplication? How can I do this using R? The answer is obviously a function of , but how is it represented in my code?
Is there any change if is also discrete? For instance, , with . How it would affect the implemented code?
I know my questions are not very specific, but I am very lost on how to start. My goal with this question is understanding how I can "translate" what I have written in paper to the computer.
Translation
The equations describe how to compute the probability distribution of X given an observation of Y=y and values for parameters p and sigma. Ultimately, you want to implement a function p_X_given_Y that takes a value of Y and returns a probability distribution for X. A good place to start is to implement the two functions used in the RHS of the expression. Something like,
p_X <- function (x, p=0.5) { switch(as.character(x), "0"=p, "1"=1-p, 0) }
p_Y_given_X <- function (y, x, sigma=1) { dnorm(y, x, sd=sigma) }
Note that p and sigma are picked arbitrarily here. These functions can then be used to define the p_X_given_Y function:
p_X_given_Y <- function (y) {
# numerators: for each x \in X
ps <- sapply(c("0"=0,"1"=1),
function (x) { p_X(x) * p_Y_given_X(y, x) })
# divide out denominator
ps / sum(ps)
}
which can be used like:
> p_X_given_Y(y=0)
# 0 1
# 0.6224593 0.3775407
> p_X_given_Y(y=0.5)
# 0 1
# 0.5 0.5
> p_X_given_Y(y=2)
# 0 1
# 0.1824255 0.8175745
These numbers should make intuitive sense (given p=0.5): Y=0 is more likely to come from X=0, Y=0.5 is equally likely to be X=0 or X=1, etc.. This is only one way of implementing it, where the idea is to return the "distribution of X", which in this case is simply a named numeric vector, where the names ("0", "1") correspond to the support of X, and the values correspond to the probability masses.
Some alternative implementations might be:
a p_X_given_Y(x,y) that also takes a value for x and returns the corresponding probability mass
a p_X_given_Y(y) that returns another function that takes an x argument and returns the corresponding probability mass (i.e., the probability mass function)

Automatically find the scaling factor of the x-axis using LsqFit (or other method)?

I have the following data: a vector B and a vector R. The vector B is the "independent" variable. For this pair, I have two data sets: One is an experimental measurement of Bex, Rex and the other is a simulation produced by me Bsim, Rsim. The simulation does not have any "scale" for the x-axis (the B vector). Therefore when I am trying to fit my curve to the experiment, I have to find out a scaling parameter B0 "by eye", and with this number B0 I multiply the entire Bsim vector and simply plot(Bsim, Rsim, Bex, Rex).
I wanted to use the package LsqFit to make the procedure automatic and more accurate. However I am having trouble in understanding how I could use it to find the scaling on the independent variable.
My first thought was to just "invert" the roles of B and R. However, there are two issues that I think make matters worse: 1) the R curve/data is not monotonous, 2) the experimental data are much more "dense" (they have more data-points: my simulation has 120 points in total, the experiments have some thousands).
Below I give an example if what I am trying to accomplish (of course, the answer need not use LsqFit). I also attach two figures that demonstrate everything very clearly.
#= stuff happened before this point =#
Bsim, Rsim = load(simulation)
Bex, Rex = load(experiment)
#this is what I want to do:
some_model(x, p) = ???
fit = curve_fit(some_model, Bex, Rex, [3.5])
B0 = fit.param[1]
#this is what I currently do by trail and error:
B0 = 3.85 #this is what I currently do by trial and error
plot(B0*Bsim, Rsim, Bex, Rex)
P.S.: The R curves (dependent variables) are both normalized by their maximum value because their scaling is not important.
A simple approach iff you can always expect both your experiment and simulation to feature one high peak, and you're sure that there's only a scaling factor rather than also an offset, is to simply multiply your Bsim vector by mode_rex / mode_rsim (e.g. in your example, mode_rsim = 1, and mode_rex = 4, so multiply Bsim by 4. But I'm sure you've thought of this already.
For a more general approach, one way is as follows:
add and load Interpolations package
Create a grid to interpolate over, e.g. Grid = 0:0.01:Bex[end]
interpolate Rex over that grid, e.g.
RexInterp = interpolate( (Bex,), Rex, Gridded(Linear()));
RexGridVec = RexInterp[Grid];
interpolate Rsim over the same grid, but introduce your multiplier on the Bsim "knots", e.g.
Multiplier = 0.1;
RsimInterp = interpolate( (Multiplier * Bsim,), Rsim, Gridded(Linear()));
RsimGridVec = RsimInterp[Grid]
Now you can calculate a square error value between RsimGridVec and RexGridVec, e.g.
SqErr = sum((RsimGridVec - RexGridVec).^2)
If you follow this technique, then if you create a loop for a multiplier range (say 0:0.01:10), and get the square error associated with each multiplier, you can find out the multiplier for which the square error is the minimum.
In theory if you wanted to find the optimal for a particular offset too, you can make it the outer loop for a range of offsets. Mind you this is a brute force approach, but it be reasonably efficient judging by the vectors in your graph.

Why do the inverse t-distributions for small values differ in Matlab and R?

I would like to evaluate the inverse Student's t-distribution function for small values, e.g., 1e-18, in Matlab. The degrees of freedom is 2.
Unfortunately, Matlab returns NaN:
tinv(1e-18,2)
NaN
However, if I use R's built-in function:
qt(1e-18,2)
-707106781
The result is sensible. Why can Matlab not evaluate the function for this small value? The Matlab and R results are quite similar to 1e-15, but for smaller values the difference is considerable:
tinv(1e-16,2)/qt(1e-16,2) = 1.05
Does anyone know what is the difference in the implemented algorithms of Matlab and R, and if R gives correct results, how could I effectively calculate the inverse t-distribution, in Matlab, for smaller values?
It appears that R's qt may use a completely different algorithm than Matlab's tinv. I think that you and others should report this deficiency to The MathWorks by filing a service request. By the way, in R2014b and R2015a, -Inf is returned instead of NaN for small values (about eps/8 and less) of the first argument, p. This is more sensible, but I think they should do better.
In the interim, there are several workarounds.
Special Cases
First, in the case of the Student's t-distribution, there are several simple analytic solutions to the inverse CDF or quantile function for certain integer parameters of ν. For your example of ν = 2:
% for v = 2
p = 1e-18;
x = (2*p-1)./sqrt(2*p.*(1-p))
which returns -7.071067811865475e+08. At a minimum, Matlab's tinv should include these special cases (they only do so for ν = 1). It would probably improve the accuracy and speed of these particular solutions as well.
Numeric Inverse
The tinv function is based on the betaincinv function. It appears that it may be this function that is responsible for the loss of precision for small values of the first argument, p. However, as suggested by the OP, one can use the CDF function, tcdf, and root-finding methods to evaluate the inverse CDF numerically. The tcdf function is based on betainc, which doesn't appear to be as sensitive. Using fzero:
p = 1e-18;
v = 2
x = fzero(#(x)tcdf(x,v)-p, 0)
This returns -7.071067811865468e+08. Note that this method is not very robust for values of p close to 1.
Symbolic Solutions
For more general cases, you can take advantage of symbolic math and variable precision arithmetic. You can use identities in terms of Gausian hypergeometric functions, 2F1, as given here for the CDF. Thus, using solve and hypergeom:
% Supposedly valid for or x^2 < v, but appears to work for your example
p = sym('1e-18');
v = sym(2);
syms x
F = 0.5+x*gamma((v+1)/2)*hypergeom([0.5 (v+1)/2],1.5,-x^2/v)/(sqrt(sym('pi')*v)*gamma(v/2));
sol_x = solve(p==F,x);
vpa(sol_x)
The tinv function is based on the betaincinv function. There is no equivalent function or even an incomplete Beta function in the Symbolic Math toolbox or MuPAD, but a similar 2F1 relation for the incomplete Beta function can be used:
p = sym('1e-18');
v = sym(2);
syms x
a = v/2;
F = 1-x^a*hypergeom([a 0.5],a+1,x)/(a*beta(a,0.5));
sol_x = solve(2*abs(p-0.5)==F,x);
sol_x = sign(p-0.5).*sqrt(v.*(1-sol_x)./sol_x);
vpa(sol_x)
Both symbolic schemes return results that agree to -707106781.186547523340184 using the default value of digits.
I've not fully validated the two symbolic methods above so I can't vouch for their correctness in all cases. The code also needs to be vectorized and will be slower than a fully numerical solution.

Can a very large (or very small) value in feature vector using SVC bias results? [scikit-learn]

I am trying to better understand how the values of my feature vector may influence the result. For example, let's say I have the following vector with the final value being the result (this is a classification problem using an SVC, for example):
0.713, -0.076, -0.921, 0.498, 2.526, 0.573, -1.117, 1.682, -1.918, 0.251, 0.376, 0.025291666666667, -200, 9, 1
You'll notice that most of the values center around 0, however, there is one value that is orders of magnitude smaller, -200.
I'm concerned that this value is skewing the prediction and is being weighted unfairly heavier than the rest simply because the value is so much different.
Is this something to be concerned about when creating a feature vector? Or will the statistical test I use to evaluate my vector control for this large (or small) value based on the training set I provide it with? Are there methods available in sci-kit learn specifically that you would recommend to normalize the vector?
Thank you for your help!
Yes, it is something you should be concerned about. SVM is heavily influenced by any feature scale variances, so you need a preprocessing technique in order to make it less probable, from the most popular ones:
Linearly rescale each feature dimension to the [0,1] or [-1,1] interval
Normalize each feature dimension so it has mean=0 and variance=1
Decorrelate values by transformation sigma^(-1/2)*X where sigma = cov(X) (data covariance matrix)
each can be easily performed using scikit-learn (although in order to achieve the third one you will need a scipy for matrix square root and inversion)
I am trying to better understand how the values of my feature vector may influence the result.
Then here's the math for you. Let's take the linear kernel as a simple example. It takes a sample x and a support vector sv, and computes the dot product between them. A naive Python implementation of a dot product would be
def dot(x, sv):
return sum(x_i * sv_i for x_i, sv_i in zip(x, sv))
Now if one of the features has a much more extreme range than all the others (either in x or in sv, or worse, in both), then the term corresponding to this feature will dominate the sum.
A similar situation arises with the polynomial and RBF kernels. The poly kernel is just a (shifted) power of the linear kernel:
def poly_kernel(x, sv, d, gamma):
return (dot(x, sv) + gamma) ** d
and the RBF kernel is the square of the distance between x and sv, times a constant:
def rbf_kernel(x, sv, gamma):
diff = [x_i - sv_i for x_i, sv_i in zip(x, sv)]
return gamma * dot(diff, diff)
In each of these cases, if one feature has an extreme range, it will dominate the result and the other features will effectively be ignored, except to break ties.
scikit-learn tools to deal with this live in the sklearn.preprocessing module: MinMaxScaler, StandardScaler, Normalizer.

Resources