add second independent variable to ode in R - r

I am using a pharmacokinetic model to fit concentrations to different exposure times.
Additionally to the independent time variable, I would like to add a vector with data for the variable cw, so at t=1 I have the value cw=41 and at t=2 the value cw=17 and so on.
How do I modify the ode, so this actually works? I can only get it to run with one value for cw.
fun<-function(time, y, parms, ...){
with(as.list(c(parms.atl, y)), {
dCp<-(k1*cw-k2*cp-k3*cp+k4*cs)
dCs<-(k3*cp-k4*cs)
list(c(dCp, dCs))
})
}
cw<-100
y0<-c(cp=0,
cs=0)
time<-seq(0:56)
parms<-c(k1=200, k2=0.1, k3=1, k4=0.1)
ode(y0, times, fun, parms)
The code works the way it is written above, now I would like to exchange cw with a vector
set.seed(42)
cw<-.Random.seed[1:57]
At the moment I get the following error, if cw is a vector instead of a number:
The number of derivatives returned by func() (58) must equal the length of the initial conditions vector (2)

Related

What are the correct Conditions to test for Weighted Die?

I was commanded with the following question. Write a function, named pdice, to simulate a weighted die that has probability of landing on 1:6 of (p1,p2,p3,p4,p5,p6) respectively. You can simulate this by using the parameter prob = p in the sample function, where p is a vector of non-negative numbers of length 6 (at least one needs to be >0). It will use p/sum(p) as the probabilities. This function has arguments p, n. Check ?sample to find out what conditions you need to check for p. Your function should generate an error message if these conditions are not met. Below is my code thus far, which runs, though giving back a warning about a coercing error of double to logical.
pdice <- function(n, p){
weightofDies <- c(1/40, rep(4/40,4), 23/40)
roll <- sample(1:6, size = n, replace = TRUE, prob = weightofDies)
if(n>0 && all(p=1)) {
return(roll)
}
else {
print("Error, Conditions Not Met")
}
}
I'm confused when the question says use parameter prob = p, then defines p as a vector of non-negative numbers of length 6. How can a probability be defined as a vector? Thus when it came to the conditions my brief understanding made sure the number of rolls (n) was greater than zero. And with p I just went on ahead to make sure the probabilities of this "Vector" added up to one. However not sure if my process thus far is correct. I created my biased probabilities via weightofDies.

R for loop with panel data for z-score calculation

I am currently working on creating some functions in RStudio with a dataset on roughly 100,000 individuals that are observed from 2005-2013. I have an unbalanced panel with two variables of interest - lets call them x and y for the sake of simplicity.
The function I am specifying takes the form of:
z = (mean(x) + mean(y)) / sd(x)
As noticeable, it is a normal z-score measure that is often used as a normalisation technique during the pre-processing stage of model estimation.
The goal of specifying the function is to compute z for each individual i in the dataset whilst taking into account that there are different periods T = 1,2...,t observed for the different individuals. In other words, in some cases I have data from 2008-2013, and for others I have data from say 2006-2010.
At the moment I have specified my function as follows:
z1 <- function(x,y) {
(mean(x) + mean(y))/sd(x)
}
when I execute it as:
z1(x,y)
I only get one number as an output representing the calculation from the total number of observations (about 150,000 rows). How should I edit my code to make sure I get one number for each individual in my dataset?
I am assuming that I must use a for loop that iterates and computes the z score for one individual at the time, but I am not sure how to specify this when writing my function.
It's returning a single value because the mean(x), mean(y) and sd(x) are all numeric values and you're not asking it to do anything else.
The following code simulates two (vectors) and does what (I think it is) that you want. It would help if were more descriptive though on your task.
x <- rbinom(100,3,(2/5))
y <- rpois(100,2.5)
f <- function(mvL,mvR){
answer = NULL;
vector <- readline('Which?: ')
if (vector=='Left'){
for (i in 1:length(mvL)){
answer[i] = mvL[i] - ((mean(mvL) + mean(mvR)) / sd(mvL));
}
}
else{
for (i in 1:length(mvR)){
answer[i] = mvR[i] - ((mean(mvL) + mean(mvR)) / sd(mvL));
}
}
return (answer);
}
f(x,y)

Solving ODE with deSolve in R- number of derivatives error

I am trying to use the deSolve package for a set of ODE's with equations as auxiliary variables. I keep getting the error where the number of derivatives isn't the same length as the initial conditions vector. What should I change?
# rm(list=ls())
library(deSolve)
exponential=function(t,state,parameters){ with(as.list( c(state,parameters)), {
#Aux. Var.
fX2 = pmax(0,1-(1-(d2/r12)*(X2/K2)))
fX1 = X1/(X1+k1);
# equations (ODE)
dX1 = C-((d1)*(X1))-(r12)*(X2)*fX2*fX1 # differential equaion
dX2 = r12*(X2)*fX2*fX1-((d2)*(X2))
return(list(c(dX1, dX2)))
})
}
# -- RUN INFORMATION
# Set Initial Values and Simulation Time
state = c(X1=2,X2=0.01,K2= 10)
times=0:100
# Assign Parameter Values
parameters = c(d1=0.001, d2=0.008, r12=0.3,C=0.5,k1= 0.001)
for (i in 1:length(times)){
out= ode(y=state,times=times,func=exponential,parms=parameters)
}
Error in checkFunc(Func2, times, y, rho) :
The number of derivatives returned by func() (2) must equal the length of
the initial conditions vector (3)**
The error comes from the return in your defined function:
Your input parameter y has length 3, but you only return 2 values back, that's the error. You can solve your problem with
return(list(c(X1, X2, K2)))
Another possibility is to take K2 to the parameters, then your old return was right. You have to decide if K2 is a variable or a parameter.
And BTW: Why a for loop with the time? In my opinion that is not necessary, because the ODEs are solved in the timeinterval you submitted to the odefunction.
out= ode(y=state,times=times,func=exponential,parms=parameters)

cost function in cv.glm of boot library in R

I am trying to use the crossvalidation cv.glm function from the boot library in R to determine the number of misclassifications when a glm logistic regression is applied.
The function has the following signature:
cv.glm(data, glmfit, cost, K)
with the first two denoting the data and model and K specifies the k-fold.
My problem is the cost parameter which is defined as:
cost: A function of two vector arguments specifying the cost function
for the crossvalidation. The first argument to cost should correspond
to the observed responses and the second argument should correspond to
the predicted or fitted responses from the generalized linear model.
cost must return a non-negative scalar value. The default is the
average squared error function.
I guess for classification it would make sense to have a function which returns the rate of misclassification something like:
nrow(subset(data, (predict >= 0.5 & data$response == "no") |
(predict < 0.5 & data$response == "yes")))
which is of course not even syntactically correct.
Unfortunately, my limited R knowledge let me waste hours and I was wondering if someone could point me in the correct direction.
It sounds like you might do well to just use the cost function (i.e. the one named cost) defined further down in the "Examples" section of ?cv.glm. Quoting from that section:
# [...] Since the response is a binary variable an
# appropriate cost function is
cost <- function(r, pi = 0) mean(abs(r-pi) > 0.5)
This does essentially what you were trying to do with your example. Replacing your "no" and "yes" with 0 and 1, lets say you have two vectors, predict and response. Then cost() is nicely designed to take them and return the mean classification rate:
## Simulate some reasonable data
set.seed(1)
predict <- seq(0.1, 0.9, by=0.1)
response <- rbinom(n=length(predict), prob=predict, size=1)
response
# [1] 0 0 0 1 0 0 0 1 1
## Demonstrate the function 'cost()' in action
cost(response, predict)
# [1] 0.3333333 ## Which is right, as 3/9 elements (4, 6, & 7) are misclassified
## (assuming you use 0.5 as the cutoff for your predictions).
I'm guessing the trickiest bit of this will be just getting your mind fully wrapped around the idea of passing a function in as an argument. (At least that was for me, for the longest time, the hardest part of using the boot package, which requires that move in a fair number of places.)
Added on 2016-03-22:
The function cost(), given above is in my opinion unnecessarily obfuscated; the following alternative does exactly the same thing but in a more expressive way:
cost <- function(r, pi = 0) {
mean((pi < 0.5) & r==1 | (pi > 0.5) & r==0)
}
I will try to explain the cost function in simple words. Let's take
cv.glm(data, glmfit, cost, K) arguments step by step:
data
The data consists of many observations. Think of it like series of numbers or even.
glmfit
It is generalized linear model, which runs on the above series. But there is a catch it splits data into several parts equal to K. And runs glmfit on each of them separately (test set), taking the rest of them as training set. The output of glmfit is a series consisting of same number of elements as the split input passed.
cost
Cost Function. It takes two arguments first the split input series(test set), and second the output of glmfit on the test input. The default is mean square error function.
.
It sums the square of difference between observed data point and predicted data point. Inside the function a loop runs over the test set (output and input should have same number of elements) calculates difference, squares it and adds to output variable.
K
The number to which the input should be split. Default gives leave one out cross validation.
Judging from your cost function description. Your input(x) would be a set of numbers between 0 and 1 (0-0.5 = no and 0.5-1 = yes) and output(y) is 'yes' or 'no'. So error(e) between observation(x) and prediction(y) would be :
cost<- function(x, y){
e=0
for (i in 1:length(x)){
if(x[i]>0.5)
{
if( y[i]=='yes') {e=0}
else {e=x[i]-0.5}
}else
{
if( y[i]=='no') {e=0}
else {e=0.5-x[i]}
}
e=e*e #square error
}
e=e/i #mean square error
return (e)
}
Sources : http://www.cs.cmu.edu/~schneide/tut5/node42.html
The cost function can optionally be defined if there is one you prefer over the default average squared error. If you wanted to do so then the you would write a function that returns the cost you want to minimize using two inputs: (1) the vector of known labels that you are predicting, and (2) the vector of predicted probabilities from your model for those corresponding labels. So for the cost function that (I think) you described in your post you are looking for a function that will return the average number of accurate classifications which would look something like this:
cost <- function(labels,pred){
mean(labels==ifelse(pred > 0.5, 1, 0))
}
With that function defined you can then pass it into your glm.cv() call. Although I wouldn't recommend using your own cost function over the default one unless you have reason to. Your example isn't reproducible, so here is another example:
> library(boot)
>
> cost <- function(labels,pred){
+ mean(labels==ifelse(pred > 0.5, 1, 0))
+ }
>
> #make model
> nodal.glm <- glm(r ~ stage+xray+acid, binomial, data = nodal)
> #run cv with your cost function
> (nodal.glm.err <- cv.glm(nodal, nodal.glm, cost, nrow(nodal)))
$call
cv.glm(data = nodal, glmfit = nodal.glm, cost = cost, K = nrow(nodal))
$K
[1] 53
$delta
[1] 0.8113208 0.8113208
$seed
[1] 403 213 -2068233650 1849869992 -1836368725 -1035813431 1075589592 -782251898
...
The cost function defined in the example for cv.glm clearly assumes that the predictions are probabilities, which would require the type="response" argument in the predict function. The documentation from library(boot) should state this explicitly. I would otherwise be forced to assume that the default type="link" is used inside the cv.glm function, in which case the cost function would not work as intended.

R: looping to search for max of non-monotonic function

Refer to the R code below. The function (someRfunction) operates on a vector and returns a scalar value. The data are pairs (x,y), where x and y are vectors of length n, which may be large.
I want to know the value of x* such that the result of someRfunction on y where {x>x*} is maximized. The function operates on y values and is non-monotonic in x*. I need to evaluate for all x* (i.e. each element of x). Speed is not an issue if executed once, but the code would be executed many times in a simulation. Is there any way to make this code more efficient/faster?
### x and y are vectors of length n
### sort x and y such that they are ordered by descending x
xord <- x[order(-x)]
yord <- y[order(-x)]
maxf <- -99999
maxcut <- NA
for (i in 1:n) {
### yi is a subvector of y that corresponds to y[x>x{i}]
### where x{i} is the (n-i+1)th order statistic of x
yi <- yord[1:(i-1)]
fxi <- someRfunction(yi)
if (fxi>maxf) {
maxf <- fxi
maxcut <- xord[i]
}
}
Thanks.
Edit: let someRfunction(yi)=t.test(yi)$statistic.
If you can say anything more about the function, particularly whether it is smooth and whether its gradient can be determine, you will get a better answer. At the moment the only increase in speed will be modest due to the ability to pre-specify a vector to hold the results, omit that if-max clause and then use which.max() on the vector. You might want to look at the function optimx in package "optimx".

Resources