Consider the following ODE with boundary condition
y'[x] = (a - 1)*y[x]/x
y[1] = a*b
where a > 0, b > 0, and x > 0.
I want to solve this ODE numerically in R using the command ode from the R package deSolve.
This is, I want to calculate the solution y[x] for x>0
Questions: (i) I do not know how to include "x" in the denominator in the model equation. (ii) I also don't know how to specify the initial condition.
Disclaimer: I know this equation has an analytical solution, but I am trying to compare it with the numerical solution.
So far, I have unsuccessfully tried:
library(deSolve)
difeq <- function(x, y, parms) {
a <- parms[1]
b <- parms[2]
# model equations
dy <- y*(a - 1)/***x*** # HOW TO SPECIFY x?
# result
return( list(dy) )
}
params <- c(a = 2, b = 1)
init <- c(y = params[1]*params[2]) # HOW TO SPECIFY THE INITIAL CONDITION?
times <- seq(0,5, by = 0.1)
out <- ode(init, times, difeq, params)
plot(out)
Looking at the equation, I would say that x is equivalent to time t. Package deSolve uses time in the denominator because this is quite common, but it is not limited to time dependent systems. It can also be something else, e.g. a spatial component, just set x=t, that works exactly the same.
Note also to avoid 0 in the denominator.
library(deSolve)
difeq <- function(x, y, parms) {
a <- parms[1]
b <- parms[2]
# model equations
dy <- y * (a - 1) / x
# result
return(list(y=dy))
}
params <- c(a = 2, b = 1)
init <- c(y = unname(params[1]*params[2]))
times <- seq(1, 5, by = 0.1)
out <- ode(init, times, difeq, params)
## just rename "time" to "x"
colnames(out)[1] <- "x"
head(out)
To get the initial value at x -> 0 one may use an optimizer, run the system backwards (see separate answer) or use another solver (see CRAN Task view).
time y
1 1.0 2.0
2 1.1 2.2
3 1.2 2.4
4 1.3 2.6
5 1.4 2.8
6 1.5 3.0
7 1.6 3.2
8 1.7 3.4
Another part of the question was, how to specify the initial condition at x>0. As this is a somewhat additional question to the technical part how to include x in the denominator, I will give it as separate answer.
The approach uses a backward simulation. As negative time steps are not supported in deSolve a workaround can be used:
Instead of running the simulation from 1 to 0 backwards, we run it forward from -1 to 0.
Because of a pole at x=0, we use a small value close to zero.
In the model, we change the sign of the derivative, and as x appears also in the model, we multiply it with the sign too.
Note: this is explicitly included here to show the general approach for didactical reasons. In the special case here, signs will of course cancel out.
In the script below, we first start integration from the known initial value at x=1. Then we do the "backward" integration to estimate the initial value close to zero. Finally we run the model with the new initial value for the whole range of x and compare the results.
library(deSolve)
difeq <- function(x, y, parms, sign = 1) {
with(as.list(parms),{
dy <- sign * y * (a - 1) / (sign * x)
return(list(y=dy))
})
}
## initial simulation starting at x = 1
params <- c(a = 2, b = 1, sign = 1)
init <- c(y = params[["a"]] * params[["b"]])
times <- seq(1, 5, by = 0.1)
out1 <- ode(init, times, difeq, params)
## backwards simulation
close_to_zero <- 1e-16 # numerical precision is limited, do not decrease this more
times <- c(-1, -close_to_zero)
out0 <- ode(init, times, difeq, params, sign = -1)
## forward simulation starting close to zero
times <- c(close_to_zero, seq(0.1, 10, 0.1))
init <- c(y = out0[[nrow(out0), 2]])
cat("y[", close_to_zero,"] =", init, "\n")
out2 <- ode(init, times, difeq, params)#, hmax=0.01)
## comparison of the initial simulation with an extended time period
plot(out2, out1, lwd=c(1, 5), lty=c("solid", "dotted"), xlab="x")
What I am about to explain is kinda tricky, but I hope I can explain it clearly.
Suppose you have a function that does the Hodrick Prescott detrending, that is pretty much this:
The user picks up the λ value, and therefore for every λ it exists a series trend τ(λ).
Suppose you pick a number near 0 (on the positive side)
This number is V, for this case suppose V=0.0001278846
Then you want to compute this:
(I have the function that does)
But you want to find a λ so that F(λ) = V
How can I complete this?
I was trying to make a while statement, but could not state it correctly, then I made a for loop with an if statement to break the for loop when F(λ)-V = 0.
This is what my for loop looks like:
for(L in 1:3500){
F_ <- find_v(dataa, L)
if((F_-V)==0){
print(paste("The λ value for this series following Rule 1 is:", L))
break
}
cat(paste("The λ =",L,"has a (F-V) difference of:", (F_-V),"\n"))
where dataa is my data composed of 89 obs.
Using this for loop I see that (F-V) turns negative between L = 3276 and L = 3277.
Is there a better way to do it? Like optimization?
Because using the for loop it feels like I'm obtaining the optimal λ by the force.
Sorry for not getting my data or codes for the hodrick prescott detrending or the find_v function, they are way too long.
Since you are doing double optimization, consider the following:
The data
set.seed(0)
y <- rnorm(89)
The function to be optimized:
lfun <- function(tau, y, lambda){
n <- length(tau)
tt <- tau[-(1:2)] - 2 * tau[-c(1, n)] + head(tau, -2)
sum((y-tau)^2) + lambda *sum(tt^2)
}
The F function:
f_lambda <- function(lambda, y, V = 0){
tau <- optim(y,lfun,y = y, lambda = lambda, method = 'BFGS')$par
tt <- tail(tau,-2) - 2 * head(tau[-1], -1) + head(tau, -2)
sqrt((sum((y-tau)^2)/sum(tt^2) - V)^2)
}
Optimizing the F function:
optim(0.1, f_lambda, y = y, V=0.0001278846, method="Brent",lower=0, upper=100)
$par
[1] 0.003412824
$value
[1] 2.633131e-10
$counts
function gradient
NA NA
$convergence
[1] 0
$message
NULL
Now the lambda = 0.003412824 gives the desired V ie:
f_lambda(0.003412824, y)
[1] 0.0001278843
Which is very close to the V=0.0001278846 you started with.
I'm trying to simulate a Brownian bridge from Wiener process, but struggling with code.
Here is what i'm trying to do in math form:
B(t) = W (t) − tW (1)
It is important, that W(T) = 0, so that the process is pinned at the origin at both t=0 and t=T (should start and end with B(t)=B(T)= 0
Here I'm defining Wiener process:
n <- 1000
T <- 1
delta <- T/n
t <- seq(0,T, delta)
set.seed(17)
W1 <- cumsum(c(0,rnorm(n, 0, 1) * sqrt(delta)))
plot(t, W1 ,type = 'l',
xlab = 't', ylab = 'W(t)', xlim = c(0, T + delta))
Here I'm trying to build the Brownian bridge, but getting wrong result:
B <- W1 - t * W1[T] # The Brownian bridge from (0,0) to (1,target)
plot(times, B, type="l")
I expect B[1] == B[1001] to be [TRUE] and equal to 0, but in my case B[1001] is not equal to 0.
Any suggestions?
There's just one issue - in the definition of B. Instead you want
B <- W1 - t * W1[n + 1]
since the indexing n + 1 (i.e., choosing the last element corresponding to t = 1) refers to W1 as a vector rather than a process. Then indeed
B[1]
# [1] 0
B[1001]
# [1] 0
as expected. Note also though that using t and T as variable names is not advisable as both of them already have important roles; see ?t and ?T.
I am trying to write a code to solve the following problem (As stated in HW5 in the CalTech course Learning from Data):
In this problem you will create your own target function f
(probability in this case) and data set D to see how Logistic
Regression works. For simplicity, we will take f to be a 0=1
probability so y is a deterministic function of x. Take d = 2 so you
can visualize the problem, and let X = [-1; 1]×[-1; 1] with uniform
probability of picking each x 2 X . Choose a line in the plane as the
boundary between f(x) = 1 (where y has to be +1) and f(x) = 0 (where y
has to be -1) by taking two random, uniformly distributed points from
X and taking the line passing through them as the boundary between y =
±1. Pick N = 100 training points at random from X , and evaluate the
outputs yn for each of these points xn. Run Logistic Regression with
Stochastic Gradient Descent to find g, and estimate Eout(the cross
entropy error) by generating a sufficiently large, separate set of
points to evaluate the error. Repeat the experiment for 100 runs with
different targets and take the average. Initialize the weight vector
of Logistic Regression to all zeros in each run. Stop the algorithm
when |w(t-1) - w(t)| < 0:01, where w(t) denotes the weight vector at
the end of epoch t. An epoch is a full pass through the N data points
(use a random permutation of 1; 2; · · · ; N to present the data
points to the algorithm within each epoch, and use different
permutations for different epochs). Use a learning rate of 0.01.
I am required to calculate the nearest value to Eout for N=100, and the average number of epochs for the required criterion.
I wrote and ran the code but I'm not getting the right answers (as stated in the solutions, these are Eout is near 0.1 and the number of epochs is near 350). The required number of epochs for a delta w of 0.01 comes to far too small (around 10), leaving the error too big (around 2). I then tried to replace the criterion with |w(t-1) - w(t)| < 0.001 (rather than 0.01). Then, the average required number of epochs was about 250 and out of sample error was about 0.35.
Is there something wrong with my code/solution, or is it possible that the answers provided are faulty? I've added comments to indicate what I intend to do at each step. Thanks in advance.
library(pracma)
h<- 0 # h will later be updated to number of required epochs
p<- 0 # p will later be updated to Eout
C <- matrix(ncol=10000, nrow=2) # Testing set, used to calculate out of sample error
d <- matrix(ncol=10000, nrow=1)
for(i in 1:10000){
C[, i] <- c(runif(2, min = -1, max = 1)) # Sample data
d[1, i] <- sign(C[2, i] - f(C[1, i]))
}
for(g in 1:100){ # 100 runs of the experiment
x <- runif(2, min = -1, max = 1)
y <- runif(2, min = -1, max = 1)
fit = (lm(y~x))
t <- summary(fit)$coefficients[,1]
f <- function(x){ # Target function
t[2]*x + t[1]
}
A <- matrix(ncol=100, nrow=2) # Sample data
b <- matrix(ncol=100, nrow=1)
norm_vec <- function(x) {sqrt(sum(x^2))} # vector norm calculator
w <- c(0,0) # weights initialized to zero
for(i in 1:100){
A[, i] <- c(runif(2, min = -1, max = 1)) # Sample data
b[1, i] <- sign(A[2, i] - f(A[1, i]))
}
q <- matrix(nrow = 2, ncol = 1000) # q tracks the weight vector at the end of each epoch
l= 1
while(l < 1001){
E <- function(z){ # cross entropy error function
x = z[1]
y = z[2]
v = z[3]
return(log(1 + exp(-v*t(w)%*%c(x, y))))
}
err <- function(xn1, xn2, yn){ #gradient of error function
return(c(-yn*xn1, -yn*xn2)*(exp(-yn*t(w)*c(xn1,xn2))/(1+exp(-yn*t(w)*c(xn1,xn2)))))
}
e = matrix(nrow = 2, ncol = 100) # e will track the required gradient at each data point
e[,1:100] = 0
perm = sample(100, 100, replace = FALSE, prob = NULL) # Random permutation of the data indices
for(j in 1:100){ # One complete Epoch
r = A[,perm[j]] # pick the perm[j]th entry in A
s = b[perm[j]] # pick the perm[j]th entry in b
e[,perm[j]] = err(r[1], r[2], s) # Gradient of the error
w = w - 0.01*e[,perm[j]] # update the weight vector accorng to the formula involving step size, gradient
}
q[,l] = w # the lth entry is the weight vector at the end of the lth epoch
if(l > 1 & norm_vec(q[,l] - q[,l-1])<0.001){ # given criterion to terminate the algorithm
break
}
l = l+1 # move to the next epoch
}
for(n in 1:10000){
p[g] = mean(E(c(C[1,n], C[2, n], d[n]))) # average over 10000 data points, of the error function, in experiment no. g
}
h[g] = l #gth entry in the vector h, tracks the number of epochs in the gth iteration of the experiment
}
mean(h) # Mean number of epochs needed
mean(p) # average Eout, over 100 experiments
I've been mystified by the R quantile function all day.
I have an intuitive notion of how quantiles work, and an M.S. in stats, but boy oh boy, the documentation for it is confusing to me.
From the docs:
Q[i](p) = (1 - gamma) x[j] + gamma
x[j+1],
I'm with it so far. For a type i quantile, it's an interpolation between x[j] and x [j+1], based on some mysterious constant gamma
where 1 <= i <= 9, (j-m)/n <= p <
(j-m+1)/ n, x[j] is the jth order
statistic, n is the sample size, and m
is a constant determined by the sample
quantile type. Here gamma depends on
the fractional part of g = np+m-j.
So, how calculate j? m?
For the continuous sample quantile
types (4 through 9), the sample
quantiles can be obtained by linear
interpolation between the kth order
statistic and p(k):
p(k) = (k - alpha) / (n - alpha - beta
+ 1),
where α and β are constants determined
by the type. Further, m = alpha + p(1
- alpha - beta), and gamma = g.
Now I'm really lost. p, which was a constant before, is now apparently a function.
So for Type 7 quantiles, the default...
Type 7
p(k) = (k - 1) / (n - 1). In this case, p(k) = mode[F(x[k])]. This is used by S.
Anyone want to help me out? In particular I'm confused by the notation of p being a function and a constant, what the heck m is, and now to calculate j for some particular p.
I hope that based on the answers here, we can submit some revised documentation that better explains what is going on here.
quantile.R source code
or type: quantile.default
You're understandably confused. That documentation is terrible. I had to go back to the paper its based on (Hyndman, R.J.; Fan, Y. (November 1996). "Sample Quantiles in Statistical Packages". American Statistician 50 (4): 361–365. doi:10.2307/2684934) to get an understanding. Let's start with the first problem.
where 1 <= i <= 9, (j-m)/n <= p < (j-m+1)/ n, x[j] is the jth order statistic, n is the sample size, and m is a constant determined by the sample quantile type. Here gamma depends on the fractional part of g = np+m-j.
The first part comes straight from the paper, but what the documentation writers omitted was that j = int(pn+m). This means Q[i](p) only depends on the two order statistics closest to being p fraction of the way through the (sorted) observations. (For those, like me, who are unfamiliar with the term, the "order statistics" of a series of observations is the sorted series.)
Also, that last sentence is just wrong. It should read
Here gamma depends on the fractional part of np+m, g = np+m-j
As for m that's straightforward. m depends on which of the 9 algorithms was chosen. So just like Q[i] is the quantile function, m should be considered m[i]. For algorithms 1 and 2, m is 0, for 3, m is -1/2, and for the others, that's in the next part.
For the continuous sample quantile types (4 through 9), the sample quantiles can be obtained by linear interpolation between the kth order statistic and p(k):
p(k) = (k - alpha) / (n - alpha - beta + 1), where α and β are constants determined by the type. Further, m = alpha + p(1 - alpha - beta), and gamma = g.
This is really confusing. What the documentation calls p(k) is not the same as the p from before. p(k) is the plotting position. In the paper, the authors write it as pk, which helps. Especially since in the expression for m, the p is the original p, and the m = alpha + p * (1 - alpha - beta). Conceptually, for algorithms 4-9, the points (pk, x[k]) are interpolated to get the solution (p, Q[i](p)). Each algorithm only differs in the algorithm for the pk.
As for the last bit, R is just stating what S uses.
The original paper gives a list of 6 "desirable properties for a sample quantile" function, and states a preference for #8 which satisfies all by 1. #5 satisfies all of them, but they don't like it on other grounds (it's more phenomenological than derived from principles). #2 is what non-stat geeks like myself would consider the quantiles and is what's described in wikipedia.
BTW, in response to dreeves answer, Mathematica does things significantly differently. I think I understand the mapping. While Mathematica's is easier to understand, (a) it's easier to shoot yourself in the foot with nonsensical parameters, and (b) it can't do R's algorithm #2. (Here's Mathworld's Quantile page, which states Mathematica can't do #2, but gives a simpler generalization of all the other algorithms in terms of four parameters.)
There are various ways of computing quantiles when you give it a vector, and don't have a known CDF.
Consider the question of what to do when your observations don't fall on quantiles exactly.
The "types" are just determining how to do that. So, the methods say, "use a linear interpolation between the k-th order statistic and p(k)".
So, what's p(k)? One guy says, "well, I like to use k/n". Another guy says, "I like to use (k-1)/(n-1)" etc. Each of these methods have different properties that are better suited for one problem or another.
The \alpha's and \beta's are just ways to parameterize the functions p. In one case, they're 1 and 1. In another case, they're 3/8 and -1/4. I don't think the p's are ever a constant in the documentation. They just don't always show the dependency explicitly.
See what happens with the different types when you put in vectors like 1:5 and 1:6.
(also note that even if your observations fall exactly on the quantiles, certain types will still use linear interpolation).
I believe the R help documentation is clear after the revisions noted in #RobHyndman's comment, but I found it a bit overwhelming. I am posting this answer in case it helps someone move quickly through the options and their assumptions.
To get a grip on quantile(x, probs=probs), I wanted to check out the source code. This too was trickier than I anticipated in R so I actually just grabbed it from a github repo that looked recent enough to run with. I was interested in the default (type 7) behavior, so I annotated that some, but didn't do the same for each option.
You can see how the "type 7" method interpolates, step by step, both in the code and also I added a few lines to print some important values as it goes.
quantile.default <-function(x, probs = seq(0, 1, 0.25), na.rm = FALSE, names = TRUE
, type = 7, ...){
if(is.factor(x)) { #worry about non-numeric data
if(!is.ordered(x) || ! type %in% c(1L, 3L))
stop("factors are not allowed")
lx <- levels(x)
} else lx <- NULL
if (na.rm){
x <- x[!is.na(x)]
} else if (anyNA(x)){
stop("missing values and NaN's not allowed if 'na.rm' is FALSE")
}
eps <- 100*.Machine$double.eps #this is to deal with rounding things sensibly
if (any((p.ok <- !is.na(probs)) & (probs < -eps | probs > 1+eps)))
stop("'probs' outside [0,1]")
#####################################
# here is where terms really used in default type==7 situation get defined
n <- length(x) #how many observations are in sample?
if(na.p <- any(!p.ok)) { # set aside NA & NaN
o.pr <- probs
probs <- probs[p.ok]
probs <- pmax(0, pmin(1, probs)) # allow for slight overshoot
}
np <- length(probs) #how many quantiles are you computing?
if (n > 0 && np > 0) { #have positive observations and # quantiles to compute
if(type == 7) { # be completely back-compatible
index <- 1 + (n - 1) * probs #this gives the order statistic of the quantiles
lo <- floor(index) #this is the observed order statistic just below each quantile
hi <- ceiling(index) #above
x <- sort(x, partial = unique(c(lo, hi))) #the partial thing is to reduce time to sort,
#and it only guarantees that sorting is "right" at these order statistics, important for large vectors
#ties are not broken and tied elements just stay in their original order
qs <- x[lo] #the values associated with the "floor" order statistics
i <- which(index > lo) #which of the order statistics for the quantiles do not land on an order statistic for an observed value
#this is the difference between the order statistic and the available ranks, i think
h <- (index - lo)[i] # > 0 by construction
## qs[i] <- qs[i] + .minus(x[hi[i]], x[lo[i]]) * (index[i] - lo[i])
## qs[i] <- ifelse(h == 0, qs[i], (1 - h) * qs[i] + h * x[hi[i]])
qs[i] <- (1 - h) * qs[i] + h * x[hi[i]] # This is the interpolation step: assemble the estimated quantile by removing h*low and adding back in h*high.
# h is the arithmetic difference between the desired order statistic amd the available ranks
#interpolation only occurs if the desired order statistic is not observed, e.g. .5 quantile is the actual observed median if n is odd.
# This means having a more extreme 99th observation doesn't matter when computing the .75 quantile
###################################
# print all of these things
cat("floor pos=", c(lo))
cat("\nceiling pos=", c(hi))
cat("\nfloor values= ", c(x[lo]))
cat( "\nwhich floors not targets? ", c(i))
cat("\ninterpolate between ", c(x[lo[i]]), ";", c(x[hi[i]]))
cat( "\nadjustment values= ", c(h))
cat("\nquantile estimates:")
}else if (type <= 3){## Types 1, 2 and 3 are discontinuous sample qs.
nppm <- if (type == 3){ n * probs - .5 # n * probs + m; m = -0.5
} else {n * probs} # m = 0
j <- floor(nppm)
h <- switch(type,
(nppm > j), # type 1
((nppm > j) + 1)/2, # type 2
(nppm != j) | ((j %% 2L) == 1L)) # type 3
} else{
## Types 4 through 9 are continuous sample qs.
switch(type - 3,
{a <- 0; b <- 1}, # type 4
a <- b <- 0.5, # type 5
a <- b <- 0, # type 6
a <- b <- 1, # type 7 (unused here)
a <- b <- 1 / 3, # type 8
a <- b <- 3 / 8) # type 9
## need to watch for rounding errors here
fuzz <- 4 * .Machine$double.eps
nppm <- a + probs * (n + 1 - a - b) # n*probs + m
j <- floor(nppm + fuzz) # m = a + probs*(1 - a - b)
h <- nppm - j
if(any(sml <- abs(h) < fuzz)) h[sml] <- 0
x <- sort(x, partial =
unique(c(1, j[j>0L & j<=n], (j+1)[j>0L & j<n], n))
)
x <- c(x[1L], x[1L], x, x[n], x[n])
## h can be zero or one (types 1 to 3), and infinities matter
#### qs <- (1 - h) * x[j + 2] + h * x[j + 3]
## also h*x might be invalid ... e.g. Dates and ordered factors
qs <- x[j+2L]
qs[h == 1] <- x[j+3L][h == 1]
other <- (0 < h) & (h < 1)
if(any(other)) qs[other] <- ((1-h)*x[j+2L] + h*x[j+3L])[other]
}
} else {
qs <- rep(NA_real_, np)}
if(is.character(lx)){
qs <- factor(qs, levels = seq_along(lx), labels = lx, ordered = TRUE)}
if(names && np > 0L) {
names(qs) <- format_perc(probs)
}
if(na.p) { # do this more elegantly (?!)
o.pr[p.ok] <- qs
names(o.pr) <- rep("", length(o.pr)) # suppress <NA> names
names(o.pr)[p.ok] <- names(qs)
o.pr
} else qs
}
####################
# fake data
x<-c(1,2,2,2,3,3,3,4,4,4,4,4,5,5,5,5,5,5,5,5,5,6,6,7,99)
y<-c(1,2,2,2,3,3,3,4,4,4,4,4,5,5,5,5,5,5,5,5,5,6,6,7,9)
z<-c(1,2,2,2,3,3,3,4,4,4,4,4,5,5,5,5,5,5,5,5,5,6,6,7)
#quantiles "of interest"
probs<-c(0.5, 0.75, 0.95, 0.975)
# a tiny bit of illustrative behavior
quantile.default(x,probs=probs, names=F)
quantile.default(y,probs=probs, names=F) #only difference is .975 quantile since that is driven by highest 2 observations
quantile.default(z,probs=probs, names=F) # This shifts everything b/c now none of the quantiles fall on an observation (and of course the distribution changed...)... but
#.75 quantile is stil 5.0 b/c the observations just above and below the order statistic for that quantile are still 5. However, it got there for a different reason.
#how does rescaling affect quantile estimates?
sqrt(quantile.default(x^2, probs=probs, names=F))
exp(quantile.default(log(x), probs=probs, names=F))