Minimum across cumulative sums with different starting indices - r

Question: Given a vector, I want to know the minimum of a series of cumulative sums, where each cumulative sum is calculated for an increasing starting index of the vector and a fixed ending index (1:5, 2:5, ..., 5:5). Specifically, I am wondering if this can be calculated w/o using a for() loop, and if there is potentially a term for this algorithm/ calculation. I am working in R.
Context: The vector of interest contains a time series of pressure changes. I want to know of the largest (or smallest) net change in pressure across a range of starting points but with a fixed end point.
Details + Example:
#Example R code
diffP <- c(0, -1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0)
minNet1 <- min(cumsum(diffP))
minNet1 #over the whole vector, the "biggest net drop" (largest magnitude with negative sign) is -1.
#However, if I started a cumulative sum in the second half of diffP, I would get a net pressure change of -2.
hold <- list()
nDiff <- length(diffP)
for(j in 1:nDiff){
hold[[j]] <- cumsum(diffP[j:nDiff])
}
answer <- min(unlist(hold)) #this gives the answer that I ultimately want
Hopefully my example above has helped to articulate my question. answer contains the correct answer, but I'd rather do this without a for() loop in R. Is there a better way to do this calculation, or maybe a name I can put to it?

This is known as the http://en.wikipedia.org/wiki/Maximum_subarray_problem and is a typical interview question!
Most people --me included-- would solve it using a O(n^2) algorithm but there is in fact a much better algorithm with O(n) complexity. Here is an R implementation of Kadane's algorithm from the link above:
max_subarray <- function(A) {
max_ending_here <- 0
max_so_far <- 0
for (x in A) {
max_ending_here <- max(0, max_ending_here + x)
max_so_far <- max(max_so_far, max_ending_here)
}
max_so_far
}
Since in your case, you are looking for the minimum sub-array sum, you would have to call it like this:
-max_subarray(-diffP)
[1] -2
(Or you can also rewrite the function above and replace max with min everywhere.)
Note that, yes, the implementation still uses a for loop, but the complexity of the algorithm being O(n) (meaning the number of operations is of the same order as length(diff)), it should be rather quick. Also, it won't consume any memory since it only stores and updates a couple variables.

Related

Optimization problems linked together by time

Let's say i'm trying to minimize a function f1(x) with x a vector. This is a classic optimization problem and I get a solution, let's say the vector x_opt = (0, 700, 0, 1412, 0, 5466).
Now i have another function to minimize f2(x), and I know this function is related to the same individual and should have x_opt close to the first one. So, if i have (700, 0, 0, 1454, 0, 5700) I won't be happy with the solution, but if the first one was (700, 0, ...) or if the second one is (0, 700, ...) I'd be happy.
Is it ok to minimize f1(x1) + f2(x2) + lambda * || x2-x1 ||
What norm should I use, should i set lambda to one ?
What if i have more than two functions and I know f1 and f2 are more closely related than f3 and f2 and f3 and f1 ?
Is there any literature on this topic, or a name ? Because I don't even know where to look

Solving an function with an iterative method

I am trying to solve a function where the values are implicit given within two equations. I have to use an iterative method.
Values that are already given are V_DCL, barrier, us_r, time.
My code for the function is:
Test <- function(V_DCL, barrier, us_r, time, vola_A) {
V_A <- V_DCL/(pnorm(d_1, 0, 1)) + (barrier*exp(us_r*time)*pnorm(d_2, 0, 1))/(pnorm(d_1, 0, 1))
d_1 <- (log(V_A/barrier)+(us_r + 0.5*vola_A^2)*t)/(vola_A*sqrt(time))
d_2 <- d_1 - vola_A*sqrt(time)
outputs <- list(V_A, d_1, d_2, vola)
return(outputs)
}
I need to get a value for V_A and vola_A.
This function is similar to the black scholes formula. But I do not have a value for the value of the asset but for the liabilities, so I rearranged it.
So far I know that I have to make an initial guess for vola_A which need to get changed until all equations fit.
I already looked into the base repeat() function and in the package simecol. But I did not figure out how to apply it on my code.
Can you give me some ideas? Thank you.
Edit (additional information):
The given data for us_r is 0.05, time stands for time horizon and I will work with 1 period for now. The given value for barrier is 2.683782e+13 and for V_DCL it is 4.732741e+11.

How to solve a matrix equation in R

My friend and I (both non-R experts) are trying to solve a matrix equation in R. We have matrix y which is defined by:
y=matrix(c(0.003,0.977,0,0,0,0,0,0,0,0,0,0.02,0,0.0117,0.957,0,0,0,0,0,0,0,0,0.03,0,0,0.0067,0.917,0,0,0,0,0,0,0,0.055,0,0,0,0.045,0.901,0,0,0,0,0,0,0.063,0,0,0,0,0.0533,0.913,0,0,0,0,0,0.035,0,0,0,0,0,0.05,0,0,0,0,0,0.922,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.01,0,0,0,0,0,0,0,0,0,0,0,0,0.023,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0),
nrow=12, ncol=12, byrow=TRUE)
This matrix simulates the way students in our school pass on to the next year. By multiplying this matrix with a vector containing the amount of students in each year we will get the amount of students in each year a year later.
With the function:
sumfun<-function(x,start,end){
return(sum(x[start:end]))
We add up the amount of students that are in each year to get the amount of students in our school in total. We want to fill in the vector (which we multiplicate by array with our matrix) with the amount of students currently in the school and have the amount of new students (first number of the vector) as our variable X.
For example:
sumfun(colSums(y*c(x,200,178,180,201,172,0,0,200,194,0,0)),2,6)
We want to equate this equation to 1000, the maximum amount of students our school building can house. By doing this, we can calculate how many new students can be accepted by our school. We have no idea how to do this. We would precast X is something between 100 and 300. We would be very grateful if somebody can help us with this!
I'm not familiar with R but I can guide through the main process of solving this matrix equation. Assuming that your matrix is called P:
And let the current student vector be called s0:
s0 = {x, 200, 178, 180, 201, 172, 0, 0, 200, 194, 0, 0};
Note that we leave x undefined as we want to solve for this variable later. Note that even though x is unknown, we can still multiply s0 with P. We call this new vector s1.
s1 = s0.P = {0.003*x, 2.34 + 0.977*x, 192.593, 173.326, 177.355, 192.113, 0, 0, 0, 0, 0, 192.749 + 0.02*x}
We can verify that this is correct as of the student years 2-6, only year 2 is effected by the amount of new students (x). So if now sum over the years 2-6 like in your example, we find that the sum is:
s1[2:6] = 737.727 + 0.977*x
All that is left is solving the trivial equation that s1[2:6] == 1000:
s1[2:6] == 1000
737.727 + 0.977*x == 1000
x = 268.447
Let me know if this is correct! This was all done in Mathematica.
The following code shows how to this in R:
y=matrix(c(0.003,0.977,0,0,0,0,0,0,0,0,0,0.02,0,0.0117,0.957,0,0,0,0,0,0,0,0,0.03,0,0,0.0067,0.917,0,0,0,0,0,0,0,0.055,0,0,0,0.045,0.901,0,0,0,0,0,0,0.063,0,0,0,0,0.0533,0.913,0,0,0,0,0,0.035,0,0,0,0,0,0.05,0,0,0,0,0,0.922,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.01,0,0,0,0,0,0,0,0,0,0,0,0,0.023,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0),
nrow=12, ncol=12, byrow=TRUE)
sumfun<-function(x,start,end){
return(sum(x[start:end]))
}
students <- function(x) {
students = sumfun(colSums(y*c(x,200,178,180,201,172,0,0,200,194,0,0)),2,6);
return(students - 1000);
}
uniroot(students, lower=100, upper=300)$root;
The function uniroot finds whenever a function is 0. So if you define a function which returns the amount of students for a value x and subtract 1000, it will find the x for which the number of students is 1000.
Note: this only describes short term behavior of the total amount of students. To have the number of students be 1000 in the long-term other equations must be solved.
I would suggest probing various x values and see the resulting answer. From that, you could see the trend and use it for figuring out the answer. Here is an example:
# Sample data
y=matrix(c(0.003,0.977,0,0,0,0,0,0,0,0,0,0.02,0,0.0117,0.957,0,0,0,0,0,0,0,0,0.03,0,0,0.0067,0.917,0,0,0,0,0,0,0,0.055,0,0,0,0.045,0.901,0,0,0,0,0,0,0.063,0,0,0,0,0.0533,0.913,0,0,0,0,0,0.035,0,0,0,0,0,0.05,0,0,0,0,0,0.922,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.01,0,0,0,0,0,0,0,0,0,0,0,0,0.023,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0),
nrow=12, ncol=12, byrow=TRUE)
# funciton f will return a total number of students in the school for a given 'x'
f <- function(x) {
z <- c(x,200,178,180,201,172,0,0,200,194,0,0)
sum(t(y[,2:6]) %*% z)
}
# Let's see the plot
px <- 1:1000
py <- sapply(px,f) # will calculate the total number of students for each x from 1 to 1000
plot(px,py,type='l',lty=2)
# Analyze the matrices (the analysis is not shown here) and reproduce the linear trend
lines(px,f(0)+sum(y[1,2:6])*px,col='red',lty=4)
# obtain the answer using the linear trend
Xstudents <- (1000-f(0))/sum(y[1,2:6])
floor(Xstudents)

Generate random number with given probability

I have a question which is basically the vectorized R solution to the following matlab problem:
Generate random number with given probability matlab
I'm able to generate the random event outcome based on random uniform number and the given probability for each single event (summing to 100% - only one event may happen) by:
sum(runif(1,0,1) >= cumsum(wdOff))
However the function only takes a single random uniform number, whereas I want it to take a vector of random uniform numbers and output the corresponding events for these entries.
So basically I'm looking for the R solution to Oleg's vectorized solution in matlab (from the comments to the matlab solution):
"Vectorized solution: sum(bsxfun(#ge, r, cumsum([0, prob]),2) where r is a column vector and prob a row vector. – Oleg"
Tell me if you need more information.
You could just do a weighted random sample, without worrying about your cumsum method:
sample(c(1, 2, 3), size = 100, replace = TRUE, prob = c(0.5, 0.1, 0.4))
If you already have the numbers, you could also do:
x <- runif(10, 0, 1)
as.numeric(cut(x, breaks = c(0, 0.5, 0.6, 1)))

Square Root of a Singular Matrix in R

I need to compute the matrix A on the power of -1/2, which basically means the square root of the initial matrix's inverse.
If A is singular then the Moore-Penrose generalized inverse is computed with the ginv function from the MASS package, otherwise the regular inverse is computed using the solve function.
Matrix A is defined below:
A <- structure(c(604135780529.807, 0, 58508487574887.2, 67671936726183.9,
0, 0, 0, 1, 0, 0, 0, 0, 58508487574887.2, 0, 10663900590720128,
10874631465443760, 0, 0, 67671936726183.9, 0, 10874631465443760,
11315986615387788, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1), .Dim = c(6L,
6L))
I check singularity with the comparison of the rank and the dimension.
rankMatrix(A) == nrow(A)
The above code returns FALSE, So I have to use ginv to get the inverse. The inverse of A is as follows:
A_inv <- ginv(A)
The square-root of the inverse matrix is computed with the sqrtm function from the expm package.
library(expm)
sqrtm(A_inv)
The function returns the following error:
Error in solve.default(X[ii, ii] + X[ij, ij], S[ii, ij] - sumU) :
Lapack routine zgesv: system is exactly singular
So how can we compute the square root in this case? Please note that matrix A is not always singular so we have to provide a general solution for the problem.
Your question relates to two distinct problems:
Inverse of a matrix
Square root of a matrix
Inverse
The inverse does not exist for singular matrices. In some applications, the Moore-Penrose or some other generalised inverse may be taken as a suitable substitute for the inverse. However, note that computer numerics will incur rounding errors in most cases; and these errors may make a singular matrix appear regular to the computer or vice versa.
If A always exhibits the the block structure of the matrix you give, I suggest to consider only its non-diagonal block
A3 = A[ c( 1, 3, 4 ), c( 1, 3, 4 ) ]
A3
[,1] [,2] [,3]
[1,] 6.041358e+11 5.850849e+13 6.767194e+13
[2,] 5.850849e+13 1.066390e+16 1.087463e+16
[3,] 6.767194e+13 1.087463e+16 1.131599e+16
instead of all of A for better efficiency and less rounding issues. The remaining 1-diagonal entries would remain 1 in the inverse of the square root, so no need to clutter the calculation with them. To get an impression of the impact of this simplification, note that R can calculate
A3inv = solve(A3)
while it could not calculate
Ainv = solve(A)
But we will not need A3inverse, as will become evident below.
Square root
As a general rule, the square root of a matrix A will only exist if the matrix has a diagonal Jordan normal form (https://en.wikipedia.org/wiki/Jordan_normal_form). Hence, there is no truly general solution of the problem as you require.
Fortunately, like “most” (real or complex) matrices are invertible, “most” (real or complex) matrices have a diagonal complex Jordan normal form. In this case, the Jordan normal form
A3 = T·J·T⁻¹
can be calculated in R as such:
X = eigen(A3)
T = X$vectors
J = Diagonal( x=X$values )
To test this recipe, compare
Tinv = solve(T)
T %*% J %*% Tinv
with A3. They should match (up to rounding errors) if A3 has a diagonal Jordan normal form.
Since J is diagonal, its squareroot is simply the diagonal matrix of the square roots
Jsqrt = Diagonal( x=sqrt( X$values ) )
so that Jsqrt·Jsqrt = J. Moreover, this implies
(T·Jsqrt·T⁻¹)² = T·Jsqrt·T⁻¹·T·Jsqrt·T⁻¹ = T·Jsqrt·Jsqrt·T⁻¹ = T·J·T⁻¹ = A3
so that in fact we obtain
√A3 = T·Jsqrt·T⁻¹
or in R code
A3sqrt = T %*% Jsqrt %*% Tinv
To test this, calculate
A3sqrt %*% A3sqrt
and compare with A3.
Square root of the inverse
The square root of the inverse (or, equally, the inverse of the sqare root) can be calculated easily once a diagonal Jordan normal form has been calculated. Instead of J use
Jinvsqrt = Diagonal( x=1/sqrt( X$values ) )
and calculate, analogously to above,
A3invsqrt = T %*% Jinvsqrt %*% Tinv
and observe
A3·A3invsqrt² = … = T·(J/√J/√J)·T⁻¹ = 1
the unit matrix so that A3invsqrt is the desired result.
In case A3 is not invertible, a generalised inverse (not necessarily the Moore-Penrose one) can be calculated by replacing all undefined entries in Jinvsqrt by 0, but as I said above, this should be done with suitable care in the light of the overall application and its stability against rounding errors.
In case A3 does not have a diagonal Jordan normal form, there is no square root, so the above formulas will yield some other result. In order not to run into this case at times of bad luck, best implement a test whether
A3invsqrt %*% A3 %*% A3invsqrt
is close enough to what you would consider a 1 matrix (this only applies if A3 was invertible in the first place).
PS: Note that you can prefix a sign ± for each diagonal entry of Jinvsqrt to your liking.

Resources