Using vctrs in matrices - r

I'm experimenting with the vctrs package. My actual use-case is in relevant aspects similar to the rational class implemented in the helpful S3 vectors article on the vctrs homepage, in that it uses rcrd for paired data. I'll use that for my reprex for clarity. (EDIT: I am not, however, specifically interested in rationals.) Let me paste the relevant parts first:
library(vctrs)
library(zeallot)
new_rational <- function(n = integer(), d = integer()) {
vec_assert(n, ptype = integer())
vec_assert(d, ptype = integer())
new_rcrd(list(n = n, d = d), class = "vctrs_rational")
}
rational <- function(n, d) {
c(n, d) %<-% vec_cast_common(n, d, .to = integer())
c(n, d) %<-% vec_recycle_common(n, d)
new_rational(n, d)
}
format.vctrs_rational <- function(x, ...) {
n <- field(x, "n")
d <- field(x, "d")
out <- paste0(n, "/", d)
out[is.na(n) | is.na(d)] <- NA
out
}
vec_ptype_abbr.vctrs_rational <- function(x, ...) "rtnl"
vec_ptype_full.vctrs_rational <- function(x, ...) "rational"
An example of using this:
(x <- rational(1, 1:15))
#> <rational[15]>
#> [1] 1/1 1/2 1/3 1/4 1/5 1/6 1/7 1/8 1/9 1/10 1/11 1/12 1/13 1/14 1/15
My problem arises when trying to use a class like this in a matrix:
matrix(x, ncol = 5, nrow = 3)
#> Warning in matrix(x, ncol = 5, nrow = 3): data length [2] is not a sub-multiple
#> or multiple of the number of rows [3]
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
#> [2,] Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
#> [3,] Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
Created on 2020-06-05 by the reprex package (v0.3.0)
I was hoping to get a 3-by-5 matrix with each cell containing one value from x, as would have happened if x had been a "normal" vector. Instead, I get a 3-by-5 matrix of lists, where vctrs tries to make alternating rows contain n and d values, respectively.
My question, therefore, is is it possible to get vctrs to work with matrices in the "expected" manner for a situation like this, and if so, how? By experimenting, I got the sense that this might have to do with implementing dim.rational and `dim<-.rational`, but I couldn't make it work.
EDIT: If the desired matrix is not clear (as suggested in the comments), I would like a matrix object somewhat akin to the following, which I've edited by hand:
(m <- matrix(x, ncol = 5, nrow = 3))
#> <rational[15]>
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1/1 1/4 1/7 1/10 1/13
#> [2,] 1/2 1/5 1/8 1/11 1/14
#> [3,] 1/3 1/6 1/9 1/12 1/15
Such that normal matrix operations would work on m, e.g.
m[1,]
#> <rational[5]>
#> 1/1 1/4 1/7 1/10 1/13

The whole design of the rational class seems built on preserving its type safety, and hiding implementation from users, which I can see would be necessary to get it to work consistently, but this means that you can't expect it to play nicely with R's default S3 methods.
The help file for vctrs specifically says
dims(), dims<-, dimnames(), dimnames<-, levels(), and levels<- methods throw errors.
This suggests that the authors of vctrs didn't think it was a great base on which to build matrix methods.
In any case, I wouldn't be in such a hurry to try to get it into a matrix, since you can't do anything with it once it's there: there are no arithmetic methods available to you:
x + 2
#> Error: <rational> + <double> is not permitted
#> Run `rlang::last_error()` to see where the error occurred.
x * 2
#> Error: <rational> * <double> is not permitted
#> Run `rlang::last_error()` to see where the error occurred.
x + x
#> Error: <rational> + <rational> is not permitted
#> Run `rlang::last_error()` to see where the error occurred.
So you would need to define the arithmetic methods first. Before you even do that, you need $ accessors for the numerators and denominators, an is.rational function to check the type before attempting arithmetic, a function to find the greatest common denominator, and a function to simplify your rationals based on it.
`$.vctrs_rational` <- function(vec, symb) unclass(vec)[[as.character(symb)]]
is.rational <- function(num) class(num)[1] == "vctrs_rational"
gcd <- function(x, y) ifelse(x %% y, gcd(y, x %% y), y)
simplify <- function(num) {
common <- gcd(num$n, num$d)
rational(num$n / common, num$d/common)
}
So now you can do
x$n
#> [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
x$d
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
is.rational(x)
#> [1] TRUE
And now write the arithmetic functions. For example, here is an implementation of basic arithmetic to cover numeric and rational types:
Ops.vctrs_rational <- function(vec, num)
{
if(!is.rational(vec)) {tmp <- vec; vec <- num; num <- tmp; }
if(.Generic == '*'){
if(is.rational(num)) return(simplify(rational(vec$n * num$n, vec$d * num$d)))
else return(simplify(rational(vec$n * 2, vec$d)))
}
else if (.Generic == '/'){
if(is.rational(num)) return(vec * rational(num$d, num$n))
else return(vec * rational(1, num))
}
else if (.Generic == '+'){
if(is.rational(num)){
new_n <- vec$n * (vec$d * num$d)/vec$d + num$n * (vec$d * num$d)/num$d
return(simplify(rational(new_n, vec$d * num$d)))
}
else return(simplify(rational(num * vec$d + vec$n, vec$d)))
}
else if (.Generic == '-'){
if(is.rational(num)) return(vec + rational(-num$n, num$d))
else return(vec + (-num))
}
else if (.Generic == '^'){
if(is.rational(num) | num < 0) stop("fractional and negative powers not supported")
return(simplify(rational(vec$n ^ num, vec$d ^ num)))
}
}
This now allows you to do, for example:
x * 3
#> <rational[15]>
#> [1] 3/1 3/2 1/1 3/4 3/5 1/2 3/7 3/8 1/3 3/10 3/11 1/4 3/13 3/14 1/5
x + x
#> <rational[15]>
#> [1] 2/1 1/1 2/3 1/2 2/5 1/3 2/7 1/4 2/9 1/5 2/11 1/6 2/13 1/7 2/15
(2 + x)^2 / (3 * x + 1)
#> <rational[15]>
#> [1] 3/1 25/8 49/15 27/8 121/35 169/48 25/7 289/80
#> [9] 361/99 147/40 529/143 625/168 243/65 841/224 961/255
Trying to use matrix() itself directly is probably not going to work, since matrix works by converting to a base vector and then calling C code. This strips out class information.
That means you need to define a separate rational_matrix class, which in turn would benefit from a supporting rational_vector class. We can then define specific format and print methods:
as.vector.vctrs_rational <- function(x, ...) {
n <- x$n/x$d
attr(n, "denom") <- x$d
attr(n, "numerator") <- x$n
class(n) <- "rational_attr"
n
}
rational_matrix <- function(data, nrow = 1, ncol = 1,
byrow = FALSE, dimnames = NULL){
d <- as.vector(data)
m <- .Internal(matrix(d, nrow, ncol, byrow, dimnames, missing(nrow),
missing(ncol)))
m_dim <- dim(m)
attributes(m) <- attributes(d)
dim(m) <- rev(m_dim)
class(m) <- c("rational_matrix", "matrix")
m
}
format.rational_matrix <- function(x) {
return(paste0(attr(x, "numerator"), "/", attr(x, "denom")))
}
print.rational_matrix <- function(x)
{
print(matrix(format(x), nrow = dim(x)[2]), quote = FALSE)
}
Finally, you need to overwrite matrix() to make it an S3 method, being sure you first copy the function as matrix.default
matrix.default <- matrix
matrix <- function(data = NA, ...) UseMethod("matrix")
matrix.vctrs_rational <- function(data, ...) rational_matrix(data, ...)
So now you can do:
matrix(x, nrow = 5)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1/1 1/4 1/7 1/10 1/13
#> [2,] 1/2 1/5 1/8 1/11 1/14
#> [3,] 1/3 1/6 1/9 1/12 1/15
rational_matrix(x + 5, nrow = 3)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 6/1 21/4 36/7 51/10 66/13
#> [2,] 11/2 26/5 41/8 56/11 71/14
#> [3,] 16/3 31/6 46/9 61/12 76/15
rational_matrix(x + x, nrow = 5)
#> [,1] [,2] [,3]
#> [1,] 2/1 1/3 2/11
#> [2,] 1/1 2/7 1/6
#> [3,] 2/3 1/4 2/13
#> [4,] 1/2 2/9 1/7
#> [5,] 2/5 1/5 2/15
However, to get this to work we had to add extra classes with attributes anyway, so my feeling is that if you want a rational class that works with matrices etc, you should do it in native S3 or one of the other object-oriented approaches available in R rather than using the vctrs package.
It's also worth saying that the above class is far from production-ready, since you would need to add in methods to test equality / inequality, methods to describe the matrix operations, an ability to convert to decimal, plotting methods, etc.

Related

Sum of correlation matrix convergence

Assume a correlation matrix P with diagonal of zero. I want to determine the order n where the sum of all the correlation matrices orders would converge i.e. diag(3)+ P + P%^%2 + P%^%3 + ... + P%^%n would converge meaning the L1 norm drops below a tol. I looked into How to find when a matrix converges with a loop but this doens't do it for me, since it doesn't keep the orders, nor it sums them up. I can do it in a really lengthy and lousy way with for loops and all but I don't want to, since I have a big df with many time windows, so I'm looking for something efficient. Thanks!
P <- matrix(c(0,0.1,0.8,0.1,0,-0.7,0.8,-0.7,0), nrow = 3, ncol = 3, byrow = TRUE)
Some notes: The %^% operator is from expm package. To sum the matrices I used matrix(mapply(sum, diag(3), P, P%^%2, P%^%3, MoreArgs=list(na.rm=T)), ncol=3).
x %^% n computes the nth power of x efficiently, but it is inefficient to compute x %^% i for all i from 0 to n, because each x %^% i requires O(log(i)) matrix multiplications.
In general, the most efficient way to compute all of the powers of x up to the nth is recursive multiplication by x, possibly taking advantage of the diagonalizability of x.
The difference is nontrivial for large n: whereas
x2 <- x %^% 2
x3 <- x %^% 3
x4 <- x %^% 4
## and so on
requires O(log(n!)) = O(n * log(n)) matrix multiplications,
x2 <- x %*% x
x3 <- x2 %*% x
x4 <- x3 %*% x
## and so on
requires just O(n).
Here is a function that recursively computes the powers of a matrix x and their sum until it encounters a power whose 1-norm is less than tol. It begins by checking that the spectral radius of x is less than 1, which is a necessary and sufficient condition for convergence of the norm of x %^% n to 0 and thus a necessary condition for convergence of the power series. It does not attempt to diagonalize x, which would simplify computation of the power series but complicate computation of norms.
f <- function(x, tol = 1e-06, nmax = 1e+03) {
stopifnot(max(abs(eigen(x, only.values = TRUE)$values)) < 1)
pow <- sum <- diag(nrow(x))
nrm <- rep.int(NA_real_, nmax + 1)
i <- 1
while ((nrm[i] <- norm(pow, "1")) >= tol && i <= nmax) {
pow <- pow %*% x
sum <- sum + pow
i <- i + 1
}
list(x = x, tol = tol, nmax = nmax, n = i - 1, sum = sum,
norm = nrm[seq_len(i)], converged = nrm[i] < tol)
}
Your matrix P has spectral radius greater than 1, hence:
P <- matrix(c(0, 0.1, 0.8, 0.1, 0, -0.7, 0.8, -0.7, 0), 3L, 3L, byrow = TRUE)
f(P)
Error in f(P) :
max(abs(eigen(x, only.values = TRUE)$values)) < 1 is not TRUE
We can always construct a matrix P whose spectral radius is less than 1, for the purpose of testing f:
set.seed(1L)
m <- 3L
V <- matrix(rnorm(m * m), m, m)
D <- diag(runif(m, -0.9, 0.9))
P <- V %*% D %*% solve(V)
all.equal(sort(eigen(P)$values), sort(diag(D))) # [1] TRUE
(fP <- f(P))
$x
[,1] [,2] [,3]
[1,] 0.26445172 0.5317116 -0.2432849
[2,] 0.04932194 0.6332122 0.1496390
[3,] -0.31174920 0.6847937 0.1682702
$tol
[1] 1e-06
$nmax
[1] 1000
$n
[1] 60
$sum
[,1] [,2] [,3]
[1,] 1.53006915 2.081717 -0.07302465
[2,] -0.04249899 4.047528 0.74063387
[3,] -0.60849191 2.552208 1.83947562
$norm
[1] 1.000000e+00 1.849717e+00 1.223442e+00 1.008928e+00 7.799426e-01
[6] 6.131516e-01 4.795602e-01 3.754905e-01 2.938577e-01 2.299751e-01
[11] 1.799651e-01 1.408263e-01 1.101966e-01 8.622768e-02 6.747162e-02
[16] 5.279503e-02 4.131077e-02 3.232455e-02 2.529304e-02 1.979107e-02
[21] 1.548592e-02 1.211727e-02 9.481396e-03 7.418905e-03 5.805067e-03
[26] 4.542288e-03 3.554202e-03 2.781054e-03 2.176090e-03 1.702724e-03
[31] 1.332329e-03 1.042507e-03 8.157298e-04 6.382837e-04 4.994374e-04
[36] 3.907945e-04 3.057848e-04 2.392672e-04 1.872193e-04 1.464934e-04
[41] 1.146266e-04 8.969179e-05 7.018108e-05 5.491455e-05 4.296896e-05
[46] 3.362189e-05 2.630810e-05 2.058529e-05 1.610736e-05 1.260351e-05
[51] 9.861865e-06 7.716607e-06 6.038009e-06 4.724558e-06 3.696822e-06
[56] 2.892650e-06 2.263410e-06 1.771049e-06 1.385792e-06 1.084340e-06
[61] 8.484627e-07
$converged
[1] TRUE
Hence convergence is attained at n = 60. You can check that the reported sum is correct by comparing against the directly (but inefficiently) calculated value:
library("expm")
all.equal(Reduce(`+`, lapply(0:fP$n, function(i) P %^% i)), fP$sum) # [1] TRUE
And just for fun:
plot(0:fP$n, fP$norm)

Implement Levenberg-Marquardt algorithm in R

I have been told to implement the Levenberg-Marquardt algorithm in R studio, considering lambda's initial value equals 10. The algorithm must stop when the gradient's norm is lower than the tolerance. I also need to print the values that x1, x2, λ, ∇f(x), d1 and d2 take for each iteration. Any ideas on how to do it? Many thanks in advance
This is what I have:
library(pracma)
library(matlib)
MetodeLM<-function(f,xi,t)
{
l=10
stop=FALSE
x<-xi
k=0
while (stop==FALSE){
dk<- inv(hessian(f,x)+l*diag(diag(hessian(f,x))))
x1<-x+dk
if (Norm(grad(f,x1))<t){
stop<-TRUE
}
else{
if (f(x1) < f(x)){
l<-l/10
k<-k+1
stop<-FALSE
}
else{
l<-l*10
stop<-FALSE
}
}
}
}
Correcting a few mistakes in your code, the following implementation of Levenberg Marquadt's algorithm should work (note that the update rule for the algorithm is shown in the following figure):
library(pracma)
# tolerance = t, λ = l
LM <- function(f, x0, t, l=10, r=10) {
x <- x0
k <- 0
while (TRUE) {
H <- hessian(f, x)
G <- grad(f, x)
dk <- inv(H + l * diag(nrow(H))) %*% G # dk <- solve(H + l * diag(nrow(H)), G)
x1 <- x - dk # update rule
print(k) # iteration
# print(l) # λ
print(x1) # x1, x2
print(G) # ∇f(x)
print(dk) # d1, d2
if (Norm(G) < t) break
l <- ifelse(f(x1) < f(x), l / r, l * r)
k <- k + 1
x <- x1 # update the old point
}
}
For example, with the following function, the non-linear optimization algorithm will quickly find a local minimum point (in the 10th iteration) as shown below
f <- function(x) {
return ((x[1]^2+x[2]-25)^2 + (x[1]+x[2]^2-25)^2)
}
x0 <- rep(0,2)
LM(f, x0, t=1e-3, l=400, r=2)
# [1] 0
# [,1]
# [1,] 0.165563
# [2,] 0.165563
# [1] -50 -50
# [,1]
# [1,] -0.165563
# [2,] -0.165563
# [1] 1
# [,1]
# [1,] 0.7986661
# [2,] 0.7986661
# [1] -66.04255 -66.04255
# [,1]
# [1,] -0.6331031
# [2,] -0.6331031
# ...
# [1] 10
# [,1]
# [1,] 4.524938
# [2,] 4.524938
# [1] 0.0001194898 0.0001194898
# [,1]
# [1,] 5.869924e-07
# [2,] 5.869924e-07
The following animation shows the convergence to the local minimum point for the function:
The following one is with LoG function

Alternative to for loop for fast calculations when equations depend on each other

I am using a for-loop to do step-by-step calculations where several equations depend on each other. Because of this dependence, I cannot find a solution where I do the calculations inside a dataframe. My main motivation is to speed up the calculations when the Time vector is very large in the reprex below.
Could you please suggest alternatives to the following for-loop based calculations, preferably inside a dataframe in R? The only thing I can think of is using for-loop in Rcpp.
Reproducible Example
last_time <- 10
STEP = 1
Time <- seq(from = 0, to = last_time, by = STEP)
## empty vectors
eq1 <- vector(mode = "double", length = length(Time))
eq2 <- vector(mode = "double", length = length(Time))
eq <- vector(mode = "double", length = length(Time))
eq3 <- vector(mode = "double", length = length(Time))
eq4 <- vector(mode = "double", length = length(Time))
## adding the first values
eq1[1] <- 25
eq2[1] <- 25
eq[1] <- 25
eq3[1] <- 100
eq4[1] <- 2
for (t in 2:length(Time)) {
## eq1
eq1[t] <- eq[t-1] + (2.5 * STEP * (1 - (eq[t-1])/25))
## eq2
eq2[t] <- (-2 * STEP) + ((-2^2) * (STEP^2)) - (2 * eq3[t-1]) - (eq[t-1] * STEP)
## min.
eq[t] <- min(eq1[t], eq2[t] )
## eq3
eq3[t] <- (eq[t] - eq[t-1])/(STEP)
## eq4
eq4[t] <- eq4[t-1] + (eq[t-1] * STEP) + (0.5 * eq3[t-1] * (STEP)^2)
}
Output:
my_data <- data.frame(Time, eq1, eq2, eq, eq3, eq4)
my_data
#> Time eq1 eq2 eq eq3 eq4
#> 1 0 25.00000 25.00000 25.00000 -256.00000 2.0000
#> 2 1 25.00000 -231.00000 -231.00000 25.60000 -101.0000
#> 3 2 -205.40000 225.00000 -205.40000 23.04000 -319.2000
#> 4 3 -182.36000 199.40000 -182.36000 20.73600 -513.0800
#> 5 4 -161.62400 176.36000 -161.62400 18.66240 -685.0720
#> 6 5 -142.96160 155.62400 -142.96160 16.79616 -837.3648
#> 7 6 -126.16544 136.96160 -126.16544 15.11654 -971.9283
#> 8 7 -111.04890 120.16544 -111.04890 13.60489 -1090.5355
#> 9 8 -97.44401 105.04890 -97.44401 12.24440 -1194.7819
#> 10 9 -85.19961 91.44401 -85.19961 11.01996 -1286.1037
#> 11 10 -74.17965 79.19961 -74.17965 0.00000 -1365.7934
Created on 2021-02-28 by the reprex package (v1.0.0)
You could define a recursive function. A loop is faster than recursion though.
g <- function(m, STEP, time, x=2) {
if (time == 0) m
else {
## eq1
m[x, 2] <- m[x - 1, 1] + 2.5*STEP*(1 - (m[x - 1, 1])/25)
## eq2
m[x, 3] <- -2*STEP + -2^2*STEP^2 - 2*m[x - 1, 4] - m[x - 1, 1]*STEP
## min.
m[x, 1] <- min(m[x, 2], m[x, 3])
## eq3
m[x - 1, 4] <- (m[x, 1] - m[x - 1, 1])/STEP
## eq4
m[x, 5] <- m[x - 1, 5] + m[x - 1, 1]*STEP + 0.5*m[x - 1, 4]*STEP^2
g(m, STEP, time - 1, x + 1)
}
}
Usage
last_time <- 10; STEP <- 1
First <- c(eq0=25, eq1=25, eq2=25, eq3=100, eq4=2)
m <- matrix(0, last_time + 1, length(First), dimnames=list(NULL, names(First)))
m[1, ] <- First
g(m, STEP, last_time)
# eq0 eq1 eq2 eq3 eq4
# [1,] 25.00000 25.00000 25.00000 -256.00000 2.0000
# [2,] -231.00000 25.00000 -231.00000 25.60000 -101.0000
# [3,] -205.40000 -205.40000 225.00000 23.04000 -319.2000
# [4,] -182.36000 -182.36000 199.40000 20.73600 -513.0800
# [5,] -161.62400 -161.62400 176.36000 18.66240 -685.0720
# [6,] -142.96160 -142.96160 155.62400 16.79616 -837.3648
# [7,] -126.16544 -126.16544 136.96160 15.11654 -971.9283
# [8,] -111.04890 -111.04890 120.16544 13.60489 -1090.5355
# [9,] -97.44401 -97.44401 105.04890 12.24440 -1194.7819
# [10,] -85.19961 -85.19961 91.44401 11.01996 -1286.1037
# [11,] -74.17965 -74.17965 79.19961 0.00000 -1365.7934
as you asked how it works:
The recursive filter function of stats::filter can be used with mapply as follows:
dataframe <-
mapply(stats::filter,
dataframe,
filter = vector,
method = "recursive")
where vector is e.g. c(25), which could be your first eq1[1] <- 25
The recursive filter works like a recursive loop but is a bit more elegant:
Then the mapply recursive filter would do:
dataframe / vector
row or timepoint 1 20
row or timepoint 2 30 + (20 * c(25))
row or timepoint 3 40 + ((20*25)+30) * c(25))
It calculates the value in the first row and uses it in the next, where it multiplies the next vector. Perhaps if you play around with stats filter and the recursive method you also get the same result. It is a row based calculation over time similar to Rcpp but more flexible.

Choose closest x elements by index in a list/vector

If I have a vector such as x <-c(1,2,3,4,5,6,7,8,9), I want a function f such that
f(vector,index,num) where it takes the vector and gives me num "closest" elements to that one on the index
Examples:
f(x,3,4) = c(1,2,4,5)
f(x,1,5) = c(2,3,4,5,6)
f(x,8,3) = c(6,7,9)
Since there is also the issue where if we have an odd num, we will need to choose whether to pick left or right side by symmetry, let's go with choosing the left side (but right side is ok too)
i.e f(x,4,5) = c(1,2,3,5,6) and f(x,7,3) = c(5,6,8)
I hope my question is clear, thank you for any help/responses!
edit: The original vector of c(1:9) is arbitrary, the vector could be a vector of strings, or a vector of length 1000 with shuffled numbers with repeats etc.
i.e c(1,7,4,2,3,7,2,6,234,56,8)
num_closest_by_indices <- function(v, idx, num) {
# Try the base case, where idx is not within (num/2) of the edge
i <- abs(seq_along(x) - idx)
i[idx] <- +Inf # sentinel
# If there are not enough elements in the base case, incrementally add more
for (cutoff_idx in seq(floor(num/2), num)) {
if (sum(i <= cutoff_idx) >= num) {
# This will add two extra indices every iteration. Strictly if we have an even length, we should add the leftmost one first and `continue`, to break ties towards the left.
return(v[i <= cutoff_idx])
}
}
}
Here's an illustration of this algorithm: we rank the indices in order of desirability, then pick the lowest num legal ones:
> seq_along(x)
1 2 3 4 5 6 7 8 9
> seq_along(x) - idx
-2 -1 0 1 2 3 4 5 6
> i <- abs(seq_along(x) - idx)
2 1 0 1 2 3 4 5 6
> i[idx] <- +Inf # sentinel to prevent us returning the element itself
2 1 Inf 1 2 3 4 5 6
Now we can just find num elements with smallest values (break ties arbitrarily, unless you have a preference (left)).
Our first guess is all indices <= (num/2) ; this might not be enough if index is within (num/2) of the start/end.
> i <= 2
TRUE TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE
> v[i <= 2]
1 2 4 5
So, adapting #dash2's code to handle the corner cases where some indices are illegal (nonpositive, or > length(x)), i.e. ! %in% 1:L. Then min(elems) would be the number of illegal indices which we cannot pick, hence we must pick abs(min(elems)) more.
Notes:
in the end the code is simpler and faster to handle it by three piecewise cases. Aww.
it actually seems to simplify things if we pick (num+1) indices, then remove idx before returning the answer. Using result[-idx] to remove it.
Like so:
f <- function (vec, elem, n) {
elems <- seq(elem - ceiling(n/2), elem + floor(n/2))
if (max(elems) > length(vec)) elems <- elems - (max(elems) - length(vec))
if (elems[1] < 1) elems <- elems + (1 - elems[1])
elems <- setdiff(elems, elem)
vec[elems]
}
Giving results:
> f(1:9, 1, 5)
[1] 2 3 4 5 6
> f(1:9, 9, 5)
[1] 4 5 6 7 8
> f(1:9, 2, 5)
[1] 1 3 4 5 6
> f(1:9, 4, 5)
[1] 1 2 3 5 6
> f(1:9, 4, 4)
[1] 2 3 5 6
> f(1:9, 2, 4)
[1] 1 3 4 5
> f(1:9, 1, 4)
[1] 2 3 4 5
> f(1:9, 9, 4)
[1] 5 6 7 8
Start a function with the variable argument x first, and the reference table and n after
.nearest_n <- function(x, table, n) {
The algorithm assumes that table is numeric, without any duplicates, and all values finite; n has to be less than or equal to the length of the table
## assert & setup
stopifnot(
is.numeric(table), !anyDuplicated(table), all(is.finite(table)),
n <= length(table)
)
Sort the table and then 'clamp' maximum and minimum values
## sort and clamp
table <- c(-Inf, sort(table), Inf)
len <- length(table)
Find the interval in table where x occurs; findInterval() uses an efficient search. Use the interval index as the initial lower index, and add 1 for the upper index, making sure to stay in-bounds.
## where to start?
lower <- findInterval(x, table)
upper <- min(lower + 1L, len)
Find the nearest n neighbors by comparing the lower and upper index distance to x, record the nearest value, and increment the lower or upper index as appropriate and making sure to stay in-bounds
## find
nearest <- numeric(n)
for (i in seq_len(n)) {
if (abs(x - table[lower]) < abs(x - table[upper])) {
nearest[i] = table[lower]
lower = max(1L, lower - 1L)
} else {
nearest[i] = table[upper]
upper = min(len, upper + 1L)
}
}
Then return the solution and finish the function
nearest
}
The code might seem verbose, but is actually relatively efficient because the only operations on the entire vector (sort(), findInterval()) are implemented efficiently in R.
A particular advantage of this approach is that it can be vectorized in it's first argument, calculating the test for using lower (use_lower = ...) as a vector and using pmin() / pmax() as clamps.
.nearest_n <- function(x, table, n) {
## assert & setup
stopifnot(
is.numeric(table), !anyDuplicated(table), all(is.finite(table)),
n <= length(table)
)
## sort and clamp
table <- c(-Inf, sort(table), Inf)
len <- length(table)
## where to start?
lower <- findInterval(x, table)
upper <- pmin(lower + 1L, len)
## find
nearest <- matrix(0, nrow = length(x), ncol = n)
for (i in seq_len(n)) {
use_lower <- abs(x - table[lower]) < abs(x - table[upper])
nearest[,i] <- ifelse(use_lower, table[lower], table[upper])
lower[use_lower] <- pmax(1L, lower[use_lower] - 1L)
upper[!use_lower] <- pmin(len, upper[!use_lower] + 1L)
}
# return
nearest
}
For instance
> set.seed(123)
> table <- sample(100, 10)
> sort(table)
[1] 5 29 41 42 50 51 79 83 86 91
> .nearest_n(c(30, 20), table, 4)
[,1] [,2] [,3] [,4]
[1,] 29 41 42 50
[2,] 29 5 41 42
Generalize this by taking any argument and coercing it to the required form using a reference look-up table table0 and the indexes into it table1
nearest_n <- function(x, table, n) {
## coerce to common form
table0 <- sort(unique(c(x, table)))
x <- match(x, table0)
table1 <- match(table, table0)
## find nearest
m <- .nearest_n(x, table1, n)
## result in original form
matrix(table0[m], nrow = nrow(m))
}
As an example...
> set.seed(123)
> table <- sample(c(letters, LETTERS), 30)
> nearest_n(c("M", "Z"), table, 5)
[,1] [,2] [,3] [,4] [,5]
[1,] "o" "L" "O" "l" "P"
[2,] "Z" "z" "Y" "y" "w"

Weight any set of numbers to sum to zero in R

I want to write a function in R that calculates weights to sum any set of numbers in R to zero. For example if
x <- c(-5, 6, 2, 4, -3)
I want a function that would return a new vector which has been weighted to force the vector sum to zero, by taking something off the positive numbers and adding something to the negative values...
EDIT: To clarify I do not want to shift values up or down a scale... I want to weight so that the rescaled negative numbers become slightly more/less negative and the rescaled positive numbers become slightly less/more positive.
I am not sure 1) how to go about calculating the right values for proportional weights and 2) if there is a function in R that can do it?
How about
x <- scale(x)
> x
[,1]
[1,] -1.2450825
[2,] 1.1162809
[3,] 0.2576033
[4,] 0.6869421
[5,] -0.8157437
attr(,"scaled:center")
[1] 0.8
attr(,"scaled:scale")
[1] 4.658326
> sum(scale(x))
[1] 5.551115e-17
Edit:
As suggested by #Josh O'brien, setting scale = FALSE gives
scale(x, scale = FALSE)
[,1]
[1,] -5.8
[2,] 5.2
[3,] 1.2
[4,] 3.2
[5,] -3.8
attr(,"scaled:center")
[1] 0.8
sum(scale(x, scale = FALSE))
[1] 6.661338e-16
1) offsets #jdharrison has already indicated if you want a vector a such that sum(x-a) is zero then setting a to be the mean of x will do it.
2) weight vector The wording of the question seems to ask for a weight vector w such that sum(w * x) is zero.
(i) If x is not constant (i.e. its elements are not all the same) then in mathematical notation P = I-xx'/(x'x) is a projection orthogonal to x and P1 = 1 - xx'1/(x'x) is a vector in the range of P so switching to R code:
w <- 1 - x * sum(x) / sum(x*x)
is such a weight vector. We can verify this:
> sum(w*x)
[1] 2.220446e-16
(ii) If x is constant but not identically zero then choose any non-constant vector s <- seq_along(x), say. Then Ps = s - xx's/(x'x) is orthogonal to x so:
x <- c(1, 1, 1, 1)
s <- seq_along(x)
w <- s - x * sum(s*x) / sum(x*x)
sum(w * x)
giving:
> sum(w * x)
[1] 0
Elaborating #jdharrison's comment:
> x
[1] -5 6 2 4 -3
> sum(x)
[1] 4
> mean(x)
[1] 0.8
> x - mean(x)
[1] -5.8 5.2 1.2 3.2 -3.8
> sum(x - mean(x))
[1] 6.661338e-16 #floating point 0
So x - mean(x) will do the trick.
If you want to keep the sign after the rescaling...
x <- c(-5, -3, 0, 2, 4, 6, 50)
rescale_zero <- function(x){
x1 <- x[x>0]
x2 <- x[x<0]
d <- (sum(x1) + sum(x2)) / 2
w1 <- (sum(x1) - d) / sum(x1)
w2 <- (sum(x2) - d) / sum(x2)
y <- x
y[x>0] <- x1*w1
y[x<0] <- x2*w2
y
}
rescale_zero(x)
# [1] -21.875000 -13.125000 0.000000 1.129032 2.258065 3.387097 28.225806

Resources