Creating all multi indices of given length - r

Given an integer K and a dimension d, how can I get all multi-indices alpha with length(alpha) = d and sum(alpha) = K in R?
Example: For K=3 and d=2 and if we organize the multi-indices in a list alphas we would get
alphas[[1]] = c(3,0)
alphas[[2]] = c(2,1)
alphas[[3]] = c(1,2)
alphas[[4]] = c(0,3)

Based on the description, we may use expand.grid on the sequence from '0' to 'k', after replicating the list 'd' times, then Filter only the rows having the sum as 'k'
f1 <- function(k, d) {
lapply(Filter(function(x) sum(x) == k,
asplit(expand.grid(rep(list(0:k), d)), 1)), unname)
}
-testing
> f1(3, 2)
[[1]]
[1] 3 0
[[2]]
[1] 2 1
[[3]]
[1] 1 2
[[4]]
[1] 0 3
Or slightly more faster would be to filter with rowSums
d1 <- expand.grid(rep(list(0:3), 2))
asplit(unname(d1)[rowSums(d1) == 3,], 1)
There is also a constraints based combinations functions in RcppAlgos
f2 <- function(k, d) {
out <- RcppAlgos::comboGeneral(0:k, d, constraintFun = "sum",
comparisonFun = "==", limitConstraints = k)
asplit(rbind(out, out[, ncol(out):1]), 1)
}
-testing
> f2(3, 2)
[[1]]
[1] 0 3
[[2]]
[1] 1 2
[[3]]
[1] 3 0
[[4]]
[1] 2 1
Or as #JosephWood mentioned permuteGeneral would be more adequate compared to comboGeneral
k <- 3
d <- 2
permuteGeneral(0:k, d, TRUE, constraintFun = "sum",
comparisonFun = "==", limitConstraints = k)
Or with compositions
library(partitions)
asplit(compositions(k, d), 2)

Related

Conditionally update rast values from another raster using terra

I am using the lapp functin of {terra} in R and I want to update rast_a with values from rast_b or rast_c (and some other math) depending on the value in each cell of rast_a.
sample data
rast_a <- rast(ncol = 2, nrow = 2)
values(rast_a) <- 1:4
rast_b <- rast(ncol = 2, nrow = 2)
values(rast_b) <- c(2,2,2,2)
rast_c <- rast(ncol = 2, nrow = 2)
values(rast_c) <- c(3,3,3,3)
Problem
This is my (wrong) attempt.
my_update_formula <- function(a, b, c) {
a[a == 1] <- b[a == 1] + 10 + 20 - 30
a[a == 2] <- c[a == 2] + 10 + 50 - 50
return(a)
}
result <- lapp(c(rast_a, rast_b, rast_c),
fun = my_update_formula)
values(result)
lyr1
[1,] 3
[2,] 3
[3,] 3
[4,] 4
The actual result should be 2,3,3,4. But because of the operations inside the formula, the first value gets updated twice. First it is changed from 1 to 2 (correctly) but then it fulfills the condition of the second line of code also, and is changed again (I don't want that to happen).
How can I solve this please?
You can change your formula to
f1 <- function(a, b, c) {
d <- a
d[a == 1] <- b[a == 1]
d[a == 2] <- c[a == 2] + 10
d
}
#or
f2 <- function(a, b, c) {
i <- a == 1
j <- a == 2
a[i] <- b[i]
a[j] <- c[j] + 10
return(a)
}
lapp(c(rast_a, rast_b, rast_c), fun = f1) |> values()
# lyr1
#[1,] 2
#[2,] 13
#[3,] 3
#[4,] 4
lapp(c(rast_a, rast_b, rast_c), fun = f2) |> values()
# lyr1
#[1,] 2
#[2,] 13
#[3,] 3
#[4,] 4
You can get the same result with
x <- ifel(rast_a==1, rast_b,
ifel(rast_a == 2, rast_c + 10, rast_a))

Write a loop to select all combination of variable values generating positive equation values in R

I have the following four equations (a,b,c,d), with several different variables (x,t,v,w,n,f). My goal would be to try and find all sets of variable values that would generate all positive (and non-zero) numbers for equations(a,b,c,d). A regular loop would just go through each number of the sequence generated and systematically check if it generates a positive value or not. I want it to pick up random numbers from each sequence and test it against the others in R.
For example (x=8, t = 2.1,v=13,w=1,n=10,f=1) is a possible set of combinations.
Please do not suggest analytically solving for these and then finding out values. These are simply representations of equations I'm dealing with. The equations I have are quite complex, and more than 15 variables.
#Equations
a <- x * t - 2*x
b <- v - x^2
c <- x - w*t - t*t
d <- (n - f)/t
x <- seq(from = 0.0001, to = 1000, by = 0.1)
t <- seq(from = 0.0001, to = 1000, by = 0.1)
v <- seq(from = 0.0001, to = 1000, by = 0.1)
w <- seq(from = 0.0001, to = 1000, by = 0.1)
n <- seq(from = 0.0001, to = 1000, by = 0.1)
f <- seq(from = 0.0001, to = 1000, by = 0.1)
For a start, it might be better to organize your equations and your probe values into lists:
set.seed(1222)
values <- list(x = x, t = t, v = v, w = w, n = n, f = f)
eqs <- list(
a = expression(x * t - 2 * x),
b = expression(v - x^2),
c = expression(x - w*t - t*t),
d = expression((n - f)/t)
)
Then we can define a number of samples to take randomly from each probe vector:
samples <- 3
values.sampled <- lapply(values, sample, samples)
$x
[1] 642.3001 563.1001 221.3001
$t
[1] 583.9001 279.0001 749.1001
$v
[1] 446.6001 106.7001 0.7001
$w
[1] 636.0001 208.8001 525.5001
$n
[1] 559.8001 28.4001 239.0001
$f
[1] 640.4001 612.5001 790.1001
We can then iterate over each stored equation, evaluating the equation within the "sampled" environment:
results <- sapply(eqs, eval, envir = values.sampled)
a b c d
[1,] 373754.5 -412102.82 -711657.5 -0.1380373
[2,] 155978.8 -316975.02 -135533.2 -2.0935476
[3,] 165333.3 -48973.03 -954581.8 -0.7356827
From there you can remove any value that is 0 or less:
results[results <= 0] <- NA
If every independent value can take on the same value (e.g. seq(from = 0.0001, to = 1000, by = 0.1)), we can approach this with much greater rigor and avoid the possibility of generating duplicates. First we create a masterFun that is essentially a wrapper for all of the functions you want to define:
masterFun <- function(y) {
## y is a vector with 6 values
## y[1] -->> x
## y[2] -->> t
## y[3] -->> v
## y[4] -->> w
## y[5] -->> n
## y[6] -->> f
fA <- function(x, t) {x * t - 2*x}
fB <- function(v, x) {v - x^2}
fC <- function(x, w, t) {x - w*t - t*t}
fD <- function(n, f, t) {(n - f)/t}
## one can easily filter out negative
## results as #jdobres has done.
c(a = fA(y[1], y[2]), b = fB(y[3], y[1]),
c = fC(y[1], y[4], y[2]), d = fD(y[5], y[6], y[2]))
}
Now, using permuteSample, which is capable of generating random permutations of a vector and subsequently applying any given user defined function to each permutation, from RcppAlgos (I am the author), we have:
## Not technically the domain, but this variable name
## is concise and very descriptive
domain <- seq(from = 0.0001, to = 1000, by = 0.1)
library(RcppAlgos)
## number of variables ... x, t, v, w, n, f
## ||
## \/
permuteSample(domain, m = 6, repetition = TRUE,
n = 3, seed = 123, FUN = masterFun)
[[1]]
a b c d
218830.316100 -608541.146040 -310624.596670 -1.415869
[[2]]
a b c d
371023.322880 -482662.278860 -731052.643620 1.132836
[[3]]
a b c d
18512.60761001 -12521.71284001 -39722.27696002 -0.09118721
In short, the underlying algorithm is capable of generating the nth lexicographical result, which allows us to apply a mapping from 1 to "# of total permutations" to the permutations themselves. For example, given the permutations of the vector 1:3:
permuteGeneral(3, 3)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 3 2
[3,] 2 1 3
[4,] 2 3 1
[5,] 3 1 2
[6,] 3 2 1
We can easily generate the 2nd and the 5th permutation above without generating the first permutation or the first four permutations:
permuteSample(3, 3, sampleVec = c(2, 5))
[,1] [,2] [,3]
[1,] 1 3 2
[2,] 3 1 2
This allows us to have a more controlled and tangible grasp of our random samples as we can now think of them in a more familiar way (i.e. a random sample of numbers).
If you actually want to see which variables were used in the above calculation, we simply drop the FUN argument:
permuteSample(domain, m = 6, repetition = TRUE, n = 3, seed = 123)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 780.7001 282.3001 951.5001 820.8001 289.1001 688.8001
[2,] 694.8001 536.0001 84.9001 829.2001 757.3001 150.1001
[3,] 114.7001 163.4001 634.4001 80.4001 327.2001 342.1001

Subsetting with negative indices: best practices?

Say I have a function for subsetting (this is just a minimal example):
f <- function(x, ind = seq(length(x))) {
x[ind]
}
(Note: one could use only seq(x) instead of seq(length(x)), but I don't find it very clear.)
So, if
x <- 1:5
ind <- c(2, 4)
ind2 <- which(x > 5) # integer(0)
I have the following results:
f(x)
[1] 1 2 3 4 5
f(x, ind)
[1] 2 4
f(x, -ind)
[1] 1 3 5
f(x, ind2)
integer(0)
f(x, -ind2)
integer(0)
For the last result, we would have wanted to get all x, but this is a common cause of error (as mentionned in the book Advanced R).
So, if I want to make a function for removing indices, I use:
f2 <- function(x, ind.rm) {
f(x, ind = `if`(length(ind.rm) > 0, -ind.rm, seq(length(x))))
}
Then I get what I wanted:
f2(x, ind)
[1] 1 3 5
f2(x, ind2)
[1] 1 2 3 4 5
My question is:
Can I do something cleaner and that doesn't need passing seq(length(x)) explicitly in f2 but using directly the default value of f's parameter ind when ind.rm is integer(0)?
If you anticipate having "empty" negative indices a lot, you can get a performance improvement for these cases if you can avoid the indexing used by x[seq(x)] as opposed to just x. In other words, if you are able to combine f and f2 into something like:
new_f <- function(x, ind.rm){
if(length(ind.rm)) x[-ind.rm] else x
}
There will be a huge speedup in the case of empty negative indices.
n <- 1000000L
x <- 1:n
ind <- seq(0L,n,2L)
ind2 <- which(x>n+1) # integer(0)
library(microbenchmark)
microbenchmark(
f2(x, ind),
new_f(x, ind),
f2(x, ind2),
new_f(x, ind2)
)
all.equal(f2(x, ind), new_f(x, ind)) # TRUE - same result at about same speed
all.equal(f2(x, ind2), new_f(x, ind2)) # TRUE - same result at much faster speed
Unit: nanoseconds
expr min lq mean median uq max neval
f2(x, ind) 6223596 7377396.5 11039152.47 9317005 10271521 50434514 100
new_f(x, ind) 6190239 7398993.0 11129271.17 9239386 10202882 59717093 100
f2(x, ind2) 6823589 7992571.5 11267034.52 9217149 10568524 63417978 100
new_f(x, ind2) 428 1283.5 5414.74 6843 7271 14969 100
What you have isn't bad, but if you want to avoid passing the default value of a default argument you could restructure like this:
f2 <- function(x, ind.rm) {
`if`(length(ind.rm) > 0, f(x,-ind.rm), f(x))
}
which is slightly shorter than what you have.
On Edit
Based on the comments, it seems you want to be able to pass a function nothing (rather than simply not pass at all), so that it uses the default value. You can do so by writing a function which is set up to receive nothing, also known as NULL. You can rewrite your f as:
f <- function(x, ind = NULL) {
if(is.null(ind)){ind <- seq(length(x))}
x[ind]
}
NULL functions as a flag which tells the receiving function to use a default value for the parameter, although that default value must be set in the body of the function.
Now f2 can be rewritten as
f2 <- function(x, ind.rm) {
f(x, ind = `if`(length(ind.rm) > 0, -ind.rm, NULL))
}
This is slightly more readable than what you have, but at the cost of making the original function slightly longer.
To implement "parameter1 = if(cond1) then value1 else default_value_of_param1", I used formals to get default parameters as a call:
f <- function(x, ind.row = seq_len(nrow(x)), ind.col = seq_len(ncol(x))) {
x[ind.row, ind.col]
}
f2 <- function(x, ind.row.rm = integer(0), ind.col.rm = integer(0)) {
f.args <- formals(f)
f(x,
ind.row = `if`(length(ind.row.rm) > 0, -ind.row.rm, eval(f.args$ind.row)),
ind.col = `if`(length(ind.col.rm) > 0, -ind.col.rm, eval(f.args$ind.col)))
}
Then:
> x <- matrix(1:6, 2)
> f2(x, 1:2)
[,1] [,2] [,3]
> f2(x, , 1:2)
[1] 5 6
> f2(x, 1, 2)
[1] 2 6
> f2(x, , 1)
[,1] [,2]
[1,] 3 5
[2,] 4 6
> f2(x, 1, )
[1] 2 4 6
> f2(x)
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6

Index in a dist matrix (1D vector) equivalent to 2D matrix indices, in R

Let's say I have a matrix that looks like this, and I convert it into a dist class object (without diagonal), and then into a vector for later purposes.
m = matrix(c(0,1,2,3, 1,0,3,4, 2,3,0,5, 3,4,5,0), nrow=4)
#m:
[,1] [,2] [,3] [,4]
[1,] 0 1 2 3
[2,] 1 0 3 4
[3,] 2 3 0 5
[4,] 3 4 5 0
md = as.dist(m, diag=F)
# md:
1 2 3
2 1
3 2 3
4 3 4 5
mdv = as.vector(md)
# 1 2 3 3 4 5
I can access the original matrix as usual with [], and I could easily access the one-dimensional index (of, for example row 3, col 2) using m[ 3+((2-1)*4) ]. The dist object (and the vector) is one-dimensional, but composes only of the lower triangle of the original matrix (and also lacks one element from each original col/row, since the diagonal was removed).
How can I later access the equivalent element in the vector mdv? So e.g. how could I access the equivalent of m[3,2] (value 3) in the object mdv? (Not by the value, since there can be duplicate values, but by the index) Related Q&A resolve similar problems with as.matrix on the dist object, but that doesn't do it for me (since I need to deal with the vector).
Having the lower.tri(, diag = FALSE) distances-vector ("mdv") you could (1) find the respective dimensions of the distances-matrix ("m") and (2) convert the i + (j - 1)*nrow indices accordingly by subtracting the equivalent missing "upper.tri".
ff = function(x, i, j)
{
#assumes that 'x' is a valid distances vector that results in correct 'n'
n = (1 + sqrt(1 + 8 * length(x))) / 2
#make sure i >= j
ii = pmax(i, j); jj = pmin(i, j)
#insert 0s to handle 'i == j'
x = c(unlist(lapply(split(x, rep(seq_len(n - 1), (n - 1):1)),
function(X) c(0, X)), FALSE, FALSE), 0)
#subtract the missing `upper.tri` elements
x[(ii + (jj - 1L) * n) - cumsum(0:(n - 1))[jj]]
}
E.g.:
n = 3
m = matrix(0, n, n); m[lower.tri(m)] = runif(choose(n, 2)); m = m + t(m); x = c(as.dist(m))
m
# [,1] [,2] [,3]
#[1,] 0.0000000 0.3796833 0.5199015
#[2,] 0.3796833 0.0000000 0.4770344
#[3,] 0.5199015 0.4770344 0.0000000
m[cbind(c(2, 2, 3, 1), c(3, 2, 1, 2))]
#[1] 0.4770344 0.0000000 0.5199015 0.3796833
ff(x, c(2, 2, 3, 1), c(3, 2, 1, 2))
#[1] 0.4770344 0.0000000 0.5199015 0.3796833
n = 23
m = matrix(0, n, n); m[lower.tri(m)] = runif(choose(n, 2)); m = m + t(m); x = c(as.dist(m))
i = sample(seq_len(n), 25, TRUE); j = sample(seq_len(n), 25, TRUE)
all.equal(m[cbind(i, j)], ff(x, i, j))
#[1] TRUE
etc...
How about this function:
fun <- function(r, c){
stopifnot(r != c)
if(r > c) (r-2)*(r-1)/2 + c
else (c-2)*(c-1)/2 + r
}
mdv[fun(1, 2)] # 1
mdv[fun(2, 3)] # 3
mdv[fun(3, 4)] # 5
mdv[fun(2, 1)] # 1
mdv[fun(3, 2)] # 3
mdv[fun(1, 1)] # stop
Cases with r == c should be handled before applying fun. For convenience, You can write another function for handling this case.

Mapping indices of vector elements

This is probably a trivial question.
Given a vector of characters, some of which are repeating:
vec <- c("a","b","d","e","e","f","g","a","d")
I'm looking for an efficient function that will return for each unique element in vec the indices of where it appears in vec.
I imagine that the return value would be something like this list:
list(a = c(1,8), b = 2, d = c(3,9), e = c(4,5), f = 6, g = 7)
Here's a few options:
lapply(setNames(unique(vec),unique(vec)), function(x) which(x == vec) )
# or to avoid setNames and still ensure you get a list:
sapply(unique(vec), function(x) which(x == vec), simplify=FALSE)
# or even better but maybe not as extensible:
split(seq_along(vec),vec)
All giving:
$a
[1] 1 8
$b
[1] 2
$d
[1] 3 9
$e
[1] 4 5
$f
[1] 6
$g
[1] 7

Resources