Creating canonical basis vectors in R? - r

Is there an efficient way of creating the canonical basis vectors:
e_1=c(1,0,0,...),
e_2=c(0,1,0,...),
e_3=c(0,0,1,...),
...
for arbitrary lengths in R and probably large p?
I know that I could do
e_1 = rep(0,p)
e_1[1] = 1
and so on or diag(p)[1] ... But I wonder if there is a more efficient way since I only need one at a time in a loop.

It can be somewhat shorter with replace:
make_basis <- function(k, p = 10) replace(numeric(p), k, 1)
# usage
e_4 = make_basis(4)
e_4
# [1] 0 0 0 1 0 0 0 0 0 0

Related

Constrained optimisation with function in the constraint and binary variable

I am looking for a way to solve - in R - a constrained optimisation problem of the form
min sum(x)
s.t. f(x) < k
where x is a binary variable (either 0 or 1) with lenght n, and f(x) is a function that depends on the entire x variable, and k is an integer constant. Thus, f(x) is not a set of n constraints to each value of x (such as sqrt(x)), but a constraint that is met based on the entire set of values of the binary variable x.
I have tried to use ompr R package with the following syntax
v < 1:10
result <- MILPModel() %>%
add_variable(x[i], i = 1:v, type = "binary") %>%
set_objective(sum_expr(x[i], i = 1:v), sense = "min") %>%
add_constraint(f(x) <= 60) %>%
solve_model(with_ROI(solver = "glpk"))
but it does not work, because I believe the package does not accept a global f(x) constraint.
Here is a solution with the rgenoud package.
library(rgenoud)
g <- function(x){
c(
ifelse(sd(x) > 0.2, 0, 1), # set the constraint (here sd(x)>0.2) in this way
sum(x) # the objective function (to minimize/maximize)
)
}
solution <- genoud(
g, lexical = 2,
nvars = 30,
starting.values = rep(0, 30),
Domains = cbind(rep(0,30), rep(1,30)),
data.type.int = TRUE)
solution$par # the values of x
## [1] 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
sd(solution$par) # is the constraint satisfied ?
## [1] 0.2537081
solution$value
## [1] 0 2 ; 0 is the value of ifelse(sd(x)>0.2,0,1) and 2 is the value of sum(x)
See the Notes section in ?genoud to understand the lexical argument.

R: Remove the number of occurrences of values in one vector from another vector, but not all

Apologies for the confusing title, but I don't know how to express my problem otherwise. In R, I have the following problem which I want to solve:
x <- seq(1,1, length.out=10)
y <- seq(0,0, length.out=10)
z <- c(x, y)
p <- c(1,0,1,1,0,0)
How can I remove vector p from vector z so that vector a new vector i now has three occurrences of 1 and three occurrences 0 less, so what do I have to do to arrive at the following result? In the solution, the order of 1's and 0's in z should not matter, they just might have been in a random order, plus there can be other numbers involved as well.
i
> 1 1 1 1 1 1 1 0 0 0 0 0 0 0
Thanks in advance!
Similar to #VincentGuillemot's answer, but in functional programming style. Uses purrr package:
i <- z
map(p, function(x) { i <<- i[-min(which(i == x))]})
i
> i
[1] 1 1 1 1 1 1 1 0 0 0 0 0 0 0
There might be numerous better ways to do it:
i <- z
for (val in p) {
if (val %in% i) {
i <- i[ - which(i==val)[1] ]
}
}
Another solution that I like better because it does not require a test (and thanks fo #Franck's suggestion):
for (val in p)
i <- i[ - match(val, i, nomatch = integer(0) ) ]

Fastest way to find switching from positive to negative in a vector in R

I have a vector that contains both positive and negative values. For example something like
x = c(1,2,1,-2,-3,3,-4,5,1,1,-3)
And now I want to flag the indices of the vector where the value changes from positive to negative or negative to positive. So in the example above I would want something a vector of indices that looks something like this
y=c(0,0,0,1,0,1,1,1,0,0,1)
I am doing this in R so if possible I would like to avoid using for-loops.
I think this should work:
+(c(0, diff(sign(x))) != 0)
#[1] 0 0 0 1 0 1 1 1 0 0 1
all.equal(+(c(0, diff(sign(x))) != 0), y)
#[1] TRUE
Here's one way:
yy = rep(0, length(x))
yy[with(rle(sign(x)),{ p = cumsum(c(1,lengths)); p[ -c(1,length(p)) ] })] = 1
all.equal(yy,y) # TRUE
...which turned out more convoluted than I expected at first.

How to declare a vector of zeros in R

I suppose this is trivial, but I can't find how to declare a vector of zeros in R.
For example, in Matlab, I would write:
X = zeros(1,3);
You have several options
integer(3)
numeric(3)
rep(0, 3)
rep(0L, 3)
You can also use the matrix command, to create a matrix with n lines and m columns, filled with zeros.
matrix(0, n, m)
replicate is another option:
replicate(10, 0)
# [1] 0 0 0 0 0 0 0 0 0 0
replicate(5, 1)
# [1] 1 1 1 1 1
To create a matrix:
replicate( 5, numeric(3) )
# [,1] [,2] [,3] [,4] [,5]
#[1,] 0 0 0 0 0
#[2,] 0 0 0 0 0
#[3,] 0 0 0 0 0
X <- c(1:3)*0
Maybe this is not the most efficient way to initialize a vector to zero, but this requires to remember only the c() function, which is very frequently cited in tutorials as a usual way to declare a vector.
As as side-note: To someone learning her way into R from other languages, the multitude of functions to do same thing in R may be mindblowing, just as demonstrated by the previous answers here.
Here are four ways to create a one-dimensional vector with zeros - then check if they are identical:
numeric(2) -> a; double(2) -> b; vector("double", 2) -> c; vector("numeric", 2) -> d
identical(a, b, c, d)
In the iteration chapter in R for Data Science they use the "d" option to create this type of vector.

R efficient way to use values as indexes

I have 10M rows matrix with integer values
A row in this matrix can look as follows:
1 1 1 1 2
I need to transform the row above to the following vector:
4 1 0 0 0 0 0 0 0
Other example:
1 2 3 4 5
To:
1 1 1 1 1 0 0 0 0
How to do it efficiently in R
?
Update:
There is a function that does exactly what I need: base::tabulate (suggested here before)
but it is extremely slow (took at least 15 mins to go over my init matrix)
I would try something like this:
m <- nrow(x)
n <- ncol(x)
i.idx <- seq_len(m)
j.idx <- seq_len(n)
out <- matrix(0L, m, max(x))
for (j in j.idx) {
ij <- cbind(i.idx, x[, j])
out[ij] <- out[ij] + 1L
}
A for loop might sound surprising for a question that asks for an efficient implementation. However, this solution is vectorized for a given column and only loops through five columns. This will be many, many times faster than looping over 10 million rows using apply.
Testing with:
n <- 1e7
m <- 5
x <- matrix(sample(1:9, n*m, T), n ,m)
this approach takes less than six seconds while a naive t(apply(x, 1, tabulate, 9)) takes close to two minutes.

Resources