Constrained optimisation with function in the constraint and binary variable - r

I am looking for a way to solve - in R - a constrained optimisation problem of the form
min sum(x)
s.t. f(x) < k
where x is a binary variable (either 0 or 1) with lenght n, and f(x) is a function that depends on the entire x variable, and k is an integer constant. Thus, f(x) is not a set of n constraints to each value of x (such as sqrt(x)), but a constraint that is met based on the entire set of values of the binary variable x.
I have tried to use ompr R package with the following syntax
v < 1:10
result <- MILPModel() %>%
add_variable(x[i], i = 1:v, type = "binary") %>%
set_objective(sum_expr(x[i], i = 1:v), sense = "min") %>%
add_constraint(f(x) <= 60) %>%
solve_model(with_ROI(solver = "glpk"))
but it does not work, because I believe the package does not accept a global f(x) constraint.

Here is a solution with the rgenoud package.
library(rgenoud)
g <- function(x){
c(
ifelse(sd(x) > 0.2, 0, 1), # set the constraint (here sd(x)>0.2) in this way
sum(x) # the objective function (to minimize/maximize)
)
}
solution <- genoud(
g, lexical = 2,
nvars = 30,
starting.values = rep(0, 30),
Domains = cbind(rep(0,30), rep(1,30)),
data.type.int = TRUE)
solution$par # the values of x
## [1] 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
sd(solution$par) # is the constraint satisfied ?
## [1] 0.2537081
solution$value
## [1] 0 2 ; 0 is the value of ifelse(sd(x)>0.2,0,1) and 2 is the value of sum(x)
See the Notes section in ?genoud to understand the lexical argument.

Related

Create a vector in a loop for every pair of samples

I do a pairwise calculation between my samples and I want every pairwise calculation to be stored in a separate vector. For 3 comparisons, I have:
sample_12 <- vector(mode="numeric", length = 10)
sample_13 <- vector(mode="numeric", length = 10)
sample_23 <- vector(mode="numeric", length = 10)
Is there a possibility to create these vectors with the corresponding names in a loop so it can work for any given number of samples?
I tried the following code but I can't access the vectors outside the for-loop, how could I solve this issue?
pop = 3
sample = vector(mode="numeric", length = 10)
for (i in 1:(pop - 1)) {
for (j in (i + 1):pop) {
name <- paste("sample",i,j, sep = "")
name <- vector(mode="numeric", length = 10)
}
}
You can use the "assign" function:
pop = 3
sample = vector(mode="numeric", length = 10)
pop_combos <- combn(pop, 2)
for (i in 1:ncol(pop_combos)) {
name <- paste("sample_",
pop_combos[,i][1],
pop_combos[,i][2],
sep="")
assign(name, sample)
}
Outside the loop you can now access the vectors:
> sample_12
[1] 0 0 0 0 0 0 0 0 0 0
Use a list:
pop = 3
combinations = apply(combn(pop, m = 2), 2, paste, collapse = "_")
sample = replicate(n = length(combinations), numeric(10), simplify = FALSE)
names(sample) = combinations
sample
# $`1_2`
# [1] 0 0 0 0 0 0 0 0 0 0
#
# $`1_3`
# [1] 0 0 0 0 0 0 0 0 0 0
#
# $`2_3`
# [1] 0 0 0 0 0 0 0 0 0 0
You can then access each element of the list, e.g., sample[["1_3"]]. This scales up very easily and doesn't require pasting together names and using assign and get, which is just asking for hard-to-find bugs. You can use lapply or for loops to iterate over each item in the list trivially. Depending on your use case, it might make more sense to use the default simplify = TRUE inside replicate and keep it as a matrix or data frame. The only reason to use a list would be if some of the vectors needed to be different lengths.
Is something like this you are searching for?
Please suppose that you save all the vectors as rows/columns in a data.frame
list.values <- list()
col <- ncol(df)
row <- nrow(df)
for( i in 1:(col*row)) {list[[i]] = df - df[i/row,i%%col]}
Now you have access to all the data frames in the list[[i * j]], that are the difference between all the elements and the element[i,j].
E.g: You want to access the values that are made between all the dataframe and the
element [2, 3]. Then, you do this View(list[[2*3]])

Creating canonical basis vectors in R?

Is there an efficient way of creating the canonical basis vectors:
e_1=c(1,0,0,...),
e_2=c(0,1,0,...),
e_3=c(0,0,1,...),
...
for arbitrary lengths in R and probably large p?
I know that I could do
e_1 = rep(0,p)
e_1[1] = 1
and so on or diag(p)[1] ... But I wonder if there is a more efficient way since I only need one at a time in a loop.
It can be somewhat shorter with replace:
make_basis <- function(k, p = 10) replace(numeric(p), k, 1)
# usage
e_4 = make_basis(4)
e_4
# [1] 0 0 0 1 0 0 0 0 0 0

Understanding R convolution code with sapply()

I am trying to break apart the R code in this post:
x <- c(0.17,0.46,0.62,0.08,0.40,0.76,0.03,0.47,0.53,0.32,0.21,0.85,0.31,0.38,0.69)
convolve.binomial <- function(p) {
# p is a vector of probabilities of Bernoulli distributions.
# The convolution of these distributions is returned as a vector
# `z` where z[i] is the probability of i-1, i=1, 2, ..., length(p)+1.
n <- length(p) + 1
z <- c(1, rep(0, n-1))
sapply(p, function(q) {z <<- (1 - q) * z + q * (c(0, z[-n])); q})
z
}
convolve.binomial(x)
[1] 5.826141e-05 1.068804e-03 8.233357e-03 3.565983e-02 9.775029e-02
[6] 1.804516e-01 2.323855e-01 2.127628e-01 1.394564e-01 6.519699e-02
[11] 2.141555e-02 4.799630e-03 6.979119e-04 6.038947e-05 2.647052e-06
[16] 4.091095e-08
I tried debugging in RStudio, but it still opaque.
The issue is with the line: sapply(p, function(q) {z <<- (1 - q) * z + q * (c(0, z[-n])); q}).
I guess that within the context of the call convolve.binomial(x) p = q = x. At least I get identical results if I pull the lines outside the function and run sapply(x, function(x) {z <<- (1 - x) * z + x * (c(0, z[-n])); x})
:
x <- c(0.17,0.46,0.62,0.08,0.40,0.76,0.03,0.47,0.53,0.32,0.21,0.85,0.31,0.38,0.69)
n <- length(x) + 1
z <- c(1, rep(0, n-1))
# [1] 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
sapply(x, function(x) {z <<- (1 - x) * z + x * (c(0, z[-n])); x})
z # Is extracted by calling it and contains the correct result
My questions are:
What is the purpose of the ;q} ending within sapply()?
How does it relate to the <<- symbol, meant to make z accessible outside of the "implicit" loop that is sapply()?
Below you can see my problem "hacking" this line of code:
(x_complem = 1 - x)
sapply(x, function(x) {z <<- x_complem * z + x * (c(0, z[-n])); x})
z # Returns 16 values and warnings
z_offset = c(0, z[-n])
# [1] 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
sapply(x, function(x) {z <<- (1 - x) * z + x * z_offset; x})
z # Returns different values.
If you want to see the intermediate values of z as the function proceeds then insert either a cat or a print command in the code below:
sapply(x, function(x) {z <<- (1 - x) * z + x * (c(0, z[-n])); cat(z,"\n"); x})
#--------
0.83 0.17 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0.4482 0.4736 0.0782 0 0 0 0 0 0 0 0 0 0 0 0 0
0.170316 0.457852 0.323348 0.048484 0 0 0 0 0 0 0 0 0 0 0 0
0.1566907 0.4348491 0.3341083 0.07047312 0.00387872 0 0 0 0 0 0 0 0 0 0 0
0.09401443 0.3235858 0.3744046 0.1759272 0.03051648 0.001551488 0 0 0 0 0 0 0 0 0 0
0.02256346 0.1491116 0.3357823 0.3267701 0.1410286 0.02356488 0.001179131 0 0 0 0 0 0 0 0 0
snipped rest of output
I think this makes it clearer that what is happening is that each intermediate step represents a set of probabilities for a series of events. Each row sums to 1.0 and represents the probabilities of individual count survivals when there might be a smaller number of binomial parameters. The final result displays the probabilities of particular sums of counts after the full sequence has been assembled.
Another interesting feature is that this result is invariant under random re-ordering of the probabilities in x (as it should be for the original question). Examine the intermediate results from
plot(x)
lines(seq(length(z)), z)
z2 <- convolve.binomial(sample(x) )
lines(seq(length(z)), z2, col="red" )
z3 <- convolve.binomial(sample(x) )
lines(seq(length(z)), z3, col="blue" )
What is the purpose of the ;q} ending within sapply()?
The function within sapply return q, but actually it's not needed. The following function will work just the same.
convolve.binomial <- function(p) {
n <- length(p) + 1
z <- c(1, rep(0, n-1))
sapply(p, function(q) {z <<- (1 - q) * z + q * (c(0, z[-n]))})
z
}
How does it relate to the <<- symbol, meant to make z accessible outside of the "implicit" loop that is sapply()?
In R, if you search up the documentation for the <<- operator using ?'<<-' it says that
The operators <<- and >>- are normally only used in function, and cause a search to be made through parent environments for an existing definition of the variable to be assigned. If such as variable is found (and its binding is not locked) then its value is redefined, otherwise assignment takes place in the global environment.
In the function convolve.binomial the value z is defined local to the function. So z <<- actually redefines z in the convolve.binomial function.
So to summarize, the z <<- in the sapply call changes the z variable already defined in convolve.binomial and we eventually return this z. The ;q} ending is not needed within sapply().

R: Remove the number of occurrences of values in one vector from another vector, but not all

Apologies for the confusing title, but I don't know how to express my problem otherwise. In R, I have the following problem which I want to solve:
x <- seq(1,1, length.out=10)
y <- seq(0,0, length.out=10)
z <- c(x, y)
p <- c(1,0,1,1,0,0)
How can I remove vector p from vector z so that vector a new vector i now has three occurrences of 1 and three occurrences 0 less, so what do I have to do to arrive at the following result? In the solution, the order of 1's and 0's in z should not matter, they just might have been in a random order, plus there can be other numbers involved as well.
i
> 1 1 1 1 1 1 1 0 0 0 0 0 0 0
Thanks in advance!
Similar to #VincentGuillemot's answer, but in functional programming style. Uses purrr package:
i <- z
map(p, function(x) { i <<- i[-min(which(i == x))]})
i
> i
[1] 1 1 1 1 1 1 1 0 0 0 0 0 0 0
There might be numerous better ways to do it:
i <- z
for (val in p) {
if (val %in% i) {
i <- i[ - which(i==val)[1] ]
}
}
Another solution that I like better because it does not require a test (and thanks fo #Franck's suggestion):
for (val in p)
i <- i[ - match(val, i, nomatch = integer(0) ) ]

How to replace all negative values in vector by 0 in R

I have a randomly generated vector from a normal distribution with 50 elements
vector<-c(rnorm(50))
I want to change all negative values to 0 and positive values to 1
I used this function and indexing however then I do not get vector with 50 elements
vector[ vector< 0 ] <- 1
vector[ vector> 0 ] <- 0
How should I proceed?
Generate some data
x = rnorm(50)
then either
x = ifelse(x > 0, 1, 0)
or
x[x < 0] = 0
x[x > 0] = 1
Or even better
as.numeric (x>0)
However since the standard normal is symmetric about 0, why not simulate directly via
sample(0:1, 50, replace=TRUE)
The problem is that in the first query you replace all value smaller 0 by values larger zero
so the trick is to switch
vector[ vector< 0 ] <- 1
vector[ vector> 0 ] <- 0
into
vector[ vector> 0 ] <- 0
vector[ vector< 0 ] <- 1
Note that you are also slightly biased towards 0 but that should only be marginal

Resources