Speeding code up. Loop over sum with kernel function - r

I am running below code to evaluate a function at each value of r.
For each element of r, the function calculates the sum of elements of a matrix product. Before doing this, values of M are adjusted based on a kernel function.
# (1) set-up with toy data
r <- seq(0, 10, 1)
bw <- 25
M <- matrix(data = c(0, 1, 2,
1, 0, 1,
2, 1, 0), nrow = 3, ncol = 3)
X <- matrix(rep(1, 9), 3, 3)
#
# (2) computation
res <- c()
# loop, calculationg sum, Epanechnikov kernel
for(i in seq_along(r)) {
res[i] <- sum(
# Epanechnikov kernel
ifelse(-bw < (M - r[i]) & (M - r[i]) < bw,
3 * (1 - ((M - r[i])^2 / bw^2)) / (4*bw),
0) * X,
na.rm = TRUE
)
}
# result
res
I am looking for recommendations to speed this up using base R. Thank you!

Using outer:
Mr <- outer(c(M), r, "-")
colSums(3*(1 - Mr^2/bw^2)/4/bw*(abs(Mr) < bw)*c(X))
#> [1] 0.269424 0.269760 0.269232 0.267840 0.265584 0.262464 0.258480 0.253632 0.247920 0.241344 0.233904
I'll also note that the original for loop solution can be sped up by pre-allocating res (e.g., res <- numeric(length(r))) prior to the for loop.

Related

Genetic algorythm (GA) to select the optimal n values of a vector

I have to choose 10 elements of a vector to maximizes a function. Since the vector is pretty long there are to many possibilities (~1000 choose 10) to compute them all. So I started to look into the GA package to use a genetic algorithm.
I came up with this MWE:
values <- 1:1000
# Fitness function which I want to maximise
f <- function(x){
# Choose values
y <- values[x]
# From the first 10 sum up the odd values.
y <- ifelse(y %% 2 != 0, y, 0)
y <- y[1:10]
return(sum(y))
}
# Maximum value of f for this example
y <- ifelse(values %% 2 != 0, values, 0)
sum(sort(y, decreasing = TRUE)[1:10])
# [1] 9900
# Genetic algorithm
GA <- ga(type = "permutation", fitness = f, lower = rep(1, 10), upper = rep(1000, 10), maxiter = 100)
summary(GA)
The results are a bit underwhelming. From summary(GA), I get the feeling that the algorithm always permutates all 1000 values (the solution goes from x1 to x1000) which leads to an inefficient optimization. How can I tell the algorithm that it should only should use 10 values (so the solution is x1 .. x10)?
You should read https://www.jstatsoft.org/article/view/v053i04. You don't have permutation problem but selection one hence you should use binary type of genetic algorithm. Because you want to select exclusively 10 (10 ones and 990 zeroes) you should probably write your own genetic operators because that is constraint that will hardly ever be satisfied by default operators (with inclusion of -Inf in fitness function if you have more than 10 zeroes). One approach:
Population (k tells how much ones you want):
myInit <- function(k){
function(GA){
m <- matrix(0, ncol = GA#nBits, nrow = GA#popSize)
for(i in seq_len(GA#popSize))
m[i, sample(GA#nBits, k)] <- 1
m
}
}
Crossover
myCrossover <- function(GA, parents){
parents <- GA#population[parents,] %>%
apply(1, function(x) which(x == 1)) %>%
t()
parents_diff <- list("vector", 2)
parents_diff[[1]] <- setdiff(parents[2,], parents[1,])
parents_diff[[2]] <- setdiff(parents[1,], parents[2,])
children_ind <- list("vector", 2)
for(i in 1:2){
k <- length(parents_diff[[i]])
change_k <- sample(k, sample(ceiling(k/2), 1))
children_ind[[i]] <- if(length(change_k) > 0){
c(parents[i, -change_k], parents_diff[[i]][change_k])
} else {
parents[i,]
}
}
children <- matrix(0, nrow = 2, ncol = GA#nBits)
for(i in 1:2)
children[i, children_ind[[i]]] <- 1
list(children = children, fitness = c(NA, NA))
}
Mutation
myMutation <- function(GA, parent){
ind <- which(GA#population[parent,] == 1)
n_change <- sample(3, 1)
ind[sample(length(ind), n_change)] <- sample(setdiff(seq_len(GA#nBits), ind), n_change)
parent <- integer(GA#nBits)
parent[ind] <- 1
parent
}
Fitness (your function adapted for binary GA):
f <- function(x, values){
ind <- which(x == 1)
y <- values[ind]
y <- ifelse(y %% 2 != 0, y, 0)
y <- y[1:10]
return(sum(y))
}
GA:
GA <- ga(
type = "binary",
fitness = f,
values = values,
nBits = length(values),
population = myInit(10),
crossover = myCrossover,
mutation = myMutation,
run = 300,
pmutation = 0.3,
maxiter = 10000,
popSize = 100
)
Chosen values
values[which(GA#solution[1,] == 1)]

Time varying parameter-matrix in deSolve R

I am struggling with this for so long. I have a logistic growth function where the growth parameter
r is a matrix. The model is constructed in a way that I have as an output two N the N1 and N2.
I would like to be able to change the r parameter over time. When time < 50 I would like
r = r1 where
r1=matrix(c(
2,3),
nrow=1, ncol=2
When time >= 50 I would like r=r2 where
r2=matrix(c(
1,2),
nrow=1, ncol=2
Here is my function. Any help is highly appreciated.
rm(list = ls())
library(deSolve)
model <- function(time, y, params) {
with(as.list(c(y,params)),{
N = y[paste("N",1:2, sep = "")]
dN <- r*N*(1-N/K)
return(list(c(dN)))
})
}
r=matrix(c(
4,5),
nrow=1, ncol=2)
K=100
params <- list(r,K)
y<- c(N1=0.1, N2=0.2)
times <- seq(0,100,1)
out <- ode(y, times, model, params)
plot(out)
I would like ideally something like this but it does not work
model <- function(time, y, params) {
with(as.list(c(y,params)),{
N = y[paste("N",1:2, sep = "")]
r = ifelse(times < 10, matrix(c(1,3),nrow=1, ncol=2),
ifelse(times > 10, matrix(c(1,4),nrow=1, ncol=2), matrix(c(1,2),nrow=1, ncol=2)))
print(r)
dN <- r*N*(1-N/K)
return(list(c(dN)))
})
}
Thank you for your time.
Here a generic approach that uses an extended version of the approx function. Note also some further simplifications of the model function and the additional plot of the parameter values.
Edit changed according to the suggestion of Lewis Carter to make the parameter change at t=3, so that the effect can be seen.
library(simecol) # contains approxTime, a vector version of approx
model <- function(time, N, params) {
r <- approxTime(params$signal, time, rule = 2, f=0, method="constant")[-1]
K <- params$K
dN <- r*N*(1-N/K)
return(list(c(dN), r))
}
signal <- matrix(
# time, r[1, 2],
c( 0, 2, 3,
3, 1, 2,
100, 1, 2), ncol=3, byrow=TRUE
)
## test of the interpolation
approxTime(signal, c(1, 2.9, 3, 100), rule = 2, f=0, method="constant")
params <- list(signal = signal, K = 100)
y <- c(N1=0.1, N2=0.2)
times <- seq(0, 10, 0.1)
out <- ode(y, times, model, params)
plot(out)
For a small number of state variables like in the example, separate signals with approxfun from package stats will look less generic but may be slighlty faster.
As a further improvement, one may consider to replace the "hard" transitions with a more smooth one. This can then directly be formulated as a function without the need of approx, approxfun or approxTime.
Edit 2:
Package simecol imports deSolve, and we need only a small function from it. So instead of loading simecol it is also possible to include the approxTime function explicitly in the code. The conversion from data frame to matrix improves performance, but a matrix is preferred anyway in such cases.
approxTime <- function(x, xout, ...) {
if (is.data.frame(x)) {x <- as.matrix(x); wasdf <- TRUE} else wasdf <- FALSE
if (!is.matrix(x)) stop("x must be a matrix or data frame")
m <- ncol(x)
y <- matrix(0, nrow=length(xout), ncol=m)
y[,1] <- xout
for (i in 2:m) {
y[,i] <- as.vector(approx(x[,1], x[,i], xout, ...)$y)
}
if (wasdf) y <- as.data.frame(y)
names(y) <- dimnames(x)[[2]]
y
}
If you want to pass a matrix parameter you should pass a list of parameters and you can modify it inside the model when your time limit is exceeded (in the example below you don't even have to pass the r matrix to the model function)
library(deSolve)
model <- function(time, y, params) {
with(as.list(c(y,params)),{
if(time < 3) r = matrix(c(2,3), nrow = 1, ncol = 2)
else r = matrix(c(1,3), nrow = 1, ncol = 2)
N = y[paste("N",1:2, sep = "")]
dN <- r*N*(1-N/K)
return(list(c(dN)))
})
}
y <- c(N1=0.1, N2=0.2)
params <- list(r = matrix(c(0,0), nrow = 1, ncol = 2), K=100)
times <- seq(0,10,0.1)
out <- ode(y, times, model, params)
plot(out)
You can see examples of this for instance with Delay Differential Equations ?dede

Why does my for loop return only 1 row,after using rbind in every iteration?

I am trying to obtain a set of solutions for the following equation:
y=x^2 - t[i] ,for all the values that the parameter t varies in [2,3].
I tried to implement a for loop that in every iteration computes the calculations and rbinds the result to a dataframe to be used for later.
t<-seq(from = 2, to = 3, by = 0.005)
x<-seq(from = 0, to = 30, by = 0.05)
d<-data.frame()
for (i in length(t)) {
y<- x^2 - t[i]
d<-rbind(d,y)
}
d
I expect the output of the for loop to be a dataframe of 201 rows and 601 columns, but the actual output is only one 1 row of 601 columns .
It will create 201 rows if you change the for loop to iterate through 1:length(t).
t <- seq(from = 2, to = 3, by = 0.005)
x <- seq(from = 0, to = 30, by = 0.05)
d <- data.frame()
for (i in 1:length(t)) {
y <- x^2 - t[i]
d <- rbind(d,y)
}
str(d)

How to run for loop R program faster?

I am using the following r code to compute the loglikelihood for left side and right side for each i = 1,2,...,200.
But I want to do this procedure for large number of generated dataset, for instance a = 10000 and iterate the entire loop for 1000 times. How can I speed up the the following program? Am I able to use applyfunction instead of for function?
Thank you in advance!
n1 = 100
n2 = 100
a = 1000
n= n1 + n2
# number of simulated copies of y
sim.data = matrix(NA, nrow = n, ncol = a)
for (i in 1:a) {
#for(j in 1:a){
sim.data[,i] = c(rnorm(n1, 2, 1), rnorm(n-n1, 4, 1))
#}
}
dim(sim.data)
# Compute the log-likelihood
B = ncol(sim.data)
loglike_profb = matrix(NA, n - 1, B)
for (j in 1:B) {
for (i in 1:(n - 1)) {
loglike_profb[i, j] = -0.5*(sum(((sim.data[1:i,j]) - mean(sim.data[1:i,j]))^2) + sum(((sim.data[(i + 1):n,j]) - mean(sim.data[(i +1):n,j]))^2))
}
}
You can put the calculation of the loglike_profb into a function and then use mapply
loglike_profb_func <- function(i,j){
-0.5*(sum(((sim.data[1:i,j]) - mean(sim.data[1:i,j]))^2) + sum(((sim.data[(i + 1):n,j]) - mean(sim.data[(i +1):n,j]))^2))
}
mapply(loglike_profb_func, rep(1:(n-1),B), rep(1:B,(n-1)))

Generalized Assignment Problem with Genetic Algorithms in R

How can I implement a Generalized Assignment Problem: https://en.wikipedia.org/wiki/Generalized_assignment_problem to be solved with Genetic Algorithms https://cran.r-project.org/web/packages/GA/GA.pdf in R.
I have a working example of the code but its not working:
require(GA)
p <- matrix(c(5, 1, 5, 1, 5, 5, 5, 5, 1), nrow = 3)
t <- c(2, 2, 2)
w <- c(2, 2, 2)
assigment <- function(x) {
f <- sum(x * p)
penalty1 <- sum(w)*(sum(t)-sum(w*x))
penalty2 <- sum(w)*(1-sum(x))
f - penalty1 - penalty2
}
GA <- ga(type = "binary", fitness = assigment, nBits = length(p),
maxiter = 1000, run = 200, popSize = 20)
summary(GA)
It seems there are problems in your definition of the fitness function, i.e. the assigment() function.
x is a binary vector, and not a matrix as in the theory, so sum(x * p) is not doing what you likely expect (note that x has length 9 and p is a 3x3 matrix in your example);
the constrain on the sum of x_{ij} is not correctly taken into account by the penalty2 term;
the penalisations should act differently for penalty1 and penalty2, the first is an inequality (i.e. <=) while the second is a strict equality (i.e. =).
w is defined as a vector, but it should be a matrix of the same size as x

Resources