NSGA2 Genetic Algorithm in R - r

I am working on the NSGA2 package on R (library mco).
My NSGA2 code takes forever to run, so I am wondering:
1) Is there a way to limit the precision of the solution values (say, maybe up to 3 decimal places) instead of infinite?
2) How do I set an equality constraint (the ones online all seemed to be about >= or <= than =)? Not sure if I'm doing it right.
My entire relevant code for reference, for easy tracing: https://docs.google.com/document/d/1xj7OPng11EzLTTtWLdRWMm8zJ9f7q1wsx2nIHdh3RM4/edit?usp=sharing
Relevant sample part of code reproduced here:
VTR = get.hist.quote(instrument = 'VTR',
start="2010-01-01", end = "2015-12-31",
quote = c("AdjClose"),provider = "yahoo",
compress = "d")
ObjFun1 <- function (xh){
f1 <- sum(HSVaR_P(merge(VTR, CMI, SPLS, KSS, DVN, MAT, LOE, KEL, COH, AXP), xh, 0.05, 2))
tempt = merge(VTR, CMI, SPLS, KSS, DVN, MAT, LOE, KEL, COH, AXP)
tempt2 = tempt[(nrow(tempt)-(2*N)):nrow(tempt),]
for (i in 1:nrow(tempt2))
{
for (j in 1:ncol(tempt2))
{
if (is.na(tempt2[i,j]))
{
tempt2[i,j] = 0
}
}
}
f2 <- ((-1)*abs(sum((xh*t(tempt2)))))
c(f1=f1,f2=f2)
}
Constr <- function(xh){
totwt <- (1-sum(-xh))
totwt2 <- (sum(xh)-1)
c(totwt,totwt2)
}
Solution1 <- nsga2(ObjFun1, n.projects, 2,
lower.bounds=rep(0,n.projects), upper.bounds=rep(1,n.projects),
popsize=n.solutions, constraints = Constr, cdim=1,
generations=generations)
The function HSVaR_P returns matrix(x,2*500,1).
Even when I set generations = 1, the code does not seem to run. Clearly there should be some error in the code, somewhere, but I am not entirely sure about the mechanics of the NSGA2 algorithm.
Thanks.

Related

Multiple forcings in a multi-patch ode model - R package desolve and compiled C code

I am trying to create an SEIR model with multiple patches using the package deSolve in R. At each time step, there is some movement of individuals between patches that can infect individuals in other patches. I also have an external forcing parameter that is specific to each patch (representing different environmental conditions). I've been able to get this working in base R, but given the number of patches and compartments and the duration of the model, I'm trying to convert it to compiled code to speed it up.
I've gotten the different patches working, but am struggling with how to incorporate a different forcing parameter for each patch. When forcings are provided, there is an automatic check checkforcings (https://rdrr.io/cran/deSolve/src/R/forcings.R) that doesn't allow for a matrix with more than two columns, and I'm not quite sure what the best workaround is for this. Write my own ode and checkforcings functions to override this? Restructure the forcings data once it gets into C? My final model has 195 patches so I'd prefer to be to automate it somehow so I am not writing out thousands of equations or hundreds of functions.
Also fine if the answer is just, do this in a different language, but would appreciate insight into what language I should switch to. Julia maybe?
Below is code for a very simple example that just highlights this "different forcings in different patches problem".
R Code
# Packages #########################################################
library(deSolve)
library(ggplot2); theme_set(theme_bw())
library(tidyr)
library(dplyr)
# Initial Parameters and things ####################################
times <- 1:500
n_patch <- 2
patch_ind <- 100
state_names <- (c("S", "I"))
n_state <- length(state_names)
x <-rep(0, n_patch*n_state)
names(x) <- unlist(lapply(state_names, function(x) paste(x,
stringr::str_pad(seq(n_patch), width = 3, side = "left", pad =0),
sep = "_")))
#start with infected individuals in patch 1
x[startsWith(names(x), "S")] <- patch_ind
x['S_001'] <- x['S_001'] - 5
x['I_001'] <- x['I_001'] + 5
x['I_002'] <- x['I_002'] + 20
params <- c(gamma = 0.1, betam = 0.2)
#seasonality
forcing <- data.frame(times = times,
rain = rep(rep(c(0.95,1.05), each = 50), 5))
new_approx_fun <- function(rain.column, t){
approx_col <- approxfun(rain.column, rule = 2)
return(approx_col(t))
}
rainfall2 <- data.frame(P1 = forcing$rain,
P2 = forcing$rain+0.01)
# model in R
r.mod2 <- function(t,x,params){
# turn state.vec into matrix
# columns are different states, rows are different patches
states <- matrix(x,
nrow = n_patch,
ncol = n_state, byrow = F)
S <- states[,1]
I <- states[,2]
N <- rowSums(states[,1:2])
with(as.list(params),{
#seasonal forcing
rain <- as.numeric(apply(as.matrix(rainfall2), MARGIN = 2, FUN = new_approx_fun, t = t))
dS <- gamma*I - rain*betam*S*I/N
dI <- rain*betam*S*I/N - gamma*I
return(list(c(dS, dI), rain))
})
}
out.R2 <- data.frame(ode(y = x, times =times, func = r.mod2,
parms = params))
#create seasonality for C
ftime <- seq(0, max(times), by = 0.1)
rain.ft <- approx(times, rainfall2$P1, xout = ftime, rule = 2)$y
forcings2 <- cbind(ftime, rain.ft, rain.ft +0.01)
# C model
system("R CMD SHLIB ex-patch-season-multi.c")
dyn.load(paste("ex-patch-season-multi", .Platform$dynlib.ext, sep = ""))
out.dll <- data.frame(ode(y = x, times = times, func = "derivsc",
dllname = "ex-patch-season-multi", initfunc = "parmsc",
parms = params, forcings = forcings2,
initforc = "forcc", nout = 1, outnames = "rain"))
C code
#include <R.h>
#include <math.h>
#include <Rmath.h>
// this is for testing to try and get different forcing for each patch //
/*define parameters, pay attention to order */
static double parms[2];
static double forc[1];
#define gamma parms[0]
#define betam parms[1]
//define forcing
#define rain forc[0]
/* initialize parameters */
void parmsc(void (* odeparms)(int *, double *)){
int N=2;
odeparms(&N, parms);
}
/* forcing */
void forcc(void (* odeforcs)(int *, double *))
{
int N=1;
odeforcs(&N, forc);
}
/* model function */
void derivsc(int *neq, double *t, double *y, double *ydot, double *yout, int *ip){
//use for-loops for patches
//define all variables at start of block
int npatch=2;
double S[npatch]; double I[npatch]; double N[npatch];
int i;
for(i=0; i<npatch; i++){
S[i] = y[i];
};
for(i=0; i <npatch; i++){
int ind = npatch+i;
I[i] = y[ind];
};
for(i=0; i<npatch; i++){
N[i] = S[i] + I[i];
};
//use for loops for equations
{
// Susceptible
for(i=0; i<npatch; i++){
ydot[i] = gamma*I[i] - rain*betam*I[i]*S[i]/N[i] ;
};
//infected
for(i=0; i<npatch; i++){
int ind=npatch+i;
ydot[ind] = rain*betam*I[i]*S[i]/N[i] - gamma*I[i];
};
};
yout[0] = rain;
}
The standard way for multiple forcings in compiled code of the deSolve package is described in the lsoda help page:
forcings only used if ‘dllname’ is specified: a list with the forcing function data sets, each present as a two-columned matrix
Such a list can be created automatically in a script.
There are also other ways possible with some creative C or Fortran programming.
For more complex models, I would recommend to use the rodeo package. It allows to specify dynamic models in a tabular form (CSV, LibreOffice, Excel), including parameters and forcing functions. The code generator of the package creates then a fast Fortran code, that can be solved with deSolve. An overview can be found in a paper of Kneis et al (2017), https://doi.org/10.1016/j.envsoft.2017.06.036 and a more extended tutorial at https://dkneis.github.io/ .

Catching the print of the function

I am using package fda in particular function fRegress. This function includes another function that is called eigchk and checks if coeffients matrix is singular.
Here is the function as the package owners (J. O. Ramsay, Giles Hooker, and Spencer Graves) wrote it.
eigchk <- function(Cmat) {
# check Cmat for singularity
eigval <- eigen(Cmat)$values
ncoef <- length(eigval)
if (eigval[ncoef] < 0) {
neig <- min(length(eigval),10)
cat("\nSmallest eigenvalues:\n")
print(eigval[(ncoef-neig+1):ncoef])
cat("\nLargest eigenvalues:\n")
print(eigval[1:neig])
stop("Negative eigenvalue of coefficient matrix.")
}
if (eigval[ncoef] == 0) stop("Zero eigenvalue of coefficient matrix.")
logcondition <- log10(eigval[1]) - log10(eigval[ncoef])
if (logcondition > 12) {
warning("Near singularity in coefficient matrix.")
cat(paste("\nLog10 Eigenvalues range from\n",
log10(eigval[ncoef])," to ",log10(eigval[1]),"\n"))
}
}
As you can see last if condition checks if logcondition is bigger than 12 and prints then the ranges of eigenvalues.
The following code implements the useage of regularization with roughness pennalty. The code is taken from the book "Functional data analysis with R and Matlab".
annualprec = log10(apply(daily$precav,2,sum))
tempbasis =create.fourier.basis(c(0,365),65)
tempSmooth=smooth.basis(day.5,daily$tempav,tempbasis)
tempfd =tempSmooth$fd
templist = vector("list",2)
templist[[1]] = rep(1,35)
templist[[2]] = tempfd
conbasis = create.constant.basis(c(0,365))
betalist = vector("list",2)
betalist[[1]] = conbasis
SSE = sum((annualprec - mean(annualprec))^2)
Lcoef = c(0,(2*pi/365)^2,0)
harmaccelLfd = vec2Lfd(Lcoef, c(0,365))
betabasis = create.fourier.basis(c(0, 365), 35)
lambda = 10^12.5
betafdPar = fdPar(betabasis, harmaccelLfd, lambda)
betalist[[2]] = betafdPar
annPrecTemp = fRegress(annualprec, templist, betalist)
betaestlist2 = annPrecTemp$betaestlist
annualprechat2 = annPrecTemp$yhatfdobj
SSE1.2 = sum((annualprec-annualprechat2)^2)
RSQ2 = (SSE - SSE1.2)/SSE
Fratio2 = ((SSE-SSE1.2)/3.7)/(SSE1/30.3)
resid = annualprec - annualprechat2
SigmaE. = sum(resid^2)/(35-annPrecTemp$df)
SigmaE = SigmaE.*diag(rep(1,35))
y2cMap = tempSmooth$y2cMap
stderrList = fRegress.stderr(annPrecTemp, y2cMap, SigmaE)
betafdPar = betaestlist2[[2]]
betafd = betafdPar$fd
betastderrList = stderrList$betastderrlist
betastderrfd = betastderrList[[2]]
As penalty factor the authors use certain lambda.
The following code implements the search for the appropriate `lambda.
loglam = seq(5,15,0.5)
nlam = length(loglam)
SSE.CV = matrix(0,nlam,1)
for (ilam in 1:nlam) {
lambda = 10ˆloglam[ilam]
betalisti = betalist
betafdPar2 = betalisti[[2]]
betafdPar2$lambda = lambda
betalisti[[2]] = betafdPar2
fRegi = fRegress.CV(annualprec, templist,
betalisti)
SSE.CV[ilam] = fRegi$SSE.CV
}
By changing the value of the loglam and cross validation I suppose to equaire the best lambda, yet if the length of the loglam is to big or its values lead the coefficient matrix to singulrity. I recieve the following message:
Log10 Eigenvalues range from
-5.44495317739048 to 6.78194912518214
Created by the function eigchk as I already have mentioned above.
Now my question is, are there any way to catch this so called warning? By catch I mean some function or method that warns me when this has happened and I could adjust the values of the loglam. Since there is no actual warning definition in the function beside this print of the message I ran out of ideas.
Thank you all a lot for your suggestions.
By "catch the warning", if you mean, will alert you that there is a potential problem with loglam, then you might want to look at try and tryCatch functions. Then you can define the behavior you want implemented if any warning condition is satisfied.
If you just want to store the output of the warning (which might be assumed from the question title, but may not be what you want), then try looking into capture.output.

R - Arrays with variable dimension

I have a weird question..
Essentially, I have a function which takes a data frame of dimension Nx(2k) and transforms it into an array of dimension Nx2xk. I then further use that array in various locations in the function.
My issue is this, when k == 2, I'm left with a matrix of degree Nx2, and even worse, if N = 1, I'm stuck with a matrix of degree 1x2.
I would like to write myArray[thisRow,,] to select that slice of the array, but this falls short for the N = 1, k = 2 case. I tried myArray[thisRow,,,drop = FALSE] but that gives an 'incorrect number of dimensions' error. This same issue arrises for the Nx2 case.
Is there a work around for this issue, or do I need to break my code into cases?
Sample Code Shown Below:
thisFunction <- function(myDF)
{
nGroups = NCOL(myDF)/2
afMyArray = myDF
if(nGroups > 1)
{
afMyArray = abind(lapply(1:nGroups, function(g){myDF[,2*(g-1) + 1:2]}),
along = 3)
}
sapply(1:NROW(myDF),
function(r)
{
thisSlice = afMyArray[r,,]
*some operation on thisSlice*
})
}
Thanks,
James

Incorporating point error information into a distance function--how to do it in R?

I have been working with the proxy package in R to implement a distance measure that weights Euclidean distance by the propagated errors of each individual point. The formula to do this is
sqrt((xi - xj)2) + (yi - yj)2) + ...(ni - nj)2) ÷ sqrt((σxi2 + σxj2) + (σyi2 + σyj2) + ...(σni2 + σnj2)).
I was able to get proxy to work for me in a basic sense (see proxy package in R, can't make it work) and replicated plain Euclidean distance functionality, hooray for the amateur.
However, once I started writing the function for the error-weighted distance, I immediately ran into a difficulty: I need to read in the errors as distinct from the points and have them processed distinctly.
I know that R has very strong functionality and I'm sure it can do this, but for the life of me, I don't know how. It looks like proxy's dist can handle two matrix inputs, but how would I tell it that matrix X is the points and matrix Y is the errors, and then have each go to its appropriate part of the function before being ultimately combined into the distance measure?
I had been hoping to use proxy directly, but I also realized that it looks like I can't. I believe I was able to come up with a function that works. First, the distance function:
DistErrAdj <- function(x,y) {
sing.err <- sqrt((x^2) + (y^2))
sum(sing.err)
}
Followed, of course, by
library(proxy)
pr_DB$set_entry(FUN=DistErrAdj,names="DistErrAdj")
Then, I took code already kindly written from augix (http://augix.com/wiki/Make%20trees%20in%20R,%20test%20its%20stability%20by%20bootstrapping.html) and altered it to suit my needs, to wit:
boot.errtree <- function(x, q, B = 1001, tree = "errave") {
library(ape)
library(protoclust)
library(cluster)
library(proxy)
func <- function(x,y) {
tr = agnes((dist(x, method = "euclidean")/dist(q, method = "DistErrAdj")), diss = TRUE, method = "average")
tr = as.phylo(as.hclust(tr))
return(tr)
}
if (tree == "errprot") {
func <- function(x,y) {
tr = protoclust((dist(x, method = "euclidean")/dist(q, method = "DistErrAdj")))
tr = as.phylo(tr)
return(tr)
}
}
if (tree == "errdiv") {
func <- function(x,y) {
tr = diana((dist(x, method = "euclidean")/dist(q, method = "DistErrAdj")), diss=TRUE)
tr = as.phylo(as.hclust(tr))
return(tr)
}
}
tr_real = func(x)
plot(tr_real)
bp <- boot.phylo(tr_real, x, FUN=func, B=B)
nodelabels(bp)
return(bp)
}
It seems to work.

(in R) Why is result of ksvm using user-defined linear kernel different from that of ksvm using "vanilladot"?

I wanted to use user-defined kernel function for Ksvm in R.
so, I tried to make a vanilladot kernel and compare with "vanilladot" which is built in "kernlab" as practice.
I write my kernel as follow.
#
###vanilla kernel with class "kernel"
#
kfunction.k <- function(){
k <- function (x,y){crossprod(x,y)}
class(k) <- "kernel"
k}
l<-0.1 ; C<-1/(2*l)
###use kfunction.k
tmp<-ksvm(x,factor(y),scaled=FALSE, type = "C-svc", kernel=kfunction.k(), C = C)
alpha(tmp)[[1]]
ind<-alphaindex(tmp)[[1]]
x.s<-x[ind,] ; y.s<-y[ind]
w.class.k<-t(alpha(tmp)[[1]]*y.s)%*%x.s
w.class.k
I thouhgt result of this operation is eqaul to that of following.
However It dosn't.
#
###use "vanilladot"
#
l<-0.1 ; C<-1/(2*l)
tmp1<-ksvm(x,factor(y),scaled=FALSE, type = "C-svc", kernel="vanilladot", C = C)
alpha(tmp1)[[1]]
ind1<-alphaindex(tmp1)[[1]]
x.s<-x[ind1,] ; y.s<-y[ind1]
w.tmp1<-t(alpha(tmp1)[[1]]*y.s)%*%x.s
w.tmp1
I think maybe this problem is related to kernel class.
When class is set to "kernel", this problem is occured.
However When class is set to "vanillakernel", the result of ksvm using user-defined kernel is equal to that of ksvm using "vanilladot" which is built in Kernlab.
#
###vanilla kernel with class "vanillakernel"
#
kfunction.v.k <- function(){
k <- function (x,y){crossprod(x,y)}
class(k) <- "vanillakernel"
k}
# The only difference between kfunction.k and kfunction.v.k is "class(k)".
l<-0.1 ; C<-1/(2*l)
###use kfunction.v.k
tmp<-ksvm(x,factor(y),scaled=FALSE, type = "C-svc", kernel=kfunction.v.k(), C = C)
alpha(tmp)[[1]]
ind<-alphaindex(tmp)[[1]]
x.s<-x[ind,] ; y.s<-y[ind]
w.class.v.k<-t(alpha(tmp)[[1]]*y.s)%*%x.s
w.class.v.k
I don't understand why the result is different from "vanilladot", when setting the class to "kernel".
Is there an error in my operation?
First, it seems like a really good question!
Now to the point. In the sources of ksvm we can find when is a line drawn between using user-defined kernel, and the built-ins:
if (type(ret) == "spoc-svc") {
if (!is.null(class.weights))
weightedC <- class.weights[weightlabels] * rep(C,
nclass(ret))
else weightedC <- rep(C, nclass(ret))
yd <- sort(y, method = "quick", index.return = TRUE)
xd <- matrix(x[yd$ix, ], nrow = dim(x)[1])
count <- 0
if (ktype == 4)
K <- kernelMatrix(kernel, x)
resv <- .Call("tron_optim", as.double(t(xd)), as.integer(nrow(xd)),
as.integer(ncol(xd)), as.double(rep(yd$x - 1,
2)), as.double(K), as.integer(if (sparse) xd#ia else 0),
as.integer(if (sparse) xd#ja else 0), as.integer(sparse),
as.integer(nclass(ret)), as.integer(count), as.integer(ktype),
as.integer(7), as.double(C), as.double(epsilon),
as.double(sigma), as.integer(degree), as.double(offset),
as.double(C), as.double(2), as.integer(0), as.double(0),
as.integer(0), as.double(weightedC), as.double(cache),
as.double(tol), as.integer(10), as.integer(shrinking),
PACKAGE = "kernlab")
reind <- sort(yd$ix, method = "quick", index.return = TRUE)$ix
alpha(ret) <- t(matrix(resv[-(nclass(ret) * nrow(xd) +
1)], nclass(ret)))[reind, , drop = FALSE]
coef(ret) <- lapply(1:nclass(ret), function(x) alpha(ret)[,
x][alpha(ret)[, x] != 0])
names(coef(ret)) <- lev(ret)
alphaindex(ret) <- lapply(sort(unique(y)), function(x)
which(alpha(ret)[,
x] != 0))
xmatrix(ret) <- x
obj(ret) <- resv[(nclass(ret) * nrow(xd) + 1)]
names(alphaindex(ret)) <- lev(ret)
svindex <- which(rowSums(alpha(ret) != 0) != 0)
b(ret) <- 0
param(ret)$C <- C
}
The important parts are two things, first, if we provide ksvm with our own kernel, then ktype=4 (while for vanillakernel, ktype=0) so it makes two changes:
in case of user-defined kernel, the kernel matrix is computed instead of actually using the kernel
tron_optim routine is ran with the information regarding the kernel
Now, in the svm.cpp we can find the tron routines, and in the tron_run (called from tron_optim), that LINEAR kernel has a separate optimization routine
if (param->kernel_type == LINEAR)
{
/* lots of code here */
while (Cpj < Cp)
{
totaliter += s.Solve(l, prob->x, minus_ones, y, alpha, w,
Cpj, Cnj, param->eps, sii, param->shrinking,
param->qpsize);
/* lots of code here */
}
totaliter += s.Solve(l, prob->x, minus_ones, y, alpha, w, Cp, Cn,
param->eps, sii, param->shrinking, param->qpsize);
delete[] w;
}
else
{
Solver_B s;
s.Solve(l, BSVC_Q(*prob,*param,y), minus_ones, y, alpha, Cp, Cn,
param->eps, sii, param->shrinking, param->qpsize);
}
As you can see, the linear case is treated in the more complex, more detailed way. There is an inner optimization loop calling the solver many times. It would require really deep analysis of actual optimization being performed here, but at this step one can answer your question in a following way:
There is no error in your operation
kernlab's svm has a separate routine for training SVM with linear kernel, which is based on the type of kernel passed to the code, changing "kernel" to "vanillakernel" made the ksvm think it is actually working with vanillakernel, and so performed this separate optimization routine
It does not seem as a bug in fact, as the linear SVM is in fact very different from the kernelized version in terms of efficient optimization techniques. Amount of heuristic as well as numerical issues that has to be taken care of is really big. As a result, some approximations are required and can lead to the different results. While for the rich feature space (like those induced by RBF kernel) it should not really matter, for simple kernels line linear ones - this simplifications can lead to significant output changes.

Resources