Catching the print of the function - r

I am using package fda in particular function fRegress. This function includes another function that is called eigchk and checks if coeffients matrix is singular.
Here is the function as the package owners (J. O. Ramsay, Giles Hooker, and Spencer Graves) wrote it.
eigchk <- function(Cmat) {
# check Cmat for singularity
eigval <- eigen(Cmat)$values
ncoef <- length(eigval)
if (eigval[ncoef] < 0) {
neig <- min(length(eigval),10)
cat("\nSmallest eigenvalues:\n")
print(eigval[(ncoef-neig+1):ncoef])
cat("\nLargest eigenvalues:\n")
print(eigval[1:neig])
stop("Negative eigenvalue of coefficient matrix.")
}
if (eigval[ncoef] == 0) stop("Zero eigenvalue of coefficient matrix.")
logcondition <- log10(eigval[1]) - log10(eigval[ncoef])
if (logcondition > 12) {
warning("Near singularity in coefficient matrix.")
cat(paste("\nLog10 Eigenvalues range from\n",
log10(eigval[ncoef])," to ",log10(eigval[1]),"\n"))
}
}
As you can see last if condition checks if logcondition is bigger than 12 and prints then the ranges of eigenvalues.
The following code implements the useage of regularization with roughness pennalty. The code is taken from the book "Functional data analysis with R and Matlab".
annualprec = log10(apply(daily$precav,2,sum))
tempbasis =create.fourier.basis(c(0,365),65)
tempSmooth=smooth.basis(day.5,daily$tempav,tempbasis)
tempfd =tempSmooth$fd
templist = vector("list",2)
templist[[1]] = rep(1,35)
templist[[2]] = tempfd
conbasis = create.constant.basis(c(0,365))
betalist = vector("list",2)
betalist[[1]] = conbasis
SSE = sum((annualprec - mean(annualprec))^2)
Lcoef = c(0,(2*pi/365)^2,0)
harmaccelLfd = vec2Lfd(Lcoef, c(0,365))
betabasis = create.fourier.basis(c(0, 365), 35)
lambda = 10^12.5
betafdPar = fdPar(betabasis, harmaccelLfd, lambda)
betalist[[2]] = betafdPar
annPrecTemp = fRegress(annualprec, templist, betalist)
betaestlist2 = annPrecTemp$betaestlist
annualprechat2 = annPrecTemp$yhatfdobj
SSE1.2 = sum((annualprec-annualprechat2)^2)
RSQ2 = (SSE - SSE1.2)/SSE
Fratio2 = ((SSE-SSE1.2)/3.7)/(SSE1/30.3)
resid = annualprec - annualprechat2
SigmaE. = sum(resid^2)/(35-annPrecTemp$df)
SigmaE = SigmaE.*diag(rep(1,35))
y2cMap = tempSmooth$y2cMap
stderrList = fRegress.stderr(annPrecTemp, y2cMap, SigmaE)
betafdPar = betaestlist2[[2]]
betafd = betafdPar$fd
betastderrList = stderrList$betastderrlist
betastderrfd = betastderrList[[2]]
As penalty factor the authors use certain lambda.
The following code implements the search for the appropriate `lambda.
loglam = seq(5,15,0.5)
nlam = length(loglam)
SSE.CV = matrix(0,nlam,1)
for (ilam in 1:nlam) {
lambda = 10ˆloglam[ilam]
betalisti = betalist
betafdPar2 = betalisti[[2]]
betafdPar2$lambda = lambda
betalisti[[2]] = betafdPar2
fRegi = fRegress.CV(annualprec, templist,
betalisti)
SSE.CV[ilam] = fRegi$SSE.CV
}
By changing the value of the loglam and cross validation I suppose to equaire the best lambda, yet if the length of the loglam is to big or its values lead the coefficient matrix to singulrity. I recieve the following message:
Log10 Eigenvalues range from
-5.44495317739048 to 6.78194912518214
Created by the function eigchk as I already have mentioned above.
Now my question is, are there any way to catch this so called warning? By catch I mean some function or method that warns me when this has happened and I could adjust the values of the loglam. Since there is no actual warning definition in the function beside this print of the message I ran out of ideas.
Thank you all a lot for your suggestions.

By "catch the warning", if you mean, will alert you that there is a potential problem with loglam, then you might want to look at try and tryCatch functions. Then you can define the behavior you want implemented if any warning condition is satisfied.
If you just want to store the output of the warning (which might be assumed from the question title, but may not be what you want), then try looking into capture.output.

Related

(R) Error in optim - attempt to apply non-function, when function is defined

not sure what I'm doing wrong here. I'm trying to get a cross-validation score for a mixture-of-two-gammas model.
llikGammaMix2 = function(param, x) {
if (any(param < 0) || param["p1"] > 1) {
return(-Inf)
} else {
return(sum(log(
dgamma(x, shape = param["k1"], scale = param["theta1"]) *
param["p1"] + dgamma(x, shape = param["k2"], scale = param["theta2"]) *
1
(1 - param["p1"])
)))
}
}
initialParams = list(
theta1 = 1,
k1 = 1.1,
p1 = 0.5,
theta2 = 10,
k2 = 2
)
for (i in 1:nrow(cichlids)) {
SWS1_training <- cichlids$SWS1 - cichlids$SWS1[i]
SWS1_test <- cichlids$SWS1[i]
MLE_training2 <-
optim(
par = initialParams,
fn = llikGammaMix2,
x = SWS1_training,
control = list(fnscale = -1)
)$par
LL_test2 <-
optim(
par = MLE_training2,
fn = llikGammaMix2,
x = SWS1_test,
control = list(fnscale = -1)
)$value
}
print(LL_test2)
This runs until it gets to the first optim(), then spits out Error in fn(par, ...) : attempt to apply non-function.
My first thought was a silly spelling error somewhere, but that doesn't seem to be the case. Any help is appreciated.
I believe the issue is in the return statement. It's unclear if you meant to multiply or add the last quantity (1 - param["p1"])))) to the return value. Based on being a mixture, I'm guessing you mean for it to be multiplied. Instead it just hangs at the end which throws issues for the function:
return(sum(log(dgamma(x, shape = param["k1"], scale = param["theta1"]) *
param["p1"] +
dgamma(x, shape = param["k2"], scale = param["theta2"]) *
(1 - param["p1"])))) ## ISSUE HERE: Is this what you meant?
There could be other issues with the code. I would double check that the function you are optimizing is what you think it ought to be. It's also hard to tell unless you give a reproducible example we might be able to use. Try to clear up the above issue and let us know if there are still problems.

Storing information during optim()

I have a general function I have provided an example below if simple linear regression:
x = 1:30
y = 0.7 * x + 32
Data = rnorm(30, mean = y, sd = 2.5);
lin = function(pars = c(grad,cons)) {
expec = pars[1] * x + pars[2];
SSE = sum((Data - expec)^2)
return(SSE)
}
start_vals = c(0.2,10)
lin(start_vals)
estimates = optim(par = start_vals, fn = lin);
## plot the data
Fit = estimates$par[1] * x + estimates$par[2]
plot(x,Data)
lines(x, Fit, col = "red")
So that's straight forward. What I want is to store the expectation for the last set of parameters, so that once I have finished optimizing I can view them. I have tried using a global container and trying to populating it if the function is executed but it doesn't work, e.g
Expectation = c();
lin = function(pars = c(grad,cons)) {
expec = pars[1] * x + pars[2];
Expectation = expec;
SSE = sum((Data - expec)^2)
return(SSE)
}
start_vals = c(0.2,10)
estimates = optim(par = start_vals, fn = lin);
Expectation ## print the expectation that would relate to estimates$par
I know that this is trivial to do outside of the function, but my actual problem (which is analogous to this) is much more complex. Basically I need to return internal information that can't be retrospectively calculated. Any help is much appreciated.
you should use <<- instead of = in your lin function, Expectation <<- expec,The operators <<- and ->> are normally only used in functions, and cause a search to be made through parent environments for an existing definition of the variable being assigned.

NSGA2 Genetic Algorithm in R

I am working on the NSGA2 package on R (library mco).
My NSGA2 code takes forever to run, so I am wondering:
1) Is there a way to limit the precision of the solution values (say, maybe up to 3 decimal places) instead of infinite?
2) How do I set an equality constraint (the ones online all seemed to be about >= or <= than =)? Not sure if I'm doing it right.
My entire relevant code for reference, for easy tracing: https://docs.google.com/document/d/1xj7OPng11EzLTTtWLdRWMm8zJ9f7q1wsx2nIHdh3RM4/edit?usp=sharing
Relevant sample part of code reproduced here:
VTR = get.hist.quote(instrument = 'VTR',
start="2010-01-01", end = "2015-12-31",
quote = c("AdjClose"),provider = "yahoo",
compress = "d")
ObjFun1 <- function (xh){
f1 <- sum(HSVaR_P(merge(VTR, CMI, SPLS, KSS, DVN, MAT, LOE, KEL, COH, AXP), xh, 0.05, 2))
tempt = merge(VTR, CMI, SPLS, KSS, DVN, MAT, LOE, KEL, COH, AXP)
tempt2 = tempt[(nrow(tempt)-(2*N)):nrow(tempt),]
for (i in 1:nrow(tempt2))
{
for (j in 1:ncol(tempt2))
{
if (is.na(tempt2[i,j]))
{
tempt2[i,j] = 0
}
}
}
f2 <- ((-1)*abs(sum((xh*t(tempt2)))))
c(f1=f1,f2=f2)
}
Constr <- function(xh){
totwt <- (1-sum(-xh))
totwt2 <- (sum(xh)-1)
c(totwt,totwt2)
}
Solution1 <- nsga2(ObjFun1, n.projects, 2,
lower.bounds=rep(0,n.projects), upper.bounds=rep(1,n.projects),
popsize=n.solutions, constraints = Constr, cdim=1,
generations=generations)
The function HSVaR_P returns matrix(x,2*500,1).
Even when I set generations = 1, the code does not seem to run. Clearly there should be some error in the code, somewhere, but I am not entirely sure about the mechanics of the NSGA2 algorithm.
Thanks.

R fit user defined distribution

I am trying to fit my own distribution to my data, find the optimum parameters of the distribution to match the data and ultimately find the FWHM of the peak in the distribution. From what I've read, the package fitdistrplus is the way to do this. I know the data takes the shape of a lorentzian peak on a quadratic background.
plot of the data:
plot of raw data
The raw data used:
data = c(0,2,5,4,5,4,3,3,2,2,0,4,4,2,5,5,3,3,4,4,4,3,3,5,5,6,6,8,4,0,6,5,7,5,6,3,2,1,7,0,7,9,5,7,5,3,5,5,4,1,4,8,10,2,5,8,7,14,7,5,8,4,2,2,6,5,4,6,5,7,5,4,8,5,4,8,11,9,4,8,11,7,8,6,9,5,8,9,10,8,4,5,8,10,9,12,10,10,5,5,9,9,11,19,17,9,17,10,17,18,11,14,15,12,11,14,12,10,10,8,7,13,14,17,18,16,13,16,14,17,20,15,12,15,16,18,24,23,20,17,21,20,20,23,20,15,20,28,27,26,20,17,19,27,21,28,32,29,20,19,24,19,19,22,27,28,23,37,41,42,34,37,29,28,28,27,38,32,37,33,23,29,55,51,41,50,44,46,53,63,49,50,47,54,54,43,45,58,54,55,67,52,57,67,69,62,62,65,56,72,75,88,87,77,70,71,84,85,81,84,75,78,80,82,107,102,98,82,93,98,90,94,118,107,113,103,99,103,96,108,114,136,126,126,124,130,126,113,120,107,107,106,107,136,143,135,151,132,117,118,108,120,145,140,122,135,153,157,133,130,128,109,106,122,133,132,150,156,158,150,137,147,150,146,144,144,149,171,185,200,194,204,211,229,225,235,228,246,249,238,214,228,250,275,311,323,327,341,368,381,395,449,474,505,529,585,638,720,794,896,919,1008,1053,1156,1134,1174,1191,1202,1178,1236,1200,1130,1094,1081,1009,949,890,810,760,690,631,592,561,515,501,489,467,439,388,377,348,345,310,298,279,253,257,259,247,237,223,227,217,210,213,197,197,192,195,198,201,202,211,193,203,198,202,174,164,162,173,170,184,170,168,175,170,170,168,162,149,139,145,151,144,152,155,170,156,149,147,158,171,163,146,151,150,147,137,123,127,136,149,147,124,137,133,129,130,128,139,137,147,141,123,112,136,147,126,117,116,100,110,120,105,91,100,100,105,92,88,78,95,75,75,82,82,80,83,83,66,73,80,76,69,81,93,79,71,80,90,72,72,63,57,53,62,65,49,51,57,73,54,56,78,65,52,58,49,47,56,46,43,50,43,40,39,36,45,28,35,36,43,48,37,36,35,39,31,24,29,37,26,22,36,33,24,31,31,20,30,28,23,21,27,26,29,21,20,22,18,19,19,20,21,20,25,18,12,18,20,20,13,14,21,20,16,18,12,17,20,24,21,20,18,11,17,12,5,11,13,16,13,13,12,12,9,15,13,15,11,12,11,8,13,16,16,16,14,8,8,10,11,11,17,15,15,9,9,13,12,3,11,14,11,14,13,8,7,7,15,12,8,12,14,9,5,2,10,8)
I have calculated the equations which define the distribution and cumulative distribution:
dFF <- function(x,a,b,c,A,gamma,pos) a + b*x + (c*x^2) + ((A/pi)*(gamma/(((x-pos)^2) + (gamma^2))))
pFF <- function(x,a,b,c,A,gamma,pos) a*x + (b/2)*(x^2) + (c/3)*(x^3) + A/2 + (A/pi)*(atan((x - pos)/gamma))
I believe these to be correct. From what I understand, a distribution fit should be possible using just these definitions using the fitdist (or mledist) method:
fitdist(data,'FF', start = list(0,0.3,-0.0004,70000,13,331))
mledist(data,'FF', start = list(0,0.3,-0.0004,70000,13,331))
This returns the statement 'function cannot be evaluated at initial parameters> Error in fitdist(data, "FF", start = list(0, 0.3, -4e-04, 70000, 13, 331)):the function mle failed to estimate the parameters, with the error code 100' in the first case and in the second I just get a list of 'NA' values for the estimates.
I then calculated a function to give the quantile distribution values to use the other fitting methods (qmefit):
qFF <- function(p,a,b,c,A,gamma,pos)
{
qList = c()
axis = seq(1,600,1)
aF = dFF(axis,a,b,c,A,gamma,pos)
arr = histogramCpp(aF) # change data to a histogram format
for(element in 1:length(p)){
q = quantile(arr,p[element], names=FALSE)
qList = c(qList,q)
}
return(qList)
}
Part of this code requires calling the c++ function (by using the library Rcpp):
#include <Rcpp.h>
#include <vector>
#include <math.h>
using namespace Rcpp;
// [[Rcpp::export]]
std::vector<int> histogramCpp(NumericVector x) {
std::vector<int> arr;
double number, fractpart, intpart;
for(int i = 0; i <= 600; i++){
number = (x[i]);
fractpart = modf(number , &intpart);
if(fractpart < 0.5){
number = (int) intpart;
}
if(fractpart >= 0.5){
number = (int) (intpart+1);
}
for(int j = 1; j <= number; j++){
arr.push_back(i);
}
}
return arr;
}
This c++ method just turns the data into a histogram format. If the first element of the vector describing the data is 4 then '1' is added 4 times to the returned vector etc. . This also seems to work as sensible values are returned. plot of the quantile function:
Plot of quantiles returned for probabilities from 0 to 1 in steps of 0.001
The 'qmefit' method can then be attempted through the fitdist function:
fitdist(data,'FF', start = list(0,0.3,-0.0004,70000,13,331), method = 'qme', probs = c(0,0.3,0.4,0.5,0.7,0.9))
I chose the 'probs' values randomly as I don't fully understand their meaning. This either straight-up crashes the R session or after a brief stuttering returns a list of 'NA' values as estimates and the line <std::bad_alloc : std::bad_alloc>
I am not sure if I am making a basic mistake here and any help or recommendations are appreciated.
In the end I managed to find a work-around for this using the rPython package and lmfit from python. It solved my issue and might be useful for others with the same issue. The R-code was as follows:
library(rPython)
python.load("pyFit.py")
python.assign("row",pos)
python.assign("vals",vals)
python.exec("FWHM,ERROR,FIT = fitDist(row,vals)")
FWHM = python.get("FWHM")
ERROR = python.get("ERROR")
cFIT = python.get("FIT")
and the called python code was:
from lmfit import Model, minimize, Parameters, fit_report
from sklearn import mixture
import numpy as np
import matplotlib.pyplot as plt
import math
def cauchyDist(x,a,b,c,d,e,f,g,A,gamma,pos):
return a + b*x + c*pow(x,2) + d*pow(x,3) + e*pow(x,4) + f*pow(x,5) + g*pow(x,6) + (A/np.pi)*(gamma/((pow((x-pos),2)) + (pow(gamma,2))))
def fitDist(row, vals):
gmod = Model(cauchyDist)
x = np.arange(0,600)
result = gmod.fit(vals, x=x, a = 0, b = 0.3, c = -0.0004, d = 0, e = 0, f= 0, g = 0, A = 70000, gamma = 13, pos = row)
newFile = open('fitData.txt', 'w')
newFile.write(result.fit_report())
newFile.close()
with open('fitData.txt', 'r') as inF:
for line in inF:
if 'gamma:' in line:
j = line.split()
inF.close()
FWHM = float(j[1])
error = float(j[3])
fit = result.best_fit
fit = fit.tolist()
return FWHM, error, fit
I increased the order of polynomial to obtain a better fit for the data and returned the FWHM, its error and the values for the fit. There are likely much better ways of achieving this but the final fit is as I needed.
Final fit. Red data points are raw data, the black line is the fitted distribution.

Ifelse statement inside deSolve not working

I want to create a Dynamic model of butterfly ecology using deSolve. the simulation runs over several simulation years and some events are triggered by the day of the year (so I added one state variable of days ). in order to trigger those events I want to use an ifelse statement and it works fine, until I try to put in the ifelse statement an operation involving another state variable: D.egg.sus=(ifelse(days<270,(400 * adult.sus),0)).
When I do so, the simulation runs, but it seems to ignore the ifelse statement.
can anyone help me please? here is my full code:
days = 1
egg.sus = 0
larvae.sus = 0
pupae.sus = 0
adult.sus = 1000
state = c(days = days, egg.sus=egg.sus, larvae.sus=larvae.sus,
pupae.sus=pupae.sus, adult.sus=adult.sus)
model = function(t, state, parameters)
{
with(as.list(c(state, parameters)),
{
D.Days = 1
D.egg.sus =
( ifelse(days<270, (400*adult.sus) ,0)) ## This is the line causing trouble
(- egg.sus/5)
(- egg.sus * rbeta(1, 6.038892/5,1.4612593)*.95)
D.larvae.sus =
(+ egg.sus/5)
(- larvae.sus * rbeta(1, 0.248531/14,0.2094379)*0.95)
(- larvae.sus/14)
D.pupae.sus =
(+ larvae.sus/14)
(- pupae.sus * rbeta(1, 0.022011/15, 1.43503))
(- pupae.sus/15)
D.adult.sus =
(+ pupae.sus/15)
(- adult.sus/30)
list(c( D.Days, D.egg.sus, D.larvae.sus,D.pupae.sus, D.adult.sus))
}
)}
events <- data.frame(var = c('days'),
time = seq(364,73000,by=365) ,
value = 0,
method = "rep")
require(deSolve)
times = seq(1,900, by = 1)
out = ode(y=state, times = times, func = model, parms = parameters, events = list(data=events))
dev.cur()
plot(out, col = 2)
I don't know about five years ago, but at the time of writing ifelse works just fine with deSolve. Your issue seems to be that the returned value of your condition did not return as you wanted. Instead you might want to use a flag variable or save the return from your ifelse to a variable that you can then use in your model.
Here is a small example demonstrating how you can use a flag in your model parameters
library(deSolve)
# Our model function, first-order
# One parameter is a flag that is used by the ifelse to set Ka to zero if TRUE.
onecomp <- function(t, state, parameters) {
with(as.list(c(state, parameters)), {
Ka = ifelse(flag == TRUE, 0, Ka) # Use ifelse to check for negative values
dX <- - X*Ka
dY <- X*Ka - Y*Ke
list(c(dX, dY))
})
}
times <- seq(0, 24, by = 0.01)
parameters <- c(Ka = 0.8 , Ke = 0.2, flag = FALSE)
state <- c(X = 100 , Y = 0)
# Test for TRUE
out <- ode(y = state, times = times, func = onecomp, parms = parameters)
plot(out)
# Test for FALSE, where we expect no transfer.
parameters <- c(Ka = 0.8 , Ke = 0.2, flag = TRUE)
out <- ode(y = state, times = times, func = onecomp, parms = parameters)
plot(out)
Created on 2021-01-13 by the reprex package (v0.3.0)
The model in the question has several issues:
You can use the simulation time directly instead of a state variable days, because simulation time in the function is given as t. Then just use the modulo operator %% and you don't need events anymore.
the parameters are all hard-coded, so use parms=NULLin the ode function.
line breaks are wrong. R continues lines if (and only if) they are not yet syntactically complete. Therefore, remove obsolete parentheses and, for example, put the - operator
at the end of the line.
Use of a random number e.g. rgamma within an ODE function is a very bad idea, especially for solvers with automatic time steps. ODEs are deterministic by definition. One may consider a fixed time-step solver instead, e.g. method="euler"with a very small time step or (much better) to provide the random values as an external input (forcing).
If you use an external input, you can avoid the ifelse anyway.

Resources