I have a function R^5 -> R, and I am interested in its minimum. There are plenty of functions in R like optim, optimize or fminbnd in the R package pracma. But they just accept one argument and I don't understand the help page.
mindisturbed <- function(a,d1,d2,d3,p){
sum((data^(- a) * (d1 + d2*cos(log(data)*2*pi/p) + d3 *
sin(log(data)*2*pi/p)) - log(j))^2)
}
The "data" and the "j" variable are in my global settings. These are vectors with length k. The arguments of the function are all numeric numbers with length 1. The function is an residual square sum.
So do anyone know how to minimize this function in depend of all its arguments?
Assuming data and j are vectors of the same length try the following. You may or may not need better starting values.
1) Use optim like this
st <- c(a = 1, d1 = 1, d2 = 1, d3 = 1, p = 1)
f <- function(x) mindisturbed(x[1], x[2], x[3], x[4], x[5])
optim(st, f)
2) or nls with default algorithm where st is from (1)
fo <- log(j) ~ data^(- a) * (d1 + d2*cos(log(data)*2*pi/p) + d3 *
sin(log(data)*2*pi/p))
nls(fo, start = st)
3) or nls with plinear algorithm. In that case the RHS of the formula is a matrix with column names d1, d2 and d3 such that first column multiplies d1, second d2 and third d3. Only the nonlinear parameters, i.e. a and p, are specified in start.
fo2 <- log(j) ~ data^(-a) * cbind(d1 = 1,
d2 = cos(log(data)*2*pi/p),
d3 = sin(log(data)*2*pi/p))
nls(fo2, start = c(a = 0.1, p = 0.1), algorithm = "plinear")
Note
The question did not include data and j but we can use these to try it out.
set.seed(123)
n <- 100
data <- runif(n, 1, 2)
j <- 1:n
o <- order(data)
j <- j[o]
data <- data[o]
Related
I apologise if this is a duplicate; I've read answers to similar questions to no avail.
I'm trying to integrate under a curve, given a specific formula (below) for said integration.
As a toy example, here's some data:
Antia_Model <- function(t,y,p1){
r <- p1[1]; k <- p1[2]; p <- p1[3]; o <- p1[4]
P <- y[1]; I <- y[2]
dP = r*P - k*P*I
dI = p*I*(P/(P + o))
list(c(dP,dI))
}
r <- 0.25; k <- 0.01; p <- 1; o <- 1000 # Note that r can range btw 0.1 and 10 in this model
parms <- c(r, k, p, o)
P0 <- 1; I0 <- 1
N0 <- c(P0, I0)
TT <- seq(0.1, 50, 0.1)
results <- lsoda(N0, TT, Antia_Model, parms, verbose = FALSE)
P <- results[,2]; I <- results[,3]
As I understand it, I should be able to use the auc() function from the MESS package (can I just use the integrate() function? Unclear...), which should look something like this:
auc(P, TT, from = x1, to = x2, type = "spline")
Though I don't really understand how to use the "from" and "to" arguments, or how to incorporate "u" from the original integration formula...
Using the integrate() function seems more intuitive, but if I try:
u <- 1
integrand <- function(P) {u*P}
q <- integrate(integrand, lower = 0, upper = Inf)
I get this error:
# Error in integrate(integrand, lower = 0, upper = Inf) :
# the integral is probably divergent
As you can tell, I'm pretty lost, so any help would be greatly appreciated! Thank you so much! :)
integrand is technically acceptable but right now, it's the identity function f(x) = x. The area under it from [0, inf) is infinite, i.e. divergent.
From the documentation of integrate the first argument is:
an R function taking a numeric first argument and returning a numeric vector of the same length. Returning a non-finite element will generate an error.
If instead you use a pulse function:
pulse <- function(x) {ifelse(x < 5 & x >= 0, 1, 0)}
integrate(pulse, lower = 0, upper = Inf)
#> 5 with absolute error < 8.5e-05
I need to find the value of a parameter which make my function produce a specific result.
I write down something like this:
## Defining the function
f = function(a, b, c, x) sqrt(t(c(a, b, c, x)) %*% rho %*% c(a, b, c, x))
## Set di input needed
rho <- matrix(c(1,0.3,0.2,0.4,
0.3,1,0.1,0.1,
0.2,0.1,1,0.5,
0.4,0.1,0.5,1),
nrow = 4, ncol = 4)
target <- 10000
## Optimize
output <- optimize(f, c(0, target), tol = 0.0001, a = 1000, b = 1000, c = 1000, maximum = TRUE)
I would like to derive di value of x related to the maximum of my function (the target value).
Thanks,
Ric
You can find one such x with closed formula. For symmetric matrices (like the one you have) you can achieve target value by vector x where x is defined as:
spectral_decomp <- eigen(rho, TRUE)
eigen_vec1 <- spectral_decomp$vectors[,1]
lambda1 <- spectral_decomp$values[[1]]
target <- 1000
x <- (target / sqrt(lambda1)) * eigen_vec1
check:
sqrt(matrix(x, nrow = 1) %*% rho %*% matrix(x, ncol = 1))
I am running a nonlinear regression model that needs initial values to start, but the number of variables I want to include may be too large to manually type all the values - therefore I was wondering if there's an alternative to that.
set.seed(12345)
y = rnorm(100, 1000,150)
x1 = rnorm(100,10000,251)
x2 = rnorm(100, 3000,654)
x3 = rnorm(100, 25000,100)
x4 = rnorm(100, 200000,589)
x5 = rnorm(100, 31657,296)
adstock <- function(x,rate=0){
return(as.numeric(stats::filter(x=log(x+1),filter=rate,method="recursive")))
}
library(minpack.lm)
nlsLM(y~b0
+ b1 * adstock(x1, r1)
+ b2 * adstock(x2, r2)
+ b3 * adstock(x3, r3)
+ b4 * adstock(x4, r4)
+ b5 * adstock(x5, r5)
, algorithm = "LM"
# this is where I need to paste the results from the loop
, start = c(b0=1,b1=1,b2=1,b3=1,b4=1,b5=1
,r1=0.1,r2=0.1,r3=0.1,r4=0.1,r5=0.1
)
# end
, control = list(maxiter = 200)
)
My idea was to use a loop to pass the values to the model, but I can't make it work (the following code should be for b_i coefficients)
test_start <- NULL
for(i in 1:(5+1)) {
test_start[i] = paste0("b",i-1,"=",1)
}
cat(test_start)
This is the result, which is not exactly what the model expects:
b0=1 b1=1 b2=1 b3=1 b4=1 b5=1
How can I pass the results of the loop to the model?
Also, how can I add r_i start coefficients to b_i start coefficients in the loop?
Any help would be very appreciated.
PS: at the moment I am interested to assign to each b0,b1,...,b5 the same value (in this case, 1) and to each r1,r2,...,r5 the same value (in this case, 0.1)
Define the data as DF and the formula as fo and then grep out the b and r variables. The line defining v creates a vector with their names and the line defining st a named vector with value 1 for the b's and 0.1 for the r's.
DF <- data.frame(y, x1, x2, x3, x4, x5)
n <- ncol(DF) - 1
rhs <- c("b0", sprintf("b%d * adstock(x%d, r%d)", 1:n, 1:n, 1:n))
fo <- reformulate(rhs, "y")
v <- grep("[br]", all.vars(fo), value = TRUE)
st <- setNames(grepl("b", v) + 0.1 * grepl("r", v), v)
st
nlsLM(fo, DF, start = st, algorithm = "LM", control = list(maxiter = 200))
Regarding the comment try defining rhs like this. In the first line take whatever subset of labs you want, e.g. labs <- labels(...)[1:9] or change the formula in the first line, e.g. labs <- labels(terms(y ~ .*(1 + x1), data = DF))
labs <- labels(terms(y ~ .^2, data = DF))
labs <- sub(":", "*", labs)
n <- length(labs)
rhs <- c("b0", sprintf("b%d * adstock(%s, r%d)", 1:n, labs, 1:n))
I'm doing Maximum Likelihood Estimation using maxLik, which requires specifying starting values. Instead of specifying a single value, is there any way that allows me to use all the values from a matrix as the start value?
My current code of maxLik is:
f12 <- function(param){
alpha <- param[1]
rho <- param[2]
lambda <- param[3]
u <- 0.5*(p12$v_50_1)^alpha + 0.5*lambda*(p12$v_50_2)^alpha
p <- 1/(1 + exp(-rho*u))
f <- sum(p12$gamble*log(p) + (1-p12$gamble)*log(1-p))}
ml <- maxLik(f12, start = c(alpha = 1, rho=2, lambda = 1), method = "NM")
I create a dataframe with the upper and lower bounds of potential start values:
st <- expand.grid(alpha = seq(0, 2, len = 100),rho = seq(0, 1, len = 100),lambda = seq(0,2, length(100))
There are 3 parameters in my function, and my goal is to loop all the values in the above dataframe st and select the best vector of start values after running the model from a variety of starting parameters.
Thanks!
Consider Map (wrapper to mapply) to pass the st columns elementwise through your methods. Here, Map will return a list of maxLik objects, specifically inherited maxim class objects containing a list of other components. The number of items in this list will be equal to rows of st.
Notice input parameters, a, r, and l being passed into start argument of maxLik() and no longer hard-coded integers. And f12 is left untouched.
maxLik_run <- function(a, r, l) {
tryCatch({
f12 <- function(param){
alpha <- param[1]
rho <- param[2]
lambda <- param[3]
u <- 0.5*(p12$v_50_1)^alpha + 0.5*lambda*(p12$v_50_2)^alpha
p <- 1/(1 + exp(-rho*u))
f <- sum(p12$gamble*log(p) + (1-p12$gamble)*log(1-p))
}
return(maxLik(f12, start = c(alpha = a, rho = r, lambda = l), method = "NM"))
}, error = function(e) return(NA))
}
st <- expand.grid(alpha = seq(0, 2, len = 100),
rho = seq(0, 1, len = 100),
lambda = seq(0, 2, length(100)))
maxLik_list <- Map(maxLik_run, st$alpha, st$rho, st$lambda)
And to answer the question --best vector of start values after running the model from a variety of starting parameters-- requires a particular definition of "best". Once you define this, you can use Filter() on your returned list of objects to select the one or more element that yields this "best".
Below is a demonstration to find the highest value across each maximum likelihood's maximum. Use estimate if needed. Do note, this returned list can have more than one if the highest value is shared by other list items:
highest_value <- max(sapply(maxLik_list, function(item) item$maximum))
maxLik_item_list <- Filter(function(i) i$maximum == highest_value, maxLik_list)
What you are doing in your logLik function is that you are calculating alpha,lambda,rho whereas your data already has them.Those are the lines with u,p and f12(that is also your function name!). Also it is possible to calculate log likelihood for one row as your log likelihood function has single indices. So you run the code using apply like this
#create a function to find mle estimate for first row
maxlike <- function(a) {
f12 <- function(param){
alpha <- param[1]
rho <- param[2]
lambda <- param[3]
#u <- 0.5*(p12$v_50_1)^alpha + 0.5*lambda*(p12$v_50_2)^alpha
#p <- 1/(1 + exp(-rho*u))
#f12 <- sum(p12$gamble*log(p) + (1-p12$gamble)*log(1-p))
}
ml <- maxLik(f12, start = c(alpha = 1, rho=2, lambda = 1), method = "NM")
}
#then using apply with data = st, 2 means rows and your mle function
mle <- apply(st,2,maxlike)
mle
Suppose that I want to solve a function containing two integrals like (this is an example, the actual function is uglier)
where a and b are the boundaries, c and d are known parameters and f(x) and F(x) are the density and distribution of the random variable x. In my problem f(x) and F(x) are nonparametrically found, so that I know their values only for certain specific values of x. How would you set the integral?
I did:
# Create the data
val <- runif(300, min=1, max = 10) #use the uniform distribution
CDF <- (val - 1)/(10 - 1)
pdf <- 1 / (10 - 1)
data <- data.frame(val = val, CDF = CDF, pdf = pdf)
c = 2
d = 1
# Inner integral
integrand1 <- function(x) {
i <- which.min(abs(x - data$val))
FF <- data$CDF[i]
ff <- data$pdf[i]
(1 - FF)^(c/d) * ff
}
# Vectorize the inner integral
Integrand1 <- Vectorize(integrand1)
# Outer integral
integrand2 <- function(x){
i <- which.min(abs(x - data$val))
FF <- data$CDF[i]
ff <- data$pdf[i]
(quadgk(Integrand1, x, 10) / FF) * c * ff
}
# Vectorize the outer integral
Integrand2 <- Vectorize(integrand2)
# Solve
require(pracma)
quadgk(Integrand2, 1, 10)
The integral is extremely slow. Is there a better way to solve this? Thank you.
---------EDIT---------
In my problem the pdf and CDF are computed from a vector of values v as follows:
# Create the original data
v <- runif(300, min = 1, max = 10)
require(np)
# Compute the CDF and pdf
v.CDF.bw <- npudistbw(dat = v, bandwidth.compute = TRUE, ckertype = "gaussian")
v.pdf.bw <- npudensbw(dat = v, bandwidth.compute = TRUE, ckertype = "gaussian")
# Extend v on a grid (I add this step because the v vector in my data
# is not very large. In this way I approximate the estimated pdf and CDF
# on a grid)
val <- seq(from = min(v), to = max(v), length.out = 1000)
data <- data.frame(val)
CDF <- npudist(bws = v.CDF.bw, newdata = data$val, edat = data )
pdf <- npudens(bws = v.pdf.bw, newdata = data$val, edat = data )
data$CDF <- CDF$dist
data$pdf <- pdf$dens
Have you considered using approxfun?
It takes vectors x and y and gives you a function that linearly interpolates between those. So for example, try
x <- runif(1000)+runif(1000)+2*(runif(1000)^2)
dx <- density(x)
fa <- approxfun(dx$x,dx$y)
curve(fa,0,2)
fa(0.4)
You should be able to call it using your gridded evaluations. It may be faster than what you're doing (as well as more accurate)
(edit: yes, as you say, splinefun should be fine if its fast enough for your needs)