Related
I'm trying to run the following function mentioned below using OptimParallel in R on a certain data set. The code is as follows:
install.packages("optimParallel")
install.packages('parallel')
library(parallel)
library(optimParallel)
library(doParallel)
library(data.table)
library(Rlab)
library(HDInterval)
library(mvtnorm)
library(matrixStats)
library(dplyr)
library(cold)
## Bolus data:
data("bolus")
d1 <- bolus
d1$group <- ifelse(d1$group == "2mg",1,0)
colnames(d1) <- c("index",'group',"time","y")
d2 <- d1 %>% select(index, y, group, time)
colnames(d2) <- c('index','y','x1','x2') ### Final data
## Modification of the objective function:
## Another approach:
dpd_poi <- function(x,fixed = c(rep(FALSE,5))){
params <- fixed
dpd_1 <- function(p){
params[!fixed] <- p
alpha <- params[1]
beta_0 <- params[2]
beta_1 <- params[3]
beta_2 <- params[4]
rho <- params[5]
add_pi <- function(d){
k <- beta_0+(d[3]*beta_1)+(d[4]*beta_2)
k1 <- exp(k) ## for Poisson regression
d <- cbind(d,k1)
}
dat_split <- split(x , f = x$index)
result <- lapply(dat_split, add_pi)
result <- rbindlist(result)
result <- as.data.frame(result)
colnames(result) <- c('index','y','x1','x2','lamb')
result_split <- split(result, f = result$index)
expression <- function(d){
bin <- as.data.frame(combn(d$y , 2))
pr <- as.data.frame(combn(d$lamb , 2))
## Evaluation of the probabilities:
f_jk <- function(u,v){
dummy_func <- function(x,y){
ppois(x, lambda = y)
}
dummy_func_1 <- function(x,y){
ppois(x-1, lambda = y)
}
k <- mapply(dummy_func,u,v)
k_1 <- mapply(dummy_func_1,u,v)
inv1 <- inverseCDF(as.matrix(k), pnorm)
inv2 <- inverseCDF(as.matrix(k_1), pnorm)
mean <- rep(0,2)
lower <- inv2
upper <- inv1
corr <- diag(2)
corr[lower.tri(corr)] <- rho
corr[upper.tri(corr)] <- rho
prob <- pmvnorm(lower = lower, upper = upper, mean = mean, corr = corr)
prob <- (1+(1/alpha))*(prob^alpha)
## First expression: (changes for Poisson regression)
lam <- as.vector(t(v))
v1 <- rpois(1000, lambda = lam[1])
v2 <- rpois(1000, lambda = lam[2])
all_possib <- as.data.frame(rbind(v1,v2))
new_func <- function(u){
k <- mapply(dummy_func,u,v)
k_1 <- mapply(dummy_func_1,u,v)
inv1_1 <- inverseCDF(as.matrix(k), pnorm)
inv2_1 <- inverseCDF(as.matrix(k_1), pnorm)
mean1 <- rep(0,2)
lower1 <- inv2_1
upper1 <- inv1_1
corr1 <- diag(2)
corr1[lower.tri(corr1)] <- rho
corr1[upper.tri(corr1)] <- rho
prob1 <- pmvnorm(lower = lower1, upper = upper1, mean = mean1, corr = corr1)
prob1 <- prob1^(alpha)
}
val <- apply(all_possib, 2, new_func)
val_s <- mean(val) ## approximation
return(val_s - prob)
}
final_res <- mapply(f_jk, bin, pr)
final_value <- sum(final_res)
}
u <- sapply(result_split,expression)
return(sum(u))
}
}
## run the objective function:
cl <- makeCluster(25)
setDefaultCluster(cl=cl)
clusterExport(cl,c('d2','val'))
clusterEvalQ(cl,c(library(data.table), library(Rlab),library(HDInterval),library(mvtnorm),library(matrixStats),library(dplyr),library(cold)))
val <- dpd_poi(d2, c(0.5,FALSE,FALSE,FALSE,FALSE))
optimParallel(par = c(beta_0 =1, beta_1 =0.1 ,beta_2 = 1,rho=0.2),fn = val ,method = "L-BFGS-B",lower = c(-10,-10,-10,0),upper = c(Inf,Inf,Inf,1))
stopCluster(cl)
After running for some time, it returns the following error:
checkForRemoteErrors(val)
9 nodes produced errors; first error: missing value where TRUE/FALSE needed
However, when I make a minor change in the objective function (pick 2 random numbers from rpois instead of 1000) and run the same code using optim, it converges and gives me a proper result. This is a Monte Carlo simulation and it does not make sense to draw so few Poisson variables. I have to use optimParllel, otherwise, it takes way too long to converge. I could also run this code using simulated data.
I'm unable to figure out where the issue truly lies. I truly appreciate any help in this regard.
I am trying to create a Hamming distance measure for the pvclust clustering method. (There isn't one defined for this function.) I'm based on the example given for the cosine measure:
cosine <- function(x) {
x <- as.matrix(x)
y <- t(x) %*% x
res <- 1 - y / (sqrt(diag(y)) %*% t(sqrt(diag(y))))
res <- as.dist(res)
attr(res, "method") <- "cosine"
return(res)
}
I try to do it this way:
hamming <- function(x) {
x <- as.matrix(x)
y <- t(x) %*% x
res <- sum(y != y)
res <- as.dist(res)
attr(res, "method") <- "hamming"
return(res)
}
Unfortunately it doesn't work properly. Anyone have any postings, where is the error and how to fix it?
Try this
hamming <- function(x) {
x <- as.matrix(x)
y <- (1 - x) %*% t(x)
res <- y + t(y)
res <- as.dist(res)
attr(res, "method") <- "hamming"
return(res)
}
I have a code which has been used for some paper.
After defining the function to be optimized, the author used the Nelder-Mead method to estimate the parameters needed. When I run the code, it freezes after 493 function evaluations have been used, it doesn't show any kind of error message or anything. I've been trying to find some info but I haven't been lucky. How can I modify the optim command in order to evaluate all possible combinations, and/or what is preventing the function from being optimized?
Here's the code. It's relatively long, BUT the second-to-last line (system.time(stcopfit...)) is the ONLY ONE I need to make work / fix / modify. So you can just copy&paste the code (as I said, taken from the author of the mentioned paper) and let it run, you don't have to go through the all code, just the last few lines. This is the data over which to run the optimization, i.e. a matrix of [0,1] uniform variables of dimension 2172x9.
Any help is appreciated, thanks!
Here's a screenshot in RStudio (it took around 2 minutes to arrive at 493, and then it's been stuck like this for the last 30 minutes):
Code:
#download older version of "sn" package
url <- "https://cran.r-project.org/src/contrib/Archive/sn/sn_1.0-0.tar.gz"
install.packages(url, repos=NULL, type="source")
install.packages(signal)
library(sn)
library(signal)
#1. redefine qst function
qst <- function (p, xi = 0, omega = 1, alpha = 0, nu = Inf, tol = 1e-08)
{
if (length(alpha) > 1)
stop("'alpha' must be a single value")
if (length(nu) > 1)
stop("'nu' must be a single value")
if (nu <= 0)
stop("nu must be non-negative")
if (nu == Inf)
return(qsn(p, xi, omega, alpha))
if (nu == 1)
return(qsc(p, xi, omega, alpha))
if (alpha == Inf)
return(xi + omega * sqrt(qf(p, 1, nu)))
if (alpha == -Inf)
return(xi - omega * sqrt(qf(1 - p, 1, nu)))
na <- is.na(p) | (p < 0) | (p > 1)
abs.alpha <- abs(alpha)
if (alpha < 0)
p <- (1 - p)
zero <- (p == 0)
one <- (p == 1)
x <- xa <- xb <- xc <- fa <- fb <- fc <- rep(NA, length(p))
nc <- rep(TRUE, length(p))
nc[(na | zero | one)] <- FALSE
fc[!nc] <- 0
xa[nc] <- qt(p[nc], nu)
xb[nc] <- sqrt(qf(p[nc], 1, nu))
fa[nc] <- pst(xa[nc], 0, 1, abs.alpha, nu) - p[nc]
fb[nc] <- pst(xb[nc], 0, 1, abs.alpha, nu) - p[nc]
regula.falsi <- FALSE
while (sum(nc) > 0) {
xc[nc] <- if (regula.falsi)
xb[nc] - fb[nc] * (xb[nc] - xa[nc])/(fb[nc] - fa[nc])
else (xb[nc] + xa[nc])/2
fc[nc] <- pst(xc[nc], 0, 1, abs.alpha, nu) - p[nc]
pos <- (fc[nc] > 0)
xa[nc][!pos] <- xc[nc][!pos]
fa[nc][!pos] <- fc[nc][!pos]
xb[nc][pos] <- xc[nc][pos]
fb[nc][pos] <- fc[nc][pos]
x[nc] <- xc[nc]
nc[(abs(fc) < tol)] <- FALSE
regula.falsi <- !regula.falsi
}
x <- replace(x, zero, -Inf)
x <- replace(x, one, Inf)
Sign <- function(x) sign(x)+ as.numeric(x==0)
q <- as.numeric(xi + omega * Sign(alpha)* x)
names(q) <- names(p)
return(q)
}
#2. initial parameter setting
mkParam <- function(Omega, delta, nu){
ndim <- length(delta)+1;
R <- diag(ndim);
for (i in 2:ndim){
R[i,1] <- R[1,i] <- delta[i-1];
if (i>=3){for (j in 2:(i-1)){R[i,j] <- R[j,i] <- Omega[i-1,j-1];}}
}
LTR <- t(chol(R));
Mtheta <- matrix(0, nrow=ndim, ncol=ndim);
for (i in 2:ndim){
Mtheta[i,1] <- acos(LTR[i,1]);
cumsin <- sin(Mtheta[i,1]);
if (i >=3){for (j in 2:(i-1)){
Mtheta[i,j] <- acos(LTR[i,j]/cumsin);
cumsin <- cumsin*sin(Mtheta[i,j]);}
}
}
c(Mtheta[lower.tri(Mtheta)], log(nu-2));
}
#3. from internal to original parameters
paramToExtCorr <- function(param){
ntheta <- dim*(dim+1)/2;
theta <- param[1:ntheta];
ndim <- (1+sqrt(1+8*length(theta)))/2;
LTR <- diag(ndim);
for (i in 2:ndim){
LTR[i,1] <- cos(theta[i-1]);
cumsin <- sin(theta[i-1]);
if (i >=3){for (j in 2:(i-1)){
k <- i+ndim*(j-1)-j*(j+1)/2;
LTR[i,j] <- cumsin*cos(theta[k]);
cumsin <- cumsin*sin(theta[k]);}
}
LTR[i,i] <- cumsin;
}
R <- LTR %*% t(LTR);
R;
}
#4. show estimated parameters and log likelihood
resultVec <- function(fit){
R <- paramToExtCorr(fit$par);
logLik <- -fit$value;
Omega <- R[-1, -1];
delta <- R[1, -1];
ntheta <- dim*(dim+1)/2;
nu <- exp(fit$par[ntheta+1])+2;
c(Omega[lower.tri(Omega)], delta, nu, logLik);
}
#5. negative log likelihood for multivariate skew-t copula
stcopn11 <- function(param){
N <- nrow(udat);
mpoints <- 150;
npar <- length(param);
nu <- exp(param[npar])+2;
R <- paramToExtCorr(param);
Omega <- R[-1, -1];
delta <- R[1, -1];
zeta <- delta/sqrt(1-delta*delta);
iOmega <- solve(Omega);
alpha <- iOmega %*% delta / sqrt(1-(t(delta) %*% iOmega %*% delta)[1,1]);
ix <- matrix(0, nrow=N, ncol=dim);
lm <- matrix(0, nrow=N, ncol=dim);
for (j in 1:dim){
minx <- qst(min(udat[,j]), alpha=zeta[j], nu=nu);
maxx <- qst(max(udat[,j]), alpha=zeta[j], nu=nu);
xx <- seq(minx, maxx, length=mpoints);
px <- sort(pst(xx, alpha=zeta[j], nu=nu));
ix[,j] <- pchip(px, xx, udat[,j]);
lm[,j] <- dst(ix[,j], alpha=zeta[j], nu=nu, log=TRUE);
}
lc <- dmst(ix, Omega=Omega, alpha=alpha, nu=nu, log=TRUE);
-sum(lc)+sum(lm)
}
#6. sample setting
dim <- 9;
smdelta <- c(-0.36,-0.33,-0.48,-0.36,-0.33,-0.48,-0.36,-0.33,-0.48);
smdf <- 5;
smOmega <- cor(udat);
smzeta <- smdelta/sqrt(1-smdelta*smdelta);
iOmega <- solve(smOmega);
smalpha <- iOmega %*% smdelta /sqrt(1-(t(smdelta) %*% iOmega %*% smdelta)[1,1]);
#7. estimation
iniPar <- mkParam(diag(dim),numeric(dim),6);
system.time(stcopfit<-optim(iniPar,stcopn11,control=list(reltol=1e-8,trace=6)));
resultVec(stcopfit);
The parameters you arrive at by step 493 lead to an infinite loop in your qst function: not having any idea what this very complex code is actually doing, I'm afraid I can't diagnose further. Here's what I did to get that far:
I stated cur.params <- NULL in the global environment, then put cur.params <<- params within stcopn11; this saves the current set of parameters to the global environment, so that when you break out of the optim() call manually (via Control-C or ESC depending on your platform) you can inspect the current set of parameters, and restart from them easily
I put in old-school debugging statements (e.g. cat("entering stcopn11\n") and cat("leaving stcopn11\n") at the beginning and at the next-to-last line of the objective function, a few within stopc11 to indicate progress markers within)
once I had the "bad" parameters I used debug(stcopn11) and stcopn11(cur.param) to step through the function
I discovered that it was hanging on dimension 3 (j==3 in the for loop within stcopn11) and particularly on the first qst() call
I added a maxit=1e5 argument to qst; initialized it <- 1 before the while loop; set it <- it+1 each time through the loop; changed the stopping criterion to while (sum(nc) > 0 && it<maxit); and added if (it==maxit) stop("hit max number of iterations in qst") right after the loop
1e5 iterations in qst took 74 seconds; I have no idea whether it might stop eventually, but didn't want to wait to find out.
This was my modified version of stcopn11:
cur.param <- NULL ## set parameter placeholder
##5. negative log likelihood for multivariate skew-t copula
stcopn11 <- function(param,debug=FALSE) {
cat("stcopn11\n")
cur.param <<- param ## record current params outside function
N <- nrow(udat)
mpoints <- 150
npar <- length(param)
nu <- exp(param[npar])+2
R <- paramToExtCorr(param)
Omega <- R[-1, -1]
delta <- R[1, -1]
zeta <- delta/sqrt(1-delta*delta)
cat("... solving iOmega")
iOmega <- solve(Omega)
alpha <- iOmega %*% delta /
sqrt(1-(t(delta) %*% iOmega %*% delta)[1,1])
ix <- matrix(0, nrow=N, ncol=dim)
lm <- matrix(0, nrow=N, ncol=dim)
cat("... entering dim loop\n")
for (j in 1:dim){
if (debug) cat(j,"\n")
minx <- qst(min(udat[,j]), alpha=zeta[j], nu=nu)
maxx <- qst(max(udat[,j]), alpha=zeta[j], nu=nu)
xx <- seq(minx, maxx, length=mpoints)
px <- sort(pst(xx, alpha=zeta[j], nu=nu))
ix[,j] <- pchip(px, xx, udat[,j])
lm[,j] <- dst(ix[,j], alpha=zeta[j], nu=nu, log=TRUE)
}
lc <- dmst(ix, Omega=Omega, alpha=alpha, nu=nu, log=TRUE)
cat("leaving stcopn11\n")
-sum(lc)+sum(lm)
}
I have a working implementation of multivariable linear regression using gradient descent in R. I'd like to see if I can use what I have to run a stochastic gradient descent. I'm not sure if this is really inefficient or not. For example, for each value of α I want to perform 500 SGD iterations and be able to specify the number of randomly picked samples in each iteration. It would be nice to do this so I could see how the number of samples influences the results. I'm having trouble through with the mini-batching and I want to be able to easily plot the results.
This is what I have so far:
# Read and process the datasets
# download the files from GitHub
download.file("https://raw.githubusercontent.com/dbouquin/IS_605/master/sgd_ex_data/ex3x.dat", "ex3x.dat", method="curl")
x <- read.table('ex3x.dat')
# we can standardize the x vaules using scale()
x <- scale(x)
download.file("https://raw.githubusercontent.com/dbouquin/IS_605/master/sgd_ex_data/ex3y.dat", "ex3y.dat", method="curl")
y <- read.table('ex3y.dat')
# combine the datasets
data3 <- cbind(x,y)
colnames(data3) <- c("area_sqft", "bedrooms","price")
str(data3)
head(data3)
################ Regular Gradient Descent
# http://www.r-bloggers.com/linear-regression-by-gradient-descent/
# vector populated with 1s for the intercept coefficient
x1 <- rep(1, length(data3$area_sqft))
# appends to dfs
# create x-matrix of independent variables
x <- as.matrix(cbind(x1,x))
# create y-matrix of dependent variables
y <- as.matrix(y)
L <- length(y)
# cost gradient function: independent variables and values of thetas
cost <- function(x,y,theta){
gradient <- (1/L)* (t(x) %*% ((x%*%t(theta)) - y))
return(t(gradient))
}
# GD simultaneous update algorithm
# https://www.coursera.org/learn/machine-learning/lecture/8SpIM/gradient-descent
GD <- function(x, alpha){
theta <- matrix(c(0,0,0), nrow=1)
for (i in 1:500) {
theta <- theta - alpha*cost(x,y,theta)
theta_r <- rbind(theta_r,theta)
}
return(theta_r)
}
# gradient descent α = (0.001, 0.01, 0.1, 1.0) - defined for 500 iterations
alphas <- c(0.001,0.01,0.1,1.0)
# Plot price, area in square feet, and the number of bedrooms
# create empty vector theta_r
theta_r<-c()
for(i in 1:length(alphas)) {
result <- GD(x, alphas[i])
# red = price
# blue = sq ft
# green = bedrooms
plot(result[,1],ylim=c(min(result),max(result)),col="#CC6666",ylab="Value",lwd=0.35,
xlab=paste("alpha=", alphas[i]),xaxt="n") #suppress auto x-axis title
lines(result[,2],type="b",col="#0072B2",lwd=0.35)
lines(result[,3],type="b",col="#66CC99",lwd=0.35)
}
Is it more practical to find a way to use sgd()? I can't seem to figure out how to have the level of control I'm looking for with the sgd package
Sticking with what you have now
## all of this is the same
download.file("https://raw.githubusercontent.com/dbouquin/IS_605/master/sgd_ex_data/ex3x.dat", "ex3x.dat", method="curl")
x <- read.table('ex3x.dat')
x <- scale(x)
download.file("https://raw.githubusercontent.com/dbouquin/IS_605/master/sgd_ex_data/ex3y.dat", "ex3y.dat", method="curl")
y <- read.table('ex3y.dat')
data3 <- cbind(x,y)
colnames(data3) <- c("area_sqft", "bedrooms","price")
x1 <- rep(1, length(data3$area_sqft))
x <- as.matrix(cbind(x1,x))
y <- as.matrix(y)
L <- length(y)
cost <- function(x,y,theta){
gradient <- (1/L)* (t(x) %*% ((x%*%t(theta)) - y))
return(t(gradient))
}
I added y to your GD function and created a wrapper function, myGoD, to call yours but first subsetting the data
GD <- function(x, y, alpha){
theta <- matrix(c(0,0,0), nrow=1)
theta_r <- NULL
for (i in 1:500) {
theta <- theta - alpha*cost(x,y,theta)
theta_r <- rbind(theta_r,theta)
}
return(theta_r)
}
myGoD <- function(x, y, alpha, n = nrow(x)) {
idx <- sample(nrow(x), n)
y <- y[idx, , drop = FALSE]
x <- x[idx, , drop = FALSE]
GD(x, y, alpha)
}
Check to make sure it works and try with different Ns
all.equal(GD(x, y, 0.001), myGoD(x, y, 0.001))
# [1] TRUE
set.seed(1)
head(myGoD(x, y, 0.001, n = 20), 2)
# x1 V1 V2
# V1 147.5978 82.54083 29.26000
# V1 295.1282 165.00924 58.48424
set.seed(1)
head(myGoD(x, y, 0.001, n = 40), 2)
# x1 V1 V2
# V1 290.6041 95.30257 59.66994
# V1 580.9537 190.49142 119.23446
Here is how you can use it
alphas <- c(0.001,0.01,0.1,1.0)
ns <- c(47, 40, 30, 20, 10)
par(mfrow = n2mfrow(length(alphas)))
for(i in 1:length(alphas)) {
# result <- myGoD(x, y, alphas[i]) ## original
result <- myGoD(x, y, alphas[i], ns[i])
# red = price
# blue = sq ft
# green = bedrooms
plot(result[,1],ylim=c(min(result),max(result)),col="#CC6666",ylab="Value",lwd=0.35,
xlab=paste("alpha=", alphas[i]),xaxt="n") #suppress auto x-axis title
lines(result[,2],type="b",col="#0072B2",lwd=0.35)
lines(result[,3],type="b",col="#66CC99",lwd=0.35)
}
You don't need the wrapper function--you can just change your GD slightly. It is always good practice to explicitly pass arguments to your functions rather than relying on scoping. Before you were assuming that y would be pulled from your global environment; here y must be given or you will get an error. This will avoid many headaches and mistakes down the road.
GD <- function(x, y, alpha, n = nrow(x)){
idx <- sample(nrow(x), n)
y <- y[idx, , drop = FALSE]
x <- x[idx, , drop = FALSE]
theta <- matrix(c(0,0,0), nrow=1)
theta_r <- NULL
for (i in 1:500) {
theta <- theta - alpha*cost(x,y,theta)
theta_r <- rbind(theta_r,theta)
}
return(theta_r)
}
This is my program for qqplot for burr X type distribution. I know coding is right, but I don't understand why I can't run plot?
burrx.loglike <- function(params, x)
{
theta <- params[1]
sigma <- params[2]
n <- length(x)
if (theta <= 0 || sigma <= 0)
{
ans <- -Inf
}
else
{
ans <- (n*log(2) + sum(log(x)) + n*log(theta) - 2*n*log(sigma)
- sum(x^2)/sigma^2 + (theta-1)*sum(log(1-exp(-1*(x/sigma)^2))))
}
return(ans)
}
burrx.mle2 <- function(x, par0=c(1,1))
{
temp.mle <- optim(par0, burrx.loglike, x=x, method="Nelder-Mead", control=list(fnscale=-1))
return(temp.mle)
}
qqburrx <- function(x, theta, sigma, use.mle=TRUE)
{
# Check to see if we calculate the MLE.
if(use.mle == TRUE)
{
par0 <- c(theta,sigma)
temp.mle <- burrx.mle2(x, par0)
theta <- temp.mle$par[1]
sigma <- temp.mle$par[2]
}
# Sample Quantiles
x.sort <- sort(x)
# Theoretical Quantiles
n <- length(x)
i <- 1:n
x.quantiles <- qburrx(q=i/(n+1), theta=theta, sigma=sigma)
# Plot the data.
plot.min <- min(x.sort, x.quantiles)
plot.max <- max(x.sort, x.quantiles)
plot(x.quantiles, x.sort,
main="Burr type X Q-Q Plot\nNote: For the BurrX to be appropriate,data must fall near the 40deg line.",
xlab="Theoretical Quantiles", ylab="Sample Quantiles",
xlim=c(plot.min,plot.max), ylim=c(plot.min,plot.max))
# Add 45-degree line
line.coord <- c(plot.min, plot.max)
lines(line.coord, line.coord)
}
Basically I'm a beginner of r. Maybe I'm making some error at the time of input parameter.
just use
qqburrx(1:2000,2,1,use.mle=TRUE)
change x as a:b according to your parameter.