plot the iterations of optim vs the logliklihood values in r - r

I have a function of Maximum likelihood estimation and I am using optim function. I would like to plot the iteration outputs vs the logliklihood values.
Here is an very similar example to my complex function:
y <- rnorm(1000,2,2)
myfunc <- function(x){
fn <- function(theta) { sum ( 0.5*(xvec - theta[1])^2/theta[2] + 0.5* log(theta[2]) ) }
optim(theta <- c(0,5), fn, hessian=TRUE,method = "L-BFGS-B",lower=c(0,0),control = list(trace=1))
}
The output is:
iter 10 value 12.001318
final value 12.001318
iter 10 is the iteration step.
value 12.001318 is the logliklihood value.
my function return 100 of them. I know that I need to store them first and then plot them. But how to do that in R?
any help please?

There are a couple of options. Option 1: add REPORT = 1 in the control list and the function value will be printed at each step. You would have to process this printed data somehow, maybe by using sink() and then removing the extra text. Option 2: run optim one iteration at a time and storing values. You can then easily store the log likelihood value and plot it. Partial code for both options is presented below.
# generating random values
set.seed(10)
y <- rnorm(1000,2,2)
#### option 1 ####
# intermediate results printed
myfunc <- function(xvec){
fn <- function(theta) { sum ( 0.5*(xvec - theta[1])^2/theta[2] + 0.5* log(theta[2]) ) }
optim(theta <- c(0,5), fn, hessian=TRUE,method = "L-BFGS-B",lower=c(0,0),control = list(trace=1,REPORT=1))
}
# running optimization with input y
myfunc(y)
# would need to copy values or otherwise post process to make plot
#### option 2 ####
# running optimization one iteration at a time
fn <- function(theta) { sum ( 0.5*(y - theta[1])^2/theta[2] + 0.5* log(theta[2]) ) }
# storing log likelihood values
loglvals <- fn(c(0,5))
# initializing variables
temp1par <- c(0,0)
# running the loop
for(i in 1:100){
temp1 <- optim(theta <- ifelse(i==1,1,0)*c(0,5)+ifelse(i==1,0,1)*temp1par, fn, hessian=TRUE,method = "L-BFGS-B",lower=c(0,0),control = list(trace=1,REPORT=1,maxit=1))
temp1par <- temp1$par
loglvals <- c(loglvals,temp1$value)
}
# plotting results
# trimming the length of loglvals because the function converged
# before the loop calling optim stopped
# simply using unique to specify when it the optim didn't return new values
plot(seq(0,length(unique(loglvals)),1)
,loglvals[seq(1,length(unique(loglvals))+1,1)]
,ylab='log likelihood'
,xlab='iteration')

Related

R - Sample function doesn't seem to be working in loop

So, I'm relatively new to R and have the following problem:
I want to run 1000 generations of a population of some organism. At each generation there is a certain probability to change from one environment to the other (there are just two different "environments").
Now, the code works just fine and I do get the desired results. However one small issue that still needs to be resolved is that for every run, the initial environment seems to be set at environment 1 even though I defined the initial environment to be randomly sampled (should be either environment 1 OR 2; you can find this in line 12 of the second block of code).
If anybody could help me resolve this issue, I would be very thankful.
simulate_one_gen_new <- function(K, N_total_init, N_wt, N_generalist, N_specialist, growth_wt, growth_generalist, growth_specialist, mut_rate) {
scaling <- min(K/(N_wt + N_generalist + N_specialist),1)
# draw offspring according to Poisson distribution
offsp_wt <- rpois(1, scaling * N_wt * growth_wt)
offsp_generalist <- rpois(1, scaling * N_generalist * growth_generalist)
offsp_specialist <- rpois(1, scaling * N_specialist * growth_specialist)
# draw new mutants according to Poisson distribution
mut_wt_to_generalist <- rpois(1, N_wt * (mut_rate/2))
mut_wt_to_specialist <- rpois(1, N_wt * (mut_rate/2))
# determine new population sizes of wild type and mutant
N_wt_new <- max(offsp_wt - mut_wt_to_specialist - mut_wt_to_generalist, 0)
N_generalist_new <- max(offsp_generalist + mut_wt_to_generalist,0)
N_specialist_new <- max(offsp_specialist + mut_wt_to_specialist,0)
N_total_new <- N_wt_new + N_generalist_new + N_specialist_new
return(c(N_total_new, N_wt_new, N_generalist_new, N_specialist_new))
}
# Test the function
print(simulate_one_gen_new(100,100,100,0,0,0.9,1.0,1.1,0.001))
The code block above is needed to simulate one single generation.
simulate_pop_new <- function(K, N_total_init,N_init_wt,
growth_vec1, growth_vec2, growth_vec3,
mut_rate, switch_prob) {
# determine that there are no mutants present at time 0
N_init_generalist <- 0
N_init_specialist <- 0
# Create the vector in which to save the results, including the index of the environment
pop_vector <- c(N_total_init,N_init_wt, N_init_generalist, N_init_specialist, 1)
# initiate the variables
pop_new <- c(N_total_init, N_init_wt, N_init_generalist, N_init_specialist)
# determine that the first environment is either 1 or 2
env_temp <- sample(1:2, size = 1, replace = T)
tmax <- 1000
j <- 0
# run the simulation until generation t_max
for (i in 1:tmax) {
# redefine the current population one generation later
pop_new <- c(simulate_one_gen_new(K,pop_new[1],pop_new[2],pop_new[3],pop_new[4], growth_vec1[env_temp],growth_vec2[env_temp], growth_vec3[env_temp],mut_rate),env_temp)
# add the new population sizes to the output vector
pop_vector <- rbind(pop_vector,pop_new)
# determine whether environmental switch occurs and make it happen
env_switch <- rbinom(1,1,switch_prob)
if (env_switch==1)
{
if(env_temp==1) env_temp <- 2
else env_temp <- 1
}
# condition to stop the simulation before t_max: either the population has only one of the two mutants left, or the whole population goes extinct
if ((pop_new[2] == 0 & pop_new[3] == 0) | (pop_new[2] == 0 & pop_new[4] == 0)){j=j+1}
if (j == 100) break #here we let it run 100 generations longer after the conditions above are met
}
# define the row and column names of the output vector
rownames(pop_vector) <- (0:length(pop_vector[1]))[1:length(pop_vector[,1])] # note that the vector has to be cut if the simulation stopped early
colnames(pop_vector) <- c("total","wt","generalist","specialist","env")
# return the result
return(pop_vector)
}
# Test the function and plot the result
# create your simulation data
output <- simulate_pop_new(1000,1000,1000,c(0.98,0.99),c(1.04,1.02),c(0.96,1.1),0.001,0.5)
# show the last few lines of the data table
print(tail(output))
# determine x axis range
x_range <- 0:(length(output[,1])-1)
# Create data frame from output (or just rename it)
df <- data.frame(output)
# Add a new column to our output that simply shows the Generations
df$Generation<-1:nrow(df)
# Manually create data frame where the genotypes are not separate but all in one column. Note that we need to repeat/ add together all other values since our "Genotype" column will be three times longer.
Genotype <- rep(c("wt", "generalist", "specialist"), each = length(output[,1]))
PopSize <- c(df$wt, df$generalist, df$specialist)
Generation <- rep(df$Generation, 3)
environment <- rep(df$env, 3)
# Let's also create a column solely for the total population
All_Genotypes <- df$generalist + df$wt + df$specialist
N_tot <- rep(All_Genotypes, 3)
# Create a new data frame containing the modified columns which we will be using for plotting
single_run <- data.frame(Generation, Genotype, PopSize, N_tot, environment)
print(tail(single_run))
Above is the second block of code which now simulates 1000 generations.

How to plot histogram from the R output?

I am trying to plot histogram from the R output which is not a data frame. Below are my codes and the output.
x <- replicate(1000,
{y <- rpois(200, 1)
{lambda0 <- 1
for(i in 1:1)
{
if( i == 1 ) cat( sprintf("%15s %15s %15s %15s\n", "LogL", "Score", "Information", "New Estimate"))
logL <- sum((-lambda0) + y*(log(lambda0)))
score <- sum((y/lambda0)-1)
information <- sum(y/(lambda0)^2)
lambda1 <- lambda0 + score/information
cat( sprintf("%15.4f %15.4f %15.4f %15.5f\n", logL, score, information, lambda1))
lambda0 <- lambda1
}
}
})
Below is my output
I'm trying to take the new estimate from the output and create histogram. Can you please help?
Thank you.
You need to store the value for New Estimate during your loop. This way you can retrieve your results after the loop is finished. Normally when using a loop, you specify a variable in advance in which you can save the result for each iteration. E.g.:
numbers <- 1:3
result <- list(length = length(numbers)
for (i in seq_along(numbers){
result[[i]] <- numbers[[i]] + 1
}
In this example there is a vector of three numbers, you want to add one to each number and save the result. You can do this by creating a list of length 3 (adding length is better, but not necessary) and for each ith iteration, you save the result in the ith element of the list.
After finishing the loop you can retrieve the results from the result variable. And you can retrieve the ith result by using square brackets: result[[i]].

How do I run a function for multiple parameters, return an output and have this in a single data table?

I have developed code that calculates a value for a given set of parameters, this works for a single set of parameters.
library(spatstat)
library(ggplot2)
library(dplyr)
library(tidyr)
#Generating a clustered landscape
dim <- 2000
radiusCluster<-100
lambdaParent<-.02
lambdaDaughter<-30
hosts<-900
randmod<-0
numbparents<-rpois(1,lambdaParent*dim)
xxParent<-runif(numbparents,0+radiusCluster,dim-radiusCluster)
yyParent<-runif(numbparents,0+radiusCluster,dim-radiusCluster)
numbdaughter<-rpois(numbparents,(lambdaDaughter))
sumdaughter<-sum(numbdaughter)
theta<-2*pi*runif(sumdaughter)
rho<-radiusCluster*sqrt(runif(sumdaughter))
xx0=rho*cos(theta)
yy0=rho*sin(theta)
xx<-rep(xxParent,numbdaughter)
yy<-rep(yyParent,numbdaughter)
xx<-xx+xx0
yy<-yy+yy0
cds<-data.frame(xx,yy)
is_outlier<-function(x){
x > dim| x < 0
}
cds<-cds[!(is_outlier(cds$xx)|is_outlier(cds$yy)),]
sampleselect<-sample(1:nrow(cds),hosts,replace=F)
cds<-cds%>%slice(sampleselect)
randfunction<-function(x){
x<-runif(length(x),0,dim)
}
randselect<-sample(1:nrow(cds),floor(hosts*randmod),replace=F)
cds[randselect,]<-apply(cds[randselect,],1,randfunction)
landscape<-ppp(x=cds$xx,y=cds$yy,window=owin(xrange=c(0,dim),yrange=c(0,dim)))
ggplot(data.frame(landscape))+geom_point(aes(x=x,y=y))+coord_equal()+theme_minimal()
#Calculating a metric for clustering
kk<-Kest(landscape)
plot(kk)
kk_iso<-kk$iso
kk_pois<-kk$theo
kk_div_na<-kk_iso/kk_pois
kk_div_0<-replace_na(kk_div_na,0)
kk_mean<-round(mean(kk_div_0),3)
So I can say for radiusCluster of 100 and randmod of 0, I get a kk_mean of "value". I want to use radiusCluster and randmod as my variables and run this experiment for a set of these variables. I begin by generating the data table that I want.
random_parameter<-rep(c(0,.5,1),3)
radiusCluster_parameter<-rep(c(100,300,600),each=3)
Cluster_metric<-rep(NA,length(radiusCluster_parameter))
parameter_table<-data.frame(random_parameter,radiusCluster_parameter,Cluster_metric)
colnames(parameter_table)<-c("r", "rho", "sigma")
Here r is randmod, rho is radiusCluster and sigma is kk_mean.
Then I create a function of the above code for generating the clustered landscape and calculating the metric.
cluster_function <- function (dim,
lambdaParent,
lambdaDaughter,
hosts,
randmod,
radiusCluster) {
numbparents <- rpois(1, lambdaParent * dim)
xxParent <- runif(numbparents, 0 + radiusCluster, dim - radiusCluster)
yyParent <- runif(numbparents, 0 + radiusCluster, dim - radiusCluster)
numbdaughter <- rpois(numbparents, (lambdaDaughter))
sumdaughter <- sum(numbdaughter)
theta <- 2 * pi * runif(sumdaughter)
rho <- radiusCluster * sqrt(runif(sumdaughter))
xx0 = rho * cos(theta)
yy0 = rho * sin(theta)
xx <- rep(xxParent, numbdaughter)
yy <- rep(yyParent, numbdaughter)
xx <- xx + xx0
yy <- yy + yy0
cds <- data.frame(xx, yy)
is_outlier <- function(x) {
x > dim | x < 0
}
cds <- cds[!(is_outlier(cds$xx) | is_outlier(cds$yy)), ]
sampleselect <- sample(1:nrow(cds), hosts, replace = F)
cds <- cds %>% slice(sampleselect)
randfunction <- function(x) {
x <- runif(length(x), 0, dim)
}
randselect <- sample(1:nrow(cds), floor(hosts * randmod), replace = F)
cds[randselect, ] <- apply(cds[randselect, ], 1, randfunction)
landscape<-ppp(x=cds$xx,y=cds$yy,window=owin(xrange=c(0,dim),yrange=c(0,dim)))
ggplot(data.frame(landscape))+geom_point(aes(x=x,y=y))+coord_equal()+theme_minimal()
kk<-Kest(landscape)
plot(kk)
kk_iso<-kk$iso
kk_pois<-kk$theo
kk_div_na<-kk_iso/kk_pois
kk_div_0<-replace_na(kk_div_na,0)
kk_mean<-round(mean(kk_div_0),3)
}
I then try running cluster_function for a set of parameters, however, this does not work.
cluster_function(dim <- 2000,
lambdaParent <-.02,
lambdaDaughter<-30,
hosts<-900,
randmod<-0,
radiusCluster<-0)
The parameters are defined in the global environment but nothing happens. So I decide to remove the landscape and ggplot command from the function and call the function to an output. Then hopefully the output will be data frame of the co ordinates that I generated in cds and can be used in a ppp() function and be plottable.
output<-cluster_function(dim <- 2000,
lambdaParent <-.02,
lambdaDaughter<-30,
hosts<-900,
randmod<-0,
radiusCluster<-0)
Output is numeric (empty). How can I get the function to work for the parameters in the cluster_function() and is it possible to run this for multiple parameters? I was thinking something along the lines of:
for (i in length(parameter_table)){
cluster_function(dim <- 2000,
lambdaParent <-.02,
lambdaDaughter<-30,
hosts<-900,
randmod<-parameter_table[i,"r"],
radiusCluster<-parameter_table[i,"rho"])
I then try running cluster_function for a set of parameters, however, this does not work
It looks like it's working to me ;) Do you want the ggplot to be printed? You can addp <- ggplot(...) followed be print(p) to see it (you may need to refresh the plot viewer...).
Output is numeric (empty). How can I get the function to work
Add an explicit return: return(cds)
And you can of course run the function multiple times. A for loop works, or you could check out purrr::pmap() or mapply(). Good luck!

Looping a Rayleigh test to find highest value

I am a beginner with R, so hopefully this will be an easy fix.
I am trying to use a for loop on a dataset for neuron firing direction in order to:
Incrementally add the next value from the dataset to a vector
Run a Rayleigh test on that vector and save it to a variable
Test if the Rayleigh test I just ran has a larger statistic than the the Rayleigh test in the last loop just before it, as well as having a p-value of less than .05
If the value is larger, save the statistic value, so that the next loop can compare to it
If the value is larger, save the vector
So far I have this for the code, and after going through it for a long time I'm at a loss for why it's not working. Every time I run it, the for loop goes all the way to the end and just reports the rayleigh value and vector for the whole dataset, which I know for sure isn't correct.
(I'm using the circular package for the rayleigh test function)
# This first line is just to create an initial rayleigh statistic to compare to in the loop that is low
best_rayleigh <- rayleigh.test(1:10)
data_vector <- c()
for (i in firing_directions) {
data_vector <- append(data_vector, i)
ray_lee_test <- rayleigh.test(data_vector)
if ((ray_lee_test$statistic>best_rayleigh$statistic)&(ray_lee_test$p.value<=.05)) {
best_rayleigh <- ray_lee_test
best_rayleigh_vector <- data_vector
} else {
NULL
}
}
Any help is appreciated. Thank you!
Update: I tried using && instead of single & in the if statement, however it returned the same result
The following code doesn't give warnings and selects the vector with highest test statistic and "significant" p-value.
library(circular)
set.seed(2020)
firing_directions <- rvonmises(n = 25, mu = circular(pi), kappa = 2)
plot(firing_directions)
best_rayleigh <- rayleigh.test(circular(1:10))
for(i in seq_along(firing_directions)){
dv <- firing_directions[seq_len(i)]
rltest <- rayleigh.test(dv)
if((rltest$statistic > best_rayleigh$statistic) && (rltest$p.value <= 0.05)){
best_rayleigh <- rltest
best_rayleigh_vector <- dv
}
}
best_rayleigh
#
# Rayleigh Test of Uniformity
# General Unimodal Alternative
#
#Test Statistic: 0.8048
#P-value: 0.0298
best_rayleigh_vector
#Circular Data:
#Type = angles
#Units = radians
#Template = none
#Modulo = asis
#Zero = 0
#Rotation = counter
#[1] 4.172219 2.510826 2.997495 4.095335 3.655613
I think the NULL is throwing up some issues. Not sure what will have if your throw a NULL. You only update the vector if it passes your criteria:
library(circular)
firing_directions= rvonmises(n=25, mu=circular(pi), kappa=2)
best_rayleigh <- rayleigh.test(1:10)
data_vector <- c()
for (i in firing_directions){
data_vector <- c(data_vector, i)
ray_lee_test <- rayleigh.test(data_vector)
if ((ray_lee_test$statistic>best_rayleigh$statistic)&(ray_lee_test$p.value<=.05)) {
best_rayleigh <- ray_lee_test
best_rayleigh_vector <- data_vector
}
}

Value-at-Risk (Extreme-Value Theory) using Monte Carlo Simulation in R

I have code that successfully calculates VaR based on Extreme Value Theory using historical data. I'm trying to run this same code on multiple simulated price paths (i.e. calculating a VaR for each path) and then taking the median or average of those VaRs.
Every example I could find online had the simulation function return the price at the end of the period and then they replicated the function X many time. That makes sense to me, except that I essentially need to calculate value-at-risk for each simulated path. Below is the code I have so far. I can say that the code works when using historical data (i.e. the "evt" function works fine and the datatable is populated correctly when the lossOnly, u, and evtVar lines aren't in a function). However, I've been trying to implement simulation in the second function and trying various combinations, which have all failed.
library('RODBC')
library('nor1mix')
library('fExtremes')
library('QRM')
library('fGarch')
#function for computing the EVT VaR
evt <- function(data,u){
#fit excess returns to gpd to get estimates
gpdfit = tryCatch({
gpdfit <- gpdFit(data,u,type="mle")
}, warning = function(w) {
gpdfit <- gpdFit(data,u,type="mle",optfunc="nlminb")
return(gpdfit)
}, error = function(e) {
gpdfit <- gpdFit(data,u,type="pwm",optfunc="nlminb")
return(gpdfit)
}, finally = {})
#now calculate VaRs
xi <- gpdfit#fit$par.ests["xi"]
beta <- gpdfit#fit$par.ests["beta"]
Nu <- length(gpdfit#data$exceedances)
n <- length(data)
evtVar95 <- (u+((beta/xi)*(((n/Nu)*.05)^(-xi) - 1.)))*100
evtVar99 <- (u+((beta/xi)*(((n/Nu)*.01)^(-xi) - 1.)))*100
evtVar997 <- (u+((beta/xi)*(((n/Nu)*.003)^(-xi) - 1.)))*100
evtVar999 <- (u+((beta/xi)*(((n/Nu)*.001)^(-xi) - 1.)))*100
#return calculations
return(cbind(evtVar95,evtVar99,evtVar997,evtVar999,u,xi,beta,Nu,n))
}
#data <- read.table("pricedata.txt")
prices <- data$V1
returns <- diff(log(prices)) #or returns <- log(prices[-1]/prices[-n])
xi <- mean(returns)
std <- sd(returns)
N <- length(prices)
lstval <- prices[N]
options(scipen = 999)
p <- c(lstval, rep(NA, N-1))
gen.path <- function(){
N <- length(prices)
for(i in 2:N)
p[i] <- p[i-1] * exp(rnorm(1, xi, std))
# plot(p, type = "l", col = "brown", main = "Simulated Price")
#evt calculation
#first get only the losses and then make them absolute
lossOnly <- abs(p[p<0])
#get threshold
u <- quantile(lossOnly, probs = 0.9, names=FALSE)
evtVar <- evt(lossOnly,u)
return(evtVar)
}
runs <- 10
sim.evtVar <- replicate(runs, gen.path())
evtVar <- mean(sim.evtVar)
#add data to total table
VaR <- c(evtVar[1],evtVar[2],evtVar[3],evtVar[4],evtVar[5],evtVar[6],evtVar[7],evtVar[8],evtVar[9])
DF <- data.frame(VaR, row.names=c("evtVar95","evtVaR_99","evtVaR_997","evtVaR_999","u","xi","beta","Nu","n"))
In short, I'm trying to run the value-at-risk function (first function) within the monte carlo function (second function) and trying to put the average simulated values into a data tables. I know the first function works, but it's the second function that's driving me crazy. There are the errors I'm getting:
> sim.evtVar <- replicate(runs, gen.path())
Error in if (xi > 0.5) { : missing value where TRUE/FALSE needed
Called from: .gpdpwmFit(x, u)
Browse[1]> evtVar <- mean(sim.evtVar)
Error during wrapup: object 'sim.evtVar' not found
Browse[1]>
> #add data to total table
> VaR <- c(evtVar[1],evtVar[2],evtVar[3],evtVar[4],evtVar[5],evtVar[6],evtVar[7],evtVar[8],evtVar[9])
Error: object 'evtVar' not found
> DF <- data.frame(VaR, row.names=c("evtVar95","evtVaR_99","evtVaR_997","evtVaR_999","u","xi","beta","Nu","n"))
Error in as.data.frame.default(x[[i]], optional = TRUE) :
cannot coerce class ""function"" to a data.frame
Any help you can provide is greatly appreciated! Thank you in advance!
I think the Problem is this row:
lstval <- prices[N]
because if you take a stock price, that can't ever be negative, you produce an empty vector at this row in your function:
lossOnly <- abs(p[p<0])
you should try instead:
lstval <- min(returns)
if you want the highest negative return of your dataset

Resources