Is there a way in R to find a maximum value during a three point estimate - r

I am using the R programming language. Suppose I have the following 3 point estimate data : Data
Here, Task & Task 2 are being done parallelly, whereas Task 3 and Task 4 are done in series, where task 4 is dependent on the completion of task 3. So now, minimum time from Task 1 & Task 2 is '10', most likely is '20' and maximum is '40'. Which will be added to Task 3 & 4 giving us the total time.
When the three point cost estimation is given, the min, most likely and max cost is added together and a simulation(1000, 10000...whatever) is run. But in case of time The general rule is: time for tasks in series should be added; time for tasks in parallel equal the time it takes for the longest task.
How is the time estimation executed in R as we are adding up rows for multiple simulations in one go.
code:
inv_triangle_cdf <- function(P, vmin, vml, vmax){
Pvml <- (vml-vmin)/(vmax-vmin)
return(ifelse(P < Pvml,
vmin + sqrt(P*(vml-vmin)*(vmax-vmin)),
vmax - sqrt((1-P)*(vmax-vml)*(vmax-vmin))))
}
#no of simulation trials
n=1000
#read in cost data
task_costs <- read.csv(file="task_costs.csv", stringsAsFactors = F)
str(task_costs)
#set seed for reproducibility
set.seed(42)
#create data frame with rows = number of trials and cols = number of tasks
csim <- as.data.frame(matrix(nrow=n,ncol=nrow(task_costs)))
# for each task
for (i in 1:nrow(task_costs)){
#set task costs
vmin <- task_costs$cmin[i]
vml <- task_costs$cml[i]
vmax <- task_costs$cmax[i]
#generate n random numbers (one per trial)
psim <- runif(n)
#simulate n instances of task
csim[,i] <- inv_triangle_cdf(psim,vmin,vml,vmax)
}
#sum costs for each trial
ctot <- csim[,1] + csim[,2] + csim[,3] + csim[,4] #costs add
ctot
How can I update this in order to accommodate time duration from the data given above?

Related

Calculate PMF for a fair coin tossed 10 times

A fair coin is tossed 10 times. Find the Probability Mass Function (PMF) of X and the length of the longest run of heads observed. Need to write R code to accomplish this task. Following are some of the functions to use for this task:
as.integer(), intToBits(), rev(), rle().
I have the starting idea of the function to use, but do not have sufficient knowledge to tie it together to calculate the PMF and calculate the length of the longest run.
toBinary <- function(n){
paste0(as.integer(rev(intToBits(n)[1:10])),collapse = "")
}
toBinary(4)
toBinary(1023)
for(i in 0:1023){
print(toBinary(i))
}
I have added the following lines to complete the task:
# number of trials
trials <- 10
# probability of success
success <- 0.5
# x is number of random variable X
x <- 0:trials
# number of probabilities for a binomial distribution
prob <- dbinom(x,trials,success)
prob
# create a table with the data from above
prob_table<-cbind(x,prob)
prob_table
# specify the column names from the probability table
colnames(prob_table)<-c("x", "P(X=x)")
prob_table

MCMC in R Modify Proposal

I've been working with MCMC for population genetics and I have some doubts.
I'm not experienced in statistics and because of that I have difficulty.
I have code to run MCMC, 1000 iterations. I start by creating a matrix with 0's (50 columns = 50 individuals and 1000 lines for 1000 iterations).
Then I create a random vector to substitute the first line of the matrix. This vector has 1's and 2's, representing population 1 or population 2.
I also have genotype frequencies and the genotypes of the 50 individuals.
What I want is to, according to the genotype frequencies and genotypes, determine to what population an individual belongs.
Then, I'll keep changing the population assigned to a random individual and checking if the new value should be accepted.
niter <- 1000
z <- matrix(0,nrow=niter,ncol=ncol(targetinds))
z[1,] <- sample(1:2, size=ncol(z), replace=T)
lhood <- numeric(niter)
lhood[1] <- compute_lhood_K2(targetinds, z[1,], freqPops)
accepted <- 0
priorz <- c(1e-6, 0.999999)
for(i in 2:niter) {
z[i,] <- z[i-1,]
# propose new vector z, by selecting a random individual, proposing a new zi value
selind <- sample(1:nind, size=1)
# proposal probability of selecting individual at random
proposal_ratio_ind <- log(1/nind)-log(1/nind)
# propose a new index for the selected individual
if(z[i,selind]==1) {
z[i,selind] <- 2
} else {
z[i,selind] <- 1
}
# proposal probability of changing the index of individual is 1/2
proposal_ratio_cluster <- log(1/2)-log(1/2)
propratio <- proposal_ratio_ind+proposal_ratio_cluster
# compute f(x_i|z_i*, p)
# the probability of the selected individual given the two clusters
probindcluster <- compute_lhood_ind_K2(targetinds[,selind],freqPops)
# likelihood ratio f(x_i|z_i*,p)/f(x_i|z_i, p)
lhoodratio <- probindcluster[z[i,selind]]-probindcluster[z[i-1,selind]]
# prior ratio pi(z_i*)/pi(z_i)
priorratio <- log(priorz[z[i,selind]])-log(priorz[z[i-1,selind]])
# accept new value according to the MH ratio
mh <- lhoodratio+propratio+priorratio
# reject if the random value is larger than the MH ratio
if(runif(1)>exp(mh)) {
z[i,] <- z[i-1,] # keep the same z
lhood[i] <- lhood[i-1] # keep the same likelihood
} else { # if accepted
lhood[i] <- lhood[i-1]+lhoodratio # update the likelihood
accepted <- accepted+1 # increase the number of accepted
}
}
It is asked that I have to change the proposal probability so that the new proposed values are proportional to the likelihood. This leads to a Gibbs sampling MCMC algorithm, supposedly.
I don't know what to change in the code to do this. I also don't understand very well the concept of proposal probability and how to chose the prior.
Grateful if someone knows how to clarify my doubts.
Your current proposal is done here:
# propose a new index for the selected individual
if(z[i,selind]==1) {
z[i,selind] <- 2
} else {
z[i,selind] <- 1
}
if the individual is assigned to cluster 1, then you propose to switch assignment deterministically by assigning them to cluster 2 (and vice versa).
You didn't show us what freqPops is, but if you want to propose according to freqPops then I believe the above code has to be replaced by
z[i,selind] <- sample(c(1,2),size=1,prob=freqPops)
(at least that is what I understand when you say you want to propose based on the likelihood - however, that statement of yours is unclear).
For this now to be a valid mcmc gibbs sampling algorithm you also need to change the next line of code:
proposal_ratio_cluster <- log(freqPops[z[i-1,selind]])-log(fregPops[z[i,selind]])

Simulate Compound poisson process in r

I'm trying to simulate a compound Poisson process in r. The process is defined by $ \sum_{j=1}^{N_t} Y_j $ where $Y_n$ is i.i.d sequence independent $N(0,1)$ values and $N_t$ is a Poisson process with parameter $1$. I'm trying to simulate this in r without luck. I have an algorithm to compute this as follows:
Simutale the cPp from 0 to T:
Initiate: $ k = 0 $
Repeat while $\sum_{i=1}^k T_i < T$
Set $k = k+1$
Simulate $T_k \sim exp(\lambda)$ (in my case $\lambda = 1$)
Simulate $Y_k \sim N(0,1)$ (This is just a special case, I would like to be able to change this to any distribution)
The trajectory is given by $X_t = \sum_{j=1}^{N_t} Y_j $ where $N(t) = sup(k : \sum_{i=1}^k T_i \leq t )$
Can someone help me to simulate this in r so that I can plot the process? I have tried, but can't get it done.
Use cumsum for the cumulative sums that determine the times N_t as well as the X_t. This illustrative code specifies the number of times to simulate, n, simulates the times in n.t and the values in x, and (to display what it has done) plots the trajectory.
n <- 1e2
n.t <- cumsum(rexp(n))
x <- c(0,cumsum(rnorm(n)))
plot(stepfun(n.t, x), xlab="t", ylab="X")
This algorithm, since it relies on low-level optimized functions, is fast: the six-year-old system I tested it on will generate over three million (time, value) pairs per second.
That's usually good enough for simulation, but it doesn't quite satisfy the problem, which asks to generate a simulation out to time T. We can leverage the preceding code, but the solution is a little trickier. It computes a reasonable upper limit on how many times will occur in the Poisson process before time T. It generates the inter-arrival times. This is wrapped in a loop that will repeat the procedure in the (rare) event the time T is not actually reached.
The additional complexity doesn't change the asymptotic calculation time.
T <- 1e2 # Specify the end time
T.max <- 0 # Last time encountered
n.t <- numeric(0) # Inter-arrival times
while (T.max < T) {
#
# Estimate how many random values to generate before exceeding T.
#
T.remaining <- T - T.max
n <- ceiling(T.remaining + 3*sqrt(T.remaining))
#
# Continue the Poisson process.
#
n.new <- rexp(n)
n.t <- c(n.t, n.new)
T.max <- T.max + sum(n.new)
}
#
# Sum the inter-arrival times and cut them off after time T.
#
n.t <- cumsum(n.t)
n.t <- n.t[n.t <= T]
#
# Generate the iid random values and accumulate their sums.
#
x <- c(0,cumsum(rnorm(length(n.t))))
#
# Display the result.
#
plot(stepfun(n.t, x), xlab="t", ylab="X", sub=paste("n =", length(n.t)))

Running into problems with running null model of genetic drift in R

I've been trying to wrangle a basic model of genetic drift in R.
However, every time I try to run the program it won't stop, and I have to manually stop it.
My complete code:
trials <- 100 #initialize the number of times you'll generate the time to fixation
fixation <- trials #Create a vector that records the number of generations until fixation of the alleles.
genVector <- numeric(trials)
for(i in 1:trials){
pop <- c(rep('a',20), rep('b',20)) #Initialize the population with equal numbers of both a and b alleles, for twenty individuals, or 40 alleles.
genTime <- 1 #Number of generations
freq <- length(pop[grep('a', pop)])/length(pop)
while(freq > 0 | freq < 1){ #While the frequency of a in the population is greater than 0 or less than 1, perform the following calculations
pop <- sample(pop, length(pop), replace = TRUE) #Randomly select 40 alleles with constant replacement
freq <- length(pop[grep('a', pop)])/length(pop)
genTime <- genTime + 1 #Add one to the generation time
}
genVector[i] <- genTime
}
I believe I have isolated the problem to the while loop I am using, within the for loop. I have no idea why it won't stop running though. Any comments or suggestions would be greatly appreciated!

Fast loan rate calculation for a big number of loans

I have a big data set (around 200k rows) where each row is a loan. I have the loan amount, the number of payments, and the loan payment.
I'm trying to get the loan rate.
R doesn't have a function for calculating this (at least base R doesn't have it, and I couldn't find it).
It isn't that hard to write both a npv and irr functions
Npv <- function(i, cf, t=seq(from=0,by=1,along.with=cf)) sum(cf/(1+i)^t)
Irr <- function(cf) { uniroot(npv, c(0,100000), cf=cf)$root }
And you can just do
rate = Irr(c(amt,rep(pmt,times=n)))
The problem is when you try to calculate the rate for a lot of payments. Because uniroot is not vectorized, and because rep takes a surprising amount of time, you end up with a slow calculation. You can make it faster if you do some math and figure out that you are looking for the roots of the following equation
zerome <- function(r) amt/pmt-(1-1/(1+r)^n)/r
and then use that as input for uniroot. This, in my pc, takes around 20 seconds to run for my 200k database.
The problem is that I'm trying to do some optimization, and this is a step of the optimization, so I'm trying to speed it up even more.
I've tried vectorization, but because uniroot is not vectorized, I can't go further that way. Is there any root finding method that is vectorized?
Thanks
Instead of using a root finder, you could use a linear interpolator. You will have to create one interpolator for each value of n (the number of remaining payments). Each interpolator will map (1-1/(1+r)^n)/r to r. Of course you will have to build a grid fine enough so it will return r to an acceptable precision level. The nice thing with this approach is that linear interpolators are fast and vectorized: you can find the rates for all loans with the same number of remaining payments (n) in a single call to the corresponding interpolator.
Now some code that proves it is a viable solution:
First, we create interpolators, one for each possible value of n:
n.max <- 360L # 30 years
one.interpolator <- function(n) {
r <- seq(from = 0.0001, to = 0.1500, by = 0.0001)
y <- (1-1/(1+r)^n)/r
approxfun(y, r)
}
interpolators <- lapply(seq_len(n.max), one.interpolator)
Note that I used a precision of 1/100 of a percent (1bp).
Then we create some fake data:
n.loans <- 200000L
n <- sample(n.max, n.loans, replace = TRUE)
amt <- 1000 * sample(100:500, n.loans, replace = TRUE)
pmt <- amt / (n * (1 - runif(n.loans)))
loans <- data.frame(n, amt, pmt)
Finally, we solve for r:
library(plyr)
system.time(ddply(loans, "n", transform, r = interpolators[[n[1]]](amt / pmt)))
# user system elapsed
# 2.684 0.423 3.084
It's fast. Note that some of the output rates are NA but it is because my random inputs made no sense and would have returned rates outside of the [0 ~ 15%] grid I selected. Your real data won't have that problem.

Resources