Running into problems with running null model of genetic drift in R - r

I've been trying to wrangle a basic model of genetic drift in R.
However, every time I try to run the program it won't stop, and I have to manually stop it.
My complete code:
trials <- 100 #initialize the number of times you'll generate the time to fixation
fixation <- trials #Create a vector that records the number of generations until fixation of the alleles.
genVector <- numeric(trials)
for(i in 1:trials){
pop <- c(rep('a',20), rep('b',20)) #Initialize the population with equal numbers of both a and b alleles, for twenty individuals, or 40 alleles.
genTime <- 1 #Number of generations
freq <- length(pop[grep('a', pop)])/length(pop)
while(freq > 0 | freq < 1){ #While the frequency of a in the population is greater than 0 or less than 1, perform the following calculations
pop <- sample(pop, length(pop), replace = TRUE) #Randomly select 40 alleles with constant replacement
freq <- length(pop[grep('a', pop)])/length(pop)
genTime <- genTime + 1 #Add one to the generation time
}
genVector[i] <- genTime
}
I believe I have isolated the problem to the while loop I am using, within the for loop. I have no idea why it won't stop running though. Any comments or suggestions would be greatly appreciated!

Related

Is there a way in R to find a maximum value during a three point estimate

I am using the R programming language. Suppose I have the following 3 point estimate data : Data
Here, Task & Task 2 are being done parallelly, whereas Task 3 and Task 4 are done in series, where task 4 is dependent on the completion of task 3. So now, minimum time from Task 1 & Task 2 is '10', most likely is '20' and maximum is '40'. Which will be added to Task 3 & 4 giving us the total time.
When the three point cost estimation is given, the min, most likely and max cost is added together and a simulation(1000, 10000...whatever) is run. But in case of time The general rule is: time for tasks in series should be added; time for tasks in parallel equal the time it takes for the longest task.
How is the time estimation executed in R as we are adding up rows for multiple simulations in one go.
code:
inv_triangle_cdf <- function(P, vmin, vml, vmax){
Pvml <- (vml-vmin)/(vmax-vmin)
return(ifelse(P < Pvml,
vmin + sqrt(P*(vml-vmin)*(vmax-vmin)),
vmax - sqrt((1-P)*(vmax-vml)*(vmax-vmin))))
}
#no of simulation trials
n=1000
#read in cost data
task_costs <- read.csv(file="task_costs.csv", stringsAsFactors = F)
str(task_costs)
#set seed for reproducibility
set.seed(42)
#create data frame with rows = number of trials and cols = number of tasks
csim <- as.data.frame(matrix(nrow=n,ncol=nrow(task_costs)))
# for each task
for (i in 1:nrow(task_costs)){
#set task costs
vmin <- task_costs$cmin[i]
vml <- task_costs$cml[i]
vmax <- task_costs$cmax[i]
#generate n random numbers (one per trial)
psim <- runif(n)
#simulate n instances of task
csim[,i] <- inv_triangle_cdf(psim,vmin,vml,vmax)
}
#sum costs for each trial
ctot <- csim[,1] + csim[,2] + csim[,3] + csim[,4] #costs add
ctot
How can I update this in order to accommodate time duration from the data given above?

How to generate samples of uniform random variable and count how many times 2*mean is greater than upper parameter

I don't know how to write the code to generate 1000 experiments of a uniform random variable between 0 and 10 with 10 data points such that it counts and returns to me how many times twice the mean of the sample points is greater than 10 and was hoping for some help with it.
Thank you
results <- rep(x=0,times=1000)
for(i in 1:1000) {
dat <- runif(n=10,min=0,max=10)
if(mean(dat) >= 5) {
results[[i]] <- 1
}
}
sum(results)

Error for using for-loop when producing simulations

I'm trying to use for loop to simulate 1,000 portfolios with 3 bonds in each portfolio, and finding the probability that two out of three bonds default.
Here's my code (with comments):
#Reproducibility
set.seed(33)
#Number of trials
n<-1000
#Initialize variables
numberofdefaults<-0
counter<-0
portfolio <- 0
for (i in 1:n){
portfolio[i] <- rbinom(3, 1, prob = 0.127) # generate three random binomial deviates with probabiltiy of sucess("default" in my case)0.127 and store them in a vector
numberofdefaults[i] <- sum(portfolio[i] == 1) # find the number of defaults in the vector (1 for default) and add them up
if (numberofdefaults[i] == 2) { # if number of defaults is 2, then add 1 to the counter
counter<-counter+1
}
}
When I execute the code, I keep getting an error message: number of items to replace is not a multiple of replacement length
Thnx so much for taking your time. Any suggestions would be appreciated.
Your code is not working as intended. Portfolio is a vector, so when you run rbinom(), which has 3 elements, you are attempting to cram 3 elements into one element (the ith element of that particular for loop). It gives you a warning that it can't do that (and only stores in the first element each time). Instead you want Portfolio to be a list.
set.seed(33)
#Number of trials
n<-1000
#Initialize variables
numberofdefaults<-0
counter<-0
portfolio <- list() # Change this
for (i in 1:n){
portfolio[[i]] <- rbinom(3, 1, prob = 0.127) # Change this
numberofdefaults[i] <- sum(portfolio[[i]] == 1) # Change this
if (numberofdefaults[i] == 2) {
counter<-counter+1
}
}

Keep rank over time in R

Is there a way in R to check if ranks are kept over time for individual observations?
I have measured a number of plants over the years and want to check if large plants stay large and small plants stay small (i.e. if the large plants prevent other plants from growing). The plants are ranked 1-5 in size (from small to large).
I have measured about 1000 plants.
Very grateful for any answers or comments.
/Stina
Maybe you do something like this?
# create random data
plantId <- sample(1:50,1000,replace=TRUE)
rank <- sample(1:5,1000,replace=TRUE)
time <- as.POSIXct(sample(1000000:10000000,1000,replace=FALSE)+10000000*rank,origin="1970-01-01")
myData <- data.frame(plantId , rank, time )
# function to calculate the time a plant has a given rank
getRankTime <- function(id,testRank,data=myData){
plantData <- myData[myData$plantId==id,];
if(nrow(plantData) < 2){ # only one observed value of this plant
return(NA)
}else if(all(plantData$rank != testRank)){ # plant was never of the rank under consideration
return(NA)
}else{ # calculate the (censered) time the plant stay(ed) in rank 'testRank'
startObsTimeInRank <- min(plantData$time[plantData$rank == testRank])
if(any(plantData$rank > testRank)){
endObsTimeInRank <- min(plantData$time[plantData$rank > testRank])
}else{
#eighter take the last time
endObsTimeInRank <- max(plantData$time[plantData$rank == testRank])
# alternatively use the current time
# endObsTimeInRank <- Sys.time()
}
return(as.numeric(endObsTimeInRank - startObsTimeInRank))
}
}
# calculate the average time plants stay in a rank
allPlantIds <- unique(myData$plantId)
stayInRankTime <- list()
for(runRank in 1:5){
stayInRankTime[[runRank]] <- sapply(allPlantIds, function(runPlatId) getRankTime(runPlatId,runRank) )
}
# average time plants stay in acertain rank'
avgRankTime <- lapply(stayInRankTime,function(x)mean(x, na.rm =TRUE))
avgRankTime

MCMC in R Modify Proposal

I've been working with MCMC for population genetics and I have some doubts.
I'm not experienced in statistics and because of that I have difficulty.
I have code to run MCMC, 1000 iterations. I start by creating a matrix with 0's (50 columns = 50 individuals and 1000 lines for 1000 iterations).
Then I create a random vector to substitute the first line of the matrix. This vector has 1's and 2's, representing population 1 or population 2.
I also have genotype frequencies and the genotypes of the 50 individuals.
What I want is to, according to the genotype frequencies and genotypes, determine to what population an individual belongs.
Then, I'll keep changing the population assigned to a random individual and checking if the new value should be accepted.
niter <- 1000
z <- matrix(0,nrow=niter,ncol=ncol(targetinds))
z[1,] <- sample(1:2, size=ncol(z), replace=T)
lhood <- numeric(niter)
lhood[1] <- compute_lhood_K2(targetinds, z[1,], freqPops)
accepted <- 0
priorz <- c(1e-6, 0.999999)
for(i in 2:niter) {
z[i,] <- z[i-1,]
# propose new vector z, by selecting a random individual, proposing a new zi value
selind <- sample(1:nind, size=1)
# proposal probability of selecting individual at random
proposal_ratio_ind <- log(1/nind)-log(1/nind)
# propose a new index for the selected individual
if(z[i,selind]==1) {
z[i,selind] <- 2
} else {
z[i,selind] <- 1
}
# proposal probability of changing the index of individual is 1/2
proposal_ratio_cluster <- log(1/2)-log(1/2)
propratio <- proposal_ratio_ind+proposal_ratio_cluster
# compute f(x_i|z_i*, p)
# the probability of the selected individual given the two clusters
probindcluster <- compute_lhood_ind_K2(targetinds[,selind],freqPops)
# likelihood ratio f(x_i|z_i*,p)/f(x_i|z_i, p)
lhoodratio <- probindcluster[z[i,selind]]-probindcluster[z[i-1,selind]]
# prior ratio pi(z_i*)/pi(z_i)
priorratio <- log(priorz[z[i,selind]])-log(priorz[z[i-1,selind]])
# accept new value according to the MH ratio
mh <- lhoodratio+propratio+priorratio
# reject if the random value is larger than the MH ratio
if(runif(1)>exp(mh)) {
z[i,] <- z[i-1,] # keep the same z
lhood[i] <- lhood[i-1] # keep the same likelihood
} else { # if accepted
lhood[i] <- lhood[i-1]+lhoodratio # update the likelihood
accepted <- accepted+1 # increase the number of accepted
}
}
It is asked that I have to change the proposal probability so that the new proposed values are proportional to the likelihood. This leads to a Gibbs sampling MCMC algorithm, supposedly.
I don't know what to change in the code to do this. I also don't understand very well the concept of proposal probability and how to chose the prior.
Grateful if someone knows how to clarify my doubts.
Your current proposal is done here:
# propose a new index for the selected individual
if(z[i,selind]==1) {
z[i,selind] <- 2
} else {
z[i,selind] <- 1
}
if the individual is assigned to cluster 1, then you propose to switch assignment deterministically by assigning them to cluster 2 (and vice versa).
You didn't show us what freqPops is, but if you want to propose according to freqPops then I believe the above code has to be replaced by
z[i,selind] <- sample(c(1,2),size=1,prob=freqPops)
(at least that is what I understand when you say you want to propose based on the likelihood - however, that statement of yours is unclear).
For this now to be a valid mcmc gibbs sampling algorithm you also need to change the next line of code:
proposal_ratio_cluster <- log(freqPops[z[i-1,selind]])-log(fregPops[z[i,selind]])

Resources