Loop structure for basic simulation model in R - r

I'm trying to write a basic model that simulates the growth of a population (whose initial size is drawn randomly from a normal distribution) and then grows by a user defined amount each 'year' (currently 2 individuals in the code below for arguments sake). The output that is produced only shows the results of one simulation and, within this simulation, the population hasn't grown at all i.e. for each 'year' the population hasn't grown/doesn't add to the previous 'years' population. I'm assuming that I've stuffed something up in the loop structure and keen for any advice!
n.years <- 3
n.sim <- 5
store.growth <- matrix(ncol=3,nrow= (n.years * n.sim))
for (i in 1:n.sim) {
init.pop.size <- rnorm(1,100,10)
for (j in 1:n.years){
#grow population
grow.pop <- init.pop.size + 5
store.growth[j,] <- cbind(grow.pop, n.years, n.sim)
}
}
store.growth

Related

Optimising assigments of individuals to groups (with defined maximum capacity) according to individuals' preferences

I am attempting to populate a set of forests with individuals from various species. Each forest has a defined capacity which it cannot exceed (given by Forest Area * Organism Density). Each species has a set population size, which is a fraction of the total population, itself determined by the sum of all forest areas * organism density.
The species also have defined preferences in regards to which forests they are assigned to, according to a relationship with a characteristic that varies between the forests, say rainfall. This results in a matrix of probabilities of a given species individual being assigned to a given forest. In order to give an equal chance for each species to be allocated to its preferred forest, I am assigning one individual at a time, repeatedly iterating through all the species in order, until both the non-assigned species populations have been exhausted and all the forests are at their maximum capacity.
Ideally the proportion of a species' population in a forest will be as close as possible to the probability of that species being assigned to that forest. This will involve a compromise between all species to minimise the total error between the population proportions and the species specific probabilities of assignment. (Thanks to commenters for making this a clearer problem)
At the moment I am doing this with a for, if/else, while loop. Here, when a patch reaches its capacity it is removed from the selection process, and when a species' population has all been assigned, it is iterated over. The species population sizes are stored in one data.frame and the forest capacities are stored in another, and are adjusted accordingly when individuals are assigned.
These criteria have led me to struggle to see an alternative method than the loops (reproducible example below). However, it is very slow as I often have total population sizes in the 100s of millions. I feel there must be a much neater and faster alternative to using loops, perhaps residing in the way I have structured the input data or in the way I provide equal oppurtunity to each species (i.e. a way this doesn't have to be sequential), but I cannot figure one out. All help is greatly appreciated.
set.seed(999)
#Generate data
nSpecies <- 50 #Number of species
max_area <- 10000 #Maximum Area of a forest
nForests <- 20 #Number of different forests
areas <- round(rbeta(nForests, 1, 2) * max_area) #Generate random forest areas
total_area <- sum(areas) #Find total area of all forests
density <- 10 #Set organism density
total_population <- total_area * density #Find total population size across all forests
pop_structure <- table(sample(1:nSpecies, total_population, replace = T)) #Generate species populations
forests <- data.frame(Name = 1:nForests,
Capacity = (areas * density), #Find max population size of each forest
Rainfall = sample(0:10000, nForests, replace = T)) #Generate forest characteristic variable (e.g. rainfall)
species <- data.frame(Species = 1:nSpecies,
Individuals = as.numeric(pop_structure),
Rain_Response = rnorm(nSpecies, 0, 2)) #Generate species rainfall response
#Generate probabilities of assignment to each forest for each species
assignment_probs <- matrix(NA, nrow = nSpecies, ncol = nForests)
for(i in 1:nSpecies){
for(x in 1:nForests){
#Probability of assignment to forest = Exponent of species rain response * log(Rainfall in Forest)
assignment_probs[i,x] <- exp(species$Rain_Response[i] * log(forests$Rainfall[x]))
}
#Scale to sum to 1
assignment_probs[i,] <- (assignment_probs[i,] / sum(assignment_probs[i,]))
}
#Allocate species individuals to a forest
forest_comms <- matrix(0, nrow = nForests, ncol = nSpecies) #Empty community matrix
possible_forests <- 1:nForests #Vector to remove forests from selection without effecting other data
done <- FALSE #Used to exit loop when finished
while(sum(species$Individuals) > 0){ #While individuals in the species pool remain to be assigned...
for(sp in 1:nSpecies){ #Repeatedly assign one individual from each species until all done
if(species$Individuals[sp] > 0){ #If species individuals remain to be assigned, proceed. Else, skip
vacancies <- 0 #Set vacancies to 0 to enter next loop
while(vacancies == 0){ #If there are 0 vacancies in forest selected in next section, retry assignment
forest <- sample(possible_forests, 1, prob = assignment_probs[sp, possible_forests]) #Randomly select forest according to generated assignment probabilities
vacancies <- species$Individuals[forest] #Find no. of individual vacancies yet to be filled in the forest
if(vacancies > 0){ #If vacancies available in forest...
forest_comms[forest, sp] <- (forest_comms[forest, sp] + 1) #Assign an individual to the forest
species$Individuals[sp] <- (species$Individuals[sp] - 1) #Decrease species count by 1
forests$Individuals[forest] <- (forests$Individuals[forest] - 1) #Decrease remaining vacancies in forest by 1
} else { #If forest is already full...
possible_forests <- possible_forests[!possible_forests %in% forest] #Remove forest from selection process
}
if(length(possible_forests) == 1){ #If only one forest has vacancies...
for(i in 1:nrow(species)){ #Assign all remaining individuals to that forest
forest_comms[possible_forests, i] <- (forest_comms[possible_forests, i] + species$Individuals[i])
}
species$Individuals <- 0 #Set population to 0 (all individuals have been assigned)
done <- TRUE #Convert 'done' to true to end loop
break
}
}
}
}
cat('\n', sum(species$Individuals))
if(done){break}
}
sum(forest_comms) == total_population

How can I repeat these two lines of code 100+ times?

I'm still new to the programming world and looking for some guidance on a model I am building for individual animal growths over time.
The goal for the code I'm working with is to
i) Generate random starting sizes of animals from a given distribution
ii) Give each of these individuals a starting growth rate from a given distribution
iii) Calculate new size of individual after 1 year
iv) Assign a new growth rate from above distribution
v) Calculate the new size of individual after another year.
So far I have the code below, and what I want to do is repeat the last two lines of code x amount of times without I having to physically run the code over and over.
# Generate starting lengths
lengths <- seq(from=4.4, to=5.4, by =0.1)
# Generate starting ks (growth rate)
ks <- seq(from=0.0358, to=0.0437, by =0.0001)
#Create individuals
create.inds <- function(id = NaN, length0=NaN, k1=NaN){
inds <- data.frame(id=id, length0 = length0, k1=k1)
inds
}
# Generate individuals
inds <- create.inds(id=1:n.initial,
length=sample(lengths,100,replace=TRUE),
k1=sample(ks, 100, replace=TRUE))
# Calculate new lengths based on last and 2nd last columns and insert into next column
inds[,ncol(inds)+1] <- 326*(1-exp(-(inds[,ncol(inds)])))+
(inds[,ncol(inds)-1]*exp(-(inds[,ncol(inds)])))
# Calculate new ks and insert into last column
inds[,ncol(inds)+1] <- sample(ks, 100, replace=TRUE)
Any and all assistance would be appreciated, also if you think there is a better way to write this please let me know.
i think what you are asking is a simple loop:
for (i in 1:100) { #replace 100 with the desired times you want this to excecute
inds[,ncol(inds)+1] <- 326*(1-exp(-(inds[,ncol(inds)])))+
(inds[,ncol(inds)-1]*exp(-(inds[,ncol(inds)])))
# Calculate new ks and insert into last column
inds[,ncol(inds)+1] <- sample(ks, 100, replace=TRUE)
}

MCMC in R Modify Proposal

I've been working with MCMC for population genetics and I have some doubts.
I'm not experienced in statistics and because of that I have difficulty.
I have code to run MCMC, 1000 iterations. I start by creating a matrix with 0's (50 columns = 50 individuals and 1000 lines for 1000 iterations).
Then I create a random vector to substitute the first line of the matrix. This vector has 1's and 2's, representing population 1 or population 2.
I also have genotype frequencies and the genotypes of the 50 individuals.
What I want is to, according to the genotype frequencies and genotypes, determine to what population an individual belongs.
Then, I'll keep changing the population assigned to a random individual and checking if the new value should be accepted.
niter <- 1000
z <- matrix(0,nrow=niter,ncol=ncol(targetinds))
z[1,] <- sample(1:2, size=ncol(z), replace=T)
lhood <- numeric(niter)
lhood[1] <- compute_lhood_K2(targetinds, z[1,], freqPops)
accepted <- 0
priorz <- c(1e-6, 0.999999)
for(i in 2:niter) {
z[i,] <- z[i-1,]
# propose new vector z, by selecting a random individual, proposing a new zi value
selind <- sample(1:nind, size=1)
# proposal probability of selecting individual at random
proposal_ratio_ind <- log(1/nind)-log(1/nind)
# propose a new index for the selected individual
if(z[i,selind]==1) {
z[i,selind] <- 2
} else {
z[i,selind] <- 1
}
# proposal probability of changing the index of individual is 1/2
proposal_ratio_cluster <- log(1/2)-log(1/2)
propratio <- proposal_ratio_ind+proposal_ratio_cluster
# compute f(x_i|z_i*, p)
# the probability of the selected individual given the two clusters
probindcluster <- compute_lhood_ind_K2(targetinds[,selind],freqPops)
# likelihood ratio f(x_i|z_i*,p)/f(x_i|z_i, p)
lhoodratio <- probindcluster[z[i,selind]]-probindcluster[z[i-1,selind]]
# prior ratio pi(z_i*)/pi(z_i)
priorratio <- log(priorz[z[i,selind]])-log(priorz[z[i-1,selind]])
# accept new value according to the MH ratio
mh <- lhoodratio+propratio+priorratio
# reject if the random value is larger than the MH ratio
if(runif(1)>exp(mh)) {
z[i,] <- z[i-1,] # keep the same z
lhood[i] <- lhood[i-1] # keep the same likelihood
} else { # if accepted
lhood[i] <- lhood[i-1]+lhoodratio # update the likelihood
accepted <- accepted+1 # increase the number of accepted
}
}
It is asked that I have to change the proposal probability so that the new proposed values are proportional to the likelihood. This leads to a Gibbs sampling MCMC algorithm, supposedly.
I don't know what to change in the code to do this. I also don't understand very well the concept of proposal probability and how to chose the prior.
Grateful if someone knows how to clarify my doubts.
Your current proposal is done here:
# propose a new index for the selected individual
if(z[i,selind]==1) {
z[i,selind] <- 2
} else {
z[i,selind] <- 1
}
if the individual is assigned to cluster 1, then you propose to switch assignment deterministically by assigning them to cluster 2 (and vice versa).
You didn't show us what freqPops is, but if you want to propose according to freqPops then I believe the above code has to be replaced by
z[i,selind] <- sample(c(1,2),size=1,prob=freqPops)
(at least that is what I understand when you say you want to propose based on the likelihood - however, that statement of yours is unclear).
For this now to be a valid mcmc gibbs sampling algorithm you also need to change the next line of code:
proposal_ratio_cluster <- log(freqPops[z[i-1,selind]])-log(fregPops[z[i,selind]])

Portfolio loss estimation using pairwise correlations

im trying to estimate the portfolio losses using pairwise correlation matrix. The current code which runs fine using two 'for' loops to do the job and takes a fair amount of time. Is there any way to optimize the code to reduce the run time?
Corp here is the number of obligors or counterparties (1000 in this case)
Edit: Trying to explain the logic a little better here
So there is a data frame df that contains borrower details (Client_ID, Sector, Exposure, Probablity of default (PD) ). For example
Client ID is 1, 2, 3, 4,5 uptil 1000
Sector ID could be anything from 1 to 21
Exposure is any amount from 0 to 99999999
So i need to identify all client_ID pairs, multiply their exposures, and here is where it gets complicates, i need to also multiply their correlation value . This correlation value is stored in a separate 21*21 matrix, and is chosen on the basis of sectors to which the two clients belong. What is the most efficient way to do this? The code below does the stuff fine, but it takes a while, and i feel there must be an easier way to do this? merge? Expand.grid?
RC_UL=function(N_EAD, N_PD, N_LGD, N_LGD_VAR,N_CO_FAC_LOAD,V_SECTOR_ID,N_DFLT_PT)
{for (i in 1:corp){
for(j in 1:corp ){
if (i==j) {
#borrower correlation with himself is always 1
sigma.ijk <- cbind( c(1, 1),c(1, 1))
} else
{ sigma.ijk <- cbind( c(1, N_CO_FAC_LOAD[i]*N_CO_FAC_LOAD[j] *rho[V_SECTOR_ID[i], V_SECTOR_ID[j]]),c(N_CO_FAC_LOAD[i]*N_CO_FAC_LOAD[j] * rho[V_SECTOR_ID[i], V_SECTOR_ID[j]], 1))
}
#rho is the correlation matrix across sectors (not shown in the code snippet #i have provided, but it is a 21*21 matrix corresponding to the 21 sectors #that the counterparties belong to
#N_CO_FAC_LOAD is an input that is specific to a sector
#UL for each borrower
UL[i]<-UL[i]+N_LGD[i] * N_LGD[j] *
N_EAD[i] * N_EAD[j] *
( pmvnorm(upper=c(N_DFLT_PT[i], N_DFLT_PT[j]), mean=c(0,0),sigma=sigma.ijk) - N_PD[i] * N_PD[j] )
}}
for (i in 1:corp){N_RC_UL[i]<-(UL[i]/sqrt(sum(UL)))}
return(N_RC_UL)
}
b<-RC_UL(EAD, PD, LGD, N_LGD_VAR,fac_weight_s,sector,def_pt1)

plot multiple fit and predictions for logistic regression

I am running multiple times a logistic regression over more than 1000 samples taken from a dataset. My question is what is the best way to show my results ? how can I plot my outputs for both the fit and the prediction curve?
This is an example of what I am doing, using the baseball dataset from R. For example I want to fit and predict the model 5 times. Each time I take one sample out (for the prediction) and use another for the fit.
library(corrgram)
data(baseball)
#Exclude rows with NA values
dataset=baseball[complete.cases(baseball),]
#Create vector replacing the Leage (A our N) by 1 or 0.
PA=rep(0,dim(dataset)[1])
PA[which(dataset[,2]=="A")]=1
#Model the player be league A in function of the Hits,Runs,Errors and Salary
fit_glm_list=list()
prd_glm_list=list()
for (k in 1:5){
sp=sample(seq(1:length(PA)),30,replace=FALSE)
fit_glm<-glm(PA[sp[1:15]]~baseball$Hits[sp[1:15]]+baseball$Runs[sp[1:15]]+baseball$Errors[sp[1:15]]+baseball$Salary[sp[1:15]])
prd_glm<-predict(fit_glm,baseball[sp[16:30],c(6,8,20,21)])
fit_glm_list[[k]]=fit_glm;prd_glm_list[[k]]=fit_glm
}
There are a number of issues here.
PA is a subset of baseball$League but the model is constructed on columns from the whole baseball data frame, i.e. they do not match.
PA is treated as a continuous response when using the default family (gaussian), it should be changed to a factor and binomial family.
prd_glm_list[[k]]=fit_glm should probably be prd_glm_list[[k]]=prd_glm
You must save the true class labels for the predictions otherwise you have nothing to compare to.
My take on your code looks like this.
library(corrgram)
data(baseball)
dataset <- baseball[complete.cases(baseball),]
fits <- preds <- truths <- vector("list", 5)
for (k in 1:5){
sp <- sample(nrow(dataset), 30, replace=FALSE)
fits[[k]] <- glm(League ~ Hits + Runs + Errors + Salary,
family="binomial", data=dataset[sp[1:15],])
preds[[k]] <- predict(fits[[k]], dataset[sp[16:30],], type="response")
truths[[k]] <- dataset$League[sp[1:15]]
}
plot(unlist(truths), unlist(preds))
The model performs poorly but at least the code runs without problems. The y-axis in the plot shows the estimated probabilities that the examples belong to league N, i.e. ideally the left box should be close to 0 and the right close to 1.

Resources