Storing consecutive values from a function to a vector? - r

I've been messing around learning functions to calculate homework answers (in this case the present value of money) and I've run into a bit of an issue.
Here's the code:
pv <- function(x,y,z) {
list2 <- 0
ans <- 0
for(t in z){
fv <- x
d <- y
rate <- (1+d)^t
ans[t] <- fv/rate
}
return(ans)
}
To calculate present value I want to apply the function to one value for a range of years (z=1:10, say) and have the value for each year stored in a vector. What I have works, but this strategy breaks down in other applications. For example, when I want to input a vector of values (I have another function where I run the ans vector through a function) for a range of years I have trouble getting back a usable vector.

I had to make a change in your for loop, then I used mapply:
pv <- function(x,y,z) {
list2 <- 0
ans <- 0
for(t in 1:z){ # changed from t in z
fv <- x
d <- y
rate <- (1+d)^t
ans[t] <- fv/rate
}
return(ans)
}
mapply(pv, c(1000, 1200, 1500, 5600), c(.05, .02, .03, .09), 5)
Output:
[,1] [,2] [,3] [,4]
[1,] 952.3810 1176.471 1456.311 5137.615
[2,] 907.0295 1153.403 1413.894 4713.408
[3,] 863.8376 1130.787 1372.712 4324.227
[4,] 822.7025 1108.615 1332.731 3967.181
[5,] 783.5262 1086.877 1293.913 3639.616
Each row contains the present value for each value of x for a period i.e. row 1 is the present value for 1st period. If you wanted just the 5th period for all values in question:
mapply(pv, c(1000, 1200, 1500, 5600), c(.05, .02, .03, .09), 5)[5,]

pv = function(fv, d, t)
fv/(1+d)^t
pv(1.05^2, 0.05, c(1, 2))
Here's an explanation. Basically, in R, algebraic functions are applied automatically over numeric data, such that looping is often unnecessary.

Related

Efficiently change individual elements in matrix/array in R

I am running a simulation in R, which I am trying to make more efficient.
A little bit of background: this is an abstract simulation to test the effects of mutation on a population. The population has N individuals and each individuals has a genotype of M letters, each letter can be one of the twenty amino acids (I denote as 0:19).
One of the most (computationally) expensive tasks involves taking a matrix "mat" with M rows and N columns, which initially starts as a matrix of all zeroes,
mat <- matrix(rep(0,M*N),nrow=M)
And then changing (mutating) at least one letter in the genotype of each individual. The reason I say at least is, I would ideally like to set a mutation rate (mutrate) that, if I set to 2 in my overall simulation function, it will cause 2 mutations in the matrix per individual.
I found two rather computationally expensive ways to do so. As you can see below, only the second method incorporates the mutation rate parameter mutrate (I could not easily of think how to incorporate it into the first).
#method 1
for(i in 1:N){
position <- floor(runif(N, min=0, max=M))
letter <- floor(runif(N, min=0, max=19))
mat[position[i],i] = letter[i]}
#method 2, somewhat faster and incorporates mutation rate
mat <- apply(mat,2,function(x) (x+sample(c(rep(0,M-mutrate),sample(0:19,size=mutrate))%%20))))
The second method incorporates a modulus because genotype values have to be between 0 and 19 as I mentioned.
A few additional notes for clarity:
I don't strictly need every individual to get exactly the same mutation amount. But that being said, the distribution should be narrow enough such that, if mutrate = 2, most individuals get two mutations, some one, some maybe three. I don't want however one individual getting a huge amount of mutations and many individuals getting no mutations Notably, some mutations will change the letter into the same letter, and so for a large population size N, the expected average number of mutations is slightly less than the assigned mutrate.
I believe the answer has something to do with the ability to use the square-bracket subsetting method to obtain one random element from every column of the matrix mat. However, I could not find any information about how to use the syntax to isolate one random element from every column of a matrix. mat[sample(1:M),sample(1:N)] obviously gives you the whole matrix... perhaps I am missing something stupidly clear here.
Any help is greatly appreciated !
To answer your last question first; you can access a single cell in a matrix with mat[row,column], or multiple scattered cells by their sequential cell id. Cell 1,1 is the first cell, followed by 2,1, 3,1, etc:
mat <- matrix(rep(0, 5*5), nrow=5)
mat[c(1,3,5,7,9)] = c(1,2,3,4,5)
mat
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 0 0 0
[2,] 0 4 0 0 0
[3,] 2 0 0 0 0
[4,] 0 5 0 0 0
[5,] 3 0 0 0 0
Accessing / overwriting the individual cells is fast too however. The fastest way that I could think of to perform your task, is to first create vectors for the values we want. A vector of all column indices (every column as many times as mutrate), a vector of row indices (randomly), and a vector of new values for these column/row combinations (randomly).
cols = rep(seq_len(N), mutrate)
rows = sample(M, N*mutrate, replace = T)
values = sample(genotypes, N*mutrate, replace = T) - 1 # -1 offset since genotypes are 0-indexed
for(i in seq_len(N*mutrate)) {
mat[rows[i],cols[i]] = values[i]
}
Instead of that for-loop to update the matrix, we can also calculate the cell-IDs so we can update all matrix cells in one go:
cols = rep(seq_len(N), mutrate)
rows = sample(M, N*mutrate, replace = T)
cellid = rows + (cols-1)*M
values = sample(genotypes, N*mutrate, replace = T) - 1 # -1 offset since genotypes are 0-indexed
mat[cellid] = values
Trying with a 6000x10000 matrix to benchmark the multiple methods, shows how fast each method is:
N = 6000 # individuals
M = 10000 # genotype length
genotypes = 20
mutrate = 2
method1 <- function() {
mat <- matrix(rep(0,M*N),nrow=M)
for(i in 1:(N*mutrate)){
position <- sample(M, 1)
letter <- sample(genotypes, 1) - 1
mat[position,i%%N] = letter
}
return(mat)
}
method2 <- function() {
mat <- matrix(rep(0,M*N),nrow=M)
mat <- apply(mat,2,function(x) (x+sample(c(rep(0,M-mutrate),sample(0:19,size=mutrate))%%20)))
}
method3 <- function() {
mat <- matrix(rep(0,M*N),nrow=M)
cols = rep(seq_len(N), mutrate)
rows = sample(M, N*mutrate, replace = T)
values = sample(genotypes, N*mutrate, replace = T) - 1 # -1 offset since genotypes are 0-indexed
for(i in seq_len(N*mutrate)) {
mat[rows[i],cols[i]] = values[i]
}
return(mat)
}
method4 <- function() {
mat <- matrix(rep(0,M*N),nrow=M)
cols = rep(seq_len(N), mutrate)
rows = sample(M, N*mutrate, replace = T)
cellid = rows + (cols-1)*M
values = sample(genotypes, N*mutrate, replace = T) - 1 # -1 offset since genotypes are 0-indexed
mat[cellid] = values
return(mat)
}
benchmark <- function(func, times=10) {
begin <- as.numeric(Sys.time())
for(i in seq_len(times))
retval <- eval(parse(text=func))
end <- as.numeric(Sys.time())
cat(func, 'took', (end-begin)/times, 'seconds\n')
return(retval)
}
ret1 <- benchmark('method1()')
ret2 <- benchmark('method2()')
ret3 <- benchmark('method3()')
ret4 <- benchmark('method4()')
I've modified your first method to speed it up and perform mutrate.
method1() took 0.8936087 seconds
method2() took 8.767686 seconds
method3() took 0.7008878 seconds
method4() took 0.6548331 seconds

Creating more pseudo-random matrices same time in R? Comparing the points sign matching?

I can make one pseudo-random matrix with the following :
nc=14
nr=14
set.seed(111)
M=matrix(sample(
c(runif(58,min=-1,max=0),runif(71, min=0,max=0),
runif(nr*nc-129,min=0,max=+1))), nrow=nr, nc=nc)
The more important question: I need 1000 matrices with the same amount of negative, positive and zero values, just the location in the matrices need to be various.
I can make matrices one by one, but I want to do this task faster.
The less important question: If I have the 1000 matrices, I need to identify for every point of the matrices, that how many positive negative or zero value got there, for example:
MATRIX_A
[,1]
[9,] -0,2
MATRIX_B
[,1]
[9,] -0,5
MATRIX_C
[,1]
[9,] 0,1
MATRIX_D
[,1]
[9,] 0,0
MATRIX_E
[,1]
[9,] 0,9
What I need:
FINAL_MATRIX_positive
[,1]
[9,] (2/5*100)=40% or 0,4 or 2
,because from 5 matrix in this point were 2 positive value, and also need this for negative and zero values too.
If it isn't possible to do this in R, I can compare them "manually" in Excel.
Thank you for your help!
Actually you are almost there!
You can try the code below, where replicate can make 1000 times for generating the random matrix, and Reduce gets the statistics of each position:
nc <- 14
nr <- 14
N <- 1000
lst <- replicate(
N,
matrix(sample(
c(
runif(58, min = -1, max = 0),
runif(71, min = 0, max = 0),
runif(nr * nc - 129, min = 0, max = +1)
)
), nrow = nr, nc = nc),
simplify = FALSE
)
pos <- Reduce(`+`,lapply(lst,function(M) M > 0))/N
neg <- Reduce(`+`,lapply(lst,function(M) M < 0))/N
zero <- Reduce(`+`,lapply(lst,function(M) M == 0))/N
I use a function for your simulation scheme:
my_sim <- function(n_neg = 58, n_0 = 71, n_pos = 67){
res <- c(runif(n_neg, min=-1, max=0),
rep(0, n_0),
runif(n_pos, min=0, max=+1))
return(sample(res))
}
Then, I simulate your matrices (I store them in a list):
N <- 1000
nr <- 14
nc <- nr
set.seed(111)
my_matrices <- list()
for(i in 1:N){
my_matrices[[i]] <- matrix(my_sim(), nrow = nr, ncol = nc)
}
Finally, I compute the proportion of positive numbers for the position row 1 and column 9:
sum(sapply(my_matrices, function(x) x[1,9]) > 0)/N
# [1] 0.366
However, if you are interested in all the positions, these lines will do the job:
aux <- lapply(my_matrices, function(x) x > 0)
FINAL_MATRIX_positive <- 0
for(i in 1:N){
FINAL_MATRIX_positive <- FINAL_MATRIX_positive + aux[[i]]
}
FINAL_MATRIX_positive <- FINAL_MATRIX_positive/N
# row 1, column 9
FINAL_MATRIX_positive[1, 9]
# [1] 0.366

How to randomise a matrix element for each iteration of a loop?

I'm working with the popbio package on a population model. It looks something like this:
library(popbio)
babies <- 0.3
kids <- 0.5
teens <- 0.75
adults <- 0.98
A <- c(0,0,0,0,teens*0.5,adults*0.8,
babies,0,0,0,0,0,
0,kids,0,0,0,0,
0,0,kids,0,0,0,
0,0,0,teens,0,0,
0,0,0,0,teens,adults
)
A <- matrix ((A), ncol=6, byrow = TRUE)
N<-c(10,10,10,10,10,10)
N<-matrix (N, ncol=1)
model <- pop.projection(A,N,iterations=10)
model
I'd like to know how I can randomise the input so that at each iteration, which represents years this case, I'd get a different input for the matrix elements. So, for instance, my model runs for 10 years, and I'd like to have the baby survival rate change for each year. babies <- rnorm(1,0.3,0.1)doesn't do it because that still leaves me with a single value, just randomly selected.
Update: This is distinct from running 10 separate models with different initial, random values. I'd like the update to occur within a single model run, which itself has 10 iteration in the pop.projection function.
Hope you can help.
I know this answer is very late, but here's one approach using expressions. First, use an expression to create the matrix.
vr <- list( babies=0.3, kids=0.5, teens=0.75, adults=0.98 )
Ax <- expression( matrix(c(
0,0,0,0,teens*0.5,adults*0.8,
babies,0,0,0,0,0,
0,kids,0,0,0,0,
0,0,kids,0,0,0,
0,0,0,teens,0,0,
0,0,0,0,teens,adults), ncol=6, byrow = TRUE ))
A1 <- eval(Ax, vr)
lambda(A1)
[1] 1.011821
Next, use an expression to create vital rates with nrorm or other functions.
vr2 <- expression( list( babies=rnorm(1,0.3,0.1), kids=0.5, teens=0.75, adults=0.98 ))
A2 <- eval(Ax, eval( vr2))
lambda(A2)
[1] 1.014586
Apply the expression to 100 matrices.
x <- sapply(1:100, function(x) lambda(eval(Ax, eval(vr2))))
quantile(x, c(.05,.95))
5% 95%
0.996523 1.025900
Finally, make two small changes to pop.projection by adding the vr option and a line to evaluate A at each time step.
pop.projection2 <- function (Ax, vr, n, iterations = 20)
{
x <- length(n)
t <- iterations
stage <- matrix(numeric(x * t), nrow = x)
pop <- numeric(t)
change <- numeric(t - 1)
for (i in 1:t) {
stage[, i] <- n
pop[i] <- sum(n)
if (i > 1) {
change[i - 1] <- pop[i]/pop[i - 1]
}
## evaluate Ax
A <- eval(Ax, eval(vr))
n <- A %*% n
}
colnames(stage) <- 0:(t - 1)
w <- stage[, t]
pop.proj <- list(lambda = pop[t]/pop[t - 1], stable.stage = w/sum(w),
stage.vectors = stage, pop.sizes = pop, pop.changes = change)
pop.proj
}
n <-c(10,10,10,10,10,10)
pop.projection2(Ax, vr2, n, 10)
$lambda
[1] 0.9874586
$stable.stage
[1] 0.33673579 0.11242588 0.08552367 0.02189786 0.02086656 0.42255023
$stage.vectors
0 1 2 3 4 5 6 7 8 9
[1,] 10 11.590000 16.375700 19.108186 20.2560223 20.5559445 20.5506251 20.5898222 20.7603581 20.713271
[2,] 10 4.147274 3.332772 4.443311 5.6693931 1.9018887 6.8455597 5.3879202 10.5214540 6.915534
[3,] 10 5.000000 2.073637 1.666386 2.2216556 2.8346965 0.9509443 3.4227799 2.6939601 5.260727
[4,] 10 5.000000 2.500000 1.036819 0.8331931 1.1108278 1.4173483 0.4754722 1.7113899 1.346980
[5,] 10 7.500000 3.750000 1.875000 0.7776139 0.6248948 0.8331209 1.0630112 0.3566041 1.283542
[6,] 10 17.300000 22.579000 24.939920 25.8473716 25.9136346 25.8640330 25.9715930 26.2494195 25.991884
$pop.sizes
[1] 60.00000 50.53727 50.61111 53.06962 55.60525 52.94189 56.46163 56.91060 62.29319 61.51194
$pop.changes
[1] 0.8422879 1.0014610 1.0485765 1.0477793 0.9521023 1.0664832 1.0079517 1.0945797 0.9874586

What is wrong with my starting values

I am using nleqslv package in R to solve nonlinear system of equations. The R codes are given below;
require(nleqslv)
x <- c(6,12,18,24,30)
NMfun1 <- function(k,n) {
y <- rep(NA, length(k))
y[1] <- -(5/k[1])+sum(x^k[2]*exp(k[3]*x))+2*sum(k[4]*x^k[2]*exp(-k[1]*x^k[2]*exp(k[3]*x)+k[3]*x)/(1-k[4]*exp(-k[1]*x^k[2]*exp(k[3]*x))))
y[2] <- -sum(log(x))-sum(1/(k[2]+k[3]*x))+sum(k[1]*x^k[2]*exp(k[3]*x)*log(x))+2*sum(k[1]*k[4]*exp(-k[1]*x^k[2]*exp(k[3]*x)+k[3]*x)*log(x)/(1-k[4]*exp(-k[1]*x^k[2]*exp(k[3]*x))))
y[3] <- -sum(x/(k[2]+k[3]*x))+sum(k[1]*x^(k[2]+1)*exp(k[3]*x))-sum(x)+2*sum(k[4]*x^k[2]*exp(-k[1]*x^k[2]*exp(k[3]*x)+k[3]*x)/(1-k[4]*exp(-k[1]*x^k[2]*exp(k[3]*x))))
y[4] <- -(5/(1-k[4]))+2*sum(exp(-k[1]*x^k[2]*exp(k[3]*x))/(1-k[4]*exp(-k[1]*x^k[2]*exp(k[3]*x))))
return(y)
}
kstart <- c(0.05, 0, 0.35, 0.9)
NMfun1(kstart)
nleqslv(kstart, NMfun1, control=list(btol=.0001),method="Newton")
The estimated values for k obtained are; 0.04223362 -0.08360564 0.14216026 0.37854908
But the estimated values of k are to be
greater than zero.
Ok. So you want real larger than 0 solutions if they exist of course.
Make a new function which squares the input argument before passing it to NMfun1. And then use the searchZeros function in the package nleqslv to search for solutions. Like this
NMfun1.alt <- function(k0,n) NMfun1(k0^2,n)
3 use set.seed for reproducibility
set.seed(413)
# generate 100 random starting values
xstart <- matrix(runif(4*100,min=0,max=1), nrow=100,ncol=4)
z <- searchZeros(xstart,NMfun1.alt)
z
ksol <- z$x^2
ksol
# in this case there are two solutions
NMfun1(ksol[1,])
NMfun1(ksol[2,])
The output of the last 4 non comment lines of this code are
> ksol <- z$x^2
> ksol
[,1] [,2] [,3] [,4]
[1,] 0.002951051 1.669142 0.03589502 0.001167185
[2,] 0.002951051 1.669142 0.03589502 0.001167185
> NMfun1(ksol[1,])
[1] 3.231138e-11 3.602561e-13 -4.665268e-12 -1.119105e-13
> NMfun1(ksol[2,])
[1] 1.532663e-12 1.085046e-14 6.894485e-14 -2.664535e-15
You will see that the solution contained in object z has a negative element. And that is squared.
From this experiment it appears that your system has a single positive solution.

How can I efficiently generate a dataframe of simulated values?

I'm trying to generate a data frame of simulated values based on existing distribution parameters. My main data frame contains the mean and standard deviation for each observation, like so:
example.data <- data.frame(country=c("a", "b", "c"),
score_mean=c(0.5, 0.4, 0.6),
score_sd=c(0.1, 0.1, 0.2))
# country score_mean score_sd
# 1 a 0.5 0.1
# 2 b 0.4 0.1
# 3 c 0.6 0.2
I can use sapply() and a custom function to use the score_mean and score_sd parameters to randomly draw from a normal distribution:
score.simulate <- function(score.mean, score.sd) {
return(mean(rnorm(100, mean=score.mean, sd=score.sd)))
}
simulated.scores <- sapply(example.data$score_mean,
FUN=score.simulate,
score.sd=example.data$score_sd)
# [1] 0.4936432 0.3753853 0.6267956
This will generate one round (or column) of simulated values. However, I'd like to generate a lot of columns (like 100 or 1,000). The only way I've found to do this is to wrap my sapply() function inside a generic function inside lapply() and then convert the resulting list into a data frame with ldply() in plyr:
results.list <- lapply(1:5, FUN=function(x) sapply(example.data$score_mean, FUN=score.simulate, score.sd=example.data$score_sd))
library(plyr)
simulated.scores <- as.data.frame(t(ldply(results.list)))
# V1 V2 V3 V4 V5
# V1 0.5047807 0.4902808 0.4857900 0.5008957 0.4993375
# V2 0.3996402 0.4128029 0.3875678 0.4044486 0.3982045
# V3 0.6017469 0.6055446 0.6058766 0.5894703 0.5960403
This works, but (1) it seems really convoluted, especially with the as.data.frame(t(ldply(lapply(... FUN=function(x) sapply ...)))) approach, (2) it is really slow when using large numbers of iterations or bigger data—my actual dataset has 3,000 rows, and running 1,000 iterations takes 1–2 minutes.
Is there a more efficient way to create a data frame of simulated values like this?
The quickest way I can think of is to take advantage of the vectorisation built-in to rnorm. Both the mean and sd arguments are vectorised, however you can only supply a single integer for the number of draws. If you supply a vector to the mean and sd arguments, R will cycle through them until it has completed the required number of draws. Therefore, just make the argument n to rnorm a multiple of the length of your mean vector. The multiplier will be the number of replicates for each row of your data.frame. In the function below this is n.
I can't think of a factor way than using base::rnorm on its own.
Worked example
#example data
df <- data.frame(country=c("a", "b", "c"),
mean=c(1, 10, 100),
sd=c(1, 2, 10))
#function which returns a matrix, and takes column vectors as arguments for mean and sd
normv <- function( n , mean , sd ){
out <- rnorm( n*length(mean) , mean = mean , sd = sd )
return( matrix( out , , ncol = n , byrow = FALSE ) )
}
#reproducible result (note order of magnitude of rows and input sample data)
set.seed(1)
normv( 5 , df$mean , df$sd )
# [,1] [,2] [,3] [,4] [,5]
#[1,] 0.3735462 2.595281 1.487429 0.6946116 0.3787594
#[2,] 10.3672866 10.659016 11.476649 13.0235623 5.5706002
#[3,] 91.6437139 91.795316 105.757814 103.8984324 111.2493092
This can be done very quickly if you remember that rnorm(1, mean, sd) is the same as rnorm(1)*sd + mean so using your data frame df, you can generate sim simulations of your obs observations like:
obs = nrow(df)
sim = 1000
mat = data.frame(matrix(rnorm(obs*sim), obs, sim) * df$sd + df$mean)
You can check that this has the desired means by using rowMeans(mat) and check the standard deviation for, say, row 1 as sd(mat[1,]).

Resources