Trouble coding a number of matrix models to run simultaneously - r

I made a matrix based population model, however, I would like to run more than one simultaneously in order to represent different groups of animals, in order that dispersing individuals can move between matrices. I originally just repeated everything to get a second matrix but then I realised that because I run the model using a for loop and break() under certain conditions (when that specific matrix should stop running, ie that group has died out) it is, understandably, stopping the whole model rather than just that singular matrix.
I was wondering if anyone had any suggestions on the best ways to code the model so that instead of breaking, and stopping the whole for loop, it just stops running across that specific matrix. I'm a little stumped. I have include a single run of one matrix below.
Also if anyone has a more efficient way of creating and running 9 matrices than writing everything out 9 times advice much appreciated.
n.steps <- 100
mats <- array(0,c(85,85,n.steps))
ns <- array(0,c(85,n.steps))
ns[1,1]<-0
ns[12,1]<-rpois(1,3)
ns[24,1]<-rpois(1,3)
ns[85,1] <- 1
birth<-4
nextbreed<-12
for (i in 2:n.steps){
# set up an empty matrix;
mat <- matrix(0,nrow=85,ncol=85)
surv.age.1 <- 0.95
x <- 2:10
diag(mat[x,(x-1)]) <- surv.age.1
surv.age.a <- 0.97
disp <- 1:74
disp <- disp*-0.001
disp1<-0.13
disp<-1-(disp+disp1)
survdisp<-surv.age.a*disp
x <- 11:84
diag(mat[x,(x-1)])<-survdisp
if (i == nextbreed) {
pb <- 1
} else {
pb <- 0
}
if (pb == 1) {
(nextbreed <- nextbreed+12)
}
mat[1,85] <- pb*birth
mat[85,85]<-1
death<-sample(c(replicate(1000,
sample(c(1,0), prob=c(0.985, 1-0.985), size = 1))),1)
if (death == 0) {
break()}
mats[,,i]<- mat
ns[,i] <- mat%*%ns[,i-1]
}
group.size <- apply(ns[1:85,],2,sum)
plot(group.size)
View(mat)
View(ns)

As somebody else suggested on Twitter, one solution might be to simply turn the matrix into all 0s whenever death happens. It looks to me like death is the probability that a local population disappears? It which case it seems to make good biological sense to just turn the entire population matrix into 0s.
A few other small changes: I made a list of replicate simulations so I could summarize them easily.
If I understand correctly,
death<-sample(c(replicate(1000,sample(c(1,0), prob=c(0.985, 1-0.985), size =1))),1)
says " a local population dies completely with probability 1.5% ". In which case, I think you could replace it with rbinom(). I did that below and my plots look similar to those I made with your code.
Hope that helps!
lots <- replicate(100, simplify = FALSE, expr = {
for (i in 2:n.steps){
# set up an empty matrix;
mat <- matrix(0,nrow=85,ncol=85)
surv.age.1 <- 0.95
x <- 2:10
diag(mat[x,(x-1)]) <- surv.age.1
surv.age.a <- 0.97
disp <- 1:74
disp <- disp*-0.001
disp1<-0.13
disp<-1-(disp+disp1)
survdisp<-surv.age.a*disp
x <- 11:84
diag(mat[x,(x-1)])<-survdisp
if (i == nextbreed) {
pb <- 1
} else {
pb <- 0
}
if (pb == 1) {
(nextbreed <- nextbreed+12)
}
mat[1,85] <- pb*birth
mat[85,85]<-1
death<-rbinom(1, size = 1, prob = 0.6)
if (death == 0) {
mat <- 0
}
mats[,,i]<- mat
ns[,i] <- mat%*%ns[,i-1]
}
ns
})
lapply(lots, FUN = function(x) apply(x[1:85,],2,sum))

Related

Looping a Rayleigh test to find highest value

I am a beginner with R, so hopefully this will be an easy fix.
I am trying to use a for loop on a dataset for neuron firing direction in order to:
Incrementally add the next value from the dataset to a vector
Run a Rayleigh test on that vector and save it to a variable
Test if the Rayleigh test I just ran has a larger statistic than the the Rayleigh test in the last loop just before it, as well as having a p-value of less than .05
If the value is larger, save the statistic value, so that the next loop can compare to it
If the value is larger, save the vector
So far I have this for the code, and after going through it for a long time I'm at a loss for why it's not working. Every time I run it, the for loop goes all the way to the end and just reports the rayleigh value and vector for the whole dataset, which I know for sure isn't correct.
(I'm using the circular package for the rayleigh test function)
# This first line is just to create an initial rayleigh statistic to compare to in the loop that is low
best_rayleigh <- rayleigh.test(1:10)
data_vector <- c()
for (i in firing_directions) {
data_vector <- append(data_vector, i)
ray_lee_test <- rayleigh.test(data_vector)
if ((ray_lee_test$statistic>best_rayleigh$statistic)&(ray_lee_test$p.value<=.05)) {
best_rayleigh <- ray_lee_test
best_rayleigh_vector <- data_vector
} else {
NULL
}
}
Any help is appreciated. Thank you!
Update: I tried using && instead of single & in the if statement, however it returned the same result
The following code doesn't give warnings and selects the vector with highest test statistic and "significant" p-value.
library(circular)
set.seed(2020)
firing_directions <- rvonmises(n = 25, mu = circular(pi), kappa = 2)
plot(firing_directions)
best_rayleigh <- rayleigh.test(circular(1:10))
for(i in seq_along(firing_directions)){
dv <- firing_directions[seq_len(i)]
rltest <- rayleigh.test(dv)
if((rltest$statistic > best_rayleigh$statistic) && (rltest$p.value <= 0.05)){
best_rayleigh <- rltest
best_rayleigh_vector <- dv
}
}
best_rayleigh
#
# Rayleigh Test of Uniformity
# General Unimodal Alternative
#
#Test Statistic: 0.8048
#P-value: 0.0298
best_rayleigh_vector
#Circular Data:
#Type = angles
#Units = radians
#Template = none
#Modulo = asis
#Zero = 0
#Rotation = counter
#[1] 4.172219 2.510826 2.997495 4.095335 3.655613
I think the NULL is throwing up some issues. Not sure what will have if your throw a NULL. You only update the vector if it passes your criteria:
library(circular)
firing_directions= rvonmises(n=25, mu=circular(pi), kappa=2)
best_rayleigh <- rayleigh.test(1:10)
data_vector <- c()
for (i in firing_directions){
data_vector <- c(data_vector, i)
ray_lee_test <- rayleigh.test(data_vector)
if ((ray_lee_test$statistic>best_rayleigh$statistic)&(ray_lee_test$p.value<=.05)) {
best_rayleigh <- ray_lee_test
best_rayleigh_vector <- data_vector
}
}

Speed up while loop in R

As part of a project I made a smoother to smooth out missing data. I make use of the previous slope of the last data points to calculate new values. After calculated each new point I use this data to calculate a new value (and so on). Hence I used a while-loop to calculate each value (both from left to right as from right to left to eventually take a average of these 2 values). This scripts works fine!
Although I expect that I can significantly accelerate this with a function from the apply-family, I still want to use this while loop. The script is however really slow (3 days for ~ 2,500,000 data points). Do you have tips (for the current script) for me to change to speed things up?
#Loop from: bottom -> top
number_rows <- nrow(weight_id)
i <- nrow(weight_id)
while (i >= 1){
j = as.integer(weight_id[i,1])
prev1 <- temp[j+1,]$new_MAP_bottom
if(j<max(weight_id)){
previous_slope <- ifelse((temp[j+2,]$duration-temp[j+1,]$duration)>0,prev1-temp[j+2,]$new_MAP_bottom,0)
}else{
previous_slope <- 0
}
new_MAP <- round(prev1+((previous_slope-(factor*temp[j,]$steps))/(1+factor)), digit=2)
temp[j,]$new_MAP_bottom <- new_MAP
i <- i-1
}
#Loop from: top -> bottom
weight_factor <- 0
i <- 1
while (i <= nrow(weight_id)) {
j = as.integer(weight_id[i,1])
prev1 <- temp[j-1,]$new_MAP_top
if(j>2){
previous_slope <- ifelse((temp[j-1,]$duration-temp[j-2,]$duration)>0,prev1-temp[j-2,]$new_MAP_top,0)
}else{
previous_slope <- 0
}
new_MAP <- round(prev1+((previous_slope+(factor*temp[j,]$steps))/(1+factor)), digit=2)
temp[j,]$new_MAP_top <- new_MAP
#Take weighted average of two approaches (top -> bottom/bottom -> top)
if(weight_factor < 1){ weight_factor = temp[j,]$weight-1 }
weight_top <- weight_factor
weight_bottom <- temp[j,]$weight-weight_factor
if(weight_top>weight_bottom){ weight_top<-weight_top-1 }
if(weight_top<weight_bottom){ weight_bottom<-weight_bottom-1}
temp[j,]$MAP <- round(((new_MAP*weight_top)+(temp[j,]$new_MAP_bottom*weight_bottom))/(weight_top+weight_bottom),digit=0)
weight_factor <- weight_factor-1
i <- i+1
}
I did not read all of your code, especially without example data, but from the textual description, its only linear approximation: Please check, if the buildin functions approx and approxfun already do what you try to implement yourself, as these will be optimized more than you can with suitable effort.
par(mfrow=c(2,1))
example <- data.frame(x = 1:14,
y = c(3,4,5,NA, NA, NA, 6,7,8.1, 8.2, NA, 8.4, 8.5, NA))
plot(example)
f <- approxfun(example)
plot(example$x, f(example$x))
The apply family tends to give you shorter, more succinct code, but not necessarily much more speed then loops. If you are into speed, first check, if somebody else has already implemented, what you need, then try vectorization.
Edit:
The following runs in about a second on my computer. If this does something close enough to your own "linear smoother" so that you can replace yours with this, that is a speed increase of about 3 days.
n <- 2500000
example <- data.frame(x = 1:n,
y = sample(1:1000, n, replace = TRUE))
example$y[sample(1:n, n/5)] <- NA
print(Sys.time())
f <- approxfun(example)
mean(f(example$x))
print(Sys.time())

statistical moments in R

I've got a data set in R of a variable, repeated 10,000 times and sampled 200 times on each repeat so a 10,000 by 200 matrix, I would like to calculate statistical moments for the variable up to an arbitrary number. So in the end I would like a numeric vector holding the value of moments.
I can get the variance and the mean for the data set using colMean and colVar, but they only go so far.
I am also aware of the moments package in R, however using the all.moments command is returning me moments for each time course, or treating each column or row as an individual variable, not what I want.
Does anyone know an equivalent to colMean and colVar for higher order moments? And if possible also for cross moments?
Many thanks!
I stole this code from an obscure R package e1071:
theskew<- function (x) {
x<-as.vector(x)
sum((x-mean(x))^3)/(length(x)*sd(x)^3)
}
thekurt <- function (x) {
x<-as.vector(x)
sum((x-mean(x))^4)/(length(x)*var(x)^2) - 3
}
You can fold that into your code by feeding them one column at a time
Okay did this yesterday for posterity here is a loop that will do what I asked.
Provided your data is a time course of a variable you are measuring, and you want the moments of that variable:
rm(list=ls())
yourdata<-read.table("whereveryourdatais/and/variableyouwant")
yourdata<-t(yourdata) #only do this at your own discretion
mu<-colMeans(yourdata,1:ncol(yourdata))
NumMoments <- 5
rawmoments <- matrix(NA, nrow=NumMoments, ncol=ncol(yourdata))
for(i in 1:NumMoments) {
rawmoments[i, ] <- colMeans(yourdata^i)
}
plot(rawmoments[1,])
holder<-matrix(NA,nrow=nrow(yourdata),ncol=ncol(yourdata))
middles<-matrix(NA,nrow=1,ncol=ncol(yourdata))
for(j in 1:nrow(yourdata)){
for(o in 1:ncol(rawmoments)){
middles[o]<-yourdata[j,o]-rawmoments[1,o]
}
holder[j,] <- middles
}
centmoments<-matrix(NA,nrow=NumMoments,ncol=ncol(yourdata))
for(i in 1:NumMoments){
centmoments[i,]<-colMeans(holder^i)
}
Then centmoments has the centralmoments and rawmoments has the raw moments, you can specify how many moments to take by changing the value of NumMoments.
Note that the first row in "centmoments" will be approximately 0.
Is this what you're looking for?
X <- matrix(1:12, 3, 4) # your data
NumMoments <- 5
moments <- matrix(NA, nrow=NumMoments, ncol=ncol(X))
for(i in 1:NumMoments) {
moments[i, ] <- colMeans(X^i)
}
EDIT:
okay, apparently you want "central moments"
X <- matrix(1:12, 3, 4)
NumMoments <- 5
moments <- matrix(NA, nrow=NumMoments, ncol=ncol(X))
Y <- X
for(i in 1:ncol(X)) {
Y[, i] <- Y[, i] - moments[1, i]
}
for(i in 2:NumMoments) {
moments[i, ] <- colMeans(Y^i)
}

How to create adjacency matrix from grid coordinates in R?

I'm new to this site. I was wondering if anyone had experience with turning a list of grid coordinates (shown in example code below as df). I've written a function that can handle the job for very small data sets but the run time increases exponentially as the size of the data set increases (I think 800 pixels would take about 25 hours). It's because of the nested for loops but I don't know how to get around it.
## Dummy Data
x <- c(1,1,2,2,2,3,3)
y <- c(3,4,2,3,4,1,2)
df <- as.data.frame(cbind(x,y))
df
## Here's what it looks like as an image
a <- c(NA,NA,1,1)
b <- c(NA,1,1,1)
c <- c(1,1,NA,NA)
image <- cbind(a,b,c)
f <- function(m) t(m)[,nrow(m):1]
image(f(image))
## Here's my adjacency matrix function that's slowwwwww
adjacency.coordinates <- function(x,y) {
df <- as.data.frame(cbind(x,y))
colnames(df) = c("V1","V2")
df <- df[with(df,order(V1,V2)),]
adj.mat <- diag(1,dim(df)[1])
for (i in 1:dim(df)[1]) {
for (j in 1:dim(df)[1]) {
if((df[i,1]-df[j,1]==0)&(abs(df[i,2]-df[j,2])==1) | (df[i,2]-df[j,2]==0)&(abs(df[i,1]-df[j,1])==1)) {
adj.mat[i,j] = 1
}
}
}
return(adj.mat)
}
## Here's the adjacency matrix
adjacency.coordinates(x,y)
Does anyone know of a way to do this that will work well on a set of coordinates a couple thousand pixels long? I've tried conversion to SpatialGridDataFrame and went from there but it won't get the adjacency matrix correct. Thank you so much for your time.
While I thought igraph might be the way to go here, I think you can do it more simply like:
result <- apply(df, 1, function(pt)
(pt["x"] == df$x & abs(pt["y"] - df$y) == 1) |
(abs(pt["x"] - df$x) == 1 & pt["y"] == df$y)
)
diag(result) <- 1
And avoid the loopiness and get the same result:
> identical(adjacency.coordinates(x,y),result)
[1] TRUE

How to optimize 2 loops

I am running a simulation trying to find the probability of something taking place in a number of binomial trials. I start with specifying the data
iter=5000
data=data.frame(prob=runif(300), value=runif(300))
data<-data[sample(nrow(data), iter, replace=T),]
then I add the trials
cols <- c("one","two","three","four","five","six",
"seven","eight","nine","ten","eleven","twelve")
data[,cols] <- NA
one contains the results of only one binomial trials, two contains the results of two binomial trials and so on. If a binomial event takes place in any of the one, two, three, ..., twelve, the cell is marked 1 else 0.
Then I run the trials for iter=5000 simulations
for (col in 3:14) {
for (i in 1:iter) if (sum(rbinom((col-2),1,data[i,1]))>0) data[i,col]<-1 else data[i,col]<-0
}
Then I evaluate the mean(data$value[data$one==0] till ... mean(data$value[data$twelve==0]
My problem is that the simulation code takes forever for iter>15000.
for (col in 3:14) {
for (i in 1:iter)
data[i,col] <- if (sum(rbinom((col-2),1,data[i,1]))>0) 1 else 0
}
Any ideas?
sim2 <- function(iter) {
dat <- data.frame(prob=runif(300), value=runif(300))
dat <- dat[sample(nrow(dat), iter, replace=TRUE),]
cols <- c("one","two","three","four","five","six",
"seven","eight","nine","ten","eleven","twelve")
dat[,cols] <- 0
for (col in 3:14) {
dat[,col] <- as.numeric(vapply(dat[,1],
function(p) {sum(rbinom((col-2), 1, p))>0},
FUN.VALUE = TRUE))
}
vapply(3:14, function(col) {mean(dat$value[dat[,col]==0])}, FUN.VALUE=1)
}
For iter of 16000, this runs in 2.29s on my machine, compared to an (estimated) 1781s for the ordering in your original algorithm. In general, don't assign individual elements in the data frame when you can assign the whole column at once. There may be more improvements possible, but I'll stop at >750x speedup (and changing the algorithm from running time of O(n^2) to O(n)).

Resources