How to write for loop when function increases with each iteration? - r

I am trying to estimate the probability of detecting animals from n.sites over multiple observation periods when animals are removed and detection changes in time and space. It works if I do something like this for 5 observation periods:
for(i in 1:nsites){
mu[i,1] <- p[i,1]
mu[i,2] <- p[i,2]*(1-p[i,1])
mu[i,3] <- p[i,3]*(1-p[i,1])*(1-p[i,2])
mu[i,4] <- p[i,4]*(1-p[i,1])*(1-p[i,2])*(1-p[i,3])
mu[i,5] <- p[i,5]*(1-p[i,1])*(1-p[i,2])*(1-p[i,3])*(1-p[i,4])
}
The probability at time 2 is dependent on the probability at time 1 and the probability at time 3 is dependent on the probabilities at times 1 and 2. If I were only doing this for 5 time periods it wouldn't be a big deal to write this out. But as I get 10, 15, 20+ time periods, it's is quite messy to write out. I feel like there should be a way to write this loop without typing out each step, but I just can't think of how to do it. Maybe additional indexing or other control statement or power function. If p[i] were the same in each jth observation (i.e. p[i,1] = p[i,2] = p[i,3], etc.) it would be:
p[i]*(1-p[i])^5
Any suggestions would be greatly appreciated.
This is BUGS language code. I work in R and sent the code to JAGS via the rjags package. BUGS, R, or pseudo code would suit my purposes.
Here is R code that would simulate the problem:
set.seed(123)
testp <- matrix(runif(108, 0.1, 0.5), 108, 5)
testmu <- matrix(NA, 108, 5)
for(i in 1:nsites){
testmu[i,1] <- testp[i,1]
testmu[i,2] <- testp[i,2]*(1-testp[i,1])
testmu[i,3] <- testp[i,3]*(1-testp[i,1])*(1-testp[i,2])
testmu[i,4] <- testp[i,4]*(1-testp[i,1])*(1-testp[i,2])*(1-testp[i,3])
testmu[i,5] <- testp[i,5]*(1-testp[i,1])*(1-testp[i,2])*(1-testp[i,3])*(1-testp[i,4])
}
Thanks for any help.
Dan

This really does look like a task well suited to R's Reduce:
testmu3 <- matrix(NA, 108, 5)
nsites = 108
np = 5
for (i in 1:nsites) {
testmu3[ i, ] <- Reduce( function(x,y) x*(1-y), testp[i, ],
accumulate=TRUE)
}
max(abs(testmu3-testmu))
[1] 0
The accumulate parameter creates a growing vector of intermediate results.
> testp[1, ]
[1] 0.215031 0.215031 0.215031 0.215031 0.215031
> Reduce( function(x,y) x*(1-y), testp[1, ], accumulate=TRUE)
[1] 0.21503101 0.16879267 0.13249701 0.10400605 0.08164152

#Frank's answer is cleaner (and faster, probably), but this will also work and might be a little easier to understand.
testmu2 <- matrix(NA, 108, 5)
nsites = 108
np = 5
for (i in 1:nsites) {
fac <- 1
testmu2[i,1] <- testp[i,1]
for (j in 2:np) {
fac <- fac * (1-testp[i,j-1])
testmu2[i,j] <- testp[i,j] * fac
}
}
max(abs(testmu2-testmu))
[1] 2.775558e-17

Here's one way:
testmu2 <- testp*t(apply(cbind(1,1-testp[,-5]),1,cumprod))
On my computer, they almost match:
> max(abs(testmu2-testmu))
[1] 2.775558e-17
I don't know about BUGS/JAGS, but the idea is to take the cumulative product of your 1-p matrix across its columns first, and then take p*result.

Related

Efficient algorithm to turn matrix subdiagonal to columns r

I have a non-square matrix and need to do some calculations on it's subdiagonals. I figure out that the best way is too turn subdiagonals to columns/rows and use functions like cumprod. Right now I use a for loop and exdiag defined as below:
exdiag <- function(mat, off=0) {mat[row(mat) == col(mat)+off]}
However it to be not really efficient. Do you know any other algorithm to achieve that kind of results.
A little example to show what I am doing:
exdiag <- function(mat, off=0) {mat[row(mat) == col(mat)+off]}
mat <- matrix(1:72, nrow = 12, ncol = 6)
newmat <- matrix(nrow=11, ncol=6)
for (i in 1:11){
newmat[i,] <- c(cumprod(exdiag(mat,i)),rep(0,max(6-12+i,0)))
}
Best regards,
Artur
The fastest but by far the most cryptic solution to get all possible diagonals from a non-square matrix, would be to treat your matrix as a vector and simply construct an id vector for selection. In the end you can transform it back to a matrix if you want.
The following function does that:
exdiag <- function(mat){
NR <- nrow(mat)
NC <- ncol(mat)
smalldim <- min(NC,NR)
if(NC > NR){
id <- seq_len(NR) +
seq.int(0,NR-1)*NR +
rep(seq.int(1,NC - 1), each = NR)*NR
} else if(NC < NR){
id <- seq_len(NC) +
seq.int(0,NC-1)*NR +
rep(seq.int(1,NR - 1), each = NC)
} else {
return(diag(mat))
}
out <- matrix(mat[id],nrow = smalldim)
id <- (ncol(out) + 1 - row(out)) - col(out) < 0
out[id] <- NA
return(out)
}
Keep in mind you have to take into account how your matrix is formed.
In both cases I follow the same logic:
first construct a sequence indicating positions along the smallest dimension
To this sequence, add 0, 1, 2, ... times the row length.
This creates the first diagonal. After doing this, you simply add a sequence that shifts the entire previous sequence by 1 (either down or to the right) until you reach the end of the matrix. To shift right, I need to multiply this sequence by the number of rows.
In the end you can use these indices to select the correct positions from mat, and return all that as a matrix. Due to the vectorized nature of this code, you have to check that the last subdiagonals are correct. These contain less elements than the first, so you have to replace the values not part of that subdiagonal by NA. Also here you can simply use an indexing trick.
You can use it as follows:
> diag1 <- exdiag(amatrix)
> diag2 <- exdiag(t(amatrix))
> identical(diag1, diag2)
[1] TRUE
In order to come to your result
amatrix <- matrix(1:72, ncol = 6)
diag1 <- exdiag(amatrix)
res <- apply(diag1,2,cumprod)
res[is.na(res)] <- 0
t(res)
You can modify the diag() function.
exdiag <- function(mat, off=0) {mat[row(mat) == col(mat)+off]}
exdiag2 <- function(matrix, off){diag(matrix[-1:-off,])}
Speed Test:
mat = diag(10, 10000,10000)
off = 4
> system.time(exdiag(mat,4))
user system elapsed
7.083 2.973 10.054
> system.time(exdiag2(mat,4))
user system elapsed
5.370 0.155 5.524
> system.time(diag(mat))
user system elapsed
0.002 0.000 0.002
It looks like that the subsetting from matrix take a lot of time, but it still performs better than your implementation. May be there are a lot of other subsetting approaches, which outperforms my solution. :)

Trouble coding a number of matrix models to run simultaneously

I made a matrix based population model, however, I would like to run more than one simultaneously in order to represent different groups of animals, in order that dispersing individuals can move between matrices. I originally just repeated everything to get a second matrix but then I realised that because I run the model using a for loop and break() under certain conditions (when that specific matrix should stop running, ie that group has died out) it is, understandably, stopping the whole model rather than just that singular matrix.
I was wondering if anyone had any suggestions on the best ways to code the model so that instead of breaking, and stopping the whole for loop, it just stops running across that specific matrix. I'm a little stumped. I have include a single run of one matrix below.
Also if anyone has a more efficient way of creating and running 9 matrices than writing everything out 9 times advice much appreciated.
n.steps <- 100
mats <- array(0,c(85,85,n.steps))
ns <- array(0,c(85,n.steps))
ns[1,1]<-0
ns[12,1]<-rpois(1,3)
ns[24,1]<-rpois(1,3)
ns[85,1] <- 1
birth<-4
nextbreed<-12
for (i in 2:n.steps){
# set up an empty matrix;
mat <- matrix(0,nrow=85,ncol=85)
surv.age.1 <- 0.95
x <- 2:10
diag(mat[x,(x-1)]) <- surv.age.1
surv.age.a <- 0.97
disp <- 1:74
disp <- disp*-0.001
disp1<-0.13
disp<-1-(disp+disp1)
survdisp<-surv.age.a*disp
x <- 11:84
diag(mat[x,(x-1)])<-survdisp
if (i == nextbreed) {
pb <- 1
} else {
pb <- 0
}
if (pb == 1) {
(nextbreed <- nextbreed+12)
}
mat[1,85] <- pb*birth
mat[85,85]<-1
death<-sample(c(replicate(1000,
sample(c(1,0), prob=c(0.985, 1-0.985), size = 1))),1)
if (death == 0) {
break()}
mats[,,i]<- mat
ns[,i] <- mat%*%ns[,i-1]
}
group.size <- apply(ns[1:85,],2,sum)
plot(group.size)
View(mat)
View(ns)
As somebody else suggested on Twitter, one solution might be to simply turn the matrix into all 0s whenever death happens. It looks to me like death is the probability that a local population disappears? It which case it seems to make good biological sense to just turn the entire population matrix into 0s.
A few other small changes: I made a list of replicate simulations so I could summarize them easily.
If I understand correctly,
death<-sample(c(replicate(1000,sample(c(1,0), prob=c(0.985, 1-0.985), size =1))),1)
says " a local population dies completely with probability 1.5% ". In which case, I think you could replace it with rbinom(). I did that below and my plots look similar to those I made with your code.
Hope that helps!
lots <- replicate(100, simplify = FALSE, expr = {
for (i in 2:n.steps){
# set up an empty matrix;
mat <- matrix(0,nrow=85,ncol=85)
surv.age.1 <- 0.95
x <- 2:10
diag(mat[x,(x-1)]) <- surv.age.1
surv.age.a <- 0.97
disp <- 1:74
disp <- disp*-0.001
disp1<-0.13
disp<-1-(disp+disp1)
survdisp<-surv.age.a*disp
x <- 11:84
diag(mat[x,(x-1)])<-survdisp
if (i == nextbreed) {
pb <- 1
} else {
pb <- 0
}
if (pb == 1) {
(nextbreed <- nextbreed+12)
}
mat[1,85] <- pb*birth
mat[85,85]<-1
death<-rbinom(1, size = 1, prob = 0.6)
if (death == 0) {
mat <- 0
}
mats[,,i]<- mat
ns[,i] <- mat%*%ns[,i-1]
}
ns
})
lapply(lots, FUN = function(x) apply(x[1:85,],2,sum))

statistical moments in R

I've got a data set in R of a variable, repeated 10,000 times and sampled 200 times on each repeat so a 10,000 by 200 matrix, I would like to calculate statistical moments for the variable up to an arbitrary number. So in the end I would like a numeric vector holding the value of moments.
I can get the variance and the mean for the data set using colMean and colVar, but they only go so far.
I am also aware of the moments package in R, however using the all.moments command is returning me moments for each time course, or treating each column or row as an individual variable, not what I want.
Does anyone know an equivalent to colMean and colVar for higher order moments? And if possible also for cross moments?
Many thanks!
I stole this code from an obscure R package e1071:
theskew<- function (x) {
x<-as.vector(x)
sum((x-mean(x))^3)/(length(x)*sd(x)^3)
}
thekurt <- function (x) {
x<-as.vector(x)
sum((x-mean(x))^4)/(length(x)*var(x)^2) - 3
}
You can fold that into your code by feeding them one column at a time
Okay did this yesterday for posterity here is a loop that will do what I asked.
Provided your data is a time course of a variable you are measuring, and you want the moments of that variable:
rm(list=ls())
yourdata<-read.table("whereveryourdatais/and/variableyouwant")
yourdata<-t(yourdata) #only do this at your own discretion
mu<-colMeans(yourdata,1:ncol(yourdata))
NumMoments <- 5
rawmoments <- matrix(NA, nrow=NumMoments, ncol=ncol(yourdata))
for(i in 1:NumMoments) {
rawmoments[i, ] <- colMeans(yourdata^i)
}
plot(rawmoments[1,])
holder<-matrix(NA,nrow=nrow(yourdata),ncol=ncol(yourdata))
middles<-matrix(NA,nrow=1,ncol=ncol(yourdata))
for(j in 1:nrow(yourdata)){
for(o in 1:ncol(rawmoments)){
middles[o]<-yourdata[j,o]-rawmoments[1,o]
}
holder[j,] <- middles
}
centmoments<-matrix(NA,nrow=NumMoments,ncol=ncol(yourdata))
for(i in 1:NumMoments){
centmoments[i,]<-colMeans(holder^i)
}
Then centmoments has the centralmoments and rawmoments has the raw moments, you can specify how many moments to take by changing the value of NumMoments.
Note that the first row in "centmoments" will be approximately 0.
Is this what you're looking for?
X <- matrix(1:12, 3, 4) # your data
NumMoments <- 5
moments <- matrix(NA, nrow=NumMoments, ncol=ncol(X))
for(i in 1:NumMoments) {
moments[i, ] <- colMeans(X^i)
}
EDIT:
okay, apparently you want "central moments"
X <- matrix(1:12, 3, 4)
NumMoments <- 5
moments <- matrix(NA, nrow=NumMoments, ncol=ncol(X))
Y <- X
for(i in 1:ncol(X)) {
Y[, i] <- Y[, i] - moments[1, i]
}
for(i in 2:NumMoments) {
moments[i, ] <- colMeans(Y^i)
}

How to create adjacency matrix from grid coordinates in R?

I'm new to this site. I was wondering if anyone had experience with turning a list of grid coordinates (shown in example code below as df). I've written a function that can handle the job for very small data sets but the run time increases exponentially as the size of the data set increases (I think 800 pixels would take about 25 hours). It's because of the nested for loops but I don't know how to get around it.
## Dummy Data
x <- c(1,1,2,2,2,3,3)
y <- c(3,4,2,3,4,1,2)
df <- as.data.frame(cbind(x,y))
df
## Here's what it looks like as an image
a <- c(NA,NA,1,1)
b <- c(NA,1,1,1)
c <- c(1,1,NA,NA)
image <- cbind(a,b,c)
f <- function(m) t(m)[,nrow(m):1]
image(f(image))
## Here's my adjacency matrix function that's slowwwwww
adjacency.coordinates <- function(x,y) {
df <- as.data.frame(cbind(x,y))
colnames(df) = c("V1","V2")
df <- df[with(df,order(V1,V2)),]
adj.mat <- diag(1,dim(df)[1])
for (i in 1:dim(df)[1]) {
for (j in 1:dim(df)[1]) {
if((df[i,1]-df[j,1]==0)&(abs(df[i,2]-df[j,2])==1) | (df[i,2]-df[j,2]==0)&(abs(df[i,1]-df[j,1])==1)) {
adj.mat[i,j] = 1
}
}
}
return(adj.mat)
}
## Here's the adjacency matrix
adjacency.coordinates(x,y)
Does anyone know of a way to do this that will work well on a set of coordinates a couple thousand pixels long? I've tried conversion to SpatialGridDataFrame and went from there but it won't get the adjacency matrix correct. Thank you so much for your time.
While I thought igraph might be the way to go here, I think you can do it more simply like:
result <- apply(df, 1, function(pt)
(pt["x"] == df$x & abs(pt["y"] - df$y) == 1) |
(abs(pt["x"] - df$x) == 1 & pt["y"] == df$y)
)
diag(result) <- 1
And avoid the loopiness and get the same result:
> identical(adjacency.coordinates(x,y),result)
[1] TRUE

Calculating all distances between one point and a group of points efficiently in R

First of all, I am new to R (I started yesterday).
I have two groups of points, data and centers, the first one of size n and the second of size K (for instance, n = 3823 and K = 10), and for each i in the first set, I need to find j in the second with the minimum distance.
My idea is simple: for each i, let dist[j] be the distance between i and j, I only need to use which.min(dist) to find what I am looking for.
Each point is an array of 64 doubles, so
> dim(data)
[1] 3823 64
> dim(centers)
[1] 10 64
I have tried with
for (i in 1:n) {
for (j in 1:K) {
d[j] <- sqrt(sum((centers[j,] - data[i,])^2))
}
S[i] <- which.min(d)
}
which is extremely slow (with n = 200, it takes more than 40s!!). The fastest solution that I wrote is
distance <- function(point, group) {
return(dist(t(array(c(point, t(group)), dim=c(ncol(group), 1+nrow(group)))))[1:nrow(group)])
}
for (i in 1:n) {
d <- distance(data[i,], centers)
which.min(d)
}
Even if it does a lot of computation that I don't use (because dist(m) computes the distance between all rows of m), it is way more faster than the other one (can anyone explain why?), but it is not fast enough for what I need, because it will not be used only once. And also, the distance code is very ugly. I tried to replace it with
distance <- function(point, group) {
return (dist(rbind(point,group))[1:nrow(group)])
}
but this seems to be twice slower. I also tried to use dist for each pair, but it is also slower.
I don't know what to do now. It seems like I am doing something very wrong. Any idea on how to do this more efficiently?
ps: I need this to implement k-means by hand (and I need to do it, it is part of an assignment). I believe I will only need Euclidian distance, but I am not yet sure, so I will prefer to have some code where the distance computation can be replaced easily. stats::kmeans do all computation in less than one second.
Rather than iterating across data points, you can just condense that to a matrix operation, meaning you only have to iterate across K.
# Generate some fake data.
n <- 3823
K <- 10
d <- 64
x <- matrix(rnorm(n * d), ncol = n)
centers <- matrix(rnorm(K * d), ncol = K)
system.time(
dists <- apply(centers, 2, function(center) {
colSums((x - center)^2)
})
)
Runs in:
utilisateur système écoulé
0.100 0.008 0.108
on my laptop.
rdist() is a R function from {fields} package which is able to calculate distances between two sets of points in matrix format quickly.
https://www.image.ucar.edu/~nychka/Fields/Help/rdist.html
Usage :
library(fields)
#generating fake data
n <- 5
m <- 10
d <- 3
x <- matrix(rnorm(n * d), ncol = d)
y <- matrix(rnorm(m * d), ncol = d)
rdist(x, y)
[,1] [,2] [,3] [,4] [,5]
[1,] 1.512383 3.053084 3.1420322 4.942360 3.345619
[2,] 3.531150 4.593120 1.9895867 4.212358 2.868283
[3,] 1.925701 2.217248 2.4232672 4.529040 2.243467
[4,] 2.751179 2.260113 2.2469334 3.674180 1.701388
[5,] 3.303224 3.888610 0.5091929 4.563767 1.661411
[6,] 3.188290 3.304657 3.6668867 3.599771 3.453358
[7,] 2.891969 2.823296 1.6926825 4.845681 1.544732
[8,] 2.987394 1.553104 2.8849988 4.683407 2.000689
[9,] 3.199353 2.822421 1.5221291 4.414465 1.078257
[10,] 2.492993 2.994359 3.3573190 6.498129 3.337441
You may want to have a look into the apply functions.
For instance, this code
for (j in 1:K)
{
d[j] <- sqrt(sum((centers[j,] - data[i,])^2))
}
Can easily be substituted by something like
dt <- data[i,]
d <- apply(centers, 1, function(x){ sqrt(sum(x-dt)^2)})
You can definitely optimise it more but you get the point I hope
dist works fast because is't vectorized and call internal C functions.
You code in loop could be vectorized in many ways.
For example to compute distance between data and centers you could use outer:
diff_ij <- function(i,j) sqrt(rowSums((data[i,]-centers[j,])^2))
X <- outer(seq_len(n), seq_len(K), diff_ij)
This gives you n x K matrix of distances. And should be way faster than loop.
Then you could use max.col to find maximum in each row (see help, there are some nuances when are many maximums). X must be negate cause we search for minimum.
CL <- max.col(-X)
To be efficient in R you should vectorized as possible. Loops could be in many cases replaced by vectorized substitute. Check help for rowSums (which describe also rowMeans, colSums, rowSums), pmax, cumsum. You could search SO, e.g.
https://stackoverflow.com/search?q=[r]+avoid+loop (copy&paste this link, I don't how to make it clickable) for some examples.
My solution:
# data is a matrix where each row is a point
# point is a vector of values
euc.dist <- function(data, point) {
apply(data, 1, function (row) sqrt(sum((point - row) ^ 2)))
}
You can try it, like:
x <- matrix(rnorm(25), ncol=5)
euc.dist(x, x[1,])

Resources