Error with parallelization in R for S4 objects - r

I am trying to optimize a function that I am going to do with a several rasters with millions of cells, so I want to parallelize this function.
The initial Raster
So this is the initial raster:
library(raster)
SPA <- raster(nrows=3, ncols=3, xmn = -10, xmx = -4, ymn = 4, ymx = 10)
values(SPA) <- c(0.1, 0.4, 0.6, 0, 0.2, 0.4, 0, 0.1, 0.2)
plot(SPA)
The objective of the function is to get a dataframe with the distance between all of the cells present in the raster with a column from, a column to, and a column distance.
Transition layer
in order to do that I create a transition layer using the gdistance package:
library(gdistance)
h16 <- transition(SPA, transitionFunction=function(x){1},16,symm=FALSE)
h16 <- geoCorrection(h16, scl=FALSE)
and the origin points for every cell:
B <- xyFromCell(SPA, cell = 1:ncell(SPA))
head(B)
x y
[1,] -9 9
[2,] -7 9
[3,] -5 9
[4,] -9 7
[5,] -7 7
[6,] -5 7
Distance function
With some help from some stackoverflow answers I made this function which is faster than the accCost one in gdistance
accCost2 <- function(x, fromCoords) {
fromCells <- cellFromXY(x, fromCoords)
tr <- transitionMatrix(x)
tr <- rBind(tr, rep(0, nrow(tr)))
tr <- cBind(tr, rep(0, nrow(tr)))
startNode <- nrow(tr)
adjP <- cbind(rep(startNode, times = length(fromCells)), fromCells)
tr[adjP] <- Inf
adjacencyGraph <- graph.adjacency(tr, mode = "directed", weighted = TRUE)
E(adjacencyGraph)$weight <- 1/E(adjacencyGraph)$weight
return(shortest.paths(adjacencyGraph, v = startNode, mode = "out")[-startNode])
}
What I want to parallelize
And using apply I get my desired data.frame
connections <- data.frame(from = rep(1:nrow(B), each = nrow(B)),to = rep(1:nrow(B), nrow(B)), dist =as.vector(apply(B,1, accCost2, x = h16)))
head(connections)
from to dist
1 1 1 0.0
2 1 2 219915.7
3 1 3 439831.3
4 1 4 221191.8
5 1 5 312305.7
6 1 6 493316.1
This is what I tried with parApply
library("parallel")
cl = makeCluster(3)
clusterExport(cl, c("B", "h16", "accCost2"))
clusterEvalQ(cl, library(gdistance), library(raster))
connections <- data.frame(from = rep(1:nrow(B), each = nrow(B)),to = rep(1:nrow(B), nrow(B)), dist =as.vector(parRapply(cl, B,1, accCost2, x = h16)))
stopCluster(cl)
But I get the following error:
Error in x[i, , drop = FALSE] : object of type 'S4' is not subsettable
I am fairly new in parallelization, and I am not sure what I am doing wrong

There are several syntax issues in your code.
This code works for me.
library("parallel")
accCost_wrap <- function(x){accCost2(h16,x)}
#Instead of including h16 in the parRapply function,
#just get it in the node environment
cl = makeCluster(3)
clusterExport(cl, c("h16", "accCost2"))
#B will be "sent" to the nodes through the parRapply function.
clusterEvalQ(cl, {library(gdistance)})
#raster is a dependency of gdistance, so no need to include raster here.
pp <- parRapply(cl, x=B, FUN=accCost_wrap)
stopCluster(cl)
connections <- data.frame(from = rep(1:nrow(B), each = nrow(B)),
to = rep(1:nrow(B), nrow(B)),
dist = as.vector(pp))
Your version of accCost is indeed faster than the version in gdistance. Your version omits the checks to see if your points are within the extent of your transition layer. Proceed with caution.
(You could make your function even faster by taking the cell numbers as input. Also, sending so much data back from each node does not seem very efficient.)

Related

How to generate iterations of list object in R

I'd like to generate set of list object.
To start, I have a 2*2 matrix from which I should get a list of output.
The list contains: a projection matrix, an asymptotic dynamic, a transient dynamic and a matrix of elasticity: hence 4 objects. I can have all of them from the function projection.
My difficulty is that:
In task 1, I'd like to vary one of the elements (the third called gamma) of the starting matrix and then get a list of as many output as possible.
What I did shows only the first element of the list for each iteration.
#Creating function projection matrix
projection<- function(sigma1,sigma2,gama,phi){
A <- matrix(c(sigma1*(1-gama),phi,sigma1*gama, sigma2),
byrow = T, ncol = 2)
if(sigma1>1|sigma1<0){stop("sigma1 must be bounded in 0 and 1")}
if(gama>1|gama<0){stop("gama must be bounded in 0 and 1")}
if(phi<0){stop("phi must be greater or equal to 0")}
library(popbio)
e.a <- eigen.analysis(A)
as <- e.a$lambda1
tr <- -log(as)
Dynamic <- list(projection.matrix = A, assymtotic.dynamic=as,
transient.dynamic=tr, Elasticity=e.a$elasticities)
return(Dynamic)
}
#Try with B
B <- projection(0.5,0.9,0.1,1.5)
#Task 1
Task1 <- function(Gama){
n <- length(as.vector(Gama))
g <- list()
for (i in 1:n){g[i]<-projection(sigma1 = 0.5,sigma2 = 0.9,
gama = Gama[i],phi = 1.5)}
return(g)
}
G <- seq(from=0, to=1, by= 0.1)
Task1(G)
There's a fairly easy fix. Instead of using [<- for the assignment of the indexed projection-object use instead the [[<- function and don't forget to assign the result to an object name so you can inspect and use it. Otherwise there will only be material printed at the console but the result will be in the (temporary) environment of the function which will get garbage-collected.
Task1 <- function(Gama){
n <- length(as.vector(Gama))
g <- list()
for (i in 1:n){g[[i]]<-projection(sigma1 = 0.5,sigma2 = 0.9,
gama = Gama[i],phi = 1.5)}
return(g)
}
G <- seq(from=0, to=1, by= 0.1)
resG <- Task1(G)
resG[1]
#--- result is a list of list.
[[1]]
[[1]]$projection.matrix
[,1] [,2]
[1,] 0.5 1.5
[2,] 0.0 0.9
[[1]]$assymtotic.dynamic
[1] 0.9
[[1]]$transient.dynamic
[1] 0.1053605
[[1]]$Elasticity
[,1] [,2]
[1,] 0 0
[2,] 0 1

Creating more pseudo-random matrices same time in R? Comparing the points sign matching?

I can make one pseudo-random matrix with the following :
nc=14
nr=14
set.seed(111)
M=matrix(sample(
c(runif(58,min=-1,max=0),runif(71, min=0,max=0),
runif(nr*nc-129,min=0,max=+1))), nrow=nr, nc=nc)
The more important question: I need 1000 matrices with the same amount of negative, positive and zero values, just the location in the matrices need to be various.
I can make matrices one by one, but I want to do this task faster.
The less important question: If I have the 1000 matrices, I need to identify for every point of the matrices, that how many positive negative or zero value got there, for example:
MATRIX_A
[,1]
[9,] -0,2
MATRIX_B
[,1]
[9,] -0,5
MATRIX_C
[,1]
[9,] 0,1
MATRIX_D
[,1]
[9,] 0,0
MATRIX_E
[,1]
[9,] 0,9
What I need:
FINAL_MATRIX_positive
[,1]
[9,] (2/5*100)=40% or 0,4 or 2
,because from 5 matrix in this point were 2 positive value, and also need this for negative and zero values too.
If it isn't possible to do this in R, I can compare them "manually" in Excel.
Thank you for your help!
Actually you are almost there!
You can try the code below, where replicate can make 1000 times for generating the random matrix, and Reduce gets the statistics of each position:
nc <- 14
nr <- 14
N <- 1000
lst <- replicate(
N,
matrix(sample(
c(
runif(58, min = -1, max = 0),
runif(71, min = 0, max = 0),
runif(nr * nc - 129, min = 0, max = +1)
)
), nrow = nr, nc = nc),
simplify = FALSE
)
pos <- Reduce(`+`,lapply(lst,function(M) M > 0))/N
neg <- Reduce(`+`,lapply(lst,function(M) M < 0))/N
zero <- Reduce(`+`,lapply(lst,function(M) M == 0))/N
I use a function for your simulation scheme:
my_sim <- function(n_neg = 58, n_0 = 71, n_pos = 67){
res <- c(runif(n_neg, min=-1, max=0),
rep(0, n_0),
runif(n_pos, min=0, max=+1))
return(sample(res))
}
Then, I simulate your matrices (I store them in a list):
N <- 1000
nr <- 14
nc <- nr
set.seed(111)
my_matrices <- list()
for(i in 1:N){
my_matrices[[i]] <- matrix(my_sim(), nrow = nr, ncol = nc)
}
Finally, I compute the proportion of positive numbers for the position row 1 and column 9:
sum(sapply(my_matrices, function(x) x[1,9]) > 0)/N
# [1] 0.366
However, if you are interested in all the positions, these lines will do the job:
aux <- lapply(my_matrices, function(x) x > 0)
FINAL_MATRIX_positive <- 0
for(i in 1:N){
FINAL_MATRIX_positive <- FINAL_MATRIX_positive + aux[[i]]
}
FINAL_MATRIX_positive <- FINAL_MATRIX_positive/N
# row 1, column 9
FINAL_MATRIX_positive[1, 9]
# [1] 0.366

Parallelization Apply to parRapply

My data set is:
ll <- matrix(c(5, 6, 60, 60), ncol=2)
And I use the function spDistsN1 from the library "sp" to obtain a distance matrix with apply:
apply(ll, 1, function(x) spDistsN1(as.matrix(ll), x, longlat = T))
But I want to do it with parallelization, so for that:
library(parallel)
ncore <- detectCores()
cl <- makeCluster(ncore)
clusterEvalQ(cl = cl, expr = c(library(sp)))
parRapply(cl = cl, x = ll, FUN = function(x) spDistsN1(as.matrix(ll), x,
longlat = T))
It shows the following error:
Error in checkForRemoteErrors(val) :
4 nodes produced errors; first error: object 'll' not found
How do I fix it?
An easier alternative to using parallel's parApply() or parRapply() is to use future_apply() of the future.apply package (disclaimer: I'm the author) because global variables are automatically exported - no need to worry about parallel::clusterExport() etc. Just use it as you would use apply(), e.g.
library(sp)
library(future.apply)
plan(multiprocess) ## parallelize on local machine
ll <- matrix(c(5, 6, 60, 60), ncol = 2)
## Sequentially
y0 <- apply(ll, 1, function(x) A(ll, x, longlat = TRUE))
print(y0)
# [,1] [,2]
# [1,] 0.00000 55.79918
# [2,] 55.79918 0.00000
## In parallel
y1 <- future_apply(ll, 1, function(x) spDistsN1(ll, x, longlat = TRUE))
print(y1)
# [,1] [,2]
# [1,] 0.00000 55.79918
# [2,] 55.79918 0.00000
print(identical(y1, y0))
# [1] TRUE
You may also find the blog post future.apply - Parallelize Any Base R Apply Function helpful.
You need to export all variables to workers. See ?parallel::clusterExport.

PSO and K-means based Text Document Clustering in R

I am newbie to Particle Swarm Optimization. I read research paper on Clustering based on PSO and K-means but I did not found any working example of the same. Any kind of help is much appreciated. Thanks in advance!
I want to perform text document clustering using PSO and K-means in R. I have the basic idea that first PSO will give me the optimised values of the cluster centroids, then I have to use those optimised value of cluster centroids of PSO as the initial cluster centroid for k-means to get cluster of documents.
Below are the codes which describe what I have done so far!
#Import library
library(pdist)
library(hydroPSO)
#Create matrix and suppose it is our document term matrix which we get after
the cleaning of corpus
( In my actual data I have 20 documents with 951 terms i.e., dim(dtm) = 20*951 )
matri <- matrix(data = seq(1, 20, 1), nrow = 4, ncol = 7, byrow = TRUE)
matri
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 2 3 4 5 6 7
[2,] 8 9 10 11 12 13 14
[3,] 15 16 17 18 19 20 1
[4,] 2 3 4 5 6 7 8
#Initially select first and second row as centroids
cj <- matri[1:2,]
#Calculate Euclidean Distance of each data point from centroids
vm <- as.data.frame(t(as.matrix(pdist(matri, cj))))
vm
V1 V2 V3 V4
1 0.00000 18.52026 34.81379 2.645751
2 18.52026 0.00000 21.51744 15.874508
#Create binary matrix S in which 1 means Instance Ii is allocated to the cluster Cj otherwise 0.
S <- matrix(data = NA, nrow = nrow(vm), ncol = ncol(vm))
for(i in 1:nrow(vm)){
for(j in 1:ncol(vm)){
cd <- which.min(vm[, j])
ifelse(cd==i, S[i,j] <-1, S[i,j] <-0)
}
}
S
[,1] [,2] [,3] [,4]
[1,] 1 0 0 1
[2,] 0 1 1 0
#Apply `hydroPSO()` to get optimised values of centroids.
set.seed(5486)
D <- 4 # Dimension
lower <- rep(0, D)
upper <- rep(10, D)
m_s <- matrix(data = NA, nrow = nrow(S), ncol = ncol(matri))
Fn= function(y) { #Objective Function which has to be minimised
for(j in 1:ncol(matri)){
for(i in 1:nrow(matri)){
for(k in 1:nrow(y)){
for(l in 1:ncol(y)){
m_s[k,] <- colSums(matri[y[k,]==1,])/sum(y[k,])
}
}
}
}
sm <- sum(m_s)/ nrow(S)
return(sm)
}
hh1 <- hydroPSO(S,fn=Fn, lower=lower, upper=upper,
control=list(write2disk=FALSE, npart=3))
But the above hydroPSO() function is not working. It is giving error Error in 1:nrow(y) : argument of length 0. I searched for it but didn't get any solution which works for me.
I also made some changes in my objective function and this time hydroPSO() worked but I guess not correctly. I am passing my initial centroid matrix as a parameter whose dimension is 2*7 but the function returns only 1*7 optimised values. I am not getting its reason.
set.seed(5486)
D <- 7# Dimension
lower <- rep(0, D)
upper <- rep(10, D)
Fn = function(x){
vm <- as.data.frame(t(as.matrix(pdist(matri, x))))
S <- matrix(data = NA, nrow = nrow(vm), ncol = ncol(vm))
for(i in 1:nrow(vm)){
for(j in 1:ncol(vm)){
cd <- which.min(vm[, j])
ifelse(cd==i, S[i,j] <-1, S[i,j] <-0)
}
}
m_s <- matrix(data = NA, nrow = nrow(S), ncol = ncol(matri))
for(j in 1:ncol(matri)){
for(i in 1:nrow(matri)){
for(k in 1:nrow(S)){
for(l in 1:ncol(S)){
m_s[k,] <- colSums(matri[S[k,]==1,])/sum(S[k,])
}
}
}
}
sm <- sum(m_s)/ nrow(S)
return(sm)
}
hh1 <- hydroPSO(cj,fn=Fn, lower=lower, upper=upper,
control=list(write2disk=FALSE, npart=2, K=2))
Output of the above function.
## $par
## Param1 Param2 Param3 Param4 Param5 Param6 Param7
## 8.6996174 2.1952303 5.6903588 0.4471795 3.7103161 1.6605425 8.2717574
##
## $value
## [1] 61.5
##
## $best.particle
## [1] 1
##
## $counts
## function.calls iterations regroupings
## 2000 1000 0
##
## $convergence
## [1] 3
##
## $message
## [1] "Maximum number of iterations reached"
I guess I am passing parameters to the hydroPSO() in a wrong way. Please correct me where I'm doing it wrong.
Thank you very much!
Instead of passing cj to hydroPSO() I used as.vector(t(cj)) in my second approach and it worked fine for me. I got 14 optimised values

How to randomise a matrix element for each iteration of a loop?

I'm working with the popbio package on a population model. It looks something like this:
library(popbio)
babies <- 0.3
kids <- 0.5
teens <- 0.75
adults <- 0.98
A <- c(0,0,0,0,teens*0.5,adults*0.8,
babies,0,0,0,0,0,
0,kids,0,0,0,0,
0,0,kids,0,0,0,
0,0,0,teens,0,0,
0,0,0,0,teens,adults
)
A <- matrix ((A), ncol=6, byrow = TRUE)
N<-c(10,10,10,10,10,10)
N<-matrix (N, ncol=1)
model <- pop.projection(A,N,iterations=10)
model
I'd like to know how I can randomise the input so that at each iteration, which represents years this case, I'd get a different input for the matrix elements. So, for instance, my model runs for 10 years, and I'd like to have the baby survival rate change for each year. babies <- rnorm(1,0.3,0.1)doesn't do it because that still leaves me with a single value, just randomly selected.
Update: This is distinct from running 10 separate models with different initial, random values. I'd like the update to occur within a single model run, which itself has 10 iteration in the pop.projection function.
Hope you can help.
I know this answer is very late, but here's one approach using expressions. First, use an expression to create the matrix.
vr <- list( babies=0.3, kids=0.5, teens=0.75, adults=0.98 )
Ax <- expression( matrix(c(
0,0,0,0,teens*0.5,adults*0.8,
babies,0,0,0,0,0,
0,kids,0,0,0,0,
0,0,kids,0,0,0,
0,0,0,teens,0,0,
0,0,0,0,teens,adults), ncol=6, byrow = TRUE ))
A1 <- eval(Ax, vr)
lambda(A1)
[1] 1.011821
Next, use an expression to create vital rates with nrorm or other functions.
vr2 <- expression( list( babies=rnorm(1,0.3,0.1), kids=0.5, teens=0.75, adults=0.98 ))
A2 <- eval(Ax, eval( vr2))
lambda(A2)
[1] 1.014586
Apply the expression to 100 matrices.
x <- sapply(1:100, function(x) lambda(eval(Ax, eval(vr2))))
quantile(x, c(.05,.95))
5% 95%
0.996523 1.025900
Finally, make two small changes to pop.projection by adding the vr option and a line to evaluate A at each time step.
pop.projection2 <- function (Ax, vr, n, iterations = 20)
{
x <- length(n)
t <- iterations
stage <- matrix(numeric(x * t), nrow = x)
pop <- numeric(t)
change <- numeric(t - 1)
for (i in 1:t) {
stage[, i] <- n
pop[i] <- sum(n)
if (i > 1) {
change[i - 1] <- pop[i]/pop[i - 1]
}
## evaluate Ax
A <- eval(Ax, eval(vr))
n <- A %*% n
}
colnames(stage) <- 0:(t - 1)
w <- stage[, t]
pop.proj <- list(lambda = pop[t]/pop[t - 1], stable.stage = w/sum(w),
stage.vectors = stage, pop.sizes = pop, pop.changes = change)
pop.proj
}
n <-c(10,10,10,10,10,10)
pop.projection2(Ax, vr2, n, 10)
$lambda
[1] 0.9874586
$stable.stage
[1] 0.33673579 0.11242588 0.08552367 0.02189786 0.02086656 0.42255023
$stage.vectors
0 1 2 3 4 5 6 7 8 9
[1,] 10 11.590000 16.375700 19.108186 20.2560223 20.5559445 20.5506251 20.5898222 20.7603581 20.713271
[2,] 10 4.147274 3.332772 4.443311 5.6693931 1.9018887 6.8455597 5.3879202 10.5214540 6.915534
[3,] 10 5.000000 2.073637 1.666386 2.2216556 2.8346965 0.9509443 3.4227799 2.6939601 5.260727
[4,] 10 5.000000 2.500000 1.036819 0.8331931 1.1108278 1.4173483 0.4754722 1.7113899 1.346980
[5,] 10 7.500000 3.750000 1.875000 0.7776139 0.6248948 0.8331209 1.0630112 0.3566041 1.283542
[6,] 10 17.300000 22.579000 24.939920 25.8473716 25.9136346 25.8640330 25.9715930 26.2494195 25.991884
$pop.sizes
[1] 60.00000 50.53727 50.61111 53.06962 55.60525 52.94189 56.46163 56.91060 62.29319 61.51194
$pop.changes
[1] 0.8422879 1.0014610 1.0485765 1.0477793 0.9521023 1.0664832 1.0079517 1.0945797 0.9874586

Resources