Distribute function computation over several cores - R - r

I have a function like the following one:
FXN <- function(a, b, c, d, e, f, g){
value <- (a*b/c+d^e)-f+g
write.table(value, paste(a, b, c, d, e, f, g, ".txt", sep = "")}
(But it is a lot more computationally heavy, this just demonstrates the basic idea - does some computation on a single core; writes it to a file.)
If I want to evaluate this function in parallel on 64 cores for all combinations of
a = c(1, 2, 3, 4, 5)
b = c(5, 6, 7, 8, 9)
c = c(11, 12, 13, 14, 15)
d = c(3, 4)
e = c(5, 7, 8)
f = c(4, 5, 6)
g = c(2, 4, 6)
How can I do this? There are 6750 combinations of these values, so the function needs to be run 6750 times.
I am trying the following, but am not sure how foreach handles the function computation. Ideally for my actual function, it just runs the function on each core, writes the table, and moves onto the next combination which hasn't already been run.
library(foreach)
n_cores = 64
registerDoMC(n_cores)
foreach (a = c(1, 2, 3, 4, 5)) %dopar% {
foreach (b = c(5, 6, 7, 8, 9)) %dopar% {
foreach (c = c(11, 12, 13, 14, 15)) %dopar% {
foreach (d = c(3, 4)) %dopar% {
foreach (e = c(5, 7, 8)) %dopar% {
foreach (f = c(4, 5, 6)) %dopar% {
foreach (g = c(2, 4, 6)) %dopar% {
FXN(
a=a,
b=b,
c=c,
d=d,
e=e,
f=f,
g=g
)
}
}
}
}
}
}
}
Would this theoretically work? What other way could I distribute the 6750 combinations over 64 cores, speeding up the computations?
Thanks!

Related

Unlist LAST level of a list in R

I have a list of list like ll:
ll <- list(a = list(data.frame(c = 1, d = 2), data.frame(h = 3, j = 4)), b = list(data.frame(c = 5, d = 6), data.frame(h = 7, j = 9)))
I want to unnest/unlist the last level of the structure (the interior list). Note that every list contains the same structure. I want to obtain lj:
lj <- list(a = (data.frame(c = 1, d = 2, h = 3, j = 4)), b = data.frame(c = 5, d = 6, h = 7, j = 9))
I have tried the following code without any success:
lj_not_success <- unlist(ll, recursive = F)
However, this code unlists the FIRST level, not the LAST one.
Any clue?
We may need to cbind the inner list elements instead of unlisting as the expected output is a also a list of data.frames
ll_new <- lapply(ll, function(x) do.call(cbind, x))
-checking
> identical(lj, ll_new)
[1] TRUE

R optim: correct using for function with more than one argument

I am writing a function for minimization using optim. The task is to solve some similar optimization tasks in loop.
# K and k are always the same (they are read from a file)
K <- matrix(data = c(1, 2, 1, 2, 1,
2, 16, 2, 1, 2,
1, 2, 8, 2, 1,
2, 1, 2, 16, 2,
1, 2, 1, 2, 32),
nrow = 5, ncol = 5, byrow = TRUE)
k <- c(-2, 4, 12, 0, 2)
# j will be changed
minimize <- function(beta){ #function to minimize (for beta)
value <- (1/2)*(t(beta)%*%K%*%beta) - t(k)%*%beta + j*abs(sum(beta)-n_s)
return(value)
}
myfunc <- function(K, k, m) #K is matrix, k is vector
{
j_values <- 10^seq(-5, 5, length = m)
for (i in 1:m)
{
current_j_value <- j_values[i]
#I want to set j in minimize function as current_j_value (and also my k and K from file)
# and then minimize it
myans <- optim(c(0, 0, 0, 0, 0), minimize) # using minimize(K, k, j) doesn't work
print(myans$par)
}
}
myfunc(K, k, 5)
My question is how to give arguments to my minimize function (to create it dynamically?) and then use it in optim.
If youw ant to include extra parameters in you minimize function you can add these in the optim call as from the documentation see ?optim the dots (...)
... Further arguments to be passed to fn and gr.
So including j, k, K and n_s in minimize
minimize <- function(beta, j, k, K, n_s){ #function to minimize (for beta)
value <- (1/2)*(t(beta)%*%K%*%beta) - t(k)%*%beta + j*abs(sum(beta)-n_s)
return(value)
}
and then adding these to the optim call (I have set n_s = 0) like,
myfunc <- function(K, k, m) #K is matrix, k is vector
{
j_values <- 10^seq(-5, 5, length = m)
for (i in 1:m)
{
current_j_value <- j_values[i]
#I want to set j in minimize function as current_j_value (and also my k and K from file)
# and then minimize it
myans <- optim(c(0, 0, 0, 0, 0), minimize, j = current_j_value, k = k, K = K, n_s = 0) # using minimize(K, k, j) doesn't work
print(myans$par)
}
}
Running this then,
> myfunc(K, k, 5)
[1] -6.7956860 0.7999990 1.9999999 0.5333326 0.1290324
[1] -6.7911329 0.7996483 2.0000002 0.5329818 0.1290322
[1] -5.3512894 0.6889257 1.9999436 0.4222287 0.1290095
[1] -2.80295781 0.61426579 1.95348934 0.24715200 -0.01194974
[1] -1.2999142 0.4313710 1.3088572 -0.5764644 0.1361504
All the code together
# K and k are always the same (they are read from a file)
K <- matrix(data = c(1, 2, 1, 2, 1,
2, 16, 2, 1, 2,
1, 2, 8, 2, 1,
2, 1, 2, 16, 2,
1, 2, 1, 2, 32),
nrow = 5, ncol = 5, byrow = TRUE)
k <- c(-2, 4, 12, 0, 2)
# j will be changed
minimize <- function(beta, j, k, K, n_s){ #function to minimize (for beta)
value <- (1/2)*(t(beta)%*%K%*%beta) - t(k)%*%beta + j*abs(sum(beta)-n_s)
return(value)
}
myfunc <- function(K, k, m) #K is matrix, k is vector
{
j_values <- 10^seq(-5, 5, length = m)
for (i in 1:m)
{
current_j_value <- j_values[i]
#I want to set j in minimize function as current_j_value (and also my k and K from file)
# and then minimize it
myans <- optim(c(0, 0, 0, 0, 0), minimize, j = current_j_value, k = k, K = K, n_s = 0) # using minimize(K, k, j) doesn't work
print(myans$par)
}
}
myfunc(K, k, 5)

identify and quantify cluster types in igraph

I have the following graph:
That's made from the given data:
K01 <- cbind(c(1, 3, 4, 6, 7, 8, 9, 11, 13, 14),
c(2, 4, 5, 7, 8, 6, 10, 12, 14, 15))
K02 <- graph_from_edgelist(K01, directed = FALSE)
I'd like to be able to quantify the number and type of all subclusters in the graph. So I'd like to be able to be able separate clusters of the same size by whether they're circular or not, or other properties about the nodes.
I can get cluster size pretty easily with:
K03 <- components(K02)
K04 <- groups(K03)
But this doesn't help me sort by cluster structure.
I'm not entirely sure but it seems like as long as there are no loops, if the number of edges in a group is equal to or greater than the number of vertex, it is circular.
with(do.call(rbind, lapply(K04, function(x){
g = induced_subgraph(K02, x)
data.frame(n = vcount(g),
type = ifelse(ecount(g) >= vcount(g), "circular", "simple"))
})), table(n, type))
# type
#n simple circular
# 2 3 0
# 3 2 1
So, thanks to d.b for his help on this. I broke down and wrote a function that I think performs the task I was trying to accomplish. It's a bit verbose and probably clunky, but it seems to work for what i need it to.
######
# take an igraph graph
# and return a nested list the length of the number of unique cluster structures
# where each list item is a list of clusters that shared that structure
# names ?
######
CategorizeSubGraphs <- function(IGraph) {
cat("\n")
# get groups
IGroups <- groups(components(IGraph))
Structures <- vector("list",
length = length(IGroups))
StructNames <- vector("list",
length = length(IGroups))
AllSubGraphs <- vector("list",
length = length(IGroups))
# initialize a progress bar
pBar <- txtProgressBar(style = 1L)
# Get all graph structures, remove names for unique()
for (i in seq_along(Structures)) {
AllSubGraphs[[i]] <- induced_subgraph(graph = IGraph,
vids = IGroups[[i]])
Structures[[i]] <- AllSubGraphs[[i]][seq(length(AllSubGraphs[[i]][1]))]
StructNames[[i]] <- dimnames(Structures[[i]])[[1]]
dimnames(Structures[[i]]) <- list(NULL,
NULL)
setTxtProgressBar(pb = pBar,
value = i / length(Structures))
}
cat("\n")
# categorize by structure
GraphSizes <- sapply(unique(Structures),
function(x) nrow(x))
AllSizes <- sapply(Structures,
function(x) nrow(x))
GraphTemplates <- unique(Structures)
Result <- vector("list",
length = length(GraphTemplates))
StructureCategory <- vector("integer",
length = length(AllSizes))
# assign each subgraph to a category
for (i in seq_along(StructureCategory)) {
Catch <- which(GraphSizes == AllSizes[i])
if (length(Catch) == 1L) {
StructureCategory[i] <- Catch
} else {
for (j in seq_along(Catch)) {
if (all(GraphTemplates[[Catch[j]]] == Structures[[i]])) {
StructureCategory[i] <- Catch[j]
}
}
}
setTxtProgressBar(pb = pBar,
value = i / length(StructureCategory))
}
Count <- rep(0L,
length(Result))
ResultSizes <- sapply(unique(StructureCategory),
function(x) length(which(StructureCategory == x)))
for (i in seq_along(Result)) {
Result[[i]] <- vector("list",
length = ResultSizes[i])
}
# collect all subgraphs into their distinct categories
for (i in seq_along(StructureCategory)) {
Count[StructureCategory[i]] <- Count[StructureCategory[i]] + 1L
Result[[StructureCategory[i]]][[Count[StructureCategory[i]]]] <- AllSubGraphs[[i]]
}
cat("\n")
return(Result)
}
So this tries to, and I think successfully sorts and summarizes all the subgraphs in a given graph.
So given:
K01 <- cbind(c(1, 3, 4, 6, 7, 8, 9, 11, 13, 14, 16, 18, 18, 17, 20, 21, 21),
c(2, 4, 5, 7, 8, 6, 10, 12, 14, 15, 18, 17, 19, 16, 21, 22, 23))
K02 <- graph_from_edgelist(K01,
directed = FALSE)
Which starts with:
K03 <- CategorizeSubGraphs(IGraph = K02)
length(K03) # the number of distinct subgraph types
[1] 5
lengths(K03) # the number of individual subgraphs for each type
[1] 3 2 1 1 1
And you can plot the individual subclusters/subclusters out appropriately, and grab their names from the graph object.
I've not tested this with graphs with interconnected subgraphs, because I need it with a fairly specific type of data, but it's here if anyone else has this question!

Passing vector elements to call as consecutive arguments

Given function and test vector:
multiply_stuff <- function(...) {
arguments <- list(...)
Reduce(f = `*`, x = arguments)
}
test_vec <- c(1, 20, 3, 40, 5, 60)
I would like to create an unevaluated call automatically listing all arguments of the passed vector. In this example that would be equivalent of expression:
call("multiply_stuff",
test_vec[1],
test_vec[2],
test_vec[3],
test_vec[4],
test_vec[5],
test_vec[6])
Attempts
For instance for the vector:
test_vec_B <- c(1, 5, 6, 8, 9, 11, 12, 14, 20, 11)
I would like to automatically list all test_vec_B arguments within call("multiply_stuff",...). Naturally this won't work:
call("multiply_stuff", test_vec_B)
call("multiply_stuff", list(test_vec_B))
Desired results
Unevaluated expression equivalent to:
call(
"multiply_stuff",
test_vec_B[1],
test_vec_B[2],
test_vec_B[3],
test_vec_B[4],
test_vec_B[5],
test_vec_B[6],
test_vec_B[7],
test_vec_B[8],
test_vec_B[9],
test_vec_B[10]
)
You can create a call object and then add arguments to it:
multiply_stuff <- function(...) {
arguments <- list(...)
Reduce(f = `*`, x = arguments)
}
test_vec_B <- c(1, 5, 6, 8, 9, 11, 12, 14, 20, 11)
get_call <- function(f, arg_vector){
my_call <- call(f)
my_call[2:(length(arg_vector) + 1)] <- arg_vector
return(my_call)
}
multiply_stuff(1, 5, 6, 8, 9, 11, 12, 14, 20, 11)
[1] 878169600
test_call <- get_call("multiply_stuff", test_vec_B)
eval(test_call)
[1] 878169600
Explanation: when you create a call object, you can access/modify the function and its arguments by index just like usual. Index 1 is the function call, indices from 2 onwards are arguments.Run to verify:
test_call2 <- call("sum", 1, 2)
test_call2[1]
test_call2[2]
test_call2[3]
eval(test_call2)
test_call2[3] <- 1234
eval(test_call2)

How to efficiently produce a desired matrix in R?

I was trying to produce the following 7 x 4 matrix in R:
m = matrix(c(seq(25, 1, by = -4),
seq(26, 2, by = -4),
seq(27, 3, by = -4),
seq(28, 4, by = -4)), nrow = 7, ncol = 4)
BUT, I'm wondering if could I achieve the same matrix with more efficient R code than what I used above?
Here's a solution:
m <- matrix(rev(c(1:28)),nrow=7,ncol=4,byrow = TRUE)[,rev(1:4)]
And this one is even faster:
m <- matrix(28:1,nrow=7,ncol=4,byrow = TRUE)[,4:1]
m = matrix(c(rep(seq(25, 1, by = -4),4) + rep(c(0:3),each=7) ), nrow = 7, ncol = 4)
Not sure if you would call this more efficient...

Resources