Parallel computing on two R servers using batchtools/BatchJobs - r

I'm trying to use batchtools/BatchJobs for parallel computing on two unix-based R servers. I'm completely new to this and hence followed a few articles and package details to do this. I have added some links below:
batchtools,
BatchJobs
So far I have not really understood how to use batchtools for multi-machines. On the other hand, with BatchJobs I have better progress.
I made an ssh connection from the terminal first and execute the following lines:
reg = makeRegistry("TestExp")
reg$cluster.functions = makeClusterFunctionsSSH(worker = makeSSHWorker(nodename="sla19438")) #By BatchJobs
#Test Function
piApprox = function(n) {
nums = matrix(runif(2 * n), ncol = 2)
d = sqrt(nums[, 1]^2 + nums[, 2]^2)
4 * mean(d <= 1)
}
set.seed(42)
piApprox(1000)
BatchJobs::batchMap(reg = reg, fun = piApprox, n = rep(1e7, 10))
getJobTable()
BatchJobs::submitJobs(reg = reg, resources = list(walltime = 3600, memory = 1024))
getStatus(reg = reg)
loadResult(reg = reg, id = 5)
mean(sapply(1:10, loadResult, reg = reg))
It works and gives me the results but I can't see any indication of the jobs being run on the other machine (sla19438) when I run "top" in the terminal.
Please help me understand what I'm doing wrong. Maybe there is some configuration needed but I don't see any material online which dumbs down the steps for a newbie like me.
Thanks

Related

XPlot trying to visualize data

I am having a really hard time trying to visualize some data using f#. I am trying to achieve this on Linux environment using jupyter notebooks that I am running on localhost. I am following this article.
Everything seems to be fine, I managed to load all the needed script files, such as MathNet.Numerics and XPlot. I don't get any errors, my terminal is fine as well, kernel is in place. I wonder why am I not getting any graph reprisentation after I run my code?
It only says that I get back Xplot.Plotly.PlotlyChart, what about the actual graph? I am not sure if this would be enough to help me out, if not, let me know and will fill in other information. I tried different browsers as well, didn't help.
Actual code:
#load #"<project-root>/.paket/load/net45/MathNet.Numerics.fsx"
#load #"<project-root>/.paket/load/net45/MathNet.Numerics.FSharp.fsx"
#load #"<project-root>/.paket/load/net45/XPlot.Plotly.fsx"
open System
open System.Linq
open MathNet.Numerics.Distributions
open MathNet.Numerics.LinearAlgebra
open XPlot.Plotly
let n = 40
let nbsim = 1000
let lambda = 0.2
let randomSeed = 1111
let exponential = Exponential.Samples(new Random(randomSeed), lambda) |> Seq.take (n* nbsim) |> List.ofSeq
let m = Matrix<float>.Build.DenseOfRowMajor(nbsim, n, exponential)
let means = m.RowSums() / (float n)
means.Average()
let historyTrace =
Histogram(
x = means,
xbins =
Xbins(
start = 2.8,
``end`` = 7.75,
size = 0.08
),
marker =
Marker(
color = "yellow",
line =
Line(
color = "grey",
width = 1
)
),
opacity = 0.75,
name = "Exponental distribution"
) :> Trace
let meanTrace =
Scatter(
x = [5; 5],
y = [0; 60],
name = "Theorical mean"
) :> Trace
// Or plain historyTrace below
[historyTrace; meanTrace]
|> Chart.Plot
|> Chart.WithXTitle("Means")
|> Chart.WithYTitle("Frequency")
|> Chart.WithTitle("Distribution of 1000 means of exponential distribution")
Please note that #load statements include <project-root> placeholder. I am using Paket to generate scripts for #load.
This worked for me in the F# Azure Notebook.
Make sure to include this in a cell before you invoke the chart
#load "XPlot.Plotly.Paket.fsx"
#load "XPlot.Plotly.fsx"
open XPlot.Plotly
This is a quote from FSharp for Azure Notebooks:
Note that we had to #load two helper scripts in order to load the
assemblies we need and to enable Display to show our charts. The first
downloads and installs the required Paket packages, and the second
sets up Display support.
The key line for you is: #load "XPlot.Plotly.fsx"
That is the one that lets you display the chart in the notebook.
This is my code in the Azure notebook:
// cell 1
#load "XPlot.Plotly.Paket.fsx"
#load "XPlot.Plotly.fsx"
// cell 2
Paket.Package [ "MathNet.Numerics"
"MathNet.Numerics.FSharp" ]
#load "Paket.Generated.Refs.fsx"
// cell 3
open System
open System.Linq
open MathNet.Numerics.Distributions
open MathNet.Numerics.LinearAlgebra
open XPlot.Plotly
let n = 40
let nbsim = 1000
let lambda = 0.2
let randomSeed = 1111
let exponential = Exponential.Samples(new Random(randomSeed), lambda) |> Seq.take (n* nbsim) |> List.ofSeq
let m = Matrix<float>.Build.DenseOfRowMajor(nbsim, n, exponential)
...

Unable to start Julia connection on port 1023: all connections are in use

I am trying to run Julia function via R using XRJulia package. Below is my code snippet.
## start
library(XRJulia)
prevInterface <- XR::getInterface()
if (is.null(prevInterface)) {
ev <- RJulia(.makeNew = TRUE)
} else {
ev <- RJulia(.makeNew = FALSE)
}
juliaAddToPath(directory = '/home/.julia/lib/v0.6/', package = NULL, evaluator = ev)
runjl <- juliaEval('function sum(a, b)
c= a+b;
return c
end
')
runjl_function <- JuliaFunction(runjl)
sum_result <- runjl_function(1, 5)
XR::rmInterface(XR::getInterface())
## end
This code is working fine. But few times when I am running above code multiple times I am getting
error: Unable to start Julia connection on port 1023: all connections
are in use.
How to close all connections of Julia and what is the systematic way..? Please suggest.
You have the function ServerQuit() in the RJuliaConnect:
https://github.com/johnmchambers/XRJulia/blob/master/R/RJuliaConnect.R

NSGA2 Genetic Algorithm in R

I am working on the NSGA2 package on R (library mco).
My NSGA2 code takes forever to run, so I am wondering:
1) Is there a way to limit the precision of the solution values (say, maybe up to 3 decimal places) instead of infinite?
2) How do I set an equality constraint (the ones online all seemed to be about >= or <= than =)? Not sure if I'm doing it right.
My entire relevant code for reference, for easy tracing: https://docs.google.com/document/d/1xj7OPng11EzLTTtWLdRWMm8zJ9f7q1wsx2nIHdh3RM4/edit?usp=sharing
Relevant sample part of code reproduced here:
VTR = get.hist.quote(instrument = 'VTR',
start="2010-01-01", end = "2015-12-31",
quote = c("AdjClose"),provider = "yahoo",
compress = "d")
ObjFun1 <- function (xh){
f1 <- sum(HSVaR_P(merge(VTR, CMI, SPLS, KSS, DVN, MAT, LOE, KEL, COH, AXP), xh, 0.05, 2))
tempt = merge(VTR, CMI, SPLS, KSS, DVN, MAT, LOE, KEL, COH, AXP)
tempt2 = tempt[(nrow(tempt)-(2*N)):nrow(tempt),]
for (i in 1:nrow(tempt2))
{
for (j in 1:ncol(tempt2))
{
if (is.na(tempt2[i,j]))
{
tempt2[i,j] = 0
}
}
}
f2 <- ((-1)*abs(sum((xh*t(tempt2)))))
c(f1=f1,f2=f2)
}
Constr <- function(xh){
totwt <- (1-sum(-xh))
totwt2 <- (sum(xh)-1)
c(totwt,totwt2)
}
Solution1 <- nsga2(ObjFun1, n.projects, 2,
lower.bounds=rep(0,n.projects), upper.bounds=rep(1,n.projects),
popsize=n.solutions, constraints = Constr, cdim=1,
generations=generations)
The function HSVaR_P returns matrix(x,2*500,1).
Even when I set generations = 1, the code does not seem to run. Clearly there should be some error in the code, somewhere, but I am not entirely sure about the mechanics of the NSGA2 algorithm.
Thanks.

Torch nn. Current error always is nan

I've wrote the following code:
require 'nn'
require 'cunn'
file = torch.DiskFile('train200.data', 'r')
size = file:readInt()
inputSize = file:readInt()
outputSize = file:readInt()
dataset = {}
function dataset:size() return size end;
for i=1,dataset:size() do
local input = torch.Tensor(inputSize)
for j=1,inputSize do
input[j] = file:readFloat()
end
local output = torch.Tensor(outputSize)
for j=1,outputSize do
output[j] = file:readFloat()
end
dataset[i] = {input:cuda(), output:cuda()}
end
net = nn.Sequential()
hiddenSize = inputSize * 2
net:add(nn.Linear(inputSize, hiddenSize))
net:add(nn.Tanh())
net:add(nn.Linear(hiddenSize, hiddenSize))
net:add(nn.Tanh())
net:add(nn.Linear(hiddenSize, outputSize))
criterion = nn.MSECriterion()
net = net:cuda()
criterion = criterion:cuda()
trainer = nn.StochasticGradient(net, criterion)
trainer.learningRate = 0.02
trainer.maxIteration = 100
trainer:train(dataset)
And it must works good (At least I think so), and it works correct when inputSize = 20. But when inputSize = 200 current error always is nan. At first I've thought that file reading part is incorrect. I've recheck it some times but it is working great. Also I found that sometimes too small or too big learning rate may affect on it. I've tried learning rate from 0.00001 up to 0.8, but still the same result. What I'm doing wrong?
Thanks,
Igor

Increase the github xrate limit in R

Increasing xrate limit related to github api in R, I am using rgithub package in R to extract the pull requests from github. I have registered the application and generated the client id's. The code of using xrate limit in R, the ctx contains the details clientids. To increase the xrate limit I am running the curl command with details. Is this the correct way to increase the rate-limit. Please suggest any other way possible
ctx = interactive.login("clientid", "client_secret_id")
owner = "a"
repo = "repo_name"
comments <- function(i){
commits <- get.issue.comments(owner = owner, repo = repo, number = i,
ctx = get.github.context(), per_page=100)
links <- digest_header_links(commits)
number_of_pages <- links[2,]$page
if (number_of_pages != 0)
try_default(for (n in 1:number_of_pages){
if (as.integer(commits$headers$`x-ratelimit-remaining`) < 5)
Sys.sleep(as.integer(commits$headers$`x-ratelimit-reset`)
- as.POSIXct(Sys.time()) %>% as.integer())
else
get.issue.comments(owner = owner, repo = repo, number = i,
ctx =get.github.context(), per_page=100, page = n)
}, default = NULL)
else
return(commits)
}
list <- c(issueid) # a issue id in github
comments_lists <- lapply(list, comments)
curl -i 'https://api.github.com/users/whatever?client_id=&client_secret=yyyy'

Resources