Running R code on linux in parallel on computing cluster - r

I've recently converted my windows R code to a Linux installation for running DEoptim on a function. On my windows system it all worked fine using:
ans <- DEoptim1(Calibrate,lower,upper,
DEoptim.control(trace=TRUE,parallelType=1,parVAr=parVarnames3,
packages=c("hydromad","maptools","compiler","tcltk","raster")))
where the function 'Calibrate' consisted of multiple functions. On the windows system I simply downloaded the various packages needed into the R library. The option paralleType=1 ran the code across a series of cores.
However, now I want to put this code onto a Linux based computing cluster - the function 'Calibrate' works fine when stand alone, as does DEoptim if I want to run the code on one core. However, when I specify the parelleType=1, the code fails and returns:
Error in checkForRemoteErrors(lapply(cl, recvResult)) :
7 nodes produced errors; first error: there is no package called ‘raster’
This error is reproduced whatever package I try and recall, even though the
library(raster)
command worked fine and 'raster' is clearly shown as okay when I call all the libraries using:
library()
So, my gut feeling is, is that even though all the packages and libraries are loaded okay, it is because I have used a personal library and the packages element of DEoptim.control is looking in a different space. An example of how the packages were installed is below:
install.packages("/home/antony/R/Pkges/raster_2.4-15.tar.gz",rpeo=NULL,target="source",lib="/home/antony/R/library")
I also set the lib paths option as below:
.libPaths('/home/antony/R/library')
Has anybody any idea of what I am doing wrong and how to set the 'packages' option in DEoptim control so I can run DEoptim across multiple cores in parallel?
Many thanks, Antony

Related

Why is R not plotting anything and crashes?

Suddenly R is not working properly anymore. Everything that requires some sort of visualization causes R to run infinitely and ultimately to crash. Even the simplest code such as: hist(rnorm(50)) does not provide anything. After a while I get a message: "Terminate R, R is not responding to your request to interrupt processing so to stop the current operation you may need to terminate R entirely".
I use a M1 macbook, installed the most recent version of R (v4.2.1. Apple silicon arm64 build for M1 Macs) and RStudio Desktop (2022.07.1+554). All packages are uptodate. I tried restarting R, reinstalling R and dev.off(). All the other functions work fine.
Does anyone know what to do?
I found a solution to my problem. Apparently R was trying to search for a font to use to display images. When I restarted my computer R gave the error message: "In doTryCatch(return(expr), name, parentenv, handler) :
no font could be found for family "Arial" "
A resolution for this problem is provided elsewhere:
RStudio cannot find fonts to be used in plotting

R processx package error Cannot write connection (system error 32, Broken pipe)

I am getting the error below after a long running process on an Ubuntu Docker container. I am using the rocker/tidyverse:3.6.3 base image. My forecast takes about 2 hours to run with multidplyr and builds a fable, or forecast table, with the fable package.
At the end of the script there is a write to Hive where the function below rights the dataframe to Hive. This is where the error is happening as I follow the messages that I have built in script. For shorter process runs the code runs just fine and the table is built in Hive. Unfortunately, I can't provide a reprex because it is internal to my work.
<c_error in rethrow_call(c_processx_connection_write_bytes, con, str):
Cannot write connection (system error 32, Broken pipe) #processx-connection.c:627 (processx_c_connection_write_bytes)>
in process
The dependencies for the function are ssh, dplyr, readr, askpass, and magrittr, but the error is for a package that I am unfamiliar with, processx. I believe it is an RStudio supported package because it lives at r-lib.org.
The function being used can be found here; it is too long to paste:
https://github.com/Fredo-XVII/RToolShed/blob/master/R/write_df_to_hive3.R
Any help would be greatly appreciated. Thank you!
P.S. I was not able to add #processx as a tag, so if anyone can add it, I would be grateful.

Deploy custom R script as web service Azure ML Studio

I have an R script which takes as input an excel file with two columns containing dates-values and it gives as output 3 dates with the corresponding prediction values. I have already successfully implemented it in Azure Machine Learning Studio using three nodes. One containing the zipped packages I use, one with the input .csv file and the last one with the R script.
The problem is when I deploy it as a web service and I try to give as input new values for Col1 and Col2, I receive the following error.
FailedToParseValue: Failed to parse value '90000, 950000, 970000' as type 'System.Double'., Error code: LibraryExecutionError, Http status code:400
The zipped libraries I use attached are: Hmisc, gdata, forecast, lubridate, fma, expsmooth, ggplot2, tsibble, fpp2, and plyr. I have also tried using the notebooks provided but no good luck as I always face some kind of problem with package installation. Moreover, I tried to follow this approach https://azure.github.io/azureml-sdk-for-r/articles/train-and-deploy-to-aci/train-and-deploy-to-aci.html locally from R Studio but I have difficulty in adapting it to my case.
Any help would be greatly appreciated!
I didn't have any success installing packages via a zip. However the following worked for me in an Execute R Script and installed all dependancies also.
if(!require(package)) install.packages("package",repos = "https://ftp.heanet.ie/mirrors/cran.r-project.org/")
Make sure that your repo is from CRAN in your country.

RStudio : Rook does not work?

I would like to build a simple webserver using Rook, however I am having strange errors when trying it in R-Studio:
The code
library(Rook)
s <- Rhttpd$new()
s$start()
print(s)
returns the rather useless error
"Error in listenPort > 0 :
comparison (6) is possible only for atomic and list types".
When trying the same code in a simple R-Console,everything works - so I would like to understand why that happens and how I can fix it.
RStudio is Version 0.99.484 and R is R 3.2.2
I've experienced same thing.
TLDR: This pull request solves the problem: https://github.com/jeffreyhorner/Rook/pull/31
RStudio is treated in different way and Rook port is same as tools:::httpdPort value. The problem is that in current Rook master tools:::httpdPort is assigned directly. It's a function that's why we need to evaluate it first.
If you want to have it solved right now, without waiting for merge into master: install devtools and load package from my fork #github.
install.packages("devtools")
library(devtools)
install_github("filipstachura/Rook")

RStudio cannot find any package after laptop restart

My R script worked fine in RStudio (Version 0.98.1091) on Windows 7. Then I restarted my laptop, entered again in RStudio and now it provides the following error messages each time I want to execute my code:
cl <- makeCluster(mc); # build the cluster
Error: could not find function "makeCluster"
> registerDoParallel(cl)
Error: could not find function "registerDoParallel"
> fileIdndexes <- gsub("\\.[^.]*","",basename(SF))
Error in basename(SF) : object 'SF' not found
These error messages are slightly different each time I run the code. It seems that RStudio cannot find any function that is used in the code.
I restarted R Session, cleaned Workspace, restarted RStudio. Nothing helps.
It must be noticed that after many attempts to execute the code, it finally was initialized. However, after 100 iterations, it crashed with the message related to unavailability of localhost.
Add library(*the package needed/where the function is*) for each of the packages you're using.

Resources