R ggplot encoding/locale issue with ssh -X - r

[ This is a cross posting to the R-help mailing list post where this question has remained unanswered so far ]
I am struggling with remote R sessions and a (I suspect) locale related
encoding problem: Using the X11 device (X11forwarding enabled),
whenever I try to plot something containing umlauts using ggplot2, I am
seeing sth like
Error in grid.Call(L_stringMetric, as.graphicsAnnot(x$label)) :
invalid use of -61 < 0 in 'X11_MetricInfo'
Using base graphics is fine as is plotting to another device (pdf, say).
Here is some code to reproduce:
## ssh -X into the remote server
## start R at the remote server
plot(1:10, 1:10, main = "größe")
## this opens a plot window and works as expected
library("ggplot2")
qplot(1:10, 1:10)
## this works still
qplot(1:10, 1:10) + xlab("größe")
## I get the ERROR above
My setup:
locally:
Linux (Debian GNU/Linux 9)
remotely
Linux (RHEL Server release 7.3 (Maipo)
(Maybe) relevant bits of my .ssh/config:
Host theserver
HostName XXX.XXX.XXX.XXX
ForwardX11 yes
ForwardX11Timeout 596h
IdentityFile ~/.ssh/id_rsa
IdentitiesOnly yes
ForwardAgent yes
ServerAliveInterval 300

What version of R do you have (on the remote machine)?
I can replicate this with:
x11(type="Xlib")
library(grid)
convertHeight(stringDescent("größe"), "in")
on R 3.2.5, but not on, e.g., R 3.4.0 (just running R locally in
both cases).

Related

Why won't R.matlab connect to MATLAB?

I am trying to use R.matlab for the first time. I run the following my_sample_program.R in MATLAB using the command:
system('/usr/local/bin/R CMD BATCH my_sample_program.R out.txt');
where the R code is as follows:
test<-function(x,y)
{
library(R.matlab)
matlab <- Matlab(host="localhost",port=9927)
open(matlab)
setVerbose(matlab,-2)
x <- getVariable(matlab, "x")
y <- getVariable(matlab, "y")
some_other_function(x,y,exact = TRUE)
}
test(x,y)
Running the system command is successful from MATLAB's point of view. But I get the error in the out.txt file that says
Error in throw.default(sprintf("Failed to connect to MATLAB on host '%s' (port %d) after trying %d times for approximately %.1f seconds.", :
Failed to connect to MATLAB on host 'localhost' (port 9927) after trying 30 times for approximately 30.0 seconds.
(I used the command sudo lsof | grep localhost in Terminal to find that MATLAB is associated with port 9927.)
My question: How can I connect to MATLAB via R.matlab?

How to connect R Studio to Tableau?

I'm trying to connect R Studio to Tableau Desktop to do some data analysis work, but an error has occurred during connection saying: localhost:6311: Connection refused
I'm using MacOS version 10.13.6
Coding on R:
install.packages("Rserve")
library(Rserve)
Rserve()
Try adding the following parameter to your Rserve() line which will hard-code the port:
Rserve(port = 6311)
If that doesn't work, it is worth troubleshooting the port with the following command in terminal (telnet might need to be installed as it is not installed by default):
telnet localhost 6311
the return from the telnet command should be
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Rsrv0103QAP1
More information on the above here
If the command returns a failure, the problem is certainly outside of Tableau.
Some thoughts from there would be to edit the R config file to explicitly accept remote connections.

How to run r script on remote GPU?

I am trying to run r script on a GPU server provided by the institute.
Specifications of GPU server are as follows:
Host Name: gpu01.cc.iitk.ac.in,
Configuration: Four Tesla T10 GPUs added to each machine with 8 cores in each
Operating System: Linux
Specific Usage: Parallel Programming under Linux using CUDA with C Language
R code:
setwd("~/Documents/tm dataset")
library(ssh)
session <- ssh_connect("dgaurav#gpu01.cc.iitk.ac.in")
print(session)
out <- ssh_exec_wait(session, command = 'articles1_test.R')
Error:
ksh: articles1_test.R: not found
Your dataset and script are only on local machine... you nneed to cooy the to remote server before you can run them.

How to setup workers for parallel processing in R using snowfall and multiple Windows nodes?

I’ve successfully used snowfall to setup a cluster on a single server with 16 processors.
require(snowfall)
if (sfIsRunning() == TRUE) sfStop()
number.of.cpus <- 15
sfInit(parallel = TRUE, cpus = number.of.cpus)
stopifnot( sfCpus() == number.of.cpus )
stopifnot( sfParallel() == TRUE )
# Print the hostname for each cluster member
sayhello <- function()
{
info <- Sys.info()[c("nodename", "machine")]
paste("Hello from", info[1], "with CPU type", info[2])
}
names <- sfClusterCall(sayhello)
print(unlist(names))
Now, I am looking for complete instructions on how to move to a distributed model. I have 4 different Windows machines with a total of 16 cores that I would like to use for a 16 node cluster. So far, I understand that I could manually setup a SOCK connection or leverage MPI. While it appears possible, I haven’t found clear and complete directions as to how.
The SOCK route appears to depend on code in a snowlib script. I can generate a stub from the master side with the following code:
winOptions <-
list(host="172.01.01.03",
rscript="C:/Program Files/R/R-2.7.1/bin/Rscript.exe",
snowlib="C:/Rlibs")
cl <- makeCluster(c(rep(list(winOptions), 2)), type = "SOCK", manual = T)
It yields the following:
Manually start worker on 172.01.01.03 with
"C:/Program Files/R/R-2.7.1/bin/Rscript.exe"
C:/Rlibs/snow/RSOCKnode.R
MASTER=Worker02 PORT=11204 OUT=/dev/null SNOWLIB=C:/Rlibs
It feels like a reasonable start. I found code for RSOCKnode.R on GitHub under the snow package:
local({
master <- "localhost"
port <- ""
snowlib <- Sys.getenv("R_SNOW_LIB")
outfile <- Sys.getenv("R_SNOW_OUTFILE") ##**** defaults to ""; document
args <- commandArgs()
pos <- match("--args", args)
args <- args[-(1 : pos)]
for (a in args) {
pos <- regexpr("=", a)
name <- substr(a, 1, pos - 1)
value <- substr(a,pos + 1, nchar(a))
switch(name,
MASTER = master <- value,
PORT = port <- value,
SNOWLIB = snowlib <- value,
OUT = outfile <- value)
}
if (! (snowlib %in% .libPaths()))
.libPaths(c(snowlib, .libPaths()))
library(methods) ## because Rscript as of R 2.7.0 doesn't load methods
library(snow)
if (port == "") port <- getClusterOption("port")
sinkWorkerOutput(outfile)
cat("starting worker for", paste(master, port, sep = ":"), "\n")
slaveLoop(makeSOCKmaster(master, port))
})
It’s not clear how to actually start a SOCK listener on the workers, unless it is buried in snow::recvData.
Looking into the MPI route, as far as I can tell, Microsoft MPI version 7 is a starting point. However, I could not find a Windows alternative for sfCluster. I was able to start the MPI service, but it does not appear to listen on port 22 and no amount of bashing against it with snowfall::makeCluster has yielded a result. I’ve disabled the firewall and tried testing with makeCluster and directly connecting to the worker from the master with PuTTY.
Is there a comprehensive, step-by-step guide to setting up a snowfall cluster on Windows workers that I’ve missed? I am fond of snowfall::sfClusterApplyLB and would like to continue using that, but if there is an easier solution, I’d be willing to change course. Looking into Rmpi and parallel, I found alternative solutions for the master side of the work, but still little to no specific detail on how to setup workers running Windows.
Due to the nature of the work environment, neither moving to AWS, nor Linux is an option.
Related questions without definitive answers for Windows worker nodes:
How to set up cluster slave nodes (on Windows)
Parallel R on a Windows cluster
Create a cluster of co-workers' Windows 7 PCs for parallel processing in R?
There were several options for HPC infrastructure considered: MPICH, Open MPI, and MS MPI. Initially tried to use MPICH2 but gave up as the latest stable release 1.4.1 for Windows dated back by 2013 and no support since those times. Open MPI is not supported by Windows. Then only the MS MPI option is left.
Unfortunately snowfall does not support MS MPI so I decided to go with pbdMPI package, which supports MS MPI by default. pbdMPI implements the SPMD paradigm in contrast withRmpi, which uses manager/worker parallelism.
MS MPI installation, configuration, and execution
Install MS MPI v.10.1.2 on all machines in the to-be Windows HPC cluster.
Create a directory accessible to all nodes, where R-scripts / resources will reside, for example, \HeadMachine\SharedDir.
Check if MS MPI Launch Service (MsMpiLaunchSvc) running on all nodes.
Check, that MS MPI has the rights to run R application on all the nodes on behalf of the same user, i.e. SharedUser. The user name and the password must be the same for all machines.
Check, that R should be launched on behalf of the SharedUser user.
Finally, execute mpiexec with the following options mentioned in Steps 7-10:
mpiexec.exe -n %1 -machinefile "C:\MachineFileDir\hosts.txt" -pwd
SharedUserPassword –wdir "\HeadMachine\SharedDir" Rscript hello.R
where
-wdir is a network path to the directory with shared resources.
–pwd is a password by SharedUser user, for example, SharedUserPassword.
–machinefile is a path to hosts.txt text file, for example С:\MachineFileDir\hosts.txt. hosts.txt file must be readable from the head node at the specified path and it contains a list of IP addresses of the nodes on which the R script is to be run.
As a result of Step 7 MPI will log in as SharedUser with the password SharedUserPassword and execute copies of the R processes on each computer listed in the hosts.txt file.
Details
hello.R:
library(pbdMPI, quiet = TRUE)
init()
cat("Hello World from
process",comm.rank(),"of",comm.size(),"!\n")
finalize()
hosts.txt
The hosts.txt - MPI Machines File - is a text file, the lines of which contain the network names of the computers on which R scripts will be launched. In each line, after the computer name is separated by a space (for MS MPI), the number of MPI processes to be launched. Usually, it equals the number of processors in each node.
Sample of hosts.txt with three nodes having 2 processors each:
192.168.0.1 2
192.168.0.2 2
192.168.0.3 2

How can I shut down Rserve gracefully?

I have tried many options both in Mac and in Ubuntu.
I read the Rserve documentation
http://rforge.net/Rserve/doc.html
and that for the Rserve and RSclient packages:
http://cran.r-project.org/web/packages/RSclient/RSclient.pdf
http://cran.r-project.org/web/packages/Rserve/Rserve.pdf
I cannot figure out what is the correct workflow for opening/closing a connection within Rserve and for shutting down Rserve 'gracefully'.
For example, in Ubuntu, I installed R from source with the ./config --enable-R-shlib (following the Rserve documentation) and also added the 'control enable' line in /etc/Rserve.conf.
In an Ubuntu terminal:
library(Rserve)
library(RSclient)
Rserve()
c<-RS.connect()
c ## this is an Rserve QAP1 connection
## Trying to shutdown the server
RSshutdown(c)
Error in writeBin(as.integer....): invalid connection
RS.server.shutdown(c)
Error in RS.server.shutdown(c): command failed with satus code 0x4e: no control line present (control commands disabled or server shutdown)
I can, however, CLOSE the connection:
RS.close(c)
>NULL
c ## Closed Rserve connection
After closing the connection, I also tried the options (also tried with argument 'c', even though the connection is closed):
RS.server.shutdown()
RSshutdown()
So, my questions are:
1- How can I close Rserve gracefully?
2- Can Rserve be used without RSclient?
I also looked at
How to Shutdown Rserve(), running in DEBUG
but the question refers to the debug mode and is also unresolved. (I don't have enough reputation to comment/ask whether the shutdown works in the non-debug mode).
Also looked at:
how to connect to Rserve with an R client
Thanks so much!
Load Rserve and RSclient packages, then connect to the instances.
> library(Rserve)
> library(RSclient)
> Rserve(port = 6311, debug = FALSE)
> Rserve(port = 6312, debug = TRUE)
Starting Rserve...
"C:\..\Rserve.exe" --RS-port 6311
Starting Rserve...
"C:\..\Rserve_d.exe" --RS-port 6312
> rsc <- RSconnect(port = 6311)
> rscd <- RSconnect(port = 6312)
Looks like they're running...
> system('tasklist /FI "IMAGENAME eq Rserve.exe"')
> system('tasklist /FI "IMAGENAME eq Rserve_d.exe"')
Image Name PID Session Name Session# Mem Usage
========================= ======== ================ =========== ============
Rserve.exe 8600 Console 1 39,312 K
Rserve_d.exe 12652 Console 1 39,324 K
Let's shut 'em down.
> RSshutdown(rsc)
> RSshutdown(rscd)
And they're gone...
> system('tasklist /FI "IMAGENAME eq Rserve.exe"')
> system('tasklist /FI "IMAGENAME eq Rserve_d.exe"')
INFO: No tasks are running which match the specified criteria.
Rserve can be used w/o RSclient by starting it with args and/or a config script. Then you can connect to it from some other program (like Tableau) or with your own code. RSclient provides a way to pass commands/data to Rserve from an instance of R.
Hope this helps :)
On a Windows system, if you want to close an RServe instance, you can use the system function in R to close it down.
For example in R:
library(Rserve)
Rserve() # run without any arguments or ports specified
system('tasklist /FI "IMAGENAME eq Rserve.exe"') # run this to see RServe instances and their PIDs
system('TASKKILL /PID {yourPID} /F') # run this to kill off the RServe instance with your selected PID
If you have closed your RServe instance with that PID correctly, the following message will appear:
SUCCESS: The process with PID xxxx has been terminated.
You can check the RServe instance has been closed down by entering
system('tasklist /FI "IMAGENAME eq Rserve.exe"')
again. If there are no RServe instances running any more, you will get the message
INFO: No tasks are running which match the specified criteria.
More help and info on this topic can be seen in this related question.
Note that the 'RSClient' approach mentioned in an earlier answer is tidier and easier than this one, but I put it forward anyway for those who start RServe without knowing how to stop it.
If you are not able to shut it down within R, run the codes below to kill it in terminal. These codes work on Mac.
$ ps ax | grep Rserve # get active Rserve sessions
You will see outputs like below. 29155 is job id of the active Rserve session.
29155 /Users/userid/Library/R/3.5/library/Rserve/libs/Rserve
38562 0:00.00 grep Rserve
Then run
$ kill 29155

Resources