multicore on Linux does not use multiple CPUs - r

I am using R 2.14.0 64 bit on Linux. I went ahead and used the example described here. I am then running the example -
library(doMC)
registerDoMC()
system.time({
r <- foreach(icount(trials), .combine=cbind) %dopar% {
ind <- sample(100, 100, replace=TRUE)
result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
coefficients(result1)
} })
However, I see in top that it is using only one CPU core. To prove it another way, if I check a process that uses all cores, I see -
ignorant#mybox: ~/R$ ps -p 5369 -L -o pid,tid,psr,pcpu
PID TID PSR %CPU
5369 5369 0 0.1
5369 5371 1 0.0
5369 5372 2 0.0
5369 5373 3 0.0
5369 5374 4 0.0
5369 5375 5 0.0
5369 5376 6 0.0
5369 5377 7 0.0
But in this case, I see -
ignorant#mybox: ~/R$ ps -p 7988 -L -o pid,tid,psr,pcpu
PID TID PSR %CPU
7988 7988 0 19.9
ignorant#mybox: ~/R$ ps -p 7991 -L -o pid,tid,psr,pcpu
PID TID PSR %CPU
7991 7991 0 19.9
How can I get it to use multiple cores? I am using multicore instead of doSMP or something else, because I do not want to have copies of my data for each process.

You could try executing your script using the command:
$ taskset 0xffff R --slave -f parglm.R
If this fixes the problem, then you may have a version of R that was built with OpenBLAS or GotoBlas2 which sets the CPU affinity so that you can only use one core, which is a known problem.
If you want to run your example interactively, start R with:
$ taskset 0xffff R

First, you might want to look at htop, which is probably available for your distribution. You can clearly see the usage for each CPU.
Second, have you tried setting the number of cores on the machine directly?
Run this with htop open:
library(doMC)
registerDoMC(cores=12) # Try setting this appropriately.
system.time({
r <- foreach(1:1000, .combine=cbind) %dopar% {
mean(rnorm(100000))
} })
# I get:
# user system elapsed
# 12.789 1.136 1.860
If the user time is much higher than elapsed (not always -- I know, but a rule of thumb), you are probably using more than one core.

Related

Solaris: Equivalent command for chronyc sources and chronyc tracking

As per title I want to check output of the equivalent command but on a Solaris 11 box.
Is there a similar command?
In Solaris 11 you have standard ntp daemon so you can use command:
ntpq -p
to get the list of peers. And with this command you can see also some local time sync parameters:
root#sol1:/etc/inet# ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
+time.cloudflare 10.74.8.178 3 u 15 64 3 9.287 -241.90 210.255
*mail.eban-meban 147.125.80.35 3 u 36 64 1 4.415 -192.07 16.270
-purple.bonev.co 151.237.71.222 2 u 14 64 3 5.197 -242.41 219.576
+ntp.netguard.bg 20.39.126.15 2 u 47 64 3 4.570 -162.62 147.576

How to readRDS from stdin?

I try to use the following command to read an RDS file. But it doesn't work. My OS is Mac OS X.
$ lr -e "readRDS(file('stdin'))" < /tmp/x.rds
Error in readRDS(file("stdin")) : unknown input format
$ lr -p -e "readRDS('/dev/stdin')" < /tmp/x.rds
Error in readRDS("/dev/stdin") : error reading from connection
But this works.
$ lr -p -e "readRDS('/tmp/x.rds')"
x y
1 1 11
2 2 12
3 3 13
Does anybody know how to readRDS from stdin? Thanks.
It works for me (on linux, using littler 0.3.9 on R-devel) using '/dev/stdin' instead of 'stdin'; so try:
lr -p -e "print(readRDS('/dev/stdin'))" < /tmp/x.rds

R's pipe() function and Ubuntu console gives different result

I am using R in my Ubuntu machine with latest configuration
In R, I get below result:
> read.fwf(pipe('ps -ef | grep /var/lib/docker/'), width = 60)
V1
1 root 29155 29151 0 11:18 pts/0 00:00:00 sh -c ps -ef
2 root 29157 29155 0 11:18 pts/0 00:00:00 grep /var/li
However in Ubuntu console I get different result
ps -ef | grep /var/lib/docker/
root 29150 2509 0 11:17 pts/0 00:00:00 grep --color=auto /var/lib/docker/
I wanted R to fetch PID of /var/lib/docker/, which is according to Ubuntu 2509
Can anyone help me understand why I am getting different result and how to fetch the PID number correctly?
Thanks,
Use ps() in the ps package. This function outputs a data.frame with the process id information.
library(ps)
pid_df <- ps()
pid_df$pid[grep("docker", pid_df$name)]
or in one line:
subset(ps(), grep("docker", name))$pid

Filter output of 'ps aux'

running ps aux returns :
USER 131 2.1 0.1 23423 423 FFF/5 R 10:12 0:00 -bash
USER 131 2.1 0.1 23423 423 FFF/5 R 10:12 0:00 -test
USER 131 2.1 0.1 23423 423 FFF/5 R 10:12 0:00 -test1
Attempting to filter on bash with wildcards so just
USER 131 2.1 0.1 23423 423 FFF/5 R 10:12 0:00 -bash
is returned :
ps aux|grep "*bash*"
which returns :
invalid option :
grep: invalid option -- 'p'
Usage: grep [OPTION]... PATTERN [FILE]...
Try `grep --help' for more information.
How to filter the output for bash ?
You should just use ps aux|grep 'bash' and it will work the way you want.The * when used in the grep command actually refers to the regex repetition operator of "zero or more" , not the * wildcard character.
ps aux | grep bash | grep -v bash
to return all bash process
Some versions of ps support this directly. For example, to list all processes whose name is bash, run ps like this:
ps -C bash

Hung parallel processes in R: icc vs gcc

I've noticed strange behaviour with launching parallel processes in R that only appears when R is built with icc. The spawned parallel processes are not killed when the main process ends.
Example code is as follows:
library(foreach)
library(doMC)
registerDoMC(cores=4)
d <- rep(1,16)
t <- foreach(i=1:4, .combine=c) %dopar% {
s <- foreach(1:4, .combine=c) %do% 1*1
}
identical(t, d)
Here we see the 4 spawned process are orphaned at the completion of the script.
build$ Rscript HungRProcs.R
Loading required package: iterators
Loading required package: parallel
[1] TRUE
build$ ps -elf | grep R
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
1 S root 1427 2 0 80 0 - 0 worker May15 ? 00:00:00 [SCIF INTR 0]
0 S build 19173 26999 0 80 0 - 35960 poll_s 12:27 pts/1 00:00:00 vim RStats-3.0.3-dw.spec
1 S walling 24425 1 1 80 0 - 51468 hrtime 13:11 pts/5 00:00:00 /home1/00157/walling/software/R-3.1.0/bin/exec/R --slave --no-restore --file=HungRProcs.R --args
1 S walling 24426 1 1 80 0 - 51468 hrtime 13:11 pts/5 00:00:00 /home1/00157/walling/software/R-3.1.0/bin/exec/R --slave --no-restore --file=HungRProcs.R --args
1 S walling 24427 1 1 80 0 - 51468 hrtime 13:11 pts/5 00:00:00 /home1/00157/walling/software/R-3.1.0/bin/exec/R --slave --no-restore --file=HungRProcs.R --args
1 S walling 24428 1 1 80 0 - 51468 hrtime 13:11 pts/5 00:00:00 /home1/00157/walling/software/R-3.1.0/bin/exec/R --slave --no-restore --file=HungRProcs.R --args
0 R walling 24430 21882 0 80 0 - 27561 - 13:11 pts/5 00:00:00 ps -elf
0 S walling 24431 21882 0 80 0 - 25814 pipe_w 13:11 pts/5 00:00:00 grep R
The configure used for the icc build is as follows:
build$ ./configure --prefix=/home1/00157/walling/software/R-3.1.0 CC=icc F77=ifort FC=ifort CXX=icpc
If built with gcc, the spawned processes are terminated when the main process completes. The configure used for the gcc build is as follows:
build$ ./configure --prefix=/home1/00157/walling/software/R-3.1.0 CC=gcc F77=gfortran FC=gfortran CXX=gcc
I have run tests against both R 3.0.3 and 3.1.0, different parallel backends via doMC, doSNOW and straight mclapply. I've also tested with multiple versions of the GNU compiler and Intel compiler and on both Centos 5.10 and 6.5. All tests cases have resulted in the same behaviour.
Any ideas why the compiler would affect proper termination of spawned sub-processes?

Resources