RServe - Scalability related - r

My requirement is to execute an R Script through a Java Webservice. The webservice needs a concurrency of 50.
We are using RServe to execute the R Script from the Java code. To achieve this, in the linux server , we created 50 instances of the RServe instances, started at different ports. Inside the java application, created a connection pool with 50 RConnection objects, each linked to one of the RServe instance created . For every execution, we fetch a RConnection from the pool, execute the R script, get the response value and then return the RConnection to the pool.
When we execute the webservice with a single user accessing, the R execution gets completed in 1 second. However , if I try to run the same webservice with a concurrency of 50, it takes around 30 seconds to execute the R Script inside the RServe.
Since the actual R execution takes only 1 second if executed with single user, Im thinking that Im doing something wrong with the RServe. Any pointers would help.

Although I think it is best to use one Rserve instance on Linux and let it just fork sub processes for parallel processing, it may not speed up processing at all.
From your question, it is not clear whether the application is used intensively and many concurrent requests are being processed continually. If that is the case, I assume that your R code is CPU intensive and the different processes just need to share CPU time, increasing the clock time needed to complete.
I tested just that kind of scenario and found these results with top
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
33839 ***** 20 0 269792 57104 3496 R 10.3 1.5 0:15.33 Rserve
33847 ***** 20 0 269776 57100 3496 R 10.3 1.5 0:09.86 Rserve
33849 ***** 20 0 269792 57104 3496 R 10.3 1.5 0:08.20 Rserve
33855 ***** 20 0 269528 56840 3496 R 10.3 1.5 0:04.92 Rserve
29725 ***** 20 0 268872 56836 4020 R 10.0 1.5 1360:13 Rserve
33841 ***** 20 0 269784 57100 3496 R 10.0 1.5 0:14.42 Rserve
33843 ***** 20 0 269796 57104 3496 R 10.0 1.5 0:12.50 Rserve
33844 ***** 20 0 269792 57104 3496 R 10.0 1.5 0:11.72 Rserve
33852 ***** 20 0 269512 56836 3496 R 10.0 1.5 0:06.38 Rserve
33856 ***** 20 0 269520 56836 3496 R 10.0 1.5 0:04.05 Rserve
33842 ***** 20 0 269776 57100 3496 R 9.3 1.5 0:13.20 Rserve
33851 ***** 20 0 269784 57100 3496 R 9.3 1.5 0:06.69 Rserve
33857 ***** 20 0 269512 56836 3496 R 9.3 1.5 0:03.15 Rserve
33834 ***** 20 0 269792 57112 3496 R 9.0 1.5 0:18.56 Rserve
33835 ***** 20 0 269784 57100 3496 R 9.0 1.5 0:17.33 Rserve
33837 ***** 20 0 269776 57100 3496 R 9.0 1.5 0:16.46 Rserve
33846 ***** 20 0 269784 57100 3496 R 9.0 1.5 0:10.17 Rserve
33848 ***** 20 0 269796 57104 3496 R 9.0 1.5 0:08.61 Rserve
33853 ***** 20 0 269532 56840 3496 R 9.0 1.5 0:05.34 Rserve
33858 ***** 20 0 269532 56840 3496 R 9.0 1.5 0:02.27 Rserve
33838 ***** 20 0 269796 57104 3496 R 8.6 1.5 0:15.74 Rserve
The %CPU sums up to 200%, corresponding to the two CPU cores available.
As you can see, the processes have the same priority (PR=20), and the shares of %CPU are almost equal, around 10%, so all of them will be allocated only 1/10th op the CPU time and will therefore take 10 times longer to complete, compared to the case of just one Rserve instance.
This is not 20 times longer, because a single Rserve process will only utilise one CPU core, leaving the other core 'sleeping'.
You simply need more CPU's if you want to speed up calculations. Also, if you don't want the 51st (or 101st, or 1001st) concurrent user to be denied access, it is better to implement a message queue. You can create multiple workers for the queue, which can distribute work-load over many CPU's, in different machines.

Related

Extending a Logical Volume on RHEL7

I am using a RHEL7 box, created by our in-house vm-provisioning system.
They create logical volumes for the likes of /var, /home, swap etc. using 2 pools of space. I was attempting to follow the examples and descriptions of how to add some of that un-allocated space to a volume from https://www.tecmint.com/extend-and-reduce-lvms-in-linux/, and am stuck getting 'resize2fs' to operate as expected.
using lvdisplay - I got the appropriate volume:
--- Logical volume ---
LV Path /dev/rootvg/lvvar
LV Name lvvar
VG Name rootvg
LV UUID WGkYI1-WG0S-uiXS-ziQQ-4Pbe-rv1H-0HyA2a
LV Write Access read/write
LV Creation host, time localhost, 2018-06-05 16:10:01 -0400
LV Status available
# open 1
LV Size 2.00 GiB
Current LE 512
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 8192
Block device 253:5
I found the associated volume group with vgdisplay:
--- Volume group ---
VG Name rootvg
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 8
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 7
Open LV 7
Max PV 0
Cur PV 1
Act PV 1
VG Size <49.00 GiB
PE Size 4.00 MiB
Total PE 12543
Alloc PE / Size 5120 / 20.00 GiB
Free PE / Size 7423 / <29.00 GiB
VG UUID 5VkgVi-oZ56-KqMk-6vmf-ttNo-EMHG-quotwk
I decided to take 4G from the Free PE's and extended the space with:
lvextend -l +1024 /dev/rootvg/lvvar
which answered as expected:
Size of logical volume rootvg/lvvar changed from 2.00 GiB (512 extents) to 6.00 GiB (1536 extents).
Logical volume rootvg/lvvar successfully resized.
But when I try to use resize2fs - I get this:
# resize2fs /dev/mapper/rootvg-lvvar
resize2fs 1.42.9 (28-Dec-2013)
resize2fs: Bad magic number in super-block while trying to open /dev/mapper/rootvg-lvvar
I'm sure it's something dumb I'm missing - can anyone push me in the right direction here?
Use xfs_growfs instead.
xfs_growfs /dev/mapper/rootvg-lvvar

Looping GAMS optim model to iterate

I have written a dispatch model in GAMS which optimizes by minimizing system costs. I want to loop runs of the model; Run the optimization, save the output, varying a single parameter(storageCap) -- increasing it by a small fraction each iteration, and running the model again. GDXRRW does not seem to be able to run on R v.3.3.1 -- "Bug In Your Hair".
Are you sure about gdxrrw not working on R 3.3.1? It surely works for me:
(1) Install gdxxrw using
install.packages("C:\\GAMS\\win64\\24.7\\gdxrrw\\win3264\\gdxrrw_1.0.0.zip",repos=NULL)
(2) Use a GAMS script like:
set i /i1*i10/;
parameter p(i);
p(i) = uniform(0,1);
display p;
execute_unload "p.gdx",p;
execute '"c:\program files\R\R-3.3.1\bin\Rscript.exe" p.R';
$onecho > p.R
R.version
library(gdxrrw)
p<-rgdx.param("p.gdx","p");
p
$offecho
You will see something like:
--- Job Untitled_56.gms Start 08/18/16 15:29:58 24.6.1 r55820 WEX-WEI x86 64bit/MS Windows
GAMS 24.6.1 Copyright (C) 1987-2016 GAMS Development. All rights reserved
Licensee: Erwin Kalvelagen G150803/0001CV-GEN
Amsterdam Optimization Modeling Group DC10455
--- Starting compilation
--- Untitled_56.gms(17) 3 Mb
--- Starting execution: elapsed 0:00:00.013
--- Untitled_56.gms(5) 4 Mb
--- GDX File C:\tmp\p.gdx
--- Untitled_56.gms(6) 4 Mb
_
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 3.1
year 2016
month 06
day 21
svn rev 70800
language R
version.string R version 3.3.1 (2016-06-21)
nickname Bug in Your Hair
i p
1 i1 0.17174713
2 i2 0.84326671
3 i3 0.55037536
4 i4 0.30113790
5 i5 0.29221212
6 i6 0.22405287
7 i7 0.34983050
8 i8 0.85627035
9 i9 0.06711372
10 i10 0.50021067
*** Status: Normal completion
--- Job Untitled_56.gms Stop 08/18/16 15:29:59 elapsed 0:00:00.907

R readHTMLTable() function error

I'm running into a problem when trying to use the readHTMLTable function in the R package XML. When running
library(XML)
baseurl <- "http://www.pro-football-reference.com/teams/"
team <- "nwe"
year <- 2011
theurl <- paste(baseurl,team,"/",year,".htm",sep="")
readurl <- getURL(theurl)
readtable <- readHTMLTable(readurl)
I get the error message:
Error in names(ans) = header :
'names' attribute [27] must be the same length as the vector [21]
I'm running 64 bit R 2.15.1 through R Studio 0.96.330. It seems there are several other questions that have been asked about the readHTMLTable() function, but none addressed this specific question. Does anyone know what's going on?
When readHTMLTable() complains about the 'names' attribute, it's a good bet that it's having trouble matching the data with what it's parsed for header values. The simplest way around this is to simply turn off header parsing entirely:
table.list <- readHTMLTable(theurl, header=F)
Note that I changed the name of the return value from "readtable" to "table.list". (I also skipped the getURL() call since 1. it didn't work for me and 2. readHTMLTable() knows how to handle URLs). The reason for the change is that, without further direction, readHTMLTable() will hunt down and parse every HTML table it can find on the given page, returning a list containing a data.frame for each.
The page you have sent it after is fairly rich, with 8 separate tables:
> length(table.list)
[1] 8
If you were only interested in a single table on the page, you can use the which attribute to specify it and receive its contents as a data.frame directly.
This could also cure your original problem if it had choked on a table you're not interested in. Many pages still use tables for navigation, search boxes, etc., so it's worth taking a look at the page first.
But this is unlikely to be the case in your example since it actually choked on all but one of them. In the unlikely event that the stars aligned and you were only interested in the successfully-oarsed third table on the page (passing statistics) you could grab it like this, keeping header parsing on:
> passing.df = readHTMLTable(theurl, which=3)
> print(passing.df)
No. Age Pos G GS QBrec Cmp Att Cmp% Yds TD TD% Int Int% Lng Y/A AY/A Y/C Y/G Rate Sk Yds NY/A ANY/A Sk% 4QC GWD
1 12 Tom Brady* 34 QB 16 16 13-3-0 401 611 65.6 5235 39 6.4 12 2.0 99 8.6 9.0 13.1 327.2 105.6 32 173 7.9 8.2 5.0 2 3
2 8 Brian Hoyer 26 3 0 1 1 100.0 22 0 0.0 0 0.0 22 22.0 22.0 22.0 7.3 118.7 0 0 22.0 22.0 0.0

Memory problems with large-scale social network visualization using R and Cytoscape

I'm relatively new to R and am trying to solve the following problem:
I work on a Windows 7 Enterprise platform with the 32bit version of R
and have about 3GB of RAM on my machine. I have large-scale social
network data (c. 7,000 vertices and c. 30,000 edges) which are
currently stored in my SQL database. I have managed to pull this data
(omitting vertex and edge attributes) into an R dataframe and then
into an igraph object. For further analysis and visualization, I would
now like to push this igraph into Cytoscape using RCytoscape.
Currently, my approach is to convert the igraph object into an
graphNEL object since RCytoscape seems to work well with this object
type. (The igraph plotting functions are much too slow and lack
further analysis functionality.)
Unfortunately, I always run into memory issues when running this
script. It has worked previously with smaller networks though.
Does anyone have an idea on how to solve this issue? Or can you
recommend any other visualization and analysis tools that work well
with R and can handle such large-scale data?
Sorry for taking several days to get back to you.
I just ran some tests in which
1) an adjacency matrix is created in R
2) an R graphNEL is then created from the matrix
3) (optionally) node & edge attributes are added
4) a CytoscapeWindow is created, displayed, and layed out, and redrawn
(all times are in seconds)
nodes edges attributes? matrix graph cw display layout redraw total
70 35 no 0.001 0.001 0.5 5.7 2.5 0.016 9.4
70 0 no 0.033 0.001 0.2 4.2 0.5 0.49 5.6
700 350 no 0.198 0.036 6.0 8.3 1.6 0.037 16.7
1000 500 no 0.64 0.07 12.0 9.8 1.8 0.09 24.9
1000 500 yes 0.42 30.99 15.7 29.9 1.7 0.08 79.4
2000 1000 no 3.5 0.30 73.5 14.9 4.8 0.08 96.6
2500 1250 no 2.7 0.45 127.1 18.3 11.5 0.09 160.7
3000 1500 no 4.2 0.46 236.8 19.6 10.7 0.10 272.8
4000 2000 no 8.4 0.98 502.2 27.9 21.4 0.14 561.8
To my complete surprise, and chagrin, there is an exponential slowdown in 'cw' (the new.CytoscapeWindow method) --which makes no sense at all. It may be that your memory exhaustion is related to that, and is quite fixable.
I will explore this, and probably have a fix in the next week.
By the way, did you know that you can create a graphNEL directly from an adjacency matrix?
g = new ("graphAM", adjMat = matrix, edgemode="directed")
Thanks, Ignacio, for your most helpful report. I should have done these timing tests long ago!
Paul
It has been a while since I used Cytoscape so I am not exactly sure how to do it, but the manual states that you can use text files as input using the "Table Import" feature.
In igraph you can use the write.graph() function to export a graph in a bunch of ways. This way you can circumvent having to convert to a graphNEL object which might be enough to not run out of memory.

Solaris 10 i386 vmstat giving more free than swap

How come when running vmstat on Solaris 10 i386 I got more free space than swap space? Isn't free a proportion of swap which is available?
$ vmstat
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr s0 s1 -- -- in sy cs us sy id
1 0 0 7727088 17137388 37 303 1 0 0 0 0 -0 4 0 0 7247 7414 8122 4 1 95
No. Free RAM represent the part of RAM that is immediately available to use while free swap represent part of virtual memory which is neither allocated or reserved. Reserved memory doesn't physically use any storage (RAM or disk).
Have a look at swap -s output for details.

Resources