RAmazonS3 connection authentication issue - HTTP/1.1 403 Forbidden

RAmazonS3 connection authentication issue - HTTP/1.1 403 Forbidden - r

I'm trying to use the listBuckets function from the RAmazonS3 package but getting HTTP/1.1 403 Forbidden.
First I'm setting the authentication options as described in the manual:
options(AmazonS3 = c('login' = 'secret'))
I replaced login with my access key ID (20 characters), and secret with my secret access key (40 characters). When I run listBuckets(), I get the following error:
Error in UseMethod("xmlSApply") :
no applicable method for 'xmlSApply' applied to an object of class "NULL"
It's not returning any data, so it must not be connecting properly. Digging into it, there is a getURL call within listBuckets. The output of that line is:
* About to connect() to proxy proxyname.domain.com port xx (#0)
* Trying xxx.xxx.xxx.xxx... * connected
* Connected to proxyname.domain.com (xxx.xxx.xxx.xxx) port xx (#0)
> GET http://s3.amazonaws.com HTTP/1.1
Host: s3.amazonaws.com
Accept: */*
Proxy-Connection: Keep-Alive
Date: Fri, 12 Sep 2014 09:02:41 EDT
Authorization: AWS [login]:[unknown 27-character code]=
< HTTP/1.1 403 Forbidden
< x-amz-request-id: [unknown 16-character code]
< x-amz-id-2: [unknown 64-character code]
< Content-Type: application/xml
< Transfer-Encoding: chunked
< Date: Fri, 12 Sep 2014 13:02:40 GMT
< Server: AmazonS3
< Cache-Control: proxy-revalidate
< Proxy-Connection: Keep-Alive
< Connection: Keep-Alive
<
* Connection #0 to host proxyname.domain.com left intact
Any ideas where I'm going wrong?
I'm not sure what value to use for login, so I've also tried my AWS account name, my AWS account user name, and literally 'login', but get a similar error.
Before finding the RAmazonS3 package, I was starting to write my own S3api functions using the AWS Command Line Interface. I was able to successfully list the buckets and their objects when calling the AWS CLI from R.
sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RCurl_1.95-4.3 bitops_1.0-6 RAmazonS3_0.1-5
loaded via a namespace (and not attached):
[1] digest_0.6.4 tools_3.1.1 XML_3.98-1.1

Just to repeat what's in the comment above; i have noticed that most of the R-packages that connected to AWS service were out of date. So i have created a new package AWSConnect that allows a user to do most basic operations with S3 and EC2. In that package, the function s3.ls() is designed to list the bucket on S3.
Please feel free to use it, and report any bugs/request/issues

The package works in most cases:
login = AWS secret key
secret = AWS secret access key
Example:
markus_test is a new created bucket with no public permission
auth <- c("AKIAJN6VFFXXXXXXXXXX" ="d95ij4uy0i6n+auvhwLLP6VQiz27OdXXXXXXXXXX")
listBucket("markus_test", auth)`
Key LastModified ETag Size
1 rmr2_example.R 2014-09-15 23:38:48 c7f4544cf972bed52fa84164cf2505bf 1248
Owner.ID Owner.DisplayName
1 2a22982b6e7216f42abd2e8848f07a8ada0b1c11318dc8331aee068f29b7765d markusataws
StorageClass
1 STANDARD
> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RAmazonS3_0.1-5
loaded via a namespace (and not attached):
[1] digest_0.6.4 RCurl_1.95-4.3 tools_3.1.1 XML_3.98-1

Related

How to fix the timeout problem in curl::curl_fetch_memory when using R with a proxy?

I have seen many questions on curl's timeout problem. However, I am still confused about my case.
My problem occurred when I was using quantmod::getSymbols. Every trial ended up with
Warning message:
x = 'AAPL', get = 'stock.prices': Error in curl::curl_fetch_memory(cu, handle = h): Timeout was reached: [finance.yahoo.com] Operation timed out after 10008 milliseconds with 0 out of 0 bytes received
Note I am using a proxy. I have tried to switch on and off the proxy or run
httr::set_config(httr::use_proxy(
"127.0.0.1:xxxxxx", port = 8080,
username = "xxxx", password = "****"
), override = TRUE)
However, nothing works.
After getting confused about quantmod's internal details, I decided to experiment on the pure curl_fetch_memory.
the example in curl_fetch_memory's document works normally on my computer
curl::curl_fetch_memory("http://httpbin.org/cookies/set?foo=123&bar=ftw")
curl_fetch_memory does not work for "https://finance.yahoo.com/" (Note that I cannot get access to the website on a web browser without my proxy)
The followings codes can successfully fetch results from https://finance.yahoo.com/:
curl_opts <- list(
ssl_verifypeer = 0L,
proxyuserpwd = "xxxx:****",
proxyauth = as.integer(15),
proxy = "127.0.0.1:xxxxxx",
proxyport = 8080
)
cookie_handler <- curl::new_handle()
curl::handle_setopt(handle=cookie_handler, .list=curl_opts)
curl::curl_fetch_memory("https://finance.yahoo.com/",
handle = cookie_handler)
It seems that quantmod is unable to use the proxy setting properly. However, there is no option to set the proxy inside the functions from the package. How can I solve my problem?
System:
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)
Matrix products: default
locale:
[1] LC_COLLATE=Chinese (Simplified)_China.936
[2] LC_CTYPE=Chinese (Simplified)_China.936
[3] LC_MONETARY=Chinese (Simplified)_China.936
[4] LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_China.936
# ...

I think I have found the solution. All you need is to switch on the proxy and run
Sys.setenv(ALL_PROXY = "YOUR_PROXY_HOST")
before the request.
However, currently I still have no idea of how the curl uses the setting ALL_PROXY. I would appreciate it if anyone can offer it an explanation.

sftp with R - sftp not a protocol with RCurl

I've read a few posts on sftp with R, but was not able to address the problem that I have. There's a decent change I'm just not searching in the right place, and if that's the case, please point me in the right direction. Here's where I'm at:
> library(RCurl)
> curlVersion()
$age
[1] 3
$version
[1] "7.43.0"
$vesion_num
[1] 469760
$host
[1] "x86_64-apple-darwin15.0"
$features
ipv6 ssl libz ntlm asynchdns spnego largefile ntlm_wb
1 4 8 16 128 256 512 32768
$ssl_version
[1] "SecureTransport"
$ssl_version_num
[1] 0
$libz_version
[1] "1.2.5"
$protocols
[1] "dict" "file" "ftp" "ftps" "gopher" "http" "https" "imap" "imaps" "ldap" "ldaps" "pop3" "pop3s" "rtsp" "smb" "smbs" "smtp" "smtps" "telnet" "tftp"
$ares
[1] ""
$ares_num
[1] 0
$libidn
[1] ""
Right away, I notice that sftp is not a protocol accepted in my current version of RCurl, which is my main problem. As a result, when I run the following code below, I get the following error:
# Input
protocol <- "sftp"
server <- "00.000.00.00"
userpwd <- "userid:userpass"
tsfrFilename <- 'myfile.txt'
ouptFilename <- 'myfile.txt'
# Run
url <- paste0(protocol, "://", server, tsfrFilename)
data <- getURL(url = url, userpwd = userpwd)
Error in function (type, msg, asError = TRUE) :
Protocol "sftp" not supported or disabled in libcurl
I actually have a second question as well. My understanding is that getURL grabs data from the other server and pulls it to my local machine, whereas I would like to put a file onto the server from my local machine.
To summarize: (1) can I update RCurl / libcurl in R to support sftp, and (2) how do i put files from my local machine into the server, rather than get files from the server to my local machine?
Thanks!

I found the answer, for the most part...
http://andrewberls.com/blog/post/adding-sftp-support-to-curl - following this link addressed the problem for me.
I've successfully added sftp support to cURL, however I am now struggling to update the RCurl package to have the same...

h2o from R on Windows gives curl error: Protocol "'http" not supported or disabled in libcurl

I've successfully run h2o from R on a linux machine and wanted to install it in Windows too. h2o will not initialise for me. The full output is pasted below but the key seems to be the line
[1] "Failed to connect to 127.0.0.1 port 54321: Connection refused"
curl: (1) Protocol "'http" not supported or disabled in libcurl
Judging from this and this experience it might be something to do with single quotes v double quotes somewhere; but this seems unlikely because then no-one would be able to get h2o / R / Windows combination working and I gather that some people are. On the other hand, this question seems to suggest the problem will be that my curl installation may not have ssl enabled. So I downloaded curl from scratch from this wizard as recommended on the h2o page, selecting the 64 bit version, generic, and selected the version with both SSL and SSH enabled; downloaded it and added the folder it ended up in to my Windows PATH. But no difference.
I've just noticed my Java runtime environment is old and will update that as well. But on the face of it it's not obvious that that could be the problem.
Any suggestions welcomed.
> library(h2o)
> h2o.init()
H2O is not running yet, starting it now...
Note: In case of errors look at the following log files:
C:\Users\PETERE~1\AppData\Local\Temp\Rtmpa6G3WA/h2o_Peter_Ellis_started_from_r.out
C:\Users\PETERE~1\AppData\Local\Temp\Rtmpa6G3WA/h2o_Peter_Ellis_started_from_r.err
java version "1.7.0_75"
Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)
............................................................
ERROR: Unknown argument (Ellis_cns773)
Usage: java [-Xmx<size>] -jar h2o.jar [options]
(Note that every option has a default and is optional.)
-h | -help
Print this help.
-version
Print version info and exit.
-name <h2oCloudName>
Cloud name used for discovery of other nodes.
Nodes with the same cloud name will form an H2O cloud
(also known as an H2O cluster).
-flatfile <flatFileName>
Configuration file explicitly listing H2O cloud node members.
-ip <ipAddressOfNode>
IP address of this node.
-port <port>
Port number for this node (note: port+1 is also used).
(The default port is 0.)
-network <IPv4network1Specification>[,<IPv4network2Specification> ...]
The IP address discovery code will bind to the first interface
that matches one of the networks in the comma-separated list.
Use instead of -ip when a broad range of addresses is legal.
(Example network specification: '10.1.2.0/24' allows 256 legal
possibilities.)
-ice_root <fileSystemPath>
The directory where H2O spills temporary data to disk.
-log_dir <fileSystemPath>
The directory where H2O writes logs to disk.
(This usually has a good default that you need not change.)
-log_level <TRACE,DEBUG,INFO,WARN,ERRR,FATAL>
Write messages at this logging level, or above. Default is INFO.
-flow_dir <server side directory or HDFS directory>
The directory where H2O stores saved flows.
(The default is 'C:\Users\Peter Ellis\h2oflows'.)
-nthreads <#threads>
Maximum number of threads in the low priority batch-work queue.
(The default is 99.)
-client
Launch H2O node in client mode.
Cloud formation behavior:
New H2O nodes join together to form a cloud at startup time.
Once a cloud is given work to perform, it locks out new members
from joining.
Examples:
Start an H2O node with 4GB of memory and a default cloud name:
$ java -Xmx4g -jar h2o.jar
Start an H2O node with 6GB of memory and a specify the cloud name:
$ java -Xmx6g -jar h2o.jar -name MyCloud
Start an H2O cloud with three 2GB nodes and a default cloud name:
$ java -Xmx2g -jar h2o.jar &
$ java -Xmx2g -jar h2o.jar &
$ java -Xmx2g -jar h2o.jar &
[1] "127.0.0.1"
[1] 54321
[1] TRUE
[1] -1
[1] "Failed to connect to 127.0.0.1 port 54321: Connection refused"
curl: (1) Protocol "'http" not supported or disabled in libcurl
[1] 1
Error in h2o.init() : H2O failed to start, stopping execution.
In addition: Warning message:
running command 'curl 'http://localhost:54321'' had status 1
> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_New Zealand.1252 LC_CTYPE=English_New Zealand.1252 LC_MONETARY=English_New Zealand.1252
[4] LC_NUMERIC=C LC_TIME=English_New Zealand.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] h2o_3.6.0.8 statmod_1.4.22
loaded via a namespace (and not attached):
[1] tools_3.2.3 RCurl_1.95-4.7 jsonlite_0.9.19 bitops_1.0-6

We pushed a fix to master for this issue: https://0xdata.atlassian.net/browse/PUBDEV-2526 If you want to try it out now you can build from master as follows:
git clone https://github.com/h2oai/h2o-3
cd h2o-3
./gradlew build -x test
R CMD INSTALL ./h2o-r/R/src/contrib/h2o_3.7.0.99999.tar.gz
Or download the next nightly release tomorrow.

R Shiny breaks PostgreSQL authentication with .pgpass

I have my password to database stored in pgpass.conf file. I am connecting to database from R with RPostgres, without specifying password so it is read from pgpass.conf, like this:
con <- dbConnect(RPostgres::Postgres(),
dbname = "dbname",
user = "username",
host = "localhost",
port = "5432")
It usually works perfectly, however when I try to connect to database from shiny app it doesn't work. Connection definition is exactly the same as above and placed in server.R script. When I run Shiny app with default arguments I get an error:
FATAL: password authentication failed for user "username"
password retrieved from file "C:\Users\...\AppData\Roaming/postgresql/pgpass.conf"
When password is explicitly given in connection definition:
con <- dbConnect(RPostgres::Postgres(),
dbname = "dbname",
user = "username",
host = "localhost",
password = "mypass",
port = "5432")
everything works.
To make things stranger, when port for shiny is set to some value, for example: shiny::runApp(port = 4000), connection is established without specifying password, but ONLY for the first time - that means when app is closed and reopened in the same R session, the error occurs again.
I've tested package 'RPostgreSQL' - it doesn't work neither, only error message is different:
Error in postgresqlNewConnection(drv, ...) :
RS-DBI driver: (could not connect postgres#localhost on dbname "dbname")
I use 32-bit R but I've tested it on 64-bit and it was the same. Shiny app was run both in browser (Chrome) and in Rstudio Viewer.
Here my session info:
R version 3.2.2 (2015-08-14)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=Polish_Poland.1250 LC_CTYPE=Polish_Poland.1250 LC_MONETARY=Polish_Poland.1250
[4] LC_NUMERIC=C LC_TIME=Polish_Poland.1250
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RPostgres_0.1 DBI_0.3.1.9008 shiny_0.12.2
loaded via a namespace (and not attached):
[1] R6_2.1.1 htmltools_0.2.6 tools_3.2.2 rstudioapi_0.3.1 Rcpp_0.12.1.3 jsonlite_0.9.17 digest_0.6.8
[8] xtable_1.7-4 httpuv_1.3.3 mime_0.4 RPostgreSQL_0.4

There's likely something different about the environment in which the command is run between Shiny and your system R GUI. I get around this by storing my credentials in an Renviron file:
readRenviron("~/.Renviron")
con <- dbConnect(RPostgres::Postgres(),
dbname = Sys.getenv('pg_db'),
user = Sys.getenv('api_user'),
...)
The thing about that is you could maintain separate Renvirons for staging and production environments. This allows your script to take a commandArgs() to specify which DB credentials it should use:
#!/usr/bin/env Rscript
environ_path <- switch(commandArgs(),
'staging' = {"~/staging.Renviron"},
'production' = {"~/production/Renviron"})
readRenviron(environ_path)
Then from BASH:
Rscript analysis.R staging

The Error is in Postgresql:
C:\Users\...\AppData\Roaming/postgresql/pgpass.conf
the file path contains "/" instead of "\"

R connecting to EC2 instance for parallel processing

I am having trouble initialising a connection to an AWS EC2 instance from R as I seem to keep getting the error: Permission denied (publickey) I am currently using a Mac OS X 10.6.8 as my OS
The code that I try to run in the terminal ($) and then R (>) is as follows:
$ R --vanilla
> require(snowfall)
> sfInit(parallel=TRUE,socketHosts =list("ec2-xx-xxx-xx-xx.zone.compute.amazonaws.com"))
Permission denied (publickey)
but weirdly when trying to ssh into the instance I don't need a password as I had already imported the public key into the instance upon initialization, (I think)
so from my normal terminal...when running
$ ssh ubuntu#ec2-xx-xxx-xx-xx.zone.compute.amazonaws.com
it automatically connects...(so im not 100% sure if its a passwordless issue like in Using snow (and snowfall) with AWS for parallel processing in R)
I have tried looking through a fair amount of the material on keys etc, but none of it seems to be making much of a difference. Also my ~/.ssh/authorized_keys is a folder rather than a file for some reason and I can't access it even when trying sudo cd .ssh/authorized_keys... in terms of permissions it has drw-------
The end goal is to connect to a lot of ec2 instances and use foreach to carry out some parallel processing...but connecting to one for now would be nice...also I would like to use my own ami so the starcluster isn't really what I am looking for....(unless I am able to use private amis and run all commands privately...)
also if doRedis is better than if someone could show me how one would connect to the ec2 instance from a local machine that would be good too...
EDIT
I have managed to deal with the ssh password-less login using the parallel package makePSOCKcluster as shown in R and makePSOCKcluter EC2 socketConnection ...but now coming across socketConnection issues as shown in the question in the link...
Any ideas how to connect to it?
Also proof that everything is working, I guess would mean that the following command/function would work to get in all the different ip addresses
d <- parLapply(cl1, 1:length(cl1),function(x)system("ifconfig",intern=T)[2])
where cl1 is the output of the make*cluster function
NOTE since the bounty is really for the question in the link....I don't mind which question you post up an answer to...but the so long as something is written on this question that links it to the correct answer on the linked question, then I will award the points accordingly...

I had quite a few issues with parallel EC2 setup too when trying to keep the master node local. Using StarCluster to setup the pool helped greatly, but the real improvement came with using StarCluster and having the master node within the EC2 private ip pool.
StarCluster sets up all of the key handling for all the nodes as well as any mounts used. Dynamic node allocation wasn't doable, but unless spot instances are going to be used long term and your bidding strategy doesn't 'keep' your instances then dynamic allocation should be an issue.
Some other lessons learned:
Create a variable containing the private IPs to pass to createCluster and export it, so when you have need to restart with the same nodes it is easier.
Have the master node run byobu and set it up for R session logging.
Running RStudio server on the master can be very helpful at times, but should be a different AMI than the slave nodes. :)
Have the control script offload data rda files to a path that is remotely monitored for new files and automatically download them.
Use htop to monitor the slaves so you can easily see the instances and determine script requirements (memory/cpu/scalability).
Make use of processor hyper-threading enable/disable scripts.
I had quite a bit of an issue with the slave connections and serialize/unserialize and found that one of the things was the connection limit, and that the connection limit needed to be reduced by the number of nodes; and when the control script was stopped the easiest method of cleanup was restarting the master R session, and using a script to kill the slave processes instead of waiting for timeout.
It did take a bit of work to setup, but hopefully these thoughts help...
Although it was 8 months ago and both StarCluster and R have changed here's some of how it was setup... You'll find 90% of this in the StarCluster docs.
Setup .starcluster/config AWS and key-pair sections based on the seurity info from AWS console.
Define the [smallcluster]
key-name
availability-zone
Define a cluster template extending [smallcluster]. Using AMIs based on the StarCluster 64bit HVM AMI. Instead of creating new public AMI instances I just saved a configured instance (with all the tools I needed) and used that as the AMI.
Here's an example of one...
[cluster Rnodes2]
EXTENDS=smallcluster
MASTER_INSTANCE_TYPE = cc1.4xlarge
MASTER_IMAGE_ID= ami-7621f91f
NODE_INSTANCE_TYPE = cc2.8xlarge
NODE_IMAGE_ID= ami-7621f91f
CLUSTER_SIZE= 8
VOLUMES= rdata
PLUGINS= pkginstaller
SPOT_BID= 1.00
Setup the shared volume, this is where the screen/byoubu logs, the main .R script checkpoint output, shared R data, and the source for the production package live. It was monitored for new files in a child path called export so if the cluster or control script died/abended a max number of records would all that would be lost and need to be re-calculated.
After creating the shared volume, the definition was simply:
[volume rdata]
VOLUME_ID = vol-1145497c
MOUNT_PATH = /rdata
The package installer which ensured the latest (and equal) R versions on all nodes.
[plugin pkginstaller]
setup_class = starcluster.plugins.pkginstaller.PackageInstaller
packages = r-base, r-base-dev, r-recommended
Lastly, access permissions for both ssh and RStudio server. Https via proxy would be safer, but since RStudio was only used for the control script setup...
[permission ssh]
# protocol can be: tcp, udp, or icmp
protocol = tcp
from_port = 22
to_port = 22
# [permission http]
protocol = tcp
from_port = 8787
to_port = 8787
Then startup a cluster using the StarCluster interface. It handles all of the access controls, system names, shares, etc... Once the cluster was running I ran an ssh session into each from my local system, and ran a script to stop hyper-threading:
#!/bin/sh
# disable hyperthreading
for cpunum in $(
cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list |
cut -s -d, -f2- | tr ',' '\n' | sort -un); do
echo 0 > /sys/devices/system/cpu/cpu$cpunum/online
done
then started an htop session on each for monitoring scalability against the exported checkpoint logs.
Then, logged into the master, started a screen session (I've since preferred byobu) and fired up R from within the StarCluster mounted volume. That way when the cluster stopped for some reason I could easily setup again just by starting R. Once in R the first thing was to create a workers.list variable using the nodeXXX names, which was simply something along the lines of:
cluster.nodes <- c("localhost", paste("node00", 1:7, sep='' ) )
workers.list <- rep( cluster.nodes, 8 )
Then I loaded up the control script, quit and saved the workspace. The control script handled all of the table output for exporting and checkpoints and par wrapped calls to the production package. The main function of the script also took a cpus argument which is where the workers list was placed, which was then passed as cores to the cluster initializer.
initialize.cluster <- function( cores )
{
if( exists( 'cl' ) ) stopCluster( cl )
print("Creating Cluster")
cl <- makePSOCKcluster( cores )
print("Cluster created.")
assign( 'cl', cl, envir=.GlobalEnv )
print( cl )
# All workers need to have the bounds generator functions...
clusterEvalQ( cl, require('scoreTarget') )
# All workers need to have the production script and package.
clusterExport( cl, varlist=list('RScoreTarget', 'scoreTarget'))
return ( cl )
}
Once the R session was restarted (after initially creating the worker.list) the control script was sourced, and the main func called. That was it. With this setup, if the cluster ever stopped, I'd just quit the rsession on the main host; stop the slave processes via htop on each of the slaves and startup again.
Here's an example of it in action::
R
R version 2.15.0 (2012-03-30)
Copyright (C) 2012 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
[Previously saved workspace restored]
> source('/rdata/buildSatisfactionRangeTable.R')
Loading required package: data.table
data.table 1.7.7 For help type: help("data.table")
Loading required package: parallel
Loading required package: scoreTarget
Loading required package: Rcpp
> ls()
[1] "build.satisfaction.range.table" "initialize.cluster"
[3] "initialize.table" "parallel.choices.threshold"
[5] "rolled.lower" "rolled.upper"
[7] "RScoreTarget" "satisfaction.range.table"
[9] "satisfaction.search.targets" "search.range.bound.offsets"
[11] "search.range.bounds" "search.range.center"
[13] "Search.Satisfaction.Range" "update.bound.offset"
[15] "workers.list"
> workers.list
[1] "localhost" "localhost" "localhost" "localhost" "localhost" "localhost"
[7] "localhost" "localhost" "node001" "node002" "node003" "node004"
[13] "node005" "node006" "node007" "node001" "node002" "node003"
[19] "node004" "node005" "node006" "node007" "node001" "node002"
[25] "node003" "node004" "node005" "node006" "node007" "node001"
[31] "node002" "node003" "node004" "node005" "node006" "node007"
[37] "node001" "node002" "node003" "node004" "node005" "node006"
[43] "node007" "node001" "node002" "node003" "node004" "node005"
[49] "node006" "node007" "node001" "node002" "node003" "node004"
[55] "node005" "node006" "node007" "node001" "node002" "node003"
[61] "node004" "node005" "node006" "node007" "node001" "node002"
[67] "node003" "node004" "node005" "node006" "node007" "node001"
[73] "node002" "node003" "node004" "node005" "node006" "node007"
[79] "node001" "node002" "node003" "node004" "node005" "node006"
[85] "node007" "node001" "node002" "node003" "node004" "node005"
[91] "node006" "node007" "node001" "node002" "node003" "node004"
[97] "node005" "node006" "node007" "node001" "node002" "node003"
[103] "node004" "node005" "node006" "node007" "node001" "node002"
[109] "node003" "node004" "node005" "node006" "node007" "node001"
[115] "node002" "node003" "node004" "node005" "node006" "node007"
> build.satisfaction.range.table(500000, FALSE, workers.list )
[1] "Creating Cluster"
[1] "Cluster created."
socket cluster with 120 nodes on hosts ‘localhost’, ‘node001’, ‘node002’, ‘node003’, ‘node004’, ‘node005’, ‘node006’, ‘node007’
Parallel threshold set to: 11000
Starting at: 2 running to: 5e+05 :: Sat Apr 14 22:21:05 2012
If you have read down to here then you may be interested to know that I tested each cluster setup I could (including openMPI) and found that there wasn't a speed difference, perhaps that is because my calculations where so CPU bound, perhaps not.
Also, don't give up even though it can be a pain to get going with HPC. It can be totally worth it. I would still be waiting to complete the first 100,000 iterations of the calculations I was running had I stuck with a naive implementation in base-R on a commodity workstation (well, not really as I would never have stuck with R :D ). With the cluster, 384,000 iterations completed in under a week. Totally worth the time (and it took a lot of it) to setup.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

RAmazonS3 connection authentication issue - HTTP/1.1 403 Forbidden - r

Related

How to fix the timeout problem in curl::curl_fetch_memory when using R with a proxy?

sftp with R - sftp not a protocol with RCurl

h2o from R on Windows gives curl error: Protocol "'http" not supported or disabled in libcurl

R Shiny breaks PostgreSQL authentication with .pgpass

R connecting to EC2 instance for parallel processing

Categories

Resources