Unexpected behavior of R after install on another EC2 instance - r

I'm fighting this problem second day straight with a completely sleepless night and I'm really starting to lose my patience and strength. It all started after I decided to provision another (paid) AWS EC2 instance in order to test my R code for dissertation data analysis. Previously I was using a single free tier t1.micro instance, which is painfully slow, especially when testing/running particular code. Time is much more valuable than reasonable number of cents per hour that Amazon is charging.
Therefore, I provisioned a m3.large instance, which I hope should have enough power to crunch my data comfortably fast. After EC2-specific setup, which included selecting Ubuntu 14.04 LTS as an operating system and some security setup, I installed R and RStudio Server per instructions via sudo apt-get install r-base r-base-dev as ubuntu user. I also created ruser as a special user for running R sessions. Basically, the same procedure as on the smaller instance.
Current situation is that any command that I issuing in R session command line result in messages like this: Error: could not find function "sessionInfo". The only function that works is q(). I suspect here a permissions problem, however, I'm not sure how to approach investigating permission-related problems in R environment. I'm also curious what could be the reasons for such situation, considering that I was following recommendations from R Project and RStudio sources.

I was able to pinpoint the place that I think caused all that horror - it was just a small configuration file "/etc/R/Rprofile.site", which I have previously updated with directives borrowed from R experts' posts here on StackOverflow. After removing questionable contents, I was able to run R commands successfully. Out of curiosity and for sharing this hard-earned knowledge, here's the removed contents:
local({
# add DISS_FLOSS_PKGS to the default packages, set a CRAN mirror
DISS_FLOSS_PKGS <- c("RCurl", "digest", "jsonlite",
"stringr", "XML", "plyr")
#old <- getOption("defaultPackages")
r <- getOption("repos")
r["CRAN"] <- "http://cran.us.r-project.org"
#options(defaultPackages = c(old, DISS_FLOSS_PKGS), repos = r)
options(defaultPackages = DISS_FLOSS_PKGS, repos = r)
#lapply(list(DISS_FLOSS_PKGS), function() library)
library(RCurl)
library(digest)
library(jsonlite)
library(stringr)
library(XML)
library(plyr)
})
Any comments on this will be appreciated!

Related

R session abort when I use assignTaxonomy

I have been having this problem for more than a week now and I am running out of time and patience.This problem occurs when I run my script on a Mac and when I run it on a PC (no difference of results from more RAM, it just aborts faster). When I try to run this line of my dataset, the session aborts.
set.seed(119)
tax_PR2 <- assignTaxonomy(seqtab,
"~/Desktop/Documents/Bruts/aeDNA_data_shared/pr2_version_4.11.1_dada2.fasta",
multithread=TRUE)
Does anyone have any idea of what the problem is? I verified my dataset (seqtab is currently considered by R as a large matrix of 3930724 elements of 20.2Mb), I verified the space I have on my computer, I have all the needed packages to run this line of code and I tried different sources of genome database for PR2 (PR2 version 4.11.1 or 4.12.0 etc...) and it always has the same result.
If you have any ideas I would appreciate them. I hope the information I gave is sufficient.
Packages installed:
library(BiocManager)
library(Rcpp)
library(dada2)
library(ff)
library(ggplot)
library(gridExtra)
library(phyloseq)
library(vegan)
This is probably caused by a bug that was introduced in 1.14, see the Github issue here for more information: https://github.com/benjjneb/dada2/issues/916
We've just identified the cause, and a fix should be out soon. For immediate use, the workaround is to turn off multithreading, or to revert to the previous release 1.12.

Possibility of using multiple CRAN mirrors? [RStudio]

I've spent the last 4 hours trying to find out why I wasn't able to install any packages in R (it started with me trying to install a package I was developing). It would just stall for +5 minutes without doing anything or outputting anything to the terminal. I tracked the issue down to the following call:
repos <- structure("https://cran.rstudio.com/", .Names = "CRAN")
type <- "both"
utils::available.packages(utils::contrib.url(repos, type), type = type)
I realized after a while that it was just the RStudio CRAN mirror, and when I changed the repo to "https://cran.case.edu/", it started working. However, there was literally no indication of this on the surface--and it took forever for me to find out what the problem was.
My question is this: Is there a way of using multiple CRAN repos so when one fails, others are used? Or at least a way that will prevent this from happening again?

RStudio Connect, Packrat, and custom packages in local repos

We have recently got RStudio Connect in my office. For our work, we have made custom packages, which we have updated amongst ourselves by opening the project and build+reloading.
I understand the only way I can get our custom packages to work within apps with RSConnect is to get up a local repo and set our options(repos) to include this.
Currently I have the following:
library(drat)
RepoAddress <- "C:/<RepoPath>" # High level path
drat::insertPackage(<sourcePackagePath>, repodir = RepoAddress)
# Add this new repo to Rs knowledge of repos.
options(repos = c(options("repos")$repos,LocalCurrent = paste0("file:",RepoAddress)))
# Install <PackageName> from the local repo :)
install.packages("<PackageName>")
Currently this works nicely and I can install my custom package from the local repo. This indicates to me that the local repo is set up correctly.
As an additional aside, I have changed the DESCRIPTION file to have an extra line saying repository:LocalCurrent.
However when I try to deploy a Shiny app or Rmd which references , I get the following error on my deploy:
Error in findLocalRepoForPkg(pkg, repos, fatal = fatal) :
No package '<PackageName> 'found in local repositories specified
I understand this is a problem with packrat being unable to find my local repos during the deploy process (I believe at a stage where it uses packrat::snapshot()).This is confusing since I would have thought packrat would use my option("repos") repos similar to install.packages. If I follow through the functions, I can see the particular point of failure is packrat:::findLocalRepoForPkg("<PackageName", repos = packrat::get_opts("local.repos")), which fails even after I define packrat::set_opts("local.repos" = c(CurrentRepo2 = paste0("file:",RepoAddress)))
If I drill into packrat:::findLocalRepoForPkg, it fails because it can't find a file/folder called: "C://". I would have thought this is guaranteed to fail, because repos follow the C://bin/windows/contrib/3.3/ structure. At no point would a repo have the structure it's looking for?
I think this last part is showing I'm materially misunderstanding something. Any guidance on configuring my repo so packrat can understand it would be great.
One should always check what options RStudio connect supports at the moment:
https://docs.rstudio.com/connect/admin/r/package-management/#private-packages
Personally I dislike all options for including local/private packages, as it defeats the purpose of having a nice easy target for deploying shiny apps. In many cases, I can't just set up local repositories in the organization because, I do not have clearance for that. It is also inconvenient that I have to email IT-support to make them manually install new packages. Overall I think RS connect is great product because it is simple, but when it comes to local packages it is really not.
I found a nice alternative/Hack to Rstudio official recommendations. I suppose thise would also work with shinyapps.io, but I have not tried. The solution goes like:
add to global.R if(!require(local_package) ) devtools::load_all("./local_package")
Write a script that copies all your source files, such that you get a shinyapp with a source directory for a local package inside, you could call the directory for ./inst/shinyconnect/ or what ever and local package would be copied to ./inst/shinyconnect/local_package
manifest.
add script ./shinyconnect/packrat_sees_these_dependencies.R to shiny folder, this will be picked up by packrat-manifest
Hack rsconnet/packrat to ignore specifically named packages when building
(1)
#start of global.R...
#load more packages for shiny
library(devtools) #need for load_all
library(shiny)
library(htmltools) #or what ever you need
#load already built local_package or for shiny connection pseudo-build on-the-fly and load
if(!require(local_package)) {
#if local_package here, just build on 2 sec with devtools::load_all()
if(file.exists("./DESCRIPTION")) load_all(".") #for local test on PC/Mac, where the shinyapp is inside the local_package
if(file.exists("./local_package/DESCRIPTION")) load_all("./local_package/") #for shiny conenct where local_package is inside shinyapp
}
library(local_package) #now local_package must load
(3)
make script loading all the dependencies of your local package. Packrat will see this. The script will never be actually be executed. Place it at ./shinyconnect/packrat_sees_these_dependencies.R
#these codelines will be recognized by packrat and package will be added to manifest
library(randomForest)
library(MASS)
library(whateverpackageyouneed)
(4) During deployment, manifest generator (packrat) will ignore the existence of any named local_package. This is an option in packrat, but rsconnect does not expose this option. A hack is to load rsconnect to memory and and modify the sub-sub-sub-function performPackratSnapshot() to allow this. In script below, I do that and deploy a shiny app.
library(rsconnect)
orig_fun = getFromNamespace("performPackratSnapshot", pos="package:rsconnect")
#packages you want include manually, and packrat to ignore
ignored_packages = c("local_package")
#highjack rsconnect
local({
assignInNamespace("performPackratSnapshot",value = function (bundleDir, verbose = FALSE) {
owd <- getwd()
on.exit(setwd(owd), add = TRUE)
setwd(bundleDir)
srp <- packrat::opts$snapshot.recommended.packages()
packrat::opts$snapshot.recommended.packages(TRUE, persist = FALSE)
packrat::opts$ignored.packages(get("ignored_packages",envir = .GlobalEnv)) #ignoreing packages mentioned here
print("ignoring following packages")
print(get("ignored_packages",envir = .GlobalEnv))
on.exit(packrat::opts$snapshot.recommended.packages(srp,persist = FALSE), add = TRUE)
packages <- c("BiocManager", "BiocInstaller")
for (package in packages) {
if (length(find.package(package, quiet = TRUE))) {
requireNamespace(package, quietly = TRUE)
break
}
}
suppressMessages(packrat::.snapshotImpl(project = bundleDir,
snapshot.sources = FALSE, fallback.ok = TRUE, verbose = FALSE,
implicit.packrat.dependency = FALSE))
TRUE
},
pos = "package:rsconnect"
)},
envir = as.environment("package:rsconnect")
)
new_fun = getFromNamespace("performPackratSnapshot", pos="package:rsconnect")
rsconnect::deployApp(appDir="./inst/shinyconnect/",appName ="shinyapp_name",logLevel = "verbose",forceUpdate = TRUE)
The problem is one of nomenclature.
I have set up a repo in the CRAN sense. It works fine and is OK. When packrat references a local repo, it is referring to a local git-style repo.
This solves why findlocalrepoforpkg doesn't look like it will work - it is designed to work with a different kind of repo.
Feel free to also reach out to support#rstudio.com
I believe the local package code path is triggered in packrat because of the missing Repository: value line in the Description file of the package. You mentioned you added this line, could you try the case-sensitive version?
That said, RStudio Connect will not be able to install the package from the RepoAddress as you've specified it (hardcoded on the Windows share). We'd recommend hosting your repo over https from a server that both your Dev environment and RStudio Connect have access to. To make this type of repo setup much easier we just released RStudio Package Manager which you (and IT!) may find a compelling alternative to manually managing releases of your internal packages via drat.

Executing a SAS program in R using system() Command

My company recently converted to SAS and did not buy the SAS SHARE license so I cannot ODBC into the server. I am not a SAS user, but I am writing a program that needs to query data from the server and I want to have my R script call a .sas program to retrieve the data. I think this is possible using
df <- system("sas -SYSIN path/to/sas/script.sas")
but I can't seem to make it work. I have spent all a few hours on the Googles and decided to ask here.
error message:
running command 'sas -SYSIN C:/Desktop/test.sas' had status 127
Thanks!
Assuming your sas program generates a sas dataset, you'll need to do two things:
Through shellor system, make SAS run the program, but first cd in the directory containing the sas executable in case the directory isn't in your PATH environment variable.
setwd("c:\\Program Files\\SASHome 9.4\\SASFoundation\\9.4\\")
return.code <- shell("sas.exe -SYSIN c:\\temp\\myprogram.sas")
Note that what this returns is NOT the data itself, but the code issued by the OS telling you if the task succeeded or not. A code 0 means task has succeeded.
In the sas program, all I did was to create a copy of sashelp.baseball in the c:\temp directory.
Import the generated dataset into R using one of the packages written for that. Haven is the most recent and IMO most reliable one.
# Install Haven from CRAN:
install.packages("haven")
# Import the dataset:
myData <- read_sas("c:\\temps\\baseball.sas7bdat")
And there you should have it!

External Scripting and R (Kognitio)

I have created the R script environment (used this command to create it "create script environment RSCRIPT command '/usr/local/R/bin/Rscript --vanilla --slave'") and tried running the one R script but it fails with the below error message.
ERROR: RS 10 S 332659 R 31A004F LO:Script stderr: external script vfork child: No such file or directory
Is it because of the below line which i am using in the script ?
mydata <- read.csv(file=file("stdin"), header=TRUE)
if (nrow(mydata) > 0){
I am not sure what is it expecting.
I have one more questions to ask.
1) do we need to install the R package on our unix box ? if not then the kognitio package has it
I suspect the problem here is that you have not installed the R environment on ALL the database nodes in your system - it must be installed on every DB node involved in processing (as explained in chapter 10 of the Kognitio Guide which you can download from http://www.kognitio.com/forums/viewtopic.php?t=3) or you will see errors like "external script vfork child: No such file or directory".
You would normally use a remote deployment tool (e.g. HP's RDP) to ensure the installation was identical on all DB nodes. Alternatively, you can leverage the Kognitio wxsync tool to synchronise files across nodes.
Section 10.6 of the Kognitio Guide also explains how to constrain which DB nodes are involved in processing - this is appropriate if your script environment should not run on all nodes for some reason (e.g. it has an expensive per-node/per-core licence). That does not seem appropriate for using R though.

Resources