Executing a SAS program in R using system() Command - r

My company recently converted to SAS and did not buy the SAS SHARE license so I cannot ODBC into the server. I am not a SAS user, but I am writing a program that needs to query data from the server and I want to have my R script call a .sas program to retrieve the data. I think this is possible using
df <- system("sas -SYSIN path/to/sas/script.sas")
but I can't seem to make it work. I have spent all a few hours on the Googles and decided to ask here.
error message:
running command 'sas -SYSIN C:/Desktop/test.sas' had status 127
Thanks!

Assuming your sas program generates a sas dataset, you'll need to do two things:
Through shellor system, make SAS run the program, but first cd in the directory containing the sas executable in case the directory isn't in your PATH environment variable.
setwd("c:\\Program Files\\SASHome 9.4\\SASFoundation\\9.4\\")
return.code <- shell("sas.exe -SYSIN c:\\temp\\myprogram.sas")
Note that what this returns is NOT the data itself, but the code issued by the OS telling you if the task succeeded or not. A code 0 means task has succeeded.
In the sas program, all I did was to create a copy of sashelp.baseball in the c:\temp directory.
Import the generated dataset into R using one of the packages written for that. Haven is the most recent and IMO most reliable one.
# Install Haven from CRAN:
install.packages("haven")
# Import the dataset:
myData <- read_sas("c:\\temps\\baseball.sas7bdat")
And there you should have it!

Related

Deploy custom R script as web service Azure ML Studio

I have an R script which takes as input an excel file with two columns containing dates-values and it gives as output 3 dates with the corresponding prediction values. I have already successfully implemented it in Azure Machine Learning Studio using three nodes. One containing the zipped packages I use, one with the input .csv file and the last one with the R script.
The problem is when I deploy it as a web service and I try to give as input new values for Col1 and Col2, I receive the following error.
FailedToParseValue: Failed to parse value '90000, 950000, 970000' as type 'System.Double'., Error code: LibraryExecutionError, Http status code:400
The zipped libraries I use attached are: Hmisc, gdata, forecast, lubridate, fma, expsmooth, ggplot2, tsibble, fpp2, and plyr. I have also tried using the notebooks provided but no good luck as I always face some kind of problem with package installation. Moreover, I tried to follow this approach https://azure.github.io/azureml-sdk-for-r/articles/train-and-deploy-to-aci/train-and-deploy-to-aci.html locally from R Studio but I have difficulty in adapting it to my case.
Any help would be greatly appreciated!
I didn't have any success installing packages via a zip. However the following worked for me in an Execute R Script and installed all dependancies also.
if(!require(package)) install.packages("package",repos = "https://ftp.heanet.ie/mirrors/cran.r-project.org/")
Make sure that your repo is from CRAN in your country.

Publishing AzureML Webservice from R requires external zip utility

I want to deploy a basic trained R model as a webservice to AzureML. Similar to what is done here:
http://www.r-bloggers.com/deploying-a-car-price-model-using-r-and-azureml/
Since that post the publishWebService function in the R AzureML package was has changed it now requires me to have a workspace object as first parameter thus my R code looks as follows:
library(MASS)
library(AzureML)
PredictionModel = lm( medv ~ lstat , data = Boston )
PricePredFunktion = function(percent)
{return(predict(PredictionModel, data.frame(lstat =percent)))}
myWsID = "<my Workspace ID>"
myAuth = "<my Authorization code"
ws = workspace(myWsID, myAuth, api_endpoint = "https://studio.azureml.net/", .validate = TRUE)
# publish the R function to AzureML
PricePredService = publishWebService(
ws,
"PricePredFunktion",
"PricePredOnline",
list("lstat" = "float"),
list("mdev" = "float"),
myWsID,
myAuth
)
But every time I execute the code I get the following error:
Error in publishWebService(ws, "PricePredFunktion", "PricePredOnline", :
Requires external zip utility. Please install zip, ensure it's on your path and try again.
I tried installing programs that handle zip files (like 7zip) on my machine as well as calling the utils library in R which allows R to directly interact with zip files. But I couldn't get rid of the error.
I also found the R package code that is throwing the error, it is on line 154 on this page:
https://github.com/RevolutionAnalytics/AzureML/blob/master/R/internal.R
but it didn't help me in figuring out what to do.
Thanks in advance for any Help!
The Azure Machine Learning API requires the payload to be zipped, which is why the package insists on the zip utility being installed. (This is an unfortunate situation, and hopefully we can find a way in future to include a zip with the package.)
It is unlikely that you will ever encounter this situation on Linux, since most (all?) Linux distributions includes a zip utility.
Thus, on Windows, you have to do the following procedure once:
Install a zip utility (RTools has one and this works)
Ensure the zip is on your path
Restart R – this is important, otherwise R will not recognize the changed path
Upon completion, the litmus test is if R can see your zip. To do this, try:
Sys.which("zip")
You should get a result similar to this:
zip
"C:\\Rtools\\R-3.1\\bin\\zip.exe"
In other words, R should recognize the installation path.
On previous occasions when people told me this didn’t work, it was always because they thought they had a zip in the path, but it turned out they didn’t.
One last comment: installing 7zip may not work. The reason is that 7zip contains a utility called 7zip, but R will only look for a utility called zip.
I saw this link earlier but the additional clarification which made my code not work was
1. Address and Path of Rtools was not as straigt forward
2. You need to Reboot R
With regards to the address - always look where it was installed . I also used this code to set the path and ALWAYS ADD ZIP at the end
##Rtools.bin="C:\\Users\\User_2\\R-Portable\\Rtools\\bin"
Rtools.bin="C:\\Rtools\\bin\\zip"
sys.path = Sys.getenv("PATH")
if (Sys.which("zip") == "" ) {
system(paste("setx PATH \"", Rtools.bin, ";", sys.path, "\"", sep = ""))
}
Sys.which("zip")
you should get a return of
" C:\\RTools|\bin\zip"
From looking at Andrie's comment here: https://github.com/RevolutionAnalytics/AzureML/commit/9cf2c5c59f1f82b874dc7fdb1f9439b11ab60f40
Implies we can just download RTools and be done with it.
Download RTools from:
https://cran.r-project.org/bin/windows/Rtools/
During installation select the check box to modify the PATH
At first it didn't work. I then tried R32bit, and that seemed to work. Then R64 bit started working again. Honestly, not sure if I did something in the middle to make it work. Only takes a few minutes so worth a punt.
Try the following
-Download the Rtools file which usually contains the zip utility.
-Copy all the files in the "bin" folder of "Rtools"
-Paste them in "~/RStudio/bin/x64" folder

External Scripting and R (Kognitio)

I have created the R script environment (used this command to create it "create script environment RSCRIPT command '/usr/local/R/bin/Rscript --vanilla --slave'") and tried running the one R script but it fails with the below error message.
ERROR: RS 10 S 332659 R 31A004F LO:Script stderr: external script vfork child: No such file or directory
Is it because of the below line which i am using in the script ?
mydata <- read.csv(file=file("stdin"), header=TRUE)
if (nrow(mydata) > 0){
I am not sure what is it expecting.
I have one more questions to ask.
1) do we need to install the R package on our unix box ? if not then the kognitio package has it
I suspect the problem here is that you have not installed the R environment on ALL the database nodes in your system - it must be installed on every DB node involved in processing (as explained in chapter 10 of the Kognitio Guide which you can download from http://www.kognitio.com/forums/viewtopic.php?t=3) or you will see errors like "external script vfork child: No such file or directory".
You would normally use a remote deployment tool (e.g. HP's RDP) to ensure the installation was identical on all DB nodes. Alternatively, you can leverage the Kognitio wxsync tool to synchronise files across nodes.
Section 10.6 of the Kognitio Guide also explains how to constrain which DB nodes are involved in processing - this is appropriate if your script environment should not run on all nodes for some reason (e.g. it has an expensive per-node/per-core licence). That does not seem appropriate for using R though.

Unexpected behavior of R after install on another EC2 instance

I'm fighting this problem second day straight with a completely sleepless night and I'm really starting to lose my patience and strength. It all started after I decided to provision another (paid) AWS EC2 instance in order to test my R code for dissertation data analysis. Previously I was using a single free tier t1.micro instance, which is painfully slow, especially when testing/running particular code. Time is much more valuable than reasonable number of cents per hour that Amazon is charging.
Therefore, I provisioned a m3.large instance, which I hope should have enough power to crunch my data comfortably fast. After EC2-specific setup, which included selecting Ubuntu 14.04 LTS as an operating system and some security setup, I installed R and RStudio Server per instructions via sudo apt-get install r-base r-base-dev as ubuntu user. I also created ruser as a special user for running R sessions. Basically, the same procedure as on the smaller instance.
Current situation is that any command that I issuing in R session command line result in messages like this: Error: could not find function "sessionInfo". The only function that works is q(). I suspect here a permissions problem, however, I'm not sure how to approach investigating permission-related problems in R environment. I'm also curious what could be the reasons for such situation, considering that I was following recommendations from R Project and RStudio sources.
I was able to pinpoint the place that I think caused all that horror - it was just a small configuration file "/etc/R/Rprofile.site", which I have previously updated with directives borrowed from R experts' posts here on StackOverflow. After removing questionable contents, I was able to run R commands successfully. Out of curiosity and for sharing this hard-earned knowledge, here's the removed contents:
local({
# add DISS_FLOSS_PKGS to the default packages, set a CRAN mirror
DISS_FLOSS_PKGS <- c("RCurl", "digest", "jsonlite",
"stringr", "XML", "plyr")
#old <- getOption("defaultPackages")
r <- getOption("repos")
r["CRAN"] <- "http://cran.us.r-project.org"
#options(defaultPackages = c(old, DISS_FLOSS_PKGS), repos = r)
options(defaultPackages = DISS_FLOSS_PKGS, repos = r)
#lapply(list(DISS_FLOSS_PKGS), function() library)
library(RCurl)
library(digest)
library(jsonlite)
library(stringr)
library(XML)
library(plyr)
})
Any comments on this will be appreciated!

How to make a log-file of an R-session which combines commands, results and warnings/messages/errors from the R-console

I would like to produce a log-file which keeps track of all commands (stdin), results (stdout) and errors/warnings/messages (stderr) in the R console.
I am aware that there are a lot of logging-packages and I tried several like TeachingDemos (seems to ignore stderr completely) or R2HTML (seems to ignore messages), however, none of them seems to include everything from stderr.
Only knitr and markdown seem to be able to include everything into a single file. But using this workaround, I have to write R-scripts and I cannot freely write commands in the console. Furthermore, I cannot include the knitr or markdown command in the same R-script (which is of course a minor problem).
Here is an example:
library(TeachingDemos)
library(R2HTML)
library(TraMineR)
logdir <- "mylog.dir"
txtStart(file=paste(logdir,"test.txt", sep=""), commands=TRUE,
results=TRUE, append=FALSE)
HTMLStart(outdir = logdir, file = "test", echo=TRUE, HTMLframe=FALSE)
## Messages, warnings and errors
message("Print this message.")
warning("Beware.")
"a" + 1
geterrmessage()
## Some example application with the TraMiner package
## which uses messages frequently
data(mvad)
mvad.seq <- seqdef(mvad[, 17:86])
mvad.ham <- seqdist(mvad.seq, method="HAM")
txtStop()
HTMLStop()
If you are running R from a Unix/Linux/Mac/etc. terminal, you can do:
R | tee mydir/mylog.txt
On windows, you can run the script in
R CMD BATCH yourscript.R
and your result will appear in the same folder as yourscript.out
On unices, I've often used the following code idiom with bash:
Some Command 2>&1 | tee NameOfOutputFile.txt
The "2>&1" says to take stderr and redirect it to stdout, which then gets piped to "tee". I will be experimenting with this and other ways of logging an R session.
Another unix trick is the "script" command, which starts a subshell whose I/O (basically everything you type and see in return) is logged to the specified file. And then exit the shell to end the script. Again, subject to experimentation. Since "sink" is native to R, I'll be trying that first.
By the way, I picked these tricks up using solaris, but they work the same running Cygwin, the unix emulator on Windows. Long time ago, I found that my Cygwin images were more up-to-date than the institutional installations of Solaris because the administrators had much more responsibility than just keeping the Solaris up-to-date. Mind you, the institutional machines were more powerful, so even though Cygwin was way more convenient, my personal machine simply didn't fill the need.
AFTERNOTE:
I took example code from page 99 of Shumway's Time Series Analysis and Its Applications With R examples. Here are the contents of a test file palette.R:
r
plot(gnp)
acf2(gnp, 50)
gnpgr = diff(log(gnp)) # growth rate
plot(gnpgr)
acf2(gnpgr, 24)
sarima(gnpgr, 1, 0, 0) # AR(1)
sarima(gnpgr, 0, 0, 2) # MA(2)
ARMAtoMA(ar=.35, ma=0, 10) # prints psi-weights
quit("no")
exit
I invoked it using:
script < palette.R
It captures the commands from palette.R and the corresponding output. So, seems usable for batch mode. For interactive mode, I'm going to go with my original plan and use sink.
I had a similar problem and in my case, I was unable to redirect truly all the output into my log file, as it was dependent on R running in interactive mode.
Specifically, in my log file, I wanted to be able to keep track of the progress bar generated by the rjags::update() function which, however, requires interactive mode:
The progress bar is suppressed if progress.bar is "none" or NULL, if
the update is less than 100 iterations, or if R is not running
interactively.
Therefore, I needed to trick R into thinking that it was running interactively, while it was in fact run from a bash script (interactive_R.sh, below) using here document:
interactive_R.sh
#!/bin/bash
R --interactive << EOT
# R code starts here
print(interactive())
quit("no")
# R code ends here
EOT
(Sidenote: Make sure to avoid the $ character in your R code, as this would not be processed correctly - for example, retrieve a column from a data.frame() by using df[["X1"]] instead of df$X1.)
Then you can simply run this script and send its contents into a log file using the bash command below:
$ ./interactive_R.sh > outputFile.log 2>&1
Your log file will then look as follows:
outputFile.log
R version 4.0.2 (2020-06-22) -- "Taking Off Again"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
>
> # R code starts here
> print(interactive())
[1] TRUE
> quit("no")

Resources