Azure, R: showing the Standard Error - r

When I develop an experiment on Azure ML I have the chance to insert the "Execute R Script" module. When I have runned it, I can explore the outputs produced by the module itself.
My problem is that I have two modules: I do a filtering on a dataset in the first and use the resulting dataset in the second.
Then I create a web service with it.
Problem: when the filtering gives a null dataset this possibly create problems in the functions on the second module.
I want to find a way to "write" in the "Standard Error" space. I have tried to use:
if (length(dataset$column1)==0) {warning("Empty filtering!!!!")}
but it does not work.

According to the R manual for the NULL object, please try to use the function is.null(x) as the if condition.
Meanwhile, as notice, there are two similar concepts NA & NULL in R, please refer to the blog http://www.r-bloggers.com/r-na-vs-null/ to know the difference, and use the function is.na(x) instead of is.null(x) for a NA object.

Related

Variables used in Test Package in R Studio

I am trying to fix an issue in an R project (that I'm not too familiar with). The test script that is executed when running "Test Package" in R-Studio uses a variable, let's call it x. From the result of the test I can tell that the data assigned to this variable is outdated and I want to update it.
Problem is, I just cannot figure out where this variable is actually defined. I have used grep for the entire source code of the package, and I only find the instance from the test script but no declaration. It is not defined in the script itself nor anywhere else in the source code.
When I just load the package, the variable is not defined. Somehow it is defined however when running the test, because only when I change the name in the test script into some dummy I get the error that it isn't defined.
Is there a specific place where I could look, or may be a simple trick how I could figure out where and how the variable is defined?
Thanks
Edit: the actual variable name is a bit complicated, it is not x
The find in files option in RStudio may help.
You can search through multiple files within a folder.
If you get too many matches to sort through (I'm really hoping your variable is not actually called x!), you can try using a regular expression.
enter image description here
Follow the pictures and you could solve the problem.

Function parameters - replace by reference

Thanks for all your advice. My remaining question is this:
Can I replace column name 'sulphate' in the following statement ...
dataclean <- datatable$sulfate[!datanas]
.... with a reference to a parameter 'pollutant', which may or may not have a value of 'sulfate'?
When you attach values to arguments, they appear as they would be objects in your workspace. But the environment is not workspace but that of the function.
So in your case, directory would be a character string and it would work. For the first time. Your working directory is now changed and you need to revert back to the previous one for the function to work again. This can get pretty messy so what I like to do is just refer to raw files by full path. See ?list.files for more info.
For your second question, your best bet is to refer to a certain level within the variable, is to do
x[, pollutant]
It is convenient to add drop = FALSE argument there, in order to keep the what I'm assuming is a data.frame.
You could improve your function by also implementing the datatable argument. That way you have all the objects bundled together nicely.
The most important thing to note here would be "debugging". You should learn to use at least browser(). This function will stop the execution of your function at the very step where it was called. This enables you, in the R console, to inspect elements in the function and run code to see what's going. This way you can speed up the development of code, at least initially when you usually haven't internalized all the data structures and paradigms yet.

How to use strings stored in vectors in another function

Hello i'm trying this for loop in which i enable multiple libraries in r.
lbs<-c("plyr","dplyr","magrittr","readr")
for (i in 1:length(lbs)) {
library(lb[i])
}
but i get this error
Error in library(lb[i]) : 'package' must be of length 1
My questions covers two dillemas.
How do i use strings stored in vectors to use them in another function?
How do i tell rstudio to enable certain libraries by default every time in open r.
In short:
The library() function is weird. Try library(lb[i],character.only = TRUE). There is an example illustrating this at the very bottom of ?library, for what it's worth.
Read ?Startup, in particular about using .Rprofile files.

Automatic Plotting in *.r file

I do have a *.r file where I order it to coduct a Chi Square of Independence and write it to an html file. It's working fine but I'd like to add a graph.
Doing by hand in R with linecommands works perfectly, but the exact same commands do not work in the *.r file but i want it to do it automatically.
mat1 <-matrix(c(12,3,2,12),nrow=2,byrow=T)
attach(mat1)
png('independence.png')
barplot(mat1,beside=TRUE)
dev.off()
Is there an additional command necessary?
kind regards
If you have an error in a script with no try or tryCatch, then entrie script fails. By trying to attach a matrix, you throw an error with the message:
Error in attach(mat1) :
'attach' only works for lists, data frames and environments
So you should pay more attention to the error messages in interactive mode, and if you are planning to use .r files for production you should learn to use error handling routines in R. The `attach function is a common source of new user errors, although this error is not particularly common. The more common errors with its use involve regression functions where the authors of the functions are expecting entire objects, usually dataframes, to be passed to a data argument.

Specify my dataset as working dataset

I am a newbie to R.
I can successfully load my dataset into R-Studio, and I can see my dataset in the workspace.
When I run the command summary(mydataset), I get the expected summary of all my variables.
However, when I run
data(mydataset)
I get the following warning message:
In data(mydataset) : data set ‘mydataset’ not found
I need to run the data() command as recommended in the fitLogRegModel() command, which is part of the PredictABEL package.
Does anybody have a hint on how I can specify mydataset as working dataset?
You don't need to use the data command. You can just pass your data to the function
riskmodel <- fitLogRegModel(data=mydataset, cOutcome=2,
cNonGenPreds=3:10, cNonGenPredsCat=6:8,
cGenPreds=c(11, 13:16), cGenPredsCat=0)
The example uses data(ExampleData) so that it can make data that is in the package available to you. Since you already have your data, you don't need to load it.
An alternative, although it has its drawbacks, is to use attach(mydataset). You can then refer to variables without a mydatdataset$ prefix. The main drawback, as far as I know (although I'd welcome the views of more expert R users) is that if you modify a variable after attaching, it is then not part of the dataset. This can cause confusion and lead to "gotchas". In any case, many expert R users counsel against the use of attach.

Resources