R checking a parameter is defined - r

I'm looking for a general practice how to check a parameter was defined in a function.
I came up with these three ideas. Which one is the proper way of doing it?
Unfortunately, the third one is not working. substitute() is working differently in a function and I couldn't figure it out how to use it properly.
file.names <- list(
cov.value <- "cov.rds",
plot.name <- "plot.pdf"
)
test1 <- function(file.names){
is.save <- !missing(file.names)
}
test2 <- function(file.names = NULL) {
is.save <- !is.null(file.names)
}
test3 <- function(file.names = NULL) {
is.save <- exists(as.character(substitute(file.names)))
}

I personally think the second approach with a default value is MUCH easier to use & understand. (And the third approach is just really bad)
...especially when you are writing a wrapper function that needs to pass an argument to it. How to pass a "missing"-value is not obvious!
wraptest1 <- function(n) {
file.names <- if (n > 0) sample(LETTERS, n)
else alist(a=)[[1]] # Hacky way of assigning 'missing'-value
print(test1(file.names))
}
wraptest1(2) # TRUE
wraptest1(0) # FALSE
wraptest2 <- function(n) {
file.names <- if (n > 0) sample(LETTERS, n)
else NULL # Much easier to read & understand
print(test2(file.names))
}
wraptest2(2) # TRUE
wraptest2(0) # FALSE
[granted, there are other ways to work around passing the missing value, but the point is that using a default value is much easier...]
Some default values to consider are NULL, NA, numeric(0), ''

It's generally a good idea to look at code from experienced coders---and R itself has plenty of examples in the R sources.
I have seen both your first and second example being used. The first one is pretty idiomatic; I personally still use the second more often by force of habit. The third one I find too obscure.

Related

Storing input value to check wether postcondition holds true when applying Design-by-Contract

I make use of the assertthat package quite often to check postconditions in functions. When reading more about the idea of Design by Contract I stumbled upon the idea to make checks of output in comparison to input values.
The most simple example is the following:
toggle <- function(x)!x
One can immediately state that x == !old_x must always be true. (old_x stands for the value of x before evaluation.)
(Of course this example is oversimplified and the postcondition does not add more useful information for humans or computers. A more useful example is on the bottom of the question..)
So I can extend my toggle function as follows to check that condition with every call:
toggle <- function(x){
old_x <- x
x <- !x
assertthat::assert_that(x == !old_x)
return(x)
}
This works of course but I wondered if there's another way to access the value of old_x without explicitely store it (or the result) under a new name. And without splitting the postcondition-checking code to the top and bottom of the function. Something along the line of how R evaluates function calls..
One attempt I can think of is to use sys.call and eval.parent to access to the old values:
toggle <- function(x){
x <- !x
.x <- eval.parent(as.list(sys.call())[[2]])
assertthat::assert_that(x == !.x)
return(x)
}
This works, but I still need to assign a new variable .x and also the subsetting with [[2]] is not flexible yet. However writing it like assertthat::assert_that(x == !eval.parent(as.list(sys.call())[[2]]) does not work and playing around with the search levels of sys.call(-1 ..) did not help.
Another (a bit more useful) example where the postcondition adds some information:
increment_if_smaller_than_2 <- function(x){
old_x <- x
x <- ifelse(x < 2, x <- x + 1, x)
assertthat::assert_that(all(x >= old_x))
return(x)
}
Any hints?
You can access the old parameter-values by accessing it via the parent environment. For this solution to work, you need to introduce new variable(s) for the return-result, i.e. retval, to prevent re-assignments to method-params. IMHO this isn't a serious drawback, since it's good programming-style not to overwrite method-parameters anyway. You could i.e. do the following:
test <- function(.a) {
retval <- 2 * .a
assertthat::assert_that(abs(retval) >= abs(.a))
return(retval)
}
a <- 42
test(a)
# [1] 84
If you would like to take it a step further and submit the assertion-function dynamically you could do that as follows:
test_with_assertion <- function(.a, assertion) {
retval <- 2 * .a
assertthat::assert_that(assertion(retval, eval.parent(.a)))
return(retval)
}
a <- 42
test_with_assertion(a, function(new_value, old_value)
abs(new_value) >= abs(eval.parent(old_value)) )
# [1] 84
Does this do, what you intended to do?

User defined function - issue with return values

I regularly come up against the issue of how to categorise dataframes from a list of dataframes according to certain values within them (E.g. numeric, factor strings, etc). I am using a simplified version using vectors here.
After writing messy for loops for this task a bunch of times, I am trying to write a function to repeatedly solve the problem. The code below returns a subscripting error (given at the bottom), however I don't think this is a subscripting problem, but to do with my use of return.
As well as fixing this, I would be very grateful for any pointers on whether there are any cleaner / better ways to code this function.
library(plyr)
library(dplyr)
#dummy data
segmentvalues <- c('1_P', '2_B', '3_R', '4_M', '5_D', '6_L')
trialvec <- vector()
for (i in 1:length(segmentvalues)){
for (j in 1:20) {
trialvec[i*j] <- segmentvalues[i]
}
}
#vector categorisation
vcategorise <- function(categories, data) {
#categorises a vector into a list of vectors
#requires plyr and dyplyr
assignment <- list()
catlength <- length(categories)
for (i in 1:length(catlength)){
for (j in 1:length(data)) {
if (any(contains(categories[i], ignore.case = TRUE,
as.vector(data[j])))) {
assignment[[i]][j] <- data[j]
}
}
}
return (assignment)
}
result <- vcategorise(categories = segmentvalues, data = trialvec)
Error in *tmp*[[i]] : subscript out of bounds
You are indexing assignments -- which is ok, even if at an index that doesn't have a value, that just gives you NULL -- and then indexing into what you get there -- which won't work if you get NULL. And NULL you will get, because you haven't allocated the list to be the right size.
In any case, I don't think it is necessary for you to allocate a table. You are already using a flat indexing structure in your test data generation, so why not do the same with assignment and then set its dimensions afterwards?
Something like this, perhaps?
vcategorise <- function(categories, data) {
assignment <- vector("list", length = length(data) * length(categories))
n <- length(data)
for (i in 1:length(categories)){
for (j in 1:length(data)) {
assignment[(i-1)*n + j] <-
if (any(contains(categories[i],
ignore.case = TRUE,
as.vector(data[j])))) {
data[j]
} else {
NA
}
}
}
dim(assignment) <- c(length(data), length(categories))
assignment
}
It is not the prettiest code, but without fully understanding what you want to achieve, I don't know how to go further.

Alternative to missing() inside a function

I want to detect if a variable is missing inside a function without calling the missing() function. I've found two alternatives, but they both seem crude.
Alternative 1
It seems that a variable that is missing has the environmnent class "name" but it seems intuitively wrong to use this construct:
a <- function(a, b){
e <- environment()
if(class(e[["b"]]) == "name")
e$b <- a
print(b)
}
Alternative 2
I guess a possible solution is to use parse and eval but it seems just as crude as the previous solution:
a <- function(a, b){
e <- environment()
if(eval(parse(text = sprintf("missing(%s)", "b"))))
e$b <- a
print(b)
}
Background
I need this as I'm changing the API and I would like to loop over all the old argument names within the ... and send a warning that the user should update to the new parameter names. This is why missing() doesn't work, my current setup is:
# Warnings due to interface changes in 1.0
API_changes <-
c(rowname = "rnames",
headings = "header",
halign = "align.header")
dots <- list(...)
fenv <- environment()
for (i in 1:length(API_changes)){
old_name <- names(API_changes)[i]
new_name <- API_changes[i]
if (old_name %in% names(dots)){
if (class(fenv[[new_name]]) == "name"){
fenv[[new_name]] <- dots[[old_name]]
dots[[old_name]] <- NULL
warning("Deprecated: '", old_name, "'",
" argument is now '", new_name ,"'",
" as of ver. 1.0")
}else{
stop("You have set both the old parameter name: '", old_name, "'",
" and the new parameter name: '", new_name, "'.")
}
}
}
Gosh,-- do we really have to point you to the appropriate fortune entry concerning eval(parse()) ?
Anyway,what's wrong with looping over the contents of dots<-list(...) ? It's not a time-pig by any means.
But my fundamental response is: you've made a mistake by allowing valid or invalid arguments within the ... entries. I don't know why you set up your previous function that way, but it's probably much cleaner, and safer, in the long run to eliminate this construct from your updated release. There's a reason that functions&packages come with help pages. Much as I approve of back-compatibility, I don't think you're doing anyone a favor here. Further, it's not clear to me how or why you'd want a required argument to be passed via ... . And if it's not required, then you don't want to emulate missing in the first place.
Your users will very quickly :-) realize that they've got invalid argument names. Regardless of whether you provide this transitional set of warning messages, they'll either adapt, or emigrate from your code to other options.

More efficient than for loop in R

I wonder are there more efficient ways to assign values to a new variable in a data frame, than using for loops. I have two recent example:
[1] Getting normalized Leveshtein distance using vwr package:
rst34$Levenshtein = rep(0, nrow(rst34))
for (i in 1:nrow(rst34)) {
rst34$Levenshtein[i] = levenshtein.distance(
as.character(rst34$target[i]), as.character(rst34$prime[i]))[[1]] /
max(nchar(as.character(rst34$target[i])), nchar(as.character(rst34$prime[i]))
)
}
[2] Extracting substring from another variable:
rst34$Experiment = 'rst4'
for (i in 1:nrow(rst34)) {
rst34$Experiment[i] = unlist(strsplit(as.character(rst34$subject[i]), '[.]'))[1]
}
Also, I think that there should be no difference between initializations in two examples:
rst34$Levenshtein = rep(0, nrow(rst34))
rst34$Experiment = 'rst4'
Many thanks!
Something like...
rst34$Experiment = sapply(rst34$subject, function(element){
unlist(strsplit(as.character(element), '[.]'))[1]
})
Should hopefully do the trick. I don't have your data, so I couldn't actually test it out.
It would only make sense to apply nchar to a character variable so the as.character calls are probably not needed:
rst34$Levenshtein <-
levenshtein.distance( rst34$target, rst34$prime) /
pmax(nchar(rst34$target),
nchar(rst34$prime ) )

Trying to vectorize a for loop in R

UPDATE
Thanks to the help and suggestions of #CarlWitthoft my code was simplified to this:
model <- unlist(sapply(1:length(model.list),
function(i) ifelse(length(model.list[[i]][model.lookup[[i]]] == "") == 0,
NA, model.list[[i]][model.lookup[[i]]])))
ORIGINAL POST
Recently I read an article on how vectorizing operations in R instead of using for loops are a good practice, I have a piece of code where I used a big for loop and I'm trying to make it a vector operation but I cannot find the answer, could someone help me? Is it possible or do I need to change my approach? My code works fine with the for loop but I want to try the other way.
model <- c(0)
price <- c(0)
size <- c(0)
reviews <- c(0)
for(i in 1:length(model.list)) {
if(length(model.list[[i]][model.lookup[[i]]] == "") == 0) {
model[i] <- NA
} else {
model[i] <- model.list[[i]][model.lookup[[i]]]
}
if(length(model.list[[i]][price.lookup[[i]]] == "") == 0) {
price[i] <- NA
} else {
price[i] <- model.list[[i]][price.lookup[[i]]]
}
if(length(model.list[[i]][reviews.lookup[[i]]] == "") == 0) {
reviews[i] <- NA
} else {
reviews[i] <- model.list[[i]][reviews.lookup[[i]]]
}
size[i] <- product.link[[i]][size.lookup[[i]]]
}
Basically the model.list variable is a list from which I want to extract a particular vector, the location from that vector is given by the variables model.lookup, price.lookup and reviews.lookup which contain logical vectors with just one TRUE value which is used to return the desired vector from model.list. Then every cycle of the for loop the extracted vectors are stored on variables model, price, size and reviews.
Could this be changed to a vector operation?
In general, try to avoid if when not needed. I think your desired output can be built as follows.
model <- unlist(sapply(1:length(model.list), function(i) model.list[[i]][model.lookup[[i]]]))
model[model=='']<-NA
And the same for your other variables. This assumes that all model.lookup[[i]] are of length one. If they aren't, you won't be able to write the output to a single element of model in the first place.
I would also note that you are grossly overcoding, e.g. x<-0 is better than x<-c(0), and don't bother with length evaluation on a single item.

Resources