Conditional expression for '...' not working in function - r

I am trying to pass multiple if-else conditions in my function and so I am using dots because the function will take many arguments.
I am trying to return yes when the first element of dots is called as a list. For example (simple version):
testt <- function(...){
ar <- list(...)
val <- ar[[1]]
if(is(val == 'list') == TRUE){
print('yes')
} else {print('no')}
}
test(list(1))
>[1] "no"
Warning message:
In if (is(val == "list") == TRUE) { :
the condition has length > 1 and only the first element will be used
It won't accept that ar[[1]] is a list, how can I work around this? additionaly, is the warning trying to tell me something as to why this is not working?

Related

How to include a not-yet indexed parameter into if-else

I have the following function:
foo <- function(...){
dots <- list(...)
response <- dots[[1]]
if(is(dots[[2]],'list') == TRUE){print('yes')} else print('no')
}
This produces the following output:
foo('yes'):
Error in dots[[2]] : subscript out of bounds
How can I use a 'not-yet' indexed parameter so that I can stall the function when it's TRUE or when its FALSE. For example, when it's TRUE I would do some stuff based on this, otherwise when it is FALSE the part of the function that uses it won't run.
However, R want's me to at-least index dots with some list values.
For example, If I wanted to use just:
foo('yes')
>Error in dots[[2]] : subscript out of bounds
#otherwise
foo('yes',c('some','list'))
>'yes'
I want to be able to run foo('yes') and for it to print no. Essentially, some parameters won't get used in the function, and so in this case when it's not assigned anything then run the else statement.
Picking up on #Rui Barradas and #Allan Camerons comments, I can achieve the same expectation with function(pred=NULL,...) by using:
foo <- function(...){
dots <- list(...)
response <- dots[[1]]
print(response)
if(length(dots) > 1){
if(is(dots[[2]],'list') == TRUE){
print('yes')
} else print('no')
} else if (length(dots) == 1){
dots[[2]] = NULL
}
}
Results:
> foo('yes',list(1, 2, 3))
[1] "yes"
> foo('yes')
[1] "yes"
Are there any cleaner alternatives to this that reduce the amount of code? My approach produces quite some clutter. The only issue I have with this is that If I wanted dots[[3]], I would have to implement further conditionals to access this or set it to NULL.

Using group_modify to apply function to grouped dataframe

I am trying to apply a function to each group of data in the main dataframe and I decided to use group_modify() (since it returns a dataframe as well). Here is my initial code:
max_conc_fx <- function(df) {
highest_conc <- 0
for (i in 1:nrow(df)) {
curr_time <- df$event_time[i]
within1hr <- filter(df, abs(event_time - curr_time) <= hours(1))
num_buyers <- length(unique(within1hr$userid))
curr_conc <- nrow(within1hr)/num_buyers
if (curr_conc > highest_conc) {
highest_conc <- curr_conc
}
}
mutate(df, highest_conc)
}
conc_data <- group_modify(data, max_conc_fx)
However, I keep getting this error message:
Error in as_group_map_function(.f) :
The function must accept at least two arguments. You can use ... to absorb unused components
After some trial and error, I rectified this by adding the argument "..." to my max_conc_fx() function, which leads to this code which works:
max_conc_fx <- function(df, ...) { #x is the rows of data for one shop
highest_conc <- 0
for (i in 1:nrow(df)) {
curr_time <- df$event_time[i]
within1hr <- filter(df, abs(event_time - curr_time) <= hours(1))
num_buyers <- length(unique(within1hr$userid))
curr_conc <- nrow(within1hr)/num_buyers
if (curr_conc > highest_conc) {
highest_conc <- curr_conc
}
}
mutate(df, highest_conc)
}
conc_data <- group_modify(data, max_conc_fx)
Can someone explain to me what the dots are actually for in this case? I understood them to be used for representing an arbitrary number of arguments or for passing on additional arguments to other functions, but I do not see both of these events happening here. Do let me know if I am missing out something or if you have a better solution for my code.
The dots don't do much in that case, but there is a condition that requires them in your functions case for group_modify()to work. The function you are passing is getting converted using a helper function as_group_map_function(). This function checks if the function has more than two arguments and if not it should have ... to pass:
## dplyr/R/group_map.R (Lines 2-8)
as_group_map_function <- function(.f) {
.f <- rlang::as_function(.f)
if (length(form <- formals(.f)) < 2 && ! "..." %in% names(form)){
stop("The function must accept at least two arguments. You can use ... to absorb unused components")
}
.f
}
I'm not 100% sure why it is done, but based on a quick peek on the source code it looks like there is a point where they pass two arguments and ... to the 'converted' version of your function (technically there is no conversion that happens – the conversion only takes place if you pass a formula instead of a function...), so my best guess is that is the reason: it needs to have some way of dealing with at least two arguments — if it doesn't need them, then it needs ... to 'absorb' them, otherwise it would fail.

Default argument to function only if condition is met

I am currently writing a function that I want to pass default arguments if a condition is met. If the condition is not met, no argument should be passed. How can I achieve this?
I tried it with ifelse and NULL like in this minimal example but it did not work:
my_function <- function(.data,
.variable = ifelse("var1" %in% names(.data), "var1", NULL)){
...
}
If "var1" is no variable name of .data and I don't pass another argument on .variable, I want to get an error like "argument ".variable" is missing, with no default". My solution works but I get other error messages.
It seems that ifelse doesn't like having NULL as the response in the case the condition is FALSE:
ifelse(2 < 1, 1, NULL)
# Error in ans[!test & ok] <- rep(no, length.out = length(ans))[!test & :
# replacement has length zero
# In addition: Warning message:
# In rep(no, length.out = length(ans)) :
# 'x' is NULL so the result will be NULL
It seems to come from the fact that ifelse returns
A vector of the same length and attributes (including dimensions and
"class") as test and data values from the values of yes or no.
and
If yes or no are too short, their elements are recycled.
Seeing rep in the error message and the fact that length(NULL) is zero seems to be a good evidence. So, instead you may want to use, e.g.,
my_function <- function(.data, .variable = if("var1" %in% names(.data)) "var1" else NULL)
is.null(.variable)
my_function("1")
# [1] TRUE
See ?ifelse for other warnings.
I would suggest not doing it directly in the default argument, but at the start of the function, with something along the lines of:
my_function <- function(.data,
.variable = NULL) {
if (is.null(.variable)) {
if ("var1" %in% names(.data)) {
.variable = "var1"
} else {
stop(".variable undefined with no suitable default")
}
}
...
}

R - How do I calculate the percentage of sentences containing a string?

I want to calculate the percentage of sentences in a text that contain a double quotation mark and have written the following function to do so:
library(tokenizers)
quote_ratio <- function(text){
sentences <- tokenize_sentences(text, simplify = TRUE)
quote_sentences <- 0
for (i in sentences){
quote_hits <- grepl('\\"', i)
if (quote_hits == TRUE) {
quote_sentences <- quote_sentences + 1
}
}
ratio <- quote_sentences / length(sentences)
return (ratio)
}
The function works in many cases but with more data I run into the issue of having NA and/or NULL values in my sentences.
library(tm)
corpus = VCorpus(DirSource("/path/to/directory"))
ratios <- tm_map(corpus, content_transformer(quote_ratio))
# Error in if (quote_hits == TRUE) { : argument is of length zero
# In addition: Warning message:
# In if (quote_hits == TRUE) { : the condition has length > 1 and only the first element will be used
I've tried changing the if statement to check for null and NA values as follows:
if (!is.na(quote_hits) && !is.null(quote_hits) && quote_hits == TRUE) {
But this only produces more errors:
# Error in if (!is.na(quote_hits) && !is.null(quote_hits) && quote_hits == : missing value where TRUE/FALSE needed
Is there a better way to formulate the if statement and/or function? Many thanks.
EDIT:
I later realized it was likely a mistake to use the tm_map and content_transformer functions to calculate this. The function worked just fine when I stored the texts in a vector and used lapply.

Semi-automating argument validation for R functions

I would like the end-user functions in my R package (S3 style) to validate their arguments and give the user informative errors or warnings when a particular validity check fails.
The obvious (but tedious and unmaintainable) way to do this would be:
foo<-function(aa,bb,cc,dd){
if(length(aa)!=1) stop("The argument 'aa' must have a single value");
if(!is.numeric(aa)) stop("The argument 'aa' must be numeric");
if(!is.character(bb)) stop("The argument 'bb' must be a character");
if(length(bb)>=4||length(bb)<=2) stop("The argument 'bb' must be a vector with a length between 2 and 4");
if(!is.recursive(cc)) stop("The argument 'cc' must be a list-like object");
if(!is.integer(dd)) stop("The argument 'dd' must contain only integers");
if(any(dd<aa)) stop("All values in the argument 'dd' must be greater than the value of argument 'aa'");
## ...and so on
}
I'm assuming that I'm by far not the first one to do this. So, can anybody suggest a package that automates all or part of such validation tasks? Or, failing that, some concise, generic idioms that will limit the ugliness to as few lines as possible within each function?
Thanks.
stopifnot might be similar to what you're looking for. The error messages won't be quite as nice though
foo <- function(x){
stopifnot(length(x) == 1, is.numeric(x))
return(x)
}
which gives
> foo(c(1,3))
Error: length(x) == 1 is not TRUE
> foo("a")
Error: is.numeric(x) is not TRUE
> foo(3)
[1] 3
You can write a helper function like this (rudimentary example):
validate <- function(x, ...){
for(s in c(...)) switch(s,
lengthone = if(length(x)!=1) stop("argument has length != 1."),
numeric = if(!all(is.numeric(x))) stop("non-numeric arguments."),
positive = if(any(x <= 0)) stop("non-positive arguments."),
nonmissing = if(any(is.na(x))) stop("Missing values in arguments.")
)
}
Results:
> validate(1, "numeric", "positive")
> validate(0, "numeric", "positive")
Error in validate(0, "numeric", "positive") : non-positive arguments.

Resources