Simple if-else loop in R - r

Can someone tell me what is wrong with this if-else loop in R? I frequently can't get if-else loops to work. I get an error:
if(match('SubjResponse',names(data))==NA) {
observed <- data$SubjResponse1
}
else {
observed <- data$SubjResponse
}
Note that data is a data frame.
The error is
Error in if (match("SubjResponse", names(data)) == NA) { :
missing value where TRUE/FALSE needed

This is not a full example as we do not have the data but I see these issues:
You cannot test for NA with ==, you need is.na()
Similarly, the output of match() and friends is usually tested for NULL or length()==0
I tend to write } else { on one line.

As #DirkEddelbuettel noted, you can't test NA that way. But you can make match not return NA:
By using nomatch=0 and reversing the if clause (since 0 is treated as FALSE), the code can be simplified. Furthermore, another useful coding idiom is to assign the result of the if clause, that way you won't mistype the variable name in one of the branches...
So I'd write it like this:
observed <- if(match('SubjResponse',names(data), nomatch=0)) {
data$SubjResponse # match found
} else {
data$SubjResponse1 # no match found
}
By the way if you "frequently" have problems with if-else, you should be aware of two things:
The object to test must not contain NA or NaN, or be a string (mode character) or some other type that can't be coerced into a logical value. Numeric is OK: 0 is FALSE anything else (but NA/NaN) is TRUE.
The length of the object should be exactly 1 (a scalar value). It can be longer, but then you get a warning. If it is shorter, you get an error.
Examples:
len3 <- 1:3
if(len3) 'foo' # WARNING: the condition has length > 1 and only the first element will be used
len0 <- numeric(0)
if(len0) 'foo' # ERROR: argument is of length zero
badVec1 <- NA
if(badVec1) 'foo' # ERROR: missing value where TRUE/FALSE needed
badVec2 <- 'Hello'
if(badVec2) 'foo' # ERROR: argument is not interpretable as logical

Related

make function detect nonexistent column when specified as df$x

I have functions that operate on a single vector (for example, a column in a data frame). I want users to be able to use $ to specify the columns that they pass to these functions; for example, I want them to be able to write myFun(df$x), where df is a data frame. But in such cases, I want my functions to detect when x isn't in df. How may I do this?
Here is a minimal illustration of the problem:
myFun <- function (x) sum(x)
data(iris)
myFun(iris$Petal.Width) # returns 180
myFun(iris$XXX) # returns 0
I don't want the last line to return 0. I want it to throw an error message, as XXX isn't a column in iris. How may I do this?
One way is to run as.character(match.call()) inside the function. I could then use the parts of the resulting string to determine the name of df, and in turn, I could check for the existence of x. But this seems like a not–so–robust solution.
It won't suffice to throw an error whenever x has length 0: I want to detect whether the vector exists, not whether it has length 0.
I searched for related posts on Stack Overflow, but I didn't find any.
The iris$XXX returns NULL and NULL is passed to sum
sum(NULL)
#[1] 0
Note that either iris$XXX or iris[['XXX']] returns NULL as value. If we need to get an error either subset or dplyr::select gives that
iris %>%
select(XXX)
Error: Can't subset columns that don't exist.
✖ Column XXX doesn't exist.
Run rlang::last_error() to see where the error occurred.
Or with pull
iris %>%
pull(XXX)
Error: object 'XXX' not found Run rlang::last_error() to see where
the error occurred.
subset(iris, select = XXX)
Error in eval(substitute(select), nl, parent.frame()) :
object 'XXX' not found
>
We could make the function to return an error if NULL is passed. Based on the way the function takes arguments, it is taking the value and not any info about the object.
myFun <- function (x) {
stopifnot(!is.null(x))
sum(x)
}
However, this would be non-specific error because NULL values can be passed to the function from other cases as well i.e. consider if the column exists and the value is NULL.
If we need to check if the column is valid, then the data and the column name should be passed into
myFun2 <- function(data, colnm) {
stopifnot(exists(colnm, data))
sum(data[[colnm]])
}
myFun2(iris, 'XXX')
#Error in myFun2(iris, "XXX") : exists(colnm, data) is not TRUE

Why does using paste in for loop return error?

I have a few problems concerning the same topic.
(1) I am trying to loop over:
premium1999 <- as.data.frame(coef(summary(data1999_mod))[c(19:44), 1])
for 10 years, in which I wrote:
for (year in seq(1999,2008)) {
paste0('premium',year) <- as.data.frame(coef(summary(paste0('data',year,'_mod')))[c(19:44), 1])
}
Note:
for data1999_mod is regression results that I want extract some of its estimators as a dataframe vector.
The coef(summary(data1999_mod)) looks like this:
#A matrix: ... of type dbl
Estimate Std. Error t value Pr(>|t|)
age 0.0388573570 2.196772e-03 17.6883885 3.362887e-6
age_sqr -0.0003065876 2.790296e-05 -10.9876373 5.826926e-28
relation 0.0724525759 9.168118e-03 7.9026659 2.950318e-15
sex -0.1348453659 8.970138e-03 -15.0326966 1.201003e-50
marital 0.0782049161 8.928773e-03 8.7587533 2.217825e-18
reg 0.1691004469 1.132230e-02 14.9351735 5.082589e-50
...
However, it returns Error: $ operator is invalid for atomic vectors, even if I did not use $ operator here.
(2) Also,
I want to create a column 'year' containing repeated values of the associated year and am trying to loop over this:
premium1999$year <- 1999
In which I wrote:
for (i in seq(1999,2008)) {
assign(paste0('premium',i)[['year']], i)
}
In this case, it returns Error in paste0("premium", i)[["year"]]: subscript out of bounds
(3) Moreover, I'd like to repeat some rows and loop over:
premium1999 <- rbind(premium1999, premium1999[rep(1, 2),])
for 10 years again and I wrote:
for (year in seq(1999,2008)) {
paste0('premium',year) <- rbind(paste0('premium',year), paste0('premium',year)[rep(1, 2),])
}
This time it returns Error in paste0("premium", year)[rep(1, 2), ]: incorrect number of dimensions
I also tried to loop over a few other similar things but I always get Error.
Each code works fine individually.
I could not find what I did wrong. Any help or suggestions would be very highly appreciated.
The problem with the code is that the paste0() function returns the character and not calling the object that is having the name as this character. For example, paste0('data',year,'_mod') returns a character vector of length 1, i.e., "data1999_mod" and not calling the object data1999_mod.
For easy understanding, there is huge a difference between, "data1999_mod"["Estimate"] and data1999_mod["Estimate"]. Subsetting as data frame merely by paste0() function returns the former, however, the expected output will be given by the latter only. That is why you are getting, Error: $ operator is invalid for atomic vectors.
The same error is found in all of your codes. On order to call the object by the output of a paste0() function, we need to enclose is by get().
As, you have not supplied the reproducible sample, I couldn't test it. However, you can try running these.
#(1)
for (year in seq(1999,2008)) {
paste0('premium',year) <- as.data.frame(coef(summary(get(paste0('data',year,'_mod'))))[c(19:44), 1])
}
#(2)
for (i in seq(1999,2008)) {
assign(get(paste0('premium',i))[['year']], i)
}
#(3)
for (year in seq(1999,2008)) {
paste0('premium',year) <- rbind(get(paste0('premium',year)), get(paste0('premium',year))[rep(1, 2),])
}

missing x with no default. Calling functions within functions in R

I'm writing a code to solve a sudoku puzzle using a video found from YouTube that has coded the same algorithm through Python. This code requires three functions to
Find an empty square.
insert a number into the empty square.
Test whether this number is valid to solve the puzzle.
This is using a backtracking algorithm for the solver.
I am having an issue when calling the functions together where i get the error:
Error in free_squ(x) : argument "x" is missing, with no default
In addition: Warning message:
In if (empty_sq == FALSE) { :
the condition has length > 1 and only the first element will be used
Called from: free_squ(x)
This is confusing as I only get it when running thIS code. So I can write other functions to call the individual functions to analyse the argument inserted into the overlying function:
function1(argument){
function2(argument){
function3(argument){
***DO STUFF***}}}
Why for the following code does function within the main function not recognise the argument?
sudoku_solve <- function(x){
empty_sq <- free_squ(x) # Define a new object to give coordinates of empty square
if(empty_sq == FALSE){ # If no empty square can be found
return(x) # Return the matrix
} else{
empty_sq <- empty_sq # Pointless line kept for clarity
}
for(i in c(1:9)){ # Integers to insert into the found empty square
if(valid(x, i, empty_sq) == TRUE){ # can the intiger be placed in this square?
x[empty_sq[1], empty_sq[2]] = i # if i valid, insert into empty square
}
if(sudoku_solve()){ # are all i's valid?
return(TRUE) # All i's valid
} else{
x[empty_sq[1], empty_sq[2]] = 0 # reset the initial try and try again with another
}
}
return(FALSE)
}
I have named the sudoku puzzle 'puzzle', and call the function by the following:
sudoku_solve(puzzle)
I think in the following statement, you are not passing any value to the function and x does not have a default value either.
if(sudoku_solve()){ # are all i's valid?
return(TRUE) # All i's valid
}
Hence, although the argument is initially passed, when the function is called again after the loop, it is called without an argument. So you pass to free_sq(x) inside sudoku_solve(), and it gives an error.
empty_sq <- free_squ(x)
Make sure you are passing a value to sudoku_solve or else set the default value for x wither in sudoku_solve or in the free_squ class/function.

Using ifelse in R when one of the options produces NAs?

I want to vectorize a function that relies on checking a condition and depending on whether this condition is TRUE or FALSE, return the outcome of one of two functions, respectively. The problem is that, when the condition is FALSE, the first function cannot be evaluated. Then, ifelse returns the correct values but it also produces a warning. I would like to produce a function that does not produce warnings.
I have tried ifelse(), but it does not work. I was expecting that this command would skip the evaluation of the first function when the condition is FALSE.
Here is an illustrative piece of R code
p = c(-1,1,-1,1,-1,-1,-1,1)
ifelse(p>0, sqrt(p), p^2)
which returns
[1] 1 1 1 1 1 1 1 1
Warning message:
In sqrt(p) : NaNs produced
As you can see, the outcome is correct but, for some reason, it evaluates the function at the first function when condition is FALSE. Thus, I would like to somehow avoid this issue.
We can create a numeric vector and then fill the elements based on the condition put forward by 'p'
out <- numeric(length(p))
out[p > 0] <- sqrt(p[p > 0])
out[p <= 0] <- p[p <= 0]^2
With ifelse we need to have all arguments of the same length. According to ?ifelse
ifelse(test, yes, no)
A vector of the same length and attributes (including dimensions and
"class") as test and data values from the values of yes or no
What happens is that we do both the calculations on the entire vector and replace the values of 'p' based on the test condition. For sqrt, the negative values definitely gives warning and output as NaN. While the NaN elements don't show up in the output, the warning was already printed. The warning is a friendly one, but can be suppressed with suppressWarnings
Avoidance through ifelse probably isn't possible. My understanding of the ifelse process is
Create a vector of values based on the expression in yes
Create a vector of values based on the expression in no
Use the result of test to decide whether each element comes from yes or no.
If an error will occur in either yes or no, ifelse will fail.
To get around this, you need to only evaluate expressions where they will succeed. (such as in akrun's answer, a variant of which is given here for completeness)
p = c(-1,1,-1,1,-1,-1,-1,1)
condition <- p > 0
result <- numeric(length(p))
result[g1] <- sqrt(p[condition])
result[!g1] <- p[condition]^2

r - Check if any value in a data.frame column is null

I am trying to see if the data.frame column has any null values to move to the next loop. I am currently using the code below:
if (is.na(df[,relevant_column]) == TRUE ){next}
which spits out the warning:
In if (is.na(df_cell_client[, numerator]) == TRUE) { ... : the
condition has length > 1 and only the first element will be used
How do I check if any of the values are null and not just the first row?
(I assume by "null" you really mean NA, since a data.frame cannot contain NULL in that sense.)
Your problem is that if expects a single logical, but is.na(df[,relevant_column]) is returning a vector of logicals. any reduces a vector of logicals into a single global "or" of the vector:
Try:
if (any(is.na(df[,relevant_column]))) {next}
BTW: == TRUE is unnecessary. Keep it if you feel you want the clarity in your code, but I think you'll find most R code does not use that. (I've also seen something == FALSE, equally "odd/wrong", where ! something should work ... but I digress.)

Resources