ifelse function on a vector - r

I am using the ifelse function in order to obtain either a vector with NA if all the "value" of this vector are NA or a vector with all the values not equal to "NA_NA". In my example, I would like to obtain this results
[1] "14_mter" "78_ONHY"
but I am obtaining this
[1] "14_mter"
my example:
vect=c("NA_NA", "14_mter", "78_ONHY")
out=ifelse(all(is.na(vec)), vec, vec[which(vec!="NA_NA")])
What is wrong in this function ?

ifelse is vectorized and its result is as long as the test argument. all(is.na(vect)) is always just length one, hence the result. a regular if/else clause is fine here.
vect <- c("NA_NA", "14_mter", "78_ONHY")
if (all(is.na(vect))) {
out <- vect
} else {
out <- vect[vect != "NA_NA"]
}
out
#> [1] "14_mter" "78_ONHY"
additional note: no need for the which() here

The ifelse help file, referring to its three arguments test, yes and no, says:
ifelse returns a value with the same shape as test which is filled
with elements selected from either yes or no depending on whether the
element of test is TRUE or FALSE.
so if the test has a length of 1, which is the case for the code in the question, then the result will also have length 1. Instead try one of these.
1) Use if instead of ifelse. if returns the value of the chosen leg so just assign that to out.
out <- if (all(is.na(vect))) vect else vect[which(vect != "NA_NA")]
2) The collapse package has an allNA function so a variation on (1) is:
library(collapse)
out <- if (allNA(vect)) vect else vect[which(vect != "NA_NA")]
3) Although not recommended if you really wanted to use ifelse it could be done by wrapping each leg in list(...) so that the condition and two legs all have the same length, i.e. 1.
out <- ifelse(all(is.na(vect)), list(vect), list(vect[which(vect != "NA_NA")])) |>
unlist()

If the NAvalue is always the string NA_NA, this works:
grep("NA_NA", vect, value = TRUE, invert = TRUE)
[1] "14_mter" "78_ONHY"
While the pattern matches the NA_NA value, the invert = TRUE argument negates the match(es) and produces the unmatched values
Data:
vect=c("NA_NA", "14_mter", "78_ONHY")

Related

Change numeric(0) to NA in pipe, but return unchanged otherwise

How to recode vector to NA if it is zero length (numeric(0)), but return the vector unchanged if not? Preferably in tidyverse using pipes.
My attempt:
library(tidyverse)
empty_numeric <- numeric(0)
empty_numeric |>
if_else(length(.) > 0, true = NA, false = . )
#> Error: `condition` must be a logical vector, not a double vector.
You can’t use a vectorised if_else here because its output is the same length as its input (i.e. always a single element). Instead, you’ll need to use conventional if. And neither will work with the built-in |> pipe since that restricts how the call can be formed (in particular, it only allows substituting the LHS into top-level arguments, and only once). By contrast, we need to repeat the LHS, and substitute it into a nested expression.
Using the ‘magrittr’ pipe operator works, however:
myvec %>% {if (length(.) > 0L) . else NA}
Or, if you prefer writing this using function call syntax:
myvec %>% `if`(length(.) > 0L, ., NA)
To be able to use the native pipe, we need to wrap the logic into a function:
na_if_null = function (x) {
if (length(x) == 0L) as(NA, class(x)) else x
}
myvec |> na_if_null()
(The as cast is there to ensure that the return type is the same as the input type, regardless of the type of x.)

Function does not to work with lubridate/mutate/across but works with a loop

I try to fix dates (years) using a function
change_century <- function(x){
a <- year(x)
ifelse(test = a >2020,yes = year(x) <- (year(x)-100),no = year(x) <- a)
return(x)
}
The function works for specific row or using a loop for one column (here date of birth)
for (i in c(1:nrow(Df))){
Df_recode$DOB[i] <- change_century(Df$DOB[i])
}
Then I try to use mutate/across
Df_recode <- Df %>% mutate(across(list_variable_date,~change_century(.)))
It does not work. Is there something I am getting wrong? thank you !
Try:
change_century <- function(x){
a <- year(x)
newx <- ifelse(test = a > 2020, yes = a - 100, no = a)
return(newx)
}
(Frankly, the use of newx as a temporary storage and then returning it was done that way solely to introduce minimal changes in your code. In general, in this case one does not need return, in fact theoretically it adds an unnecessary function to the evaluation stack. I would tend to have two lines in that function: a <- year(x) and ifelse(..), without assignment. The default behavior in R is to return the value of the last expression, which in my case would be the results of ifelse, which is what we want. Assigning it to newx and then return(newx) or even just newx as the last expression has exactly the same effect.)
Rationale
ifelse cannot have variable assignment within it. That's not to say that is is a syntax error (it is not), but that it is counter to its intent. You are asking the function to go through each condition found in test=, and return a value based on it. Regardless of the condition, both yes= and no= are evaluated completely, and then ifelse joins them together as needed.
For demonstration,
ifelse(test = c(TRUE, FALSE, TRUE), yes = 1:3, no = 11:13)
The return value is something like:
c(
if (test[1]) yes[1] else no[1],
if (test[2]) yes[2] else no[2],
if (test[3]) yes[3] else no[3]
)
# c(1, 12, 3)
To capture the results of the zipped-together yeses and nos c(1, 12, 3), one must capture the return value from ifelse itself, not inside of the call to ifelse.
Another point that may be relevant: ifelse(cond, yes, now) is not at all a shortcut for if (cond) { yes } else { no }. Some key differences:
in if, the cond must always be exactly length 1, no more, no less.
In R < 4.2, length 0 returns an error argument is of length zero (see ref), while length 2 or more produces a warning the condition has length > 1 and only the first element will be used (see ref1, ref2).
In R >= 4.2, both conditions (should) produce an error (no warnings).
ifelse is intended to be vectorized, so the cond can be any length. yes= and no= should either be the same length or length 1 (recycling is in effect here); cond= should really be the same length as the longer of yes= and no=.
if does short-circuiting, meaning that if (TRUE || stop("quux")) 1 will never attempt to evaluate stop. This can be very useful when one condition will fail (logically or with a literal error) if attempted on a NULL object, such as if (!is.null(quux) && quux > 5) ....
Conversely, ifelse always evaluates all three of cond=, yes=, and no=, and all values in each, there is no short-circuiting.

function with vector R - argument is of length zero

Wrote this function lockdown_func(beta.hat_func).
First thing is: I get an error "argument is of length zero".
Second thing is: when I compute it without the date indices, it doesn't change the value as it should, output vector contains same value for every indices.
date= c(seq(from=30, to=165))
beta.hat_func <- c(rep(x = beta.hat, times = 135))
beta.hat <- beta0[which.min(SSE)]
#implement function for modeling
lockdown_func <- function(beta.hat_func,l){
h=beta.hat_func
{
for(i in 1:length(h))
if(date[i]>60 | date[i]<110){
beta.hat_func[i]=beta.hat_func[i]*exp(-l*(date[i]-date[i-1]))
}else{
beta.hat_func[i]=beta.hat_func[i]
}
return(h)
}
}
lockdown_func(beta.hat_func,0.03)
A few comments:
did you mean to apply an AND rather than an OR to get date range between 60 and 110? This would be date[i]>60 && date[i]<110 (it's better to use the double-&& if you are computing a length-1 logical value)
because you didn't, i=1 satisfies the criterion, so date[i-1] will refer to date[0], which is a length-0 vector.
You might want something like:
l_dates <- date>60 & date<110 ## single-& here for vectorized operation
beta.hat_func[l_dates] <- beta.hat_func[l_dates]*exp(-l*diff(date)[l_dates])

How to check if vector is a single NA value without length warning and without suppression

I have a function with NA as a default, but if not NA should be a character vector not restricted to size 1. I have a check to validate these, but is.na produces the standard warning when the vector is a character vector with length greater than 1.
so_function <- function(x = NA) {
if (!(is.na(x) | is.character(x))) {
stop("This was just an example for you SO!")
}
}
so_function(c("A", "B"))
#> Warning in if (!(is.na(x) | is.character(x))) {: the condition has length >
#> 1 and only the first element will be used
An option to prevent the warning I came up with was to use identical:
so_function <- function(x = NA) {
if (!(identical(x, NA) | is.character(x))) {
stop("This was just an example for you SO!")
}
}
My issue here is that this function will generally be taking Excel sheet data loaded into R as inputs, and the NA values generated from that are often NA_character_, NA_integer_, and NA_real_, so identical(x, NA) is often FALSE when I actually need it to be TRUE.
For the broader context, I am experiencing this issue for S3 classes I am creating for a package, and the function below approximates how I am validating multiple attributes for that class, which is when the warnings are appearing. Because of this, I am trying to avoid suppressing warnings as the solution, so would be interested to know what best practice exists to solve this issue.
Edit
In order to make use cases clearer, this is validating attributes for a class, where I want to ensure the attribute is either a single NA value, or a character vector of any length:
so_function(NA_character_) # should pass
so_function(NA_integer_) # should pass
so_function(c(NA, NA)) # should fail
so_function(c("A", "B")) # should pass
so_function(c(1, 2, 3)) # should fail
The length warning comes from the use of if, which expects a length 1 vector, and is.na which is vectorised.
You could use any or all around the is.na to compress it to a length 1 vector but there may be edge cases where it doesn't work as you expect so I would use shortcircuit evaluation to check it is length 1 on the is.na check:
so_function <- function(x = NA) {
if (!((length(x)==1 && is.na(x)) | is.character(x))) {
stop("This was just an example for you SO!")
}
}
so_function(NA_character_) # should pass
so_function(NA_integer_) # should pass
so_function(c(NA, NA)) # should fail
Error in so_function(c(NA, NA)) : This was just an example for you SO!
so_function(c("A", "B")) # should pass
so_function(c(1, 2, 3)) # should fail
Error in so_function(c(1, 2, 3)) : This was just an example for you SO!
Another option is to use NULL as the default value instead.
I don't think the problem arises from is.na() - it is a vectorized function which produces a vector as an output. is.character(x) on the other hand is not vectorized so it only will output a single value.
You can leverage apply-like functions to overcome this e.g.
sapply(c("a", NA, 5), is.character)
if also functions similarly - you are better off using ifelse for by-element comparison.
I don't think I quite grasped what you what do to with you function but it could rewritten like this:
so_function_2 <- function(x = NA) {
condit <- !(is.na(x) | sapply(x, is.character))
ifelse(condit, "This was just an example for you SO!", "FALSE")
}

Count number of rows matching a criteria

I am looking for a command in R which is equivalent of this SQL statement. I want this to be a very simple basic solution without using complex functions OR dplyr type of packages.
Select count(*) as number_of_states
from myTable
where sCode = "CA"
so essentially I would be counting number of rows matching my where condition.
I have imported a csv file into mydata as a data frame.So far I have tried these with no avail.
nrow(mydata$sCode == "CA") ## ==>> returns NULL
sum(mydata[mydata$sCode == 'CA',], na.rm=T) ## ==>> gives Error in FUN(X[[1L]], ...) : only defined on a data frame with all numeric variables
sum(subset(mydata, sCode='CA', select=c(sCode)), na.rm=T) ## ==>> FUN(X[[1L]], ...) : only defined on a data frame with all numeric variables
sum(mydata$sCode == "CA", na.rm=T) ## ==>> returns count of all rows in the entire data set, which is not the correct result.
and some variations of the above samples. Any help would be appreciated! Thanks.
mydata$sCode == "CA" will return a boolean array, with a TRUE value everywhere that the condition is met. To illustrate:
> mydata = data.frame(sCode = c("CA", "CA", "AC"))
> mydata$sCode == "CA"
[1] TRUE TRUE FALSE
There are a couple of ways to deal with this:
sum(mydata$sCode == "CA"), as suggested in the comments; because
TRUE is interpreted as 1 and FALSE as 0, this should return the
numer of TRUE values in your vector.
length(which(mydata$sCode == "CA")); the which() function
returns a vector of the indices where the condition is met, the
length of which is the count of "CA".
Edit to expand upon what's happening in #2:
> which(mydata$sCode == "CA")
[1] 1 2
which() returns a vector identify each column where the condition is met (in this case, columns 1 and 2 of the dataframe). The length() of this vector is the number of occurences.
sum is used to add elements; nrow is used to count the number of rows in a rectangular array (typically a matrix or data.frame); length is used to count the number of elements in a vector. You need to apply these functions correctly.
Let's assume your data is a data frame named "dat". Correct solutions:
nrow(dat[dat$sCode == "CA",])
length(dat$sCode[dat$sCode == "CA"])
sum(dat$sCode == "CA")
mydata$sCode is a vector, it's why nrow output is NULL.
mydata[mydata$sCode == 'CA',] returns data.frame where sCode == 'CA'. sCode includes character. That's why sum gives you the error.
subset(mydata, sCode='CA', select=c(sCode)), you should use sCode=='CA' instead sCode='CA'. Then subset returns you vector where sCode equals CA, so you should use
length(subset(na.omit(mydata), sCode='CA', select=c(sCode)))
Or you can try this: sum(na.omit(mydata$sCode) == "CA")
With dplyr package, Use
nrow(filter(mydata, sCode == "CA")),
All the solutions provided here gave me same error as multi-sam but that one worked.
Just give a try using subset
nrow(subset(data,condition))
Example
nrow(subset(myData,sCode == "CA"))
to get the number of observations the number of rows from your Dataset would be more valid:
nrow(dat[dat$sCode == "CA",])
grep command can be used
CA = mydata[grep("CA", mydata$sCode, ]
nrow(CA)
Call nrow passing as argument the name of the dataset:
nrow(dataset)
I'm using this short function to make it easier using dplyr:
countc <- function(.data, ..., preserve = FALSE){
return(nrow(filter(.data, ..., .preserve = preserve)))
}
With this you can just use it like filter. For example:
countc(data, active == TRUE)
[1] 42

Resources