Specify class of NA in R (for if_else, dplyr) - r

In the if_else() function in dplyr, it requires that both the if:TRUE and if:FALSE elements are of the same class.
I wish to return NA from my if_else() statement.
But e.g.
if_else(mtcars$cyl > 5, NA, 1)
returns
Error: false has type 'double' not 'logical'
Because simply reading in NA is logical, and 1 is numeric (double).
Wrapping as.numeric() around the NA works fine: e.g.
if_else(mtcars$cyl > 5, as.numeric(NA), 1)
returns
1 NA NA 1 NA NA NA NA 1 1 NA NA NA NA NA NA NA NA 1 1 1 1 NA NA NA NA 1 1 1 NA NA NA 1
As what I am hoping for.
But this feels kinda silly/unnecessary. Is there a better way of inputting NA as a "numeric NA" than wrapping it like this?
NB this only applies to the stricter dplyr::if_else not base::ifelse.

you can use NA_real_
if_else(mtcars$cyl > 5, NA_real_, 1)

try the base function
ifelse(mtcars$cyl > 5, NA, 1)

Or you can use if_else_ from the package hablar. It is as rigid as if_else from dplyr about types, but allows for generic NA. See,
library(hablar)
if_else_(mtcars$cyl > 5, NA, 1)

Related

Index TRUE occurrences preserving NA in a new vector

I have what some of you might categorise as a dumb question, but I cannot solve it. I have this vector:
a <- c(NA,NA,TRUE,NA,TRUE,NA,TRUE)
And I want to get this in a new vector:
b <- c(NA,NA,1,NA,2,NA,3)
That simple. All the ways I am trying do not preserve the NA and I need them untouched. I would prefer if there would be a way in base R.
In base R, use cumsum() while excluding the NA values:
a <- c(NA,NA,TRUE,NA,TRUE,NA,TRUE)
a[!is.na(a)] <- cumsum(a[!is.na(a)])
Output:
[1] NA NA 1 NA 2 NA 3
Using replace from base R
b <- replace(a, !is.na(a), seq_len(sum(a, na.rm = TRUE)))
b
[1] NA NA 1 NA 2 NA 3
Or slightly more compact option (if the values are logical/numeric)
cumsum(!is.na(a)) * a
[1] NA NA 1 NA 2 NA 3
Update
If the OP's vector is
a <- c(NA,NA,TRUE,NA,FALSE,NA,TRUE)
(a|!a) * cumsum(replace(a, is.na(a), 0))
[1] NA NA 1 NA 1 NA 2
replaceing the non-NAs with the cumsum.
replace(a, !is.na(a), cumsum(na.omit(a)))
# [1] NA NA 1 NA 2 NA 3

Select columns, excluding some which are all NA

Suppose I have this dataframe
df <- data.frame(keep = c(1, NA, 2),
also_want = c(NA, NA, NA),
maybe = c(1, 2, NA),
maybe_2 = c(NA, NA, NA))
Edit: In the actual dataframe there are many columns I'd like to keep, so spelling them all out isn't viable. These columns are all the columns that do not start with maybe. The maybe columns, instead, do have a common naming like maybe, maybe_1 etc. that could work with grep or stringr::str_detect
I want to select keep, and also_want. I also want any of the maybe columns that have values other than NA
desired_df
keep also_want maybe
1 1 NA 1
2 NA NA 2
3 2 NA NA
I can use select_if to get all columns that have non-NA values, but then I lose also_want
library(dplyr)
df %>%
select_if(~sum(!is.na(.)) > 0)
keep maybe
1 1 1
2 NA 2
3 2 NA
Thoughts?
With dplyr 1.0.0 you can use the where function inside a select statement to test for conditions that your variables have to satisfy, but first you specify the variables you also want to keep.
EDIT
I've inserted the condition that only the "maybe" variables have to contain values other than NA; before, we select every column that does not start with "maybe".
df %>%
select(!starts_with("maybe"), starts_with("maybe") & where(~sum(!is.na(.)) > 0))
Output
# keep also_want maybe
# 1 1 NA 1
# 2 NA NA 2
# 3 2 NA NA
following your comments, in Base-R we can use
df[,!apply(
rbind(
grepl("maybe",colnames(df)),
!apply(df, 2, function(x) !all(is.na(x)))
)
,2,all)]
keep also_want maybe
1 1 NA 1
2 NA NA 2
3 2 NA NA
Or if you prefer seeing the same code all on 1 line:
df[,!apply(rbind(grepl("maybe",colnames(df)),!apply(df, 2, function(x) !all(is.na(x)))),2,all)]
I eventually figured this out. Using str_detect to select all non-maybe columns, and then using a one-liner inside sapply to also select any other columns (i.e. any maybe columns) that have non-NA values.
library(dplyr)
library(stringr)
df %>%
select_if(stringr::str_detect(names(.), "maybe", negate = TRUE) |
sapply(., function(x) {
sum(!is.na(x))
} > 0))
keep also_want maybe
1 1 NA 1
2 NA NA 2
3 2 NA NA

ifelse r - x and y lengths differ

I'm trying to use an ifelse on an array called "OutComes" but it's giving me some trouble.
> PersonNumber Risk_Factor OC_Death OnsetAge Clinical CS_Death Cure AC_Death
>[1,] 1 1 99.69098 NA NA NA NA NA
>[2,] 2 1 60.68009 NA NA NA NA NA
>[3,] 3 0 88.67483 NA NA NA NA NA
>[4,] 4 0 87.60846 NA NA NA NA NA
>[5,] 5 0 78.23118 NA NA NA NA NA
Now I will try to use an apply to analyse this table's Risk_Factor Column and apply one of two functions to replace the OnsetAge column's NA's.
I've been using an apply function -
apply(OutComes, 1, function(x)ifelse(OutComes[,"Risk_Factor"] == 1,
HighOnsetFunction(x), OnsetFunction(x))
However this obviously won't work as the ifelse itself won't work. the error being -
Error in xy.coords(x, y) : 'x' and 'y' lengths differ
I'm not sure what's going on in this ifelse or what the x and y lengths are.
There is a mistake in your apply function. You are applying a function with argument x (one row of OutComes), but then whithin ifelse, you use a vector OutComes[,"Risk_Factor"] which is a column of the original matrix, not a single number. One simple solution is to do
apply(OutComes, 1, function(x) ifelse(x["Risk_Factor"] == 1,
HighOnsetFunction(x), OnsetFunction(x)))
But when dealing with a scalar, there is no real need to use ifelse, so it may be more efficient to write
apply(OutComes, 1, function(x) if (x["Risk_Factor"] == 1) HighOnsetFunction(x) else OnsetFunction(x)))

passing positive results from multiple columns into a single new column in r

I am trying to work out a way to create a single column from multiple columns in R. What I want to do is for R to go through all rows for multiple columns and if it finds a positive result in one of those columns, to pass that result into an 'amalgam' column (sorry I don't know a better word for it).
See the toy dataset below
x <- c(NA, NA, NA, NA, NA, 1)
y <- c(NA, NA, 1, NA, NA, NA)
z <- c(NA, 1, NA, NA, NA, NA)
df <- data.frame(cbind(x, y, z))
df[, "compCol"] <- NA
df
x y z compCol
1 NA NA NA NA
2 NA NA 1 NA
3 NA 1 NA NA
4 NA NA NA NA
5 NA NA NA NA
6 1 NA NA NA
I need to pass positive results from each of the columns into the compCol column while changing negative results to 0. So that it looks like this.
x y z compCol
1 NA NA NA 0
2 NA NA 1 3
3 NA 1 NA 2
4 NA NA NA 0
5 NA NA NA 0
6 1 NA NA 1
I know if probably requires an if else statement nested inside a for loop but all the ways I have tried result in errors that I don't understand.
I tried the following just for a single column
for (i in 1:length(x)) {
if (df$x[i] == 1) {
df$compCol[i] <- df$x[i]
}
}
But it didn't work at all.
I got the message 'Error in if (df$x[i] == 1) { : missing value where TRUE/FALSE needed'
And that makes sense but I can't see where to put the TRUE/FALSE statement
You can also use reshaping with NA removal
library(dplyr)
library(tidyr)
df.id = df %>% mutate(ID = 1:n() )
df.id %>%
gather(variable, value,
x, y, z,
na.rm = TRUE) %>%
left_join(df.id)
We can use max.col. Create a logical matrix by checking whether the selected columns are greater than 0 and are not NA ('ind'). We use max.col to get the column index for each row and multiply with rowSums of 'ind' so that if there is 0 TRUE values for a row, it will be 0.
ind <- df > 0 & !is.na(df)
df$compCol <- max.col(ind) *rowSums(ind)
df$compCol
#[1] 0 3 2 0 0 1
Or another option is pmax after multiplying with the col(df)
do.call(pmax,col(df)*replace(df, is.na(df), 0))
#[1] 0 3 2 0 0 1
NOTE: I used the dataset before creating the 'compCol' in the OP's post.

R: How to manage heading and trailing NAs when using rollmean()

I have this vector
vec <- c(NA, 1, 2, 3, 4, NA)
for which I wish to calculate a rollmean of a window of size 3 aligned to the right (that is, if I understand correctly looking backwards)
The expected rolling mean of my vector would be
# [1] NA NA NA 2 3 NA #
and yet if I do
rollmean(vec, 3, align='right', fill=NA)
I obtain
# [1] NA NA NA NA NA NA
You can use apply function instead.
rollapply(vec,3,mean,fill=NA,align="right")
[1] NA NA NA 2 3 NA

Resources