Removing/ignoring NA values in an if loop - r

So I have this simple if loop
chick<-lapply(1:length(t),function(i){
if(t[[i]]<0.01){
chick=1
}else 0
})
So basically when t<0.01 it print outs 1 if not it prints 0 but there are times when I have data that has NA values like the one below....how can I assign the NA values 0 as well coz I'll get an error similar to this if I dont:
Error in if (t[[i]] < 0.01) { : missing value where TRUE/FALSE needed
Here is a sample output from data called 't'
[[1]]
[1] NA
[[2]]
[1] NA
[[3]]
[1] 0.01
thanks again

use is.na
if(t[!is.na(t)][[i]]<0.01) ...
More importatnly though, since you are assigning to chick, do not try to also assign inside your lapply (or similar) statement. It will give you results different from what you are expecting. Instead try
chick <- lapply(1:length(t),function(i)
if(t[!is.na(t)][[i]]<0.01) 1 else 0
)
Or better yet, use ifelse.
chick <- ifelse(t[!is.na(t)] < 0.01, 1, 0)
If you want chick to be the same length as t, then use the '|' operator in ifelse. as suggested by NPE (but use single | not || in ifelse)

The following with check whether t[[i]] is either NA or less than 0.01:
if (is.na(t[[i]]) || t[[i]] < 0.01) {
....

Or you could just use...
chick <- numeric( length(t) )
chick[ t < 0.01 ] <- 1
... and avoid loops and checking for NA altogether.

Why not just this (invert the test and return 0 for the complementary test as well as for NA):
chick <- ifelse(is.na(t)|t>=0.01, 0, 1)
This should work because FALSE|NA will return FALSE. See the ?Logic page. It's also more efficient that looping with lapply. I suppose if you need the results in list format you could eitehr do as.list to the results or use:
if( is.na(t) || t>=0.01 ) { 0 }else{ 1 }

Related

Performing operation along vector in R - only returns a single value

I'm trying to perform a conditional operation on a vector xt, given a value lambdat.
Outside of the ifelse() function the operations work, but the full code doesn't. See example below, cheers!
xt <- c(1,2,3)
lambdat <- 1
bc_applied_columnt <- ifelse(lambdat != 0, (xt^(lambdat)-1)/lambdat, log(xt))
This returns 0 (first value in the vector xt), but I'd like it to return the output of (xt^(lambdat)-1)/lambdat or log(xt) - depending on the condition.
ifelse returns the output of the same length as the condition that you check. Since length(lambdat != 0) is of length 1 ifelse returns output of length 1 as well. When you have only one value to check use if/else.
xt <- c(1,2,3)
lambdat <- 1
if(lambdat != 0) (xt^(lambdat)-1)/lambdat else log(xt)
#[1] 0 1 2

If - then - do/else in R [duplicate]

I want to dummy code i.e. create flag variables for column Species.
I wrote the below code:
create_dummies <- function(data, categorical_preds){
if (categorical_preds == "setosa"){data$setosa_flg <- 1}
else {data$setosa_flg <- 0}
if (categorical_preds == "versicolor"){data$versicolor_flg <- 1}
else {data$versicolor_flg <- 0}
if (categorical_preds == "virginica"){data$virginica_flg <- 1}
else {data$virginica_flg <- 0}
return(data)
}
create_dummies(iris,iris$Species)
I got a warning:
Warning messages:
1: In if (categorical_preds == "setosa") { :
the condition has length > 1 and only the first element will be used
2: In if (categorical_preds == "versicolor") { :
the condition has length > 1 and only the first element will be used
3: In if (categorical_preds == "virginica") { :
the condition has length > 1 and only the first element will be used
Then I changed the code to:
create_dummies <- function(data, categorical_preds){
ifelse(categorical_preds == "setosa",data$setosa_flg <- 1,data$setosa_flg <- 0)
ifelse(categorical_preds == "versicolor",data$versicolor_flg <- 1,data$versicolor_flg <- 0)
ifelse(categorical_preds == "virginica",data$virginica_flg <- 1,data$virginica_flg <- 0)
return(data)
}
create_dummies(iris,iris$Species)
No warning this time but the new dummy variables are always 0.
As a next step I want to avoid hardcoding so i wrote
create_dummies <- function(data, categorical_preds){
catvar <- (unique(categorical_preds))
for (i in 1:length(catvar)){
iris[catvar[i]] <- ifelse(iris$Species == catvar[i],1,0)
}
return(data)
}
create_dummies(iris,iris$Species)
What is wrong with this?
Questions:
Why the 2 versions of the code is not working?
What is difference between if(){} and ifelse() function in R?
In ifelse(), if the condition is true, how can I do multiple action?
example: ifelse(categorical_preds == "setosa",data$setosa_flg <- 1 print(iris$Species),data$setosa_flg <- 0).
The warning message:
the condition has length > 1 and only the first element will be used
tells you that using a vector in if condition is equivalent to use its first element :
[if (v == 1)] ~ [if (v[1] == 1)] ## v here is a vector
You should use the vectorized ifelse. For example you can write your condition like this:
create_dummies<-function(data, categorical_preds){
## here I show only the first condition
data$setosa_flg <-
ifelse (categorical_preds=="setosa",1,0)
data
}
iris$Species is a vector. An if statement is a control statement designed to work only on a scalar boolean condition. In R, when you compare a vector with a string, the output is a vector of booleans telling whether each element of the vector is equal to the string.
If else should be used when you build function, to run certain parts of function given when given codition is true (one condition, length==1) . ifelse you should use in transforming your data.frame.
Help on if else:
cond A length-one logical vector that is not NA. Conditions of length
greater than one are accepted with a warning, but only the first
element is used. Other types are coerced to logical if possible,
ignoring any class.
For this purpose (if vector is factor) you can use model.matrix to create dummy variables.
mat<-model.matrix(~iris$Species-1)
mat<-as.data.frame(mat)
names(mat)<-unique(iris$Species)
> str(mat)
'data.frame': 150 obs. of 3 variables:
$ setosa : num 1 1 1 1 1 1 1 1 1 1 ...
$ versicolor: num 0 0 0 0 0 0 0 0 0 0 ...
$ virginica : num 0 0 0 0 0 0 0 0 0 0 ...

Sum of functions in R

I have a function in R and I wish to take the sum of this function with different values. However, since I have a break condition (made by an if statement) I cannot just do this explicitly:
F<- function(x) if(x<5) 1 else 0
sum(F(seq(1,10,1))
#[1] 1
#Warning message:
#In if (x < 5) 1 else 0 :
# the condition has length > 1 and only the first element will be used
so it is trying to do the sequence of the function and not the sum of the sequence. I wish to avoid the for loop as this can make long codes very cluttered; specifically to avoid ugly nested for loops.
How do I go about this?
You can use Vectorize:
F_v <- Vectorize(F)
sum(F_v(seq(1,10,1)))
# [1] 4
If you like to avoid for-loops, sapply is an option for you, because it is faster.
sapply(seq(1,10,1), FUN <- function(x) {if(x<5) 1 else 0})

Difference between if() and ifelse() functions

I want to dummy code i.e. create flag variables for column Species.
I wrote the below code:
create_dummies <- function(data, categorical_preds){
if (categorical_preds == "setosa"){data$setosa_flg <- 1}
else {data$setosa_flg <- 0}
if (categorical_preds == "versicolor"){data$versicolor_flg <- 1}
else {data$versicolor_flg <- 0}
if (categorical_preds == "virginica"){data$virginica_flg <- 1}
else {data$virginica_flg <- 0}
return(data)
}
create_dummies(iris,iris$Species)
I got a warning:
Warning messages:
1: In if (categorical_preds == "setosa") { :
the condition has length > 1 and only the first element will be used
2: In if (categorical_preds == "versicolor") { :
the condition has length > 1 and only the first element will be used
3: In if (categorical_preds == "virginica") { :
the condition has length > 1 and only the first element will be used
Then I changed the code to:
create_dummies <- function(data, categorical_preds){
ifelse(categorical_preds == "setosa",data$setosa_flg <- 1,data$setosa_flg <- 0)
ifelse(categorical_preds == "versicolor",data$versicolor_flg <- 1,data$versicolor_flg <- 0)
ifelse(categorical_preds == "virginica",data$virginica_flg <- 1,data$virginica_flg <- 0)
return(data)
}
create_dummies(iris,iris$Species)
No warning this time but the new dummy variables are always 0.
As a next step I want to avoid hardcoding so i wrote
create_dummies <- function(data, categorical_preds){
catvar <- (unique(categorical_preds))
for (i in 1:length(catvar)){
iris[catvar[i]] <- ifelse(iris$Species == catvar[i],1,0)
}
return(data)
}
create_dummies(iris,iris$Species)
What is wrong with this?
Questions:
Why the 2 versions of the code is not working?
What is difference between if(){} and ifelse() function in R?
In ifelse(), if the condition is true, how can I do multiple action?
example: ifelse(categorical_preds == "setosa",data$setosa_flg <- 1 print(iris$Species),data$setosa_flg <- 0).
The warning message:
the condition has length > 1 and only the first element will be used
tells you that using a vector in if condition is equivalent to use its first element :
[if (v == 1)] ~ [if (v[1] == 1)] ## v here is a vector
You should use the vectorized ifelse. For example you can write your condition like this:
create_dummies<-function(data, categorical_preds){
## here I show only the first condition
data$setosa_flg <-
ifelse (categorical_preds=="setosa",1,0)
data
}
iris$Species is a vector. An if statement is a control statement designed to work only on a scalar boolean condition. In R, when you compare a vector with a string, the output is a vector of booleans telling whether each element of the vector is equal to the string.
If else should be used when you build function, to run certain parts of function given when given codition is true (one condition, length==1) . ifelse you should use in transforming your data.frame.
Help on if else:
cond A length-one logical vector that is not NA. Conditions of length
greater than one are accepted with a warning, but only the first
element is used. Other types are coerced to logical if possible,
ignoring any class.
For this purpose (if vector is factor) you can use model.matrix to create dummy variables.
mat<-model.matrix(~iris$Species-1)
mat<-as.data.frame(mat)
names(mat)<-unique(iris$Species)
> str(mat)
'data.frame': 150 obs. of 3 variables:
$ setosa : num 1 1 1 1 1 1 1 1 1 1 ...
$ versicolor: num 0 0 0 0 0 0 0 0 0 0 ...
$ virginica : num 0 0 0 0 0 0 0 0 0 0 ...

My user-defined function with nested if else statements isn't correctly evaluating vector inputs in R - please help

I am trying to create a function that shows how many "person-years" an individual has contributed to a given age-group in a given period. If the person is alive during the specified interval, the person contributes to the time-interval. For example, for the age-group 0-1, an individual who came under observation at age 0.5 and left at age 3 will have contributed 0.5 years to the person-years for the 0-1 age group.
I've been able to run this code successfully over a for-loop, but it takes forever, so I'm trying to implement a vector-based function instead. The function works fine for individual entries, but cannot handle the vectors I pass to it, giving the error: "...the condition has length > 1 and only the first element will be used"
The function I've written is as follows:
pyears01.smm <- function(ageent, ageleave) {
if ( is.na(ageent) | is.na(ageleave) )
{NA} else
if( ageent > 1 )
{0}
if ( ageent <= 1 && ageleave > 1 )
{1-ageent} else
if( ageent <= 1 && ageleave <= 1 )
{ageleave-ageent}
}
which works fine for evaluating the following:
pyears.smm(0,5)
[1] 1
pyears.smm(0.5,0.75)
[1] 0.25
pyears.smm(2,3)
[1] 0
but does not evaluate NAs correctly:
> pyears.smm(NA,NA)
[1] 0
> pyears.smm("NA",5)
[1] 0
and doesn't handle vectors correctly:
x <- c(0,0.5,2,5)
y <- c(5,0.75,3,NA)
z<- pyears.smm(x,y)
Warning message:
In if (!is.na(ageent) & ageent <= 1 & !is.na(ageleave) & ageleave > :
the condition has length > 1 and only the first element will be used
> z
[1] 1.0 0.5 -1.0 -4.0
I have read that elseif takes vectors while if statements like this can only evaluate single elements, but I have several layers of nested if statements, so I'm not sure how to fix this. Any suggestions would be appreciated. Thanks!
The warning message you are getting is a common one, especially if you are coming from another programming language. You are looking for the ifelse() function, which operates on vectors. As the warning message told you, it only evaluated the first condition. Here's the ifelse() version of your code:
pyears01.smm2 <- function(ageent, ageleave){
ifelse(is.na(ageent) | is.na(ageleave), NA
, ifelse(ageent > 1,0
, ifelse(ageent <= 1 & ageleave > 1, 1 - ageent, ageleave - ageent)))
}
> pyears01.smm2(NA, NA)
[1] NA
> pyears01.smm2(NA, 5)
[1] NA
> x <- c(0,0.5,2,5)
> y <- c(5,0.75,3,NA)
> pyears01.smm2(x,y)
[1] 1.00 0.25 0.00 NA
If you Google or search on SO for differences between if else and ifelse(), I'm sure you'll find some good stuff. Here's one link that rose to the top: http://rwiki.sciviews.org/doku.php?id=tips:programming:ifelse
The vectorised form of an if-else construct is ifelse (not elseif). However, you don't really need it for this exercise. Instead, use pmax and pmin to get the (elementwise) upper and lower bounds for the exposure interval for each observation, and also to handle the case where the ages at entry and exit are outside the interval entirely.
pyears01.smm <- function(ageent, ageleave)
pmax(0, pmin(ageleave, 1) - pmax(ageent, 0))
The problem you are trying to solve has already been addressed in two package that I am aware of: "survival" and "epi". You are (unnecessarily) reinventing the Lexis diagram.

Resources