Related
Say I have a logical value like this
rex <- c(TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, NA)
How do I create a counter which will only count for the TRUE and will group consecutive repetitions as the same value in the counter. ex. to have something like this:
1, NA, 2,2,NA, 3, NA
The accepted solution doesn't work for the following case:
rex <- c(TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, NA, NA, TRUE)
It prints 1 NA 2 2 NA 3 NA NA NA.
The following function returns the correct answer:
solution <- function (rex) {
result <- c()
counter <- 1
consecutive <- FALSE
for (i in 1:length(rex)) {
if (rex[i] == TRUE && !is.na(rex[i]) && consecutive) {
result <- c(result, counter)
consecutive <- TRUE
} else if (rex[i] == TRUE && !is.na(rex[i]) && !consecutive) {
result <- c(result, counter)
consecutive <- TRUE
} else{
if(i < length(rex) && rex[i+1] == TRUE && !is.na(rex[i+1])){
counter <- counter + 1
}
result <- c(result, NA)
consecutive <- FALSE
}
}
return(result)
}
Calling solution(rex) prints 1 NA 2 2 NA 3 NA NA 4, which is the correct answer.
You could use rle and its inverse:
inverse.rle(modifyList(b<-rle(rex),list(values = with(b,cumsum(values)*NA^(!values)))))
[1] 1 NA 2 2 NA 3 NA
This can also be written as:
inverse.rle(`[[<-`(b<-rle(rex),"values",with(b,cumsum(values)*NA^(!values))))
To break it down:
b <- rle(rex)
b$values <- cumsum(b$values) * NA^(!b$values)
inverse.rle(b)
A twist on the answer provided by #Onyambu could be:
with(rle(!is.na(rex) & rex), rep(cumsum(values), lengths)) * rex^NA
[1] 1 NA 2 2 NA 3 NA
It should also work when there are NAs not just at the end of the vector.
I want to dummy code i.e. create flag variables for column Species.
I wrote the below code:
create_dummies <- function(data, categorical_preds){
if (categorical_preds == "setosa"){data$setosa_flg <- 1}
else {data$setosa_flg <- 0}
if (categorical_preds == "versicolor"){data$versicolor_flg <- 1}
else {data$versicolor_flg <- 0}
if (categorical_preds == "virginica"){data$virginica_flg <- 1}
else {data$virginica_flg <- 0}
return(data)
}
create_dummies(iris,iris$Species)
I got a warning:
Warning messages:
1: In if (categorical_preds == "setosa") { :
the condition has length > 1 and only the first element will be used
2: In if (categorical_preds == "versicolor") { :
the condition has length > 1 and only the first element will be used
3: In if (categorical_preds == "virginica") { :
the condition has length > 1 and only the first element will be used
Then I changed the code to:
create_dummies <- function(data, categorical_preds){
ifelse(categorical_preds == "setosa",data$setosa_flg <- 1,data$setosa_flg <- 0)
ifelse(categorical_preds == "versicolor",data$versicolor_flg <- 1,data$versicolor_flg <- 0)
ifelse(categorical_preds == "virginica",data$virginica_flg <- 1,data$virginica_flg <- 0)
return(data)
}
create_dummies(iris,iris$Species)
No warning this time but the new dummy variables are always 0.
As a next step I want to avoid hardcoding so i wrote
create_dummies <- function(data, categorical_preds){
catvar <- (unique(categorical_preds))
for (i in 1:length(catvar)){
iris[catvar[i]] <- ifelse(iris$Species == catvar[i],1,0)
}
return(data)
}
create_dummies(iris,iris$Species)
What is wrong with this?
Questions:
Why the 2 versions of the code is not working?
What is difference between if(){} and ifelse() function in R?
In ifelse(), if the condition is true, how can I do multiple action?
example: ifelse(categorical_preds == "setosa",data$setosa_flg <- 1 print(iris$Species),data$setosa_flg <- 0).
The warning message:
the condition has length > 1 and only the first element will be used
tells you that using a vector in if condition is equivalent to use its first element :
[if (v == 1)] ~ [if (v[1] == 1)] ## v here is a vector
You should use the vectorized ifelse. For example you can write your condition like this:
create_dummies<-function(data, categorical_preds){
## here I show only the first condition
data$setosa_flg <-
ifelse (categorical_preds=="setosa",1,0)
data
}
iris$Species is a vector. An if statement is a control statement designed to work only on a scalar boolean condition. In R, when you compare a vector with a string, the output is a vector of booleans telling whether each element of the vector is equal to the string.
If else should be used when you build function, to run certain parts of function given when given codition is true (one condition, length==1) . ifelse you should use in transforming your data.frame.
Help on if else:
cond A length-one logical vector that is not NA. Conditions of length
greater than one are accepted with a warning, but only the first
element is used. Other types are coerced to logical if possible,
ignoring any class.
For this purpose (if vector is factor) you can use model.matrix to create dummy variables.
mat<-model.matrix(~iris$Species-1)
mat<-as.data.frame(mat)
names(mat)<-unique(iris$Species)
> str(mat)
'data.frame': 150 obs. of 3 variables:
$ setosa : num 1 1 1 1 1 1 1 1 1 1 ...
$ versicolor: num 0 0 0 0 0 0 0 0 0 0 ...
$ virginica : num 0 0 0 0 0 0 0 0 0 0 ...
I would like to remove the repeated ones but keep the first in a binary vector:
x = c(0,0,1,1,0,1,0,1,1,1,0,1) # the input
y = c(0,0,1,0,1,0,1,0,1) # the desired output
i.e., one 1 and two 1's of the first and third set of 1's are removed, respectively, and the first in the set is kept.
I am trying to use rle with cumsum but have not yet figured it out. Any suggestion would be appreciated.
Using rle/inverse.rle
res <- rle(x)
res$lengths[res$values == 1] <- 1
inverse.rle(res)
## [1] 0 0 1 0 1 0 1 0 1
We can use diff:
x[c(1, diff(x)) == 1 | x == 0]
x = c(0,0,1,1,0,1,0,1,1,1,0,1)
x[!(x == 1 & #remove each value that is a 1
c(x[-1] == 1, FALSE) #followed by a 1 (never the case for the last value)
)]
#[1] 0 0 1 0 1 0 1 0 1
x = c(0,0,1,1,0,1,0,1,1,1,0,1)
x1 <- rle(x)
x1$lengths[x1$values==1] <- 1
inverse.rle(x1)
Depending on the vector size you could loop through it and use conditions for appending the value to the result. Here is a simple solution using your given input.
x <- c(0,0,1,1,0,1,0,1,1,1,0,1)
prev <- 0
y <- c()
for(i in x){
if (i == 1){
if (prev != 1){
y <- append(y,i)
}
}else{
y <- append(y,i)
}
prev <- i
}
I want to dummy code i.e. create flag variables for column Species.
I wrote the below code:
create_dummies <- function(data, categorical_preds){
if (categorical_preds == "setosa"){data$setosa_flg <- 1}
else {data$setosa_flg <- 0}
if (categorical_preds == "versicolor"){data$versicolor_flg <- 1}
else {data$versicolor_flg <- 0}
if (categorical_preds == "virginica"){data$virginica_flg <- 1}
else {data$virginica_flg <- 0}
return(data)
}
create_dummies(iris,iris$Species)
I got a warning:
Warning messages:
1: In if (categorical_preds == "setosa") { :
the condition has length > 1 and only the first element will be used
2: In if (categorical_preds == "versicolor") { :
the condition has length > 1 and only the first element will be used
3: In if (categorical_preds == "virginica") { :
the condition has length > 1 and only the first element will be used
Then I changed the code to:
create_dummies <- function(data, categorical_preds){
ifelse(categorical_preds == "setosa",data$setosa_flg <- 1,data$setosa_flg <- 0)
ifelse(categorical_preds == "versicolor",data$versicolor_flg <- 1,data$versicolor_flg <- 0)
ifelse(categorical_preds == "virginica",data$virginica_flg <- 1,data$virginica_flg <- 0)
return(data)
}
create_dummies(iris,iris$Species)
No warning this time but the new dummy variables are always 0.
As a next step I want to avoid hardcoding so i wrote
create_dummies <- function(data, categorical_preds){
catvar <- (unique(categorical_preds))
for (i in 1:length(catvar)){
iris[catvar[i]] <- ifelse(iris$Species == catvar[i],1,0)
}
return(data)
}
create_dummies(iris,iris$Species)
What is wrong with this?
Questions:
Why the 2 versions of the code is not working?
What is difference between if(){} and ifelse() function in R?
In ifelse(), if the condition is true, how can I do multiple action?
example: ifelse(categorical_preds == "setosa",data$setosa_flg <- 1 print(iris$Species),data$setosa_flg <- 0).
The warning message:
the condition has length > 1 and only the first element will be used
tells you that using a vector in if condition is equivalent to use its first element :
[if (v == 1)] ~ [if (v[1] == 1)] ## v here is a vector
You should use the vectorized ifelse. For example you can write your condition like this:
create_dummies<-function(data, categorical_preds){
## here I show only the first condition
data$setosa_flg <-
ifelse (categorical_preds=="setosa",1,0)
data
}
iris$Species is a vector. An if statement is a control statement designed to work only on a scalar boolean condition. In R, when you compare a vector with a string, the output is a vector of booleans telling whether each element of the vector is equal to the string.
If else should be used when you build function, to run certain parts of function given when given codition is true (one condition, length==1) . ifelse you should use in transforming your data.frame.
Help on if else:
cond A length-one logical vector that is not NA. Conditions of length
greater than one are accepted with a warning, but only the first
element is used. Other types are coerced to logical if possible,
ignoring any class.
For this purpose (if vector is factor) you can use model.matrix to create dummy variables.
mat<-model.matrix(~iris$Species-1)
mat<-as.data.frame(mat)
names(mat)<-unique(iris$Species)
> str(mat)
'data.frame': 150 obs. of 3 variables:
$ setosa : num 1 1 1 1 1 1 1 1 1 1 ...
$ versicolor: num 0 0 0 0 0 0 0 0 0 0 ...
$ virginica : num 0 0 0 0 0 0 0 0 0 0 ...
I came to R from SAS, where numeric missing is set to infinity. So we can just say:
positiveA = A > 0;
In R, I have to be verbose like:
positiveA <- ifelse(is.na(A),0, ifelse(A > 0, 1, 0))
I find this syntax is hard to read. Is there anyway I can modify ifelse function to consider NA a special value that is always false for all comparison conditions? If not, considering NA as -Inf will work too.
Similarly, setting NA to '' (blank) in ifelse statement for character variables.
Thanks.
This syntax is easier to read:
x <- c(NA, 1, 0, -1)
(x > 0) & (!is.na(x))
# [1] FALSE TRUE FALSE FALSE
(The outer parentheses aren't necessary, but will make the statement easier to read for almost anyone other than the machine.)
Edit:
## If you want 0s and 1s
((x > 0) & (!is.na(x))) * 1
# [1] 0 1 0 0
Finally, you can make the whole thing into a function:
isPos <- function(x) {
(x > 0) & (!is.na(x)) * 1
}
isPos(x)
# [1] 0 1 0 0
Replacing a NA value with zero seems rather strange behaviour to expect. R considers NA values missing (although hidden far behind scenes where you (never) need to go they are negative very large numbers when numeric ))
All you need to do is A>0 or as.numeric(A>0) if you want 0,1 not TRUE , FALSE
# some dummy data
A <- seq(-1,1,l=11)
# add NA value as second value
A[2] <- NA
positiveA <- A>0
positiveA
[1] FALSE NA FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
as.numeric(positiveA) #
[1] 0 NA 0 0 0 0 1 1 1 1 1
note that
ifelse(A>0, 1,0) would also work.
The NA values are "retained", or dealt with appropriately. R is sensible here.
Try this:
positiveA <- ifelse(!is.na(A) & A > 0, 1, 0)
If you are working with integers you can use %in%
For example, if your numbers can go up to 2
test <- c(NA, 2, 1, 0, -1)
other people has suggested to use
(test > 0) & (!is.na(test))
or
ifelse(!is.na(test) & test > 0, 1, 0)
my solution is simpler and gives you the same result.
test %in% 1:2
YOu can use the missing argument i if_else_ from hablar:
library(hablar)
x <- c(NA, 1, 0, -1)
if_else_(x > 0, T, F, missing = F)
which gives you
[1] FALSE TRUE FALSE FALSE