I want to dummy code i.e. create flag variables for column Species.
I wrote the below code:
create_dummies <- function(data, categorical_preds){
if (categorical_preds == "setosa"){data$setosa_flg <- 1}
else {data$setosa_flg <- 0}
if (categorical_preds == "versicolor"){data$versicolor_flg <- 1}
else {data$versicolor_flg <- 0}
if (categorical_preds == "virginica"){data$virginica_flg <- 1}
else {data$virginica_flg <- 0}
return(data)
}
create_dummies(iris,iris$Species)
I got a warning:
Warning messages:
1: In if (categorical_preds == "setosa") { :
the condition has length > 1 and only the first element will be used
2: In if (categorical_preds == "versicolor") { :
the condition has length > 1 and only the first element will be used
3: In if (categorical_preds == "virginica") { :
the condition has length > 1 and only the first element will be used
Then I changed the code to:
create_dummies <- function(data, categorical_preds){
ifelse(categorical_preds == "setosa",data$setosa_flg <- 1,data$setosa_flg <- 0)
ifelse(categorical_preds == "versicolor",data$versicolor_flg <- 1,data$versicolor_flg <- 0)
ifelse(categorical_preds == "virginica",data$virginica_flg <- 1,data$virginica_flg <- 0)
return(data)
}
create_dummies(iris,iris$Species)
No warning this time but the new dummy variables are always 0.
As a next step I want to avoid hardcoding so i wrote
create_dummies <- function(data, categorical_preds){
catvar <- (unique(categorical_preds))
for (i in 1:length(catvar)){
iris[catvar[i]] <- ifelse(iris$Species == catvar[i],1,0)
}
return(data)
}
create_dummies(iris,iris$Species)
What is wrong with this?
Questions:
Why the 2 versions of the code is not working?
What is difference between if(){} and ifelse() function in R?
In ifelse(), if the condition is true, how can I do multiple action?
example: ifelse(categorical_preds == "setosa",data$setosa_flg <- 1 print(iris$Species),data$setosa_flg <- 0).
The warning message:
the condition has length > 1 and only the first element will be used
tells you that using a vector in if condition is equivalent to use its first element :
[if (v == 1)] ~ [if (v[1] == 1)] ## v here is a vector
You should use the vectorized ifelse. For example you can write your condition like this:
create_dummies<-function(data, categorical_preds){
## here I show only the first condition
data$setosa_flg <-
ifelse (categorical_preds=="setosa",1,0)
data
}
iris$Species is a vector. An if statement is a control statement designed to work only on a scalar boolean condition. In R, when you compare a vector with a string, the output is a vector of booleans telling whether each element of the vector is equal to the string.
If else should be used when you build function, to run certain parts of function given when given codition is true (one condition, length==1) . ifelse you should use in transforming your data.frame.
Help on if else:
cond A length-one logical vector that is not NA. Conditions of length
greater than one are accepted with a warning, but only the first
element is used. Other types are coerced to logical if possible,
ignoring any class.
For this purpose (if vector is factor) you can use model.matrix to create dummy variables.
mat<-model.matrix(~iris$Species-1)
mat<-as.data.frame(mat)
names(mat)<-unique(iris$Species)
> str(mat)
'data.frame': 150 obs. of 3 variables:
$ setosa : num 1 1 1 1 1 1 1 1 1 1 ...
$ versicolor: num 0 0 0 0 0 0 0 0 0 0 ...
$ virginica : num 0 0 0 0 0 0 0 0 0 0 ...
Related
I wanted to explain my problem with codes
example_1 <- sample(-100:100, 100) # simple sample for my question
example_1[30] <- NA # changed one of them to NA
not_equal_zero <- matrix(NA, 100, 1) # matrix to find out if there is any zeros (1 for TRUE, 0 for FALSE)
for (i in 1:100) { # check each observation if it is 0 assign 1 to "not equal zero matrix"
if (example_1[i] == 0) {
not_equal_zero[i] <- 1
} else {
not_equal_zero[i] <- 0
}
}
When i = 30 it finds 0, and terminates. I am not checking only against zero. I have special values. What is the solution for this problem?
2 == 0 # it gives FALSE
0 == 0 # it gives TRUE
NA == 0 # it gives NA but i need FALSE
NA gives NA when compared to anything. You probably want to replace:
if (example_1[i] == 0)
with:
if (!is.na(example_1[i]) && example_1[i] == 0)
I would like to remove the repeated ones but keep the first in a binary vector:
x = c(0,0,1,1,0,1,0,1,1,1,0,1) # the input
y = c(0,0,1,0,1,0,1,0,1) # the desired output
i.e., one 1 and two 1's of the first and third set of 1's are removed, respectively, and the first in the set is kept.
I am trying to use rle with cumsum but have not yet figured it out. Any suggestion would be appreciated.
Using rle/inverse.rle
res <- rle(x)
res$lengths[res$values == 1] <- 1
inverse.rle(res)
## [1] 0 0 1 0 1 0 1 0 1
We can use diff:
x[c(1, diff(x)) == 1 | x == 0]
x = c(0,0,1,1,0,1,0,1,1,1,0,1)
x[!(x == 1 & #remove each value that is a 1
c(x[-1] == 1, FALSE) #followed by a 1 (never the case for the last value)
)]
#[1] 0 0 1 0 1 0 1 0 1
x = c(0,0,1,1,0,1,0,1,1,1,0,1)
x1 <- rle(x)
x1$lengths[x1$values==1] <- 1
inverse.rle(x1)
Depending on the vector size you could loop through it and use conditions for appending the value to the result. Here is a simple solution using your given input.
x <- c(0,0,1,1,0,1,0,1,1,1,0,1)
prev <- 0
y <- c()
for(i in x){
if (i == 1){
if (prev != 1){
y <- append(y,i)
}
}else{
y <- append(y,i)
}
prev <- i
}
Say, I have a data frame as follows
1 2
1 4
1 6
1 7
1 9
While running a loop from 1:10, I want to retrieve only those numbers which are present along with 1 in the table above, namely, 2,4,6,7,9. This is my code using the which condition, however, I get an error saying, "Error in if : argument is of length zero". I also tried with ==TRUE instead of >0, and still get the same error.
for(i in 1:10)
{
if(which((mydata[,1] == 1) & (mydata[,2] == i)) > 0)
{
print("yes");
}
else
{
print("no")
}
}
Like suggested, you would have to check the length of which's output:
if (length(which(mydata[,1] == 1 & mydata[,2] == i)) > 0)
A more appropriate tool for this is any:
if (any(mydata[,1] == 1 & mydata[,2] == i))
I also suggested removing the two sets of innermost parentheses since the == operator has higher precedence than & (see ?Syntax).
I want to dummy code i.e. create flag variables for column Species.
I wrote the below code:
create_dummies <- function(data, categorical_preds){
if (categorical_preds == "setosa"){data$setosa_flg <- 1}
else {data$setosa_flg <- 0}
if (categorical_preds == "versicolor"){data$versicolor_flg <- 1}
else {data$versicolor_flg <- 0}
if (categorical_preds == "virginica"){data$virginica_flg <- 1}
else {data$virginica_flg <- 0}
return(data)
}
create_dummies(iris,iris$Species)
I got a warning:
Warning messages:
1: In if (categorical_preds == "setosa") { :
the condition has length > 1 and only the first element will be used
2: In if (categorical_preds == "versicolor") { :
the condition has length > 1 and only the first element will be used
3: In if (categorical_preds == "virginica") { :
the condition has length > 1 and only the first element will be used
Then I changed the code to:
create_dummies <- function(data, categorical_preds){
ifelse(categorical_preds == "setosa",data$setosa_flg <- 1,data$setosa_flg <- 0)
ifelse(categorical_preds == "versicolor",data$versicolor_flg <- 1,data$versicolor_flg <- 0)
ifelse(categorical_preds == "virginica",data$virginica_flg <- 1,data$virginica_flg <- 0)
return(data)
}
create_dummies(iris,iris$Species)
No warning this time but the new dummy variables are always 0.
As a next step I want to avoid hardcoding so i wrote
create_dummies <- function(data, categorical_preds){
catvar <- (unique(categorical_preds))
for (i in 1:length(catvar)){
iris[catvar[i]] <- ifelse(iris$Species == catvar[i],1,0)
}
return(data)
}
create_dummies(iris,iris$Species)
What is wrong with this?
Questions:
Why the 2 versions of the code is not working?
What is difference between if(){} and ifelse() function in R?
In ifelse(), if the condition is true, how can I do multiple action?
example: ifelse(categorical_preds == "setosa",data$setosa_flg <- 1 print(iris$Species),data$setosa_flg <- 0).
The warning message:
the condition has length > 1 and only the first element will be used
tells you that using a vector in if condition is equivalent to use its first element :
[if (v == 1)] ~ [if (v[1] == 1)] ## v here is a vector
You should use the vectorized ifelse. For example you can write your condition like this:
create_dummies<-function(data, categorical_preds){
## here I show only the first condition
data$setosa_flg <-
ifelse (categorical_preds=="setosa",1,0)
data
}
iris$Species is a vector. An if statement is a control statement designed to work only on a scalar boolean condition. In R, when you compare a vector with a string, the output is a vector of booleans telling whether each element of the vector is equal to the string.
If else should be used when you build function, to run certain parts of function given when given codition is true (one condition, length==1) . ifelse you should use in transforming your data.frame.
Help on if else:
cond A length-one logical vector that is not NA. Conditions of length
greater than one are accepted with a warning, but only the first
element is used. Other types are coerced to logical if possible,
ignoring any class.
For this purpose (if vector is factor) you can use model.matrix to create dummy variables.
mat<-model.matrix(~iris$Species-1)
mat<-as.data.frame(mat)
names(mat)<-unique(iris$Species)
> str(mat)
'data.frame': 150 obs. of 3 variables:
$ setosa : num 1 1 1 1 1 1 1 1 1 1 ...
$ versicolor: num 0 0 0 0 0 0 0 0 0 0 ...
$ virginica : num 0 0 0 0 0 0 0 0 0 0 ...
So I have this simple if loop
chick<-lapply(1:length(t),function(i){
if(t[[i]]<0.01){
chick=1
}else 0
})
So basically when t<0.01 it print outs 1 if not it prints 0 but there are times when I have data that has NA values like the one below....how can I assign the NA values 0 as well coz I'll get an error similar to this if I dont:
Error in if (t[[i]] < 0.01) { : missing value where TRUE/FALSE needed
Here is a sample output from data called 't'
[[1]]
[1] NA
[[2]]
[1] NA
[[3]]
[1] 0.01
thanks again
use is.na
if(t[!is.na(t)][[i]]<0.01) ...
More importatnly though, since you are assigning to chick, do not try to also assign inside your lapply (or similar) statement. It will give you results different from what you are expecting. Instead try
chick <- lapply(1:length(t),function(i)
if(t[!is.na(t)][[i]]<0.01) 1 else 0
)
Or better yet, use ifelse.
chick <- ifelse(t[!is.na(t)] < 0.01, 1, 0)
If you want chick to be the same length as t, then use the '|' operator in ifelse. as suggested by NPE (but use single | not || in ifelse)
The following with check whether t[[i]] is either NA or less than 0.01:
if (is.na(t[[i]]) || t[[i]] < 0.01) {
....
Or you could just use...
chick <- numeric( length(t) )
chick[ t < 0.01 ] <- 1
... and avoid loops and checking for NA altogether.
Why not just this (invert the test and return 0 for the complementary test as well as for NA):
chick <- ifelse(is.na(t)|t>=0.01, 0, 1)
This should work because FALSE|NA will return FALSE. See the ?Logic page. It's also more efficient that looping with lapply. I suppose if you need the results in list format you could eitehr do as.list to the results or use:
if( is.na(t) || t>=0.01 ) { 0 }else{ 1 }