Writing an Indicator function in R - r

I am trying to create an indictor variable, Z, in R, i.e If I have some event A, I want Z to give a result of 1 if A is true and 0 if A is false.
I have tried this:
Z=0
if(A==(d>=5 && d<=10))
{
Z=1
}
else
{
Z=0
}
But this doesn't work. I was also thinking i could try to write a separate function called indicator:
indicator = function()
Any suggestions would be really helpful, thank you

You could easily write something like this
indicator<-function(condition) ifelse(condition,1,0)
ifelse can be used on vectors, but it works perfectly fine on single logical values (TRUE or FALSE).

Booleans FALSE / TRUE can be coerced to be 0 or 1 and then be multiplied:
Indicator<-function(x, min=0, max=Inf)
{
as.numeric(x >= min) * as.numeric(x <= max)
}

You can use
a <- data.frame(a = -5:5, b = 1:11)
indicator <- function(data) I(data > 0) + 1 - 1
indicator(a)
a can be vector, data frame...
And you can chance the logical in I function with your interest.

There is no need to define A, just test the condition.
Also, remember that && and & have different uses in R (see R - boolean operators && and || for more details), so maybe that is part of your problem.
if (d>=5 & d<=10)
{
Z <- 1
}
else
{
Z <- 0
}
Or, as suggested in the other answer use ifelse:
z <- ifelse((d>=5 & d<=10), 1, 0)

Related

Confusion of nested ifelse expression

Consider below expression:
x$Y = ifelse(x$A<= 5 & abs(x$B) >= 2,
ifelse(x$B> 2 ,"YES","NO"),
'NA')
What I understand is that, if A is <=5 and B >=2 then ALL are YES, if not then NO, but I am confused the second ifelse condition. Any help will be highly appreciated.
Thanks
This code aims to define a new column, Y in the data set x. The column Y will populate based on the following statements:
If we rewrite your ifelse expression using expanded syntax, it might be easier to understand.
x$Y <- ifelse(x$A <= 5 & abs(x$B) >= 2, ifelse(x$B > 2, "YES", "NO"), 'NA')
# becomes
if (x$A <= 5 & abs(x$B) >= 2) {
if (x$B > 2) {
x$Y <- "YES"
} else {
x$Y <- "NO"
}
} else {
x$Y <- NA
}
The second nested ifelse() corresponds to the inner if above. It checks the value of x$B to see if it be greater than 2, or less than -2 (one of these much be the case based on the earlier check abs(x$B) >= 2. If the former be the case, then x$Y gets assigned to YES, otherwise it gets assigned to NO.

How can I create a function that converts a vector of fractional odds to percentages?

converter2 <- function(odds){
if(grepl("/", odds) == T){
x <- str_split(odds, "/")
y <- as.numeric(x[[1]][1])
z <- as.numeric(x[[1]][2])
a <- (1 / ((y/z) + 1)) * 100
return(a)
}
else{
x <- as.numeric(odds)
x <- 1/(x + 1)
return(x*100)
}
}
This is the code I have been using to create a function that converts a single character fractional odds to a percentage (e.g. if you write "7/2" it will return 22.222, but it doesn't work on a vector, returning the error:
1: In if (grepl("/", odds) == T) { :
the condition has length > 1 and only the first element will be used
Does anyone have a good way to fix this? I was thinking of using an ifelse statement but can't figure out what will work. Thanks.
I have been out of touch with R so bear with me.
sapply(vector, converter2)
This would apply the function on every element and return a vector.
Another option is
Vectorize(converter2)(c("7/22","3/7"))

Use if else statement for Dummy-Coding in R

I tried to create a If Else Statement to Recode my Variable in a Dummy-Variable.
I Know there is the ifelse() Function and the fastDummy-Package, but I tried this Way without succes.
Why does this not work? I want to learn and understand R in a better Way.
if(df$iscd115==1){
df$iscd1151 <- 1
} else {
df$iscd1151 <- 0
}
This should be a reasonable solution.
First we'll find out what the positions of your important columns are, and then we'll apply a function that will search the rows (margin = 1) that will check if that our important column is 1 or 0, and then modify the other column accordingly.
col1 <- which(names(df) == "iscd115")
col2 <- which(names(df) == "iscd1151")
mat <- apply(df, margin = 1, function(x) {
if (x[col1] == 1) {x[col2] <- 1
} else {
x[col2] == 0
}
x
})
Unfortunately, this transforms the original data frame into a transposed matrix. We can re-transpose the matrix back and turn it back into a data frame with the following.
new_df <- as.data.frame( t(mat))

R: Simple Function with For Loop

I have an elementary question that I sadly cannot figure out. I have a set of numeric vector of 1s and 0s that are stored in the return variable below and whose sums are stored in the totals variable. I would like to check each of these individual vectors to see if there were consecutive zeroes in the result, and then return the total number of times this occurred. However, I'm quite rusty and/or bad at for loops/functions and cannot get this result. My latest attempt is below. Any suggestions are welcome - appreciate the help.
set.seed(1)
return = ifelse(runif(10) <= 0.6, 1, 0)
totals = sapply(1:10, function (x) sum(ifelse(runif(10)<=0.6,1,0)))
sums = function (x) {
g = 0
for (i in 1:length(x)-1) {
sum(ifelse (x[i]+x[i+1]=0,1,0))
}
return (g)
}
Although this is not the most efficient way to do so (see akrun's answer), we can get your for loop to work:
sums=function (x)
{
g=0
# watch your brackets! 1:3-1 returns c(0,1,2), not c(1,2)!
for (i in 1:length(x)-1)
{
# To test for equality, use a double ==, rather than a single.
# also, your 'g' variable is not updated, which is what you want to do.
sum(ifelse (x[i]+x[i+1]=0,1,0))
}
return (g)
}
Corrected:
sums <-function(x)
{
g=0
for (i in 1:(length(x)-1))
{
g= g+ifelse(x[i]+x[i+1]==0,1,0)
}
return (g)
}
You can call your function by:
return=ifelse(runif(10)<=0.6,1,0)
sums(return)
Or to generate ten vectors with random 1's and 0's, and apply your function to them, you could do:
totals = lapply(1:10, function (x) ifelse(runif(10) <= 0.6, 1, 0))
sapply(totals,sums)
Hope this helps!
If we are looking for the number of times consecutive 0's occur (i.e. greater than 1) and its length, then use rle
with(rle(return), lengths[values==0 & lengths > 1])
#[1] 4
The return vector is
return
#[1] 1 1 1 0 1 0 0 0 0 1
Now, we can see the 4 consecutive number of 0's. Just to show that the answer matches the initial vector
A for loop (incorrect answer just for the sake of answering)
sums <- function (x) {
g <- 0
for (i in tail(seq_along(x), -1)) {
if(x[i-1]==0 & x[i]==0) {
g <- g+1
}
}
g
}
sums(return)

Confused about if statement and for loop in R

So I have a Data frame in R where one column is a variable of a few factors and I want to create a handful of dummy variables for each factor but when I write a loop to do this I get an error.
So for example if the column is made up of various factors a, b, c and I want to code a dummy variable of 1 or 0 for each one, the code I have to create one is:
h = rep(0, nrow(data))
for (i in 1:nrow(data)) {
if (data[,1] == "a") {
h[i] = 1
} else {
h[i] = 0
}
}
cbind(data, h)
This gives me the error message "the condition has length > 1 and only the first element will be used" I have seen in other places on this site saying I should try and write my own function to solve problems and avoid for loops and I don't really understand a) how to solve this by writing a function (at least immediately) b)the benefit of doing this as a function rather than with loops.
Also I ended up using the ifelse statement to create each vector and then cbind to add it to the data frame but an explanation would really be appreciated.
Change if (data[,1] == "a") { to if (data[i,1] == "a") {
Aakash is correct in pointing out the problem in your loop. Your test is
if (data[,1] == "a")
Since your test doesn't depend on i, it will be the same for every iteration. You could fix your loop like this:
h = rep(0, nrow(data))
for (i in 1:nrow(data)) {
if (data[i, 1] == "a")
h[i] = 1
} else {
h[i] = 0
}
}
We could even simplify, since h is initialized to 0, there is no need to set it to 0 in the else case, we can just move on:
for (i in 1:nrow(data)) {
if (data[i, 1] == "a")
h[i] = 1
}
}
A more substantial improvement would be to introduce vectorization. This will speed up your code and is usually easier to write once you get the hang of it. if can only check a single condition, but ifelse is vectorized, it will take a vector of tests, a vector of "if true" results, a vector of "if false" results, and combine them:
h = ifelse(data[, 1] == "a", 1, 0)
With this, there is no need to initialize h before the statement, and we could add it directly to a data frame:
data$h = ifelse(data[, 1] == "a", 1, 0)
In this case, your test case and results are so simple, that we can do even better.
data[, 1] == "a" ## run this and look at the output
The above code is just a boolean vector of TRUE and FALSE. If we run as.numeric() on it TRUE values will be coerced to 1s and FALSE values will be coerced to 0s. So we can just do
data$h = as.numeric(data[, 1] == "a")
which will be even more efficient than ifelse.
This operation is so simple that there is no benefit in writing a function to do it.

Resources