Consider below expression:
x$Y = ifelse(x$A<= 5 & abs(x$B) >= 2,
ifelse(x$B> 2 ,"YES","NO"),
'NA')
What I understand is that, if A is <=5 and B >=2 then ALL are YES, if not then NO, but I am confused the second ifelse condition. Any help will be highly appreciated.
Thanks
This code aims to define a new column, Y in the data set x. The column Y will populate based on the following statements:
If we rewrite your ifelse expression using expanded syntax, it might be easier to understand.
x$Y <- ifelse(x$A <= 5 & abs(x$B) >= 2, ifelse(x$B > 2, "YES", "NO"), 'NA')
# becomes
if (x$A <= 5 & abs(x$B) >= 2) {
if (x$B > 2) {
x$Y <- "YES"
} else {
x$Y <- "NO"
}
} else {
x$Y <- NA
}
The second nested ifelse() corresponds to the inner if above. It checks the value of x$B to see if it be greater than 2, or less than -2 (one of these much be the case based on the earlier check abs(x$B) >= 2. If the former be the case, then x$Y gets assigned to YES, otherwise it gets assigned to NO.
Related
I am quite new to this kind of function in R. What I am trying to do is to use the if statement over a vector.
Specifically, let's say we have a vector of characters:
id <- c('4450', '73635', '7462', '12')
What I'd like to do is to substitute those elements containing a specific number of characters with a particular term. Here what I tried so far:
for (i in 1:length(id)) {
if(nchar(i) > 3) {
id[i] <- 'good'
}
else id[i] <- 'bad'
}
However, the code doesn't work and I don't understand why. Also I'd like to ask you:
How can use multiple conditions in this example? Like for those elements with nchar(i) > 6 susbstitute with 'mild', nchar(i) < 2 susbsitute with 'not bad' and so on.
In your for statement, i is the iterator, not the actual element of your vector.
I think your code would work if you replace :
if(nchar(i) > 3)
by
if(nchar(id[i]) > 3)
You could use dplyr::case_when to include multiple such conditions.
temp <- nchar(id)
id1 <- dplyr::case_when(temp > 6 ~ 'mild',
temp < 2 ~ 'not bad',
#default condition
TRUE ~ 'bad')
Or using nested ifelse
id1 <- ifelse(temp > 6, 'mild', ifelse(temp < 2, 'not bad', 'bad'))
I'd like to know the shape or length of the filtered dataframe through multiple conditions. I have 2 methods I've used, but I'm a little stumped because they're giving me different outputs.
Method 1
x <- df[df$gender=='male',]
x <- x[x$stat == 0,]
nrow(x)
OUTPUT = Some Number
Method 2
nrow(sqldf('SELECT * FROM df WHERE gender == "male" AND stat == 0'))
OUTPUT = Some Number
I'm a little confused as to why the outputs would be different? Any ideas?
It looks like in method one you assigned x to df[df$gender=='male'] and then you replace x with assigning it to x[x$stat == 0]. So you will end up with nrow for how many stat == 0 only. Off of the top of my head with no dataset, maybe x <- df[df$gender=='male' & x$stat == 0] would work. Although I have never done it this way. I would use the subset function with x <- subset(x, df$gender=='male' & x$stat == 0) and then nrow(x).
Suppose I have a file A with (id,x,y) and another file B with (ID, xmin, xmax,ymin,ymax), with dim(A)~50000 and dim(B)~3000.
What I need is to add an additional column to A where each row is a vector composed of all the B$ID[j] for which A$x[i] is between B$xmin[j] and B$xmax[j] and, simultaneously, A$y[i] is between B$ymin[j] and B$ymax[j].
This vector will have a min dimension of 1 and a max dimension of 4.
(essentially I have a grid and I want to know in which cells of the grid the elements of A are falling. They will always fall in at least one cell to a maximum of 4)
How can I express it ?
Thanks for your help
Here you go. I could not test this with your data, however so there there might be an error.
getIDs <- function (x, y) {
found <- c()
for ( j in nrows(B) ) {
if ( x >= B[j,"xmin"] && x <= B[j,"xmax"] &&
y >= B[j,"ymin"] && y <= B[j,"ymax"] ) {
found <- append(found, B[j, "ID"])
}
}
return(found)
}
A$NewCol <- apply( A[, c("x", "y")], 1, function(x) getIDs(x[1], x[2]) )
I suggest you check this out here: Call apply-like function on each row of dataframe with multiple arguments from each row
Not very proud of this but it works:
A=data.table(id=c(1,1,1,1,1,2,2,2,2,2,2),x=c(1:5,2:7),y=c((3:7),(4:9)))
B=data.table(ID=c(1,2),xmin=c(1,2),xmax=c(5,7),ymin=c(3,4), ymax=c(7,9))
A$newcol <- apply(A,1,function(rowA) B$ID[apply(B,1,function(rowB) rowA[2]>=rowB[2] & rowA[2]<=rowB[3] & rowA[2]>=rowB[4] & rowA[2]<=rowB[5])])
I will work on finding the data.table / dplyr alternative which will be, I hope, nicer and more generic
I am trying to create an indictor variable, Z, in R, i.e If I have some event A, I want Z to give a result of 1 if A is true and 0 if A is false.
I have tried this:
Z=0
if(A==(d>=5 && d<=10))
{
Z=1
}
else
{
Z=0
}
But this doesn't work. I was also thinking i could try to write a separate function called indicator:
indicator = function()
Any suggestions would be really helpful, thank you
You could easily write something like this
indicator<-function(condition) ifelse(condition,1,0)
ifelse can be used on vectors, but it works perfectly fine on single logical values (TRUE or FALSE).
Booleans FALSE / TRUE can be coerced to be 0 or 1 and then be multiplied:
Indicator<-function(x, min=0, max=Inf)
{
as.numeric(x >= min) * as.numeric(x <= max)
}
You can use
a <- data.frame(a = -5:5, b = 1:11)
indicator <- function(data) I(data > 0) + 1 - 1
indicator(a)
a can be vector, data frame...
And you can chance the logical in I function with your interest.
There is no need to define A, just test the condition.
Also, remember that && and & have different uses in R (see R - boolean operators && and || for more details), so maybe that is part of your problem.
if (d>=5 & d<=10)
{
Z <- 1
}
else
{
Z <- 0
}
Or, as suggested in the other answer use ifelse:
z <- ifelse((d>=5 & d<=10), 1, 0)
I have two columns (df$Z and df$A)
I basically want to say: if df$Z is less than 5, then fill df$A with an NA and if not then leave df$A alone. I've tried these things but am not sure where I'm going wrong or what the error message means.
if(df$X<5){df$A <- NA}
Error:
In if (df$X < 5) { : the condition has length > 1 and only the first element will be used
I also tried to do something more like this.
for(i in dfX){
if(df$X<5){
df$A <- "NA"
}
}
No if statement needed. That's the magic of vectorization.
df$A[df$Z < 5] <- NA
A simple way is the "is.na<-" function:
is.na(df$A) <- df$Z < 5
The vectorized form of the if statement in R is the ifelse() function:
df$A <- ifelse( df$X < 5, NA, df$A )
However, in this case I would also go with #mark-heckmann's solution.
And please note, that "NA"is not the same as NA.