I have two columns (df$Z and df$A)
I basically want to say: if df$Z is less than 5, then fill df$A with an NA and if not then leave df$A alone. I've tried these things but am not sure where I'm going wrong or what the error message means.
if(df$X<5){df$A <- NA}
Error:
In if (df$X < 5) { : the condition has length > 1 and only the first element will be used
I also tried to do something more like this.
for(i in dfX){
if(df$X<5){
df$A <- "NA"
}
}
No if statement needed. That's the magic of vectorization.
df$A[df$Z < 5] <- NA
A simple way is the "is.na<-" function:
is.na(df$A) <- df$Z < 5
The vectorized form of the if statement in R is the ifelse() function:
df$A <- ifelse( df$X < 5, NA, df$A )
However, in this case I would also go with #mark-heckmann's solution.
And please note, that "NA"is not the same as NA.
Related
Consider below expression:
x$Y = ifelse(x$A<= 5 & abs(x$B) >= 2,
ifelse(x$B> 2 ,"YES","NO"),
'NA')
What I understand is that, if A is <=5 and B >=2 then ALL are YES, if not then NO, but I am confused the second ifelse condition. Any help will be highly appreciated.
Thanks
This code aims to define a new column, Y in the data set x. The column Y will populate based on the following statements:
If we rewrite your ifelse expression using expanded syntax, it might be easier to understand.
x$Y <- ifelse(x$A <= 5 & abs(x$B) >= 2, ifelse(x$B > 2, "YES", "NO"), 'NA')
# becomes
if (x$A <= 5 & abs(x$B) >= 2) {
if (x$B > 2) {
x$Y <- "YES"
} else {
x$Y <- "NO"
}
} else {
x$Y <- NA
}
The second nested ifelse() corresponds to the inner if above. It checks the value of x$B to see if it be greater than 2, or less than -2 (one of these much be the case based on the earlier check abs(x$B) >= 2. If the former be the case, then x$Y gets assigned to YES, otherwise it gets assigned to NO.
I am quite new to this kind of function in R. What I am trying to do is to use the if statement over a vector.
Specifically, let's say we have a vector of characters:
id <- c('4450', '73635', '7462', '12')
What I'd like to do is to substitute those elements containing a specific number of characters with a particular term. Here what I tried so far:
for (i in 1:length(id)) {
if(nchar(i) > 3) {
id[i] <- 'good'
}
else id[i] <- 'bad'
}
However, the code doesn't work and I don't understand why. Also I'd like to ask you:
How can use multiple conditions in this example? Like for those elements with nchar(i) > 6 susbstitute with 'mild', nchar(i) < 2 susbsitute with 'not bad' and so on.
In your for statement, i is the iterator, not the actual element of your vector.
I think your code would work if you replace :
if(nchar(i) > 3)
by
if(nchar(id[i]) > 3)
You could use dplyr::case_when to include multiple such conditions.
temp <- nchar(id)
id1 <- dplyr::case_when(temp > 6 ~ 'mild',
temp < 2 ~ 'not bad',
#default condition
TRUE ~ 'bad')
Or using nested ifelse
id1 <- ifelse(temp > 6, 'mild', ifelse(temp < 2, 'not bad', 'bad'))
I use elseif to sanitise data in a real world data base that is subjected to typing errors.
Lets say I want to sanitise a value of X which I know can't be above 100 in real world situations so I just want to turn everything above 100 to NA values not to be included in the analysis.
So I would do:
df$x <- ifelse(df$x > 100, NA, df$x)
this turns all values above 100 to NA and keeps the other ones
This feels quite cumbersome and makes the code unreadable when I use the real variable names which are quite long.
Is there any shorter way to do what I am trying to perform?
Thanks!
Is there any way in r to shorten this pea
The simplest way I am aware of is with function is.na<-.
is.na(df$x) <- df$x > 100
Explanation.
Function is.na<- is a generic function defined in file
src/library/base/R/is.R as
`is.na<-` <- function(x, value) UseMethod("is.na<-")
One method is defined in the file, the default method.
`is.na<-.default` <- function(x, value)
{
x[value] <- NA
x
}
This is what S3's method dispatch mechanism calls in the answer's code line. An alternative way of calling it is the functional form.
`is.na<-`(df$x, df$x > 100)
Use data.table
setDT(df)
df[x > 100, x := NA]
If the operation is to be applied for several columns,
column.names <- names(df)[names(df) %in% column.names]
for(i.col in column.names){
set(df, which(df[[i.col]] > 100), i.col, NA)
}
Try This answer will help.
df <- data.frame('X'=c(1,2,3,4,NA,100,101,102))
df$X <- as.numeric(df$X)
df$X <- ifelse((is.na(df$X) | df$X >100),NA,df$X)
You can use the column index instead of column names then.
col <- which(names(df) == 'x')
df[[col]] <- df[[col]] * c(1, NA)[(df[[col]] > 100) + 1]
Or
df[[col]] <- with(df, replace(df[[col]], df[[col]] > 100, NA))
So here you use column name only once.
I have a dataframe column with NA, I want to how can I use apply (or lapply, sapply, ...) to the column.
I've tried with apply and lapply, but it return an error.
The function I want to apply to the column is:
a.b <- function(x, y = 165){
if (x < y)
return('Good')
else if (x > y)
return('Bad')
}
the column of the dataframe is:
data$col = 180 170 NA NA 185 185
When I use apply I get:
apply(data$col, 2, a.b)
Error in apply(data$col, 2, a.b) :
dim(X) must have a positive length
I have try dim(data$col) and the return is NULL and I think it is because of the NA's.
I also use lapply and I get:
lapply(data$col, a.b)
Error in if (x < y) return("Good") else if (x > y) return("Bad") :
missing value where TRUE/FALSE needed
This is for a course of R for beginners that I am doing so I am sorry if I made some mistakes. Thanks for taking your time to read it and trying to help.
apply is used on a matrix, not a vector. Try:
a.b <- function(x, y = 165){
if (is.na(x)){
return("NA")
} else if (x < y){
return('Good')} else if (x > y){
return('Bad')}
}
data$col=sapply(data$col,a.b)
You should be able to solve this with mapply by specifying the values to pass into your parameters:
mapply(a.b, x = data[,'col'], y = 165)
Note that you may need to modify your a.b.() function in order to manage the NA's.
There's a few issues going on here:
apply is meant to run on a something with a dimension to act over, which is the MARGIN argument. A column, which you're passing to apply has no dimension. see below:
> dim(mtcars)
[1] 32 11
> dim(mtcars$cyl)
NULL
apply and lapply are meant to run over all columns (or rows if you're using that margin for apply). If you want to just replace one column, you should not use apply. Do something like data$my_col <- my_func(data$my_col) if you want to replace my_col with the result of passing it to my_func
NA values do not return TRUE or FALSE when using an operator on them. Note that 7 < NA will return NA. Your if statement is looking for a TRUE or FALSE value but getting an NA value, hence the error in your second attempt. If you want to handle NA values, you may need to incorporate that into your function with is.na.
Your function should be vectorized. See circle 3 of the R-Inferno. Currently, it will just return length 1 vectors of "Good" or "Bad". My hunch is what you want is similar to the following (although not exactly same if x == y)
a.b <- function(x, y = 165){
ifelse(x < y, "Good", "Bad")
}
I beleive using the above info should get you where you want to be.
I have just started to learn R, and trying to do the following task.
I have a vector of 10 random values few are NAs and few are numeric values in it, like
a <- rnorm(100)
b <- rep(NA, 100)
c <- sample(c(a, b), 10)
now I want to make another vector "d" which has indices of all the NA values in "c" for example
d <- c(2, 7, 9)
I tried
d <- which(c %in% is.na(c))
but its not giving me desired result
also what is wrong with this code i tried for the above purpose
navects <- function(x) {
for(i in 1:length(x)) {
if(is.na(x[i])) c(i)
}
}
You can try with which
which(is.na(c))
NOTE: c is also a function, so it is better not to name objects with c.