How to deal with NA when using lappy in R - r

I have a data frame err consisting of 796 rows and 54432 columns
I have to check the columns that have values not exceeding 20 and -20.
This is my approach:
do.call(cbind, (lapply(err, function(x) if((all(x<20) & all(x>-20))) return(x) )))
I Have NA values in all of the columns and after i got
Error in if ((all(x < 20) & all(x > -20))) return(x) :
missing value where TRUE/FALSE needed
I update the command using !is.na as:
do.call(cbind, (lapply(err, function(x) if(!is.na(all(x<20) & all(x>-20))) return(x) )))
But in this case all the columns are reported and the filter does not work.
Any help?

Since I don't have an example df check if this works for you:
do.call("cbind", lapply(err, function(x) if(min(x, na.rm=T) > -20 & max(x, na.rm=T) < 20) return(x) ))

Using apply
err[apply(err, 2, function(x) min(x,na.rm=T) > -20 & max(x,na.rm=T) < 20)]

Related

Create indicator variables within a list

I have a list containing sequences of numbers. I want to create a list that indicates all non-zero elements up to the first element that matches a defined limit. I also want to create a list that indicates all non-zero elements after the first element to match the defined limit.
I prefer a base R solution. Presumably the solution will use lapply, but I have not been able to come up with a simple solution.
Below is a minimally reproducible example in which the limit is 2:
my.limit <- 2
my.samples <- list(0,c(1,2),0,c(0,1,1),0,0,0,0,0,c(1,1,2,2,3,4),c(0,1,2),0,c(0,0,1,1,2,2,3))
Here are the two desired lists:
within.limit <- list(0,c(1,1),0,c(0,1,1),0,0,0,0,0,c(1,1,1,0,0,0),c(0,1,1),0,c(0,0,1,1,1,0,0))
outside.limit <- list(0,c(0,0),0,c(0,0,0),0,0,0,0,0,c(0,0,0,1,1,1),c(0,0,0),0,c(0,0,0,0,0,1,1))
We can use match with nomatch argument as a very big number (should be greater than any length of the list, for some reason I couldn't use Inf here.)
within.limit1 <- lapply(my.samples, function(x)
+(x > 0 & seq_along(x) <= match(my.limit, x, nomatch = 1000)))
outside.limit1 <- lapply(my.samples, function(x)
+(seq_along(x) > match(my.limit, x, nomatch = 1000)))
Checking if output is correct to shown one :
all(mapply(function(x, y) all(x == y), within.limit, within.limit1))
#[1] TRUE
all(mapply(function(x, y) all(x == y), outside.limit, outside.limit1))
#[1] TRUE
I would do
within.limit <- lapply(my.samples, function(x)
+(x!=0 & (x<limit | cumsum(x == limit)==1)))
outside.limit <- lapply(my.samples, function(x)
+(x!=0 & (x>limit | cumsum(x == limit)>1)))
foo <- function(samples, limit, within = TRUE) {
`%cp%` <- if (within) `<=` else `>`
lapply(samples, function(x) pmin(x, seq_along(x) %cp% match(my.limit, x, nomatch = 1e8)))
}
> all.equal(foo(my.samples, my.limit, FALSE), outside.limit)
# [1] TRUE
> all.equal(foo(my.samples, my.limit, TRUE), within.limit)
# [1] TRUE
We can use findInterval
lapply(my.samples, function(x)
+(x > 0 & seq_along(x) <= findInterval(my.limit, x)-1))
and
lapply(my.samples, function(x) +(seq_along(x) > findInterval(my.limit, x)-1))

I cant get rid of the NA no matter what i try - missing Value where TRUE/FALSE needed

I was trying to optimize my loop but I came across an issue and I havent found any direct solution here. I already checked out other threads like Error in if/while (condition) {: missing Value where TRUE/FALSE needed but it doesnt help me solving my problem I still have the same issue.
This is my code:
output <- character (nrow(df)) # predefine the length and type of the vector
condition <- (df$price < df$high & df$price > df$low) # condition check outside the loop
system.time({
for (i in 1:nrow(df)) {
if (condition[i]) {
output[i] <- "1"
}else if (!condition[i]){
output[i] <- "0"
}else {
output[i] <- NA
}
}
df$output <- output
})
I am basically checking if my price is in a certain range. If its inside the range i assign it a 1 and if its outside the range I assign it a 0. However, I have couple NA values and then my loop stops the moment i reach an NA.
Below you can see the working code if I filter out the NAs. But I would like to have a way which would handle the NAs as well.
df<- df%>% filter(!is.na(price))
output <- character (nrow(df)) # predefine the length and type of the vector
condition <- (df$price < df$high & df$price > df$low) # condition check outside the loop
system.time({
for (i in 1:nrow(df)) {
if (condition[i]) {
output[i] <- "1"
}else {
output[i] <- "0"
}
}
df$output <- output
})
Any idea how I could handle the NAs?
If/else in R doesn't like NAs. You could try this, where you start with checking for the NA condition on the input, and then check for TRUE or FALSE of your condition.
output <- character (nrow(df)) # predefine the length and type of the vector
condition <- (df$price < df$high & df$price > df$low) # condition check outside the loop
system.time({
for (i in 1:nrow(df)) {
if(is.na(condition[i])){
output[i] <- NA
}else (condition[i]) {
output[i] <- "1"
}else{
output[i] <- "0"
}
}
df$output <- output
})
I think you can do :
df$output <- as.integer(df$price < df$high & df$price > df$low)
which would handle all the cases.
For example,
df <- data.frame(price = c(10, 23, NA, 50), high = 25, low = 5)
df$output <- as.integer(df$price < df$high & df$price > df$low)
df
# price high low output
#1 10 25 5 1
#2 23 25 5 1
#3 NA 25 5 NA
#4 50 25 5 0
We can also do
df$output <- +(df$price < df$high & df$price > df$low)

Returning absent values without inducing integer (0)

I want to identify which values in one vector are present in another vector. Sometimes, in my application, none of the values of the first vector are present; in such cases I would like NA. My current approach returns integer(0) when this occurs:
l <- 1:3
m <- 2:5
n <- 4:6
l[l %in% m]
1] 2 3
l[l %in% n]
integer(0)
This post discusses how to capture integer(0) using length, but is there a way to avoid integer(0) in the first place, and do this operation in just one step? Answers to the previous question suggest that any could be used but I fail to see how that would work in this example.
You could catch the integer(0) with a custom function:
l <- 1:3
m <- 2:5
n <- 4:6
returnsafe <- function(a, b) {
result <- a[a %in% b]
if(is.integer(result) && length(result) == 0L) {
return(NA)
} else {
return(result)
}
}
> returnsafe(l, n)
[1] NA
You can do:
l[match(l, n)]
[1] NA NA NA
Or:
any(l[match(l, n)])
[1] NA

Repeated conditional change with sapply or a loop in R

I am trying to do a conditional change for a list of 11 columns in R. My conditional is always the same survey$only0 == 1. I wrote the following code:
survey$w.house[survey$only0 == 1] <- 1
survey$w.inc[survey$only0 == 1] <- 1
survey$w.jobs[survey$only0 == 1] <- 1
survey$w.com[survey$only0 == 1] <- 1
survey$w.edu[survey$only0 == 1] <- 1
survey$w.env[survey$only0 == 1] <- 1
survey$w.health[survey$only0 == 1] <- 1
survey$w.satisf[survey$only0 == 1] <- 1
survey$w.safe[survey$only0 == 1] <- 1
survey$w.bal[survey$only0 == 1] <- 1
survey$w.civic[survey$only0 == 1] <- 1
My code works well, but I would like to shorten my code using a loop or a function as sapply or lapply. Does anyone know how to do it ?
Thank you for your help !
David
We can do this easily with lapply by looping through the columns of interest ('nm1'), and replace the values of it to 1 where 'only0' is 1.
survey[nm1] <- lapply(survey[nm1], function(x) replace(x, survey$only0==1, 1))
Or as #Vlo mentioned the anonymous function call is not needed
survey[nm1] <- lapply(survey[nm1], replace, list = survey$only0==1, values=1)
where
nm1 <- c("w.house", "w.inc", "w.jobs", "w.com", "w.edu", "w.env",
"w.health", "w.satisf", "w.safe", "w.bal", "w.civic")
You can try,
survey[survey$only0 == 1, cols] <- 1
where cols are the columns for which you want to check the condition.
cols <- c("w.house", "w.inc", "w.jobs", "w.com", "w.edu", "w.env",
"w.health", "w.satisf", "w.safe", "w.bal", "w.civic")

How to properly avoid if-expressions by using vector indices?

x is a vector of integers ranging between 1 and 100
I created a function that determines in which category a number is:
x∈[1,20]: small
x∈[21,50]: med
x∈[51, 100]:large
Here the function:
x <- c(1:99)
vector.fun<-function(x){
x[x >= 1 & x <=20] <-"small"
x[x >= 21 & x <=50] <-"med"
x[x >=51 & x <=99] <-"large"
return(x)
}
vector.fun(89)
However as you can see, in the function my vector is 1:99 instead of 1:100, for some reason when i change it to:
x <- c(1:100)
vector.fun<-function(x){
x[x >= 1 & x <=20] <-"small"
x[x >= 21 & x <=50] <-"med"
x[x >=51 & x <=100] <-"large"
return(x)
}
vector.fun(100)
it doesn't recognise any number from the last line: x[x >=51 & x <=100] <-"large" and when it does it returns "med" instead of "large" as it should be.
what am I doing wrong? Which changes should I do in my function in order that 100 is included in the parameter and returns "large"?
It is indeed a coercion problem as mentioned in the comments above.
If you want to keep your function structure the way you created it, you can alter it as follows:
vector.fun<-function(y){
x <- y
x[y >= 1 & y <=20] <-"small"
x[y >= 21 & y <=50] <-"med"
x[y >=51 & y <=100] <-"large"
return(x)
}
Although the solution suggested by #alexis_laz is more concise and elegant:
vector.fun<-function(x){
cut(x, c(0,20,50,100), labels = c("small", "med", "large"))
}
Keep in mind, this second version will produce a factor type vector, while the first version will produce a character type vector.

Resources