Looking for a better way: How can I make R check the values of a flexible subset of multiple columns element-wise (let's say Var2 and Var3 here) and write the result of the check to a new logical column?
Is there a shorter, more elegant way than using row-wise apply() here?
df <- read.csv(
text = '"Var1","Var2","Var3"
"","",""
"","","a"
"","a",""
"a","a","a"
"a","","a"
"","a",""
"","",""
"","","a"
"","a",""
"","","a"'
)
criticalColumns <- c("Var2", "Var3")
df$criticalColumnsAreEmpty <-
apply(df[, criticalColumns], 1, function(curRow) {
return(all(curRow == ""))
})
I could also do this in an explicit way, but this is not a flexible then:
df$criticalColumnsAreEmpty <- df$Var2 == "" & df$Var3 == ""
Desired output:
Var1 Var2 Var3 criticalColumnsAreEmpty
TRUE
a FALSE
a FALSE
a a a FALSE
a a FALSE
a FALSE
TRUE
a FALSE
a FALSE
a FALSE
We can use rowSums on the logical matrix
df$criticalColumnsAreEmpty <- !rowSums(df[criticalColumns]!="")
df$criticalColumnsAreEmpty
#[1] TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
Or another option (for big datasets to avoid converting to matrix for memory reasons) is loop over the columns, check whether the elements are blank and use Reduce with &
Reduce(`&`, lapply(df[criticalColumns], function(x) !nzchar(as.character(x))))
Related
I am trying to change the logical values (elements) of my list based on another list. Basically, where both lists are "TRUE", I want to change the value in the main list to "FALSE". Both lists are lengths of 5. For example
List_A <- list(c(TRUE,FALSE,TRUE),c(FALSE,TRUE,TRUE),c(FALSE,FALSE,FALSE),c(TRUE,TRUE,TRUE),c(TRUE,FALSE,TRUE))
List_B <-list(c(FALSE,FALSE,FALSE),c(TRUE,TRUE,FALSE),c(TRUE,TRUE,TRUE),c(FALSE,FALSE,TRUE),c(FALSE,TRUE,TRUE))
List B has sequences as name attributes.
Desired output:
Output <-
list(c(TRUE,FALSE,TRUE),c(FALSE,FALSE,TRUE),c(FALSE,FALSE,FALSE),c(TRUE,TRUE,FALSE),c(TRUE,FALSE,FALSE))
In other words, elements in listA remain the same unless they have matching TRUE values in both lists, which replaces them to FALSE.
I've tried running the for loop below but it doesn't work and I don't know how I would redirect the output, if it did.
for(i in 1:length(List_A)) { List_A[[i]][List_B[[i]]] <- FALSE }
You can take help of Map function.
If both the values are TRUE turn to FALSE or keep value from List_A.
Output <- Map(function(x, y) replace(x, x & y, FALSE), List_A, List_B)
Output
#[[1]]
#[1] TRUE FALSE TRUE
#[[2]]
#[1] FALSE FALSE TRUE
#[[3]]
#[1] FALSE FALSE FALSE
#[[4]]
#[1] TRUE TRUE FALSE
#[[5]]
#[1] TRUE FALSE FALSE
data
List_A <- list(c(TRUE,FALSE,TRUE),c(FALSE,TRUE,TRUE),c(FALSE,FALSE,FALSE),c(TRUE,TRUE,TRUE),c(TRUE,FALSE,TRUE))
List_B <- list(c(FALSE,FALSE,FALSE),c(TRUE,TRUE,FALSE),c(TRUE,TRUE,TRUE),c(FALSE,FALSE,TRUE),c(FALSE,TRUE,TRUE))
We can use map2
library(purrr)
map2(List_A, List_B, ~ !(.x & .y))
data
List_A <- list(c(TRUE,FALSE,TRUE),c(FALSE,TRUE,TRUE),c(FALSE,FALSE,FALSE),c(TRUE,TRUE,TRUE),c(TRUE,FALSE,TRUE))
List_B <- list(c(FALSE,FALSE,FALSE),c(TRUE,TRUE,FALSE),c(TRUE,TRUE,TRUE),c(FALSE,FALSE,TRUE),c(FALSE,TRUE,TRUE))
I often run into a situation where I have two vectors and I want to check if each element of vector 1 is in vector 2. I typically do it with an sapply() but wanted to know of there is a more concise way to do it or a single built in function for this. For example:
v1 = c(1,1,3,4,5,7)
v2 = c(1,5)
# desired output: [1] TRUE TRUE FALSE FALSE TRUE FALSE
# my solution
sapply(v1, function(x) x %in% v2)
We can just use %in% as it is vectorized
v1 %in% v2
#[1] TRUE TRUE FALSE FALSE TRUE FALSE
1)I need to intersect two vectors and return a vector with the same and with the intersected values.
intersect() does not return a vector with the same size.
2) Also why does this return c(TRUE TRUE TRUE) and not c(FALSE TRUE TRUE) ?
set1 = c(TRUE,FALSE,TRUE)
set2 = c(FALSE,FALSE,TRUE)
testset = set1 %in% set2
> print(testset)
[1] TRUE TRUE TRUE
I got as result TRUE TRUE TRUE and I need FALSE FALSE TRUE.
To do the intersection, you need to use the & operator, as follows:
testset = set1 & set2
This will give you the following result: FALSE FALSE TRUE
Hope it helps.
A %in% B checks for every element in A if that element is in B. The results always has the same length as length(A). Try e.g.
1:3 %in% 1:9
1:9 %in% 1:3
I think what you want is this:
set1 == set2
[1] FALSE TRUE TRUE
I'm trying to test a column of my dataset for dynamically changing given values. The values come from a previous calculation and change all the time, such that the ifelse command cannot be used.
I tried it with a for-loop since it needs to be flexible but it was not working. An example of my problem is below:
require(dplyr)
data <- data.frame(step=c(1,1,1,1,3,3,3,3,4,4,5,6,7,7,7,7,4,4,4,4,6,5,7,7,3,4,3,1))
data <- mutate(data, col2 = 0)
data <- mutate(data, col3 = 0)
data_check <- data.frame(step=c(3,4))
for(j in 1:length(data_check)){
for(i in 1:nrow(data)){
if(data$step[i] == data_check[j]){
data <- mutate(data, Occurrence = 1)
} else {
data <- mutate(data, Occurrence = 0)
}
}
}
The goal is to get an additional column 'Occurrence' in the dataset, which tells if any of the given values occur or not.
I can't understand what you're trying to do, but if you're trying to test if each entry in data$step is present in data_check or not, then something like:
data_check <- list(3,4) # so you can use the %in% operator
data$Occurrence <- data$step %in% data_check
[1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE
[13] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE
[25] TRUE TRUE TRUE FALSE
EDIT: and as Eumenedies said, you want to apply as.numeric() to that.
I have the following data frame:
df <- data.frame("Logical"=c("true",NA,"false","true","","false"),
"Numeric"=c(1,2,3,4,5,6))
unique(df$Logical)
length(df$Logical == TRUE)
I'm trying to figure out, how many TRUE-values do I have in my df$Logical column. But seems I'm missing something and length(df$Logical == TRUE) returns no of records in my logical column.
What I'm doing wrong in this particular case. Desired result should be 2 for TRUE-values in df$Logical column. Many thanks in advance.
We need to specify the string in the lower case as the values were 'true/false' and not exactly TRUE/FALSE. Also, instead of length, sum should be used. The sum gets the number of TRUE elements.
sum(df$Logical == "true")
#[1] 2
If there are NA elements in the column, use na.rm = TRUE
sum(df$Logical=='true', na.rm = TRUE)
#[1] 2
The length of a logical or any other vector would be the same as the original length/number of rows of the dataset.
length(df$Logical == "true")
#[1] 6
because it returns a logical vector of length 6.
df$Logical == "true"
#[1] TRUE FALSE FALSE TRUE FALSE FALSE
To get the counts of both true and false, we can use table
table(df$Logical)
First of all "true" and "false" as you put it into you data frame are not Booleans but simple strings.
Moreover, length(df$Logical == TRUE) will always return 6 in this example, i.e. the number of elements in the column. This is because df$Logical == TRUE returns a sequence of TRUE or FALSE. In your case it will return
FALSE NA FALSE FALSE FALSE FALSE
because the boolean expression is never true. However, the length of this will be 6 as returned by length().
To overcome the problem you might define your data frame like this
df <- data.frame("Logical"=c(TRUE,NA,FALSE,FALSE,NA,FALSE),
"Numeric"=c(1,2,3,4,5,6))
And then you can sum up the number of TRUE
sum(df$Logical == TRUE, na.rm = T)
[1] 2
na.rm = T is important here because otherwise the sum will return NA if one more more elements are NA.
Alternatively, you can work with strings to indicate true or false (and empty strings a NA)
Then you could write
df <- data.frame("Logical"=c("true",NA,"false","true","","false"),
"Numeric"=c(1,2,3,4,5,6))
sum(df$Logical == "true", na.rm = T)
[1] 2