So, I writting a function that takes dataframe and unique number <1, 5>
let say we want a unique number to be 3 in this case
how_much = 100
A <- sample(how_much, replace = TRUE, x = 1:5)
B <- sample(how_much, replace = TRUE, x = 1:5)
VennData <- data.frame(A, B)
and then return a described table as below:
count A B
24 TRUE TRUE
20 TRUE FALSE
13 FALSE TRUE
43 FALSE FALSE
when we can see that we have 24 observations where both A and B is equal to 3,
20 observations have A equal to 3 and B non equal to 3,
13 observations have A not equal to 3 and B equal to 3 etc...
With set.seed(43)
library(dplyr)
VennData %>%
mutate(A = (A == 3),
B = (B == 3)) %>%
count(A, B)
## A tibble: 4 x 3
# A B n
# <lgl> <lgl> <int>
#1 FALSE FALSE 64
#2 FALSE TRUE 20
#3 TRUE FALSE 13
#4 TRUE TRUE 3
In base R,
aggregate(Count ~ ., transform(VennData, A = A == 3, B = B == 3, Count = 1), sum)
# A B Count
#1 FALSE FALSE 64
#2 TRUE FALSE 13
#3 FALSE TRUE 20
#4 TRUE TRUE 3
An option with data.table
library(data.table)
set.seed(43)
setDT(VennData)[, .N, .(A = A == 3, B = B == 3)]
# A B N
#1: FALSE FALSE 64
#2: FALSE TRUE 20
#3: TRUE TRUE 3
#4: TRUE FALSE 13
Related
I have a dataframe of two columns of T and F.
I want to know
which row is T in the first and F in the second
which row is F in the first and T in the second
which row is F in both
I have very little clues on the matter, con someone shine a light?
You can use case when
library(dplyr)
df = data.frame(x = c("T","T","F","F","F"), y = c("T","F","T","F","X"))
df %>%
mutate(condition = case_when(
x == "T" & y == "T" ~ "Both are T",
x == "T" & y == "F" ~ "First is T fecond is F",
x == "F" & y == "F" ~ "Both are F",
x == "F" & y == "T" ~ "First is F, second is T",
TRUE ~ "Something else"
))
#> x y condition
#> 1 T T Both are T
#> 2 T F First is T fecond is F
#> 3 F T First is F, second is T
#> 4 F F Both are F
#> 5 F X Something else
Created on 2021-08-05 by the reprex package (v2.0.0)
Here is one possible way to solve your problem:
library(dplyr)
df <- data.frame(a = rep(c(T, F, T, F), each=2),
b = rep(c(T, T, F, F), each=2))
# a b
# 1 TRUE TRUE
# 2 TRUE TRUE
# 3 FALSE TRUE
# 4 FALSE TRUE
# 5 TRUE FALSE
# 6 TRUE FALSE
# 7 FALSE FALSE
# 8 FALSE FALSE
df %>%
mutate(newcol = case_when(a & !b ~ "first=T second=F",
!a & b ~ "first=F second=T",
!a & !b ~ "both=F",
TRUE ~ "other"))
# a b newcol
# 1 TRUE TRUE other
# 2 TRUE TRUE other
# 3 FALSE TRUE first=F second=T
# 4 FALSE TRUE first=F second=T
# 5 TRUE FALSE first=T second=F
# 6 TRUE FALSE first=T second=F
# 7 FALSE FALSE both=F
# 8 FALSE FALSE both=F
You can treat [a,b] columns as a 2-bit binary number vector, and a*2+b transfer it from binary to decimal. Thus, 2*a+b+1 is mapped to 1,2,3,4.
Try the base R code below
transform(
df,
newcol = c("both=F", "first=F,second=T", "first=T,second=F", "other")[a * 2 + b + 1]
)
which gives
a b newcol
1 TRUE TRUE other
2 TRUE TRUE other
3 FALSE TRUE first=F,second=T
4 FALSE TRUE first=F,second=T
5 TRUE FALSE first=T,second=F
6 TRUE FALSE first=T,second=F
7 FALSE FALSE both=F
8 FALSE FALSE both=F
Data
df <- data.frame(a = rep(c(T, F, T, F), each=2),
b = rep(c(T, T, F, F), each=2))
Let's say I have a data frame:
data <- data.frame(w = c(1, 2, 3, 4), x = c(F, F, F, F), y = c(T, T, F, T),
z = c(T, F, F, T), z1 = c(12, 4, 5, 15))
data
#> w x y z z1
#> 1 1 FALSE TRUE TRUE 12
#> 2 2 FALSE TRUE FALSE 4
#> 3 3 FALSE FALSE FALSE 5
#> 4 4 FALSE TRUE TRUE 15
Question
How do I filter the rows in which all boolean variables are FALSE? In this case, row 3.
Or in other words, I would like to get a data frame that has at least one TRUE value per row.
Expected output
#> w x y z z1
#> 1 1 FALSE TRUE TRUE 12
#> 2 2 FALSE TRUE FALSE 4
#> 3 4 FALSE TRUE TRUE 15
Attempt
library(tidyverse)
data %>% filter(x == T | y == T | z == T)
#> w x y z z1
#> 1 1 FALSE TRUE TRUE 12
#> 2 2 FALSE TRUE FALSE 4
#> 3 4 FALSE TRUE TRUE 15
Above is a working option, but not scalable at all. Is there a more convenient option using the dplyr's filter() function?
rowSums() is a good option - TRUE is 1, FALSE is 0.
cols = c("x", "y", "z")
## all FALSE
df[rowSums[cols] == 0, ]
## at least 1 TRUE
df[rowSums[cols] >= 1, ]
## etc.
With dplyr, I would use the same idea like this:
df %>%
filter(
rowSums(. %>% select(all_of(cols))) >= 1
)
With dplyr's filter(),
library(dplyr)
filter(data, (x + y + z) > 0 )
w x y z z1
1 1 FALSE TRUE TRUE 12
2 2 FALSE TRUE FALSE 4
3 4 FALSE TRUE TRUE 15
# after #Gregor Thomas's suggestion on using TRUE or FALSE
df[!(apply(!df[, c('x', 'y', 'z')], 1, all)), ]
# without rowSums
df[!(apply(df[, c('x', 'y', 'z')] == FALSE, 1, all)), ]
# with rowSums
df[rowSums(df[, c('x', 'y', 'z')] == FALSE) != 3, ]
# w x y z z1
#1 1 FALSE TRUE TRUE 12
#2 2 FALSE TRUE FALSE 4
#4 4 FALSE TRUE TRUE 15
I am trying to create three new columns with values depending on a particular order of three logical type columns.
eg I have this:
a b c
1 TRUE TRUE TRUE
2 TRUE FALSE TRUE
3 TRUE FALSE TRUE
And depending if going across the row the values are TRUE, TRUE, TRUE as in row 1, then create three new columns with the values 1,1,1 but if the order is TRUE,FALSE,TRUE as in row 2 and 3 then the values would be 2,3,3. Just to note, a value of TRUE does not = 1 but rather a value I define depending on all three logical values (A total of 8 possible combinations each defined by three separate numbers). So I get something like this:
a b c d e f
1 TRUE TRUE TRUE 5 5 2
2 TRUE FALSE TRUE 2 3 3
3 TRUE FALSE TRUE 2 3 3
If someone could point me in the right direction to do this as efficiently as possible it would be greatly appreciated as I am relatively new to R.
If there is no logic in getting values for the columns and you need to add conditions individually for each combination you can use if/else.
df[c('d', 'e', 'f')] <- t(apply(df, 1, function(x) {
if (x[1] && x[2] && x[3]) c(5, 5, 2)
else if (x[1] && !x[2] && x[3]) c(2, 3, 3)
#add more conditions
#....
}))
df
# a b c d e f
#1 TRUE TRUE TRUE 5 5 2
#2 TRUE FALSE TRUE 2 3 3
#3 TRUE FALSE TRUE 2 3 3
Here's a dplyr solution using case_when. On the left side of the ~ you define your conditions, and on the right side of the ~ you assign a value for when those conditions are met. If a condition is not met (i.e. all FALSE values), you will return NA.
df %>%
mutate(d =
case_when(
a == TRUE & b == TRUE & c == TRUE ~ 5,
a == TRUE & b == FALSE & c == TRUE ~ 2
),
e =
case_when(
a == TRUE & b == TRUE & c == TRUE ~ 5,
a == TRUE & b == FALSE & c == TRUE ~ 3
),
f =
case_when(
a == TRUE & b == TRUE & c == TRUE ~ 2,
a == TRUE & b == FALSE & c == TRUE ~ 3
))
Which gives you:
a b c d e f
<lgl> <lgl> <lgl> <dbl> <dbl> <dbl>
1 TRUE TRUE TRUE 5 5 2
2 TRUE FALSE TRUE 2 3 3
3 TRUE FALSE TRUE 2 3 3
Data:
df <- tribble(
~a, ~b, ~c,
TRUE, TRUE, TRUE,
TRUE, FALSE, TRUE,
TRUE, FALSE, TRUE
)
This question already has answers here:
Find how many times duplicated rows repeat in R data frame [duplicate]
(4 answers)
Closed 4 years ago.
I have a large matrix filled with True/False values under each column. Is there a way I can summarize the matrix so that every row is unique and I have a new column with the sum of how often that row appeared.
Example:
A B C D E
[1] T F F T F
[2] T T T F F
[3] T F F T T
[4] T T T F F
[5] T F F T F
Would become:
A B C D E total
[1] T F F T F 2
[2] T T T F F 2
[3] T F F T F 1
EDIT
I cbind this matrix with a new column rev so I now have a data.frame that looks like
A B C D E rev
[1] T F F T F 2
[2] T T T F F 3
[3] T F F T T 5
[4] T T T F F 2
[5] T F F T F 1
And would like a data.frame that also sums the rev column as follows:
A B C D E rev total
[1] T F F T F 3 2
[2] T T T F F 5 2
[3] T F F T T 5 1
An approach with dplyr :
use as.data.frame (or here as_tibble) first if you start from a matrix. In the end you need to have a data.frame anyway as you'll have both numeric and logical in your table.
mat <- matrix(
c(T, F, F, T, F, T, T, T, F, F, T, F, F, T, T, T, T, T, F, F, T, F, F, T, F),
ncol = 5,
byrow = TRUE,
dimnames = list(NULL, LETTERS[1:5])
)
library(dplyr)
mat %>%
as_tibble %>% # convert matrix to tibble, to be able to group
group_by_all %>% # group by every column so we can count by group of equal values
tally %>% # tally will add a count column and keep distinct grouped values
ungroup # ungroup the table to be clean
#> # A tibble: 3 x 6
#> A B C D E n
#> <lgl> <lgl> <lgl> <lgl> <lgl> <int>
#> 1 TRUE FALSE FALSE TRUE FALSE 2
#> 2 TRUE FALSE FALSE TRUE TRUE 1
#> 3 TRUE TRUE TRUE FALSE FALSE 2
Created on 2018-05-29 by the reprex package (v0.2.0).
And a base solution:
df <- as.data.frame(mat)
df$n <- 1
aggregate(n~.,df,sum)
# A B C D E n
# 1 TRUE TRUE TRUE FALSE FALSE 2
# 2 TRUE FALSE FALSE TRUE FALSE 2
# 3 TRUE FALSE FALSE TRUE TRUE 1
Or as a one liner: aggregate(n~.,data.frame(mat,n=1),sum)
count function from plyr is exactly what you are looking for (suppose m is your matrix):
plyr::count(m)
# x.A x.B x.C x.D x.E freq
#1 TRUE FALSE FALSE TRUE FALSE 2
#2 TRUE FALSE FALSE TRUE TRUE 1
#3 TRUE TRUE TRUE FALSE FALSE 2
If you have an object mat as defined in #Moody_Mudskipper's answer, you can do
library(data.table)
dt <- as.data.table(mat)
dt[, .N, by = names(dt)]
# A B C D E N
# 1: TRUE FALSE FALSE TRUE FALSE 2
# 2: TRUE TRUE TRUE FALSE FALSE 2
# 3: TRUE FALSE FALSE TRUE TRUE 1
Explanation
by = <names> divides the data table into groups of rows, where the value of all the variables in <names> is equal across rows. If you do by = names(dt) it will divide into groups where all variables are equal.
.N is the number of observations in the given group of rows.
For your edit, if your data.frame is named df, you can do
setDT(df) # convert to data table
df[, .(rev = sum(rev), total = .N), by = A:E] # get desired output
# A B C D E rev N
# 1: TRUE FALSE FALSE TRUE FALSE 3 2
# 2: TRUE TRUE TRUE FALSE FALSE 5 2
# 3: TRUE FALSE FALSE TRUE TRUE 5 1
Suppose I have an outcome such like:
df<-data.frame(id=rep(letters[1:4], each=4), stringsAsFactors=FALSE,
test=c(rep(FALSE, 4), rep(c(FALSE, TRUE), 4), rep(TRUE, 4)))
id test
1 a FALSE
2 a FALSE
3 a FALSE
4 a FALSE
5 b FALSE
6 b TRUE
7 b FALSE
8 b TRUE
9 c FALSE
10 c TRUE
11 c FALSE
12 c TRUE
13 d TRUE
14 d TRUE
15 d TRUE
16 d TRUE
What I wanted to see is whether the test results were consistent across each subject. Such that:
id consist
1 a TRUE
2 b FALSE
3 c FALSE
4 d TRUE
What is an easy way to realize this in R?
Here is a method using aggregate:
aggregate(test ~ id, data=df, FUN=function(x) min(x) == max(x))
id test
1 a TRUE
2 b FALSE
3 c FALSE
4 d TRUE
For each, id, the function checks whether the min of the test results equal the maximum of the results.
A second method is to check if there are any differences in the values using diff:
aggregate(test ~ id, data=df, FUN=function(x) max(abs(diff(x))) == 0)
id test
1 a TRUE
2 b FALSE
3 c FALSE
4 d TRUE
Here, taking the maximum of the absolute value to get the magnitude of the differences.
Could also check if either TRUE or FALSE isn't present at all by group using table and rowSums combination
rowSums(table(df) == 0)
# a b c d
# 1 0 0 1
Or closer to your desired output
data.frame(test = rowSums(table(df) == 0) == 1)
# test
# a TRUE
# b FALSE
# c FALSE
# d TRUE
Here is an option using data.table
library(data.table)
setDT(df)[, .(consist= all(test)| all(!test)) , by = id]
# id consist
#1: a TRUE
#2: b FALSE
#3: c FALSE
#4: d TRUE
Or use uniqueN
setDT(df)[,.(consist = uniqueN(test)==1) , by = id]
Another approach using dplyr package
df %>%group_by(id) %>% summarise(consist=ifelse(var(test)==0,TRUE,FALSE))
Thanks to #David Arenburg's comment, We can simplify above using base R by doing this
data.frame(test=with(df, tapply(test, id, var)) == 0)