Count by row with variable criteria - r

I have a data.frame in which I want to perform a count by row versus a specified criterion. The part I cannot figure out is that I want a different count criterion for each row.
Say I have 10 rows, I want 10 different criteria for the 10 rows.
I tried: count.above <- rowSums(Data > rate), where rate is a vector with the 10 criterion, but R used only the first as the criterion for the whole frame.
I imagine I could split my frame into 10 vectors and perform this task, but I thought there would be some simple way to do this without resorting to that.

Edit: this depends whether you want to operate over rows or columns. See below:
This is a job for mapply and Reduce. Suppose you have a data frame along the lines of
df1 <- data.frame(a=1:10,b=2:11,c=3:12)
Let's say we want to count the rows where a>6, b>3 and c>5. This is done with mapply:
mapply(">",df1,c(6,3,5),SIMPLIFY=FALSE)
$a
[1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
$b
[1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
$c
[1] FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
Now we use Reduce to find those which are all TRUE:
Reduce("&",mapply(">",df1,c(6,3,5),SIMPLIFY=FALSE))
[1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
Lastly, we use sum to add them all up:
sum(Reduce("&",mapply(">",df1,c(6,3,5),SIMPLIFY=FALSE)))
[1] 4
If you want a result for each row rather than a global aggregate, then apply is the function to use:
apply(df1,1,function(v) sum(v>c(6,3,5)))
[1] 0 0 1 2 2 2 3 3 3 3

Given the dummy data (from #zx8754s solution)
# dummy data
df1 <- data.frame(matrix(1:15, nrow = 3))
myRate <- c(7, 5, 1)
Solution using apply
Courtesy of #JDL
rowSums(apply(df1, 2, function(v) v > myRate))
Alternative solution using the Reduce pattern
Reduce(function(l, v) cbind(l[,1] + (l[,2] > myRate), l[,-2:-1]),
1:ncol(df1),
cbind(0, df1))

Related

How does R use square brackets to return values in a vector?

I came across a question like this: "retrieve all values less than or equal to 5 from a vector of sequence 1 through 9 having a length of 9". Now based on my knowledge so far, I did trial & error, then I finally executed the following code:
vec <- c(1:9) ## assigns to vec
lessThanOrEqualTo5 <- vec[vec <= 5]
lessThanOrEqualTo5
[1] 1 2 3 4 5
I know that the code vec <= 5 would return the following logical
[1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE
So my question is, how does R use these logical to return the appropriate values satisfying the condition since the code would end up having a structure like this vec[TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE]?

Structure numbers in a vector R

I try to subset values in R depending on values in column y like shown in the following:
I have the data set "data" which is like this:
data <- data.frame(y = c(0,0,2000,1500,20,77,88),
a = "bla", b = "bla")
And would end up with this:
I have this R code:
data <- arrange(subset(data, y != 0 & y < 1000 & y !=77 & [...]), desc(y))
print(head(data, n =100))
Which works.
However I would like to collect the values to exclude in a list as:
[0, 1000, 77]
And somehow loop through this, with the lowest possible running time instead of hardcoding them directly in the formula. Any ideas?
The list, should only contain "!=" operations:
[0, 77]
and the "<" should be remain in the formula or in another list.
I'm going to answer your original question because it's more interesting. I hope you won't mind.
Imagine you had values and operators to apply to your data:
my.operators <- c("!=","<","!=")
my.values <- c(0,1000,77)
You can use Map from base R to apply a function to two vectors. Here I'll use get so we can obtain the actual operator given by the character string.
Map(function(x,y)get(y)(data$y,x),my.values,my.operators)
[[1]]
[1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE
[[2]]
[1] TRUE TRUE FALSE FALSE TRUE TRUE TRUE
[[3]]
[1] TRUE TRUE TRUE TRUE TRUE FALSE TRUE
As you can see, we get a list of logical vectors for each value, operator pair.
To better understand what's going on here, consider only the first value of each vector:
get("!=")(data$y,0)
[1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE
Now we can use Reduce:
Reduce(`&`,lapply(my.values,function(x) data$y!=x))
[1] FALSE FALSE TRUE TRUE TRUE FALSE TRUE
And finally subset the data:
data[Reduce("&",Map(function(x,y)get(y)(data$y,x),my.values,my.operators)),]
y a b
5 20 bla bla
7 88 bla bla

in R- for all values that are TRUE, is the same row, previous column also TRUE?

EDITED:
In R. I'm trying to generate a data frame full of logicals that tells me for all values that are TRUE, whether the same row in the previous column is also TRUE. The columns represent time points, and I want to know for any row that's true, is it the first instance of that row being true? Note- i only need it to look as far as one time point (column) though. If it was true three columns ago, but not on the last one, it's still considered a new instance.
example data frame:
T1<- c(TRUE, TRUE, FALSE)
T2<- c(FALSE, TRUE, FALSE)
T3<- c(TRUE, FALSE, TRUE)
df<- data.frame(cbind(T1,T2,T3))
df
looks like:
T1 T2 T3
1 TRUE FALSE TRUE
2 TRUE TRUE FALSE
3 FALSE FALSE TRUE
since I'm asking about the previous column, need to add a null column at the beginning
df_w_null<-cbind("null_col"= logical(nrow(df)), df)
df_w_null
looks like:
null_col T1 T2 T3
1 FALSE TRUE FALSE TRUE
2 FALSE TRUE TRUE FALSE
3 FALSE FALSE FALSE TRUE
for each row, where TRUE, is it the first instance of TRUE? (is the previous column true? If yes, it's not a new instance, print false)
for (i in 2:ncol(df_w_null)){
status[i]<- as.data.frame(apply((!df_w_null[,i, drop=FALSE] == df_w_null[,i-1, drop=FALSE]), 1, isTRUE))
status<- data.frame(status)
return(status)
}
looks like:
status[,2:ncol(df_w_null)]
1 TRUE TRUE TRUE
2 TRUE FALSE TRUE
3 FALSE FALSE TRUE
#expected result:
1 TRUE FALSE TRUE
2 TRUE FALSE FALSE
3 FALSE FALSE TRUE
There are a lot of little step going on here. First, the data.frame gets split up into pairs of columns, then those pairs of columns are checked to see whether they meet the requirement of FALSE then TRUE and then the resulting logical vectors are reassembled into a final data.frame.
as.data.frame(do.call(cbind, lapply(setNames(lapply(2:ncol(df_w_null), function(x) data.frame(df_w_null[x-1], df_w_null[x])), names(df_w_null)[-1]),
function(x) ifelse(x[,1] == F & x[,2] == T, T, F))))
T1 T2 T3
1 TRUE FALSE TRUE
2 TRUE FALSE FALSE
3 FALSE FALSE TRUE
Here's a data frame with all values in the first column FALSE
df1 <- cbind(FALSE, df)
You would like a TRUE value whenever the column i is not TRUE (we're not interested in the last column, so !df1[, -ncol(df1)]) AND the column i + 1 is TRUE (we're not interested in the first column, so df1[, -1]). We have
> (!df1[, -ncol(df1)]) & (df1[, -1])
T1 T2 T3
[1,] TRUE FALSE TRUE
[2,] TRUE FALSE FALSE
[3,] FALSE FALSE TRUE

Which indices are FALSE?

which() conveniently gives all the indices which are TRUE in x. What is a simple way to get all the indices of x which are FALSE?
Sample data
x <- c(T,T,F,F)
[1] TRUE TRUE FALSE FALSE
which function gives indices where we have TRUE value
which(x)
[1] 1 2
If we need to populate indices for only FALSE values
which(!x)
[1] 3 4
we can also bring false values as output as
!which(x)
[1] FALSE FALSE

reporting identical values across columns in matrix

I have a matrix that I am performing a for loop over. I want to know if the values of position i in the for loop exist anywhere else in the matrix, and if so, report TRUE. The matrix looks like this
dim
x y
[1,] 5 1
[2,] 2 2
[3,] 5 1
[4,] 5 9
In this case, dim[1,] is the same as dim[3,] and should therefore report TRUE if I am in position i=1 in the for loop. I could write another for loop to deal with this, but I am sure there are more clever and possibly vectorized ways to do this.
We can use duplicated
duplicated(m1)|duplicated(m1, fromLast=TRUE)
#[1] TRUE FALSE TRUE FALSE
The duplicated(m1) gives a logical vector of 'TRUE/FALSE' values. If there is a duplicate row, it will be TRUE
duplicated(m1)
#[1] FALSE FALSE TRUE FALSE
In this case, the third row is duplicate of first row. Suppose if we need both the first and third row, we can do the duplication from the reverse side and use | to make both positions TRUE. i.e.
duplicated(m1, fromLast=TRUE)
#[1] TRUE FALSE FALSE FALSE
duplicated(m1)|duplicated(m1, fromLast=TRUE)
#[1] TRUE FALSE TRUE FALSE
According to ?duplicated, the input data can be
x: a vector or a data frame or an array or ‘NULL’.
data
m1 <- cbind(x=c(5,2,5,5), y=c(1,2,1,9))

Resources