How to use tabulate function to count zeros? - r

I am trying to count integers in a vector that also contains zeros. However, tabulate doesn't count the zeros. Any ideas what I am doing wrong?
Example:
> tabulate(c(0,4,4,5))
[1] 0 0 0 2 1
but the answer I expect is:
[1] 1 0 0 0 2 1

Use a factor and define its levels
tabulate(factor(c(0,4,4,5), 0:5))
#[1] 1 0 0 0 2 1
The explanation for the behaviour you're seeing is in ?tabulate (bold face mine)
bin: a numeric vector (of positive integers), or a factor. Long
vectors are supported.
In other words, if you give a numeric vector, it needs to have positive >0 integers. Or use a factor.

I got annoyed enough by tabulate to write a short function that can count not only the zeroes but any other integers in a vector:
my.tab <- function(x, levs) {
sapply(levs, function(n) {
length(x[x==n])
}
)}
The parameter x is an integer vector that we want to tabulate. levs is another integer vector that contains the "levels" whose occurrences we count. Let's set x to some integer vector:
x <- c(0,0,1,1,1,2,4,5,5)
A) Use my.tab to emulate R's built-in tabulate. 0-s will be ignored:
my.tab(x, 1:max(x))
# [1] 3 1 0 1 2
B) Count the occurrences of integers from 0 to 6:
my.tab(x, 0:6)
# [1] 2 3 1 0 1 2 0
C) If you want to know (for some strange reason) only how many 1-s and 4-s your x vector contains, but ignore everything else:
my.tab(x, c(1,4))
# [1] 3 1

Related

How can I use rowSums with conditions to return binary value?

Say I have a data frame with a column for summed data. What is the most efficient way to return a binary 0 or 1 in a new column if any value in columns a, b, or c are NOT zero? rowSums is fine for a total, but I also need a simple indicator if anything differs from a value.
tt <- data.frame(a=c(0,-5,0,0), b=c(0,5,10,0), c=c(-5,0,0,0))
tt[, ncol(tt)+1] <- rowSums(tt)
This yields:
> tt
a b c V4
1 0 0 -5 -5
2 -5 5 0 0
3 0 10 10 20
4 0 0 0 0
The fourth column is a simple sum of the data in the first three columns. How can I add a fifth column that returns a binary 1/0 value if any value differs from a criteria set on the first three columns?
For example, is there a simple way to return a 1 if any of a, b, or c are NOT 0?
as.numeric(rowSums(tt != 0) > 0)
# [1] 1 1 1 0
tt != 0 gives us a logical matrix telling us where there are values not equal to zero in tt.
When the sum of each row is greater than zero (rowSums(tt != 0) > 0), we know that at least one value in that row is not zero.
Then we convert the result to numeric (as.numeric(.)) and we've got a binary vector result.
We can use Reduce
+(Reduce(`|`, lapply(tt, `!=`, 0)))
#[1] 1 1 1 0
One could also use the good old apply loop:
+apply(tt != 0, 1, any)
#[1] 1 1 1 0
The argument tt != 0 is a logical matrix with entries stating whether the value is different from zero. Then apply() with margin 1 is used for a row-wise operation to check if any of the entries is true. The prefix + converts the logical output into numeric 0 or 1. It is a shorthand version of as.numeric().

compare current cell and previous cell in excel style without loop

I want to create a indicator variable after comparing the current value of a variable and the previous value. The logic is like this:if current value= previous value, then indicator =1,else 0. The first indicator value is truncated because there is no comparison.
It needs to be fast because I have lots of groups to compare in my data( I did not include the group for simplicity)
> dt<-c('a','a','a','b','a','a','c','c')
> indicator
[1] NA 1 1 0 0 1 0 1
Using base R you can remove the last elements and the first element of the vector with head() and tail() and do the comparison, then add the NA to the front.
c(NA, as.numeric(head(dt, -1) == tail(dt, -1)))
If dt were a vector of numbers, you could use diff like
dn <- c(1,1,1,2,1,1,3,3)
c(NA, (diff(dn)==0)+0)
(using +0 rather than as.numeric to make the booleans 1's and 0's.)
You can use Lag from Hmisc package
Ignoring the first value with [-1] and adding NA at the beginning.
library(Hmisc)
c(NA, as.numeric(dt== Lag(dt))[-1])
#[1] NA 1 1 0 0 1 0 1
You could also use rle in base R:
v <- rle(dt)[[1]]
x <- rep(1:length(v),v)
indicator <- c(NA, (diff(x)==0)*1)
#[1] NA 1 1 0 0 1 0 1
v: gets the number of times each character is repeated
x: contains the respective numeric vector from dt to benefit from diff

R - Match two vectors with conditional

I've got two binary vectors and I'm trying to find the most efficient way of comparing them based on slightly more than just a standard "are they equal?".
My function is that if I have vector x and vector y I want to find out how many times in vector x do I have a 1 at the same index that vector y has a 0. I also need to when vector y has a 1 + has a 0 where vector x also has a 0. (Note: If I find either of these I can just find the inverse to get the other, I'm just not sure which is easier/more efficient ie. VectorY Score = length(VectorX) - VectorX Score)
Ex:
vector x: 1 1 1 0 0 1 - Score: 2
vector y: 0 1 0 1 0 1 - Score: 4
I know that I could just use a for loop to go through each index, but I'd like something more efficient if possible. I have vector lengths of 100 and I need to do many of these comparisons so speed matters.
I tried to use the sum command, but I can't figure out how to add complex conditionals to it. I can find every spot that matches, but that's not enough to solve this.
Ex:
sum(vectorX == vectorY)
Sample:
> vx
[1] 1 1 1 0 0 1
> vy
[1] 0 1 0 1 0 1
You said: "how many times in vector x do I have a 1 at the same index that vector y has a 0"
> vx==1 & vy==0 # constructs this vector:
[1] TRUE FALSE TRUE FALSE FALSE FALSE
> sum(vx==1 & vy==0) # its sum is the answer (TRUE=1, FALSE=0)
[1] 2
You also said: "when vector y has a 1 + has a 0 where vector x also has a 0" which I don't understand but you can clarify that and probably work it out yourself given the answer I've just given you.

Implementing simple scoring function with permutation test in R

I'm new in R, and I want to calculate some specific score for bunch of genes in biology.
can somebody help me to implement this ? :-)
I have following two vectors:
vector 1: (0.01,0.02,0.04,0.5,0.9,0.002,0.07,0.008)
vector 2: (1,0,0,1,0,0,0,0)
vector 2 shows the membership of vector 1 elements in specific set c
I want to implement a scoring function which would do the following steps :
1) takes vector 1 and vector 2 as inputs.
2) sort the vector 1 with decreasing values and then sort the vector 2 with corresponding vector 1
3) it's go through the sorted vector 1 and if for the element i of the vector 1 the corresponding element of sorted vector 2 is 1, then the score should be increased by (m-l),
else it should be decreased by l .
m= length of vector 1
l= # of non-zero elements in vector 2
4) finally do the permutation on the vectors 1 and vector 2 and re-calculate the score of step 3. the permutation should preserve the true membership of vector 1 element in vector 2 . for example : vector 1: (10,7,4), vector 2: (0,0,1), after one possible permutation : vector 1: (4,7,10), vector2: (1,0,0)
here is my attempt :
vector1<- c(0.01,0.02,0.04,0.5,0.9,0.002,0.07,0.008)
vector2<- c(1,0,0,1,0,0,0,0)
m<-length(vector1)
l<-nnzero(vector2, na.counted = NA)
score=0
score_function<-function (a,b){
a<-sort(a,decreasing = T)
for (i in a){
if (b[i]==1) {
score= + m-1
} else{ score= score-l }
}
score
}
but I couldn't sort the b (vector 2) according to vector 1 (a)
If you want to sort by another vector use order() as an index to "[":
> vector1<- c(0.01,0.02,0.04,0.5,0.9,0.002,0.07,0.008)
>
> vector2<- c(1,0,0,1,0,0,0,0)
> vector2[ order(vector1) ]
[1] 0 0 1 0 0 0 1 0

need to count number of specific transitions in a vector in R

I am programming a sampler in R, which basically is a big for loop, and for every Iterations I have to count the number of transitions in a vector. I have a vector called k, which contains zeros and ones, with 1000 entries in the vector.
I have used the following, horribly slow, code:
#we determine the number of transitions n00,n01,n10,n11
n00=n01=n10=n11=0 #reset number of transitions between states from last time
for(j in 1:(1000-1)){
if(k[j+1]==1 && k[j]==0) {n01<-n01+1}
else { if(k[j+1]==1 && k[j]==1) {n11<-n11+1}
else { if(k[j+1]==0 && k[j]==1) {n10<-n10+1}
else{n00<-n00+1}
}
}
}
So for every time the loop goes, the variables n00,n01,n10,n11 counts the transitions in the vector. For example, n00 counts number of times a 0 is followed by another 0. And so on...
This is very slow, and I am very new to R, so I am kind of desperate here. I do not understand how to use grep, if that even is possible.
Thank you for your help
Try something like this:
x <- sample(0:1,20,replace = TRUE)
> table(paste0(head(x,-1),tail(x,-1)))
00 01 10 11
4 3 4 8
The head and tail return portions of the vector x: all but the last element, and then all but the first element. This means that the corresponding elements are the consecutive pairs from x.
Then paste0 just converts each one to a character vector and pastes the first elements, the second element, etc. The result is a character vector with elements like "00", "01", etc. Then table just counts up how many of each there are.
You can assign the result to a new variable like so:
T <- table(paste0(head(x,-1),tail(x,-1)))
Experiment yourself with each piece of the code to see how it works. Run just head(x,-1), etc. to see what each piece does.
To address the comment below, to ensure that all types appear with counts when you run table, convert it to a factor first:
x1 <- factor(paste0(head(x,-1),tail(x,-1)),levels = c('00','01','10','11'))
table(x1)
If we don't care about distinguishing the n00 and n11 cases, then this becomes much simpler:
x <- sample(0:1,20,replace = TRUE)
# [1] 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 1 0 0 0 0
table(diff(x))
# -1 0 1
# 4 11 4
Since the question says that you're primarily interested in the transitions, this may be acceptable, otherwise one of the other answers would be preferable.
x <- sample(0:1, 10, replace = TRUE)
# my sample: [1] 0 0 0 0 0 1 0 1 1 0
rl <- rle(x)
zero_to_zero <- sum(rl$len[rl$val == 0 & rl$len > 1] - 1)
one_to_one <- sum(rl$len[rl$val == 1 & rl$len > 1] - 1)
zero_to_one <- sum(diff(rl$val) == -1)
one_to_zero <- sum(diff(rl$val) == 1)
x
# [1] 0 0 0 0 0 1 0 1 1 0
zero_to_zero
# [1] 4
one_to_one
# [1] 1
zero_to_one
# [1] 2
one_to_zero
# [1] 2
#joran's answer is faaaar cleaner though...Still, I thought I just as well could finish the stroll I started down (the dirty) trail, and share the result.

Resources