Implementing simple scoring function with permutation test in R

Implementing simple scoring function with permutation test in R - r

I'm new in R, and I want to calculate some specific score for bunch of genes in biology.
can somebody help me to implement this ? :-)
I have following two vectors:
vector 1: (0.01,0.02,0.04,0.5,0.9,0.002,0.07,0.008)
vector 2: (1,0,0,1,0,0,0,0)
vector 2 shows the membership of vector 1 elements in specific set c
I want to implement a scoring function which would do the following steps :
1) takes vector 1 and vector 2 as inputs.
2) sort the vector 1 with decreasing values and then sort the vector 2 with corresponding vector 1
3) it's go through the sorted vector 1 and if for the element i of the vector 1 the corresponding element of sorted vector 2 is 1, then the score should be increased by (m-l),
else it should be decreased by l .
m= length of vector 1
l= # of non-zero elements in vector 2
4) finally do the permutation on the vectors 1 and vector 2 and re-calculate the score of step 3. the permutation should preserve the true membership of vector 1 element in vector 2 . for example : vector 1: (10,7,4), vector 2: (0,0,1), after one possible permutation : vector 1: (4,7,10), vector2: (1,0,0)
here is my attempt :
vector1<- c(0.01,0.02,0.04,0.5,0.9,0.002,0.07,0.008)
vector2<- c(1,0,0,1,0,0,0,0)
m<-length(vector1)
l<-nnzero(vector2, na.counted = NA)
score=0
score_function<-function (a,b){
a<-sort(a,decreasing = T)
for (i in a){
if (b[i]==1) {
score= + m-1
} else{ score= score-l }
}
score
}
but I couldn't sort the b (vector 2) according to vector 1 (a)

If you want to sort by another vector use order() as an index to "[":
> vector1<- c(0.01,0.02,0.04,0.5,0.9,0.002,0.07,0.008)
>
> vector2<- c(1,0,0,1,0,0,0,0)
> vector2[ order(vector1) ]
[1] 0 0 1 0 0 0 1 0

Related

How can I use rowSums with conditions to return binary value?

Say I have a data frame with a column for summed data. What is the most efficient way to return a binary 0 or 1 in a new column if any value in columns a, b, or c are NOT zero? rowSums is fine for a total, but I also need a simple indicator if anything differs from a value.
tt <- data.frame(a=c(0,-5,0,0), b=c(0,5,10,0), c=c(-5,0,0,0))
tt[, ncol(tt)+1] <- rowSums(tt)
This yields:
> tt
a b c V4
1 0 0 -5 -5
2 -5 5 0 0
3 0 10 10 20
4 0 0 0 0
The fourth column is a simple sum of the data in the first three columns. How can I add a fifth column that returns a binary 1/0 value if any value differs from a criteria set on the first three columns?
For example, is there a simple way to return a 1 if any of a, b, or c are NOT 0?

as.numeric(rowSums(tt != 0) > 0)
# [1] 1 1 1 0
tt != 0 gives us a logical matrix telling us where there are values not equal to zero in tt.
When the sum of each row is greater than zero (rowSums(tt != 0) > 0), we know that at least one value in that row is not zero.
Then we convert the result to numeric (as.numeric(.)) and we've got a binary vector result.

We can use Reduce
+(Reduce(`|`, lapply(tt, `!=`, 0)))
#[1] 1 1 1 0

One could also use the good old apply loop:
+apply(tt != 0, 1, any)
#[1] 1 1 1 0
The argument tt != 0 is a logical matrix with entries stating whether the value is different from zero. Then apply() with margin 1 is used for a row-wise operation to check if any of the entries is true. The prefix + converts the logical output into numeric 0 or 1. It is a shorthand version of as.numeric().

How to use tabulate function to count zeros?

I am trying to count integers in a vector that also contains zeros. However, tabulate doesn't count the zeros. Any ideas what I am doing wrong?
Example:
> tabulate(c(0,4,4,5))
[1] 0 0 0 2 1
but the answer I expect is:
[1] 1 0 0 0 2 1

Use a factor and define its levels
tabulate(factor(c(0,4,4,5), 0:5))
#[1] 1 0 0 0 2 1
The explanation for the behaviour you're seeing is in ?tabulate (bold face mine)
bin: a numeric vector (of positive integers), or a factor. Long
vectors are supported.
In other words, if you give a numeric vector, it needs to have positive >0 integers. Or use a factor.

I got annoyed enough by tabulate to write a short function that can count not only the zeroes but any other integers in a vector:
my.tab <- function(x, levs) {
sapply(levs, function(n) {
length(x[x==n])
}
)}
The parameter x is an integer vector that we want to tabulate. levs is another integer vector that contains the "levels" whose occurrences we count. Let's set x to some integer vector:
x <- c(0,0,1,1,1,2,4,5,5)
A) Use my.tab to emulate R's built-in tabulate. 0-s will be ignored:
my.tab(x, 1:max(x))
# [1] 3 1 0 1 2
B) Count the occurrences of integers from 0 to 6:
my.tab(x, 0:6)
# [1] 2 3 1 0 1 2 0
C) If you want to know (for some strange reason) only how many 1-s and 4-s your x vector contains, but ignore everything else:
my.tab(x, c(1,4))
# [1] 3 1

generate a vector with set number of 1s [duplicate]

This question already has answers here:
Assigning a specific number of values informed by a probability distribution (in R)
(3 answers)
Closed 4 years ago.
I want to generate a large vector of just 0's and 1's of arbitrary length. But I want at max 10 1's in the vector.
(For those familiar, a 10-sparse vector of some arbitrary length)
How can I do this in R/Rstudio

rep(0,n) #generate n zeros
sample(0:10,1) #generate random number between 0 and 10
rep(1,sample(0:10,1)) # generate random number of ones
sample(c(rep(0,n),rep(1,sample(0:10,1)))) # combine and permute

# function that generates a 10-sparce vector
GenerateSparceVector = function(N) {
# number of 1s
n = sample(1:10,1)
# create vector
vec = c(rep(1, n), rep(0, N-n))
# randomise vector
sample(vec) }
# for reproducibility
set.seed(32)
# apply the function
GenerateSparceVector(20)
# [1] 0 0 0 1 0 0 0 0 1 0 1 1 0 1 0 0 0 0 0 1
Note that I assumed you need at least one 1 in your vector.
Every time you run it there's an equal probability of getting 1, 2, 3, ... 10 1s in your vector.

Sorting and Ordering in R

I am currently working through an intro class and I and was having some difficulty with this particular problem:
Create a function that takes in a vector of numbers V.Size and a single number N as inputs and outputs a list object of size N where each list member is a vector that contains elements of V.Size such that the largest value in V.Size is in the vector of the first list item, the second largest value in V.Sizeis in the vector of the second list item, etc. The (N+1) ordered value of V.Size should be in the first vector of the list, the (N+2) ordered value ofV.Size should be in the second vector of the list and so on.
Now, this is what I have done thus far, I am trying to make an example code:
V.Size <- c(5,4,2,3,1)
n <- 5
Function <- c(V.Size, n)
Function
[1] 5 4 2 3 1 5
sort(Function, decreasing=TRUE)
[1] 5 5 4 3 2 1
The issue I am having is with (N+1), (N+2) and its ordering.

The first step to addressing this would be to create a vector of the list position for each element in sorted V.size. This is basically the vector (1, 2, ..., N, 1, 2, ..., N, ...), of total length V.size. You can get that with:
V.Size <- c(5,4,2,3,1)
n <- 2
rep(1:n, length.out=length(V.Size))
# [1] 1 2 1 2 1
Now you can use the split function to create a list based on these assignments:
split(sort(V.Size, decreasing=TRUE), rep(1:n, length.out=length(V.Size)))
# $`1`
# [1] 5 3 1
#
# $`2`
# [1] 4 2

R - Match two vectors with conditional

I've got two binary vectors and I'm trying to find the most efficient way of comparing them based on slightly more than just a standard "are they equal?".
My function is that if I have vector x and vector y I want to find out how many times in vector x do I have a 1 at the same index that vector y has a 0. I also need to when vector y has a 1 + has a 0 where vector x also has a 0. (Note: If I find either of these I can just find the inverse to get the other, I'm just not sure which is easier/more efficient ie. VectorY Score = length(VectorX) - VectorX Score)
Ex:
vector x: 1 1 1 0 0 1 - Score: 2
vector y: 0 1 0 1 0 1 - Score: 4
I know that I could just use a for loop to go through each index, but I'd like something more efficient if possible. I have vector lengths of 100 and I need to do many of these comparisons so speed matters.
I tried to use the sum command, but I can't figure out how to add complex conditionals to it. I can find every spot that matches, but that's not enough to solve this.
Ex:
sum(vectorX == vectorY)

Sample:
> vx
[1] 1 1 1 0 0 1
> vy
[1] 0 1 0 1 0 1
You said: "how many times in vector x do I have a 1 at the same index that vector y has a 0"
> vx==1 & vy==0 # constructs this vector:
[1] TRUE FALSE TRUE FALSE FALSE FALSE
> sum(vx==1 & vy==0) # its sum is the answer (TRUE=1, FALSE=0)
[1] 2
You also said: "when vector y has a 1 + has a 0 where vector x also has a 0" which I don't understand but you can clarify that and probably work it out yourself given the answer I've just given you.