Fisher test using apply function in R - r

The following is the code: the problem is that the calculation is very slow.
The matrices, gene1, gene2 and neither are of same length (8000)
pos <- c()
neg <- c()
either <- c()
for(i in 1:ncol(both)){
x <- cbind(both[,i], gene1[,i], gene2[,i], neither[,i])
test <- apply(x, 1, function(s){fisher.test(matrix(s, nrow = 2),
alternative = "greater")$p.value})
pos <- c(test,pos)
test1 <- apply(x, 1, function(s){fisher.test(matrix(s, nrow = 2),
alternative = "less")$p.value})
neg <- c(test1, neg)
test2 <- apply(x, 1, function(s){fisher.test(matrix(s, nrow = 2))$p.value})
either <- c(test2, either)
}

You can try using lapply to loop over the different alternatives (less, greater, two.sided) and wrap the fisher.test call in your own function. Perhaps something like this:
myTest <- function(altn,x){
ft <- apply(x,1,FUN=function(s,alt) {
fisher.test(matrix(s,nrow=2),alternative=alt)$p.value},
alt=altn)
}
pos <- c()
neg <- c()
either <- c()
for(i in 1:ncol(both)){
x <- cbind(both[,i], gene1[,i], gene2[,i], neither[,i])
rs <- lapply(c('two.sided','greater','less'),myTest,x=x)
pos <- c(rs[[2]],pos)
neg <- c(rs[[3]],neg)
either <- c(rs[[1]],either)
}
Without some test data to check on, I can't assure you there won't be any gotcha's in this, but this basic strategy should do what you want.
Note that this still calls fisher.test three times, just in a somewhat more compact form. I don't know of a function that calculates a fisher test with all three alternatives in the same call, but perhaps someone else will weigh in with one.

Related

for loop question in r :number of items to replace is not a multiple of replacement length

all
I'm new to R. I try many ways and still cannot solve it. Can anyone help to check??
I am trying to produce 3 times 100 random values that follow a chisquare distribution. Console says ''number of items to replace is not a multiple of replacement length''. Any hint to fix it??
for(i in 1:3) {
x1[i] <- rchisq(100, df=2)
n1[i] <- length(x1[i])
}
As an explanation for your problem: You are trying to store a vector of 100 elements into a single element, the ith element, of a vector, x1. To illustrate, you could put a vector of values into a vector of the same length:
x <- rnorm(6, 0, 1)
x[1:3] <- c(1,2,3)
x
## [1] 1.0000000 2.0000000 3.0000000 -0.8652300 1.3776699 -0.8817483
You could to store them into a list, each element of a list is a vector that can be of any length. You will need double square brackets.
x1 <- list()
for(i in 1:3) {
x1[[i]] <- rchisq(100, df=2)
n1[i] <- length(x1[[i]])
}
Lists and vectors are different types of data structures in R, you can read a lot about them in advanced R.
It depends on what containers you want to use. There are two containers that come to mind, either a list or matrix.
# list format
x1 = list();
n1 = vector();
for(i in 1:3) {
x1[[i]] <- rchisq(100, df=2)
n1[i] <- length(x1[[i]])
}
note the double brackets [[i]] as mentioned in the comments
# matrix format
x1 = matrix(NA, nrow = 100, ncol = 3)
n1 = vector();
for(i in 1:3) {
x1[,i] <- rchisq(100, df=2)
n1[i] <- length(x1[,i])
}

Is it possible to use vector math in R for a summation involving intervals?

Title's a little rough, open to suggestions to improve.
I'm trying to calculate time-average covariances for a 500 length vector.
This is the equation we're using
The result I'm hoping for is a vector with an entry for k from 0 to 500 (0 would just be the variance of the whole set).
I've started with something like this, but I know I'll need to reference the gap (i) in the first mean comparison as well:
x <- rnorm(500)
xMean <-mean(x)
i <- seq(1, 500)
dfGam <- data.frame(i)
dfGam$gamma <- (1/(500-dfGam$i))*(sum((x-xMean)*(x[-dfGam$i]-xMean)))
Is it possible to do this using vector math or will I need to use some sort of for loop?
Here's the for loop that I've come up with for the solution:
gamma_func <- function(input_vec) {
output_vec <- c()
input_mean <- mean(input_vec)
iter <- seq(1, length(input_vec)-1)
for(val in iter){
iter2 <- seq((val+1), length(input_vec))
gamma_sum <- 0
for(val2 in iter2){
gamma_sum <- gamma_sum + (input_vec[val2]-input_mean)*(input_vec[val2-val]-input_mean)
}
output_vec[val] <- (1/length(iter2))*gamma_sum
}
return(output_vec)
}
Thanks
Using data.table, mostly for the shift function to make x_{t - k}, you can do this:
library(data.table)
gammabar <- function(k, x){
xbar <- mean(x)
n <- length(x)
df <- data.table(xt = x, xtk = shift(x, k))[!is.na(xtk)]
df[, sum((xt - xbar)*(xtk - xbar))/n]
}
gammabar(k = 10, x)
# [1] -0.1553118
The filter [!is.na(xtk)] starts the sum at t = k + 1, because xtk will be NA for the first k indices due to being shifted by k.
Reproducible x
x <- c(0.376972124936433, 0.301548373935665, -1.0980231706536, -1.13040590360378,
-2.79653431987176, 0.720573498411587, 0.93912102300901, -0.229377746707471,
1.75913134696347, 0.117366786802848, -0.853122822287008, 0.909259181618213,
1.19637295955276, -0.371583903741348, -0.123260233287436, 1.80004311672545,
1.70399587729432, -3.03876460529759, -2.28897494991878, 0.0583034949929225,
2.17436525195634, 1.09818265352131, 0.318220322390854, -0.0731475581637693,
0.834268741278827, 0.198750636733429, 1.29784138432631, 0.936718306241348,
-0.147433193833294, 0.110431994640128, -0.812504663900505, -0.743702167768748,
1.09534507180741, 2.43537370755095, 0.38811846676708, 0.290627670295127,
-0.285598287083935, 0.0760147178373681, -0.560298603759627, 0.447188372143361,
0.908501134499943, -0.505059597708343, -0.301004012157305, -0.726035976548133,
-1.18007702699501, 0.253074712637114, -0.370711296884049, 0.0221795637601637,
0.660044122429767, 0.48879363533552)

How to create matrix of all 2^n binary sequences of length n using recursion in R?

I know I can use expand.grid for this, but I am trying to learn actual programming. My goal is to take what I have below and use a recursion to get all 2^n binary sequences of length n.
I can do this for n = 1, but I don't understand how I would use the same function in a recursive way to get the answer for higher dimensions.
Here is for n = 1:
binseq <- function(n){
binmat <- matrix(nrow = 2^n, ncol = n)
r <- 0 #row counter
for (i in 0:1) {
r <- r + 1
binmat[r,] <- i
}
return(binmat)
}
I know I have to use probably a cbind in the return statement. My intuition says the return statement should be something like cbind(binseq(n-1), binseq(n)). But, honestly, I'm completely lost at this point.
The desired output should basically recursively produce this for n = 3:
binmat <- matrix(nrow = 8, ncol = 3)
r <- 0 # current row of binmat
for (i in 0:1) {
for (j in 0:1) {
for (k in 0:1) {
r <- r + 1
binmat[r,] <- c(i, j, k)}
}
}
binmat
It should just be a matrix as binmat is being filled recursively.
I quickly wrote this function to generate all N^K permutations of length K for given N characters. Hope it will be useful.
gen_perm <- function(str=c(""), lst=5, levels = c("0", "1", "2")){
if (nchar(str) == lst){
cat(str, "\n")
return(invisible(NULL))
}
for (i in levels){
gen_perm(str = paste0(str,i), lst=lst, levels=levels)
}
}
# sample call
gen_perm(lst = 3, levels = c("x", "T", "a"))
I will return to your problem when I get more time.
UPDATE
I modified the code above to work for your problem. Note that the matrix being populated lives in the global environment. The function also uses the tmp variable to pass rows to the global environment. This was the easiest way for me to solve the problem. Perhaps, there are other ways.
levels <- c(0,1)
nc <- 3
m <- matrix(numeric(0), ncol = nc)
gen_perm <- function(row=numeric(), lst=nc, levels = levels){
if (length(row) == lst){
assign("tmp", row, .GlobalEnv)
with(.GlobalEnv, {m <- rbind(m, tmp); rownames(m) <- NULL})
return(invisible(NULL))
}
for (i in levels){
gen_perm(row=c(row,i), lst=lst, levels=levels)
}
}
gen_perm(lst=nc, levels=levels)
UPDATE 2
To get the expected output you provided, run
m <- matrix(numeric(0), ncol = 3)
gen_perm(lst = 3, levels = c(0,1))
m
levels specifies a range of values to generate (binary in our case) to generate permutations, m is an empty matrix to fill up, gen_perm generates rows and adds them to the matrix m, lst is a length of the permutation (matches the number of columns in the matrix).

Conditional replacement of values in an array

I want to modify an array but with an element-by-element condition. This is what I want to do
vector <- runif(18, 0,1)
xx <- array(vector, dim=c(2,3,3))
for (i in 1:2) {
for (j in 1:3) {
xx[i,j,1] <- ifelse(xx[i,j,1]<0.5,1,xx[i,j,1])
xx[i,j,2] <- ifelse(xx[i,j,2]<0.4,1.5,xx[i,j,2])
xx[i,j,3] <- ifelse(xx[i,j,3]<0.2,2,xx[i,j,3])
}
}
Is there a more efficient way to do it?
Thanks
Not sure what you mean by efficient but this avoids looping:
vector <- runif(18, 0,1)
xx <- array(vector, dim=c(2,3,3))
xx
xx[,,1][xx[,,1]<.5] <- 1
xx[,,2][xx[,,2]<.4] <- 1.5
xx[,,3][xx[,,3]<.2] <- 2
Try it online!
There are two ways that you could simplify this double loop
Option 1:
vector <- runif(18, 0,1)
xx <- array(vector, dim=c(2,3,3))
xx[,,1][xx[,,1]<.5] = 1
xx[,,2][xx[,,2]<.4] = 1.5
xx[,,3][xx[,,3]<.2] = 2
You still have to write one line for each condition, though.
The second way is to use lapply, but in this case you have to create three vectors: index, threshhold, substitution
idx = 1:3
thr = c(.5, .4, .2)
sb = c(1, 1.5, 2)
lapply(idx, function(k){
xx[,,k][ xx[,,k]< thr[x] ] <<- sb[k]
})

Using function "cat" with "replicate" in R

Is there a way how to combine function "cat" with function "replicate" in R?
I want to see number of "loops" R has already made at a particular moment. However, instead of using "for" loop, I prefer to use "replicate". See the simple example below:
Data <- rnorm(20,20,3)
# with for loop
N <- 1000
outcome <- NULL
for(i in 1:N){
Data.boot <- sample(Data, replace=TRUE)
outcome[i] <- mean(Data.boot)
cat("\r", i, "of", N)
}
#the same but with replicate
f <- function() {
Data.boot <- sample(Data, replace=TRUE)
outcome <- mean(Data.boot)
return(outcome)
}
replicate(N, f())
Thus, any ideas how to implement function "cat" with "replicate" (as well as other approaches to see a number of how many times the function of interest has been executed with "replicate") would be very appreciated. Thank you!
As an alternative, you could use sapply instead of replicate:
Data <- rnorm(20,20,3)
N <- 1000
f <- function(i) {
Data.boot <- sample(Data, replace=TRUE)
cat("\r", i, "of", N)
mean(Data.boot)
}
outcome <- sapply(1:N, f)
or alternatively, using plyr, you could use raply with the progress option (if your main purpose is to see how far through you are):
outcome <- plyr::raply(N, mean(sample(Data, replace = TRUE)), .progress = "text")
You could use scoping in the following way:
i = 0
f <- function() {
Data.boot <- sample(Data, replace=TRUE)
outcome <- mean(Data.boot)
i <<- i + 1
print(i)
return(outcome)
}

Resources