I'm trying to write a function that performs a given number (n) of t-tests on a random set of normal data of size k. The output should be a count of the total number of significant (<0.05) t-tests and a ratio of significant to overall t-tests. I wrote this function below:
StatPractice <- function(n, k) {
i = 1
length <- k
size <- n
while(i <= size){
k1 <- rnorm(length)
k2 <- rnorm(length)
t <- t.test(k1, k2)
p <- cbind(t$p.value)
i <- i + 1;
q <- c(p <= 0.05)
count <- length(q[q==TRUE])
prop <- count/size
print(q)
}
cat("count of significant t-tests:", count, "\n",
"proportion of significant t-tests:", prop, "\n")
}
I've tooled with this in a number of ways, but essentially, the output is something like this:
[1] FALSE
[1] TRUE
[1] FALSE
[1] FALSE
[1] FALSE
[1] TRUE
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
count of significant t-tests: 0
proportion of significant t-tests: 0
Could someone help me figure out why the count is unable to recognize q as a single vector and thus unable to give correct output for number of TRUE values?
You need a vector to store the p-values. Currently, the object p stores the last value only. You can create a vector before the loop starts (p <- numeric(size)). Within the loop, you assign the current p-value to the vector p at index i (p <- numeric(size)). The counting of significant p-values has to be done after the loop. Below is a modified version of your function.
StatPractice <- function(n, k) {
i = 1
length <- k
size <- n
p <- numeric(size)
while(i <= size){
k1 <- rnorm(length)
k2 <- rnorm(length)
t <- t.test(k1, k2)
p[i] <- t$p.value
i <- i + 1
}
q <- p <= 0.05
count <- sum(q)
prop <- count/size
print(q)
cat("count of significant t-tests:", count, "\n",
"proportion of significant t-tests:", prop, "\n")
}
Note that length(q[q==TRUE]) has been replaced with the simpler command sum(q). Furthermore, the function does not print q but return q.
Related
I am trying to carry out the intersect of the 3 vectors. I intend the x, y ,z to take up values as intended below and be inserted to the trans.list, prot.list and PT.list.
#v = number of vectors (number of sets)
V <- 3
#N= number of clusters in each set
N <- 5
intersects <- vector(length=N^V,mode='character')
for (x in 1:N) {
for (y in 1:N) {
for (z in 1:N) {
inters <- length(Reduce(intersect,
list(
trans.list[[x]][["NAME"]],
prot.list[[y]][["NAME"]],
PT.list[[z]][["NAME"]]
)
))
intersects[x, y, z] <- inters
However at the end, I cant seem to save it to index as a value. any advice?
[1] "1,1,1"
[1] "1,1,2"
[1] "1,1,3"
[1] "1,1,4"
[1] "1,1,5"
[1] "1,2,1"
[1] "1,2,2"
...
I am having problems when saving the results in a for loop.
I am computing a variance (this is not relevant I think) and my code is:
library(dirmult)
n <- 50
p <- 20
size <- 5*p
prob_true <- rep(1/p, p)
multinom <- as.matrix(rmultinom(n, size, prob = prob_true))
zeros <- round(0.5*p*n)
a <- c(as.matrix(multinom))
a[sample(1:(p*n), zeros)] <- 0
data_zeros <- matrix(a, p, n)
dirmult <- dirmult(t(data_zeros))
alpha <- dirmult$gamma
sum_alpha <- (1-dirmult$theta)/dirmult$theta
for (j in ncol(data_zeros)){
A <- alpha/sum_alpha
B <- 1 - A
N <- colSums(data_zeros)
C <- 1 + sum_alpha
var_s_dirm <- list()
var_s_dirm[[j]] <- N[j]*A*B*((N[j]+sum_alpha)/C)
}
In particular I can say that alpha is a vector with 20 values, sum_alpha is a scalar data_zeros is my dataset which has 20 rows and 50 columns and N is the sum of each column of the dataset, so it is a vector with 50 values.
It seems very simple to do what I wanted to do:
I want to get a list with 50 vectors where each one differs form the other by the fact that I multiply for a different value of N.
I really hope that somebody can help me finding the error.
The problem is (probably) you are setting constants in each time j is increased, and in each step you clear the list with the line var_s_dirm <- list()...
See if this works for you
library(dirmult)
n <- 50
p <- 20
size <- 5*p
prob_true <- rep(1/p, p)
multinom <- as.matrix(rmultinom(n, size, prob = prob_true))
zeros <- round(0.5*p*n)
a <- c(as.matrix(multinom))
a[sample(1:(p*n), zeros)] <- 0
data_zeros <- matrix(a, p, n)
dirmult <- dirmult(t(data_zeros))
alpha <- dirmult$gamma
sum_alpha <- (1-dirmult$theta)/dirmult$theta
A <- alpha/sum_alpha
B <- 1 - A
N <- colSums(data_zeros)
C <- 1 + sum_alpha
var_s_dirm <- list()
for (j in 1:ncol(data_zeros)){
var_s_dirm[[j]] <- N[j]*A*B*((N[j]+sum_alpha)/C)
}
output
var_s_dirm
[[1]]
[1] 2.614833 2.327105 2.500483 3.047700 2.233528 2.130223 2.700103 2.869699 2.930213 2.575903 2.198459 2.846096
[13] 2.425448 3.517559 3.136266 2.565345 2.578267 2.763113 2.709707 3.420792
[[2]]
[1] 2.568959 2.286279 2.456615 2.994231 2.194343 2.092850 2.652732 2.819353 2.878806 2.530712 2.159889 2.796165
[13] 2.382897 3.455848 3.081244 2.520339 2.533034 2.714637 2.662168 3.360778
[[3]]
[1] 3.211199 2.857849 3.070769 3.742790 2.742930 2.616064 3.315916 3.524193 3.598509 3.163391 2.699862 3.495207
[13] 2.978622 4.319811 3.851556 3.150424 3.166294 3.393297 3.327711 4.200974
....
I am writing a function to perform bit inversion for each row of a binary matrix which depends on a predefined n value. The n value will determine the number of 1 bits for each row of the matrix.
set.seed(123)
## generate a random 5 by 10 binary matrix
init <- t(replicate(5, {i <- sample(3:6, 1); sample(c(rep(1, i), rep(0, 10 - i)))}))
n <- 3
## init_1 is a used to explain my problem (single row matrix)
init_1 <- t(replicate(1, {i <- sample(3:6, 1); sample(c(rep(1, i), rep(0, 10 - i)))}))
The bit_inversion function does this few things:
If the selected row has number of 1's lesser than n, then it randomly select a few indices (difference) and invert them. (0 to 1)
Else if the selected row has number of 1's greater than n, then it randomly select a few indices (difference) and invert them. (1 to 0)
Else do nothing (when the row has number of 1's equals to n.)
Below is the function I implemented:
bit_inversion<- function(pop){
for(i in 1:nrow(pop)){
difference <- abs(sum(pop[i,]) - n)
## checking condition where there are more bits being turned on than n
if(sum(pop[i,]) > n){
## determine position of 1's
bit_position_1 <- sample(which(pop[i,]==1), difference)
## bit inversion
for(j in 1:length(bit_position_1)){
pop[bit_position_1[j]] <- abs(pop[i,][bit_position_1[j]] - 1)
}
}
else if (sum(pop[i,]) < n){
## determine position of 0's
bit_position_0 <- sample(which(pop[i,]==0), difference)
## bit inversion
for(j in 1:length(bit_position_0)){
pop[bit_position_0[j]] <- abs(pop[bit_position_0[j]] - 1)
}
}
}
return(pop)
}
Outcome:
call <- bit_inversion(init)
> rowSums(call) ## suppose to be all 3
[1] 3 4 5 4 3
But when using init_1 (a single row matrix), the function seems to work fine.
Outcome:
call_1 <- bit_inversion(init_1)
> rowSums(call)
[1] 3
Is there a mistake in my for and if...else loop?
Change the line in 'j' for loop
pop[bit_position_1[j]] <- abs(pop[i,][bit_position_1[j]] - 1)
into
pop[i,bit_position_1[j]] <- abs(pop[i,][bit_position_1[j]] - 1)
You forgot the row index.
And, here is a more compact version of your for loop:
for(i in 1:nrow(pop)){
difference <- abs(sum(pop[i,]) - n)
logi <- sum(pop[i,]) > n
pop[i,sample(which(pop[i,]==logi), difference)] <- !logi
}
I am trying to generate n random numbers whose sum is less than 1.
So I can't just run runif(3). But I can condition each iteration on the sum of all values generated up to that point.
The idea is to start an empty vector, v, and set up a loop such that for each iteration, i, a runif() is generated, but before it is accepted as an element of v, i.e. v[i] <- runif(), the test sum(v) < 1 is carried out, and while FALSE the last entry v[i] is finally accepted, BUT if TRUE, that is the sum is greater than 1, v[i] is tossed out of the vector, and the iteration i is repeated.
I am far from implementing this idea, but I would like to resolve it along the lines of something similar to what follows. It's not so much a practical problem, but more of an exercise to understand the syntax of loops in general:
n <- 4
v <- 0
for (i in 1:n){
rdom <- runif(1)
if((sum(v) + rdom) < 1) v[i] <- rdom
}
# keep trying before moving on to iteration i + 1???? i <- stays i?????
}
I have looked into while (actually I incorporated the while function in the title); however, I need the vector to have n elements, so I get stuck if I try something that basically tells R to add random uniform realizations as elements of the vector v while sum(v) < 1, because I can end up with less than n elements in v.
Here's a possible solution. It doesn't use while but the more generic repeat. I edited it to use a while and save a couple of lines.
set.seed(0)
n <- 4
v <- numeric(n)
i <- 0
while (i < n) {
ith <- runif(1)
if (sum(c(v, ith)) < 1) {
i <- i+1
v[i] <- ith
}
}
v
# [1] 0.89669720 0.06178627 0.01339033 0.02333120
Using a repeat block, you must check for the condition anyways, but, removing the growing problem, it would look very similar:
set.seed(0)
n <- 4
v <- numeric(n)
i <- 0
repeat {
ith <- runif(1)
if (sum(c(v, ith)) < 1) {
i <- i+1
v[i] <- ith
}
if (i == 4) break
}
If you really want to keep exactly the same procedure that you have posted (aka iteratively sample the n values one at a time from the standard uniform distribution, rejecting any samples that cause your sum to exceed 1), then the following code is mathematically equivalent, shorter, and more efficient:
samp <- function(n) {
v <- rep(0, n)
for (i in 1:n) {
v[i] <- runif(1, 0, 1-sum(v))
}
v
}
Basically, this code uses the mathematical fact that if the sum of the vector is currently sum(v), then sampling from the standard uniform distribution until you get a value no greater than 1-sum(v) is exactly equivalent to sampling in the uniform distribution from 0 to 1-sum(v). The advantage of using the latter approach is that it's much more efficient -- we don't need to keep rejecting samples and trying again, and can instead just sample once for each element.
To get a sense of the runtime differences, consider sampling 100 observations with n=10, comparing to a working implementation of the code from your post (copied from my other answer to this question):
OP <- function(n) {
v <- rep(0, n)
for (i in 1:n){
rdom <- runif(1)
while (sum(v) + rdom > 1) rdom <- runif(1)
v[i] <- rdom
}
v
}
set.seed(144)
system.time(samples.OP <- replicate(100, OP(10)))
# user system elapsed
# 261.937 1.641 265.805
system.time(samples.josliber <- replicate(100, samp(10)))
# user system elapsed
# 0.004 0.001 0.004
In this case, the new approach is approaching 100,000 times faster.
It sounds like you're trying to uniformly sample from a space of n variables where the following constraints hold:
x_1 + x_2 + ... + x_n <= 1
x_1 >= 0
x_2 >= 0
...
x_n >= 0
The "hit and run" algorithm is the mathematical machinery that enables you to do exactly this. In 2-dimensional space, the algorithm will sample uniformly from the following triangle, with each location in the shaded area being equally likely to be selected:
The algorithm is provided in R through the hitandrun package, which requires you to specify the linear inequalities that define the space through a constraint matrix, direction vector, and right-hand side vector:
library(hitandrun)
n <- 3
constr <- list(constr = rbind(rep(1, n), -diag(n)),
dir = c(rep("<=", n+1)),
rhs = c(1, rep(0, n)))
set.seed(144)
samples <- hitandrun(constr, n.samples=1000)
head(samples, 10)
# [,1] [,2] [,3]
# [1,] 0.28914690 0.01620488 0.42663224
# [2,] 0.65489979 0.28455231 0.00199671
# [3,] 0.23215115 0.00661661 0.63597912
# [4,] 0.29644234 0.06398131 0.60707269
# [5,] 0.58335047 0.13891392 0.06151205
# [6,] 0.09442808 0.30287832 0.55118290
# [7,] 0.51462261 0.44094683 0.02641638
# [8,] 0.38847794 0.15501252 0.31572793
# [9,] 0.52155055 0.09921046 0.13304728
# [10,] 0.70503030 0.03770875 0.14299089
Breaking down this code a bit, we generated the following constraint matrix:
constr
# $constr
# [,1] [,2] [,3]
# [1,] 1 1 1
# [2,] -1 0 0
# [3,] 0 -1 0
# [4,] 0 0 -1
#
# $dir
# [1] "<=" "<=" "<=" "<="
#
# $rhs
# [1] 1 0 0 0
Reading across the first line of constr$constr we have 1, 1, 1 which indicates "1*x1 + 1*x2 + 1*x3". The first element of constr$dir is <=, and the first element of constr$rhs is 1; putting it together we have x1 + x2 + x3 <= 1. From the second row of constr$constr we read -1, 0, 0 which indicates "-1*x1 + 0*x2 + 0*x3". The second element of constr$dir is <= and the second element of constr$rhs is 0; putting it together we have -x1 <= 0 which is the same as saying x1 >= 0. The similar non-negativity constraints follow in the remaining rows.
Note that the hit and run algorithm has the nice property of having the exact same distribution for each of the variables:
hist(samples[,1])
hist(samples[,2])
hist(samples[,3])
Meanwhile, the distribution of the samples from your procedure will be highly uneven, and as n increases this problem will get worse and worse.
OP <- function(n) {
v <- rep(0, n)
for (i in 1:n){
rdom <- runif(1)
while (sum(v) + rdom > 1) rdom <- runif(1)
v[i] <- rdom
}
v
}
samples.OP <- t(replicate(1000, OP(3)))
hist(samples.OP[,1])
hist(samples.OP[,2])
hist(samples.OP[,3])
An added advantage is that the hit-and-run algorithm appears faster -- I generated these 1000 replicates in 0.006 seconds on my computer with hit-and-run and it took 0.3 seconds using the modified code from the OP.
Here's how I would do it, without any loop, if or while:
set.seed(123)
x <- runif(1) # start with the sum that you want to obtain
n <- 4 # number of generated random numbers, can be chosen arbitrarily
y <- sort(runif(n-1,0,x)) # choose n-1 random points to cut the range [0:x]
z <- c(y[1],diff(y),x-y[n-1]) # result: determine the length of the segments
#> z
#[1] 0.11761257 0.10908627 0.02723712 0.03364156
#> sum(z)
#[1] 0.2875775
#> all.equal(sum(z),x)
#[1] TRUE
The advantage here is that you can determine exactly which sum you want to obtain and how many numbers n you want to generate for this. If you set, e.g., x <- 1 in the second line, the n random numbers stored in the vector z will add up to one.
I have a setup that looks like below
for(V in (seq(1, 250, by = 5))){
for(n in (seq(1, 250, by = 5))){
# 1) Working Algorithm creating a probability
ie. vector in range [0:1]
# 2) Take the natural log of this probability
a <- log(lag(Probability), base = exp(1))
# 3) calculate price differences
b <- abs(diff(Price) -1)
# 4) Then compute correlation between a and b
cor(a, b)
# 5) Here I'd like to save this in the corresponding index of matrix
}
}
So that I get a [V, n] sized matrix as output, that collects from each loop.
I have a few problems with this.
The first problem is that my correlation is not computable, as the Probability is often 0, creating a ln(0) = -Inf input in the ln(Probability) vector. Is there a way to compute the std.dev or cor of a Ln vector with -Inf inputs?
My second question is how I save this correlation output into a matrix generated for each loop?
Thanks for your help. I hope this is clear enough.
For your second question (My second question is how I save this correlation output into a matrix generated for each loop?), you could initialise a matrix before the loop and store each computed correlation in the corresponding index like:
sz <- seq(1, 250, by = 5)
out_mat <- matrix(0, nrow=length(sz), ncol=length(sz))
# then continue with your for-loop
for (V in 1:length(sz)) {
for(n in length(sz)) {
# here instead of accessing V and n in computing probability
# use sz[V] and sz[n]
...
...
# after computing the correlation, here use V and n (not sz[V] or sz[n])
out_mat[V, n] <- c # c holds the value of cor(a,b)
}
}
What you can do with -Inf is replace that by NA, for example:
x = runif(10)
x[3] = 1/0
> is.infinite(x)
[1] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
x[is.infinite(x)] <- NA
> x
[1] 0.09936348 0.66624531 NA 0.90689357 0.71578917 0.14655174
[7] 0.59561047 0.41944552 0.67203026 0.03263173
And use the na.rm argument for sd:
> sd(x, na.rm = TRUE)
[1] 0.3126829