Different results on same expression - r

I have the following code snippets sum s which i have written in two ways. Both the expressions although are same, giving me two different answers. The output is coming to be negative where it is supposed to come in a positive value. Please tell me why it is happening and how to rectify it.
n <- 40
k <- 20
m <- 30
T <- 1.2
t1 <- 3
(1) The first way of expressing the sum
s <- 0
for (j in 0:(m-1)) {
a <- choose(m-1, j)*(-1)^j*(1/(n-m+1+j))*(1-exp(-T*(n-m+1+j)*(1/t1)))
s <- s + a
}
s <- s * m*choose(n, m)
(2) The second way of expressing the same sum is
s <- 0
for (j in 0:(m-1)) {
a <- choose(m-1, j)*(-1)^j*(1/(n-m+1+j))*(1-exp(-T*(n-m+1+j)*(1/t1)))*m*choose(n, m)
s <- s + a
}

Related

Understanding Breakpoint function: how for loops work inside functions

I have the following exercise to be solved in R. Under the exercise, there is a hint towards the solution.
Exercise: If there are no ties in the data set, the function above will produce breakpoints with h observations in the interval between two consecutive breakpoints (except the last two perhaps). If there are ties, the function will by construction return unique breakpoints, but there may be more than h observations in some intervals.
Hint:
my_breaks <-function(x, h = 5) {
x <-sort(x)
breaks <- xb <- x[1]
k <- 1
for(i in seq_along(x)[-1])
{if(k<h)
{k <- k+1}
else{
if(xb<x[i-1]&&x[i-1]<x[i])
{xb <- x[i-1]
breaks <-c(breaks, xb)
k <- 1
}
}
}
However, I am having a hard time understanding the above function particularly the following lines
for(i in seq_along(x)[-1])
{if(k<h)
{k <- k+1}
Question:
How is the for loop supposed to act in k if k is previously defined as 1 and i is different than k? How are the breakpoints chosen according to the h=5 gap if the for loop is not acting on x? Can someone explain to me how this function works?
Thanks in advance!
First, note that your example is incomplete. The return value and the final brace are missing there. Here is the correct version.
my_breaks <-function(x, h = 5) {
x <- sort(x)
breaks <- xb <- x[1]
k <- 1
for(i in seq_along(x)[-1]){
if(k<h) {
k <- k+1
} else {
if(xb<x[i-1]&&x[i-1]<x[i]){
xb <- x[i-1]
breaks <-c(breaks, xb)
k <- 1
}
}
}
breaks
}
Let's check if it works.
my_breaks(c(1,1,1:5,8:10), 2)
#[1] 1 2 4 8
my_breaks(c(1,1,1:5,8:10), 5)
#[1] 1 3
As you can see, everything is fine. And what is seq_along(x)[-1]? We could write this equation as 2:length(x). So the for loop goes through each element of the vector x in sequence, skipping the first element.
What is the k variable for? It counts the distance to take into account the h parameter.

Cooley-Tukey FFT in R radix-2 DIT case

So I've been trying to (manually) implement the Cooley-Turkey FFT algorithm in R (for Inputs with size N=n^2). I tried:
myfft <- function(s){
N <- length(s)
if (N != 1){
s[1:(N/2)] <- myfft(s[(1:(N/2))*2-1])
s[(N/2+1):N] <- myfft(s[(1:(N/2))*2])
for (k in 1:(N/2)){
t <- s[k]
s[k] <- t + exp(-1i*2*pi*(k-1)/N) * s[k+N/2]
s[k+N/2] <- t - exp(-1i*2*pi*(k-1)/N) * s[k+N/2]
}
}
s
}
This compiles, but for n>1, N=2^n it does not compute the right values. I implemented a DFT-function and used the fft() function to compare, both compute, when normalized, give the same values, but seem to disagree with my algorithm above.
If anyone feels interested and sees where I went wrong, help would be greatly appreciated, I'm going mad searching for the mistake and am starting to question, if I even ever understood this FFT algorithm.
UPDATE: I fixed it, I'm not 100% sure where the problem exactly was, but here is the working implementation:
myfft <- function(s){
N <- length(s)
if (N != 1){
t <- s
t[1:(N/2)] <- myfft(s[(1:(N/2))*2-1]) # 1 3 5 7 ...
t[(N/2+1):N] <- myfft(s[(1:(N/2))*2]) # 2 4 6 8 ...
s[1:(N/2)] <- t[1:(N/2)] + exp(-1i*2*pi*(0:(N/2-1))/N) * t[(N/2+1):N]
s[(N/2+1):N] <- t[1:(N/2)] - exp(-1i*2*pi*(0:(N/2-1))/N) * t[(N/2+1):N]
}
return(s)
}
The problem was with the following line
s[1:(N/2)] <- myfft(s[(1:(N/2))*2-1])
which was overwriting part of the untransformed values that were needed on the subsequent line:
s[(N/2+1):N] <- myfft(s[(1:(N/2))*2])
For example, when N=4, the second call to myfft uses s[2] and s[4], but the assignment from the first call to myfft writes into s[1] and s[2] (thus overwriting the required original value in s[2]).
Your solution of copying the entire array prevents this overwrite.
An alternate solution commonly used is to copy the even and odd parts separately:
myfft <- function(s){
N <- length(s)
if (N != 1){
odd <- s[(1:(N/2))*2-1]
even <- s[(1:(N/2))*2]
s[1:(N/2)] <- myfft(odd)
s[(N/2+1):N] <- myfft(even)
s[1:(N/2)] <- t[1:(N/2)] + exp(-1i*2*pi*(0:(N/2-1))/N) * t[(N/2+1):N]
s[(N/2+1):N] <- t[1:(N/2)] - exp(-1i*2*pi*(0:(N/2-1))/N) * t[(N/2+1):N]
}
return(s)
}

Simplify Simulations on R

as I mentioned in a previous question. I am brand new to programming and have no prior experience, but am very happy to be learning.
However, I've run into the following problem, my professor has given us the following:
sim1 <- function(n) {
xm <- matrix(nrow=n,ncol=2)
for (i in 1:n) {
d <- rnorm(1)
if (runif(1) < 0.5) {
xm[i,1] <- 1
xm[i,2] <- 2.5*d + 69
} else {
xm[i,1] <- 0
xm[i,2] <- 2*d + 64
}
}
return(xm)
}
With the following task: Try to improve the efficiency of this code. Use speed.test to see if it is improved for generating n=1000 observations.
I have finally at least been able to figure out what this code does, nonetheless, I am completely lost on how I could possibly make this code more efficient.
Any help means a whole lot.
Thank you!
If possible, don't use loops in R. rep and rnorm will fill vectors with 5, 10, or 500,000 values all in one call, very quickly. Calling rnorm(1) 500,000 times is a waste and much slower than simply calling rnorm(500000). It's like taking a Ferrari for a drive, going 1 foot and stopping, going 1 foot and stopping, over and over to get to your destination.
This function will return statistically identical results as your function. However, instead of using loops, it does things in the R way.
sim2 <- function(n) {
n1 <- floor(n/2) #this is how many of the else clause we'll do
n2 <- n - n1 #this is how many of the if clause we'll do
col11 <- rep(0, n1) #bam! we have a vector filled with 0s
col12 <- (rnorm(n1) * 2) + 64 #bam! vector filled with deviates
col21 <- rep(1, n2) #bam! vector filled with 1s
col22 <- (rnorm(n2) * 2.5) + 69 #bam! vector filled with deviates
xm <- cbind(c(col11,col21), c(col12,col22)) #now we have a matrix, 2 cols, n rows
return(xm[sample(nrow(xm)),]) #shuffle the rows, return matrix
}
No loops! The functionality might be obvious but in case it is not, I'll explain. First, n1 & n2 are simply to split the size of n appropriately (accounting for odd numbers).
Next, the binomial process (i.e., if(runif(1) < 0.5) {} else {}) per element can be eliminated since we know that in sim1, half of the matrix falls into the if condition and half in the else (see proof below). We don't need to decide for each element over and over and over which random path to take when we know that it's 50/50. So, we're going to do ALL the else 50% first: we fill a vector with n/2 0s (col11) and another with n/2 random deviates (mean = 0, sd = 1 by default) and, for each deviate, multiply by 2 and add 64, with result vector col12. That 50% is done.
Next, we finish the second 50% (the if portion). We fill a vector with n/2 1s (col21) and another with random deviates and, for each deviate, multiply by 2.5 and add 69.
We now have 4 vectors that we'll turn into a matrix. STEP 1: We glue col11 (filled with n/2 0s) and col21 (filled with n/2 1s) together using the c function to get a vector (n elements). STEP 2: Glue col12 and col22 together (filled with the deviates) using c to get a vector (like a 1 column x n row matrix). Note: 0s/1s are associated with the correct deviates based on 64/69 formulas. STEP 3: Use cbind to make a matrix (xm) out of the vectors: 0/1 vector becomes column 1, deviate vector becomes column 2. STEP 4: Get the number of rows in the matrix (which should just be n) using nrow. STEP 5: Make a shuffled vector with all the row numbers randomly ordered using sample. STEP 6: Make a new (unnamed) matrix putting xm's rows in order according to the shuffled vector. The point of steps 4-6 is just to randomly order the rows, since the binomial process in sim1 would have produced a random order of rows.
This version runs 866% faster!
> system.time({ sim1(500000)})
user system elapsed
1.341 0.179 1.527
> system.time({ sim2(500000)})
user system elapsed
0.145 0.011 0.158
If you're concerned about proof that this maintains the integrity of the binomial process, consider that the binomial process does two things: 1) It associates 1 with the 2.5*d+69 equation and 0 with the 2*d + 64 equation - the association is maintained since rows are shuffled intact; 2) 50% go in the if clause and 50% in the else clause, as proved below.
sim3 <- function(n) {
a <- 0
for(j in 1:n) {
if(runif(1) < 0.5) {
a <- a + 1
}
}
return(a/n)
}
> sim3(50)
[1] 0.46
> sim3(5000)
[1] 0.4926
> sim3(10000)
[1] 0.5022
> sim3(5000000)
[1] 0.4997844
The binomial process produces 50% 1s and 50% 0s (column 1).
I'll do what I think is the most obvious step, namely to move rnorm() out of the loop and take advantage of its vectorized nature (as rawr alluded to)
sim2 <- function(n) {
xm <- matrix(nrow=n, ncol=2)
d <- rnorm(n)
for (i in 1:n) {
if (runif(1) < 0.5) {
xm[i,1] <- 1
xm[i,2] <- 2.5*d[i] + 69
} else {
xm[i,1] <- 0
xm[i,2] <- 2*d[i] + 64
}
}
return(xm)
}
n <- 1e3
set.seed(1); system.time(s1 <- sim1(n)); system.time(s2 <- sim2(n))
# user system elapsed
# 0.019 0.004 0.023
# user system elapsed
# 0.010 0.000 0.009
t.test(s1[,2], s2[,2]) # Not identical, but similar, again alluded to by rawr
Just that gives us a reasonable improvement. A similar thing can be done with runif() as well, but I'll leave that to you.
If you want some reading material I can recommend Hadley Wickhams Advanced R and the chapter Optimising code.
And in case you're wondering, it is indeed possible to eliminate both the loop and the conditionals.
One optimization I can suggest is that you create the matrix with default value as 0. Once matrix has been created with 0 value as default then there will be no need to populate a value 0 in function.
The modified code will look like:
sim1 <- function(n) {
#create matrix with 0 value.
xm <- matrix(0,nrow=n,ncol=2)
for (i in 1:n) {
d <- rnorm(1)
if (runif(1) < 0.5) {
xm[i,1] <- 1
xm[i,2] <- 2.5*d + 69
} else {
#xm[i,1] <- 0 --- No longer needed
xm[i,2] <- 2*d + 64
}
}
return(xm)
}

Loop inside a loop in R

I am trying to create an R code that puts another loop inside of the one I've already created. Here is my code:
t <- rep(1,1000)
omega <- seq(from=1,to=12,by=1)
for(i in 1:1000){
omega <- setdiff(omega,sample(1:12,1))
t[i] <- length(omega)
remove <- 0
f <- length(t [! t %in% remove]) + 1
}
When I run this code, I get a number a trials it takes f to reach the zero vector, but I want to do 10000 iterations of this experiment.
replicate is probably how you want to run the outer loop. There's also no need for the f assignment to be inside the loop. Here I've moved it outside and converted it to simply count of the elements of t that are greater than 0, plus 1.
result <- replicate(10000, {
t <- rep(1, 1000)
omega <- 1:12
for(i in seq_along(t)) {
omega <- setdiff(omega,sample(1:12,1))
t[i] <- length(omega)
}
sum(t > 0) + 1
})
I suspect your code could be simplified in other ways as well, and also that you could just write down the distribution that you're looking for without simulation. I believe your variable of interest is just how long until you get at least one of each of the numbers 1:12, yes?
Are you just looking to run your existing loop 10,000 times, like below?
t <- rep(1,1000)
omega <- seq(from=1,to=12,by=1)
f <- rep(NA, 10000)
for(j in 1:10000) {
for(i in 1:1000){
omega <- setdiff(omega,sample(1:12,1))
t[i] <- length(omega)
remove <- 0
f[j] <- length(t [! t %in% remove]) + 1
}
}

creating a loop with combinatorics in r

I'm trying to create a combinatoric function in R and it is dependent on what number I set a variable to. This tells me I would have to use a loop. Here is an example of the equation that I am trying to create and I filled in the numbers given in the example:
The equation changes depending on what number if put in for b. I'm guessing I would need a sum and a loop.
comb = function(n, x) {return(factorial(n) / (factorial(x) * factorial(n-x)))}
a <- 8
b <- 4
c <- 0:b
p <- 0.05
total = function(n) {
return(((comb(a,b + c)*comb((n-a), (c - b)*(-1))/comb(n,a) - 0.05)
}
I will then find out what n is equal too by setting it equal to 0.
You don't need to define comb -- this is provided by the function choose in base R. I think all you need is to combine choose and sum to compute your numerator:
total <- function(n) {
sum(choose(a, b:(2*b)) * choose(n-a, b:0)) / choose(n, a) - p
}
# Plot total(n) for various n values:
a <- 8
b <- 4
p <- 0.05
n <- 20:100
plot(n, sapply(n, total))
For these parameters, total(n) crossed 0 between n=36 and n=37.

Resources