I am struggling to code this recursive program and wondering if anyone could help.
I want to code this recursive equation:
for k=1,2,...
beta(k)=k-sum_(i=0)^(k-1)Kchoosei*beta(i)*exp(-i(k-i))
I've done it the manual way in R but would like to put it in a function.
beta0<-0
beta1<-1-choose(1,0)*beta0*exp(-0*lambdaL*(1-0))
beta2<-2-choose(2,0)*beta0*exp(-0*lambdaL*(2-0))-choose(2,1)*beta1*exp(-1*lambdaL*(2-1))
beta3<-3-choose(3,0)*beta0*exp(-0*lambdaL*(3-0))-choose(3,1)*beta1*exp(-1*lambdaL*(3-1))-choose(3,2)*beta2*exp(-2*lambdaL*(3-2))
beta4<-4-choose(4,0)*beta0*exp(-0*lambdaL*(4-0))-choose(4,1)*beta1*exp(-1*lambdaL*(4-1))-choose(4,2)*beta2*exp(-2*lambdaL*(4-2))-choose(4,3)*beta3*exp(-3*lambdaL*(4-3))
You can just define a second loop for the sum. Note that the indexing here begins with 1 rather than 0 which leads to an "index shift".
beta = numeric()
beta[1] <- 0
for (k in 1:10){
beta[k+1] <- k
for (i in 0:(k-1))
beta[k+1] <- beta[k+1] - choose(k, i)*beta[i+1]*exp(-i*(k-i))
}
beta
# [1] 0.000000 1.000000 1.264241 2.080705 3.247551 4.528104 5.748673
# [8] 6.876234 7.941197 8.972749 9.987645
I think you need 2 functions, because you need every previous value of beta as an input, and yet you need only one output. Here's what I propose (to adjust with your lambaL, as there's some ambiguity in your post, it works with LambdaL == 1):
beta_vec <- function(k){
if(k == 0) 0 else {
beta_vec_old <- beta_vec(k-1)
c(beta_vec_old,sum(sapply(0:(k-1),function(i){1-choose(k,i)*beta_vec_old[i+1]*exp(-i*(k-i))})))
}}
beta <- function(k){
tail(beta_vec(k),1)
}
# > beta_vec(5)
# [1] 0.000000 1.000000 1.264241 2.080705 3.247551 4.528104
# > beta(5)
# [1] 4.528104
(edited for typo in code)
Related
I'm pretty new and a beginner in using R.
That's my Problem:
I have a large raster with lots of cells. It's a binary raster, so there are just 0s and 1s. I have to go through the whole raster and find the 0s. If cell [i,j] is a 0, then I need to look pairwise at its 4 neighbours.
I just wanted to try this with a small 7x7 Matrix.
My idea was to use a loop like this:
nr3=0
for (i in 1:7)
{for (j in 1:7)
{if (m[i,j]==0)
{if (m[i-1,j]!=0&&m[i,j-1]!=0)
{nr3++}
if (m[i-1,j]!=0&&m[i,j+1]!=0)
{nr3++}
if (m[i,j+1]!=0&&m[i+1,j]!=0)
{nr3++}
if (m[i+1,j]!=0&&m[i,j-1]!=0)
{nr3++} }}}
so that's what it has to be.
but there is this error:
Error in if (m[i-1,j]!=0&&m[i,j-1]!=0 {: missing value where TRUE/FALSE needed
I can see the problem. At the boundary you can't compare all the neighbours.
That's why I tried that with
for (i in 2:6)
for (j in 2:6)
It worked. But the Problem is that some are missing.
So what could I do?
By the way, I hope there is another possibility to solve this task. Maybe I don't need a Loop? I can image that this is not a very good solution for a very large raster.
Does anyone have an idea?
Make use of the raster library. This should be faster than your loop approach:
Dummy Matrix
library(raster)
#create a dummy raster
m <- raster(matrix(c(0,1,1,0,0,1,0,1,0), nrow=3))
> as.matrix(m)
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 1 0 1
[3,] 1 1 0
Focal window with 4 neighbours
#define the size of your moving window (focal)
f <- matrix(c(0,1,0,1,1,1,0,1,0), nrow=3)
Use function raster::focal
for the pairwise comparison and a <<- assignment:
nr3 <- 0
focal(m,
w = f,
pad = T,
padValue = 0,
fun = function(x){
#x[5] corresponds to the center pixel in the moving window
if(x[5]==0){
#x[2],x[4],x[6] and x[8] are the four neighbours
if(x[2]!=0 & x[4]!=0){
nr3 <<- nr3 + 1
}
if(x[8]!=0 & x[4]!=0){
nr3 <<- nr3 + 1
}
if(x[8]!=0 & x[6]!=0){
nr3 <<- nr3 + 1
}
if(x[2]!=0 & x[6]!=0){
nr3 <<- nr3 + 1
}
}
}
)
Output:
> nr3
[1] 3
Note:
You have to use the <<- assignment here to be able to modify a variable outside the functions environment. To quote Hadley:
The regular assignment arrow, <-, always creates a variable in the
current environment. The deep assignment arrow, <<- , never creates a
variable in the current environment, but instead modifies an existing
variable found by walking up the parent environments.
function(q,b,Data1,Data2){
x<-sum(
ifelse(Data1[13+q,b]/Data1[12+q,b]>Data2[13+q,1]/Data2[12+q,1],1,0)+
ifelse(Data1[13+q,b]/Data1[11+q,b]>Data2[13+q,1]/Data2[11+q,1],1,0)+
ifelse(Data1[13+q,b]/Data1[10+q,b]>Data2[13+q,1]/Data2[10+q,1],1,0)+
ifelse(Data1[13+q,b]/Data1[9+q,b]>Data2[13+q,1]/Data2[9+q,1],1,0)+
ifelse(Data1[13+q,b]/Data1[8+q,b]>Data2[13+q,1]/Data2[8+q,1],1,0)+
ifelse(Data1[13+q,b]/Data1[7+q,b]>Data2[13+q,1]/Data2[7+q,1],1,0)+
ifelse(Data1[13+q,b]/Data1[6+q,b]>Data2[13+q,1]/Data2[6+q,1],1,0)+
ifelse(Data1[13+q,b]/Data1[5+q,b]>Data2[13+q,1]/Data2[5+q,1],1,0)+
ifelse(Data1[13+q,b]/Data1[4+q,b]>Data2[13+q,1]/Data2[4+q,1],1,0)+
ifelse(Data1[13+q,b]/Data1[3+q,b]>Data2[13+q,1]/Data2[3+q,1],1,0)+
ifelse(Data1[13+q,b]/Data1[2+q,b]>Data2[13+q,1]/Data2[2+q,1],1,0)+
ifelse(Data1[13+q,b]/Data1[1+q,b]>Data2[13+q,1]/Data2[1+q,1],1,0)
)/12
}
Is there a way to simplify this? (no characters, only numbers in the data sets)
Thank you
Two pieces of knowledge you can combine to improve your code:
Firstly, you can divide a single number by a vector and R will return a vector with elementwise divisions. For example:
5 / c(1,2,3,4,5,6)
# [1] 5.0000000 2.5000000 1.6666667 1.2500000 1.0000000 0.8333333
The numerator on both sides of the inequality are the same all the time, you can use the above. So instead of explicitly calling it for every inequality, you can just call it once.
Secondly, an expression with TRUE or FALSE will be coerced to 1 and 0 when you try to perform arithmetic operations (in your case division, or calculating a mean). Inequalities return TRUE or FALSE values. Explicitly telling R to convert them to 0 and 1 is wasted energy, because R will automatically do it in your last step.
Putting this together in a simplified function:
function(q, b, Data1, Data2){
qseq <- (1:12) + q # Replaces all "q+1", "q+2", ... , "q+12"
dat1 <- Data1[qseq, b] # Replaces all "Data1[q+1, b]", ... "Data1[q+12, b]"
dat2 <- Data2[qseq, 1] # Replaces all "Data2[q+1, 1]", ... "Data2[q+12, 1]"
mean( Data1[13+q, b]/dat1 > Data2[13+q, 1]/dat2 )
this simplify a bit:
function(q,b,Data1,Data2){
data1_num <- Data1[13+q,b]
data2_num <- Data2[13+q,1]
x <- 0
for (i in 1:12) {
x <- x + ((data1_num/Data1[i+q,b]) > (data2_num /Data2[i+q,1]))
}
x <- x /12
#return(x)
}
But If you provide data example, and the output your expecting, i'm sure there is way to simplify it better
I am trying to generate n random numbers whose sum is less than 1.
So I can't just run runif(3). But I can condition each iteration on the sum of all values generated up to that point.
The idea is to start an empty vector, v, and set up a loop such that for each iteration, i, a runif() is generated, but before it is accepted as an element of v, i.e. v[i] <- runif(), the test sum(v) < 1 is carried out, and while FALSE the last entry v[i] is finally accepted, BUT if TRUE, that is the sum is greater than 1, v[i] is tossed out of the vector, and the iteration i is repeated.
I am far from implementing this idea, but I would like to resolve it along the lines of something similar to what follows. It's not so much a practical problem, but more of an exercise to understand the syntax of loops in general:
n <- 4
v <- 0
for (i in 1:n){
rdom <- runif(1)
if((sum(v) + rdom) < 1) v[i] <- rdom
}
# keep trying before moving on to iteration i + 1???? i <- stays i?????
}
I have looked into while (actually I incorporated the while function in the title); however, I need the vector to have n elements, so I get stuck if I try something that basically tells R to add random uniform realizations as elements of the vector v while sum(v) < 1, because I can end up with less than n elements in v.
Here's a possible solution. It doesn't use while but the more generic repeat. I edited it to use a while and save a couple of lines.
set.seed(0)
n <- 4
v <- numeric(n)
i <- 0
while (i < n) {
ith <- runif(1)
if (sum(c(v, ith)) < 1) {
i <- i+1
v[i] <- ith
}
}
v
# [1] 0.89669720 0.06178627 0.01339033 0.02333120
Using a repeat block, you must check for the condition anyways, but, removing the growing problem, it would look very similar:
set.seed(0)
n <- 4
v <- numeric(n)
i <- 0
repeat {
ith <- runif(1)
if (sum(c(v, ith)) < 1) {
i <- i+1
v[i] <- ith
}
if (i == 4) break
}
If you really want to keep exactly the same procedure that you have posted (aka iteratively sample the n values one at a time from the standard uniform distribution, rejecting any samples that cause your sum to exceed 1), then the following code is mathematically equivalent, shorter, and more efficient:
samp <- function(n) {
v <- rep(0, n)
for (i in 1:n) {
v[i] <- runif(1, 0, 1-sum(v))
}
v
}
Basically, this code uses the mathematical fact that if the sum of the vector is currently sum(v), then sampling from the standard uniform distribution until you get a value no greater than 1-sum(v) is exactly equivalent to sampling in the uniform distribution from 0 to 1-sum(v). The advantage of using the latter approach is that it's much more efficient -- we don't need to keep rejecting samples and trying again, and can instead just sample once for each element.
To get a sense of the runtime differences, consider sampling 100 observations with n=10, comparing to a working implementation of the code from your post (copied from my other answer to this question):
OP <- function(n) {
v <- rep(0, n)
for (i in 1:n){
rdom <- runif(1)
while (sum(v) + rdom > 1) rdom <- runif(1)
v[i] <- rdom
}
v
}
set.seed(144)
system.time(samples.OP <- replicate(100, OP(10)))
# user system elapsed
# 261.937 1.641 265.805
system.time(samples.josliber <- replicate(100, samp(10)))
# user system elapsed
# 0.004 0.001 0.004
In this case, the new approach is approaching 100,000 times faster.
It sounds like you're trying to uniformly sample from a space of n variables where the following constraints hold:
x_1 + x_2 + ... + x_n <= 1
x_1 >= 0
x_2 >= 0
...
x_n >= 0
The "hit and run" algorithm is the mathematical machinery that enables you to do exactly this. In 2-dimensional space, the algorithm will sample uniformly from the following triangle, with each location in the shaded area being equally likely to be selected:
The algorithm is provided in R through the hitandrun package, which requires you to specify the linear inequalities that define the space through a constraint matrix, direction vector, and right-hand side vector:
library(hitandrun)
n <- 3
constr <- list(constr = rbind(rep(1, n), -diag(n)),
dir = c(rep("<=", n+1)),
rhs = c(1, rep(0, n)))
set.seed(144)
samples <- hitandrun(constr, n.samples=1000)
head(samples, 10)
# [,1] [,2] [,3]
# [1,] 0.28914690 0.01620488 0.42663224
# [2,] 0.65489979 0.28455231 0.00199671
# [3,] 0.23215115 0.00661661 0.63597912
# [4,] 0.29644234 0.06398131 0.60707269
# [5,] 0.58335047 0.13891392 0.06151205
# [6,] 0.09442808 0.30287832 0.55118290
# [7,] 0.51462261 0.44094683 0.02641638
# [8,] 0.38847794 0.15501252 0.31572793
# [9,] 0.52155055 0.09921046 0.13304728
# [10,] 0.70503030 0.03770875 0.14299089
Breaking down this code a bit, we generated the following constraint matrix:
constr
# $constr
# [,1] [,2] [,3]
# [1,] 1 1 1
# [2,] -1 0 0
# [3,] 0 -1 0
# [4,] 0 0 -1
#
# $dir
# [1] "<=" "<=" "<=" "<="
#
# $rhs
# [1] 1 0 0 0
Reading across the first line of constr$constr we have 1, 1, 1 which indicates "1*x1 + 1*x2 + 1*x3". The first element of constr$dir is <=, and the first element of constr$rhs is 1; putting it together we have x1 + x2 + x3 <= 1. From the second row of constr$constr we read -1, 0, 0 which indicates "-1*x1 + 0*x2 + 0*x3". The second element of constr$dir is <= and the second element of constr$rhs is 0; putting it together we have -x1 <= 0 which is the same as saying x1 >= 0. The similar non-negativity constraints follow in the remaining rows.
Note that the hit and run algorithm has the nice property of having the exact same distribution for each of the variables:
hist(samples[,1])
hist(samples[,2])
hist(samples[,3])
Meanwhile, the distribution of the samples from your procedure will be highly uneven, and as n increases this problem will get worse and worse.
OP <- function(n) {
v <- rep(0, n)
for (i in 1:n){
rdom <- runif(1)
while (sum(v) + rdom > 1) rdom <- runif(1)
v[i] <- rdom
}
v
}
samples.OP <- t(replicate(1000, OP(3)))
hist(samples.OP[,1])
hist(samples.OP[,2])
hist(samples.OP[,3])
An added advantage is that the hit-and-run algorithm appears faster -- I generated these 1000 replicates in 0.006 seconds on my computer with hit-and-run and it took 0.3 seconds using the modified code from the OP.
Here's how I would do it, without any loop, if or while:
set.seed(123)
x <- runif(1) # start with the sum that you want to obtain
n <- 4 # number of generated random numbers, can be chosen arbitrarily
y <- sort(runif(n-1,0,x)) # choose n-1 random points to cut the range [0:x]
z <- c(y[1],diff(y),x-y[n-1]) # result: determine the length of the segments
#> z
#[1] 0.11761257 0.10908627 0.02723712 0.03364156
#> sum(z)
#[1] 0.2875775
#> all.equal(sum(z),x)
#[1] TRUE
The advantage here is that you can determine exactly which sum you want to obtain and how many numbers n you want to generate for this. If you set, e.g., x <- 1 in the second line, the n random numbers stored in the vector z will add up to one.
I am a complete statistical noob and new to R, hence the question. I've tried to find an implementation of the Rao score for the particular case when one's data is binary and each observation has bernoulli distribution. I stumbled upon anova in the R language but failed to understand how to use that. Therefore, I tried implementing Rao score for this particular case myself:
rao.score.bern <- function(data, p0) {
# assume `data` is a list of 0s and 1s
y <- sum(data)
n <- length(data)
phat <- y / n
z <- (phat - p0) / sqrt(p0 * (1 - p0) / n)
p.value <- 2 * (1 - pnorm(abs(z)))
}
I am pretty sure that there is a bug in my code because it produces only two distinct p-values in the following scenario:
p0 <- 1 / 4
p <- seq(from=0.01, to=0.5, by=0.01)
n <- seq(from=5, to=70, by=1)
g <- expand.grid(n, p)
data <- apply(g, 1, function(x) rbinom(x[1], 1, x[2]))
p.values <- sapply(data, function(x) rao.score.bern(x[[1]], p0))
Could someone please show me where the problem is? Could you perhaps point me to a built-in solution in R?
First test, then debug.
Test
Does rao.score.bern work at all?
rao.score.bern(c(0,0,0,1,1,1), 1/6))
This returns...nothing! Fix it by replacing the ultimate line by
2 * (1 - pnorm(abs(z)))
This eliminates the unnecessary assignment.
rao.score.bern(c(0,0,0,1,1,1), 1/6))
[1] 0.02845974
OK, now we're getting somewhere.
Debug
Unfortunately, the code still doesn't work. Let's debug by yanking the call to rao.score.bern and replacing it by something that shows us the input. Don't apply it to the large input you created! Use a small piece of it:
sapply(data[1:5], function(x) x[[1]])
[1] 0 0 0 0 0
That's not what you expected, is it? It's returning just one zero for each element of data. What about this?
sapply(data[1:5], function(x) x)
[[1]]
[1] 0 0 0 0 0
[[2]]
[1] 0 0 0 0 0 0
...
[[5]]
[1] 0 0 0 0 0 0 0 0 0
Much better! The variable x in the call to sapply refers to the entire vector, which is what you want to pass to your routine. Whence
p.values <- sapply(data, function(x) rao.score.bern(x, p0)); hist(p.values)
a) Create a vector X of length 20, with the kth element in X = 2k, for k=1…20. Print out the values of X.
b) Create a vector Y of length 20, with all elements in Y equal to 0. Print out the values of Y.
c) Using a for loop, reassigns the value of the k-th element in Y, for k = 1…20. When k < 12, the kth element of Y is reassigned as the cosine of k. When the k ≥ 12, the kth element of Y is reassigned as the value of integral sqrt(t)dt from 0 to K.
for the first two questions, it is simple.
> x1 <- seq(1,20,by=2)
> x <- 2 * x1
> x
[1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
> y <- rep(0,20)
> y
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
i got stuck on the last one,
t <- function(i) sqrt(i)
for (i in 1:20) {
if (i < 12) {
y[i] <- cos(i)
}
else if (i >= 12) {
y[i] <- integral(t, lower= 0, Upper = 20)
}
}
y // print new y
Any suggestions? thanks.
What may help is that the command to calculate a one-dimensional integral is integrate not integral.
You have successfully completed the first two, so I'll demonstrate a different way of getting those vectors:
x <- 2 * seq_len(20)
y <- double(length = 20)
As for your function, you have the right idea, but you need to clean up your syntax a bit. For example, you may need to double-check your braces (using a set style like Hadley Wickham's will help you prevent syntax errors and make the code more readable), you don't need the "if" in the else, you need to read up on integrate and see what its inputs, and importantly its outputs are (and which of them you need and how to extract it), and lastly, you need to return a value from your function. Hopefully, that's enough to help you work it out on your own. Good Luck!
Update
Slightly different function to demonstrate coding style and some best practices with loops
Given a working answer has been posted, this is what I did when looking at your question. I think it is worth posting, as as I think that it is a good habit to 1) pre-allocate answers 2) prevent confusion about scope by not re-using the input variable name as an output and 3) use the seq_len and seq_along constructions for for loops, per R Inferno(pdf) which is required reading, in my opinion:
tf <- function(y){
z <- double(length = length(y))
for (k in seq_along(y)) {
if (k < 12) {
z[k] <- cos(k)
} else {
z[k] <- integrate(f = sqrt, lower = 0, upper = k)$value
}
}
return(z)
}
Which returns:
> tf(y)
[1] 0.540302306 -0.416146837 -0.989992497 -0.653643621 0.283662185 0.960170287 0.753902254
[8] -0.145500034 -0.911130262 -0.839071529 0.004425698 27.712816032 31.248114562 34.922139530
[15] 38.729837810 42.666671456 46.728535669 50.911693960 55.212726149 59.628486093
To be honest you almost have it ready and it is good that you have showed some code here:
y <- rep(0,20) #y vector from question 2
for ( k in 1:20) { #start the loop
if (k < 12) { #if k less than 12
y[k] <- cos(k) #calculate cosine
} else if( k >= 12) { #else if k greater or equal to 12
y[k] <- integrate( sqrt, lower=0, upper=k)$value #see below for explanation
}
}
print(y) #prints y
> print(y)
[1] 0.540302306 -0.416146837 -0.989992497 -0.653643621 0.283662185 0.960170287 0.753902254 -0.145500034 -0.911130262 -0.839071529 0.004425698
[12] 27.712816032 31.248114562 34.922139530 38.729837810 42.666671456 46.728535669 50.911693960 55.212726149 59.628486093
First of all stats::integrate is the function you need to calculate the integral
integrate( sqrt, lower=0, upper=2)$value
The first argument is a function which in your case is sqrt. sqrt is defined already in R so there is no need to define it yourself explicitly as t <- function(i) sqrt(i)
The other two arguments as you correctly set in your code are lower and upper.
The function integrate( sqrt, lower=0, upper=2) will return:
1.885618 with absolute error < 0.00022
and that is why you need integrate( sqrt, lower=0, upper=2)$value to only extract the value.
Type ?integrate in your console to see the documentation which will help you a lot I think.