Understanding Breakpoint function: how for loops work inside functions - r

I have the following exercise to be solved in R. Under the exercise, there is a hint towards the solution.
Exercise: If there are no ties in the data set, the function above will produce breakpoints with h observations in the interval between two consecutive breakpoints (except the last two perhaps). If there are ties, the function will by construction return unique breakpoints, but there may be more than h observations in some intervals.
Hint:
my_breaks <-function(x, h = 5) {
x <-sort(x)
breaks <- xb <- x[1]
k <- 1
for(i in seq_along(x)[-1])
{if(k<h)
{k <- k+1}
else{
if(xb<x[i-1]&&x[i-1]<x[i])
{xb <- x[i-1]
breaks <-c(breaks, xb)
k <- 1
}
}
}
However, I am having a hard time understanding the above function particularly the following lines
for(i in seq_along(x)[-1])
{if(k<h)
{k <- k+1}
Question:
How is the for loop supposed to act in k if k is previously defined as 1 and i is different than k? How are the breakpoints chosen according to the h=5 gap if the for loop is not acting on x? Can someone explain to me how this function works?
Thanks in advance!

First, note that your example is incomplete. The return value and the final brace are missing there. Here is the correct version.
my_breaks <-function(x, h = 5) {
x <- sort(x)
breaks <- xb <- x[1]
k <- 1
for(i in seq_along(x)[-1]){
if(k<h) {
k <- k+1
} else {
if(xb<x[i-1]&&x[i-1]<x[i]){
xb <- x[i-1]
breaks <-c(breaks, xb)
k <- 1
}
}
}
breaks
}
Let's check if it works.
my_breaks(c(1,1,1:5,8:10), 2)
#[1] 1 2 4 8
my_breaks(c(1,1,1:5,8:10), 5)
#[1] 1 3
As you can see, everything is fine. And what is seq_along(x)[-1]? We could write this equation as 2:length(x). So the for loop goes through each element of the vector x in sequence, skipping the first element.
What is the k variable for? It counts the distance to take into account the h parameter.

Related

How to generate n random variables from x using sample() function when x has geometric distribution?

I am trying to write a function in R to generate n random variables from x using sample () function when x~Ge(p) (it means x has geometric distribution). In my function I would like to use a while loop.
I think my function needs two inputs as size and p. I need also a for loop in my function. What I think will work is something like a below framework for my function:
rGE <- function(size,p){
for
i<-1
while()
...
return(i)
}
I would like to develope my above function in order to generate n random variables from x (when x~Ge(p))
For a home-grown, inefficient (but comprehensible) version of rgeom, something like this should work:
my_rgeom <- function(n, p) {
x <- numeric(n) ## allocate space for the results (all zeros)
for (i in seq(n)) {
done <- FALSE
while (!done) {
x[i] <- x[i] + 1
done <- runif(1)<p
}
}
return(x)
}
I'm sure you could use sample() instead of runif() for the innermost loop, but it's not obvious to me how. One piece of advice: if you're unfamiliar with programming, try writing your proposed algorithm out as pseudocode rather than jumping in to R-bashing right away. It can be easier if you deal with the logic and the coding nuts-and-bolts separately ...
You could use rgeom:
set.seed(1)
rgeom(n = 10, p = .1)
#> [1] 6 3 23 3 24 13 15 2 20 3
I have finally written the below function:
rge<- function(n, p) {
x <- numeric(n)
for (i in seq(n)) {
j <- 0
while (j==0) {
x[i] <- x[i] + 1
j <- sum(sample(0:1,replace=TRUE,prob=c(1-p,p)))
}
}
return(x)
}
rge(10,.2)
I hope it really generates n random variables number from geometric distribution.

R: Storing Loop Result in a Vector

I'm trying to run a loop that stores results in a vector. But I also need to increase the counter across a predetermined vector for the stored calculation to run properly. I'm stuck on two parts: (1) increasing the counter, and (2) storing the result of the loop in a vector.
I'm new to loops so bear with the most likely incorrect syntax below; here's what I'm working with:
x <- c(.01,.05,.10,.20,.25) # observed defect rates
for(i in x) {
j <- 1
if(x < 1){
atmost2[] <- dbinom(0,size=10,prob=x[[j]])+
dbinom(1,size=10,prob=x[[j]])+
dbinom(2,size=10,prob=x[[j]]) &&
j <- j + 1
}
}
atmost2
Essentially I'd like to store the result in a new vector, atmost2, with each successive loop running across the vector values in x by increasing j; j should increase to change the prob parameter in dbinom from the predetermined vector values in x.
Can anyone help out?
A few things:
juljo is correct to initialize the vector before a loop, and they made some other corrections, but I think their code only works if you have already established:
j <- 1
Without that, juljo's code breaks.
Also, your code doesn't need the '&&' to work. Just put j<-j+1 on a new line, like this (Using julgo's code)
j <- 1
x <- c(.01,.05,.10,.20,.25) # observed defect rates
atmost2 <- as.numeric(1:length(x)) # this initializes the vector to a predetermined length which may help with very large loops
for(i in 1:length(x)) {
if(x < 1){
atmost2[i] <- dbinom(0,size=10,prob=x[j])+ # note that the double brackets are gone
dbinom(1,size=10,prob=x[j])+
dbinom(2,size=10,prob=x[j])
}
j <- j + 1 # I think you want j to increment outside the if statement
}
atmost2
This code does 'something' but there are a few warnings and I'm not sure what you are trying to do.
You could also skip the adding of the dbinoms and instead to this:
j <- 1
x <- c(.01,.05,.10,.20,.25) # observed defect rates
atmost2 <- as.numeric(1:length(x)) # this initializes the vector to a predetermined length which may help with very large loops
for(i in 1:length(x)) {
if(x < 1){
atmost2[i] <- sum(dbinom( 0:2 , size=10,prob=x[j])) #dbinom will give a vector that can be summed
}
j <- j + 1 # I think you want j to increment outside the if statement
}
atmost2
But I think using the j iterator might be habit from other programming languages. Notice the same out put using a loop but without j:
x <- c(.01,.05,.10,.20,.25) # observed defect rates
atmost2 <- as.numeric(1:length(x)) # this initializes the vector to a predetermined length which may help with very large loops
for(i in 1:length(x)) {
if(x < 1){
atmost2[i] <- sum(dbinom(0:2,size=10,prob=x[i]))
}
}
atmost2
These all produce the same output:
> atmost2
[1] 0.9998862 0.9884964 0.9298092 0.6777995 0.5255928
But I have follow up questions:
Should atmost2 be the same length as x?
Are you using the values in x as probabilities? So, atmost2 is a sum of dbinom probabilities based on the value of x[i]?
Does it have to be a loop? R uses vectors very well, so the apply functions may be helpful. You might find lapply to be of use here.
?apply might start you off while
?lapply will give descriptions of the other apply functions.
So your code may look like this
x <- c(.01, .05, .10, .20, .25)
atmost2 <- as.numeric(1:length(x))
atmost2 <- lapply(x, function(x) sum(dbinom( 0:2 , size = 10, prob = x)))
atmost2 # this is a list, not a vector
the lapply function reads like this:
apply to items in a list, 'x', a function.
In this case, the function is an anonymous function "sum(dbinom....)"
So, apply to each value of x the function sum(dbinom...) and return a list.
Basically, it does the loop for you. And often times faster than a for-loop (in R).
If you need atmost2 to not be a list and instead a vector, you can:
unlist(atmost2)
> unlist(atmost2)
[1] 0.9998862 0.9884964 0.9298092 0.6777995 0.5255928
edit based on the reminder of Rui
Using sapply, everything else is the same but the output is indeed a vector.
x <- c(.01, .05, .10, .20, .25)
atmost2 <- as.numeric(1:length(x))
atmost2 <- sapply(x, function(x) sum(dbinom( 0:2 , size = 10, prob = x)))
atmost2 # this is a vector
How about calling elements like this:
x <- c(.01,.05,.10,.20,.25) # observed defect rates
atmost2 <- numeric() # Initialize vector before filling it in the loop
for(i in 1:length(x)) {
if(x[i] < 1){
atmost2[i] <- dbinom(0,size=10,prob=x[i])+
dbinom(1,size=10,prob=x[i])+
dbinom(2,size=10,prob=x[i])
}
}
atmost2

How to make an R function that loops over two lists

I have an event A that is triggered when the majority of coin tosses in a series of tosses comes up heads. I have an unfair coin and I'd like to see how the likelihood of A changes as the number of tosses change and the probability in each toss changes.
This is my function assuming 3 tosses
n <- 3
#victory requires majority of tosses heads
#tosses only occur in odd intervals
k <- seq(n/2+.5,n)
victory <- function(n,k,p){
for (i in p) {
x <- 0
for (i in k) {
x <- x + choose(n, k) * p^k * (1-p)^(n-k)
}
z <- x
}
return(z)
}
p <- seq(0,1,.1)
victory(n,k,p)
My hope is the victory() function would:
find the probability of each of the outcomes where the majority of tosses are heads, given a particular value p
sum up those probabilities and add them to a vector z
go back and do the same thing given another probability p
I tested this with n <- 3, k <- c(2,3) and p <- (.5,.75) and the output was 0.75000, 0.84375. I know that the output should've been 0.625, 0.0984375.
I wasn't able to get exactly the result you wanted, but maybe can help you along a bit.
When looping in R the vector you are looping through remains unchanged and value you are using to loop changes. For example see the differences in these loops:
test <- seq(0,1,length.out = 5)
for ( i in test){
print(test)
}
for ( i in test){
print(i)
}
for ( i in 1:length(test)){
print(test[i])
}
when you are iterating you are firstly setting i to the first number in p, then to the first number in k and then using the unchanged vectors.
You are also assigning to z in the first loop of p and then writing over it in the second loop.
Try using the below - I am still not getting the answer you say but it might help you find where the error is (printing out along the way or using debug(victory) might also be helpful
victory <- function(n,k,p){
z <-list()
for (i in 1:length(p)) {
x <- 0
for (j in 1:length(k)) {
x <- x + choose(n, k[j]) * p[i]^k[j] * (1-p[i])^(n-k[j])
}
z[i] <- x
}
return(z)
}

For-Loop in R with if-else: How to save the output

I am trying to save the output of the code below. I know "print" is the problem, but I do not know what works instead.
I generally wonder if there is not another way instead of the for-loop: For each value in the vector (x), I want to draw a new random number (here with runif) and match it to a given value (here for example 0.5). Depending on the result, a new value for x should be stored in a vector x2 (similar to the if-else example below). Waiving the for-loop, I could not find a way to always draw a new random number for each value in vector x.
I would be very grateful for any help!
x <- c(2,2,2,3,3,3)
for(i in x){
if(runif(1) <= 0.5){
print(i + 1)
} else {
print(i)
}
}
Or you could use lapply, then you don't have to modify an object outside your loop each step.
x <- c(2,2,2,3,3,3)
x2 <- unlist(lapply(x, function(x){
if(runif(1) <= 0.5) return(x +1)
return(x)
}))
x2
Try this code:
x <- c(2,2,2,3,3,3)
x2<-NULL
for(i in 1:length(x)){
if(runif(1) <= 0.5){
x2[i]<-1
} else {
x2[i]<-2
}
}
Your output
x2
[1] 1 2 2 1 2 1
In x2 you have random numbers with given values (1 and 2) related to the runif probability.
This is the same thing in a single row:
ifelse(runif(n = length(x))<=0.5,1,2)
[1] 1 2 2 2 1 1

re-expressing a simple operation as a function in R

I am trying to construct a new variable, z, using two pre-existing variables - x and y.  Suppose for simplicity that there are only 5 observations (corresponding to 5 time periods) and that x=c(5,7,9,10,14) and y=c(0,2,1,2,3). I’m really only using the first observation in x as the initial value, and then constructing the new variable z using depreciated values of x[1] (depreciation rate of 0.05 per annum) and each of the observations over time in the vector, y. The variable I am constructing takes the form of a new 5 by 1 vector, z, and it can be obtained using the following simple commands in R:
z=NULL
for(i in 1:length(x)){n=seq(1,i,by=1)
z[i]=sum(c(0.95^(i-1)*x[1],0.95^(i-n)*y[n]))}
The problem I am having is that I need to define this operation as a function. That is, I need to create a function f that will spit out the vector z whenever any arbitrary vectors x and y are plugged into the function, f(x,y). I’ve been going around in circles for days now and I was wondering if someone would be kind enough to provide me with a suggestion about how to proceed. Thanks in advance.
I hope following will work for you...
x=c(5,7,9,10,14)
y=c(0,2,1,2,3)
getZ = function(x,y){
z = NULL
for(i in 1:length(x)){
n=seq(1,i,by=1)
z[i]=sum(c(0.95^(i-1)*x[1],0.95^(i-n)*y[n]))
}
return = z
}
z = getZ(x,y)
z
5.000000 6.750000 7.412500 9.041875 11.589781
This will allow .05 (or any other value) passed in as r.
ConstructZ <- function(x, y, r){
n <- length(y)
d <- 1 - r
Z <- vector(length = n)
for(i in seq_along(x)){
n = seq_len(i)
Z[i] = sum(c(d^(i-1)*x[1],d^(i-n)*y[n]))
}
return(Z)
}
Here is a cool (if I say so myself) way to implement this as an infix operator (since you called it an operation).
ff = function (x, y, i) {
n = seq.int(i)
sum(c(0.95 ^ (i - 1) * x[[1]],
0.95 ^ (i - n) * y[n]))
}
`%dep%` = function (x, y) sapply(seq_along(x), ff, x=x, y=y)
x %dep% y
[1] 5.000000 6.750000 7.412500 9.041875 11.589781
Doing the loop multiple times and recalculating the exponents every time may be inefficient. Here's another way to implement your calculation
getval <- function(x,y,lambda=.95) {
n <- length(y)
pp <- lambda^(1:n-1)
yy <- sapply(1:n, function(i) {
sum(y * c(pp[i:1], rep.int(0, n-i)))
})
pp*x[1] + yy
}
Testing with #vrajs5's sample data
x=c(5,7,9,10,14)
y=c(0,2,1,2,3)
getval(x,y)
# [1] 5.000000 6.750000 7.412500 9.041875 11.589781
but appears to be about 10x faster when testing on larger data such as
set.seed(15)
x <- rpois(200,20)
y <- rpois(200,20)
I'm not sure of how often you will run this or on what size of data so perhaps efficiency isn't a concern for you. I guess readability is often more important long-term for maintenance.

Resources