longest Collatz sequence 1:n - r

I am doing some studying on the collatz conjecture. I spend a while researching and came across the following exercise in R. I tried it for a while, but couldn't get it to work. I figured out how to make one collatz sequence, but dit not come any further than that.
This is the question: Write a function that expects a positive natural number n and returns the longest Collatz sequence
generated by any number smaller than or equal to n. The function must also return the length of this
Collatz sequence and the starting value of the sequence. To do this, you can put all the to-be-returned
objects in a list and return the list
This is how far I came:
collatz_sequence <- function(p) {
collatz <- vector()
collatz[1] <- p
i <- 1
while (p > 1) {
if (p %% 2 == 0)
p <- p / 2
else
p <- 3 * p + 1
collatz[i+1] <- p
i <- i + 1
length_seq <- length(collatz)
}
collist <- list(collatz, length(collatz), collatz[1])
return(collist)
}
I would really appreciate the help!

Related

Understanding Breakpoint function: how for loops work inside functions

I have the following exercise to be solved in R. Under the exercise, there is a hint towards the solution.
Exercise: If there are no ties in the data set, the function above will produce breakpoints with h observations in the interval between two consecutive breakpoints (except the last two perhaps). If there are ties, the function will by construction return unique breakpoints, but there may be more than h observations in some intervals.
Hint:
my_breaks <-function(x, h = 5) {
x <-sort(x)
breaks <- xb <- x[1]
k <- 1
for(i in seq_along(x)[-1])
{if(k<h)
{k <- k+1}
else{
if(xb<x[i-1]&&x[i-1]<x[i])
{xb <- x[i-1]
breaks <-c(breaks, xb)
k <- 1
}
}
}
However, I am having a hard time understanding the above function particularly the following lines
for(i in seq_along(x)[-1])
{if(k<h)
{k <- k+1}
Question:
How is the for loop supposed to act in k if k is previously defined as 1 and i is different than k? How are the breakpoints chosen according to the h=5 gap if the for loop is not acting on x? Can someone explain to me how this function works?
Thanks in advance!
First, note that your example is incomplete. The return value and the final brace are missing there. Here is the correct version.
my_breaks <-function(x, h = 5) {
x <- sort(x)
breaks <- xb <- x[1]
k <- 1
for(i in seq_along(x)[-1]){
if(k<h) {
k <- k+1
} else {
if(xb<x[i-1]&&x[i-1]<x[i]){
xb <- x[i-1]
breaks <-c(breaks, xb)
k <- 1
}
}
}
breaks
}
Let's check if it works.
my_breaks(c(1,1,1:5,8:10), 2)
#[1] 1 2 4 8
my_breaks(c(1,1,1:5,8:10), 5)
#[1] 1 3
As you can see, everything is fine. And what is seq_along(x)[-1]? We could write this equation as 2:length(x). So the for loop goes through each element of the vector x in sequence, skipping the first element.
What is the k variable for? It counts the distance to take into account the h parameter.

How to make an R function that loops over two lists

I have an event A that is triggered when the majority of coin tosses in a series of tosses comes up heads. I have an unfair coin and I'd like to see how the likelihood of A changes as the number of tosses change and the probability in each toss changes.
This is my function assuming 3 tosses
n <- 3
#victory requires majority of tosses heads
#tosses only occur in odd intervals
k <- seq(n/2+.5,n)
victory <- function(n,k,p){
for (i in p) {
x <- 0
for (i in k) {
x <- x + choose(n, k) * p^k * (1-p)^(n-k)
}
z <- x
}
return(z)
}
p <- seq(0,1,.1)
victory(n,k,p)
My hope is the victory() function would:
find the probability of each of the outcomes where the majority of tosses are heads, given a particular value p
sum up those probabilities and add them to a vector z
go back and do the same thing given another probability p
I tested this with n <- 3, k <- c(2,3) and p <- (.5,.75) and the output was 0.75000, 0.84375. I know that the output should've been 0.625, 0.0984375.
I wasn't able to get exactly the result you wanted, but maybe can help you along a bit.
When looping in R the vector you are looping through remains unchanged and value you are using to loop changes. For example see the differences in these loops:
test <- seq(0,1,length.out = 5)
for ( i in test){
print(test)
}
for ( i in test){
print(i)
}
for ( i in 1:length(test)){
print(test[i])
}
when you are iterating you are firstly setting i to the first number in p, then to the first number in k and then using the unchanged vectors.
You are also assigning to z in the first loop of p and then writing over it in the second loop.
Try using the below - I am still not getting the answer you say but it might help you find where the error is (printing out along the way or using debug(victory) might also be helpful
victory <- function(n,k,p){
z <-list()
for (i in 1:length(p)) {
x <- 0
for (j in 1:length(k)) {
x <- x + choose(n, k[j]) * p[i]^k[j] * (1-p[i])^(n-k[j])
}
z[i] <- x
}
return(z)
}

Data generation: Creating a vector of vectors

I have a vector of positive integers of unknown length. Let's call it vector a with elements a[1], a[2], ...
I want to perform calculations on vector b where for all i, 0 <= b[i] <= a[i].
The following does not work:
for(b in 0:a)
{
# calculations
}
The best I have come up with is:
probabilities <- function(a,p)
{
k <- a
k[1] <- 1
h <- rep(0,sum(a)+1)
for(i in 2:length(a))
{
k[i] <- k[i-1]*(a[i-1]+1)
}
for(i in 0:prod(a+1))
{
b <- a
for(j in 1:length(a))
{
b[j] <- (floor(i/k[j]) %% (a[j]+1))
}
t <- 1
for(j in 1:length(a))
{
t <- t * choose(a[j],b[j])*(p[j])^(b[j])*(1-p[j])^(a[j]-b[j])
}
h[sum(b)+1] <- h[sum(b)+1] + t
}
return(h)
}
In the middle of my function is where I create b. I start off by setting b equal to a (so that it is the same size). Then, I replace all of the elements of b with different elements that are rather tricky to calculate. This seems like an inefficient solution. It works, but it is fairly slow as the numbers get large. Any ideas for how I can cut down on process time? Essentially, what this does for b is the first time through, b is all zeros. Then, it is 1, 0,0,0,... The first element keeps incrementing until it reaches a[1], then b[2] increments and b[1] is set to 0. Then b[1] starts incrementing again.
I know the math is sound, I just do not trust that it is efficient. I studied combinatorics for a few years, but have never studied computational complexity theory, so coming up with a fast algorithm is a bit beyond my realm of knowledge. Any ideas would be helpful!

Interpreting [R] Greatest Common Divisor (GCD) (and LCM) Function in {numbers} package

I don't have background in programming (except from wrestling with R to get things done), and I'm trying to verbalize what the formula for the greater common divisor in the R {numbers} package is trying to do at each step. I need help with understanding the flow of steps within the function:
function (n, m)
{
stopifnot(is.numeric(n), is.numeric(m))
if (length(n) != 1 || floor(n) != ceiling(n) || length(m) !=
1 || floor(m) != ceiling(m))
stop("Arguments 'n', 'm' must be integer scalars.")
if (n == 0 && m == 0)
return(0)
n <- abs(n)
m <- abs(m)
if (m > n) {
t <- n
n <- m
m <- t
}
while (m > 0) {
t <- n
n <- m
m <- t%%m
}
return(n)
}
<environment: namespace:numbers>
For instance, in the if (m > n) {} part the n becomes t and ultimately it becomes m? I'm afraid to ask, because it may be painfully obvious, but I don't know what is going on. The same apply to, I guess, he else part of the equation with %% being perhaps modulo.
What it says is:
Stop if either m or n are not numeric, more than one number, or have decimals, and return the message, "Arguments 'n', 'm' must be integer scalars."
If they both are zero, return zero.
Using absolute values from now on.
Make sure that n > m because of the algorithm we'll end up applying in the next step. If this is not the case flip them: initially place n in a temporary variable "t", and assign m to n, so that now the larger number is at the beginning of the (n, m) expression. At this point both the initial (n, m) values contain m. Finish it up by retrieving the value in the temporary variable and assigning it to m.
Now they apply the modified Euclidean algorithm to find the GCD - a more efficient version of the algorithm that shortcuts the multiple subtractions, instead replacing the larger of the two numbers by its remainder when divided by the smaller of the two.
The smaller number at the beginning of the algorithm will end up being the larger after the first iteration, therefore we'll assign it to n to get ready for the second iteration. To do so, though, we need to get the current n out of the way by assigning it to the temporary variable t. After that we get the modulo resulting from dividing the original larger number (n), which now is stored in t, by the smaller number m. The result will replace the number stored in m.
As long as there is a remainder (modulo) the process will go on, this time with the initial smaller number, m playing the role of the big guy. When there is no remainder, the smaller of the numbers in that particular iteration is returned.
ADDENDUM:
Now that I know how to read this function, I see that it is limited to two numbers in the input to the function. So I entertained myself putting together a function that can work with three integers in the input:
require(numbers)
GCF <- function(x,y,z){
tab.x <- tabulate(primeFactors(x))
tab.y <- tabulate(primeFactors(y))
tab.z <- tabulate(primeFactors(z))
max.len <- max(length(tab.x), length(tab.y), length(tab.z))
tab_x = c(tab.x, rep(0, max.len - length(tab.x)))
tab_y = c(tab.y, rep(0, max.len - length(tab.y)))
tab_z = c(tab.z, rep(0, max.len - length(tab.z)))
GCD_elem <- numeric()
for(i in 1:max.len){
GCD_elem[i] <- min(tab_x[i], tab_y[i], tab_z[i]) * i
}
GCD_elem <- GCD_elem[!GCD_elem==0]
GrCD <- prod(GCD_elem)
print(GrCD)
}
Also for the LCM:
LCM <- function(x,y,z){
tab.x <- tabulate(primeFactors(x))
tab.y <- tabulate(primeFactors(y))
tab.z <- tabulate(primeFactors(z))
max.len <- max(length(tab.x), length(tab.y), length(tab.z))
tab_x = c(tab.x, rep(0, max.len - length(tab.x)))
tab_y = c(tab.y, rep(0, max.len - length(tab.y)))
tab_z = c(tab.z, rep(0, max.len - length(tab.z)))
LCM_elem <- numeric()
for(i in 1:max.len){
LCM_elem[i] <- i^(max(tab_x[i], tab_y[i], tab_z[i]))
}
LCM_elem <- LCM_elem[!LCM_elem==0]
LCM <- prod(LCM_elem)
print(LCM)
}

Recursion in a prime generator

I'm making a prime generator, and to make it more efficient, i'm trying to only test numbers against primes that I've already found rather than all numbers < sqrt of the number being tested. I'm trying to get a to be my list of primes, but i'm not sure how to make it recur inside my second for loop. I think this is only testing against a <- 2 and not a <- c(a,i)
x <- 3:1000
a <- 2
for (i in x)
{for (j in a)
{if (i %% j == 0)
{next}
else {a <- unique(c(a,i))}}}
a
The solution might be to cut out the second loop and instead compare your proposed prime number to the entire vector instead, like:
x <- 3:1000
a <- 2
for (i in x) {
if (!any(i %% a == 0)) {
a <- c(a,i)
}
}
That seemed to work for me.
A non-recursive mod using simple prime function that's about as fast as you can make it in R is below. Rather than cycle through each individual value and test it's primeness it removes all of the multiples of primes in big chunks. This isolates each subsequent remaining value as a prime. So, it takes out 2x, then 3x, then 4 is gone so 5x values go. It's the most efficient way to do it in R.
primest <- function(n){
p <- 2:n
i <- 1
while (p[i] <= sqrt(n)) {
p <- p[p %% p[i] != 0 | p==p[i]]
i <- i+1
}
p
}
(you might want to see this stack question for faster methods using a sieve and also my timings of the function. What's above will run 50, maybe 500x faster than the version you're working from.)

Resources