Optimize simple r code for Project Euler 12 - r

The idea of Project Euler question 12 is to find the smallest triangular number with a specified number of divisors(https://projecteuler.net/problem=12). As an attempt to solve this problem, I wrote the following code:
# This function finds the number of divisors of a number and returns it.
FUN <- function(x) {
i = 1
lst = integer(0)
while(i<=x)
{
if(x %% i ==0)
{
lst = c(lst, i)
}
i = i +1
}
return(lst)
}
and
n = 1
i=1
while (length(FUN(n))<500)
{
i = i + 1
n = n + i
}
This code is producing the correct answer for few smaller test cases: length(FUN(n))<4 will produce 6, and length(FUN(n))<6 will produce 28.
However, this simple looking code is taking over 24 hours to run (and still running) for length(FUN(n))<500. I understand that for a number to have 500 divisors, the number is probably very big, but I am wondering why is it taking so long to run.

You FUN is much too inefficient for this task. As the first triangular number is above the 12,000th with a value of 75,000,000 and FUN runs through all these numbers ... the number of iterations to perform is almost
12000 * 75000000 / 2 = 450 * 10^9
This is clearly more than R's relatively slow for-loop can do in a reasonable time frame.
Instead, you could apply the divisors function from the numbers package that employs a prime factor decomposition. The following code need about 5-6 seconds (on my machine) to find the triangular number.
library(numbers)
t <- 0
system.time(
for (i in 1:100000) {
t <- t + i
d <- length( divisors(t) )
if (d > 500) {
cat(i, t, d, '\n')
break
}
}
)
## 12375 76576500 576
## user system elapsed
## 5.660 0.000 5.658
Instead of calculating the i-th triangular number, here i is added to the last triangular number. The time saving is minimal.

Here's my attempt:
library(gmp)
library(plyr)
get_all_factors <- function(n)
{
prime_factor_tables <- lapply(
setNames(n, n),
function(i)
{
if(i == 1) return(data.frame(x = 1L, freq = 1L))
plyr::count(as.integer(gmp::factorize(i)))
}
)
lapply(
prime_factor_tables,
function(pft)
{
powers <- plyr::alply(pft, 1, function(row) row$x ^ seq.int(0L, row$freq))
power_grid <- do.call(expand.grid, powers)
sort(unique(apply(power_grid, 1, prod)))
}
)
}
for (i in 99691200:100000) {
if (length(get_all_factors(i)[[1]])>500) print(paste(i, length(get_all_factors(i)[[1]])))
if (i %% 100000 == 0) print(paste("-",i,"-"))
}
Let it run as long as you can be bothered...

Related

Efficiently sum all elements of vector that add upp to no more than a specified limit

Need to apply code that has the following effect on a huge data set:
sum_to_limit <- function(x, limit) {
ret <- 0
if (length(na.omit(x)) > 0) {
for (i in seq_along(x)) {
if (ret + x[i] <= limit) ret <- ret + x[i]
}
}
return(ret)
}
In other words, I need to sum up all the elements of the supplied vector that add up to no more than limit. So for example sum_to_limit(c(10,10,10,10,5), 17) = 15. Have so far failed to come up with anything faster than above, which is not really cutting it on my data. Feels like it should be possible to create a vectorized version...
You can set up an optimization problem. Might only be worth if you have longer vectors rather than multiple small ones:
library(lpSolve)
sum_to_limit <- function(x, limit) {
sol <- lp ("max",
objective.in = rep(1, length(x)),
const.mat = matrix(x, nrow=1),
const.dir = "<=",
const.rhs = limit,
all.bin = T)
stopifnot(sol$status==0) # no solution
return(x[sol$solution==1])
}
print(sum_to_limit(c(10,10,10,10,5), 17))
I found it more interesting to return the selected entries rather than the sum.
You can sort the vector and add them upto they are less than the limit.
sum_to_limit <- function(x, limit) {
x <- sort(x)
sum(x[cumsum(x) <= limit])
}
sum_to_limit(c(10,10,10,10,5), 17)
#[1] 15
sum_to_limit(c(10,10,10,10,5), 35)
#[1] 35

Struggling with simulating N rolls for K dice

The function dice takes a parameter n, representing number of rolls for a single six-sided die. It returns a vector of length n that has elements that are integers b/w 1 and 6. I have created the following code for the dice function below. It seems to run properly when I test it.
dice <- function(n) {
x <- c(1:6)
sample(length(x), size = n, replace = TRUE, prob = x)
}
The function kdice takes two parameters, n and k. The parameter n is denoted for number of rolls done. The number of dice rolled is represented by parameter k. The function should return the sum of the k dices, rolled n times. Somehow I have to implement dice() within this function. Below is what I have completed thus far, however the function returns nothing. I have an If and Else statement to make sure that at least 1 dice was rolled at least 1 time. While loop is to make the sum of NumofDice is outputted until it reaches n. Would appreciate any insights, especially how to incorporate the function Dice() in kdice().
kdice <- function(k, n){
NumofDice <- sample(1:6, size = k, replace = TRUE)
RollCount = 0
if(k>0 && n>0) {
while(RollCount < n) {
RollCount = RollCount + 1
sum(NumofDice)
}
}
else {
print("No number of dices were rolled")
}
}
kdice <- function(k, n){
if(k>0 && n>0){
replicate(n, sum(sample(c(1:6),k, replace=TRUE)))
}
else {
print("No number of dices were rolled")
}
}
kdice(4,2)
[1] 15 8
You can try defining kdice using replicate + colSums like below
kdice <- function(k, n) {
tryCatch(
colSums(matrix(replicate(n, dice(k)), nrow = k)),
error = function(e) print("No number of dices were rolled")
)
}
which give result like
> kdice(4, 5)
[1] 17 14 22 13 11
> kdice(4, 0)
[1] "No number of dices were rolled"

Large number digit sum

I am trying to create a function that computes the sum of digits of large numbers, of the order of 100^100. The approach described in this question does not work, as shown below. I tried to come up with a function that does the job, but have not been able to get very far.
The inputs would be of the form a^b, where 1 < a, b < 100 and a and b are integers. So, in that sense, I am open to making digitSumLarge a function that accepts two arguments.
digitSumLarge <- function(x) {
pow <- floor(log10(x)) + 1L
rem <- x
i <- 1L
num <- integer(length = pow)
# Individually isolate each digit starting from the largest and store it in num
while(rem > 0) {
num[i] <- rem%/%(10^(pow - i))
rem <- rem%%(10^(pow - i))
i <- i + 1L
}
return(num)
}
# Function in the highest voted answer of the linked question.
digitsum <- function(x) sum(floor(x / 10^(0:(nchar(x) - 1))) %% 10)
Consider the following tests:
x <- c(1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9)
as.numeric(paste(x, collapse = ''))
# [1] 1.234568e+17
sum(x)
# 90
digitSumLarge(as.numeric(paste(x, collapse = '')))
# 85
digitsum(as.numeric(paste(x, collapse = '')))
# 81, with warning message about loss of accuracy
Is there any way I can write such a function in R?
You need arbitrary precision numbers. a^b with R's numerics (double precision floats) can be only represented with limited precision and not exactly for sufficiently large input.
library(gmp)
a <- as.bigz(13)
b <- as.bigz(67)
sum(as.numeric(strsplit(as.character(a^b), split = "")[[1]]))
#[1] 328

Sum matrix recursively from previews operations

Having the following matrix and vector:
a<-matrix(c(1,4,7,
2,5,8,
3,6,9), nrow = 3)
b <- c(1,1,1)
How do I sum recursiverly over each line of the matrix inside a funciton till obtain a desired result using last result to calculate next operation as shown:
b<-b+a[1,]
b<-b+a[2,]
b<-b+a[3,]
b<-b+a[1,]
b<-b+a[2,]
sum(b)>100 # Sum recursiverly till obtain this result sum(b)>100
This operation looks similar to this answer Multiply recursiverly in r. However it uses results from previews operations to calculate next ones.
Here's a recursive function to do what you're after,
# Sample Data
a<-matrix(c(1,4,7,
2,5,8,
3,6,9), nrow = 3)
b <- c(1,1,1)
We create a function that references itself with a value that increments modulo the number of rows
recAdd <- function(b, a, start = 1, size = NROW(a)) {
if(sum(b) > 100) return(b)
return(recAdd(b + a[start,], a, start = start %% size + 1, size))
}
> recAdd(b,a)
[1] 30 38 46
EDIT: Alternatively, here's a way with no recursion at all, which is much faster on large ratios of target number to sum of the matrix (but is slower on data of this size). Basically we get to take advantage of Euclid
nonrecAdd <- function(b, a, target = 100) {
Remaining <- target - sum(b)
perloop <- sum(a)
nloops <- Remaining %/% perloop
Remaining <- Remaining %% perloop
if(Remaining > 0) {
cumulativeRowsums <- cumsum(rowSums(a))
finalindex <- which((Remaining %/% cumulativeRowsums) == 0)[1]
b + colSums(a) * nloops + colSums(a[1:finalindex,,drop = FALSE])
} else {
b + colSums(a) * nloops
}
}

Track loop iterations

Flip a coin. Success, you win 100, otherwise you lose 50. You will keep playing until you have money in your pocket a. How can the value of a at any iteration be stored?
a <- 100
while (a > 0) {
if (rbinom(1, 1, 0.5) == 1) {
a <- a + 100
} else {
a <- a - 50
}
}
As a final result, when the while loop ends, I would like to be able to look at the value of a for each iteration, instead of just the final result. I consulted the post on Counting the iteration in sapply, but I wasn't able to apply it to this case.
Store the initial value of a in a second vector, and append the new value of a at each iteration.
a <- pocket <- 100
while (a > 0) {
if (rbinom(1, 1, 0.5) == 1) {
a <- a + 100
} else {
a <- a - 50
}
pocket <- c(pocket, a)
}
Of course a vectorised approach may be more efficient, e.g.:
n <- 1000000
x <- c(100, sample(c(100, -50), n, replace=TRUE))
cumsum(x)[1:match(0, cumsum(x))]
But there's no guarantee you'll run out of money within n iterations (in which case you receive an error and can just look at x to see the realised trajectory).
EDIT
In response to concerns voiced by #Roland, the following approach avoids reallocation of memory at each iteration:
n <- 1e6
a <- rep(NA_integer_, n)
a[1] <- 100L # set initial value (integer)
i <- 1 # counter
while(a[i] > 0) {
# first check whether our results will fit. If not, embiggenate `a`.
if(i==length(a)) a <- c(a, rep(NA_integer_, n))
if (rbinom(1, 1, 0.5) == 1) {
a[i+1] <- a[i] + 100L
} else {
a[i+1] <- a[i] - 50L
}
i <- i + 1
}
a[seq_len(i)]

Resources