I have a vector Vec with these values:
1721
979
366
299
675
1456
I am struggling in finding a way to obtain which combination of n (I would like to do initially for n=2) values has a sum equals to 2020.
In the example is easy to see this as 1721 and 299 sum 2020 but my data is even longer and I would like to generalize to n values so that I have a function where I set a vector and a value to choose the combination of numbers (it can be 2,3,5,..). My output would be c(1721,299).
You can get all combinations of your input numbers with combn(), which returns a matrix where the combinations are columns. So then you just need to take the sum of each column and see which one is equal to your target.
Vec <- c(1721,
979,
366,
299,
675,
1456)
n <- 2
all_combinations <- combn(Vec,n)
all_combinations[,colSums(all_combinations) == 2020]
If you just want to find one solution (there might be multiple solution) for the subset sum problem, you could try subsetsum from package adagio
> adagio::subsetsum(Vec, 2020)
$val
[1] 2020
$inds
[1] 1 4
such that
> Vec[adagio::subsetsum(Vec, 2020)$inds]
[1] 1721 299
Another way is using combn, e.g.,
f <- function(Vec, Tar, n) {
Filter(
length,
combn(Vec, n, FUN = function(x) ifelse(sum(x) == Tar, list(x), list(NULL)))
)
}
where a function f is defined for the objective, such that
> f(Vec,2020,2)
[[1]]
[1] 1721 299
> f(Vec,2020,3)
[[1]]
[1] 979 366 675
Note: Benchmarks at this gist. Note also the memory allocation on each approach.
Update
For a faster version of the combn recommendation, check out comboGeneral from the "RcppAlgos" package:
fun_RcppAlgos <- function(x, target, n) {
a <- RcppAlgos::comboGeneral(x, n)
a[which(rowSums(a) == target), ]
}
For n = 2, and assuming that you're only expecting one pair to be returned, the solution is as simple as:
Vec[(2020 - Vec) %in% Vec]
## [1] 1721 299
For n = 3, my initial thought was to use combn or expand.grid (or data.table::CJ), but then I thought this might also be a good case for a for loop. Since I don't use for loops a lot, here's what I came up with:
fun_for <- function(x, target, n) {
if (!n %in% c(2, 3)) stop("The accounting Elves are crazy!")
if (n == 2) {
out <- x[(target - x) %in% x]
} else if (n == 3) {
out <- numeric(0)
for (i in seq_along(x)) {
s1 <- x + x[i]
for (j in seq_along(s1)) {
s2 <- s1 + x[j]
if (any(s2 == target)) out <- c(out, x[which(s2 == target)])
}
}
out <- unique(out)
}
out
}
And, for expand.grid and data.table::CJ, these were the functions I used:
fun_eg <- function(x, target, n) {
a <- expand.grid(replicate(n, x, FALSE))
unlist(a[rowSums(a) == target, ][1, ], use.names = FALSE)
}
fun_cj <- function(x, target, n) {
a <- do.call(data.table::CJ, replicate(n, x, FALSE))
unlist(a[rowSums(a) == target, ][1, ], use.names = FALSE)
}
The reason I'm extremely hesitant about the expand.grid type approach is that you can quickly end up having to generate a huge table against which you're going to be checking. For example, with length(x) == 500, you'd have to create a table with 125,000,000 rows and 3 columns that you're going to have to check against.
combn is a bit better. With combn, if you have length(x) == 500, you'd have to create a matrix with 3 rows and 10,586,800 columns (run choose(400, 3) to calculate the number of columns).
Keeping all of that in mind, I ran some tests, which I've posted at this gist (rather than crowding this post further). This is one of those cases where a for loop makes sense, and if you continue with the Advent of Code 2020 problems, you're probably going to have to practice your looping and recursion skills a lot. Have fun!
I am using very basic coding.
Sub <- list()
for(x in vec){
Sub[[as.character(x)]] <- 2020-x
if(Sub[[as.character(x)]] %in% vec){print(paste0(x,",",Sub[[as.character(x)]]))}
}
Related
Need to apply code that has the following effect on a huge data set:
sum_to_limit <- function(x, limit) {
ret <- 0
if (length(na.omit(x)) > 0) {
for (i in seq_along(x)) {
if (ret + x[i] <= limit) ret <- ret + x[i]
}
}
return(ret)
}
In other words, I need to sum up all the elements of the supplied vector that add up to no more than limit. So for example sum_to_limit(c(10,10,10,10,5), 17) = 15. Have so far failed to come up with anything faster than above, which is not really cutting it on my data. Feels like it should be possible to create a vectorized version...
You can set up an optimization problem. Might only be worth if you have longer vectors rather than multiple small ones:
library(lpSolve)
sum_to_limit <- function(x, limit) {
sol <- lp ("max",
objective.in = rep(1, length(x)),
const.mat = matrix(x, nrow=1),
const.dir = "<=",
const.rhs = limit,
all.bin = T)
stopifnot(sol$status==0) # no solution
return(x[sol$solution==1])
}
print(sum_to_limit(c(10,10,10,10,5), 17))
I found it more interesting to return the selected entries rather than the sum.
You can sort the vector and add them upto they are less than the limit.
sum_to_limit <- function(x, limit) {
x <- sort(x)
sum(x[cumsum(x) <= limit])
}
sum_to_limit(c(10,10,10,10,5), 17)
#[1] 15
sum_to_limit(c(10,10,10,10,5), 35)
#[1] 35
I am trying to create a function that computes the sum of digits of large numbers, of the order of 100^100. The approach described in this question does not work, as shown below. I tried to come up with a function that does the job, but have not been able to get very far.
The inputs would be of the form a^b, where 1 < a, b < 100 and a and b are integers. So, in that sense, I am open to making digitSumLarge a function that accepts two arguments.
digitSumLarge <- function(x) {
pow <- floor(log10(x)) + 1L
rem <- x
i <- 1L
num <- integer(length = pow)
# Individually isolate each digit starting from the largest and store it in num
while(rem > 0) {
num[i] <- rem%/%(10^(pow - i))
rem <- rem%%(10^(pow - i))
i <- i + 1L
}
return(num)
}
# Function in the highest voted answer of the linked question.
digitsum <- function(x) sum(floor(x / 10^(0:(nchar(x) - 1))) %% 10)
Consider the following tests:
x <- c(1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9)
as.numeric(paste(x, collapse = ''))
# [1] 1.234568e+17
sum(x)
# 90
digitSumLarge(as.numeric(paste(x, collapse = '')))
# 85
digitsum(as.numeric(paste(x, collapse = '')))
# 81, with warning message about loss of accuracy
Is there any way I can write such a function in R?
You need arbitrary precision numbers. a^b with R's numerics (double precision floats) can be only represented with limited precision and not exactly for sufficiently large input.
library(gmp)
a <- as.bigz(13)
b <- as.bigz(67)
sum(as.numeric(strsplit(as.character(a^b), split = "")[[1]]))
#[1] 328
a <- c(0,3,7,2)
b <- 10`
I try to distribute a certain number of values (b) over the length of vector a. Instead of adding 10/4 to every value of a, I want to fill them up. the result vector for this case should be c(5,5,7,5).
what I've tried:
f = e + b
opt.vert <- function(b,a,f) {
repeat{lapply(1:length(a),
function(x) if((a[[x]] == min(a)) && (a[[x]]) < (b/length(a))){
a[[x]] <- a[[x]] +1
} else {
a[[x]] <- a[[x]]
} )
if(sum(a) >= f) break
}
return(a)
}
Apart from that approach being horribly unelegant, it also doesn't work. I'm having a hard time figuring out what's wrong in it bc it seems to drag me into an eternal loop and I therefore get no error message.
for (i in seq_len(b)) a[which.min(a)] <- a[which.min(a)] + 1
#[1] 5 5 7 5
Note that which.min returns the position of the first minimum. If you want to break ties differently, you'll have to modify this slightly.
(I suspect spending some time on the mathematical background of the task might lead to more efficient solutions that could avoid loops. Might be a nice puzzle for people with more spare time.)
something like this using recursion
a <- c(0, 3, 7, 2)
b <- 10
Reduce(function(x, y) {
idx <- which.min(x)
x[idx] <- x[idx] + 1
x
}, rep(1, b), a, accumulate=TRUE)
All I want to do is to implement the solution given here (the one in python)
in R.
I'm not very used to do debugging in R-Studio but even after I have tried that I still can't figure out why my code does not work. Basically (with the example input provided) I get the function to run over all the numbers and then it is stuck in a sort of infinite loop (or function). Can someone please point me in the right direction regarding this?
subset_sum <- function(numbers, target, partial = numeric(0)){
s <- sum(partial,na.rm = TRUE)
# check if the partial sum equals to target
if (s == target){
cat("sum(",partial,")","=",target)
}
else if (s >= target) {
return() # if we reach the number why bother to continue
}
else {
for(i in 1:length(numbers)){
n <- numbers[i]
remaining <- numbers[i+1:length(numbers)]
subset_sum(remaining, target, partial = append(partial,n))
}
}
}
subset_sum(c(3,9,8,4,5,7,10),15)
When not run in debug mode it gives me these errors:
Error: node stack overflow
Error during wrapup: node stack overflow
Here's a recursive implementation in R
subset_sum = function(numbers,target,partial=0){
if(any(is.na(partial))) return()
s = sum(partial)
if(s == target) print(sprintf("sum(%s)=%s",paste(partial[-1],collapse="+"),target))
if(s > target) return()
for( i in seq_along(numbers)){
n = numbers[i]
remaining = numbers[(i+1):length(numbers)]
subset_sum(remaining,target,c(partial,n))
}
}
I had to add one extra catch in R from python to handle when i+1 > length(numbers) and returned an NA.
> subset_sum(c(3,9,8,4,5,7,10),15)
[1] "sum(3+8+4)=15"
[1] "sum(3+5+7)=15"
[1] "sum(8+7)=15"
[1] "sum(5+10)=15"
I think (but I'm not sure) that your issue was nest if/else if logic in a recursive function. Interestingly, when I put the if(i+1 > length(numbers)) return() inside the for loop, that broke the functionality so I didn't get all the answers right - the return's need to be outside the recursion.
This is not a recursive function but it takes advantage of R's ability to handle matrix/array type data. Some output is shown after #
v <- c(3,9,8,4,5,7,10)
v <- sort(v)
# [1] 3 4 5 7 8 9 10
target <- 15
# we don't need to check more than at most 4 numbers since 3+4+5+7 (the smallest numbers) is greater than 15
mincombs <- min(which(cumsum(v) > target))
# [1] 4
Combs <- combn(v, mincombs) # make combinations of numbers
ans <- mapply(function(x,y) ifelse(y > 0, paste0(paste0(Combs[1:y,x], collapse="+"), "=", target), NA), 1:ncol(Combs), apply(Combs, 2, function(I) which(cumsum(I) == target)))
ans <- unlist(ans[lengths(ans) > 0])
# [1] "3+4+8=15" "3+4+8=15" "3+5+7=15" "3+5+7=15" "3+5+7=15" "7+8=15"
In a function
myfun <- function(V, target) {
V <- sort(V)
mincombs <- min(which(cumsum(V) > target))
Combs <- combn(V, mincombs)
ans <- mapply(function(x,y) ifelse(y > 0, paste0(paste0(Combs[1:y,x], collapse="+"), "=", target), NA), 1:ncol(Combs), apply(Combs, 2, function(I) which(cumsum(I) == target)))
ans <- unlist(ans[lengths(ans) > 0])
return(ans)
}
myfun(V = c(3,9,8,4,5,7,10), target = 15)
myfun(V = c(3,9,8,4,5,7,10,12,4,32),target = 20)
The idea of Project Euler question 12 is to find the smallest triangular number with a specified number of divisors(https://projecteuler.net/problem=12). As an attempt to solve this problem, I wrote the following code:
# This function finds the number of divisors of a number and returns it.
FUN <- function(x) {
i = 1
lst = integer(0)
while(i<=x)
{
if(x %% i ==0)
{
lst = c(lst, i)
}
i = i +1
}
return(lst)
}
and
n = 1
i=1
while (length(FUN(n))<500)
{
i = i + 1
n = n + i
}
This code is producing the correct answer for few smaller test cases: length(FUN(n))<4 will produce 6, and length(FUN(n))<6 will produce 28.
However, this simple looking code is taking over 24 hours to run (and still running) for length(FUN(n))<500. I understand that for a number to have 500 divisors, the number is probably very big, but I am wondering why is it taking so long to run.
You FUN is much too inefficient for this task. As the first triangular number is above the 12,000th with a value of 75,000,000 and FUN runs through all these numbers ... the number of iterations to perform is almost
12000 * 75000000 / 2 = 450 * 10^9
This is clearly more than R's relatively slow for-loop can do in a reasonable time frame.
Instead, you could apply the divisors function from the numbers package that employs a prime factor decomposition. The following code need about 5-6 seconds (on my machine) to find the triangular number.
library(numbers)
t <- 0
system.time(
for (i in 1:100000) {
t <- t + i
d <- length( divisors(t) )
if (d > 500) {
cat(i, t, d, '\n')
break
}
}
)
## 12375 76576500 576
## user system elapsed
## 5.660 0.000 5.658
Instead of calculating the i-th triangular number, here i is added to the last triangular number. The time saving is minimal.
Here's my attempt:
library(gmp)
library(plyr)
get_all_factors <- function(n)
{
prime_factor_tables <- lapply(
setNames(n, n),
function(i)
{
if(i == 1) return(data.frame(x = 1L, freq = 1L))
plyr::count(as.integer(gmp::factorize(i)))
}
)
lapply(
prime_factor_tables,
function(pft)
{
powers <- plyr::alply(pft, 1, function(row) row$x ^ seq.int(0L, row$freq))
power_grid <- do.call(expand.grid, powers)
sort(unique(apply(power_grid, 1, prod)))
}
)
}
for (i in 99691200:100000) {
if (length(get_all_factors(i)[[1]])>500) print(paste(i, length(get_all_factors(i)[[1]])))
if (i %% 100000 == 0) print(paste("-",i,"-"))
}
Let it run as long as you can be bothered...