combinations of numbers to reach a given sum - recursive implementation in R - r

All I want to do is to implement the solution given here (the one in python)
in R.
I'm not very used to do debugging in R-Studio but even after I have tried that I still can't figure out why my code does not work. Basically (with the example input provided) I get the function to run over all the numbers and then it is stuck in a sort of infinite loop (or function). Can someone please point me in the right direction regarding this?
subset_sum <- function(numbers, target, partial = numeric(0)){
s <- sum(partial,na.rm = TRUE)
# check if the partial sum equals to target
if (s == target){
cat("sum(",partial,")","=",target)
}
else if (s >= target) {
return() # if we reach the number why bother to continue
}
else {
for(i in 1:length(numbers)){
n <- numbers[i]
remaining <- numbers[i+1:length(numbers)]
subset_sum(remaining, target, partial = append(partial,n))
}
}
}
subset_sum(c(3,9,8,4,5,7,10),15)
When not run in debug mode it gives me these errors:
Error: node stack overflow
Error during wrapup: node stack overflow

Here's a recursive implementation in R
subset_sum = function(numbers,target,partial=0){
if(any(is.na(partial))) return()
s = sum(partial)
if(s == target) print(sprintf("sum(%s)=%s",paste(partial[-1],collapse="+"),target))
if(s > target) return()
for( i in seq_along(numbers)){
n = numbers[i]
remaining = numbers[(i+1):length(numbers)]
subset_sum(remaining,target,c(partial,n))
}
}
I had to add one extra catch in R from python to handle when i+1 > length(numbers) and returned an NA.
> subset_sum(c(3,9,8,4,5,7,10),15)
[1] "sum(3+8+4)=15"
[1] "sum(3+5+7)=15"
[1] "sum(8+7)=15"
[1] "sum(5+10)=15"
I think (but I'm not sure) that your issue was nest if/else if logic in a recursive function. Interestingly, when I put the if(i+1 > length(numbers)) return() inside the for loop, that broke the functionality so I didn't get all the answers right - the return's need to be outside the recursion.

This is not a recursive function but it takes advantage of R's ability to handle matrix/array type data. Some output is shown after #
v <- c(3,9,8,4,5,7,10)
v <- sort(v)
# [1] 3 4 5 7 8 9 10
target <- 15
# we don't need to check more than at most 4 numbers since 3+4+5+7 (the smallest numbers) is greater than 15
mincombs <- min(which(cumsum(v) > target))
# [1] 4
Combs <- combn(v, mincombs) # make combinations of numbers
ans <- mapply(function(x,y) ifelse(y > 0, paste0(paste0(Combs[1:y,x], collapse="+"), "=", target), NA), 1:ncol(Combs), apply(Combs, 2, function(I) which(cumsum(I) == target)))
ans <- unlist(ans[lengths(ans) > 0])
# [1] "3+4+8=15" "3+4+8=15" "3+5+7=15" "3+5+7=15" "3+5+7=15" "7+8=15"
In a function
myfun <- function(V, target) {
V <- sort(V)
mincombs <- min(which(cumsum(V) > target))
Combs <- combn(V, mincombs)
ans <- mapply(function(x,y) ifelse(y > 0, paste0(paste0(Combs[1:y,x], collapse="+"), "=", target), NA), 1:ncol(Combs), apply(Combs, 2, function(I) which(cumsum(I) == target)))
ans <- unlist(ans[lengths(ans) > 0])
return(ans)
}
myfun(V = c(3,9,8,4,5,7,10), target = 15)
myfun(V = c(3,9,8,4,5,7,10,12,4,32),target = 20)

Related

Efficiently sum all elements of vector that add upp to no more than a specified limit

Need to apply code that has the following effect on a huge data set:
sum_to_limit <- function(x, limit) {
ret <- 0
if (length(na.omit(x)) > 0) {
for (i in seq_along(x)) {
if (ret + x[i] <= limit) ret <- ret + x[i]
}
}
return(ret)
}
In other words, I need to sum up all the elements of the supplied vector that add up to no more than limit. So for example sum_to_limit(c(10,10,10,10,5), 17) = 15. Have so far failed to come up with anything faster than above, which is not really cutting it on my data. Feels like it should be possible to create a vectorized version...
You can set up an optimization problem. Might only be worth if you have longer vectors rather than multiple small ones:
library(lpSolve)
sum_to_limit <- function(x, limit) {
sol <- lp ("max",
objective.in = rep(1, length(x)),
const.mat = matrix(x, nrow=1),
const.dir = "<=",
const.rhs = limit,
all.bin = T)
stopifnot(sol$status==0) # no solution
return(x[sol$solution==1])
}
print(sum_to_limit(c(10,10,10,10,5), 17))
I found it more interesting to return the selected entries rather than the sum.
You can sort the vector and add them upto they are less than the limit.
sum_to_limit <- function(x, limit) {
x <- sort(x)
sum(x[cumsum(x) <= limit])
}
sum_to_limit(c(10,10,10,10,5), 17)
#[1] 15
sum_to_limit(c(10,10,10,10,5), 35)
#[1] 35

Find the n values whose sum is equal to 2020

I have a vector Vec with these values:
1721
979
366
299
675
1456
I am struggling in finding a way to obtain which combination of n (I would like to do initially for n=2) values has a sum equals to 2020.
In the example is easy to see this as 1721 and 299 sum 2020 but my data is even longer and I would like to generalize to n values so that I have a function where I set a vector and a value to choose the combination of numbers (it can be 2,3,5,..). My output would be c(1721,299).
You can get all combinations of your input numbers with combn(), which returns a matrix where the combinations are columns. So then you just need to take the sum of each column and see which one is equal to your target.
Vec <- c(1721,
979,
366,
299,
675,
1456)
n <- 2
all_combinations <- combn(Vec,n)
all_combinations[,colSums(all_combinations) == 2020]
If you just want to find one solution (there might be multiple solution) for the subset sum problem, you could try subsetsum from package adagio
> adagio::subsetsum(Vec, 2020)
$val
[1] 2020
$inds
[1] 1 4
such that
> Vec[adagio::subsetsum(Vec, 2020)$inds]
[1] 1721 299
Another way is using combn, e.g.,
f <- function(Vec, Tar, n) {
Filter(
length,
combn(Vec, n, FUN = function(x) ifelse(sum(x) == Tar, list(x), list(NULL)))
)
}
where a function f is defined for the objective, such that
> f(Vec,2020,2)
[[1]]
[1] 1721 299
> f(Vec,2020,3)
[[1]]
[1] 979 366 675
Note: Benchmarks at this gist. Note also the memory allocation on each approach.
Update
For a faster version of the combn recommendation, check out comboGeneral from the "RcppAlgos" package:
fun_RcppAlgos <- function(x, target, n) {
a <- RcppAlgos::comboGeneral(x, n)
a[which(rowSums(a) == target), ]
}
For n = 2, and assuming that you're only expecting one pair to be returned, the solution is as simple as:
Vec[(2020 - Vec) %in% Vec]
## [1] 1721 299
For n = 3, my initial thought was to use combn or expand.grid (or data.table::CJ), but then I thought this might also be a good case for a for loop. Since I don't use for loops a lot, here's what I came up with:
fun_for <- function(x, target, n) {
if (!n %in% c(2, 3)) stop("The accounting Elves are crazy!")
if (n == 2) {
out <- x[(target - x) %in% x]
} else if (n == 3) {
out <- numeric(0)
for (i in seq_along(x)) {
s1 <- x + x[i]
for (j in seq_along(s1)) {
s2 <- s1 + x[j]
if (any(s2 == target)) out <- c(out, x[which(s2 == target)])
}
}
out <- unique(out)
}
out
}
And, for expand.grid and data.table::CJ, these were the functions I used:
fun_eg <- function(x, target, n) {
a <- expand.grid(replicate(n, x, FALSE))
unlist(a[rowSums(a) == target, ][1, ], use.names = FALSE)
}
fun_cj <- function(x, target, n) {
a <- do.call(data.table::CJ, replicate(n, x, FALSE))
unlist(a[rowSums(a) == target, ][1, ], use.names = FALSE)
}
The reason I'm extremely hesitant about the expand.grid type approach is that you can quickly end up having to generate a huge table against which you're going to be checking. For example, with length(x) == 500, you'd have to create a table with 125,000,000 rows and 3 columns that you're going to have to check against.
combn is a bit better. With combn, if you have length(x) == 500, you'd have to create a matrix with 3 rows and 10,586,800 columns (run choose(400, 3) to calculate the number of columns).
Keeping all of that in mind, I ran some tests, which I've posted at this gist (rather than crowding this post further). This is one of those cases where a for loop makes sense, and if you continue with the Advent of Code 2020 problems, you're probably going to have to practice your looping and recursion skills a lot. Have fun!
I am using very basic coding.
Sub <- list()
for(x in vec){
Sub[[as.character(x)]] <- 2020-x
if(Sub[[as.character(x)]] %in% vec){print(paste0(x,",",Sub[[as.character(x)]]))}
}

Prime numbers from random samples in R

I wrote the following code trying to find all the prime numbers from a random generated data set. sadly it seems something went wrong, could anybody help me.
set.seed(20171106)
n <- 10000
num <- sample(1:100000,n,replace=TRUE)
findPrime <- function(x){
apple<-c()
n<-length(x)
for(i in n){
if(any(x[i]%%(1:(x[i]-1))!=0)) apple <-c(apple,x[i])
}
return(apple)
}
To get results:
type:findPrime(num)
This is the warning message:
Warning message:
In if (x[i]%%(1:(x[i] - 1)) == 0) apple <- c(apple, x[i]) :
the condition has length > 1 and only the first element will be used
so how can I fix the problem?
if statements only accept single elements and in your declaration seems to get the whole vector. I have rewritten your function using a ifelse expression wrapped inside a sapply loop.
I hope this works for you.
findPrime <- function(x = 0){
primes <- c()
# Prime finder
primes <- sapply(X = x,FUN = function(x) {
ifelse(any(x %% (1:(x - 1)) != 0), T, F)}
)
# Select primes
primes <- num[primes]
return(primes)
}
findPrime(num)
I have checked another silly mistake... Inside the function change num for x in the select primes step and invert the F, T outcomes. It should look like this:
findPrime <- function(x = 0){
primes <- c()
# Prime finder
primes <- sapply(X = x,FUN = function(x) {
ifelse(any(x %% (2:(x - 1)) == 0), F, T)}
)
# Select primes
primes <- x[primes]
return(primes)
}
I have just tried it and it works fine.
use package "gmp" which has a function "isprime" which returns 0 for non prime numbers and 2 for prime numbers and then subset the data based on the same
say you have a vector a = c(1:10)
a = c(1:10)
b = gmp::isprime(a)
c = cbind(a,b)
c = as.data.frame(c)
c = c[c$b==2,]
a1 = c$a
a1
In your code: for(i in 1:n), there is the error

Summation of max function with vector input in R?

I want to build this function: f(a,b)=sum_{i=1 to n}(max(a-b_i,0)) where b=(b_1,b_2,...b_n). This is what I have done:
vec<-function(a,b){
z<-0
for(i in b){
ifelse(a > i,z <- z + (a-i), 0)
}
return(data.frame(z))
}
This code gives correct output for scalar input of a. But while using vector output answers are not always correct. For example
> vec(c(-6,5),c(3,1,3))
gives -25 for a=-6 and 8 for a=5 respectively.
But > vec(-6,c(3,1,3)) gives 0. And this the correct answer.
Please throw give some idea how will I correct it.
When you let a = c(-6,5), then this argument a > i becomes (FALSE, TRUE). Since it contains a true, the vector is passed into z <- z + (a-i). Note that if you use a=-6 in this formula, you get the output of -25. I would suggest doing something like this:
vec<-function(a,b){
z<-0
for(i in b){
p <- ifelse(a > i,z <- z + (a-i), 0)
}
return(data.frame(p))
}
I believe that what you are missing is the condition on only summing up positive numbers, as expressed by the condition 'max(a-b_i, 0)'.
This code works fine for the examples you provide:
vec <- function(a, b){
n <- NULL
for(i in 1:length(a)){
z <- a[i] - b
z <- z[z > 0]
n <- c(n, sum(z))
}
return(n)
}
For example:
> vec(c(-6, 5), c(3, 1, 3))
[1] 0 8
and
> vec(-6, c(3, 1, 3))
[1] 0
If you want to get a data.frame back just replace the return(n) instruction to return(data.frame(n)).

Change length.out in ifelse function

I'm running a simple ifelse function
f <- function(x) {
ifelse(x==shift(x), x + 0.001* sd(x, na.rm = TRUE), x)
}
where shift is from the data.table package
which allows me to change, for each column in a dataframe (usig apply), a value which is exactly the same as the previous one. The problem is that the ifelse function returns a length which is equal to the length of the test. In this case, the length is the one of shift(x) and not x. Therefore I end up with the first element (or the last, if using type = "lead", instead of the default "lag") of each column turned into NA.
Here a MWE:
a <- c(1,2,2,3,4,5,6)
b <- c(4,5,6,7,8,8,9)
data <- data.frame(cbind(a,b))
f <- function(x) {
ifelse(x==shift(x), x + 0.001* sd(x, na.rm = TRUE), x)
}
apply(data, 2, f)
Therefore I thought I could change the ifelse function: I've done a few attempts to change the length.out but I haven't succeeded yet
function (test, yes, no)
{
if (is.atomic(test)) {
if (typeof(test) != "logical")
storage.mode(test) <- "logical"
if (length(test) == 1 && is.null(attributes(test))) {
if (is.na(test))
return(NA)
else if (test) {
if (length(yes) == 1 && is.null(attributes(yes)))
return(yes)
}
else if (length(no) == 1 && is.null(attributes(no)))
return(no)
}
}
else test <- if (isS4(test))
methods::as(test, "logical")
else as.logical(test)
ans <- test
ok <- !(nas <- is.na(test))
if (any(test[ok]))
ans[test & ok] <- rep(yes, length.out = length(ans))[test &
ok]
if (any(!test[ok]))
ans[!test & ok] <- rep(no, length.out = length(ans))[!test &
ok]
ans[nas] <- NA
ans
}
EDIT
My original code was:
copy <- copy(data)
for (j in 1: ncol(copy)) {
for (i in 2: nrow(copy)) {
if (copy[i,j] == copy[i-1,j] & !is.na(copy[i,j]) & !is.na(copy[i-1,j])) {
copy[i,j] <- copy[i-1,j] + (0.0001*sd(copy[,j], na.rm = T))
}
}
}
but using it with large matrices may cause slow running time. This deals with multiple repetitions.
The goal was to get to a vectorised, quicker method using a function and apply.
As you mention, your approach leads to a NA in the first element of the vector returned by f. This first element is not similar to the previous (since there is none), so we would like to have the first value unchanged.
A straightforward approach is to do just that. Apologies, it does not answer your title question although it does solve your problem.
f <- function(x) {
# storing the output of ifelse in a variable
out <- ifelse(x==shift(x), x + 0.001* sd(x, na.rm = TRUE), x)
# changing the first element of `out` into first element of x
out[1] <- x[1]
# returning `out` -- in a R function,
# the last thing evaluated is returned
out
}
Note that this will not take care properly of elements repeated more than twice (e.g. c(1,2,2,2,3)). Also, this will change all your element the same way. So in c(1,2,2,1,2,2), all the second twos will be changed the same way. This may or mat not be something you want.
You could hack something (a comment suggests ?rle), but I suggest changing the way you randomize your data, if this makes sense with your particular data.
Instead of adding 0.001*sd, maybe you could add a gaussian noise with this standard dev? This depends on your application obviously.
f <- function(x) {
# adding gaussian noise with small sd to repeated values
# storing the output in a variable `out`
out <- ifelse(x==shift(x),
x + rnorm(length(x), mean=0,
sd=0.01*sd(x, na.rm = TRUE)),
x)
# changing the first element of `out` into first element of x
out[1] <- x[1]
# returning `out` -- in a R function,
# the last thing evaluated is returned
out
}
It depends on what is your purpose for getting rid of exact duplicated values.

Resources