Filling up a vector - r

a <- c(0,3,7,2)
b <- 10`
I try to distribute a certain number of values (b) over the length of vector a. Instead of adding 10/4 to every value of a, I want to fill them up. the result vector for this case should be c(5,5,7,5).
what I've tried:
f = e + b
opt.vert <- function(b,a,f) {
repeat{lapply(1:length(a),
function(x) if((a[[x]] == min(a)) && (a[[x]]) < (b/length(a))){
a[[x]] <- a[[x]] +1
} else {
a[[x]] <- a[[x]]
} )
if(sum(a) >= f) break
}
return(a)
}
Apart from that approach being horribly unelegant, it also doesn't work. I'm having a hard time figuring out what's wrong in it bc it seems to drag me into an eternal loop and I therefore get no error message.

for (i in seq_len(b)) a[which.min(a)] <- a[which.min(a)] + 1
#[1] 5 5 7 5
Note that which.min returns the position of the first minimum. If you want to break ties differently, you'll have to modify this slightly.
(I suspect spending some time on the mathematical background of the task might lead to more efficient solutions that could avoid loops. Might be a nice puzzle for people with more spare time.)

something like this using recursion
a <- c(0, 3, 7, 2)
b <- 10
Reduce(function(x, y) {
idx <- which.min(x)
x[idx] <- x[idx] + 1
x
}, rep(1, b), a, accumulate=TRUE)

Related

Understanding Breakpoint function: how for loops work inside functions

I have the following exercise to be solved in R. Under the exercise, there is a hint towards the solution.
Exercise: If there are no ties in the data set, the function above will produce breakpoints with h observations in the interval between two consecutive breakpoints (except the last two perhaps). If there are ties, the function will by construction return unique breakpoints, but there may be more than h observations in some intervals.
Hint:
my_breaks <-function(x, h = 5) {
x <-sort(x)
breaks <- xb <- x[1]
k <- 1
for(i in seq_along(x)[-1])
{if(k<h)
{k <- k+1}
else{
if(xb<x[i-1]&&x[i-1]<x[i])
{xb <- x[i-1]
breaks <-c(breaks, xb)
k <- 1
}
}
}
However, I am having a hard time understanding the above function particularly the following lines
for(i in seq_along(x)[-1])
{if(k<h)
{k <- k+1}
Question:
How is the for loop supposed to act in k if k is previously defined as 1 and i is different than k? How are the breakpoints chosen according to the h=5 gap if the for loop is not acting on x? Can someone explain to me how this function works?
Thanks in advance!
First, note that your example is incomplete. The return value and the final brace are missing there. Here is the correct version.
my_breaks <-function(x, h = 5) {
x <- sort(x)
breaks <- xb <- x[1]
k <- 1
for(i in seq_along(x)[-1]){
if(k<h) {
k <- k+1
} else {
if(xb<x[i-1]&&x[i-1]<x[i]){
xb <- x[i-1]
breaks <-c(breaks, xb)
k <- 1
}
}
}
breaks
}
Let's check if it works.
my_breaks(c(1,1,1:5,8:10), 2)
#[1] 1 2 4 8
my_breaks(c(1,1,1:5,8:10), 5)
#[1] 1 3
As you can see, everything is fine. And what is seq_along(x)[-1]? We could write this equation as 2:length(x). So the for loop goes through each element of the vector x in sequence, skipping the first element.
What is the k variable for? It counts the distance to take into account the h parameter.

Find the n values whose sum is equal to 2020

I have a vector Vec with these values:
1721
979
366
299
675
1456
I am struggling in finding a way to obtain which combination of n (I would like to do initially for n=2) values has a sum equals to 2020.
In the example is easy to see this as 1721 and 299 sum 2020 but my data is even longer and I would like to generalize to n values so that I have a function where I set a vector and a value to choose the combination of numbers (it can be 2,3,5,..). My output would be c(1721,299).
You can get all combinations of your input numbers with combn(), which returns a matrix where the combinations are columns. So then you just need to take the sum of each column and see which one is equal to your target.
Vec <- c(1721,
979,
366,
299,
675,
1456)
n <- 2
all_combinations <- combn(Vec,n)
all_combinations[,colSums(all_combinations) == 2020]
If you just want to find one solution (there might be multiple solution) for the subset sum problem, you could try subsetsum from package adagio
> adagio::subsetsum(Vec, 2020)
$val
[1] 2020
$inds
[1] 1 4
such that
> Vec[adagio::subsetsum(Vec, 2020)$inds]
[1] 1721 299
Another way is using combn, e.g.,
f <- function(Vec, Tar, n) {
Filter(
length,
combn(Vec, n, FUN = function(x) ifelse(sum(x) == Tar, list(x), list(NULL)))
)
}
where a function f is defined for the objective, such that
> f(Vec,2020,2)
[[1]]
[1] 1721 299
> f(Vec,2020,3)
[[1]]
[1] 979 366 675
Note: Benchmarks at this gist. Note also the memory allocation on each approach.
Update
For a faster version of the combn recommendation, check out comboGeneral from the "RcppAlgos" package:
fun_RcppAlgos <- function(x, target, n) {
a <- RcppAlgos::comboGeneral(x, n)
a[which(rowSums(a) == target), ]
}
For n = 2, and assuming that you're only expecting one pair to be returned, the solution is as simple as:
Vec[(2020 - Vec) %in% Vec]
## [1] 1721 299
For n = 3, my initial thought was to use combn or expand.grid (or data.table::CJ), but then I thought this might also be a good case for a for loop. Since I don't use for loops a lot, here's what I came up with:
fun_for <- function(x, target, n) {
if (!n %in% c(2, 3)) stop("The accounting Elves are crazy!")
if (n == 2) {
out <- x[(target - x) %in% x]
} else if (n == 3) {
out <- numeric(0)
for (i in seq_along(x)) {
s1 <- x + x[i]
for (j in seq_along(s1)) {
s2 <- s1 + x[j]
if (any(s2 == target)) out <- c(out, x[which(s2 == target)])
}
}
out <- unique(out)
}
out
}
And, for expand.grid and data.table::CJ, these were the functions I used:
fun_eg <- function(x, target, n) {
a <- expand.grid(replicate(n, x, FALSE))
unlist(a[rowSums(a) == target, ][1, ], use.names = FALSE)
}
fun_cj <- function(x, target, n) {
a <- do.call(data.table::CJ, replicate(n, x, FALSE))
unlist(a[rowSums(a) == target, ][1, ], use.names = FALSE)
}
The reason I'm extremely hesitant about the expand.grid type approach is that you can quickly end up having to generate a huge table against which you're going to be checking. For example, with length(x) == 500, you'd have to create a table with 125,000,000 rows and 3 columns that you're going to have to check against.
combn is a bit better. With combn, if you have length(x) == 500, you'd have to create a matrix with 3 rows and 10,586,800 columns (run choose(400, 3) to calculate the number of columns).
Keeping all of that in mind, I ran some tests, which I've posted at this gist (rather than crowding this post further). This is one of those cases where a for loop makes sense, and if you continue with the Advent of Code 2020 problems, you're probably going to have to practice your looping and recursion skills a lot. Have fun!
I am using very basic coding.
Sub <- list()
for(x in vec){
Sub[[as.character(x)]] <- 2020-x
if(Sub[[as.character(x)]] %in% vec){print(paste0(x,",",Sub[[as.character(x)]]))}
}

why smart rounding works differently with map/lapply than without?

I would like to smartly round my results so that it sums up to the same sum after rounding.
Can someone explain me why this is different when I do it with map or lapply?
v <- c(
0.9472164,
71.5330771,
27.5197066)
smart.round <- function(x, digits = 0) {
up <- 10 ^ digits
x <- x * up
y <- floor(x)
indices <- tail(order(x-y), round(sum(x)) - sum(y))
y[indices] <- y[indices] + 1
y / up
}
### works correctly
smart.round(v)
### lapply and map is wrong
lapply(v,smart.round)
map(v,smart.round)
( I think this is merely a comment, but I have not yet earned my right add comments )
lapply, purrr::map are processing your input sequentially. In your example, lapply takes the first value of v and calls smart.round then moves on to the second value of v and so on ...
in total smart.round is called three times, each time without any knowledge of the other two values in v.
I'm not entirely sure why you try to use lapply here, if this is part of a more complex situation you might want to expand your question.
I have written my own solution. Definitely a bit cumbersome but it works.. :) My initial goal was just to input a dataframe and output the rounded dataframe.
The whole example here:
v <- data.frame(a = c(0.9472164,
71.5330771,
27.5197066),
b = c(4.6472164,
5.6330771,
27.1197066))
smart.round <- function(x, digits = 0) {
up <- 10 ^ digits
x <- x * up
y <- floor(x)
indices <- tail(order(x-y), round(sum(x)) - sum(y))
y[indices] <- y[indices] + 1
y / up
}
rounding_function <- function(input_df) {
output_df <- data.frame(matrix(ncol = ncol(input_df), nrow = nrow(input_df)))
for (i in 1:nrow(input_df)) {
a = smart.round(as.numeric(input_df[i,]))
for (k in 1:ncol(input_df)) {
output_df[i,k]=a[k]
}
colnames(output_df) = colnames(input_df)
}
return(output_df)
}
v_rounded <- rounding_function(v)

Data generation: Creating a vector of vectors

I have a vector of positive integers of unknown length. Let's call it vector a with elements a[1], a[2], ...
I want to perform calculations on vector b where for all i, 0 <= b[i] <= a[i].
The following does not work:
for(b in 0:a)
{
# calculations
}
The best I have come up with is:
probabilities <- function(a,p)
{
k <- a
k[1] <- 1
h <- rep(0,sum(a)+1)
for(i in 2:length(a))
{
k[i] <- k[i-1]*(a[i-1]+1)
}
for(i in 0:prod(a+1))
{
b <- a
for(j in 1:length(a))
{
b[j] <- (floor(i/k[j]) %% (a[j]+1))
}
t <- 1
for(j in 1:length(a))
{
t <- t * choose(a[j],b[j])*(p[j])^(b[j])*(1-p[j])^(a[j]-b[j])
}
h[sum(b)+1] <- h[sum(b)+1] + t
}
return(h)
}
In the middle of my function is where I create b. I start off by setting b equal to a (so that it is the same size). Then, I replace all of the elements of b with different elements that are rather tricky to calculate. This seems like an inefficient solution. It works, but it is fairly slow as the numbers get large. Any ideas for how I can cut down on process time? Essentially, what this does for b is the first time through, b is all zeros. Then, it is 1, 0,0,0,... The first element keeps incrementing until it reaches a[1], then b[2] increments and b[1] is set to 0. Then b[1] starts incrementing again.
I know the math is sound, I just do not trust that it is efficient. I studied combinatorics for a few years, but have never studied computational complexity theory, so coming up with a fast algorithm is a bit beyond my realm of knowledge. Any ideas would be helpful!

Distributing an amount as evenly as possible

We have have a certain amount e.g. 300 units. This amount should be as evenly as possible distributed over 40 "slots". It would be easy if each slot would be the same - so it would be 7,5 at each slot. However, the slots vary in size and we cannot "fill in" there more than its "size" allows for e.g. if its only 5. What we cannot "fill in" we have to distribute more over the other ones.
I have some basic ideas but I am far away from being an expeRt and hope there is an easy way to solve this.
As an example how this could look like. In array "a" the values stand for the maxima the slots can take. a[i] is the maximum of the i-th slot. "b" is what we have to distribute overall e.g. 300.
# developing slots and their "size"
a <- rnorm(40,10,4)
sum(a)
# overall sum to distribute
b <- 300
Maybe it is possible to sort the values in a increasing order and then one could use it by a double for loop. a[,2] becomes the column for the "filled in" amount.
for i in 1:40
{a[i,2] <- a[1,2]*40
b <- a [1,2]*40}
for i in 2:40
{a[i,2] <- a[1,2]*39
b <- a[1,2]*39}
etc.
I am not sure how I can put the both for loops together and if this is an adequate solution overall.
Happy to hear your ideas. Thanks!
First version, using a while loop:
optimal.fill <- function(a, b) {
stopifnot(sum(a) >= b)
d <- rep(0, length(a))
while(b > 0) {
has.room <- a > 0
num.slots <- sum(has.room)
min.size <- min(a[has.room])
add.size <- min(b / num.slots, min.size)
d[has.room] <- d[has.room] + add.size
a[has.room] <- a[has.room] - add.size
b <- b - num.slots * add.size
}
return(d)
}
This second version is a little harder to understand, but more elegant I feel:
optimal.fill <- function(a, b) {
stopifnot(sum(a) >= b)
slot.order <- order(a)
sorted.sizes <- a[slot.order]
can.fill <- sorted.sizes * rev(seq_along(a))
full.slots <- slot.order[which(cumsum(can.fill) <= b)]
d <- rep(0, length(a))
d[ full.slots] <- a[full.slots]
d[!full.slots] <- (b - sum(a[full.slots])) /
(length(a) - length(full.slots))
return(d)
}
Here's another option:
optimal.fill2 <- function(a,b) {
o <- rank(a)
a <- sort(a)
ca <- cumsum(a)
foo <- (b-ca)/((length(a)-1):0)
ok <- foo >= a
a[!ok] <- foo[max(which(ok))]
a[o]
}

Resources