Distributing an amount as evenly as possible - r

We have have a certain amount e.g. 300 units. This amount should be as evenly as possible distributed over 40 "slots". It would be easy if each slot would be the same - so it would be 7,5 at each slot. However, the slots vary in size and we cannot "fill in" there more than its "size" allows for e.g. if its only 5. What we cannot "fill in" we have to distribute more over the other ones.
I have some basic ideas but I am far away from being an expeRt and hope there is an easy way to solve this.
As an example how this could look like. In array "a" the values stand for the maxima the slots can take. a[i] is the maximum of the i-th slot. "b" is what we have to distribute overall e.g. 300.
# developing slots and their "size"
a <- rnorm(40,10,4)
sum(a)
# overall sum to distribute
b <- 300
Maybe it is possible to sort the values in a increasing order and then one could use it by a double for loop. a[,2] becomes the column for the "filled in" amount.
for i in 1:40
{a[i,2] <- a[1,2]*40
b <- a [1,2]*40}
for i in 2:40
{a[i,2] <- a[1,2]*39
b <- a[1,2]*39}
etc.
I am not sure how I can put the both for loops together and if this is an adequate solution overall.
Happy to hear your ideas. Thanks!

First version, using a while loop:
optimal.fill <- function(a, b) {
stopifnot(sum(a) >= b)
d <- rep(0, length(a))
while(b > 0) {
has.room <- a > 0
num.slots <- sum(has.room)
min.size <- min(a[has.room])
add.size <- min(b / num.slots, min.size)
d[has.room] <- d[has.room] + add.size
a[has.room] <- a[has.room] - add.size
b <- b - num.slots * add.size
}
return(d)
}
This second version is a little harder to understand, but more elegant I feel:
optimal.fill <- function(a, b) {
stopifnot(sum(a) >= b)
slot.order <- order(a)
sorted.sizes <- a[slot.order]
can.fill <- sorted.sizes * rev(seq_along(a))
full.slots <- slot.order[which(cumsum(can.fill) <= b)]
d <- rep(0, length(a))
d[ full.slots] <- a[full.slots]
d[!full.slots] <- (b - sum(a[full.slots])) /
(length(a) - length(full.slots))
return(d)
}

Here's another option:
optimal.fill2 <- function(a,b) {
o <- rank(a)
a <- sort(a)
ca <- cumsum(a)
foo <- (b-ca)/((length(a)-1):0)
ok <- foo >= a
a[!ok] <- foo[max(which(ok))]
a[o]
}

Related

Part 2: How to make nested for-loop do every permutation

This is an update to the first question I asked. I essentially am still missing the pretty obvious learning lesson and can't expand the code that worked originally.
Trying to store a nested loop that runs all permutations of calculations.
I'm missing understanding on how to set up the index still. I am now trying to set this up with three sets of loops (and really, I want to set this up so I understand how to do it with any # of them). The code below doesn't exactly work, the third loop doesn’t index over every combination (but I get 1000 calculations anyway) I commented the part that is clearly wrong...
With only two loops, this was sufficient, per the answer of my first post:
index <- 10*(i-1) + j
So not sure why the way I changed it for 3 loops doesn't work, but it's obviously wrong.
iter = 10 #length of parameter ranges to test
perm = 3 #how many parameters are being varied
n_c <- 1
m_c <- 1
n_n <- 1
m_n <- 1
n_v <- 1
my_data_c <- vector("numeric", iter^perm)
my_data_n <- vector("numeric", iter^perm)
my_data_v <- vector("numeric", iter^perm)
rho_c_store <- vector("numeric", iter)
rho_n_store <- vector("numeric", iter)
rho_v_store <- vector("numeric", iter)
for (i in 1:iter) {
# you can move this assignment to the outer loop
rho_c <- (i / 10)
x <- (rho_c * n_c)/m_c
for (j in 1:iter) {
rho_n <- (j / 10)
y <- (rho_n * n_n)/m_n
for (k in 1:iter){
rho_v <- (k / 10)
z <- rho_v/n_v
index <- iter*(i-2)+j+ k #Clearly where the error is
rho_c_store[index] <- rho_c
rho_n_store[index] <- rho_n
rho_v_store[index] <- rho_v
my_data_c[index] <- x
my_data_n[index] <- y
my_data_v[index] <- z
}
}
}
my_data <- cbind(rho_c_store, rho_n_store, rho_v_store, my_data_c, my_data_n,my_data_v)
print(my_data)
Think of it as if you were counting: the ones move the fastest, next come the tens, then the hundredes. 001, 002, ..., 010, ..., 099, 100, 101. You can think of your loop variables ijk like the digits of a number - k moves the fastest, j slower and i slower still. To get the right index you have to multiply the i with 100, the j with 10 and the k with one, givin you the index 100 * (i-1) + 10 * (j-1) + k. The -1 for i and j is necessary because the loop variable starts from 1, but we want to start by adding 0 * 100 and 0 * 10.
So all you need to do is change your index calculation to index <- iter^2*(i-1)+ iter*(j-1) + k.
# I stripped the example to the essentials for easier understanding
iter = 10 #length of parameter ranges to test
perm = 3 #how many parameters are being varied
my_data_c <- vector("numeric", iter^perm)
my_data_n <- vector("numeric", iter^perm)
my_data_v <- vector("numeric", iter^perm)
for (i in 1:iter) {
x <- i / 10
for (j in 1:iter) {
y <- j / 10
for (k in 1:iter){
z <- k / 10
index <- iter^2*(i-1)+ iter*(j-1) + k
my_data_c[index] <- x
my_data_n[index] <- y
my_data_v[index] <- z
}
}
}
my_data <- cbind(my_data_c, my_data_n, my_data_v)
print(my_data)
Anyway, while it's great to develop an deep understanding of these things you're certainly on the right track when you move away from looping with explicit indices and use R's toolkit for such tasks.
Hope this helps
In case anyone else wants to try this...expand.grid is definitely the way to go. Still not sure how to deal with the nested loop from a coding perspective, but oh well..
n_c <- 1
m_c <- 1
n_n <- 1
m_n <- 1
n_v <- 1
c = c(1,2)
n = c(10,20)
v = c(100,200)
parm_length = 3
iter = length(c)
test <- data.frame(matrix(ncol=parm_length, nrow=iter^parm_length))
test[] <- expand.grid(c,n,v)
x <- (test[,1] * n_c)/m_c
y <- (test[,2]* n_n)/m_n
z <- (test[,3]/n_v)
group <- cbind(test,x,y,z)
print(group)

why smart rounding works differently with map/lapply than without?

I would like to smartly round my results so that it sums up to the same sum after rounding.
Can someone explain me why this is different when I do it with map or lapply?
v <- c(
0.9472164,
71.5330771,
27.5197066)
smart.round <- function(x, digits = 0) {
up <- 10 ^ digits
x <- x * up
y <- floor(x)
indices <- tail(order(x-y), round(sum(x)) - sum(y))
y[indices] <- y[indices] + 1
y / up
}
### works correctly
smart.round(v)
### lapply and map is wrong
lapply(v,smart.round)
map(v,smart.round)
( I think this is merely a comment, but I have not yet earned my right add comments )
lapply, purrr::map are processing your input sequentially. In your example, lapply takes the first value of v and calls smart.round then moves on to the second value of v and so on ...
in total smart.round is called three times, each time without any knowledge of the other two values in v.
I'm not entirely sure why you try to use lapply here, if this is part of a more complex situation you might want to expand your question.
I have written my own solution. Definitely a bit cumbersome but it works.. :) My initial goal was just to input a dataframe and output the rounded dataframe.
The whole example here:
v <- data.frame(a = c(0.9472164,
71.5330771,
27.5197066),
b = c(4.6472164,
5.6330771,
27.1197066))
smart.round <- function(x, digits = 0) {
up <- 10 ^ digits
x <- x * up
y <- floor(x)
indices <- tail(order(x-y), round(sum(x)) - sum(y))
y[indices] <- y[indices] + 1
y / up
}
rounding_function <- function(input_df) {
output_df <- data.frame(matrix(ncol = ncol(input_df), nrow = nrow(input_df)))
for (i in 1:nrow(input_df)) {
a = smart.round(as.numeric(input_df[i,]))
for (k in 1:ncol(input_df)) {
output_df[i,k]=a[k]
}
colnames(output_df) = colnames(input_df)
}
return(output_df)
}
v_rounded <- rounding_function(v)

Filling up a vector

a <- c(0,3,7,2)
b <- 10`
I try to distribute a certain number of values (b) over the length of vector a. Instead of adding 10/4 to every value of a, I want to fill them up. the result vector for this case should be c(5,5,7,5).
what I've tried:
f = e + b
opt.vert <- function(b,a,f) {
repeat{lapply(1:length(a),
function(x) if((a[[x]] == min(a)) && (a[[x]]) < (b/length(a))){
a[[x]] <- a[[x]] +1
} else {
a[[x]] <- a[[x]]
} )
if(sum(a) >= f) break
}
return(a)
}
Apart from that approach being horribly unelegant, it also doesn't work. I'm having a hard time figuring out what's wrong in it bc it seems to drag me into an eternal loop and I therefore get no error message.
for (i in seq_len(b)) a[which.min(a)] <- a[which.min(a)] + 1
#[1] 5 5 7 5
Note that which.min returns the position of the first minimum. If you want to break ties differently, you'll have to modify this slightly.
(I suspect spending some time on the mathematical background of the task might lead to more efficient solutions that could avoid loops. Might be a nice puzzle for people with more spare time.)
something like this using recursion
a <- c(0, 3, 7, 2)
b <- 10
Reduce(function(x, y) {
idx <- which.min(x)
x[idx] <- x[idx] + 1
x
}, rep(1, b), a, accumulate=TRUE)

Creating an Equation with Summation and Constants in R

I was hoping someone could evaluate this code in which I am seeking to create a Random Variable Z using a Uniform Variable U and numerous Constant variables within a summation.
I have U(0,1) and constants a0=2.51..., a2=0.01, and b0=1...b3=0.0013.
and based on if U is > or < than 0.5 we get either Z1 or Z2 in return. My code is below!
w <- (sqrt((-2)*(log(U))))
a[1] <- 2.515517
a[2] <- 0.802853
a[3] <- 0.010328
b[1] <- 1
b[2] <- 1.432788
b[3] <- 0.189269
b[4] <- 0.001308
U <- runif(1)
if(U<=0.5) {
print(ZOne <- ((-w)+((sum(((a[i])*(w^[i])), i=1, 3))/(sum(((b[j])*(w^[j])), j=1, 4)))))
} else {
print(ZTwo <- ((1)-(((-w)+((sum(((a[i])*(w^[i])), i=1, 3))/(sum(((b[j])*(w^[j])), j=1, 4)))))))
}
Hope this makes sense, just for reference ZOne = , when U=<0.5.
ZTwo is (1-ZOne), when U>=0.5.
If you need any clarification please just let me know. Thank you!
*PS, I somehow need to create 1000 of these variables (Z), and figured I would just use the replicate for that.
a <- numeric(3)
b <- numeric(4)
a[1] <- 2.515517
a[2] <- 0.802853
a[3] <- 0.010328
b[1] <- 1
b[2] <- 1.432788
b[3] <- 0.189269
b[4] <- 0.001308
U <- runif(1)
# You can't use U before it exists!
w <- (sqrt((-2)*(log(U))))
# Here assuming w^2 in your equation is w * w etc.
# if not, remove ^(1 : length(a)) and ^(1 : length(b))
ZOne <- (-w) + sum(a * w^(1 : length(a)))/sum(b * w^(1 : length(b)))
if (U<=0.5) {
print(ZOne)
} else {
print(ZTwo <- 1 - ZOne)
}
This seems to be what you are trying to do:
Z <- function(){
w <- sqrt(-2*log(runif(1)))
a <- c(2.515517, 0.802853, 0.010328)
b <- c(1, 1.432788, 0.189269,0.001308)
ZOne <- -w+sum(a*w^(1:3))/sum(b*w^(1:4))
ZTwo <- 1 - ZOne
if(runif(1)<=0.5) {
Zval <- ZOne
} else {
Zval <- ZTwo
}
Zval
}
R operates on whole vectors. Using indices (i,j) is frequently a sign of poor design. Perhaps you might want to spend a certain amount of time reading a tutorial on R programming. By making it a function, you can use Z() to create random variates at will. E.g. something like replicate(1000,Z()) will create 1000 such values.

Nested rolling sum in vector

I am struggling to produce an efficient code to compute the vector result r result from an input vector v using this function.
r(i) = \sum_{j=i}^{i-N} [o(i)-o(j)] * exp(o(i)-o(j))
where i loops (from N to M) over the vector v. Size of v is M>>N.
Of course this is feasible with 2 nested for loops, but it is too slow for computational purposes, probably out of fashion and deprecated style...
A MWE:
for (i in c(N+1):length(v)){
csum <- 0
for (j in i:c(i-N)) {
csum <- csum + (v[i]-v[j])*exp(v[i]-v[j])
}
r[i] <- csum
}
In my real application M > 10^5 and the v vector is indeed several vectors.
I have been trying with nested applications of lapply and rollapply without success.
Any suggestion is welcome.
Thanks!
I don't know if it is any more efficient but something you can try:
r[N:M] <- sapply(N:M, function(i) tail(cumsum((v[i]-v[1:N])*exp(v[i]-v[1:N])), 1))
checking that both computations give same results, I got r with your way and r2 with mine, initializing r2 to rep(NA, M) and assessed the similarity:
all((r-r2)<1e-12, na.rm=TRUE)
# [1] TRUE
NOTE: as in #lmo answer, tail(cumsum(...), 1) can be efficiently replaced by just using sum(...):
r[N:M] <- sapply(N:M, function(i) sum((v[i]-v[1:N])*exp(v[i]-v[1:N])))
Here is a method with a single for loop.
# create new blank vector
rr <- rep(NA,M)
for(i in N:length(v)) {
rr[i] <- sum((v[i] - v[seq_len(N)]) * exp(v[i] - v[seq_len(N)]))
}
check for equality
all.equal(r, rr)
[1] TRUE
You could reduce the number of operations by 1 if you store the difference. This should add a little speed up.
for(i in N:length(v)) {
x <- v[i] - v[seq_len(N)]
rr[i] <- sum(x * exp(x))
}

Resources