Vector whose elements add up to a value in R - r

I'm trying to create a vector whose elements add up to a specific number. For example, let's say I want to create a vector with 4 elements, and they must add up to 20, so its elements could be 6, 6, 4, 4 or 2, 5, 7, 6, whatever. I tried to run some lines using sample() and seq() but I cannot do it.
Any help appreciated.

To divide into 4 parts, you need three breakpoints from the 19 possible breaks between 20 numbers. Then your partitions are just the sizes of the intervals between 0, your partitions, and 20:
> sort(sample(19,3))
[1] 5 7 12
> diff(c(0, 5,7,12,20))
[1] 5 2 5 8
Test, lets create a big matrix of them. Each column is an instance:
> trials = sapply(1:1000, function(X){diff(c(0,sort(sample(19,3)),20))})
> trials[,1:6]
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 1 8 13 3 2
[2,] 4 7 10 2 9 5
[3,] 2 11 1 4 3 7
[4,] 11 1 1 1 5 6
Do they all add to 20?
> all(apply(trials,2,sum)==20)
[1] TRUE
Are there any weird cases?
> range(trials)
[1] 1 17
No, there are no zeroes and nothing bigger than 17, which will be a (1,1,1,17) case. You can't have an 18 without a zero.

foo = function(n, sum1){
#Divide sum1 into 'n' parts
x = rep(sum1/n, n)
#For each x, sample a value from 1 to that value minus one
f = sapply(x, function(a) sample(1:(a-1), 1))
#Add and subtract f from 'x' so that sum(x) does not change
x = x + sample(f)
x = x - sample(f)
x = floor(x)
x[n] = x[n] - (sum(x) - sum1)
return(x)
}

Related

Swap each value to a different value

Below is the swapping function which swap values lesser than 10 in a list
swapFun <- function(x, n = 10){
inx <- which(x < n)
x[sample(inx)] <- x[inx]
x
}
For example, the original list is 1, 2, 3, 10, 4, 11.
After swapping by sampling , this list may be 2, 1, 4, 10, 3, 11 or 1, 3, 2, 10, 4, 11.
But I want to swap each value lesser than 10 to a different value lesser than 10.
For example, the first outcome (ie 2, 1, 4, 10, 3, 11) is what I want because each value lesser than 10 has been swapped to a different value lesser than 10.
However the second outcome (ie 1, 3, 2, 10, 4, 11.) is not what I want because 1 and 4 have not been swapped to a different value lesser than 10.
If there are no feasible solution, just print 'no feasible solution'
Any suggestions?
Many thanks.
You are looking for a derangement of the values less than 10. By the theory of derangements, approximately 1/e (37%) of randomly chosen permutations are derangements, so a hit or miss approach is reasonable, with an important caveat.
There might be repetitions among the items less than n. Not all permutations of those items are distinguishable, so not all derangements of the items look like derangements: swapping two 2s with each other (for example) is in some sense a derangement, but it wouldn't look like a derangement. The 1/e heuristic applies to raw permutations of positions, not distinguishable permutations of values. If the number of repetitions is high, it might take longer than 1/e would suggest. If in your use-case the performance isn't satisfactory, you would need to replace sample() in the function definitions by a more sophisticated function that picks random distinguishable permutations.
As far as feasibility goes, there will be a feasible solution so long as the most common element less than n doesn't account for more than 50% of the items less than n
derangement <- function(x){
if(max(table(x)) > length(x)/2) return(NA)
while(TRUE){
y <- sample(x)
if(all(y != x)) return(y)
}
}
swapFun <- function(x, n = 10){
inx <- which(x < n)
y <- derangement(x[inx])
if(length(y) == 1) return(NA)
x[inx] <- y
x
}
For example,
> set.seed(10)
> swapFun(c(1,2,10,4,11,2,12))
[1] 2 4 10 2 11 1 12
> swapFun(c(2,2,10,4,11,2,12))
[1] NA
Note that no valid derangement has length 1, but NA has length 1, so testing the length of y is an effective way to test if it is possible to derange the values. The function returns NA if no derangement of the values less than n exists. You can test for NA and print "No feasible solutions" if you want
This function gives you all the unique permutations for the numbers < m while keeping the positions of numbers >= m the same.
require(combinat)
x <- c(1,2,10,4,11,2,12)
m <- 10
swapFun <- function(x, m){
# determine positions of values to be permutated or fixed
xi <- which(x < m)
xj <- which(x >= m)
# make permuations
xp <- do.call(rbind, permn(x[xi]))
# make matrix with permutated and fixed values
xn <- matrix(nrow = nrow(xp), ncol = length(x))
xn[ ,xi] <- xp
xn[ ,xj] <- sort(rep(x[xj],nrow(xp)))
# delete duplicates
d <- !duplicated(apply(xn, 1, paste, collapse = "_"))
xn <- xn[d,]
return(xn)
}
swapFun(x,m)
> swapFun(x,m)
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 2 10 4 11 2 12
[2,] 1 2 10 2 11 4 12
[3,] 2 1 10 2 11 4 12
[4,] 2 1 10 4 11 2 12
[5,] 1 4 10 2 11 2 12
[6,] 4 1 10 2 11 2 12
[7,] 4 2 10 1 11 2 12
[8,] 2 4 10 1 11 2 12
[9,] 2 4 10 2 11 1 12
[10,] 4 2 10 2 11 1 12
[11,] 2 2 10 4 11 1 12
[12,] 2 2 10 1 11 4 12

distribute `n` among `k` units without repetition and zero structures in R

I was wondering if there might be a way in R to distribute n among k units without repetition (e.g., 3 5 2 is the same as 5 3 2, and 2 3 5 and 5 2 3) and without considering 0 combinations (i.e., no 9 1 0) and see the make-up of this distribution?
For example if n = 9 and k = 3 then we expect the make-up to be:
(Note: k will always be the # of columns)
3 3 3
4 3 2
4 1 4
5 2 2
5 1 3
6 2 1
7 1 1
makeup <- function(n, k){
# your suggested solution #
}
These are called integer partitions (more specifically restricted integer partitions) and can efficiently be generated with the packages partitions or arrangements like so:
partitions::restrictedparts(9, 3, include.zero = FALSE)
[1,] 7 6 5 4 5 4 3
[2,] 1 2 3 4 2 3 3
[3,] 1 1 1 1 2 2 3
arrangements::partitions(9, 3)
[,1] [,2] [,3]
[1,] 1 1 7
[2,] 1 2 6
[3,] 1 3 5
[4,] 1 4 4
[5,] 2 2 5
[6,] 2 3 4
[7,] 3 3 3
They are much faster than the solutions thus provided:
library(microbenchmark)
microbenchmark(arrangePack = arrangements::partitions(20, 5),
partsPack = partitions::restrictedparts(20, 5, include.zero = FALSE),
myfun2(20, 5, 20),
myfun1(20, 5, 20),
makeup(20, 5),
mycomb(20, 5), times = 3, unit = "relative")
Unit: relative
expr min lq mean median uq max neval
arrangePack 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 3
partsPack 3.070203 2.755573 2.084231 2.553477 1.854912 1.458389 3
myfun2(20, 5, 20) 10005.679667 8528.784033 6636.284386 7580.133387 5852.625112 4872.050067 3
myfun1(20, 5, 20) 12770.400243 10574.957696 8005.844282 9164.764625 6897.696334 5610.854109 3
makeup(20, 5) 15422.745155 12560.083171 9248.916738 10721.316721 7812.997976 6162.166646 3
mycomb(20, 5) 1854.125325 1507.150003 1120.616461 1284.278219 950.015812 760.280469 3
In fact, for the example below, the other functions will error out because of memory:
system.time(arrangements::partitions(100, 10))
user system elapsed
0.068 0.031 0.099
arrangements::npartitions(100, 10)
[1] 2977866
You may try gtools::combinations for this work like below with repeats.allowed=TRUE option:
m <- gtools::combinations(9, 3, repeats.allowed = TRUE)
m[rowSums(m) == 9,]
A probable function could be, with options(expressions = 500000), this function could go till n = 500 (successfully ran on my machine for n=500, r=3):
mycomb <- function(n, r, sumval){
m <- combinations(n, r, repeats.allowed = TRUE)
m[rowSums(m) == sumval,]
}
mycomb(9,3,9)
Output:
# [,1] [,2] [,3]
#[1,] 1 1 7
#[2,] 1 2 6
#[3,] 1 3 5
#[4,] 1 4 4
#[5,] 2 2 5
#[6,] 2 3 4
#[7,] 3 3 3
Here's a base solution using expand.grid. I'm not going to recommend it for large n, but it works:
makeup <- function(n, k) {
x <- expand.grid(rep(list(1:n), 3)) # generate all combinations
x <- x[rowSums(x) == n,] # filter out stuff that doesn't sum to n
x <- as.data.frame(t(apply(x, 1, sort))) # order everything
unique(x) # keep non-duplicates
}
A little rethinking simplifies this greatly. If we have a vector of n objects, we can break it apart at n-1 different spots.. starting from this, we can reduce the work substantially:
makeup <- function(n, k) {
splits <- combn(n-1, k-1) # locations where to split up the data
bins <- rbind(rep(0, ncol(splits)), splits) # add an extra "split" before the 1st element
x <- apply(bins, 2, function(x) c(x[-1],9) -x) # count how many items in each bin
x <- as.data.frame(t(apply(x, 2, sort))) # order everything
unique(x) # keep non-duplicates
}
using matrix in base R:
myfun1 <- function( n, k){
x <- as.matrix(expand.grid( rep(list(seq_len(n)), k)))
x <- x[rowSums(x) == n,]
x[ ! duplicated( t( apply(x, 1, sort)) ),]
}
myfun1( n = 9, k = 3 )
May be this using data.table.
myfun2 <- function( n, k){
require('data.table')
dt <- do.call(CJ, rep(list(seq_len(n)), k))
dt <- dt[rowSums(dt) == n,]
dt[which(!duplicated(dt[, transpose(lapply( transpose(.SD), sort ))])),]
}
myfun2( n = 9, k = 3 )
# V1 V2 V3
# 1: 7 1 1
# 2: 6 2 1
# 3: 5 3 1
# 4: 4 4 1
# 5: 5 2 2
# 6: 4 3 2
# 7: 3 3 3

From a sequence of numbers, how do I find an immediate smaller (and an immediate bigger) number than a particular random number, In R?

So I have 10 increasing sequence of numbers, each of them look like (say x(i) <- c(2, 3, 5, 6, 8, 10, 11, 17) for i ranging from 1 to 10 ) and I have a random sampling number say p=9.
Now for each sequence x(i), I need to find the number immediately smaller than p and immediately bigger than p, and then for each i (from 1 to 10) , I need to take the difference of these two numbers and store them in a string.
For the x(i) that I have given here, the immediate smaller number than p=9 would be 8 and the immediate bigger number than p=9 would be 10, the difference of these would be (10-8)=2.
I am trying to get a code that would create a string of these differences, where first number of the string would mean the difference for i=1, second number would mean the difference for i=2 and so on. The string would have i numbers.
I am relatively new to R, so anywhere connected to loops throws me off a little bit. Any help would be appreciated. Thanks.
EDIT: I am putting the code I am working with for clarification.
fr = 100
dt = 1/1000 #dt in milisecond
duration = 2 #no of duration in s
nBins = 2000 #SpikeTrain
nTrials = 20 #NumberOfSimulations
MyPoissonSpikeTrain = function(p, fr= 100) {
p = runif(nBins)
q = ifelse(p < fr*dt, 1, 0)
return(q)
}
set.seed(1)
SpikeMat <- t(replicate(nTrials, MyPoissonSpikeTrain()))
Spike_times <- function(i) {
c(dt*which( SpikeMat[i, ]==1))}
set.seed(4)
RT <- runif(1, 0 , 2)
for (i in 1:nTrials){
The explanation for this code, is mentioned in my previous question. I have 20 (number of trials aka nTrials) strings with name Spike_times(i) here. Each Spike_times(i) is a string of time stamps between o and 2 seconds where spikes occurred and they have different number of entries. Now I have a random time sample in the form of RT, which is a random number between 0 and 2 seconds. Say RT is 1.17 seconds and Spike_times(i) are the sequence of increasing times stamps between 0 and 2 seconds.
Let me give you an example, Spike_times(3) looks like 0.003 0.015 0.017 ... 1.169 1.176 1.189 ... 1.985 1.990 1.997 then I need a code that picks out 1.169 and 1.176 and gives me the difference of these entries 0.007 and stores it in another string say W as the third entry c(_, _, 0.007, ...) and does this for all 20 strings Spike_times(i) and gives me W with 20 entries.
I hope my question is clear enough. Please let me know if I need to correct something.
This approach should do what you want. I am making a function that extracts the desired result from a single sequence and then applying it to each sequence. I am assuming here that your sequences are row-vectors and are stacked in a matrix. If your actual data structure is different the code can be adapted, but you need to indicate how your sequences are actually stored.
x <- matrix(rep(c(2,3,5,6,8,10,11,17), 10), nrow=10, byrow = T)
x
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 2 3 5 6 8 10 11 17
#> [2,] 2 3 5 6 8 10 11 17
#> [3,] 2 3 5 6 8 10 11 17
#> [4,] 2 3 5 6 8 10 11 17
#> [5,] 2 3 5 6 8 10 11 17
#> [6,] 2 3 5 6 8 10 11 17
#> [7,] 2 3 5 6 8 10 11 17
#> [8,] 2 3 5 6 8 10 11 17
#> [9,] 2 3 5 6 8 10 11 17
#> [10,] 2 3 5 6 8 10 11 17
set.seed(123)
p = sample(10, 1)
# write a function to do what you want on one sequence:
# NOTE: If p appears in the sequence I assume you want the
# closest numbers not equal to p! If you want the closest
# numbers to p including p itself change the less than/
# greater than to <= / >=
get_l_r_diff <- function(row, p) {
temp <- row - p
lower <- max(row[temp < 0])
upper <- min(row[temp > 0])
upper - lower
}
apply(x, 1, function(row)get_l_r_diff(row, p))
#> [1] 3 3 3 3 3 3 3 3 3 3
apply(x, 1, function(row) get_l_r_diff(row, 9))
#> [1] 2 2 2 2 2 2 2 2 2 2
# if the result really needs to be a string
paste(apply(x, 1, function(row) get_l_r_diff(row, 9)), collapse = "")
#> [1] "2222222222"
For your case you can just apply the two functions to your indices:
spikes <- sapply(1:20, function(i){get_l_r_diff(Spike_times(i), RT)})
By making a small change to your Spike_times function you can do this with sapply returning a vector of all calculated values
Spike_times <- function(i) {
x <- c(dt*which( SpikeMat[i, ]==1))
min(x[x > RT]) - max(x[x < RT])
}
set.seed(4)
RT <- runif(1, 0 , 2)
results <- sapply(1:20, Spike_times)

How to sort odd and even numbers of an array in a specific format

I have a vector like this
seq_vector <- c(3,12,5,9,11,8,4,6,7,11,15,3,9,10,12,2)
I want to format them in descending order of odd numbers, followed by ascending order of even numbers. Output of above seq_vector will be
new_seq_vector <- c(15,11,11,9,9,7,5,3,3,2,4,6,8,10,12,12)
Can you please help me with the logic of the same?
Try x[order(x*v)] where v is -1 for odd, +1 for even.
Thanks to #lmo for this:
x[order( x*(-1)^x )]
# [1] 15 11 11 9 9 7 5 3 3 2 4 6 8 10 12 12
So v = (-1)^x here.
Some other ways to build v: #d.b's (-1)^(x %% 2); and mine, 1-2*(x %% 2).
(Thanks #d.b) If x contains negative integers, an additional sorting vector is needed:
# new example
x = c(2, 5, -15, -10, 1, -3, 12)
x[order(v <- (-1)^x, x*v)]
# [1] 5 1 -3 -15 -10 2 12
Take modulus by 2 (%% 2) to determine the odd and even elements and sort accordingly.
c(sort(seq_vector[seq_vector %% 2 == 1], decreasing = TRUE), #For odd
sort(seq_vector[seq_vector %% 2 == 0])) #For even
#[1] 15 11 11 9 9 7 5 3 3 2 4 6 8 10 12 12
Use an auxiliary function.
is.odd <- function(x) (x %% 2) == 1
result <- c(sort(seq_vector[is.odd(seq_vector)], decreasing = TRUE),
sort(seq_vector[!is.odd(seq_vector)]))
result

Find consecutive sub-vectors of length k out of a numeric vector which satisfy a given condition

I have a numeric vector in R, say
v= c(2,3,5,6,7,6,3,2,3,4,5,7,8,9,6,1,1,2,5,6,7,11,2,3,4)
Now, I have to find all the consecutive sub-vector of size 4 out of it with the condition that each element of the sub-vector must be greater than 2 and all sub-vector must be disjoint in the sense that non of the two sub-vector can contain same index element. So my output will be:
(3,5,6,7),(3,4,5,7),(5,6,7,11)
Edited:
Other examples for illustration purpose: for,
v=c(3,3,3,3,1,3,3,3,3,3,3,3,3)
output will be :
(3,3,3,3), (3,3,3,3),(3,3,3,3).
and for,
v= c(2,3,5,5,7,6,3,2,3,4,5,7,8,9,6,1,1,2,5,6,7,11,2,3,4)
output will be
(3,5,5,7),(3,4,5,7),(5,6,7,11)
The second condition on the output simply says that if we found any sub- array say (v[m],v[m+1],v[m+2],v[m+3]) with each element greater than > 2 then it will goes into my output and the next sub-array can only be start from v[m+4](if possible)
This solution uses embed() to create a matrix of lags and then extracts the desired rows from this matrix:
v <- c(2,3,5,6,7,6,3,2,3,4,5,7,8,9,6,1,1,2,5,6,7,11,2,3,4)
e <- embed(v, 4)
ret <- which(
apply(e, 1, function(x)all(x > 2)) &
apply(e, 1, function(x)length(unique(x)) == 4)
)
rows <- ret[c(1, 1 + which(diff(ret) > 4))]
e[rows, 4:1]
[,1] [,2] [,3] [,4]
[1,] 3 5 6 7
[2,] 3 4 5 7
[3,] 5 6 7 11
Try:
fun1 <- function(vec, n, cond1) {
lst1 <- lapply(1:(length(vec) - n+1), function(i) {
x1 <- vec[i:(i + (n-1))]
if (all(diff(x1) >= 0) & all(x1 > cond1))
x1
})
indx <- which(sapply(lst1, length) == n)
indx2 <- unlist(lapply(split(indx, cumsum(c(TRUE, diff(indx) != 1))), function(x) x[seq(1,
length(x), by = n-1)]))
lst1[indx2]
}
v1 <- c(3,3,3,3,1,3,3,3,3,3,3,3,3)
v2 <- c(2,3,5,5,7,6,3,2,3,4,5,7,8,9,6,1,1,2,5,6,7,11,2,3,4)
v3 <- c(2,3,5,6,7,6,3,2,3,4,5,7,8,9,6,1,1,2,5,6,7,11,2,3,4)
fun1(v1,4,2)
#[[1]]
#[1] 3 3 3 3
#[[2]]
#[1] 3 3 3 3
#[[3]]
#[1] 3 3 3 3
fun1(v2,4,2)
#[[1]]
#[1] 3 5 5 7
#[[2]]
#[1] 3 4 5 7
#[[3]]
#[1] 5 6 7 11
fun1(v3,4,2)
#[[1]]
#[1] 3 5 6 7
#[[2]]
#[1] 3 4 5 7
#[[3]]
#[1] 5 6 7 11
Here is another idea based on rle:
ff = function(x, size, thres)
{
valid_subsets = sapply(head(seq_along(x), -(size - 1)),
function(i) all(x[i:(i + (size - 1))] > thres))
r = rle(valid_subsets)
lapply(unlist(mapply(function(a, b) a + (seq_len(b) - 1) * size,
(cumsum(r$lengths) - r$lengths + 1)[which(r$values)],
(r$lengths[which(r$values)] + size - 1) %/% size)),
function(i) x[i:(i + (size - 1))])
}
ff(c(3,3,3,3,1,3,3,3,3,3,3,3,3), 4, 2)
ff(c(2,3,5,6,7,6,3,2,3,4,5,7,8,9,6,1,1,2,5,6,7,11,2,3,4), 4, 2)
Testing on another vector (I assume this is the correct output):
set.seed(4); xx = sample(1:10, 20, T)
xx
# [1] 6 1 3 3 9 3 8 10 10 1 8 3 2 10 5 5 10 6 10 8
ff(xx, 4, 2)
#[[1]]
#[1] 3 3 9 3
#
#[[2]]
#[1] 10 5 5 10
Unless I'm missing something, on "xx" (as well as on other cases) the other posted answers do not seem to work:
fun1(xx, 4, 2)
#[[1]]
#[1] 3 8 10 10
#e[rows, 4:1]
#[1] 9 3 8 10

Resources