How to find the averages of any consecutive numbers in a sequence? - math

This is a bit of a math question, but I post it here too because there's a direct practical purpose and it's related to creating a faster algorithm. I want to identify users that use my app on a weekly basis. For each user I can generate a sequence of times of their interactions, and from that I can generate a sequence of the length of time between each interaction.
So given this sequence of lengths of time, how can I find sections of consecutive numbers that have an average of 7 days or less?
As an example, if I had the following sequence: [1, 11, 1, 8, 12]
[1, 11, 1, 8, 12] would be a valid stretch of numbers with an average of 7 or less, but [11, 1, 8, 12] would not be valid. [1, 2, 12] would again be valid.
Ideally, my output for every valid section would be the starting position of the first item and the length of the section. So [1, 11, 1, 8, 12] would be described as [1, 5] and [1, 2, 12] would be described as [3, 3].
There is a brute force, computational approach where I take every item in the sequence as a start point, and calculate the averages of every possible length of following numbers up until the end of the sequence. The number of calculations grows quickly though at a rate of n(n+1)/2 (Imagine for each given sequence of length N finding consecutive sequences of length N, N-1, N-2 etc.)
I ask broadly if there's a more elegant approach that doesn't require a quadratically growing number of individual calculations for means.

Related

Multidimensional selection in a 2D array

picture of question posted in this link
Given a 2-d array arr of size n x m, a selection is defined as an array of integers such that it contains at least [m/2] integers from each row of arr. The cost of the selection is defined as the maximum difference between any two integers of the selection.
Suppose k is the minimum cost of all the possible selections for the given 2-d array. Find the maximum value of the product of k * the number of integers considered in the selection with the minimum cost.
Example
Suppose n= 3, m = 2, and arr = [[1, 2], [3, 4], [8, 9]]
Some of the possible selections are [2, 3, 8], [1, 2, 3, 9], [1, 3, 4, 8, 9] etc. The cost of these selections are 8 - 2 = 6, 9 - 1 = 8, and 8 respectively.
Here the minimum cost of all the possible selections is 6. The possible selections with the cost 6 are [2, 4, 8] and [2, 3, 4, 8]. The maximum value of the required product is obtained using the latter selection i.e. 6 * 4 = 24. Hence the answer is 24.
How do you go about this problem? This is the basic thought process:
Find the selections (how do you do that?)
calculate the cost of each selection
determine the least cost and multiply it with the length of the selection.
Unfortunately I am stuck on the very first step which is to find the selections. How can we do that in an efficient manner? Can we use combinations on 2D arrays?
Any help would be appreciated, thank you!

Rebalancing table with average value

Looking for algorithm which will resolve my problem, i.e.
Having a table with values: [6, 2, 3, 1]
Get an average value per slice: 12/4 = 3
Re-balance values in slice, i.e. I should get formulas something like: 1) 0, 2) 6-1, 3) 0, 4) 5-2. I.e. from highest value deduct amount need's for average. As result I will need to get slice [3, 3, 3, 3]
Thanks

How to partition UUID space into N equal-size partitions?

Take a UUID in its hex representation: '123e4567-e89b-12d3-a456-426655440000'
I have a lot of such UUIDs, and I want to separate them into N buckets, where N is of my choosing, and I want to generate the bounds of these buckets.
I can trivially create 16 buckets with these bounds:
00000000-0000-0000-0000-000000000000
10000000-0000-0000-0000-000000000000
20000000-0000-0000-0000-000000000000
30000000-0000-0000-0000-000000000000
...
e0000000-0000-0000-0000-000000000000
f0000000-0000-0000-0000-000000000000
ffffffff-ffff-ffff-ffff-ffffffffffff
just by iterating over the options for the first hex digit.
Suppose I want 50 equal size buckets(equal in terms of number of UUID possibilities contained within each bucket), or 2000 buckets, or N buckets.
How do I generate such bounds as a function of N?
Your UUIDs above are 32 hex digits in length. So that means you have 16^32 ≈ 3.4e38 possible UUIDs. A simple solution would be to use a big int library (or a method of your own) to store these very large values as actual numbers. Then, you can just divide the number of possible UUIDs by N (call that value k), giving you bucket bounds of 0, k, 2*k, ... (N-1)*k, UMAX.
This runs into a problem if N doesn't divide the number of possible UUIDs. Obviously, not every bucket will have the same number of UUIDs, but in this case, they won't even be evenly distributed. For example, if the number of possible UUIDs is 32, and you want 7 buckets, then k would be 4, so you would have buckets of size 4, 4, 4, 4, 4, 4, and 8. This probably isn't ideal. To fix this, you could instead make the bucket bounds at 0, (1*UMAX)/N, (2*UMAX)/N, ... ((N-1)*UMAX)/N, UMAX. Then, in the inconvenient case above, you would end up with bounds at 0, 4, 9, 13, 18, 22, 27, 32 -- giving bucket sizes of 4, 5, 4, 5, 4, 5, 5.
You will probably need a big int library or some other method to store large integers in order to use this method. For comparison, a long long in C++ (in some implementations) can only store up to 2^64 ≈ 1.8e19.
If N is a power of 2, then the solution is obvious: you can split on bit boundaries as for 16 buckets in your question.
If N is not a power of 2, the buckets mathematically cannot be of exactly equal size, so the question becomes how unequal are you willing to tolerate in the name of efficiency.
As long as N<2^24 or so, the simplest thing to do is just allocate UUIDs based on the first 32 bits into N buckets each of size 2^32/N. That should be fast enough and equal enough for most applications, and if N needs to be larger than that allows, you could easily double the bits with a smallish penalty.

Absolute difference between a vector and a number in R

I'm new to R and I would be very grateful for an answer to my question:
I've got a vector: c(9, 11, 2, 6, 10) and the number 4 (or a vector c(4))
I want to generate a vector with the absolute difference between the first and the second one, which should look like this: c(5, 7, 2, 2, 6)
How do I do this? I can't get it to work with diff(), even after reading through the help (?diff()).
Any help is appreciated :)
x <- c(9, 11, 2, 6, 10)
abs(x - 4)
#[1] 5 7 2 2 6
abs finds the absolute value of a vector. '4' will be recycled when subtracted from x. If you have multiple values to be subtracted, they will also be recycled with a warning unless they are the same length as x.
You ran into problems with diff because it isn't designed for scalar subtraction (what you are attempting). It is better suited to finding the difference within a vector.

Getting Unique Number Combinations

Is it possible without using exponentiation to have a set of numbers that when added together, always give unique sum?
I know it can be done with exponentiation (see first answer): The right way to manage user privileges (user hierarchy)
But I'm wondering if it's possible without exponentiation.
No, you can only use exponentiation, because the sum of lower values have to be less than the new number to be unique: 1+2=3 < 4, 1+2+4=7 < 8.
[EDIT:]
This is a laymen's explanation, of course there are other possibilities, but none as efficient as using exponentials of 2.
There's a chance it can be done without exponentation (I'm no math expert), but not in any way that's more efficient than exponentation. This is because it only takes one bit of storage space per possible value, and as an added plus you can use boolean operators to do useful stuff with the values.
If you restrict yourself to integers, the numbers have to grow at least as fast as an exponential function. If you find a function that grows faster (like, oh, maybe the Ackermann function) then the numbers produced by that will probably work too.
With floating-point numbers, you can keep adding unique irreducible roots of primes (sqrt(2), sqrt(3), sqrt(5), ...) and you will always get something unique, up until you hit the limits of floating-point precision. Not sure how many unique numbers you could squeeze out of it - maybe you should try it.
No. To see this directly, think about building up the set of basis values by considering at each step the smallest possible positive integer that could be included as the next value. The next number to add must be different from all possible sums of the numbers already in the set (including the empty sum, which is 0), and can't combine with any combination of numbers already present to produce a duplicate. So...
{} : all possible sums = {0}, smallest possible next = 1
{1} : all possible sums = {0, 1}, smallest possible next = 2
{1, 2} : all possible sums = {0, 1, 2, 3}, smallest possible next = 4
{1, 2, 4} : a.p.s. = {0, 1, 2, 3, 4, 5, 6, 7}, s.p.n. = 8
{1, 2, 4, 8} ...
And, of course, we're building up the binary powers. You could start with something other than {1, 2}, but look what happens, using the "smallest possible next" rule:
{1, 3} : a.p.s. = {0, 1, 3, 4}, s.p.n. = 6 (because 2 could be added to 1 giving 3, which is already there)
{1, 3, 6} : a.p.s. = {0, 1, 3, 4, 6, 7, 9, 10}, s.p.n = 11
{1, 3, 6, 11} ...
This sequence is growing faster than the binary powers, term by term.
If you want a nice Project-Euler-style programming challenge, you could write a routine that takes a set of positive integers and determines the "smallest possible next" positive integer, under the "sums must be unique" constraint.

Resources