As a background, I'm a computer programmer and I'm working on a software library that allows a computer to quickly search through all dates to find a set of dates that satisfies a criteria. For example:
I want a list of every possible time that has ever occurred that has occurred on a friday or a saturday that is in April or May during the first week of the month.
My library uses numerical sets to efficiently represent ranges of dates that satisfy a criteria.
I've been thinking about ways to improve the performance of some parts of the app and I think that by combining sets and some geometry, I can really improve my results. However, my geometry is a bit rusty and I was hoping you might could help.
Here's my thought:
Certain elements of time can be represented as a circular dial. For example, Minutes can be positioned on a clock with values between 0...59. We could store valid ranges as a list of arcs. For example, If we wanted all times that ended with 05..10, we could store [5,10]. If we wanted all times that end with :45-59 or :00-15, we could store [45, 15]. Notice how this last arc "loops around" the dial. Here's a mockup showing different ranges intersecting on a dial
My question is this:
Given a set of whole numbers between N...M arranged into a circle.
Given Arc1 which is representing by [A, B] and Arc2 which is represented by [C, D] where A, B, C, and D are all within in range N...M
How do I determine:
A. Whether the arcs intersect.
B. If they do, what their intersection is.
C. If they do, what their union is.
Thank you so much for your help. If you're not able to help, if you can point me in the right direction, that would be great.
Thanks!
A simple and safe approach is to split the intervals that straddle 0. Then you perform pairwise interval intersection/union (for instance if A < D and C < B then [max(A,C), min(B,D)] for the intersection), and merge them if they meet at 0.
It seems the primitive operation to implement would be something like 'is the number X contained in the arch [A,B]'. Once you have that, you could implement an [A,B]/[C,D] arch-intersection predicate by something like -
Arch intersection means exactly that at least one of the following conditions is met:
C is contained in [A,B]
D is contained in [A,B]
A is contained in [C,D]
B is contained in [C,D]
One way to implement this contained-in-arch test without any branches is with some trigonometry and vector cross product. Not sure it would be faster (the math/branches performance tradeoff is entirely empiric), but it might be worth a try.
Denote Xa = sin(X/N * 2PI), Ya = cos(X/N * 2PI) and similarly for Xb,Yb etc.
C is contained in [A,B] is equivalent to:
Xa * Yc - Ya * Xc > 0
AND
Xc * Yb - Yc * Xb > 0
You can complete the other 3 conditions in an identical manner.
Hope this turns out useful.
I would like to solve a quite big optimization problem, where time matters, but I stucked with the understanding in the vast amount of "R" packages, so I would like to ask the community directly about this problem.
I want to minimize a function:
F=(x-y)^2
where y is a given, pre defined vector of 8000 values.
So, I'm searching for the 8000 x-es.
I've got a matrix of A (which is basically a dummy variable matrix), with nrow=8, ncol=8000.
I also have a vector b, with 8 given values.
So, I want to want to solve the following problem:
min(x-y)^2
s.t:
A*x=b
Theoretically I understand everything, but somehow I fail to incorporate the F into any package, where equallity constrains are allowed.
Also (and because I've no idea, what will be the processing time), I would like to ask you, what would you do, if:
F= abs(x-y)
because if the minimalization of the quadratic function takes to long, this second option would also satisfy me.
The data is confidential, but privately (and a bit differently) I'll send it, if it's necessary for the solution.
Edit nr.1:
ok, i'll be more specific this time
i've got 2 years of data (that is the 8000 measurment, each year contains 4000 measurments)
each year have q1, q2, q3, q4, which happened somehow in a past (but will be specified as optimum in the future, to achive some goals)
so, this is my b vector, the criteria that the optimization has to meet.
made up numbers
b<-c(20,30,40,50,60,70,80,90)
i have got a matrix A, which is a binary matrix, indicates where are we in the time, q1,q2, etc
let say, that one quarter of the year is 1 days long, so:
(there is 7 zero in a vector, because we are talkin about 2 years here, and only one quarter)
a<-c(1,0,0,0,0,0,0,0)
u<-c(0,1,0,0,0,0,0,0)
c<-c(1,0,1,0,0,0,0,0)
d<-c(0,0,0,1,0,0,0,0)
from this point, another year comes in, with another q1, that is why the binary wont jumps back to the first place
e<-c(0,0,0,0,1,0,0,0)
f<-c(0,0,0,0,0,1,0,0)
g<-c(0,0,0,0,0,0,1,0)
h<-c(0,0,0,0,0,0,0,1)
A<-cbind(a,u,c,d,e,f,g,h)
this is a bit bad way to represent the data, I can trick you, because the length and the width is the
same in the matrix, but remember, in the original data everything is fine for matrix multiplication
the width of A, and the length of x is 8000
there is a planed way, how things sholud go in each Q, that is the "y", which is given.
made up numbers
y<-c(10,11,12,13,14,16,17,18)
so basically, i want to stuck to the plan, as much as I can, but to achive criteria b, that is whay I want to minimize
the differences between the planned and the x values,
min F (Ax-y)^2
s.t: A*x=b
Hope it's clearer,i reduced the dimension of the problem, this way it may
seem unfeasible
(its dumb, i know :)
Looks like there is nothing to optimize. E.g. with your data set:
> b<-c(20,30,40,50,60,70,80,90)
> a<-c(1,0,0,0,0,0,0,0)
> u<-c(0,1,0,0,0,0,0,0)
> c<-c(1,0,1,0,0,0,0,0)
> d<-c(0,0,0,1,0,0,0,0)
>
> e<-c(0,0,0,0,1,0,0,0)
> f<-c(0,0,0,0,0,1,0,0)
> g<-c(0,0,0,0,0,0,1,0)
> h<-c(0,0,0,0,0,0,0,1)
>
> A<-cbind(a,u,c,d,e,f,g,h)
> x <- solve(A,b)
> x
a u c d e f g h
-20 30 40 50 60 70 80 90
It would be more interesting if there were some degrees of freedom left to play with x and make it as close as y as possible,
Assume I have a series t_1, t_2,..., t_n,..., and the number is always coming in. I want to calculate the approximate of sum/average of last t numbers, but without storing those t numbers. The only thing stored is the previous sum/average. What is the appropriate function?
E.g.
s_1 = t_1
s_2 = f(t_2, s_1)
s_3 = f(t_3, s_2)
The possible function may be like s_2 = t_2 + s_1 * (e ^ -1), but what is the optimal solution?
Note: The window size is fixed. So there is no exact solution, but an approximation, since the number out of the window is not known.
Note 2: Thanks for all the discussion. I know the answer now. It is really trivial, my fault not thinking it well. I will delete this question later. But any way, the answer is, I should assume that the number out of the window is the average. Under this assumption, the new sum is
(old average)*(t-1) + new number
and the new average is
((old average)*(t-1)+(new number))/t
First of all, this realistically is probably a question for Mathematics Stack Exchange
but anyway, since you dont mention a programming language, Ill go with C# (with an array). lets call your series 'mySeries':
double average=0;
for (int i = 0; i < mySeries.Length; i++)
average+=mySeries[i]/(i+1);
MessageBox.Show("Here is your average dawg:" + average.ToString());
EDIT
So it seems I "underestimated" what varying length numbers meant. I didn't even think about situations where the operands are 100 digits long. In that case, my proposed algorithm is definitely not efficient. I'd probably need an implementation who's complexity depends on the # of digits in each operands as opposed to its numerical value, right?
As suggested below, I will look into the Karatsuba algorithm...
Write the pseudocode of an algorithm that takes in two arbitrary length numbers (provided as strings), and computes the product of these numbers. Use an efficient procedure for multiplication of large numbers of arbitrary length. Analyze the efficiency of your algorithm.
I decided to take the (semi) easy way out and use the Russian Peasant Algorithm. It works like this:
a * b = a/2 * 2b if a is even
a * b = (a-1)/2 * 2b + a if a is odd
My pseudocode is:
rpa(x, y){
if x is 1
return y
if x is even
return rpa(x/2, 2y)
if x is odd
return rpa((x-1)/2, 2y) + y
}
I have 3 questions:
Is this efficient for arbitrary length numbers? I implemented it in C and tried varying length numbers. The run-time in was near-instant in all cases so it's hard to tell empirically...
Can I apply the Master's Theorem to understand the complexity...?
a = # subproblems in recursion = 1 (max 1 recursive call across all states)
n / b = size of each subproblem = n / 1 -> b = 1 (problem doesn't change size...?)
f(n^d) = work done outside recursive calls = 1 -> d = 0 (the addition when a is odd)
a = 1, b^d = 1, a = b^d -> complexity is in n^d*log(n) = log(n)
this makes sense logically since we are halving the problem at each step, right?
What might my professor mean by providing arbitrary length numbers "as strings". Why do that?
Many thanks in advance
What might my professor mean by providing arbitrary length numbers "as strings". Why do that?
This actually change everything about the problem (and make your algorithm incorrect).
It means than 1234 is provided as 1,2,3,4 and you cannot operate directly on the whole number. You need to analyze your algorithm in terms of #additions, #multiplications, #divisions.
You should expect a division to be a bit more expensive than a multiplication, and a multiplication to be lot more expensive than an addition. So a good algorithm try to reduce the number of divisions and multiplications.
Check out the Karatsuba algorithm, (ps don't copy it that's not what your teacher want) is one of the fastest for this specification.
Add 3): Native integers are limited in how large (or small) numbers they can represent (32- or 64-bit integers for example). To represent arbitrary length numbers you can choose strings, because then you are not really limited by this. The problem is then, of course, that your arithmetic units are not really made to add strings ;-)