Ok,
so this is a application of existing mathematical practices, but I can't really apply them to my case.
So, I have x of a currency to increase the level of a game-object y for cost z.
z is calculated in cost(y.lvl) = c_1 * c_2^y.lvl / c_3, where the c's are constants.
I am seeking an efficient way to calculate, how often I can increase the level of y, given x. Currently I'm using a loop that does something like this:
double tempX = x;
int counter = 0;
while(tempX >= cost(y.lvl+counter)){
tempX-=cost(y.lvl)+counter;
counter++;
}
The problem is, that in some cases, this loop has to iterate too many times to stay performant.
What I am looking for is essentially a function
int howManyCanBeBought(x,y.lvl), which calculates it's result in a single go, instead of looping a lot of times.
I've read something about transforming recursions to generating functions and transforming them to closed formulas, but I didn't get the math behind it. Is there an easy way to it?
If I understand correctly, you're looking for the largest n such that:
Σi=0..n c1/c3 c2lvl+i ≤ x
Dividing by the constant factor:
Σi=0..n c2i ≤ c3 / (c1 c2lvl) x
Using the formula for the sum of a geometric series:
(c2n+1 - 1) / (c2 - 1) ≤ c3 / (c1 c2lvl) x
And solving for the maximum integer:
n = floor(logc2(c3 (c2 - 1) / (c1 c2lvl) x + 1) - 1)
Related
I'm trying to solve this equation:
(b(ax+b ) - c) % n = e
Where everything is given except x
I tried the approach of :
(A + x) % B = C
(B + C - A) % B = x
where A is (-c) and then manually solve for x given my other subs, but I am not getting the correct output. Would I possibly need to use eea? Any help would be appreciated! I understand this question has been asked, I tried their solutions but it doesn't work for me.
(b*(a*x+b) - c) % n = e
can be rewritten as:
(b*a*x) % n = (e - b*b + c) % n
x = ((e - b*b + c) * modular_inverse(b*a, n)) % n
where the modular inverse of u, modular_inverse(u, n), is a number v such that u*v % n == 1. See this question for code to calculate the modular inverse.
Some caveats:
When simplifying modular equations, you can never simply divide, you need to multiply with the modular inverse.
There is no straightforward formula to calculate the modular inverse, but there is a simple, quick algorithm to calculate it, similar to calculating the gcd.
The modular inverse doesn't always exist.
Depending on the programming language, when one or both arguments are negative, the result of modulo can also be negative.
As every solution works for every x modulo n, for small n only the numbers from 0 till n-1 need to be tested, so in many cases a simple loop is sufficient.
What language are you doing this in, and are the variables constant?
Here's a quick way to determine the possible values of x in Java:
for (int x = -1000; x < 1000; x++){
if ((b*((a*x)+b) - c) % n == e){
System.out.println(x);
}
}
I'm trying to find time complexity (big O) of a recursive formula.
I tried to find a solution, you may see the formula and my solution below:
Like Brenner said, your last assumption is false. Here is why: Let's take the definition of O(n) from the Wikipedia page (using n instead of x):
f(n) = O(n) if and only if there exist constants c, n0 s.t. |f(n)| <= c |g(n)|, for alln >= n0.
We want to check if O(2^n^2) = O(2^n). Clearly, 2^n^2 is in O(2^n^2), so let's pick f(n) = 2^n^2 and check if this is in O(2^n). Put this into the above formula:
exists c, n0: 2^n^2 <= c * 2^n for all n >= n0
Let's see if we can find suitable constant values n0 and c for which the above is true, or if we can derive a contradiction to proof that it is not true:
Take the log on both sides:
log(2^n^2) <= log(c * 2 ^ n)
Simplify:
2 ^n log(2) <= log(c) + n * log(2)
Divide by log(2):
n^2 <= log(c)/log(2) * n
It's easy to see know that there is no c, n0 for which the above is true for all n >= n0, thus O(2^n^2) = O(n^2) is not a valid assumption.
The last assumption you've specified with the question mark is false! Do not make such assumptions.
The rest of the manipulations you've supplied seem to be correct. But they actually bring you nowhere.
You should have finished this exercise in the middle of your draft:
T(n) = O(T(1)^(3^log2(n)))
And that's it. That's the solution!
You could actually claim that
3^log2(n) == n^log2(3) ==~ n^1.585
and then you get:
T(n) = O(T(1)^(n^1.585))
which is somewhat similar to the manipulations you've made in the second part of the draft.
So you can also leave it like this. But you cannot mess with the exponent. Changing the value of the exponent changes the big-O classification.
EDIT 2: this post seems to have been moved from CrossValidated to StackOverflow due to it being mostly about programming, but that means by fancy MathJax doesn't work anymore. Hopefully this is still readable.
Say I want to to calculate the squared Mahalanobis distance between two vectors x and y with covariance matrix S. This is a fairly simple function defined by
M2(x, y; S) = (x - y)^T * S^-1 * (x - y)
With python's numpy package I can do this as
# x, y = numpy.ndarray of shape (n,)
# s_inv = numpy.ndarray of shape (n, n)
diff = x - y
d2 = diff.T.dot(s_inv).dot(diff)
or in R as
diff <- x - y
d2 <- t(diff) %*% s_inv %*% diff
In my case, though, I am given
m by n matrix X
n-dimensional vector mu
n by n covariance matrix S
and want to find the m-dimensional vector d such that
d_i = M2(x_i, mu; S) ( i = 1 .. m )
where x_i is the ith row of X.
This is not difficult to accomplish using a simple loop in python:
d = numpy.zeros((m,))
for i in range(m):
diff = x[i,:] - mu
d[i] = diff.T.dot(s_inv).dot(diff)
Of course, given that the outer loop is happening in python instead of in native code in the numpy library means it's not as fast as it could be. $n$ and $m$ are about 3-4 and several hundred thousand respectively and I'm doing this somewhat often in an interactive program so a speedup would be very useful.
Mathematically, the only way I've been able to formulate this using basic matrix operations is
d = diag( X' * S^-1 * X'^T )
where
x'_i = x_i - mu
which is simple to write a vectorized version of, but this is unfortunately outweighed by the inefficiency of calculating a 10-billion-plus element matrix and only taking the diagonal... I believe this operation should be easily expressible using Einstein notation, and thus could hopefully be evaluated quickly with numpy's einsum function, but I haven't even begun to figure out how that black magic works.
So, I would like to know: is there either a nicer way to formulate this operation mathematically (in terms of simple matrix operations), or could someone suggest some nice vectorized (python or R) code that does this efficiently?
BONUS QUESTION, for the brave
I don't actually want to do this once, I want to do it k ~ 100 times. Given:
m by n matrix X
k by n matrix U
Set of n by n covariance matrices each denoted S_j (j = 1..k)
Find the m by k matrix D such that
D_i,j = M(x_i, u_j; S_j)
Where i = 1..m, j = 1..k, x_i is the ith row of X and u_j is the jth row of U.
I.e., vectorize the following code:
# s_inv is (k x n x n) array containing "stacked" inverses
# of covariance matrices
d = numpy.zeros( (m, k) )
for j in range(k):
for i in range(m):
diff = x[i, :] - u[j, :]
d[i, j] = diff.T.dot(s_inv[j, :, :]).dot(diff)
First off, it seems like maybe you're getting S and then inverting it. You shouldn't do that; it's slow and numerically inaccurate. Instead, you should get the Cholesky factor L of S so that S = L L^T; then
M^2(x, y; L L^T)
= (x - y)^T (L L^T)^-1 (x - y)
= (x - y)^T L^-T L^-1 (x - y)
= || L^-1 (x - y) ||^2,
and since L is triangular L^-1 (x - y) can be computed efficiently.
As it turns out, scipy.linalg.solve_triangular will happily do a bunch of these at once if you reshape it properly:
L = np.linalg.cholesky(S)
y = scipy.linalg.solve_triangular(L, (X - mu[np.newaxis]).T, lower=True)
d = np.einsum('ij,ij->j', y, y)
Breaking that down a bit, y[i, j] is the ith component of L^-1 (X_j - \mu). The einsum call then does
d_j = \sum_i y_{ij} y_{ij}
= \sum_i y_{ij}^2
= || y_j ||^2,
like we need.
Unfortunately, solve_triangular won't vectorize across its first argument, so you should probably just loop there. If k is only about 100, that's not going to be a significant issue.
If you are actually given S^-1 rather than S, then you can indeed do this with einsum more directly. Since S is quite small in your case, it's also possible that actually inverting the matrix and then doing this would be faster. As soon as n is a nontrivial size, though, you're throwing away a lot of numerical accuracy by doing this.
To figure out what to do with einsum, write everything in terms of components. I'll go straight to the bonus case, writing S_j^-1 = T_j for notational convenience:
D_{ij} = M^2(x_i, u_j; S_j)
= (x_i - u_j)^T T_j (x_i - u_j)
= \sum_k (x_i - u_j)_k ( T_j (x_i - u_j) )_k
= \sum_k (x_i - u_j)_k \sum_l (T_j)_{k l} (x_i - u_j)_l
= \sum_{k l} (X_{i k} - U_{j k}) (T_j)_{k l} (X_{i l} - U_{j l})
So, if we make arrays X of shape (m, n), U of shape (k, n), and T of shape (k, n, n), then we can write this as
diff = X[np.newaxis, :, :] - U[:, np.newaxis, :]
D = np.einsum('jik,jkl,jil->ij', diff, T, diff)
where diff[j, i, k] = X_[i, k] - U[j, k].
Dougal nailed this one with an excellent and detailed answer, but thought I'd share a small modification that I found increases efficiency in case anyone else is trying to implement this. Straight to the point:
Dougal's method was as follows:
def mahalanobis2(X, mu, sigma):
L = np.linalg.cholesky(sigma)
y = scipy.linalg.solve_triangular(L, (X - mu[np.newaxis,:]).T, lower=True)
return np.einsum('ij,ij->j', y, y)
A mathematically equivalent variant I tried is
def mahalanobis2_2(X, mu, sigma):
# Cholesky decomposition of inverse of covariance matrix
# (Doing this in either order should be equivalent)
linv = np.linalg.cholesky(np.linalg.inv(sigma))
# Just do regular matrix multiplication with this matrix
y = (X - mu[np.newaxis,:]).dot(linv)
# Same as above, but note different index at end because the matrix
# y is transposed here compared to above
return np.einsum('ij,ij->i', y, y)
Ran both versions head-to-head 20x using identical random inputs and recorded the times (in milliseconds). For X as a 1,000,000 x 3 matrix (mu and sigma 3 and 3x3) I get:
Method 1 (min/max/avg): 30/62/49
Method 2 (min/max/avg): 30/47/37
That's about a 30% speedup for the 2nd version. I'm mostly going to be running this in 3 or 4 dimensions but to see how it scaled I tried X as 1,000,000 x 100 and got:
Method 1 (min/max/avg): 970/1134/1043
Method 2 (min/max/avg): 776/907/837
which is about the same improvement.
I mentioned this in a comment on Dougal's answer but adding here for additional visibility:
The first pair of methods above take a single center point mu and covariance matrix sigma and calculate the squared Mahalanobis distance to each row of X. My bonus question was to do this multiple times with many sets of mu and sigma and output a two-dimensional matrix. The set of methods above can be used to accomplish this with a simple for loop, but Dougal also posted a more clever example using einsum.
I decided to compare these methods with each other by using them to solve the following problem: Given k d-dimensional normal distributions (with centers stored in rows of k by d matrix U and covariance matrices in the last two dimensions of the k by d by d array S), find the density at the n points stored in rows of the n by d matrix X.
The density of a multivariate normal distribution is a function of the squared Mahalanobis distance of the point to the mean. Scipy has an implementation of this as scipy.stats.multivariate_normal.pdf to use as a reference. I ran all three methods against each other 10x using identical random parameters each time, with d=3, k=96, n=5e5. Here are the results, in points/sec:
[Method]: (min/max/avg)
Scipy: 1.18e5/1.29e5/1.22e5
Fancy 1: 1.41e5/1.53e5/1.48e5
Fancy 2: 8.69e4/9.73e4/9.03e4
Fancy 2 (cheating version): 8.61e4/9.88e4/9.04e4
where Fancy 1 is the better of the two methods above and Fancy2 is Dougal's 2nd solution. Since the Fancy 2 needs to calculate the inverses of all the covariance matrices I also tried a "cheating version" where it was passed these as a parameter, but it looks like that didn't make a difference. I had planned on including the non-vectorized implementation but that was so slow it would have taken all day.
What we can take away from this is that using Dougal's first method is about 20% faster than however Scipy does it. Unfortunately despite its cleverness the 2nd method is only about 60% as fast as the first. There are probably some other optimizations that can be done but this is already fast enough for me.
I also tested how this scaled with higher dimensionality. With d=100, k=96, n=1e4:
Scipy: 7.81e3/7.91e3/7.86e3
Fancy 1: 1.03e4/1.15e4/1.08e4
Fancy 2: 3.75e3/4.10e3/3.95e3
Fancy 2 (cheating version): 3.58e3/4.09e3/3.85e3
Fancy 1 seems to have an even bigger advantage this time. Also worth noting that Scipy threw a LinAlgError 8/10 times, probably because some of my randomly-generated 100x100 covariance matrices were close to singular (which may mean that the other two methods are not as numerically stable, I did not actually check the results).
I have a problem with what i guess is a rounding error with floating-points in OpenEdge ABL / Progress 4GL
display truncate(log(4) / log(2) , 0) .
This returns 1.0 but should give me a 2.0
if i do this pseudo solution it gives me the right answer in most cases which hints to floating-points.
display truncate(log(4) / log(2) + 0.00000001, 0) .
What I am after is this
find the largest x where
p^x < n, p is prime, n and x is natural numbers.
=>
x = log(n) / log(p)
Any takes on this one?
No numerical arithmetic system is exact. The natural logarithms of 4 and 2 cannot be represented exactly. Since the log function can only return a representable value, it returns an approximation of the exact mathematical result.
Sometimes this approximation will be slightly higher than the mathematical result. Sometimes it will be slightly lower. Therefore, you cannot generally expect that log(x*x) will be exactly twice log(x).
Ideally, a high-quality log implementation would return the representable value that is closest to the exact mathematical value. (This is called a “correctly rounded” result.) In that case, and if you are using binary floating-point (which is common), then log(4) would always be exactly twice log(2). Since this does not happen for you, it seems the log implementation you are using does not provide correctly rounded results.
However, for this problem, you also need log(8) to be exactly three times log(2), and so on for additional powers. Even if the log implementation did return correctly rounded results, this would not necessarily be true for all the values you need. For some y = x5, log(y) might not be exactly five times log(x), because rounding log(y) to the closest representable value might round down while rounding log(x) rounds up, just because of where the exact values happen to lie relative to the nearest representable values.
Therefore, you cannot rely on even a best-possible log implementation to tell you exactly how many powers of x divide some number y. You can get close, and then you can test the result by confirming or denying it with integer arithmetic. There are likely other approaches depending upon the needs specific to your situation.
I think you want:
/* find the largest x where p^x < n, p is prime, n and x is natural numbers.
*/
define variable p as integer no-undo format ">,>>>,>>>,>>9".
define variable x as integer no-undo format ">>9".
define variable n as integer no-undo format ">,>>>,>>>,>>9".
define variable i as integer no-undo format "->>9".
define variable z as decimal no-undo format ">>9.9999999999".
update p n with side-labels.
/* approximate x
*/
z = log( n ) / log( p ).
display z.
x = integer( truncate( z, 0 )). /* estimate x */
/* is p^x < n ?
*/
if exp( p, x ) >= n then
do while exp( p, x ) >= n: /* was the estimate too high? */
assign
i = i - 1
x = x - 1
.
end.
else
do while exp( p, x + 1 ) < n: /* was the estimate too low? */
assign
i = i + 1
x = x + 1
.
end.
display
x skip
exp( p, x ) label "p^x" format ">,>>>,>>>,>>9" skip
i skip
log( n ) skip
log( p ) skip
z skip
with
side-labels
.
The root of the problem is that the log function, susceptible to floating point truncation error, is being used to address a question in the realm of natural numbers. First, I should point out that actually, in the example given, 1 really is the correct answer. We are looking for the largest x such that p^x < n; not p^x <= n. 2^1 < 4, but 2^2 is not. That said, we still have a problem, because when p^x = n for some x, log(n) divided by log(p) could probably just as well land slightly above the whole number rather than below, unless there is some systemic bias in the implementation of the log function. So in this case where there is some x for which p^x=n, we actually want to be sure to round down to the next lower whole value for x.
So even a solution like this will not correct this problem:
display truncate(round(log(4) / log(2), 10) , 0) .
I see two ways to deal with this. One is similar to what you already tried, except that because we actually want to round down to the next lower natural number, we would subtract rather than add:
display truncate(log(4) / log(2) - 0.00000001, 0) .
This will work as long as n is less than 10^16, but a more tidy solution would be to settle the boundary conditions with actual integer math. Of course, this will fail too if you get to numbers that are higher than the maximum integer value. But if this is not a concern, you can just use your first solution get the approximate solution:
display truncate(log(4) / log(2) , 0) .
And then test whether the result works in the equation p^x < n. If it isn't less than n, subtract one and try again.
On a side note, by the way, the definition of natural numbers does not include zero, so if the lowest possible value for x is 1, then the lowest possible value for p^x is p, so if n is less than or equal to p, there is no natural number solution.
Most calculators can not calculate sqrt{2}*sqrt{2} either. The problem is that we usually do not have that many decimals.
Work around: Avoid TRUNCATE use ROUND like
ROUND(log(4) / log(2), 0).
Round(a,b) rounds up the decimal a to closest number having b decimals.
For an ocean shader, I need a fast function that computes a very approximate value for sin(x). The only requirements are that it is periodic, and roughly resembles a sine wave.
The taylor series of sin is too slow, since I'd need to compute up to the 9th power of x just to get a full period.
Any suggestions?
EDIT: Sorry I didn't mention, I can't use a lookup table since this is on the vertex shader. A lookup table would involve a texture sample, which on the vertex shader is slower than the built in sin function.
It doesn't have to be in any way accurate, it just has to look nice.
Use a Chebyshev approximation for as many terms as you need. This is particularly easy if your input angles are constrained to be well behaved (-π .. +π or 0 .. 2π) so you do not have to reduce the argument to a sensible value first. You might use 2 or 3 terms instead of 9.
You can make a look-up table with sin values for some values and use linear interpolation between that values.
A rational algebraic function approximation to sin(x), valid from zero to π/2 is:
f = (C1 * x) / (C2 * x^2 + 1.)
with the constants:
c1 = 1.043406062
c2 = .2508691922
These constants were found by least-squares curve fitting. (Using subroutine DHFTI, by Lawson & Hanson).
If the input is outside [0, 2π], you'll need to take x mod 2 π.
To handle negative numbers, you'll need to write something like:
t = MOD(t, twopi)
IF (t < 0.) t = t + twopi
Then, to extend the range to 0 to 2π, reduce the input with something like:
IF (t < pi) THEN
IF (t < pi/2) THEN
x = t
ELSE
x = pi - t
END IF
ELSE
IF (t < 1.5 * pi) THEN
x = t - pi
ELSE
x = twopi - t
END IF
END IF
Then calculate:
f = (C1 * x) / (C2 * x*x + 1.0)
IF (t > pi) f = -f
The results should be within about 5% of the real sine.
Well, you don't say how accurate you need it to be. The sine can be approximated by straight lines of slopes 2/pi and -2/pi on intervals [0, pi/2], [pi/2, 3*pi/2], [3*pi/2, 2*pi]. This approximation can be had for the cost of a multiplication and an addition after reducing the angle mod 2*pi.
Using a lookup table is probably the best way to control the tradeoff between speed and accuracy.