I'm writing a vertex shader at the moment, and I need some random numbers. Vertex shader hardware doesn't have logical/bit operations, so I cannot implement any of the standard random number generators.
Is it possible to make a random number generator using only standard arithmetic? the randomness doesn't have to be particularly good!
If you don't mind crappy randomness, a classic method is
x[n+1] = (x[n] * x[n] + C) mod N
where C and N are constants, C != 0 and C != -2, and N is prime. This is a typical pseudorandom generator for Pollard Rho factoring. Try C = 1 and N = 8051, those work ok.
Vertex shaders sometimes have built-in noise generators for you to use, such as cg's noise() function.
Use a linear congruential generator:
X_(n+1) = (a * X_n + c) mod m
Those aren't that strong, but at least they are well known and can have long periods. The Wikipedia page also has good recommendations:
The period of a general LCG is at most
m, and for some choices of a much less
than that. The LCG will have a full
period if and only if:
1. c and m are relatively prime,
2. a - 1 is divisible by all prime factors of m,
3. a - 1 is a multiple of 4 if m is a multiple of 4
Believe it or not, I used newx = oldx * 5 + 1 (or a slight variation of it) in several videogames. The randomness is horrible--it's more of a scrambled sequence than a random generator. But sometimes that's all you need. If I recall correctly, it goes through all numbers before it repeats.
It has some terrible characteristics. It doesn't ever give you the same number twice in a row. A few of us did a bunch of tests on variations of it and we used some variations in other games.
We used it when there was no good modulo available to us. It's just a shift by two and two adds (or a multiply by 5 and one add). I would never use it nowadays for random numbers--I'd use an LCG--but maybe it would work OK for a shader where speed is crucial and your instruction set may be limited.
Related
I'm trying to solve this in LPSolve IDE:
/* Objective function */
min: x + y;
/* Variable bounds */
r_1: 2x = 2y;
r_2: x + y = 1.11 x y;
r_3: x >= 1;
r_4: y >= 1;
but the response I get is:
Model name: 'LPSolver' - run #1
Objective: Minimize(R0)
SUBMITTED
Model size: 4 constraints, 2 variables, 5 non-zeros.
Sets: 0 GUB, 0 SOS.
Using DUAL simplex for phase 1 and PRIMAL simplex for phase 2.
The primal and dual simplex pricing strategy set to 'Devex'.
The model is INFEASIBLE
lp_solve unsuccessful after 2 iter and a last best value of 1e+030
How come this can happen when x=1.801801802 and y=1.801801802 are possible solutions here?
How To Find The Solution
Let's do some math.
Your problem is:
min x+y
s.t. 2x = 2y
x + y = 1.11 x y
x >= 1
y >= 1
The first constraint 2x = 2y can be simplified to x=y. We now substitute throughout the problem:
min 2*x
s.t. 2*x = 1.11 x^2
x >= 1
And rearrange:
min 2*x
s.t. 1.11 x^2-2*x=0
x >= 1
From geometry we know that 1.11 x^2-2*x makes an upward-opening parabola with a minimum less than zero. Therefore, there are exactly two points. These are given by the quadratic equation: 200/111 and 0.
Only one of these satisfies the second constraint: 200/111.
Why Can't I Find This Constraint With My Solver
The easy way out is to say it's because the x^2 term (x*y before the substitution is nonlinear). But it goes a little deeper than that. Nonlinear problems can be easy to solve as long as they are convex. A convex problem is one whose constraints form a single, contiguous space such that any line drawn between two points in the space stays within the boundaries of the space.
Your problem is not convex. The constraint 1.11 x^2-2*x=0 defines an infinite number of points. No two of these points can be connected by a straight line which stays in the space defined by the constraint because that space is curved. If the constraint were instead 1.11 x^2-2*x<=0 then the space would be convex because all points could be connected with straight lines that stay in its interior.
Nonconvex problems are part of a broader class of problems called NP-Hard. This means that there is not (and perhaps cannot) be any easy way of solving the problem. We have to be smart.
Solvers that can handle mixed-integer programming (MIP/MILP) can solve many non-convex problems efficiently, as can other techniques such as genetic algorithms. But, beneath the hood, these techniques all rely on glorified guess-and-check.
So your solver fails because the problem is nonconvex and your solver is neither smart enough to use MIP to guess-and-check its way to a solution nor smart enough to use the quadratic equation.
How Then Can I Solve The Problem?
In this particular instance, we are able to use mathematics to quickly find a solution because, although the problem is nonconvex, it is part of a class of special cases. Deep thinking by mathematicians has given us a simple way of handling this class.
But consider a few generalizations of the problem:
(a) a x^3+b x^2+c x+d=0
(b) a x^4+b x^3+c x^2+d x+e =0
(c) a x^5+b x^4+c x^3+d x^2+e x+f=0
(a) has three potential solutions which must be checked (exact solutions are tricky), (b) has four (trickier), and (c) has five. The formulas for (a) and (b) are much more complex than the quadratic formula and mathematicians have shown that there is no formula for (c) that can be expressed using "elementary operations". Instead, we have to resort to glorified guess-and-check.
So the techniques we used to solve your problem don't generalize very well. This is what it means to live in the realm of the nonconvex and NP-hard, and it's a good reason to fund research in mathematics, computer science, and related fields.
I was going through a question which ask to calculate gcd(a-b,a^n+b^n)%(10^9+7) where a,b,n can be as large as 10^12.
I am able to solve this for a,b and n for very small numbers and fermat's theorem also didn't seem to work, and i reached a conclusion that if a,b are coprime then this will always give me gcd as 2 but for the rest i am not able to get it?
i need just a little hint that what i am doing wrong to get gcd for large numbers? I also tried x^y to find gcd by taking modulo at each step but that also didn't work.
Need just direction and i will make my way.
Thanks in advance.
You are correct that a^n + b^n is too large to compute and that working mod 10^9 + 7 at each step doesn't provide a way to compute the answer. But, you can still use modular exponentiation by squaring with a different modulus, namely a-b
Key observations:
1) gcd(a-b,a^n + b^n) = gcd(d,a^n + b^n) where d = abs(a-b)
2) gcd(d,a^n + b^n) = gcd(d,r) where r = (a^n + b^n) % d
3) r can be feasibly computed with modular exponentiation by squaring
The point of 1) is that different programming languages have different conventions for handling negative numbers in the mod operator. Taking the absolute value avoids such complications, though mathematically it doesn't make a difference. The key idea is that it is perfectly feasible to do the first step of the Euclidean algorithm for computing gcds. All you need is the remainder upon division of the larger by the smaller of the two numbers. After the first step is done, all of the numbers are in the feasible range.
I use extensively the julia's linear equation solver res = X\b. I have to use it millions of times in my program because of parameter variation. This was working ok because I was using small dimensions (up to 30). Now that I want to analyse bigger systems, up to 1000, the linear solver is no longer efficient.
I think there can be a work around. However I must say that sometimes my X matrix is dense, and sometimes is sparse, so I need something that works fine for both cases.
The b vector is a vector with all zeroes, except for one entry which is always 1 (actually it is always the last entry). Moreover, I don't need all the res vector, just the first entry of it.
If your problem is of the form (A - µI)x = b, where µ is a variable parameter and A, b are fixed, you might work with diagonalization.
Let A = PDP° where P° denotes the inverse of P. Then (PDP° - µI)x = b can be transformed to
(D - µI)P°x = P°b,
P°x = P°b / (D - µI),
x = P(P°b / (D - µI)).
(the / operation denotes the division of the respective vector elements by the scalars Dr - µ.)
After you have diagonalized A, computing a solution for any µ reduces to two matrix/vector products, or a single one if you can also precompute P°b.
Numerical instability will show up in the vicinity of the Eigenvalues of A.
Usually when people talk about speeding up linear solvers res = X \ b, it’s for multiple bs. But since your b isn’t changing, and you just keep changing X, none of those tricks apply.
The only way to speed this up, from a mathematical perspective, seems to be to ensure that Julia is picking the fastest solver for X \ b, i.e., if you know X is positive-definite, use Cholesky, etc. Matlab’s flowcharts for how it picks the solver to use for X \ b, for dense and sparse X, are available—most likely Julia implements something close to these flowcharts too, but again, maybe you can find some way to simplify or shortcut it.
All programming-related speedups (multiple threads—while each individual solver is probably already multi-threaded, it may be worth running multiple solvers in parallel when each solver uses fewer threads than cores; #simd if you’re willing to dive into the solvers themselves; OpenCL/CUDA libraries; etc.) then can be applied.
Best approach for efficiency would be to use: JuliaMath/IterativeSolvers.jl. For A * x = b problems, I would recommend x = lsmr(A, b).
Second best alternatives would be to give a bit more information to the compiler: instead of x = inv(A'A) * A' * b, do x = inv(cholfact(A'A)) A' * b if Cholesky decomposition works for you. Otherwise, you could try U, S, Vt = svd(A) and x = Vt' * diagm(sqrt.(S)) * U' * b.
Unsure if x = pinv(A) * b is optimized, but might be slightly more efficient than x = A \ b.
This is a very general question regarding the maximum size of a set of linear equations to be solved by today's fastest hardware, in the form:
X = AX + B
A: NxN matrix of floats, it is sparse.
B: N-vector of floats.
solve for X.
This becomes X(I-A) = B which is best solved using factorisation (and not matrix inversion) as I read here:
http://www.johndcook.com/blog/2010/01/19/dont-invert-that-matrix/
Do you know yourselfs or have a reference to a benchmark or paper which gives some maximum value for N with today's fastest hardware? Most benchmarks I have seen use N < 10,000. I am thinking about N>10x10^6 or more to be processed within a month.
Please consider not only the computational dimension but also storage for A. It can be a problem:
e.g. assuming N = 1 x 10^6, storage would be 1x10^12 x 4 bytes / (1024x1024x1024) = 4 Terrabytes for totally dense matrix, which is about manageable I guess.
Lastly, can the method to solve the system be parallelised so that I can make the assumption that with parallelisation N can be pretty large?
thanks in advance,
bliako
I have to code to evaluate the value of following sequence :
( pow(1,k) + pow(2,k) + ... + pow(n,k) ) % MOD
for given value of n,k and MOD.
I have tried searching it on internet. I got an equation . It contains zeta functions and it seems difficult in implementation. I want any simple approach for implementing the same. Note that the value of n is large, so that we cannot simply use brute force to pass the time limit.
Newton's identities might be of help. Calculate the coefficients of the polynomial with 1..n as roots. That pretty trivial. Then use the identities.
It's just the first thing that comes to mind when I see sums of powers.
I think it is nicely compatible with modular arithmetics - there are only multiplications and additions.
I must admit, that Newton's identities are only the rearrangement of the terms, so not much speed gain here.
JUST USE PYTHON
k=input("Enter value for K: ")
n=input("Enter value for N: ")
mod=input("Enter value for MOD: ")
sum=0
for i in range(1,n+1):
sum+=pow(i,k)
result=sum % mod
print mod
May be this code is gonna help.
I agree that math.stackexchange.com is a better bet.
But here are random facts that, depending on parameters, may make the problem more manageable.
First, factor MOD, solve for each prime power factor, then use the Chinese Remainder Theorem to find the answer for MOD. Thus without loss of generality, you may assume that MOD is a prime power.
Next, note that 1^k + ... + MOD^k is always divisible by MOD. Therefore you can replace n by n mod MOD.
Next, if MOD = p^i and j is not divisible by p, then j^((p-1) * p^(i-1)) is 1 mod MOD, so we can reduce the size of k.
Of course if (k, n) < MOD and MOD is prime, this will not help you at all. (Which, depending on how this problem arises, may well be the case.)
(If k is small enough, there are explicit formulas that you can produce for the sum. But it seems that for you k can be large enough to make that approach intractable.)