[warning: biologist asking a math question]
In a linear dynamical system (LDS), what feature of the matrix controls the speed of the trajectory in state space?
Say I have a matrix M describing how the LDS evolves per discrete time unit t. State after 10 t is given by M^10, and I'll call it the final state.
For the same initial condition, how should I modify M to make it reach the final state in arbitrary fewer (or more) time steps? Is it trivial?
thanks,
Related
I've fitted a HMM model to my data using hmm.discnp package in R as follows:
library(hmm.discnp)
zs <- hmm(y=lis,K=5)
Now I want to predict the future K observations (emissions) from this model. But I am only able to get most probable state sequence for the observations that I already have through Viterbi algorithm.
I have t emissions already , i.e (y(1),...,y(t)).
I want the most probable future K emissions from the fitted HMM object i.e (y(t+1),...y(t+k)).
Is there a function to calculate this? if not then how do I calculate it manually?
Generating emissions from an HMM is pretty straightforward to do manually. I'm am not really familiar with R but I explain here the steps to generate data as you ask.
First thing to keep in mind is that, by its Markovian nature, the HMM has no memory. At any time, only the current state is known, what happened before is "forgotten". This means that the generation of the sample at time t+1 only depends of the sample at time t.
If you have a sequence, the first thing you can do is to fit the most probable state sequence (with the Viterbi algorithm) as you did. Now, you know the state that generated the last observation that you have (the one that you denote y(t)).
Now, from this state, you know the probabilities to transit to each other state of the model thanks to the transition matrix. This is a probability mass function (pmf) and you can draw a state number from this pmf (not by hand! R should have a built-in function to draw a sample from a pmf). The state number you draw is the state in which your system is at time t+1.
With this information, you can now draw a sample observation from the probability function that is assigned to this new state (same here, if it is a Gaussian distribution, use a Gaussian random generator that should exist in R).
From this state t+1, you can now apply the same procedure to reach a state at time t+2 and so on.
Keep in mind that if you do this full procedure several times (to generate data samples from time t+1 to t+k), you will end up with different results. This is due to the probabilistic nature of the model. I am not sure of what you mean by most probable future emissions and I am not sure whether there are some routines or not to do so. You can compute the likelihood of the full sequence you obtain at the end (from 1 to t+k). It will in general be greater that the likelihood of the sequence up to t as the last part has been truly generated from the model itself and thus "perfectly" fits in some regards.
Are there functions which produce "infinite" amounts of high entropy data? Moreover, do functions exist which produce the same random data (sequentially) time after time?
I kind of know that they exist, but do they have a specific name?
Use case examples:
Using the function to generate 100 bits of random data. (Great!) But while maintaining high values of entropy.
Using the same function to generate 10000 bits of random data. (The first 100 bits generated are the same as the 100 bits of random data generated before). And while still maintaining high values of entropy
Further, how would I go about building these functions myself?
You are most likely looking for Pseudo-Random Number Generators.
They are initialized by a seed, thus taking in a finite amount of entropy.
Good generators have a decent entropy coming out, supposing you judge it only from its output (thus you ignore the seed and/or the algorithm to generate the numbers, otherwise the entropy is obviously 0).
Most PRNG algorithms produce sequences which are uniformly distributed by any of several tests. It is an open question, and one central to the theory and practice of cryptography, whether there is any way to distinguish the output of a high-quality PRNG from a truly random sequence without knowing the algorithm(s) used and the state with which it was initialized.
All PRNGs have a period, after which a generated sequence will restart.
The period of a PRNG is defined thus: the maximum, over all starting states, of the length of the repetition-free prefix of the sequence. The period is bounded by the number of the states, usually measured in bits. However, since the length of the period potentially doubles with each bit of "state" added, it is easy to build PRNGs with periods long enough for many practical applications.
Thus, to have two sequences of different lengths where one is the prefix of the other, you just have to run a PRNG with the same seed both times.
Building them yourself would be pretty tricky, but a rather good and simple one is the Mersenne Twister, which dates back to only 1998 and defined in a paper by Matsumoto and Nishimura [1].
A trivial example would be a linear congruential generator.
[1] Matsumoto, M.; Nishimura, T. (1998). "Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator". ACM Transactions on Modeling and Computer Simulation 8 (1): 3–30. doi:10.1145/272991.272995.
In all of the simple algorithms for path tracing using lots of monte carlo samples the tracing the path part of the algorithm randomly chooses between returning with the emitted value for the current surface and continuing by tracing another ray from that surface's hemisphere (for example in the slides here). Like so:
TracePath(p, d) returns (r,g,b) [and calls itself recursively]:
Trace ray (p, d) to find nearest intersection p’
Select with probability (say) 50%:
Emitted:
return 2 * (Le_red, Le_green, Le_blue) // 2 = 1/(50%)
Reflected:
generate ray in random direction d’
return 2 * fr(d ->d’) * (n dot d’) * TracePath(p’, d’)
Is this just a way of using russian roulette to terminate a path while remaining unbiased? Surely it would make more sense to count the emissive and reflective properties for all ray paths together and use russian roulette just to decide whether to continue tracing or not.
And here's a follow up question: why do some of these algorithms I'm seeing (like in the book 'Physically Based Rendering Techniques') only compute emission once, instead of taking in to account all the emissive properties on an object? The rendering equation is basically
L_o = L_e + integral of (light exiting other surfaces in to the hemisphere of this surface)
which seems like it counts the emissive properties in both this L_o and the integral of all the other L_o's, so the algorithms should follow.
In reality, the single emission vs. reflection calculation is a bit too simplistic. To answer the first question, the coin-flip is used to terminate the ray but it leads to much greater biases. The second question is a bit more complex....
In the abstract of Shirley, Wang and Zimmerman TOG 94, the authors briefly summarize the benefits and complexities of Monte Carlo sampling:
In a distribution ray tracer, the crucial part of the direct lighting
calculation is the sampling strategy for shadow ray testing. Monte
Carlo integration with importance sampling is used to carry out this
calculation. Importance sampling involves the design of
integrand-specific probability density functions which are used to
generate sample points for the numerical quadrature. Probability
density functions are presented that aid in the direct lighting
calculation from luminaires of various simple shapes. A method for
defining a probability density function over a set of luminaires is
presented that allows the direct lighting calculation to be carried
out with one sample, regardless of the number of luminaires.
If we start dissecting that abstract, here are some of the important points:
Lights aren't points: in real life, we're almost never dealing with a point light source (e.g., a single LED).
Shadows are usually soft: this is a consequence of the non-point lights. It's very rare to see a truly hard-edged shadow in real life.
Noise (especially bright sampling artifacts) are disproportionately distracting: humans have a lot of intuition about how things should look. Look at slide 5 (the glass sphere on a table) in the OP's linked presentation. Note the bright specks in the shadow.
When rendering for more visual realism, both of the sets of reflected visibility rays and lighting calculation rays must be sampled and weighted according to the surface's bidirectional reflectance distribution function.
Note that this is a guided sampling method that's distinctly different from the original question's "generate ray in random direction" method in that it is both:
More accurate: the images in the linked PDF suffer a bit from the PDF process. Figure 10 is a reasonable representation of the original - note that lack of bright speckle artifacts that you will sometimes see (as in figure 5 of the original presentation).
Significantly faster: as the original presentation notes, unguided Monte Carlo sampling can take quite a while to converge. More sampling rays = much more computation = more time.
After reading the slides (thank you for posting), I'll amend my answer as best I can.
Is this just a way of using russian roulette to terminate a path
while remaining unbiased? Surely it would make more sense to count
the emissive and reflective properties for all ray paths together
and use russian roulette just to decide whether to continue tracing
or not.
Perhaps the emitted and reflected properties are treated differently because the reflected path depends on the incident path in a way that emitted paths do not (at least for a spectral surface). Does the algorithm take a Bayesian approach and use prior information about the incidence angle as a prior for predicting the reflective angle? Or is this a Feynman integration over all paths to come up with a probability? It's hard to tell without digging deeper into the details of the theory.
My earlier black body comment is quite incorrect. I see that the slides talk about (R, G, B) components; black body emissivities are integrated over all wavelengths.
And here's a follow up question: why do some of these algorithms I'm
seeing (like in the book 'Physically Based Rendering Techniques')
only compute emission once, instead of taking in to account all the
emissive properties on an object? The rendering equation is
basically
L_o = L_e + integral of (light exiting other surfaces in to the
hemisphere of this surface)
A single emissivity for the surface would assume that there's no functional relationship on wavelength or direction. I don't know how significant it is for rendering photo-realistic images.
The ones that are posted are certainly impressive. I wonder how different they would look if the complexities that you have in mind were included?
Thank you for posting a nice question - I'm voting it up. It's been a long time since I've thought about this kind of problem. I wish I could be more helpful.
Yes that is a very basic implementation of Russian Roulette, though normally the probability of terminating would take into account the light intensity (i.e. less light means the value contributes less to the final summation so use a higher probability of terminating).
If I have a system of a springs, not one, but for example 3 degree of freedom system of the springs connected in some with each other. I can make a system of differential equations for but it is impossible to solve it in a general way. The question is, are there any papers or methods for filtering such a complex oscilliations, in order to get rid of the oscilliations and get a real signal as much as possible? For example if I connect 3 springs in some way, and push them to start the vibrations, or put some weight on them, and then take the vibrations from each spring, are there any filtering methods to make it easy to determine the weight (in case if some mass is put above) of each mass? I am interested in filtering complex spring like systems.
Three springs, six degrees of freedom? This is a trivial solution using finite element methods and numerical integration. It's a system of six coupled ODEs. You can apply any form of numerical integration, such as 5th order Runge-Kutta.
I'd recommend doing an eigenvalue analysis of the system first to find out something about its frequency characteristics and normal modes. I'd also do an FFT of the dynamic forces you apply to the system. You don't mention any damping, so if you happen to excite your system at a natural frequency that's close to a resonance you might have some interesting behavior.
If the dynamic equation has this general form (sorry, I don't have LaTeX here to make it look nice):
Ma + Kx = F
where M is the mass matrix (diagonal), a is the acceleration (2nd derivative of displacements w.r.t. time), K is the stiffness matrix, and F is the forcing function.
If you're saying you know the response, you'll have to pre-multiply by the transpose of the response function and try to solve for M. It's diagonal, so you have a shot at it.
Are you connecting the springs in such a way that the behavior of the system is approximately linear? (e.g. at least as close to linear as are musical instrument springs/strings?) Is this behavior consistant over time? (e.g. the springs don't melt or break.) If so, LTI (linear time invariant) systems theory might be applicable. Given enough measurements versus the numbers of degrees of freedom in the LTI system, one might be able to estimate a pole-zero plot of the system response, and go from there. Or something like a linear predictor might be useful.
Actually it is possible to solve the resulting system of differential equations as long as you know the masses, etc.
The standard approach is to use a Laplace Transform. In particular you start with a set of linear differential equations. Add variables until you have a set of first order linear differential equations. (So if you have y'' in your equation, you'd add the equation z = y' and replace y'' with z'.) Rewrite this in the form:
v' = Av + w
where v is a vector of variable, A is a matrix, and w is a scalar vector. (An example of something that winds up in w is gravity.)
Now apply a Laplace transform to get
s L(v) - v(0) = AL(v) + s w
Solve it to get
L(v) = inv(A - I s)(s w + v(0))
where inv inverts a matrix and I is the identity matrix. Apply the inverse Laplace transform (if you read up on Laplace transforms you can find tables of inverse of common types of functions - getting a complete list of the functions you actually encounter shouldn't be that hard), and you have your solution. (Be warned, these computations quickly get very complex.)
Now you have the ability to take a particular setup and solve for the future behavior. You also have the ability to (if you do things really carefully) figure out how the model responds to a small perturbation in parameters. But your problem is that you don't know the parameters to use. However you do have the ability to measure the positions in the system at repeated times.
If you put this together, what you can do is this. Measure your position at a number of points. First estimate all of the initial values of the parameters, and then all of the values a second later. You can adjust your parameters (using Newton's method) to come close enough to the values a second later. Take the measurements from 5 seconds later and use that initial estimate as your starting point to refine your calculations for what is happening 5 seconds later. Repeat with longer intervals to get all of your answers.
Writing and debugging this should take you some time. :-) I would strongly recommend investigating how much of this Mathematica knows how to do for you already...
Is a finite state machine just an implementation of a Markov chain? What are the differences between the two?
Markov chains can be represented by finite state machines. The idea is that a Markov chain describes a process in which the transition to a state at time t+1 depends only on the state at time t. The main thing to keep in mind is that the transitions in a Markov chain are probabilistic rather than deterministic, which means that you can't always say with perfect certainty what will happen at time t+1.
The Wikipedia articles on Finite-state machines has a subsection on Finite Markov-chain processes, I'd recommend reading that for more information. Also, the Wikipedia article on Markov chains has a brief sentence describing the use of finite state machines in representing a Markov chain. That states:
A finite state machine can be used as
a representation of a Markov chain.
Assuming a sequence of independent and
identically distributed input signals
(for example, symbols from a binary
alphabet chosen by coin tosses), if
the machine is in state y at time n,
then the probability that it moves to
state x at time n + 1 depends only on
the current state.
Whilst a Markov chain is a finite state machine, it is distinguished by its transitions being stochastic, i.e. random, and described by probabilities.
The two are similar, but the other explanations here are slightly wrong. Only FINITE Markov chains can be represented by a FSM. Markov chains allow for an infinite state space. As it was pointed out, the transitions of a Markov chain are described by probabilities, but it is also important to mention that the transition probabilities can only depend on the current state. Without this restriction, it would be called a "discrete time stochastic process".
I believe this should answer your question:
https://en.wikipedia.org/wiki/Probabilistic_automaton
And, you are on to the right idea - they are almost the same, subsets, supersets and modifications depending on what adjective describes the chain or the automaton. Automata typically take an input as well, but I am sure there have been papers utilizing 'Markov-chains' with inputs.
Think gaussian distribution vs. normal distribution - same ideas different fields. Automata belong to computer science, Markov belongs to probability and statistics.
I think most of the answers are not appropriate. A Markov process is generated by a (probablistic) finite state machine, but not every process generated by a probablistic finite state machine is a Markov process. E.g. Hidden Markov Processes are basically the same as processes generated by probabilistic finite state machines, but not every Hidden Markov Process is a Markov Process.
If leaving the inner working details aside, finite state machine is like a plain value, while markov chain is like a random variable (add probability on top of the plain value). So the answer to the original question is no, they are not the same. In the probabilistic sense, Markov chain is an extension of finite state machine.