How to detect a trend inside unsteady data (e.g. Trendly)? - math

I was wondering what kind of model / method / technique Trendly might use to achieve this model:
[It tries to find the moments where significant changes set in and ignores random movements]
Any pointers very welcome! :)

I've never seen 'Trendly', and don't know anything about it, but if I wanted to produce that red line from that blue line, in an algorithmic fashion, I would try:
Fourier the whole data set
Choose a block size longer than the period of the dominant frequency
Divide the data up into blocks of the chosen size
Compare adjacent ones with a statistical test of some sort.
Where the test says two blocks belong to the same underlying distribution, merge them.
If any were merged, go back to 4.
Red trend line is the mean of each block.

A simple "median" function could produce smoother curves over a mostly un-smooth curve.
Otherwise, a brute-force or genetic algorithm could be used; attempting to find the way to split the data into sections, so that more sections = worse solution, and less accuracy of the lines = worse solution.
Another way would be like this: Start at the beginning. As soon as the line moves outside of some radius (3 above or 3 below the first, for instance) set the new height to an average of the current line's height and the previous marker.
If you keep doing that, it would ignore small fluctuations. However, if the fluctuation was large enough, it would still effect it.

Related

Calculate a dynamic iteration value when zooming into a Mandelbrot

I'm trying to figure out how to automatically adjust the maximum iteration value when moving around in the Mandelbrot fractal.
All examples I've found uses a constant of 1000 or less but that's not enough when zooming into the fractal set.
Is there a way to determine the number of max_iterations based on for example where you are in the Mandelbrot space (x_start,x_end,y_start,y_end)?
One method I tried was to repetitively pre-process a small area in the region of the Mset boundary with increasing iterations until the percentage change in status from one repetition to the next was small. The problem was, that would vary in different places on the current map, since the "depth" varies across it. How to find the right place to do it? By logging the "deepest" boundary area during the previous generation (that will still be within the next zoom area).
But my best strategy was to avoid iterating wherever possible:
Away from the boundary of the Mset, areas of equal depth can be "contoured" and then filled with that depth. It was not an easy algorithm. Basically I followed a raster scan but when I detected a boundary of iteration change (examining all the neighbours to ensure I wasn't close the the edge of the Mset), I would switch to a curve-stitching method to iterate around a contour back to where it started (obviously not recalculating spots I already did), and then make a second pass filling in the raster lines within the countour with the iteration level. It was fraught with leaks but eventually I cracked it.
Within the Mset, I followed the same approach, because the very last thing you want to do is to plough across vast areas and hit the iteration limit.
The difficult area is close the the boundary, where the iteration results can't be related to smooth contours with the neighbours. The contour stitching method won't work here, since there is only ever 1 pixel of a particular depth.
Using the contour method also will have faults to the lower or Mset sides of this region, but since this area looks chaotic until you zoom deeper, I lived with that.
So having said all that, I simply set the iteration depth as high as I can tolerate, but perhaps you can combine my first paragraph with the area-filling techniques.
BTW colouring the region adjacent to the Mset looks terrible when an animated smooth playback of the zoom is attempted. For that reason I coloured this area in a grey scale, by comparing with neighbours. If there was too much difference, I coloured to 0x808080 at first, then adapted that depending on the predominance of the neighbours' depth. All requiring fine tuning!

Problem with Principal Component Analysis

I'm not sure this is the right place but here I go:
I have a database of 300 picture in high-resolution. I want to compute the PCA on this database and so far here is what I do: - reshape every image as a single column vector - create a matrix of all my data (500x300) - compute the average column and substract it to my matrix, this gives me X - compute the correlation C = X'X (300x300) - find the eigenvectors V and Eigen Values D of C. - the PCA matrix is given by XV*D^-1/2, where each column is a Principal Component
This is great and gives me correct component.
Now what I'm doing is doing the same PCA on the same database, except that the images have a lower resolution.
Here are my results, low-res on the left and high-res on the right. Has you can see most of them are similar but SOME images are not the same (the ones I circled)
Is there any way to explain this? I need for my algorithm to have the same images, but one set in high-res and the other one in low-res, how can I make this happen?
thanks
It is very possible that the filter you used could have done a thing or two to some of the components. After all, lower resolution images don't contain higher frequencies that, too, contribute to which components you're going to get. If component weights (lambdas) at those images are small, there's also a good possibility of errors.
I'm guessing your component images are sorted by weight. If they are, I would try to use a different pre-downsampling filter and see if it gives different results (essentially obtain lower resolution images by different means). It is possible that the components that come out differently have lots of frequency content in the transition band of that filter. It looks like images circled with red are nearly perfect inversions of each other. Filters can cause such things.
If your images are not sorted by weight, I wouldn't be surprised if the ones you circled have very little weight and that could simply be a computational precision error or something of that sort. In any case, we would probably need a little more information about how you downsample, how you sort the images before displaying them. Also, I wouldn't expect all images to be extremely similar because you're essentially getting rid of quite a few frequency components. I'm pretty sure it wouldn't have anything to do with the fact that you're stretching out images into vectors to compute PCA, but try to stretch them out in a different direction (take columns instead of rows or vice versa) and try that. If it changes the result, then perhaps you might want to try to perform PCA somewhat differently, not sure how.

Minimising interpolation error between two data sets

In the top of the diagrams below we can see some value (y-axis) changing over time (x-axis).
As this happens we are sampling the value at different and unpredictable times, also we are alternating the sampling between two data sets, indicated by red and blue.
When computing the value at any time, we expect that both red and blue data sets will return similar values. However as shown in the three smaller boxes this is not the case. Viewed over time the values from each data set (red and blue) will appear to diverge and then converge about the original value.
Initially I used linear interpolation to obtain a value, next I tried using Catmull-Rom interpolation. The former results in a values come close together and then drift apart between each data point; the latter results in values which remain closer, but where the average error is greater.
Can anyone suggest another strategy or interpolation method which will provide greater smoothing (perhaps by using a greater number of sample points from each data set)?
I believe what you ask is a question that does not have a straight answer without further knowledge on the underlying sampled process. By its nature, the value of the function between samples can be merely anything, so I think there is no way to assure the convergence of the interpolations of two sample arrays.
That said, if you have a prior knowledge of the underlying process, then you can choose among several interpolation methods to minimize the errors. For example, if you measure the drag force as a function of the wing velocity, you know the relation is square (a*V^2). Then you can choose polynomial fitting of the 2nd order and have pretty good match between the interpolations of the two serieses.
Try B-splines: Catmull-Rom interpolates (goes through the data points), B-spline does smoothing.
For example, for uniformly-spaced data (not your case)
Bspline(t) = (data(t-1) + 4*data(t) + data(t+1)) / 6
Of course the interpolated red / blue curves depend on the spacing of the red / blue data points,
so cannot match perfectly.
I'd like to quote Introduction to Catmull-Rom Splines to suggest not using Catmull-Rom for this interpolation task.
One of the features of the Catmull-Rom
spline is that the specified curve
will pass through all of the control
points - this is not true of all types
of splines.
By definition your red interpolated curve will pass through all red data points and your blue interpolated curve will pass through all blue points. Therefore you won't get a best fit for both data sets.
You might change your boundary conditions and use data points from both data sets for a piecewise approximation as shown in these slides.
I agree with ysap that this question cannot be answered as you may be expecting. There may be better interpolation methods, depending on your model dynamics - as with ysap, I recommend methods that utilize the underlying dynamics, if known.
Regarding the red/blue samples, I think you have made a good observation about sampled and interpolated data sets and I would challenge your original expectation that:
When computing the value at any time, we expect that both red and blue data sets will return similar values.
I do not expect this. If you assume that you cannot perfectly interpolate - and particularly if the interpolation error is large compared to the errors in samples - then you are certain to have a continuous error function that exhibits largest errors longest (time) from your sample points. Therefore two data sets that have differing sample points should exhibit the behaviour you see because points that are far (in time) from red sample points may be near (in time) to blue sample points and vice versa - if staggered as your points are, this is sure to be true. Thus I would expect what you show, that:
Viewed over time the values from each data set (red and blue) will appear to diverge and then converge about the original value.
(If you do not have information about underlying dynamics (except frequency content), then Giacomo's points on sampling are key - however, you need not interpolate if looking at info below Nyquist.)
When sampling the original continuous function, the sampling frequency should comply to the Nyquist-Shannon sampling theorem, otherwise the sampling process introduces an error (also known as aliasing). The error, being different in the two datasets, results in a different value when you interpolate.
Therefore, you need to know the highest frequency B of the original function and then collect samples with a frequency at least 2B. If your function has very high frequencies and you cannot sample that fast, you should at least try to filter them away before sampling.

How to determine all line segments from a list of points generated from a mouse gesture?

Currently I am interning at a software company and one of my tasks has been to implement the recognition of mouse gestures. One of the senior developers helped me get started and provided code/projects that uses the $1 Unistroke Recognizer http://depts.washington.edu/aimgroup/proj/dollar/. I get, in a broad way, what the $1 Unistroke Recognizer is doing and how it works but am a bit overwhelmed with trying to understand all of the internals/finer details of it.
My problem is that I am trying to recognize the gesture of moving the mouse downards, then upwards. The $1 Unistroke Recognizer determines that the gesture I created was a downwards gesture, which is infact what it ought to do. What I really would like it to do is say "I recognize a downards gesture AND THEN an upwards gesture."
I do not know if the lack of understanding of the $1 Unistroke Recognizer completely is causing me to scratch my head, but does anyone have any ideas as to how to recognize two different gestures from moving the mouse downwards then upwards?
Here is my idea that I thought might help me but would love for someone who is an expert or even knows just a bit more than me to let me know what you think. Any help or resources that you know of would be greatly appreciated.
How My Application Currently Works:
The way that my current application works is that I capture points from where the mouse cursor is while the user holds down the left mouse button. A list of points then gets feed to a the gesture recognizer and it then spits out what it thinks to be the best shape/gesture that cooresponds to the captured points.
My Idea:
What I wanted to do is before I feed the points to the gesture recognizer is to somehow go through all the points and break them down into separate lines or curves. This way I could feed each line/curve in one at a time and from the basic movements of down, up, left, right, diagonals, and curves I could determine the final shape/gesture.
One way I thought would be good in determining if there are separate lines in my list of points is sampling groups of points and looking at their slope. If the slope of a sampled group of points differed X% from some other group of sampled points then it would be safe to assume that there is indeed a separate line present.
What I Think Are Possible Problems In My Thinking:
Where do I determine the end of a line and the start of a separate line? If I was to use the idea of checking the slope of a group of points and then determined that there was a separate line present that doesn't mean I nessecarily found the slope of a separate line. For example if you were to draw a straight edged "L" with a right angle and sample the slope of the points around the corner of the "L" you would see that the slope would give resonable indication that there is a separate line present but those points don't correspond to the start of a separate line.
How to deal with the ever changing slope of a curved line? The gesture recognizer that I use handles curves already in the way I want it too. But I don't want my method that I use to determine separate lines keep on looking for these so called separate lines in a curve because its slope is changing all the time when I sample groups of points. Would I just stop sampling points once the slope changed more than X% so many times in a row?
I'm not using the correct "type" of math for determining separate lines. Math isn't my strongest subject but I did do some research. I tried to look into Dot Products and see if that would point me in some direction, but I don't know if it will. Has anyone used Dot Prodcuts for doing something like this or some other method?
Final Thoughts, Remarks, And Thanks:
Part of my problem I feel like is that I don't know how to compeletly ask my question. I wouldn't be surprised if this problem has already been asked (in one way or another) and a solution exist that can be Googled. But my search results on Google didn't provide any solutions as I just don't know exactly how to ask my question yet. If you feel like it is confusing please let me know where and why and I will help clarify it. In doing so maybe my searches on Google will become more precise and I will be able to find a solution.
I just want to say thanks again for reading my post. I know its long but didn't really know where else to ask it. Imma talk with some other people around the office but all of my best solutions I have used throughout school have come from the StackOverflow community so I owe much thanks to you.
Edits To This Post:
(7/6 4:00 PM) Another idea I thought about was comparing all the points before a Min/Max point. For example, if I moved the mouse downards then upwards, my starting point would be the current Max point while the point where I start moving the mouse back upwards would be my min point. I could then go ahead and look to see if there are any points after the min point and if so say that there could be a new potential line. I dunno how well this will work on other shapes like stars but thats another thing Im going to look into. Has anyone done something similar to this before?
If your problem can be narrowed down to breaking apart a general curve into straight or smoothly curved partial lines then you could try this.
Comparing the slope of the segments and identifying breaking points where it is greater then some threshold would work in a very simplified case. Imagine a perfectly formed L-shape where you have a right angle between two straight lines. Obviously the corner point would be the only one where the slope difference is above the threshold as long as the threshold is between 0 and 90 degrees, and thus a identifiable breaking point.
However, the vertical and horizontal lines may be slightly curved so the threshold would need to be large enough for these small differences in slope to be ignored as breaking points. You'd also have to decide how sharp a corner the algorithm should pick up as a break. is 90 deg or higher required, or is even 30 deg enough? This is an important question.
Finally, to make this robust I would not be satisfied comparing the slopes of two adjacent segments. Hands may shake, corners may be smoothed out and the ideal conditions to find straight lines and sharp corners will probably never occur. For each point investigated for a break I would take the average slope of the N previous segments and compare it to the average slope of the N following segments. This can be efficiently implemented using a running mean. By choosing a good sample number N (depending on the accuracy of the input, the total number of points, etc) the algorithm can avoid the noise and make better detections.
Basically the algorithm would be:
For each investigated point (beginning N points into the sequence and ending N points before the end.)
Compute average slope of the N previous segments.
Compute average slope of the N next segments.
If the difference of the averages is greater than the Threshold, mark current point as a breaking point.
This is quite off the top of my head. You'd have to try it in your application.
if you work with absolute angles, like upwards and downwards, you can simply take the absolute slope between two points (not necessarily adjacent) to determine if it's RIGHT, LEFT, UP, DOWN (if that is enough of a distinction)
the art is to find a distance between points so that the angle is not random (with 1px, the angle will be a multiple of 45°)
There is a firefox plugin for Navigation using mouse gestures that works very well. I think it's FireGestures, but I'm not sure. I guess you can get some inspiration from that one
Additional thought: If you draw a shape by connectiong successive points, then connecting back to the first point, the ratio between the area and the final line segment's length is also an indicator for the gesture's "edginess"
If you are just interested in up/down/left/right, a first approximation is to check 45 degree segments of a circle. This is easily done by checking the the horizontal difference between (successive) points against the vertical difference between points.
Say you have a greater positive horizontal difference than vertical difference, then that would be 'RIGHT'.
The only difficulty then comes for example, in distinguishing UP/DOWN from UP/RIGHT/DOWN. But this could be done by distances between points. If you determine that the mouse has moved RIGHT for less than 20 pixels say, then you can ignore that movement.

Similarity Between Colors

I'm writing a program that works with images and at some point I need to posterize the image. This means I need to bin the colors, but I'm having trouble deciding how to tell how close one color is to another.
Given a color in RGB, I can think of at least 2 ways to see how different they are:
|r1 - r2| + |g1 - g2| + |b1 - b2|
sqrt((r1 - r2)^2 + (g1 - g2)^2 + (b1 - b2)^2)
And if I move into HSV, I can think of other ways of doing it.
So I ask, ignoring speed, what is the best way to tell how similar two colors are? Best meaning most accurate to the human eye.
Well, if speed is not an issue, the most accurate way would be to take some sample images and apply the filter to them using various cutoff values for the distance (distance being determined by one of the equations on the Color_difference page that astander linked to, meaning you'd have to use one of those color spaces listed there with the calculations, then convert to sRGB or something [which also means that you'd need to convert the image into the other color space first if it's not in it to begin with]), and then have a large number of people examine the images to see what looks best to them, then go with the cutoff value for the images that the majority agrees looks best.
Basically, it's largely a matter of subjectiveness; in fact, it also depends on how stylized you want the images, and you might even want to add in some sort of control so that you can alter the cutoff distance on the fly.
If speed does become a bit of an issue and/or you want more simplicity, then just use your second choice for distance calculation (which is simply the CIE76 equation; just make sure to use the Lab* color space) with the cutoff being around 2 or 2.3.
What do you mean by "posterize the image"?
If you're trying to cluster the colors into bins, you should look at
cluster analysis
Just a comment if you are going to move to HSV (or similar spaces):
Diffing on H: difference between 0° and 359° is numerically big but perceptually is negligible.
H difference if V or S are small - is small.
For computer vision apps, more important not perceptual difference (used mostly by paint manufacturers) but are these colors belong to the same object/segment or not. Which means that we might partially ignore V, which can change from lighting conditions.

Resources