Let's assume two similar timeseries like below. Both are similar and never equal. E.g. the length can be different and similar parts can be between non-similar parts. I've tried to indicate the similarity by black arrows.
I'm not a mathematician and so I'm questioning if is there an approximately (or perfect) fast way to find a mapping table between them? I've digged into dynamic time warping but at this point I'm thinking that dtw is not what I'm looking for (not sure).
The mapping table good be like
Sequence Location_Timeseries_0 Location_Timeseries_1 Length
0 LTO_0 LT1_0 N
1 LTO_1 LT1_1 M
...
Can someone put me into the right direction?
Based on what you say/show, DTW is perfect
See the bottom right of http://www.cs.ucr.edu/~eamonn/sampleslides2.jpg
or the right of http://www.cs.ucr.edu/~eamonn/sampleslides3.jpg
Related
I am trying to clean up some eye tracking data in which people are told to focus on the middle of the screen. However, the data is somewhat noisy and I am trying to clean it up in a proper way.
I have created some code that emulates the kind of data that I have and the methods I am trying to use as well as what I am presenting below.
The data complete with noise looks as follows:
I have tried to use a simple formula to throw all samples further than some pixels from the centre away such as:
results[results$x <= xmid+threshold & results$x >= xmid-threshold,]
But that results in data in a square shape rather than a circle:
I have tried to think about what to do here and have made it as far as to define a circle that encompasses the area that I am interested in:
However, I can not see a straightforward way to only pick data within that area.The solutions I have tried have required several for loops and still not given me the result I was hoping for.
I hope that some of you can point me in the right direction here. Maybe the problem is even trivial to solve in some manner that I have not yet considered? Thanks for reading this far and here is the code if you think that you can help :)
To check whether point lies in circular region with radius threshold around center xmid, ymid, you can use expression (^ denotes 2-nd power, squaring)
(x-xmid)^2 + (y-ymid)^2 <= threshold^2
I'm currently working towards a 3D model of this, but I thought I would start with 2D. Basically, I have a grid of longitude and latitude with NO2 concentrations across it. What I want to produce, at least for now, is a total amount of Nitrogen Dioxide between two points. Like so:
2DGrid
Basically, These two points are at different lats and lons and as I stated I want to find the amount of something between them. The tricky thing to me is that the model data I'm working with is gridded so I need to be able to account for the amount of something along a line at the lat and lons at which that line cuts through said grid.
Another approach, and maybe a better one for my purposes, could be visualized like this:3DGrid
Ultimately, I'd like to be able to create a program (within any language honestly) that could find the amount of "something" between two points in a 3D grid. If you would like specfics, the bottom altitude is the surface, the top grid is the top of the atmosphere. The bottom point is a measurement device looking at the sun during a certain time of day (and therefore having a certain zenith and azimuth angle). I want to find the NO2 between that measurement device and the "top of the atmosphere" which in my grid is just the top altitude level (of which there are 25).
I'm rather new to coding, stack exchange, and even the subject matter I'm working with so the sparse code I've made might end up creating more clutter than purely asking the question and seeing what methods/code you might suggest?
Hopefully my question is beneficial!
Best,
Taylor
To traverse all touched cells, you can use Amanatides-Woo algorithm. It is suitable both for 2D and for 3D case.
Implementation clues
To account for quantity used from every cell, you can apply some model. For example, calculate path length inside cell (as difference of enter and exit coordinates) and divide by normalizing factor to get cell weight (for example, byCellSize*Sqrt(3) for 3D case as diagonal length).
I have a huge number of multidimensional data points. The points basically look like this:
[1.5,3.7,1.95,1.23] one point
[2.56,3.78,4.3,2.9] another point
...................
...................
so on
Sometimes the number of dimensions goes up to something like 20 and the number of points in this 20d space can go up to like 10 million.
I have to bin this data points considering all dimensions as "dependent". So the points have to move together. I have done binning in one-dimension, but although I have been racking my brains to come up with and algorithm, I haven't been successful so far in multi-dimensional case.
I couldn't find any java examples on multi-dimensional binning either. If anybody can give me an idea on how to tackle this issue in java that would be great help.
I'm not sure this is the right place but here I go:
I have a database of 300 picture in high-resolution. I want to compute the PCA on this database and so far here is what I do: - reshape every image as a single column vector - create a matrix of all my data (500x300) - compute the average column and substract it to my matrix, this gives me X - compute the correlation C = X'X (300x300) - find the eigenvectors V and Eigen Values D of C. - the PCA matrix is given by XV*D^-1/2, where each column is a Principal Component
This is great and gives me correct component.
Now what I'm doing is doing the same PCA on the same database, except that the images have a lower resolution.
Here are my results, low-res on the left and high-res on the right. Has you can see most of them are similar but SOME images are not the same (the ones I circled)
Is there any way to explain this? I need for my algorithm to have the same images, but one set in high-res and the other one in low-res, how can I make this happen?
thanks
It is very possible that the filter you used could have done a thing or two to some of the components. After all, lower resolution images don't contain higher frequencies that, too, contribute to which components you're going to get. If component weights (lambdas) at those images are small, there's also a good possibility of errors.
I'm guessing your component images are sorted by weight. If they are, I would try to use a different pre-downsampling filter and see if it gives different results (essentially obtain lower resolution images by different means). It is possible that the components that come out differently have lots of frequency content in the transition band of that filter. It looks like images circled with red are nearly perfect inversions of each other. Filters can cause such things.
If your images are not sorted by weight, I wouldn't be surprised if the ones you circled have very little weight and that could simply be a computational precision error or something of that sort. In any case, we would probably need a little more information about how you downsample, how you sort the images before displaying them. Also, I wouldn't expect all images to be extremely similar because you're essentially getting rid of quite a few frequency components. I'm pretty sure it wouldn't have anything to do with the fact that you're stretching out images into vectors to compute PCA, but try to stretch them out in a different direction (take columns instead of rows or vice versa) and try that. If it changes the result, then perhaps you might want to try to perform PCA somewhat differently, not sure how.
Apologies if this is considered a repeat question, but the answers I've seen on here are too complex for my needs.
I simply need to find out if a line segment intersects a circle. I don't need to find the distance to the line from the circle center, I don't need to solve for the points of intersection.
The reason I need something simple is that I have to code this in SQL and am unable to call out to external libraries, and need to write this formula in a WHERE clause... basicaly it has to be done in a single statement that I can plug values in to.
Assuming 2 points A (Ax,Ay) and B (Bx,By) to describe the line segment, and a circle with center point C (Cx,Cy) and radius R, the formula I am currently using is:
( RR ( (Ax-Bx)(Ax-Bx) + (Ay-By)(Ay-By) ) )
-( ((Ax-Cx)(By-Cy))-((Bx-Cx)(Ay-Cy)) ) > 0
This formula is taken from link text, and is based on a 0,0 centered circle.
The reason I am posting is that I am getting weird results and I wondered if I did something stupid. :(
although this doesn't exactly answer your question: Do you really have to calculate this on the fly on a SQL-Select? This means that the DB-system has to calculate the formula for every single row in the table (or every single row for which the remaining where conditions hold, respectively) which might result in bad performance.
Instead, you might consider creating a separate boolean column and calculate its value in an on-insert/on-update trigger. There, in turn, you wouldn't even need to put the test in a single line formula. Using a separate column has another advantage: You can create an index on that column which allows you to get your set of intersecting/non-intersecting records very fast.