I want to analyse angles in movement of animals. I have tracking data that has 10 recordings per second. The data per recording consists of the position (x,y) of the animal, the angle and distance relative to the previous recording and furthermore includes speed and acceleration.
I want to analyse the speed an animal has while making a particular angle, however since the temporal resolution of my data is so high, each turn consists of a number of minute angles.
I figured there are two possible ways to work around this problem for both of which I do not know how to achieve such a thing in R and help would be greatly appreciated.
The first: Reducing my temporal resolution by a certain factor. However, this brings the disadvantage of losing possibly important parts of the data. Despite this, how would I be able to automatically subsample for example every 3rd or 10th recording of my data set?
The second: By converting straight movement into so called 'flights'; rule based aggregation of steps in approximately the same direction, separated by acute turns (see the figure). A flight between two points ends when the perpendicular distance from the main direction of that flight is larger than x, a value that can be arbitrarily set. Does anyone have any idea how to do that with the xy coordinate positional data that I have?
It sounds like there are three potential things you might want help with: the algorithm, the math, or R syntax.
The algorithm you need may depend on the specifics of your data. For example, how much data do you have? What format is it in? Is it in 2D or 3D? One possibility is to iterate through your data set. With each new point, you need to check all the previous points to see if they fall within your desired column. If the data set is large, however, this might be really slow. Worst case scenario, all the data points are in a single flight segment, meaning you would check the first point the same number of times as you have data points, the second point one less, etc. The means n + (n-1) + (n-2) + ... + 1 = n(n-1)/2 operations. That's O(n^2); the operating time could have quadratic growth with respect to the size of your data set. Hence, you may need something more sophisticated.
The math to check whether a point is within your desired column of x is pretty straightforward, although maybe more sophisticated math could help inform a better algorithm. One approach would be to use vector arithmetic. To take an example, suppose you have points A, B, and C. Your goal is to see if B falls in a column of width x around the vector from A to C. To do this, find the vector v orthogonal to C, then look at whether the magnitude of the scalar projection of the vector from A to B onto v is less than x. There is lots of literature available for help with this sort of thing, here is one example.
I think this is where I might start (with a boolean function for an individual point), since it seems like an R function to determine this would be convenient. Then another function that takes a set of points and calculates the vector v and calls the first function for each point in the set. Then run some data and see how long it takes.
I'm afraid I won't be of much help with R syntax, although it is on my list of things I'd like to learn. I checked out the manual for R last night and it had plenty of useful examples. I believe this is very doable, even for an R novice like myself. It might be kind of slow if you have a big data set. However, with something that works, it might also be easier to acquire help from people with more knowledge and experience to optimize it.
Two quick clarifying points in case they are helpful:
The above suggestion is just to start with the data for a single animal, so when I talk about growth of data I'm talking about the average data sample size for a single animal. If that is slow, you'll probably need to fix that first. Then you'll need to potentially analyze/optimize an algorithm for processing multiple animals afterwards.
I'm implicitly assuming that the definition of flight segment is the largest subset of contiguous data points where no "sub" flight segment violates the column rule. That is to say, I think I could come up with an example where a set of points satisfies your rule of falling within a column of width x around the vector to the last point, but if you looked at the column of width x around the vector to the second to last point, one point wouldn't meet the criteria anymore. Depending on how you define the flight segment then (e.g. if you want it to be the largest possible set of points that meet your condition and don't care about what happens inside), you may need something different (e.g. work backwards instead of forwards).
Related
My goal is to detect any change in frames(single channel. each pixel is depth value). On app init, I take average of all corresponding pixels of first 30 frames so one average background frame will be created. On new frames arrival I subtract each frame from saved background frame(mean frame of first 30 frames). Currently algorithm is: I take mean of first 30 frames say it bg_mean(scalar, not frame, say 2345). Then calculate mean of new frame and compare it with bg_mean with some threshold added to bg_mean (to avoid noise consideration) . But this method method does not give good results if distance is far. Are there any other methods ?
I suspect your threshold value is the issue. I'm not familiar with depth-cameras, but I'd assume the noise values might be impacted by the distance. To validate this, try changing/removing it to see if it improves the results.
Traditional image processing techniques will also update the history as time goes so large changes to the scene are not counted as changes in every subsequent frames. Your idea is essentially how it is done, but approaches vary in how they account for noise (in standard images you also have to account for shadows, which you can avoid with a depth camera)
I'd suggest looking into built-in algorithms with more sophisticated noise removal (https://docs.opencv.org/3.4/d/d38/tutorial_bgsegm_bg_subtraction.html). Although for images, I suspect this would work similarly for depths cameras.
I am working with some observation data and have run into a bit of an issue beyond my current capabilities. I surveyed different polygons (the column "PolygonID" in the screenshot) for lizards two times during a survey season. I want to determine the total search effort (shown in the column "Effort") for each individual polygon within each survey round. Problem is, the software I was using to collect the data sometimes creates unnecessary repeats for polygons within a survey round. There is an example of this in the screenshot for the rows with PolygonID P3.
Most of the time it does not affect the effort calculations because the start and end time for the rows (the fields used to calculate effort) are the same, and I know how to filter the dataset so it only shows one line per polygon per survey, but I have reason to be concerned there might be some lines where the software glitched and assigned incorrect start and end times for one of the repeat lines. Is there a way I can test whether start and end time match for any such repeats with R, rather than manually going through all the data?
Thank you!
I am working with frame by frame video data in tabular format. When an event happens, it is only coded for that single frame. I would like to order events in pseudotime and there is no distinct periodicity to them. Regular one-hot encoding does not encode the 'eventness' of timestamps within close proximity to the frame labeled "hey stuff happens here". I imagine that this could be modeled as a sinusoidal function, convolved with the the initial array that represents a column. This way, even when frames are not ordered, I can still see the 'eventness' of each one.
I was thinking something along these lines:
x <- c(...0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0...)
y <- c(...0,1,2,3,4,5,6,7,8,9,10,9,8,7,6,5,4,3,2,1,0...)
(convolve(x, y))
I am struggling with designing a for loop that might do this for each column of dummy variables.
Also, I am using "bit depth" as an analogy, since this is like increasing the bit depth of an audio sample. Basically, I want to trade in my NES for a Sega Genesis.
Thanks!
To the bevy of people on the edge of their seats, I ended up using a moving average to solve this problem.
Assume I have four-vectors (v1,v2,v3,v4), and I want to create a new vector (vec_new) that is not close to any of those four-vectors. I was thinking about interpolation and extrapolation. Do you think they are suitable? Are they also apply for vector and generate a vector of let's say 300 dimensions? Another possible option would be the transformation matrix. But I am not sure if it fit my concern. I think averaging and concatenation are not the good ones as I might be close to some of those four-vectors.
based on my problem, Imagine I divided my vectors into two categories. I need to find a vector which belongs to non-of those categories.
Any other ideas?
Per my comment, I wouldn't expect the creation of synthetic "far away" examples to be useful for realistic goals.
Even things like word antonyms are not maximally cosine-dissimilar from each other, because among the realm of all word-meaning-possibilities, antonyms are quite similar to each other. For example, 'hot' and 'cold' are considered opposites, but are the same kind of word, describing the same temperature-property, and can often be drop-in replacements for each other in the same sentences. So while they may show an interesting contrast in word-vector space, the "direction of difference" isn't going to be through the origin -- as would create maximal cosine-dissimilarity.
And in classification contexts, even a simple 2-category classifier will need actual 'negative' examples. With only positive examples, the 'vector space' won't necessarily model anything about hypothesized-but-not-actually-present negative examples. (It's nearly impossible to divide the space into two categories without training examples showing the real "boundaries".)
Still, there's an easy way to make a vector that is maximally dissimilar to another single vector: negate it. That creates a vector that's in the exact opposite direction from the original, and thus will have a cosine-similarity of -1.0.
If you have a number of vectors against which you want to find a maximally-dissimilar vector, I suspect you can't do much better than negating the average of all the vectors. That is, average the vectors, then negate that average-vector, to find the vector that's pointing exactly-opposite the average.
Good luck!
I am doing image processing, in which I came across a situation, where I have to compare two vectors and find an instance of the smaller vector in the larger vector.
Say the two vectors are A: with 100 elements (or entries)
and B; with 10 elements. B is a model and it may not be present exactly as it is' in the vector A. I can compare 10 elements at a time and find the difference. Ideal case is that the B is present somewhere and the difference is zero. Otherwise a minimum will result at some random location, and i am missing the location.
Please help me in giving an algorithm such that the i can find Bs' closest instance in A.
What you are looking for is the cross-correlation function.The peak the the cross correlation of the two vectors will be the point were vector B is most similar to vector A.
You may want to get an explanation of how it is implemented in matlab HERE as it gives an easier explanation of how this operation can be implemented in software.