Detect peaks at beginning and end of x-axis - r
I've been working on detecting peaks within a data set of thousands of y~x relationships. Thanks to this post, I've been using loess and rollapply to detect peaks by comparing the local maximum to the smooth. Since, I've been working to optimise the span and w thresholds for loess and rollapply functions, respectively.
However, I have realised that several of my relationships have a peak at the beginning or the end on the x-axis, which are of my interest. But these peaks are not being identified. For now, I've tried to add fake variables outside of my x variable range to imitate a peak. For example, if my x values range from -50 to 160, I created x values of -100 and 210 and assigned a 0 y value to them.
This helped me to identify some of the relationships that have a peak at the beginning or the end. As you can see here:
However, for some it does not work.
Despite the fact that I feel uncomfortable adding 'fake' values to the relationship, the smoothing shifts the location of the peak frequently and more importantly, I cannot find a solution that allows to detect these beginning or end peaks. Does anyone know how to work out a solution that works in R?
Related
segmenting lat/long data graph into lines/vectors
I have lat/lng data of multirotor UAV flights. There are alot of datapoints (~13k per flight) and I wish to find line segments from the data. They give me flight speed and direction. I know that most of the flights are guided missons meaning a point is given to fly to. However the exact points are unknown to me. Here is a graph of a single flight lat/lng shifted to near (0,0) so they are visible on the same time-series graph. I attempted to generate similar data, but there are several constraints and it may take more time to solve than working on the segmenting. The graphs start and end nearly always at the same point. Horisontal lines mean the UAV is stationary. These segments are expected. Beginning and and end are always stationary for takeoff and landing. There is some level of noise in the lines for the gps accuracy tho seemingly not that much. Alot of data points. The number of segments is unknown. The noise I could calculate given the segments and least squares method to the line. Currently I'm thinking of sampling the data (to decimate it a little) and constructing lines. Merging the lines with smaller angle than x (dependant on the noise) and finding the intersection points of the lines left. Another thought is to try and look at this problem in the frequency domain. The corners should be quite high frequency. Maybe I could make a custom filter kernel that would enable me to use a window function and win in efficency. EDIT: Rewrote the question for more clarity and less rambling.
Adding plotstick-like arrows to a scatterplot
This is my first post here, thought i have read a lot of your Q&A these last 6 months. I'm currently working on ADCP (Aquatic Doppler Current Profiler) data, handled by the "oce" package from Dan Kelley (a little bit of advertising for those who want to deal with oceanographic datas in R). I'm not very experienced in R, and i have read the question relative to abline for levelplot functions "How to add lines to a levelplot made using lattice (abline somehow not working)?". What i currently have is a levelplot representing a time series of echo intensity (from backscattered signal, which is monitored in the same time as current is) data taken in 10m of depth, this 10m depth line is parted into 25 rows, where each measurement is done along the line. (see the code part to obtain an image of what i have) (unfortunately, my reputation doesn't allow me to post images). I then proceed to generate an other plot, which represents arrows of the current direction as: The length of each arrow gives an indication of the current strength Its orientation is represented (all of this is done by taking the two components of the current intensity (East-West / North-South) and represent the resulting current). There is an arrow drawn for each tick of time (thus for the 1000 columns of my example data, there are always two components of the current intensity). Those arrows are drawn at the beginning of each measurement cell, thus at each row of my data, allowing to have a representation of currents for the whole water column. You can see the code part to have a "as i have" representation of currents The purpose of this question is to understand how i can superimpose those two representations, drawing my current arrows at each row of the represented data, thus making a representation of both current direction, intensity and echo intensity. Here i can't find any link to describe what i mean, but this is something i have already seen. I tried with the panel function which seems to be the best option, but my knowledge of R and the handeling of this kind of work is small, and i hope one of you may have the time and the knowledges to help me to solve this problem way faster than i could. I am, of course, available to answer any questions or give precisions. I may ask a lot more, after working on a large code for 6 months, my thirst for learning is now large. Code to represent data : Here are some data to represent what I have: U (north/south component of velocity) and V (East/west): U1= c(0.043,0.042,0.043,0.026,0.066,-0.017,-0.014,-0.019,0.024,-0.007,0.000,-0.048,-0.057,-0.101,-0.063,-0.114,-0.132,-0.103,-0.080,-0.098,-0.123,-0.087,-0.071,-0.050,-0.095,-0.047,-0.031,-0.028,-0.015,0.014,-0.019,0.048,0.026,0.039,0.084,0.036,0.071,0.055,0.019,0.059,0.038,0.040,0.013,0.044,0.078,0.040,0.098,0.015,-0.009,0.013,0.038,0.013,0.039,-0.008,0.024,-0.004,0.046,-0.004,-0.079,-0.032,-0.023,-0.015,-0.001,-0.028,-0.030,-0.054,-0.071,-0.046,-0.029,0.012,0.016,0.049,-0.020,0.012,0.016,-0.021,0.017,0.013,-0.008,0.057,0.028,0.056,0.114,0.073,0.078,0.133,0.056,0.057,0.096,0.061,0.096,0.081,0.100,0.092,0.057,0.028,0.055,0.025,0.082,0.087,0.070,-0.010,0.024,-0.025,0.018,0.016,0.007,0.020,-0.031,-0.045,-0.009,-0.060,-0.074,-0.072,-0.082,-0.100,-0.047,-0.089,-0.074,-0.070,-0.070,-0.070,-0.075,-0.070,-0.055,-0.078,-0.039,-0.050,-0.049,0.024,-0.026,-0.021,0.008,-0.026,-0.018,0.002,-0.009,-0.025,0.029,-0.040,-0.006,0.055,0.018,-0.035,-0.011,-0.026,-0.014,-0.006,-0.021,-0.031,-0.030,-0.056,-0.034,-0.026,-0.041,-0.107,-0.069,-0.082,-0.091,-0.096,-0.043,-0.038,-0.056,-0.068,-0.064,-0.042,-0.064,-0.058,0.016,-0.041,0.018,-0.008,0.058,0.006,0.007,0.060,0.011,0.050,-0.028,0.023,0.015,0.083,0.106,0.057,0.096,0.055,0.119,0.145,0.078,0.090,0.110,0.087,0.098,0.092,0.050,0.068,0.042,0.059,0.030,-0.005,-0.005,-0.013,-0.013,-0.016,0.008,-0.045,-0.021,-0.036,0.020,-0.018,-0.032,-0.038,0.021,-0.077,0.003,-0.010,-0.001,-0.024,-0.020,-0.022,-0.029,-0.053,-0.022,-0.007,-0.073,0.013,0.018,0.002,-0.038,0.024,0.025,0.033,0.008,0.016,-0.018,0.023,-0.001,-0.010,0.006,0.053,0.004,0.001,-0.003,0.009,0.019,0.024,0.031,0.024,0.009,-0.009,-0.035,-0.030,-0.031,-0.094,-0.006,-0.052,-0.061,-0.104,-0.098,-0.054,-0.161,-0.110,-0.078,-0.178,-0.052,-0.073,-0.051,-0.065,-0.029,-0.012,-0.053,-0.070,-0.040,-0.056,-0.004,-0.032,-0.065,-0.005,0.036,0.023,0.043,0.078,0.039,0.019,0.061,0.025,0.036,0.036,0.062,0.048,0.073,0.037,0.025,0.000,-0.007,-0.014,-0.050,-0.014,0.007,-0.035,-0.115,-0.039,-0.113,-0.102,-0.109,-0.158,-0.158,-0.133,-0.110,-0.170,-0.124,-0.115,-0.134,-0.097,-0.106,-0.155,-0.168,-0.038,-0.040,-0.074,-0.011,-0.040,-0.003,-0.019,-0.022,-0.006,-0.049,-0.048,-0.039,-0.011,-0.036,-0.001,-0.018,-0.037,-0.001,0.033,0.061,0.054,0.005,0.040,0.045,0.062,0.016,-0.007,-0.005,0.009,0.044,0.029,-0.016,-0.028,-0.021,-0.036,-0.072,-0.138,-0.060,-0.109,-0.064,-0.142,-0.081,-0.032,-0.077,-0.058,-0.035,-0.039,-0.013,0.007,0.007,-0.052,0.024,0.018,0.067,0.015,-0.002,-0.004,0.038,-0.010,0.056) V1=c(-0.083,-0.089,-0.042,-0.071,-0.043,-0.026,0.025,0.059,-0.019,0.107,0.049,0.089,0.094,0.090,0.120,0.169,0.173,0.159,0.141,0.157,0.115,0.128,0.154,0.083,0.038,0.081,0.129,0.120,0.112,0.074,0.022,-0.022,-0.028,-0.048,-0.027,-0.056,-0.027,-0.107,-0.020,-0.063,-0.069,-0.019,-0.055,-0.071,-0.027,-0.034,-0.018,-0.089,-0.068,-0.129,-0.034,-0.002,0.011,-0.009,-0.038,-0.013,-0.006,0.027,0.037,0.022,0.087,0.080,0.119,0.085,0.076,0.072,0.029,0.103,0.019,0.020,0.052,0.024,-0.051,-0.024,-0.008,0.011,-0.019,0.023,-0.011,-0.033,-0.101,-0.157,-0.094,-0.099,-0.106,-0.103,-0.139,-0.093,-0.098,-0.083,-0.118,-0.142,-0.155,-0.095,-0.122,-0.072,-0.034,-0.047,-0.036,0.014,0.035,-0.034,-0.012,0.054,0.030,0.060,0.091,0.013,0.049,0.083,0.070,0.127,0.048,0.118,0.123,0.099,0.097,0.074,0.125,0.051,0.107,0.069,0.040,0.102,0.100,0.119,0.087,0.077,0.044,0.091,0.020,0.010,-0.028,0.026,-0.018,-0.020,0.010,0.034,0.005,0.010,0.028,-0.043,0.025,-0.069,-0.003,0.004,-0.001,0.024,0.032,0.076,0.033,0.071,0.000,0.052,0.034,0.058,0.002,0.070,0.025,0.056,0.051,0.080,0.051,0.101,0.009,0.052,0.079,0.035,0.051,0.049,0.064,0.004,0.011,0.005,0.031,-0.021,-0.024,-0.048,-0.011,-0.072,-0.034,-0.020,-0.052,-0.069,-0.088,-0.093,-0.084,-0.143,-0.103,-0.110,-0.124,-0.175,-0.083,-0.117,-0.090,-0.090,-0.040,-0.068,-0.082,-0.082,-0.061,-0.013,-0.029,-0.032,-0.046,-0.031,-0.048,-0.028,-0.034,-0.012,0.006,-0.062,-0.043,0.010,0.036,0.050,0.030,0.084,0.027,0.074,0.082,0.087,0.079,0.031,0.003,0.001,0.038,0.002,-0.038,0.003,0.023,-0.011,0.013,0.003,-0.046,-0.021,-0.050,-0.063,-0.068,-0.085,-0.051,-0.052,-0.065,0.014,-0.016,-0.082,-0.026,-0.032,0.019,-0.026,0.036,-0.005,0.092,0.070,0.045,0.074,0.091,0.122,-0.007,0.094,0.064,0.087,0.063,0.083,0.109,0.062,0.096,0.036,-0.019,0.075,0.052,0.025,0.031,0.078,0.044,-0.018,-0.040,-0.039,-0.140,-0.037,-0.095,-0.056,-0.044,-0.039,-0.086,-0.062,-0.085,-0.023,-0.103,-0.035,-0.067,-0.096,-0.097,-0.060,0.003,-0.051,0.014,-0.002,0.054,0.045,0.073,0.080,0.096,0.104,0.126,0.144,0.136,0.132,0.160,0.155,0.136,0.080,0.144,0.087,0.093,0.103,0.151,0.165,0.146,0.159,0.156,0.002,0.023,-0.019,0.078,0.031,0.038,0.019,0.094,0.018,0.028,0.064,-0.052,-0.034,0.000,-0.074,-0.076,-0.028,-0.048,-0.025,-0.095,-0.098,-0.045,-0.016,-0.030,-0.036,-0.012,0.023,0.038,0.042,0.039,0.073,0.066,0.027,0.016,0.093,0.129,0.138,0.121,0.077,0.046,0.067,0.068,0.023,0.062,0.038,-0.007,0.055,0.006,-0.015,0.008,0.064,0.012,0.004,-0.055,0.018,0.042) U2=c(0.022,0.005,-0.022,0.025,-0.014,-0.020,-0.001,-0.021,-0.008,-0.006,-0.056,0.050,-0.068,0.018,-0.106,-0.053,-0.084,-0.082,-0.061,-0.041,-0.057,-0.123,-0.060,-0.029,-0.084,-0.004,0.030,-0.021,-0.036,-0.016,0.006,0.088,0.088,0.079,0.063,0.097,0.020,-0.048,0.046,0.057,0.065,0.042,0.022,0.016,0.041,0.109,0.024,-0.010,-0.084,-0.002,0.004,-0.033,-0.025,-0.020,-0.061,-0.060,-0.043,-0.027,-0.054,-0.054,-0.040,-0.077,-0.043,-0.014,0.030,-0.051,0.001,-0.029,0.008,-0.023,0.015,0.002,-0.001,0.029,0.048,0.081,-0.022,0.040,0.018,0.131,0.059,0.055,0.043,0.027,0.091,0.104,0.101,0.084,0.048,0.057,0.044,0.083,0.063,0.083,0.079,0.042,-0.021,0.017,0.005,0.001,-0.033,0.010,-0.028,-0.035,-0.012,-0.034,-0.055,-0.009,0.001,-0.084,-0.047,-0.020,-0.046,-0.042,-0.058,-0.071,0.013,-0.045,-0.070,0.000,-0.067,-0.090,0.012,-0.013,-0.013,-0.009,-0.063,-0.047,-0.030,0.046,0.026,0.019,0.007,-0.056,-0.062,0.009,-0.019,-0.005,0.003,0.022,-0.006,-0.019,0.020,0.025,0.040,-0.032,0.015,0.019,-0.014,-0.031,-0.047,0.010,-0.058,-0.079,-0.052,-0.044,0.012,-0.039,-0.007,-0.068,-0.095,-0.053,-0.066,-0.056,-0.033,-0.006,0.001,0.010,0.004,0.011,0.013,0.029,-0.011,0.007,0.023,0.087,0.054,0.040,0.013,-0.006,0.076,0.086,0.103,0.121,0.070,0.074,0.067,0.045,0.088,0.041,0.075,0.039,0.043,0.016,0.065,0.056,0.047,-0.002,-0.001,-0.009,-0.029,0.018,0.041,0.002,-0.022,0.003,0.008,0.031,0.003,-0.031,-0.015,0.014,-0.057,-0.043,-0.045,-0.067,-0.040,-0.013,-0.111,-0.067,-0.055,-0.004,-0.070,-0.019,0.009,0.009,0.032,-0.021,0.023,0.123,-0.032,0.040,0.012,0.042,0.038,0.037,-0.007,0.003,0.011,0.090,0.039,0.083,0.023,0.056,0.030,0.042,0.030,-0.046,-0.034,-0.021,-0.076,-0.017,-0.071,-0.053,-0.014,-0.060,-0.038,-0.076,-0.011,-0.005,-0.051,-0.043,-0.032,-0.014,-0.038,-0.081,-0.021,-0.035,0.014,-0.001,0.001,0.003,-0.029,-0.031,0.000,0.048,-0.036,0.034,0.054,0.001,0.046,0.006,0.039,0.015,0.012,0.034,0.022,0.015,0.033,0.037,0.012,0.057,0.001,-0.014,0.012,-0.007,-0.022,-0.002,-0.008,0.043,-0.041,-0.057,-0.006,-0.079,-0.070,-0.038,-0.040,-0.073,-0.045,-0.101,-0.092,-0.046,-0.047,-0.023,-0.028,-0.019,-0.086,-0.047,-0.038,-0.068,-0.017,0.037,-0.010,-0.016,0.010,-0.005,-0.031,0.004,-0.034,0.005,0.006,-0.015,0.017,-0.043,-0.007,-0.009,0.013,0.026,-0.036,0.011,0.047,-0.025,-0.023,0.043,-0.020,-0.003,-0.043,0.000,-0.018,-0.075,-0.045,-0.063,-0.043,-0.055,0.007,-0.063,-0.085,-0.031,0.005,-0.067,-0.059,-0.059,-0.029,-0.014,-0.040,-0.072,-0.018,0.039,-0.006,-0.001,-0.015,0.038,0.038,-0.009,0.026,0.017,0.056) V2=c(-0.014,0.001,0.004,-0.002,0.022,0.019,0.023,-0.023,0.030,-0.085,-0.007,-0.027,0.100,0.058,0.108,0.055,0.132,0.115,0.084,0.046,0.102,0.121,0.036,0.019,0.066,0.049,-0.011,0.020,0.023,0.011,0.041,0.009,-0.009,-0.023,-0.036,0.031,0.012,0.026,-0.011,0.009,-0.027,-0.033,-0.054,-0.004,-0.040,-0.048,-0.009,0.023,-0.028,0.022,0.090,0.060,0.040,0.003,-0.011,0.030,0.107,0.025,0.084,0.036,0.074,0.065,0.078,0.011,0.058,0.092,0.083,0.080,0.039,0.000,-0.027,0.035,0.011,0.004,0.023,-0.033,-0.060,-0.049,-0.101,-0.033,-0.105,-0.042,-0.088,-0.086,-0.093,-0.085,-0.028,-0.046,-0.045,-0.052,-0.009,-0.066,-0.073,-0.067,0.011,-0.057,-0.087,-0.066,-0.103,-0.075,0.003,-0.021,0.010,-0.013,0.021,0.020,0.084,0.028,0.127,0.050,0.104,0.097,0.075,0.021,0.057,0.095,0.080,0.077,0.086,0.110,0.054,0.016,0.105,0.065,0.046,0.047,0.072,0.058,0.092,0.063,0.033,0.087,0.036,0.049,0.093,0.008,0.064,0.068,0.040,0.049,0.035,0.042,0.045,0.021,0.056,0.007,0.026,0.067,0.046,0.088,0.084,0.070,0.037,0.079,0.065,0.074,0.077,0.023,0.094,0.061,0.096,0.068,0.067,0.091,0.061,0.069,0.090,0.046,0.057,0.011,-0.018,0.005,0.001,-0.023,-0.087,0.010,0.023,-0.025,-0.040,-0.059,-0.063,-0.075,-0.136,-0.078,-0.102,-0.128,-0.116,-0.091,-0.136,-0.083,-0.115,-0.063,-0.055,-0.080,-0.093,-0.099,-0.053,-0.042,-0.011,-0.034,-0.027,-0.042,-0.022,-0.008,-0.033,-0.039,-0.036,0.019,0.036,-0.002,0.000,-0.021,0.060,0.030,0.073,0.080,0.061,0.046,0.062,0.010,0.034,0.103,0.107,0.016,0.080,0.067,0.007,0.060,0.021,-0.026,0.008,0.051,0.030,0.001,-0.036,-0.047,0.000,0.006,0.006,0.013,0.009,0.019,0.009,-0.086,-0.020,0.018,0.039,0.014,0.011,0.052,0.031,0.095,0.047,0.065,0.114,0.086,0.102,0.037,0.039,0.060,0.024,0.091,0.058,0.065,0.060,0.045,0.031,0.062,0.047,0.043,0.057,0.032,0.057,0.051,0.019,0.056,0.024,-0.003,0.023,-0.013,-0.032,-0.022,-0.064,-0.021,-0.050,-0.063,-0.090,-0.082,-0.076,-0.077,-0.042,-0.060,-0.010,-0.060,-0.069,-0.028,-0.071,-0.046,-0.020,-0.074,0.080,0.071,0.065,0.079,0.065,0.039,0.061,0.154,0.072,0.067,0.133,0.106,0.080,0.047,0.053,0.110,0.080,0.122,0.075,0.052,0.034,0.081,0.118,0.079,0.101,0.053,0.082,0.036,0.033,0.026,0.002,-0.002,0.020,0.087,0.021,0.034,0.003,-0.021,0.016,-0.009,-0.045,-0.043,-0.020,0.027,0.008,-0.006,0.043,0.045,0.014,0.053,0.083,0.113,0.091,0.028,0.060,0.040,0.019,0.114,0.126,0.090,0.046,0.089,0.029,0.030,0.010,0.045,0.040,0.072,-0.033,-0.008,0.014,-0.018,-0.004,-0.037,0.015,-0.021,-0.015) bindistances=c(1.37,1.62,1.87,2.12,2.37,2.62,2.87,3.12,3.37,3.62,3.87,4.12,4.37,4.62,4.87,5.12,5.37,5.62,5.87,6.12,6.37,6.62,6.87,7.12,7.37,7.62,7.87,8.12) Then, as a representation of currents: AA=14 x11() par(mfrow=c(4,1)) plotSticks(x=seq(from=(1), to=(377), by=(1)), u=U1, v=V1, yscale=ysc,xlab='',ylab='',xaxt='n',yaxt='n',col=(rep('black',384))) axis(side=1) plotSticks(x=seq(from=(1), to=(377), by=(1)), u=U2, v=V2, yscale=ysc,xlab='',ylab='',xaxt='n',yaxt='n',col=(rep('black',384))) plotSticks(x=seq(from=(1), to=(377), by=(1)), u=U2, v=V2, yscale=ysc,xlab='',ylab='',xaxt='n',yaxt='n',col=(rep('black',384))) plotSticks(x=seq(from=(1), to=(377), by=(1)), u=U2, v=V2, yscale=ysc,xlab='',ylab='',xaxt='n',yaxt='n',col=(rep('black',384))) In order to simplify the representation, the three last plots are based on the same data.
Plotting cosine wave samples in Maple
I'm having trouble with Maple. I have a cosine wave, which I figured out how to plot, but now I have to take samples from that wave and plot those(as dots) over top of the original cosine wave. Here is the question from the assignment: "Produce the samples from Q1 above and plot the result (plot the points on a plot of the cosine wave - use different colours for both, it will look like a cosine wave with dots on it)" Problem is, my samples keep being straight lines at different heights http://i197.photobucket.com/albums/aa221/Haseo_Ame/Maple.png I'm not sure what I'm doing wrong since I've never used maple before.
Firstly, try not to build up lists using repeated concatenation (which can incur an O(n^2) in resources) if you can use the seq command instead (which can incur an O(n) cost in resources). You should always reconsider, when coding like s:=[op(s),...] in a loop. Next, a point-plot needs pairs of x-y values. Your list is just a collection of scalar values, and hence is being interpreted as a collection of constant functions to be plotted. The pairs of x-y values can be in a list of (2-element) lists such as [[x1,y1],...,[xn,yn] It's not clear how you want your x-axis scaled, but you could start off with something like this, s:=[seq([i, 4*cos(2*Pi*i*70/200+Pi/4)],i=0..20)]: plot(s, style=point); # s:=[seq([2*Pi*i*70/200+Pi/4, 4*cos(2*Pi*i*70/200+Pi/4)],i=0..20)]: ps. Please post source code as text, not as embedded images, so that anyone trying to help needn't type it all in.
Finding a density peak / cluster centrum in 2D grid / point process
I have a dataset with minute by minute GPS coordinates recorded by a persons cellphone. I.e. the dataset has 1440 rows with LON/LAT values. Based on the data I would like a point estimate (lon/lat value) of where the participants home is. Let's assume that home is the single location where they spend most of their time in a given 24h interval. Furthermore, the GPS sensor most of the time has quite high accuracy, however sometimes it is completely off resulting in gigantic outliers. I think the best way to go about this is to treat it as a point process and use 2D density estimation to find the peak. Is there a native way to do this in R? I looked into kde2d (MASS) but this didn't really seem to do the trick. Kde2d creates a 25x25 grid of the data range with density values. However, in my data, the person can easily travel 100 miles or more per day, so these blocks are generally too large of an estimate. I could narrow them down and use a much larger grid but I am sure there must be a better way to get a point estimate.
There are "time spent" functions in the trip package (I'm the author). You can create objects from the track data that understand the underlying track process over time, and simply process the points assuming straight line segments between fixes. If "home" is where the largest value pixel is, i.e. when you break up all the segments based on the time duration and sum them into cells, then it's easy to find it. A "time spent" grid from the tripGrid function is a SpatialGridDataFrame with the standard sp package classes, and a trip object can be composed of one or many tracks. Using rgdal you can easily transform coordinates to an appropriate map projection if lon/lat is not appropriate for your extent, but it makes no difference to the grid/time-spent calculation of line segments. There is a simple speedfilter to remove fixes that imply movement that is too fast, but that is very simplistic and can introduce new problems, in general updating or filtering tracks for unlikely movement can be very complicated. (In my experience a basic time spent gridding gets you as good an estimate as many sophisticated models that just open up new complications). The filter works with Cartesian or long/lat coordinates, using tools in sp to calculate distances (long/lat is reliable, whereas a poor map projection choice can introduce problems - over short distances like humans on land it's probably no big deal). (The function tripGrid calculates the exact components of the straight line segments using pixellate.psp, but that detail is hidden in the implementation). In terms of data preparation, trip is strict about a sensible sequence of times and will prevent you from creating an object if the data have duplicates, are out of order, etc. There is an example of reading data from a text file in ?trip, and a very simple example with (really) dummy data is: library(trip) d <- data.frame(x = 1:10, y = rnorm(10), tms = Sys.time() + 1:10, id = gl(1, 5)) coordinates(d) <- ~x+y tr <- trip(d, c("tms", "id")) g <- tripGrid(tr) pt <- coordinates(g)[which.max(g$z), ] image(g, col = c("transparent", heat.colors(16))) lines(tr, col = "black") points(pt[1], pt[2], pch = "+", cex = 2) That dummy track has no overlapping regions, but it shows that finding the max point in "time spent" is simple enough.
How about using the location that minimises the sum squared distance to all the events? This might be close to the supremum of any kernel smoothing if my brain is working right. If your data comprises two clusters (home and work) then I think the location will be in the biggest cluster rather than between them. Its not the same as the simple mean of the x and y coordinates. For an uncertainty on that, jitter your data by whatever your positional uncertainty is (would be great if you had that value from the GPS, otherwise guess - 50 metres?) and recompute. Do that 100 times, do a kernel smoothing of those locations and find the 95% contour. Not rigorous, and I need to experiment with this minimum distance/kernel supremum thing...
In response to spacedman - I am pretty sure least squares won't work. Least squares is best known for bowing to the demands of outliers, without much weighting to things that are "nearby". This is the opposite of what is desired. The bisquare estimator would probably work better, in my opinion - but I have never used it. I think it also requires some tuning. It's more or less like a least squares estimator for a certain distance from 0, and then the weighting is constant beyond that. So once a point becomes an outlier, it's penalty is constant. We don't want outliers to weigh more and more and more as we move away from them, we would rather weigh them constant, and let the optimization focus on better fitting the things in the vicinity of the cluster.
Minimising interpolation error between two data sets
In the top of the diagrams below we can see some value (y-axis) changing over time (x-axis). As this happens we are sampling the value at different and unpredictable times, also we are alternating the sampling between two data sets, indicated by red and blue. When computing the value at any time, we expect that both red and blue data sets will return similar values. However as shown in the three smaller boxes this is not the case. Viewed over time the values from each data set (red and blue) will appear to diverge and then converge about the original value. Initially I used linear interpolation to obtain a value, next I tried using Catmull-Rom interpolation. The former results in a values come close together and then drift apart between each data point; the latter results in values which remain closer, but where the average error is greater. Can anyone suggest another strategy or interpolation method which will provide greater smoothing (perhaps by using a greater number of sample points from each data set)?
I believe what you ask is a question that does not have a straight answer without further knowledge on the underlying sampled process. By its nature, the value of the function between samples can be merely anything, so I think there is no way to assure the convergence of the interpolations of two sample arrays. That said, if you have a prior knowledge of the underlying process, then you can choose among several interpolation methods to minimize the errors. For example, if you measure the drag force as a function of the wing velocity, you know the relation is square (a*V^2). Then you can choose polynomial fitting of the 2nd order and have pretty good match between the interpolations of the two serieses.
Try B-splines: Catmull-Rom interpolates (goes through the data points), B-spline does smoothing. For example, for uniformly-spaced data (not your case) Bspline(t) = (data(t-1) + 4*data(t) + data(t+1)) / 6 Of course the interpolated red / blue curves depend on the spacing of the red / blue data points, so cannot match perfectly.
I'd like to quote Introduction to Catmull-Rom Splines to suggest not using Catmull-Rom for this interpolation task. One of the features of the Catmull-Rom spline is that the specified curve will pass through all of the control points - this is not true of all types of splines. By definition your red interpolated curve will pass through all red data points and your blue interpolated curve will pass through all blue points. Therefore you won't get a best fit for both data sets. You might change your boundary conditions and use data points from both data sets for a piecewise approximation as shown in these slides.
I agree with ysap that this question cannot be answered as you may be expecting. There may be better interpolation methods, depending on your model dynamics - as with ysap, I recommend methods that utilize the underlying dynamics, if known. Regarding the red/blue samples, I think you have made a good observation about sampled and interpolated data sets and I would challenge your original expectation that: When computing the value at any time, we expect that both red and blue data sets will return similar values. I do not expect this. If you assume that you cannot perfectly interpolate - and particularly if the interpolation error is large compared to the errors in samples - then you are certain to have a continuous error function that exhibits largest errors longest (time) from your sample points. Therefore two data sets that have differing sample points should exhibit the behaviour you see because points that are far (in time) from red sample points may be near (in time) to blue sample points and vice versa - if staggered as your points are, this is sure to be true. Thus I would expect what you show, that: Viewed over time the values from each data set (red and blue) will appear to diverge and then converge about the original value. (If you do not have information about underlying dynamics (except frequency content), then Giacomo's points on sampling are key - however, you need not interpolate if looking at info below Nyquist.)
When sampling the original continuous function, the sampling frequency should comply to the Nyquist-Shannon sampling theorem, otherwise the sampling process introduces an error (also known as aliasing). The error, being different in the two datasets, results in a different value when you interpolate. Therefore, you need to know the highest frequency B of the original function and then collect samples with a frequency at least 2B. If your function has very high frequencies and you cannot sample that fast, you should at least try to filter them away before sampling.