How to draw millions of lines fast

How to draw millions of lines fast - r

x axes could be treated as 1:n.
y axes values distribute in a limited range [-1, 1]
I want to draw line segments connecting all points described by vectors above
geom_line(aes(x, y))
All works good except for the performance. It takes minutes to render the final image. Sample plot goes blow.
Is there any way to improve the performance?
Thank you for your comments. I did tried a resampling. But it's very hard to me to do a real "smart" resampling. As we cares a lot about the "out of local mean values" which is usually considered as "noise" in many statistical cases. Please allow me to show the problem by image, though it's not encouraged.
The image above is the original one, while below is the resampled one. I market the "important" information loss with arrows in the original image.

Thanks a lot for commenting so much. Eventually I think I could resolve this by aggregating hundreds of values into one line range.
To be more descriptive, assume there're 1M points.
Group them into 10K groups, with 100 points in each group.
Get the min & max values of each group.
For each group, draw a vertical line from min to max.
By doing such aggregation I could reduce the data by 1/group.size
Still, it surprises me a little that drawing one line could take tens of microseconds in R. At the very beginning I was thinking if there's any solution like "hardware acceleration"

Related

Detect peaks at beginning and end of x-axis

I've been working on detecting peaks within a data set of thousands of y~x relationships. Thanks to this post, I've been using loess and rollapply to detect peaks by comparing the local maximum to the smooth. Since, I've been working to optimise the span and w thresholds for loess and rollapply functions, respectively.
However, I have realised that several of my relationships have a peak at the beginning or the end on the x-axis, which are of my interest. But these peaks are not being identified. For now, I've tried to add fake variables outside of my x variable range to imitate a peak. For example, if my x values range from -50 to 160, I created x values of -100 and 210 and assigned a 0 y value to them.
This helped me to identify some of the relationships that have a peak at the beginning or the end. As you can see here:
However, for some it does not work.
Despite the fact that I feel uncomfortable adding 'fake' values to the relationship, the smoothing shifts the location of the peak frequently and more importantly, I cannot find a solution that allows to detect these beginning or end peaks. Does anyone know how to work out a solution that works in R?

segmenting lat/long data graph into lines/vectors

I have lat/lng data of multirotor UAV flights. There are alot of datapoints (~13k per flight) and I wish to find line segments from the data. They give me flight speed and direction. I know that most of the flights are guided missons meaning a point is given to fly to. However the exact points are unknown to me.
Here is a graph of a single flight lat/lng shifted to near (0,0) so they are visible on the same time-series graph.
I attempted to generate similar data, but there are several constraints and it may take more time to solve than working on the segmenting.
The graphs start and end nearly always at the same point.
Horisontal lines mean the UAV is stationary. These segments are expected.
Beginning and and end are always stationary for takeoff and landing.
There is some level of noise in the lines for the gps accuracy tho seemingly not that much.
Alot of data points.
The number of segments is unknown.
The noise I could calculate given the segments and least squares method to the line. Currently I'm thinking of sampling the data (to decimate it a little) and constructing lines. Merging the lines with smaller angle than x (dependant on the noise) and finding the intersection points of the lines left.
Another thought is to try and look at this problem in the frequency domain. The corners should be quite high frequency. Maybe I could make a custom filter kernel that would enable me to use a window function and win in efficency.
EDIT: Rewrote the question for more clarity and less rambling.

Adding plotstick-like arrows to a scatterplot

This is my first post here, thought i have read a lot of your Q&A these last 6 months. I'm currently working on ADCP (Aquatic Doppler Current Profiler) data, handled by the "oce" package from Dan Kelley (a little bit of advertising for those who want to deal with oceanographic datas in R). I'm not very experienced in R, and i have read the question relative to abline for levelplot functions "How to add lines to a levelplot made using lattice (abline somehow not working)?".
What i currently have is a levelplot representing a time series of echo intensity (from backscattered signal, which is monitored in the same time as current is) data taken in 10m of depth, this 10m depth line is parted into 25 rows, where each measurement is done along the line. (see the code part to obtain an image of what i have)
(unfortunately, my reputation doesn't allow me to post images).
I then proceed to generate an other plot, which represents arrows of the current direction as:
The length of each arrow gives an indication of the current strength
Its orientation is represented (all of this is done by taking the two components of the current intensity (East-West / North-South) and represent the resulting current).
There is an arrow drawn for each tick of time (thus for the 1000 columns of my example data, there are always two components of the current intensity).
Those arrows are drawn at the beginning of each measurement cell, thus at each row of my data, allowing to have a representation of currents for the whole water column.
You can see the code part to have a "as i have" representation of currents
The purpose of this question is to understand how i can superimpose those two representations, drawing my current arrows at each row of the represented data, thus making a representation of both current direction, intensity and echo intensity.
Here i can't find any link to describe what i mean, but this is something i have already seen.
I tried with the panel function which seems to be the best option, but my knowledge of R and the handeling of this kind of work is small, and i hope one of you may have the time and the knowledges to help me to solve this problem way faster than i could.
I am, of course, available to answer any questions or give precisions. I may ask a lot more, after working on a large code for 6 months, my thirst for learning is now large.
Code to represent data :
Here are some data to represent what I have:
U (north/south component of velocity) and V (East/west):
U1= c(0.043,0.042,0.043,0.026,0.066,-0.017,-0.014,-0.019,0.024,-0.007,0.000,-0.048,-0.057,-0.101,-0.063,-0.114,-0.132,-0.103,-0.080,-0.098,-0.123,-0.087,-0.071,-0.050,-0.095,-0.047,-0.031,-0.028,-0.015,0.014,-0.019,0.048,0.026,0.039,0.084,0.036,0.071,0.055,0.019,0.059,0.038,0.040,0.013,0.044,0.078,0.040,0.098,0.015,-0.009,0.013,0.038,0.013,0.039,-0.008,0.024,-0.004,0.046,-0.004,-0.079,-0.032,-0.023,-0.015,-0.001,-0.028,-0.030,-0.054,-0.071,-0.046,-0.029,0.012,0.016,0.049,-0.020,0.012,0.016,-0.021,0.017,0.013,-0.008,0.057,0.028,0.056,0.114,0.073,0.078,0.133,0.056,0.057,0.096,0.061,0.096,0.081,0.100,0.092,0.057,0.028,0.055,0.025,0.082,0.087,0.070,-0.010,0.024,-0.025,0.018,0.016,0.007,0.020,-0.031,-0.045,-0.009,-0.060,-0.074,-0.072,-0.082,-0.100,-0.047,-0.089,-0.074,-0.070,-0.070,-0.070,-0.075,-0.070,-0.055,-0.078,-0.039,-0.050,-0.049,0.024,-0.026,-0.021,0.008,-0.026,-0.018,0.002,-0.009,-0.025,0.029,-0.040,-0.006,0.055,0.018,-0.035,-0.011,-0.026,-0.014,-0.006,-0.021,-0.031,-0.030,-0.056,-0.034,-0.026,-0.041,-0.107,-0.069,-0.082,-0.091,-0.096,-0.043,-0.038,-0.056,-0.068,-0.064,-0.042,-0.064,-0.058,0.016,-0.041,0.018,-0.008,0.058,0.006,0.007,0.060,0.011,0.050,-0.028,0.023,0.015,0.083,0.106,0.057,0.096,0.055,0.119,0.145,0.078,0.090,0.110,0.087,0.098,0.092,0.050,0.068,0.042,0.059,0.030,-0.005,-0.005,-0.013,-0.013,-0.016,0.008,-0.045,-0.021,-0.036,0.020,-0.018,-0.032,-0.038,0.021,-0.077,0.003,-0.010,-0.001,-0.024,-0.020,-0.022,-0.029,-0.053,-0.022,-0.007,-0.073,0.013,0.018,0.002,-0.038,0.024,0.025,0.033,0.008,0.016,-0.018,0.023,-0.001,-0.010,0.006,0.053,0.004,0.001,-0.003,0.009,0.019,0.024,0.031,0.024,0.009,-0.009,-0.035,-0.030,-0.031,-0.094,-0.006,-0.052,-0.061,-0.104,-0.098,-0.054,-0.161,-0.110,-0.078,-0.178,-0.052,-0.073,-0.051,-0.065,-0.029,-0.012,-0.053,-0.070,-0.040,-0.056,-0.004,-0.032,-0.065,-0.005,0.036,0.023,0.043,0.078,0.039,0.019,0.061,0.025,0.036,0.036,0.062,0.048,0.073,0.037,0.025,0.000,-0.007,-0.014,-0.050,-0.014,0.007,-0.035,-0.115,-0.039,-0.113,-0.102,-0.109,-0.158,-0.158,-0.133,-0.110,-0.170,-0.124,-0.115,-0.134,-0.097,-0.106,-0.155,-0.168,-0.038,-0.040,-0.074,-0.011,-0.040,-0.003,-0.019,-0.022,-0.006,-0.049,-0.048,-0.039,-0.011,-0.036,-0.001,-0.018,-0.037,-0.001,0.033,0.061,0.054,0.005,0.040,0.045,0.062,0.016,-0.007,-0.005,0.009,0.044,0.029,-0.016,-0.028,-0.021,-0.036,-0.072,-0.138,-0.060,-0.109,-0.064,-0.142,-0.081,-0.032,-0.077,-0.058,-0.035,-0.039,-0.013,0.007,0.007,-0.052,0.024,0.018,0.067,0.015,-0.002,-0.004,0.038,-0.010,0.056)
V1=c(-0.083,-0.089,-0.042,-0.071,-0.043,-0.026,0.025,0.059,-0.019,0.107,0.049,0.089,0.094,0.090,0.120,0.169,0.173,0.159,0.141,0.157,0.115,0.128,0.154,0.083,0.038,0.081,0.129,0.120,0.112,0.074,0.022,-0.022,-0.028,-0.048,-0.027,-0.056,-0.027,-0.107,-0.020,-0.063,-0.069,-0.019,-0.055,-0.071,-0.027,-0.034,-0.018,-0.089,-0.068,-0.129,-0.034,-0.002,0.011,-0.009,-0.038,-0.013,-0.006,0.027,0.037,0.022,0.087,0.080,0.119,0.085,0.076,0.072,0.029,0.103,0.019,0.020,0.052,0.024,-0.051,-0.024,-0.008,0.011,-0.019,0.023,-0.011,-0.033,-0.101,-0.157,-0.094,-0.099,-0.106,-0.103,-0.139,-0.093,-0.098,-0.083,-0.118,-0.142,-0.155,-0.095,-0.122,-0.072,-0.034,-0.047,-0.036,0.014,0.035,-0.034,-0.012,0.054,0.030,0.060,0.091,0.013,0.049,0.083,0.070,0.127,0.048,0.118,0.123,0.099,0.097,0.074,0.125,0.051,0.107,0.069,0.040,0.102,0.100,0.119,0.087,0.077,0.044,0.091,0.020,0.010,-0.028,0.026,-0.018,-0.020,0.010,0.034,0.005,0.010,0.028,-0.043,0.025,-0.069,-0.003,0.004,-0.001,0.024,0.032,0.076,0.033,0.071,0.000,0.052,0.034,0.058,0.002,0.070,0.025,0.056,0.051,0.080,0.051,0.101,0.009,0.052,0.079,0.035,0.051,0.049,0.064,0.004,0.011,0.005,0.031,-0.021,-0.024,-0.048,-0.011,-0.072,-0.034,-0.020,-0.052,-0.069,-0.088,-0.093,-0.084,-0.143,-0.103,-0.110,-0.124,-0.175,-0.083,-0.117,-0.090,-0.090,-0.040,-0.068,-0.082,-0.082,-0.061,-0.013,-0.029,-0.032,-0.046,-0.031,-0.048,-0.028,-0.034,-0.012,0.006,-0.062,-0.043,0.010,0.036,0.050,0.030,0.084,0.027,0.074,0.082,0.087,0.079,0.031,0.003,0.001,0.038,0.002,-0.038,0.003,0.023,-0.011,0.013,0.003,-0.046,-0.021,-0.050,-0.063,-0.068,-0.085,-0.051,-0.052,-0.065,0.014,-0.016,-0.082,-0.026,-0.032,0.019,-0.026,0.036,-0.005,0.092,0.070,0.045,0.074,0.091,0.122,-0.007,0.094,0.064,0.087,0.063,0.083,0.109,0.062,0.096,0.036,-0.019,0.075,0.052,0.025,0.031,0.078,0.044,-0.018,-0.040,-0.039,-0.140,-0.037,-0.095,-0.056,-0.044,-0.039,-0.086,-0.062,-0.085,-0.023,-0.103,-0.035,-0.067,-0.096,-0.097,-0.060,0.003,-0.051,0.014,-0.002,0.054,0.045,0.073,0.080,0.096,0.104,0.126,0.144,0.136,0.132,0.160,0.155,0.136,0.080,0.144,0.087,0.093,0.103,0.151,0.165,0.146,0.159,0.156,0.002,0.023,-0.019,0.078,0.031,0.038,0.019,0.094,0.018,0.028,0.064,-0.052,-0.034,0.000,-0.074,-0.076,-0.028,-0.048,-0.025,-0.095,-0.098,-0.045,-0.016,-0.030,-0.036,-0.012,0.023,0.038,0.042,0.039,0.073,0.066,0.027,0.016,0.093,0.129,0.138,0.121,0.077,0.046,0.067,0.068,0.023,0.062,0.038,-0.007,0.055,0.006,-0.015,0.008,0.064,0.012,0.004,-0.055,0.018,0.042)
U2=c(0.022,0.005,-0.022,0.025,-0.014,-0.020,-0.001,-0.021,-0.008,-0.006,-0.056,0.050,-0.068,0.018,-0.106,-0.053,-0.084,-0.082,-0.061,-0.041,-0.057,-0.123,-0.060,-0.029,-0.084,-0.004,0.030,-0.021,-0.036,-0.016,0.006,0.088,0.088,0.079,0.063,0.097,0.020,-0.048,0.046,0.057,0.065,0.042,0.022,0.016,0.041,0.109,0.024,-0.010,-0.084,-0.002,0.004,-0.033,-0.025,-0.020,-0.061,-0.060,-0.043,-0.027,-0.054,-0.054,-0.040,-0.077,-0.043,-0.014,0.030,-0.051,0.001,-0.029,0.008,-0.023,0.015,0.002,-0.001,0.029,0.048,0.081,-0.022,0.040,0.018,0.131,0.059,0.055,0.043,0.027,0.091,0.104,0.101,0.084,0.048,0.057,0.044,0.083,0.063,0.083,0.079,0.042,-0.021,0.017,0.005,0.001,-0.033,0.010,-0.028,-0.035,-0.012,-0.034,-0.055,-0.009,0.001,-0.084,-0.047,-0.020,-0.046,-0.042,-0.058,-0.071,0.013,-0.045,-0.070,0.000,-0.067,-0.090,0.012,-0.013,-0.013,-0.009,-0.063,-0.047,-0.030,0.046,0.026,0.019,0.007,-0.056,-0.062,0.009,-0.019,-0.005,0.003,0.022,-0.006,-0.019,0.020,0.025,0.040,-0.032,0.015,0.019,-0.014,-0.031,-0.047,0.010,-0.058,-0.079,-0.052,-0.044,0.012,-0.039,-0.007,-0.068,-0.095,-0.053,-0.066,-0.056,-0.033,-0.006,0.001,0.010,0.004,0.011,0.013,0.029,-0.011,0.007,0.023,0.087,0.054,0.040,0.013,-0.006,0.076,0.086,0.103,0.121,0.070,0.074,0.067,0.045,0.088,0.041,0.075,0.039,0.043,0.016,0.065,0.056,0.047,-0.002,-0.001,-0.009,-0.029,0.018,0.041,0.002,-0.022,0.003,0.008,0.031,0.003,-0.031,-0.015,0.014,-0.057,-0.043,-0.045,-0.067,-0.040,-0.013,-0.111,-0.067,-0.055,-0.004,-0.070,-0.019,0.009,0.009,0.032,-0.021,0.023,0.123,-0.032,0.040,0.012,0.042,0.038,0.037,-0.007,0.003,0.011,0.090,0.039,0.083,0.023,0.056,0.030,0.042,0.030,-0.046,-0.034,-0.021,-0.076,-0.017,-0.071,-0.053,-0.014,-0.060,-0.038,-0.076,-0.011,-0.005,-0.051,-0.043,-0.032,-0.014,-0.038,-0.081,-0.021,-0.035,0.014,-0.001,0.001,0.003,-0.029,-0.031,0.000,0.048,-0.036,0.034,0.054,0.001,0.046,0.006,0.039,0.015,0.012,0.034,0.022,0.015,0.033,0.037,0.012,0.057,0.001,-0.014,0.012,-0.007,-0.022,-0.002,-0.008,0.043,-0.041,-0.057,-0.006,-0.079,-0.070,-0.038,-0.040,-0.073,-0.045,-0.101,-0.092,-0.046,-0.047,-0.023,-0.028,-0.019,-0.086,-0.047,-0.038,-0.068,-0.017,0.037,-0.010,-0.016,0.010,-0.005,-0.031,0.004,-0.034,0.005,0.006,-0.015,0.017,-0.043,-0.007,-0.009,0.013,0.026,-0.036,0.011,0.047,-0.025,-0.023,0.043,-0.020,-0.003,-0.043,0.000,-0.018,-0.075,-0.045,-0.063,-0.043,-0.055,0.007,-0.063,-0.085,-0.031,0.005,-0.067,-0.059,-0.059,-0.029,-0.014,-0.040,-0.072,-0.018,0.039,-0.006,-0.001,-0.015,0.038,0.038,-0.009,0.026,0.017,0.056)
V2=c(-0.014,0.001,0.004,-0.002,0.022,0.019,0.023,-0.023,0.030,-0.085,-0.007,-0.027,0.100,0.058,0.108,0.055,0.132,0.115,0.084,0.046,0.102,0.121,0.036,0.019,0.066,0.049,-0.011,0.020,0.023,0.011,0.041,0.009,-0.009,-0.023,-0.036,0.031,0.012,0.026,-0.011,0.009,-0.027,-0.033,-0.054,-0.004,-0.040,-0.048,-0.009,0.023,-0.028,0.022,0.090,0.060,0.040,0.003,-0.011,0.030,0.107,0.025,0.084,0.036,0.074,0.065,0.078,0.011,0.058,0.092,0.083,0.080,0.039,0.000,-0.027,0.035,0.011,0.004,0.023,-0.033,-0.060,-0.049,-0.101,-0.033,-0.105,-0.042,-0.088,-0.086,-0.093,-0.085,-0.028,-0.046,-0.045,-0.052,-0.009,-0.066,-0.073,-0.067,0.011,-0.057,-0.087,-0.066,-0.103,-0.075,0.003,-0.021,0.010,-0.013,0.021,0.020,0.084,0.028,0.127,0.050,0.104,0.097,0.075,0.021,0.057,0.095,0.080,0.077,0.086,0.110,0.054,0.016,0.105,0.065,0.046,0.047,0.072,0.058,0.092,0.063,0.033,0.087,0.036,0.049,0.093,0.008,0.064,0.068,0.040,0.049,0.035,0.042,0.045,0.021,0.056,0.007,0.026,0.067,0.046,0.088,0.084,0.070,0.037,0.079,0.065,0.074,0.077,0.023,0.094,0.061,0.096,0.068,0.067,0.091,0.061,0.069,0.090,0.046,0.057,0.011,-0.018,0.005,0.001,-0.023,-0.087,0.010,0.023,-0.025,-0.040,-0.059,-0.063,-0.075,-0.136,-0.078,-0.102,-0.128,-0.116,-0.091,-0.136,-0.083,-0.115,-0.063,-0.055,-0.080,-0.093,-0.099,-0.053,-0.042,-0.011,-0.034,-0.027,-0.042,-0.022,-0.008,-0.033,-0.039,-0.036,0.019,0.036,-0.002,0.000,-0.021,0.060,0.030,0.073,0.080,0.061,0.046,0.062,0.010,0.034,0.103,0.107,0.016,0.080,0.067,0.007,0.060,0.021,-0.026,0.008,0.051,0.030,0.001,-0.036,-0.047,0.000,0.006,0.006,0.013,0.009,0.019,0.009,-0.086,-0.020,0.018,0.039,0.014,0.011,0.052,0.031,0.095,0.047,0.065,0.114,0.086,0.102,0.037,0.039,0.060,0.024,0.091,0.058,0.065,0.060,0.045,0.031,0.062,0.047,0.043,0.057,0.032,0.057,0.051,0.019,0.056,0.024,-0.003,0.023,-0.013,-0.032,-0.022,-0.064,-0.021,-0.050,-0.063,-0.090,-0.082,-0.076,-0.077,-0.042,-0.060,-0.010,-0.060,-0.069,-0.028,-0.071,-0.046,-0.020,-0.074,0.080,0.071,0.065,0.079,0.065,0.039,0.061,0.154,0.072,0.067,0.133,0.106,0.080,0.047,0.053,0.110,0.080,0.122,0.075,0.052,0.034,0.081,0.118,0.079,0.101,0.053,0.082,0.036,0.033,0.026,0.002,-0.002,0.020,0.087,0.021,0.034,0.003,-0.021,0.016,-0.009,-0.045,-0.043,-0.020,0.027,0.008,-0.006,0.043,0.045,0.014,0.053,0.083,0.113,0.091,0.028,0.060,0.040,0.019,0.114,0.126,0.090,0.046,0.089,0.029,0.030,0.010,0.045,0.040,0.072,-0.033,-0.008,0.014,-0.018,-0.004,-0.037,0.015,-0.021,-0.015)
bindistances=c(1.37,1.62,1.87,2.12,2.37,2.62,2.87,3.12,3.37,3.62,3.87,4.12,4.37,4.62,4.87,5.12,5.37,5.62,5.87,6.12,6.37,6.62,6.87,7.12,7.37,7.62,7.87,8.12)
Then, as a representation of currents:
AA=14
x11()
par(mfrow=c(4,1))
plotSticks(x=seq(from=(1),
to=(377),
by=(1)),
u=U1,
v=V1,
yscale=ysc,xlab='',ylab='',xaxt='n',yaxt='n',col=(rep('black',384)))
axis(side=1)
plotSticks(x=seq(from=(1),
to=(377),
by=(1)),
u=U2,
v=V2,
yscale=ysc,xlab='',ylab='',xaxt='n',yaxt='n',col=(rep('black',384)))
plotSticks(x=seq(from=(1),
to=(377),
by=(1)),
u=U2,
v=V2,
yscale=ysc,xlab='',ylab='',xaxt='n',yaxt='n',col=(rep('black',384)))
plotSticks(x=seq(from=(1),
to=(377),
by=(1)),
u=U2,
v=V2,
yscale=ysc,xlab='',ylab='',xaxt='n',yaxt='n',col=(rep('black',384)))
In order to simplify the representation, the three last plots are based on the same data.

Exclude graph values above certain point

I would like to ensure that when looking at my web-server response time graphs I can see a good level of detail from 0-5k on the scale of my graph. However occasionally there are metrics above the 5k (File downloads) mark which then increase the scale of the graph making it difficult to see what is going on around the regular range of values.
How do I exclude metric values from being plotted that are above 5k? Bearing in mind I do not want metrics themselves to be excluded.
Or perhaps the best thing to do would be to scale down the high points with log, but then I loose the actual scale information, which is quite useful at a glance.
Any help appreciated.

From the Graphite Documentation:
http://graphite.readthedocs.org/en/latest/render_api.html#ymax
Default: The highest value of any of the series displayed
Manually sets the upper bound of the graph. Can be passed any integer
or floating point number.
Example:
&yMax=0.2345
Looks like yMax parameter was only a suggestion at one point. Reported to be strictly enforced as of 0.9.5. For more: https://bugs.launchpad.net/graphite/+bug/412663
Also, from: http://graphite.wikidot.com/url-api-reference
yMin and yMax set the minimum and maximum y-values for the generated
image. A good use of these parameters would be min=0&max=100 when the
value you are graphing is a percentage.
Some other finds. Not sure if they're entirely relevant; might be helpful.
graphite-graph-dsl: A small DSL to describe graphite graphs
https://github.com/behrendsj/graphite-graph-dsl
Added ability to define the right y-axis min and max values: https://github.com/behrendsj/graphite-graph-dsl/commit/11e146b0b3eb82faa7c1f5db5af324c81db66144
graphene: Graphene is a realtime dashboard & graphing toolkit based on D3 and Backbone.
https://github.com/jondot/graphene
Define yMax support: https://github.com/jondot/graphene/pull/33

Finding a density peak / cluster centrum in 2D grid / point process

I have a dataset with minute by minute GPS coordinates recorded by a persons cellphone. I.e. the dataset has 1440 rows with LON/LAT values. Based on the data I would like a point estimate (lon/lat value) of where the participants home is. Let's assume that home is the single location where they spend most of their time in a given 24h interval. Furthermore, the GPS sensor most of the time has quite high accuracy, however sometimes it is completely off resulting in gigantic outliers.
I think the best way to go about this is to treat it as a point process and use 2D density estimation to find the peak. Is there a native way to do this in R? I looked into kde2d (MASS) but this didn't really seem to do the trick. Kde2d creates a 25x25 grid of the data range with density values. However, in my data, the person can easily travel 100 miles or more per day, so these blocks are generally too large of an estimate. I could narrow them down and use a much larger grid but I am sure there must be a better way to get a point estimate.

There are "time spent" functions in the trip package (I'm the author). You can create objects from the track data that understand the underlying track process over time, and simply process the points assuming straight line segments between fixes. If "home" is where the largest value pixel is, i.e. when you break up all the segments based on the time duration and sum them into cells, then it's easy to find it. A "time spent" grid from the tripGrid function is a SpatialGridDataFrame with the standard sp package classes, and a trip object can be composed of one or many tracks.
Using rgdal you can easily transform coordinates to an appropriate map projection if lon/lat is not appropriate for your extent, but it makes no difference to the grid/time-spent calculation of line segments.
There is a simple speedfilter to remove fixes that imply movement that is too fast, but that is very simplistic and can introduce new problems, in general updating or filtering tracks for unlikely movement can be very complicated. (In my experience a basic time spent gridding gets you as good an estimate as many sophisticated models that just open up new complications). The filter works with Cartesian or long/lat coordinates, using tools in sp to calculate distances (long/lat is reliable, whereas a poor map projection choice can introduce problems - over short distances like humans on land it's probably no big deal).
(The function tripGrid calculates the exact components of the straight line segments using pixellate.psp, but that detail is hidden in the implementation).
In terms of data preparation, trip is strict about a sensible sequence of times and will prevent you from creating an object if the data have duplicates, are out of order, etc. There is an example of reading data from a text file in ?trip, and a very simple example with (really) dummy data is:
library(trip)
d <- data.frame(x = 1:10, y = rnorm(10), tms = Sys.time() + 1:10, id = gl(1, 5))
coordinates(d) <- ~x+y
tr <- trip(d, c("tms", "id"))
g <- tripGrid(tr)
pt <- coordinates(g)[which.max(g$z), ]
image(g, col = c("transparent", heat.colors(16)))
lines(tr, col = "black")
points(pt[1], pt[2], pch = "+", cex = 2)
That dummy track has no overlapping regions, but it shows that finding the max point in "time spent" is simple enough.

How about using the location that minimises the sum squared distance to all the events? This might be close to the supremum of any kernel smoothing if my brain is working right.
If your data comprises two clusters (home and work) then I think the location will be in the biggest cluster rather than between them. Its not the same as the simple mean of the x and y coordinates.
For an uncertainty on that, jitter your data by whatever your positional uncertainty is (would be great if you had that value from the GPS, otherwise guess - 50 metres?) and recompute. Do that 100 times, do a kernel smoothing of those locations and find the 95% contour.
Not rigorous, and I need to experiment with this minimum distance/kernel supremum thing...

In response to spacedman - I am pretty sure least squares won't work. Least squares is best known for bowing to the demands of outliers, without much weighting to things that are "nearby". This is the opposite of what is desired.
The bisquare estimator would probably work better, in my opinion - but I have never used it. I think it also requires some tuning.
It's more or less like a least squares estimator for a certain distance from 0, and then the weighting is constant beyond that. So once a point becomes an outlier, it's penalty is constant. We don't want outliers to weigh more and more and more as we move away from them, we would rather weigh them constant, and let the optimization focus on better fitting the things in the vicinity of the cluster.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to draw millions of lines fast - r

Related

Detect peaks at beginning and end of x-axis

segmenting lat/long data graph into lines/vectors

Adding plotstick-like arrows to a scatterplot

Exclude graph values above certain point

Finding a density peak / cluster centrum in 2D grid / point process

Categories

Resources