Plotting cosine wave samples in Maple - plot

I'm having trouble with Maple.
I have a cosine wave, which I figured out how to plot, but now I have to take samples
from that wave and plot those(as dots) over top of the original cosine wave.
Here is the question from the assignment:
"Produce the samples from Q1 above and plot the result (plot the points on a plot of the cosine wave - use different colours for both, it will look like a cosine wave with dots on it)"
Problem is, my samples keep being straight lines at different heights
http://i197.photobucket.com/albums/aa221/Haseo_Ame/Maple.png
I'm not sure what I'm doing wrong since I've never used maple before.

Firstly, try not to build up lists using repeated concatenation (which can incur an O(n^2) in resources) if you can use the seq command instead (which can incur an O(n) cost in resources). You should always reconsider, when coding like s:=[op(s),...] in a loop.
Next, a point-plot needs pairs of x-y values. Your list is just a collection of scalar values, and hence is being interpreted as a collection of constant functions to be plotted.
The pairs of x-y values can be in a list of (2-element) lists such as [[x1,y1],...,[xn,yn]
It's not clear how you want your x-axis scaled, but you could start off with something like this,
s:=[seq([i, 4*cos(2*Pi*i*70/200+Pi/4)],i=0..20)]:
plot(s, style=point);
# s:=[seq([2*Pi*i*70/200+Pi/4, 4*cos(2*Pi*i*70/200+Pi/4)],i=0..20)]:
ps. Please post source code as text, not as embedded images, so that anyone trying to help needn't type it all in.

Related

Detect peaks at beginning and end of x-axis

I've been working on detecting peaks within a data set of thousands of y~x relationships. Thanks to this post, I've been using loess and rollapply to detect peaks by comparing the local maximum to the smooth. Since, I've been working to optimise the span and w thresholds for loess and rollapply functions, respectively.
However, I have realised that several of my relationships have a peak at the beginning or the end on the x-axis, which are of my interest. But these peaks are not being identified. For now, I've tried to add fake variables outside of my x variable range to imitate a peak. For example, if my x values range from -50 to 160, I created x values of -100 and 210 and assigned a 0 y value to them.
This helped me to identify some of the relationships that have a peak at the beginning or the end. As you can see here:
However, for some it does not work.
Despite the fact that I feel uncomfortable adding 'fake' values to the relationship, the smoothing shifts the location of the peak frequently and more importantly, I cannot find a solution that allows to detect these beginning or end peaks. Does anyone know how to work out a solution that works in R?

Adding plotstick-like arrows to a scatterplot

This is my first post here, thought i have read a lot of your Q&A these last 6 months. I'm currently working on ADCP (Aquatic Doppler Current Profiler) data, handled by the "oce" package from Dan Kelley (a little bit of advertising for those who want to deal with oceanographic datas in R). I'm not very experienced in R, and i have read the question relative to abline for levelplot functions "How to add lines to a levelplot made using lattice (abline somehow not working)?".
What i currently have is a levelplot representing a time series of echo intensity (from backscattered signal, which is monitored in the same time as current is) data taken in 10m of depth, this 10m depth line is parted into 25 rows, where each measurement is done along the line. (see the code part to obtain an image of what i have)
(unfortunately, my reputation doesn't allow me to post images).
I then proceed to generate an other plot, which represents arrows of the current direction as:
The length of each arrow gives an indication of the current strength
Its orientation is represented (all of this is done by taking the two components of the current intensity (East-West / North-South) and represent the resulting current).
There is an arrow drawn for each tick of time (thus for the 1000 columns of my example data, there are always two components of the current intensity).
Those arrows are drawn at the beginning of each measurement cell, thus at each row of my data, allowing to have a representation of currents for the whole water column.
You can see the code part to have a "as i have" representation of currents
The purpose of this question is to understand how i can superimpose those two representations, drawing my current arrows at each row of the represented data, thus making a representation of both current direction, intensity and echo intensity.
Here i can't find any link to describe what i mean, but this is something i have already seen.
I tried with the panel function which seems to be the best option, but my knowledge of R and the handeling of this kind of work is small, and i hope one of you may have the time and the knowledges to help me to solve this problem way faster than i could.
I am, of course, available to answer any questions or give precisions. I may ask a lot more, after working on a large code for 6 months, my thirst for learning is now large.
Code to represent data :
Here are some data to represent what I have:
U (north/south component of velocity) and V (East/west):
U1= c(0.043,0.042,0.043,0.026,0.066,-0.017,-0.014,-0.019,0.024,-0.007,0.000,-0.048,-0.057,-0.101,-0.063,-0.114,-0.132,-0.103,-0.080,-0.098,-0.123,-0.087,-0.071,-0.050,-0.095,-0.047,-0.031,-0.028,-0.015,0.014,-0.019,0.048,0.026,0.039,0.084,0.036,0.071,0.055,0.019,0.059,0.038,0.040,0.013,0.044,0.078,0.040,0.098,0.015,-0.009,0.013,0.038,0.013,0.039,-0.008,0.024,-0.004,0.046,-0.004,-0.079,-0.032,-0.023,-0.015,-0.001,-0.028,-0.030,-0.054,-0.071,-0.046,-0.029,0.012,0.016,0.049,-0.020,0.012,0.016,-0.021,0.017,0.013,-0.008,0.057,0.028,0.056,0.114,0.073,0.078,0.133,0.056,0.057,0.096,0.061,0.096,0.081,0.100,0.092,0.057,0.028,0.055,0.025,0.082,0.087,0.070,-0.010,0.024,-0.025,0.018,0.016,0.007,0.020,-0.031,-0.045,-0.009,-0.060,-0.074,-0.072,-0.082,-0.100,-0.047,-0.089,-0.074,-0.070,-0.070,-0.070,-0.075,-0.070,-0.055,-0.078,-0.039,-0.050,-0.049,0.024,-0.026,-0.021,0.008,-0.026,-0.018,0.002,-0.009,-0.025,0.029,-0.040,-0.006,0.055,0.018,-0.035,-0.011,-0.026,-0.014,-0.006,-0.021,-0.031,-0.030,-0.056,-0.034,-0.026,-0.041,-0.107,-0.069,-0.082,-0.091,-0.096,-0.043,-0.038,-0.056,-0.068,-0.064,-0.042,-0.064,-0.058,0.016,-0.041,0.018,-0.008,0.058,0.006,0.007,0.060,0.011,0.050,-0.028,0.023,0.015,0.083,0.106,0.057,0.096,0.055,0.119,0.145,0.078,0.090,0.110,0.087,0.098,0.092,0.050,0.068,0.042,0.059,0.030,-0.005,-0.005,-0.013,-0.013,-0.016,0.008,-0.045,-0.021,-0.036,0.020,-0.018,-0.032,-0.038,0.021,-0.077,0.003,-0.010,-0.001,-0.024,-0.020,-0.022,-0.029,-0.053,-0.022,-0.007,-0.073,0.013,0.018,0.002,-0.038,0.024,0.025,0.033,0.008,0.016,-0.018,0.023,-0.001,-0.010,0.006,0.053,0.004,0.001,-0.003,0.009,0.019,0.024,0.031,0.024,0.009,-0.009,-0.035,-0.030,-0.031,-0.094,-0.006,-0.052,-0.061,-0.104,-0.098,-0.054,-0.161,-0.110,-0.078,-0.178,-0.052,-0.073,-0.051,-0.065,-0.029,-0.012,-0.053,-0.070,-0.040,-0.056,-0.004,-0.032,-0.065,-0.005,0.036,0.023,0.043,0.078,0.039,0.019,0.061,0.025,0.036,0.036,0.062,0.048,0.073,0.037,0.025,0.000,-0.007,-0.014,-0.050,-0.014,0.007,-0.035,-0.115,-0.039,-0.113,-0.102,-0.109,-0.158,-0.158,-0.133,-0.110,-0.170,-0.124,-0.115,-0.134,-0.097,-0.106,-0.155,-0.168,-0.038,-0.040,-0.074,-0.011,-0.040,-0.003,-0.019,-0.022,-0.006,-0.049,-0.048,-0.039,-0.011,-0.036,-0.001,-0.018,-0.037,-0.001,0.033,0.061,0.054,0.005,0.040,0.045,0.062,0.016,-0.007,-0.005,0.009,0.044,0.029,-0.016,-0.028,-0.021,-0.036,-0.072,-0.138,-0.060,-0.109,-0.064,-0.142,-0.081,-0.032,-0.077,-0.058,-0.035,-0.039,-0.013,0.007,0.007,-0.052,0.024,0.018,0.067,0.015,-0.002,-0.004,0.038,-0.010,0.056)
V1=c(-0.083,-0.089,-0.042,-0.071,-0.043,-0.026,0.025,0.059,-0.019,0.107,0.049,0.089,0.094,0.090,0.120,0.169,0.173,0.159,0.141,0.157,0.115,0.128,0.154,0.083,0.038,0.081,0.129,0.120,0.112,0.074,0.022,-0.022,-0.028,-0.048,-0.027,-0.056,-0.027,-0.107,-0.020,-0.063,-0.069,-0.019,-0.055,-0.071,-0.027,-0.034,-0.018,-0.089,-0.068,-0.129,-0.034,-0.002,0.011,-0.009,-0.038,-0.013,-0.006,0.027,0.037,0.022,0.087,0.080,0.119,0.085,0.076,0.072,0.029,0.103,0.019,0.020,0.052,0.024,-0.051,-0.024,-0.008,0.011,-0.019,0.023,-0.011,-0.033,-0.101,-0.157,-0.094,-0.099,-0.106,-0.103,-0.139,-0.093,-0.098,-0.083,-0.118,-0.142,-0.155,-0.095,-0.122,-0.072,-0.034,-0.047,-0.036,0.014,0.035,-0.034,-0.012,0.054,0.030,0.060,0.091,0.013,0.049,0.083,0.070,0.127,0.048,0.118,0.123,0.099,0.097,0.074,0.125,0.051,0.107,0.069,0.040,0.102,0.100,0.119,0.087,0.077,0.044,0.091,0.020,0.010,-0.028,0.026,-0.018,-0.020,0.010,0.034,0.005,0.010,0.028,-0.043,0.025,-0.069,-0.003,0.004,-0.001,0.024,0.032,0.076,0.033,0.071,0.000,0.052,0.034,0.058,0.002,0.070,0.025,0.056,0.051,0.080,0.051,0.101,0.009,0.052,0.079,0.035,0.051,0.049,0.064,0.004,0.011,0.005,0.031,-0.021,-0.024,-0.048,-0.011,-0.072,-0.034,-0.020,-0.052,-0.069,-0.088,-0.093,-0.084,-0.143,-0.103,-0.110,-0.124,-0.175,-0.083,-0.117,-0.090,-0.090,-0.040,-0.068,-0.082,-0.082,-0.061,-0.013,-0.029,-0.032,-0.046,-0.031,-0.048,-0.028,-0.034,-0.012,0.006,-0.062,-0.043,0.010,0.036,0.050,0.030,0.084,0.027,0.074,0.082,0.087,0.079,0.031,0.003,0.001,0.038,0.002,-0.038,0.003,0.023,-0.011,0.013,0.003,-0.046,-0.021,-0.050,-0.063,-0.068,-0.085,-0.051,-0.052,-0.065,0.014,-0.016,-0.082,-0.026,-0.032,0.019,-0.026,0.036,-0.005,0.092,0.070,0.045,0.074,0.091,0.122,-0.007,0.094,0.064,0.087,0.063,0.083,0.109,0.062,0.096,0.036,-0.019,0.075,0.052,0.025,0.031,0.078,0.044,-0.018,-0.040,-0.039,-0.140,-0.037,-0.095,-0.056,-0.044,-0.039,-0.086,-0.062,-0.085,-0.023,-0.103,-0.035,-0.067,-0.096,-0.097,-0.060,0.003,-0.051,0.014,-0.002,0.054,0.045,0.073,0.080,0.096,0.104,0.126,0.144,0.136,0.132,0.160,0.155,0.136,0.080,0.144,0.087,0.093,0.103,0.151,0.165,0.146,0.159,0.156,0.002,0.023,-0.019,0.078,0.031,0.038,0.019,0.094,0.018,0.028,0.064,-0.052,-0.034,0.000,-0.074,-0.076,-0.028,-0.048,-0.025,-0.095,-0.098,-0.045,-0.016,-0.030,-0.036,-0.012,0.023,0.038,0.042,0.039,0.073,0.066,0.027,0.016,0.093,0.129,0.138,0.121,0.077,0.046,0.067,0.068,0.023,0.062,0.038,-0.007,0.055,0.006,-0.015,0.008,0.064,0.012,0.004,-0.055,0.018,0.042)
U2=c(0.022,0.005,-0.022,0.025,-0.014,-0.020,-0.001,-0.021,-0.008,-0.006,-0.056,0.050,-0.068,0.018,-0.106,-0.053,-0.084,-0.082,-0.061,-0.041,-0.057,-0.123,-0.060,-0.029,-0.084,-0.004,0.030,-0.021,-0.036,-0.016,0.006,0.088,0.088,0.079,0.063,0.097,0.020,-0.048,0.046,0.057,0.065,0.042,0.022,0.016,0.041,0.109,0.024,-0.010,-0.084,-0.002,0.004,-0.033,-0.025,-0.020,-0.061,-0.060,-0.043,-0.027,-0.054,-0.054,-0.040,-0.077,-0.043,-0.014,0.030,-0.051,0.001,-0.029,0.008,-0.023,0.015,0.002,-0.001,0.029,0.048,0.081,-0.022,0.040,0.018,0.131,0.059,0.055,0.043,0.027,0.091,0.104,0.101,0.084,0.048,0.057,0.044,0.083,0.063,0.083,0.079,0.042,-0.021,0.017,0.005,0.001,-0.033,0.010,-0.028,-0.035,-0.012,-0.034,-0.055,-0.009,0.001,-0.084,-0.047,-0.020,-0.046,-0.042,-0.058,-0.071,0.013,-0.045,-0.070,0.000,-0.067,-0.090,0.012,-0.013,-0.013,-0.009,-0.063,-0.047,-0.030,0.046,0.026,0.019,0.007,-0.056,-0.062,0.009,-0.019,-0.005,0.003,0.022,-0.006,-0.019,0.020,0.025,0.040,-0.032,0.015,0.019,-0.014,-0.031,-0.047,0.010,-0.058,-0.079,-0.052,-0.044,0.012,-0.039,-0.007,-0.068,-0.095,-0.053,-0.066,-0.056,-0.033,-0.006,0.001,0.010,0.004,0.011,0.013,0.029,-0.011,0.007,0.023,0.087,0.054,0.040,0.013,-0.006,0.076,0.086,0.103,0.121,0.070,0.074,0.067,0.045,0.088,0.041,0.075,0.039,0.043,0.016,0.065,0.056,0.047,-0.002,-0.001,-0.009,-0.029,0.018,0.041,0.002,-0.022,0.003,0.008,0.031,0.003,-0.031,-0.015,0.014,-0.057,-0.043,-0.045,-0.067,-0.040,-0.013,-0.111,-0.067,-0.055,-0.004,-0.070,-0.019,0.009,0.009,0.032,-0.021,0.023,0.123,-0.032,0.040,0.012,0.042,0.038,0.037,-0.007,0.003,0.011,0.090,0.039,0.083,0.023,0.056,0.030,0.042,0.030,-0.046,-0.034,-0.021,-0.076,-0.017,-0.071,-0.053,-0.014,-0.060,-0.038,-0.076,-0.011,-0.005,-0.051,-0.043,-0.032,-0.014,-0.038,-0.081,-0.021,-0.035,0.014,-0.001,0.001,0.003,-0.029,-0.031,0.000,0.048,-0.036,0.034,0.054,0.001,0.046,0.006,0.039,0.015,0.012,0.034,0.022,0.015,0.033,0.037,0.012,0.057,0.001,-0.014,0.012,-0.007,-0.022,-0.002,-0.008,0.043,-0.041,-0.057,-0.006,-0.079,-0.070,-0.038,-0.040,-0.073,-0.045,-0.101,-0.092,-0.046,-0.047,-0.023,-0.028,-0.019,-0.086,-0.047,-0.038,-0.068,-0.017,0.037,-0.010,-0.016,0.010,-0.005,-0.031,0.004,-0.034,0.005,0.006,-0.015,0.017,-0.043,-0.007,-0.009,0.013,0.026,-0.036,0.011,0.047,-0.025,-0.023,0.043,-0.020,-0.003,-0.043,0.000,-0.018,-0.075,-0.045,-0.063,-0.043,-0.055,0.007,-0.063,-0.085,-0.031,0.005,-0.067,-0.059,-0.059,-0.029,-0.014,-0.040,-0.072,-0.018,0.039,-0.006,-0.001,-0.015,0.038,0.038,-0.009,0.026,0.017,0.056)
V2=c(-0.014,0.001,0.004,-0.002,0.022,0.019,0.023,-0.023,0.030,-0.085,-0.007,-0.027,0.100,0.058,0.108,0.055,0.132,0.115,0.084,0.046,0.102,0.121,0.036,0.019,0.066,0.049,-0.011,0.020,0.023,0.011,0.041,0.009,-0.009,-0.023,-0.036,0.031,0.012,0.026,-0.011,0.009,-0.027,-0.033,-0.054,-0.004,-0.040,-0.048,-0.009,0.023,-0.028,0.022,0.090,0.060,0.040,0.003,-0.011,0.030,0.107,0.025,0.084,0.036,0.074,0.065,0.078,0.011,0.058,0.092,0.083,0.080,0.039,0.000,-0.027,0.035,0.011,0.004,0.023,-0.033,-0.060,-0.049,-0.101,-0.033,-0.105,-0.042,-0.088,-0.086,-0.093,-0.085,-0.028,-0.046,-0.045,-0.052,-0.009,-0.066,-0.073,-0.067,0.011,-0.057,-0.087,-0.066,-0.103,-0.075,0.003,-0.021,0.010,-0.013,0.021,0.020,0.084,0.028,0.127,0.050,0.104,0.097,0.075,0.021,0.057,0.095,0.080,0.077,0.086,0.110,0.054,0.016,0.105,0.065,0.046,0.047,0.072,0.058,0.092,0.063,0.033,0.087,0.036,0.049,0.093,0.008,0.064,0.068,0.040,0.049,0.035,0.042,0.045,0.021,0.056,0.007,0.026,0.067,0.046,0.088,0.084,0.070,0.037,0.079,0.065,0.074,0.077,0.023,0.094,0.061,0.096,0.068,0.067,0.091,0.061,0.069,0.090,0.046,0.057,0.011,-0.018,0.005,0.001,-0.023,-0.087,0.010,0.023,-0.025,-0.040,-0.059,-0.063,-0.075,-0.136,-0.078,-0.102,-0.128,-0.116,-0.091,-0.136,-0.083,-0.115,-0.063,-0.055,-0.080,-0.093,-0.099,-0.053,-0.042,-0.011,-0.034,-0.027,-0.042,-0.022,-0.008,-0.033,-0.039,-0.036,0.019,0.036,-0.002,0.000,-0.021,0.060,0.030,0.073,0.080,0.061,0.046,0.062,0.010,0.034,0.103,0.107,0.016,0.080,0.067,0.007,0.060,0.021,-0.026,0.008,0.051,0.030,0.001,-0.036,-0.047,0.000,0.006,0.006,0.013,0.009,0.019,0.009,-0.086,-0.020,0.018,0.039,0.014,0.011,0.052,0.031,0.095,0.047,0.065,0.114,0.086,0.102,0.037,0.039,0.060,0.024,0.091,0.058,0.065,0.060,0.045,0.031,0.062,0.047,0.043,0.057,0.032,0.057,0.051,0.019,0.056,0.024,-0.003,0.023,-0.013,-0.032,-0.022,-0.064,-0.021,-0.050,-0.063,-0.090,-0.082,-0.076,-0.077,-0.042,-0.060,-0.010,-0.060,-0.069,-0.028,-0.071,-0.046,-0.020,-0.074,0.080,0.071,0.065,0.079,0.065,0.039,0.061,0.154,0.072,0.067,0.133,0.106,0.080,0.047,0.053,0.110,0.080,0.122,0.075,0.052,0.034,0.081,0.118,0.079,0.101,0.053,0.082,0.036,0.033,0.026,0.002,-0.002,0.020,0.087,0.021,0.034,0.003,-0.021,0.016,-0.009,-0.045,-0.043,-0.020,0.027,0.008,-0.006,0.043,0.045,0.014,0.053,0.083,0.113,0.091,0.028,0.060,0.040,0.019,0.114,0.126,0.090,0.046,0.089,0.029,0.030,0.010,0.045,0.040,0.072,-0.033,-0.008,0.014,-0.018,-0.004,-0.037,0.015,-0.021,-0.015)
bindistances=c(1.37,1.62,1.87,2.12,2.37,2.62,2.87,3.12,3.37,3.62,3.87,4.12,4.37,4.62,4.87,5.12,5.37,5.62,5.87,6.12,6.37,6.62,6.87,7.12,7.37,7.62,7.87,8.12)
Then, as a representation of currents:
AA=14
x11()
par(mfrow=c(4,1))
plotSticks(x=seq(from=(1),
to=(377),
by=(1)),
u=U1,
v=V1,
yscale=ysc,xlab='',ylab='',xaxt='n',yaxt='n',col=(rep('black',384)))
axis(side=1)
plotSticks(x=seq(from=(1),
to=(377),
by=(1)),
u=U2,
v=V2,
yscale=ysc,xlab='',ylab='',xaxt='n',yaxt='n',col=(rep('black',384)))
plotSticks(x=seq(from=(1),
to=(377),
by=(1)),
u=U2,
v=V2,
yscale=ysc,xlab='',ylab='',xaxt='n',yaxt='n',col=(rep('black',384)))
plotSticks(x=seq(from=(1),
to=(377),
by=(1)),
u=U2,
v=V2,
yscale=ysc,xlab='',ylab='',xaxt='n',yaxt='n',col=(rep('black',384)))
In order to simplify the representation, the three last plots are based on the same data.

Finding a density peak / cluster centrum in 2D grid / point process

I have a dataset with minute by minute GPS coordinates recorded by a persons cellphone. I.e. the dataset has 1440 rows with LON/LAT values. Based on the data I would like a point estimate (lon/lat value) of where the participants home is. Let's assume that home is the single location where they spend most of their time in a given 24h interval. Furthermore, the GPS sensor most of the time has quite high accuracy, however sometimes it is completely off resulting in gigantic outliers.
I think the best way to go about this is to treat it as a point process and use 2D density estimation to find the peak. Is there a native way to do this in R? I looked into kde2d (MASS) but this didn't really seem to do the trick. Kde2d creates a 25x25 grid of the data range with density values. However, in my data, the person can easily travel 100 miles or more per day, so these blocks are generally too large of an estimate. I could narrow them down and use a much larger grid but I am sure there must be a better way to get a point estimate.
There are "time spent" functions in the trip package (I'm the author). You can create objects from the track data that understand the underlying track process over time, and simply process the points assuming straight line segments between fixes. If "home" is where the largest value pixel is, i.e. when you break up all the segments based on the time duration and sum them into cells, then it's easy to find it. A "time spent" grid from the tripGrid function is a SpatialGridDataFrame with the standard sp package classes, and a trip object can be composed of one or many tracks.
Using rgdal you can easily transform coordinates to an appropriate map projection if lon/lat is not appropriate for your extent, but it makes no difference to the grid/time-spent calculation of line segments.
There is a simple speedfilter to remove fixes that imply movement that is too fast, but that is very simplistic and can introduce new problems, in general updating or filtering tracks for unlikely movement can be very complicated. (In my experience a basic time spent gridding gets you as good an estimate as many sophisticated models that just open up new complications). The filter works with Cartesian or long/lat coordinates, using tools in sp to calculate distances (long/lat is reliable, whereas a poor map projection choice can introduce problems - over short distances like humans on land it's probably no big deal).
(The function tripGrid calculates the exact components of the straight line segments using pixellate.psp, but that detail is hidden in the implementation).
In terms of data preparation, trip is strict about a sensible sequence of times and will prevent you from creating an object if the data have duplicates, are out of order, etc. There is an example of reading data from a text file in ?trip, and a very simple example with (really) dummy data is:
library(trip)
d <- data.frame(x = 1:10, y = rnorm(10), tms = Sys.time() + 1:10, id = gl(1, 5))
coordinates(d) <- ~x+y
tr <- trip(d, c("tms", "id"))
g <- tripGrid(tr)
pt <- coordinates(g)[which.max(g$z), ]
image(g, col = c("transparent", heat.colors(16)))
lines(tr, col = "black")
points(pt[1], pt[2], pch = "+", cex = 2)
That dummy track has no overlapping regions, but it shows that finding the max point in "time spent" is simple enough.
How about using the location that minimises the sum squared distance to all the events? This might be close to the supremum of any kernel smoothing if my brain is working right.
If your data comprises two clusters (home and work) then I think the location will be in the biggest cluster rather than between them. Its not the same as the simple mean of the x and y coordinates.
For an uncertainty on that, jitter your data by whatever your positional uncertainty is (would be great if you had that value from the GPS, otherwise guess - 50 metres?) and recompute. Do that 100 times, do a kernel smoothing of those locations and find the 95% contour.
Not rigorous, and I need to experiment with this minimum distance/kernel supremum thing...
In response to spacedman - I am pretty sure least squares won't work. Least squares is best known for bowing to the demands of outliers, without much weighting to things that are "nearby". This is the opposite of what is desired.
The bisquare estimator would probably work better, in my opinion - but I have never used it. I think it also requires some tuning.
It's more or less like a least squares estimator for a certain distance from 0, and then the weighting is constant beyond that. So once a point becomes an outlier, it's penalty is constant. We don't want outliers to weigh more and more and more as we move away from them, we would rather weigh them constant, and let the optimization focus on better fitting the things in the vicinity of the cluster.

How to resample/rebin a spectrum?

In Matlab, I frequently compute power spectra using Welch's method (pwelch), which I then display on a log-log plot. The frequencies estimated by pwelch are equally spaced, yet logarithmically spaced points would be more appropriate for the log-log plot. In particular, when saving the plot to a PDF file, this results in a huge file size because of the excess of points at high frequency.
What is an effective scheme to resample (rebin) the spectrum, from linearly spaced frequencies to log-spaced frequencies? Or, what is a way to include high-resolution spectra in PDF files without generating excessively large files sizes?
The obvious thing to do is to simply use interp1:
rate = 16384; %# sample rate (samples/sec)
nfft = 16384; %# number of points in the fft
[Pxx, f] = pwelch(detrend(data), hanning(nfft), nfft/2, nfft, rate);
f2 = logspace(log10(f(2)), log10(f(end)), 300);
Pxx2 = interp1(f, Pxx, f2);
loglog(f2, sqrt(Pxx2));
However, this is undesirable because it does not conserve power in the spectrum. For example, if there is a big spectral line between two of the new frequency bins, it will simply be excluded from the resulting log-sampled spectrum.
To fix this, we can instead interpolate the integral of the power spectrum:
df = f(2) - f(1);
intPxx = cumsum(Pxx) * df; % integrate
intPxx2 = interp1(f, intPxx, f2); % interpolate
Pxx2 = diff([0 intPxx2]) ./ diff([0 F]); % difference
This is cute and mostly works, but the bin centers aren't quite right, and it doesn't intelligently handle the low-frequency region, where the frequency grid may become more finely sampled.
Other ideas:
write a function that determines the new frequency binning and then uses accumarray to do the rebinning.
Apply a smoothing filter to the spectrum before doing interpolation. Problem: the smoothing kernel size would have to be adaptive to the desired logarithmic smoothing.
The pwelch function accepts a frequency-vector argument f, in which case it computes the PSD at the desired frequencies using the Goetzel algorithm. Maybe just calling pwelch with a log-spaced frequency vector in the first place would be adequate. (Is this more or less efficient?)
For the PDF file-size problem: include a bitmap image of the spectrum (seems kludgy--I want nice vector graphics!);
or perhaps display a region (polygon/confidence interval) instead of simply a segmented line to indicate the spectrum.
I would let it do the work for me and give it the frequencies from the start. The doc states the freqs you specify will be rounded to the nearest DFT bin. That shouldn't be a problem since you are using the results to plot. If you are concerned about the runtime, I'd just try it and time it.
If you want to rebin it yourself, I think you're better off just writing your own function to do the integration over each of your new bins. If you want to make your life easier, you can do what they do and make sure your log bins share boundaries with your linear ones.
Solution found: https://dsp.stackexchange.com/a/2098/64
Briefly, one solution to this problem is to perform Welch's method with a frequency-dependent transform length. The above link is to a dsp.SE answer containing a paper citation and sample implementation. A disadvantage of this technique is that you can't use the FFT, but because the number of DFT points being computed is greatly reduced, this is not a severe problem.
If you want to resample an FFT at a variable rate (logarithmically), then the smoothing or low pass filter kernel will need to be variable width as well to avoid aliasing (loss of sample points). Just use a different width Sync interpolation kernel for each plot point (Sync width approximately the reciprocal of the local sampling rate).

Minimising interpolation error between two data sets

In the top of the diagrams below we can see some value (y-axis) changing over time (x-axis).
As this happens we are sampling the value at different and unpredictable times, also we are alternating the sampling between two data sets, indicated by red and blue.
When computing the value at any time, we expect that both red and blue data sets will return similar values. However as shown in the three smaller boxes this is not the case. Viewed over time the values from each data set (red and blue) will appear to diverge and then converge about the original value.
Initially I used linear interpolation to obtain a value, next I tried using Catmull-Rom interpolation. The former results in a values come close together and then drift apart between each data point; the latter results in values which remain closer, but where the average error is greater.
Can anyone suggest another strategy or interpolation method which will provide greater smoothing (perhaps by using a greater number of sample points from each data set)?
I believe what you ask is a question that does not have a straight answer without further knowledge on the underlying sampled process. By its nature, the value of the function between samples can be merely anything, so I think there is no way to assure the convergence of the interpolations of two sample arrays.
That said, if you have a prior knowledge of the underlying process, then you can choose among several interpolation methods to minimize the errors. For example, if you measure the drag force as a function of the wing velocity, you know the relation is square (a*V^2). Then you can choose polynomial fitting of the 2nd order and have pretty good match between the interpolations of the two serieses.
Try B-splines: Catmull-Rom interpolates (goes through the data points), B-spline does smoothing.
For example, for uniformly-spaced data (not your case)
Bspline(t) = (data(t-1) + 4*data(t) + data(t+1)) / 6
Of course the interpolated red / blue curves depend on the spacing of the red / blue data points,
so cannot match perfectly.
I'd like to quote Introduction to Catmull-Rom Splines to suggest not using Catmull-Rom for this interpolation task.
One of the features of the Catmull-Rom
spline is that the specified curve
will pass through all of the control
points - this is not true of all types
of splines.
By definition your red interpolated curve will pass through all red data points and your blue interpolated curve will pass through all blue points. Therefore you won't get a best fit for both data sets.
You might change your boundary conditions and use data points from both data sets for a piecewise approximation as shown in these slides.
I agree with ysap that this question cannot be answered as you may be expecting. There may be better interpolation methods, depending on your model dynamics - as with ysap, I recommend methods that utilize the underlying dynamics, if known.
Regarding the red/blue samples, I think you have made a good observation about sampled and interpolated data sets and I would challenge your original expectation that:
When computing the value at any time, we expect that both red and blue data sets will return similar values.
I do not expect this. If you assume that you cannot perfectly interpolate - and particularly if the interpolation error is large compared to the errors in samples - then you are certain to have a continuous error function that exhibits largest errors longest (time) from your sample points. Therefore two data sets that have differing sample points should exhibit the behaviour you see because points that are far (in time) from red sample points may be near (in time) to blue sample points and vice versa - if staggered as your points are, this is sure to be true. Thus I would expect what you show, that:
Viewed over time the values from each data set (red and blue) will appear to diverge and then converge about the original value.
(If you do not have information about underlying dynamics (except frequency content), then Giacomo's points on sampling are key - however, you need not interpolate if looking at info below Nyquist.)
When sampling the original continuous function, the sampling frequency should comply to the Nyquist-Shannon sampling theorem, otherwise the sampling process introduces an error (also known as aliasing). The error, being different in the two datasets, results in a different value when you interpolate.
Therefore, you need to know the highest frequency B of the original function and then collect samples with a frequency at least 2B. If your function has very high frequencies and you cannot sample that fast, you should at least try to filter them away before sampling.

Resources