I can not find this information in the reference literature [1]
1)how adaptative.density() (package spatstat) manage duplicated spatial points. I have duplicated points exactly in the same position because I am combining measurements from different years, and I am expecting that the density curve is higher in those areas but I am not sure about it.
2) is the default value of f in adaptative.density() f=0 or f=1?
My guess is that it is f=0, so it is doing an adaptive estimate by calculating the intensity estimate at every location equal to the average intensity (number of points divided by window area)
Thank you for your time and input!
The default value of f is 0.1 as you can see from the "Usage" section in the help file.
The function subsamples the point pattern with this selection probability and uses the resulting pattern to generate a Dirichlet tessellation (if there are duplicated points here they are ignored). The other fraction of points (1-f) is used to estimate the intensity by the number of points in each tile of the tessellation divided by the corresponding area (here duplicated points count equally to the total count in the tile).
Related
I'm trying to determine whether a set of points are uniformly distributed in a 1 x 1 x 1 cube. Each point comes with an x, y, and z coordinate that corresponds to their location in the cube.
A trivial way that I can think of is to flatten the set of points into 2 graphs and check how normally distributed both are however I do not know whether that's a correct way of doing so.
Anyone else has any idea?
I would compute point density map and then check for anomalies in it:
definitions
let assume we have N points to test. If the points are uniformly distributed then they should form "uniform grid" of mmm points:
m * m * m = N
m = N^(1/3)
To account for disturbances from uniform grid and asses statistics you need to divide your cube to grid of cubes where each cube will hold several points (so statistical properties could be computed) let assume k>=5 points per grid cube so:
cubes = m/k
create a 3D array of counters
simply we need integer counter per each grid cube so:
int map[cubes][cubes][cubes];
fill it with zeroes.
process all points p(x,y,z) and update map[][][]
Simply loop through all of your points, and compute grid cube position they belong to and update their counter by incrementing it.
map[x*(cubes-1)][y*(cubes-1)][z*(cubes-1)]++;
compute average count of the map[][][]
simple average like this will do:
avg=0;
for (xx=0;xx<cubes;xx++)
for (yy=0;yy<cubes;yy++)
for (zz=0;zz<cubes;zz++)
avg+=map[xx][yy][zz];
avg/=cubes*cubes*cubes;
now just compute abs distance to this average
d=0;
for (xx=0;xx<cubes;xx++)
for (yy=0;yy<cubes;yy++)
for (zz=0;zz<cubes;zz++)
d+=fabs(map[xx][yy][zz]-avg);
d/=cubes*cubes*cubes;
the d will hold a metric telling how far are the points from uniform density. Where 0 means uniform distribution. So just threshold it ... the d is also depending on the number of points and my intuition tells me d>=k means totally not uniform so if you want to make it more robust you can do something like this (the threshold might need tweaking):
d/=k;
if (d<0.25) uniform;
else nonuniform;
As you can see all this is O(N) time so it should be fast enough for you. If it isn't you can evaluate every 10th point by skipping points however that can be done only if the order of points is random. If not you would need to pick N/10 random points instead. The 10 might be any constant but you need to take in mind you still need enough points to process so the statistic results are representing your set so I would not go below 250 points (but that depends on what exactly you need)
Here few of my answers using density map technique:
Finding holes in 2d point sets?
Location of highest density on a sphere
I have a dataframe with around 10,000 points. I want to find to take the first point and check its distance from the second point onwards and if the distance is less than "d" (variable), I remove these two points from df and perform this activity with the first point of the new df.
This is done till there are only 2 points left in the dataframe.
It takes a lot of time. Is there a time-efficient way to do this?
If your points exist in 2D space (e.g. Euclidean), then you can use the Cluster package:
library(cluster)
data(agriculture)
## Dissimilarities using Euclidean metric
d.agr <- daisy(agriculture, metric = "euclidean")
as.matrix(d.agr)
The final matrix will give you the "distance" between each point, according to the metric you set (Euclidean in the above example).
I want to find the expected value of y given x, based on some data. I want really good estimates of the mean y-value at a few particular x-values, but I don't want/need to fit something parametric or do a regression.
Instead, I want to take my observations, bin a bunch of them where I have a lot of x-values in a small range of X, and compute the mean of y.
Is there a clever way to select, say, 6 non-overlapping regions of high density from my vector of x observations?
If so, I'll take the center of each region, grab a bunch of the closest x's (maybe 100 in my real data), and compute the associated mean(y).
Here's some example data:
# pick points for high density regions
#xobs<-runif(900)
clustery_obs<-function(x){rnorm(40,x,0.2)}
under_x<-runif(11)
xobs<-sapply(under_x, clustery_obs)
xobs<-xobs[0<xobs&xobs<1]
yfun<-function(x){rnorm(1, mean=(10*x)^2-(30*x)+3, sd=6)}
yobs<-sapply(xobs, yfun)
plot(xobs, yobs)
In R, what method is used in boxplot() to remove outliers? In other words, what determines if a given value is an outlier?
Please note, this question is not asking how to remove outliers.
EDIT: Why is this question being downvoted? Please provide comments. The method for outlier removal is not available in any documentation I have come across.
tl;dr outliers are points that are beyond approximately twice the interquartile range away from the median (in a symmetric case). More precisely, points beyond a cutoff equal to the 'hinges' (approx. 1st and 3d quartiles) +/- 1.5 times the interquartile range.
R's boxplot() function does not actually remove outliers at all; all observations in the data set are represented in the plot (unless the outline argument is FALSE). The information on the calculation for which points are plotted as outliers (i.e., as individual points beyond the whiskers) is, implicitly, contained in the description of the range parameter:
range [default 1.5]: this determines how far the plot whiskers extend out from the
box. If ‘range’ is positive, the whiskers extend to the most
extreme data point which is no more than ‘range’ times the
interquartile range from the box. A value of zero causes the
whiskers to extend to the data extremes.
This has to be deconstructed a little bit more: what does "from the box" mean? To figure this out, we need to look at the Details of ?boxplot.stats:
The two ‘hinges’ are versions of the first and third quartile,
i.e., close to ‘quantile(x, c(1,3)/4)' [... see ?boxplot.stats for slightly more detail ...]
The reason for all the complexity/"approximately equal to the quartile" language is that the developers of the boxplot wanted to make sure that the hinges and whiskers were always drawn at points representing actual observations in the data set (whereas the quartiles can be located between observed points, e.g. in the case of data sets with odd numbers of observations).
Example:
set.seed(101)
z <- rnorm(100000)
boxplot(z)
hinges <- qnorm(c(0.25,0.75))
IQR <- diff(qnorm(c(0.25,0.75)))
abline(h=hinges,lty=2,col=4) ## hinges ~ quartiles
abline(h=hinges+c(-1,1)*1.5*IQR,col=2)
## in this case hinges = +/- IQR/2, so whiskers ~ +/- 2*IQR
abline(h=c(-1,1)*IQR*2,lty=2,col="purple")
I am calculating the Pearson correlation between two rasters (identical in dimensions and cell size) in a moving window with the corLocal from the raster package. It is not clear (to me) from the manual what the neighborhood size parameter (ngb) actually means. E.g., does a ngb = 5 mean that the correlation is calculated for the focal cell plus the top-bottom-right-left cells?
I looked at the code and corLocal calls getValuesFocal():
getValuesFocal(x, 1, nrow(x), ngb=ngb)
but I couldn't understand what getValuesFocal actually does.
Thanks,
Ilik
The ngb parameter defines the neighborhood size. For example, I believe ngb=5 defines a 5 x 5 neighborhood. This should be equivalent to ngb=c(5,5) which is a vector of two integers defining the numbers of rows and cols in the neighborhood or focal window. In this example, an individual cell in the output raster would represent the correlation calculated from a 5 x 5 cell neighborhood in the two input rasters.
The raster library documentation on p. 118 might help too.