Association between point distribution and a continuous variable using R - r

I have a data set consisting of the location of trees and measurements of soil C (SOC). All points (trees and SOC) have x(0,50), y(0,50) coordinates. First, I would like to check to see if the proximity of trees (points) influences the amount of SOC (continuous variable). Second, I’d like to do the same analysis using only a subset of the trees (say, only trees with a diameter of >20 cm (dbh)). Can this be done in R using package 'spatstat' or 'ads'? I’ve looked around, but haven’t been able to find any solution to this problem yet. Any pointers would be greatly appreciated!
Example from (Simon et al. 2013): http://postimg.org/image/goks26xr5/
Data:
library(spatstat)
soc<-data.frame(x=c(0,5,5,5,5,5,10,10,10,10,10,10,10,15,15,15,15,15.1,15.9,15,15,15,20,20,20,20,21,20,20,20,25,25,25,25,23,25,25,25,25,30,30,31.5,30,33,30,30,30,30,35,35,35,35,35,35,35,35,35,40,40,40,40,40,40,40,45,45,45,45,45,50),
y=c(25,35,30,25,20,15,40,35.2,30,25,20,15,10,45,40,35,30,25,20,15,10,5,45,40,35,30,25,20,15,10,5,50,45,40,35,30,25,20,15,10,5,0,45,40,35,30,25,20,15,10,5,40,35,30,25,20,15.5,40,35,30,25,20,15,10,35,30,25,20,15,25),
zsoc=c(2,3,4,5,6,1,2,3,4,5,2,3,4,5,3,5,6,3,4,5,3,4,5,6,8,3,4,1,3,2,5,3,2,4,6,2,4,1,1,1,1,1,1,2,3,4,1,2,3,8,1.5,2,3,4,2.3,4,5,3,4,5,6,7,8,2,1,1,1,1,1,2))
tree<-data.frame(x=c(24,18,11,9,7,6,11,11,15,13,15,22,27,29,22,20,27,28,36,34,33,32,33,42,47,47,46,46,46,43,41,35,36,37,35,35,35,34,34,33,34,34,34,33,31,29,30,29,29),
y=c(28.8,31.2,32.0,24.0,18.4,17.6,13.1,11.9,11.1,5.8,3.6,1.5,8.3,13.3,15.7,17.3,19.0,19.1,14.4,10.8,6.1,4.9,2.7,2.7,11.3,11.8,12.3,10.1,19.9,24.4,23.0,25.6,31.0,34.6,36.5,36.9,36.8,38.4,35.6,37.0,39.6,39.5,41.6,41.8,39.7,41.1,35.9,35.8, 35.0),
zdbh=c(15,49,53,53,43,32,34,46,50,32,56,32,48,42,53,52,34,47,39,48,38,36,17,33,25,21,10,11,50,36,47,50,47,12,7,8,6,6,9,16,23,8,8,21,6,10,6,21,11))
soc <- ppp(soc[,1], soc[,2], c(0,50), c(0,50), marks=soc[3], unitname=c("meter"))
tree <- ppp(tree[,1], tree[,2], c(0,50), c(0,50), marks=tree[3], unitname=c("meter"))
Hope this works!
Example reference: Simón, N., Montes, F., Díaz-Pinés, E., Benavides, R., Roig, S., Rubio, A., 2013. Spatial distribution of the soil organic carbon pool in a Holm oak dehesa in Spain. Plant and Soil 366(1-2), 537-549.

Related

Problems with rhohat in R - Distribution of archaeological finds in relation to environmental factors

I am currently working with metal axe heads from Denmark, England/Wales and the Netherlands, which I have analysed with the rhohat function in R. The spatial covariates I have employed are soil type, soil texture, soil pH, land cover and a shapefile containing estimated preservation capacity across Europe (Source: https://esdac.jrc.ec.europa.eu/content/maps-related-predicting-preservation-cultural-artefacts-and-buried-materials-soils-eu-0). This is my code (if my spatial covariate is a shapefile):
r <- raster(ncol=180, nrow=180)
extent(r) <- extent(soil)
soiltype_raster <- rasterize(soil, r, 'FAO85LV1')
soiltype_im <- as.im(soiltype_raster)
plot(soiltype_im)
plot(axeheads, add=TRUE)
soiltype_dk <- rhohat(axeheads.pp, soiltype_im)
plot(soiltype_dk)
The values of my rhohat graphs and the graphs themselves don't make alot of sense, see
Rhohat of English and Welsh metal axe heads in relation to land cover, Rhohat of Danish metal axe heads in relation to land cover or Rhohat of English and Welsh metal axe heads in relation to soil texture. The spatial covariate land cover is a three-digit numerical value (i.e. 111=Discontinuous urban fabric), whereas soil texture is a numerical value between 1-5 (coarse to fine, with additional values 0=no information and 9=no mineral texture).
What could be causing my funky-looking graphs that do not seem to correspond with the information I have extracted in QGIS? What are the values on the Y-axis? Are there any other functions that could be useful for investigating the relation between the distribution of finds and environmental factors? Thank you in advance!
This is a question about the function rhohat in the R package spatstat (this is not part of the base R system!)
The help file for rhohat says that the covariate must have numerical values. In your example, the covariates are more like categorical values that are encoded as numerical values. For example, 111 means discontinuous urban fabric: what do 110 and 112 mean, and is 111 halfway between them?
For this kind of data you would be better advised to convert the covariates to categorical (factor) values and use raw estimates of intensity:
soiltype_im <- eval.im(factor(soiltype_im))
soiltype_tess <- tess(image=soiltype_im)
soiltype_count <- quadratcount(axeheads.pp, tess=soiltype_tess)
soiltype_inten <- intensity(soiltype_count)
The last result will be a table telling you the average number of axeheads per unit area in each type of soil. (rhohat gives a similar calculation but assumes that the covariate is continuously-varying, which does not apply in your example)

the meaning of cluster size in Cox process models in spatstat

for some tree wood, the conduits in cross sections clearly aggregate as clusters. it looks natural that the Cox process modeling in spatstat (r) could be fitted for the conduits point data, and the results include a estimated "Mean cluster size". I am not sure the meaning of this index, can I think it is the mean number of conduits in clusters of the whole conduit points data?
code from an good example in the book is following:
>fitM<-kppm(redwood~1, "MatClust")
>fitM
#...
# Scale-0.08654
# Mean cluster size: 2.525 points
in their book, author of the spatstat explain the mean cluster size as the offspring points number, which is dispered by parent points like plant seedlings. in my case, there are no such process happening: conduits are xylem cells developed from cambium cells from outside of the stem annual ring, they donnot disperse randomly.
I would like to estimate the mean cluster size and cluster scale for my conduit distribution data, the Scale and Mean cluster size seems like what I want. however, the redwood data was different with mine in nature, I am not sure about the meaning of them in my data. futhermore, I am wondering, which model is suit for my context, NeymanScott, MatCluster, Thomas or others?
any suggestion is appreciated.
Jingming
If you fit a parametric point process model such as a Thomas or Matern cluster
process you are assuming the data is generated by a random process that
generates a random number of clusters with a random number of points in each
cluster. The location of the points around each cluster center is also random.
The parameter kappa controls the expected number of clusters, mu
controls the expected number of points in a cluster and scale controls the
extend of the cluster. The type of process (Thomas, Matern or others)
determines the distribution within the cluster. My best suggestion is to do
simulation experiments to understand these different types of processes and
see if they are appropriate for your needs.
For example on average 10 clusters in the unit square with on average 5
points in each and a short spatial extend (scale=0.01) of the cluster gives
you fairly well-defined tight clusters:
library(spatstat)
set.seed(42)
sim1 <- rThomas(kappa = 10, mu = 5, scale = 0.01, nsim = 9)
plot(sim1, main = "")
For example on average 10 clusters in the unit square with on average 5
points in each and a bigger spatial extend (scale=0.05) of the cluster gives
a less clear picture where it is hard to see the clusters:
sim2 <- rThomas(kappa = 10, mu = 5, scale = 0.05, nsim = 9)
plot(sim2, main = "")
In conclusion: Experiment with simulation and remember to do many simulations
of each experiment rather than just one, which can be vey misleading.

Using a Point Process model for Prediction

I am analysing ambulance incident data. The dataset covers three years and has roughly 250000 incidents.
Preliminary analysis indicates that the incident distribution is related to population distribution.
Fitting a point process model using spatstat agrees with this, with broad agreement in a partial residual plot.
However, it is believed that the trend diverges from this population related trend during the "social hours", that is Friday, Saturday night, public holidays.
I want to take subsets of the data and see how they differ from the gross picture. How do I account for the difference in intensity due to the smaller number of points inherent in a subset of the data?
Or is there a way to directly use my fitted model for the gross picture?
It is difficult to provide data as there are privacy issues, and with the size of the dataset, it's hard to simulate the situation. I am not by any means a statistician, hence I am flundering a bit here. I have a copy of
"Spatial Point Patterns Methodology and Applications with R" which is very useful.
I will try with pseudocode to explain my methodology so far..
250k_pts.ppp <- ppp(the_ambulance_data x and y, the_window)
1.3m_census_pts <- ppp(census_data x and y, the_window)
Best bandwidth for the density surface by visual inspection seemed to be bw.scott. This was used to fit a density surface for the points.
inc_density <- density(250k_pts.ppp, bw.scott)
pop_density <- density(1.3m_census_pts, bw.scott)
fit0 <- ppm(inc_density ~ 1)
fit_pop <- ppm(inc_density ~ pop_density)
partials <- parres(fit_pop, "pop_density")
Plotting the partial residuals shows that the agreement with the linear fit is broadly acceptable, with some areas of 'wobble'..
What I am thinking of doing next:
the_ambulance_data %>% group_by(day_of_week, hour_of_day) %>%
select(x_coord, y_coord) %>% nest() -> nested_day_hour_pts
Taking one of these list items and creating a ppp, say fri_2300hr_ppp;
fri23.den <- density(fri_2300hr_ppp, bw.scott)
fit_fri23 <- fit(fri_2300hr_ppp ~ pop_density)
How do I then compare this ppp or density with the broader model? I can do characteristic tests such as dispersion, clustering.. Can I compare the partial residuals of fit_pop and fit_fri23?
How do I control for the effect of the number of points on the density - i.e. I have 250k points versus maybe 8000 points in the subset. I'm thinking maybe quantiles of the density surface?
Attach marks to the ambulance data representing the subset/categories of interest (eg 'busy' vs 'non-busy'). For an informal or nonparametric analysis, use tools like relrisk, or use density.splitppp after separating the different types of points using split.ppp. For a formal analysis (taking into account the sample sizes etc etc) you should fit several candidate models to the same data, one model having a busy/nonbusy effect and another model having no such effect, then use anova.ppm to test formally whether there is a busy/nonbusy effect. See Chapter 14 of the book mentioned.

Method to find simulation envelopes for association (pair correlation) between adults and juveniles using homogeneous Thomas process in R

I have marked ppp object (shp) that consist of ; x co-ordinate, y co-ordinate, growth ("adult", "juvenile","sapling").
I found simulation envelopes for association (using pair correlation function) between adults and juveniles using homogeneous Poisson process.
aj1<-envelope(shp,pcfcross,nsim = 199,i = "adult", j = "juvenile",savefuns=TRUE)
Now I want to find the same thing using homogeneous Thomas process. So I fit a homogeneous Thomas process to a point pattern object using
fit1 <- kppm(shp ~1, "Thomas",method ="palm").
Then I tried to find simulation envelopes using
aj2<-envelope(fit1,pcfcross,nsim = 199,i = "adult", j = "juvenile",savefuns=TRUE)
But this did not work. If anyone has any suggestion or any alternative method to find simulation envelopes for association (using pair correlation function) between adults and juveniles using homogeneous Thomas process, I would be really grateful.
From Spatial Lecture Note, is the following example helpful
require('spatstat')
data("redwood")
X <- redwood
plot(X)
plot(envelope(X))
fit <- kppm(X, ~1, "Thomas")
plot(fit)
plot(envelope(fit))
Unfortunately I don't think any generic software to estimate multitype Poisson cluster models exists. We hope to add such functionality to spatstat in the not too distant future (but realistically many months from now). I know of some people that have worked on this and you might ask them for code. I think the paper
Jalilian, A., Guan, Y., Mateu, J. and Waagepetersen, R. (2014)
Multivariate product-shot-noise Cox models, Biometrics, 71, 1022-1033.
may be very relevant for you and you can ask the authors whether they have any code you can use.
Currently kppm does not handle marked point patterns. You would have got an error message about "cannot handle marked point patterns" when you tried to fit the model kppm(shp ~ 1, "Thomas", method="palm").
This functionality will be added to spatstat sometime in 2016.

Normalizing data in r using population raster

I have two pixel images that I created using spatstat, one is a density image created by a set of points (using function density.ppp), and the other is a pixel image created from a population raster. I am wondering if there is a way to use the population raster to normalize the density image. Basically, I have a dataset of 10000+ cyber attack origin locations in the US, using the spatstat function I hope to investigate for spatial patterns. However, the obvious problem is that areas of higher population have more cyber attack origins because there are more people. I would like to use the population raster to fix that. Any ideas would be appreciated.
As the comment by #RHA says: The first solution is to simply divide by the intensity.
I don't have your data so I will make some that might seem similar. The Chorley dataset has two types of cancer cases. I will make an estimate of the intensity of lung cancer and use it as your given population density. Then a density estimate of the larynx cases serves as your estimate of the cyber attack intensity:
library(spatstat)
# Split into list of two patterns
tmp <- split(chorley)
# Generate fake population density
pop <- density(tmp$lung)
# Generate fake attack locations
attack <- tmp$larynx
# Plot the intensity of attacks relative to population
plot(density(attack)/pop)
Alternatively, you could use the inverse population density as weights in density.ppp:
plot(density(attack, weights = 1/pop[attack]))
This might be the preferred way, where you basically say that an attack occurring at e.g. a place with population density 10 only "counts" half as much as an attack occurring at a place with density 5.
I'm not sure what exactly you want to do with your analysis, but maybe the you should consider fitting a simple Poisson model with ppm and see how your data diverges from the proposed model to understand the behaviour of the attacks.

Resources