How to infer a relation/correlation between two spatial points using R - r

I am quite new to the area of spatial statistics, but I'm very interested. For learning and demo purposes, I've created three datsets.
Dataset - Persons: This describes individuals at a certain location with a few variables. Please note, that the persons are located in the provided cities. A short explanation:
POINT_X: X-coordinate of city.
POINT_Y: Y-coordinate of city.
city: The name of the city, where they live.
ill: "1" states that they are ill. For learning purposes, all persons are ill.
job: If they have a job or not. "1" means: they have one, "0" means they haven't got one.
disnw: The distance to the nearest waterpoint.
wID: not relevant.
Dataset - City: This describes a number of cities including some variables. A short explanation of these:
city: The name of the city.
population: The population of the city.
POINT_X: X-coordinate of city.
POINT_Y: Y-coordinate of city.
ill: Number of ill persons in the city.
notill: Number of healthy persons in the city.
disnw: The distance (in km) to the nearest waterfeature.
wID: not relevant
rate_ill: The rate of ill persons in the city.
rate_notill: The rate of healthy persons in the city.
Dataset - Waterfeatures: . Please note that the viallages are on the same location as persons. This is a collection of spatial points, which describes waterfeatures.
POINT_X: X-coordinate of a waterfeature.
POINT_Y: Y-coordinate of a waterfeature.
geographic overview about the setting (red are persons, blue are waterfeatures, yellow are cities)
Now I want to check the hypothesis that cities, which are nearer to waterfeatures (so where the variable disnw is lower), have a higher number of ill persons. So is there a correlation between the number of ill persons/rate of ill persons and the proximity to water features. I know, that the datasets are possibly not representative or suitable for my hyptothesis, but for now this fact shouldn't matter.
I've already looked at some functions and packages, but I'm very unsure about a suitable method. Methods, which might be useful (at least from my point of view): semivariogram, variogram, Ripley's K function, G-Function, correlation coefficient.
To give you a better overview, I've created example datasets. You can find these here:
persons = read.csv("http://pastebin.com/raw.php?i=3aMGi9Ax", header = TRUE, stringsAsFactors=FALSE)
city = read.csv("http://pastebin.com/raw.php?i=Lk3KXLQT", header = TRUE, stringsAsFactors=FALSE)
water = read.csv("http://pastebin.com/raw.php?i=hQRvMZwE", header = TRUE, stringsAsFactors=FALSE)
It would be awesome to get some input from your side. Maybe you have a tip, how to perform this kind of analysis.
Thanks in advance!

Related

Calculate weights from overlapping areas (layer) Rstudio

I have two shapefiles of the quartiers of Paris, one before (available here) 1860 and one after (available here).
Neighborhood boundaries changed overtime, and I want to do an analysis with geography before 1860.
To do so, I need to compute how many new neighborhoods overlap the old ones by computing weights. There can be several new neighborhoods that overlap only one old neighborhood. At the end, I want to end up with a crosswalk matrix with all the weights of new neighborhoods to old neighborhoods.
Here is a figure to illustrate, (in orange the old, in red the new)

What are the names of the graphs used in the NAPLAN? Anyone know how to plot them?

In Australia we have a test for students called NAPLAN.
The results are provided in a sort of band graph mixed with a box and whisker.
Does anyone know what they are called?
They are good because they show total range, Where the student falls in the band. What the national average is and what the students class average is.
essentially 4 data points on 1 graph.

Normalizing species count data in ArcGIS Pro

I have presence points of a certain species all over the United States. I completed a spatial join between the US and said points. However, I am unsure of how to normalize the data. There is a "percent of total," but I am unsure if this is the appropriate option. Or is it as simple as just normalizing by the counts themselves?
It depends on what comparison you're trying to make with the normalized data.
If you want to look at the occurrence of that species by state, you could do a spatial join on a US States layer, then calculate a new field where the value is the species count for each state divided by the total area of the state. That would give you the normalized 'count per square mile' (or whatever unit you want).

Using st_join for a spatial join using largest intersection

I am using the sf_package to work around spatial data in r. At this stage, I want to make a spatial join so that the tax lots of my area of study inherit the attributes of the floodplain on which they are located. For example, taxlots may be located in a floodplain categorized as X, VE, A, A0, or V (these are codes that relate to the intensity of the flood in each area).
To do this, I tested the sf function st_join, which will by default rely on st_intersects to determine the spatial join for each entity of my tax lots.
However, I am trying to figure out the criteria used by the function when a tax lot intersects with two different floodplain areas (as in the photo below, in which several lots intersect both with an A floodplain and an AE floodplain). Does it take the value of the area that covers the largest area of the lot? or is it a matter of which area is located upper in the dataframe?
Note that I am not interested in partitioning the intersecting lots so that I divide them according to their areas intersecting one and other floodplain zones.
Photo of tax lots intesecting with more than one floodplain category
By default, st_join(x, y, join = st_intersects) duplicates all features in x,
that intersect with more than one features from y.
If you set the argument largest = TRUE, st_join() returns the x features augmented with the fields of y that have the largest overlap with each of the features of x.
See https://r-spatial.github.io/sf/reference/st_join.html and https://github.com/r-spatial/sf/issues/578 for more details.

Multiple nonlinear regression in R (# of restaurants vs # of people in region)

I am trying to find the relationship between the number of people that come to a certain region and the number of accommodations, shops, restaurants, and leisure places in that region. I know the number of total people whom visit a certain region but I don't know whether they visit for accommodation or to shop, etc.
So I have plotted the number of restaurants, etc, in each region by the number of people in that region. Here is the graph. Here is some of the data I'm trying to analyze
Thus, the general shape of these points is a parabola that is rotated 90 degrees. I am not very familiar with R and cannot figure out how to find this equation/know if this is not possible.
My goal is to get coefficients of each parameter (ie accomodation, restaurants, etc.) so I can conclude something like "if we add 10 restaurants, an increase of x number of people should come to the region."
Here is a snippet of some code I've tried but not succeeded
linez <- nls(People ~ sqrt(Accommodation/a) , data=fourth, start=c(a=1), trace=T)
lines(s, predict(linez, list(x=s)), col = "red")

Resources