How can I get the spatial correlation between two datsets in r? - r

I have two arrays:
data1=array(-10:30, c(2160,1080,12))
data2=array(-20:30, c(2160,1080,12))
#Add in some NAs
ind <- which(data1 %in% sample(data1, 1500))
data1[ind] <- NA
One is modelled global gridded data (lon,lat,month) and the other, global gridded observations (lon,lat,month).
I want to assess how 'skillful' the modelled data is at recreating the obs. I think the best way to do this is with a spatial correlation between the datasets. How can I do that?
I tried a straightforward x<-cor(data1,data2) but that just returned x<-NA_real_.
Then I was thinking that I probably have to break it up by month or season. So, just looking at one month x<-cor(data1[,,1],data2[,,1]) it returned a matrix of size 1080*1080 (most of which are NAs).
How can I get a spatial correlation between these two datasets? i.e. I want to see where the modelled data performs 'well' i.e. has high correlation with observations, or where it does badly (low correlation with observations).

Related

When setting your obsCovs for the function pcount (package unmarked) how does R "know" which obsCov observation corresponds to each y value?

I'm relatively new at R particularly with this package. I am running n-mixture models assessing detection probabilities and abundance. I have abundance data, site covariates and observation covariates. There are three repeated observations(rounds)/site. The observation covariates are set up as columns (three column/covariate, one for each round). The rows are individual sites. The abundance data is formatted similarly, with each column heading representing a different round. I've copied my code below.
y.abun2<-COYE[2:4]
obsCovs.ss <- list(temp=Covariate2021[3:5], Date=Covariate2021[13:15], Cloud=Covariate2021[17:19], Wind=Covariate2021[21:23],Observ=Covariate2021[25:27])
siteCovs.ss <- Covariate2021[c(29,30,31,32)]
coyeabund<-unmarkedFramePCount(y=y.abun2, siteCovs = siteCovs.ss,
obsCovs = obsCovs.ss)
After this I scale using this code:
coyeabund#siteCovs$TreeCover <-
scale(coyeabund#siteCovs$TreeCover)
Moving on to my model I use this code:
abun.coye.full<-pcount(~TreeCover+temp+Date+Cloud+Wind+Observ ~ HHSDI+ProportionNH+Quality, coyeabund,mixture="NB", K=132,se=TRUE)
Is the model matching the observation covariates to the abundance measurements to each round? (i.e., is it able to tell that temp column 5 corresponds to the third round of abundance measurements?)
The models seem fine so far but I am so new at this I want to confirm that I haven't gone astray.

Divide a set of curves into groups using functional data analysis

I have a dataset containing about 500 curves. In my dataset every row is a curve (which comes from some experimental measurements) and in the columns there are the measurement intervals (I don't think it's important, but intervals are not time measurements but frequency measurements).
Here you can find the data:
https://drive.google.com/file/d/1q1F1any8RlCIrn-CcQEzLWyrsyTBCCNv/view?usp=sharing
curves
t1
t2
1
-57.48
-57.56
2
-56.22
-56.28
3
-57.06
-57.12
I want to divide this dataset into 2 - 4 homogeneous groups of curves.
I've seen that there are some packages in R (fda and funHDDC) that allow you to find clusters but I don't know how to create the list with which to start the analysis, and I also don't understand why the initial dataset doesn't fit. How can I transform the data I have into a list suitable for processing with the above packages?
What results should I expect?

How to include plots / rows with zero values in the presence / absence community matrix in a CCA using R Vegan package

I am trying to do CCA using a presence / absence matrix of plant quadrat data and continuous environmental data for the same quadrats, using the Vegan package in R. Some of the quadrats have no plant species present (the row for the quadrat is full of 0's) but do have corresponding environmental data in another dataframe. The context of the study is that the environmental data is metal concentrations in soil, which are typically high where there are no plant species, so the quadrats with zero species do contribute to the data, and are not errors or NA's. When running the CCA with the R Vegan Package so far I have had to delete these rows to get it to work, otherwise it returns the error
'Error in cca.default(d$X, d$Y, d$Z) :
all row sums must be >0 in the community data matrix' .
Is there a way to include the data from quadrats that have no plant species in the CCA? I have read in this paper, which also uses the Vegan package,: https://www.researchgate.net/publication/229087061_Relationships_between_the_presence_of_odonate_species_and_environmental_characteristics_in_lowland_ponds_of_central_Italy and that has a similar research design, that they have included plots with zero species by adding a 'zero species' variable but do not elaborate on how this is done.
I am new to coding so any help is very much appreciated,
Thanks in advance
Here is how to do it. Assume your data set is called comm and it has some rows (sampling units) that have no species:
comm$ZERO <- as.numeric(rowSums(comm) == 0)
This will add a new column ZERO which is 1 for rows that had no species, and 0 for others.
Personally, I would be worried about doing this. Correspondence Analysis is a compositional analysis, and adding a column (species) that never occurs with any other species (by definition) creates a data set with two disjunct blocks. In unconstrained CA this disjunct block manifests in first eigenvalue 1 – which is the theoretical maximum in CA. This first eigenvector will separate the blocks: ZERO species and the sampling units with ZERO species in one extreme, and all other species and sampling units in another extreme of the first axis. The second axis of this ZERO ordination will be identical to the first axis without ZERO, so in effect you just add this disjunction axis to the ordination.
Things are slightly different with CCA which actually looks at the fitted values of your species, and these fitted values may not be disjunct. So technically you can do it. However, it is not quite clear to me what you do if you do so. Even if the data set is not completely disjunct with CCA, the zero sampling units will probably be far separated from other points, and all plotted in the same point.

Spatial Regressions with Panel Data in R

I have a panel dataset with several hundred regions, ~10 years and spatial data for the regions. I created a weight matrix with the spdeppackage (via the standard way, and then, nb2listw).
I have, thus, a matrix with weights for each region (in relation to the other regions) - but each region is represented just once.
I would like to run some of the spatial regressions from the spdeppackage (lagsarlm, errorsarlm), but I get an error:
Error in subset.listw(listw, subset, zero.policy = zero.policy) :
Not yet able to subset general weights lists
and
Error in lagsarlm(y ~ x1 + x2: Input data and weights have different dimensions
I assume this is because the weight matrix has only one row per region (and then, only one year can be calculated). Do you have any suggestions how to attack the issue?
My ideas revolve around the following:
Extend the spatial weight matrix OR
Tell spdep that the regions will repeat in the same order (but how?)
Looking forward to your suggestions.

How to specify subset/ sample number for permutations using specaccum() in R's vegan package

I have a community matrix (species as columns, samples as rows) from which I would like to generate a species accumulation curve (SAC) using the specaccum() and fitspecaccum() functions in R's vegan package. In order for the resulting SAC and cumulative species richness at sample X to be comparable among regions (I have 1 community matrix per region), I need to have specaccum() choose the same number of sets within each region. My problem is that some regions have a larger number of sets than others. I would like to limit the sample size to the minimum number of sets among regions (in my case, the minimum number of sets is 45, so I would like specaccum() to randomly sample 45 sets, 100 times (set permutations=100) for each region. I would like to sample from the entire data set available for each region. The code below has not worked... it doesn't recognize "subset=45". The vegan package info says "subset" needs to be logical... I don't understand how subset number can be logical, but maybe I am misinterpreting what subset is... Is there another way to do this? Would it be sufficient to run specaccum() for the entire number of sets available for each region and then just truncate the output to 45?
require(vegan)
pool1<-specaccum(comm.matrix, gamma="jack1", method="random", subet=45, permutations=100)
Any help is much appreciated.
Why do you want to limit the function to work in a random sample of 45 cases? Just use the species accumulation up to 45 cases. Taking a random subset of 45 cases gives you the same accumulation, except for the random error of subsampling and throwing away information. If you want to compare your different cases, just compare them at the sample size that suits all cases, that is, at 45 or less. That is the idea of species accumulation models.
The subset is intended for situations where you have (possibly) heterogeneous collection of sampling units, and you want to stratify data. For instance, if you want to see only the species accumulation in the "OldLow" habitat type of the Barro Colorado data, you could do:
data(BCI, BCI.env)
plot(specaccum(BCI, subset = BCI.env$Habitat == "OldLow"))
If you want to have, say, a subset of 30 sample plots of the same data, you could do:
take <- c(rep(TRUE, 30), rep(FALSE, 20))
plot(specaccum(BCI)) # to see it all
# repeat the following to see how taking subset influences
lines(specaccum(BCI, subset = sample(take)), col = "blue")
If you repeat the last line, you see how taking random subset influences the results: the lines are normally within the error bars of all data, but differ from each other due to random error.

Resources