How to extract saved envelope values in Spatstat? - r

I am new to both R & spatstat and am working with the inhomogeneous pair correlation function. My dataset consists of point values spread across several time intervals.
sp77.ppp = ppp(sp77.dat$Plot_X, sp77.dat$Plot_Y, window = window77, marks = sp77.dat$STATUS)
Dvall77 = envelope((Y=dv77.ppp[dv77.ppp$marks=='2']),fun=pcfinhom, r=seq(0,20,0.25), nsim=999,divisor = 'd', simulate=expression((rlabel(dv77.ppp)[rlabel(dv77.ppp)$marks=='1']),(rlabel(dv77.ppp)[rlabel(dv77.ppp)$marks=='2'])), savepatterns = T, savefuns = T).
I am trying to compare multiple pairwise comparisons (from different time periods) and need to create a function that will go through for every calculated envelope value, at each ‘r’ value, and find the min and max differences between the envelopes.
My question is: How do I find the saved envelope values? I know that the savefuns = T is saving all the simulated envelope values but I can’t find how to extract the values. The summary (below) says that the values are stored. How do I call the values and extract them?
> print(Dvall77)
Pointwise critical envelopes for g[inhom](r)
and observed value for ‘(Y = dv77.ppp[dv77.ppp$marks == "2"])’
Edge correction: “iso”
Obtained from 999 evaluations of user-supplied expression
(All simulated function values are stored)
(All simulated point patterns are stored)
Alternative: two.sided
Significance level of pointwise Monte Carlo test: 2/1000 = 0.002
.......................................................................................
Math.label Description
r r distance argument r
obs {hat(g)[inhom]^{obs}}(r) observed value of g[inhom](r) for data pattern
mmean {bar(g)[inhom]}(r) sample mean of g[inhom](r) from simulations
lo {hat(g)[inhom]^{lo}}(r) lower pointwise envelope of g[inhom](r) from simulations
hi {hat(g)[inhom]^{hi}}(r) upper pointwise envelope of g[inhom](r) from simulations
.......................................................................................
Default plot formula: .~r
where “.” stands for ‘obs’, ‘mmean’, ‘hi’, ‘lo’
Columns ‘lo’ and ‘hi’ will be plotted as shading (by default)
Recommended range of argument r: [0, 20]
Available range of argument r: [0, 20]
Thanks in advance for any suggestions!

If you are looking to access the values of the summary statistic (ginhom) for each of the randomly labelled patterns this is in principle documented in help(envelope.ppp). Admittedly this is long and if you are new to both R and spatstat it is easy to get lost. The clue is in the value section of the help file. The result is a data.frame with the some additional classes (envelope and fv) and as the help file says:
Additionally, if ‘savepatterns=TRUE’, the return value has an
attribute ‘"simpatterns"’ which is a list containing the ‘nsim’
simulated patterns. If ‘savefuns=TRUE’, the return value has an
attribute ‘"simfuns"’ which is an object of class ‘"fv"’
containing the summary functions computed for each of the ‘nsim’
simulated patterns.
Then of course you need to know how to access an attribute in R, which is done using attr:
funs <- attr(Dvall77, "simfuns")
Then funs is a data.frame (and fv-object) with all the function values for each randomly labelled pattern.
I can't really understand from your question whether you just need the values of the upper and lower curve defining the envelope? In that case you just access them like an ordinary data.frame (and there is no need to save all the individual function values in the envelope):
lo <- Dvall77$lo
hi <- Dvall77$hi
d <- hi - lo
More elegantly you can do:
d <- with(Dvall77, hi - lo)

Related

What does a proportional matrix look like for glmnet response variable in R?

I'm trying to use glmnet to fit a GLM that has a proportional response variable (using the family="binomial").
The help file for glmnet says that the response variable:
"For family="binomial" should be either a factor with
two levels, or a two-column matrix of counts or proportions (the second column
is treated as the target class"
But I don't really understand how I would have a two column matrix. My variable is currently just a single column with values between 0 and 1. Can someone help me figure out how this needs to be formatted so that glmnet will run it properly? Also, can you explain what the target class means?
It is a matrix of positive label and negative label counts, for example in the example below we fit a model for proportion of Claims among Holders :
data = MASS::Insurance
y_counts = cbind(data$Holders - data$Claims,data$Claims)
x = model.matrix(~District+Age+Group,data=data)
fit1 = glmnet(x=x,y=y_counts,family="binomial",lambda=0.001)
If possible, so you should go back to before your calculation of the response variable and retrieve these counts. If that is not possible, you can provide a matrix of proportion, 2nd column for success but this assumes the weight or n is same for all observations:
y_prop = y_counts / rowSums(y_counts)
fit2 = glmnet(x=x,y=y_prop,family="binomial",lambda=0.001)

convert a list -class numeric- into a distance structure in R

I have a list that looks like this, it is a measure of dispersion for each sample.
1 2 3 4 5
0.11829384 0.24987017 0.08082147 0.13355495 0.12933790
To further analyze this I need it to be a distance structure, the -vegan- package need it as a 'dist' object.
I found some solutions that applies to matrices > dist, but how could I change this current data into a dist object?
I am using the FD package, at the manual I found,
Still, one potential advantage of FDis over Rao’s Q is that in the unweighted case
(i.e. with presence-absence data), it opens possibilities for formal statistical tests for differences in
FD between two or more communities through a distance-based test for homogeneity of multivariate
dispersions (Anderson 2006); see betadisper for more details
I wanted to use vegan betadisper function to test if there are differences among different regions (I provided this using element "region" with column "region" too)
functional <- FD(trait, comun)
mod <- betadisper(functional$FDis, region$region)
using gowdis or fdisp from FD didn't work too.
distancias <- gowdis(rasgo)
mod <- betadisper(distancias, region$region)
dispersion <- fdisp(distancias, presence)
mod <- betadisper(dispersion, region$region)
I tried this but I need a list object. I thought I could pass those results to betadisper.
You cannot do this: FD::fdisp() does not return dissimilarities. It returns a list of three elements: the dispersions FDis for each sampling unit (SU), and the results of the eigen decomposition of input dissimilarities (eig for eigenvalues, vectors for orthonormal eigenvectors). The FDis values are summarized for each original SU, but there is no information on the differences among SUs. The eigen decomposition can be used to reconstruct the original input dissimilarities (your distancias from FD::gowdis()), but you can directly use the input dissimilarities. Function FD::gowdis() returns a regular "dist" structure that you can directly use in vegan::betadisper() if that gives you a meaningful analysis. For this, your grouping variable must be based on the same units as your distancias. In typical application of fdisp, the units are species (taxa), but it seems you want to get analysis for communities/sites/whatever. This will not be possible with these tools.

tapply, plotting, length doesn't match

I am trying to generate a plot from a dataset of 2 columns - the first column contains distances and the second contains correlations of something measured at those distances.
Now there multiple entries with the same distance but different correlation values. I want to take the average of these various entries and generate a plot of distance versus correlation. So, this is what I did (the dataset is called correlation table):
bins <- sort(unique(correlationtable[,1]))
corr <- tapply(correlationtable[,2],correlationtable[,1],mean)
plot(bins,corr,type = 'l')
However, this gives me the error that lengths of bins and corr don't match.
I cannot figure out what am I doing wrong.
I tried it with some random data and for me it worked every time. To track the error you would need to supply us with the concrete example that did not work for you.
However to answer the question here is alternative way to do the same thing:
corr <- tapply(correlationtable[,2],correlationtable[,1],mean)
bins <- as.numeric(names(corr))
plot(bins,corr,type = 'l')
This uses the fact that tapply returns names attribute which then is converted into numeric and used as distance. And it must be the same length as corr.

Is it possible to specify a range for numbers randomly generated by mvrnorm( ) in R?

I am trying to generate a random set of numbers that exactly mirror a data set that I have (to test it). The dataset consists of 5 variables that are all correlated with different means and standard deviations as well as ranges (they are likert scales added together to form 1 variable). I have been able to get mvrnorm from the MASS package to create a dataset that replicated the correlation matrix with the observed number of observations (after 500,000+ iterations), and I can easily reassign means and std. dev. through z-score transformation, but I still have specific values within each variable vector that are far above or below the possible range of the scale whose score I wish to replicate.
Any suggestions how to fix the range appropriately?
Thank you for sharing your knowledge!
To generate a sample that does "exactly mirror" the original dataset, you need to make sure that the marginal distributions and the dependence structure of the sample matches those of the original dataset.
A simple way to achieve this is with resampling
my.data <- matrix(runif(1000, -1, 2), nrow = 200, ncol = 5) # Some dummy data
my.ind <- sample(1:nrow(my.data), nrow(my.data), replace = TRUE)
my.sample <- my.data[my.ind, ]
This will ensure that the margins and the dependence structure of the sample (closely) matches those of the original data.
An alternative is to use a parametric model for the margins and/or the dependence structure (copula). But as staded by #dickoa, this will require serious modeling effort.
Note that by using a multivariate normal distribution, you are (implicity) assuming that the dependence structure of the original data is the Gaussian copula. This is a strong assumption, and it would need to be validated beforehand.

Generating multiple confidence intervals from samples of a normal distribution in R

I am an statistics student and R beginner (understatement of the year) trying to generate multiple confidence intervals for randomly generated samples of a normal distribution as part of an assignment.
I used the function
data <- replicate(25, rnorm(20, 50, 6))
to generate 25 samples of size n=20 from a N(50, 6^2) distribution (in a double matrix).
My question is, how do I find a 95% confidence interval for each sample of this distribution? I know that I can use colMeans(data) and sd(data) to find the sample mean and sample standard deviation for each sample, but I am having a brain fart trying to think of a function that can generate the confidence intervals for all columns in the double matrix (data).
As of now, my (extremely crude) solution consists of creating the functions
left <- function (x,y){x-(qnorm(0.975)*y/sqrt(20))}
right <- function (x,y){x+(qnorm(0.975)*y/sqrt(20))}
left(colMeans(data), sd(data)
right(colMeans(data), sd(data)
to generate 2 vectors of left and right bounds. Please let me know if there is a better way I can do this.
I suppose you could use the t.test() function. It returns the mean and the 95% confidence interval for a given vector of numbers.
# Create your data
data <- replicate(25, rnorm(20, 50, 6))
data <- as.data.frame(data)
After you make your data, you could apply the t.test() function to all columns using the lapply() function.
# Apply the t.test function and save the results
results <- lapply(data, t.test)
If you only want to see the confidence interval or mean returned, you can call them using the dollar sign operator. For example, for column one of your original data frame, you could type the following:
# Check 95% CI for sample one
results[[1]]$conf.int[1:2]
You could come up with a more eloquent way of saving these data to a results data frame. Remember, you can always see what individual bits of information you can yank from an object by using the str() command. For example:
# Example
example <- t.test(data[,1])
str(example)
Hope this helps. Try this link for more information: Using R to find Confidence Intervals

Resources