Numerical integration of numerical function in R - r

I'm, trying to apply this solution to find the p-value in an arbitrary distribution defined from data experiments. I have estimated this distribution using the density function in R. Now, I would like to integrate this function to apply the solution proposed by #mpiktas. However, the integrate function requires a function as input, not two vectors x and y with the values that define the function, which is what density provides.
Any idea on how to deal with this numerical integration based on x-y values in R?

Related

How to get analytical formula of probability density function and cumulative distribution function for a distribution in R?

Is there anyway to print out the PDF/CDF formula for a distribution? E.g. for normal distribution I wish to run a command and see some formula printed f(x) = 1/sqrt(.....)...
I want to translate R's implementation of distributions like hyperbolic and EGB2 into Python and hope there is a way to fetch the formula from R elegantly rather than looking into the source code.

R: functions to determine distance of multivariate data to normal distribution

I have a multivariate data and I am interested to compute the distance of complete data to multivariate normal distribution. I want to use R. I have seen some functions like shapiro-wilk test etc. But from them I can only understand if p-value is less <0.05 it does not follow normal distribution. But I want to know how much it is far from the normal distribution. Can anyone please refer me to some functions that I can refer to for use.
Use the mqqnorm function from the RVAideMemoire package. It shows, among others, Mahalanobis distances. From the function example:
x <- 1:30+rnorm(30)
y <- 1:30+rnorm(30,1,3)
mqqnorm(cbind(x,y))

Simple variogram in R, understanding gstat::variogram() and object gstat

I have a data.frame in R whose variables represent locations and whose observations are measures of a certain variable in those locations. I want to measure the decay of dependence for certain locations depending on distance, so the variogram comes particularly useful for my studies.
I am trying to use gstat library but I am a bit confused about certain parameters. As far as I understand the (empirical) variogram should only need as basic data:
The locations of the variables
Observations for these variables
And then other parameters like maximun distance, directions, ...
Now, gstat::variogram() function requires as first input an object of class gstat. Checking the documentation of function gstat() I see that it outputs an object of this class, but this function requires a formula argument, which is described as:
formula that defines the dependent variable as a linear model of independent variables; suppose the dependent variable has name z, for ordinary and simple kriging use the formula z~1; for simple kriging also define beta (see below); for universal kriging, suppose z is linearly dependent on x and y, use the formula z~x+y
Could someone explain me what this formula is for?
try
methods(variogram)
and you'll see that gstat has several methods for variogram, one requiring a gstat object as first argument.
Given a data.frame, the easiest is to use the formula method:
variogram(z~1, ~x+y, data)
which specifies that in data, z is the observed variable of interest, ~1 specifies a constant mean model, ~x+y specify that the coordinates are found in columns x and y of data.

Prediction at a new value using lowess function in R

I am using lowess function to fit a regression between two variables x and y. Now I want to know the fitted value at a new value of x. For example, how do I find the fitted value at x=2.5 in the following example. I know loess can do that, but I want to reproduce someone's plot and he used lowess.
set.seed(1)
x <- 1:10
y <- x + rnorm(x)
fit <- lowess(x, y)
plot(x, y)
lines(fit)
Local regression (lowess) is a non-parametric statistical method, it's a not like linear regression where you can use the model directly to estimate new values.
You'll need to take the values from the function (that's why it only returns a list to you), and choose your own interpolation scheme. Use the scheme to predict your new points.
Common technique is spline interpolation (but there're others):
https://www.r-bloggers.com/interpolation-and-smoothing-functions-in-base-r/
EDIT: I'm pretty sure the predict function does the interpolation for you. I also can't find any information about what exactly predict uses, so I've tried to trace the source code.
https://github.com/wch/r-source/blob/af7f52f70101960861e5d995d3a4bec010bc89e6/src/library/stats/R/loess.R
else { ## interpolate
## need to eliminate points outside original range - not in pred_
I'm sure the R code calls the underlying C implementation, but it's not well documented so I don't know what algorithm it uses.
My suggestion is: either trust the predict function or roll out your own interpolation algorithm.

Extract random sample from a unknown distribution (no generate stochastic random deviates)

I have a vector of data. I need build the density / distribution function and from that, extract a random sample, i.e. I need obtain the result that give us a function similar to rnorm(), rpois(), rbinom(), etc, but with a distribution built from a vector of data. All in R. Thank you so much.
It has nothing to do with generate stochastic random deviates.
I know the function sample() do something similar, but not exactly. If I use sample() I obtain only elements from my original data, as a discrete distribution and I need as a continuous distribution.

Resources