Density estimation with dirichlet process - r

I'm interesting in estimating probability density functions in quite high dimension (around 15 or even more). I've heard about Dirichlet processes. Do you know a R package which implements this kind of methods ?
Thank You

I think you can use the "DPpackage" (http://cran.r-project.org/web/packages/DPpackage/index.html) that provides some Bayesian nonparametric density estimation functions. You can perform the Dirichlet mixture density estimation with 'DPdensity' function.

Related

I'm looking for the CaliPro codes on matlab

I am looking for the matlab code for CaliPro: A Calibration Protocol That Utilizes Parameter Density
Estimation to Explore Parameter Space and Calibrate Complex Biological Models
I am looking for the code "plot_outcomes.m" which plots the experimental data values, and each of the model simulations for both predator and prey.

Obtaining glmer coefficient confidence intervals via bootstrapping

I am in my first experience using mixed models in R for my statistical analysis. Due to my data being comprised of binary outcome variables, I have managed to build a logistic model using the glmer function of the lme4 package that I think works as I wanted it to.
I am now aiming to investigate the statistical significance of my model coefficients. I have read that generally, the best approach for generalized mixed models is to bootstrap confidence intervals, but I haven't managed to find a good, clear, explanation of how to do this in R.
Would anyone have any suggestions? Are there any packages in R that expedite this process, or do people generally build their own functions for this? I haven't really done any bootstrapping before so I'd appreciate some more in-depth answers.
If you want to compute parametric bootstrap confidence intervals, the built-in functionality
confint(fitted_model, method = "boot")
should work (see ?confint.merMod)
Also see this answer (which illustrates both parametric and nonparametric bootstrapping for user-defined quantities).
If you have multiple cores, you can speed this up by adding parallel = "multicore", ncpus = parallel::detectCores()-1 (or some other appropriate number of cores to use): see ?lme4::bootMer for details.

Is it possible to customize a likelihood function for logit models using speedglm, biglm, and glm packages

I am trying to fit a customized logistic regression/survival analysis function using the optim/maxBFGS functions in R and literally defining the functions by hand.
I was always under the impression that for the packages speedglm, biglm, and glm, the likelihood functions for logit models or whatever distribution were hardlocked. However, I was wondering if I was mistaken or if it was possible to specify my own likelihood functions. The reason being that optim/maxBFGS is a LOT slower to run than speedglm.
The R glm function is set up only to work with likelihoods from the exponential family. The fitting algorithms won't work with any other kind of likelihood, and with any other you're not in fact fitting a glm but some other kind of model.
The glm functions fit using iterated reweighted least squares; the special form of the likelihood function for the exponential families makes Newton's method for solving the max likelihood equations identical to fitting ordinary least squares regression repeatedly until convergence is achieved.
This is a faster process than generic nonlinear optimization; so if the likelihoods you want to use have been customized so that they are no longer from an exponential family, you are no longer fitting a generalized linear model. This means that the IRWLS algorithm isn't applicable, and the fit will be slower, as you are finding.

Output posterior distribution from bayesian network in R (bnlearn)

I'm experimenting with Bayesian networks in R and have built some networks using the bnlearn package. I can use them to make predictions for new observations with predict(), however I would also like to have the posterior distribution over the possible classes. Is there a way of retrieving this information?
It seems like there is a prob-parameter that does this for the naive bayes implementation in the bnlearn package, but not for networks fitted with bn.fit.
Thankful for any help with this.
See the documentation of bnlearn.
predict function implements prob only for naive.bayes and TAN.
In short, because all other methods do not necessarily compute posterior probabilities.
[bnlearn] :: predict returns the predicted values for node given the data specified by data. Depending on the
value of method, the predicted values are computed as follows:
a)parents b)bayes-lw
When using bayes-lw , likelihood weighting simulations are performed for making predictions.
Hope this helps. :)

Goodness of fit functions in R

What functions do you use in R to fit a curve to your data and test how well that curve fits? What results are considered good?
Just the first part of that question can fill entire books. Just some quick choices:
lm() for standard linear models
glm() for generalised linear models (eg for logistic regression)
rlm() from package MASS for robust linear models
lmrob() from package robustbase for robust linear models
loess() for non-linear / non-parametric models
Then there are domain-specific models as e.g. time series, micro-econometrics, mixed-effects and much more. Several of the Task Views as e.g. Econometrics discuss this in more detail. As for goodness of fit, that is also something one can spend easily an entire book discussing.
The workhorses of canonical curve fitting in R are lm(), glm() and nls(). To me, goodness-of-fit is a subproblem in the larger problem of model selection. Infact, using goodness-of-fit incorrectly (e.g., via stepwise regression) can give rise to seriously misspecified model (see Harrell's book on "Regression Modeling Strategies"). Rather than discussing the issue from scratch, I recommend Harrell's book for lm and glm. Venables and Ripley's bible is terse, but still worth a reading. "Extending the Linear Model with R" by Faraway is comprehensive and readable. nls is not covered in these sources, but "Nonlinear Regression with R" by Ritz & Streibig fills the gap and is very hands-on.
The nls() function (http://sekhon.berkeley.edu/stats/html/nls.html) is pretty standard for nonlinear least-squares curve fitting. Chi squared (the sum of the squared residuals) is the metric that is optimized in that case, but it is not normalized so you can't readily use it to determine how good the fit is. The main thing you should ensure is that your residuals are normally distributed. Unfortunately I'm not sure of an automated way to do that.
The Quick R site has a reasonable good summary of basic functions used for fitting models and testing the fits, along with sample R code:
http://www.statmethods.net/stats/regression.html
The main thing you should ensure is
that your residuals are normally
distributed. Unfortunately I'm not
sure of an automated way to do that.
qqnorm() could probably be modified to find the correlation between the sample quantiles and the theoretical quantiles. Essentially, this would just be a numerical interpretation of the normal quantile plot. Perhaps providing several values of the correlation coefficient for different ranges of quantiles could be useful. For example, if the correlation coefficient is close to 1 for the middle 97% of the data and much lower at the tails, this tells us the distribution of residuals is approximately normal, with some funniness going on in the tails.
Best to keep simple, and see if linear methods work "well enuff". You can judge your goodness of fit GENERALLY by looking at the R squared AND F statistic, together, never separate. Adding variables to your model that have no bearing on your dependant variable can increase R2, so you must also consider F statistic.
You should also compare your model to other nested, or more simpler, models. Do this using log liklihood ratio test, so long as dependant variables are the same.
Jarque–Bera test is good for testing the normality of the residual distribution.

Resources