How are asymptotic p-values calculated in Hmisc: rcorr? - r

I am using the rcorr function within the Hmisc package in R to develop Pearson correlation coefficients and corresponding p-values when analyzing the correlation of several fishery landings time series. The data isn't really important here but what I would like to know is: how are the p-values calculated for this? It states that the asymptotic P-values are approximated by using the t or F distributions but I am wondering if someone could help me find some more information on this or an equation that describes how exactly these values are calculated.

Related

How to transform data after fitting a distribution with gamlss?

I have a data set where observations come from highly distinct groups. Each group may have a wildly different distribution, so I am trying to find the best distribution using fitdist from fitdistrplus, then use gamlssML from the gamlss package to find the best parameters.
My issue is with transforming the data after this step. For some of the distributions, like the Box-Cox t, I can find the equation for normalizing the data using the BCT coefficients, but for many of these distributions I cannot.
Does gamlss have a function that normalizes the data after fitting? Their documentation only provides the transformations for a small number of distributions https://www.gamlss.com/wp-content/uploads/2018/01/DistributionsForModellingLocationScaleandShape.pdf
Thanks a lot
The normalised data values (for any distribution) are exactly equal to the residuals from a gamlss fit,
m1 <- gamlss()
which can be accessed by
residuals(m1) or
m1$residuals

MICE: Paired sample t-test and cohen's d estimation using imputed datasets?

I've created 12 imputed samples using the MICE package and wish to run paired sample t-tests and cohen's d calculation using an imputed dataset but I'm not sure how to do this. My end goal is to compare parameter estimates, t-test results and effect size estimates from both complete case analysis and adjusted (via MICE) to compare these, but while I have no issue with parameter estimates, I can't figure out t-tests and cohen's d.
I'm a bit confused as to how to approach this and searching online and in the mice package documentation and has not led to much progress. I did find mi.t.test from the MKmisc package but this appears to be for datasets imputed using Amelia, not MICE, and I can't quite figure it out. Would anyone have any advice or resources here please?
So far I have:
Identified auxiliary variables
Created Predictor Matrix
Imputed missing data m times
Fit & pooled estimates for linear models using with() for parameter estimates using summary()
Is there perhaps a way I can create an object of an imputed dataset that is usable with other analyses or am I looking at this in the wrong way?
I used multiple imputations for the first time for my research, but maybe I can help you by passing on the tips I received.
Perform the t-test on every imputed dataset
Use the Mice pool.scalar function. You can find documentation online. For Q fill in the Mean Difference, and for U the Standard Error Difference.
Then your pooled t-value is: qbar / sqrt(t)
You can find the values of qbar and t in the output of pool.scalar
And your pooled p-value is: 2 * (1 - pt(abs(statistic), pmax(DF, 0.001)))
Hope this helps!

Using GAMLSS, the difference between fitDist() and gamlss()

When using the GAMLSS package in R, there are many different ways to fit a distribution to a set of data. My data is a single vector of values, and I am fitting a distribution over these values.
My question is this: what is the main difference between using fitDist() and gamlss() since they give similar but different answers for parameter values, and different worm plots?
Also, using the function confint() works for gamlss() fitted objects but not for objects fitted with fitDist(). Is there any way to produce confidence intervals for parameters fitted with the fitDist() function? Is there an accuracy difference between the two procedures? Thanks!
m1 <- fitDist()
fits many distributions and chooses the best according to a
generalized Akaike information criterion, GAIC(k), wit penalty k for each
fitted parameter in the distribution, where k is specified by the user,
e.g. k=2 for AIC,
k = log(n) for BIC,
k=4 for a Chi-squared test (rounded from 3.84, the 5% critical value of a Chi-squared distribution with 1 degree of fereedom), which is my preference.
m1$fits
gives the full results from the best to worst distribution according to GAIC(k).

Calculating uncertainty (entropy) from density distribution in R

I wonder what would be the best and most correct way to estimate an Entropy from a probability density function in R? I have some real-values that are not probabilities, and I would like to get some measure of "uneaveness" of those values. Thus, I was thinking about entropy. Would something like this work:
entropy(density(dat$X), unit='log2'),
assuming that I am using entropy function from the entropy package.
Are there some other ways of estimating uncertainty from real-valued vector?
Many thanks! PM

AUC in Weka vs R

I have received AUCs and prediction from a collaborated generated in Weka. The statistical model behin that was cross validated, so my dataset with the predictions includes columns for fold, predicted probability and true class. Using this data I was unable to replicate the AUCs given the predicted probabilities in R. The values always differ slightly.
Additional details:
Weka was used via GUI, not command line
I checked the AUC in R with packages pROC and ROCR
I first tried calculating the AUC over the collected predictions (without regard to fold) and I got different AUCs
Then I tried calculating the AUCs per fold and averaging. This did also not match.
The model was ridge logistic regression and there is a single tie in the predictions
The first fold has one sample more than the others. I have tried taking a weighted average, but this did not work out either
I have even tested averaging the AUCs after logit-transformation (for normality)
Taking the median instead of the mean did not help either
I am familiar with how the AUC is calculated in R, but I don't see what Weka could do differently.

Resources