How to extract values fitted to a gaussian distribution in R? - r

I have a data frame X with 2 columns a and b, a is of class character and b is of class numeric.
I fitted a gaussian distribution using the fitdist (fitdistrplus package) function on b.
data.fit <- fitdist(x$b,"norm", "mle")
I want to extract the elements in column a that fall in the 5% right tail of the fitted gaussian distribution.
I am not sure how to proceed because my knowledge on fitting distribution is limited.
Do I need to retain the corresponding elements in column a for which b is greater than the value obtain for the 95%?
Or does the fitting imply that new values have been created for each value in b and I should use those values?
Thanks

by calling unclass(data.fit) you can see all the parts that make up the data.fit object, which include:
$estimate
mean sd
0.1125554 1.2724377
which means you can access the estimated mean and standard deviation via:
data.fit$estimate['sd']
data.fit$estimate['mean']
To calculate the upper 5th percentile of the fitted distribution, you can use the qnorm() function (q is for quantile, BTW) like so:
threshold <-
qnorm(p = 0.95,
mean=data.fit$estimate['mean'],
sd=data.fit$estimate['sd'])
and you can subset your data.frame x like so:
x[x$b > threshold,# an indicator of the rows to return
'a']# the column to return

Related

Calculate GWESP for a matrix with fixed decay parameter

I'm wondering if there exist pre-programed R functions that can calculate geometrically weighted edgewise shared partners (GWESP, per Hunter (2007)) for a given adjacency matrix with fixed decay parameter (alpha) and then return the estimated values also in matrix form?
I've looked at xergm and igraph packages but could not find one, they only estimate GWESP during model (network models) fitting but that's not what I want to do here. I only need a function to calculate GWESP for a given adjacency matrix and return the estimated GWESP value (with fixed decay parameter) for each dyad of the adjacency matrix in matrix form as well.
For example
# for a given adjacency matrix (adjm)
adjm <- matrix(sample(0:1, 100, replace=TRUE, prob=c(0.6,0.4)), nc=10)
# apply some functions to calculate GWESP for each dyad of adjm (and fix alpha at some value) and return a matrix of the same dimension filled with estimated GWESP values
somefunction(adjm, alpha = somevalue)

Weights in Principal Component Analysis (PCA) using Psych::principal

I am computing a Principal Component Analysis with this matrix as input using the function psych::principal . Each column in the input data is the monthly correlations between crop yields and a climatic variable in a region (30) so what I want to obtain with the PCA is to reduce the information and find simmilarities pattern of response between regions.
pc <- principal(dat,nfactors = 9, residuals = FALSE, rotate="varimax", n.obs=NA, covar=TRUE,scores=TRUE, missing=FALSE, impute="median", oblique.scores=TRUE, method="regression")
The matrix has dimensions 10*30, and the first message I get is:
The determinant of the smoothed correlation was zero. This means the
objective function is not defined. Chi square is based upon observed
residuals. The determinant of the smoothed correlation was zero. This
means the objective function is not defined for the null model either.
The Chi square is thus based upon observed correlations. Warning
messages: 1: In cor.smooth(r) : Matrix was not positive definite,
smoothing was done 2: In principal(dat, nfactors = 3, residuals = F,
rotate = "none", : The matrix is not positive semi-definite, scores
found from Structure loadings
Nontheless, the function seems to work, the main problem is when you check pc$weights and realize that is equal to pc$loadings.
When the number of columns is less than/equal to the number of rows the results are coherent, however that is not the case here.
I have to obtain the weights for refering the score values in the same magnitude as the input data (correlation values).
I would really appreciate any help.
Thank you.

Several Regressions_Modify the code

When I run the code below, I can calculate the regression's coefficients for each category of c. Now I was wondering how I can apply these estimated coefficients to calculate the residuals of all observations. For example, here just 25 observations belong to c=1, but I need to calculate the fitted values/Residuals of all 50 observations based on the estimated coefficients for this category.
A<-cars$speed
B<-cars$dist
c<-rep(1:2,25)
S<-data.frame(A,B,c)
library(plyr)
lmodel <- dlply(S,"c", function(d) lm(B~A, data = d))
I'm not 100% sure I understand what you mean, but the following code will give you a list of residuals. The first element of the list contains the residuals for all 50 observations using the coefficients for c=1 and the second for c=2.
residuals<- lapply(lmodel, function(x) B - coef(x)[1] - coef(x)[2]*A)

How do you select the rank-k approximation for SVDImpute (package: imputation) in R?

I have a matrix with nominal values from 1-5, with some missing values. I would like to use SVDImpute (from the "imputation" package) in R to fill in the missing values, but I am unsure of what number to use for k (rank-k approximation) in the function.
The help page description of the imputation is:
Imputation using the SVD First fill missing values using the mean of
the column Then, compute a low, rank-k approximation of x. Fill the
missing values again from the rank-k approximation. Recompute the
rank-k approximation with the imputed values and fill again, repeating
num.iters times
To me, this sounds likes the columns means are calculated as part of the function; is this correct? If so, then how was the value of k=3 chosen for the example?
x = matrix(rnorm(100),10,10)
x.missing = x > 1
x[x.missing] = NA
SVDImpute(x, 3)
Any help is greatly appreciated.

Probability transformation using R

I want to turn a continuous random variable X with cdf F(x) into a continuous random variable Y with cdf F(y) and am wondering how to implement it in R.
For example, perform a probability transformation on data following normal distribution (X) to make it conform to a desirable Weibull distribution (Y).
(x=0 has CDF F(x=0)=0.5, CDF F(y)=0.5 corresponds to y=5, then x=0 corresponds to y=5 etc.)
There are many built in distribution functions, those starting with a 'p' will transform to a uniform and those starting with a 'q' will transform from a uniform. So the transform in your example can be done by:
y <- qweibull( pnorm( x ), 2, 6.0056 )
Then just change the functions and/or parameters for other cases.
The distr package may also be of interest for additional capabilities.
In general, you can transform an observation x on X to an observation y on Y by
getting the probability of X≤x, i.e. FX(x).
then determining what observation y has the same probability,
I.e. you want the probability Y≤y = FY(y) to be the same as FX(x).
This gives FY(y) = FX(x).
Therefore y = FY-1(FX(x))
where FY-1 is better known as the quantile function, QY. The overall transformation from X to Y is summarized as: Y = QY(FX(X)).
In your particular example, from the R help, the distribution functions for the normal distribution is pnorm and the quantile function for the Weibull distribution is qweibull, so you want to first of all call pnorm, then qweibull on the result.

Resources