Problems fitting arbitray distributions using the fitdistplus package

Problems fitting arbitray distributions using the fitdistplus package - r

I am trying to fit my data using the fitdist() function from the fitdistrplus package in R. I succeeded using a normal and a lognormal distribution via key words 'norm' and 'lnorm' which I found via searching online.
I was not able to get other distributions working; the help of fitdist() says:
distr: A character string "name" naming a distribution for which the corresponding density function dname, the corresponding distribution function pname and the corresponding quantile function qname must be defined, or directly the density function.
I checked and when entering ?norm or ?norm into the R command line, neither function norm() nor lnorm() is found. This confuses me totally.
When I try for example fitdist(data, 'poisson'), I get the following error message:
Error in fitdist(data$time, "poisson") :
The dpoisson function must be defined
I am somewhat a noob in R, can anybody give a hint?

norm() in R is a different function to compute norms of a matrix, so not directly related to the normal distribution.
?Normal brings up the documentation related to the normal distribution, and you'll see the 4 functions dnorm, pnorm, qnorm and rnorm belonging to this family.
If you look at ?Lognormal you'll see the same convention with the typical 4 functions.
More generally, you can look-up ?Distributions, which links all of them. There you can see that the keyword for the poisson distribution should actually be pois.

Related

R Single sample K-S Test

This site gives a great example on how to execute a two-sample K-S test in R
Upon reading the documentation, I'm completely lost about how I can execute a single sample K-S test. I understand that y can be "a numeric vector of data values, or a character string naming a cumulative distribution function or an actual cumulative distribution function such as pnorm. Alternatively, y can be an ecdf function (or an object of class stepfun) for specifying a discrete distribution."
I tried passing in one variable, got this error message: "argument "y" is missing, with no default"
Thinking about how pnorm can be used I recall this link. Not sure which one would be applicable in the case where I want to execute a single sample K-S test.

R: robust package -- lmRob how to find the psi function used in the calculations

I am using lmRob.
require(robust)
stack.rob.int <- lmRob(Loss ~.*., data = stack.dat)
Fine but, I was wondering how I could obtain the psi-function that is used by the lmRob function in the actual fitting. Thanks in advance for any help!
If I were to use the lmrob function in robustbase, is it possible to change the psi function to subtract it by a constant. I am trying to implement the bootstrap as per Lahiri (Annals of Statistics, 1992) where the way to still keep the bootstrap valid is mentioned to be to replace the psi() with the originalpsi() minus the mean ot the residuals while fitting the bootstrap for the robust linear model.

So, there is no way to access the psi function directly for robust::lmRob().
Simply put, lmRob() calls lmRob.fit() (or lmRob.wfit() if you supply weights) which subsequently calls lmRob.fit.compute() that then sets initial values for a Fortran version depending on the lmRob.control() set to either "bisquare" or "optimal".
As a result of the above discussion, if you need access to the psi functions, you may wish to use robustbase as it has easy access to many psi functions (c.f. the biweights)
Edit 1
Regarding:
psi function evaluated at the residuals in lmRob
No. The details of what is available after running lmRob is available in the lmRob.object. The documentation is accessible via ?lmRob.object. Regarding residuals, the following are available in the lmRob object.
residuals: the residual vector corresponding to the estimates returned in coefficients.
T.residuals: the residual vector corresponding to the estimates returned in T.coefficients.
M.weights: the robust estimate weights corresponding to the final MM-estimates in coefficients, if applies.
T.M.weights: the robust estimate weights corresponding to the initial S-estimates in T.coefficients, if applies.
Regarding
what does "optimal" do in lmRob?
Optimal refers to the following psi function:
sign(x)*(- (phi'(|x|) + c) / (phi(|x|) )
For other traditional psi functions, you may wish to look at robustbase's vignette
or a robust textbook.

R - How to get one "summary" prediction map instead for 5 when using 5-fold cross-validation in maxent model?

I hope I have come to the right forum. I'm an ecologist making species distribution models using the maxent (version 3.3.3, http://www.cs.princeton.edu/~schapire/maxent/) function in R, through the dismo package. I have used the argument "replicates = 5" which tells maxent to do a 5-fold cross-validation. When running maxent from the maxent.jar file directly (the maxent software), an html file with statistics will be made, including the prediction maps. In R, an html file is also made, but the prediction maps have to be extracted afterwards, using the function "predict" in the dismo package in r. When I do this, I get 5 maps, due to the 5-fold cross-validation setting. However, (and this is the problem) I want only one output map, one "summary" prediction map. I assume this is possible, although I don't know how maxent computes it. The maxent tutorial (see link above) says that:
"...you may want to avoid eating up disk space by turning off the “write output grids” option, which will suppress writing of output grids for the replicate runs, so that you only get the summary statistics grids (avg, stderr etc.)."
A list of arguments that can be put into R is found in this forum https://groups.google.com/forum/#!topic/maxent/yRBlvZ1_9rQ.
I have tried to use the argument "outputgrids=FALSE" both in the maxent function itself, and in the predict function, but it doesn't work. I still get 5 maps, even though I don't get any errors in R.
So my question is: How do I get one "summary" prediction map instead of the five prediction maps that results from the cross-validation?
I hope someone can help me with this, I am really stuck and haven't found any answers anywhere on the internet. Not even a discussion about this. Hope my question is clear. This is the R-script that I use:
model1<-maxent(x=predvars, p=presence_points, a=target_group_absence, path="//home//...//model1", args=c("replicates=5", "outputgrids=FALSE"))
model1map<-predict(model1, predvars, filename="//home//...//model1map.tif", outputgrids=FALSE)
Best regards,
Kristin

Sorry to be the bearer of bad news, but based on the source code, it looks like Dismo's predict function does not have the ability to generate a summary map.
Nitty-gritty details for those who care: When you call maxent with replicates set to something greater than 1, the maxent function returns a MaxEntReplicates object, rather than a normal MaxEnt object. When predict receives a MaxEntReplicates object, it just iterates through all of the models that it contains and calls predict on them individually.
So, what next? Fortunately, all is not lost! The reason that Dismo doesn't have this functionality is that for most kinds of model-building, there isn't actually a valid way to average parameters across your cross-validation models. I don't want to go so far as to say that that's definitely the case for MaxEnt specifically, but I suspect it is. As such, cross-validation is usually used more as a way of checking that your model building methodology works for your data than as a way of building your model directly (see this question for further discussion of that point). After verifying via cross-validation that models built using a given procedure seem to be accurate for the phenomenon you're modelling, it's customary to build a final model using all of your data. In theory this new model should only be better than models trained on a subset of your data.
So basically, assuming your cross-validated models look reasonable, you can run MaxEnt again with only one replicate. Your final result will be a model accuracy estimate based on the cross-validation and a map based on the second run with all of your data lumped together. Depending on what exactly your question is, there might be other useful summary statistics from the cross-validation that you want to use, but those are all things you've already seen in the html output.

I may have found this a couple of years later. But you could do something like this:
xm <- maxent(predictors, pres_train) # basically the maxent model
px <- predict(predictors, xm, ext=ext, progress= '' ) #prediction
px2 <- predict(predictors, xm2, ext=ext, progress= '' ) #prediction #02
models <- stack(px,px2) # create a stack of prediction from all the models
final_map <- mean(px,px2) # Take a mean of all the prediction
plot(final_map) #plot the averaged map
xm1,xm2,.. would be the maxent models for each partitions in cross-validation, and px, px2,.. would be the predicted maps.

Loglogistic distribution r

There are some packages in R which produce numbers coming from a loglogistic distribution.
One example is the package FAdist. In particular I am trying to use the function rllog to obtain numbers coming from a loglogistic distribution, but it's not clear to my how are the parameters defined.
What is the full version of the pdf of a number produced when using the rllog function for shape parameter a and scale parameter b?

How can I find a distribution for a set of data, and then further propagate this distribution?

My problem is that I have a set of data which I want to fit a distribution to, and then once I have found the distribution, run monte carlo simulation on it to propagate the found distribution.
My first bit of code is:
require(fitdistrplus)
example1<-c(29,23,29,25,26,29,29,27,25,25,25,26,28,25,29,28,28,26,28,25,29,26,30)
f1<-fitdist(example1,rgamma,method="mle")
If I then use the command
print(f1)
it tells me that the shape is 204.00
and the rate is 7.568 for the gamma distribution
(please note the numbers I am fitting the distribution to are arbitrary at the moment, I would normally have hundreds of observations to fit the distribution to).
Where I now need help is when I use the code from package mc2d to propagate this distribution as follows:
require(mc2d)
ndunc(1000)
fitted<-mcstoc(rgamma, type="U", shape=204.00, rate=7.569)
Currently I am having to manually type in the shape and rate into this above function from the previous "print" of the "fitdist" command.
My question is, is there a way to get the mcstoc command to automatically pick up the shape and rate from the fitdist command so that I do not have to interrupt the code to do so manually? Or if it is not possible with the fitdistrplus package and mc2d package, then is there another package out there which might do this for me?
Many thanks in advance!

f1$estimate[1]
# shape
#204.0008
f1$estimate[2]
# rate
#7.567762
fitted<-mcstoc(rgamma, type="U", shape=f1$estimate[1], rate=f1$estimate[2])

myFunction <- function (data){
f1<-fitdist(data,rgamma,method="mle")
fitted<-mcstoc(rgamma, type="U", shape=f1$estimate[1], rate=f1$estimate[2])
return(fitted)
}
example1<-c(29,23,29,25,26,29,29,27,25,25,25,26,28,25,29,28,28,26,28,25,29,26,30)
fitted.example1 <- myFunction(exemple1)
This function isn't tested.

If you do not want to type the name of the parameters,
you can use do.call:
fitted <- do.call(
function(...) mcstoc(rgamma, type="U", ...),
as.list(f1$estimate)
)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Problems fitting arbitray distributions using the fitdistplus package - r

Related

R Single sample K-S Test

R: robust package -- lmRob how to find the psi function used in the calculations

R - How to get one "summary" prediction map instead for 5 when using 5-fold cross-validation in maxent model?

Loglogistic distribution r

How can I find a distribution for a set of data, and then further propagate this distribution?

Categories

Resources