This could be a quick one.
I have not been able to find a function for the mathematical "error function" or the "inverse error function" in R. I have not seen a package either.
I am aware I can script this but I thought someone must have made a package for its various approximations by now. Could be poor Googling due to generic terms "error function" ...
These are very closely related to pnorm() and qnorm(): see the last 4 lines of the example code in ?pnorm:
## if you want the so-called 'error function'
erf <- function(x) 2 * pnorm(x * sqrt(2)) - 1
## (see Abramowitz and Stegun 29.2.29)
## and the so-called 'complementary error function'
erfc <- function(x) 2 * pnorm(x * sqrt(2), lower = FALSE)
## and the inverses
erfinv <- function (x) qnorm((1 + x)/2)/sqrt(2)
erfcinv <- function (x) qnorm(x/2, lower = FALSE)/sqrt(2)
If you want to use complex-valued arguments, you need erfz from the pracma package (as commented above by #eipi10). Otherwise, it's not clear whether there's an advantage to using the versions in pracma (the implementations of pnorm() and qnorm() have been very thoroughly tested over a wide range of parameter values ...)
As far as searching goes,
library("sos")
findFn("erf")
seems to work pretty well ...
Related
I am estimating a fairly simple McFadden choice model using a very large data set (101.6 million unit-alternatives). I can estimate this model just fine in Stata using the asclogit command, but when I try to use the mlogit package in R, I get the following error:
region1 <- mlogit(chosen ~ mean_log.wage + mean_log.rent + bornNear + Dim.1 + regionFE | 0,
shape= "long", chid.var = "chid", alt.var = "alternatives", data = ready)
Error in qr.default(na.omit(X)) : too large a matrix for LINPACK
Calls: mlogit ... model.matrix -> model.matrix.mFormula -> qr -> qr.default
If I look at the source code of qr.R it's clear that the number of elements in my design matrix is too big relative to the LINPACK limit of 2,147,483,647. However, no such limit exists for LAPACK (that I can tell, at least).
From qr.R:
qr.default <- function(x, tol = 1e-07, LAPACK = FALSE, ...)
{
x <- as.matrix(x)
if(is.complex(x))
return(structure(.Internal(La_qr_cmplx(x)), class = "qr"))
## otherwise :
if(LAPACK)
return(structure(.Internal(La_qr(x)), useLAPACK = TRUE, class = "qr"))
## else "Linpack" case:
p <- as.integer(ncol(x))
if(is.na(p)) stop("invalid ncol(x)")
n <- as.integer(nrow(x))
if(is.na(n)) stop("invalid nrow(x)")
if(1.0 * n * p > 2147483647) stop("too large a matrix for LINPACK")
...
qr() appears to be called in the mFormula method of mlogit, when model.matrix is being created, and probably while checking NAs. But I can't tell if there is a way to pass LAPACK = TRUE to mlogit, or if there is a way to skip the NA checking.
I'm hoping #YvesCroissant will see this.
As I mentioned, I can estimate this model just fine in Stata, so it's not a question of resources. My Stata license is not portable, however, which is why I would like to use R.
Thanks to Julius' comment and this post on namespaces in R, I figured out the answer. I added the following code right after my library statements:
source("mymFormula.R")
tmpfun <- get("model.matrix.mFormula", envir = asNamespace("mlogit"))
environment(mymFormula) <- environment(tmpfun)
attributes(mymFormula) <- attributes(tmpfun) # don't know if this is really needed
assignInNamespace("model.matrix.mFormula", mymFormula, ns="mlogit")
mymFormula.R is an R script where I copy/pasted the contents of mlogit:::model.matrix.mFormula and added mymFormula <- before the function invocation at the top of the file.
I viewed the contents of mlogit:::model.matrix.mFormula by typing trace(mlogit:::model.matrix.mFormula, edit=TRUE) in RStudio. (Thanks to this answer for help on how to do that.)
This is the Non-Linear Equation in "mu" which I want to solve numerically using R. All of the paired (x, y) are known. So the only variable is "mu"
Now, I have written the function in R. Then, I am trying to get the root by using "rootSolve" package. But it is giving an error.
This is my code of the function:
f = function(k){
sum(((2*exp(-x) - 1)*(2*exp(-y)- 1))/
(1 + k*(2*exp(-x) - 1)*(2*exp(-y)- 1)))
}
This is the error after running "uniroot.all" from the "rootSolve" package:
> library(rootSolve)
> uniroot.all(f, interval = c(-1, 1))
numeric(0)
Warning message:
In k * (2 * exp(-x) - 1) :
longer object length is not a multiple of shorter object length
Also, I am searching my root in the interval (-1, 1).
Can someone please help? I think, my way of defining the function is wrong. Hence this error is coming.
Can anyone confirm that my way of defining the function in the picture is correct or not?
Thank you in advance!
Let me add something more:
The way I have defined my function (chances are high that the way is wrong) and given my data (x, y), I have f(-1) < f(1) and also f(-1) * f(1) < 0. These conditions are satisfied.
Additional Problem that makes me question my way of writing the function:
I cannot run curve function in R. e.g., curve(f, from = -1, to = 1)
But, if I plot Vectorize(f), then the curve function works.
Can anyone please help me in correcting the way of defining the function?
Thank you very much!
It seems that uniroot.all wants a vectorized function. I have tried to use it and get the same error as you do. I can't find anything in the documentation of uniroot.all about this.
I have tried package nleqslv and can get a solution.
I have rewritten your function as follows (I think you made some errors in setting up the equation):
f <- function(k){
A <- 2*exp(-x)-1
B <- 2*exp(-y)-1
sum((A*B)/(1+k*A*B))
}
and run the function with these data
set.seed(13)
x <- runif(10)*10
y <- runif(10)*5
and the solved your function as follows:
library(nleqslv)
nleqslv(0,f)
with the following output:
$x
[1] 1.42992
$fvec
[1] 2.300633e-09
$termcd
[1] 1
$message
[1] "Function criterion near zero"
$scalex
[1] 1
$nfcnt
[1] 7
$njcnt
[1] 1
$iter
[1] 7
This solves with a Secant method. If you want to try other methods you can use testnslv to explore.
** Addition **
Function uniroot.all will work in my example as long as you do:
fvec <- Vectorize(f)
and change the interval to c(-1,1.7).
I am using Rstudio, and trying to use roc from package pROC with boot for bootstrapping. I am following the code on this link. Code from that link uses another function with boot which works fine. But when I try roc, it gives error.
Below is my code: (In the output I am printing the dimensions of the sample to see how many times re-sampling is done. Here R=5, sampling is done 6 times and then error occurs).
library(boot)
roc_boot <- function(D, d) {
E=D[d,]
print(dim(E))
return(roc(E$x,E$y))
}
x = round(runif(100))
y = runif(100)
D = data.frame(x, y)
b = boot(D, roc_boot, R=5)
Output:
[1] 100 2
[1] 100 2
[1] 100 2
[1] 100 2
[1] 100 2
[1] 100 2
Error in boot(D, roc_boot, R = 5) :
incorrect number of subscripts on matrix
What is the problem here?
If I replace roc with some other function like sum, then it works perfectly (it prints the 6 lines without any error). It also gives different answers when booted multiple times (while keeping D same).
Please notice that the error is occurring after all the re-sampling is done. I cannot find the source of this particular error. I have looked at other answers like this but they don't seem to apply on my case. Can someone also explain why this error occurs and what it means, generally?
EDIT:
I returned only area under curve using following function:
roc_boot <- function(D, d) {
E=D[d,]
objectROC <- roc(E$x,E$y)
return(objectROC$auc)
}
This gives an answer of area under the curve but it is same as the answer without bootstrapping, meaning there is no improvement. I need to pass the whole roc object to have improvement because of bootstrapping.
Turns out, you can't return roc object from the function statistic in boot. It has to be a numeric value. So the following modification gets rid of the error (as edited in the questions)
roc_boot <- function(D, d) {
E=D[d,]
objectROC <- roc(E$x,E$y)
return(objectROC$auc)
}
Moreover, As suggested by #Calimo, boot only improves the confidence interval and not the actual answer. In my case, there is a slight improvement in confidence interval.
I am trying to run the ABC_sequential() function from the package EasyABC in parallel in R. But I am getting the error:
Error in checkForRemoteErrors(val) :
2 nodes produced errors; first error: could not find function "f"
I think this is because ABC_sequential() is ultimately calling parLapplyLB() (https://github.com/cran/EasyABC/blob/master/R/EasyABC-internal.R) and I have to export the functions using clusterExport()? (parSapply not finding objects in global environment)
Because the function calls makeCluster() within it, it seems like I may have to modify the package to add clusterExport(cl, "f")? However, as I am a fairly new, I haven't looked into modifying packages for my needs (and I am suspecting it may be more complicated than adding the one line of code). I am wondering if there is a better/easier workaround to getting my function onto the parallel nodes? Below is a simplified reproducible example based on the parallel example given in the R help for ABC_sequential:
library(EasyABC)
f <- function(x){
x = x^2
}
toy_model_parallel <- function(x){
set.seed(x[1])
2 * x[2] + f(2) + rnorm(1,0,0.1)
}
sum_stat_obs <- 6.5
pacc <- .4
toy_prior <- list(c("unif",0,1)) # a uniform prior distribution between 0 and 1
# this line of code gives the checkForRemoteErrors(val) error
ABC_Lenormand <- ABC_sequential(method="Lenormand", model=toy_model_parallel, prior=toy_prior, nb_simul=20, summary_stat_target=sum_stat_obs, p_acc_min=pacc, use_seed=TRUE, n_cluster=2)
}
Any advice is greatly appreciated.
You could define any necessary auxiliary functions inside the model function. In this case:
toy_model_parallel <- function(x){
f <- function(x){
x = x^2
}
set.seed(x[1])
2 * x[2] + f(2) + rnorm(1,0,0.1)
}
It looks like you need to do any worker initialization at the beginning of this function. So if your function needs to call functions from another package, you'd also need to load that package at the beginning of the model function.
I suggest that you send an email to the package developers to see if they have a better solution to this problem. If they don't, you might request that they add support for a user specified cluster object.
Suppose I want to integrate some function that involves sums and products of a few other user defined functions. Lets take an extremely simple example, it gives the same error.
integrate(f = sin + cos, lower=0, upper=1)
This yields "Error in sin + cos : non-numeric argument to binary operator" which I think is saying it doesn't make sense to just add functions together without passing them some sort of argument. So I am a bit stuck here. This thread poses what I think is a solution to a more complicated question, that can be applied here, but it seems long for such a simple task in this case. I'm actually kind of surprised that I am unable to find passing function arguments to functions in the help manual so I think I am not using the right terminology.
Just write your own function:
> integrate(f = function(x) sin(x) + cos(x), lower=0, upper=1)
1.301169 with absolute error < 1.4e-14
In this example I've used an anonymous function, but that's not necessary. The key is to write a function that represents whatever function you want to integrate over. In this case, the function should take a vector input and add the sin and cos of each element.
Equivalently, we could have done:
foo <- function(x){
sin(x) + cos(x)
}
integrate(f = foo, lower=0, upper=1)
This is an old question, but I recently struggled with it, so here is a simple example in case it helps others in the future. #joran's answer is still the best.
Define your first function: f1 <- function(x){return(x*2)}
Test it: f1(8) (expect 8*2=16); returns [1] 16
Define your second function: f2 <-function(f, y){return(f+y)}
Test it: f2(f=f1(8), y=1) (expect 8*2 = 16 +1 = 17); returns [1] 17