I am trying to write a very simple function wrapper in R, that will accept f and return g where g returns zero whenever the first argument is negative. I have the following code
wrapper <- function(f) {
function(x, ...) {
if( x <= 0 ) { 0 }
else { f(x, ...) }
}
}
Thge wrapper works as expected, but is there are way to maintain the function signature
> wdnorm <- wrapper(dnorm)
> args(dnorm)
function (x, mean = 0, sd = 1, log = FALSE)
NULL
> args(wdnorm)
function (x, ...)
NULL
I would like to do something like this (but obviously it doesn't work)
args(g) <- args(f)
is this possible in R?
Here is what you want. Tho, do you really need this?
wrapper <- function(f) {
f2 = function(x) {
if (x <= 0) { 0 }
else { do.call(f, as.list( match.call())[-1]) }
}
formals(f2) = formals(f)
f2
}
wdnorm <- wrapper(dnorm)
args(dnorm)
args(wdnorm)
wdnorm(-5)
wdnorm(5)
output
> args(dnorm)
function (x, mean = 0, sd = 1, log = FALSE)
NULL
> args(wdnorm)
function (x, mean = 0, sd = 1, log = FALSE)
NULL
> wdnorm(-5)
[1] 0
> wdnorm(5)
[1] 1.48672e-06
Related
Do if statements not work for integrate? I have to do something much more complicated than this, but I am supplying this example because it isolated the problem.
Kernel = function(x){
if(abs(x)<1){
w = 1 - abs(x)
} else{
w = 0
}
return(w)
}
integrate(Kernel,
0,
1)
The error message:
the condition has length > 1 and only the first element will be used
Kernel = function(x){
pmax(1-abs(x), 0)
}
integrate(Kernel, 0, 1)
0.5 with absolute error < 5.6e-15
or even:
Kernel1 = function(x){
ifelse(abs(x)<1, 1-abs(x), 0)
}
integrate(Kernel1, 0, 1)
0.5 with absolute error < 5.6e-15
If you want to maintain the way you have written your function, you have to vectorize it:
Kernel2 = function(x){
ifelse(abs(x)< 1, 1-abs(x), 0)
if(abs(x)<1){
w = 1 - abs(x)
} else{
w = 0
}
return(w)
}
integrate(Vectorize(Kernel2), 0, 1)
0.5 with absolute error < 5.6e-15
I found myself often writing code such as
#' #param x input vector
#' #param ... passed to [slow_fun()]
fast_fun <- function(x, ...) {
u <- unique(x)
i <- match(x, u)
v <- slow_fun(u, ...)
v[i]
}
To accelerate a slow vectorized "pure" function where each input entry could theoretically be computed individually and where input is expected to contain many duplicates.
Now I wonder whether this is the best way to achieve such a speedup or is there some function (preferrably in base R or the tidyverse) which does something like unique and match at the same time?
Benchmarks so far
Thanks for the provided answers. I've written a small benchmark suite to compare the approaches:
method <- list(
brute = slow_fun,
unique_match = function(x, ...) {
u <- unique(x)
i <- match(x, u)
v <- slow_fun(u, ...)
v[i]
},
unique_factor = function(x, ...) {
if (is.character(x)) {
x <- factor(x)
i <- as.integer(x)
u <- levels(x)
} else {
u <- unique(x)
i <- as.integer(factor(x, levels = u))
}
v <- slow_fun(u, ...)
v[i]
},
unique_match_df = function(x, ...) {
u <- unique(x)
i <- if (is.numeric(x)) {
match(data.frame(t(round(x, 10))), data.frame(t(round(u, 10))))
} else {
match(data.frame(t(x)), data.frame(t(u)))
}
v <- slow_fun(u, ...)
v[i]
},
rcpp_uniquify = function(x, ...) {
iu <- uniquify(x)
v <- slow_fun(iu[["u"]], ...)
v[iu[["i"]]]
}
)
exprs <- lapply(method, function(fun) substitute(fun(x), list(fun = fun)))
settings$bench <- lapply(seq_len(nrow(settings)), function(i) {
cat("\rBenchmark ", i, " / ", nrow(settings), sep = "")
x <- switch(
settings$type[i],
integer = sample.int(
n = settings$n_distinct[i],
size = settings$n_total[i],
replace = TRUE
),
double = sample(
x = runif(n = settings$n_distinct[i]),
size = settings$n_total[i],
replace = TRUE
),
character = sample(
x = stringi::stri_rand_strings(
n = settings$n_distinct[i],
length = 20L
),
size = settings$n_total[i],
replace = TRUE
)
)
microbenchmark::microbenchmark(
list = exprs
)
})
library(tidyverse)
settings %>%
mutate(
bench = map(bench, summary)
) %>%
unnest(bench) %>%
group_by(n_distinct, n_total, type) %>%
mutate(score = median / min(median)) %>%
group_by(expr) %>%
summarise(mean_score = mean(score)) %>%
arrange(mean_score)
Currently, the rcpp-based approach is best in all tested settings on my machine but barely manages to exceed the unique-then-match method.
I suspect a greater advantage in performance the longer x becomes, because unique-then-match needs two passes over the data while uniquify() only needs one pass.
|expr | mean_score|
|:---------------|----------:|
|rcpp_uniquify | 1.018550|
|unique_match | 1.027154|
|unique_factor | 5.024102|
|unique_match_df | 36.613970|
|brute | 45.106015|
Maybe you can try factor + as.integer like below
as.integer(factor(x))
I found a cool, and fast, answer recently,
match(data.frame(t(x)), data.frame(t(y)))
As always, beware when working with floats. I recommend something like
match(data.frame(t(round(x,10))), data.frame(t(round(y))))
in such cases.
I've finally managed to beat unique() and match() using Rcpp to hand-code the algorithm in C++ using a std::unordered_map as core bookkeeping data structure.
Here is the source code, which can be used in R by writing it into a file and running Rcpp::sourceCpp on it.
#include <Rcpp.h>
using namespace Rcpp;
template <int T>
List uniquify_impl(Vector<T> x) {
IntegerVector idxes(x.length());
typedef typename Rcpp::traits::storage_type<T>::type storage_t;
std::unordered_map<storage_t, int> unique_map;
int n_unique = 0;
// 1. Pass through x once
for (int i = 0; i < x.length(); i++) {
storage_t curr = x[i];
int idx = unique_map[curr];
if (idx == 0) {
unique_map[curr] = ++n_unique;
idx = n_unique;
}
idxes[i] = idx;
}
// 2. Sort unique_map by its key
Vector<T> uniques(unique_map.size());
for (auto &pair : unique_map) {
uniques[pair.second - 1] = pair.first;
}
return List::create(
_["u"] = uniques,
_["i"] = idxes
);
}
// [[Rcpp::export]]
List uniquify(RObject x) {
switch (TYPEOF(x)) {
case INTSXP: {
return uniquify_impl(as<IntegerVector>(x));
}
case REALSXP: {
return uniquify_impl(as<NumericVector>(x));
}
case STRSXP: {
return uniquify_impl(as<CharacterVector>(x));
}
default: {
warning(
"Invalid SEXPTYPE %d (%s).\n",
TYPEOF(x), type2name(x)
);
return R_NilValue;
}
}
}
I have defined the next function
inequalizer <- function(x,caracter) {
if(caracter=="X") {
function(y) {y[1] < x}
} else if(caracter=="Y") {
function(y) {y[2] < x}
} else {
function(y) {y[3] < x}
}
}
which returns one function depending on the input parameters x and caracter. I have another function where I call this function recursively , whose arguments depend on some initial data.
This function returned by "inequalizer" is saved as
function(y) {y[2] < x}
<bytecode: 'code'>
<environment: 'code'>
I want to know if there is some way to save it with the literal argument passed to x. So if those parameters are x=1 caracter="Y" I would get
function(y) {y[2] < 1}
<bytecode: 'code'>
<environment: 'code'>
Maybe store x as attr
inequalizer <- function(x, caracter) {
if(caracter=="X") {
foo = function(y) {y[1] < x}
attr(foo, "x") = x
foo
} else if(caracter=="Y") {
foo = function(y) {y[2] < x}
attr(foo, "x") = x
foo
} else {
foo = function(y) {y[3] < x}
attr(foo, "x") = x
foo
}
}
myf = inequalizer(5, "X")
myf
#function(y) {y[1] < x}
#<environment: 0x000000001c12e2d0>
#attr(,"x")
#[1] 5
I have multiple functions handing around arguments that may be missing.
e.g. i have
mainfunction <- function(somearg) {
mytest(somearg)
fun <- function() { subfunction(somearg) }
fun()
}
with the interesting aspect that the only interaction of mytest(somearg) with the arg is that it tests if the argument isn’t missing:
mytest = function(somearg) {
print(missing(somearg))
}
subfunction then again tests if it’s missing and treats it accordingly:
subfunction = function(somearg) {
if (missing(somearg))
somearg = NULL
else
somearg = matrix(somearg, cols = 2)
# somearg is used here…
}
the kicker is that, with somearg missing, this doesn’t work: matrix(somearg, cols = 2) throws
argument "somearg" is missing, with no default
during debugging, i found the following:
at the start of mainfunction, missing(somearg) returns TRUE
in mytest, missing(somearg) returns TRUE
insubfunction, missing(somearg) returns FALSE (!!!!)
therefore the matrix branch is hit, but in reality, somearg is missing, so it fails…
wat.
the #BenBolker way:
mainfunction <- function(somearg = NULL) {
mytest(somearg)
fun <- function() { subfunction(somearg) }
fun()
}
mytest = function(somearg) {
print(is.null(somearg))
}
subfunction = function(somearg) {
if (is.null(somearg))
somearg = 1:10
else
somearg = matrix(somearg, ncol = 2)
somearg
}
Another way, using explicit missing argument
mainfunction <- function(somearg) {
is_missing <- missing(somearg)
mytest(is_missing)
fun <- function() { subfunction(somearg, is_missing) }
fun()
}
mytest = function(x) { print(x) }
subfunction = function(somearg, is_arg_missing) {
if (is_arg_missing)
somearg = 1:10
else
somearg = matrix(somearg, ncol = 2)
somearg
}
A third way, using plain missing arg passing:
mainfunction <- function(somearg) {
is_missing <- missing(somearg)
mytest(somearg)
fun <- function() {
if (is_missing) subfunction() else
subfunction(somearg)
}
fun()
}
mytest = function(somearg) {
print(missing(somearg))
}
subfunction = function(somearg) {
if (missing(somearg))
somearg = 1:10
else
somearg = matrix(somearg, ncol = 2)
somearg
}
Using:
mean (x, trim=0.05)
Removes 2.5% from each side of the distribution, which is fine for symmetrical two-tailed data. But if I have one tailed or highly asymmetric data I would like to be able to remove just one side of the distribution. Is there a function for this or do I have write myself a new one? If so, how?
Just create a modified mean.default. First look at mean.default:
mean.default
Then modify it to accept a new argument:
mean.default <-
function (x, trim = 0, na.rm = FALSE, ..., side="both")
{
if (!is.numeric(x) && !is.complex(x) && !is.logical(x)) {
warning("argument is not numeric or logical: returning NA")
return(NA_real_)
}
if (na.rm)
x <- x[!is.na(x)]
if (!is.numeric(trim) || length(trim) != 1L)
stop("'trim' must be numeric of length one")
n <- length(x)
if (trim > 0 && n) {
if (is.complex(x))
stop("trimmed means are not defined for complex data")
if (any(is.na(x)))
return(NA_real_)
if (trim >= 0.5)
return(stats::median(x, na.rm = FALSE))
lo <- if( side=="both" || side=="right" ){ floor(n * trim) + 1 }else{1}
hi <- if( side=="both" || side=="left" ){ n + 1 - (floor(n * trim) + 1 ) }else{ n}
x <- sort.int(x, partial = unique(c(lo, hi)))[lo:hi]
cat(c(length(x), lo , hi) )
}
.Internal(mean(x))
}
I don't know of a function. Something like the following would trim off the upper tail of the distribution before taking the mean.
upper.trim.mean <- function(x,trim) {
x <- sort(x)
mean(x[1:floor(length(x)*(1-trim))])
}
This should account for either side, or both sides for trimming.
trim.side.mean <- function(x, trim, type="both"){
if (type == "both") {
mean(x,trim)}
else if (type == "right") {
x <- sort(x)
mean(x[1:floor(length(x)*(1-trim))])}
else if (type == "left"){
x <- sort(x)
mean(x[max(1,floor(length(x)*trim)):length(x)])}}
one.sided.trim.mean <- function(x, trim, upper=T) {
if(upper) trim = 1-trim
data <- mean(x[x<quantile(x, trim)])
}
I found that all the answers posted do not match when checked manually. So I created one of my own. Its long but simple enough to understand
get_trim <- function(x,trim,type)
{
x <- sort(x)
ans<-0
if (type=="both")
{
for (i in (trim+1):(length(x)-trim))
{
ans=ans+x[i];
}
print(ans/(length(x)-(2*trim)))
}
else if(type=="left")
{
for (i in (trim+1):(length(x)))
{
ans=ans+x[i];
}
print(ans/(length(x)-trim))
}
else if (type=="right")
{
for (i in 1:(length(x)-trim))
{
ans=ans+x[i];
}
print(ans/(length(x)-trim))
}
}