Is it possible to save variable values when using pairs? - r

Is it possible to do something like this somehow with pairs or a similar function?
var = "" #initialization
panel.pearson <- function(x, y, ...) {
horizontal <- (par("usr")[1] + par("usr")[2]) / 2;
vertical <- (par("usr")[3] + par("usr")[4]) / 2;
cor = cor.test(x,y)
cor.p = cor$p.value
cor.r = cor$estimate
cor.p = round(cor.p, digits = 2)
cor.r = round(cor.r, digits = 2)
stars = ifelse(cor.p < .001, "***", ifelse(cor.p < .01, "** ", ifelse(cor.p < .05, "* ", " ")))
format_r_p = paste(cor.r, stars, sep="")
text(horizontal, vertical, format_r_p, cex=2)
var = c(var, format_r_p)
}
pairs(crime, upper.panel=panel.pearson )
var would output all the format_r_p values.

It’s possible but it’s a really, really bad idea in general: functions should not mutate global state.
So instead, isolate the modification to be local instead of global:
var = ''
pairs(crime, upper.panel = function (x, y, ...) {
result = panel.pearson(x, y, ...)
var <<- c(var, result)
result
})
Now, instead of making panel.pearson modify any global magic variables, we use an anonymous function in the scope of the call to pairs to modify a variable in the scope of the call to pairs, i.e. locally.
To modify this variable from inside the anonymous function, we use <<- instead of the normal assignment.

Related

Using R, how to scope internal functions within a MAIN function?

My young son and I were playing a board game called Snails Pace. Simple enough, so I sat down to show him the game on the R-console.
Helper function
num.round = function(n, by=5)
{
byidx = (n %% by == 0); # these already are indexed well
new = by * as.integer((n + by) / by);
res = n;
res[!byidx] = new[!byidx];
res;
}
Primary function
snails.pace = function(moves = 200, finish.line = 8,
snail.x = NULL,
snail.y = NULL,
snail.col = NULL
)
{
if(is.null(snail.x)) { snail.x = 0*(1:6); }
if(is.null(snail.y)) { snail.y = 1*(1:6); }
if(is.null(snail.col)) { snail.col = c("orange", "blue", "pink", "green", "yellow", "red"); }
snail.rank = 0*snail.x;
crank = 1; # current rank
move.number = 0;
snails.plot = function(snail.x, snail.y, snail.rank, move.number, moves, finish.line, crank)
{
xmax = max(10, max(snail.x) );
ymax = max(8, max(snail.y) );
plot(snail.x, snail.y,
col=snail.col,
pch=16, cex=5,
xlim=c(0, num.round(xmax, 5) ),
ylim=c(0, num.round(ymax, 4) ),
axes=FALSE,
frame.plot=FALSE,
xlab="", ylab="",
main=paste0("Move #", move.number, " of ", moves)
);
#axis(gr.side("bottom"));
axis(1);
has.rank = (snail.rank != 0);
snails.lab = paste0(snail.x, "*", snail.rank);
snails.lab[!has.rank] = snail.x[!has.rank];
text(snail.x, y=snail.y, labels=snails.lab, col="black");
abline(v = finish.line, col="gray", lty="dashed");
}
snails.update = function(snail.x, snail.y, snail.rank, move.number, moves, finish.line, crank)
{
x = readline(prompt="Press [enter] to continue, [ESC] to quit");
n = sample(1:6, 1);
snail.x[n] = 1 + snail.x[n];
if( (snail.rank[n] == 0) && (snail.x[n] >= finish.line) )
{
snail.rank[n] = crank;
crank = 1 + crank;
# update to MAIN environment
assign("snail.rank", snail.rank, envir=parent.frame() );
assign("crank", crank, envir=parent.frame() );
}
snail.x;
}
snails.plot(snail.x, snail.y, snail.rank, move.number, moves, finish.line, crank);
while(move.number < moves)
{
move.number = 1 + move.number;
snail.x = snails.update(snail.x, snail.y, snail.rank, move.number, moves, finish.line, crank);
snails.plot(snail.x, snail.y, snail.rank, move.number, moves, finish.line, crank);
}
}
Game play
snails.pace();
Question: how to scope internal functions within MAIN environoment?
The MAIN function is snails.pace(). You will notice in the internal function snails.update, I update two variables and assign them back to the MAIN scope using assign.
Is there a way at the MAIN level I can define all the variables and just USE them within all internal functions without having to assign them back or returning the updating values?
As you can see in my CODE, I call all of the variables into the functions and either "back assign" or return any changes. I would prefer to just set a new env() or something and have MAIN work like R-Global seems to. Any suggestions on how to do that?
That is, my internal functions would not pass anything in: snails.plot = function() and snails.update = function() AS they would get the LOCAL environment variables (defined as within MAIN defined as snails.pace()). And ideally update the LOCAL environment variables by updating the value within the internal function.
Update
So it appears that I can drop the function passing. See:
snails.pace2 = function(moves = 200, finish.line = 8,
snail.x = NULL,
snail.y = NULL,
snail.col = NULL
)
{
if(is.null(snail.x)) { snail.x = 0*(1:6); }
if(is.null(snail.y)) { snail.y = 1*(1:6); }
if(is.null(snail.col)) { snail.col = c("orange", "blue", "pink", "green", "yellow", "red"); }
snail.rank = 0*snail.x;
crank = 1; # current rank
move.number = 0;
snails.plot = function()
{
xmax = max(10, max(snail.x) );
ymax = max(8, max(snail.y) );
plot(snail.x, snail.y,
col=snail.col,
pch=16, cex=5,
xlim=c(0, num.round(xmax, 5) ),
ylim=c(0, num.round(ymax, 4) ),
axes=FALSE,
frame.plot=FALSE,
xlab="", ylab="",
main=paste0("Move #", move.number, " of ", moves)
);
#axis(gr.side("bottom"));
axis(1);
has.rank = (snail.rank != 0);
snails.lab = paste0(snail.x, "*", snail.rank);
snails.lab[!has.rank] = snail.x[!has.rank];
text(snail.x, y=snail.y, labels=snails.lab, col="black");
abline(v = finish.line, col="gray", lty="dashed");
}
snails.update = function()
{
x = readline(prompt="Press [enter] to continue, [ESC] to quit");
n = sample(1:6, 1);
snail.x[n] = 1 + snail.x[n];
if( (snail.rank[n] == 0) && (snail.x[n] >= finish.line) )
{
snail.rank[n] = crank;
crank = 1 + crank;
# update to MAIN environment
assign("snail.rank", snail.rank, envir=parent.frame() );
assign("crank", crank, envir=parent.frame() );
}
snail.x;
}
snails.plot();
while(move.number < moves)
{
move.number = 1 + move.number;
snail.x = snails.update();
snails.plot();
}
}
#MrFlick is correct about the lexical scoping, if I understand the above correctly. If an internal updates something from MAIN, it has to assign it back to MAIN I guess <<- or assign ... parent. Is there not a way to tell the internal SUBFUNCTIONS to SCOPE at the same level of MAIN?
There are two completely different concepts called "parent" in R: the parent.frame() of a call, and the parent.env() of an environment.
parent.frame() walks up the chain of the stack of calls. If you have a recursive function that calls itself, it will appear multiple times in that chain.
In general, it's dangerous to use parent.frame(), because even if the context in which you use it now makes it clear which environment will be the parent.frame(), at some future time you might change your program (e.g. make the internal function into a recursive one, or call it from another internal function), and then parent.frame() will refer to something different.
The parent.env() function applies to an environment; parent.env(environment()) gives you the enclosing environment of the current one. If you call parent.env(environment()) it will always refer to the environment where your current function was defined. It doesn't matter how you called it, just how you defined it. So you always know what will happen if you assign there, and it's much safer in the long term than using parent.frame().
The <<- "super-assignment" works with enclosing environments, not the stack of calls. If you do var <<- value, then as long as you are sure that var was defined in the enclosing function, you can be sure that's what gets modified.
One flaw in R is that it doesn't enforce the existence of var there, so that's why some people say <<- is "sloppy". If you accidentally forget to define it properly, or spell it wrong, R will search back through the whole chain of environments to try to do what you asked, and if it never finds a matching variable, it will do the assignment in the global environment. You almost never want to do that: keep side effects minimal.
So, to answer the question "Is there a way at the MAIN level I can define all the variables and just USE them within all internal functions without having to assign them back or returning the updating values?": as you found in your edit, the nested function can read the value of any variable in the MAIN function without requiring any special code. To modify those variables, be sure both snail.rank and crank are defined in MAIN, then use <<- in the nested function to assign new values to them.
To have a function f defined within another function main such that f has the same scope as main surround the entire body of f with eval.parent(substitute({...})) like this:
main <- function() {
f <- function() eval.parent(substitute({
a <- a + 1
b <- 0.5
}))
a <- 1
f()
f()
10 * a + b
}
main()
## [1] 30.5
The gtools package has defmacro which allows the same thing and uses the same technique internally. Also see the wrapr package.

How to best combine unique and match in R?

I found myself often writing code such as
#' #param x input vector
#' #param ... passed to [slow_fun()]
fast_fun <- function(x, ...) {
u <- unique(x)
i <- match(x, u)
v <- slow_fun(u, ...)
v[i]
}
To accelerate a slow vectorized "pure" function where each input entry could theoretically be computed individually and where input is expected to contain many duplicates.
Now I wonder whether this is the best way to achieve such a speedup or is there some function (preferrably in base R or the tidyverse) which does something like unique and match at the same time?
Benchmarks so far
Thanks for the provided answers. I've written a small benchmark suite to compare the approaches:
method <- list(
brute = slow_fun,
unique_match = function(x, ...) {
u <- unique(x)
i <- match(x, u)
v <- slow_fun(u, ...)
v[i]
},
unique_factor = function(x, ...) {
if (is.character(x)) {
x <- factor(x)
i <- as.integer(x)
u <- levels(x)
} else {
u <- unique(x)
i <- as.integer(factor(x, levels = u))
}
v <- slow_fun(u, ...)
v[i]
},
unique_match_df = function(x, ...) {
u <- unique(x)
i <- if (is.numeric(x)) {
match(data.frame(t(round(x, 10))), data.frame(t(round(u, 10))))
} else {
match(data.frame(t(x)), data.frame(t(u)))
}
v <- slow_fun(u, ...)
v[i]
},
rcpp_uniquify = function(x, ...) {
iu <- uniquify(x)
v <- slow_fun(iu[["u"]], ...)
v[iu[["i"]]]
}
)
exprs <- lapply(method, function(fun) substitute(fun(x), list(fun = fun)))
settings$bench <- lapply(seq_len(nrow(settings)), function(i) {
cat("\rBenchmark ", i, " / ", nrow(settings), sep = "")
x <- switch(
settings$type[i],
integer = sample.int(
n = settings$n_distinct[i],
size = settings$n_total[i],
replace = TRUE
),
double = sample(
x = runif(n = settings$n_distinct[i]),
size = settings$n_total[i],
replace = TRUE
),
character = sample(
x = stringi::stri_rand_strings(
n = settings$n_distinct[i],
length = 20L
),
size = settings$n_total[i],
replace = TRUE
)
)
microbenchmark::microbenchmark(
list = exprs
)
})
library(tidyverse)
settings %>%
mutate(
bench = map(bench, summary)
) %>%
unnest(bench) %>%
group_by(n_distinct, n_total, type) %>%
mutate(score = median / min(median)) %>%
group_by(expr) %>%
summarise(mean_score = mean(score)) %>%
arrange(mean_score)
Currently, the rcpp-based approach is best in all tested settings on my machine but barely manages to exceed the unique-then-match method.
I suspect a greater advantage in performance the longer x becomes, because unique-then-match needs two passes over the data while uniquify() only needs one pass.
|expr | mean_score|
|:---------------|----------:|
|rcpp_uniquify | 1.018550|
|unique_match | 1.027154|
|unique_factor | 5.024102|
|unique_match_df | 36.613970|
|brute | 45.106015|
Maybe you can try factor + as.integer like below
as.integer(factor(x))
I found a cool, and fast, answer recently,
match(data.frame(t(x)), data.frame(t(y)))
As always, beware when working with floats. I recommend something like
match(data.frame(t(round(x,10))), data.frame(t(round(y))))
in such cases.
I've finally managed to beat unique() and match() using Rcpp to hand-code the algorithm in C++ using a std::unordered_map as core bookkeeping data structure.
Here is the source code, which can be used in R by writing it into a file and running Rcpp::sourceCpp on it.
#include <Rcpp.h>
using namespace Rcpp;
template <int T>
List uniquify_impl(Vector<T> x) {
IntegerVector idxes(x.length());
typedef typename Rcpp::traits::storage_type<T>::type storage_t;
std::unordered_map<storage_t, int> unique_map;
int n_unique = 0;
// 1. Pass through x once
for (int i = 0; i < x.length(); i++) {
storage_t curr = x[i];
int idx = unique_map[curr];
if (idx == 0) {
unique_map[curr] = ++n_unique;
idx = n_unique;
}
idxes[i] = idx;
}
// 2. Sort unique_map by its key
Vector<T> uniques(unique_map.size());
for (auto &pair : unique_map) {
uniques[pair.second - 1] = pair.first;
}
return List::create(
_["u"] = uniques,
_["i"] = idxes
);
}
// [[Rcpp::export]]
List uniquify(RObject x) {
switch (TYPEOF(x)) {
case INTSXP: {
return uniquify_impl(as<IntegerVector>(x));
}
case REALSXP: {
return uniquify_impl(as<NumericVector>(x));
}
case STRSXP: {
return uniquify_impl(as<CharacterVector>(x));
}
default: {
warning(
"Invalid SEXPTYPE %d (%s).\n",
TYPEOF(x), type2name(x)
);
return R_NilValue;
}
}
}

Passing a function argument with dots

I'm trying to write a function to compute sample sizes in R.
The function uses a couple of smaller functions. I'd like to pass arguments into the smaller functions using the dots. Here is my function so far:
log_reg_var<-function(p){
if(p<=0|p>=1) stop('p must be between 0 and 1')
var<-1/(p*(1-p))
return(var)
}
samplesize<-function(method_name, beta, sigma_x, mult_cor, power= 0.8,fpr = 0.05,...){
if(method_name=='linear regression'){
var_func <- lin_reg_var
}
else if(method_name=='logistic regression'){
var_func <- log_reg_var
}
else if(method_name=='cox regression'){
var_func <- cox_reg_var
}
else if(method_name=='poisson regression'){
var_func <- pois_reg_var
}
else{
stop('method_name not recognized. method_name accepts one of: "linear regression",
"logistic regression","cox regression", or "poisson regression"')
}
top = (qnorm(1-fpr/2) + qnorm(power))^2
bottom = (beta*sigma_x)^2*(1-mult_cor)
n = (top/bottom)*var_func(...)
return(ceiling(n))
}
I should be able to do
samplesize(method_name = 'logreg',1,1,0,p=0.5)
>>>32
But instead I am thrown the following error:
Error in var_func(...) : argument "p" is missing, with no default
Clearly there is something wrong with me passing p through the dots, but I'm not sure what is wrong.
What is my problem here?
You need to add the additional parameter p as an argument and you need to pass it into your log_reg_var() function. You also have to be careful with some other syntax:
log_reg_var<-function(p){
if(p<=0|p>=1) stop('p must be between 0 and 1')
var<-1/(p*(1-p))
return(var)
}
# specify that you pass a parameter `p`
samplesize<-function(method_name, beta, sigma_x, mult_cor, power= 0.8,fpr = 0.05, p, ...){
# Initialize `var_func` to a NULL value
var_func = NULL
if(method_name=='linear regression'){
var_func <- lin_reg_var(p)
}
else if(method_name=='logistic regression'){
# pass parameter `p` into log_reg_var since there is no default
var_func <- log_reg_var(p)
}
else if(method_name=='cox regression'){
var_func <- cox_reg_var(p)
}
else if(method_name=='poisson regression'){
var_func <- pois_reg_var(p)
}
else{
stop('method_name not recognized. method_name accepts one of: "linear regression",
"logistic regression","cox regression", or "poisson regression"')
}
top = (qnorm(1-fpr/2) + qnorm(power))^2
bottom = (beta*sigma_x)^2*(1-mult_cor)
n = (top/bottom)*var_func
return(ceiling(n))
}
> samplesize(method_name ='logistic regression', 1, 1, 0, p=0.5)
[1] 32

How to modify drawdown functions in PerformanceAnalytics package for value

I am calculating the average drawdown, average length, recovery length, etc. in R for a PnL data series rather than return data. This is data frame like this
PNL
2008-11-03 3941434
2008-11-04 4494446
2008-11-05 2829608
2008-11-06 2272070
2008-11-07 -2734941
2008-11-10 -2513580
I used the maxDrawDown function from fTrading package and it worked. How could I get the other drawdown functions? If I directly run AverageDrawdown(quantbook) function, it will give out error message like this
Error in if (thisSign == priorSign) { : missing value where TRUE/FALSE needed
I checked the documentation for AverageDrawdown and it is as below:
findDrawdowns(R, geometric = TRUE, ...)
R an xts, vector, matrix, data frame, timeSeries or zoo object of asset returns
My quantbook is a data frame but doesn't work for this function.
Or do you have anything other packages to get the same funciton, please advise.
I've modified the package's functions. Here is one solution in PnL case (or any other case you want to get the value rather than the return) and hope you find it useful. The parameter x is a dataframe and the row.names for x are dates so you don't bother to convert amongst different data types (which I actually suffer a lot). With the function findPnLDrawdown, you could perform a lot other functions to calculate averageDrawDown, averageLength, recovery, etc.
PnLDrawdown <- function(x) {
ts = as.vector(x[,1])
cumsum = cumsum(c(0, ts))
cmaxx = cumsum - cummax(cumsum)
cmaxx = cmaxx[-1]
cmaxx = as.matrix(cmaxx)
row.names(cmaxx) = row.names(x)
cmaxx = timeSeries(cmaxx)
cmaxx
}
findPnLDrawdown <- function(R) {
drawdowns = PnLDrawdown(R)
draw = c()
begin = c()
end = c()
length = c(0)
trough = c(0)
index = 1
if (drawdowns[1] >= 0) {
priorSign = 1
} else {
priorSign = 0
}
from = 1
sofar = as.numeric(drawdowns[1])
to = 1
dmin = 1
for (i in 1:length(drawdowns)) {
thisSign =ifelse(drawdowns[i] < 0, 0, 1)
if (thisSign == priorSign) {
if (as.numeric(drawdowns[i]) < as.numeric(sofar)) {
sofar = drawdowns[i]
dmin = i
}
to = i+ 1
}
else {
draw[index] = sofar
begin[index] = from
trough[index] = dmin
end[index] = to
from = i
sofar = drawdowns[i]
to = i + 1
dmin = i
index = index + 1
priorSign = thisSign
}
}
draw[index] = sofar
begin[index] = from
trough[index] = dmin
end[index] = to
list(pnl = draw, from = begin, trough = trough, to = end,
length = (end - begin + 1),
peaktotrough = (trough - begin + 1),
recovery = (end - trough))
}

Optim() function error

This is my code.
beta1 = function(a,b,t) { beta(a+(1/t),b) }
beta2 = function(a,b,t) { beta(a+(2/t),b) }
eb11 = function(a,b,t) { beta2(a,b,t)/beta(a,b) }
eb12 = function(a,b,t) { (beta1(a,b,t)-beta2(a,b,t))/beta(a,b) }
eb22 = function(a,b,t) { 1 + (beta2(a,b,t)-2*beta1(a,b,t))/beta(a,b) }
eb11r11 = function(a,b,t) { beta2(a,b,t)*beta(a,b)/beta1(a,b,t)^2 }
eb12r12 = function(a,b,t) { (beta1(a,b,t)-beta2(a,b,t))*beta(a,b)/beta1(a,b,t)/(beta(a,b)-beta1(a,b,t)) }
eb22r22 = function(a,b,t) { (beta(a,b)^2 + (beta2(a,b,t)-2*beta1(a,b,t))*beta(a,b))/(beta(a,b)-beta1(a,b,t))^2 }
gbetloglik = function(a,b,t) {
loglik = n1*log(eb11r11(a,b,t)) + n2*log(eb12r12(a,b,t)) + n3*log(eb22r22(a,b,t))
return(-loglik)
}
abt = optim(c(0.5,0.5,1),gbetloglik,lower=c(0.001,0.001,0.001),method="L-BFGS-B")$par
What I'd like to do is to find a,b, and t that maximize 'gbetloglik' function.
But I got this error.
Error in 2/t : 't' is missing
It seems that the third argument of function 'beta2' is missing. When I enter three numbers directly in gbetloglik function, it works well. The problem occurs only in optim() function.
Does anyone have any idea?
It looks like you are misinterpreting the first argument of the optim function. The first argument simply supplies initial values for the 1 arguments being optimized. In your case this is supplying 3 initial guesses for one of the arguments to gbetloglik. This call will work:
abt = optim(0.5,gbetloglik,lower=c(0.001,0.001,0.001),method="L-BFGS-B", b=0.5, t= 0.5)$par
but won't optimize across all three arguments, it will simply optimize a given b and t. To optimize across all arguments you will need to install an external package from here. Here is an example from nlmrt:
ydat = c(6.308, 6.94, 9.638, 12.866, 17.069, 23.192, 31.443, 37.558, 51.156, 64.948, 77.995, 91.972)
tdat = seq_along(ydat)
start1 = c(b1=1, b2=1, b3=1)
eunsc = y ~ b1/(1+b2*exp(-b3*tt))
anlxb1g =try(nlxb(eunsc, start=start1, trace=FALSE, data=data.frame(y=ydat, tt=tdat)))
print(anlxb1g)
anlxb1g$coefficients

Resources