Simplify ave() or aggregate() with several inputs - r

How can I write this all in one line?
mydata is a "zoo" series, limit is a numeric vector of the same size
tmp <- ave(coredata(mydata), as.Date(index(mydata)),
FUN = function(x) cummax(x)-x)
tmp <- (tmp < limit)
final <- ave(tmp, as.Date(index(mydata)),
FUN = function(x) cumprod(x))
I've tried to use two vectors as argument to ave(...) but it seems to accept just one even if I join them into a matrix.
This is just an example, but any other function could be use.
Here I need to compare the value of cummax(mydata)-mydata with a numeric vector and
once it surpasses it I'll keep zeros till the end of the day. The cummax is calculated from the beginning of each day.
If limit were a single number instead of a vector (with different possible numbers) I could write it:
ave(coredata(mydata), as.Date(index(mydata)),
FUN = function(x) cumprod((cummax(x) - x) < limit))
But I can't introduce there a vector longer than x (it should have the same length than each day) and I don't know how to introduce it as another argument in ave().

Seems like this routine imposes intraday stoploss based on maxdrawdown. So I assume you want to be able to pass in variable limit as a second argument to your aggregation function which only currently only takes 1 function due to the way ave works.
If putting all this in one line is not an absolute must, I can share a function I've written that generalizes aggregation via "cut variables". Here's the code:
mtapplylist2 <- function(t, IDX, DEF, MoreArgs=NULL, ...)
{
if(mode(DEF) != "list")
{
cat("Definition must be list type\n");
return(NULL);
}
a <- c();
colnames <- names(DEF);
for ( i in 1:length(DEF) )
{
def <- DEF[[i]];
func <- def[1];
if(mode(func) == "character") { func <- get(func); }
cols <- def[-1];
# build the argument to be called
arglist <- list();
arglist[[1]] <- func;
for( j in 1:length(cols) )
{
col <- cols[j];
grp <- split(t[,col], IDX);
arglist[[1+j]] <- grp;
}
arglist[["MoreArgs"]] <- MoreArgs;
v <- do.call("mapply", arglist);
# print(class(v)); print(v);
if(class(v) == "matrix")
{
a <- cbind(a, as.vector(v));
} else {
a <- cbind(a, v);
}
}
colnames(a) <- colnames;
return(a);
}
And you can use it like this:
# assuming you have the data in the data.frame
df <- data.frame(date=rep(1:10,10), ret=rnorm(100), limit=rep(c(0.25,0.50),50))
dfunc <- function(x, ...) { return(cummax(x)-x ) }
pfunc <- function(x,y, ...) { return((cummax(x)-x) < y) }
# assumes you have the function declared in the same namespace
def <- list(
"drawdown" = c("dfunc", "ret"),
"hasdrawdown" = c("pfunc", "ret", "limit")
);
# from R console
> def <- list("drawdown" = c("dfunc", "ret"),"happened" = c("pfunc","ret","limit"))
> dim( mtapplylist2(df, df$date, def) )
[1] 100 2
Notice that the "def" variable is a list containing the following items:
computed column name
vector arg function name as a string
name of the variable in the input data.frame that are inputs into the function
If you look at the guts of "mtapplylist2" function, the key components would be "split" and "mapply". These functions are sufficiently fast (I think split is implemented in C).
This works with functions requiring multiple arguments, and also for functions returning vector of the same size or aggregated value.
Try it out and let me know if this solves your problem.

Related

Name seurat function in r with name of each experiment/variable

I am using seurat to analyze some scRNAseq data, I have managed to put all the SCT integration one line codes from satijalab into a function with basically
SCT_normalization <- function (f1, f2) {
f_merge <- merge (f1, y=f2)
f.list <- SplitObject(f_merge, split.by = "stim")
f.list <- lapply(X = f.list, FUN = SCTransform)
features <- SelectIntegrationFeatures(object.list = f.list, nfeatures = 3000)
f.list <<- PrepSCTIntegration(object.list = f.list, anchor.features = features)
return (f.list)
}
so that I will have f.list in the global environment for downstream analysis and making plots. The problem I am running into is that, every time I run the function, the output would be f.list, I want it to be specific to the input value name (i.e., f1 and/or f2). Basically something that I can set so that I would know which input value was used to generate the final output. I saw something using the assign function but someone wrote a warning about "the evil and wrong..." so I am not sure as to how to approach this.
From what it sounds like you don't need to use the super assign function <<-. In my opinion, I don't think <<- should be used as it can cause unexpected changes in objects. This is what I assume the other person was saying. For example, if you have the following function:
AverageVector <- function(v) x <<- mean(v, rm.na = TRUE)
Now you're trying to find the average of a vector you have, along with more analysis
library(tidyverse)
x <- unique(iris$Species)
avg_sl <- AverageVector(iris$Sepal.Length)
Now where x used to be a character vector, it's not a numeric vector with a length of 1.
So I would remove the <<- and call your function like this
object_list_1_2 <- SCT_normalize(object1, object2)
If you wanted a slightly more programatic way you could do something like this to keep track of objects you could do something like this:
SCT_normalization <- function(f1, f2) {
f_merge <- merge (f1, y = f2)
f.list <- SplitObject(f_merge, split.by = "stim")
f.list <- lapply(X = f.list, FUN = SCTransform)
features <- SelectIntegrationFeatures(object.list = f.list, nfeatures = 3000)
f.list <- PrepSCTIntegration(object.list = f.list, anchor.features = features)
to_return <- list(inputs = list(f1, f2), normalized = f.list)
return(to_return)
}

Change data type of elements in a nested list

Is it possible to scan a list of lists for elements with a certain name and change their datatype but retain their value?
As an example, the following list containing elements 'N' of class 'character' or 'numeric'
x = list(list(N=as.character(1)),
list(a=1,b=2,c="another element",N=as.character(5)),
list(a=2,b=2,N=as.character(7),c=NULL),
list(a=2,b=2,list(N=as.character(3))))
should then become:
x = list(list(N=as.numeric(1)),
list(a=1,b=2,c="another element",N=as.numeric(5)),
list(a=2,b=2,N=as.numeric(7),c=NULL),
list(a=2,b=2,list(N=as.numeric(3))))
To be clear, the solution should allow for deeper nesting, and respect the data type of fields with names other than "N". I have not been able to find a general solution that works for lists with an arbitrary structure.
I have tried something along the lines of the solution given in this post:
a <- as.relistable(x)
u <- unlist(a)
u[names(u) == "N"] <- as.numeric(u[names(u) == "N"])
relist(u, a)
Unfortunately the substitution does not work in it's current form. In addition, relist does not seem to work in case the list contains NULL elements.
Use lapply to repeat the process over the list elements with a condition to check for your element of interest, so you don't inadvertently add elements to your sublists:
x <- lapply(x, function(i) {
if(length(i$N) > 0) {
i$N <- as.numeric(i$N)
}
return(i)
})
A solution that works only on a list of lists containing numbers or strings with numbers:
x <- list(list(N=as.character(1)),
list(a=1,b=2,N=as.character(5)),
list(a=2,b=2,N=as.character(7)),
list(a=2,b=2))
y1 <- lapply(x, function(y) lapply(y, as.numeric))
y2 <- list(list(N=as.numeric(1)),
list(a=1,b=2,N=as.numeric(5)),
list(a=2,b=2,N=as.numeric(7)),
list(a=2,b=2))
identical(y1,y2)
# [1] TRUE
EDIT. Here is a more general code that works on nested lists of number and strings. It uses a recursive function as_num and the list.apply function of the rlist package.
library(rlist)
x = list(list(N=as.character(1)),
list(a=1,b=2,c="another element",N=as.character(5)),
list(a=2,b=2,N=as.character(7),c=NULL),
list(a=2,b=2,list(N=as.character(3))))
# Test if the string contains a number
is_num <- function(x) grepl("[-]?[0-9]+[.]?[0-9]*|[-]?[0-9]+[L]?|[-]?[0-9]+[.]?[0-9]*[eE][0-9]+",x)
# A recursive function for numeric convertion of strings containing numbers
as_num <- function(x) {
if (!is.null(x)) {
if (class(x)!="list") {
y <- x
if (is.character(x) & is_num(x)) y <- as.numeric(x)
} else {
y <- list.apply(x, as_num)
}
} else {
y <- x
}
return(y)
}
y <- list.apply(x, as_num)
z = list(list(N=as.numeric(1)),
list(a=1,b=2,c="another element",N=as.numeric(5)),
list(a=2,b=2,N=as.numeric(7),c=NULL),
list(a=2,b=2,list(N=as.numeric(3))))
identical(y,z)
# [1] TRUE
The answer provided by marco sandri can be further generalised to:
is_num <- function(x) grepl("^[-]?[0-9]+[.]?[0-9]*|^[-]?[0-9]+[L]?|^[-]?[0-9]+[.]?[0-9]*[eE][0-9]+",x)
as_num <- function(x) {
if (is.null(x)||length(x) == 0) return(x)
if (class(x)=="list") return(lapply(x, as_num))
if (is.character(x) & is_num(x)) return(as.numeric(x))
return(x)
}
y <- as_num(z)
identical(y,z)
This solution also allows for list elements to contain numerical(0) and mixed datatypes such as 'data2005'.

*apply in r to repeat a function

I've written a function that is a simulation, that outputs a vector of 100 elements, and I want to use the *apply functions to run the function many times and store the repeated output in a new vector for each time the simulation is run.
The function looks like:
J <- c(1:100)
species_richness <- function(J){
a <- table(J)
return(NROW(a))
}
simulation <- function(J,gens,ploton=FALSE,v=0.1){
species_richness_output <- rep(NA,gens)
for(rep in 1:gens){
index1 <- sample(1:length(J),1)
if(runif(1,0,1) < v){
J[index1] <- (rep+100)
}
else{
index2 <- sample(1:length(J),1)
while(index1==index2) {
index2 <- sample(1:length(J),1)
}
J[index1] <- J[index2]
}
species_richness_output[rep] <- species_richness(J)
}
species_abundance <- function(J){
a <- table(J)
return(a)
}
abuntable <- species_abundance(J)
print(abuntable)
octaves <- function(abuntable){
oct <- (rep(0,log2(sum(abuntable))))
for(i in 1:length(abuntable)){
oct2 <- floor(log2(abuntable[i])+1)
oct[oct2] <- oct[oct2]+1
}
print(oct)
}
# octaves(c(100,64,63,5,4,3,2,2,1,1,1,1))
if(ploton==TRUE){
hist(octaves(abuntable))
}
print(species_richness(J))
return(J)
}
simulation(J, 10000,TRUE,v=0.1)
So that's my function, it takes J a vector I defined earlier, manipulates it, then returns:
the newly simulated vector J of 100 elements
a function called octave that categorises the new vector
a histogram corresponding to the above "octave"
I have tried a number of variations: using lapply, mapply
putting args=args_from_original_simulation
simulation_repeated <- c(mapply(list, FUN=simulation(args),times=10000))
but I keep getting an error with the match.fun part of the mapply function
Error in match.fun(FUN) :
'simulation(J, 10000, FALSE, 0.1)' is not a function, character or symbol
This is despite the simulation I have written showing as being saved as a function in the workspace.
Does anyone know what this error is pointing to?
In this line:
simulation_repeated <- c(mapply(list, FUN=simulation(args),times=10000))
You are not giving a function to mapply. You are (essentially) passing the result of calling simulation(args) and simulation does not return a function.

Writing functions to handle multiple data types in R/Splus?

I would like to write a function that handles multiple data types. Below is an example that works but seems clunky. Is there a standard (or better) way of doing this?
(It's times like this I miss Matlab where everything is one type :>)
myfunc = function(x) {
# does some stuff to x and returns a value
# at some point the function will need to find out the number of elements
# at some point the function will need to access an element of x.
#
# args:
# x: a column of data taking on many possible types
# e.g., vector, matrix, data.frame, timeSeries, list
x.vec <- as.vector(as.matrix(as.data.frame(x)))
n <- length(x.vec)
ret <- x.vec[n/3] # this line only for concreteness
return(ret)
}
Use S3 methods. A quick example to get you started:
myfunc <- function(x) {
UseMethod("myfunc",x)
}
myfunc.data.frame <- function(x) {
x.vec <- as.vector(as.matrix(x))
myfunc(x.vec)
}
myfunc.numeric <- function(x) {
n <- length(x)
ret <- x[n/3]
return(ret)
}
myfunc.default <- function(x) {
stop("myfunc not defined for class",class(x),"\n")
}
Two notes:
The ... syntax passes any additional arguments on to functions. If you're extending an existing S3 method (e.g. writing something like summary.myobject), then including the ... is a good idea, because you can pass along arguments conventionally given to the canonical function.
print.myclass <- function(x,...) {
print(x$keyData,...)
}
You can call functions from other functions and keep things nice and parsimonious.
Hmm, your documentation for the function is
# args:
# x: a column of data taking on many possible types
# e.g., vector, matrix, data.frame, timeSeries, list
and if one supplies an object as you claim is require, isn't it already a vector and not a matrix or a data frame, hence obviating the need for separate methods/specific handling?
> dat <- data.frame(A = 1:10, B = runif(10))
> class(dat[,1])
[1] "integer"
> is.vector(dat[,1])
[1] TRUE
> is.vector(dat$A)
[1] TRUE
> is.numeric(dat$A)
[1] TRUE
> is.data.frame(dat$A)
[1] FALSE
I would:
myfunc <- function(x) {
# args:
# x: a column of data taking on many possible types
# e.g., vector, matrix, data.frame, timeSeries, list
n <- length(x)
ret <- x[n/3] # this line only for concreteness
return(ret)
}
> myfunc(dat[,1])
[1] 3
Now, if you want to handle different types of objects and extract a column, then S3 methods would be a way to go. Perhaps your example is over simplified for actual use? Anyway, S3 methods would be something like:
myfunc <- function(x, ...)
UseMethod("myfunc", x)
myfunc.matrix <- function(x, j = 1, ...) {
x <- x[, j]
myfunc.default(x, ...)
}
myfunc.data.frame <- function(x, j = 1, ...) {
x <- data.matrix(x)
myfunc.matrix(x, j, ...)
}
myfunc.default <- function(x, ...) {
n <- length(x)
x[n/3]
}
Giving:
> myfunc(dat)
[1] 3
> myfunc(data.matrix(dat))
[1] 3
> myfunc(data.matrix(dat), j = 2)
[1] 0.2789631
> myfunc(dat[,2])
[1] 0.2789631
You probably should try to use an S3 method for writing a function that will handle multiple datatypes.
A good reference is here: http://www.biostat.jhsph.edu/~rpeng/biostat776/classes-methods.pdf

Passing arguments to iterated function through apply

I have a function like this dummy-one:
FUN <- function(x, parameter){
if (parameter == 1){
z <- DO SOMETHING WITH "x"}
if (parameter ==2){
z <- DO OTHER STUFF WITH "x"}
return(z)
}
Now, I would like to use the function on a dataset using apply.
The problem is, that apply(data,1,FUN(parameter=1))
wont work, as FUN doesn't know what "x" is.
Is there a way to tell apply to call FUN with "x" as the current row/col?
`
You want apply(data,1,FUN,parameter=1). Note the ... in the function definition:
> args(apply)
function (X, MARGIN, FUN, ...)
NULL
and the corresponding entry in the documentation:
...: optional arguments to ‘FUN’.
You can make an anonymous function within the call to apply so that FUN will know what "x" is:
apply(data, 1, function(x) FUN(x, parameter = 1))
See ?apply for examples at the bottom that use this method.
Here's a practical example of passing arguments using the ... object and *apply. It's slick, and this seemed like an easy example to explain the use. An important point to remember is when you define an argument as ... all calls to that function must have named arguments. (so R understands what you're trying to put where). For example, I could have called times <- fperform(longfunction, 10, noise = 5000) but leaving off noise = would have given me an error because it's being passed through ... My personal style is to name all of the arguments if a ... is used just to be safe.
You can see that the argument noise is being defined in the call to fperform(FUN = longfunction, ntimes = 10, noise = 5000) but isn't being used for another 2 levels with the call to diff <- rbind(c(x, runtime(FUN, ...))) and ultimately fun <- FUN(...)
# Made this to take up time
longfunction <- function(noise = 2500, ...) {
lapply(seq(noise), function(x) {
z <- noise * runif(x)
})
}
# Takes a function and clocks the runtime
runtime <- function(FUN, display = TRUE, ...) {
before <- Sys.time()
fun <- FUN(...)
after <- Sys.time()
if (isTRUE(display)) {
print(after-before)
}
else {
after-before
}
}
# Vectorizes runtime() to allow for multiple tests
fperform <- function(FUN, ntimes = 10, ...) {
out <- sapply(seq(ntimes), function(x) {
diff <- rbind(c(x, runtime(FUN, ...)))
})
}
times <- fperform(FUN = longfunction, ntimes = 10, noise = 5000)
avgtime <- mean(times[2,])
print(paste("Average Time difference of ", avgtime, " secs", sep=""))

Resources