R - How to get a difference/sum of 2 step functions? - r

I have 2 step function objects (ecdf objects to be exact). How to calculate a step function that is a difference or sum of these two?

I just had the same question and found the follwoing nice solution
y1 <- c(0, 1, 2, 0)
x1 <- c(1, 2, 3)
f1 <- stepfun(x = x1, y = y1)
par(mfrow = c(2, 2))
plot(f1)
y2 <- c(0, 1, 0)
x2 <- c(1.5, 2.5)
f2 <- stepfun(x = x2, y = y2)
plot(f2)
fs <- function(x, f1, f2) {
return(f1(x) + f2(x))
}
fm <- function(x, f1, f2) {
return(f1(x) * f2(x))
}
x <- seq(0, 4, length.out = 100)
plot(x, fs(x, f1, f2), type = "s", main = "Sum f1+f2")
plot(x, fm(x, f1, f2), type = "s", main = "Multiplication f1*f2")
par(mfrow = c(1, 1))
There might be a more elegant version using + and * operators, see e.g. here...
foo <- structure(list(value = 1, txt = 'a'), class = 'foo')
`+.foo` <- function(leftfoo, rightfoo) { return (paste(leftfoo$txt, rightfoo$txt)) }
foo + foo
#[1] "a a"

Depends on what you need it for. An object of class stepfun is in one sense a function; if a <- ecdf(rnorm(100)), then a(0) will evaluate to something close to .5. So you can add them just by adding functions -- ecdf.sum <- function(x) { ecdf1(x) + ecdf2(x) }. This will yield something that is effectively a step function, but not of class stepfun or ecdf.
Regardless, what you get out will not be an ecdf object, because the values will not have the correct range. But to at least recover it as a step function, you can decompose it into knots:
knots.new <- sort(knots(ec1), knots(ec2))
ec.new <- stepfun(knots.new, c(0,ec1(knots.new) + ec2(knots.new)))
The c(0, ... is because you need one more value than the knots (for the left-hand value of the step function), and for objects of type ecdf 0 is a safe value.

Related

Avoiding duplication in R

I am trying to fit a variety of (truncated) probability distributions to the same very thin set of quantiles. I can do it but it seems to require lots of duplication of the same code. Is there a neater way?
I am using this code by Nadarajah and Kotz to generate the pdf of the truncated distributions:
qtrunc <- function(p, spec, a = -Inf, b = Inf, ...)
{
tt <- p
G <- get(paste("p", spec, sep = ""), mode = "function")
Gin <- get(paste("q", spec, sep = ""), mode = "function")
tt <- Gin(G(a, ...) + p*(G(b, ...) - G(a, ...)), ...)
return(tt)
}
where spec can be the name of any untruncated distribution for which code in R exists, and the ... argument is used to provide the names of the parameters of that untruncated distribution.
To achieve the best fit I need to measure the distance between the given quantiles and those calculated using arbitrary values of the parameters of the distribution. In the case of the gamma distribution, for example, the code is as follows:
spec <- "gamma"
fit_gamma <- function(x, l = 0, h = 20, t1 = 5, t2 = 13){
ct1 <- qtrunc(p = 1/3, spec, a = l, b = h, shape = x[1],rate = x[2])
ct2 <- qtrunc(p = 2/3, spec, a = l, b = h, shape = x[1],rate = x[2])
dist <- vector(mode = "numeric", length = 2)
dist[1] <- (t1 - ct1)^2
dist[2] <- (t2- ct2)^2
return(sqrt(sum(dist)))
}
where l is the lower truncation, h is the higher and I am given the two tertiles t1 and t2.
Finally, I seek the best fit using optim, thus:
gamma_fit <- optim(par = c(2, 4),
fn = fit_gamma,
l = l,
h = h,
t1 = t1,
t2 = t2,
method = "L-BFGS-B",
lower = c(1.01, 1.4)
Now suppose I want to do the same thing but fitting a normal distribution instead. The names of the parameters of the normal distribution that I am using in R are mean and sd.
I can achieve what I want but only by writing a whole new function fit_normal that is extremely similar to my fit_gamma function but with the new parameter names used in the definition of ct1 and ct2.
The problem of duplication of code becomes very severe because I wish to try fitting a large number of different distributions to my data.
What I want to know is whether there is a way of writing a generic fit_spec as it were so that the parameter names do not have to be written out by me.
Use x as a named list to create a list of arguments to pass into qtrunc() using do.call().
fit_distro <- function(x, spec, l = 0, h = 20, t1 = 5, t2 = 13){
args <- c(x, list(spec = spec, a = l, b = h))
ct1 <- do.call(qtrunc, args = c(list(p = 1/3), args))
ct2 <- do.call(qtrunc, args = c(list(p = 2/3), args))
dist <- vector(mode = "numeric", length = 2)
dist[1] <- (t1 - ct1)^2
dist[2] <- (t2 - ct2)^2
return(sqrt(sum(dist)))
}
This is called as follows, which is the same as your original function.
fit_distro(list(shape = 2, rate = 3), "gamma")
# [1] 13.07425
fit_gamma(c(2, 3))
# [1] 13.07425
This will work with other distributions, for however many parameters they have.
fit_distro(list(mean = 10, sd = 3), "norm")
# [1] 4.08379
fit_distro(list(shape1 = 2, shape2 = 3, ncp = 10), "beta")
# [1] 12.98371

How can a make the sum of two step functions (R-stepfun) of class "stepfun"?

From the example here I tried to make the sum as class "stepfun". I thought, as.stepfun is the right choice, but my ideas don't work. What is wrong?
y1 <- c(0, 1, 2, 0)
x1 <- c(1, 2, 3)
f1 <- stepfun(x = x1, y = y1)
print(class(f1))
# [1] "stepfun" "function" # OK!!!
plot(f1)
y2 <- c(0, 1, 0)
x2 <- c(1.5, 2.5)
f2 <- stepfun(x = x2, y = y2)
plot(f2)
fs <- function(x, f1, f2) {
# y <- f1(x) + f2(x) # OK
# y <- as.stepfun(x = x, y = y, ties = "ordered", right = FALSE) # does not work
# return(y) # does not work
return(f1(x) + f2(x))
}
print(class(fs)) # [1] "function"
# attributes(fs) # no new information...
fm <- function(x, f1, f2) {
return(f1(x) * f2(x))
}
print(class(fm)) # [1] "function"
Example as. for data.frame which works as expected:
z <- c(1, 2)
class(z) # [1] "numeric"
class(as.data.frame(z)) # [1] "data.frame"
About internals of stepfun
function (x, y, f = as.numeric(right), ties = "ordered", right = FALSE)
{
if (is.unsorted(x))
stop("stepfun: 'x' must be ordered increasingly")
n <- length(x)
if (n < 1)
stop("'x' must have length >= 1")
n1 <- n + 1L
if (length(y) != n1)
stop("'y' must be one longer than 'x'")
rval <- approxfun(x, y[-if (right)
n1
else 1], method = "constant", yleft = y[1L], yright = y[n1],
f = f, ties = ties)
class(rval) <- c("stepfun", class(rval))
attr(rval, "call") <- sys.call()
rval
}
Thanks to the answers from #jblood94, #user2554330 and #rbm here I found an elegant way which I plan to use in my case. I hope that also helps others:
par(mfrow = c(2, 2))
y1 <- c(0, 1, 2, 0)
x1 <- c(1, 2, 3)
f1 <- stepfun(x = x1, y = y1)
y2 <- c(0, 1, 0)
x2 <- c(1.5, 2.5)
f2 <- stepfun(x = x2, y = y2)
plot(f1)
plot(f2)
'+.stepfun' <- function(f1, f2) {
xs1 <- get("x", envir = environment(f1))
xs2 <- get("x", envir = environment(f2))
xs <- sort(unique(c(x1, x2)))
ys <- f1(c(xs[1] - 1, xs)) + f2(c(xs[1] - 1, xs))
return(stepfun(x = xs, y = ys))
}
f1 + f2
print(class(f1 + f2))
plot(f1 + f2, main = "Sum f1+f2")
'*.stepfun' <- function(f1, f2) {
xs1 <- get("x", envir = environment(f1))
xs2 <- get("x", envir = environment(f2))
xs <- sort(unique(c(x1, x2)))
ys <- f1(c(xs[1] - 1, xs)) * f2(c(xs[1] - 1, xs))
return(stepfun(x = xs, y = ys))
}
f1 * f2
print(class(f1 * f2))
plot(f1 * f2, main = "Sum f1*f2")
par(mfrow = c(1, 1))
It looks like user2554330 is right about creating stepfun objects directly from two other stepfun objects, but here is a workaround in case it's useful:
x12 <- sort(unique(c(x1, x2)))
y12 <- f1(c(x12[1] - 1, x12)) + f2(c(x12[1] - 1, x12))
fs <- stepfun(x = x12, y = y12)
UPDATE1:
Or if you want to create the function from f1 and f2:
fs <- function(f1, f2) {
xs <- sort(unique(c(get("x", envir = environment(f1)), get("x", envir = environment(f2)))))
ys <- f1(c(xs[1] - 1, xs)) + f2(c(xs[1] - 1, xs))
return(stepfun(x = xs, y = ys))
}
plot(fs(f1, f2))
UPDATE2:
Or for an arbitrarily long list of stepfun objects and a the ability to specify the combining function:
y3 <- c(-1, 2, 1, 0)
x3 <- c(0.5, 1, 3)
f3 <- stepfun(x = x3, y = y3)
fs <- function(lf, fun) {
xs <- sort(unique(unlist(lapply(lf, function(x) get("x", envir = environment(x))))))
ys <- apply(mapply(function(f) f(c(xs[1] - 1, xs)), lf), 1, FUN = fun)
return(stepfun(x = xs, y = ys))
}
plot(fs(list(f1, f2, f3), "sum"))
plot(fs(list(f1, f2, f3), "prod"))
The fs function needs the "stepfun" class, not the results it returns. But your fs object won't work as a "stepfun" object, because R makes assumptions about those: they need to keep copies of the data that produced them, among other things. You can see what f1 keeps by looking at ls(environment(f1)). I don't know how those objects are used, but presumably they are needed.
Edited to add:
To turn fs into a "stepfun" object, you could try as.stepfun(fs). But this fails, with the error message
Error in as.stepfun.default(fs) :
no 'as.stepfun' method available for 'x'
The error message isn't the best (x is the internal name of the first argument, and in my opinion it would make more sense to say there's no method available for fs), but it's saying R doesn't know how to do the conversion you want.

How to automatically set-up and add functions to a model in R?

I am setting up a model, and I am trying to reduce the amount of writing I have to do.
Concretely, I am using the coala R-package to do coalescent simulations, and I am trying to easily implement a stepping-stone migration model.
A reproducible example: 4 linearly distributed populations exchange migrants according to stepping-stone pattern (only the adjacent populations).
model <- coal_model(sample_size = c(5, 5, 5, 5),
loci_number = 1,
loci_length = 10,
ploidy = 1) +
feat_mutation(rate = mut_rate, # e.g. 0.1
model = "HKY",
base_frequencies = c(0.25,0.25,0.25,0.25),
tstv_ratio = 4) +
feat_migration(mig_rate, 1, 2) + # mig_rate can be e.g. 0.5
feat_migration(mig_rate, 2, 1) +
feat_migration(mig_rate, 2, 3) +
feat_migration(mig_rate, 3, 2) +
feat_migration(mig_rate, 3, 4) +
feat_migration(mig_rate, 4, 3) +
sumstat_dna(name = "dna", transformation = identity)
This example works, but the downside is that I have to write many 'feat_migration' lines, although there is a clear pattern that could be automated. It is fine for a small number of populations, but I want to do a large simulation with about 70 populations. Does someone has a good idea how to automate this? The documentation has not helped me so far.
I tried two things that didn't work:
feat_migration(mig_rate, c(1,2,2,3,3,4), c(2,1,3,2,4,3))
and something like this:
migration_model <- function(){
for(i in 1:n_pops){
feat_migration(mig_rate, i, i+1) +
feat_migration(mig_rate, i+1, i))
}
In the latter case, I don't really know how I can correctly create and parse all functions correctly into my model.
Good ideas are very welcome! :)
Consider the higher-order functions: Map (wrapper to mapply) and Reduce to build a list of function calls and add them iteratively into model. Specifically, Reduce helps for function accumulating needs where result of each iteration needs to be passed into the next iteration to reduce to a single final result.
n_pops <- 4
start_pts <- as.vector(sapply(seq(n_pops-1), function(x) c(x, x+1)))
start_pts
# [1] 1 2 2 3 3 4
end_pts <- as.vector(sapply(seq(n_pops-1), function(x) c(x+1, x)))
end_pts
# [1] 2 1 3 2 4 3
# LIST OF feat_migration()
feats <- Map(function(x, y) feat_migration(mig_rate, x, y), start_pts, end_pts)
# LIST OF FUNCTIONS
funcs <- c(coal_model(sample_size = c(5, 5, 5, 5),
loci_number = 1,
loci_length = 10,
ploidy = 1),
feat_mutation(rate = mut_rate, # e.g. 0.1
model = "HKY",
base_frequencies = c(0.25,0.25,0.25,0.25),
tstv_ratio = 4),
feats,
sumstat_dna(name = "dna", transformation = identity)
)
# MODEL CALL
model <- Reduce(`+`, funcs)
As an aside, the functional form for ggplot + calls is Reduce:
gp <- ggplot(df) + aes_string(x='Time', y='Data') +
geom_point() + scale_x_datetime(limits=date_range)
# EQUIVALENTLY
gp <- Reduce(ggplot2:::`+.gg`, list(ggplot(df), aes_string(x='Time', y='Data'),
geom_point(), scale_x_datetime(limits=date_range)))
The answer is a slight edit by the solution proposed by Parfait. The model initializes without errors, and can be run in the simulator without errors.
n_pops <- 4
start_pts <- as.vector(sapply(seq(n_pops-1), function(x) c(x, x+1)))
end_pts <- as.vector(sapply(seq(n_pops-1), function(x) c(x+1, x)))
# LIST OF feat_migration()
feats <- Map(function(x, y) feat_migration(mig_rate, x, y), start_pts, end_pts)
# LIST OF FUNCTIONS
funcs <- c(list(coal_model(sample_size = c(5, 5, 5, 5),
loci_number = 1,
loci_length = 10,
ploidy = 1),
feat_mutation(rate = mut_rate, # e.g. 0.1
model = "HKY",
base_frequencies = c(0.25,0.25,0.25,0.25),
tstv_ratio = 4),
sumstat_dna(name = "dna", transformation = identity)),
feats)
)
# MODEL CALL
model <- Reduce(`+`, funcs)

Vectorized R function to produce sets of histograms

I have a vectorized R function (see below). At each run, the function plots two histograms. My goal is that when argument n is a vector (see example of use below), the function plots length of n separate sets of these histograms (ex: if n is a vector of length 2, I expected two sets of histograms i.e., 4 individual histograms)?
I have tried the following with no success. Is there a way to do this?
t.sim = Vectorize(function(n, es, n.sim){
d = numeric(n.sim)
p = numeric(n.sim)
for(i in 1:n.sim){
N = sqrt((n^2)/(2*n))
x = rnorm(n, es, 1)
y = rnorm(n, 0, 1)
a = t.test(x, y, var.equal = TRUE)
d[i] = a[[1]]/N
p[i] = a[[3]]
}
par(mfcol = c(2, length(n)))
hist(p) ; hist(d)
}, "n")
# Example of use:
t.sim(n = c(30, 300), es = .1, n.sim = 1e3) # `n` is a vector of `2` so I expect
# 4 histograms in my graphical device
Vectorize seems to be based on mapply, which would essentially call the function numerous times while cycle through your inputs vector. Hence, the easier way out probably just calls it outside the function
t.sim = Vectorize(function(n, es, n.sim){
d = numeric(n.sim)
p = numeric(n.sim)
for(i in 1:n.sim){
N = sqrt((n^2)/(2*n))
x = rnorm(n, es, 1)
y = rnorm(n, 0, 1)
a = t.test(x, y, var.equal = TRUE)
d[i] = a[[1]]/N
p[i] = a[[3]]
}
# par(mfcol = c(2, npar))
hist(p) ; hist(d)
}, "n")
#inputs
data <- c(30,300)
par(mfcol = c(2, length(data)))
t.sim(n = data, es = c(.1), n.sim = 1e3)

label ylab in timeSeries::plot, type = 'o'

How do I label the y-axis, using timeSeries::plot, with Greek letters? i.e. change SB, SP, etc. to \alpha, \beta etc., I'm am aware I need expression(), in some way. However I can't even get to the labels (I normally use ggplot2). Code below.
# install.packages("xtable", dependencies = TRUE)
library("timeSeries")
## Load Swiss Pension Fund Benchmark Data -
LPP <- LPP2005REC[1:12, 1:4]
colnames(LPP) <- abbreviate(colnames(LPP), 2)
finCenter(LPP) <- "GMT"
timeSeries::plot(LPP, type = "o")
It have been pointed out that the object structure, obtained with str(), is quite particular in LPP compared to say this object z
z <- ts(matrix(rnorm(300), 100, 3), start = c(1961, 1), frequency = 12)
plot(z)
If any one has an answer to both or any I would appreciate it. I realize I can convert the data and plot it with ggplot2, I have seen that here on SO, but I am interested in doing in directly on the timeSeries object LPP and the stats (time-series object) z
[ REVISION & Edited ]
When plot.type is "multiple", we can't define ylab directly. Both plot(ts.obj) (S3 method) and plot(timeSeries.obj) (S4 method) take colnames(obj) as ylab, and I don't know any methods of using Greek letters as colname. (The difference in structure mainly comes from the difference of S3 and S4; colnames(timeSeries.obj) is equivalent to timeSeries.obj#units; the defaults is Series i and TS.i).
We can step in ylab using the arugument, panel (It wants a function and the default is lines). It is used in for(i in 1:ncol(data)). I couldn't give panel.function a suitable "i" (I guess it can in some way, but I didn't think up), so I got "i" using which col the data matches.
for timeSeries
ylabs <- expression(alpha, beta, gamma, delta)
row1 <- LPP[1,]
timeSeries.panel.f <- function(x, y, ...) {
lines(x, y, ...)
mtext(ylabs[which(row1 %in% y[1])], 2, line = 3)
}
plot(LPP, panel = timeSeries.panel.f, type = "o", ann = F)
title("Title")
mtext("Time", 1, line = 3)
## If you aren't so concerned about warnings, here is more general.
## (Many functions read `...` and they return warnings).
timeSeries.panel.f2 <- function(x, y, ..., ylabs = ylabs, row1 = row1) {
lines(x, y, ...)
mtext(ylabs[which(row1 %in% y[1])], 2, line = 3)
}
plot(LPP, panel = timeSeries.panel.f2, type = "o", ann = F,
ylabs = expression(alpha, beta, gamma, delta), row1 = LPP[1,])
title("Title")
mtext("Time", 1, line = 3)
for ts
ylabs <- expression(alpha, beta, gamma)
row1 <- z[1,]
ts.panel.f <- function(y, ...) {
lines(y, ...)
mtext(ylabs[which(row1 %in% y[1])], 2, line = 3)
}
plot(z, panel = ts.panel.f, ann = F)
title("Title")
mtext("Time", 1, line = 3)
Of course you can archieve it using new functions made from the original (mostly the same as the original). I showed only the modified points.
modified plot(ts.obj) (made from plot.ts)
my.plot.ts <- function(~~~, my.ylab = NULL) {
:
nm <- my.ylab # before: nm <- colnames(x)
:
}
# use
my.plot.ts(z, my.ylab = expression(alpha, beta, gamma), type = "o")
modified plot(timeSeries.obj)
# made from `.plot.timeSeries`
my.plot.timeSeries <- function(~~~, my.ylab = NULL) {
:
my.plotTimeSeries(~~~, my.ylab = my.ylab)
}
# made from `timeSeries:::.plotTimeSeries`
my.plotTimeSeries <- function(~~~, my.ylab) {
:
nm <- my.ylab # before: nm <- colnames(x)
:
}
#use
my.plot.timeSeries(LPP, my.ylab = expression(alpha, beta, gamma, delta), type="o")

Resources