R use apply when arguments contain arrays and scalars - r

I want to avoid the following loop:
for(i in 1:2){
vectVal[i] = myFunc(M[,,i],S[,,i],phi2, sig2)
}
by using the apply function.
The problem is that the arguments passed to the apply function contain arrays (--> M and S) and scalars (--> phi2 and sig2).
I tried the following:
apply(M,3,myFunc, S = S, phi2 = phi2, sig2 = sig2)
which resulted in an error message because S is an array and not a matrix as required in myFunc (see below):
Here is a reproducible code:
M = array(data = c(
0.5, 0.7, 0.45,
0.5, 0.3, 0.45,
0.5, 0.7, 0.3,
0.5, 0.3, 0.7,
0.5, 0.7, 0.45,
0.5, 0.3, 0.55),
dim = c(3,2,2),
)
S = array(data = c(
0.7723229, -0.2149794, -0.2159068,
-0.2149794, 0.7723229, -0.2083123,
-0.2159068, -0.2083123, 0.7723229,
0.7723229, -0.2149794, -0.2149794,
-0.2149794, 0.7723229, -0.1783025,
-0.2149794, -0.1783025, 0.7723229,
0.7723229, -0.2149794, -0.2176665,
-0.2149794, 0.7723229, -0.2111496,
-0.2176665, -0.2111496, 0.7723229),
dim = c(3,3,2)
)
phi2 = 0.5
sig2 = 0.3
myFunc = function(M, S, phi2, sig2){
valMult = M[,1]%*%diag(S)
valEnd = valMult + phi2 - sig2
return(valEnd)
}
vectVal = vector(length = 2)
for(i in 1:2){
vectVal[i] = myFunc(M[,,i],S[,,i],phi2, sig2)
}
vectVal
Does someone has an idea?

One (not particularly efficient) way would be to use plyr to split your arrays into lists (each element of the lists are the third dimension of your arrays). You could then use mapply to run your function like so:
require( plyr)
ml <- alply( M , 3 )
sl <- alply( S , 3 )
mapply( myFunc , ml , sl , phi2 , sig2 )
# 1 2
#1.474333 1.358484
Update:
A more vectorised alternative (but still not as fast as for and %*% [see #JorisMeys comment below]) is to get the diag of S and then use colSums and matrix multiplication like so to achieve the same result:
s <- apply(S,3,diag)
colSums( M[,1,] * s ) + phi2 - sig2
# [1] 1.474333 1.358484
Update, update:
#JorisMeys has written a vectorised extractor function for getting the diagonal elements of 3D square arrays. Check this out.

Related

Implementing a particular approach to calculating a log-likelihood using matrix operations

I cam across a mathematical expression for log-likelihood in a CrossValidated.com answer and am unclear how I should implement in R. I'm not sure if SO can represent MathML the same as CV, but this is the first equation in the second (not accepted) anser:
$$
\begin{eqnarray}
\ell(\mu, \Sigma) &=& C - \frac{m}{2}\log|\Sigma|-\frac{1}{2} \sum_{i=1}^m \text{tr}\left[(\mathbf{x}^{(i)}-\mu)^T \Sigma^{-1} (\mathbf{x}^{(i)}-\mu)\right]\\
$$
I focusing on the 3rd term in that equation and I do not think the trace operation is necessary according to another answer on that page. I suppose I could look at one of the several implementations in the various packages that exist, but I'm thinking they use more economical approaches that don't clearly follow that equation's procedure, as did #onyambu in the answer here:
I'm ripping out code from an earlier SO example:
library(MASS)
# Make covariance matrix. See note above re the implications of using a correlation matrix.
S = matrix(c(1.0, 0.2, 0.1, 0.35, 0.0,
0.2, 1.0, 0.0, 0.4, 0.0,
0.1, 0.0, 1.0, 0.0, 0.4,
0.35, 0.4, 0.0, 1.0, 0.6,
0.0, 0.0, 0.4, 0.6, 1.0), ncol = 5)
colnames(S) = c("Y1", "X1", "X2", "Z1" ,"Z2")
rownames(S) = colnames(S)
# Make mean vector
mus = c(1, 2, 3, 4, 5); names(mus) = colnames(S)
# Generate 5347 observations
obs = mvrnorm(n = 200, mu = mus, Sigma = S)
This effort was in response to a question correctly answered now but not using a summation of a matrix expression. I think I can do it with a for-loop to create individual contributions for each data point:
llmat.term3 <- matrix(NA, 200,1)
for(n in 1:200) {
llmat.term3[n] <- t(obs[n,]-mus) %*% solve(S) %*% (obs[n,]-mus) }
sum(llmat.term3)
#[1] 982.7356
.... but I'm wondering if there is a more compact matrix approach? Or I suppose, filled in the gaps in my linear algebra knowledge that explains why sum(u * solve(sig, u) is the same as sum{i=1,N} ( t(obs[n,]-mu) %*% S^-1 %*% (obs[n,]-mu) ).
in your code you have
S = matrix(c(1.0, 0.2, 0.1, 0.35, 0.0,
0.2, 1.0, 0.0, 0.4, 0.0,
0.1, 0.0, 1.0, 0.0, 0.4,
0.35, 0.4, 0.0, 1.0, 0.6,
0.0, 0.0, 0.4, 0.6, 1.0), ncol = 5)
colnames(S) = c("Y1", "X1", "X2", "Z1" ,"Z2")
rownames(S) = colnames(S)
# Make mean vector
mus = c(1, 2, 3, 4, 5); names(mus) = colnames(S)
# Generate 5347 observations
set.seed(123)
obs = MASS::mvrnorm(n = 200, mu = mus, Sigma = S)
llmat.term3 <- matrix(NA, 200,1)
for(n in 1:200) {
llmat.term3[n] <- t(obs[n,]-mus) %*% solve(S) %*% (obs[n,]-mus) }
sum(llmat.term3)
#[1] 982.7356
compare to more compact approaches:
u <- t(obs) - mus
sum(diag(solve(S, tcrossprod(u))))
#> [1] 982.7356
sum(u * solve(S, u))
#> [1] 982.7356
Though the two expressions give similar results, The first one seems to be quicker than the second. I do not know why since in the first one there is a computation of n * n matrix. The for loop takes for-ever to compute.
Unit: milliseconds
expr min lq mean median uq max neval
a 4532.6753 4679.4043 5470.94765 4815.1294 6061.3284 7789.5116 10
b 2.8991 3.2693 3.73495 3.3675 3.7777 6.9719 10
c 7.8176 8.5473 12.03060 9.2542 16.4089 20.1742 10
set.seed(123)
n <- 200000
obs = MASS::mvrnorm(n = n, mu = mus, Sigma = S)
u <- t(obs) -mus
microbenchmark::microbenchmark(a = {
llmat.term3 <- matrix(NA, n,1)
for(i in seq(n)) {
llmat.term3[i] <- t(obs[i,]-mus) %*% solve(S) %*% (obs[i,]-mus) }
sum(llmat.term3)
},
b = sum(diag(solve(S, tcrossprod(u)))),
c = sum(u * solve(S, u)),
check = 'equal', times = 10)
NB: took me a while to get the seed you used. Next time include it in your data generation

How do I need to assign values to each other in triplets using R?

The situation is as follows:
I need to create a dataset of triplets where we have discrete distribution of stock prices S <- c(80,100,120,140,160), with probability P <- c(0.2, 0.3, 0.2, 0.2, 0.1), call option C <- max(S-120,0) = c(0,0,0,20,40) and liability of an option which pays 30 if in a certain region otherwise zero, namely L = I{110 \leq S \leq 150} = c(0,0,30,30,0) <- c(0,0,30,30,0). It is important to mention that if P[1] = 80, then C[1] and L[1]. This holds for i = 1,2,3,4,5. How do you create a dataset for N = 10000 simulations where each value for i corresponds to the other two values for the same i?
This is the code I had for now. Note that X_1 = S, X_2 = C and Y = L.
X_1 <- function(n) {
sample(c(80,100,120,140,160), size = n,
prob = c(0.2, 0.3, 0.2, 0.2, 0.1), replace=T)
}
X_2 <- function(n) {
sample(X_1 - 120, size = n,
prob = c(0.2, 0.3, 0.2, 0.2, 0.1), replace=T)
}
Y <- function(n) {
sample(L, size = n,
prob = c(0.2, 0.3, 0.2, 0.2, 0.1), replace=T)
}
##Creating triplets##
df <- data.frame(S_T = X_1(10000), C_T = X_2(10000), L_T =Y(10000))
df```
I'm not sure if you want C_T to be dependent on the S_T values. If you do, I think you just want to call X_1, assign the results to an object, then use that as the argument to X_2 (or just subtract 120, which is what X_2 does).
X_1 <- function(n) {
sample(c(80,100,120,140,160), size = n,
prob = c(0.2, 0.3, 0.2, 0.2, 0.1), replace=T)
}
# Call that function
S_T <- X_1(10) # for practice
C_T <- S_T - 120 # that's all you're doing in function X_2, if you want to use S_T
If you want to C_T to contain values independent of S_T, you can create function within function
X_1 <- function(n) {
sample(c(80,100,120,140,160), size = n,
prob = c(0.2, 0.3, 0.2, 0.2, 0.1), replace=T)
}
X_2 <- function(n) {
X_1(n) - 120
}
S_T <- X_1(10) # Same as above
C_T <- X_2(10) # Gives values not dependent on S_T
EDIT to address comment below:
It's hard to read the comment, but it looks like you want create a function that takes the results of function X_1 and returns a result based on a condition. Use ifelse to read each element one at at time. You can create another function and then input the results of function X_1
Y <- function(X_1_func){
ifelse( X_1_func == 80,
return(0),
ifelse(X_1_func == 100,
return(0),
ifelse(X_1_func == 120,
return(30),
return(60) # Add a default value here or the last possible value if others are F
)
)
)
}
sapply(X_1(10), Y) # Use an apply to input one element of function X_1 at a time. Assign results to L or whatever you with to call.
If this all works for you, you can accept the answer.

avoid negative values when resolving a ODE

I am trying to model the behavior of a made-up networks of 5 genes, but I have the problem that I get negative values, which it has not sense biologically speaking.
Is there a way to limit the values to zero?
I managed to do it when I represent the graph, but I don't know how to use the ifelse in the main equation.
Thank you very much-1
###################################################
###preliminaries
###################################################
library(deSolve)
library(ggplot2)
library(reshape2)
###################################################
### Initial values
###################################################
values <- c(A = 1,
B = 1,
D = 1,
E = 20,
R = 1)
###################################################
### Set of constants
###################################################
constants <- c(a = 1.2,
b = 0.5,
c = 1.2,
d = 1.5,
e = 0.3,
f = 0.5,
g = 1.5,
h = 0.9,
i = 1.3,
j = 1.3,
m = 0.8,
n = 0.6,
q = 1,
t = 0.0075,
u = 0.0009,
Pa = 100,
Pb = 0.05,
Pd = 0.1,
Pe = 10)
###################################################
### differential equations
###################################################
Dynamic_Model<-function(t, values, constants) {
with(as.list(c(values, constants)),{
dA <- Pa + a*D - j*A - R
dB <- Pb + b*A + e*E - m*B
dD <- Pd + d*B + f*E - g*A - n*D
dE <- Pe - h*B + i*E - q*E
dR <- t*A*B - u*D*E
list(c(dA, dB, dD, dE, dR))
})
}
###################################################
### time
###################################################
times <- seq(0, 200, by = 0.01)
###################################################
### print ## Ploting
###################################################
out <- ode(y = values, times = times, func = Dynamic_Model, parms = constants)
out2 <- ifelse(out<0, 0, out)
out.df = as.data.frame(out2)
out.m = melt(out.df, id.vars='time')
p <- ggplot(out.m, aes(time, value, color = variable)) + geom_point(size=0.5) + ggtitle("Dynamic Model")
I agree completely with #Lutz Lehmann, that the negative values are a result of the structure of the model.
The system of equations allows that derivatives still become negative, even if the states are already below zero, i.e. the states can further decrease. We don't have information about what the states are, so the following is only a technical demonstration. Here a dimensionless Monod-type feedback function fb is implemented as a safeguard. It is normally close to one. The km value should be small enough to act only for state values close to zero, and it should not be too small to avoid numerical errors. It can be formulated individually for each state. Other function types are also possible.
library(deSolve)
library(ggplot2)
library(reshape2)
values <- c(A = 1,
B = 1,
D = 1,
E = 20,
R = 1)
constants <- c(a = 1.2,
b = 0.5,
c = 1.2,
d = 1.5,
e = 0.3,
f = 0.5,
g = 1.5,
h = 0.9,
i = 1.3,
j = 1.3,
m = 0.8,
n = 0.6,
q = 1,
t = 0.0075,
u = 0.0009,
Pa = 100,
Pb = 0.05,
Pd = 0.1,
Pe = 10,
km = 0.001)
Dynamic_Model<-function(t, values, constants) {
with(as.list(c(values, constants)),{
fb <- function(x) x / (x+km) # feedback
dA <- (Pa + a*D - j*A - R) * fb(A)
dB <- (Pb + b*A + e*E - m*B) * fb(B)
dD <- (Pd + d*B + f*E - g*A - n*D) * fb(D)
dE <- (Pe - h*B + i*E - q*E) * fb(E)
dR <- (t*A*B - u*D*E) * fb(R)
list(c(dA, dB, dD, dE, dR))
})
}
times <- seq(0, 200, by = 0.1)
out <- ode(y = values, times = times, func = Dynamic_Model, parms = constants)
plot(out)
Additional hints:
Removal of negative values afterwards (out2 <- ifelse(out<0, 0, out)) is just wrong.
Removal of negative values in the model function, i.e.
use the ifelse in the main
would also be wrong as it can lead to a severe violation of mass balance.
the time steps don't need to be very small. They are automatically adapted anyway by the solver. Too small time steps make your model slow and you get more outputs as needed.
some of your parameters are quite large, so that the model becomes very stiff.

R match.fun does not see function from imported package

I've imported package called "stabledist". It includes function "rstable"
When I do this
my_fun <- function(function_from_library)
{
function_from_library <- match.fun(function_from_library)
print(some_data <- function_from_library)
}
my_fun (5, rstable(5, alpha = 1.7, beta = 0, gamma = 1.0, delta = 1.0))
I get error:" Error in match.fun(some_distr) : 'rstable(5, alpha = 1.7, beta = 0, gamma = 1, delta = 1)' is not a function, character or symbol "
Everything works fine, whrn match.fun is deleted. Is there anyway to import library that it can be visible to others? Or I can just skip match.fun?
This is how I implemented my suggestion:
library(stabledist )
my_fun <- function(function_from_library, ...)
{
function_from_library <- match.fun(function_from_library)
print(some_data <- function_from_library(...))
}
my_fun ( rstable, 5, alpha = 1.7, beta = 0, gamma = 1.0, delta = 1.0)
#[1] 1.4600308688 -0.0004999279 1.9301805374 -1.3276383194 0.9137183709
It does require also knowing how to use the ellipsis-mechanism for passing lists of arbitrary length to functions as Roland had additionally commented. The print mechanism will not actually create a data-vector of values. To do that you would need to assign ("<-") the result "outside" the function body (and so the print() call is not needed either).
library(stabledist )
my_fun <- function(function_from_library, ...)
{
function_from_library <- match.fun(function_from_library)
function_from_library(...) }
some_data <- my_fun ( rstable, 5, alpha = 1.7, beta = 0, gamma = 1.0, delta = 1.0)
some_data
# 5 random values are printed at console.

Iteratively define user-defined discrete distributions

I am writing a script that, using -distr-, defines some discrete distributions based on the following objects:
margins <- c("discrete1", "discrete2")
vec1 <- list(support=c(0,1,2), probabilities=c(0.2, 0.2, 0.6))
vec2 <- list(support=c(12,14,20), probabilities=c(0.1, 0.15, 0.75))
Here you have the code that works as expeced: it creates the two distributions.
library("distr")
discrete1 <- DiscreteDistribution (supp = vec1[[1]], prob = vec1[[2]])
ddiscrete1 <- d(discrete1) # Density function
pdiscrete1 <- p(discrete1) # Distribution function
qdiscrete1 <- q(discrete1) # Quantile function
rdiscrete1 <- r(discrete1)
discrete2 <- DiscreteDistribution (supp = vec2[[1]], prob = vec2[[2]])
ddiscrete2 <- d(discrete2)
pdiscrete2 <- p(discrete2)
qdiscrete2 <- q(discrete2)
rdiscrete2 <- r(discrete2)
Once the two (or possibly more) distributions are defined, my final goal is to sample random numbers from them:
rdiscrete1(100)
rdiscrete2(100)
The problem with this code is that the number of distributions can be very high.. I wonder how it could be possible to automatize the creation of the functions in a more elegant manner.
Also, I need the two functions to be of class DiscreteDistribution and not as nested in lists (see is(discrete1) in my example).
l <- list(list(support = c(0, 1, 2), probabilities = c(0.2, 0.2, 0.6)),
list(support = c(12, 14, 20), probabilities = c(0.1, 0.15, 0.75)))
distrs <- lapply(1:length(l), function(n) {
d <- DiscreteDistribution(supp = l[[n]][[1]], prob = l[[n]][[2]])
list(d = d, dd = d(d), pd = p(d), qd = q(d), rd = r(d))
})
# First object of class DiscreteDistribution
is(distrs[[1]][[1]])
# [1] "DiscreteDistribution" "UnivariateDistribution" "AcDcLcDistribution"
# [4] "Distribution" "UnivDistrListOrDistribution"
# Random numbers
dim(sapply(distrs, function(x) x[[5]](100)))
# [1] 100 2

Resources