Related
I am running below code to evaluate a function at each value of r.
For each element of r, the function calculates the sum of elements of a matrix product. Before doing this, values of M are adjusted based on a kernel function.
# (1) set-up with toy data
r <- seq(0, 10, 1)
bw <- 25
M <- matrix(data = c(0, 1, 2,
1, 0, 1,
2, 1, 0), nrow = 3, ncol = 3)
X <- matrix(rep(1, 9), 3, 3)
#
# (2) computation
res <- c()
# loop, calculationg sum, Epanechnikov kernel
for(i in seq_along(r)) {
res[i] <- sum(
# Epanechnikov kernel
ifelse(-bw < (M - r[i]) & (M - r[i]) < bw,
3 * (1 - ((M - r[i])^2 / bw^2)) / (4*bw),
0) * X,
na.rm = TRUE
)
}
# result
res
I am looking for recommendations to speed this up using base R. Thank you!
Using outer:
Mr <- outer(c(M), r, "-")
colSums(3*(1 - Mr^2/bw^2)/4/bw*(abs(Mr) < bw)*c(X))
#> [1] 0.269424 0.269760 0.269232 0.267840 0.265584 0.262464 0.258480 0.253632 0.247920 0.241344 0.233904
I'll also note that the original for loop solution can be sped up by pre-allocating res (e.g., res <- numeric(length(r))) prior to the for loop.
I am struggling with this for so long. I have a logistic growth function where the growth parameter
r is a matrix. The model is constructed in a way that I have as an output two N the N1 and N2.
I would like to be able to change the r parameter over time. When time < 50 I would like
r = r1 where
r1=matrix(c(
2,3),
nrow=1, ncol=2
When time >= 50 I would like r=r2 where
r2=matrix(c(
1,2),
nrow=1, ncol=2
Here is my function. Any help is highly appreciated.
rm(list = ls())
library(deSolve)
model <- function(time, y, params) {
with(as.list(c(y,params)),{
N = y[paste("N",1:2, sep = "")]
dN <- r*N*(1-N/K)
return(list(c(dN)))
})
}
r=matrix(c(
4,5),
nrow=1, ncol=2)
K=100
params <- list(r,K)
y<- c(N1=0.1, N2=0.2)
times <- seq(0,100,1)
out <- ode(y, times, model, params)
plot(out)
I would like ideally something like this but it does not work
model <- function(time, y, params) {
with(as.list(c(y,params)),{
N = y[paste("N",1:2, sep = "")]
r = ifelse(times < 10, matrix(c(1,3),nrow=1, ncol=2),
ifelse(times > 10, matrix(c(1,4),nrow=1, ncol=2), matrix(c(1,2),nrow=1, ncol=2)))
print(r)
dN <- r*N*(1-N/K)
return(list(c(dN)))
})
}
Thank you for your time.
Here a generic approach that uses an extended version of the approx function. Note also some further simplifications of the model function and the additional plot of the parameter values.
Edit changed according to the suggestion of Lewis Carter to make the parameter change at t=3, so that the effect can be seen.
library(simecol) # contains approxTime, a vector version of approx
model <- function(time, N, params) {
r <- approxTime(params$signal, time, rule = 2, f=0, method="constant")[-1]
K <- params$K
dN <- r*N*(1-N/K)
return(list(c(dN), r))
}
signal <- matrix(
# time, r[1, 2],
c( 0, 2, 3,
3, 1, 2,
100, 1, 2), ncol=3, byrow=TRUE
)
## test of the interpolation
approxTime(signal, c(1, 2.9, 3, 100), rule = 2, f=0, method="constant")
params <- list(signal = signal, K = 100)
y <- c(N1=0.1, N2=0.2)
times <- seq(0, 10, 0.1)
out <- ode(y, times, model, params)
plot(out)
For a small number of state variables like in the example, separate signals with approxfun from package stats will look less generic but may be slighlty faster.
As a further improvement, one may consider to replace the "hard" transitions with a more smooth one. This can then directly be formulated as a function without the need of approx, approxfun or approxTime.
Edit 2:
Package simecol imports deSolve, and we need only a small function from it. So instead of loading simecol it is also possible to include the approxTime function explicitly in the code. The conversion from data frame to matrix improves performance, but a matrix is preferred anyway in such cases.
approxTime <- function(x, xout, ...) {
if (is.data.frame(x)) {x <- as.matrix(x); wasdf <- TRUE} else wasdf <- FALSE
if (!is.matrix(x)) stop("x must be a matrix or data frame")
m <- ncol(x)
y <- matrix(0, nrow=length(xout), ncol=m)
y[,1] <- xout
for (i in 2:m) {
y[,i] <- as.vector(approx(x[,1], x[,i], xout, ...)$y)
}
if (wasdf) y <- as.data.frame(y)
names(y) <- dimnames(x)[[2]]
y
}
If you want to pass a matrix parameter you should pass a list of parameters and you can modify it inside the model when your time limit is exceeded (in the example below you don't even have to pass the r matrix to the model function)
library(deSolve)
model <- function(time, y, params) {
with(as.list(c(y,params)),{
if(time < 3) r = matrix(c(2,3), nrow = 1, ncol = 2)
else r = matrix(c(1,3), nrow = 1, ncol = 2)
N = y[paste("N",1:2, sep = "")]
dN <- r*N*(1-N/K)
return(list(c(dN)))
})
}
y <- c(N1=0.1, N2=0.2)
params <- list(r = matrix(c(0,0), nrow = 1, ncol = 2), K=100)
times <- seq(0,10,0.1)
out <- ode(y, times, model, params)
plot(out)
You can see examples of this for instance with Delay Differential Equations ?dede
I'm trying to compute the integral between 1 and some cutoff 'cut' of the function given in the R-code below as 'int'. It depends on 2 parameters dM[i] and dLambda[j] defined before I make the integration and for each pair I save the results in vector 'vec':
vec = c() #vector for INT values: this is our goal
dM = seq(from = 0, to = 3, by = 0.01) #vector for mass density parameter
dLambda = seq(from = -1.5, to = 3, by = 0.01) #vector for vacuum energy density parameter
for (i in 1:length(dM)) {
for (j in 1:length(dLambda)) {
int = function(x) ((dM[i]*x^4*(x - 1) + dLambda[j]*x^2*(1 - x^2) + x^4)^(-1/2))
cut = 30
INT_data = integrate(int, 1, cut)
INT = INT_data$value
vec = c(vec, INT)
}
}
But when I run the script I get the error: "Error in integrate(int, 1, cut) : non-finite function value
". Nonetheless, if I tried the following code
int = function(x) ((0*x^4*(x - 1) -1.5*x^2*(1 - x^2) + x^4)^(-1/2))
cut = 30
INT_data = integrate(int, 1, cut)
INT = INT_data$value
vec = c(vec, INT)
I get the correct result without any error. So the error above is not true, it can calculate the integral but it seems that R cannot work it out if I use the 2 'for'-loops. How can I re-write the code so I can compute all the different values for dM[i] and dLambda[j] I want?
Your function is only defined for some values of dM and dLambda. You can use the try() function to attempt evaluation, but not stop in case an error occurs.
It's also a lot more efficient to pre-allocate the object to hold the results; running vec = c(vec, INT) gradually grows it, and that's very slow, because R needs to keep creating new vectors just one element longer than the last one.
This code fixes both issues, and then plots the result:
dM <- seq(from = 0, to = 3, by = 0.01) #vector for mass density parameter
dLambda <- seq(from = -1.5, to = 3, by = 0.01) #vector for vacuum energy density parameter
result <- matrix(NA, length(dM), length(dLambda))
for (i in 1:length(dM)) {
for (j in 1:length(dLambda)) {
int <- function(x) ((dM[i]*x^4*(x - 1) + dLambda[j]*x^2*(1 - x^2) + x^4)^(-1/2))
cut <- 30
INT_data <- try(integrate(int, 1, cut), silent = TRUE)
if (!inherits(INT_data, "try-error"))
result[i, j] <- INT_data$value
}
}
image(dM, dLambda, result)
Edited to add: Here's how this works. If integrate signals an error in your original code, the loop will stop. try() prevents that. If there's no error, it returns the result of the integrate call. If there is an error, it returns an object with information about the error. That object has class "try-error", so the check if (!inherits(INT_data, "try-error")) is basically asking "Was there an error?" If there was an error, nothing happens, and that entry of the result is left as NA, as it was initialized. The loop then goes on to try the next dM, dLambda pair.
The problem is mathematical rather than being related to coding. The function is not defined for the whole domain you are integrating. With dM[1] = 0 and dLambda > 1, your expression
(dM[i]*x^4*(x - 1) + dLambda[j]*x^2*(1 - x^2) + x^4)^(-1/2)
simplifies to
(dLambda[j] * x^2 * (1 - x^2) + x^4)^(-1/2)
so let's take dLambda[j] at 1.01, which is where your calculation stops:
(1.01 * x^2 * (1 - x^2) + x^4)^(-1/2)
which is
(1.01 * x^2 - 1.01 * x^4 + x^4)^(-1/2)
or
(1.01 * x^2 - 0.01 x^4)^(-1/2)
Now, you are evaluating x between 1 and 30. So what happens when x = 11?
(1.01 * 121 - 0.01 * 14641)^(-1/2)
This leaves you
(122.21 - 146.41)^(-1/2)
which is equivalent to
1/sqrt(-24.2)
So the reason for the error is that you are integrating a function in a domain in which it is undefined.
The function is badly behaved for other values of dM too, with infinite peaks in the midst of the range, so even using the integrate(..., stop.on.error = F) option won't allow you to keep calculating because you will get an infinite sum.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
Let's assume I have a data frame with 1000 entries/rows. Each row has an ID, a 2nd column with some data, and a 3rd column also with some data.
So the data frame would look something like:
ID yesNo Id_specific_data
1 1 4
2 0 8
3 0 43
4 1 11
5 0 9
... and so on.
I now need to do the following:
n = 4
ID_range <- c(1:n)
ID_spec_data <- floor(runif(n, min=10, max=100))
yesNo_data <- sample(c(0,1), replace=TRUE, size=n)
df <- data.frame("ID" = ID_range, "yesNo" = yesNo_data, "ID_specific_data" = ID_spec_data)
m <- 1
for (i in seq(1, 100, 1)) {
for (j in seq(0.1, 1, 0.1)) {
log_like_list <- c()
for (k in seq(0.1, 1, 0.1)) {
total_ID_list <- c()
for (l in seq(1, length(df$ID))) {
x = (df$ID_specific_data[[l]]*k - j) / (i*j)
calc = pnorm(x, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
total_ID_list[[l]] = calc
}
# log likelihood function
final_calc = sum(df$yesNo*log(total_ID_list)+(1-df$yesNo)*log(1 - total_ID_list))
log_like_list[[m]] = final_calc
m <- m + 1
}
}
}
So basically the end result (log_like_list) should be a list/vector with 1500*200*100 values. But in order to do this the same amount of calculations is needed to be done on the number of ID's in the data frame (which is about 500-1000 in my case). All in all - a lot of calculations.
I know for loops are probably the worst thing you can do in terms of speed, but I'm not even sure that using apply would make it super fast when it's this many calculations ? I have read about Rcpp, which in principle could reduce calculation time the most of any option. But it requires knowledge of C++ as I can see (which I really lack), and I'm not even sure if it is applicable to my problem here ?
So could the calculation times be reduced significantly with any R tricks, or do I just have to wait it out ?
I think your current edit is still wrong,
you probably shouldn't be re-defining log_like_list inside any of the loops.
Here's an alternative that first allocates all parameter combinations with expand.grid,
which is a bit wasteful in terms of RAM,
but I think it's manageable:
n <- 4L
df <- data.frame(
ID = 1L:n,
yesNo = sample(c(0,1), replace=TRUE, size=n),
ID_specific_data = floor(runif(n, min=10, max=100))
)
params <- expand.grid(
i = seq(1, 100, 1),
j = seq(0.1, 1, 0.1),
k = seq(0.1, 1, 0.1)
)
log_like <- sapply(1L:nrow(params), function(row_id) {
i <- params$i[row_id]
j <- params$j[row_id]
k <- params$k[row_id]
calc <- sapply(df$ID_specific_data, function(idsd) {
x <- (idsd * k - j) / (i * j)
pnorm(x, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
})
sum(df$yesNo * log(calc) + (1 - df$yesNo) * log(1 - calc))
})
However, for your final use case, this is probably still too slow...
You can try to use parallelization,
which might have acceptable times if you have many cores:
library(doParallel)
library(itertools)
# do NOT run these lines several times without calling stopCluster() on the created workers
workers <- makeCluster(detectCores())
registerDoParallel(workers)
n <- 1000L
df <- data.frame(
ID = 1L:n,
yesNo = sample(c(0,1), replace=TRUE, size=n),
ID_specific_data = floor(runif(n, min=10, max=100))
)
params <- expand.grid(
i = seq(1, 150, 0.1),
j = seq(0.1, 2, 0.01),
k = seq(0.1, 1, 0.01)
)
params_chunk <- isplitRows(params, chunks = getDoParWorkers())
log_like_par <- foreach(param = params_chunk, .combine = c, .multicombine = TRUE) %dopar% {
# return from foreach body here
sapply(1L:nrow(param), function(row_id) {
i <- param$i[row_id]
j <- param$j[row_id]
k <- param$k[row_id]
calc <- sapply(df$ID_specific_data, function(idsd) {
x <- (idsd * k - j) / (i * j)
pnorm(x, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
})
# return from sapply body here
sum(df$yesNo * log(calc) + (1 - df$yesNo) * log(1 - calc))
})
}
stopCluster(workers); registerDoSEQ()
I tried running it in my system (4 cores),
but stopped it after a few minutes.
If you wait it out, let me know how long it took.
This is not going to be a 100% answer that you can copy and paste, but I think it will help you get part of the way. Mainly you need to think about why it is that you are spending time doing loops where you are really dealing with essentially constant values.
For example
i <- seq(1, 100, 1)
j <- seq(0.1, 1, 0.1)
ioxj <- i %o% j
df_ij <- data.frame("i" = i, "j" = j, "ioxj" = ioxj)
df_ij$ixj <- df_ij$i * df_ij$j
Will get you every combination of i and j and their product and there is no reason to use a loop to get that basic math result. You might use a loop to go through the columns at some point, that would potentially make sense because the values of i and j might change.
You can also work similarly with k.
Also there is never a reason to do something like this
x = (df$ID_specific_data[[l]]*k - j) / (i*j)
in a loop going over each row in a data frame, that loses the whole idea of vectorization, you want to end up with this instead.
x = (df$ID_specific_data*k - j) / (i*j)
You need to play around with the code to get it exactly the way you want it, but it will be worth spending the time to do so. It's possible the occasional loop may be correct but I think you will likely end up doing something much simpler.
I calculated a distribution function numerically. First I plot the function. It looks wrong around 0.05. Is this due to rounding errors, please?
Second, I need to find the corresponding first and second non-central moments. That is,
EX = int x dF(x)
EX^2 = int x^2 dF(x)
Can I do this type of Lebesgue–Stieltjes integration in R, please? Is there a build-in method, please? If not in R, what package offers such calculation, please?
I guess alternatively, I can find the numerical differentiation f(x) of F(x) and then conduct the usually integration like
EX = int x f(x) dx
But I remember from somewhere that numerical differentiation is much less stable. Which is the right way, please?
FYI my functions are attached below.
library(mvtnorm)
library(matrixcalc)
VAR <- matrix(c(1.043856e-03, 5.044899e-04, 3.239951e-04, 2.330992e-04, 0.0001779055, 0.0001403866, 0.0001127118, 9.074962e-05, 7.157144e-05,
5.044899e-04, 5.485889e-04, 3.523165e-04, 2.534751e-04, 0.0001934568, 0.0001526582, 0.0001225642, 9.868232e-05, 7.782773e-05,
3.239951e-04, 3.523165e-04, 3.878844e-04, 2.790645e-04, 0.0002129870, 0.0001680697, 0.0001349376, 1.086447e-04, 8.568475e-05,
2.330992e-04, 2.534751e-04, 2.790645e-04, 3.123147e-04, 0.0002383642, 0.0001880950, 0.0001510153, 1.215896e-04, 9.589399e-05,
1.779055e-04, 1.934568e-04, 2.129870e-04, 2.383642e-04, 0.0002728857, 0.0002153361, 0.0001728863, 1.391990e-04, 1.097820e-04,
1.403866e-04, 1.526582e-04, 1.680697e-04, 1.880950e-04, 0.0002153361, 0.0002548851, 0.0002046389, 1.647645e-04, 1.299447e-04,
1.127118e-04, 1.225642e-04, 1.349376e-04, 1.510153e-04, 0.0001728863, 0.0002046389, 0.0002555744, 2.057751e-04, 1.622886e-04,
9.074962e-05, 9.868232e-05, 1.086447e-04, 1.215896e-04, 0.0001391990, 0.0001647645, 0.0002057751, 2.840218e-04, 2.239993e-04,
7.157144e-05, 7.782773e-05, 8.568475e-05, 9.589399e-05, 0.0001097820, 0.0001299447, 0.0001622886, 2.239993e-04, 3.974881e-04),
nrow=9, ncol=9, byrow=TRUE)
is.symmetric.matrix(VAR)
is.positive.definite(VAR)
kappa(VAR)
CDF <- function(x){
summand <- rep(0, 5)
for(j in 5:9){
choice <- combn(9, j)
for(i in 1:ncol(choice)){
ub <- rep(Inf, 9)
ub[choice[, i]] <- x
summand[j-4] <- summand[j-4] + as.numeric(pmvnorm(lower=rep(-Inf, 9), upper=ub, sigma=VAR))
}
}
l <- c(1, -5, 15, -35, 70)
as.numeric(t(l)%*%summand)
}
CDF <- Vectorize(CDF)
x <- seq(-0.1, 0.1, by=0.01)
y <- CDF(x)
plot(x, y, type="l", lwd=2)
I initially plotted the result I got from taking first differences from numCDF <- CDF( seq(-10, 10, length=100) ), but that was rather disappointing, since only one value was different than 0. So I restricted the focus to:
numCDF <- CDF( seq(-.10, .10, length=100) )
plot( diff(numCDF) )
Simply plotting the values of numCDF produces similar chaotic results in the region where you expressed concern.
So I think maybe your function is not sufficiently well-behaved to yield good results.