I'm trying to make a model using deSolve with a fairly large number of states. One of the states, 'foo', is actually made of 15 different states comprising of foo[1,1:5], foo[2,1:5] and foo[3,1:5] so I thought it would be easiest to pass the function a matrix of states instead of typing them out individually and then I could refer to them with indexing:
par <- rep(NA,3)
par_names <- c('alpha','prog','death_rate')
names(par) <-par_names
par['alpha'] <- 0.7
par['prog'] <- 0.8
par['death_rate'] <- 0.3
foo <- matrix(0,nrow = 3,ncol = 5)
states <- foo
my_func <- function(t,states,par){
with(as.list(c(states,par)),{
for (j in 1:5){
dfoo[1,j] <- par['alpha']*par['prog']*foo[1,j] - par['death_rate']*foo[1,j]
dfoo[2,j] <- par['prog']*foo[1,j] - par['prog']*foo[2,j] - par['death_rate']*foo[2,j]
dfoo[3,j] <- par['prog']*foo[2,j] - par['prog']*foo[3,j] - par['death_rate']*foo[3,j]
}
list(c(
dfoo[]
))
})
}
times <- seq(1,365,by=1)
library(deSolve)
alldata <- as.data.frame(ode(y=states,times=times,func=my_func,parms=par))
I've tried to fix it but I just keep getting the same error:
Error in dfoo[1, j] <- par["alpha"] * par["prog"] * foo[1, j] - par["death_rate"] * :
object 'dfoo' not found
So does anyone know how this might be made to work or an easier way of doing this?
Yes, you can pass a matrix in as your states. But every time ode calls your function (except for the first time) it will pass a vector rather than a matrix. But you can convert it to a matrix at the beginning of your function.
You use unnecessary contortions to create your data. Also, as pointed out in the comments, your function doesn't seem to initialize dfoo. Finally, your for loop in the function could be more cleanly handled with a few vectorized operations. Here is an example:
my_func <- function(t,states,par){
foo <- matrix(states, nrow = 3, ncol = 5)
dfoo <- with(as.list(par), rbind(
(prog * alpha * foo[1,]) - (death_rate * foo[1,]),
(prog * foo[-nrow(foo),]) - (prog * foo[-1,]) - (death_rate * foo[-1,])
))
list(dfoo)
}
library(deSolve)
par <- c(alpha = 0.7, prog = 0.8, death_rate = 0.3)
states <- matrix(0,nrow = 3,ncol = 5)
ode <- ode(y=states, times=1:365, func=my_func, parms=par)
alldata <- as.data.frame(ode)
Related
How would the following be written using apply?
# Variables
age <- 1:100
Y <- age+5
d <- 0.25
dx <- 5
a_x <- 1:dx
Yd <- matrix( 0, nrow=max(age), ncol=dx )
# Nested loop is computationally inefficient?
for (a in age){
for (ax in a_x){
Yd[a,ax] <- (Y[[a]] * (1 - d) ** (ax-1))
}
}
My model has a lot of these nested for loop structures, because I am incompetent. I am hoping to improve the computational time using apply. I find the apply functions rather confusing to get into. I am looking for a solution that illustrates how one can obtain such nested structures using apply. Hopefully, from there on I can apply (pun intended) the solution to even more complicated nested for loops (4-5 loops within each other).
For example
Ydi <- rep( list(), 6)
for (i in 1:6){
Ydi[[i]] <- matrix( 0, nrow=max(age), ncol=dx )
}
# Nested loop is computationally inefficient?
for (i in 1:6){
for (a in age){
for (ax in a_x){
Ydi[[i]][a,ax] <- (Y[[a]] * (1 - d) ** (ax-1)) + i
}
}
}
I would use expand.grid instead:
df <- data.frame(expand.grid(a = age, ax = a_x))
df[['Yd']] <- (df[['a']] + 5) * (1 - d) ** (df[['ax']] - 1)
This is infinitely extendable (subject to memory constraints) - each additional nested loop will just be an additional variable in your expand.grid call. For example:
new_col <- 1:2
df_2 <- data.frame(expand.grid(a = age, ax = a_x, nc = new_col))
df_2[['Yd']] <- (df_2[['a']] + 5) * (1 - d) ** (df_2[['ax']] - 1) + df_2[['nc']]
This essentially switches to a tidy data format, which is an easier way of storing multi-dimensional data.
For easier syntax, and faster speed, you can use the data.table package:
library(data.table)
dt_3 <- data.table(expand.grid(a = age, ax = a_x, nc = new_col))
dt_3[ , Yd := (a + 5) * (1 - d) ** (ax - 1) + nc]
I am writing a for loop to calculate a numerator which is part of a larger formula. I used a for loop but it is taking a lot of time to compute. What would be a better way to do this.
city is a dataframe with the following columns: pop, not.white, pct.not.white
n <- nrow(city)
numerator = 0
for(i in 1:n) {
ti <- city$pop[i]
pi<- city$pct.not.white[i]
for(j in 1:n) {
tj <- city$pop[j]
pj <- city$pct.not.white[j]
numerator = numerator + (ti * tj) * abs(pi -pj)
}
}
Use the following toy data for result validation.
set.seed(0)
city <- data.frame(pop = runif(101), pct.not.white = runif(101))
The most obvious "vectorization":
# n <- nrow(city)
titj <- tcrossprod(city$pop)
pipj <- outer(city$pct.not.white, city$pct.not.white, "-")
numerator <- sum(titj * abs(pipj))
Will probably have memory problem if n > 5000.
A clever workaround (exploiting symmetry; more memory efficient "vectorization"):
## see https://stackoverflow.com/a/52086291/4891738 for function: tri_ind
n <- nrow(city)
ij <- tri_ind(n, lower = TRUE, diag = FALSE)
titj <- city$pop[ij$i] * city$pop[ij$j]
pipj <- abs(city$pct.not.white[ij$i] - city$pct.not.white[ij$j])
numerator <- 2 * crossprod(titj, pipj)[1]
The ultimate solution is to write C / C++ loop, which I will not showcase.
I am starting in R and trying to get this loop to execute. I am trying to get the loop to calculate consecutive distances between coordinates using a function (Vincenty's formula). 'Distfunc' is the file to the function. The function is then called up by 'x' below. All I want then is a data frame or a list of the distances between coordinates. Greatful of any help!
Distfunc <- source("F://Distfunc.R")
for (i in length(Radians)) {
LatRad1 <- Radians[i,1]
LongRad1 <- Radians[i,2]
LatRad2 <- Radians[i+1,1]
LongRad2 <- Radians[i+1,2]
x <- gcd.vif(LongRad1, LatRad1, LongRad2, LatRad2)
print(data.frame(x[i]))
}
Well, without a good description of the problem you are facing and a proper reproducible example it is very difficult to provide any good insight. To start off, see How to make a great R reproducible example?.
There are many things that are not clear in the way you are doing things. First of all, why assign the results of source(...) to the variable Distfunc?
Anyways, here is some code that I put together in trying to understand this; it runs without problems, but it is not clear that it accomplishes what you expect (since you don't provide much information). In particular, the codet uses the implementation for function gcd.vif by Mario Pineda-Krch (http://www.r-bloggers.com/great-circle-distance-calculations-in-r/). The code below is aimed at clarity, since you mention that you are starting in R.
# Calculates the geodesic distance between two points specified by radian latitude/longitude using
# Vincenty inverse formula for ellipsoids (vif)
# By Mario Pineda-Krch (http://www.r-bloggers.com/great-circle-distance-calculations-in-r/)
gcd.vif <- function(long1, lat1, long2, lat2) {
# WGS-84 ellipsoid parameters
a <- 6378137 # length of major axis of the ellipsoid (radius at equator)
b <- 6356752.314245 # ength of minor axis of the ellipsoid (radius at the poles)
f <- 1/298.257223563 # flattening of the ellipsoid
L <- long2-long1 # difference in longitude
U1 <- atan((1-f) * tan(lat1)) # reduced latitude
U2 <- atan((1-f) * tan(lat2)) # reduced latitude
sinU1 <- sin(U1)
cosU1 <- cos(U1)
sinU2 <- sin(U2)
cosU2 <- cos(U2)
cosSqAlpha <- NULL
sinSigma <- NULL
cosSigma <- NULL
cos2SigmaM <- NULL
sigma <- NULL
lambda <- L
lambdaP <- 0
iterLimit <- 100
while (abs(lambda-lambdaP) > 1e-12 & iterLimit>0) {
sinLambda <- sin(lambda)
cosLambda <- cos(lambda)
sinSigma <- sqrt( (cosU2*sinLambda) * (cosU2*sinLambda) +
(cosU1*sinU2-sinU1*cosU2*cosLambda) * (cosU1*sinU2-sinU1*cosU2*cosLambda) )
if (sinSigma==0) return(0) # Co-incident points
cosSigma <- sinU1*sinU2 + cosU1*cosU2*cosLambda
sigma <- atan2(sinSigma, cosSigma)
sinAlpha <- cosU1 * cosU2 * sinLambda / sinSigma
cosSqAlpha <- 1 - sinAlpha*sinAlpha
cos2SigmaM <- cosSigma - 2*sinU1*sinU2/cosSqAlpha
if (is.na(cos2SigmaM)) cos2SigmaM <- 0 # Equatorial line: cosSqAlpha=0
C <- f/16*cosSqAlpha*(4+f*(4-3*cosSqAlpha))
lambdaP <- lambda
lambda <- L + (1-C) * f * sinAlpha *
(sigma + C*sinSigma*(cos2SigmaM+C*cosSigma*(-1+2*cos2SigmaM*cos2SigmaM)))
iterLimit <- iterLimit - 1
}
if (iterLimit==0) return(NA) # formula failed to converge
uSq <- cosSqAlpha * (a*a - b*b) / (b*b)
A <- 1 + uSq/16384*(4096+uSq*(-768+uSq*(320-175*uSq)))
B <- uSq/1024 * (256+uSq*(-128+uSq*(74-47*uSq)))
deltaSigma = B*sinSigma*(cos2SigmaM+B/4*(cosSigma*(-1+2*cos2SigmaM^2) -
B/6*cos2SigmaM*(-3+4*sinSigma^2)*(-3+4*cos2SigmaM^2)))
s <- b*A*(sigma-deltaSigma) / 1000
return(s) # Distance in km
}
# Initialize the variable 'Radians' with random data
Radians <- matrix(runif(20, min = 0, max = 2 * pi), ncol = 2)
lst <- list() # temporary list to store the results
for (i in seq(1, nrow(Radians) - 1)) { # loop through each row of the 'Radians' matrix
LatRad1 <- Radians[i, 1]
LongRad1 <- Radians[i, 2]
LatRad2 <- Radians[i + 1, 1]
LongRad2 <- Radians[i + 1, 2]
gcd_vif <- gcd.vif(LongRad1, LatRad1, LongRad2, LatRad2)
# Store the input data and the results
lst[[i]] <- c(
latitude_position_1 = LatRad1,
longtude_position_1 = LongRad1,
latitude_position_2 = LatRad2,
longtude_position_2 = LongRad2,
GCD = gcd_vif
)
}
Results <- as.data.frame(do.call(rbind, lst)) # store the input data and the results in a data frame
I am making flow plots for spatial interation models, with x-y coordinates for both origins and destinations:
The problem is that I keep using nested for loops (one for origins, one for destinations) to plot these lines and am sure there's a better way in R.
Anyway to help answer this question I set-up a simple reproducible example with 4 origins and 2 destinations. Suspect the answer to plotting quicker is in matrix algebra, but not sure where to start. Test it out and please let me know:
o <- data.frame(x = c(3,5,6,1), y = c(8,2,3,2))
plot(o)
d <- data.frame(x = c(5,3), y = c(5,3))
points(d, col="red", pch=3)
beta <- 0.6
dist <- matrix(sqrt(c(o[,1] - d[1,1], o[,1] - d[2,1] )^2 +
c(o[,2] - d[1,2], o[,2] - d[2,2] )^2), ncol = 2)
s <- dist
for(i in 1:nrow(o)){
for(j in 1:nrow(d)){
s[i,j] <- exp(-beta * dist[i,j])
}
}
for(i in 1:nrow(o)){
for(j in 1:nrow(d)){
lines(c(o[i,1], d[j,1]), c(o[i,2], d[j,2]),
lwd = 2 * s[i,j] / mean(s))
}
}
Edit - for some context on this project, please see here http://rpubs.com/RobinLovelace/9697
A way to replace the second loop is to use mapply:
fun <- function(row.o, row.d)
{
lines(c(o[row.o,1], d[row.d,1]), c(o[row.o,2], d[row.d,2]),
lwd = 2 * s[row.o,row.d] / mean(s))
}
#all combinatios of rows of `d` and `o`
args.od <- expand.grid(1:nrow(o), 1:nrow(d))
mapply(fun, row.o = args.od[,1], row.d = args.od[,2])
The plot:
I want to use arms() to get one sample each time and make a loop like the following one in my function. It runs very slowly. How could I make it run faster? Thanks.
library(HI)
dmat <- matrix(0, nrow=100,ncol=30)
system.time(
for (d in 1:100){
for (j in 1:30){
y <- rep(0, 101)
for (i in 2:100){
y[i] <- arms(0.3, function(x) (3.5+0.000001*d*j*y[i-1])*log(x)-x,
function(x) (x>1e-4)*(x<20), 1)
}
dmat[d, j] <- sum(y)
}
}
)
This is a version based on Tommy's answer but avoiding all loops:
library(multicore) # or library(parallel) in 2.14.x
set.seed(42)
m = 100
n = 30
system.time({
arms.C <- getNativeSymbolInfo("arms")$address
bounds <- 0.3 + convex.bounds(0.3, dir = 1, function(x) (x>1e-4)*(x<20))
if (diff(bounds) < 1e-07) stop("pointless!")
# create the vector of z values
zval <- 0.00001 * rep(seq.int(n), m) * rep(seq.int(m), each = n)
# apply the inner function to each grid point and return the matrix
dmat <- matrix(unlist(mclapply(zval, function(z)
sum(unlist(lapply(seq.int(100), function(i)
.Call(arms.C, bounds, function(x) (3.5 + z * i) * log(x) - x,
0.3, 1L, parent.frame())
)))
)), m, byrow=TRUE)
})
On a multicore machine this will be really fast since it spreads the loads across cores. On a single-core machine (or for poor Windows users) you can replace mclapply above with lapply and get only a slight speedup compared to Tommy's answer. But note that the result will be different for parallel versions since it will use different RNG sequences.
Note that any C code that needs to evaluate R functions will be inherently slow (because interpreted code is slow). I have added the arms.C just to remove all R->C overhead to make moli happy ;), but it doesn't make any difference.
You could squeeze out a few more milliseconds by using column-major processing (the question code was row-major which requires re-copying as R matrices are always column-major).
Edit: I noticed that moli changed the question slightly since Tommy answered - so instead of the sum(...) part you have to use a loop since y[i] are dependent, so the function(z) would look like
function(z) { y <- 0
for (i in seq.int(99))
y <- y + .Call(arms.C, bounds, function(x) (3.5 + z * y) * log(x) - x,
0.3, 1L, parent.frame())
y }
Well, one effective way is to get rid of the overhead inside arms. It does some checks and calls the indFunc every time even though the result is always the same in your case.
Some other evaluations can be also be done outside the loop. These optimizations bring down the time from 54 secs to around 6.3 secs on my machine. ...and the answer is identical.
set.seed(42)
#dmat2 <- ##RUN ORIGINAL CODE HERE##
# Now try this:
set.seed(42)
dmat <- matrix(0, nrow=100,ncol=30)
system.time({
e <- new.env()
bounds <- 0.3 + convex.bounds(0.3, dir = 1, function(x) (x>1e-4)*(x<20))
f <- function(x) (3.5+z*i)*log(x)-x
if (diff(bounds) < 1e-07) stop("pointless!")
for (d in seq_len(nrow(dmat))) {
for (j in seq_len(ncol(dmat))) {
y <- 0
z <- 0.00001*d*j
for (i in 1:100) {
y <- y + .Call("arms", bounds, f, 0.3, 1L, e)
}
dmat[d, j] <- y
}
}
})
all.equal(dmat, dmat2) # TRUE
why not like this?
dat <- expand.grid(d=1:10, j=1:3, i=1:10)
arms.func <- function(vec) {
require(HI)
dji <- vec[1]*vec[2]*vec[3]
arms.out <- arms(0.3,
function(x,params) (3.5 + 0.00001*params)*log(x) - x,
function(x,params) (x>1e-4)*(x<20),
n.sample=1,
params=dji)
return(arms.out)
}
dat$arms <- apply(dat,1,arms.func)
library(plyr)
out <- ddply(dat,.(d,j),summarise, arms=sum(arms))
matrix(out$arms,nrow=length(unique(out$d)),ncol=length(unique(out$j)))
However, its still single core and time consuming. But that isn't R being slow, its the arms function.