I am trying to write a function to calculate h-point. the function is defined over a rank frequency data frame.
consider the following data.frame :
DATA <-data.frame(frequency=c(49,48,46,38,29,24,23,22,15,12,12,10,10,9,9), rank=c(seq(1, 15)))
and the formula for h-point is :
if {there is an r = f(r), h-point = r }
else { h-point = f(i)j-f(j)i / j-i+f(i)-f(j) }
where f(i) and f(j) are corresponding frequencies for ith and jth ranks and i and j are adjacent ranks that i<f(i) and j>f(j).
NOW, i have tried the following codes :
fr <-function(x){d <-DATA$frequency[x]
return(d)}
for (i in 1:length(DATA$rank)) {
j <- i+1
if (i==fr(i))
return(i)
else(i<fr(i) && j>fr(j)) {
s <-fr(i)*j-fr(j)*i/j-i+fr(i)-fr(j)
return(s)
}}
I also tried:
for (i in 1:length(DATA$rank)) {
j <- i+1
if (i==fr(i))
return(i)
if (i<fr(i) while(j>fr(j))) {
s <-fr(i)*j-fr(j)*i/j-i+fr(i)-fr(j)
return(s)
}}
and neither of them works. for the DATA ,the desired result would be i=11 and j=12, so:
h-point=12×12 - 10×11 / 12 - 11 + 12 - 10
can you please tell me what I`m doing wrong here?
You could do:
h_point <- function(data){
x <- seq(nrow(data))
f_x <- data[["frequency"]][x]
h <- which(x == f_x)
if(length(h)>1) h
else{
i <- which(x<f_x)
j <- which(x>f_x)
s <- which(outer(i,j,"-") == -1, TRUE)
i <- i[s[,1]]
j <- j[s[,2]]
cat("i: ",i, "j: ", j,"\n")
f_x[i]*j - f_x[j]*i / (i-j + f_x[i]-f_x[j])
}
}
h_point(DATA)
i: 11 j: 12
[1] 34
I think I have figured out what you are trying to achieve. My loop will go through DATA and break at any point if rank == frequency for a given row. If might be more prudent to explicitly test this with DATA$rank[i] == fr(i) rather than relying on i, in case tied ranks etc.
The second if statement calculates h-point (s) for rows i and j if row i has rank that is lower than freq and row j has a rank that is higher.
Is this what you wanted?
DATA <-data.frame(frequency=c(49,48,46,38,29,24,23,22,15,12,12,10,10,9,9), rank=c(seq(1, 15)))
fr <-function(x){d <-DATA$frequency[x]
return(d)}
for(i in 1:nrow(DATA)){
j <- i+1
if (i==fr(i)){
s <- list(ij=c(i=i,j=j), h=i)
break
}else if(i <fr(i) && j>fr(j)){
s <-list(ij=c(i=i,j=j),h=fr(i)*j-fr(j)*i/j-i+fr(i)-fr(j))
}}
I am not sure the formula is correct, in your loop you had j-i but in explanation it was i-j. Not sure if the entire i-j+fr(i)-fr(j) is the denominator and similarly for the numerator. Simple fixes.
I am trying to write a function to sort a vector, but without using R's inbuilt 'Sort' function.
My Code:
sorting <- function(x){
for(i in 1:length(x)){
for(j in (i+1):length(x)){
if(x[i] > x[j]){
x[c(i,j)] = x[c(j,i)]
}
}
}
x
}
I get below output:
> x <- c(3,1,4,7,2,9)
> sorting(x)
Error in if (x[i] > x[j]) { : missing value where TRUE/FALSE needed
>
I understand that we'll get above error when the 'IF' condition returns 'NA' instead of TRUE/FALSE.
Is there an issue with the statement:
for(j in (i+1):length(x)){
Python code for same:
def sorting(a):
for i in range(len(a)):
for j in range(i+1,len(a)):
if a[i] > a[j]:
a[i],a[j] = a[j],a[i]
return a
Output:
sorting([3,1,4,7,2,9])
Out[380]: [1, 2, 3, 4, 7, 9]
In Python, the code works fine.
Could someone let me know the issue with my R code.
The problem is with that (i+1). When length(x) reaches its max value, j goes out of range. I added this: (length(x)-1).
sorting <- function(x){
for(i in 1:(length(x)-1)){
for(j in (i+1):length(x)){
if(x[i] > x[j]){
x[c(i,j)] = x[c(j,i)] # +1 for this
}
}
}
x
}
sorting(c(3,1,4,7,2,9))
[1] 1 2 3 4 7 9
Sorting <- function(x){
xs <- rep(0, length(x))
for(i in 1:length(x)){
xs[i] = min(x)
x <- x[x != min(x)]
}
xs
}
Sorting(c(3,1,4,7,2,9))
[1] 1 2 3 4 7 9
Sorting is a function with a numeric vector argument x. Initially xs, the sorted vector of the same length as x, is filled with zeroes. The for loop aims at assigning the minimum value of vector x to each component of xs (for(i in 1:length(x)){ xs[i] = min(x) ). We then remove such minimum value from x by x <- x[x != min(x)] } to find the next minimum value. Finally, xs shows the sorted vector of x.
I have the 3 following functions which I would like to make faster, I assume apply functions are the best way to go, but I have never used apply functions, so I have no idea what to do. Any type of hints, ideas and code snippets will be much appreciated.
n, T, dt are global parameters and par is a vector of parameters.
Function 1: is a function to create an m+1,n matrix containing poisson distributed jumps with exponentially distributed jump sizes. My troubles here is because I have 3 loops and I am not sure how to incorporate the if statement in the inner loop. Also I have no idea if it is at all possible to use apply functions on the outer layers of the loops only.
jump <- function(t=0,T=T,par){
jump <- matrix(0,T/dt+1,n) # initializing output matrix
U <- replicate(n,runif(100,t,T)) #matrix used to decide when the jumps will happen
Y <-replicate(n,rexp(100,1/par[6])) #matrix with jump sizes
for (l in 1:n){
NT <- rpois(1,par[5]*T) #number of jumps
k=0
for (j in seq(t,T,dt)){
k=k+1
if (NT>0){
temp=0
for (i in 1:NT){
u <- vector("numeric",NT)
if (U[i,l]>j){ u[i]=0
}else u[i]=1
temp=temp+Y[i,l]*u[i]
}
jump[k,l]=temp
}else jump[k,l]=0
}
}
return(jump)
}
Function 2: calculates a default intensity, based on Brownian motions and the jumps from function 1. Here my trouble is how to use apply functions when the variable used for the calculation is the values from the row above in the output matrix AND how to get the right values from the external matrices which are used in the calculations (BMz_C & J)
lambda <- function(t=0,T=T,par,fit=0){
lambda <- matrix(0,m+1,n) # matrix to hold intesity path output
lambda[1,] <- par[4] #initializing start value of the intensity path.
J <- jump(t,T,par) #matrix containing jumps
for(i in 2:(m+1)){
dlambda <- par[1]*(par[2]-max(lambda[i-1,],0))*dt+par[3]*sqrt(max(lambda[i- 1,],0))*BMz_C[i,]+(J[i,]-J[i-1,])
lambda[i,] <- lambda[i-1,]+dlambda
}
return(lambda)
}
Function 3: calculates a survival probability based on the intensity from function 2. Here a() and B() are functions that return numerical values. My problem here is that the both value i and j are used because i is not always an integer which thus can to be used to reference the external matrix. I have earlier tried to use i/dt, but sometimes it would overwrite one line and skip the next lines in the matrix, most likely due to rounding errors.
S <- function(t=0,T=T,par,plot=0, fit=0){
S <- matrix(0,(T-t)/dt+1,n)
if (fit > 0) S.fit <- matrix(0,1,length(mat)) else S.fit <- 0
l=lambda(t,T,par,fit)
j=0
for (i in seq(t,T,dt)){
j=j+1
S[j,] <- a(i,T,par)*exp(B(i,T,par)*l[j,])
}
return(S)
}
Sorry for the long post, any help for any of the functions will be much appreciated.
EDIT:
First of all thanks to digEmAll for the great reply.
I have now worked on vectorising function 2. First I tried
lambda <- function(t=0,T=T,par,fit=0){
lambda <- matrix(0,m+1,n) # matrix to hold intesity path input
J <- jump(t,T,par,fit)
lambda[1,] <- par[4]
lambda[2:(m+1),] <- sapply(2:(m+1), function(i){
lambda[i-1,]+par[1]*(par[2]-max(lambda[i-1,],0))*dt+par[3]*sqrt(max(lambda[i-1,],0))*BMz_C[i,]+(J[i,]-J[i-1,])
})
return(lambda)
}
but it would only produce the first column. So I tried a two step apply function.
lambda <- function(t=0,T=T,par,fit=0){
lambda <- matrix(0,m+1,n) # matrix to hold intesity path input
J <- jump(t,T,par,fit)
lambda[1,] <- par[4]
lambda[2:(m+1),] <- sapply(1:n, function(l){
sapply(2:(m+1), function(i){
lambda[i-1,l]+par[1]*(par[2]-max(lambda[i-1,l],0))*dt+par[3]*sqrt(max(lambda[i-1,l],0))*BMz_C[i,l]+(J[i,l]-J[i-1,l])
})
})
return(lambda)
}
This seems to work, but only on the first row, all rows after that have an identical non-zero value, as if lambda[i-1] is not used in the calculation of lambda[i], does anyone have an idea how to manage that?
I'm going to explain to you, setp-by-step, how to vectorize the first function (one possible way of vectorization, maybe not the best one for your case).
For the others 2 functions, you can simply apply the same concepts and you should be able to do it.
Here, the key concept is: start to vectorize from the innermost loop.
1) First of all, rpois can generate more than one random value at a time but you are calling it n-times asking one random value. So, let's take it out of the loop obtaining this:
jump <- function(t=0,T=T,par){
jump <- matrix(0,T/dt+1,n)
U <- replicate(n,runif(100,t,T))
Y <-replicate(n,rexp(100,1/par[6]))
NTs <- rpois(n,par[5]*T) # note the change
for (l in 1:n){
NT <- NTs[l] # note the change
k=0
for (j in seq(t,T,dt)){
k=k+1
if (NT>0){
temp=0
for (i in 1:NT){
u <- vector("numeric",NT)
if (U[i,l]>j){ u[i]=0
}else u[i]=1
temp=temp+Y[i,l]*u[i]
}
jump[k,l]=temp
}else jump[k,l]=0
}
}
return(jump)
}
2) Similarly, it is useless/inefficient to call seq(t,T,dt) n-times in the loop since it will always generate the same sequence. So, let's take it out of the loop and store into a vector, obtainig this:
jump <- function(t=0,T=T,par){
jump <- matrix(0,T/dt+1,n)
U <- replicate(n,runif(100,t,T))
Y <-replicate(n,rexp(100,1/par[6]))
NTs <- rpois(n,par[5]*T)
js <- seq(t,T,dt) # note the change
for (l in 1:n){
NT <- NTs[l]
k=0
for (j in js){ # note the change
k=k+1
if (NT>0){
temp=0
for (i in 1:NT){
u <- vector("numeric",NT)
if (U[i,l]>j){ u[i]=0
}else u[i]=1
temp=temp+Y[i,l]*u[i]
}
jump[k,l]=temp
}else jump[k,l]=0
}
}
return(jump)
}
3) Now, let's have a look at the innermost loop:
for (i in 1:NT){
u <- vector("numeric",NT)
if (U[i,l]>j){ u[i]=0
}else u[i]=1
temp=temp+Y[i,l]*u[i]
}
this is equal to :
u <- as.integer(U[1:NT,l]<=j)
temp <- sum(Y[1:NT,l]*u)
or, in one-line:
temp <- sum(Y[1:NT,l] * as.integer(U[1:NT,l] <= j))
hence, now the function can be written as :
jump <- function(t=0,T=T,par){
jump <- matrix(0,T/dt+1,n)
U <- replicate(n,runif(100,t,T))
Y <-replicate(n,rexp(100,1/par[6]))
NTs <- rpois(n,par[5]*T)
js <- seq(t,T,dt)
for (l in 1:n){
NT <- NTs[l]
k=0
for (j in js){
k=k+1
if (NT>0){
jump[k,l] <- sum(Y[1:NT,l]*as.integer(U[1:NT,l]<=j)) # note the change
}else jump[k,l]=0
}
}
return(jump)
}
4) Again, let's have a look at the current innermost loop:
for (j in js){
k=k+1
if (NT>0){
jump[k,l] <- sum(Y[1:NT,l]*as.integer(U[1:NT,l]<=j)) # note the change
}else jump[k,l]=0
}
as you can notice, NT does not depend on the iteration of this loop, so the inner if can be moved outside, as follows:
if (NT>0){
for (j in js){
k=k+1
jump[k,l] <- sum(Y[1:NT,l]*as.integer(U[1:NT,l]<=j)) # note the change
}
}else{
for (j in js){
k=k+1
jump[k,l]=0
}
}
this seems worse than before, well yes it is, but now the 2 conditions can be turned into one-liner's (note the use of sapply¹):
if (NT>0){
jump[1:length(js),l] <- sapply(js,function(j){ sum(Y[1:NT,l]*as.integer(U[1:NT,l]<=j)) })
}else{
jump[1:length(js),l] <- 0
}
obtaining the following jump function:
jump <- function(t=0,T=T,par){
jump <- matrix(0,T/dt+1,n)
U <- replicate(n,runif(100,t,T))
Y <-replicate(n,rexp(100,1/par[6]))
NTs <- rpois(n,par[5]*T)
js <- seq(t,T,dt)
for (l in 1:n){
NT <- NTs[l]
if (NT>0){
jump[1:length(js),l] <- sapply(js,function(j){ sum(Y[1:NT,l]*as.integer(U[1:NT,l]<=j)) })
}else{
jump[1:length(js),l] <- 0
}
}
return(jump)
}
5) finally we can get rid of the last loop, using again the sapply¹ function, obtaining the final jump function :
jump <- function(t=0,T=T,par){
U <- replicate(n,runif(100,t,T))
Y <-replicate(n,rexp(100,1/par[6]))
js <- seq(t,T,dt)
NTs <- rpois(n,par[5]*T)
jump <- sapply(1:n,function(l){
NT <- NTs[l]
if (NT>0){
sapply(js,function(j){ sum(Y[1:NT,l]*as.integer(U[1:NT,l]<=j)) })
}else {
rep(0,length(js))
}
})
return(jump)
}
(¹)
sapply function is pretty easy to use. For each element of the list or vector passed in the X parameter, it applies the function passed in the FUN parameter, e.g. :
vect <- 1:3
sapply(X=vect,FUN=function(el){el+10}
# [1] 11 12 13
since by default the simplify parameter is true, the result is coerced to the simplest possible object. So, for example in the previous case the result becomes a vector, while in the following example result become a matrix (since for each element we return a vector of the same size) :
vect <- 1:3
sapply(X=vect,FUN=function(el){rep(el,5)})
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 1 2 3
# [3,] 1 2 3
# [4,] 1 2 3
# [5,] 1 2 3
Benchmark :
The following benchmark just give you an idea of the speed gain, but the actual performances may be different depending on your input parameters.
As you can imagine, jump_old corresponds to your original function 1, while jump_new is the final vectorized version.
# let's use some random parameters
n = 10
m = 3
T = 13
par = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6)
dt <- 3
set.seed(123)
system.time(for(i in 1:5000) old <- jump_old(T=T,par=par))
# user system elapsed
# 12.39 0.00 12.41
set.seed(123)
system.time(for(i in 1:5000) new <- jump_new(T=T,par=par))
# user system elapsed
# 4.49 0.00 4.53
# check if last results of the 2 functions are the same:
isTRUE(all.equal(old,new))
# [1] TRUE