Looping through vector elements using macro variables - r

Suppose the user sets dim (1, ...., n), and then one or more x_i variables values, with i = 1,..., n.
After some more computations I need to automatically return a dim-dimensional vector vec of the form: (0, 0, 0.2, 0, 0, ..., 0.3), where in this specific case the user has set:
dim <- 10
x_3 <- 0.2
x_10 <- 0.3
Of course it is immediate to do:
vec <- rep(0, dim)
vec[3] <- x_3
vec[10] <- x_10
However, since I want to automatize as much operations as possible, I ask you how would you link the x_i variables together with the "respective" element of vec, taking into account that a priori it is not known which/how many variables will be set different from 0.
In other languages this could be done using a for-loop with macro variables... the syntax is wrong, but the idea is something like this:
vec <- rep(0, dim)
for (i in 1:dim) {
if (as.integer(exists(x_i))==1) {
vec[i] <- x_i
}
}
what would you suggest? Thanks!

sapply(paste("x_",seq(dim),sep=""),function(x) if(exists(x)) get(x) else 0)
x_1 x_2 x_3 x_4 x_5 x_6 x_7 x_8 x_9 x_10
0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.3

Related

Lower Triangular Matrix equal to value in Julia

I am trying to find a lower triangular matrix in Julia and place that matrix in a zeros matrix (replacing the zeros). I then want to set the lower triangular matrix equal to the value c. In R this would look something like this:
mat <- matrix(0, n,n)
mat[lower.tri(mat, diag=FALSE)] <- c
with an output that looks like this:
[,1] [,2]
[1,] 0.0000000 0
[2,] 0.4646787 0
I am able to make the zeros matrix in Julia, but I do not know how to place the lowertriangular function inside of the it along with the c value.
Here's probably the closest equivalent in Julia:
julia> n = 3; c = 0.4646787; A = zeros(n,n);
julia> A[tril!(trues(size(A)), -1)] .= c;
julia> A
3×3 Array{Float64,2}:
0.0 0.0 0.0
0.464679 0.0 0.0
0.464679 0.464679 0.0
Of course, one of the big advantages in Julia is that you don't need to worry about creating these clever one-liners nearly as much. Writing out the naive for loops is quite easy and should have similar performance:
julia> for j in 1:size(A, 2), i in j+1:size(A, 1)
A[i, j] = c
end
julia> A
3×3 Array{Float64,2}:
0.0 0.0 0.0
0.464679 0.0 0.0
0.464679 0.464679 0.0

Reduce computation time of simple function

I have a simple function, which is called hundreds of times. Is there any way to improve the speed of its computation?
# the input variable (y) is a scalar
my_function <- function(y){
ret_value <- 0.5*((max(0,y)**2)-(3*max(0,y-1)**2)+(3*max(0,y-2)**2)-(max(0,y-3)**2))
return(ret_value)
}
I already tried with an if& else if statement, which resulted in a slightly longer computation time.
I also read a lot about the rcpppackage, but it is my understanding that a C++ implementation is mostly helpful for more complicated structures such as recursions and multiple loops. Not sure if there would be any improvement for such a simple function as mine.
EDIT: The parent function is as follows and as such, a vectorization would probably be the best idea. What's the best / fastest way to do so?
val1 <- 0.9
val2 <- 0.7
F_val <- 1
loop_seq <- c(1, 2, 3)
for(i in loop_seq){
for(j in loop_seq){
F_val <- F_val + my_function(val1-i)*my_function(val2-j)
}
}
My current solution is the following, but I didn't really get any speed improvement:
ret_value <- 0.5*apply(matrix(pmax(0,rep(y_vec,each=4)+c(0,-1,-2,-3))^2*c(1,-3,3,-1), ncol=length(y_vec)),2,sum)
I think **2 is where the time is spend. Replaced it with multiplication
mff <- function(y) {
a <- if (y < 0.0) 0.0 else y
b <- if (y < 1.0) 0.0 else y-1.0
c <- if (y < 2.0) 0.0 else y-2.0
d <- if (y < 3.0) 0.0 else y-3.0
0.5*(a*a - 3.0*b*b + 3.0*c*c - d*d)
}
On my machine your original function
> system.time(replicate(1e6, mf(0.7)))
user system elapsed
2.88 0.00 2.88
Mine
> system.time(replicate(1e6, mff(0.7)))
user system elapsed
1.48 0.03 1.52
UPDATE
It is also easy to vectorize this function with ifelse()
vecmff <- function(y) {
a <- ifelse(y < 0.0, 0.0, y)
b <- ifelse(y < 1.0, 0.0, y-1.0)
c <- ifelse(y < 2.0, 0.0, y-2.0)
d <- ifelse(y < 3.0, 0.0, y-3.0)
0.5*(a*a - 3.0*b*b + 3.0*c*c - d*d)
}

Speeding up an iterative function with 2 inputs/outputs

Using R, I'm wondering what the best way to iteratively evaluate a function of multiple inputs and outputs. I'm motivated by the plots seen at: http://paulbourke.net/fractals/clifford/
The key equations are:
x_{n+1} = sin(A* y_n) + C* cos(A* x_n)
y_{n+1} = sin(B* x_n) + D* cos(B* y_n)
And I want to store the results for each iteration. I'm guessing there is a MUCH faster way than going through the loop described in the code below:
#Parameters
A <- -1.4
B <- 1.6
C <- 1.0
D <- 0.7
n_iter <- 10000000
#Initial values
x0 <- 0
y0 <- 0
#function to calculate n+1 points
cliff <- function(x,y){
c(sin(A*y) + C*cos(A*x), sin(B*x) + D*cos(B*y))
}
#matrix to store results
res_mat <- matrix(0,nrow=n_iter,ncol=2)
#recursive loop (definitely not the fastest way to do this?)
for (i in 2:n_iter){
res_mat[i,] <- cliff(res_mat[i-1,1],res_mat[i-1,2])
}
I imagine this doesn't actually have to be a single function, but 2 that operate on each other's outputs. Any insight into a more appropriate way to evaluate these functions would be greatly appreciated. I daresay I would benefit here from some general programming advice that would not necessarily be R specific.
One option would be using Rcpp; for iterative functions like this one where each new value is a complex function of the previous iteration's value, this often yields quite good speedups.
library(Rcpp)
cliff.rcpp = cppFunction("
NumericMatrix cliff(int nIter, double A, double B, double C, double D) {
NumericMatrix x(nIter, 2);
for (int i=1; i < nIter; ++i) {
x(i,0) = sin(A*x(i-1,1)) + C*cos(A*x(i-1,0));
x(i,1) = sin(B*x(i-1,0)) + D*cos(B*x(i-1,1));
}
return x;
}")
cliff.rcpp(10, 1, 2, 3, 4)
# [,1] [,2]
# [1,] 0.0000000 0.0000000
# [2,] 3.0000000 4.0000000
# [3,] -3.7267800 -0.8614156
# [4,] -3.2595913 -1.5266964
# [5,] -3.9781665 -4.2182644
# [6,] -1.1296464 -3.1953775
# [7,] 1.3346977 3.2046776
# [8,] 0.6386906 4.4230487
# [9,] 1.4501988 -2.3914781
# [10,] -0.3208062 0.5208984
We can see that this returns identical results to the code in the question:
cliff.orig <- function(n_iter, A, B, C, D) {
#function to calculate n+1 points
cliff <- function(x,y){
c(sin(A*y) + C*cos(A*x), sin(B*x) + D*cos(B*y))
}
#matrix to store results
res_mat <- matrix(0,nrow=n_iter,ncol=2)
#recursive loop (definitely not the fastest way to do this?)
for (i in 2:n_iter){
res_mat[i,] <- cliff(res_mat[i-1,1],res_mat[i-1,2])
}
res_mat
}
identical(cliff.rcpp(10, 1, 2, 3, 4), cliff.orig(10, 1, 2, 3, 4))
# [1] TRUE
For the input in the original question, the Rcpp approach yields a ~50 times speedup:
system.time(cliff.rcpp(10000000, -1.4, 1.6, 1.0, 0.7))
# user system elapsed
# 0.661 0.046 0.717
system.time(cliff.orig(10000000, -1.4, 1.6, 1.0, 0.7))
# user system elapsed
# 34.591 0.245 35.040

r create matrix from repeat loop output

For each value n in some vector N, I want to compute the percentage of values exceed n for each variable in my data frame T.
Consider the following input data frame:
T <- data.frame(A=c(0.1,0.2,0.3), B=c(0.3,0.3,0.9),C=c(1,0.5,0))
T
# A B C
# 1 0.1 0.3 1.0
# 2 0.2 0.3 0.5
# 3 0.3 0.9 0.0
I would like the output to be a matrix that looks something like this:
A B C
n=0.1 66.6 100 66.6
n=0.2 33.3 100 66.6
My current implementation is not working:
n <- 0.8
repeat {
Tlogic <- T > n
TU <- as.matrix(apply(Tlogic,2,sum))
q = NULL
for (i in seq(along=TU[,1]))
{
percent <- (TU[i]/nrow(T))*100
q = c(q, percent)
}
n <- n - 0.05;
print(n);
if(log(n) < -6) break
}
Basically you're asking, for each value n in some vector N, to compute the percentage of values in each column of T that exceed n.
You can actually do this in one line in R by moving from a solution that writes out loops to one that uses the *apply functions in R:
N <- c(0.1, 0.2)
do.call(rbind, lapply(N, function(n) c(n=n, 100*colMeans(T > n))))
# n A B C
# [1,] 0.1 66.66667 100 66.66667
# [2,] 0.2 33.33333 100 66.66667
For each value n in N, the call lapply(N, function(n) c(n=n, 100*colMeans(T > n))) computes a vector that indicates n as well as the percentage of values in each column of T that exceed n. Then do.call(rbind, ...) groups all of these together into a final output matrix.
In your case, you want N to form a decreasing sequence (by 0.05 each step) from 0.8 until log(n) < -6. You can get the N vector in this case with:
N <- seq(.8, 0, -.05)
N <- N[log(N) >= -6]

Tabular data to matrix in R

I'm trying to remove the shackles of some legacy code that we use to make decision trees in a retail setting. I got to playing with hclust in R and it's beautiful and I'd like to use it. The heavy lifting for calculating distances is done in SQL and I get an output like this:
main with dist
A A 0.00
A B 1.37
A C 0.64
B B 0
B C 0.1
C C 0
That's loaded as a data frame right now (just reading the SQL query dump), but hclust wants a matrix of distances. E.g.,:
A B C
--+-----------------
A | 0
B | 1.37 0
C | 0.64 0.1 0
My thinking is too procedural and I'm trying to do it in nested loops at the moment. Can someone point me in the direction of something more R-idiomatic to do this?
Thank!
If you are looking for an actual distance matrix in R, try:
as.dist(xtabs(dist ~ with + main, mydf), diag = TRUE)
# A B C
# A 0.00
# B 1.37 0.00
# C 0.64 0.10 0.00
I'm presuming that the combinations of "main" and "with" are unique, otherwise xtabs would sum the "dist" values.
I would suggest to change from letters to numbers (which is straight forward using the ASCII codes) and then use the linearized indices of R matrices to access each pair in a vectorwise manner.
Minimal example:
N <- 3
d <- data.frame(x = c(1,2), y = c(2,3), v = c(0.1, 0.2))
m <- matrix(0, N, N)
m[(d$y-1)*N+d$x] = d$v
The output is:
[,1] [,2] [,3]
[1,] 0 0.1 0.0
[2,] 0 0.0 0.2
[3,] 0 0.0 0.0
EDIT: To preserve arbitrary strings as row and col names, consider the following example:
codes <- c('A','B','C')
N <- 3
d <- data.frame(x = c('A','B'), y = c('B','C'), v = c(0.1, 0.2))
m <- matrix(0, N, N)
m[(vapply(d$y, function(x) which(codes == x), 0)-1)*N+
vapply(d$x, function(x) which(codes == x), 0)] = d$v
rownames(m) = codes
colnames(m) = codes

Resources