R reverse behaviour of bquote - r

bquote function allows to evaluate parts of the expression which are wrapped in .() call. For example,
a <- 2
b <- 100
bquote(.(2 * a) * x + .(log10(b)))
would return
4 * x + 2
I want to rewrite this function to evaluate everything except for things inside .() call. This is the desired behavior:
a <- 2
b <- 100
bquote(2 * a * .(x) + log10(b))
> 4 * x + 2
I understand that to do so I have to go over the abstract syntax tree and evaluate brunches without .() call in them but I couldn't handle all this recursion.
Could you help me to write such a function?

subst will substitute all variables except those within .(...) and the simplify function will simplify sub-trees that have no variables -- omit the simplify part if simplification is not needed. No packages are used.
subst <- function(e) {
if (typeof(e) == "language") {
if (identical(e[[1]], as.name("."))) e[[2]]
else {
if (length(e) > 1) e[-1] <- lapply(as.list(e[-1]), subst)
e
}
} else {
eval(e)
}
}
simplify <- function(e) {
if (typeof(e) == "language") {
if (length(all.vars(e))) {
if (length(e) > 1) {
e[-1] <- lapply(as.list(e[-1]), simplify)
e
} else e
} else eval(e)
} else e
}
inverse_bquote <- function(x, SIMPLIFY = TRUE) {
result <- subst(substitute(x))
if (SIMPLIFY) simplify(result) else result
}
Now test it out.
a <- 2
b <- 100
inverse_bquote(2 * a * .(x) + log10(b))
## 4 * x + 2
# without simplification
inverse_bquote(2 * a * .(x) + log10(b), SIMPLIFY = FALSE)
## 2 * 2 * x + log10(100)
Update: Added simplification. Made it optional.

Related

Slow recursion even with memoization in R

I'm trying to solve the problem #14 of Project Euler.
So the main objective is finding length of Collatz sequence.
Firstly I solved problem with regular loop:
compute <- function(n) {
result <- 0
max_chain <- 0
hashmap <- 1
for (i in 1:n) {
chain <- 1
number <- i
while (number > 1) {
if (!is.na(hashmap[number])) {
chain <- chain + hashmap[number]
break
}
if (number %% 2 == 0) {
chain <- chain + 1
number <- number / 2
} else {
chain <- chain + 2
number <- (3 * number + 1) / 2
}
}
hashmap[i] <- chain
if (chain > max_chain) {
max_chain <- chain
result <- i
}
}
return(result)
}
Only 2 seconds for n = 1000000.
I decided to replace while loop to recursion
len_collatz_chain <- function(n, hashmap) {
get_len <- function(n) {
if (is.na(hashmap[n])) {
hashmap[n] <<- ifelse(n %% 2 == 0, 1 + get_len(n / 2), 2 + get_len((3 * n + 1) / 2))
}
return(hashmap[n])
}
get_len(n)
return(hashmap)
}
compute <- function(n) {
result <- 0
max_chain <- 0
hashmap <- 1
for (i in 1:n) {
hashmap <- len_collatz_chain(i, hashmap)
print(length(hashmap))
if (hashmap[i] > max_chain) {
max_chain <- hashmap[i]
result <- i
}
}
return(result)
}
This solution works but works so slow. Almost 1 min for n = 10000.
I suppose that one of the reasons is R creates hashmap object each time when call function len_collatz_chain.
I know about Rcpp packages and yes, the first solution works fine but I can't understand where I'm wrong.
Any tips?
For example, my Python recursive solution works in 1 second with n = 1000000
def len_collatz_chain(n: int, hashmap: dict) -> int:
if n not in hashmap:
hashmap[n] = 1 + len_collatz_chain(n // 2, hashmap) if n % 2 == 0 else 2 + len_collatz_chain((3 * n + 1) // 2, hashmap)
return hashmap[n]
def compute(n: int) -> int:
result, max_chain, hashmap = 0, 0, {1: 1}
for i in range(2, n):
chain = len_collatz_chain(i, hashmap)
if chain > max_chain:
result, max_chain = i, chain
return result
The main difference between your R and Python code is that in R you use a vector for the hashmap, while in Python you use a dictionary and that hashmap is transferred many times as function argument.
In Python, if you have a Dictionary as function argument, only a reference to the actual data is transfered to the called function. This is fast. The called function works on the same data as the caller.
In R, a vector is copied when used as function argument. This is potentially slow, but safer in the sense that the called function cannot alter the data of the caller.
This the main reason that Python is so much faster in your code.
You can however alter the R code slightly, such that the hashmap is not transfered as function argument anymore:
len_collatz_chain <- local({
hashmap <- 1L
get_len <- function(n) {
if (is.na(hashmap[n])) {
hashmap[n] <<- ifelse(n %% 2 == 0, 1 + get_len(n / 2), 2 + get_len((3 * n + 1) / 2))
}
hashmap[n]
}
get_len
})
compute <- function(n) {
result <- rep(NA_integer_, n)
for (i in seq_len(n)) {
result[i] <- len_collatz_chain(i)
}
result
}
compute(n=10000)
This makes the R code much faster. (Python will probably still be faster though).
Note that I have also removed the return statements in the R code, as they are not needed and add one level to the call stack.

How can I define a sequence of recursive functions?

I tried to define the following function but failed. Any suggestions will be welcome.
H = list()
H[[1]] = function(x) 1
for(i in 2:4) H[[i]] = function(x) H[[i-1]](x)*x+1
> H
[[1]]
function (x)
1
[[2]]
function (x)
H[[i - 1]](x) * x + 1
[[3]]
function (x)
H[[i - 1]](x) * x + 1
[[4]]
function (x)
H[[i - 1]](x) * x + 1
> H[[1]](1)
1
> H[[2]](1)
Too Deep Nesting
Instead of defining a set of functions recursively, define a single recursive function:
H <- function(x, n) {
if (n == 1) 1 else H(x, n-1) * x + 1
}
Then, H(x, n) returns the same as your H[[n]](x).
For completeness sake: your approach with the for loop does not work because each function depends on the specific value which was assigned to i at the moment the function was generated.
At the end of the loop i is set to 4. When you call H[[2]](10) R tries to compute H[[i-1]](10) * 10 + 1 = H[[3]](10) * 3 + 1 = ... which end in an infinite recursion.
Simply put, R does not remember that at the moment H[[2]] was defined i was equal to 2.

What is wrong with my R for-loop that sums a series?

Here is my function that does a loop:
answer = function(a,n) {
for (k in 0:n) {
x =+ (a^k)/factorial(k)
}
return(x)
}
answer(1,2) should return 2.5 as it is the calculated value of
1^0 / 0! + 1^1 / 1! + 1^2 / 2! = 1 + 1 + 0.5 = 2.5
But I get
answer(1,2)
#[1] 0.5
Looks like it fails to accumulate all three terms and just stores the newest value every time. += does not work so I used =+ but it is still not right. Thanks.
answer = function(a,n) {
x <- 0 ## initialize the accumulator
for (k in 0:n) {
x <- x + (a^k)/factorial(k) ## note how to accumulate value in R
}
return(x)
}
answer(1, 2)
#[1] 2.5
There is "vectorized" solution:
answer = function(a,n) {
x <- a ^ (0:n) / factorial(0:n)
return(sum(x))
}
In this case you don't need to initialize anything. R will allocate memory behind that <- and sum.
You are using Taylor expansion to approximate exp(a). See this Q & A on the theme. You may want to pay special attention to the "numerical convergence" issue mentioned in my answer.

R lpsolve see all possible solutions of an integral LP

Is there a way to make lpSolve return multiple solutions? In below case i want (5,0) and (0,5) both.
If lpSolve cannot do that then is there any other R package which will return all possible solutions of an integral linear optimization program?
library("lpSolve")
A=matrix (c(1, 1), nrow=1, byrow=TRUE)
b=(5)
signs='=='
c_=c(1,1)
res = lpSolve::lp('max', c_, A, signs, b, all.int = TRUE)
res$solution
=======================================================================
I would also like to know why lpSolve package provides all possible solutions if all decision variables are binary. Why cannot it repeat the same when all variables are integer...
Code:
library(lpSolveAPI)
vBiv_of_v <- function (nbits,v){
taillev<-length(v)
taillevBivalent<-nbits*taillev
vBivalent<-rep(0,taillevBivalent)
for(iLg in seq(1,taillev)) {
iCoef<-1
for(iDelta in seq(1,nbits)){
vBivalent[(iLg-1)*nbits+iDelta]<- iCoef*v[iLg]
iCoef<-iCoef*2
}
}
vBivalent
}
vBiv_to_v <- function (nbits,vBivalent) {
taillevBivalent<-length(vBivalent)
taillev<-taillevBivalent/nbits
v<-rep(0,taillev)
for(iLg in seq(1,taillev)) {
for(iDelta in seq(1,nbits)){
v[iLg]<-v[iLg]+2^(iDelta-1)*vBivalent[(iLg-1)*nbits+iDelta]
}
}
v
}
nbVariable<-2
nbBits=3
nbVariableBivalentes<-nbVariable*nbBits
f.obj<-rep(0,nbVariableBivalentes)
mylp <- make.lp(0, nbVariableBivalentes)
set.objfn(mylp,f.obj)
add.constraint(mylp, vBiv_of_v(nbBits,c(1,1)), "=", 5)
set.type(mylp, 1:nbVariableBivalentes , type = "binary")
repeat {
status<-solve(mylp)
if(status == 0) {
last_sol<-get.variables(mylp)
vRes<-vBiv_to_v(nbBits,last_sol)
cat(vRes[1],vRes[2],"\n")
#add cutting
new_rhs <- 0;
f.condSup<-rep(0,nbVariableBivalentes)
for (iCol in 1:nbVariableBivalentes) {
f.condSup[iCol] <- 2 * last_sol[iCol] - 1
new_rhs <- new_rhs + last_sol[iCol];
}
add.constraint(mylp, f.condSup, "<=", new_rhs - 1)
}
else if(status == 2) {
cat("No more solution.\n")
break
}
}
Result:
5 0
4 1
3 2
1 4
2 3
0 5
No more solution.

Mixture modeling - troublee with infinite values from exp() and log()

I'm writing a function for Gaussian mixture models with spherical covariance structures--ie $\Sigma_k = \sigma_k^2 I$. This particular function is similar to the mclust package with identifier VII.
http://en.wikipedia.org/wiki/Mixture_model
Anyways, the problem I'm having is running into infinite values for the weight matrix. Definition: Let W be an n x m matrix where n = 1, ..., n (number of obs) and m = 1, ..., m (number of mixtues). Each element of W (ie w_ij) can essentially be defined as a specific form of:
w_im = \frac{a / b * exp(c)}{\sum_i=1^m [a_i / b_i * exp(c_i)]}
Computing this numerically is giving me infinite values. So I'm trying to use the log-identity log(x+y) = log(x) + log(1 + y/x). But the issue is that it's not as simple as log(x+y) but rather log(\sum_i=1^m [a_i / b_i * exp(c_i)]).
Here's some code define:
n_im = a / b * exp(c) ;
d_.m = \sum_i=1^m [a_i / b_i * exp(c_i)] ; and
c_mat[i,j] as the value of the exponent for the [i,j]th term.
n_mat[, i] <- log(a[i]) - log(b[i]) - c[,i] # numerator of w_im
internal_vec1[i] <- (a[i] * b[1])/ (a[1] * b[i]) # an internal for the step below
c_mat2 <- cbind(rep(1, n), c_mat[,1] - c_mat[,-1]) # since e^a / e^b = e^(a-b)
for (i in 1:n) {
d_vec[i] <- n_mat[i,1] + log(sum(internal_vec1 * exp(c_mat2[i,)))
} ## still getting infinite values
I'm trying to define the problem as briefly as possible. the entire function is obviously much larger than this. But, since the problem I'm running into is specifically dealing with infinite (and 1/infinity) values, I'm hoping this snippet is sufficient. Anyone with a coding trick here?
Here is the solution!! (I've spent way too damn long on this)
**The first function log_plus() solves the simple problem where you want log(\sum_{i=1)^n x_i)
**The second function log_plus2() solves the more complicated problem described above where you want log(\sum_{i=1}^n [a_i / b_i * exp(c_i)])
log_plus <- function(xvec) {
m <- length(xvec)
x <- log(xvec[1])
for (j in 2:m) {
sum_j <- sum(xvec[1:j-1])
x <- x + log(1 + xvec[j]/sum_j)
}
return(x)
}
log_plus2 <- function(a, b, c) {
# assumes intended input of form sum(a/b * e^c)
if ((length(a) != length(b)) || (length(a) != length(c))) {
stop("Input equal length vectors")
}
if (!(all(c > 0) || all(c < 0))) {
stop("All values of c must be either > 0 or < 0.")
}
m <- length(a)
# initilialize log sum
x <- log(a[1]) - log(b[1]) + c[1]
# aggregate / loop log sum
for (j in 2:m) {
# build denominator
b2 <- b[1:j-1]
for (i in 1:j-1) {
d1 <- 0
c2 <- c[1:i]
if (all(c2 > 0)) {
c_min <- min(c2[1:j-1])
c2 <- c2 - c_min
} else if (all(c2 < 0)) {
c_min <- max(c2[1:j-1])
c2 <- c2 - c_min
}
d1 <- d1 + a[i] * prod(b2[-i]) * exp(c2[i])
}
den <- b[j] * (d1)
num <- a[j] * prod(b[1:j-1]) * exp(c[j] - c_min)
x <- x + log(1 + num / den)
}
return(x)
}

Resources