Z3Py solver producing different results in Jupyter - jupyter-notebook

I'm learning how to use Z3Py through the Jupyter notebooks provided here, starting with guide.ipynb. I noticed something odd when running the below example code included in the Boolean Logic section.
p = Bool('p')
q = Bool('q')
r = Bool('r')
solve(Implies(p, q), r == Not(q), Or(Not(p), r))
The first time I run this in the Jupyter notebook it produces the result [p = False, q = True, r = False]. But if I run this code again (or outside of Jupyter) I instead get the result [q = False, p = False, r = True]
Am I doing something wrong to get these different results? Also, since the notebook doesn't say it, which solution is actually correct?

If you take both obtained results, i.e. assignments to your boolean variables, you'll see that each assignment set satisfies your constraints. Hence, both results are correct.
The fact that you obtain different results on different platforms/environments might be odd, but can be explained: SMT solvers typically use heuristics during their solving process, these are often randomised, and different environments may yield different random seeds.
Bottom line: it's all good :-)

Related

Correct way to generate Poisson-distributed random numbers in Julia GPU code?

For a stochastic solver that will run on a GPU, I'm currently trying to draw Poisson-distributed random numbers. I will need one number for each entry of a large array. The array lives in device memory and will also be deterministically updated afterwards. The problem I'm facing is that the mean of the distribution depends on the old value of the entry. Therefore, I would have to do naively do something like:
CUDA.rand_poisson!(lambda=array*constant)
or:
array = CUDA.rand_poisson(lambda=array*constant)
Both of which don't work, which does not really surprise me, but maybe I just need to get a better understanding of broadcasting?
Then I tried writing a kernel which looks like this:
function cu_draw_rho!(rho::CuDeviceVector{FloatType}, λ::FloatType)
idx = (blockIdx().x - 1i32) * blockDim().x + threadIdx().x
stride = gridDim().x * blockDim().x
#inbounds for i=idx:stride:length(rho)
l = rho[i]*λ
# 1. variant
rho[i] > 0.f0 && (rho[i] = FloatType(CUDA.rand_poisson(UInt32,1;lambda=l)))
# 2. variant
rho[i] > 0.f0 && (rho[i] = FloatType(rand(Poisson(lambda=l))))
end
return
end
And many slight variations of the above. I get tons of errors about dynamic function calls, which I connect to the fact that I'm calling functions that are meant for arrays from my kernels. the 2. variant of using rand() works only without the Poisson argument (which uses the Distributions package, I guess?)
What is the correct way to do this?
You may want CURAND.jl, which provides curand_poisson.
using CURAND
n = 10
lambda = .5
curand_poisson(n, lambda)

How to get upper and lower bounds of objective vector in gurobi R

I'm trying to get the upper and lower bound vectors of the objective vector that will keep the same optimal solution of a linear program. I am using gurobi in R to solve my LP. The gurobi reference manual says that the attributes SAObjLow and SAObjUP will give you these bounds, but I cannot find them in the output of my gurobi call.
Is there a special way to tell the solver to return these vectors?
The only values that I see in the output of my gurobi call are status, runtime, itercount, baritercount, nodecount, objval, x, slack, rc, pi, vbasis, cbasis, objbound. The dual variables and reduced costs are returned in pi and rc, but not bounds on the objective vector.
I have tried forcing all 6 different 'methods' but none of them return what I'm looking for.
I know I can get these easily using the lpsolve R package, but I'm solving a relatively large problem and I trust gurobi more than this package.
Here's a reproducible example...
library(gurobi)
model = list()
model$obj = c(500,450)
model$modelsense = 'max'
model$A = matrix(c(6,10,1,5,20,0),3,2)
model$rhs = c(60,150,8)
model$sense = '<'
sol = gurobi(model)
names(sol)
Ideally something like SAObjLow would be one of the possible entries in sol.
Not all attributes are available in the Gurobi R interface - this includes the ones for sensitivity analysis.
You may find this example helpful.
Alternatively, you can use a different API, like Python, to query all available information.

CRAN package submission: "Error: C stack usage is too close to the limit"

Right upfront: this is an issue I encountered when submitting an R package to CRAN. So I
dont have control of the stack size (as the issue occured on one of CRANs platforms)
I cant provide a reproducible example (as I dont know the exact configurations on CRAN)
Problem
When trying to submit the cSEM.DGP package to CRAN the automatic pretest (for Debian x86_64-pc-linux-gnu; not for Windows!) failed with the NOTE: C stack usage 7975520 is too close to the limit.
I know this is caused by a function with three arguments whose body is about 800 rows long. The function body consists of additions and multiplications of these arguments. It is the function varzeta6() which you find here (from row 647 onwards).
How can I adress this?
Things I cant do:
provide a reproducible example (at least I would not know how)
change the stack size
Things I am thinking of:
try to break the function into smaller pieces. But I dont know how to best do that.
somehow precompile? the function (to be honest, I am just guessing) so CRAN doesnt complain?
Let me know your ideas!
Details / Background
The reason why varzeta6() (and varzeta4() / varzeta5() and even more so varzeta7()) are so long and R-inefficient is that they are essentially copy-pasted from mathematica (after simplifying the mathematica code as good as possible and adapting it to be valid R code). Hence, the code is by no means R-optimized (which #MauritsEvers righly pointed out).
Why do we need mathematica? Because what we need is the general form for the model-implied construct correlation matrix of a recursive strucutral equation model with up to 8 constructs as a function of the parameters of the model equations. In addition there are constraints.
To get a feel for the problem, lets take a system of two equations that can be solved recursivly:
Y2 = beta1*Y1 + zeta1
Y3 = beta2*Y1 + beta3*Y2 + zeta2
What we are interested in is the covariances: E(Y1*Y2), E(Y1*Y3), and E(Y2*Y3) as a function of beta1, beta2, beta3 under the constraint that
E(Y1) = E(Y2) = E(Y3) = 0,
E(Y1^2) = E(Y2^2) = E(Y3^3) = 1
E(Yi*zeta_j) = 0 (with i = 1, 2, 3 and j = 1, 2)
For such a simple model, this is rather trivial:
E(Y1*Y2) = E(Y1*(beta1*Y1 + zeta1) = beta1*E(Y1^2) + E(Y1*zeta1) = beta1
E(Y1*Y3) = E(Y1*(beta2*Y1 + beta3*(beta1*Y1 + zeta1) + zeta2) = beta2 + beta3*beta1
E(Y2*Y3) = ...
But you see how quickly this gets messy when you add Y4, Y5, until Y8.
In general the model-implied construct correlation matrix can be written as (the expression actually looks more complicated because we also allow for up to 5 exgenous constructs as well. This is why varzeta1() already looks complicated. But ignore this for now.):
V(Y) = (I - B)^-1 V(zeta)(I - B)'^-1
where I is the identity matrix and B a lower triangular matrix of model parameters (the betas). V(zeta) is a diagonal matrix. The functions varzeta1(), varzeta2(), ..., varzeta7() compute the main diagonal elements. Since we constrain Var(Yi) to always be 1, the variances of the zetas follow. Take for example the equation Var(Y2) = beta1^2*Var(Y1) + Var(zeta1) --> Var(zeta1) = 1 - beta1^2. This looks simple here, but is becomes extremly complicated when we take the variance of, say, the 6th equation in such a chain of recursive equations because Var(zeta6) depends on all previous covariances betwenn Y1, ..., Y5 which are themselves dependend on their respective previous covariances.
Ok I dont know if that makes things any clearer. Here are the main point:
The code for varzeta1(), ..., varzeta7() is copy pasted from mathematica and hence not R-optimized.
Mathematica is required because, as far as I know, R cannot handle symbolic calculations.
I could R-optimze "by hand" (which is extremly tedious)
I think the structure of the varzetaX() must be taken as given. The question therefore is: can I somehow use this function anyway?
Once conceivable approach is to try to convince the CRAN maintainers that there's no easy way for you to fix the problem. This is a NOTE, not a WARNING; The CRAN repository policy says
In principle, packages must pass R CMD check without warnings or significant notes to be admitted to the main CRAN package area. If there are warnings or notes you cannot eliminate (for example because you believe them to be spurious) send an explanatory note as part of your covering email, or as a comment on the submission form
So, you could take a chance that your well-reasoned explanation (in the comments field on the submission form) will convince the CRAN maintainers. In the long run it would be best to find a way to simplify the computations, but it might not be necessary to do it before submission to CRAN.
This is a bit too long as a comment, but hopefully this will give you some ideas for optimising the code for the varzeta* functions; or at the very least, it might give you some food for thought.
There are a few things that confuse me:
All varzeta* functions have arguments beta, gamma and phi, which seem to be matrices. However, in varzeta1 you don't use beta, yet beta is the first function argument.
I struggle to link the details you give at the bottom of your post with the code for the varzeta* functions. You don't explain where the gamma and phi matrices come from, nor what they denote. Furthermore, seeing that beta are the model's parameter etimates, I don't understand why beta should be a matrix.
As I mentioned in my earlier comment, I would be very surprised if these expressions cannot be simplified. R can do a lot of matrix operations quite comfortably, there shouldn't really be a need to pre-calculate individual terms.
For example, you can use crossprod and tcrossprod to calculate cross products, and %*% implements matrix multiplication.
Secondly, a lot of mathematical operations in R are vectorised. I already mentioned that you can simplify
1 - gamma[1,1]^2 - gamma[1,2]^2 - gamma[1,3]^2 - gamma[1,4]^2 - gamma[1,5]^2
as
1 - sum(gamma[1, ]^2)
since the ^ operator is vectorised.
Perhaps more fundamentally, this seems somewhat of an XY problem to me where it might help to take a step back. Not knowing the full details of what you're trying to model (as I said, I can't link the details you give to the cSEM.DGP code), I would start by exploring how to solve the recursive SEM in R. I don't really see the need for Mathematica here. As I said earlier, matrix operations are very standard in R; analytically solving a set of recursive equations is also possible in R. Since you seem to come from the Mathematica realm, it might be good to discuss this with a local R coding expert.
If you must use those scary varzeta* functions (and I really doubt that), an option may be to rewrite them in C++ and then compile them with Rcpp to turn them into R functions. Perhaps that will avoid the C stack usage limit?

BDgraph R package producing different (but consistent) results on different OSs

I'm producing samples from a G-Wishart distribution (for example Mohammadi and Wit (2015) and Mohammadi et al. (2017) ) using the BDgraph package in R, but I'm getting different results from one OS to another.
The results are however consistent on the same OS across different machines!
To see this (and to give a minimum reproducible example) I'll sample from the rgwish function on one OS (say linux)
library(BDgraph)
N = 10000
s=7
nu = s+5
m = sample(5:50,s,replace = TRUE)
G = matrix(nrow = s,ncol = s,
c(0,1,0,0,0,0,0,
0,0,1,1,0,0,0,
0,0,0,1,1,1,0,
0,0,0,0,1,1,0,
0,0,0,0,0,1,0,
0,0,0,0,0,0,1,
0,0,0,0,0,0,0))
sample_linux <- rgwish( n = N, adj.g = G, b = nu - s + 1 , D = diag(m,s,s) )
save.image("foo.RData")
I'll then save the resulting samples and the parameters somewhere. Reboot on (say) Windows and run
load("foo.RData")
library(BDgraph)
sample_win <- rgwish( n = N, adj.g = G, b = nu - s + 1 , D = diag(m,s,s) )
plot( density( sample_linux[7,7,],n=2024), type="l")
lines( density( sample_win[7,7,],n=2024 ) ,col="red" )
The two marginal distribution (of this last diagonal element in this example) are clearly different in my experience:
If I however repeat the procedure on another machine with linux installed the two samples coincide.
The underlying graph G doesn't seem to matter, I've tried with both decomposable or non-decomposable graphs and tried different formats for the adjacency matrix (with diagonal or not, symmetric or upper trianguar etc..) although the one here seems to be the preferred format, and inside the rgwish function the authors correct for it anyway.
R version is 3.4.1 on all the machines and BDgraph and all connected packages are at their last version available*.
For those who might be curious OSX gives a consistently different third set of answers...
The only thing changing that I can think of are the BLAS and LAPACK libraries, but I haven't installed any "experimental"/weird package, openBLAS on both my linux systems and I don't even know which one on Windows (the one R comes with in the binaries from CRAN)...
EDIT: I suppose that there wasn't really a question, so...what do you think of it? Any idea why this could happen? Any idea how to solve the issue?
Until proven wrong I'll assume I'm the one doing something wrong, either in sampling or in verifying, so I decided to write here before contacting the maintainer of the package directly.
*(igraph compiled from github in both cases as normal install on linux fails.)
Problem solved from (I believe) version 2.42 of the package.
The issue was with sampling random number inside some OMP parallel region. Linux and MacOSX could make use of OMP while my version under Windows couldn't, hence different results under different OSs (the Windows version was correct for reference).
The author of the package figured out the problem and provided the fix which will be available from the next release at the time of this answer.

How to handle boundary constraints when using `nls.lm` in R

I asked this question a while ago. I am not sure whether I should post this as an answer or a new question. I do not have an answer but I "solved" the problem by applying the Levenberg-Marquardt algorithm using nls.lm in R and when the solution is at the boundary, I run the trust-region-reflective algorithm (TRR, implemented in R) to step away from it. Now I have new questions.
From my experience, doing this way the program reaches the optimal and is not so sensitive to the starting values. But this is only a practical method to step aside from the issues I encounterd using nls.lm and also other optimization functions in R. I would like to know why nls.lm behaves this way for optimization problems with boundary constraints and how to handle the boundary constraints when using nls.lm in practice.
Following I gave an example illustrating the two issues using nls.lm.
It is sensitive to starting values.
It stops when some parameter reaches the boundary.
A Reproducible Example: Focus Dataset D
library(devtools)
install_github("KineticEval","zhenglei-gao")
library(KineticEval)
data(FOCUS2006D)
km <- mkinmod.full(parent=list(type="SFO",M0 = list(ini = 0.1,fixed = 0,lower = 0.0,upper =Inf),to="m1"),m1=list(type="SFO"),data=FOCUS2006D)
system.time(Fit.TRR <- KinEval(km,evalMethod = 'NLLS',optimMethod = 'TRR'))
system.time(Fit.LM <- KinEval(km,evalMethod = 'NLLS',optimMethod = 'LM',ctr=kingui.control(runTRR=FALSE)))
compare_multi_kinmod(km,rbind(Fit.TRR$par,Fit.LM$par))
dev.print(jpeg,"LMvsTRR.jpeg",width=480)
The differential equations that describes the model/system is:
"d_parent = - k_parent * parent"
"d_m1 = - k_m1 * m1 + k_parent * f_parent_to_m1 * parent"
In the graph on the left is the model with initial values, and in the middle is the fitted model using "TRR"(similar to the algorithm in Matlab lsqnonlin function ), on the right is the fitted model using "LM" with nls.lm. Looking at the fitted parameters(Fit.LM$par) you will find that one fitted parameter(f_parent_to_m1) is at the boundary 1. If I change the starting value for one parameter M0_parent from 0.1 to 100, then I got the same results using nls.lm and lsqnonlin.I have many cases like this one.
newpars <- rbind(Fit.TRR$par,Fit.LM$par)
rownames(newpars)<- c("TRR(lsqnonlin)","LM(nls.lm)")
newpars
M0_parent k_parent k_m1 f_parent_to_m1
TRR(lsqnonlin) 99.59848 0.09869773 0.005260654 0.514476
LM(nls.lm) 84.79150 0.06352110 0.014783294 1.000000
Except for the above problems, it often happens that the Hessian returned by nls.lm is not invertable(especially when some parameters are on the boundary) so I cannot get an estimation of the covariance matrix. On the other hand, the "TRR" algorithm(in Matlab) almost always give an estimation by calculating the Jacobian at the solution point. I think this is useful but I am also sure that R optimization algorithms(the ones I have tried) did not do this for a reason. I would like to know whether I am wrong by using the Matlab way of calculating the covariance matrix to get standard error for the parameter estimates.
One last note, I claimed in my previous post that the Matlab lsqnonlin outperforms R's optimization functions in almost all cases. I was wrong. The "Trust-Region-Reflective" algorithm used in Matlab is in fact slower(sometimes much slower) if also implemented in R as you can see from the above example. However, it is still more stable and reaches a better solution than the R's basic optimization algorithms.
First off, I am not an expert on Matlab and Optimisation and have never used R.
I am not sure I see what your actual question is, but maybe I can shed some light into your puzzlement:
LM is slightly enhanced Gauß-Newton approach - for problems with several local minima it is very sensitive to initial states. Including boundaries typically generates more of those minima.
TRR is akin to LM, but more robust. It has better capabilities for "jumping out of" bad local minima. It is quite feasible that it will behave better, but perform worse, than an LM. Actually explaining why is very hard. You would need to study the algorithms in detail and look at how they behave in this situation.
I cannot explain the difference between Matlab's and R's implementation, but there are several extensions to TRR that maybe Matlab uses and R does not.
Does your approach of using LM and TRR alternatingly converge better than TRR alone?
Using the mkin package, you can find the parameters using the "Port" algorithm (which is also a kind of a TRR algorithm as far as I could tell from its documentation), or the "Marq" algorithm, which uses nls.lm in the background. Then you can use "normal" starting values or "bad" starting values.
library(mkin)
packageVersion("mkin")
Recent mkin version can speed up the process considerably as they compile the models from automatically generated C code if a compiler is available on your system (e.g. you have r-base-dev installed on Debian/Ubuntu, or Rtools on Windows).
This defines the model:
m <- mkinmod(parent = mkinsub("SFO", "m1"),
m1 = mkinsub("SFO"),
use_of_ff = "max")
You can check that the differential equations are correct:
cat(m$diffs, sep = "\n")
Then we fit in four variants, Port and LM, with or without M0 fixed to 0.1:
f.Port = mkinfit(m, FOCUS_2006_D)
f.Port.M0 = mkinfit(m, FOCUS_2006_D, state.ini = c(parent = 0.1, m1 = 0))
f.LM = mkinfit(m, FOCUS_2006_D, method.modFit = "Marq")
f.LM.M0 = mkinfit(m, FOCUS_2006_D, state.ini = c(parent = 0.1, m1 = 0),
method.modFit = "Marq")
Then we look at the results:
results <- sapply(list(Port = f.Port, Port.M0 = f.Port.M0, LM = f.LM, LM.M0 = f.LM.M0),
function(x) round(summary(x)$bpar[, "Estimate"], 5))
which are
Port Port.M0 LM LM.M0
parent_0 99.59848 99.59848 99.59848 39.52278
k_parent 0.09870 0.09870 0.09870 0.00000
k_m1 0.00526 0.00526 0.00526 0.00000
f_parent_to_m1 0.51448 0.51448 0.51448 1.00000
So we can see that the Port algorithm finds the best solution (to the best of my knowledge) even with bad starting values. The speed issue that one may have with more complicated models is alleviated using the automatic generation of C code.

Resources