Julia: #fastmath not defined - julia

function init!(u)
n = length(u)
dx = 1.0 / (n-1)
#fastmath #inbounds #simd for i in 1:n
u[i] = sin(2pi*dx*i)
end
end
when I execute above function in IJulia, it prompts below texts:
#fastmath not defined

Thanks Simon Byrne. My julia version is v0.3.8 less than v0.4. That's why julia prompts "#fastmath not defined".

Related

Optim Julia parameter meaning

I'm trying to use Optim in Julia to solve a two variable minimization problem, similar to the following
x = [1.0, 2.0, 3.0]
y = 1.0 .+ 2.0 .* x .+ [-0.3, 0.3, -0.1]
function sqerror(betas, X, Y)
err = 0.0
for i in 1:length(X)
pred_i = betas[1] + betas[2] * X[i]
err += (Y[i] - pred_i)^2
end
return err
end
res = optimize(b -> sqerror(b, x, y), [0.0,0.0])
res.minimizer
I do not quite understand what [0.0,0.0] means. By looking at the document http://julianlsolvers.github.io/Optim.jl/v0.9.3/user/minimization/. My understanding is that it is the initial condition. However, if I change that to [0.0,0., 0.0], the algorithm still work despite the fact that I only have two unknowns, and the algorithm gives me three instead of two minimizer. I was wondering if anyone knows what[0.0,0.0] really stands for.
It is initial value. optimize by itself cannot know how many values your sqerror function takes. You specify it by passing this initial value.
For example if you add dimensionality check to sqerror you will get a proper error:
julia> function sqerror(betas::AbstractVector, X::AbstractVector, Y::AbstractVector)
#assert length(betas) == 2
err = 0.0
for i in eachindex(X, Y)
pred_i = betas[1] + betas[2] * X[i]
err += (Y[i] - pred_i)^2
end
return err
end
sqerror (generic function with 2 methods)
julia> optimize(b -> sqerror(b, x, y), [0.0,0.0,0.0])
ERROR: AssertionError: length(betas) == 2
Note that I also changed the loop condition to eachindex(X, Y) to ensure that your function checks if X and Y vectors have aligned indices.
Finally if you want performance and reduce compilation cost (so e.g. assuming you do this optimization many times) it would be better to define your optimized function like this:
objective_factory(x, y) = b -> sqerror(b, x, y)
optimize(objective_factory(x, y), [0.0,0.0])

Fast summation of symmetric matrix in Julia

I want to sum all elements in a matrix A with dimension n times n. The matrix is symmetric and has 0s on the diagonal. The fastest way to do so that I have found is simply
sum(A). However this seems wasteful since it doesn't use the fact that I only need to calculate the lower triangle of the matrix. However, sum(tril(A, -1)) is significantly slower, and sum(A[i, j] for i = 1:n-1 for j = i+1:n) even more so. Is there a more efficient way to sum the matrix?
Edit: The solution by #AboAmmar performs well. Here is code (with summing the diagonal separately, something that can be removed if there is only zeros on the diagonal) to compare:
using BenchmarkTools
using LinearAlgebra
function sum_triu(A)
m, n = size(A)
#assert m == n
s = zero(eltype(A))
for j = 2:n
#simd for i = 1:j-1
s += #inbounds A[i,j]
end
end
s *= 2
for i = 1:n
s += A[i, i]
end
return s
end
N = 1000
A = Symmetric(rand(0:9,N,N))
A -= diagm(diag(A))
#btime sum(A)
#btime 2 * sum(tril(A))
#btime sum_triu(A)
This is 2.7X faster than sum for n = 1000 matrix. Make sure to add a #simd before the loop and use #inbounds. Also, use the correct loop order for fast memory access.
function sum_triu(A)
m, n = size(A)
#assert m == n
s = zero(eltype(A))
for j = 1:n
#simd for i = 1:j
s += #inbounds A[i,j]
end
end
return 2 * s
end
Example run on my PC:
sum_triu(A) = 499268.7328022966
sum(A) = 499268.73280229873
93.000 μs (0 allocations: 0 bytes)
249.900 μs (0 allocations: 0 bytes)
How about
2 * sum(LowerTriangular(A))
help?> LA.LowerTriangular
LowerTriangular(A::AbstractMatrix)
Construct a LowerTriangular view of the matrix A.
tril creates a new matrix, which allocates memory. Since a LowerTriangular is a view into the existing matrix, there's no memory allocation.

Why two styles of executing Juilia programs are giving different results?

If a run a program written in julia as
sachin#localhost:$ julia mettis.jl then it runs sucessfully, without printing anything, though one print statement is in it.
And Secondly If run it as by going in julia:
sachin#localhost:$ julia
julia> include("mettis.jl")
main (generic function with 1 method)`
julia> main()
Then It gives some error.
I am puzzled why two style of executing is giving different result ?
Here is my code:
using ITensors
using Printf
function ITensors.op(::OpName"expτSS", ::SiteType"S=1/2", s1::Index, s2::Index; τ)
h =
1 / 2 * op("S+", s1) * op("S-", s2) +
1 / 2 * op("S-", s1) * op("S+", s2) +
op("Sz", s1) * op("Sz", s2)
return exp(τ * h)
end
function main(; N=10, cutoff=1E-8, δτ=0.1, beta_max=2.0)
# Make an array of 'site' indices
s = siteinds("S=1/2", N; conserve_qns=true)
# #show s
# Make gates (1,2),(2,3),(3,4),...
gates = ops([("expτSS", (n, n + 1), (τ=-δτ / 2,)) for n in 1:(N - 1)], s)
# Include gates in reverse order to complete Trotter formula
append!(gates, reverse(gates))
# Initial state is infinite-temperature mixed state
rho = MPO(s, "Id") ./ √2
#show inner(rho, H)
# Make H for measuring the energy
terms = OpSum()
for j in 1:(N - 1)
terms += 1 / 2, "S+", j, "S-", j + 1
terms += 1 / 2, "S-", j, "S+", j + 1
terms += "Sz", j, "Sz", j + 1
end
H = MPO(terms, s)
# Do the time evolution by applying the gates
# for Nsteps steps
for β in 0:δτ:beta_max
energy = inner(rho, H)
#printf("β = %.2f energy = %.8f\n", β, energy)
rho = apply(gates, rho; cutoff)main
rho = rho / tr(rho)
end
# #show energy
return nothing
end
There is nothing special about a function called main in Julia and defining a function is different from calling it. Consequently a file mettis.jl with the following code:
function main()
println("Hello, World!")
end
will not "do" anything when run (julia mettis.jl). However if you actually call the function at the end:
function main()
println("Hello, World!")
end
main()
you get the expected result
$ julia mettis.jl
Hello, World!

Improving the speed of a for loop in Julia

Here is my code in Julia platform and I like to speed it up. Is there anyway that I can make this faster? It takes 0.5 seconds for a dataset of 50k*50k. I was expecting Julia to be a lot faster than this or I am not sure if I am doing a silly implementation.
ar = [[1,2,3,4,5], [2,3,4,5,6,7,8], [4,7,8,9], [9,10], [2,3,4,5]]
SV = rand(10,5)
function h_score_0(ar ,SV)
m = length(ar)
SC = Array{Float64,2}(undef, size(SV, 2), m)
for iter = 1:m
nodes = ar[iter]
for jj = 1:size(SV, 2)
mx = maximum(SV[nodes, jj])
mn = minimum(SV[nodes, jj])
term1 = (mx - mn)^2;
SC[jj, iter] = (term1);
end
end
return score = sum(SC, dims = 1)
end
You have some unnecessary allocations in your code:
mx = maximum(SV[nodes, jj])
mn = minimum(SV[nodes, jj])
Slices allocate, so each line makes a copy of the data here, you're actually copying the data twice, once on each line. You can either make sure to copy only once, or even better: use view, so there is no copy at all (note that view is much faster on Julia v1.5, in case you are using an older version).
SC = Array{Float64,2}(undef, size(SV, 2), m)
And no reason to create a matrix here, and sum over it afterwards, just accumulate while you are iterating:
score[i] += (mx - mn)^2
Here's a function that is >5x as fast on my laptop for the input data you specified:
function h_score_1(ar, SV)
score = zeros(eltype(SV), length(ar))
#inbounds for i in eachindex(ar)
nodes = ar[i]
for j in axes(SV, 2)
SVview = view(SV, nodes, j)
mx = maximum(SVview)
mn = minimum(SVview)
score[i] += (mx - mn)^2
end
end
return score
end
This function outputs a one-dimensional vector instead of a 1xN matrix in your original function.
In principle, this could be even faster if we replace
mx = maximum(SVview)
mn = minimum(SVview)
with
(mn, mx) = extrema(SVview)
which only traverses the vector once, instead of twice. Unfortunately, there is a performance issue with extrema, so it is currently not as fast as separate maximum/minimum calls: https://github.com/JuliaLang/julia/issues/31442
Finally, for absolutely getting the best performance at the cost of brevity, we can avoid creating a view at all and turn the calls to maximum and minimum into a single explicit loop traversal:
function h_score_2(ar, SV)
score = zeros(eltype(SV), length(ar))
#inbounds for i in eachindex(ar)
nodes = ar[i]
for j in axes(SV, 2)
mx, mn = -Inf, +Inf
for node in nodes
x = SV[node, j]
mx = ifelse(x > mx, x, mx)
mn = ifelse(x < mn, x, mn)
end
score[i] += (mx - mn)^2
end
end
return score
end
This also avoids the performance issue that extrema suffers, and looks up the SV element once per node. Although this version is annoying to write, it's substantially faster, even on Julia 1.5 where views are free. Here are some benchmark timings with your test data:
julia> using BenchmarkTools
julia> #btime h_score_0($ar, $SV)
2.344 μs (52 allocations: 6.19 KiB)
1×5 Matrix{Float64}:
1.95458 2.94592 2.79438 0.709745 1.85877
julia> #btime h_score_1($ar, $SV)
392.035 ns (1 allocation: 128 bytes)
5-element Vector{Float64}:
1.9545848011260765
2.9459235098820167
2.794383144368953
0.7097448590904598
1.8587691646610984
julia> #btime h_score_2($ar, $SV)
118.243 ns (1 allocation: 128 bytes)
5-element Vector{Float64}:
1.9545848011260765
2.9459235098820167
2.794383144368953
0.7097448590904598
1.8587691646610984
So explicitly writing out the innermost loop is worth it here, reducing time by another 3x or so. It's annoying that the Julia compiler isn't yet able to generate code this efficient, but it does get smarter with every version. On the other hand, the explicit loop version will be fast forever, so if this code is really performance critical, it's probably worth writing it out like this.

Compiling A Mexfile using R CMD SHLIB

I am trying to import a number of Fortran 90 codes into R for a project. They were initially written with a mex (matlab integration of Fortran routines) type compilation in mind. This is what one of the codes look like:
# include <fintrf.h>
subroutine mexFunction(nlhs, plhs, nrhs, prhs)
!--------------------------------------------------------------
! MEX file for VFI3FCN routine
!
! log M_{t,t+1} = log \beta + gamma (x_t - x_{t+1})
! gamma = gamA + gamB (x_t - xbar)
!
!--------------------------------------------------------------
implicit none
mwPointer plhs(*), prhs(*)
integer nlhs, nrhs
mwPointer mxGetM, mxGetPr, mxCreateDoubleMatrix
mwPointer nk, nkp, nz, nx, nh
mwSize col_hxz, col_hz, col_xz
! check for proper number of arguments.
if(nrhs .ne. 31) then
call mexErrMsgTxt('31 input variables required.')
elseif(nlhs .ne. 4) then
call mexErrMsgTxt('4 output variables required.')
endif
! get the size of the input array.
nk = mxGetM(prhs(5))
nx = mxGetM(prhs(7))
nz = mxGetM(prhs(11))
nh = mxGetM(prhs(14))
nkp = mxGetM(prhs(16))
col_hxz = nx*nz*nh
col_xz = nx*nz
col_hz = nz*nh
! create matrix for the return arguments.
plhs(1) = mxCreateDoubleMatrix(nk, col_hxz, 0)
plhs(2) = mxCreateDoubleMatrix(nk, col_hxz, 0)
plhs(3) = mxCreateDoubleMatrix(nk, col_hxz, 0)
plhs(4) = mxCreateDoubleMatrix(nk, col_hxz, 0)
call vfi3fcnIEccB(%val(mxGetPr(plhs(1))), nkp)
return
end
subroutine vfi3fcnIEccB(optK, V, I, div, & ! output variables
alp1, alp2, alp3, V0, k, nk, x, xbar, nx, Qx, z, nz, Qz, h, nh, kp, &
alpha, beta, delta, f, gamA, gamB, gP, gN, istar, kmin, kmtrx, ksubm, hmtrx, xmtrx, zmtrx, &
nkp, col_hxz, col_xz, col_hz)
use ifwin
implicit none
! specify input and output variables
integer, intent(in) :: nk, nkp, nx, nz, nh, col_hxz, col_xz, col_hz
real*8, intent(out) :: V(nk, col_hxz), optK(nk, col_hxz), I(nk, col_hxz), div(nk, col_hxz)
real*8, intent(in) :: V0(nk, col_hxz), k(nk), kp(nkp), x(nx), z(nz), Qx(nx, nx), Qz(nz, nz), h(nh)
real*8, intent(in) :: alp1, alp2, alp3, xbar, kmin, alpha, gP, gN, beta, delta, gamA, gamB, f, istar
real*8, intent(in) :: kmtrx(nk, col_hxz), ksubm(nk, col_hz), zmtrx(nk, col_hxz), xmtrx(nk, col_hxz), hmtrx(nk, col_hxz)
! specify intermediate variables
real*8 :: Res(nk, col_hxz), Obj(nk, col_hxz), optKold(nk, col_hxz), Vold(nk, col_hxz), tmpEMV(nkp, col_hz), tmpI(nkp), &
tmpObj(nkp, col_hz), tmpA(nk, col_hxz), tmpQ(nx*nh, nh), detM(nx), stoM(nx), g(nkp), tmpInd(nh, nz)
real*8 :: Qh(nh, nh, nx), Qxh(nx*nh, nx*nh), Qzxh(col_hxz, col_hxz)
real*8 :: hp, d(nh), errK, errV, T1, lapse
integer :: ix, ih, iter, optJ(col_hz), ik, iz, ind(nh, col_xz), subindex(nx, col_hz)
logical*4 :: statConsole
! construct the transition matrix for kh --- there are nx number of these transition matrix: 3-d
Qh = 0.0
do ix = 1, nx
do ih = 1, nh
! compute the predicted next period kh
hp = alp1 + alp2*h(ih) + alp3*(x(ix) - xbar)
! construct transition probability vector
d = abs(h - hp) + 1D-32
Qh(:, ih, ix) = (1/d)/sum(1/d)
end do
end do
! construct the compound transition matrix over (z x h) space
! compound the (x h) space
Qxh = 0.0
do ix = 1, nx
call kron(tmpQ, Qx(:, ix), Qh(:, :, ix), nx, 1, nh, nh)
Qxh(:, (ix - 1)*nh + 1 : ix*nh) = tmpQ
end do
! compound the (z x h) space: h changes the faster, followed by x, and z changes the slowest
call kron(Qzxh, Qz, Qxh, nz, nz, nx*nh, nx*nh)
! available funds for the firm
Res = dexp(xmtrx + zmtrx + hmtrx)*(kmtrx**alpha) + (1 - delta)*kmtrx - f
! initializing
Obj = 0.0
optK = 0.0
optKold = optK + 1.0
Vold = V0
! Some Intermediate Variables Used in Stochastic Discount Factor
detM = beta*dexp((gamA - gamB*xbar)*x + gamB*x**2)
stoM = -(gamA - gamB*xbar + gamB*x)
! Intermediate index vector to facilitate submatrix extracting
ind = reshape((/1 : col_hxz : 1/), (/nh, col_xz/))
do ix = 1, nx
tmpInd = ind(:, ix : col_xz : nx)
do iz = 1, nz
subindex(ix, (iz - 1)*nh + 1 : iz*nh) = tmpInd(:, iz)
end do
end do
! start iterations
errK = 1.0
errV = 1.0
iter = 0
T1 = secnds(0.0)
do
if (errV <= 1D-3 .AND. errK <= 1D-8) then
exit
else
iter = iter + 1
do ix = 1, nx
! next period value function by linear interpolation: nkp by nz*nh matrix
call interp1(tmpEMV, k, detM(ix)*(matmul(dexp(stoM(ix)*xmtrx)*Vold, Qzxh(:, subindex(ix, :)))) - ksubm, kp, &
nk, nkp, col_hz)
! maximize the right-hand size of Bellman equation on EACH grid point of capital stock
do ik = 1, nk
! with istar tmpI is no longer investment but a linear transformation of that
tmpI = (kp - (1.0 - delta)*k(ik))/k(ik) - istar
where (tmpI >= 0.0)
g = gP
elsewhere
g = gN
end where
tmpObj = tmpEMV - spread((g/2.0)*(tmpI**2)*k(ik), 2, col_hz)
! direct discrete maximization
Obj(ik, subindex(ix, :)) = maxval(tmpObj, 1)
optJ = maxloc(tmpObj, 1)
optK(ik, subindex(ix, :)) = kp(optJ)
end do
end do
! update value function and impose limited liability condition
V = max(Res + Obj, 1D-16)
! convergence criterion
errK = maxval(abs(optK - optKold))
errV = maxval(abs(V - Vold))
! revise Initial Guess
Vold = V
optKold = optK
! visual
if (modulo(iter, 50) == 0) then
lapse = secnds(T1)
statConsole = AllocConsole()
print "(a, f10.7, a, f10.7, a, f8.1, a)", " errV:", errV, " errK:", errK, " Time:", lapse, "s"
end if
end if
end do
! visual check on errors
lapse = secnds(T1)
statConsole = AllocConsole()
print "(a, f10.7, a, f10.7, a, f8.1, a)", " errV:", errV, " errK:", errK, " Time:", lapse, "s"
! optimal investment and dividend
I = optK - (1.0 - delta)*kmtrx
tmpA = I/kmtrx - istar
where (tmpA >= 0)
div = Res - optK - (gP/2.0)*(tmpA**2)*kmtrx
elsewhere
div = Res - optK - (gN/2.0)*(tmpA**2)*kmtrx
end where
return
end
subroutine interp1(v, x, y, u, m, n, col)
!-------------------------------------------------------------------------------------------------------
! Linear interpolation routine similar to interp1 with 'linear' as method parameter in Matlab
!
! OUTPUT:
! v - function values on non-grid points (n by col matrix)
!
! INPUT:
! x - grid (m by one vector)
! y - function defined on the grid x (m by col matrix)
! u - non-grid points on which y(x) is to be interpolated (n by one vector)
! m - length of x and y vectors
! n - length of u and v vectors
! col - number of columns of v and y matrices
!
! Four ways to pass array arguments:
! 1. Use explicit-shape arrays and pass the dimension as an argument(most efficient)
! 2. Use assumed-shape arrays and use interface to call external subroutine
! 3. Use assumed-shape arrays and make subroutine internal by using "contains"
! 4. Use assumed-shape arrays and put interface in a module then use module
!
! This subroutine is equavilent to the following matlab call
! v = interp1(x, y, u, 'linear', 'extrap') with x (m by 1), y (m by col), u (n by 1), and v (n by col)
!------------------------------------------------------------------------------------------------------
implicit none
integer :: m, n, col, i, j
real*8, intent(out) :: v(n, col)
real*8, intent(in) :: x(m), y(m, col), u(n)
real*8 :: prob
do i = 1, n
if (u(i) < x(1)) then
! extrapolation to the left
v(i, :) = y(1, :) - (y(2, :) - y(1, :)) * ((x(1) - u(i))/(x(2) - x(1)))
else if (u(i) > x(m)) then
! extrapolation to the right
v(i, :) = y(m, :) + (y(m, :) - y(m-1, :)) * ((u(i) - x(m))/(x(m) - x(m-1)))
else
! interpolation
! find the j such that x(j) <= u(i) < x(j+1)
call bisection(x, u(i), m, j)
prob = (u(i) - x(j))/(x(j+1) - x(j))
v(i, :) = y(j, :)*(1 - prob) + y(j+1, :)*prob
end if
end do
end subroutine interp1
subroutine bisection(list, element, m, k)
!--------------------------------------------------------------------------------
! find index k in list such that (list(k) <= element < list(k+1)
!--------------------------------------------------------------------------------
implicit none
integer*4 :: m, k, first, last, half
real*8 :: list(m), element
first = 1
last = m
do
if (first == (last-1)) exit
half = (first + last)/2
if ( element < list(half) ) then
! discard second half
last = half
else
! discard first half
first = half
end if
end do
k = first
end subroutine bisection
subroutine kron(K, A, B, rowA, colA, rowB, colB)
!--------------------------------------------------------------------------------
! Perform K = kron(A, B); translated directly from kron.m
!
! OUTPUT:
! K -- rowA*rowB by colA*colB matrix
!
! INPUT:
! A -- rowA by colA matrix
! B -- rowB by colB matrix
! rowA, colA, rowB, colB -- integers containing shape information
!--------------------------------------------------------------------------------
implicit none
integer, intent(in) :: rowA, colA, rowB, colB
real*8, intent(in) :: A(rowA, colA), B(rowB, colB)
real*8, intent(out) :: K(rowA*rowB, colA*colB)
integer :: t1(rowA*rowB), t2(colA*colB), i, ia(rowA*rowB), ja(colA*colB), ib(rowA*rowB), jb(colA*colB)
t1 = (/ (i, i = 0, (rowA*rowB - 1)) /)
ia = int(t1/rowB) + 1
ib = mod(t1, rowB) + 1
t2 = (/ (i, i = 0, (colA*colB - 1)) /)
ja = int(t2/colB) + 1
jb = mod(t2, colB) + 1
K = A(ia, ja)*B(ib, jb)
end subroutine kron
My initial plan was to remove the mexFunction subroutine and compile the main Fortran subroutines using the R CMD SHLIB command but then I run into the Rtools compiler not knowing where to find the ifwin library even though I have the library in my intel fortran compiler folder.
So my first question is:
1) Is there a way for me to tell rtools where to find the ifwin library and any other library I need to include? Or is there a way to include the dependency libraries in the R CMD SHLIB command so the compiler can find the necessary libraries and compile?
2) If the answer to two is no, can I some how use the compiled version from Matlab in R. I can compile the file as is in matlab using the mex Zhang_4.f90 command with no errors.
3) Is there a way of setting up an environment in Visual Studio 2015 so I can compile Fortran subroutines for use in R using the Intel compiler?
When I take out the mexFunction subroutine and try compiling the rest of the code, I get the following error:
D:\SS_Programming\Fortran>R CMD SHLIB Zhang_4.f90
c:/Rtools/mingw_64/bin/gfortran -O2 -mtune=core2 -c Zhang_4.f90 -o
Zhang_4.o
Zhang_4.f90:6.4:
use ifwin
1
Fatal Error: Can't open module file 'ifwin.mod' for reading at (1): No
such file or directory
make: *** [Zhang_4.o] Error 1
Warning message:
running command 'make -f "C:/PROGRA~1/R/R-34~1.2/etc/x64/Makeconf" -f
"C:/PROGRA~1/R/R-34~1.2/share/make/winshlib.mk"
SHLIB_LDFLAGS='$(SHLIB_FCLDFLAGS)' SHLIB_LD='$(SHLIB_FCLD)'
SHLIB="Zhang_4.dll" SHLIB_LIBADD='$(FCLIBS)' WIN=64 TCLBIN=64
OBJECTS="Zhang_4.o"' had status 2
I don't think there is any other way then rewrite the code to not use IFWIN. Unless you manage to use Intel Fortran for R (that might require recompiling the whole R distribution...). Matlab is tied to Intel Fortran, that's why the code worked there.
You have to adjust the code anyway, you cannot use it as it stands.
Just get rid of the AllocConsole() calls and the print statements. You will need to use the R routines to print to console. See https://cran.r-project.org/doc/manuals/R-exts.html#Printing-from-FORTRAN

Resources