undefined references ATLAS, MPI, CBLAS when compiling HPL in Centos6 - mpi

I am trying to run HPL 2.1 in my Centos systems.
This is my make file:
[root#cadejos-0 hpl]# cat Make.cadejos
#
# -- High Performance Computing Linpack Benchmark (HPL)
# HPL - 2.1 - October 26, 2012
# Antoine P. Petitet
# University of Tennessee, Knoxville
# Innovative Computing Laboratory
# (C) Copyright 2000-2008 All Rights Reserved
#
# -- Copyright notice and Licensing terms:
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions, and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# 3. All advertising materials mentioning features or use of this
# software must display the following acknowledgement:
# This product includes software developed at the University of
# Tennessee, Knoxville, Innovative Computing Laboratory.
#
# 4. The name of the University, the name of the Laboratory, or the
# names of its contributors may not be used to endorse or promote
# products derived from this software without specific written
# permission.
#
# -- Disclaimer:
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY
# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# ######################################################################
#
# ----------------------------------------------------------------------
# - shell --------------------------------------------------------------
# ----------------------------------------------------------------------
#
SHELL = /bin/sh
#
CD = cd
CP = cp
LN_S = ln -s
MKDIR = mkdir
RM = /bin/rm -f
TOUCH = touch
#
# ----------------------------------------------------------------------
# - Platform identifier ------------------------------------------------
# ----------------------------------------------------------------------
#
ARCH = Linux_PII_CBLAS
#
# ----------------------------------------------------------------------
# - HPL Directory Structure / HPL library ------------------------------
# ----------------------------------------------------------------------
#
TOPdir = $(HOME)/hpl
INCdir = $(TOPdir)/include
BINdir = $(TOPdir)/bin/$(ARCH)
LIBdir = $(TOPdir)/lib/$(ARCH)
#
HPLlib = $(LIBdir)/libhpl.a
#
# ----------------------------------------------------------------------
# - Message Passing library (MPI) --------------------------------------
# ----------------------------------------------------------------------
# MPinc tells the C compiler where to find the Message Passing library
# header files, MPlib is defined to be the name of the library to be
# used. The variable MPdir is only used for defining MPinc and MPlib.
#
MPdir = /usr/include/mpich2-x86_64
MPinc = -I$(MPdir)
MPlib = /usr/lib64/mpich2/lib/libmpich.a
#
# ----------------------------------------------------------------------
# - Linear Algebra library (BLAS or VSIPL) -----------------------------
# ----------------------------------------------------------------------
# LAinc tells the C compiler where to find the Linear Algebra library
# header files, LAlib is defined to be the name of the library to be
# used. The variable LAdir is only used for defining LAinc and LAlib.
#
LAdir = /usr/include/atlas-x86_64-base/
LAinc = -I$(LAdir)
LAlib = /usr/lib64/atlas/libatlas.a /usr/lib64/atlas/libcblas.a
#
# ----------------------------------------------------------------------
# - F77 / C interface --------------------------------------------------
# ----------------------------------------------------------------------
# You can skip this section if and only if you are not planning to use
# a BLAS library featuring a Fortran 77 interface. Otherwise, it is
# necessary to fill out the F2CDEFS variable with the appropriate
# options. **One and only one** option should be chosen in **each** of
# the 3 following categories:
#
# 1) name space (How C calls a Fortran 77 routine)
#
# -DAdd_ : all lower case and a suffixed underscore (Suns,
# Intel, ...), [default]
# -DNoChange : all lower case (IBM RS6000),
# -DUpCase : all upper case (Cray),
# -DAdd__ : the FORTRAN compiler in use is f2c.
#
# 2) C and Fortran 77 integer mapping
#
# -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default]
# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long,
# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short.
#
# 3) Fortran 77 string handling
#
# -DStringSunStyle : The string address is passed at the string loca-
# tion on the stack, and the string length is then
# passed as an F77_INTEGER after all explicit
# stack arguments, [default]
# -DStringStructPtr : The address of a structure is passed by a
# Fortran 77 string, and the structure is of the
# form: struct {char *cp; F77_INTEGER len;},
# -DStringStructVal : A structure is passed by value for each Fortran
# 77 string, and the structure is of the form:
# struct {char *cp; F77_INTEGER len;},
# -DStringCrayStyle : Special option for Cray machines, which uses
# Cray fcd (fortran character descriptor) for
# interoperation.
#
F2CDEFS =
#
# ----------------------------------------------------------------------
# - HPL includes / libraries / specifics -------------------------------
# ----------------------------------------------------------------------
#
HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc)
HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib)
#
# - Compile time options -----------------------------------------------
#
# -DHPL_COPY_L force the copy of the panel L before bcast;
# -DHPL_CALL_CBLAS call the cblas interface;
# -DHPL_CALL_VSIPL call the vsip library;
# -DHPL_DETAILED_TIMING enable detailed timers;
#
# By default HPL will:
# *) not copy L before broadcast,
# *) call the BLAS Fortran 77 interface,
# *) not display detailed timing information.
#
HPL_OPTS = -DHPL_CALL_CBLAS -DHPL_DETAILED_TIMING
#
# ----------------------------------------------------------------------
#
HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)
#
# ----------------------------------------------------------------------
# - Compilers / linkers - Optimization flags ---------------------------
# ----------------------------------------------------------------------
#
CC = /usr/bin/gcc
CCNOOPT = $(HPL_DEFS)
CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops
#
# On some platforms, it is necessary to use the Fortran linker to find
# the Fortran internals used in the BLAS library.
#
LINKER = /usr/bin/g77
LINKFLAGS = $(CCFLAGS)
#
ARCHIVER = ar
ARFLAGS = r
RANLIB = echo
#
# ----------------------------------------------------------------------
I got some errors fixed here and there but i cant get past this lot o' undefined references:
/usr/bin/gcc -o HPL_pdtest.o -c -DHPL_CALL_CBLAS -DHPL_DETAILED_TIMING -I/root/hpl/include -I/root/hpl/include/Linux_PII_CBLAS -I/usr/include/atlas-x86_64-base/ -I/usr/include/mpich2-x86_64 -fomit-frame-pointer -O3 -funroll-loops ../HPL_pdtest.c
/usr/bin/g77 -DHPL_CALL_CBLAS -DHPL_DETAILED_TIMING -I/root/hpl/include -I/root/hpl/include/Linux_PII_CBLAS -I/usr/include/atlas-x86_64-base/ -I/usr/include/mpich2-x86_64 -fomit-frame-pointer -O3 -funroll-loops -o /root/hpl/bin/Linux_PII_CBLAS/xhpl HPL_pddriver.o HPL_pdinfo.o HPL_pdtest.o /root/hpl/lib/Linux_PII_CBLAS/libhpl.a /usr/lib64/atlas/libatlas.a /usr/lib64/atlas/libcblas.a /usr/lib64/mpich2/lib/libmpich.a
/usr/lib64/atlas/libcblas.a(cblas_dgemm.o): In function `cblas_dgemm':
(.text+0x321): undefined reference to `ATL_dsyrk'
/usr/lib64/atlas/libcblas.a(cblas_dgemm.o): In function `cblas_dgemm':
(.text+0x186): undefined reference to `ATL_dgemm'
/usr/lib64/atlas/libcblas.a(cblas_dgemm.o): In function `cblas_dgemm':
(.text+0x35e): undefined reference to `ATL_dsyreflect'
... A LOT MORE ...
/usr/lib64/mpich2/lib/libmpich.a(info_getvallen.o): In function `MPI_Info_get_valuelen':
(.text+0x37b): undefined reference to `pthread_setspecific'
/usr/lib64/mpich2/lib/libmpich.a(info_getvallen.o): In function `MPI_Info_get_valuelen':
(.text+0x38d): undefined reference to `pthread_getspecific'
/usr/lib64/mpich2/lib/libmpich.a(info_getvallen.o): In function `MPI_Info_get_valuelen':
(.text+0x3b7): undefined reference to `pthread_setspecific'
collect2: ld returned 1 exit status
make[2]: *** [dexe.grd] Error 1
make[2]: Leaving directory `/root/hpl/testing/ptest/cadejos'
make[1]: *** [build_tst] Error 2
make[1]: Leaving directory `/root/hpl'
make: *** [build] Error 2
I think it has something to do with the libs path but i am unable to go on...
Has anyone run into this kind or problem?

looks like its not finding atlas and pthreads
undefined reference to 'ATL_dsyreflect'
undefined reference to 'pthread_setspecific'
you will need to link to this properly or in the case of atlas it could be built incorrectly.
i believe pthreads requires a -lpthread flag which i do no see above and I also do not see anything with atlas

As for the Pthreads error: every MPI library requires a certain number of external dependencies to be linked when building the final executable. That's why it is not a good idea to use gcc or g77 to link an MPI executable. Rather mpicc or mpif90 or another MPI compiler wrapper should be used. Anyways, Pthreads is linked if -pthread is specified.
As for the ATLAS errors: the order in which static libraries are listed on the command line matters (it doesn't for dynamic libraries). In your case you have:
... /usr/lib64/atlas/libatlas.a /usr/lib64/atlas/libcblas.a ...
With such an arrangement the linker won't be able to resolve any references from code inside libcblas.a to code inside libatlas.a. This is due to the single pass symbol resolution done by the linker. libcblas.a should come before libatlas.a, not after.

Related

Cannot write int16 data type using the R's rhdf5 package

In R I would like to write a matrix of integers into an HDF5 file ".h5" as an int16 data type. To do so I am using the rhdf5 package. The documentation says that you should set one of the supported H5 data types when creating the dataset. However, even when setting up the int16 data type the result is always int32. Is it possible to store the data as int16 or uint16?
library(rhdf5)
m <- matrix(1,5,5)
outFile <- "test.h5"
h5createFile(outFile)
h5createDataset(file=outFile,"m",dims=dim(m),H5type = "H5T_NATIVE_INT16")
h5write(m,file=outFile,name="m")
H5close()
h5ls(outFile)
The result is:
Using another library as I did not find rhdf5
library(hdf5r)
m <- matrix(1L,5L,5L)
outFile <- h5file("test.h5")
createDataSet(outFile, "m", m, dtype=h5types$H5T_NATIVE_INT16)
print(outFile)
print(outFile[["m"]])
h5close(outFile)
For the first print (the file)
Class: H5File
Filename: D:\Travail\Projets\SWM\swm.gps\test.h5
Access type: H5F_ACC_RDWR
Listing:
name obj_type dataset.dims dataset.type_class
m H5I_DATASET 5 x 5 H5T_INTEGER
Here we see it displays H5T_INTEGER as the datatype for the dataset m
and the second (the dataset)
Class: H5D
Dataset: /m
Filename: D:\Travail\Projets\SWM\swm.gps\test.h5
Access type: H5F_ACC_RDWR
Datatype: H5T_STD_I16LE
Space: Type=Simple Dims=5 x 5 Maxdims=Inf x Inf
Chunk: 64 x 64
We can see that it has the right datatype H5T_STD_I16LE
The code your provided works as expected, but it's a limitation of the h5ls() function in rhdf5 that it doens't report a more details data type. As #r2evans points out, it's technically true that it's an integer, you just want to know a bit more detail that that.
If we run you code and use the h5ls() tool distributed by the HDF5 group we get more information:
library(rhdf5)
m <- matrix(1,5,5)
outFile <- tempfile(fileext = ".h5")
h5createFile(outFile)
h5createDataset(file=outFile,"m", dims=dim(m),H5type = "H5T_NATIVE_INT16")
h5write(m,file=outFile, name="m")
system2("h5ls", args = list("-v", outFile))
## Opened "/tmp/RtmpFclmR3/file299e79c4c206.h5" with sec2 driver.
## m Dataset {5/5, 5/5}
## Attribute: rhdf5-NA.OK {1}
## Type: native int
## Location: 1:800
## Links: 1
## Chunks: {5, 5} 50 bytes
## Storage: 50 logical bytes, 14 allocated bytes, 357.14% utilization
## Filter-0: shuffle-2 OPT {2}
## Filter-1: deflate-1 OPT {6}
## Type: native short
Here the most important part is the final line which confirms the datatype is "native short" a.k.a native int16.

*** caught segfault *** address 0x2aaeb4b6f440, cause 'memory not mapped' when Fortran is called by R

I have a very specific error so googling wasn't helpful and I'm sorry I don't know how to provide a simple producible example for this issue. The code below runs perfectly on my local machine but on the HPC it is producing this error:
*** caught segfault ***
address 0x2ad718ba0440, cause 'memory not mapped'
Traceback:
1: array(.Fortran("hus_vertical_interpolation", m = as.integer(DIM[1]), n = as.integer(DIM[2]), o = as.integer(DIM[3]), p = as.integer(DIM[4]), req = as.integer(length(req_press_levels)), hus_on_model_level = as.numeric(spec_hum_data[]), pres = as.numeric(req_press_levels), pressure_full_level = as.numeric(pressure[]), hus_on_press_level = as.numeric(output_array[]))$hus_on_press_level, dim = output_DIM)
2: Specific_humidity_afterburner(spec_hum_file = q_nc.files[x], req_press_levels = required_PLev)
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:
The code is supposed to:
Loop over a vector of NetCDF files and pass the filename spec_hum_file to function Specific_humidity_afterburner.
The function reads the NetCDF file, extract data pass to the first compiled subroutine, do the math and return the values.
Take the result, pass it to another FORTRAN subroutine and return the second result.
Write the second result to a new NetCDF file.
The error occurs in step 3. The R function is:
Specific_humidity_afterburner<-function(spec_hum_file,req_press_levels){
require(ff)
require(ncdf4)
require(stringi)
require(DescTools)
library(stringr)
library(magrittr)
#1============================================================================
#Reading data from netCDF file
#2============================================================================
#Reading other variables
#3============================================================================
# First Fortran subroutine
#4============================================================================
#load vertical interpolate subroutine for specific humidity
dyn.load("spec_hum_afterburner/vintp2p_afterburner_hus.so")
#check
is.loaded("hus_vertical_interpolation")
DIM<-dim(spec_hum_data)
output_DIM<-c(DIM[1],DIM[2],length(req_press_levels),DIM[4])
output_array<-ff(array(0.00,dim =output_DIM),dim =output_DIM)
result<- array(.Fortran("hus_vertical_interpolation",
m=as.integer(DIM[1]),
n=as.integer(DIM[2]),
o=as.integer(DIM[3]),
p=as.integer(DIM[4]),
req = as.integer(length(req_press_levels)),
pres=as.numeric(req_press_levels),
pressure_full_level=as.numeric(pressure[]),
hus_on_model_level=as.numeric(spec_hum_data[]),
hus_on_press_level=as.numeric(output_array[]))$hus_on_press_level,
dim =output_DIM)
DIMNAMES<-dimnames(spec_hum_data)
DIMNAMES[["lev"]]<-req_press_levels
Specific_humidity<- ff(result, dim = output_DIM,
dimnames =DIMNAMES )
rm(result)
#5============================================================================
# Writing NetCDF file of the interpolated values
}
Fortran subroutine:
subroutine hus_vertical_interpolation(m,n,o,p,req,pres, &
pressure_full_level,hus_on_model_level,hus_on_press_level)
implicit none
integer :: m,n,o,p,req
integer :: x,y,s,t,plev
double precision :: pres(req),hus_on_model_level(m,n,o,p)
double precision :: pressure_full_level(m,n,o,p)
double precision :: delta_hus,delta_p,grad_hus_p,diff_p
double precision, intent(out) :: hus_on_press_level(m,n,req,p)
real :: arg = -1.0,NaN
NaN= sqrt(arg)
do plev=1,req
do t=1,p
do x=1,m
do y=1,n
do s=1,o
!above uppest level
if(pres(plev) .LT. pressure_full_level(x,y,1,t)) then
hus_on_press_level(x,y,plev,t) = NaN
end if
! in between levels
if(pres(plev) .GE. pressure_full_level(x,y,s,t) .AND. pres(plev) .LE. &
pressure_full_level(x,y,s+1,t) ) then
delta_hus = hus_on_model_level(x,y,s,t) - hus_on_model_level(x,y,s+1,t)
delta_p = log(pressure_full_level(x,y,s,t))&
- log(pressure_full_level(x,y,s+1,t))
grad_hus_p = delta_hus /delta_p
diff_p = log(pres(plev)) - log(pressure_full_level(x,y,s,t))
hus_on_press_level(x,y,plev,t) = hus_on_model_level(x,y,s,t)&
+ grad_hus_p * diff_p
end if
! extrapolation below the ground
if(pres(plev) .GT. pressure_full_level(x,y,o,t)) then
hus_on_press_level(x,y,plev,t) = hus_on_model_level(x,y,o,t)
end if
end do
end do
end do
end do
end do
end subroutine hus_vertical_interpolation
Fortran subroutine was compiled with:
gfortran -fPIC -shared -ffree-form vintp2p_afterburner_hus.f90 -o vintp2p_afterburner_hus.so
The error behaviour is unpredictable for example, can happen at index 1, 2, 8, .. etc of the loop. We have tried to hand over the big array to the Fortran subroutine as the last variable, it minimized the occurrence of the error.
Also, the NetCDF files have a size of ~2GB. Another point to mention, The modules are built with EasyBuild so conflicts are not probable as HPC support team stated. We have tried many solutions as far as we know and no progress!

Julia parallel behavior different in v0.5.0 and v0.6.0?

I get different behavior for the same code below for Julia 0.5.0 and 0.6.0
workspace()
rmprocs(workers())
addprocs(2)
r = #spawnat workers()[2] #eval a=20000
println(fetch(r)) # prints 20000 as expected
a = 1 # assign value to a in process 1
r = #spawnat workers()[2] #eval a
println(fetch(r)) # prints 20000 as expected
r = #spawnat(workers()[2], getfield(Main, :a))
println(fetch(r)) # prints 20000 as expected, equivalent to previous fetch
#sync #spawnat workers()[2] println(a) # prints 1 as expected
r = #sync #spawnat workers()[1] #eval a=10000 # define variable a on workers()[1]
#everywhere println(a) # different results on v0.5.0 and v0.6.0
The difference is shown below - namely, that workers()[2] gets a value of a from process 1 which is never explicitly assigned to it. The 0.5.0 code works like I expect it to, and 0.6.0 does not. Any ideas what could be going on here or is it something I don't understand?
v0.5 v0.6
WARNING: rmprocs: process 1 not removed | WARNING: rmprocs: process 1 not removed
20000 | 20000
20000 | 20000
20000 | 20000
From worker 3: 1 | From worker 3: 1
1 | 1
From worker 2: 10000 | From worker 2: 10000
From worker 3: 20000 | From worker 3: 1
I think this has to do with global variables in v0.6.0 onwards.
Global constants (in Module Main) are declared as constants on remote nodes too. If I had used a let statement in the above code as follows:
let a=a
#sync #spawnat workers()[2] println(a) # prints 1 as expected
end
then 0.5.0 and 0.6.0 produce the same result with the #everywhere println(a) on the last line of my previously posted code snippet. I probably need to delve further into channels and futures for parallel code which requires data transfer, initializing variables on parallel processes, etc.

What is the difference between #code_native, #code_typed and #code_llvm in Julia?

While going through julia, I wanted to have a functionality similar to python's dis module.
Going through over the net, I found out that the Julia community have worked over this issue and given these (https://github.com/JuliaLang/julia/issues/218)
finfer -> code_typed
methods(function, types) -> code_lowered
disassemble(function, types, true) -> code_native
disassemble(function, types, false) -> code_llvm
I have tried these personally using the Julia REPL, but I quite seem to find it hard to understand.
In Python, I can disassemble a function like this.
>>> import dis
>>> dis.dis(lambda x: 2*x)
1 0 LOAD_CONST 1 (2)
3 LOAD_FAST 0 (x)
6 BINARY_MULTIPLY
7 RETURN_VALUE
>>>
Can anyone who has worked with these help me understand them more? Thanks.
The standard CPython implementation of Python parses source code and does some pre-processing and simplification of it – aka "lowering" – transforming it to a machine-friendly, easy-to-interpret format called "bytecode". This is what is displayed when you "disassemble" a Python function. This code is not executable by the hardware – it is "executable" by the CPython interpreter. CPython's bytecode format is fairly simple, partly because that's what interpreters tend to do well with – if the bytecode is too complex, it slows down the interpreter – and partly because the Python community tends to put a high premium on simplicity, sometimes at the cost of high performance.
Julia's implementation is not interpreted, it is just-in-time (JIT) compiled. This means that when you call a function, it is transformed to machine code which is executed directly by the native hardware. This process is quite a bit more complex than the parsing and lowering to bytecode that Python does, but in exchange for that complexity, Julia gets its hallmark speed. (The PyPy JIT for Python is also much more complex than CPython but also typically much faster – increased complexity is a fairly typical cost for speed.) The four levels of "disassembly" for Julia code give you access to the representation of a Julia method implementation for particular argument types at different stages of the transformation from source code to machine code. I'll use the following function which computes the next Fibonacci number after its argument as an example:
function nextfib(n)
a, b = one(n), one(n)
while b < n
a, b = b, a + b
end
return b
end
julia> nextfib(5)
5
julia> nextfib(6)
8
julia> nextfib(123)
144
Lowered code. The #code_lowered macro displays code in a format that is the closest to Python byte code, but rather than being intended for execution by an interpreter, it's intended for further transformation by a compiler. This format is largely internal and not intended for human consumption. The code is transformed into "single static assignment" form in which "each variable is assigned exactly once, and every variable is defined before it is used". Loops and conditionals are transformed into gotos and labels using a single unless/goto construct (this is not exposed in user-level Julia). Here's our example code in lowered form (in Julia 0.6.0-pre.beta.134, which is just what I happen to have available):
julia> #code_lowered nextfib(123)
CodeInfo(:(begin
nothing
SSAValue(0) = (Main.one)(n)
SSAValue(1) = (Main.one)(n)
a = SSAValue(0)
b = SSAValue(1) # line 3:
7:
unless b < n goto 16 # line 4:
SSAValue(2) = b
SSAValue(3) = a + b
a = SSAValue(2)
b = SSAValue(3)
14:
goto 7
16: # line 6:
return b
end))
You can see the SSAValue nodes and unless/goto constructs and label numbers. This is not that hard to read, but again, it's also not really meant to be easy for human consumption. Lowered code doesn't depend on the types of the arguments, except in as far as they determine which method body to call – as long as the same method is called, the same lowered code applies.
Typed code. The #code_typed macro presents a method implementation for a particular set of argument types after type inference and inlining. This incarnation of the code is similar to the lowered form, but with expressions annotated with type information and some generic function calls replaced with their implementations. For example, here is the type code for our example function:
julia> #code_typed nextfib(123)
CodeInfo(:(begin
a = 1
b = 1 # line 3:
4:
unless (Base.slt_int)(b, n)::Bool goto 13 # line 4:
SSAValue(2) = b
SSAValue(3) = (Base.add_int)(a, b)::Int64
a = SSAValue(2)
b = SSAValue(3)
11:
goto 4
13: # line 6:
return b
end))=>Int64
Calls to one(n) have been replaced with the literal Int64 value 1 (on my system the default integer type is Int64). The expression b < n has been replaced with its implementation in terms of the slt_int intrinsic ("signed integer less than") and the result of this has been annotated with return type Bool. The expression a + b has been also replaced with its implementation in terms of the add_int intrinsic and its result type annotated as Int64. And the return type of the entire function body has been annotated as Int64.
Unlike lowered code, which depends only on argument types to determine which method body is called, the details of typed code depend on argument types:
julia> #code_typed nextfib(Int128(123))
CodeInfo(:(begin
SSAValue(0) = (Base.sext_int)(Int128, 1)::Int128
SSAValue(1) = (Base.sext_int)(Int128, 1)::Int128
a = SSAValue(0)
b = SSAValue(1) # line 3:
6:
unless (Base.slt_int)(b, n)::Bool goto 15 # line 4:
SSAValue(2) = b
SSAValue(3) = (Base.add_int)(a, b)::Int128
a = SSAValue(2)
b = SSAValue(3)
13:
goto 6
15: # line 6:
return b
end))=>Int128
This is the typed version of the nextfib function for an Int128 argument. The literal 1 must be sign extended to Int128 and the result types of operations are of type Int128 instead of Int64. The typed code can be quite different if the implementation of a type is considerably different. For example nextfib for BigInts is significantly more involved than for simple "bits types" like Int64 and Int128:
julia> #code_typed nextfib(big(123))
CodeInfo(:(begin
$(Expr(:inbounds, false))
# meta: location number.jl one 164
# meta: location number.jl one 163
# meta: location gmp.jl convert 111
z#_5 = $(Expr(:invoke, MethodInstance for BigInt(), :(Base.GMP.BigInt))) # line 112:
$(Expr(:foreigncall, (:__gmpz_set_si, :libgmp), Void, svec(Ptr{BigInt}, Int64), :(&z#_5), :(z#_5), 1, 0))
# meta: pop location
# meta: pop location
# meta: pop location
$(Expr(:inbounds, :pop))
$(Expr(:inbounds, false))
# meta: location number.jl one 164
# meta: location number.jl one 163
# meta: location gmp.jl convert 111
z#_6 = $(Expr(:invoke, MethodInstance for BigInt(), :(Base.GMP.BigInt))) # line 112:
$(Expr(:foreigncall, (:__gmpz_set_si, :libgmp), Void, svec(Ptr{BigInt}, Int64), :(&z#_6), :(z#_6), 1, 0))
# meta: pop location
# meta: pop location
# meta: pop location
$(Expr(:inbounds, :pop))
a = z#_5
b = z#_6 # line 3:
26:
$(Expr(:inbounds, false))
# meta: location gmp.jl < 516
SSAValue(10) = $(Expr(:foreigncall, (:__gmpz_cmp, :libgmp), Int32, svec(Ptr{BigInt}, Ptr{BigInt}), :(&b), :(b), :(&n), :(n)))
# meta: pop location
$(Expr(:inbounds, :pop))
unless (Base.slt_int)((Base.sext_int)(Int64, SSAValue(10))::Int64, 0)::Bool goto 46 # line 4:
SSAValue(2) = b
$(Expr(:inbounds, false))
# meta: location gmp.jl + 258
z#_7 = $(Expr(:invoke, MethodInstance for BigInt(), :(Base.GMP.BigInt))) # line 259:
$(Expr(:foreigncall, ("__gmpz_add", :libgmp), Void, svec(Ptr{BigInt}, Ptr{BigInt}, Ptr{BigInt}), :(&z#_7), :(z#_7), :(&a), :(a), :(&b), :(b)))
# meta: pop location
$(Expr(:inbounds, :pop))
a = SSAValue(2)
b = z#_7
44:
goto 26
46: # line 6:
return b
end))=>BigInt
This reflects the fact that operations on BigInts are pretty complicated and involve memory allocation and calls to the external GMP library (libgmp).
LLVM IR. Julia uses the LLVM compiler framework to generate machine code. LLVM defines an assembly-like language which it uses as a shared intermediate representation (IR) between different compiler optimization passes and other tools in the framework. There are three isomorphic forms of LLVM IR:
A binary representation that is compact and machine readable.
A textual representation that is verbose and somewhat human readable.
An in-memory representation that is generated and consumed by LLVM libraries.
Julia uses LLVM's C++ API to construct LLVM IR in memory (form 3) and then call some LLVM optimization passes on that form. When you do #code_llvm you see the LLVM IR after generation and some high-level optimizations. Here's LLVM code for our ongoing example:
julia> #code_llvm nextfib(123)
define i64 #julia_nextfib_60009(i64) #0 !dbg !5 {
top:
br label %L4
L4: ; preds = %L4, %top
%storemerge1 = phi i64 [ 1, %top ], [ %storemerge, %L4 ]
%storemerge = phi i64 [ 1, %top ], [ %2, %L4 ]
%1 = icmp slt i64 %storemerge, %0
%2 = add i64 %storemerge, %storemerge1
br i1 %1, label %L4, label %L13
L13: ; preds = %L4
ret i64 %storemerge
}
This is the textual form of the in-memory LLVM IR for the nextfib(123) method implementation. LLVM is not easy to read – it's not intended to be written or read by people most of the time – but it is thoroughly specified and documented. Once you get the hang of it, it's not hard to understand. This code jumps to the label L4 and initializes the "registers" %storemerge1 and %storemerge with the i64 (LLVM's name for Int64) value 1 (their values are derived differently when jumped to from different locations – that's what the phi instruction does). It then does a icmp slt comparing %storemerge with register %0 – which holds the argument untouched for the entire method execution – and saves the comparison result into the register %1. It does an add i64 on %storemerge and %storemerge1 and saves the result into register %2. If %1 is true, it branches back to L4 and otherwise it branches to L13. When the code loops back to L4 the register %storemerge1 gets the previous values of %storemerge and %storemerge gets the previous value of %2.
Native code. Since Julia executes native code, the last form a method implementation takes is what the machine actually executes. This is just binary code in memory, which is rather hard to read, so long ago people invented various forms of "assembly language" which represent instructions and registers with names and have some amount of simple syntax to help express what instructions do. In general, assembly language remains close to one-to-one correspondence with machine code, in particular, one can always "disassemble" machine code into assembly code. Here's our example:
julia> #code_native nextfib(123)
.section __TEXT,__text,regular,pure_instructions
Filename: REPL[1]
pushq %rbp
movq %rsp, %rbp
movl $1, %ecx
movl $1, %edx
nop
L16:
movq %rdx, %rax
Source line: 4
movq %rcx, %rdx
addq %rax, %rdx
movq %rax, %rcx
Source line: 3
cmpq %rdi, %rax
jl L16
Source line: 6
popq %rbp
retq
nopw %cs:(%rax,%rax)
This is on an Intel Core i7, which is in the x86_64 CPU family. It only uses standard integer instructions, so it doesn't matter beyond that what the architecture is, but you can get different results for some code depending on the specific architecture of your machine, since JIT code can be different on different systems. The pushq and movq instructions at the beginning are a standard function preamble, saving registers to the stack; similarly, popq restores the registers and retq returns from the function; nopw is a 2-byte instruction that does nothing, included just to pad the length of the function. So the meat of the code is just this:
movl $1, %ecx
movl $1, %edx
nop
L16:
movq %rdx, %rax
Source line: 4
movq %rcx, %rdx
addq %rax, %rdx
movq %rax, %rcx
Source line: 3
cmpq %rdi, %rax
jl L16
The movl instructions at the top initialize registers with 1 values. The movq instructions move values between registers and the addq instruction adds registers. The cmpq instruction compares two registers and jl either jumps back to L16 or continues to return from the function. This handful of integer machine instructions in a tight loop is exactly what executes when your Julia function call runs, presented in slightly more pleasant human-readable form. It's easy to see why it runs fast.
If you're interested in JIT compilation in general as compared to interpreted implementations, Eli Bendersky has a great pair of blog posts where he goes from a simple interpreter implementation of a language to a (simple) optimizing JIT for the same language:
http://eli.thegreenplace.net/2017/adventures-in-jit-compilation-part-1-an-interpreter/
http://eli.thegreenplace.net/2017/adventures-in-jit-compilation-part-2-an-x64-jit.html

Struct member selected from type, it is not visible and will not be selected

I have a function that uses the Unix lib for its time functions:
let rfc822 (t: Unix.tm) : string =
Printf.sprintf "%s, %s %s %d %s:%s:%s %s"
(List.nth short_days t.tm_wday)
(padInt t.tm_yday 2 "0")
(List.nth short_month t.tm_mon)
(t.tm_year + 1900)
(padInt t.tm_hour 2 "0")
(padInt t.tm_min 2 "0")
(padInt t.tm_sec 2 "0")
"GMT"
I'm getting this warning:
ocamlbuild -libs unix,str -Is recore/src,ostd/src,owebl/src app.native
+ /usr/bin/ocamlc -c -I recore/src -I ostd/src -I owebl/src -o recore/src/time.cmo recore/src/time.ml
File "recore/src/time.ml", line 45, characters 27-34:
Warning 40: tm_wday was selected from type Unix.tm.
It is not visible in the current scope, and will not
be selected if the type becomes unknown.
File "recore/src/time.ml", line 46, characters 14-21:
Warning 40: tm_yday was selected from type Unix.tm.
It is not visible in the current scope, and will not
be selected if the type becomes unknown.
File "recore/src/time.ml", line 46, characters 4-28:
Error: This expression has type 'a -> string
but an expression was expected of type string
Command exited with code 2.
Compilation unsuccessful after building 13 targets (12 cached) in 00:00:00.
Makefile:8: recipe for target 'old' failed
make: *** [old] Error 10
How do I deal with this warning? I would much rather avoid opening the Unix module if possible.
(Please ignore the actual compile error.)
You can write t.Unix.tm_yday
$ ocaml
OCaml version 4.02.1
# let f (t: Unix.tm) = t.tm_yday;;
Warning 40: tm_yday was selected from type Unix.tm.
It is not visible in the current scope, and will not
be selected if the type becomes unknown.
val f : Unix.tm -> int = <fun>
# let f (t: Unix.tm) = t.Unix.tm_yday;;
val f : Unix.tm -> int = <fun>
Update
To find this in the documents, you need to look for the definition of field:
field ::= [ module-path . ] field-name
A field name can include a module name (or a sequence of module names, for nested modules) before the field name itself.
Update 2
There are also two syntaxes for opening a module locally. They look like overkill for this tiny function, but might be tidier for more complex ones. The module's symbols are directly available throughout the subexpression.
$ ocaml
OCaml version 4.02.1
# let f t = Unix.(t.tm_yday);;
val f : Unix.tm -> int = <fun>
# let f t = let open Unix in t.tm_yday;;
val f : Unix.tm -> int = <fun>
These are documented as language extensions in Local opens.

Resources