Odd type construction behaviour - julia

Working with MD simulations, I need to enforce periodic boundary conditions on particle positions. The simplest way of which is using mod(particle position, box dimension). Since I'm working in 3D space, I made a 3D vector type:
immutable Vec3
x::Float32
y::Float32
z::Float32
end
And a mod function:
f1(a::Vec3, b::Vec3) = Vec3(mod(a.x, b.x), mod(a.y, b.y), mod(a.z, b.z))
However, when using this, it fails horribly:
julia> a = Vec3(11,-2,5)
Vec3(11.0f0,-2.0f0,5.0f0)
julia> b = Vec3(10,10,10)
Vec3(10.0f0,10.0f0,10.0f0)
julia> f1(a,b)
Vec3(5.0f0,10.0f0,NaN32)
If I simply return a tuple, it works fine:
f2(a::Vec3, b::Vec3) = mod(a.x,b.x), mod(a.y,b.y), mod(a.z,b.z)
julia> f2(a,b)
(1.0f0,8.0f0,5.0f0)
As a test to see if it was not liking the mod inside the type constructor, I tried a more verbose method:
function f3(a::Vec3, b::Vec3)
x = mod(a.x,b.x)
y = mod(a.y,b.y)
z = mod(a.z,b.z)
return Vec3(x,y,z)
end
julia> f3(a,b)
Vec3(5.0f0,10.0f0,NaN32)
And then, a version printing the intermediates:
function f4(a::Vec3, b::Vec3)
x = mod(a.x,b.x)
y = mod(a.y,b.y)
z = mod(a.z,b.z)
println(x, " ", y, " ", z)
return Vec3(x,y,z)
end
julia> f4(a,b)
1.0 8.0 5.0
Vec3(1.0f0,8.0f0,5.0f0)
Which for some reason now works. I've tried this on multiple computers now, each with the same result. If someone could shed some light on this, I would be most thankful. Julia version is: Version 0.3.2 (2014-10-21 20:18 UTC)

What works and what doesn't
I think this may be a bug, perhaps even an LLVM bug. I was able to reproduce your error on version 0.3.0, but not on version 0.4. Like you, I also obtained correct results by inserting a print statement in the middle.
Furthermore, I found that both the simpler
f1(a::Vec3, b::Vec3) = Vec3(mod(a.x,b.x),mod(a.y,b.y),1)
julia> f1(a,b)
Vec3(1.0f0,8.0f0,1.0f0)
AND the more complicated
julia> f1(a::Vec3, b::Vec3) = Vec3(mod(a.x,b.x),mod(a.y,b.y),mod(a.z,b.z) + 1)
f1 (generic function with 1 method)
julia> f1(a,b)
Vec3(1.0f0,8.0f0,6.0f0)
both work, but the following one doesn't
julia> f1(a::Vec3, b::Vec3) = Vec3(mod(a.x,b.x),mod(a.y,b.y),mod(a.z,b.z))
f1 (generic function with 1 method)
julia> f1(a,b)
Vec3(5.0f0,10.0f0,NaN32)
To LLVM
The LLVM source also looks correct. The various parts of each of the input Vec3 arguments are loaded mod is taken of the arguments using (fadd,frem,fadd) and then the results are stored in the result.
julia> code_llvm(f1,(Vec3,Vec3))
define %Vec3 #"julia_f1;20242"(%Vec3, %Vec3) {
top:
%2 = extractvalue %Vec3 %1, 0, !dbg !1733
%3 = extractvalue %Vec3 %1, 1, !dbg !1733
%4 = extractvalue %Vec3 %1, 2, !dbg !1733
%5 = extractvalue %Vec3 %0, 0, !dbg !1733
%6 = frem float %5, %2, !dbg !1733
%7 = fadd float %2, %6, !dbg !1733
%8 = frem float %7, %2, !dbg !1733
%9 = insertvalue %Vec3 undef, float %8, 0, !dbg !1733
%10 = extractvalue %Vec3 %0, 1, !dbg !1733
%11 = frem float %10, %3, !dbg !1733
%12 = fadd float %3, %11, !dbg !1733
%13 = frem float %12, %3, !dbg !1733
%14 = insertvalue %Vec3 %9, float %13, 1, !dbg !1733
%15 = extractvalue %Vec3 %0, 2, !dbg !1733
%16 = frem float %15, %4, !dbg !1733
%17 = fadd float %4, %16, !dbg !1733
%18 = frem float %17, %4, !dbg !1733
%19 = insertvalue %Vec3 %14, float %18, 2, !dbg !1733, !julia_type !1734
ret %Vec3 %19, !dbg !1733
Native Code Error?
But the native instructions look incorrect, XMM2 is moved to XMM0, and later XMM0 is used as an operand to addss, but XMM2 doesn't appear to be initialized.
julia> code_native(f1,(Vec3,Vec3))
.section __TEXT,__text,regular,pure_instructions
Filename: none
Source line: 1
push RBP
mov RBP, RSP
sub RSP, 16
movss DWORD PTR [RBP - 4], XMM5
Source line: 1
movaps XMM0, XMM2
movaps XMM1, XMM5
movabs RAX, 140735600044048
call RAX
movss XMM1, DWORD PTR [RBP - 4]
addss XMM0, XMM1
movabs RAX, 140735600044048
add RSP, 16
pop RBP
jmp RAX
Update:
Submitted this issue for possible LLVM error.

Related

Make type of literal constant depend on other variables

I have the following code in Julia, in which the literal constant 2. does a multiplication on array elements. I made the literal constant now single precision (2.f0), but I would like to let the type depend on the other variables (these are either all Float64 or all Float32). How do I do this in an elegant way?
function diff!(
at, a,
visc, dxidxi, dyidyi, dzidzi,
itot, jtot, ktot)
​
#tturbo for k in 2:ktot-1
for j in 2:jtot-1
for i in 2:itot-1
at[i, j, k] += visc * (
(a[i-1, j , k ] - 2.f0 * a[i, j, k] + a[i+1, j , k ]) * dxidxi +
(a[i , j-1, k ] - 2.f0 * a[i, j, k] + a[i , j+1, k ]) * dyidyi +
(a[i , j , k-1] - 2.f0 * a[i, j, k] + a[i , j , k+1]) * dzidzi )
end
end
end
end
In general, if you have a scalar x or an array A, you can get the type with T = typeof(x) or T = eltype(A), respectively, and then use that to convert a literal to the equivalent type, e.g.
julia> A = [1.0]
1-element Vector{Float64}:
1.0
julia> T = eltype(A)
Float64
julia> T(2)
2.0
So you could in principle use that within the function, and if everything is type-stable, this should actually be overhead-free:
julia> #code_native 2 * 1.0f0
.section __TEXT,__text,regular,pure_instructions
; ┌ # promotion.jl:322 within `*'
; │┌ # promotion.jl:292 within `promote'
; ││┌ # promotion.jl:269 within `_promote'
; │││┌ # number.jl:7 within `convert'
; ││││┌ # float.jl:94 within `Float32'
vcvtsi2ss %rdi, %xmm1, %xmm1
; │└└└└
; │ # promotion.jl:322 within `*' # float.jl:331
vmulss %xmm0, %xmm1, %xmm0
; │ # promotion.jl:322 within `*'
retq
nopw (%rax,%rax)
; └
julia> #code_native 2.0f0 * 1.0f0
.section __TEXT,__text,regular,pure_instructions
; ┌ # float.jl:331 within `*'
vmulss %xmm1, %xmm0, %xmm0
retq
nopw %cs:(%rax,%rax)
; └
julia> #code_native Float32(2) * 1.0f0
.section __TEXT,__text,regular,pure_instructions
; ┌ # float.jl:331 within `*'
vmulss %xmm1, %xmm0, %xmm0
retq
nopw %cs:(%rax,%rax)
; └
As it happens however, there is a somewhat more elegant pattern in Julia for writing a function signature such that it will specialize parametrically on the element type of the arrays you are passing to this function, which you should then be able to use without overhead to ensure your literals are of the appropriate type as follows:
function diff!(at::AbstractArray{T}, a::AbstractArray{T},
visc, dxidxi, dyidyi, dzidzi,
itot, jtot, ktot) where T <: Number
#tturbo for k in 2:ktot-1
for j in 2:jtot-1
for i in 2:itot-1
at[i, j, k] += visc * (
(a[i-1, j , k ] - T(2) * a[i, j, k] + a[i+1, j , k ]) * dxidxi +
(a[i , j-1, k ] - T(2) * a[i, j, k] + a[i , j+1, k ]) * dyidyi +
(a[i , j , k-1] - T(2) * a[i, j, k] + a[i , j , k+1]) * dzidzi )
end
end
end
end
This sort of approach is discussed to some extent in the documentation regarding parametric methods in Julia
There's a nice little function in Base:
help?> oftype
search: oftype
oftype(x, y)
Convert y to the type of x (convert(typeof(x), y)).
Examples
≡≡≡≡≡≡≡≡≡≡
julia> x = 4;
julia> y = 3.;
julia> oftype(x, y)
3
julia> oftype(y, x)
4.0
So you could use something like
two = oftype(at[i,j,k], 2)
in the appropriate place.
For multiple variables at once, you could write something like
two, visc, dxidxi, dyidyi, dzidzi = convert.(T, 2, visc, dxidxi, dyidyi, dzidzi)
at the top (with T a type parameter as in #cbk's answer), since oftype(x, y) = convert(typeof(x), y).

Is it valid to add an entire array of bytes at once by converting them to a larger integer data type?

If I have two arrays that contain u8s, can I convert them into a larger integer type to reduce the number of additions I need to do? For example, if two byte arrays each contain 4 bytes, can I make them each into a u32, do the addition, and then convert them back?
For example:
let a = u32::from_ne_bytes([1, 2, 3, 4]);
let b = u32::from_ne_bytes([5, 6, 7, 8]);
let c = a + b;
let c_bytes = u32::to_ne_bytes(c);
assert_eq!(c_bytes, [6, 8, 10, 12]);
This example results in the correct output.
Does this always result in the right output (assuming there is no overflow)?
Is this faster than just doing the additions individually?
Does it hold true for other integer types? Such as 2 u16s in a u32 added with 2 other u16s in a u32?
If this exists and is common, what is it called?
Does this always result in the right output (assuming there is no overflow)?
Yes. Provided that each sum is less than 256, this will add the bytes as you want. You've specified "ne" in each case, for native endianness. This will work, regardless of the native endianness because the operations are byte-wise.
If you wrote code to actually check that the sums are all in range, then you would almost certainly undo any extra speed-up that you had got (if there was any to begin with).
Is this faster than just doing the additions individually?
Maybe. The only way to know for sure is to test.
Does it hold true for other integer types? Such as 2 u16s in a u32 added with 2 other u16s in a u32?
Yes, but you need to pay attention to byte order.
If this exists and is common, what is it called?
It's not common because it's usually unnecessary. This type of optimisation makes code harder to read and introduces considerable complexity and opportunities for bugs. The Rust compiler and LLVM between them are able to find extremely sophisticated optimisations, that you would never think of, while your code stays readable and maintainable.
If it has a name, it's SIMD, and most modern processor support a form of it natively (SSE, MMX, AVX). You can do this manually, using the built-in functions, e.g. core::arch::x86_64::_mm_add_epi8, but LLVM might do it automatically. It's possible that trying to do this manually could interfere with optimisations that LLVM would otherwise do, while making your code more bug-prone at the same time.
I'm not an expert at assembly code by any means, but I took at a look at the assembly generated for the following two functions:
#[no_mangle]
#[inline(never)]
pub fn f1(a1: u8, b1: u8, c1: u8, d1: u8, a2: u8, b2: u8, c2: u8, d2: u8) -> [u8; 4]{
let a = u32::from_le_bytes([a1, b1, c1, d1]);
let b = u32::from_le_bytes([a2, b2, c2, d2]);
u32::to_le_bytes(a + b)
}
#[no_mangle]
#[inline(never)]
pub fn f2(a1: u8, b1: u8, c1: u8, d1: u8, a2: u8, b2: u8, c2: u8, d2: u8) -> [u8; 4]{
[a1 + a2, b1 + b2, c1 + c2, d1 + d2]
}
The assembly for f1:
movzx r10d, byte ptr [rsp + 8]
shl ecx, 24
movzx eax, dl
shl eax, 16
movzx edx, sil
shl edx, 8
movzx esi, dil
or esi, edx
or esi, eax
or esi, ecx
mov ecx, dword ptr [rsp + 16]
shl ecx, 24
shl r10d, 16
movzx edx, r9b
shl edx, 8
movzx eax, r8b
or eax, edx
or eax, r10d
or eax, ecx
add eax, esi
ret
And for f2:
add r8b, dil
add r9b, sil
add dl, byte ptr [rsp + 8]
add cl, byte ptr [rsp + 16]
movzx ecx, cl
shl ecx, 24
movzx edx, dl
shl edx, 16
movzx esi, r9b
shl esi, 8
movzx eax, r8b
or eax, esi
or eax, edx
or eax, ecx
ret
Fewer instructions doesn't necessarily make it faster, but it's not a bad guideline.
Consider this kind of optimisation as a last resort, after careful measurement and testing.

Fibonacci with recursion steps shown indented / nested

This is the format that I need:
F(3) = F(2) + F(1) =
F(2) = (F1) + F(0) =
F(1) = 1
F(0) = 1
F(2) = 1
F(1) = 1
F(3) = 2
and this is my code, how am I going to do to get the format I want?
Please give me a hint or something that may help, thank you. I just start learning assembly language.
I only know how to show the first line like f()= the answer, but don't know how to show the process.
.data
fib1 BYTE "f(",0
fib2 BYTE ") + f(",0
fib3 BYTE ") = ",0
intVal DWORD ?
main PROC
mov edx, OFFSET fib1 ;show f(intVal)=
call WriteString
mov edx, intVal
call WriteDec
mov edx, OFFSET fib3
call WriteString
mov ecx, intVal-1
push intVal
call fib
add esp, 4
call WriteDec ;show result
call crlf
mov edx, OFFSET msg5 ;show goodbye msg
call WriteString
mov edx, OFFSET username
call WriteString
exit
main ENDP
fib PROC c
add ecx, 1
push ebp
mov ebp, esp
sub esp,4
mov eax, [ebp+8] ;get value
cmp eax,2 ;if ((n=1)or(n=2))
je S4
cmp eax,1
je S4
dec eax ;do fib(n-1)+fib(n-2)
push eax ;fib(n-1)
call fib
mov [ebp-4], eax ;store first result
dec dword ptr [esp] ;(n-1) -> (n-2)
call fib
add esp,4 ;clear
add eax,[ebp-4] ;add result and stored first result
jmp Quit
S4:
mov eax,1 ;start from 1,1
Quit:
mov esp,ebp ;restore esp
pop ebp ;restore ebp
ret
fib ENDP
END main
The code needs to output a "newline" at the end of each line of output, which could be a carriage return (00dh) followed by a linefeed (00ah) or just a linefeed (00ah) depending on the system (I don't know the irvine setup).
The indented lines should be printed from within the fib function, which means you have to save (push/pop stack) any registers that the print functions use.
The fib function needs to print out a variable number of spaces depending on the level of recursion, based on the text, 2 spaces per level of recursion.
The fib function needs to handle an input of 0 and return 0.
Note that the number of recursive calls to the fib function will be 2 * fib(n) - 1, assuming fib checks for fib(0), fib(1), fib(2) (more if it doesn't check for fib(2) ), which would be 5.94 billion calls for fib(47) = 2971215073, the max value for 32 bit unsigned integers. You may want to limit inputs to something like fib(10) = 55 or fib(11) = 89 or fib(12) = 144.

How to count number of assignments in Julia?

I know that Julia has a #time macro that outputs the amount of memory that is allocated, but is there any way to measure the number of assignments made in a function?
The problem is counting assignments is that by the time the machine runs the code, register or memory loads and stores no longer correspond to the assignments of the original code. For instance, the code
julia> g(x) = x^3
g (generic function with 1 method)
julia> #code_llvm g(1)
define i64 #julia_g_70778(i64) #0 {
top:
%1 = mul i64 %0, %0
%2 = mul i64 %1, %0
ret i64 %2
}
julia> #code_native g(1)
.text
Filename: REPL[7]
pushq %rbp
movq %rsp, %rbp
Source line: 1
movq %rdi, %rax
imulq %rax, %rax
imulq %rdi, %rax
popq %rbp
retq
nopw %cs:(%rax,%rax)
clearly has four "assignments", two movq and two imulq. But the original code did not have a single assignment.
The closest you can get, therefore, is to use a macro to rewrite assignments so that they increment a counter (in addition to actually doing the assigning). This will of course likely slow down your code substantially, so I do not recommend it.

Access the AST for generic functions in Julia

How can I access the abstract syntax tree for a generic function in Julia?
To recap: It looks like Simon was looking for the AST for a specific method associated with a generic function. We can get a LambdaStaticData object, which contains the AST, for a specific method as follows:
julia> f(x,y)=x+y
julia> f0 = methods(f, (Any, Any))[1]
((Any,Any),(),AST(:($(expr(:lambda, {x, y}, {{}, {{x, Any, 0}, {y, Any, 0}}, {}}, quote # none, line 1:
return +(x,y)
end)))),())
julia> f0[3]
AST(:($(expr(:lambda, {x, y}, {{}, {{x, Any, 0}, {y, Any, 0}}, {}}, quote # none, line 1:
return +(x,y)
end))))
julia> typeof(ans)
LambdaStaticData
Apparently this AST can either be an Expr object or a compressed AST object, represented as a sequence of bytes:
julia> typeof(f0[3].ast)
Array{Uint8,1}
The show() method for LambdaStaticData from base/show.jl illustrates how to decompress this, when encountered:
julia> ccall(:jl_uncompress_ast, Any, (Any, Any), f0[3], f0[3].ast)
:($(expr(:lambda, {x, y}, {{}, {{x, Any, 0}, {y, Any, 0}}, {}}, quote # none, line 1:
return +(x,y)
end)))
julia> typeof(ans)
Expr
Julia has four functions and four macros analog to those functions, used to inspect a lot about generic function's methods:
julia> f(x, y) = x + y
f (generic function with 1 method)
julia> methods(f)
# 1 method for generic function "f":
f(x,y) at none:1
Lowered code:
julia> code_lowered(f, (Int, Int))
1-element Array{Any,1}:
:($(Expr(:lambda, {:x,:y}, {{},{{:x,:Any,0},{:y,:Any,0}},{}}, :(begin # none, line 1:
return x + y
end))))
julia> #code_lowered f(1, 1) # Both `Int`s
...same output.
Typed code:
julia> code_typed(f, (Int, Int))
1-element Array{Any,1}:
:($(Expr(:lambda, {:x,:y}, {{},{{:x,Int64,0},{:y,Int64,0}},{}}, :(begin # none, line 1:
return (top(box))(Int64,(top(add_int))(x::Int64,y::Int64))::Int64
end::Int64))))
julia> #code_lowered f(1, 1) # Both `Int`s
...same output.
LLVM code:
julia> code_llvm(f, (Int, Int))
define i64 #julia_f_24771(i64, i64) {
top:
%2 = add i64 %1, %0, !dbg !1014
ret i64 %2, !dbg !1014
}
julia> #code_llvm f(1, 1) # Both `Int`s
...same output.
Native code:
julia> code_native(f, (Int, Int))
.text
Filename: none
Source line: 1
push RBP
mov RBP, RSP
Source line: 1
add RDI, RSI
mov RAX, RDI
pop RBP
ret
julia> #code_llvm f(1, 1) # Both `Int`s
...same output.
Type instability warnings (v0.4+):
julia> #code_warntype f(1, 1)
Variables:
x::Int64
y::Int64
Body:
begin # In[17], line 1:
return (top(box))(Int64,(top(add_int))(x::Int64,y::Int64))
end::Int64
Reflection and introspection
I'm not sure that there is an AST associated with a generic function because of multiple dispatch. If you're writing a function definition fbody, you should be able to get the AST by doing dump(quote(fbody)).

Resources