getelementptr in LLVM IR - llvm-ir

I don't understand the following code written for LLVM IR. I hope, you can give me a hint.
%struct.foo_struct = type {[3 x i32], i16*, i32}
;struct foo_struct {
; [3 x i32] f0;
; i16* f1;
; i32 f2;
; };
define i32 #foo(%struct.foo_struct* %P) {
entry:
; &P[0].f1
%tmp0 = getelementptr inbounds %struct.foo_struct, %struct.foo_struct* %P, i64 0, i32 1
; P[0].f1
%tmp1 = load i16*, i16** %tmp0
; &P[0].f1[0]
%tmp2 = getelementptr inbounds i16, i16* %tmp1, i64 0
Specifically, in the first code line in entry, we have at the end i32 1. Why i32? Since we want to jump to the next field, namely f1, we have to jump over an array (f0), which 3xi32. So what is this i32? What would be there if we want to have e.g. &P[0].f2?
Thank you for any help

Since we want to jump to the next field, namely f1, we have to jump over an array (f0), which 3xi32. So what is this i32?
1 is the index of the member in the struct and i32 is the type of this index.
The fact that we're jumping over a 3xi32 to get to f1 or that we're thus jumping by 12 bytes is not directly encoded in the instruction. All we're specifying is that we want the second member (i.e. the member at index 1) and how that translates to an offset in bytes is calculated by LLVM based on the types involved. We don't spell this out in the instruction.
What would be there if we want to have e.g. &P[0].f2?
i32 2
Why i32?
Struct indices always have the type i32. Since struct indices must always be constants, there wasn't really a need to allow different types and i32 is large enough for all practical purposes (i.e. you won't have a struct with more than 2^32 members).

Related

Why is it not necessary to dereference when operating on references of primitive types in Rust? [duplicate]

I'm new to Rust and trying to learn how references work. In the following code when I want to do a calculation on a1 which is i32 I don't have to dereference it. But with b1 which is a Box, I have to dereference it.
Actually both let a2 = a1 * 2; and let a3 = *a1 * 2; behave similarly. It looks like dereferencing in primitives is optional OR the compiler is implicitly doing that for us.
fn main(){
let a = 5;
let b = Box::new(10);
let a1 = &a;
let b1 = &b;
println!("{} {}", a1, b1);
let a2 = a1 * 2;
let b2 = (**b1) * 10;
let a3 = *a1 * 2;
println!("{} {} {}", a2, a3, b2);
}
Can someone please explain this functionality?
All of the arithmetic operators in Rust are implemented for both primitive values and references to primitives on either side of the operator. For example, see the Implementors section of std::ops::Mul, the trait that controls the overriding of the * operator.
You'll see something like:
impl Mul<i32> for i32
impl<'a> Mul<i32> for &'a i32
impl<'a> Mul<&'a i32> for i32
impl<'a, 'b> Mul<&'a i32> for &'b i32
and so on and so on.
In your example, b1 has a type of &Box<i32> (the default integer type), and while Box implements many traits as a passthrough for its contained type (e.g. impl<T: Read> Read for Box<T>), the arithmetic operators are not among them. That is why you have to dereference the box.

Why are these two code samples equal? Difference between references, borrowed variables, pointers

fn largest(num_list: &[i32]) -> i32 {
let mut largest = num_list[0];
for &num in num_list {
if num > largest {
largest = num
}
}
largest
}
fn largest2(num_list: &[i32]) -> i32 {
let mut largest = num_list[0];
for num in num_list {
if num > &largest {
largest = *num
}
}
largest
}
fn main() {
let num_list = vec![30, 20, 10, 60, 50, 40];
let largest = largest(&num_list);
let largest2 = largest2(&num_list);
println!("The largest number in num_list fn is: {}.", largest);
println!("The largest number in num_list fn is: {}.", largest2);
}
As you can see there are some slight differences in largest and largest2. Can someone help me understand the differences here and why both samples of code actually function the same?
The main difference between the two samples stems from your loop statement, for num in num_list vs for &num in num_list. First of all, it is important to understand that they are equivalent to for [...] in num_list.iter(), that is, they iterate over references of elements of num_list. These references, of type &i32, are then assigned to either num or &num. In the first case, we just have a direct assignment, therefore num: &i32. In the second case, there is an irrefutable assignment that binds num to the number, therefore num: i32. Incidentally, this is possible because, even though you are trying to move out a value from a borrow, i32: Copy, so Rust compiles it fine.
The rest is just adaptation: either you are using a &i32, or directly a i32. For instance, when num: &i32, and you want to compare it with largest, you should dereference num and then compare it, giving *num > largest. However, num > &largest works too because Rust knows how to compare two &i32, by dereferencing both (so it will actually produce *num > largest). Similarly, when you assign to largest, you must assign an i32, so you dereference num: largest = *num.
It's easy to understand, then, why these two pieces of code do the same thing: in one version, you copy the value of a pointer to an integer in num, and then use it, whereas in the other you keep the dereference in num, and simply dereference it each time you need to use it.

How to get a `Ptr` to an element of an `NTuple`?

Say I have a tuple of Cchar like
str = ('f', 'o', 'o', '\0', '\0')
and I want to convert it to a more traditional string. If str were a Vector, I could create Ptr and do all sorts of things with that. I've tried various ways of passing str to methods of pointer, Ptr, Ref, and unsafe_string without success since those normally work on arrays rather than tuples. Any suggestions?
Note: what I really have is a C struct that looks like
typedef struct foo {
char str[FOO_STR_MAX_SZ];
...
} foo_t;
which Clang.jl wrapped as
struct foo_t
str :: NTuple{FOO_STR_MAX_SZ, UInt8}
...
end
I also played around with NTuple of Cchar (ie, Int8) instead of UInt8, and I tried to use SVector instead of NTuple as well. But I still couldn't find a way to generate a Ptr from the str field. Am I missing something?
Since you asked the question, I think collecting it to an array a = collect(x.str) is not the answer you are expecting...
You could use ccall(:jl_value_ptr, Ptr{Cvoid}, (Any,), a) to get the pointer of a even if a is immutable. However, blindly using it will produce some confusing results:
julia> struct foo_t
str::NTuple{2, UInt8}
end
julia> a = foo_t((2, 3))
foo_t((0x02, 0x03))
julia> ccall(:jl_value_ptr, Ptr{Cvoid}, (Any,), a.str)
Ptr{Nothing} #0x00007f4302c4f670
julia> ccall(:jl_value_ptr, Ptr{Cvoid}, (Any,), a.str)
Ptr{Nothing} #0x00007f4302cc47e0
We got two different pointers from the same object! The reason is that since NTuple is immutable, the compiler will do many "optimizations" for it, for example, coping it every time you use it. This is why getting pointers from immutable objects is explicitly forbidden in the source code:
function pointer_from_objref(#nospecialize(x))
#_inline_meta
typeof(x).mutable || error("pointer_from_objref cannot be used on immutable objects")
ccall(:jl_value_ptr, Ptr{Cvoid}, (Any,), x)
end
However, there are several workarounds for it. First, since the expression a.str copies the tuple, you can avoid this expressoin and calculate the address of it directly using the address of a and fieldoffset(typeof(a), 1). (1 means str is the first field of foo_t)
julia> p = Ptr{UInt8}(ccall(:jl_value_ptr, Ptr{UInt8}, (Any,), a)) + fieldoffset(typeof(a), 1)
Ptr{UInt8} #0x00007f4304901df0
julia> p2 = Ptr{UInt8}(ccall(:jl_value_ptr, Ptr{UInt8}, (Any,), a)) + fieldoffset(typeof(a), 1)
Ptr{UInt8} #0x00007f4304901df0
julia> p === p2
true
julia> unsafe_store!(p, 5)
Ptr{UInt8} #0x00007f4304901df0
julia> a
foo_t((0x05, 0x03))
It now works. However, there are still caveats: when you try to wrap the code in a function, it became wrong again:
julia> mut!(a) = unsafe_store!(Ptr{UInt8}(ccall(:jl_value_ptr, Ptr{UInt8}, (Any,), a)) + fieldoffset(typeof(a), 1), 8)
mut! (generic function with 1 method)
julia> mut!(a)
Ptr{UInt8} #0x00007f42ec560294
julia> a
foo_t((0x05, 0x03))
a is not changed because, well, foo_t itself is also immutable and will be copied to mut!, so the change made within the function will not be visible outside. To solve this, we need to wrap a in a mutable object to give it a stable address in the heap. Base.RefValue can be used for this purpose:
julia> b = Base.RefValue(a)
Base.RefValue{foo_t}(foo_t((0x05, 0x03)))
julia> mut!(b) = unsafe_store!(Ptr{UInt8}(ccall(:jl_value_ptr, Ptr{UInt8}, (Any,), b)) + fieldoffset(typeof(b), 1) + fieldoffset(typeof(a), 1), 8)
mut! (generic function with 1 method)
julia> mut!(b)
Ptr{UInt8} #0x00007f43057b3820
julia> b
Base.RefValue{foo_t}(foo_t((0x08, 0x03)))
julia> b[]
foo_t((0x08, 0x03))
As explained by #张实唯, str is a constant array which is stack-allocated, so you need to use pointer arithmetics to access the field. There is a package called Blobs.jl for this kinda purpose. As for the mutability, you could also use Setfield.jl for convenience.
BTW, Clang.jl do support generating mutable structs via ctx.options["is_struct_mutable"] = true.

how to get string literal from LLVM IR instruction

I want to get string literal from LLVM IR.
C source code looked like:
char *test = "string";
LLVM IR looked like:
#.str = private unnamed_addr constant [9 x i8] c"string\00", align 1
#test = global i8* getelementptr inbounds ([9 x i8], [9 x i8]* #.str, i32 0, i32 0), align 8
I got some how in ArgValue variable. --> the second line of IR..
my code looked like this : and now i am stuck after getting Constant*
GetElementPtrInst *gep = dyn_cast<GetElementPtrInst>(ArgValue);
Value *Valop = gep->getPointerOperand();
Instruction *inst = dyn_cast<Instruction>(Valop);
Constant *cda = dyn_cast<Constant>(inst->getOperand(0));
now after the last statement, how do i get the constant "string" is something i stuck at.
Constant *cda is not null.. the third line is success
it does not work if i try to type cast to any other object.
please help...

Julia: invoke a function by a given string

Does Julia support the reflection just like java?
What I need is something like this:
str = ARGS[1] # str is a string
# invoke the function str()
The Good Way
The recommended way to do this is to convert the function name to a symbol and then look up that symbol in the appropriate namespace:
julia> fn = "time"
"time"
julia> Symbol(fn)
:time
julia> getfield(Main, Symbol(fn))
time (generic function with 2 methods)
julia> getfield(Main, Symbol(fn))()
1.448981716732318e9
You can change Main here to any module to only look at functions in that module. This lets you constrain the set of functions available to only those available in that module. You can use a "bare module" to create a namespace that has only the functions you populate it with, without importing all name from Base by default.
The Bad Way
A different approach that is not recommended but which many people seem to reach for first is to construct a string for code that calls the function and then parse that string and evaluate it. For example:
julia> eval(parse("$fn()")) # NOT RECOMMENDED
1.464877410113412e9
While this is temptingly simple, it's not recommended since it is slow, brittle and dangerous. Parsing and evaling code is inherently much more complicated and thus slower than doing a name lookup in a module – name lookup is essentially just a hash table lookup. In Julia, where code is just-in-time compiled rather than interpreted, eval is much slower and more expensive since it doesn't just involve parsing, but also generating LLVM code, running optimization passes, emitting machine code, and then finally calling a function. Parsing and evaling a string is also brittle since all intended meaning is discarded when code is turned into text. Suppose, for example, someone accidentally provides an empty function name – then the fact that this code is intended to call a function is completely lost by accidental similarity of syntaxes:
julia> fn = ""
""
julia> eval(parse("$fn()"))
()
Oops. That's not what we wanted at all. In this case the behavior is fairly harmless but it could easily be much worse:
julia> fn = "println(\"rm -rf /important/directory\"); time"
"println(\"rm -rf /important/directory\"); time"
julia> eval(parse("$fn()"))
rm -rf /important/directory
1.448981974309033e9
If the user's input is untrusted, this is a massive security hole. Even if you trust the user, it is still possible for them to accidentally provide input that will do something unexpected and bad. The name lookup approach avoids these issues:
julia> getfield(Main, Symbol(fn))()
ERROR: UndefVarError: println("rm -rf /important/directory"); time not defined
in eval(::Module, ::Any) at ./boot.jl:225
in macro expansion at ./REPL.jl:92 [inlined]
in (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:46
The intent of looking up a name and then calling it as a function is explicit, instead of implicit in the generated string syntax, so at worst one gets an error about a strange name being undefined.
Performance
If you're going to call a dynamically specified function in an inner loop or as part of some recursive computation, you will want to avoid doing a getfield lookup every time you call the function. In this case all you need to do is make a const binding to the dynamically specified function before defining the iterative/recursive procedure that calls it. For example:
fn = "deg2rad" # converts angles in degrees to radians
const f = getfield(Main, Symbol(fn))
function fast(n)
t = 0.0
for i = 1:n
t += f(i)
end
return t
end
julia> #time fast(10^6) # once for JIT compilation
0.010055 seconds (2.97 k allocations: 142.459 KB)
8.72665498661791e9
julia> #time fast(10^6) # now it's fast
0.003055 seconds (6 allocations: 192 bytes)
8.72665498661791e9
julia> #time fast(10^6) # see?
0.002952 seconds (6 allocations: 192 bytes)
8.72665498661791e9
The binding f must be constant for optimal performance, since otherwise the compiler can't know that you won't change f to point at another function at any time (or even something that's not a function), so it has to emit code that looks f up dynamically on every loop iteration – effectively the same thing as if you manually call getfield in the loop. Here, since f is const, the compiler knows f can't change so it can emit fast code that just calls the right function directly. But the compiler can sometimes do even better than that – in this case it actually inlines the implementation of the deg2rad function, which is just a multiplication by pi/180:
julia> #code_llvm fast(100000)
define double #julia_fast_51089(i64) #0 {
top:
%1 = icmp slt i64 %0, 1
br i1 %1, label %L2, label %if.preheader
if.preheader: ; preds = %top
br label %if
L2.loopexit: ; preds = %if
br label %L2
L2: ; preds = %L2.loopexit, %top
%t.0.lcssa = phi double [ 0.000000e+00, %top ], [ %5, %L2.loopexit ]
ret double %t.0.lcssa
if: ; preds = %if.preheader, %if
%t.04 = phi double [ %5, %if ], [ 0.000000e+00, %if.preheader ]
%"#temp#.03" = phi i64 [ %2, %if ], [ 1, %if.preheader ]
%2 = add i64 %"#temp#.03", 1
%3 = sitofp i64 %"#temp#.03" to double
%4 = fmul double %3, 0x3F91DF46A2529D39 ; deg2rad(x) = x*(pi/180)
%5 = fadd double %t.04, %4
%6 = icmp eq i64 %"#temp#.03", %0
br i1 %6, label %L2.loopexit, label %if
}
If you need to do this with many different dynamically specified functions, then you can even pass the function to be called in as an argument:
function fast(f,n)
t = 0.0
for i = 1:n
t += f(i)
end
return t
end
julia> #time fast(getfield(Main, Symbol(fn)), 10^6)
0.007483 seconds (1.70 k allocations: 76.670 KB)
8.72665498661791e9
julia> #time fast(getfield(Main, Symbol(fn)), 10^6)
0.002908 seconds (6 allocations: 192 bytes)
8.72665498661791e9
This generates the same fast code as single-argument fast above, but will generate a new version for every different function f that you call it with.

Resources