How do I define a pointer to a variable or list element in Julia? I have tried reading some resources but I am really confused about using a pointer in Julia.
You cannot have a pointer to a variable—unlike C/C++, Julia doesn't work like that: variables don't have memory locations as part of the language semantics; the compiler may store a variable in memory (or in a register), but that's none of your business 😁. Mutable objects, however, do generally live in memory and you can get a pointer to such an object using the pointer_from_objref function:
pointer_from_objref(x)
Get the memory address of a Julia object as a Ptr. The existence of
the resulting Ptr will not protect the object from garbage collection,
so you must ensure that the object remains referenced for the whole
time that the Ptr will be used.
This function may not be called on immutable objects, since they do
not have stable memory addresses.
See also: unsafe_pointer_to_objref.
Why the awful name? Because, really why are you taking pointers to objects? Probably don't do that. You can also get a pointer into an array using the pointer function:
pointer(array [, index])
Get the native address of an array or string, optionally at a given
location index.
This function is "unsafe". Be careful to ensure that a Julia reference
to array exists as long as this pointer will be used. The GC.#preserve
macro should be used to protect the array argument from garbage
collection within a given block of code.
Calling Ref(array[, index]) is generally preferable to this function
as it guarantees validity.
This is a somewhat more legitimate use case, especially for interop with C or Fortran, but be careful. The interaction between raw pointers and garbage collection is tricky and dangerous. If you're not doing interop then think hard about why you need pointers—you probably want to approach the problem differently.
Related
I'm working on a toy programming JavaScript-like language which I'm writing in Rust, and this language has a very basic mark and sweep garbage collector. I have a working prototype, and the way I implemented it is to wrap Rust objects which are going to be allocated on the GC heap inside of a Box. Then I can get a pointer to the GC'd object and wrap pointer into a dynamically-typed Value type, i.e.:
#[derive(Debug, Copy, Clone, PartialEq)]
pub enum Value
{
Int64(i64),
UInt64(u64),
HostFn(HostFn),
Fun(*mut Function),
Str(*mut String),
Nil,
}
These are the values which my bytecode interpreter works with. The full source code is here if anyone wants to see more about how the GC works and provide feedback. I'm trying to keep the implementation as simple as possible so that this will be beginner-friendly.
What I would like advice with is that working with mutable pointers like this in Rust is very unergonomic. Every time I want to dereference them, I have to wrap everything in unsafe. This means, if I want to implement string concatenation or equality, I have to wrap the whole thing in an unsafe block where I'll dereference and borrow both strings. I have to do the same thing if I want to implement function calls. That's going to result in unsafe blocks everywhere in the implementation of my interpreter.
So my question is: can we make this more ergonomic somehow? One potential idea would be to have a helper method on Value that can return a &'static mut String for example. This is essentially lying to the Rust compiler about lifetimes for the sake of convenience, because the programmer still needs to be careful that GC'd values are on the VM heap, otherwise they will be collected. Would this be safe though? The real danger would come if the Rust compiler were to reorder operations around. Advice welcome.
I have a Rust dynamic library which is intended to be called from any language. The arguments to the exported function are two char * pointers to memory and two lengths for each piece of memory.
The problem is that from_raw_parts reduces to a memcpy and can segfault in a variety of dangerous ways if for example the lengths are wrong. I'm then using bincode::deserialize on the slices to use them as Rust objects. Is there any safer option to deal with incoming raw pointers to memory?
No.
What you are asking doesn't make sense. To some level, the entire reason that Rust the language exists is because raw pointers are inherently dangerous. Rust's references (and their related lifetimes) are a structured way of performing compile-time checks to ensure that a pointer is valid and safe to use.
Once you start using raw pointers, the compiler can no longer help you with those pointers and it's now up to you to ensure that safety is guaranteed.
from_raw_parts reduces to a memcpy
This doesn't seem correct. No memory should be copied to create a slice. A Rust slice is effectively just a pair of (pointer, length) — the same things that you are passing in separately. I'd expect those each to be register-sized, so calling memcpy would be overkill.
Using the resulting slice could possibly involve copying the data, but that's not due to from_raw_parts anymore.
I get the impression that Rust is intended to be used in highly safe systems. Then I noticed that raw pointers allow arbitrary pointer arithmetic, and they can cause memory safety and security issues.
Basically, a pointer is an object that refers to another object. In most programming languages (I guess) a pointer is actually just a number that refers to a memory address. Rust's raw pointers are really just that - memory addresses. There are other pointer types in Rust (& references, Box, Rc, Arc), for which the compiler can verify that the memory is valid and contains what the program thinks it contains. This is not the case for raw pointers; they can in principle point to any memory location, regardless of the content. Refer to The Book for more details.
Raw pointers can only be dereferenced inside unsafe blocks. These blocks are a way for the programmer to tell the compiler "I know better than you that this is safe and I promise not to do anything stupid".
It is generally best to avoid raw pointers if possible because the compiler cannot reason about their validity, which makes them unsafe in general. Things that make raw pointers unsafe are the potential to...
access a NULL pointer,
access a dangling (freed or invalid) pointer,
free a pointer multiple times,
All these points boil down to dereferencing the pointer. That is, to use the memory pointed to.
However, using raw pointers without dereferencing them is perfectly safe. This has a use case in finding out if two references point to the same object:
fn is_same(a: &i32, b: &i32) -> bool {
a as *const _ == b as *const _
}
Another use case is the foreign function interface (FFI). If you wrap a C function that takes raw pointers as arguments, there is no way around providing them to the function. This is actually unsafe (as is the whole FFI business), because the function is likely to dereference the pointer. This means you are responsible for making sure the pointer is valid, stays valid, and is not freed multiple times.
Finally, raw pointers are used for optimization. For example, the slice iterator uses raw pointers as internal state. This is faster than indices because they avoid range checks during iteration. However, it is also unsafe as far as the compiler is concerned. The library author needs to pay extra attention, so using raw pointers for optimization always comes at the risk of introducing memory bugs that you normally do not have in rust.
In summary, the three main uses of raw pointers are:
"just numbers" - you never access the memory they point to.
FFI - you pass them outside Rust.
memory-mapped I/O - to trigger I/O actions you need to access hardware registers at fixed addresses.
performance - they can be faster than other options, but the compiler won't enforce safety.
As to when raw pointers should be used, the first three points are straight-forward: You will know when they apply because you have to. The last point is more subtle. As with all optimizations, only use them when the benefit outweighs the effort and risk of using them.
A counter example when not to use raw pointers is whenever the other pointer types (& references, Box, Rc, Arc) do the job.
In go I seem to have two options:
foo := Thing{}
foo.bar()
foo := &Thing{}
foo.bar()
func (self Thing) bar() {
}
func (self *Thing) bar() {
}
What's the better way to define my funcs with self Thing or with self *Thing?
Edit: this is not a duplicate of the question about methods and functions. This question has to do with Thing and &Thing and I think it's different enough to warrent it's own url.
Take a look at this item from the official FAQ:
For programmers unaccustomed to pointers, the distinction between
these two examples can be confusing, but the situation is actually
very simple. When defining a method on a type, the receiver (s in the
above examples) behaves exactly as if it were an argument to the
method. Whether to define the receiver as a value or as a pointer is
the same question, then, as whether a function argument should be a
value or a pointer. There are several considerations.
First, and most important, does the method need to modify the
receiver? If it does, the receiver must be a pointer. (Slices and maps
act as references, so their story is a little more subtle, but for
instance to change the length of a slice in a method the receiver must
still be a pointer.) In the examples above, if pointerMethod modifies
the fields of s, the caller will see those changes, but valueMethod is
called with a copy of the caller's argument (that's the definition of
passing a value), so changes it makes will be invisible to the caller.
By the way, pointer receivers are identical to the situation in Java,
although in Java the pointers are hidden under the covers; it's Go's
value receivers that are unusual.
Second is the consideration of efficiency. If the receiver is large, a
big struct for instance, it will be much cheaper to use a pointer
receiver.
Next is consistency. If some of the methods of the type must have
pointer receivers, the rest should too, so the method set is
consistent regardless of how the type is used. See the section on
method sets for details.
For types such as basic types, slices, and small structs, a value
receiver is very cheap so unless the semantics of the method requires
a pointer, a value receiver is efficient and clear.
There isn't a clear answer but they're completely different. When you don't use a pointer you 'pass by value' meaning the object you called it on will be immutable (modifying a copy), when you use the pointer you 'pass by reference'. I would say more often you use the pointer variety but it is completely situational, there is no 'better way'.
If you look at various programming frameworks/class libraries you will see many examples where the authors have deliberately chosen to do things by value or reference. For example, in C# .NET this is the fundamental difference between a struct and a class and types like Guid and DateTime were deliberately implemented as structs (value type). Again, I think the pointer is more often the better choice (if you look through .NET almost everything is a class, the reference type), but it definitely depends on what you wish to achieve with the type and/or how you want consumers/other developers to interact with it. Your may need to consider performance and concurrency (maybe you want everything to be by value so you don't have to worry about concurrent ops on a type, maybe you need a pointer because the objects memory footprint is large and copying it would make your program too slow or consumptive).
Given the following struct:
type Exp struct {
foo int,
bar *int
}
What is the difference in term of performance when using a pointer or a value in a struct. Is there any overhead or this just two schools of Go programming?
I would use pointers to implement a chained struct but is this the only case we have to use pointers in struct in order to gain performance?
PS: in the above struct we talk about a simple int but it could be any other type (even custom one)
Use the form which is most functionally useful for your program. Basically, this means if it's useful for the value to be nil, then use a pointer.
From a performance perspective, primitive numeric types are always more efficient to copy than to dereference a pointer. Even more complex data structures are still usually faster to copy if they are smaller than a cache line or two (under 128 bytes is a good rule of thumb for x86 CPUs).
When things get a little larger, you need to benchmark if performance concerns you. CPUs are very efficient at copying data, and there are so many variables involved which will determine the locality and cache friendliness of your data, it really depends on your program's behavior, and the hardware you're using.
This is an excellent series of articles if you want to better understand the how memory and software interact: "What every programmer should know about memory".
In short, I tell people to choose a pointer or not based on the logic of the program, and worry about performance later.
Use a pointer if you need to pass something to be modified.
Use a pointer if you need to determine if something was unset/nil.
Use a pointer if you are using a type that has methods with pointer receivers.
If the size of a pointer is less than the struct member, then using a pointer is more efficient since you don't need to copy the member but just its address. Also, if you want to be able to move or share some part of a structure, it is better to have a pointer so that you can, again, only share the address of the member. See also the golang faqs.