Memory leak in golang slice - pointers

I just started learning go, while going through slice tricks, couple of points are very confusing. can any one help me to clarify.
To cut elements in slice its given
Approach 1:
a = append(a[:i], a[j:]...)
but there is a note given that it may cause to memory leaks if pointers are used and recommended way is
Approach 2:
copy(a[i:], a[j:])
for k, n := len(a)-j+i, len(a); k < n; k++ {
a[k] = nil // or the zero value of T
}
a = a[:len(a)-j+i]
Can any one help me understand how memory leaks happen.
I understood sub slice will be backed by the main array. My thought is irrespective of pointer or not we have to follow approach 2 always.
update after #icza and #Volker answer..
Lets say you have a struct
type Books struct {
title string
author string
}
var Book1 Books
var Book2 Books
/* book 1 specification */
Book1.title = "Go Programming"
Book1.author = "Mahesh Kumar"
Book2.title = "Go Programming"
Book2.author = "Mahesh Kumar"
var bkSlice = []Books{Book1, Book2}
var bkprtSlice = []*Books{&Book1, &Book2}
now doing
bkSlice = bkSlice[:1]
bkSlice still holds the Book2 in backing array which is still in memory and is not required to be.
so do we need to do
bkSlice[1] = Books{}
so that it will be GCed. I understood pointers have to be nil-ed as the slice will hold unnecessary references to the objects outside backing array.

Simplest can be demonstrated by a simple slice expression.
Let's start with a slice of *int pointers:
s := []*int{new(int), new(int)}
This slice has a backing array with a length of 2, and it contains 2 non-nil pointers, pointing to allocated integers (outside of the backing array).
Now if we reslice this slice:
s = s[:1]
Length will become 1. The backing array (holding 2 pointers) is not touched, it sill holds 2 valid pointers. Even though we don't use the 2nd pointer now, since it is in memory (it is the backing array), the pointed object (which is a memory space for storing an int value) cannot be freed by the garbage collector.
The same thing happens if you "cut" multiple elements from the middle. If the original slice (and its backing array) was filled with non-nil pointers, and if you don't zero them (with nil), they will be kept in memory.
Why isn't this an issue with non-pointers?
Actually, this is an issue with all pointer and "header" types (like slices and strings), not just pointers.
If you would have a slice of type []int instead of []*int, then slicing it will just "hide" elements that are of int type which must stay in memory as part of the backing array regardless of if there's a slice that contains it or not. The elements are not references to objects stored outside of the array, while pointers refer to objects being outside of the array.
If the slice contains pointers and you nil them before the slicing operation, if there are no other references to the pointed objects (if the array was the only one holding the pointers), they can be freed, they will not be kept due to still having a slice (and thus the backing array).
Update:
When you have a slice of structs:
var bkSlice = []Books{Book1, Book2}
If you slice it like:
bkSlice = bkSlice[:1]
Book2 will become unreachabe via bkSlice, but still will be in memory (as part of the backing array).
You can't nil it because nil is not a valid value for structs. You can however assign its zero value to it like this:
bkSlice[1] = Book{}
bkSlice = bkSlice[:1]
Note that a Books struct value will still be in memory, being the second element of the backing array, but that struct will be a zero value, and thus will not hold string references, thus the original book author and title strings can be garbage collected (if no one else references them; more precisely the byte slice referred from the string header).
The general rule is "recursive": You only need to zero elements that refer to memory located outside of the backing array. So if you have a slice of structs that only have e.g. int fields, you do not need to zero it, in fact it's just unnecessary extra work. If the struct has fields that are pointers, or slices, or e.g. other struct type that have pointers or slices etc., then you should zero it in order to remove the reference to the memory outside of the backing array.

Related

(CAPL) how i assign array length by using parameter

void func(int a){
byte arr[a];
}
this code is not working. how I assign array length by using parameter?
In CAPL you have many options to go, but first you'll have to consider you probably want to step back and ask yourself if you really need variable array size at runtime. The measurement performance is what you should be concerned about, declaring a suitable array size as design may be a safer approach.
A global array of parametric size could be something like this:
variables
{
int arraySize = 256;
byte arr[arraySize];
}
From the docs,
Declaration of arrays (arrays, vectors, matrices) is permitted in CAPL. They are used and initialized in a manner analogous to C language.
In C, array size is constant:
Array is a type consisting of a contiguously allocated nonempty sequence of objects with a particular element type. The number of those objects (the array size) never changes during the array lifetime. [source]
This is why your code is not working: you cannot create an array of runtime-based size. Similarly, from the same source
Variable-length arrays
If expression is not an integer constant
expression, the declarator is for an array of variable size.
Each time the flow of control passes over the declaration, expression
is evaluated (and it must always evaluate to a value greater than
zero), and the array is allocated (correspondingly, lifetime of a VLA
ends when the declaration goes out of scope). The size of each VLA
instance does not change during its lifetime, but on another pass over
the same code, it may be allocated with a different size.
This is why you should be able to define a parametric array like I showed you. Even if in the code arraySize should change, arr will be of 256 elements for the execution of your CAPL script.
void func(int a){
byte arr[a];
}
Will throw error, because int a is determined to be of non-constant time, thus violating the requirements above. What you can do, is to memcpy parts of a larger array to a location of choice, for example a smaller array, or employ a number of "buffer" arrays as you often see in CAPL scripts.
As I took it home, the gist of it is: use a larger size array, and be precise about where you are putting your information inside of it. Note that you must be precise, because every element in the array contains some kind of data, at init most of it is non-sense, and there is no safeguard for you against this digital noise.

Pointer vs value receiver in Go | heap.Interface vs sort.Interface

I came across priorityqueue example under heap.Interface package
Link: https://golang.org/pkg/container/heap/#Interface
For Push() and Pop() function required by heap.Interface, the implementation is on pointer receiver. But for Swap() function required by sort.Interface, the implementation is on value.
Why this difference ?
As per my understanding, Push() and Pop() are implemented on pointer type, as they need to change the underlying data. But going by that logic, Swap() should also be implemented on pointer type.
How and why does the Swap() implementation work on value, but Push() and Pop() do not ?
A pointer receiver is needed when the value passed needs to be modified. In the case of Swap, the value itself (which is a slice) doesn't get modified, although the array backing the slice does get modified.
In the case of Push and Pop, the slice does get modified since in both cases the length changes (and in the case of Push the underlying array may get replaced by a new one if it has reached its capacity).
Take a look at Push implementation:
func (pq *PriorityQueue) Push(x interface{}) {
n := len(*pq)
item := x.(*Item)
item.index = n
*pq = append(*pq, item) // Here, the slice is assigned a new value
}
Push (and Pop) modify the underlying slice as well as the slice elements for the priority queue, whereas Swap will only swap two elements in the slice, and will not change the slice itself. Thus, Swap can work with a value receiver.
Internally, a slice variable holds a length, a capacity, and a pointer to the data. Swapping items changes the data, but doesn't change any of the items in the slice header. Russ Cox explained this in a blog post.
Adding items to the slice, like to push something onto a heap, may require the array to be re-allocated, which will change the capacity and the location that needs to be pointed to.
You may find this answer on pointers vs. values generally to be useful. There are other types, like channels and maps, that contain references such that you don't need a pointer to mess with the data underneath.

Does a function parameter that accepts a string reference point directly to the string variable or the data on the heap in Rust

I've taken this picture and code from The Rust Book.
Why does s point to s1 rather than just the data on the heap itself?
If so this is how it works? How does the s point to s1. Is it allocated memory with a ptr field that contains the memory address of s1. Then, does s1, in turn point to the data.
In s1, I appear to be looking at a variable with a pointer, length, and capacity. Is only the ptr field the actual pointer here?
This is my first systems level language, so I don't think comparisons to C/C++ will help me grok this. I think part of the problem is that I don't quite understand what exactly pointers are and how the OS allocates/deallocates memory.
fn main() {
let s1 = String::from("hello");
let len = calculate_length(&s1);
println!("The length of '{}' is {}.", s1, len);
}
fn calculate_length(s: &String) -> usize {
s.len()
}
The memory is just a huge array, which can be indexed by any offset (e.g. u64).
This offset is called address,
and a variable that stores an address called a pointer.
However, usually only some small part of memory is allocated, so not every address is meaningful (or valid).
Allocation is a request to make a (sequential) range of addresses meaningful to the program (so it can access/modify).
Every object (and by object I mean any type) is located in allocated memory (because non-allocated memory is meaningless to the program).
Reference is actually a pointer that is guaranteed (by a compiler) to be valid (i.e. derived from address of some object known to a compiler). Take a look at std doc also.
Here an example of these concepts (playground):
// This is, in real program, implicitly defined,
// but for the sake of example made explicit.
// If you want to play around with the example,
// don't forget to replace `usize::max_value()`
// with a smaller value.
let memory = [uninitialized::<u8>(); usize::max_value()];
// Every value of `usize` type is valid address.
const SOME_ADDR: usize = 1234usize;
// Any address can be safely binded to a pointer,
// which *may* point to both valid and invalid memory.
let ptr: *const u8 = transmute(SOME_ADDR);
// You find an offset in our memory knowing an address
let other_ptr: *const u8 = memory.as_ptr().add(SOME_ADDR);
// Oversimplified allocation, in real-life OS gives a block of memory.
unsafe { *other_ptr = 15; }
// Now it's *meaningful* (i.e. there's no undefined behavior) to make a reference.
let refr: &u8 = unsafe { &*other_ptr };
I hope that clarify most things out, but let's cover the questions explicitly though.
Why does s point to s1 rather than just the data on the heap itself?
s is a reference (i.e. valid pointer), so it points to the address of s1. It might (and probably would) be optimized by a compiler for being the same piece of memory as s1, logically it still remains a different object that points to s1.
How does the s point to s1. Is it allocated memory with a ptr field that contains the memory address of s1.
The chain of "pointing" still persists, so calling s.len() internally converted to s.deref().len, and accessing some byte of the string array converted to s.deref().ptr.add(index).deref().
There are 3 blocks of memory that are displayed on the picture: &s, &s1, s1.ptr are different (unless optimized) memory addresses. And all of them are stored in the allocated memory. The first two are actually stored at pre-allocated (i.e. before calling main function) memory called stack and usually it is not called an allocated memory (the practice I ignored in this answer though). The s1.ptr pointer, in contrast, points to the memory that was allocated explicitly by a user program (i.e. after entering main).
In s1, I appear to be looking at a variable with a pointer, length, and capacity. Is only the ptr field the actual pointer here?
Yes, exactly. Length and capacity are just common unsigned integers.

What is the difference between Vec<struct> and &[struct]?

I often find myself getting an error like this:
mismatched types: expected `collections::vec::Vec<u8>`, found `&[u8]` (expected struct collections::vec::Vec, found &-ptr)
As far as I know, one is mutable and one isn't but I've no idea how to go between the types, i.e. take a &[u8] and make it a Vec<u8> or vice versa.
What's the different between them? Is it the same as String and &str?
Is it the same as String and &str?
Yes. A Vec<T> is the owned variant of a &[T]. &[T] is a reference to a set of Ts laid out sequentially in memory (a.k.a. a slice). It represents a pointer to the beginning of the items and the number of items. A reference refers to something that you don't own, so the set of actions you can do with it are limited. There is a mutable variant (&mut [T]), which allows you to mutate the items in the slice. You can't change how many are in the slice though. Said another way, you can't mutate the slice itself.
take a &[u8] and make it a Vec
For this specific case:
let s: &[u8]; // Set this somewhere
Vec::from(s);
However, this has to allocate memory not on the stack, then copy each value into that memory. It's more expensive than the other way, but might be the correct thing for a given situation.
or vice versa
let v = vec![1u8, 2, 3];
let s = v.as_slice();
This is basically "free" as v still owns the data, we are just handing out a reference to it. That's why many APIs try to take slices when it makes sense.

How to link Two Multi-Dimensional arrays using pointers?

I need to basically merge a Binary Heap, and Linear Probing Hashtable to make a "compound" data structure, which has the functionality of a heap, with the sorting power of a hashtable.
What I need to do is create 2 2 dimension arrays for each data structure (Binary Heap, and Hash) then link them to each other with pointers so that when I change things, such as deleting a value in the Binary Heap, it also gets deleted in the Hash table.
Therefore, I need to have one row of the Heap array pointing from the Heap to the Hastable, and one row of the hashtable array pointing from the hashtable to the heap.
Create a container that contains both, with accessor functions/methods (depending on your language of implementation) that performs all the operations required of your algorithm.
IE:
Delete from container: does a delete from Binary and from hash.
Add to container: adds to binary and to hash.
EDIT:
Oh, an assignment - fun! :)
I'd do this:
still implement a container. But, instead of using a standard library for btree/hash, implement them like this:
Make a type that can be put in your data member that has a pointer to the BTree node and the Hashtable Node that the data element lives in.
To delete a data element, given a pointer to it, you can perform the delete algorithm on a btree (navigate to parent from node pointer, delete child (left or right), restructure tree) and on the hash table (delete from hash list). When adding a value, perform the add algorithm on btree and hash, but be sure you update the node pointers in the data before you return.
Some pseudocode (I'll use C, but i'm not sure what language your using):
typedef struct
{
BTreeNode* btree
HashNode* hash
} ContianerNode;
to put data in your container:
typedef struct
{
ContainerNode node;
void* data; /* whatever the data is */
} Data;
a BTreeNode has something like:
typedef struct _BTreeNode
{
struct _BTreeNode* parent;
struct _BTreeNode* left;
struct _BTreeNode* right;
} BTreeNode;
and a HashNode has something like:
typedef struct _HashNode
{
struct _HashNode* next;
} HashNode;
/* ala singly linked list */
and your BTree would be a pointer to a BTreeNode and your hastable would be an array of pointers to HashNodes. Like this:
typedef struct
{
BTreeNode* btree;
HashNode* hashtable[HASHTABLESIZE];
} Container;
void delete(Container* c, ContainerNode* n)
{
delete_btree_node(n->btree);
delete_hashnode(n->hash);
}
ContainerNode* add(Container* c, void* data)
{
ContainerNode* n = malloc(sizeof(ContainerNode));
n->btree = add_to_btree(n);
n->hash = add_to_hash(n);
}
I'll let you complete those other functions (can't do the whole assignment for you ;) )
Why bother with the links?
You have two associative structures just duplicate any operation on one to the other (ensuring that if one operation excepts you either crash the whole thing or leave the object in a valid state if you care about such things)
Unless you can make use of the structure of one to help you with the other (and I don't see how you can since either one can entirely rearrange it's internal state on any modification operation) this is just as effective and much simpler.
Of course this means that the O() cost of any modification operation is the cost of the most expensive and memory costs are doubled but that is true of the original plan unless their is some trick I'm missing.

Resources