Language without explicit memory alloc/dealloc AND without garbage collection - pointers

I was wondering if it is possible to create a programming language without explicit memory allocation/deallocation (like C, C++ ...) AND without garbage collection (like Java, C#...) by doing a full analysis at the end of each scope?
The obvious problem is that this would take some time at the end of each scope, but I was wondering if it has become feasible with all the processing power and multiple cores in current CPU's. Do such languages exist already?
I also was wondering if a variant of C++ where smart pointers are the only pointers that can be used, would be exactly such a language (or am I missing some problems with that?).
Edit:
Well after some more research apparently it's this: http://en.wikipedia.org/wiki/Reference_counting
I was wondering why this isn't more popular. The disadvantages listed there don't seem quite serious, the overhead should be that large according to me. A (non-interpreted, properly written from the ground up) language with C family syntax with reference counting seems like a good idea to me.

The biggest problem with reference counting is that it is not a complete solution and is not capable of collecting a cyclic structure. The overhead is incurred every time you set a reference; for many kinds of problems this adds up quickly and can be worse than just waiting for a GC later. (Modern GC is quite advanced and awesome - don't count it down like that!!!)

What you are talking about is nothing special, and it shows up all the time. The C or C++ variant you are looking for is just plain regular C or C++.
For example write your program normally, but constrain yourself not to use any dynamic memory allocation (no new, delete, malloc, or free, or any of their friends, and make sure your libraries do the same), then you have that kind of system. You figure out in advance how much memory you need for everything you could do, and declare that memory statically (either function level static variables, or global variables). The compiler takes care of all the accounting the normal way, nothing special happens at the end of each scope, and no extra computation is necessary.
You can even configure your runtime environment to have a statically allocated stack space (this one isn't really under the compiler's control, more linker and operating system environment). Just figure out how deep your function call chain goes, and how much memory it uses (with a profiler or similar tool), an set it in your link options.
Without dynamic memory allocation (and thus no deallocation through either garbage collection or explicit management), you are limited to the memory you declared when you wrote the program. But that's ok, many programs don't need dynamic memory, and are already written that way. The real need for this shows up in embedded and real-time systems when you absolutely, positively need to know exactly how long an operation will take, how much memory (and other resources) it will use, and that the running time and the use of those resources can't ever change.
The great thing about C and C++ is that the language requires so little from the environment, and gives you the tools to do so much, that smart pointers or statically allocated memory, or even some special scheme that you dream up can be implemented. Requiring the use them, and the constraints you put on yourself just becomes a policy decision. You can enforce that policy with code auditing (use scripts to scan the source or object files and don't permit linking to the dynamic memory libraries)

Related

Could you implement async-await by memcopying stack frames rather than creating state machines?

I am trying to understand all the low-level stuff Compilers / Interpreters / the Kernel do for you (because I'm yet another person who thinks they could design a language that's better than most others)
One of the many things that sparked my curiosity is Async-Await.
I've checked the under-the-hood implementation for a couple languages, including C# (the compiler generates the state machine from sugar code) and Rust (where the state machine has to be implemented manually from the Future trait), and they all implement Async-Await using state machines.
I've not found anything useful by googling ("async copy stack frame" and variations) or in the "Similar questions" section.
To me, this method seems rather complicated and overhead-heavy;
Could you not implement Async-Await by simply memcopying the stack frames of async calls to/from heap?
I'm aware that it is architecturally impossible for some languages (I thank the CLR can't do it, so C# can't either).
Am I missing something that makes this logically impossible? I would expect less complicated code and a performance boost from doing it that way, am I mistaken? I suppose when you have a deep stack hierarchy after a async call (eg. a recursive async function) the amount of data you would have to memcopy is rather large, but there are probably ways to work around that.
If this is possible, then why isn't it done anywhere?
Yes, an alternative to converting code into state machines is copying stacks around. This is the way that the go language does it now, and the way that Java will do it when Project Loom is released.
It's not an easy thing to do for real-world languages.
It doesn't work for C and C++, for example, because those languages let you make pointers to things on the stack. Those pointers can be used by other threads, so you can't move the stack away, and even if you could, you would have to copy it back into exactly the same place.
For the same reason, it doesn't work when your program calls out to the OS or native code and gets called back in the same thread, because there's a portion of the stack you don't control. In Java, project Loom's 'virtual threads' will not release the thread as long as there's native code on the stack.
Even in situations where you can move the stack, it requires dedicated support in the runtime environment. The stack can't just be copied into a byte array. It has to be copied off in a representation that allows the garbage collector to recognize all the pointers in it. If C# were to adopt this technique, for example, it would require significant extensions to the common language runtime, whereas implementing state machines can be accomplished entirely within the C# compiler.
I would first like to begin by saying that this answer is only meant to serve as a starting point to go in the actual direction of your exploration. This includes various pointers and building up on the work of various other authors
I've checked the under-the-hood implementation for a couple languages, including C# (the compiler generates the state machine from sugar code) and Rust (where the state machine has to be implemented manually from the Future trait), and they all implement Async-Await using state machines
You understood correctly that the Async/Await implementation for C# and Rust use state machines. Let us understand now as to why are those implementations chosen.
To put the general structure of stack frames in very simple terms, whatever we put inside a stack frame are temporary allocations which are not going to outlive the method which resulted in the addition of that stack frame (including, but not limited to local variables). It also contains the information of the continuation, ie. the address of the code that needs to be executed next (in other words, the control has to return to), within the context of the recently called method. If this is a case of synchronous execution, the methods are executed one after the other. In other words, the caller method is suspended until the called method finishes execution. This, from a stack perspective fits in intuitively. If we are done with the execution of a called method, the control is returned to the caller and the stack frame can be popped off. It is also cheap and efficient from a perspective of the hardware that is running this code as well (hardware is optimised for programming with stacks).
In the case of asynchronous code, the continuation of a method might have to trigger several other methods that might get called from within the continuation of callers. Take a look at this answer, where Eric Lippert outlines the entirety of how the stack works for an asynchronous flow. The problem with asynchronous flow is that, the method calls do not exactly form a stack and trying to handle them like pure stacks may get extremely complicated. As Eric says in the answer, that is why C# uses graph of heap-allocated tasks and delegates that represents a workflow.
However, if you consider languages like Go, the asynchrony is handled in a different way altogether. We have something called Goroutines and here is no need for await statements in Go. Each of these Goroutines are started on their own threads that are lightweight (each of them have their own stacks, which defaults to 8KB in size) and the synchronization between each of them is achieved through communication through channels. These lightweight threads are capable of waiting asynchronously for any read operation to be performed on the channel and suspend themselves. The earlier implementation in Go is done using the SplitStacks technique. This implementation had its own problems as listed out here and replaced by Contigious Stacks. The article also talks about the newer implementation.
One important thing to note here is that it is not just the complexity involved in handling the continuation between the tasks that contribute to the approach chosen to implement Async/Await, there are other factors like Garbage Collection that play a role. GC process should be as performant as possible. If we move stacks around, GC becomes inefficient because accessing an object then would require thread synchronization.
Could you not implement Async-Await by simply memcopying the stack frames of async calls to/from heap?
In short, you can. As this answer states here, Chicken Scheme uses a something similar to what you are exploring. It begins by allocating everything on the stack and move the stack values to heap when it becomes too large for the GC activities (Chicken Scheme uses Generational GC). However, there are certain caveats with this kind of implementation. Take a look at this FAQ of Chicken Scheme. There is also lot of academic research in this area (linked in the answer referred to in the beginning of the paragraph, which I shall summarise under further readings) that you may want to look at.
Further Reading
Continuation Passing Style
call-with-current-continuation
The classic SICP book
This answer (contains few links to academic research in this area)
TLDR
The decision of which approach to be taken is subjective to factors that affect the overall usability and performance of the language. State Machines are not the only way to implement the Async/Await functionality as done in C# and Rust. Few languages like Go implement a Contigious Stack approach coordinated over channels for asynchronous operations. Chicken Scheme allocates everything on the stack and moves the recent stack value to heap in case it becomes heavy for its GC algorithm's performance. Moving stacks around has its own set of implications that affect garbage collection negatively. Going through the research done in this space will help you understand the advancements and rationale behind each of the approaches. At the same time, you should also give a thought to how you are planning on designing/implementing the other parts of your language for it be anywhere close to be usable in terms of performance and overall usability.
PS: Given the length of this answer, will be happy to correct any inconsistencies that may have crept in.
I have been looking into various strategies for doing this myseøf, because I naturally thi k I can design a language better than anybody else - same as you. I just want to emphasize that when I say better, I actually mean better as in tastes better for my liking, and not objectively better.
I have come to a few different approaches, and to summarize: It really depends on many other design choices you have made in the language.
It is all about compromises; each approach has advantages and disadvantages.
It feels like the compiler design community are still very focused on garbage collection and minimizing memory waste, and perhaps there is room for some innovation for more lazy and less purist language designers given the vast resources available to modern computers?
How about not having a call stack at all?
It is possible to implement a language without using a call stack.
Pass continuations. The function currently running is responsible for keeping and resuming the state of the caller. Async/await and generators come naturally.
Preallocated static memory addresses for all local variables in all declared functions in the entire program. This approach causes other problems, of course.
If this is your design, then asymc functions seem trivial
Tree shaped stack
With a tree shaped stack, you can keep all stack frames until the function is completely done. It does not matter if you allow progress on any ancestor stack frame, as long as you let the async frame live on until it is no longer needed.
Linear stack
How about serializing the function state? It seems like a variant of continuations.
Independent stack frames on the heap
Simply treat invocations like you treat other pointers to any value on the heap.
All of the above are trivialized approaches, but one thing they have in common related to your question:
Just find a way to store any locals needed to resume the function. And don't forget to store the program counter in the stack frame as well.

What can pointers do that are otherwise impossible to implement?

So I'm having a hard time grasping the idea behind pointers and all that memory allocation.
I'm thinking nowadays with computer as powerful as they are right now why do we have to use pointers at all?
Isn't there always a workaround to do things without the help of pointers?
Pointers are an indirection: instead of working with the data itself, you are working with (something) that points to the data. Depending on the semantics of the language, this allows many things: cheaply switch to another instance of data (by setting the pointer to point to another instance), passing pointers allows access to the original data without having to make (a possibly expensive) copy, etc.
Memory allocation is related to pointers, but separate: you can have pointers without allocating memory. The reason you need pointers for memory allocation is that the actual address the allocated block of memory resides is not known at compile time, so you can only access it via a level of indirection (i.e. pointers) -- the compiler statically allocates space for the pointer that will point to the dynamically allocated memory.
Pointers are incredibly powerful. Just because computers have a faster processing time nowdays, doesn't mean that's any reason to abandon something as essential as pointers. Passing around giant chunks of memory on the stack is inefficient at best, catastrophic at worst. With pointers, you only need to maintain a reference to where the data resides, rather than duplicating huge chunks of memory each time you call a function.
Also, if you're copying all the data every time, how do you modify the original data? Aside from returning the copy of the structure in every call that touches it.
I remember reading somewhere that Dijkstra was assessing a student for a programming course; this student was quite intelligent but s/he wasn't able to solve the problem because there was sort of a mental block.
All the code was sort of ok, but what was needed was simply to use the expression
a[a[i+1]] = j;
and even if being so close to the solution still the goal seemed to be miles away.
Languages "without pointers" already exist... e.g. BASIC. Without explicit pointers, that is. But the indirection idea... the idea that you can have data to mean just where to find other data is central to programming.
The very idea of array is about being able to use computed values to find other values.
Trying to hide this idea is an horrible plan. According to Dijkstra anyone that has been exposed to the BASIC language has already received such a mental mutilation that is impossible to recover as a good programmer (and probably the absence of explicit indirection was one of the problems).
I think he was exaggerating.
Just a bit.

Just in Time compilation always faster?

Greetings to all the compiler designers here on Stack Overflow.
I am currently working on a project, which focuses on developing a new scripting language for use with high-performance computing. The source code is first compiled into a byte code representation. The byte code is then loaded by the runtime, which performs aggressive (and possibly time consuming) optimizations on it (which go much further, than what even most "ahead-of-time" compilers do, after all that's the whole point in the project). Keep in mind the result of this process is still byte code.
The byte code is then run on a virtual machine. Currently, this virtual machine is implemented using a straight-forward jump table and a message pump. The virtual machine runs over the byte code with a pointer, loads the instruction under the pointer, looks up an instruction handler in the jump table and jumps into it. The instruction handler carries out the appropriate actions and finally returns control to the message loop. The virtual machine's instruction pointer is incremented and the whole process starts over again. The performance I am able to achieve with this approach is actually quite amazing. Of course, the code of the actual instruction handlers is again fine-tuned by hand.
Now most "professional" run-time environments (like Java, .NET, etc.) use Just-in-Time compilation to translate the byte code into native code before execution. A VM using a JIT does usually have much better performance than a byte code interpreter. Now the question is, since all an interpreter basically does is load an instruction and look up a jump target in a jump table (remember the instruction handler itself is statically compiled into the interpreter, so it is already native code), will the use of Just-in-Time compilation result in a performance gain or will it actually degrade performance? I cannot really imagine the jump table of the interpreter to degrade performance that much to make up the time that was spent on compiling that code using a JITer. I understand that a JITer can perform additional optimization on the code, but in my case very aggressive optimization is already performed on the byte code level prior to execution. Do you think I could gain more speed by replacing the interpreter by a JIT compiler? If so, why?
I understand that implementing both approaches and benchmarking will provide the most accurate answer to this question, but it might not be worth the time if there is a clear-cut answer.
Thanks.
The answer lies in the ratio of single-byte-code-instruction complexity to jump table overheads. If you're modelling high level operations like large matrix multiplications, then a little overhead will be insignificant. If you're incrementing a single integer, then of course that's being dramatically impacted by the jump table. Overall, the balance will depend upon the nature of the more time-critical tasks the language is used for. If it's meant to be a general purpose language, then it's more useful for everything to have minimal overhead as you don't know what will be used in a tight loop. To quickly quantify the potential improvement, simply benchmark some nested loops doing some simple operations (but ones that can't be optimised away) versus an equivalent C or C++ program.
When you use an interpreter, the code cache in your processor caches the interpreter code; not the byte code (which may be cached in the data cache). Since code caches are 2 to 3 times faster than data caches, IIRC; you may see a performance boost if you JIT compile. Also, the native, real code you are executing is probably PIC; something which can be avoided for JITted code.
Everything else depends on how optimized the byte code is, IMHO.
JIT can theoretically optimize better, since it has information not available at compile time (especially about typical runtime behavior). So it can for example do better branch prediction, roll out loops as needed, et.c.
I am sure your jumptable approach is OK, but I still think it would perform rather poor compared to straight C code, don't you think?

Why has Ada no garbage collector?

I know GC wasn't popular in the days when Ada was developed and for the main use case of embedded programming it is still not a good choice.
But considering that Ada is a general purpose programming language why wasn't a partial and optional (traces only explicitly tagged memory objects) garbage collector introduced in later revisions of the language and the compiler implementations.
I simply can't think of developing a normal desktop application without a garbage collector anymore.
Ada was designed with military applications in mind. One of the big priorities in its design was determinism. i.e. one wanted an Ada program to consistently perform exactly the same way every time, in any environment, under all operating systems... that kinda thing.
A garbage collector turns one application into two, working against one another. Java programs develop hiccups at random intervals when the GC decides to go to work, and if it's too slow about it there's a chance that an application will run out of heap sometimes and not others.
Simplified: A garbage collector introduces some variability into a program that the designers didn't want. You make a mess - you clean it up! Same code, same behavior every time.
Not that Ada became a raging worldwide success, mind you.
Because Ada was designed for use in defense systems which control weapons in realtime, and garbage collection interferes with the timing of your application. This is dangerous which is why, for many years, Java came with a warning that it was not to be used for healthcare and military control systems.
I believe that the reason there is no longer such a disclaimer with Java is because the underlying hardware has become much faster as well as the fact that Java has better GC algorithms and better control over GC.
Remember that Ada was developed in the 1970's and 1980's at a time when computers were far less powerful than they are today, and in control applications timing issues were paramount.
First off, there is nothing in the language really that prohibits garbage collection.
Secondly some implementations do perform garbage collection. In particular, all the implementations that target the JVM garbage collect.
Thirdly, there is a way to get some amount of garbage collection with all compilers. You see, when an access type goes out of scope, if you specifially told the language to set aside a certian amount of space for storage of its objects, then that space will be destroyed at that point. I've used this in the past to get some modicum of garbage collection. The declaration voodo you use is:
type Foo is access Blah;
for Foo'storage_size use 100_000_000; --// 100K
If you do this, then all (100K of) memory allocated to Blah objects pointed to by Foo pointers will be cleaned up when the Foo type goes out of scope. Since Ada allows you to nest subroutines inside of other subroutines, this is particularly powerful.
To see more about what storage_size and storage pools can do for you, see LRM 13.11
Fourthly, well-written Ada programs don't tend to rely on dynamic memory allocation nearly as much as C programs do. C had a number of design holes that practicioners learned to use pointers to paint over. A lot of those idioms aren't nessecary in Ada.
the answer is more complicated: Ada does not require a garbage collector, because of real-time constraints and such. however, the language have been cleverly designed so as to allow the implementation of a garbage collector.
although, many (almost all) compilers do not include a garbage collector, there are some notable implementation:
a patch for GNAT
Ada compilers targeting the Java Virtual Machine (i don't know if those projects are still supported). It used the garbage collector of the JVM.
there are plenty other sources about garbage collection in Ada around the web. this subject has been discussed at length, mainly because of the fierce competition with Java in the mid '90s (have a look at this page: "Ada 95 is what the Java language should have been"), when Java was "The Next Big Thing" before Microsoft drew C#.
First off, I'd like to know who's using Ada these days. I actually like the language, and there's even a GUI library for Linux/Ada, but I haven't heard anything about active Ada development for years. Thanks to its military connections, I'm really not sure if it's ancient history or so wildly successful that all mention of its use is classified.
I think there's a couple of reason for no GC in Ada. First, and foremost, it dates back to an era where most compiled languages used primarily stack or static memory, or in a few cases, explicit heap allocate/free. GC as a general philosophy really only took off about 1990 or so, when OOP, improved memory management algorithms and processors powerful enough to spare the cycles to run it all came into their own. What simply compiling Ada could do to an IBM 4331 mainframe in 1989 was simply merciless. Now I have a cell phone that can outperform that machine's CPU.
Another good reason is that there are people who think that rigorous program design includes precise control over memory resources, and that there shouldn't be any tolerance for letting dynamically-acquired objects float. Sadly, far too many people ended up leaking memory as dynamic memory became more and more the rule. Plus, like the "efficiency" of assembly language over high-level languages, and the "efficiency" of raw JDBC over ORM systems, the "efficiency" of manual memory management tends to invert as it scales up (I've seen ORM benchmarks where the JDBC equivalent was only half as efficient). Counter-intuitive, I know, but these days systems are much better at globally optimizing large applications, plus they're able to make radical re-optimizations in response to superficially minor changes.Including dynamically re-balancing algorithms on the fly based on detected load.
I'm afraid I'm going to have to differ with those who say that real-time systems can't afford GC memory. GC is no longer something that freezes the whole system every couple of minutes. We have much more intelligent ways to reclaim memory these days.
Your question is incorrect. It does. See the package ada.finalization which handles GC for you.
I thought I'd share a really simple example of how to implement a Free() procedure (which would be used in a way familiar to all C programmers)...
with Ada.Integer_Text_IO, Ada.Unchecked_Deallocation;
use Ada.Integer_Text_IO;
procedure Leak is
type Int_Ptr is access Integer;
procedure Free is new Ada.Unchecked_Deallocation (Integer, Int_Ptr);
Ptr : Int_Ptr := null;
begin
Ptr := new Integer'(123);
Free (Ptr);
end Leak;
Calling Free at the end of the program will return the allocated Integer to the Storage Pool ("heap" in C parlance). You can use valgrind to demonstrate that this does in fact prevent 4 bytes of memory being leaked.
The Ada.Unchecked_Deallocation (a generically defined procedure) can be used on (I think) any type that may be allocated using the "new" keyword. The Ada Reference Manual ("13.11.2 Unchecked Storage Deallocation") has more details.

Execution speed of references vs pointers

I recently read a discussion regarding whether managed languages are slower (or faster) than native languages (specifically C# vs C++). One person that contributed to the discussion said that the JIT compilers of managed languages would be able to make optimizations regarding references that simply isn't possible in languages that use pointers.
What I'd like to know is what kind of optimizations that are possible on references and not on pointers?
Note that the discussion was about execution speed, not memory usage.
In C++ there are two advantages of references related to optimization aspects:
A reference is constant (refers to the same variable for its whole lifetime)
Because of this it is easier for the compiler to infer which names refer to the same underlying variables - thus creating optimization opportunities. There is no guarantee that the compiler will do better with references, but it might...
A reference is assumed to refer to something (there is no null reference)
A reference that "refers to nothing" (equivalent to the NULL pointer) can be created, but this is not as easy as creating a NULL pointer. Because of this the check of the reference for NULL can be omitted.
However, none of these advantages carry over directly to managed languages, so I don't see the relevance of that in the context of your discussion topic.
There are some benefits of JIT compilation mentioned in Wikipedia:
JIT code generally offers far better performance than interpreters. In addition, it can in some or many cases offer better performance than static compilation, as many optimizations are only feasible at run-time:
The compilation can be optimized to the targeted CPU and the operating system model where the application runs. For example JIT can choose SSE2 CPU instructions when it detects that the CPU supports them. With a static compiler one must write two versions of the code, possibly using inline assembly.
The system is able to collect statistics about how the program is actually running in the environment it is in, and it can rearrange and recompile for optimum performance. However, some static compilers can also take profile information as input.
The system can do global code optimizations (e.g. inlining of library functions) without losing the advantages of dynamic linking and without the overheads inherent to static compilers and linkers. Specifically, when doing global inline substitutions, a static compiler must insert run-time checks and ensure that a virtual call would occur if the actual class of the object overrides the inlined method.
Although this is possible with statically compiled garbage collected languages, a bytecode system can more easily rearrange memory for better cache utilization.
I can't think of something related directly to the use of references instead of pointers.
In general speak, references make it possible to refer to the same object from different places.
A 'Pointer' is the name of a mechanism to implement references. C++, Pascal, C... have pointers, C++ offers another mechanism (with slightly other use cases) called 'Reference', but essentially these are all implementations of the general referencing concept.
So there is no reason why references are by definition faster/slower than pointers.
The real difference is in using a JIT or a classic 'up front' compiler: the JIT can data take into account that aren't available for the up front compiler. It has nothing to do with the implementation of the concept 'reference'.
Other answers are right.
I would only add that any optimization won't make a hoot of difference unless it is in code where the program counter actually spends much time, like in tight loops that don't contain function calls (such as comparing strings).
An object reference in a managed framework is very different from a passed reference in C++. To understand what makes them special, imagine how the following scenario would be handled, at the machine level, without garbage-collected object references: Method "Foo" returns a string, which is stored into various collections and passed to different pieces of code. Once nothing needs the string any more, it should be possible to reclaim all memory used in storing it, but it's unclear what piece of code will be the last one to use the string.
In a non-GC system, every collection either needs to have its own copy of the string, or else needs to hold something containing a pointer to a shared object which holds the characters in the string. In the latter situation, the shared object needs to somehow know when the last pointer to it gets eliminated. There are a variety of ways this can be handled, but an essential common aspect of all of them is that shared objects need to be notified when pointers to them are copied or destroyed. Such notification requires work.
In a GC system by contrast, programs are decorated with metadata to say which registers or parts of a stack frame will be used at any given time to hold rooted object references. When a garbage collection cycle occurs, the garbage collector will have to parse this data, identify and preserve all live objects, and nuke everything else. At all other times, however, the processor can copy, replace, shuffle, or destroy references in any pattern or sequence it likes, without having to notify any of the objects involved. Note that when using pointer-use notifications in a multi-processor system, if different threads might copy or destroy references to the same object, synchronization code will be required to make the necessary notification thread-safe. By contrast, in a GC system, each processor may change reference variables at any time without having to synchronize its actions with any other processor.

Resources