What is Trivial Abstraction (Leaky abstraction Context) - abstraction

I was reading an article of Leaky Abstraction Law and I read something saying
All non-trivial abstractions, to some degree, are leaky.
So what does Trivial abstraction means ?
Thank you.

I have a feeling this question will get flagged as argumentative or something, but I am going to attempt an answer any way:
In the context of the article, a "leak" in the abstraction is not a memory leak, but rather, the abstraction suffering from the limitations of the underlying implementation.
The more you try to hide the underlying implementation, the less "trivial" the abstraction becomes - the more work it does.
So, it seems to me a "trivial" abstraction would be one that doesn't really do anything beyond what the underlying implementation does.
Taking the first example of the article - a trivial abstraction for IP would be some library that would let you easily build IP packets, without putting one together byte by byte your self, but would do nothing else - no retry, no protocol for verifying delivery, no packet numbering to reassemble in order like TCP does.
But in the end, since the article does not provide a clear definition, "trivial" and "non trivial" is in the eye of the beholder.
You judge where the line lies, and how much an abstraction needs to add on top of the abstracted implementation before it becomes non-trivial...
By the way, this is not a C++ specific question.

The word trivial is used for its common dictionary definition. It has nothing to do with C++ (or mathematics).
Trivial
of little value or importance.
synonyms: unimportant, insignificant, inconsequential, minor.
In programming the word toy is often used to describe trivial applications, as in the phrase a toy program. In this case a trivial abstraction would be a toy abstraction, used for demonstration purposes only: not real or practical.

Related

Could you implement async-await by memcopying stack frames rather than creating state machines?

I am trying to understand all the low-level stuff Compilers / Interpreters / the Kernel do for you (because I'm yet another person who thinks they could design a language that's better than most others)
One of the many things that sparked my curiosity is Async-Await.
I've checked the under-the-hood implementation for a couple languages, including C# (the compiler generates the state machine from sugar code) and Rust (where the state machine has to be implemented manually from the Future trait), and they all implement Async-Await using state machines.
I've not found anything useful by googling ("async copy stack frame" and variations) or in the "Similar questions" section.
To me, this method seems rather complicated and overhead-heavy;
Could you not implement Async-Await by simply memcopying the stack frames of async calls to/from heap?
I'm aware that it is architecturally impossible for some languages (I thank the CLR can't do it, so C# can't either).
Am I missing something that makes this logically impossible? I would expect less complicated code and a performance boost from doing it that way, am I mistaken? I suppose when you have a deep stack hierarchy after a async call (eg. a recursive async function) the amount of data you would have to memcopy is rather large, but there are probably ways to work around that.
If this is possible, then why isn't it done anywhere?
Yes, an alternative to converting code into state machines is copying stacks around. This is the way that the go language does it now, and the way that Java will do it when Project Loom is released.
It's not an easy thing to do for real-world languages.
It doesn't work for C and C++, for example, because those languages let you make pointers to things on the stack. Those pointers can be used by other threads, so you can't move the stack away, and even if you could, you would have to copy it back into exactly the same place.
For the same reason, it doesn't work when your program calls out to the OS or native code and gets called back in the same thread, because there's a portion of the stack you don't control. In Java, project Loom's 'virtual threads' will not release the thread as long as there's native code on the stack.
Even in situations where you can move the stack, it requires dedicated support in the runtime environment. The stack can't just be copied into a byte array. It has to be copied off in a representation that allows the garbage collector to recognize all the pointers in it. If C# were to adopt this technique, for example, it would require significant extensions to the common language runtime, whereas implementing state machines can be accomplished entirely within the C# compiler.
I would first like to begin by saying that this answer is only meant to serve as a starting point to go in the actual direction of your exploration. This includes various pointers and building up on the work of various other authors
I've checked the under-the-hood implementation for a couple languages, including C# (the compiler generates the state machine from sugar code) and Rust (where the state machine has to be implemented manually from the Future trait), and they all implement Async-Await using state machines
You understood correctly that the Async/Await implementation for C# and Rust use state machines. Let us understand now as to why are those implementations chosen.
To put the general structure of stack frames in very simple terms, whatever we put inside a stack frame are temporary allocations which are not going to outlive the method which resulted in the addition of that stack frame (including, but not limited to local variables). It also contains the information of the continuation, ie. the address of the code that needs to be executed next (in other words, the control has to return to), within the context of the recently called method. If this is a case of synchronous execution, the methods are executed one after the other. In other words, the caller method is suspended until the called method finishes execution. This, from a stack perspective fits in intuitively. If we are done with the execution of a called method, the control is returned to the caller and the stack frame can be popped off. It is also cheap and efficient from a perspective of the hardware that is running this code as well (hardware is optimised for programming with stacks).
In the case of asynchronous code, the continuation of a method might have to trigger several other methods that might get called from within the continuation of callers. Take a look at this answer, where Eric Lippert outlines the entirety of how the stack works for an asynchronous flow. The problem with asynchronous flow is that, the method calls do not exactly form a stack and trying to handle them like pure stacks may get extremely complicated. As Eric says in the answer, that is why C# uses graph of heap-allocated tasks and delegates that represents a workflow.
However, if you consider languages like Go, the asynchrony is handled in a different way altogether. We have something called Goroutines and here is no need for await statements in Go. Each of these Goroutines are started on their own threads that are lightweight (each of them have their own stacks, which defaults to 8KB in size) and the synchronization between each of them is achieved through communication through channels. These lightweight threads are capable of waiting asynchronously for any read operation to be performed on the channel and suspend themselves. The earlier implementation in Go is done using the SplitStacks technique. This implementation had its own problems as listed out here and replaced by Contigious Stacks. The article also talks about the newer implementation.
One important thing to note here is that it is not just the complexity involved in handling the continuation between the tasks that contribute to the approach chosen to implement Async/Await, there are other factors like Garbage Collection that play a role. GC process should be as performant as possible. If we move stacks around, GC becomes inefficient because accessing an object then would require thread synchronization.
Could you not implement Async-Await by simply memcopying the stack frames of async calls to/from heap?
In short, you can. As this answer states here, Chicken Scheme uses a something similar to what you are exploring. It begins by allocating everything on the stack and move the stack values to heap when it becomes too large for the GC activities (Chicken Scheme uses Generational GC). However, there are certain caveats with this kind of implementation. Take a look at this FAQ of Chicken Scheme. There is also lot of academic research in this area (linked in the answer referred to in the beginning of the paragraph, which I shall summarise under further readings) that you may want to look at.
Further Reading
Continuation Passing Style
call-with-current-continuation
The classic SICP book
This answer (contains few links to academic research in this area)
TLDR
The decision of which approach to be taken is subjective to factors that affect the overall usability and performance of the language. State Machines are not the only way to implement the Async/Await functionality as done in C# and Rust. Few languages like Go implement a Contigious Stack approach coordinated over channels for asynchronous operations. Chicken Scheme allocates everything on the stack and moves the recent stack value to heap in case it becomes heavy for its GC algorithm's performance. Moving stacks around has its own set of implications that affect garbage collection negatively. Going through the research done in this space will help you understand the advancements and rationale behind each of the approaches. At the same time, you should also give a thought to how you are planning on designing/implementing the other parts of your language for it be anywhere close to be usable in terms of performance and overall usability.
PS: Given the length of this answer, will be happy to correct any inconsistencies that may have crept in.
I have been looking into various strategies for doing this myseøf, because I naturally thi k I can design a language better than anybody else - same as you. I just want to emphasize that when I say better, I actually mean better as in tastes better for my liking, and not objectively better.
I have come to a few different approaches, and to summarize: It really depends on many other design choices you have made in the language.
It is all about compromises; each approach has advantages and disadvantages.
It feels like the compiler design community are still very focused on garbage collection and minimizing memory waste, and perhaps there is room for some innovation for more lazy and less purist language designers given the vast resources available to modern computers?
How about not having a call stack at all?
It is possible to implement a language without using a call stack.
Pass continuations. The function currently running is responsible for keeping and resuming the state of the caller. Async/await and generators come naturally.
Preallocated static memory addresses for all local variables in all declared functions in the entire program. This approach causes other problems, of course.
If this is your design, then asymc functions seem trivial
Tree shaped stack
With a tree shaped stack, you can keep all stack frames until the function is completely done. It does not matter if you allow progress on any ancestor stack frame, as long as you let the async frame live on until it is no longer needed.
Linear stack
How about serializing the function state? It seems like a variant of continuations.
Independent stack frames on the heap
Simply treat invocations like you treat other pointers to any value on the heap.
All of the above are trivialized approaches, but one thing they have in common related to your question:
Just find a way to store any locals needed to resume the function. And don't forget to store the program counter in the stack frame as well.

Etymology of reflection?

I've never found a clear explanation for the etymology of reflection in the context of computer languages, so I want to clarify this here.
"Reflection" originates from Latin and has the following definitions:
bend back
turn back
turn round
So the idea behind it is a language that is able to bend back on itself, to be able to look and manipulate its own code.
Or is there something else?
Reflection in logic, functional and object-oriented programming:
a Short Comparative Study Francois-Nicola Demers and Jacques Malenfant (PDF) seems to agree:
Reflection is the process of reasoning about and/or acting upon oneself.
The term reflection was (according to the wikipedia article) coined by Brian Cantwell Smith in his dissertation Procedural reflection in programming languages.
The prologue starts
It is a striking fact about human cognition that we can think not only about the world around us, but also about our ideas, our actions, our feelings. our past experience. This ability to reflect lies behind much of the subtlety and flexibility with which we deal with the world; it is an essential part of mastering new skills, of reacting to unexpected circumstances, ...
(---)
This last aspect -- the self-referential aspect of reflective thought -- has sparked particular interest for cognitive theorists...
(---)
In artifial intelligence, the focus on computational forms of self-referential reflective reasoning has become particularly central.
And then it summarises the reflection hypothesis as
In as much as a computational process can be constructed to reason about an external world in virtue of comprising an ingredient process (interpreter) formally manipulating representations of that world, so too a computational process could be made to reason about itself in virtue of comprising an ingredient process (interpreter) formally manipulating representations of its own operations and structures.
The use of reflection is tied to self-representation and self-reference which to me suggests that of the alternatives in the question, the closest is bend back as also given in the etymonline entry on reflection:
Of the mind, from 1670s. Meaning "remark made after turning back one's thought on some subject" is from 1640s. Spelling with -ct- recorded from late 14c., established 18c., by influence of the verb.

Does Law of Demeter also account for standard classes?

Assuming the following code:
requiredIssue.get().isDone()
where requiredIssue is an Optional and it has been verified that requiredIssue.isPresent(). Does this code break the Law of Demeter? Technically there is a tight coupling here between my class and Optional now, because isDone() now relies on get() working properly.
But is it not reasonable to assume for the standard library to work consistently?
In practical terms, the Law of Demeter means that if you have a class or function that depends on the value of isDone(), you should not pass requiredIssue to that class or function. You should simply pass the value of isDone(). If you find that following this principle leads to a proliferation of fields or parameters, you're probably violating the Single Responsibility Principle.
First,
requiredIssue.get().isDone()
does violate the Law of Demeter, which clearly enumerates the objects you can call methods on. Whatever object get() returns is not in that list. For more detail on the Law itself, see this article.
Second, the concrete example with Optional. Calling get() of an Optional has many problems and should be avoided. The Law of Demeter (which is a sort-of canary in a coalmine for bad code) correctly indicates that this construct is flawed. Problems range from temporal coupling with the isPresent() call, technical leak of having to call isPresent() in the first place, to copying the "isPresent()" logic allover the place. Most of the time if you feel the need to use get() you are already doing it wrong.
Third, the more general question of whether standard classes should be basically exempt from the Law of Demeter. They should not be exempt, if you are trying to do object-oriented programming. You have to keep in mind that some of the JDK, and most of the JEE are not actually meant to be used in an object-oriented environment. Sure, they use Java, which is a semi-object-oriented language, but they are expected to be used in a "mixed-paradigm" context, in which procedural designs are allowed. Just think of "Service" classes, pure "Data" objects (like Beans), layering, etc. These come from our procedural past, which some argue is still relevant and useful today.
Regardless where you stand on that debate, some of the standard classes/libraries/specifications are just not compatible with the Law of Demeter, or object-orientation in general. If you are willing to compromise on object-oriented principles (like most projects) that is not a problem. If you are actually not willing to compromise on a good design, but are forced to do so by libraries or standard classes, the solution is not to discuss the Law of Demeter away, but to recognize (and accept) that you are making an conscious decision to violate it.
So, is it reasonable to assume the standard library works? Sure! That does not mean they are exempt from good design principles, like the Law of Demeter, if you really want to stick with object-orientation.
Law of Demeter only applies weakly, and at system boundaries (almost never the object level, sometimes at the module level).
It's only a problem when you are forced to make 'public' a class in your module/system that would not otherwisely have to be public. IMHO Making this mistake is and/or should be rare, and avoid it it common sense, even for med-level devs.
As you can see, the standard classes (or .Net:framework/core) are NOT a violation, because often times the objects you see in their double-dots are meant to be public, anyway.
Think about the design. If you are following GRASP principles (and DDD) when choosing what kinds of objects to make, LOD will go away and lead to increased coupling.
It's nuanced, LOD is not complete crap, it's just not helpful and better to ignore, while solving the obscure problems it's meant to solve using solutions (GRASP,DDD) that address the root cause instead of just outlawing the symptoms.
More reading from people on the internet who also agree with me (which must mean I've right, huh?)
https://www.tedinski.com/2018/12/18/the-law-of-demeter.html
https://naildrivin5.com/blog/2020/01/22/law-of-demeter-creates-more-problems-than-it-solves.html
https://wiki.c2.com/?LawOfDemeterIsInvalid

How to identify that code is over abstracted?

What should be the measures that should be used to identify that code is over abstracted and very hard to understand and what should be done to reduce over abstraction?
"Simplicity over complexity, complexity over complicatedness"
So - there's a benefit to abstract something only if You are "de-leveling" complicatedness to complexity. Reasons to do that can vary: better modularity, better encapsulation etc.
Identifying over abstraction is a chicken and egg problem. In order to reduce over abstraction You need to understand actual reason behind code lines. That includes understanding idea of particular abstraction itself (in contrast to calling it over abstracted cause of lack of understanding). And that's not enough - You need to know a better, simpler solution to prove that it's over abstracted.
If You are looking for tool that could do it in Your place - look no more, only mind can reliably judge that.
I will give an answer that will get a LOT of down votes!
If the code is written in an OO language .. it is necessarily heavily over-abstracted. The purer the language the worse the problem.
Abstraction should be used with great caution. If in doubt always use concrete data structures. (You can always abstract later, this is easier than de-abstraction :)
You must be very certain you have the right abstraction in your current context, and you must be very sure that concept will stand the test of change. Abstraction has a high price in performance of both the code and the coder.
Some weak tests for over-abstraction: if the data structure is a product type (struct in C) and the programmer has written get and set method for each field, they have utterly failed to provide any real abstraction, disabled operators like C increment, for no purpose, and simply not understood that the struct field names are already the abstract representation of a product. Duplicating and laming up the interface is not a good idea.
A good test for the product case is whether there exist any data invariants to maintain. For example a pair of integers representing a rational number is almost sufficient, there's little need for any abstraction because all pairs are valid except when the denominator is zero. However for performance reasons one may choose to maintain an invariant, typically the denominator is required to be greater than zero, and the numerator and denominator are relatively prime. To ensure the invariant, the product representation is encapsulated: the initial value protected by a constructor and methods constrained to maintain the invariant.
To fix code I recommend these steps:
Document the representation invariants the abstraction is maintaining
Remove the abstraction (methods) if you can't find strong invariants
Rewrite code using the method to access the data directly.
This procedure only works for low level abstraction, i.e. abstraction of small values by classes.
Over abstraction at a higher level is much harder to deal with. Ideally you'd refactor the code repeatedly, checking to see after each step it continues to work. However this will be hard, and sometimes a major rewrite is required, rather than a refinement. It's probably not worth it unless the abstraction is so far off base it is not tenable to continue to maintain it.
Download Magento and have a look at the code, read some documents on it and have a look at their ERD: http://www.magentocommerce.com/wiki/_media/doc/magento---sample_database_diagram.png?cache=cache
I'm not joking, this is over-abstraction.. trying to please everyone and cover every base is a terrible idea and makes life extremely difficult for everyone.
Personally I would say that "What is the ideal level of abstraction?" is a subjective question.
I don't like code that uses a new line for every atomic operation, but I also don't like 10 nested operations within one line.
I like the use of recursive functions, but I don't appreciate recursion for the sole sake of recursion.
I like generics, but I don't like (nested) generic functions that e.g. use different code for each specific type that's expected...
It is a matter of personal opinion as well as common sense. Does this answer your question?
I completely agree with what #ArnisLapsa wrote:
"Simplicity over complexity, complexity over complicatedness"
And that
an abstraction is used to "de-level" those, from complicated to complex
(and from complex to simpler)
Also, as stated by #MartinHemmings a good abstraction is quite subjective because we don't all think the same way. And actually our way of thinking change with time. So Something that someone find simple might looks complex to others, and even become simpler with more experiences. Eg. A monadic operation is something trivial for functional programmer, but can be seriously confusing for others. Similarly, a design with mutable object communicating with each other can be natural for some and feel un-trackable for others.
That being said, I would like to add a couple of indicators. Note that this applies to abstractions used in code-base, not "paradigm abstraction" such as everything-is-a-function, or everything-is-designed-as-objects. So:
To the people it concerns, the abstraction should be conceptually simpler than other alternatives, without looking at the implementation. If you find that thinking of all possible cases is simpler that reasoning using the abstraction, then this abstraction is not suitable (for you)
Its implementation should reason only about the abstraction, not the specific cases that it will be used for. As soon as the abstraction implementation has parts made for specific cases, it indicates an "unfit" abstraction. And increasing generalization to cope with each new case, is going the wrong way (and tends to fall to the next issue).
A very common indicator of over-abstraction I have found (and actually fell for) are abstractions that represent more than what is needed, now. As much as possible, they should allow to do exactly what is required, but nothing more. For example, say you're thinking of, or already have, a "2d point" abstraction for which you can define many operators you need. Then you have another need that could really be a "4d point" similar to the 2d. Don't start to use a "Ndimensionnal point" abstraction, especially thinking that you might later need it. Maybe you'll never have anything else than 2 and 4d (because it stays as "a good idea" in the backlog forever) but instead some requirements pops to convert 4d points into pairs of 2d points. That's going to be hard to generalize to n-dimensions. So, each abstraction can be checked to cover and only cover the actual needs. In my point example, the complexity "n-dimensional" is actually only used to cope with the 2 and 4d cases (and the 4d might not even be used that much).
Finally, in a more global point of view, a code-base that has many not related abstractions, is an indicator that the dev team tends to abstract every little issues. So probably many of them are or became over-abstracted.

should we teach pointers in a "fundamentals of programming" course?

I will be teaching a course on the fundamentals of programming next Fall, first year computer science course. What are the pros and cons of teaching pointers in such a course? (My position: they should be taught).
Edit: My problem with the "cater your audience" argument is that in the first couple of years in University, we (profs) do not know if students would like to be scientists or not... we wish we knew, but we have to strike a balance between those who will remain in school (4 years does not a scientist make), and those who will be engineers.
Final decision: At least references, but possibly pointers without pointer arithmetic.
At the very least you should teach references or some equivalent concept. I think you should probably take it easy on things like pointer arithmetic, c arrays and strings, but indirection is a very important concept in computer science, and students should be introduced to it.
Yes.
Pointers underpin a huge number of concepts in other, higher level languages, and I'm firmly of the opinion that you need to teach a certain amount of the lower-level stuff to facilitate a good understanding of why we bother with anything higher level at all.
Once you understand a bit about how memory is allocated, and how it's addressed and manipulated with pointers, explaining a lot of other constructs gets easier. For example, explaining a NullPointerException in Java, or even the concept of references in such languages is child's play if you've got someone who understands pointers in C (and better still, if they also grok references in C++).
Absolutely teach them. Understanding indirection is essential for programming, whether it's with pointers, references, dynamic binding, or any number of other things. Now obviously don't start off with them, but understanding indirection is at least as important as understanding control flow ideas.
The con of course is that some people just won't get it and will do poorly or drop out. If this is a course for people who want to be CS majors then don't sweat it because you're just giving them incentive to switch majors earlier rather than later. If it's more or a general ed course for people who are kind of interested in programming, then they should probably still be introduced, but not graded harshly or heavily.
During my first year as a CS student, I took a Java course in fall which was the general intro. The professor didn't teach pointers directly, but he did teach the concept of references, and why you can modify objects and not when primitives when either is passed in an argument.
During my 2nd semester, I took the next course in the series, which was about C, and this class heavily relied on pointers.
For an intro to programming class, I'd say just mention references, but not pointers directly.
I think that a "fundamentals of programming" course should at least touch on basic processor architecture and assembly language, and if it does, you can't really make a case for not discussing pointers.
If you only teach higher-level (byte-code) languages, then I guess pointers would confuse the audience.
Pros: solid understanding of the way that memory is used by the machine, the difference between (and pitfalls of) pointers to data on the heap vs. pointers to data on the stack, passing methods by address, etc.
Cons: complex for an audience who is not yet knowledgeable (or has not had enough time to assimilate the concepts) of computer architecture, including what is stack, what are registers, calling conventions, etc.
So, to summarize, it depends a lot on your audience and on the language(s) you'll tackle (pointers will be meaningless in the context of LISP or Java), as well as on how deep you are willing to go in the direction of what is heap, what is stack, how scope is translated into stack (i.e. why never to return a pointer to a local variable), etc.
When I taught pointers to an engineering class I ultimately fired up a debugger on a simple "hello world" program, and showed the students the actual machine code, register values and corresponding memory dumps, with the stack manipulation and parameter passing, etc., but they were ready for it. Would your audience be receptive to such a drill-down expedition, to ensure solid understanding of what's going on behind the scenes, and would you be willing to go to such lengths? :)
I think you shouldn't teach it first. But later, once basic concepts of programming are acquired.
A good example is the last Stroustrup book : Programming -- Principles and Practice Using C++ where he teach how to make a parser, I/O (streams) usage and GUI usage before even talking about pointers!
I think it will be a good reference for teaching because it is more natural to understand the way we build ideas instead of how much constraints (memory management for example) we have to handle at the same time to make a software work. I really recommand you this book to have a fresh perspective about teaching fundamentals of programming.
It really depends on the goal of your course - teaching programming and teaching computer science are two separate goals, and though they are not mutually exclusive, introductory classes generally do not teach both equally well. Here's an example of the difference: say we want to learn how to sort a list. A programming course in C++ would teach you to use the syntax of a std::sort function template, and homework might be writing several comparison functors. A computer science course would explain to you what a merge sort is, what the algorithm looks like in pseudo-code, and its performance/space characteristics, and homework would be writing the sort function itself.
So if you are teaching introductory programming, then yes, you should teach your students about pointers.
If you are teaching computer science, then no, there is no need to understand pointers at an introductory level.
Anybody who calls themselves a good programmer must know how pointers work; being a good programmer implies that they do not know only a single programming language, but that they know how programming languages work in general, allowing them to adapt to programming languages they haven't seen before.
This doesn't mean that a fundamentals of programming course should be teaching pointers, however.
If your goal is to give these people a complete, well rounded familiarity with programming languages in general, then yes pointers shall be part of that.
If your way of introducing them to programming is to use one programming language at first, with the intention of covering other languages in subsequent courses, and pointers are not relevant to that language, then there's no need to talk about pointers yet.
I think there's a lot to be said by starting people out in one language only, rather than trying to cover every style of language at once.
My first introductory programming course used Haskell. It wasn't until a subsequent course using C that pointers were introduced (I was already a good C and C++ programmer when I took the course; those subjects were mandatory).

Resources