First-Class Citizen - first-class

The definition of first-class citizen found in the wiki article says:
An object is first-class when it
can be stored in variables and data structures
can be passed as a parameter to a subroutine
can be returned as the result of a subroutine
can be constructed at run-time
has intrinsic identity (independent of any given name)
Can someone please explain/elaborate on the 5th requirement (in bold)? I feel that the article should have provided more details as in what sense "intrinsic identity" is capturing.
Perhaps we could use functions in Javascript and functions in C in our discussion to illustrate the 5th bullet.
I believe functions in C are second-class, whereas functions are first-class in Javascript because we can do something like the following in Javascript:
var foo = function () { console.log("Hello world"); };
, which is not permitted in C.
Again, my question is really on the 5th bullet (requirement).

Intrinsic identity is pretty simple, conceptually. If a thing has it, its identity does not depend on something external to that thing. It can be aliased, referenced, renamed, what-have-you, but it still maintains whatever that "identity" is. People (most of them, anyway) have intrinsic identity. You are you, no matter what your name is, or where you live, or what physical transformations you may have suffered in life.
An electron, on the other hand, has no intrinsic identity. Perhaps introducing quantum mechanics here just confuses the issue, but I think it's a really fantastic example. There's no way to "tag" or "label" an electron such that we could tell the difference between it and a neighbor. If you replace one electron with another, there is absolutely no way to distinguish the old one from the new one.
Back to computers: an example of "intrinsic identity" might be the value returned by Object#hashCode() in Java, or whatever mechanism a JavaScript engine uses that permits this statement to be false:
{} === {} // false
but this to be true:
function foo () {}
var bar = foo;
var baz = bar;
baz === foo; // true

Related

How do I do jumps to a label in an enclosing function in LLVM IR?

I want to do an LLVM compiler for a very old language, PL/M. This has some peculiar features, not least of which is having nested functions with the ability to jump out of an enclosing function. In pseudocode:
toplevel() {
nested() {
if (something)
goto label;
}
nested();
label:
print("finished!");
}
The constraints here are:
you can only jump into the top-level function, luckily
the stack does get unwound (the language does not support destructors, so this is easy)
you do not have to have executed the statement at label before jumping (so the naive setjmp/longjmp method doesn't work).
code at label can be executed normally, i.e. it's not like catch
LLVM has a number of non-local jump mechanisms, such as the exception handling system, but I've never used that. Can this be implemented using LLVM exceptions, or are they not suitable for this? Is there an easier way?
If you want the stack to get unwound, you'll likely want it to be in a separate function, at least a separate LLVM IR function. (The only real exception is if your language does not have a construct like C's "alloca()" and you don't allow calling a nested function by address in which case you could inline it.)
That part of the problem you mentioned, jumping out of an enclosing function, is best handled by having some way for the callee to communicate "how it exited" to the caller, and the caller having a "switch()" on that value. You could stick it in the return value (if it already returns a value, make it a struct of both values), you could add a pointer parameter that it writes to, you could add it a thread-local global variable and fill that in before calling longjmp, or you could use exceptions.
Exceptions, they're complex (I can't describe how to make them work offhand but the docs are here: https://llvm.org/docs/ExceptionHandling.html ) and slow when the exception path is taken, and really intended for exceptional situations, not for normal code. Setjmp/longjmp does the same thing as exceptions except simpler to use and without the performance trade-off when executed, but unfortunately there are miscompiles in LLVM which you need will be the one to fix if you start using them in earnest (see the postscript at the end of the answer).
Those two options cover the ways you can do it without changing the function signature, which may be necessary if your language allows the address to be taken then called later.
If you do need to take the address of nested, then LLVM supports trampolines. See https://llvm.org/docs/LangRef.html#trampoline-intrinsics . Trampolines solve the problem of accessing the local variables of the calling function from the callee, even when the function is called by address.
PS. LLVM miscompiles setjmp/longjmp today. The current model is that a call to setjmp may return twice, and only functions with the returns_twice attribute may return twice. Note that this doesn't affect the whole call stack, only the direct caller of a function that returns twice has to deal with the twice-returning call-- just because function F calls setjmp does not mean that F itself can return twice. So far, so good.
The problem is that in a function with a setjmp, all function calls may themselves call longjmp. I'd say "unless proven otherwise" as with all things in optimizers, but there is no attribute in LLVM doesnotlongjmp or any code within LLVM that attempts to answer the question of whether a function could call longjmp. Adding that would be a good optimization, but it's a separate issue from the miscompile.
If you have code like this pseudo-code:
%entry block:
allocate val
val <- 0
setjmpret <- call setjmp
br i1 setjmpret, %first setjmp return block, %second setjmp return block
%first setjmp return block:
val <- 1;
call foo();
goto after;
%second setjmp return block:
call print(val);
goto after;
%after:
return
The control flow graph shows that is no path from val <- 0 to val <- 1 to print(val). The only path with "print(val)" has "val <- 0" before it therefore constant propagation may turn print(val) into print(0). The problem here is a missing control flow edge from foo() back to the %second setjmp return block. In a function that contains a setjmp, all calls which may call longjmp must have a CFG edge to the second setjmp return block. In LLVM that control flow edge is missing and LLVM miscompiles code because of it.
This problem also manifests in the backend. The first time I heard of this problem it was in the context of the backend losing track of the placement of variables on the stack, and this issue was the underlying root cause.
For the most part setjmp/longjmp seems to work because LLVM isn't usually able to analyze what calling foo() might do and can't perform the optimization. For instance if val was not a fresh allocation but was a pointer, then who's to say that foo() doesn't have access to the same pointer, and then performs "val <- 1" on it? If LLVM can't prove that impossible, that precludes the transform to print(0). Secondly, setjmp/longjmp are just not used often in real code.

Treating single and multiple elements the same way ("transparent" map operator)

I'm working on a programming language that is supposed to be easy, intuitive, and succinct (yeah, I know, I'm the first person to ever come up with that goal ;-) ).
One of the features that I am considering for simplifying the use of container types is to make the methods of the container's element type available on the container type itself, basically as a shortcut for invoking a map(...) method. The idea is that working with many elements should not be different from working with a single element: I can apply add(5) to a single number or to a whole list of numbers, and I shouldn't have to write slightly different code for the "one" versus the "many" scenario.
For example (Java pseudo-code):
import static java.math.BigInteger.*; // ZERO, ONE, ...
...
// NOTE: BigInteger has an add(BigInteger) method
Stream<BigInteger> numbers = Stream.of(ZERO, ONE, TWO, TEN);
Stream<BigInteger> one2Three11 = numbers.add(ONE); // = 1, 2, 3, 11
// this would be equivalent to: numbers.map(ONE::add)
As far as I can tell, the concept would not only apply to "container" types (streams, lists, sets...), but more generally to all functor-like types that have a map method (e.g., optionals, state monads, etc.).
The implementation approach would probably be more along the lines of syntactic sugar offered by the compiler rather than by manipulating the actual types (Stream<BigInteger> obviously does not extend BigInteger, and even if it did the "map-add" method would have to return a Stream<BigInteger> instead of an Integer, which would be incompatible with most languages' inheritance rules).
I have two questions regarding such a proposed feature:
(1) What are the known caveats with offering such a feature? Method name collisions between the container type and the element type are one problem that comes to mind (e.g., when I call add on a List<BigInteger> do I want to add an element to the list or do I want to add a number to all elements of the list? The argument type should clarify this, but it's something that could get tricky)
(2) Are there any existing languages that offer such a feature, and if so, how is this implemented under the hood? I did some research, and while pretty much every modern language has something like a map operator, I could not find any languages where the one-versus-many distinction would be completely transparent (which leads me to believe that there is some technical difficulty that I'm overlooking here)
NOTE: I am looking at this in a purely functional context that does not support mutable data (not sure if that matters for answering these questions)
Do you come from an object-oriented background? That's my guess because you're thinking of map as a method belonging to each different "type" as opposed to thinking about various things that are of the type functor.
Compare how TypeScript would handle this if map were a property of each individual functor:
declare someOption: Option<number>
someOption.map(val => val * 2) // Option<number>
declare someEither: Either<string, number>
someEither.map(val => val * 2) // Either<string,number>
someEither.mapLeft(string => 'ERROR') // Either<'ERROR', number>
You could also create a constant representing each individual functor instance (option, array, identity, either, async/Promise/Task, etc.), where these constants have map as a method. Then have a standalone map method that takes one of those "functor constant"s, the mapping function, and the starting value, and returns the new wrapped value:
const option: Functor = {
map: <A, B>(f: (a:A) => B) => (o:Option<A>) => Option<B>
}
declare const someOption: Option<number>
map(option)(val => val * 2)(someOption) // Option<number>
declare const either: Functor = {
map: <E, A, B>(f: (a:A) => B) => (e:Either<E, A>) => Either<E, B>
}
declare const either: Either<string,number>
map(either)(val => val * 2)(someEither)
Essentially, you have a functor "map" that uses the first parameter to identify which type you're going to be mapping, and then you pass in the data and the mapping function.
However, with proper functional languages like Haskell, you don't have to pass in that "functor constant" because the language will apply it for you. Haskell does this. I'm not fluent enough in Haskell to write you the examples, unfortunately. But that's a really nice benefit that means even less boilerplate. It also allows you to write a lot of your code in what is "point free" style, so refactoring becomes much easier if you make your language so you don't have to manually specify the type being used in order to take advantage of map/chain/bind/etc.
Consider you initially write your code that makes a bunch of API calls over HTTP. So you use a hypothetical async monad. If your language is smart enough to know which type is being used, you could have some code like
import { map as asyncMap }
declare const apiCall: Async<number>
asyncMap(n => n*2)(apiCall) // Async<number>
Now you change your API so it's reading a file and you make it synchronous instead:
import { map as syncMap }
declare const apiCall: Sync<number>
syncMap(n => n*2)(apiCall)
Look how you have to change multiple pieces of the code. Now imagine you have hundreds of files and tens of thousands of lines of code.
With a point-free style, you could do
import { map } from 'functor'
declare const apiCall: Async<number>
map(n => n*2)(apiCall)
and refactor to
import { map } from 'functor'
declare const apiCall: Sync<number>
map(n => n*2)(apiCall)
If you had a centralized location of your API calls, that would be the only place you're changing anything. Everything else is smart enough to recognize which functor you're using and apply map correctly.
As far as your concerns about name collisions, that's a concern that will exist no matter your language or design. But in functional programming, add would be a combinator that is your mapping function passed into your fmap (Haskell term) / map(lots of imperative/OO languages' term). The function you use to add a new element to the tail end of an array/list might be called snoc ("cons" from "construct" spelled backwards, where cons prepends an element to your array; snoc appends). You could also call it push or append.
As far as your one-vs-many issue, these are not the same type. One is a list/array type, and the other is an identity type. The underlying code treating them would be different as they are different functors (one contains a single element, while one contains multiple elements.
I suppose you could create a language that disallows single elements by automatically wrapping them as a single-element lists and then just uses the list map. But this seems like a lot of work to make two things that are very different look the same.
Instead, the approach where you wrap single elements to be identity and multiple elements to be a list/array, and then array and identity have their own under-the-hood handlers for the functor method map probably would be better.

FRP vs. State Machine w/ Lenses for Game Loop

I'm trying to understand the practical difference between a FRP graph and a State Machine with lenses- specifically for something like a game loop where the entire state is re-drawn every tick.
Using javascript syntax, the following implementations would both essentially work:
Option 1: State Machine w/ Lenses
//Using Sanctuary and partial.lenses (or Ramda) primitives
//Each update takes the state, modifies it with a lens, and returns it
let state = initialValues;
eventSource.addEventListener(update, () => {
state = S.pipe([
updateCharacter,
updateBackground,
])
(state) //the first call has the initial settings
render(state);
});
Option 2: FRP
//Using Sodium primitives
//It's possible this isn't the best way to structure it, feel free to advise
cCharacter = sUpdate.accum(initialCharacter, updateCharacter)
cBackground = sUpdate.accum(initialBackground, updateBackground)
cState = cCharacter.lift(cBackground, mergeGameObjects)
cState.listen(render)
I see that Option 1 allows any update to get or set data anywhere in the game state, however all the cells/behaviors in Option 2 could be adjusted to be of type GameState and then the same thing applies. If this were the case, then I'm really confused about the difference since that would then just boil down to:
cGameState = sUpdate
.accum(initialGameState, S.pipe(...updates))
.listen(render)
And then they're really very equivalent...
Another way to achieve that goal would be to store all the Cells in some global reference, and then any other cell could sample them for reading. New updates could be propagated for communicating. That solution also feels quite similar to Option 1 at the end of the day.
Is there a way to structure the FRP graph in such a way that it offers clear advantages over the event-driven state machine, in this scenario?
I'm not quite sure what your question is, also because you keep changing the second example in your explanatory text.
In any case, the key benefit of the FRP approach — as I see it — is the following: The game state depends on many things, but they are all listed explicitly on the right-hand side of the definition of cGameState.
In contrast, in the imperative style, you have a global variable state which may or may not be changed by code that is not shown in the snippet you just presented. For all I know, the next line could be
eventSource2.addEventListener(update, () => { state = state + 1; })
and the game state suddenly depends on a second event source, a fact that is not apparent from the snippet you showed. This cannot happen in the FRP example: All dependencies of cGameState are explicit on the right-hand side. (They may be very complicated, sure, but at least they are explicit.)

Is it possible to declare a tuple struct whose members are private, except for initialization?

Is it possible to declare a tuple struct where the members are hidden for all intents and purposes, except for declaring?
// usize isn't public since I don't want users to manipulate it directly
struct MyStruct(usize);
// But now I can't initialize the struct using an argument to it.
let my_var = MyStruct(0xff)
// ^^^^
// How to make this work?
Is there a way to keep the member private but still allow new structs to be initialized with an argument as shown above?
As an alternative, a method such as MyStruct::new can be implemented, but I'm still interested to know if its possible to avoid having to use a method on the type since it's shorter, and nice for types that wrap a single variable.
Background
Without going into too many details, the only purpose of this type is to wrap a single type (a helper which hides some details, adds some functionality and is optimized away completely when compiled), in this context it's not exactly exposing hidden internals to use the Struct(value) style initializing.
Further, since the wrapper is zero overhead, its a little misleading to use the new method which is often associated with allocation/creation instead of casting.
Just as it's convenient type (int)v or int(v), instead of int::new(v), I'd like to do this for my own type.
It's used often, so the ability to use short expression is very convenient. Currently I'm using a macro which calls a new method, its OK but a little awkward/indirect, hence this question.
Strictly speaking this isn't possible in Rust.
However the desired outcome can be achieved using a normal struct with a like-named function (yes, this works!)
pub struct MyStruct {
value: usize,
}
#[allow(non_snake_case)]
pub fn MyStruct(value: usize) -> MyStruct {
MyStruct { value }
}
Now, you can write MyStruct(5) but not access the internals of MyStruct.
I'm afraid that such a concept is not possible, but for a good reason. Each member of a struct, unless marked with pub, is admitted as an implementation detail that should not raise to the surface of the public API, regardless of when and how the object is currently being used. Under this point of view, the question's goal reaches a conundrum: wishing to keep members private while letting the API user define them arbitrarily is not only uncommon but also not very sensible.
As you mentioned, having a method named new is the recommended approach of doing that. It's not like you're compromising code readability with the extra characters you have to type. Alternatively, for the case where the struct is known to wrap around an item, making the member public can be a possible solution. That, on the other hand, would allow any kind of mutations through a mutable borrow (thus possibly breaking the struct's invariants, as mentioned by #MatthieuM). This decision depends on the intended API.

Languages supporting complete reflection

Only recently, I discovered that both Java and C# do not support reflection of local variables. For example, you cannot retrieve the names of local variables at runtime.
Although clearly this is an optimisation that makes sense, I'm curious as to whether any current languages support full and complete reflection of all declarations and constructs.
EDIT: I will qualify my "names of local variables" example a bit further.
In C#, you can output the names of parameters to methods using reflection:
foreach(ParameterInfo pi in typeof(AClass).GetMethods()[0].GetParameters())
Trace.WriteLine(pi.Name);
You don't need to know the names of the parameters (or even of the method) - it's all contained in the reflection information. In a fully-reflective language, you would be able to do:
foreach(LocalVariableInfo lvi in typeof(AClass).GetMethods()[0].GetLocals())
Trace.WriteLine(lvi.Name);
The applications may be limited (many applications of reflection are), but nevertheless, I would expect a reflection-complete language to support such a construct.
EDIT: Since two people have now effectively said "there's no point in reflecting local variable names", here's a basic example of why it's useful:
void someMethod()
{
SomeObject x = SomeMethodCall();
// do lots of stuff with x
// sometime later...
if (!x.StateIsValid)
throw new SomeException(String.Format("{0} is not valid.", nameof(x));
}
Sure, I could just hardcode "x" in the string, but correct refactoring support makes that a big no-no. nameof(x) or the ability to reflect all names is a nice feature that is currently missing.
Your introductory statement about the names of local variables drew my interest.
This code will actually retrieve the name of the local var inside the lambda expression:
static void Main(string[] args)
{
int a = 5;
Expression<Func<int>> expr = (() => a);
Console.WriteLine(expr.Compile().Invoke());
Expression ex = expr;
LambdaExpression lex = ex as LambdaExpression;
MemberExpression mex = lex.Body as MemberExpression;
Console.WriteLine(mex.Member.Name);
}
Also have a look at this answer mentioning LocalVariableInfo.
Yes, there are languages where this is (at least kind of) possible. I would say that reflection in both Smalltalk and Python are pretty "complete" for any reasonable definition.
That said, getting the name of a local variable is pretty pointless - by definition to get the name of that variable, you must know its name. I wouldn't consider the lack of an operation to perform that exact task a lacuna in the reflection facility.
Your second example does not "determine the name of a local variable", it retrieves the name of all local variables, which is a different task. The equivalent code in Python would be:
for x in locals().iterkeys(): print x
eh, in order to access a local var you have to be within the stackframe/context/whatever where the local var is valid. Since it is only valid at that point in time, does it matter if it is called 't1' or 'myLittlePony'?

Resources