How can I pass a non-static variable to an async block? - asynchronous

According to the docs on task::spawn (note I'm using tokio::spawn which seems similar but lacks the description)
The 'static constraint means that the closure and its return value must have a lifetime of the whole program execution. The reason for this is that threads can detach and outlive the lifetime they have been created in. Indeed if the thread, and by extension its return value, can outlive their caller, we need to make sure that they will be valid afterwards, and since we can't know when it will return we need to have them valid as long as possible, that is until the end of the program, hence the 'static lifetime.
I'm trying to pass a Url into a thread, like this,
do_update(confg.to_feed_url()).await;
Where I do this in do_update,
async fn do_update(url: Url) {
let task = task::spawn(async {
let duration = Duration::from_millis(5_000);
let mut stream = time::interval(duration);
stream.tick().await;
loop {
feeds::MyFeed::from_url(url.clone(), true);
This doesn't work and generates these errors,
error[E0373]: async block may outlive the current function, but it borrows url, which is owned by the current function
I've read the docs, but this doesn't much make sense to me because I'm cloning in the block. How can I resolve this error?

In this case, if you don't need the URL elsewhere, you can write async move for your block instead of async, which will move the URL (and anything else the block captures, which it looks like is nothing) into the block.
If you do need it elsewhere, then you can clone the url variable outside of the block and then do the same thing (async move) to move just that particular variable into the block and not url itself. The clone call will probably not be necessary in such a case.
This occurs because url lives outside the block and calling url.clone borrows it (probably immutably, but it depends on the implementation). Since its scope is this function and not the life of the program, it won't live for the 'static lifetime and hence you need to move it into the block instead of borrowing.

Related

How do I do jumps to a label in an enclosing function in LLVM IR?

I want to do an LLVM compiler for a very old language, PL/M. This has some peculiar features, not least of which is having nested functions with the ability to jump out of an enclosing function. In pseudocode:
toplevel() {
nested() {
if (something)
goto label;
}
nested();
label:
print("finished!");
}
The constraints here are:
you can only jump into the top-level function, luckily
the stack does get unwound (the language does not support destructors, so this is easy)
you do not have to have executed the statement at label before jumping (so the naive setjmp/longjmp method doesn't work).
code at label can be executed normally, i.e. it's not like catch
LLVM has a number of non-local jump mechanisms, such as the exception handling system, but I've never used that. Can this be implemented using LLVM exceptions, or are they not suitable for this? Is there an easier way?
If you want the stack to get unwound, you'll likely want it to be in a separate function, at least a separate LLVM IR function. (The only real exception is if your language does not have a construct like C's "alloca()" and you don't allow calling a nested function by address in which case you could inline it.)
That part of the problem you mentioned, jumping out of an enclosing function, is best handled by having some way for the callee to communicate "how it exited" to the caller, and the caller having a "switch()" on that value. You could stick it in the return value (if it already returns a value, make it a struct of both values), you could add a pointer parameter that it writes to, you could add it a thread-local global variable and fill that in before calling longjmp, or you could use exceptions.
Exceptions, they're complex (I can't describe how to make them work offhand but the docs are here: https://llvm.org/docs/ExceptionHandling.html ) and slow when the exception path is taken, and really intended for exceptional situations, not for normal code. Setjmp/longjmp does the same thing as exceptions except simpler to use and without the performance trade-off when executed, but unfortunately there are miscompiles in LLVM which you need will be the one to fix if you start using them in earnest (see the postscript at the end of the answer).
Those two options cover the ways you can do it without changing the function signature, which may be necessary if your language allows the address to be taken then called later.
If you do need to take the address of nested, then LLVM supports trampolines. See https://llvm.org/docs/LangRef.html#trampoline-intrinsics . Trampolines solve the problem of accessing the local variables of the calling function from the callee, even when the function is called by address.
PS. LLVM miscompiles setjmp/longjmp today. The current model is that a call to setjmp may return twice, and only functions with the returns_twice attribute may return twice. Note that this doesn't affect the whole call stack, only the direct caller of a function that returns twice has to deal with the twice-returning call-- just because function F calls setjmp does not mean that F itself can return twice. So far, so good.
The problem is that in a function with a setjmp, all function calls may themselves call longjmp. I'd say "unless proven otherwise" as with all things in optimizers, but there is no attribute in LLVM doesnotlongjmp or any code within LLVM that attempts to answer the question of whether a function could call longjmp. Adding that would be a good optimization, but it's a separate issue from the miscompile.
If you have code like this pseudo-code:
%entry block:
allocate val
val <- 0
setjmpret <- call setjmp
br i1 setjmpret, %first setjmp return block, %second setjmp return block
%first setjmp return block:
val <- 1;
call foo();
goto after;
%second setjmp return block:
call print(val);
goto after;
%after:
return
The control flow graph shows that is no path from val <- 0 to val <- 1 to print(val). The only path with "print(val)" has "val <- 0" before it therefore constant propagation may turn print(val) into print(0). The problem here is a missing control flow edge from foo() back to the %second setjmp return block. In a function that contains a setjmp, all calls which may call longjmp must have a CFG edge to the second setjmp return block. In LLVM that control flow edge is missing and LLVM miscompiles code because of it.
This problem also manifests in the backend. The first time I heard of this problem it was in the context of the backend losing track of the placement of variables on the stack, and this issue was the underlying root cause.
For the most part setjmp/longjmp seems to work because LLVM isn't usually able to analyze what calling foo() might do and can't perform the optimization. For instance if val was not a fresh allocation but was a pointer, then who's to say that foo() doesn't have access to the same pointer, and then performs "val <- 1" on it? If LLVM can't prove that impossible, that precludes the transform to print(0). Secondly, setjmp/longjmp are just not used often in real code.

Should I create new context in each incoming request?

For the past few days, I've been reading about Go and one concept that I keep returning to are contexts.
I think I understand the motivation behind creating such a structure. The thing that I don't understand is a particular use case when using a context in the incoming HTTP request.
Let's say we have a following httpHandlerFunc. Inside that handler, we call a function that requires a context to be passed. I often saw this solution
func myHandler(w http.ResponseWriter, r *http.Request) {
ctx := context.WithValue(context.Background(), "request", r)
otherFunc(ctx)
}
My question is, why don't we just pass a context from the request, like so
func myHandler(w http.ResponseWriter, r *http.Request) {
otherFunc(r.Context())
}
Doesn't it make more sense to pass the context of the request since we want the context to flow through our program? I thought that creating a background context is something we want to do only in the root parent, like init() function.
You might be missing the main point of contexts — supposedly due to poor HOWTOs you're dealing with.
The possiblility of carrying around arbitrary values in contexts is actually a misfeature of this type, regretted by its designers because it creates an anti-pattern (a proper way to deal with context-as-some-state is to have a set of values explicitly passed around).
The chief reason contexts exist is because they provide tree-like propagation of a signal (cancellation or "done" in the case of contexts).
So the original idea behind contexts is like follows:
The "root" context object is created for an incoming request.
Each "task" which is needed to be executed on behalf of the request is associated with its own context, derived from that of the request¹.
Those tasks may produce other tasks and so on.
As you can see, a hierarchy of "units of works" is formed, — linked to the object which is the reason for these units to exist and execute.
When the incoming request is cancelled (the client's socket got disconnected, for example), the context object associated with it is cancelled as well, and then all the linked tasks receive it as it's propagated from the root of the resulting context tree down to its leaves — making sure all the tasks being executed for the request are (eventually) cancelled.
Of course, in order for this to work, each "task" — which is usually a goroutine doing something — is required to "listen" from the context passed to it for that "done" signal.
Contexts also support timeout out of the box, so you might create a context which cancels itself after some fixed time interval passes.
So, back to the examples in your question.
The first example ignores the request's context completely and creates a from-scratch context ostensibly with the sole reason to carry stuff in it (bad).
The second example might use the context for its intended purpose (but we do not know as we cannot see that otherFunc).
I would advise you to read https://blog.golang.org/context, and the articles on concurrency patters in Go linked there.
¹ Actually, a new context need not be created if the task to be controlled by it has no other policy to "add" to the existing, parent, context.
The idea of derivation is here to implment additional ways to cancel work in this particular task as well as honoring the cancellation of the parent context.
For instance, a context derived for a particular task could have its own deadline or have a way to cancel only this particular context.
Of course, a complex—nested—context can be derived for a task: for example, a context with a deadline can be derived from the parent context, and then a cancellable context can be derived form the former. The result would be a context which is cancelled either explicitly by the code or when the deadline expires or when the parent context signals its cancellation.
Your two examples do entirely different things.
func myHandler(w http.ResponseWriter, r *http.Request) {
ctx := context.WithValue(context.Background(), "request", r)
otherFunc(ctx)
}
This creates a new context, and stores the request as a value. There is rarely, if ever, any reason to do exactly this. A far more idiomatic solution would be just to pass the request to otherFunc like so:
func myHandler(w http.ResponseWriter, r *http.Request) {
otherFunc(r)
}
If you really do need to pass the request as a context value, you should probably do it with the current request's context, like so:
func myHandler(w http.ResponseWriter, r *http.Request) {
ctx := context.WithValue(r.Context(), "request", r)
otherFunc(ctx)
}

Pointers as function arguments when implementing a structure

Why there is a & symbol before self in the full_name() function but there isn't any in the to_tuple() function? When I look at them, the usage of self is similar in both function, but why use &. Also when I add & to to_tuple() or delete it from full_name() it would throw an error. Can someone explain it?
fn full_name(&self) -> String {
format!("{} {}", self.first_name, self.last_name)
}
fn to_tuple(self) -> (String, String) {
(self.first_name, self.last_name)
}
full_name does not consume self, it uses a reference via &self: The members are only used via references as arguments to format!(), so a reference suffices.
to_tuple (as the name to_... suggests) consumes self: It moves the members from self into the returned tuple. Since the original self is no longer valid memory after the move (self no longer owns the memory), it has to be consumed, hence a move via self.
You can change full_name to use self, that is move ownership. This would become unhandy, though, as calling the function would consume the struct without the need to.
to_tuple could be changed to not consume self, yet it would need to .clone() (make a copy) of the members, which is costly.

Purely functional feedback suppression?

I have a problem that I can solve reasonably easy with classic imperative programming using state: I'm writing a co-browsing app that shares URL's between several nodes. The program has a module for communication that I call link and for browser handling that I call browser. Now when a URL arrives in link i use the browser module to tell the
actual web browser to start loading the URL.
The actual browser will trigger the navigation detection that the incoming URL has started to load, and hence will immediately be presented as a candidate for sending to the other side. That must be avoided, since it would create an infinite loop of link-following to the same URL, along the line of the following (very conceptualized) pseudo-code (it's Javascript, but please consider that a somewhat irrelevant implementation detail):
actualWebBrowser.urlListen.gotURL(function(url) {
// Browser delivered an URL
browser.process(url);
});
link.receivedAnURL(function(url) {
actualWebBrowser.loadURL(url); // will eventually trigger above listener
});
What I did first wast to store every incoming URL in browser and simply eat the URL immediately when it arrives, then remove it from a 'received' list in browser, along the lines of this:
browser.recents = {} // <--- mutable state
browser.recentsExpiry = 40000;
browser.doSend = function(url) {
now = (new Date).getTime();
link.sendURL(url); // <-- URL goes out on the network
// Side-effect, mutating module state, clumsy clean up mechanism :(
browser.recents[url] = now;
setTimeout(function() { delete browser.recents[url] }, browser.recentsExpiry);
return true;
}
browser.process = function(url) {
if(/* sanity checks on `url`*/) {
now = (new Date).getTime();
var duplicate = browser.recents[url];
if(! duplicate) return browser.doSend(url);
if((now - duplicate_t) > browser.recentsExpiry) {
return browser.doSend(url);
}
return false;
}
}
It works but I'm a bit disappointed by my solution because of my habitual use of mutable state in browser. Is there a "Better Way (tm)" using immutable data structures/functional programming or the like for a situation like this?
A more functional approach to handling long-lived state is to use it as a parameter to a recursive function, and have one execution of the function responsible for handling a single "action" of some kind, then calling itself again with the new state.
F#'s MailboxProcessor is one example of this kind of approach. However it does depend on having the processing happen on an independent thread which isn't the same as the event-driven style of your code.
As you identify, the setTimeout in your code complicates the state management. One way you could simplify this out is to instead have browser.process filter out any timed-out URLs before it does anything else. That would also eliminate the need for the extra timeout check on the specific URL it is processing.
Even if you can't eliminate mutable state from your code entirely, you should think carefully about the scope and lifetime of that state.
For example might you want multiple independent browsers? If so you should think about how the recents set can be encapsulated to just belong to a single browser, so that you don't get collisions. Even if you don't need multiple ones for your actual application, this might help testability.
There are various ways you might keep the state private to a specific browser, depending in part on what features the language has available. For example in a language with objects a natural way would be to make it a private member of a browser object.

Groovy DSL with embedded groovy scripts

I am writing a DSL for expressing flow (original I know) in groovy. I would like to provide the user the ability to write functions that are stored and evaluated at certain points in the flow. Something like:
states {
"checkedState" {
onEnter {state->
//do some groovy things with state object
}
}
}
Now, I am pretty sure I could surround the closure in quotes and store that. But I would like to keep syntax highlighting and content assist if possible when editing these DSLs. I realize that the closure COULD reference artifacts from the surrounding flow definition which would no longer be valid when executing the closure in a different context, and I am fine with this. In reality I would like to use the closure syntax for a non-closure function definition.
tl;dr; I need to get the closure's code while evaluating the DSL so that it can be stored in the database and executed by a script host later.
I don't think there is a way to get a closure's source code, as this information is discarded during compilation. Perhaps you could try writing an AST transformation that would make closure's syntax tree available at runtime.
If all you care about is storing the closure in the database, and you don't need later access to the source code, you can try serializing it and storing the serialized form.
Closure implements Serializable, and after nulling its owner, thisObject and delegate attributes I was able to serialize it, but I'm getting ClassNotFoundException on deserialization.
def myClosure = {a, b -> a + b}
Closure.metaClass.setAttribute(myClosure, "owner", null)
Closure.metaClass.setAttribute(myClosure, "thisObject", null)
myClosure.delegate = null
def byteOS = new ByteArrayOutputStream()
new ObjectOutputStream(byteOS).writeObject(myClosure)
def serializedClosure = byteOS.toByteArray()
def input = new ObjectInputStream(new ByteArrayInputStream(serializedClosure))
def deserializedClosure = input.readObject() // throws CNFE
After some searching, I found Groovy Remote Control, a library created specifically to enable serializing closures and executing them later, possibly on a remote machine. Give it a try, maybe that's what you need.

Resources