Use Smart Sensors and still get context variable - airflow

I am on Airflow 2.1.4 and I am trying to modify a custom sensor to act as a Smart Sensor.
Among other things, to allow a custom sensor to work as a Smart Sensor you need to give it a poke_context_fields class variable. This isn't very well documented, but I think it's just a list of the arguments to __init__ that you want to also be passed to self.poke() when that is called by the Smart Sensor DAG/Shard (though I could be wrong).
So I have it like:
poke_context_fields = ['myarg1', 'myarg2']
I have tested this out, but it seems there is a problem: When the Smart Sensor DAG calls self.poke(), it forwards along those arguments as expected, but it does NOT give me the regular context variable that my poke method expects. Unfortunately my code won't work without access to that variable, because it needs certain properties like context['ds'], context['task_instance'], etc, which would only be available at execution time and not when Python parses the class variable.
I have read the following (https://github.com/apache/airflow/issues/11893) but I don't 100% follow. Is there a workaround for this or should I just conclude that I can't use Smart Sensors and should wait to use Deferrable Operators which was released in 2.2.0?

Smart Sensors is/was an experimental feature. Eventually it was decided to deprecate it (see PR) starting from version 2.3.0 in favor of Deferrable Operators.
To quote the docs:
Deferrable Operators essentially supersede Smart Sensors, and should
be preferred for almost all situations.

Related

No Global Contract available for procedure / function

I've got a procedure within a SPARK module that calls the standard Ada-Text_IO.Put_Line.
During proving I get the following warning warning: no Global contract available for "Put_Line".
I do already know how to add the respective data dependency contract to procedures and functions written by myself but how do I add them to a procedures / functions written by others where I can't edit the source files?
I looked through sections 5.2 and 7.4 of the Adacore SPARK 2014 user's guide but didn't found an example with a solution to my problem.
This means that the analyzer cannot "see" whether global variables might be affected when this function is called. It therefore assumes this call is not modifying anything (otherwise all other proofs could be refuted immediately). This is likely a valid assumption for your specific example, but it might not be valid on an embedded system, where a custom implementation of Put_Line might do anything.
There are two ways to convey the missing information:
verifier can examine the source code of the function. Then it can try to generate global contracts itself.
global contracts are specified explicitly, see RM 6.1.4 (http://docs.adacore.com/spark2014-docs/html/lrm/subprograms.html#global-aspects)
In this case, the procedure you are calling is part of the run-time system (RTS), and therefore the source is not visible, and you probably cannot/should not change it.
What to do in practice?
Suppressing warnings is almost never a good idea, especially not when you are working on something safety-critical. Usually the code has to be changed until the warning goes away, or some justification process has to start.
If you are serious about the analysis results, I recommend to not use such subprograms. If you really need output there, either write your own procedure that replaces the RTS subprogram, or ensure that the subprogram really has no side effects. This is further backed up by what Frédéric has linked: Even if the callee has no side effects, you don't know whether it raises an exception for specific inputs (e.g., very long strings).
If you are not so serious about the results, then you can consider this specific one as a warning that you could live with.
Wrapper packages for use in development of SPARK applications may be found here:
https://github.com/joakim-strandberg/aida_2012
I think you just can't add Spark contracts on code you don't own, especially code from the Ada standard.
About Text_Io, I found something that may be valuable to you in the reference manual.
EDIT
Another solution compared to what Martin said, according to "Building high integrity applications with Spark" book, is to create a wrapper package.
As Spark requires you to deal with Spark packages but allows you to depend on a Spark spec with an Ada body, the solution is to build a Spark package wrapping your Ada.Text_io calls.
It might be tedious as you will have to wrap possible exceptions, possibly define specific types and so on but this way, you'll be able to discharge VCs on your full Spark package.

Why does zumero_sync need to be called multiple times?

According to the documentation for zumero_sync:
If a large amount of information needs to be pulled from the server,
this function may need to be called more than once.
In my Android app that uses Zumero that's no problem; I just keep calling zumero_sync until the return value doesn't start with "0;".
However, now I'm trying to write an admin script that also syncs with my server dbfiles. I'd like to use the sqlite3 shell, and have the script pass the SQL to execute via command line arguments. I need to call zumero_sync in a loop (which SQLite doesn't support) to make sure the db is fully synced. If I had to, I could invoke sqlite3 in a loop (reading its output, looking for "0;"), or even write a C++ app to call the SQLite/Zumero functions natively. But it certainly would be easier if a single zumero_sync was enough.
I guess my real question is: could zumero_sync be changed so it completes the sync before returning? If there are cases where the existing behavior is more useful, maybe there could be a parameter for specifying which mode to use?
I see two basic questions here:
(1) Why does zumero_sync() work the way it does?
(2) Can it work differently?
I'll answer (2) first, since it's easier: Yes, it could work differently. Rather, we could (and probably will, soon, you brought this up) implement an additional function, named something like zumero_sync_complete(), which performs [the guts of] zumero_sync() in a loop and returns after the sync is complete.
We didn't implement zumero_sync_complete() because it doesn't add much value. It's a simple loop, so you can darn well write it yourself. :-)
Er, except in scripting environments which don't support loops. Like the sqlite3 shell.
Answer to (1):
The Zumero sync protocol is designed to give the server the flexibility to return partial results if it wants to do so. And for the sake of reducing load on the server (and increasing its scalability) it often does want to do exactly that.
Given that, one reason to expose this to the client is to increase the client's flexibility as well. As long we're making multiple roundtrips, we might as well give the client an opportunity to do something (like, maybe, update a progress bar) in between them.
Another thing a client might want to do in between loop iterations is handle an error.
Or, in the case of a multithreaded client, it might want to deal with changes that happened on the client while the sync is going on.
Which raises the question of how locking should be managed? Do we hold the sqlite write lock during the entire loop? Or only when absolutely necessary?
Bottom line: A robust app would probably want to implement the loop itself so that it can make its own decisions and retain full control over things.
But, as you observe, the sqlite3 shell doesn't have loops. And it's not an app. And it doesn't have threads. Or progress bars. So it's a use case where a simpler-and-less-powerful form of zumero_sync() would make sense.

Replacing WaitForMultipleObjects in Qt

I am not familar with WINAPI, and I am looking for a way to replace WaitForMultipleObjects used in one example I'm porting to Qt by anything using Qt only. Is it possible?
EDIT: (Providing more information as requested in comments)
A 3rd party API provides an array of events:
HANDLE m_hEv[MAX_EV];
In an endles-loop of a thread, the program waits for the events like this:
WaitForMultipleObjects(m_EvMax, m_hEv, FALSE ,INFINITE )
The HANDLE type seems to be void*.
So I wonder, if any Qt class could observe m_hEv for changes and unlock thread execution.
There is no simple way of porting WaitForMultipleObjects outside WinAPI. WinAPI has an "advantage" of that all lockable resources (sockets, files, processes) provide the same generic non-typesafe HANDLE, which is your void*. Unlike other platforms which have different ways of locking and signalling per the type of resource, the event handling in WinAPI is largely independent of the resources. Then a generic function like WaitForMultipleObjects can exist, which doesn't need to care who produced the HANDLEs. So you'll have to understand what the code is trying to do and mimic it differently per scenario.
The biggest difference is in WaitForMultipleObjects third parameter, which is FALSE in your case. Which means that the it will exit waiting as soon as any single event of the waiting array will happen. That is the easier scenario and can be replaced with a QWaitCondition.
Instead of m_hEv, you will pass a QWaitCondition* into the code which signals the event (most probably via WinAPI SetEvent(m_hEv[x]))
Instead of WaitForMultipleObjects, do QWaitCondition::wait().
Instead of SetEvent(), do QWaitCondition::wakeOne().
Would the third parameter be TRUE, then the WinAPI code waits until ALL m_hEv events are signalled. The established name for such functionality is a synchronization barrier and it can be simulated with QEventCondition too, but does not come out of the Qt box. I never needed to do any myself, but SO has some ideas how to do it:
Qt synchronization barrier?
WaitForMultipleObjects is a kind of generic function that works with many things: threads, processes, mutexes, etc. Qt is an OOP library where every class exposes the operations it supports. So the equivalent operation in Qt depends on what class you're using. For example, with threads, use QThread::wait. With mutexes, use QMutex::lock.

what are exactly MPI, MPICH, and OPENMPI? what does "implementation" mean in this context?

My question might seem silly to those who have been in the field for long time, but I appreciate your patience in elaborating it for me.
When they say MPICH is an "implementation" of MPI, what does it mean?
Is the following analogy true(?):
if we think of MPI as a set of standards for a FORTRAN compiler, then MPICH, and OPENMPI are different versions of FORTRAN compilers, like Intel.Fortran, Compaq.Fortran, GNU.Fortran, and so on.
MPI is a standard: it outlines a particular model for message passing in a distributed system. However, it only gives a series of requirements: it does not actually include any code, nor does it specify how exactly these requirements need to be fulfilled. For example, take a look at this excerpt from the official MPI 2.2 spec (as of today):
A valid MPI implementation guarantees certain general properties of
point-to-point communication, which are described in this section.
Order Messages are non-overtaking: If a sender sends two messages in succession to the same destination, and both match the same
receive, then this operation cannot receive the second message if the
first one is still pending.
It then goes on to explain the rationale behind this requirement and provide an example, but says nothing more about the requirement itself.
An MPI implementation is a library that fulfills every requirement - like the one above - in the MPI specification. However, the standard contains absolutely no requirements as to what language constructs, OS calls, 3rd party libraries, etc can/can't/should be used. Occasionally, it will give advice to implementors, like this:
Advice to implementors. The implementation may keep a reference count
of active communications that use the datatype, in order to decide
when to free it. Also, one may implement constructors of derived
datatypes so that they keep pointers to their datatype arguments,
rather then copying them. In this case, one needs to keep track of
active datatype definition references in order to know when a datatype
object can be freed. (End of advice to implementors.)
however, these are still vague, very language-agnostic, and only recommendations: an implementation can ignore every single one of these advices, and still conform to the standard.
So yes, in essence it's similar to various implementations of a compiler. If a program takes valid source code for a language, and produces binary code that does everything that the language specification says it should do given the original source code, it's a conforming compiler for that language. Similarly, if you can use a library to pass messages in a way that doesn't break any rules of the MPI spec, then that's a valid MPI implementation.

How to use non-blocking or asynchronous IO with Boost Spirit?

Does Spirit provide any capabilities for working with non-blocking IO?
To provide a more concrete example: I'd like to use Boost's Spirit parsing framework to parse data coming in from a network socket that's been placed in non-blocking mode. If the data is not completely available, I'd like to be able to use that thread to perform other work instead of blocking.
The trivial answer is to simply read all the data before invoking Spirit, but potentially gigabytes of data would need to be received and parsed from the socket.
It seems like that in order to support non-blocking I/O while parsing, Spirit would need some ability to partially parse the data and be able to pause and save its parse state when no more data is available. Additionally, it would need to be able to resume parsing from the saved parse state when data does become available. Or maybe I'm making this too complicated?
TODO Will post a example for a simple single-threaded 'event-based' parsing model. This is largely trivial but might just be what you need.
For anything less trivial, please heed to following considerations/hints/tips:
How would you be consuming the result? You wouldn't have the synthesized attributes any earlier anyway, or are you intending to use semantic actions on the fly?
That doesn't usually work well due to backtracking. The caveats could be worked around by careful and judicious use of qi::hold, qi::locals and putting semantic actions with side-effects only at stations that will never be backtracked. In other words:
this is bound to be very errorprone
this naturally applies to a limited set of grammars only (those grammars with rich contextual information will not lend themselves well for this treatment).
Now, everything can be forced, of course, but in general, experienced programmers should have learned to avoid swimming upstream.
Now, if you still want to do this:
You should be able to get spirit library thread safe / reentrant by defining BOOST_SPIRIT_THREADSAFE and linking to libboost_thread. Note this makes the gobals used by Spirit threadsafe (at the cost of fine grained locking) but not your parsers: you can't share your own parsers/rules/sub grammars/expressions across threads. In fact, you can only share you own (Phoenix/Fusion) functors iff they are threadsafe, and any other extensions defined outside the core Spirit library should be audited for thread-safety.
If you manage the above, I think by far the best approach would seem to
use boost::spirit::istream_iterator (or, for binary/raw character streams I'd prefer to define a similar boost::spirit::istreambuf_iterator using the boost::spirit::multi_pass<> template class) to consume the input. Note that depending on your grammar, quite a bit of memory could be used for buffering and the performance is suboptimal
run the parser on it's own thread (or logical thread, e.g. Boost Asio 'strands' or its famous 'stackless coprocedures')
use coarse-grained semantic actions like shown above to pass messages to another logical thread that does the actual processing.
Some more loose pointers:
you can easily 'fuse' some functions to handle lazy evaluation of your semantic action handlers using BOOST_FUSION_ADAPT_FUNCTION and friends; This reduces the amount of cruft you have to write to get simple things working like normal C++ overload resolution in semantic actions - especially when you're not using C++0X and BOOST_RESULT_OF_USE_DECLTYPE
Because you will want to avoid semantic actions with side-effects, you should probably look at Inherited Attributes and qi::locals<> to coordinate state across rules in 'pure functional fashion'.

Resources