I created a simple process definition in jBPM5 with just a single script task. I want to include a global variable, say count that is static in the sense that the same value is shared across the various process instances, however it is not a constant and each instance can update the value, say increment it in the first task of the process. From the script task I want to do this modification (increment) and print it to the stdout. How do I do this?
System.out.println(count);
kcontext.setVariable("count", count + 1);
I myself found the answer with some researching that we need to use kcontext.getKnowledgeRuntime().setVariable() and .getVariable() for setting and getting a 'static' variable that is shared across process instances. However, it is leading to another question in my mind as to what would happen if the scriptTask that uses setVariable is called simultaneously by multiple instances! Thanks #KrisV! Without your help I would not have been able to come to this. :)
Related
In the docs, they say that you should avoid passing data between tasks:
This is a subtle but very important point: in general, if two operators need to share information, like a filename or small amount of data, you should consider combining them into a single operator. If it absolutely can’t be avoided, Airflow does have a feature for operator cross-communication called XCom that is described in the section XComs.
I fundamentally don't understand what they mean. If there's no data to pass between two tasks, why are they part of the same DAG?
I've got half a dozen different tasks that take turns editing one file in place, and each send an XML report to a final task that compiles a report of what was done. Airflow wants me to put all of that in one Operator? Then what am I gaining by doing it in Airflow? Or how can I restructure it in an Airflowy way?
fundamentally, each instance of an operator in a DAG is mapped to a different task.
This is a subtle but very important point: in general if two operators need to share
information, like a filename or small amount of data, you should consider combining them
into a single operator
the above sentence means that if you want any information that needs to be shared between two different tasks then it is best you could combine them into one task instead of using two different tasks, on the other hand, if you must use two different tasks and you need to pass some information from one task to another then you can do it using
Airflow's XCOM, which is similar to a key-value store.
In a Data Engineering use case, file schema before processing is important. imagine two tasks as follows :
Files_Exist_Check : the purpose of this task is to check whether particular files exist in a directory or not
before continuing.
Check_Files_Schema: the purpose of this task is to check whether the file schema matches the expected schema or not.
It would only make sense to start your processing if Files_Exist_Check task succeeds. i.e. you have some files to process.
In this case, you can "push" some key to xcom like "file_exists" with the value being the count of files present in that particular directory in Task Files_Exist_Check.
Now, you "pull" this value using the same key in Check_Files_Schema Task, if it returns 0 then there are no files for you to process hence you can raise exception and fail the task or handle gracefully.
hence sharing information across tasks using xcom does come in handy in this case.
you can refer following link for more info :
https://www.astronomer.io/guides/airflow-datastores/
Airflow - How to pass xcom variable into Python function
What you have to do for avoiding having everything in one operator is saving the data somewhere. I don't quite understand your flow, but if for instance, you want to extract data from an API and insert that in a database, you would need to have:
PythonOperator(or BashOperator, whatever) that takes the data from the API and saves it to S3/local file/Google Drive/Azure Storage...
SqlRelated operator that takes the data from the storage and insert it into the database
Anyway, if you know which files are you going to edit, you may also use jinja templates or reading info from a text file and make a loop or something in the DAG. I could help you more if you clarify a little bit your actual flow
I've decided that, as mentioned by #Anand Vidvat, they are making a distinction between Operators and Tasks here. What I think is that they don't want you to write two Operators that inherently need to be paired together and pass data to each other. On the other hand, it's fine to have one task use data from another, you just have to provide filenames etc in the DAG definition.
For example, many of the builtin Operators have constructor parameters for files, like the S3FileTransformOperator. Confusing documentation, but oh well!
Is this a good way to fire another command within the command handler in axonframework application?
For example, I want to provide a ROLLBACK function, which the underlying process is read the history state of aggregate with the given sequence number, and then update the aggregate according to the history state, imagine it as following:
#CommandHandler
private void on(RollbackCommand command, MetaData metaData) {
ContractAggregate ca = queryGateWay.query(new QueryContractWithGivenSequenceNumber(...));
commandGateWay.sendCommandAndWait(new UpdateContractCommand(ca));
}
Will it work fine?
On "Dispatching commands from command handlers"
Command handlers can roughly exist in two areas in an Axon application:
Within the Aggregate
On a Command Handling Component
In both options, it would be possible to dispatch a command from within the command handler, but I would only advice such an operation from option 2.
The reasoning behind this is that when Axon handles a command from within an Aggregate, that exact Aggregate instance will be locked.
This is done to ensure no concurrent operations are performed on a given aggregate instance.
Knowing this, we can deduce that the subsequent command could also land up in an aggregate instance, which will be locked as well. Added, if the command being dispatched from within an aggregate command handler is targeted to the same aggregate instance, you'll effectively be blocking the system. Axon will throw a LockAcquisitionFailedException eventually, but nonetheless you'd have created something undesirable.
Thus, I'd only dispatch commands from within #CommandHandler annotated methods which reside on a Command Handling Component.
On "your use case"
I have some questions on your use case, as the blanks make me slightly concerned whether this is the best approach. Thus, let me ask you some follow up questions.
If I understand your question correctly, you want to introduce a command handler which queries the aggregate to be able to rollback it's state with a single command?
Would you have a command which adjusts the entire state of the aggregate?
Or specific portions of the aggregate instance?
And, I assume the query is targeted towards a dedicated Query Model representing the aggregate, thus not Axon's idea of the #Aggregate annotated class, right?
Does dart handle the case in which two different calls of an asynchronous function try to add two (or more) objects to a List at the same time? If it does not is there a way for me to handle this?
I do not need those two new objects to be inserted in a particular order because I take care of that later on, I only wandered what happens in that unlikely but still possible case
If you're wondering if there's any kind of locking necessary to prevent race conditions in the List data structure itself, no. As pskink noted in a comment, each Dart isolate runs in its own thread, and as the "isolate" name implies, memory is not shared. Two operations therefore cannot both be actively updating a List at the same time. Once all asynchronous operations complete, your List will contain all of the added items but not with any guaranteed ordering.
If you need to prevent asynchronous operations from being interleaved, you could use package:pool.
The question
Is it possible (and if so, how) to make it so when an object's field x (that contains a timestamp) is created/updated a specific trigger will be called at the time specified in x (probably calling a serverless function)?
My Specific context
In my specific instance the object can be seen as a task. I want to make it so when the task is created a serverless function tries to complete the task and if it doesn't succeed it updates the record with the partial results and specifies in a field x when the next attempt should happen.
The attempts should not span at a fixed interval. For example, a task may require 10 successive attempts at approximately every 30 seconds, but then it may need to wait 8 hours.
There currently is no way to (re)trigger a Cloud Function on a node after a certain timespan.
The closest you can get is by regularly scheduling a cron job to run on the list of tasks. For more on that, see this sample in the function-samples repo, this blog post by Abe, and this video where Jen explains them.
I admit I never like using this cron-job approach, since you have to query the list to find the items to process. A while ago, I wrote a more efficient solution that runs a priority queue in a node process. My code was a bit messy, so I'm not quite ready to share it, but it wasn't a lot (<100 lines). So if the cron-trigger approach doesn't work for you, I recommend investigating that direction.
I have a google apps scripts that takes a spreadsheet, and loops over the rows, getting the value column by column and generating an RSS feed.
I have some performance issues, and it's due to the foor loop I think, and querying that many values.
Any insights on how to optimize this? Thanks!
http://pastebin.com/EPN5EPAx
Calling getCell and setValue over and over again is probably what is slowing it down so much. Each time you call setValue() it makes a new IO call which is slow. It''s best to load and save your data all in one fell swoop.
For example, load all the values from the range at the start with:
var values = range.getValues();
Then iterate through he resulting two dimensional array (instead of getCell(i, 2) use values[i - 1][1]).
When you need to change a value use:
values[i][j] = newValue;
Then when you're finished call:
range.setValues(values);
This way you minimize the IO calls to two: one to load at the start and one save changes at the end.