I want to issue something e.g. a new option. Inside the flow where I'm issuing this new option, I need to get info from two separate oracles that need to provide data for the output state.
How should I do this... should I have one output and 3 commands? command with data from Oracle 1, command with data from Oracle 2 and then the issue command? or can this be done with one command?
It's entirely up to you - your command can contain whatever you data you want, so in theory, you could do the whole with one command.
Having said that, I would probably split it out into at least two commands for clarity and privacy. The privacy element is you can build a filtered transaction for the oracle to sign that only contains the oracle command.
If you don't mind the two oracles seeing the data sent to each for signing, you could encapsulate the data in one command e.g.
class OracleCommand(val spotPrice: SpotPrice, val volatility: Volatility) : CommandData
Where one oracle attests to spotPrice and the other to Volatility.
However, you would find it hard to determine what part of the data they attested to since they will both sign over the entire filtered transaction.
Unless you knew the design of the oracle could specifically pick out the correct data, you're probably better off going with three separate commands.
Related
I'm working with Progress 11.6 appBuilder and procedure editor (and Data Dictionary).
Regularly we are doing modifications at the customer's database, there are two types of modifications:
Modifications of the structure: those are done, using interactive GUI of the data dictionary.
Modifications of the data: those are done, using the procedure editor
An example of a data modification in the procedure typically looks like this:
FOR EACH Table1 WHERE Table1.Field1 = <value>:
CREATE Table2.
Table2.Field1 = <value>.
Table2.Field2 = <some-other-value>.
END.
This is completely in contradiction with one of the basics of software delivery quantity, repeatability: there is no way to return to the previous situation!
Therefore I'm looking for ways to do this in an (automatable) repeatable way, hence my questions:
What can we use instead of the interactive GUI of data dictionary (without undo feature) in order to perform/undo database structure modifications?
What can we do in order to undo database data modifications? (Is there something like a Oracle redo log or a Oracle archive log in Progress?)
In case you say "What are you talking about? You can do "Undo transaction" in the data dictionary.", I mean the following:
I perform a transaction using the data dictionary, I leave the data dictionary and the day later the customer complains. When I open the data dictionary at that moment, the "Undo transaction" feature is disabled.
At a high level you should be creating "df files" (DDL scripts) and applying those to the customer database rather than manually making changes. There are many ways to create those files and you can automate the entire process with the appropriate tooling.
One of the most common ways to create a df file is to create whatever new schema you need in your development database and then use the "create an incremental df" facility in the data dictionary tool. This tool compares the development database schema to the target schema and builds a "df file" (DDL script) of the differences. You could connect directly to the target db for this process or you could have an empty skeleton db that you use for this.
How to create an incremental df file
(If you then reverse the comparison you can also create a reversing df file to undo the changes.)
Most df files consist of additions - new tables, new fields, new indexes. These can all be added online and that can all be completely scripted. And, of course, the individual df files and all of the supporting scripts can (and should) be stored in a repository (like git or whatever).
As for the data change scripts... there's no reason that those programs cannot be written as actual programs and saved in a repository. You can enclose the whole update in a transaction and UNDO it if that is appropriate. For what it is worth, I personally do not think that is a very good idea. Especially when large amounts of data are involved you really don't want to be creating monstrous multi-gigabyte undo logs. You're better off with a second "reversing transaction" script that will roll things back piecemeal. A side benefit is that you can still use that if you decide to back out the change a day or three afterwards.
The really gory details are going to depend on your development process and the customers change management process and the tooling available. It kind of sounds like there is not much process or tooling at either end of this relationship so you probably have a lot of adventures ahead of you!
In the docs, they say that you should avoid passing data between tasks:
This is a subtle but very important point: in general, if two operators need to share information, like a filename or small amount of data, you should consider combining them into a single operator. If it absolutely can’t be avoided, Airflow does have a feature for operator cross-communication called XCom that is described in the section XComs.
I fundamentally don't understand what they mean. If there's no data to pass between two tasks, why are they part of the same DAG?
I've got half a dozen different tasks that take turns editing one file in place, and each send an XML report to a final task that compiles a report of what was done. Airflow wants me to put all of that in one Operator? Then what am I gaining by doing it in Airflow? Or how can I restructure it in an Airflowy way?
fundamentally, each instance of an operator in a DAG is mapped to a different task.
This is a subtle but very important point: in general if two operators need to share
information, like a filename or small amount of data, you should consider combining them
into a single operator
the above sentence means that if you want any information that needs to be shared between two different tasks then it is best you could combine them into one task instead of using two different tasks, on the other hand, if you must use two different tasks and you need to pass some information from one task to another then you can do it using
Airflow's XCOM, which is similar to a key-value store.
In a Data Engineering use case, file schema before processing is important. imagine two tasks as follows :
Files_Exist_Check : the purpose of this task is to check whether particular files exist in a directory or not
before continuing.
Check_Files_Schema: the purpose of this task is to check whether the file schema matches the expected schema or not.
It would only make sense to start your processing if Files_Exist_Check task succeeds. i.e. you have some files to process.
In this case, you can "push" some key to xcom like "file_exists" with the value being the count of files present in that particular directory in Task Files_Exist_Check.
Now, you "pull" this value using the same key in Check_Files_Schema Task, if it returns 0 then there are no files for you to process hence you can raise exception and fail the task or handle gracefully.
hence sharing information across tasks using xcom does come in handy in this case.
you can refer following link for more info :
https://www.astronomer.io/guides/airflow-datastores/
Airflow - How to pass xcom variable into Python function
What you have to do for avoiding having everything in one operator is saving the data somewhere. I don't quite understand your flow, but if for instance, you want to extract data from an API and insert that in a database, you would need to have:
PythonOperator(or BashOperator, whatever) that takes the data from the API and saves it to S3/local file/Google Drive/Azure Storage...
SqlRelated operator that takes the data from the storage and insert it into the database
Anyway, if you know which files are you going to edit, you may also use jinja templates or reading info from a text file and make a loop or something in the DAG. I could help you more if you clarify a little bit your actual flow
I've decided that, as mentioned by #Anand Vidvat, they are making a distinction between Operators and Tasks here. What I think is that they don't want you to write two Operators that inherently need to be paired together and pass data to each other. On the other hand, it's fine to have one task use data from another, you just have to provide filenames etc in the DAG definition.
For example, many of the builtin Operators have constructor parameters for files, like the S3FileTransformOperator. Confusing documentation, but oh well!
I want to know which tables are being read by a query.
for each Customer where CustomerID = 12345.
Eventually this customer will be found in the following example, but progress must 'read' many tables before getting to customer 12345.
How do I know exactly which tables are read (By CustomerID), prior to getting to customer 12345?
*NOTE: I do not have access to modify the code being run for this selection. Ideally I would run a separate set of code that is executed at the same time as the customer query above to track the reads.
EDIT: More clearly - Can you track reads from a given program (.p) OR ProcessID and output either a RECID or the PrimaryKey to a file?
I understand the information is being read off the Disk and probably stored in a database buffer. So how would I get at the information in the database buffer?
You seem to be mixing up a few different things.
In a situation like your example where you FIND a specific record in one, and only one table then there is just a single record read. Progress will find that record by first scanning a relevant index. That might be 2 or 3 "logical reads" of the b-tree to get to the proper node. The record block and index blocks may, or may not be read from disk - that depends on what has happened previously.
There are "Virtual System Tables" available that can tell you how many READ operations take place against a particular table or index. But they do not trace the specific ROWID or other identifying data. _TableStat and _IndexStat are aggregates for all users on the system, _UserTableStat and _UserIndexStat are specific to a particular user's activity. You do need to set the -tablerangesize and -indexrangesize parameters adequately to take advantage of these.
If you have enabled the table and index statistics then you can use a tool like ProTop - http://protop.wss.com to get insight into this activity. Or you can write your own code.
OpenEdge Auditing does not track reads. That would be prohibitively expensive.
It's probably not really a good idea but, in theory, you could write FIND triggers for the tables you are interested in. That doesn't require access to the application source but you would need a development license. It will probably kill performance to do this though - so unless this is a non-production test environment that you just want to fiddle with I wouldn't really do that.
You mention wanting to know how you got to that point. That sounds more like you might need to have a "4gl trace". One easy way to get the stack trace of a running process is to execute:
$DLC/bin/proGetStack PID (UNIX)
or
%DLC%\bin\proGetStack PID (Windows)
This command will generate a "protrace.pid" file containing a 4gl stack trace and other interesting information.
There are also more complicated ways to get that info like using PROMON and the "client statement cache" or setting various log entry types at session startup. But proGetStack is pretty convenient and requires no code or scripting changes.
Some great options from Tom above. And all of them may be relevant to you. The option he only skirts around is the logging options. I feel obliged to expand on this because I'm giving a talk on it in a couple of weeks!
Assuming you are running a modern version of Progress, or even 10.2B08, then you have client logging available to you. Start your session with these additional options:
-clientlog "\somefolder\somefile.txt"
-logentrytypes "QryInfo:3"
This will log all the info of all the queries in your session to the file you specified above. If you navigate to the point in the system where you want to analyse your query and empty the logfile and save it, you can then run the offending query and see all the detail you need.
The output tells you all sorts of useful info, including the number of reads on each table, compared with the number returned to the user. You also get the index selected.
Using Tom's advice and/or this will get you what you need.
I am finding a solution to handle dynamic data / list to all nodes or specified nodes at the time creating a contract in Corda. I don’t think Oracle is a good approach to use in my case for the following reasons:
The data can be a list of for example legal entity names, they are not from outside world, not a single value;
The list is depended on particular field(s) selected, therefore will need perhaps a centralized place to maintain the data relationship;
Appreciate if anyone can help on this. Thanks.
Kwan
This question is a little difficult to answer without further details on your use-case. However, on the surface, an Oracle doesn't sound like a bad solution:
The data provided by an oracle can be a list
The term "outside world" simply refers to any information not included in the transaction itself. This term should not be taken too literally.
Ultimately, you can think of an Oracle as a provider of "official" data. You request a command including the data from the oracle, include it in the transaction, and the oracle will sign over the transaction if and only if it agrees that the data in the command is true. As long as the Oracle is trusted by all parties involved, this allows data from outside the transaction to be included in the transaction in a reliable way.
I have multiple flatfiles (CSV) (with multiple records) where files will be received randomly. I have to combine them (records) with unique ID fields.
How can I combine them, if there is no common unique field for all files, and I don't know which one will be received first?
Here are some files examples:
In real there are 16 files.
Fields and records are much more then in this example.
I would avoid trying to do this purely in XSLT/BizTalk orchestrations/C# code. These are fairly simple flat files. Load them into SQL, and create a view to join your data up.
You can still use BizTalk to pickup/load the files. You can also still use BizTalk to execute the view or procedure that joins the data up and sends your final message.
There are a few questions that might help guide how this would work here:
When do you want to join the data together? What triggers that (a time of day, a certain number of messages received, a certain type of message, a particular record, etc)? How will BizTalk know when it's received enough/the right data to join?
What does a canonical version of this data look like? Does all of the data from all of these files truly get correlated into one entity (e.g. a "Trade" or a "Transfer" etc.)?
I'd probably start with defining my canonical entity, and then look towards the path of getting a "complete" picture of that canonical entity by using SQL for this kind of case.