Oozie Mutual Exclusion between two coordinators

Oozie Mutual Exclusion between two coordinators - oozie

I have two coordinators in Oozie which works on similar data files.I want to create the mutual exclusion between those coordinators such that
If C1 runs then C2 should wait for C1 to complete
If C2 runs then C1 should wait for C2 to complete
C1 and C1 are Oozie Coordinators.
Please let me know how can we do that in Oozie in terms of coordinators.

Is there a reason you split the things you want to do into two coordinators? If you do not want to let them work in parallel you can as well just combine them in a single coordinator/workflow and let the actions happen sequentially.

Related

Order the result by the number of relationships

I have a directed multigraph. With this query I tried to find all the nodes that is connected to node with the uuid n1_34
MATCH (n1:Node{uuid: "n1_34"}) -[r]- (n2:Node) RETURN n2, r
This will give me a list of n2 (n1_1187, n2_2280, n2_1834, n2_932 and n2_722) and their relationships among themselves which are exactly what I need.
Nodes n1_1187, n2_2280, n2_1834, n2_932 and n2_722 are connected to the node n1_34
Now I need to order them based on the relationship it has within this subgraph. So for example, n1_1187 should be on top with 4 relationships while the others have 1 relationship.
I followed this post: Extract subgraph from Neo4j graph with Cypher but it gives me the same result as the query above. I also tried to return count(r) but it gives me 1 since it counts all unique relationships not the relationships with a common source/target.
Usually with networkx I can copy this result into a subgraph then count the relationships of each node. Can I do that with neo4j without modifying the current graph? How?
Please help. Or is there other way?

This snippet will recreate your graph for testing purposes:
WITH ['n1_34,n1_1187','n1_34,n2_2280','n1_34,n2_1834','n1_34,n2_722', 'n1_34,n2_932','n1_1187,n2_2280','n1_1187,n2_932','n1_1187,n2_1834', 'n1_1187,n2_722'] AS node_relationships
UNWIND node_relationships as relationship
with split(relationship, ",") as nodes
merge(n1:Node{label:nodes[0]})
merge(n2:Node{label:nodes[1]})
merge(n1)-[:LINK]-(n2)
Once that is run, the graph I'm working with looks like:
Then this CQL will select the nodes in the subgraph and then subsequently count up each of their respective associated links, but only to other nodes existing already in the subgraph:
match(n1:Node{label:'n1_34'})-[:LINK]-(n2:Node)
with collect(distinct(n2)) as subgraph_nodes
unwind subgraph_nodes as subgraph_node
match(subgraph_node)-[r:LINK]-(n3:Node)
where n3 in subgraph_nodes
return subgraph_node.label, count(r) order by count(r) DESC
Running the above yields the following result:

This query should do what you need :
MATCH (n1:Node{uuid: "n1_34"})-[r]-(n2:Node)
RETURN n1, n2, count(*) AS freq
ORDER BY freq DESC

Using PROFILE to assess the efficiency of some of the existing solutions using #DarrenHick's sample data, the following is the most efficient one I have found, needing only 84 DB hits:
MATCH (n1:Node{label:'n1_34'})-[:LINK]-(n2:Node)
WITH COLLECT(n2) AS nodes
UNWIND nodes AS n
RETURN n, SIZE([(n)-[:LINK]-(n3) WHERE n3 IN nodes | null]) AS cnt
ORDER BY cnt DESC
Darren's solution (adjusted to return subgraph_node instead of subgraph_node.label, for parity) requires 92 DB hits.
#LuckyChandrautama's own solution (provided in a comment to Darren's answer, and adjusted to match Darren's sample data), uses 122 DB hits.
This shows the importance of using PROFILE to assess the performance of different Cypher solutions against the actual data. You should try doing that with your actual data to see which one works best for you.

Is there a way we can create tests within a loop? So that we create tests based on input data from a file

Would like to hear from you all on a scenario I'm facing. Consider I have three scenarios that I'd like to test and for each scenario, I have about 10 input data which I'd like to Loop and run some tests.
The intention is, for a given scenario, I'd like to see for what range of inputs the test passes and for what it fails. Does this go against the standard of Robot framework test suite practices?
We are testing the results of a Search Engine and hence we expect not all results to pass but rather we are expecting to see when we are getting most ideal results(based on when most of scenarios with data passes).
Example :
Test -> Scenario 1
Loop ${line} in File1
Run Actual Test 1 for Input ${line}
Test -> Scenario 2
Loop ${line} in File2
Run Actual Test 2 for Input ${line}
Test -> Scenario 3
Loop ${line} in File3
Run Actual Test 3 for Input ${line}
Imagine that the files had 5 lines each. The idea is that there would be 15 tests actually and would like to know how many of those 15 pass and how many fail.
Thanks for your help. I'd really appreciate it.
Regards,
Balaji

Does this go against the standard of Robot framework test suite practices?
It sort of feels like it goes against the idea of actual e2e tests. When you're testing from the perspective of the user, you ideally don't want to ignore any failures.
Having said that, why not run such cases on a different level, perhaps you can tests some endpoints of an API (?)
If you really have to do it in RF, I suppose you can use e.g. Run Keyword And Ignore Error keyword. It'd not fail any test case but still give you feedback about what data caused a failure.

you can create keywords in place of test scenarios containing the test steps you want to execute for 10 different data. for example:
An excel file containing 10 records and for each record, you want to execute 3 different test scenarios:
sample code

Issue with join in Azure Data Factory

I have created a simple ADF pipeline that has two sources (S1, S2) and stores data from these sources into Azure Cosmos DB sink using left outer join (condition: s1.abc = S2.abc). After running this pipeline, I can see all columns from S1 and none of the columns from S2. Why is that? Please help me understand.

I can see all columns from S1 and none of the columns from S2
Since you mentioned left outer join in your question, so i think you are using Data Flow Activity to transfer the data. I tested on my side an it works for me.
Firstly,please check the statement of left outer join in the official document:
Then please refer to my sample test:
I have 2 csv files as below:
My data flow activity as below, the B is the join key:
Output in cosmos db,the row from left stream has no match,so the output from right stream is NULL:

Ab Initio graph : partioning by key behavior with Replicate

I am asking myself a question concerning
Let's suppose I have a flow F which is replicated X times.
All the replicated flows are then Join on the same key but with different datasets each time.
I want the joins to be run in a parallel layout. For this particular case, do I need to use X time the "Partition by key" component or can I put only one at the input of the replicate (instead of 1 per replicate output) ?
TLDR :
Is this graph
https://ibb.co/hHmk5e
equivalent to
https://ibb.co/i2NNJz
supposing all joins occur on same key
Thank you,

Use Replicate into multiple Partition By Keys. Pay caution to the checkpoints, if you have 3 checkpoints after the replicate consider removing them and placing a single checkpoint before the replicate.

From and From Named Graph in SPARQL

I am getting a confusion related to FROM and FROM NAMED graphs in SPARQL. I did read the specifications relating to these two construct in the SPARQL Specifications. I just want to confirm my understanding.
Suppose an RDF Dataset is located at IRI I. I is made up of:
a default graph G
3 named graphs {(I1,G1), (I2,G2), (I3,G3)}
Now, suppose I have a SPARQL query:
SELECT *
FROM I
FROM I1
FROM NAMED I2
So if I understand, to evaluate this query, the SPARQL service may construct the active graph at the back, this active merge will contain:
a default graph which is the merge of I and I1
a named graph I2
Is this understanding right?

The FROM, FROM NAMED clauses describe the dataset to be queried. How that comes into being is not part of the SPARQL spec. There is a universe of graphs from which I, I1, and I2 are taken.
You are correct that the dataset for the query will have a default graph which is the merge of I and I1, and also a named graph I2.
Whether those are taken from the underlying dataset is implementation dependent. It is a common thing to provide (the universe of graphs is the named graphs in the dataset) but it is also possible that the I, I1, and I2 are taken from the web (the universe of graphs is the web).

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Oozie Mutual Exclusion between two coordinators - oozie

Is there a reason you split the things you want to do into two coordinators? If you do not want to let them work in parallel you can as well just combine them in a single coordinator/workflow and let the actions happen sequentially.

Related

Order the result by the number of relationships

Is there a way we can create tests within a loop? So that we create tests based on input data from a file

Issue with join in Azure Data Factory

Ab Initio graph : partioning by key behavior with Replicate

From and From Named Graph in SPARQL

Categories

Resources