The difference between a YIELD statement and a GROUP combined YIELD clause in Nebula Graph - nebula-graph

FETCH PROP ON person "..." YIELD properties(person).name as name, properties(vertex).age as age
| GROUP BY $-.name YIELD $-.name, max($-.age) as max_age
FETCH PROP ON person "..." YIELD properties(person).name as name, properties(vertex).age as age
| YIELD $-.name, max($-.age) as max_age
As title, is there a difference between using the YIELD statement independently and the GROUP combination YIELD clause?

Equivalent syntax. The yield clause of native ngql supports both implicit and explicit groupby, where implicit groupby behavior is consistent with cypher standards. For details,see neo4j docs.

Related

How to handle return value of SnowflakeOperator in Airflow

I'm currently experimenting with Airflow for monitoring tasks regarding Snowflake and I'd like to execute a simple DAG with one task that pushes a SQL query to in Snowflake and should check the returned value that should be a number to be greater than a defined threshold.
So the following is basically my sql Statement in the DAG definition:
query_check = """select COUNT(*)
FROM (select CASE WHEN NAME LIKE '%SW_PRODUCTFEED%' THEN 'PRODUCTFEED'
ELSE NULL END AS TASKTREE_NAME
, NAME
, STATE
, ERROR_MESSAGE
, SCHEDULED_TIME
, QUERY_START_TIME
, NEXT_SCHEDULED_TIME
from table(TEST_DB.INFORMATION_SCHEMA.task_history())
where TASKTREE_NAME IS NOT NULL
qualify DENSE_RANK() OVER (PARTITION BY TASKTREE_NAME ORDER BY to_date(SCHEDULED_TIME) desc) < 3
order by scheduled_time desc);"""
Then the following is the definition of the DAG and the task within it:
with dag:
query1_exec = SnowflakeCheckOperator(
task_id="snowflake_check_task_history",
sql=query_check,
params={
"check_name": "number_rows",
"check_statement": "count >=1"
},
conn_id="Snowflake_test"
)
query1_exec
I'd like to use the SnowflakeCheckOperator to check the returned value from the query if it's greater than 1
However, it seems that Snowflake or the SnowflakeOperator in that case is returning the result of the query in a dict object, like so:
Record: {'COUNT(*)': 10}
Therefore the check always results in a true statement because the SnowflakeCheckOperator isn't checking against the value of the Record["Count"] but something else.
Now my question is how to handle the return value so that the check is evaluated against right value? Is it possible to change the format of the return value? Or maybe get access to the value of the key of the dict object?

Boto3: querying DynamoDB with multiple sort key values

Is there any way of supplying multiple values for a DynamoDB table's Sort Key whilst doing a query in Boto3?
For a single SK value to search on, I'm doing this:
table.query(
IndexName="my_gsi",
KeyConditionExpression=Key('my_gsi_pk').eq({pk value}) & Key('my_gsi_sk').eq({sk value}),
FilterExpression={filter expression}
)
... which works.
However, my scenario involves searching on one of a couple of potential SK values, so I'd like to, in SQL terms, do something like this:
WHERE my_gsi_pk = {pk value}
AND my_gsi_sk IN ({sk value 1}, {sk value 2})
I've looked in the Boto3 documentation in the .query() section and concentrated upon the KeyConditionExpression syntax but can't identify whether this is possible or not.
The query API does not support the IN operator in the KeyConditionExpression.
Use the execute_statement API instead. This executes a PartiQL statement, which does accept the IN operator in query operations for the Partition and Sort keys:
sk = ["Foo", "Bar"]
res = client.execute_statement(
Statement=f'SELECT * FROM "my_table"."my_gsi" WHERE my_gsi_pk = ? AND my_gsi_sk IN [{",".join(["?" for k in sk])}]',
Parameters= [{"S": "1"}] + [{"S": k} for k in sk]
)
This creates a PartiQL Statement like SELECT * FROM "my_table"."my_gsi" WHERE my_gsi_pk = ? AND my_gsi_sk IN [?, ?] and substitution Parameters like [{"S": "1"}, {"S": "Foo"}, {"S": "Bar"}].
Please note that the PartiQL will spend much more RCU than the Query. You can check this by requesting ReturnConsumedCapacity = ReturnConsumedCapacity.TOTAL

Query ADX table name dynamically

I have a need to be able to query Azure Data Explorer (ADX) tables dynamically, that is, using application-specific metadata that is also stored in ADX.
If this is even possible, the way to do it seems to be via the table() function. In other words, it feels like I should be able to simply write:
let table_name = <non-trivial ADX query that returns the name of a table as a string>;
table(table_name) | limit 10
But this query fails since I am trying to pass a variable to the table() function, and "a parameter, which is not scalar constant string can't be passed as parameter to table() function". The workaround provided doesn't really help, since all the possible table names are not known ahead of time.
Is there any way to do this all within ADX (i.e. without multiple queries from the client) or do I need to go back to the drawing board?
if you know the desired output schema, you could potentially achieve that using union (note that in this case, the result schema will be the union of all tables, and you'll need to explicitly project the columns you're interested in)
let TableA = view() { print col1 = "hello world"};
let TableB = view() { print col1 = "goodbye universe" };
let LabelTable = datatable(table_name:string, label:string, updated:datetime)
[
"TableA", "MyLabel", datetime(2019-10-08),
"TableB", "MyLabel", datetime(2019-10-02)
];
let GetLabeledTable = (l:string)
{
toscalar(
LabelTable
| where label == l
| order by updated desc
| limit 1
)
};
let table_name = GetLabeledTable('MyLabel');
union withsource = T *
| where T == table_name
| project col1

DynamoDB “OR” conditional Range query

Let's assume my table looks like:
Code |StartDate |EndDate |Additional Attributes...
ABC |11-24-2015 |11-26-2015 | ....
ABC |12-12-2015 |12-15-2015 | ....
ABC |10-05-2015 |10-10-2015 | ....
PQR |03-24-2015 |03-27-2015 | ....
PQR |05-04-2015 |05-08-2015 | ....
Provided a Code (c) and a date range (x, y), I need to be able to query items something like:
Query => (Code = c) AND ((StartDate BETWEEN x AND y) OR (EndDate BETWEEN x AND y))
I was planning to use a Primary Key as a Hash and Range Key (Code, StartDate) with an additional LSI (EndDate) and do a query on it.
I am not sure if there is a way to achieve this. I don't want to use the SCAN operation as it seems to scan the entire table which could be very costly.
Also, would like to achieve this in a single query.
One option would be to do this using QUERY and a FilterExpression. No need to define the LSI on this case. You would have to query by Hash Key with the EQ operator and then narrow the results with the Filter Expression. Here is an example with the Java SDK:
Table table = dynamoDB.getTable(tableName);
Map<String, Object> expressionAttributeValues = new HashMap<String, Object>();
expressionAttributeValues.put(":x", "11-24-2015");
expressionAttributeValues.put(":y", "11-26-2015");
QuerySpec spec = new QuerySpec()
.withHashKey("Code", "CodeValueHere")
.withFilterExpression("(StartDate between :x and :y) or (EndDate between :x and :y)")
.withValueMap(expressionAttributeValues);
ItemCollection<QueryOutcome> items = table.query(spec);
Iterator<Item> iterator = items.iterator();
while (iterator.hasNext()) {
System.out.println(iterator.next().toJSONPretty());
}
See Specifying Conditions with Condition Expressions for more details.
Additionally, although the previous query only uses the Hash Key , you can still group the records with the Range Key containing the dates in the following format:
StartDate#EndDate
Table Structure:
Code DateRange |StartDate |EndDate
ABC 11-24-2015#11-26-2015 |11-24-2015 |11-26-2015
ABC 12-12-2015#12-15-2015 |12-12-2015 |12-15-2015
ABC 10-05-2015#10-10-2015 |10-05-2015 |10-10-2015
PQR 03-24-2015#03-27-2015 |03-24-2015 |03-27-2015
PQR 05-04-2015#05-08-2015 |05-04-2015 |05-08-2015
This way If you happen to query only by Hash Key you would still get the records sorted by the dates. Also, I believe it is a good idea to follow the advice given about the unambiguous date format.nu

cypher: how to return distinct relationship types?

How to return the distinct relationship types from all paths in cypher?
Example query:
MATCH p=(a:Philosopher)-[*]->(b:SchoolType)
RETURN DISTINCT EXTRACT( r in RELATIONSHIPS(p)| type(r) ) as RelationshipTypes
This returns a collection for each path p.
I would like to return a single collection contain the distinct relationship types across all collections.
Here is a link to a graph gist to run the query-
http://gist.neo4j.org/?7851642
You might first collect all relationships on the matched path to a collection "allr", and then get the collection of distinct type(r) from the collection of all relationships,
MATCH p=(a:Philosopher)-[rel*]->(b:SchoolType)
WITH collect(rel) AS allr
RETURN Reduce(allDistR =[], rcol IN allr |
reduce(distR = allDistR, r IN rcol |
distR + CASE WHEN type(r) IN distR THEN [] ELSE type(r) END
)
)
Note, each element 'rcol' in the collection "allr" is in turn a collection of relationships on each matched path.

Resources