How to Project Rows Whose Column Values Only Appear in the First Table? - azure-data-explorer

I have two Kusto tables in the same database, Open_Work_Items and Closed_Work_Items that appear respectively like so:
Item ID | Opened Date
1234 | <DateTime>
Item ID | Closed Date
1234 | <DateTime>
My issue is that I cannot remove work items from Open_Work_Items once the Item ID appears in Closed_Work_Items, but I would still like to query which work items are open. This means I need to find distinct Item IDs Open_Work_Items that do not appear in Closed_Work_Items, but I do not know which Kusto function(s) I can use to do so.
I've looked at Tabular and Scalar Operators, but I'm not understanding how I can combine them to get what I want here. Any help/advice would be appreciated!
Any help would be appreciated!

I think I figured it out:
Open_Work_Items | where Item_ID !in (Closed_Work_Items)

Related

projecting multiple columns in ADX with wild cards

If I have too many columns and a bunch of them start with similar strings , is there a way in Kusto to select them based on this pattern , such as using wild cards etc ?
e.g. Assuming we have some of the columns like datafield1, datafield2 ... , something like the following would be helpful
mytable | project datafield*
I know that this is not syntactically valid , so is there any workaround for achieving this easily?
project-keep does exactly what you want:
mytable | project-keep datafield*

Teradata LIKE ANY causing duplicates

I have a query which looks something like below which aim is to see if a two letter code exists in a string. If it does, set a column to yes, if it doesnt set a column to no
SELECT ID,
case when Table_a.item LIKE ANY ('%AA%','%AB%','%AC%','%AD%','%AE%','%FF%',' %GG%','%HR%','%TR%','%ST%','%VL%') THEN 'YES' else 'NO' end AS foo,
item
FROM Table_a
Unfortunately its causing rows to appear twice
ID foo item
112 yes AA-FF-TT-RR
112 no AA-FF-TT-RR
Does anyone know why.
Am i misusing the LIKE ANY function
We are not on Teradata 14 yet
Thank you for your time

How to update entries in a table within a nested dictionary?

I am trying to create an order book data structure where a top level dictionary holds 3 basic order types, each of those types has a bid and ask side and each of the sides has a list of tables, one for each ticker. For example, if I want to retrieve all the ask orders of type1 for Google stock, I'd call book[`orderType1][`ask][`GOOG]. I implemented that using the following:
bookTemplate: ([]orderID:`int$();date:"d"$();time:`time$();sym:`$();side:`$();
orderType:`$();price:`float$();quantity:`int$());
bookDict:(1#`)!enlist`orderID xkey bookTemplate;
book: `orderType1`orderType2`orderType3 ! (3# enlist(`ask`bid!(2# enlist bookDict)));
Data retrieval using book[`orderType1][`ask][`ticker] seems to be working fine. The problem appears when I try to add new order to a specific order book e.g:
testorder:`orderID`date`time`sym`side`orderType`price`quantity!(111111111;.z.D;.z.T;
`GOOG;`ask;`orderType1;100.0f;123);
book[`orderType1][`ask][`GOOG],:testorder;
Executing the last query gives 'assign error. What's the reason? How to solve it?
A couple of issues here. First one being that while you can lookup into dictionaries using a series of in-line repeated keys, i.e.
q)book[`orderType1][`ask][`GOOG]
orderID| date time sym side orderType price quantity
-------| -------------------------------------------
you can't assign values like this (can only assign at one level deep). The better approach is to use dot-indexing (and dot-amend to reassign values). However, the problem is that the value of your book dictionary is getting flattened to a table due to the list of dictionaries being uniform. So this fails:
q)book . `orderType1`ask`GOOG
'rank
You can see how it got flattened by inspecting the terminal
q)book
| ask
----------| -----------------------------------------------------------------
orderType1| (,`)!,(+(,`orderID)!,`int$())!+`date`time`sym`side`orderType`pric
orderType2| (,`)!,(+(,`orderID)!,`int$())!+`date`time`sym`side`orderType`pric
orderType3| (,`)!,(+(,`orderID)!,`int$())!+`date`time`sym`side`orderType`pric
To prevent this flattening you can force the value to be a mixed list by adding a generic null
q)book: ``orderType1`orderType2`orderType3 !(::),(3# enlist(`ask`bid!(2# enlist bookDict)));
Then it looks like this:
q)book
| ::
orderType1| `ask`bid!+(,`)!,((+(,`orderID)!,`int$())!+`date`time`sym`side`ord
orderType2| `ask`bid!+(,`)!,((+(,`orderID)!,`int$())!+`date`time`sym`side`ord
orderType3| `ask`bid!+(,`)!,((+(,`orderID)!,`int$())!+`date`time`sym`side`ord
Dot-indexing now works:
q)book . `orderType1`ask`GOOG
orderID| date time sym side orderType price quantity
-------| -------------------------------------------
which means that dot-amend will now work too
q).[`book;`orderType1`ask`GOOG;,;testorder]
`book
q)book
| ::
orderType1| `ask`bid!+``GOOG!(((+(,`orderID)!,`int$())!+`date`time`sym`side`o
orderType2| `ask`bid!+(,`)!,((+(,`orderID)!,`int$())!+`date`time`sym`side`ord
orderType3| `ask`bid!+(,`)!,((+(,`orderID)!,`int$())!+`date`time`sym`side`ord
Finally, I would recommend reading this FD whitepaper on how to best store book data: http://www.firstderivatives.com/downloads/q_for_Gods_Nov_2012.pdf

DSE cassandra and spark map collections type: how to perform get operation

For example I have the following table named "example":
name | age | address
'abc' | 12 | {'street':'1', 'city':'kl', 'country':'malaysia'}
'cab' | 15 | {'street':'5', 'city':'jakarta', 'country':'indonesia'}
In Spark I can do this:
scala> val test = sc.cassandraTable ("test","example")
and this:
scala> test.first.getString
and this:
scala> test.first.getMapString, String
which gives me all the fields of the address in the form of a map
Question 1: But how do I use the "get" to access "city" information?
Question 2: Is there a way to falatten the entire table?
Question 3: how do I go about counting number of rows where "city" = "kl"?
Thanks
Question 3 : How do we count the number of rows where city == something
I'll answer 3 first because this may provide you an easier way to work with the data. Something like
sc.cassandraTable[(String,Map[String,String],Int)]("test","example")
.filter( _._2.getOrElse("city","NoCity") == "kl" )
.count
First, I use the type parameter [(String,Map[String,String],Int)] on my cassandraTable call to transform the rows into tuples. This gives me easy access to the Map without any casting. (The order is just how it appears when I made the table in my test environment you may have to change the ordering)
Second I say I would like to filter based on the _._2 which is shorthand for the second element of the incoming tuple. getOrElse returns the value for the key "city" if the key exists and "NoCity" otherwise. The final equivalency checks what city it is.
Finally, I call count to find out the number of entries in the city.
1 How do we access the map?
So the answer to 2 is that once you have a Map, you can call get("key") or getOrElse("key") or any of the standard Scala operations to get a value out of the map.
2 How to flatten the entire table.
Depending on what you mean by "flatten" this can be a variety of things. For example if you want to return the entire table as an array to the driver (Not recommended since your RDD should be very big in production.) You can call collect
If you want to flatten the elements of your map into a tuple you can always do something like calling toSeq and you will end up with a list of (key,value) tuples. Feel free to ask another question if I haven't answered what you want with "flattening."

Joining odd/even results of select for subtraction in sqlite?

I've got an sqlite table which contains start/stop timestamps. I would like to create a query which returns a total elapsed time from there.
Right now I have a SELECT (e.g. SELECT t,type FROM event WHERE t>0 AND (name='start' or name='stop) and eventId=xxx ORDER BY t) which returns a table which looks something like this:
+---+-----+
|t |type |
+---+-----+
| 1|start|
| 20|stop |
|100|start|
|150|stop |
+---+-----+
To produce the total elapsed time in the above example would be accomplished by (20-1)+(150-100) = 69
One idea I had was this: I could run two separate queries, one for the "start" fields and one for the "stop" fields, on the assumption that they would always line up like this:
+---+---+
|(1)|(2)|
+---+---+
| 1| 20|
|100|150|
+---+---+
(1) SELECT t FROM EVENT where name='start' ORDER BY t
(2) SELECT t FROM EVENT where name='stop' ORDER BY t
Then it would be simple (I think!) to just sum the differences. The only problem is, I don't know if I can join two separate queries like this: I'm familiar with joins that combine every row with every other row and then eliminate those that don't match some criteria. In this case, the criteria is that the row index is the same, but this isn't a database field, it's the order of the resulting rows in the output of two separate selects - there isn't any database field I can use to determine this.
Or perhaps there is some other way to do this?
I do not use SQLite but this may work. Let me know.
SELECT SUM(CASE WHEN type = 'stop' THEN t ELSE -t END) FROM event
This assumes the only values in type are start/stop.

Resources