Can't project/extend after `bag_unpack` if empty table - azure-application-insights

I'm using bag_unpack to explode the customDimensions column in the AppInsights traces table and want to "shape" the resultant table. All is fine if there are rows to work with. If there are not, subsequent operations that reference the exploded columns fail. For example (I boiled it down to an isolated repro),
datatable (Date:datetime, JSON:string )
[datetime(1910-06-11), '{"key": "1"}', datetime(1930-01-01), '{"key": "2"}',
datetime(1953-01-01), '{"key": "3"}', datetime(1997-06-25), '{"key": "4"}']
| where Date > datetime(2000-01-01)
| project parsed = parse_json(JSON)
| evaluate bag_unpack(parsed)
| project-rename value = key
// lots more data shaping here
Since the where filters out all rows, there is nothing to unpack. OK, that's fine but the data shaping ops (e.g., project-rename) fail saying
project-rename: Failed to resolve column reference 'key'
If you change the date in the where to be say 1900-01-01 then everything works as expected.
Note as well that if you remove the bag_unpack the project-rename some other column, it works fine with no rows. For example,
datatable (Date:datetime, JSON:string )
[datetime(1910-06-11), '{"key": "1"}', datetime(1930-01-01), '{"key": "2"}',
datetime(1953-01-01), '{"key": "3"}', datetime(1997-06-25), '{"key": "4"}']
| where Date > datetime(2000-01-01)
| project-rename value = JSON
I can see how the unpack creates the columns so if it didn't run the column doesn't get created but at the same time, why run the project at all if there are no rows?
In theory I could move the where down but I'm not sure if the query planning will recognize that and only do the subsequent project/data shaping on the reduced set of rows (filtered by the where). I've got a lot of rows and typically only need to operate on a few of them.
Pointers on how to work with bag_unpack and empty tables? Or columns that may or may not be there?

You could use the column_ifexists() function: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/columnifexists
For example:
... | project value = column_ifexists("key", "")

Related

In KQL how can I use bag_unpack to turn a serialized dictionary object in customDimensions into columns?

I'm trying to write a KQL query that will, among other things, display the contents of a serialized dictionary called Tags which has been added to the Application Insights traces table customDimensions column by application logging.
An example of the serialized Tags dictionary is:
{
"Source": "SAP",
"Destination": "TC",
"SAPDeliveryNo": "0012345678",
"PalletID": "(00)312340123456789012(02)21234987654(05)123456(06)1234567890"
}
I'd like to use evaluate bag_unpack(...) to evaluate the JSON and turn the keys into columns. We're likely to add more keys to the dictionary as the project develops and it would be handy not to have to explicitly list every column name in the query.
However, I'm already using project to reduce the number of other columns I display. How can I use both a project statement, to only display some of the other columns, and evaluate bag_unpack(...) to automatically unpack the Tags dictionary into columns?
Or is that not possible?
This is what I have so far, which doesn't work:
traces
| where datetime_part("dayOfYear", timestamp) == datetime_part("dayOfYear", now())
and message has "SendPalletData"
| extend TagsRaw = parse_json(customDimensions.["Tags"])
| evaluate bag_unpack(TagsRaw)
| project timestamp, message, ActionName = customDimensions.["ActionName"], TagsRaw
| order by timestamp desc
When it runs it displays only the columns listed in the project statement (including TagsRaw, so I know the Tags exist in customDimensions).
evaluate bag_unpack(TagsRaw) doesn't automatically add extra columns to the result set unpacked from the Tags in customDimensions.
EDIT: To clarify what I want to achieve, these are the columns I want to output:
timestamp
message
ActionName
TagsRaw
Source
Destination
SAPDeliveryNo
PalletID
EDIT 2: It turned out a major part of my problem was that double quotes within the Tags data are being escaped. While the Tags as viewed in the Azure portal looked like normal JSON, and copied out as normal JSON, when I copied out the whole of a customDimensions record the Tags looked like "Tags": "{\"Source\":\"SAP\",\"Destination\":\"TC\", ... with the double quotes escaped with backslashes.
The accepted answer from David Markovitz handles this situation in the line:
TagsRaw = todynamic(tostring(customDimensions["Tags"]))
A few comments:
When filtering on timestamp, better use the timestamp column As Is, and do the manipulations on the other side of the equation.
When using the has[...] operators, prefer the case-sensitive one (if feasable)
Everything extracted from dynamic value is also dynamic, and when given a dynamic value parse_json() (or its equivalent, todynamic()), simply returns it, As Is.
Therefore, we need to treet customDimensions.["Tags"] in 2 steps:
1st, convert it to string. 2nd, convert the result to dynamic.
To reference a field within a dynamic type you can use X.Y, X["Y"], or "X['Y'].
No need to combine them as you did with customDimensions.["Tags"].
As the bag_unpack plugin doc states:
"The specified input column (Column) is removed."
In other words, TagsRaw does not exist following the bag_unpack operation.
Please note that you can add prefix to the columns generated by bag_unpack. Might make it easier to differentiate them from the rest of the columns.
While you can use project, using project-away is sometimes easier.
// Data sample generation. Not part of the solution.
let traces =
print c1 = "some columns"
,c2 = "we"
,c3 = "don't need"
,timestamp = ago(now()%1d * rand())
,message = "abc SendPalletData xyz"
,customDimensions = dynamic
(
{
"Tags":"{\"Source\":\"SAP\",\"Destination\":\"TC\",\"SAPDeliveryNo\":\"0012345678\",\"PalletID\":\"(00)312340123456789012(02)21234987654(05)123456(06)1234567890\"}"
,"ActionName":"Action1"
}
)
;
// Solution starts here
traces
| where timestamp >= startofday(now())
and message has_cs "SendPalletData"
| extend TagsRaw = todynamic(tostring(customDimensions["Tags"]))
,ActionName = customDimensions.["ActionName"]
| project-away c*
| evaluate bag_unpack(TagsRaw, "TR_")
| order by timestamp desc
timestamp
message
ActionName
TR_Destination
TR_PalletID
TR_SAPDeliveryNo
TR_Source
2022-08-27T04:15:07.9337681Z
abc SendPalletData xyz
Action1
TC
(00)312340123456789012(02)21234987654(05)123456(06)1234567890
0012345678
SAP
Fiddle
If I understand correctly, you want to use project to limit the number of columns that are displayed, but you also want to include all of the unpacked columns from TagsRaw, without naming all of the tags explicitly.
The easiest way to achieve this is to switch the order of your steps, so that you first do the project (including the TagsRaw column) and then you unpack the tags. If desired, you can then use project-away to specifically remove the TagsRaw column after you've unpacked it.

Aggregations in Application Insights analytics treated as scalars

I have this query in application insights analytics
let total = exceptions
| where timestamp >= ago(7d)
| where problemId contains "Microsoft.ServiceBus"
| summarize sum(itemCount);
let nullContext = exceptions
| where timestamp >= ago(7d)
| where problemId contains "Microsoft.ServiceBus"
| where customDimensions.["SpecificTelemetry.Message"] == "HttpContext.Current is null"
| summarize sum(itemCount);
let result = iff(total == nullContext, "same", "different");
result
but I get this error
Invalid relational operator
I am surprised as yesterday with the same code (as far as I remember) I was getting a different error saying that both sides of the check would need to be scalar but my understanding was that the aggregation even if it displays a value (under sum_countItem) it's not a scalar. But couldn't find a way to transform it or now to get rid of this work.
Thanks
Couple of issues.
First - the Invalid relational operator is probably due to the empty lines between your let statements. AI Analytics allows you to write several queries in the same window, and uses empty lines to separate those. So in order to run all the statements as a single query you need to eliminate the empty lines.
Regarding the error of "Left and right side of the relational operator must be scalars" - the result of the "summarize" operator is a table and not scalar. It can contain a single line/column or multiple of those (think of what happens if you add a "by" clause to the summarize).
To achieve what you want to do you might want to use a single query as follows:
exceptions
| where timestamp >= ago(7d)
| where problemId contains "Microsoft.ServiceBus"
| extend nullContext = customDimensions.["SpecificTelemetry.Message"] == "HttpContext.Current is null"
| summarize sum(itemCount) by nullContext

How to update entries in a table within a nested dictionary?

I am trying to create an order book data structure where a top level dictionary holds 3 basic order types, each of those types has a bid and ask side and each of the sides has a list of tables, one for each ticker. For example, if I want to retrieve all the ask orders of type1 for Google stock, I'd call book[`orderType1][`ask][`GOOG]. I implemented that using the following:
bookTemplate: ([]orderID:`int$();date:"d"$();time:`time$();sym:`$();side:`$();
orderType:`$();price:`float$();quantity:`int$());
bookDict:(1#`)!enlist`orderID xkey bookTemplate;
book: `orderType1`orderType2`orderType3 ! (3# enlist(`ask`bid!(2# enlist bookDict)));
Data retrieval using book[`orderType1][`ask][`ticker] seems to be working fine. The problem appears when I try to add new order to a specific order book e.g:
testorder:`orderID`date`time`sym`side`orderType`price`quantity!(111111111;.z.D;.z.T;
`GOOG;`ask;`orderType1;100.0f;123);
book[`orderType1][`ask][`GOOG],:testorder;
Executing the last query gives 'assign error. What's the reason? How to solve it?
A couple of issues here. First one being that while you can lookup into dictionaries using a series of in-line repeated keys, i.e.
q)book[`orderType1][`ask][`GOOG]
orderID| date time sym side orderType price quantity
-------| -------------------------------------------
you can't assign values like this (can only assign at one level deep). The better approach is to use dot-indexing (and dot-amend to reassign values). However, the problem is that the value of your book dictionary is getting flattened to a table due to the list of dictionaries being uniform. So this fails:
q)book . `orderType1`ask`GOOG
'rank
You can see how it got flattened by inspecting the terminal
q)book
| ask
----------| -----------------------------------------------------------------
orderType1| (,`)!,(+(,`orderID)!,`int$())!+`date`time`sym`side`orderType`pric
orderType2| (,`)!,(+(,`orderID)!,`int$())!+`date`time`sym`side`orderType`pric
orderType3| (,`)!,(+(,`orderID)!,`int$())!+`date`time`sym`side`orderType`pric
To prevent this flattening you can force the value to be a mixed list by adding a generic null
q)book: ``orderType1`orderType2`orderType3 !(::),(3# enlist(`ask`bid!(2# enlist bookDict)));
Then it looks like this:
q)book
| ::
orderType1| `ask`bid!+(,`)!,((+(,`orderID)!,`int$())!+`date`time`sym`side`ord
orderType2| `ask`bid!+(,`)!,((+(,`orderID)!,`int$())!+`date`time`sym`side`ord
orderType3| `ask`bid!+(,`)!,((+(,`orderID)!,`int$())!+`date`time`sym`side`ord
Dot-indexing now works:
q)book . `orderType1`ask`GOOG
orderID| date time sym side orderType price quantity
-------| -------------------------------------------
which means that dot-amend will now work too
q).[`book;`orderType1`ask`GOOG;,;testorder]
`book
q)book
| ::
orderType1| `ask`bid!+``GOOG!(((+(,`orderID)!,`int$())!+`date`time`sym`side`o
orderType2| `ask`bid!+(,`)!,((+(,`orderID)!,`int$())!+`date`time`sym`side`ord
orderType3| `ask`bid!+(,`)!,((+(,`orderID)!,`int$())!+`date`time`sym`side`ord
Finally, I would recommend reading this FD whitepaper on how to best store book data: http://www.firstderivatives.com/downloads/q_for_Gods_Nov_2012.pdf

How to insert all the records in a data frame into the database in R?

I have a data frame data_frm which has following columns:
emp_id | emp_sal | emp_bonus | emp_desig_level
| | |
| | |
And I want to insert all the records present in this data frame into the database table tab1.I executed this query but I got an error:
for(record in data_frm)
{
write_sql <- paste("Insert into tab1 (emp_id,emp_sal,emp_bonus,emp_desig_level) values (",data_frm[,"emp_id"],",",data_frm[,"emp_sal"],",",data_frm[,"emp_bonus"],",",data_frm[,"emp_desig_level"],")",sep="")
r <- dbSendQuery(r,write_sql)
}
I get error as:
Error in data_frm[, "emp_id"] : incorrect number of dimensions
How do I insert all the records from the data frame into database?
NOTE: I want to insert all the records of the data frame using insert statement.
dbWriteTable(conn, "RESULTS", results2000, append = T) # to protect current values
dbWriteTable(conn, "RESULTS", results2000, append = F) # to overwrite values
From the RDBI homepage at sourceforge. Hope that helps...
In your for loop, you need to put:
data_frm[record,"column_name"]
Other wise your loop is trying to insert the entire column instead of just the particular record.
for(record in data_frm)
{
write_sql <- paste("Insert into tab1 (emp_id,emp_sal,emp_bonus,emp_desig_level) values (",data_frm[record,"emp_id"],",",data_frm[record,"emp_sal"],",",data_frm[record,"emp_bonus"],",",data_frm[record,"emp_desig_level"],")",sep="")
r <- dbSendQuery(r,write_sql)
}
Answered here
Copied one more time:
Recently I had similar issue.
Problem description: MS Server data base with scheme. The task is to save an R data.frame object to a predefined data base table without dropping it.
Problems I faced:
Some packages functions does not support schemes or require github development version installation
You can save data.frame only after drop (delete table) operation (I needed just "clear table" operation)
How I solved the issue
Using simple RODBC::sqlQuery, writing a data.frame row by row.
The solution (couple of functions) is available here or here

Joining odd/even results of select for subtraction in sqlite?

I've got an sqlite table which contains start/stop timestamps. I would like to create a query which returns a total elapsed time from there.
Right now I have a SELECT (e.g. SELECT t,type FROM event WHERE t>0 AND (name='start' or name='stop) and eventId=xxx ORDER BY t) which returns a table which looks something like this:
+---+-----+
|t |type |
+---+-----+
| 1|start|
| 20|stop |
|100|start|
|150|stop |
+---+-----+
To produce the total elapsed time in the above example would be accomplished by (20-1)+(150-100) = 69
One idea I had was this: I could run two separate queries, one for the "start" fields and one for the "stop" fields, on the assumption that they would always line up like this:
+---+---+
|(1)|(2)|
+---+---+
| 1| 20|
|100|150|
+---+---+
(1) SELECT t FROM EVENT where name='start' ORDER BY t
(2) SELECT t FROM EVENT where name='stop' ORDER BY t
Then it would be simple (I think!) to just sum the differences. The only problem is, I don't know if I can join two separate queries like this: I'm familiar with joins that combine every row with every other row and then eliminate those that don't match some criteria. In this case, the criteria is that the row index is the same, but this isn't a database field, it's the order of the resulting rows in the output of two separate selects - there isn't any database field I can use to determine this.
Or perhaps there is some other way to do this?
I do not use SQLite but this may work. Let me know.
SELECT SUM(CASE WHEN type = 'stop' THEN t ELSE -t END) FROM event
This assumes the only values in type are start/stop.

Resources