sqlite timecode calculations - sqlite

I have two tables which I export from my video editing suite, one ("MediaPool") containing a row for each media file imported into the project, another ("Montage") for the portions of that file used in a specific edit. The fields that are associated between the two are MediaPool.FileName and Montage.Name, which are very similar (Filename only adds the file extension).
# MediaPool
Filename | Take
---------------------------------
somefile.mp4 | Getty
file2.mov | Associated Press
file3.mov | Associated Press
and
# Montage
Name | RecordIn | RecordOut
------------------------------------------
somefile | 01:01:01:01 | 01:01:20:19
somefile | 01:05:15:23 | 01:05:16:10
somefile | 01:25:19:10 | 01:30:16:04
file2 | 01:30:11:10 | 01:31:18:12
file2 | 01:40:15:22 | 01:42:21:17
The tables contain many more columns of course, but only the above is relevant.
Only the "MediaPool" table contains the field called "Take" which designates the file's copyright holder (long story). It can't be included in the "Montage" export. I needed to calculate the total duration of footage used from each source, by subtracting the RecordIn timecode from RecordOut and adding each result. This turned out to be more complicated than I expected, as I have some notions of programming but almost none when it comes to SQL (sqlite in my case).
I managed to come up with the following, which works fine and runs in under 4 seconds. However, from the little programming I've done, it seems overlong and very inelegant. Is there a shorter way to achieve this?
BTW, I'm using 25 fps timecode and I can't use LPAD in sqlite.
SELECT
Source,
SUBSTR('00' || CAST(DurationFrames/(60*60*25) AS TEXT), -2, 2) || ':' ||
SUBSTR('00' || CAST(DurationFrames%(60*60*25)/(60*25) AS TEXT), -2, 2) || ':' ||
SUBSTR('00' || CAST(DurationFrames%(60*60*25)%(60*25)/25 AS TEXT), -2, 2) || ':' ||
SUBSTR('00' || CAST(DurationFrames%(60*60*25)%(60*25)%25 AS TEXT), -2, 2)
AS DurationTC
FROM
(
SELECT
MediaPool.Take AS Source,
Montage.RecordIn,
Montage.RecordOut,
SUM(CAST(SUBSTR(Montage.RecordOut, 1, 2) AS INT)*3600*25 +
CAST(SUBSTR(Montage.RecordOut, 4, 2) AS INT)*60*25 +
CAST(SUBSTR(Montage.RecordOut, 7, 2) AS INT)*25 +
CAST(SUBSTR(Montage.RecordOut, 10, 2) AS INT) -
CAST(SUBSTR(Montage.RecordIn, 1, 2) AS INT)*3600*25 -
CAST(SUBSTR(Montage.RecordIn, 4, 2) AS INT)*60*25 -
CAST(SUBSTR(Montage.RecordIn, 7, 2) AS INT)*25 -
CAST(SUBSTR(Montage.RecordIn, 10, 2) AS INT))
AS DurationFrames
FROM
MediaPool
JOIN
Montage ON MediaPool.FileName LIKE '%' || Montage.Name || '%'
GROUP BY
Take
ORDER BY
Take
)

Here's a simplified query that produces the same results as yours on your test data. Mostly it uses printf() instead of a bunch of string concatenation and substr()s, and uses strftime() to calculate the total seconds of the hours minutes seconds part of the timecode:
WITH frames AS
(SELECT Take, sum((strftime('%s', substr(RecordOut,1,8))*25 + substr(RecordOut,10))
- (strftime('%s', substr(RecordIn,1,8))*25 + substr(RecordIn,10)))
AS DurationFrames
FROM MediaPool
JOIN Montage ON MediaPool.Filename LIKE Montage.Name || '.%'
GROUP BY Take)
SELECT Take AS Source
, printf("%02d:%02d:%02d:%02d", DurationFrames/(60*60*25),
DurationFrames%(60*60*25)/(60*25),
DurationFrames%(60*60*25)%(60*25)/25,
DurationFrames%(60*60*25)%(60*25)%25)
AS DurationTC
FROM frames
ORDER BY Take;

Related

Auto update Kusto to pull todays date but specific time

I am very new to Kusto queries and I have one that is giving me the proper data that I export to Excel to manage. My only problem is that I only care (right now) about yesterday and today in two separate Sheets. I can manually change the datetime with the information but I would like to be able to just refresh the data and it pull the newest number.
It sounds pretty simple but I cannot figure out how to specify the exact time I want. Has to be from 2 am day 1 until 1:59 day 2
Thanks
['Telemetry.WorkStation']
| where NexusUid == "08463c7b-fe37-43b6-a0d2-237472b9774d"
| where TelemetryLocalTimeStamp >= make_datetime(2023,2,15,2,0,0) and TelemetryLocalTimeStamp < make_datetime(2023,2,16,01,59,0)
| where NumberOfBinPresentations >0
ago(), now(), startofday() and some datetime arithmetic.
// Sample data generation. Not part of the solution.
let ['Telemetry.WorkStation'] = materialize(range i from 1 to 1000000 step 1 | extend NexusUid = "08463c7b-fe37-43b6-a0d2-237472b9774d", TelemetryLocalTimeStamp = ago(2d * rand()));
// Solution starts here.
['Telemetry.WorkStation']
| where NexusUid == "08463c7b-fe37-43b6-a0d2-237472b9774d"
| where TelemetryLocalTimeStamp >= startofday(ago(1d)) + 2h
and TelemetryLocalTimeStamp < startofday(now()) + 2h
| summarize count(), min(TelemetryLocalTimeStamp), max(TelemetryLocalTimeStamp)
count_
min_TelemetryLocalTimeStamp
max_TelemetryLocalTimeStamp
500539
2023-02-15T02:00:00.0162031Z
2023-02-16T01:59:59.8883692Z
Fiddle

Filter columns by dashboard multi-select parameter

I'm trying to render a time series, but I have too many columns to show by default. To remedy this I figured I would present the user with a multi-select of all the columns and downselect the columns I render to that list, but I can't for the life of me figure out or find an answer on how to do it.
I have data with, say, columns Time, X1, X2, ... X120 and a multi-select parameter _columns of that table | getschema | project ColumnName | where ColumnName != "Time". I want to project to Time and the contents of _columns.
I can only find how to filter rows based on some column's value vs the multi-select. I feel like I'm missing something very simple.
Updated
There is also a simple solution for data that looks something like that:
This kind of data might be created by a make-series operator with multiple aggregation functions, e.g. -
make-series series_001 = count(), series_002 = min(x), series_003 = sum(x), series_004 = avg(x), series_005 = countif(type == 1), series_006 = countif(subtype == 123) on Timestamp from ago(7d) to now() step 1d
// Data sample generation, including series creation.
// Not part of the solution.
let p_series_num = 100;
let data = materialize
(
range i from 1 to p_series_num step 1
| project series_name = strcat("series_", substring(strcat("00", i), -3))
| mv-apply range(1, 7, 1) on (summarize make_list(rand()))
| evaluate pivot(series_name, take_any(list_))
| extend Timestamp = range(now() - 6d, now(), 1d)
| project-reorder Timestamp, * granny-asc
);
// Solution starts here
// We assume the creation of a parameter named _series, in the dashboard
// Uncomment the following line when executed outside the context of the dashboard
let ['_series'] = 'series_001';
data
| project Timestamp, column_ifexists(['_series'], real(null))
| render timechart
Timestamp
series_001
["2022-10-08T15:59:51.4634127Z","2022-10-09T15:59:51.4634127Z","2022-10-10T15:59:51.4634127Z","2022-10-11T15:59:51.4634127Z","2022-10-12T15:59:51.4634127Z","2022-10-13T15:59:51.4634127Z","2022-10-14T15:59:51.4634127Z"]
["0.35039128090096505","0.79027849410190631","0.023939659111657484","0.14207071795033441","0.91242141133745414","0.33016368441829869","0.50674771943297525"]
Fiddle
This solution supports multi-selection.
The original data looks something like that:
// Data sample generation. Not part of the solution.
let p_start_time = startofday(ago(1d));
let p_interval = 5m;
let p_rows = 15;
let p_cols = 120;
let data = materialize
(
range Timestamp from p_start_time to p_start_time + p_rows * p_interval step p_interval
| mv-expand MetricID = range(1, p_cols) to typeof(int)
| extend MetricVal = rand(), MetricName = strcat("x", tostring(MetricID))
| evaluate pivot(MetricName, take_any(MetricVal), Timestamp)
| project-reorder Timestamp, * granny-asc
);
// Solution starts here
// We assume the creation of a parameter named _MetricName, in the dashboard
// Uncomment the following line when executed outside the context of the dashboard
let ['_series'] = dynamic(['x1', 'x3', 'x7', 'x100', 'x120']);
data
| project Timestamp, pa = pack_all()
| project Timestamp, cols = bag_remove_keys(pa, set_difference(bag_keys(pa), _series))
| evaluate bag_unpack(cols)
| render timechart
Timestamp
x1
x100
x120
x3
x7
2022-10-20T00:00:00Z
0.40517703772298719
0.86952520047109094
0.67458442932790974
0.20662741864260561
0.19230161743580523
2022-10-20T00:05:00Z
0.098438188653858671
0.14095230636982198
0.10269711129443576
0.99361020447683746
0.093624077251808144
2022-10-20T00:10:00Z
0.3779198279036311
0.095647188329308852
0.38967218915903867
0.62601873006422182
0.18486009896828509
2022-10-20T00:15:00Z
0.141551736845493
0.64623737123678782
... etc.
Fiddle
It might be very simple, if our tabular data (post creation of the series) looks something like that:
This kind of data might be created by a make-series operator with by clause, e.g. -
make-series count() on Timestamp from ago(7d) to now() step 1d by series_name
In that case, all we need to do is add a filter on the series name, E.g. -
// Data sample generation, including series creation.
// Not part of the solution.
let p_series_num = 100;
let data = materialize
(
range i from 1 to 1000000 step 1
| extend Timestamp = ago(rand()*7d)
,series_name = strcat("series_", substring(strcat("00", tostring(toint(rand(p_series_num)))), -3))
| make-series count() on Timestamp from ago(7d) to now() step 1d by series_name
);
// Solution starts here
// We assume the creation of a parameter named _series, in the dashboard
// Uncomment the following line when executed outside the context of the dashboard
let ['_series'] = 'series_001';
data
| where series_name == _series
| render timechart
series_name
count_
Timestamp
series_001
[1434,1439,1430,1428,1422,1372,1475]
["2022-10-07T15:54:57.3677580Z","2022-10-08T15:54:57.3677580Z","2022-10-09T15:54:57.3677580Z","2022-10-10T15:54:57.3677580Z","2022-10-11T15:54:57.3677580Z","2022-10-12T15:54:57.3677580Z","2022-10-13T15:54:57.3677580Z"]
Fiddle
Here is a solution that match the data structure in your scenario.
* It is the same solution is the other solution I just modified, but since the source data structure is different, I posted an additional answer for learning purposes.
The original data looks something like that:
The code is actually very simple, leveraging column_ifexists():
// Data sample generation. Not part of the solution.
let p_start_time = startofday(ago(1d));
let p_interval = 5m;
let p_rows = 15;
let p_cols = 120;
let data = materialize
(
range Timestamp from p_start_time to p_start_time + p_rows * p_interval step p_interval
| mv-expand MetricID = range(1, p_cols) to typeof(int)
| extend MetricVal = rand(), MetricName = strcat("x", tostring(MetricID))
| evaluate pivot(MetricName, take_any(MetricVal), Timestamp)
| project-reorder Timestamp, * granny-asc
);
// Solution starts here
// We assume the creation of a parameter named _MetricName, in the dashboard
// Uncomment the following line when executed outside the context of the dashboard
let ['_MetricName'] = "x42";
data
| project Timestamp, column_ifexists(['_MetricName'], real(null))
| render timechart
Timestamp
x42
2022-10-13T00:00:00Z
0.89472385054721115
2022-10-13T00:05:00Z
0.11275174098360444
2022-10-13T00:10:00Z
0.96233152692333268
2022-10-13T00:15:00Z
0.21751913633816042
2022-10-13T00:20:00Z
0.69591667527850931
2022-10-13T00:25:00Z
0.36802228024058203
2022-10-13T00:30:00Z
0.29060518653083045
2022-10-13T00:35:00Z
0.13362332423562559
2022-10-13T00:40:00Z
0.013920161307282448
2022-10-13T00:45:00Z
0.05909880950497
2022-10-13T00:50:00Z
0.146454957813311
2022-10-13T00:55:00Z
0.318823204227693
2022-10-13T01:00:00Z
0.020087435985750794
2022-10-13T01:05:00Z
0.31110660126024159
2022-10-13T01:10:00Z
0.75531136771424379
2022-10-13T01:15:00Z
0.99289833682620265
Fiddle

KQL, time difference between separate rows in same table

I have Sessions table
Sessions
|Timespan|Name |No|
|12:00:00|Start|1 |
|12:01:00|End |2 |
|12:02:00|Start|3 |
|12:04:00|Start|4 |
|12:04:30|Error|5 |
I need to extract from it duration of each session using KQL (but if you could give me suggestion how I can do it with some other query language it would be also very helpful). But if next row after start is also start, it means session was abandoned and we should ignore it.
Expected result:
|Duration|SessionNo|
|00:01:00| 1 |
|00:00:30| 4 |
You can try something like this:
Sessions
| order by No asc
| extend nextName = next(Name), nextTimestamp = next(timestamp)
| where Name == "Start" and nextName != "Start"
| project Duration = nextTimestamp - timestamp, No
When using the operator order by, you are getting a Serialized row set, which then you can use operators such as next and prev. Basically you are seeking rows with No == "Start" and next(Name) == "End", so this is what I did,
You can find this query running at Kusto Samples open database.
let Sessions = datatable(Timestamp: datetime, Name: string, No: long) [
datetime(12:00:00),"Start",1,
datetime(12:01:00),"End",2,
datetime(12:02:00),"Start",3,
datetime(12:04:00),"Start",4,
datetime(12:04:30),"Error",5
];
Sessions
| order by No asc
| extend Duration = iff(Name != "Start" and prev(Name) == "Start", Timestamp - prev(Timestamp), timespan(null)), SessionNo = prev(No)
| where isnotnull(Duration)
| project Duration, SessionNo

Regular Expression in teradata

I need to search few patterns from a column using regular expression in Teradata.
One of the example is mentioned below:
SELECT
REGEXP_SUBSTR(
REGEXP_SUBSTR('1-2-3','([0-9] *- *[0-9] *- *[0-9])',1, 1, 'i'),
'([0-9] *- *[0-9] *- *[0-9])',
1, 1, 'i'
) AS Tmp,
REGEXP_SUBSTR(
tmp,
'(^[0-9])',1,1,'i') || '-' || REGEXP_SUBSTR(tmp,'([0-9]$)',
1, 1, 'i'
) AS final_exp
;
In the above expression, I am extracting "1-3" out of a pattern like "1-2-3". Now the patterns can be anything like: 1-2-3-4-5 or 1-2,3 or 1&2-3 or 1-2,3 &4.
Is there any way that I can generalize the search pattern in regular expression like [-,&]* will only search for occurrence of this characters in order, but the characters can be present in any order in the data.
Few examples mentioned below,need is to fetch all the desired result set using a single pattern serch in expression.
Column name ==> Result
abc 1-2+3- 4 ==> 1-4
def 10,12 & 13 ==> 10-13
ijk 1,2,3, and 4 lmn ==> 1-4
abc1-2 & 3 def ==> 1-3
ikl 11 &12 -13 ==> 11-13
oAy$ 7-8 and 9 ==> 7-9
RegExp_Substr(col, '(\d+)',1, 1, 'c') || '-' ||
RegExp_Substr(col, '(\d+)(?!.*\d)',1, 1, 'c')
(\d+) = first number
(\d+)(?!.*\d) = last number (a number not followed by another number)
There's also no need for those optional parameters, because it's using the defaults anyway:
RegExp_Substr(col, '(\d+)') || '-' ||
RegExp_Substr(col, '(\d+)(?!.*\d)')

Avoid running a function multiple times in a query

I have the following query in Application Insights where I run the parsejson function multiple times in the same query.
Is it possible to reuse the data from the parsejson() function after the first invocation? Right now I call it three times in the query. I am trying to see if calling it just once might be more efficient.
EventLogs
| where Timestamp > ago(1h)
and tostring(parsejson(tostring(Data.JsonLog)).LogId) =~ '567890'
| project Timestamp,
fileSize = toint(parsejson(tostring(Data.JsonLog)).fileSize),
pageCount = tostring(parsejson(tostring(Data.JsonLog)).pageCount)
| limit 10
You can use extend for that:
EventLogs
| where Timestamp > ago(1h)
| extend JsonLog = parsejson(tostring(Data.JsonLog)
| where tostring(JsonLog.LogId) =~ '567890'
| project Timestamp,
fileSize = toint(JsonLog.fileSize),
pageCount = tostring(JsonLog.pageCount)
| limit 10

Resources