Creating a friendly name from a substring in kusto? - azure-data-explorer

I have the following dataset in kusto:
let data = datatable(Timestamp:datetime, Name:string, Value:int)
[
datetime(2022-02-18 10:00:00 AM), "AX_100A_A00", 100,
datetime(2022-02-18 10:01:00 AM), "BX_101B_B00", 200,
datetime(2022-02-18 10:02:00 AM), "CX_102C_C00", 300,
datetime(2022-02-18 10:03:00 AM), "DX_103D_D00", 400,
datetime(2022-02-18 10:04:00 AM), "EX_104E_E00", 500,
];
data
| summarize result = max(Value) by Floor_Name = substring(Name, 3, 4)
To illustrate what I am trying achieve here. Between the two underscores there is a code which represents a specific Location. What I need is to replace each value between those 2 underscores into a friendly name. Please note this is just sample. In my real case scenario I need to be replacing 50 names. I don't know if I should be defining a variable as dictionary which takes the previous name as KEY and the new name as VALUE then check for the existing key and replace it. I don't know if there is a better way to do it. I need to achieve the following:
100A --> New York
101B --> Geneva
102C --> France
103D --> US
104E --> Canada

The dynamic data type
or
bag_pack(), pack()
let data = datatable(Timestamp:datetime, Name:string, Value:int)
[
datetime(2022-02-18 10:00:00 AM), "AX_100A_A00", 100,
datetime(2022-02-18 10:01:00 AM), "BX_101B_B00", 200,
datetime(2022-02-18 10:02:00 AM), "CX_102C_C00", 300,
datetime(2022-02-18 10:03:00 AM), "DX_103D_D00", 400,
datetime(2022-02-18 10:04:00 AM), "EX_104E_E00", 500,
];
let mydict = dynamic(
{
"100A":"New York"
,"101B":"Geneva"
,"102C":"France"
,"103D":"US"
,"104E":"Canada"
}
);
data
| summarize result = max(Value) by Floor_Name = tostring(mydict[substring(Name, 3, 4)])
Floor_Name
result
New York
100
Geneva
200
France
300
US
400
Canada
500
Fiddle

Related

Get Other columns based on max of one column in Kusto

I am trying to write a Kusto query to find record who has max value in column grouped by another column but also requires 3rd(remaining) columns with it.
Let there be three columns A(timestamp) B(impvalue: number) and C (anothervalue:string).
I need to get records grouped by C with max timestamp and its corresponding B column too.
In Sql, I am well aware how to do it using self join. I am new to Kusto, I tried few combination with summarize, join and top operator but wasn't able to make it work.
Example:
Output:
you can use the arg_max() aggregation function: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/arg-max-aggfunction.
For example:
datatable(A:datetime, B:long, C:string)
[
datetime(2020-08-20 12:00:00), 50, "abc",
datetime(2020-08-20 12:10:00), 30, "abc",
datetime(2020-08-20 12:05:00), 100, "abc",
datetime(2020-08-20 12:00:00), 40, "def",
datetime(2020-08-20 12:05:00), 120, "def",
datetime(2020-08-20 12:10:00), 80, "def",
]
| summarize arg_max(A, *) by C
C
A
B
abc
2020-08-20 12:10:00.0000000
30
def
2020-08-20 12:10:00.0000000
80
This isn't the most elegant solution, but it works:
let X = datatable (a: string, b: int, c: string) [
"8/24/2021, 12:40:00.042 PM", 50, "abc",
"8/24/2021, 12:40:10.042 PM", 30, "abc",
"8/24/2021, 12:40:05.042 PM", 100, "abc",
"8/24/2021, 12:40:00.042 PM", 40, "def",
"8/24/2021, 12:40:05.042 PM", 120, "def",
"8/24/2021, 12:40:10.042 PM", 80, "def"
];
X
| summarize Answer = max(a)
| join X on $left.Answer == $right.a
| project a,b,c

R: aggregating list of time series by period

what would be the best way to aggregate several time series together by reference period? Ideally by using ts objects only.
For example, I have two monthly series TS1 and TS2, I want to get TSTOT:
TIME_PERIOD TS1 TS2 TSTOT
2000-01-01 25 25 50
2000-02-01 35 30 65
2000-03-01 40 30 70
I have several ts objects so I could imagin some function working with a list.
Thank you!
If these are ts objects, we can use merge
ts1 <- ts(c(25, 35, 40), start = c(2000, 1), freq = 12)
ts2 <- ts(c(25, 30, 30), start = c(2000, 1), freq = 12)
transform(merge(ts1, ts2, by = "row.names"), TSTOT = x.x + x.y)

Calculate count of zeros R for specific case

I have data that looks like below
I am trying to calculate when Unit 1 went 0 and what time it became greater than 0. Suppose Unit 1 first drops to zero at 01/04/2019 02:00 and it is zero-till 01/04/2019 03:00 so that should be counted as 1 and then the second time it goes zero at 01/04/2019 04:30 and its zero-till 01/04/2019 05:00 which will be counted as 2 and same calculation for the other units.
Additionally, Iam looking to capture the time difference like the first time unit 1 went 0 for 2 hours and then second time unit went 0 for 1 hour something like this
I am thinking if that can be done using if statement that counts until the value is greater than zero and then a loop gets updated.
I am struggling with how to incorporate time with that.
The final result should be
Unit | Went Offline| Came online
Unit 1| 01/04/2019 02:00 | 01/04/2019 03:00
Unit 1| 01/04/2019 04:30 | 01/04/2019 05:00
I prefer some sudo code to start with. But here is an example solution to begin with.
# create data frame
date = format(seq(as.POSIXct("2019-04-01 00:00:00", tz="GMT"),
length.out=15, by='30 min'), '%Y-%m-%d %H:%M:%S')
unit1 = c(513, 612, 653, 0, 0, 0, 530, 630, 0, 0, 650, 512, 530 , 650, 420)
data = data.frame(date, unit1)
# subset all data that is Zero
data1 = data[data$unit1 != 0,]
# Create lead for from and to
data1$dateTo = lead(data1$date, 1, na.pad = TRUE)
#calculate time diff
data1$timediff = as.numeric(difftime(data1$dateTo,data1$date,units = "mins"))
# subset data that has a time diff more than 30 mins
data2 = subset.data.frame(data1, timediff > 30)

Creating new label and computing the corresponding value in existing columns using dplyr

I would like to create a new label in an existing column (e.g. column A) and to create a computed value in the same row in another existing column (e.g. column B).
A simulated data looks like the following:
df <- data.frame(date = as.Date(c("31-Dec-2018", "31-Dec-2018", "31-Dec-2018", "30-Sep-2018", "30-Sep-2018", "30-Jun-2018", "30-Jun-2018",
"31-Mar-2018", "31-Mar-2018"), format = "%d-%b-%Y"),
metric = c("Revenue", "Profit", "Restructuring Cost", "Revenue", "Profit", "Revenue", "Profit", "Revenue", "Profit"),
value = c(100, 50, 10, 100, 50, 90, 44, 97, 60))
There are three columns (date, financial metric, and the corresponding value for that financial metric for that particular date). For example, I would like to compute the net profit margin for each date (Profit for particular date divided by revenue for that same date). However, mutate does it wrongly; it creates a new computed column. I want the "Net Margin" label to be created in the existing "metric" column and the corresponding net margin value in the "value" column.
What I have done thus far (which is wrong) is the following:
test <- df %>%
group_by(date) %>%
mutate(net_margin = round(value/lag(value), digits = 2))
I am not sure of how to call for the metric as well. My above code uses the value of the previous row, but this may not be the case all the time.
The desired output would look something like the following:
Thanks!
We can summarise by date and calculate ratio of value at "Profit" by that at "Revenue" and bind the rows to original dataframe.
library(dplyr)
df %>%
group_by(date) %>%
summarise(value = round(value[metric == "Profit"]/value[metric == "Revenue"], 2),
metric = "Net Margin") %>%
bind_rows(df) %>%
arrange(date)
# date value metric
# <date> <dbl> <chr>
# 1 2018-03-31 0.62 Net Margin
# 2 2018-03-31 97 Revenue
# 3 2018-03-31 60 Profit
# 4 2018-06-30 0.49 Net Margin
# 5 2018-06-30 90 Revenue
# 6 2018-06-30 44 Profit
# 7 2018-09-30 0.5 Net Margin
# 8 2018-09-30 100 Revenue
# 9 2018-09-30 50 Profit
#10 2018-12-31 0.5 Net Margin
#11 2018-12-31 100 Revenue
#12 2018-12-31 50 Profit
#13 2018-12-31 10 Restructuring Cost

add data.frame column by using a look up table and linear interpolation

I'm sure this a very common method but I'm having trouble stating it accurately. I have a long data.frame with three columns: a date_time column, a numeric column (df$property1), and a string column.
I have another data.frame serving as a lookup table providing a relationship between "property1" and another numeric property, "property2".
I'd like to add a df$property2 column to df that is an approximated result of using linear interpolation of df$property1 based on the lookup table's property1 and property2 relationship. For example, if df$property1 happened to be 10, df$property2 would be 20, or if df$property1 happened to be 145, df$property2 would be an somewhere under but pretty close to 1500.
I'm hoping to learn how to create df$property2 efficiently and am interested in learning tidyverse and non-tidyverse methods.
library(tidyverse)
# create example data frame needing new column
date_time <- seq(from=as.POSIXct("2015-12-10 12:00", tz="GMT"),
to=as.POSIXct("2015-12-10 18:00", tz="GMT"), by="1 hours")
property1 <- c(1,45,12,99, 105,3,149)
df1 <- data.frame(date_time, property1) %>% mutate(class = "a")
property1 <- c(50,10,66,147, 11,190,80)
df2 <- data.frame(date_time, property1) %>% mutate(class = "b")
df <- rbind(df1, df2)
# create example look up table
property1_lookup <- c(1, 10, 15, 50, 100, 150, 99999)
property2_lookup <- c(0.001, 20, 30, 100, 500, 1500, 1501)
lookup <- data.frame(property1_lookup, property2_lookup)
Thank you.
I think this is fairly straightforward:
df$property2 = approx(x = lookup$property1_lookup,
y = lookup$property2_lookup,
xout = df$property1)$y
head(df)
# date_time property1 class property2
# 1 2015-12-10 12:00:00 1 a 0.001000
# 2 2015-12-10 13:00:00 45 a 90.000000
# 3 2015-12-10 14:00:00 12 a 24.000000
# 4 2015-12-10 15:00:00 99 a 492.000000
# 5 2015-12-10 16:00:00 105 a 600.000000
# 6 2015-12-10 17:00:00 3 a 4.445222
I'll leave it to you whether or not linear interpolation is appropriate... from your data a logarithmic interpolation might do better.

Resources