How to create a calculated column " Flag" using KQL - azure-data-explorer

timestamp
identifier
EDD
ward
2022-03-04T09:00:00Z
ab1
2022-03-06T09:00:00Z
h1
2022-03-04T11:45:00Z
ab1
2022-03-07T09:00:00Z
h1
2022-03-05T11:45:00Z
ab1
2022-03-09T09:00:00Z
h1
2022-03-06T11:45:00Z
ab1
2022-03-09T09:00:00Z
G1
2022-03-04T11:45:00Z
xy
2022-03-09T09:00:00Z
A1
2022-03-04T09:00:00Z
bc
2022-03-07T09:00:00Z
S1
2022-03-06T11:45:00Z
abc
2022-03-14T09:00:00Z
G1
2022-03-05T09:00:00Z
bc
2022-03-12T09:00:00Z
S1
2022-03-07T11:45:00Z
xyz
2022-03-10T09:00:00Z
Z1
2022-03-04T11:45:00Z
def
2022-03-09T09:00:00Z
A1
2022-03-06T11:45:00Z
def
2022-03-09T09:00:00Z
R1
2022-03-07T11:45:00Z
def
2022-03-09T09:00:00Z
H1
For Every change in EDD for an identifier, it should flag 1
Expected output:
timestamp
identifier
EDD
ward
Flag
2022-03-04T09:00:00Z
ab1
2022-03-06T09:00:00Z
h1
2022-03-04T11:45:00Z
ab1
2022-03-07T09:00:00Z
h1
1
2022-03-05T11:45:00Z
ab1
2022-03-09T09:00:00Z
h1
1
2022-03-06T11:45:00Z
ab1
2022-03-09T09:00:00Z
G1
2022-03-04T11:45:00Z
xy
2022-03-09T09:00:00Z
A1
2022-03-04T09:00:00Z
bc
2022-03-07T09:00:00Z
S1
2022-03-06T11:45:00Z
abc
2022-03-14T09:00:00Z
G1
2022-03-05T09:00:00Z
bc
2022-03-12T09:00:00Z
S1
1
2022-03-07T11:45:00Z
xyz
2022-03-10T09:00:00Z
Z1
2022-03-04T11:45:00Z
def
2022-03-09T09:00:00Z
A1
2022-03-06T11:45:00Z
def
2022-03-09T09:00:00Z
R1
2022-03-07T11:45:00Z
def
2022-03-09T09:00:00Z
H1

You should use the prev() function:
<Your query>
| extend Flag = iff(EOD != prev(EOD), 1, 0)
Just note that in order for prev() to work, the input to the extend operator should be serialized, for eample, sorted by values in some column. This is because records in Kusto are not ordered.

Related

using output of one query in another in kql

let average = materialize(FooTable
| summarize avg(value) by group, class
| summarize arg_min(avg_value, class) by group
This should output something like (i.e. minimum value for the group averages per class):
group
class
avg_value
G1
C1
100
G2
C2
150
..
..
..
Now, I would like to display all the group, class and value row which shows the delta from their group's minimum class average as calculated by the query above.
FooTable
| where value > ( here I want to insert the query to get min by group and class)
Output should be something like:
group
class
min_avg_value
delta
G1
C1
100
0
G1
C2
120
20
G2
C1
200
50
G2
C2
150
0
..
..
..
..
Thanks for the help in advance!
lookup
let FooTable = datatable (group:string, class:string, value:int)
[
'G1' ,'C1', 100
,'G1' ,'C2', 120
,'G2' ,'C1', 200
,'G2' ,'C2', 150
];
let average = materialize(
FooTable
| summarize avg(value) by group, class
| summarize min(avg_value) by group
);
FooTable
| lookup kind=inner average on group
| extend delta = value - min_avg_value
Show expand view
group
class
value
min_avg_value
delta
G1
C1
100
100
0
G1
C2
120
100
20
G2
C1
200
150
50
G2
C2
150
150
0
Fiddle
join
let FooTable = datatable (group:string, class:string, value:int)
[
'G1' ,'C1', 100
,'G1' ,'C2', 120
,'G2' ,'C1', 200
,'G2' ,'C2', 150
];
let average = materialize(
FooTable
| summarize avg(value) by group, class
| summarize min(avg_value) by group
);
average
| join kind=inner FooTable on group
| extend delta = value - min_avg_value
group
min_avg_value
group1
class
value
delta
G1
100
G1
C1
100
0
G1
100
G1
C2
120
20
G2
150
G2
C1
200
50
G2
150
G2
C2
150
0
Fiddle

How to create a new column in R based on information in a sample table?

I'd like to create a new column in my data that contains the age of samples specified in another dataframe. Here's a sample of my data:
Depth Name X Statistic Total.Cell.Number
1 B1 fcs NA 95208
2 > B1 fcs/Immune Cells 43.40 41276
3 > > B1 fcs/Immune Cells/Single Cells 93.70 38686
4 > > > B1 fcs/Immune Cells/Single Cells/Live 96.90 37506
5 > > > > B1 fcs/Immune Cells/Single Cells/Live/CD45 High 9.10 3413
6 > > > > > B1 fcs/Immune Cells/Single Cells/Live/CD45 High/B Cells 7.76 265
And here's the sample information dataframe:
Sample Age
1 B1 2
2 B2 2
3 B3 2
4 B4 2
5 B5 2
6 B6 2
7 B7 12
8 B8 12
9 B9 12
10 B10 12
11 B11 12
12 B12 12
I would like to create a new column in the original dataframe, Age, that matches the Age specified for each sample in the second dataframe. Here's the catch though: because this will be part of a function with an unknown number of samples and different ages/names every time, I cannot hard code this. Anyone have any ideas?
dplyr::left_join(df1, sampleInfo, by = c("name" = "Sample"))

How to do a match in R with left_join using multiple columns and one "likely" column

Im triying to do a match between two different data frames in R.
For example one data frame looks like:
df1<- data.frame(description=c("sol 100ml","200 mg","1.5 ml","10MG"),
pa=c("clorbetazol","Milk","Aciclovir","AAC"),
atc=c("x1","a2","a3","x3"))
description pa atc
sol 100ml clorbetazol x1
200 mg Milk a2
1.5 ml Aciclovir a3
10MG AAC x3
And the other one looks like:
df2 <-data.frame(Concentration=c("100","200","1.5","10"),
pa=c("clorbetazol","Milk","Aciclovir","AAC"),
atc=c("x1","a2","a3","x3"),
code=c("A101","A202","A303","A404"))
Concentration pa atc code
100 clorbetazol x1 A101
200 Milk a2 A202
1.5 Aciclovir a3 A303
10 AAC x3 A404
My question is: There is a way to do a match with columns "pa", "atc" and use "concentration" column in some way (use GREPL or something) to do left join o merge?
Finally i want to get this:
description pa atc code
sol 100ml clorbetazol x1 A101
200 mg Milk a2 A202
1.5 ml Aciclovir a3 A303
10MG AAC x3 A404
I wonder if someone can help me.
Thanks!
You can use a regex to extract the numbers, which you then match with a left join:
library(dplyr)
df1 %>%
mutate(Concentration = gsub("^.*?(\\d+(\\.)?(\\d+)?).*$", "\\1", description)) %>%
left_join(df2, by = c("pa", "atc", "Concentration")) %>%
select(-Concentration)
#> description pa atc code
#> 1 sol 100ml clorbetazol x1 A101
#> 2 200 mg Milk a2 A202
#> 3 1.5 ml Aciclovir a3 A303
#> 4 10MG AAC x3 A404
Using gsub with a regex, then merge.
res <- merge(transform(df1, Concentration=gsub("[^\\d\\.]", "",
description, perl=TRUE)),
df2, all=TRUE)[-3]
res
# pa atc description code
# 1 AAC x3 10MG A404
# 2 Aciclovir a3 1.5 ml A303
# 3 clorbetazol x1 sol 100ml A101
# 4 Milk a2 200 mg A202

How to subset and apply a function across the dataset

I have a list of prices for different items in the same dataset.
abc1 <- c("2005-09-18", "ABC", 99.00)
abc2 <- c("2005-09-19", "ABC", 98.00)
abc3 <- c("2005-09-20", "ABC", 98.50)
abc4 <- c("2005-09-21", "ABC", 97.75)
def1 <- c("2005-09-14", "DEF", 79.00)
def2 <- c("2005-09-15", "DEF", 78.00)
def3 <- c("2005-09-16", "DEF", 78.50)
def4 <- c("2005-09-20", "DEF", 77.75)
df <- data.frame(rbind(abc1, abc2, abc3, abc4, def1, def2, def3, def4))
the above quick table would result in :
X1 X2 X3
abc1 2005-09-18 ABC 99
abc2 2005-09-19 ABC 98
abc3 2005-09-20 ABC 98.5
abc4 2005-09-21 ABC 97.75
def1 2005-09-14 DEF 79
def2 2005-09-15 DEF 78
def3 2005-09-16 DEF 78.5
def4 2005-09-20 DEF 77.75
I would like to add a column, say X4, which would be the variation of today, versus the previous day, for a specific X2. So x4 would have the following value:
X4
0,0%
-1,0%
0,5%
-0,8%
0,0%
-1,3%
0,6%
-1,0%
The goal would be to do that for all the different items in X3. Ideally without splitting the table. I think the date is always going to be in the right order, but just in case.
We can group by 'X2' and take the difference of adjacent elements with diff
library(dplyr)
df %>%
group_by(X2) %>%
mutate(X4 = c(0, diff(X3)))
Or after grouping by 'X2', take the difference between the 'X2' and the lag of 'X2'
df %>%
group_by(X2) %>%
mutate(X4 = X3 - lag(X3, default = first(X3)))
Just a little hint: You wanted to calculate the difference in percent, not the absolute difference.
You have to adjust the formula to do so, otherwise your results are wrong :-).
df %>%
dplyr::group_by(X2) %>%
dplyr::mutate(X4 = (X3/lag(X3, default = first(X3)) - 1) * 100)
X1 X2 X3 X4
<fct> <fct> <dbl> <dbl>
1 2005-09-18 ABC 99 0
2 2005-09-19 ABC 98 -1.01
3 2005-09-20 ABC 98.5 0.510
4 2005-09-21 ABC 97.8 -0.761
5 2005-09-14 DEF 79 0
6 2005-09-15 DEF 78 -1.27
7 2005-09-16 DEF 78.5 0.641
8 2005-09-20 DEF 77.8 -0.955

Creating a new column in R with help of 3 existing columns

Want to create a new column "non_coded" using existing 3 columns- allele_2 , allele_1 and A1
the conditions I want satisfied are :
if allele_2 == A1 then non_coded = allele_1
if allele_2 != A1 then non_coded = allele_2
Thanks in advance,
Rad
OK This is what the data looks like:
SNPID chrom STRAND IMPUTED allele_2 allele_1 MAF CALL_RATE HET_RATE
1 rs1000000 12 + Y A G 0.12160 1.00000 0.2146
2 rs10000009 4 + Y G A 0.07888 0.99762 0.1386
HWP RSQ PHYS_POS A1 M1_FRQ M1_INFO M1_BETA M1_SE M1_P
1 1.0000 0.9817 125456933 A 0.1173 0.9452 -0.0113 0.0528 0.83090
2 0.1164 0.8354 71083542 A 0.9048 0.9017 -0.0097 0.0593 0.87000
The code I tried:
Hy_MVA$non_coded <- ifelse(Hy_MVA$allele_2 == Hy_MVA$A1, Hy_MVA$allele_1, Hy_MVA$allele_2)
result:
SNPID chrom STRAND IMPUTED allele_2 allele_1 MAF CALL_RATE HET_RATE
1 rs1000000 12 + Y A G 0.12160 1.00000 0.2146
2 rs10000009 4 + Y G A 0.07888 0.99762 0.1386
HWP RSQ PHYS_POS A1 M1_FRQ M1_INFO M1_BETA M1_SE M1_P non_coded
1 1.0000 0.9817 125456933 A 0.1173 0.9452 -0.0113 0.0528 0.83090 3
2 0.1164 0.8354 71083542 A 0.9048 0.9017 -0.0097 0.0593 0.87000 3
What I want:
SNPID chrom STRAND IMPUTED allele_2 allele_1 MAF CALL_RATE HET_RATE
1 rs1000000 12 + Y A G 0.12160 1.00000 0.2146
2 rs10000009 4 + Y G A 0.07888 0.99762 0.1386
HWP RSQ PHYS_POS A1 M1_FRQ M1_INFO M1_BETA M1_SE M1_P non_coded
1 1.0000 0.9817 125456933 A 0.1173 0.9452 -0.0113 0.0528 0.83090 G
2 0.1164 0.8354 71083542 A 0.9048 0.9017 -0.0097 0.0593 0.87000 G
As Chase said, use ifelse(). I guess the code then becomes:
non_coded <- ifelse(allele_2 == A1, allele_1, allele_2)
Edit
After seeing the updated question, it makes sense that you get numbers because allele_1 and allele_2 are factors. Adding a as.character() should fix this:
A1 <- c("A","A","B")
allele_1 <- as.factor(c("A","C","C"))
allele_2 <- as.factor(c("A","B","B"))
non_coded <- ifelse(allele_2 == A1, as.character(allele_1), as.character(allele_2))
non_coded
[1] "A" "B" "C"
Since you want non_coded to be one of two values:
Hy_MVA$non_coded <- Hy_MVA$allele_2
Hy_MVA$non_coded[Hy_MVA$allele_2 == Hy_MVA$A1] <- Hy_MVA$allele_1[Hy_MVA$allele_2 == Hy_MVA$A1]
That replaces values with allele_1 values in only the rows where allele_2 == A1. It sounds as though you might have a problem with ifelse converting a factor to a numeric.

Resources