How to create a window of arbitrary size in Kusto? - azure-data-explorer

Using prev() function I can access previous rows individually.
mytable
| sort by Time asc
| extend mx = max_of(prev(Value, 1), prev(Value, 2), prev(Value, 3))
How to define a window to aggregate over in more generic way? Say I need maximum of 100 values in previous rows. How to write a query that does not require repeating prev() 100 times?

Can be achieved by combining scan and series_stats_dynamic().
scan is used to create an array of last x values, per record.
series_stats_dynamic() is used to get the max value of each array.
// Data sample generation. Not part of the solution
let mytable = materialize(range i from 1 to 15 step 1 | extend Time = ago(1d*rand()), Value = toint(rand(100)));
// Solution starts here
let window_size = 3; // >1
mytable
| order by Time asc
| scan declare (last_x_vals:dynamic)
with
(
step s1 : true => last_x_vals = array_concat(array_slice(s1.last_x_vals, -window_size + 1, -1), pack_array(Value));
)
| extend toint(series_stats_dynamic(last_x_vals).max)
i
Time
Value
last_x_vals
max
5
2022-06-10T11:25:49.9321294Z
45
[45]
45
14
2022-06-10T11:54:13.3729674Z
82
[45,82]
82
2
2022-06-10T13:25:40.9832745Z
44
[45,82,44]
82
1
2022-06-10T17:38:28.3230397Z
24
[82,44,24]
82
7
2022-06-10T18:29:33.926463Z
17
[44,24,17]
44
15
2022-06-10T19:54:33.8253844Z
9
[24,17,9]
24
3
2022-06-10T20:17:46.1347592Z
43
[17,9,43]
43
12
2022-06-11T00:02:55.5315197Z
94
[9,43,94]
94
9
2022-06-11T00:11:18.5924511Z
61
[43,94,61]
94
11
2022-06-11T00:39:40.6858444Z
38
[94,61,38]
94
4
2022-06-11T03:54:59.418534Z
84
[61,38,84]
84
10
2022-06-11T05:55:38.2904242Z
6
[38,84,6]
84
6
2022-06-11T07:25:43.3977923Z
36
[84,6,36]
84
13
2022-06-11T09:36:08.7904844Z
28
[6,36,28]
36
8
2022-06-11T09:51:45.2225391Z
73
[36,28,73]
73
Fiddle

Related

Populating a column with values from two of several columns based on value in another column

I have a data cleaning/transformation problem which I've solved in a way which I'm 1,000% sure could have been solved much more simply.
Below is an example of what my data looks like initially. The first four columns are numebrs I'll use for a lookup, the next is the type of the item, and the last two columns are the ones I want to fill. Based on the value of the column type I would like to fill in the value_one and value_two columns with the values of the same numbered column of the matching type- either one_apple and two_apple or one_orange and two_orange . For example, for the first row if the value is "apple", I would like to fill value_one with the value of one_apple for that row, and value_two with the value of two_apple from that row.
one_apple one_orange two_apple two_orange type value_one value_two
1 23 56 90 orange NA NA
2 24 57 91 orange NA NA
3 25 58 92 apple NA NA
4 26 59 93 apple NA NA
5 27 60 94 orange NA NA
6 28 61 95 apple NA NA
...
This is what I would like that dataframe to look like after I run my code:
one_apple one_orange two_apple two_orange type value_one value_two
1 23 56 90 apple 1 56
2 24 57 91 orange 24 91
3 25 58 92 apple 3 58
4 26 59 93 apple 4 59
5 27 60 94 apple 5 60
6 28 61 95 apple 6 61
...
The way I have solved this right now is to use a for loop, which figures out the index of the columns matching the type value in that row, which(str_sub(names(example_data), start = 5) == example_data$type[i]). Then I use that index to select the correct value for the value_one column from the appropriate place, example_data[i,...)[1]] and assign it to value_one. I do the same thing for value_two.
Below I have code which first creates an example dataset like the one I want to transform, and then shows my for loop running on it to transform the data.
example_data = data.frame(one_apple = 1:(1+30), one_orange = 23:(23+30), two_apple = 56:(56+30), two_orange = 90:(90+30), type = sample(c("apple","orange"), 31, replace = T), value_one = rep(NA,31), value_two = rep(NA,31))
for(i in 1:nrow(example_data)){
example_data$value_one[i] = example_data[i,which(str_sub(names(example_data), start = 5) == example_data$type[i])[1]]
example_data$value_two[i] = example_data[i,which(str_sub(names(example_data), start = 5) == example_data$type[i])[2]]
}
This transformation works, but it is clearly not great code and I feel like I'm missing an easier way to do it with apply and without the convoluted use of which to grab column indexes and stuff. It would be very helpful to see a better way to do this.

How to extend a hash with multiple values in R

So I understand that in R, a hash() is similar to a dictionary. I would like to extract specific values from my dataframe and put them in to a hash.
The componentindex column is were I have my keys and the cluster.index + UniqueFileSourceCounts columns contain my values. So for the same key I have multiple values. e.g: hash {91: [1,15],[22,99] etc..
So I would like to create a hash that contains each key, with multiple values. But im not sure how to do that.
mini_df <- head(df,10) #using a small df
compID <- unique(mini_df$componentindex) #list with unique keys
h1 <- hash()
for (i in 1:length(mini_df)){
if(compID == mini_df[i,"componentindex"]){
h1 <- hash(mini_df[i,"componentindex"] ,c(mini_df[i,"cluster.index"],mini_df[i,"UniqueFileSourcesCount"]))
}
#h2 <- append(h2,h1)
}
if I print h1 , I end up having only the last value:
<hash> containing 1 key-value pair(s).
91 : 42 5
Which I understand since I don't append to this hash but overwrite it. Im not sure how to append/expand hashes in R and I have not been able to find a solution yet.
mini_df:
UniqueFileSourcesCount cluster.index componentindex
1 15 1 91
2 15 10 -1
3 99 22 91
4 63 23 1675
5 12 25 91
6 6 27 91
7 50 37 91
8 5 42 91
9 2 43 -1
10 2 69 -1

Splitting a matrix into multiple matrices

There are two matrices:
Matrix with 2 columns: node name and node degree (k1):
Matrix with 1 column: degrees (ms):
I need to split 1st matrix into multiple matrices, where every matrix has nodes of same degree. Then, write matrices to csv-files. But my code is not working. How can i do this correctly?
k1<-read.csv2("VandD.csv", header = FALSE)
fnk1<-as.matrix(k1)
ms<-read.csv2("mas.csv", header = FALSE)
massive<-as.matrix(ms)
wlk<-1
varbl<-1
rtt<-list()
for (wlk in 1:384) {
rtt<-NULL
stepen<-massive[wlk]
for (varbl in 1:2154) {
if(fnk1[varbl,2]==stepen){
kapa<-fnk1[varbl,1]
rtt<-append(rtt,kapa)
}
}
namef<-paste("reslt",stepen,".csv",sep = "")
write.csv2(rtt, file=namef)
}
k1
V1 V2
1 UC7Ucs42FZy3uYzjrqzOIHsw 81
2 UCyWDmyZRjrGHeKF-ofFsT5Q 81
3 UCIZP6nCTyU9VV0zIhY7q1Aw 81
4 UCqk3CdGN_j8IR9z4uBbVPSg 81
5 UCjWzQkWu0l1yAhcBoavokng 81
6 UCRXiA3h1no_PFkb1JCP0yMA 81
7 UC2w9SdXpwq2Uq-MV4W4A8kw 81
8 UCdJqTQJZleoxZFReiyNvn8w 81
9 UC2Qw1dzXDBAZPwS7zm37g8g 81
10 UCTOovOHTf4efJOmGvJBxIQQ 81
ms
V1
1 81
2 82
3 83
4 84
5 85
6 86
7 87
8 88
9 89
10 90
Seems you need split
split(k1,k1$v2)
We can use group_split
library(dplyr)
k1 %>%
group_split(v2)

Flip Every Nth Coin in R [duplicate]

This question already has answers here:
R: How to use ifelse statement for a vector of characters
(2 answers)
Closed 6 years ago.
My friend gave me a brain teaser that I wanted to try on R.
Imagine 100 coins in a row, with heads facing up for all coins. Now every 2nd coin is flipped (thus becoming tails). Then every 3rd coin is flipped. How many coins are now showing heads?
To create the vector, I started with:
flips <- rep('h', 100)
levels(flips) <- c("h", "t")
Not sure how to proceed from here. Any help would be appreciated.
Try this:
coins <- rep(1, 100) # 1 = Head, 0 = Tail
n = 3 # run till the time when you flip every 3rd coin
invisible(sapply(2:n function(i) {indices <- seq(i, 100, i); coins[indices] <<- (coins[indices] + 1) %% 2}) )
which(coins == 1)
# [1] 1 5 6 7 11 12 13 17 18 19 23 24 25 29 30 31 35 36 37 41 42 43 47 48 49 53 54 55 59 60 61 65 66 67 71 72 73 77 78 79 83 84 85 89 90 91 95 96 97
sum(coins==1)
#[1] 49
If you run till n = 100, only the coins at the positions which are perfect squares will be showing heads.
coins <- rep(1, 100) # 1 = Head, 0 = Tail
n <- 100
invisible(sapply(2:n, function(i) {indices <- seq(i, 100, i); coins[indices] <<- (coins[indices] + 1) %% 2}) )
which(coins == 1)
# [1] 1 4 9 16 25 36 49 64 81 100
sum(coins==1)
# [1] 10

Summing values after every third position in data frame in R

I am new to R. I have a data frame like following
>df=data.frame(Id=c("Entry_1","Entry_1","Entry_1","Entry_2","Entry_2","Entry_2","Entry_3","Entry_4","Entry_4","Entry_4","Entry_4"),Start=c(20,20,20,37,37,37,68,10,10,10,10),End=c(50,50,50,78,78,78,200,94,94,94,94),Pos=c(14,34,21,50,18,70,101,35,2,56,67),Hits=c(12,34,17,89,45,87,1,5,6,3,26))
Id Start End Pos Hits
Entry_1 20 50 14 12
Entry_1 20 50 34 34
Entry_1 20 50 21 17
Entry_2 37 78 50 89
Entry_2 37 78 18 45
Entry_2 37 78 70 87
Entry_3 68 200 101 1
Entry_4 10 94 35 5
Entry_4 10 94 2 6
Entry_4 10 94 56 3
Entry_4 10 94 67 26
For each entry I would like to iterate the data.frame in 3 different modes. For an example, for Entry_1 mode_1 =seq(20,50,3)and mode_2=seq(21,50,3) and mode_3=seq(22,50,3). I would like sum all the Values in Column "Hits" whose corresponding values in Column "Pos" that falls in mode_1 or_mode_2 or mode_3 and generate a data.frame like follow:
Id Mode_1 Mode_2 Mode_3
Entry_1 0 17 34
Entry_2 87 89 0
Entry_3 1 0 0
Entry_4 26 8 0
I tried the following code:
mode_1=0
mode_2=0
mode_3=0
mode_1_sum=0
mode_2_sum=0
mode_3_sum=0
for(i in dim(df)[1])
{
if(df$Pos[i] %in% seq(df$Start[i],df$End[i],3))
{
mode_1_sum=mode_1_sum+df$Hits[i]
print(mode_1_sum)
}
mode_1=mode_1_sum+counts
print(mode_1)
ifelse(df$Pos[i] %in% seq(df$Start[i]+1,df$End[i],3))
{
mode_2_sum=mode_2_sum+df$Hits[i]
print(mode_2_sum)
}
mode_2_sum=mode_2_sum+counts
print(mode_2)
ifelse(df$Pos[i] %in% seq(df$Start[i]+2,df$End[i],3))
{
mode_3_sum=mode_3_sum+df$Hits[i]
print(mode_3_sum)
}
mode_3_sum=mode_3_sum+counts
print(mode_3_sum)
}
But the above code only prints 26. Can any one guide me how to generate my desired output, please. I can provide much more details if needed. Thanks in advance.
It's not an elegant solution, but it works.
m <- 3 # Number of modes you want
foo <- ((df$Pos - df$Start)%%m + 1) * (df$Start < df$Pos) * (df$End > df$Pos)
tab <- matrix(0,nrow(df),m)
for(i in 1:m) tab[foo==i,i] <- df$Hits[foo==i]
aggregate(tab,list(df$Id),FUN=sum)
# Group.1 V1 V2 V3
# 1 Entry_1 0 17 34
# 2 Entry_2 87 89 0
# 3 Entry_3 1 0 0
# 4 Entry_4 26 8 0
-- EXPLANATION --
First, we find the indices of df$Pos That are both bigger than df$Start and smaller than df$End. These should return 1 if TRUE and 0 if FALSE. Next, we take the difference between df$Pos and df$Start, we take mod 3 (which will give a vector of 0s, 1s and 2s), and then we add 1 to get the right mode. We multiply these two things together, so that the values that fall within the interval retain the right mode, and the values that fall outside the interval become 0.
Next, we create an empty matrix that will contain the values. Then, we use a for-loop to fill in the matrix. Finally, we aggregate the matrix.
I tried looking for a quicker solution, but the main problem I cannot work around is the varying intervals for each row.

Resources