Accamulated data in pivot mode - azure-data-explorer

Accamulated data in pivot mode - azure-data-explorer

Now i accamulate columns via row_cumsum
test
| project Boenheter, Ar, Maned, ManedTLA
| extend _date = make_datetime(toint(Ar), Maned, 1)
| extend key1 = Ar, __auto0 = datetime_part('Month', startofmonth(_date))
| summarize value0 = sum(Boenheter) by key1, __auto0, ManedTLA
| order by __auto0 asc, key1 asc
| serialize value0 = **row_cumsum(value0, __auto0 != prev(__auto0))**
| extend __p = pack(tostring(ManedTLA), value0)
| summarize __p = make_bag(__p) by key1
| evaluate bag_unpack(__p)
| order by key1 asc
But i wanna do accamulation for rows in next approach:
Feb = Jan + Feb, Mar = Jan + Feb + Mar, etc... so Feb = 304, Mar = 624 (for 2012 year as example) and so on
Does Kusto have some hack for do accamulation for row instead columns (row_cumsum)?
Help please)

Use row_cumsum, with restart on year change, before using pivot
// Generation of a data sample. No part of the solution.
let t = materialize(range i from 1 to 200 step 1 | extend dt = ago(365d*10*rand()));
// The solution starts here.
t
| summarize count() by year = getyear(dt), month = format_datetime(dt,'MM')
| order by year asc, month asc
| extend cumsum = row_cumsum(count_, year != prev(year))
| evaluate pivot(month, any(cumsum), year)
year
01
02
03
04
05
06
07
08
09
10
11
12
2012
2
4
6
7
10
14
16
2013
2
3
7
8
10
11
15
16
17
18
2014
2
7
11
12
13
14
15
17
19
20
2015
2
3
6
10
11
12
13
14
15
2016
1
2
3
5
6
8
10
11
12
15
16
19
2017
1
2
5
8
13
16
17
20
21
2018
4
5
8
12
15
18
20
23
24
25
26
2019
5
7
8
10
11
14
18
19
20
21
2020
2
5
8
10
11
13
15
16
19
22
2021
2
5
6
7
8
9
11
17
2022
2
4
5
Fiddle

Related

R:How to apply a sliding conditional branch to consecutive values in the sequential data

I want to use conditional statement to consecutive values in the sliding manner.
For example, I have dataset like this;
data <- data.frame(ID = rep.int(c("A","B"), times = c(24, 12)),
+ time = c(1:24,1:12),
+ visit = as.integer(runif(36, min = 0, max = 20)))
and I got table below;
> data
ID time visit
1 A 1 7
2 A 2 0
3 A 3 6
4 A 4 6
5 A 5 3
6 A 6 8
7 A 7 4
8 A 8 10
9 A 9 18
10 A 10 6
11 A 11 1
12 A 12 13
13 A 13 7
14 A 14 1
15 A 15 6
16 A 16 1
17 A 17 11
18 A 18 8
19 A 19 16
20 A 20 14
21 A 21 15
22 A 22 19
23 A 23 5
24 A 24 13
25 B 1 6
26 B 2 6
27 B 3 16
28 B 4 4
29 B 5 19
30 B 6 5
31 B 7 17
32 B 8 6
33 B 9 10
34 B 10 1
35 B 11 13
36 B 12 15
I want to flag each ID by continuous values of "visit".
If the number of "visit" continued less than 10 for 6 times consecutively, I'd attach "empty", and "busy" otherwise.
In the data above, "A" is continuously below 10 from rows 1 to 6, then "empty". On the other hand, "B" doesn't have 6 consecutive one digit, then "busy".
I want to apply the condition to next segment of 6 values if the condition weren't fulfilled in the previous segment.
I'd like achieve this using R. Any advice will be appreciated.

Adding a name to fields in a newly created data frame

I have created a new data.frame from another data.frame, for example:
aaa=cbind(bb1[,1],bb1[,2],ay,ax)
I want to name bb1[,1] as prob, bb1[,2] as recommendation and remaining as it is. Can someone tell me the syntax of doing this? Thanks

bb1 = data.frame(c(1:10),c(11:20))
ax = c(21:30)
ay = c(31:40)
aaa = data.frame(cbind(bb1[,1],bb1[,2],ay,ax))
colnames(aaa) = c("prob", "recommandation","ax","ay")
Output
aaa
prob recommandation ax ay
1 1 11 31 21
2 2 12 32 22
3 3 13 33 23
4 4 14 34 24
5 5 15 35 25
6 6 16 36 26
7 7 17 37 27
8 8 18 38 28
9 9 19 39 29
10 10 20 40 30

Sum a variable based on another variable

I have a dataset consisting of two variables, Contents and Time like so:
Time Contents
2017M01 123
2017M02 456
2017M03 789
. .
. .
. .
2018M12 789
Now I want to create a numeric vector that aggregates Contents for six months, that is I want to sum 2017M01 to 2017M06 to one number, 2017M07 to 2017M12 to another number and so on.
I'm able to do this by indexing but I want to be able to write: "From 2017M01 to 2017M06 sum contents corresponding to that sequence" in my code.
I would really appreciate some help!

You can create a grouping variable based on the number of rows and number of elements to group. For your case, you want to group every 6 rows so your data frame should be divisible with 6. Using iris to demonstrate (It has 150 rows, so 150 / 6 = 25)
rep(seq(nrow(iris)%/%6), each = 6)
#[1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8 8 8 9 9 9 9 9 9 10 10 10 10
#[59] 10 10 11 11 11 11 11 11 12 12 12 12 12 12 13 13 13 13 13 13 14 14 14 14 14 14 15 15 15 15 15 15 16 16 16 16 16 16 17 17 17 17 17 17 18 18 18 18 18 18 19 19 19 19 19 19 20 20
#[117] 20 20 20 20 21 21 21 21 21 21 22 22 22 22 22 22 23 23 23 23 23 23 24 24 24 24 24 24 25 25 25 25 25 25
There are plenty of ways to handle how you want to call it. Here is a custom function that allows you to do that (i.e. create the grouping variable),
f1 <- function(x, df) {
v1 <- as.numeric(gsub('[0-9]{4}M(.*):[0-9]{4}M(.*)$', '\\1', x))
v2 <- as.numeric(gsub('[0-9]{4}M(.*):[0-9]{4}M(.*)$', '\\2', x))
i1 <- (v2 - v1) + 1
return(rep(seq(nrow(df)%/%i1), each = i1))
}
f1("2017M01:2017M06", iris)
#[1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8 8 8 9 9 9 9 9 9 10 10 10 10
#[59] 10 10 11 11 11 11 11 11 12 12 12 12 12 12 13 13 13 13 13 13 14 14 14 14 14 14 15 15 15 15 15 15 16 16 16 16 16 16 17 17 17 17 17 17 18 18 18 18 18 18 19 19 19 19 19 19 20 20
#[117] 20 20 20 20 21 21 21 21 21 21 22 22 22 22 22 22 23 23 23 23 23 23 24 24 24 24 24 24 25 25 25 25 25 25
EDIT: We can easily make the function compatible with 'non-0-remainder' divisions by concatenating the final result with a repetition of the max+1 value of the final result of remainder times, i.e.
f1 <- function(x, df) {
v1 <- as.numeric(gsub('[0-9]{4}M(.*):[0-9]{4}M(.*)$', '\\1', x))
v2 <- as.numeric(gsub('[0-9]{4}M(.*):[0-9]{4}M(.*)$', '\\2', x))
i1 <- (v2 - v1) + 1
final_v <- rep(seq(nrow(df) %/% i1), each = i1)
if (nrow(df) %% i1 == 0) {
return(final_v)
} else {
remainder = nrow(df) %% i1
final_v1 <- c(final_v, rep((max(final_v) + 1), remainder))
return(final_v1)
}
}
So for a data frame with 20 rows, doing groups of 6, the above function will yield the result:
f1("2017M01:2017M06", df)
#[1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4

Difference in Timestamp

I want to calculate the difference of two incidents. First five columns indicate a date-time of incident. The rest five columns indicate the date-time of death.
dat <- read.table(header=TRUE, text="
YEAR MONTH DAY HOUR MINUTE D.YEAR D.MONTH D.DAY D.HOUR D.MINUTE
2013 1 6 0 55 2013 1 6 0 56
2013 2 3 21 24 2013 2 4 23 14
2013 1 6 11 45 2013 1 6 12 29
2013 3 6 12 25 2013 3 6 23 55
2013 4 6 18 28 2013 5 3 11 18
2013 4 8 14 31 2013 4 8 14 32")
dat
YEAR MONTH DAY HOUR MINUTE D.YEAR D.MONTH D.DAY D.HOUR D.MINUTE
2013 1 6 1 55 2013 1 6 0 56
2013 2 3 21 24 2013 2 4 23 14
2013 1 6 11 45 2013 1 6 12 29
2013 3 6 12 25 2013 3 6 23 55
2013 4 6 18 28 2013 5 3 11 18
2013 4 8 14 31 2013 4 8 14 32
I want to calculate the difference of time (in minutes). The following code is not going anywhere. The timestamp will look like 2013-04-06 04:08.
library(lubridate)
dat$tstamp1 <- mdy(paste(dat$YEAR, dat$MONTH, dat$DAY, dat$HOUR, dat$MINUTE,sep = "-"))
dat$tstamp2 <- mdy(paste(dat$D.YEAR, dat$D.MONTH, dat$D.DAY, dat$D.HOUR, dat$D.MINUTE, sep = "-"))
dat$diff <- dat$tstamp2 -dat$tstamp2 ### want the difference in minutes

In order to parse a date/time string of the "-"-separated format you're creating, you'll need to give a custom format, and pass it to parse_date_time. For example:
parse_date_time(paste(dat$D.YEAR, dat$D.MONTH, dat$D.DAY, dat$D.HOUR, dat$D.MINUTE, sep = "-"),
"%Y-%m-%d-%H-%M")
Your new code would therefore look like:
library(lubridate)
dat$tstamp1 <- parse_date_time(paste(dat$YEAR, dat$MONTH, dat$DAY, dat$HOUR, dat$MINUTE, sep = "-"),
"%Y-%m-%d-%H-%M")
dat$tstamp2 <- parse_date_time(paste(dat$D.YEAR, dat$D.MONTH, dat$D.DAY, dat$D.HOUR, dat$D.MINUTE, sep = "-"),
"%Y-%m-%d-%H-%M")
Then the following will get you the time difference in minutes:
dat$diff <- as.numeric(dat$tstamp2 - dat$tstamp1)

You can try this:
library(lubridate)
dat$tstamp1 <- strptime(paste(dat$YEAR, dat$MONTH, dat$DAY, dat$HOUR, dat$MINUTE,sep = "-"),"%Y-%m-%d-%H-%M")
dat$tstamp2 <- strptime(paste(dat$D.YEAR, dat$D.MONTH, dat$D.DAY, dat$D.HOUR, dat$D.MINUTE, sep = "-"),"%Y-%m-%d-%H-%M")
dat$diff <- as.POSIXct(dat$tstamp2) - as.POSIXct(dat$tstamp1)
Using strptime is faster and bit safer against unexpected data. You can read more about it here.

R: Linear extrapolation between raster layers of different dates

There is already a thread dealing with interpolation between raster layers of different years (2006,2008,2010,2012). Now I tried to linearly extrapolate to 2020 with the approach suggested by #Ram Narasimhan and approxExtrap from the Hmisc package:
library(raster)
library(Hmisc)
df <- data.frame("2006" = 1:9, "2008" = 3:11, "2010" = 5:13, "2012"=7:15)
#transpose since we want time to be the first col, and the values to be columns
new <- data.frame(t(df))
times <- seq(2006, 2012, by=2)
new <- cbind(times, new)
# Now, apply Linear Extrapolate for each layer of the raster
approxExtrap(new, xout=c(2006:2012), rule = 2)
But instead of getting something like this:
# times X1 X2 X3 X4 X5 X6 X7 X8 X9
#1 2006 1 2 3 4 5 6 7 8 9
#2 2007 2 3 4 5 6 7 8 9 10
#3 2008 3 4 5 6 7 8 9 10 11
#4 2009 4 5 6 7 8 9 10 11 12
#5 2010 5 6 7 8 9 10 11 12 13
#6 2011 6 7 8 9 10 11 12 13 14
#7 2012 7 8 9 10 11 12 13 14 15
#8 2013 8 9 10 11 12 13 14 15 16
#9 2014 9 10 11 12 13 14 15 16 17
#10 2015 10 11 12 13 14 15 16 17 18
#11 2016 11 12 13 14 15 16 17 18 19
#12 2017 12 13 14 15 16 17 18 19 20
#13 2018 13 14 15 16 17 18 19 20 21
#14 2019 14 15 16 17 18 19 20 21 22
#15 2020 15 16 17 18 19 20 21 22 23
I get this:
$x
[1] 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
$y
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
This is quite confusing as both approxTime and approxExtrap are based on approxfun.

I found a way to make this work, although it doesn't seem to be the most elegant way to do it. The basic idea is to perform a linear interpolation with approxTime first, then use lm to fit a linear model to the time-series and extrapolate by using predict and the final year of extrapolation. The data gap between the final year and the end-year of the first interpolation is than filled by a second linear interpolation using approxTime again.
NOTE: The first linear interpolation is not really necessary, although I don't know if it makes any difference when you use more sophisticated data.
library(raster)
library(Hmisc)
library(simecol)
df <- data.frame("2006" = 1:9, "2008" = 3:11, "2010" = 5:13, "2012"=7:15)
#transpose since we want time to be the first col, and the values to be columns
new <- data.frame(t(df))
times <- seq(2006, 2012, by=2)
new <- cbind(times, new)
# Now, apply Linear Interpolate for each layer of the raster
intp<-approxTime(new, 2006:2012, rule = 2)
#Extract the years from the data.frame
tm<-intp[,1]
#Define a function for a linear model using lm
lm.func<-function(i) {lm(i ~ tm)}
#Define a new data.frame without the years from intp
intp.new<-intp[,-1]
#Creates a list of the lm coefficients for each column of intp.new
lm.list<-apply(intp.new, MARGIN=2, FUN=lm.func)
#Create a data.frame of the final year of your extrapolation; keep the name of tm data.frame
new.pred<-data.frame(tm = 2020)
#Make predictions for the final year for each element of lm.list
pred.points<-lapply(lm.frame, predict, new.pred)
#unlist the predicted points
fintime<-matrix(unlist(pred.points))
#Add the final year to the fintime matrix and transpond it
fintime.new<-t(rbind(2020,fintime))
#Convert the intp data.frame into a matrix
intp.ma<-as.matrix(intp)
#Append fintime.new to intp.ma
intp.wt<-as.data.frame(rbind(intp.ma,fintime.new))
#Perform an linear interpolation with approxTime again
approxTime(intp.wt, 2006:2020, rule = 2)
times X1 X2 X3 X4 X5 X6 X7 X8 X9
1 2006 1 2 3 4 5 6 7 8 9
2 2007 2 3 4 5 6 7 8 9 10
3 2008 3 4 5 6 7 8 9 10 11
4 2009 4 5 6 7 8 9 10 11 12
5 2010 5 6 7 8 9 10 11 12 13
6 2011 6 7 8 9 10 11 12 13 14
7 2012 7 8 9 10 11 12 13 14 15
8 2013 8 9 10 11 12 13 14 15 16
9 2014 9 10 11 12 13 14 15 16 17
10 2015 10 11 12 13 14 15 16 17 18
11 2016 11 12 13 14 15 16 17 18 19
12 2017 12 13 14 15 16 17 18 19 20
13 2018 13 14 15 16 17 18 19 20 21
14 2019 14 15 16 17 18 19 20 21 22
15 2020 15 16 17 18 19 20 21 22 23

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Accamulated data in pivot mode - azure-data-explorer

Related

R:How to apply a sliding conditional branch to consecutive values in the sequential data

Adding a name to fields in a newly created data frame

Sum a variable based on another variable

Difference in Timestamp

R: Linear extrapolation between raster layers of different dates

Categories

Resources