Rolling data for 12 month period - azure-data-explorer

I wanna show the last 12 months, and each of those months should show the sum of 12 months back. So January 2022 shows sum of January 2021 -> January 2022, February 2022 shows sum of February 2021 -> February 2022 and so on.
My current data
Expected Result
I new in kusto, seems i need use pivot mode with prev function but these month period a bit confusing.

If you know for sure that you have data for each month, this will do the trick.
If not, the solution will get a bit more complicated.
The Idea is to create an accumulated sum column and then match each month accumulated sum with this of the same month from the previous year.
The difference between them is the sum of the last 12 months.
// Data sample generation. Not part of the solution.
let t = materialize(range i from 1 to 10000 step 1 | extend dt = ago(365d*5*rand()) | summarize val = count() by year = getyear(dt), month = getmonth(dt));
// Solution starts here.
t
| order by year asc, month asc
| extend cumsum_val = row_cumsum(val) - val, prev_year = year - 1
| as t2
| join kind=inner t2 on $left.prev_year == $right.year and $left.month == $right.month
| project year, month = format_datetime(make_datetime(year,month,1),'MM') , last_12_cumsum_val = cumsum_val - cumsum_val1
| evaluate pivot(month, any(last_12_cumsum_val), year)
| order by year asc
year
01
02
03
04
05
06
07
08
09
10
11
12
2018
1901
2020
2018
2023
2032
2039
2015
2025
2039
2019
2045
2048
2029
2043
2053
2040
2041
2027
2025
2037
2050
2042
2020
2035
2016
2024
2022
1999
2009
1989
1996
1975
1968
1939
1926
2021
1926
1931
1936
1933
1945
1942
1972
1969
1981
2007
2020
2049
2022
2051
2032
2019
2002
Fiddle

Another option is to follow the sliding window aggregations sample described here:
let t = materialize(range i from 1 to 10000 step 1 | extend dt = ago(365d*5*rand()) | summarize val = count() by year = getyear(dt), month = getmonth(dt) | extend Date = make_datetime(year, month, 1));
let window_months = 12;
t
| extend _bin = startofmonth(Date)
| extend _range = range(1, window_months, 1)
| mv-expand _range to typeof(long)
| extend end_bin = datetime_add("month", _range, Date)
| extend end_month = format_datetime(end_bin, "MM"), end_year = datetime_part("year", end_bin)
| summarize sum(val), count() by end_year, end_month
| where count_ == 12
| evaluate pivot(end_month, take_any(sum_val), end_year)
| order by end_year asc
end_year
01
02
03
04
05
06
07
08
09
10
11
12
2018
1921
2061
2036
2037
2075
2067
2038
2025
2029
2019
2012
2006
2015
2022
1997
2015
2012
2010
1994
2002
2029
2035
2020
2012
2002
1967
1949
1950
1963
1966
1976
1982
2016
1988
1972
2021
1990
1987
1991
1996
2026
2004
2005
1996
1991
1966
1989
1993
2022
1979
1983
1981
1977
1931

Related

Fill Lagged Values Down R

I am trying to use a combination of conditional lagging and then filling values down by group. In my data, I have old_price and new_price. The new_price must always be lower than old_price. Whenever new_price is greater than old_price, I would like to lag back to the most recent value where new_price was less than old_price. In the case of Raleigh, rows 2 and 3 should lag back to 36.00. Row 4 should not lag back since new_price is already lower than old_price. When I have tried using lag, it has been applying it to row 2 (where the price is 52), but then leaving row 3 as 54.00. I would like row 3 to also lag from row 1, or from row 2 once it has the correct value.
Here is my data:
city sku month year old_price new_price
Raleigh 001 Dec 2021 45.00 36.00
Raleigh 001 Jan 2022 45.00 52.00
Raleigh 001 Feb 2022 45.00 54.00
Raleigh 001 Mar 2022 45.00 37.00
Austin 002 Dec 2021 37.50 30.00
Austin 002 Jan 2022 37.50 32.00
Austin 002 Feb 2022 37.50 48.00
Desired output:
city sku month year old_price new_price
Raleigh 001 Dec 2021 45.00 36.00
Raleigh 001 Jan 2022 45.00 36.00
Raleigh 001 Feb 2022 45.00 36.00
Raleigh 001 Mar 2022 45.00 37.00
Austin 002 Dec 2021 37.50 30.00
Austin 002 Jan 2022 37.50 32.00
Austin 002 Feb 2022 37.50 32.00
One approach is to convert values where new_price > old_price to NA and then fill down.
library(dplyr)
library(tidyr)
df %>%
mutate(new_price = if_else(new_price > old_price, NA_real_, new_price)) %>%
fill(new_price)
Output:
city sku month year old_price new_price
1 Raleigh 1 Dec 2021 45.0 36
2 Raleigh 1 Jan 2022 45.0 36
3 Raleigh 1 Feb 2022 45.0 36
4 Raleigh 1 Mar 2022 45.0 37
5 Austin 2 Dec 2021 37.5 30
6 Austin 2 Jan 2022 37.5 32
7 Austin 2 Feb 2022 37.5 32
Data:
df <- read.table(textConnection("city sku month year old_price new_price
Raleigh 001 Dec 2021 45.00 36.00
Raleigh 001 Jan 2022 45.00 52.00
Raleigh 001 Feb 2022 45.00 54.00
Raleigh 001 Mar 2022 45.00 37.00
Austin 002 Dec 2021 37.50 30.00
Austin 002 Jan 2022 37.50 32.00
Austin 002 Feb 2022 37.50 48.00"), header = TRUE)

Converting Dates to Julian Date

I am currently trying to do Theil-Sen trend estimates with a number of time series. How should I convert the Date variables so that they can be run in mblm package? The dates currently exist like so 'Apr 1981'. I want to use monthly medians in this assessment. See attached data.frame.
Thanks!
mo yr doc Date
04 1981 2.800 Apr 1981
05 1982 2.700 May 1982
10 1999 0.500 Oct 1999
05 2000 2.400 May 2000
06 2000 1.200 Jun 2000
07 2000 0.950 Jul 2000
08 2000 0.700 Aug 2000
09 2000 0.750 Sep 2000
10 2000 0.600 Oct 2000
11 2000 0.785 Nov 2000
12 2000 0.660 Dec 2000
01 2001 0.710 Jan 2001

taking lag and capping the values with mean in dplyr

I have a following dataframe in r
name date month year hours
SSI 01-01-2016 01 2016 2000
SSI 02-01-2016 01 2016 1900
SSI 03-01-2016 01 2016 2038
SSI 04-01-2016 01 2016 2041
SSII 01-01-2016 01 2016 2000
SSII 02-01-2016 01 2016 2100
SSII 03-01-2016 01 2016 2105
SSII 04-01-2016 01 2016 2203
I want to calculate lag of hours for every name group by month and year.Which I can do it with following code
df1 <- df %>%
group_by(name,year,month) %>%
mutate(running_hrs = hours- lag(hours)) %>%
as.data.frame()
What I want is where running_hrs is greater than 24 or less than 0,I want to cap those values with mean of that month. I am doing following.
new_df <- df%>%
group_by(name,year,month) %>%
mutate(running_hrs = hours- lag(hours)) %>%
mutate(running_hrs_new = ifelse(running_hrs > 24 | running_hrs < 0,mean(running_hrs),running_hrs)) %>%
as.data.frame()
name date month year hours running_hrs running_hrs_new
SSI 01-01-2016 01 2016 2000 NA
SSI 02-01-2016 01 2016 1900 -100 (3/4)
SSI 03-01-2016 01 2016 2038 138 (3/4)
SSI 04-01-2016 01 2016 2041 3 3
SSII 01-01-2016 01 2016 2000 NA
SSII 02-01-2016 01 2016 2100 100 (10/4)
SSII 03-01-2016 01 2016 2105 5 5
SSII 04-01-2016 01 2016 2110 5 5
Values should be replaced by mean of running hours less than 24 and greater than or equal to zero. I think we can use conditional mean
library(dplyr)
library(tidyr)
new_df <- df%>%
group_by(name,year,month) %>%
mutate(running_hrs = hours- lag(hours)) %>%
mutate(valid_running_hrs= ifelse(running_hrs < 24 & running_hrs > 0,running_hrs,0)) %>%
replace_na(list(valid_running_hrs=0)) %>%
group_by(name,year,month) %>%
mutate(running_hrs_new = ifelse(running_hrs > 24 | running_hrs < 0, mean(valid_running_hrs), running_hrs)) %>%
as.data.frame()

How to switch a row dimension to a column dimension in R? [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 7 years ago.
I have a df looks like below.
Year Month Cont
1 2011 Apr 1376
2 2012 Apr 1232
3 2013 Apr 1360
4 2014 Apr 1294
5 2015 Apr 1344
6 2011 Aug 1933
7 2012 Aug 1930
8 2013 Aug 1821
9 2014 Aug 1845
10 2015 Aug 1855
So my question is how can I switch the rows in "Month" the column. The result should look like this.
Cont Apr Aug
1 2011 1376 1933
2 2012 1232 1930
3 2013 1360 1821
4 2014 1294 1845
5 2015 1344 1855
You can use reshape2:
library(reshape2)
dcast(df, Year~Month, value.var="Cont")
Or tidyr:
library(tidyr)
spread(df, Month, Cont)
Please refer the following code
> dat <- read.table("data.txt", quote="\"", comment.char="")
> dat
V1 V2 V3
1 2011 Apr 1376
2 2012 Apr 1232
3 2013 Apr 1360
4 2014 Apr 1294
5 2015 Apr 1344
6 2011 Aug 1933
7 2012 Aug 1930
8 2013 Aug 1821
9 2014 Aug 1845
10 2015 Aug 1855
> library(reshape2)
> dcast(dat, V1~V2)
Using V3 as value column: use value.var to override.
V1 Apr Aug
1 2011 1376 1933
2 2012 1232 1930
3 2013 1360 1821
4 2014 1294 1845
5 2015 1344 1855

sorting of month in matrix in R

I have a matrix in this format:
year month Freq
1 2014 April 466
2 2015 April 59535
3 2014 August 10982
4 2015 August 0
5 2014 December 35881
6 2015 December 0
7 2014 February 17
8 2015 February 24258
9 2014 January 0
10 2015 January 22785
11 2014 July 2981
12 2015 July 0
13 2014 June 1279
14 2015 June 31356
15 2014 March 289
16 2015 March 40274
I need to sort months on the basis of their occurrence i.e jan, feb, mar... when I sort it gets sorted on the basis of first alphabet. I used this:
mat <- mat[order(mat[,1], decreasing = TRUE), ]
and it looks like this :
row.names April August December February January July June March May November October September
1 2015 59535 0 0 24258 22785 0 31356 40274 84211 0 0 0
2 2014 466 10982 35881 17 0 2981 1279 289 879 8911 8565 4000
Can we sort months on the basis of occurrence in R ?
Suppose DF is the data frame from which you derived your matrix. We provide such a data frame in reproducible form at the end. Ensure that month and year are factors with appropriate levels. Note that month.name is a builtin variable in R that is used here to ensure that the month levels are appropriately sorted and we have assumed year is a numeric column. Then use levelplot like this:
DF2 <- transform(DF,
month = factor(as.character(month), levels = month.name),
year = factor(year)
)
library(lattice)
levelplot(Freq ~ year * month, DF2)
Note: Here is DF in reproducible form:
Lines <- " year month Freq
1 2014 April 466
2 2015 April 59535
3 2014 August 10982
4 2015 August 0
5 2014 December 35881
6 2015 December 0
7 2014 February 17
8 2015 February 24258
9 2014 January 0
10 2015 January 22785
11 2014 July 2981
12 2015 July 0
13 2014 June 1279
14 2015 June 31356
15 2014 March 289
16 2015 March 40274 "
DF <- read.table(text = Lines, header = TRUE)
Assuming you want to sort based on time (have to add a dummy day 1 to convert to time format):
time = strptime(paste(1, mat$month, mat$year), format = "%d %B %Y")
mat = mat[sort.ind(time, index.return=T)$ix, ]
Or if you don't care about the year:
time = strptime(paste(1, mat$month, 2000), format = "%d %B %Y")
mat = mat[sort.ind(time, index.return=T)$ix, ]

Resources