Power BI, How to summarize following table (Max, distinctcount) - count

Date Customer Sales
2018-01-01 A 36
2018-01-01 A 45
2018-01-01 B 16
2018-01-01 C 31
2018-01-02 D 29
2018-01-02 D 29
2018-01-02 A 26
2018-01-02 B 35
2018-01-03 C 12
2018-01-03 C 32
2018-01-03 A 33
2018-01-03 B 32
I've been spending several day & nights solving following using DAX in POWER BI.
From the attached 'Raw data' table , I'd like to have the summry like attached aside.Raw Data and Result Table
Raw Data
Date Customer Sales
2018-01-01 A 36
2018-01-01 A 45
2018-01-01 B 16
2018-01-01 C 31
2018-01-02 D 29
2018-01-02 D 29
2018-01-02 A 26
2018-01-02 B 35
2018-01-03 C 12
2018-01-03 C 32
2018-01-03 A 33
2018-01-03 B 32

Here's one way to do it...
Raw data:
Create a new Calculated Table
Add a Matrix and add, from your new calculated table, Customer on Rows and DailyOrders into Values three times, setting one to Count and one to Maximum.
The resulting Matrix:

Related

How to generate variable based on conditions of two other - generic formulae [duplicate]

This question already has an answer here:
generate sequence within group in R [duplicate]
(1 answer)
Closed 3 years ago.
I have a huge dataframe and need to add a new variable based on the value of three others.
The new variable has to be numeric and depends on the variables "Compartment", "Plot" and "Date". In every compartment, I will number the dates for plot x say 1:10 (if ten dates), the dates of plot y 11:20 (of also ten dates), those of plot z 21:25 (if 5 dates) and so on. Normally the dates are the same for every plot withing each compartment, but exceptions occur.
So I need a single numeric value for each plot-date combination and they need to be in chronological order for every plot.
This post: R code: how to generate variable based on multiple conditions from other variables
gives a solution on how to create a variable based on conditions of other variables, but if I have to retype this for every combination in every df, it'll take me days and a massive amount of code.
Is there a generic way of solving this? With a loop or something? Thus far I couldn't think of anything better then splitting the df in df's per plot and linking the new variable to date with ifelse (in ifelse in ifelse ...). And linking them again afterwards. But this is impossible for the amount of data I have.
I did already split the large df per compartment though, should that help for certain solutions.
Dummy code (NOTE all compartments have different plot names in the real data and dates sometimes differ between compartments and even plots, as does the no. of observations per combo):
# Dataframe
Comp <- rep(c("A","B","C"), each=20)
Date <- rep(rep(c("2018-01-01", "2018-01-02", "2018-01-03", "2018-01-04", "2018-01-05"), times=4),times=3)
Plot <- rep(rep(c("P1", "P2", "P3", "P4"), each=5),times=3)
df <- data.frame(Comp, Date, Plot)
# Expected result
Comp Date Plot T
1 A 2018-01-01 P1 1
2 A 2018-01-02 P1 2
3 A 2018-01-03 P1 3
4 A 2018-01-04 P1 4
5 A 2018-01-05 P1 5
6 A 2018-01-01 P2 6
7 A 2018-01-02 P2 7
8 A 2018-01-03 P2 8
9 A 2018-01-04 P2 9
10 A 2018-01-05 P2 10
11 A 2018-01-01 P3 11
12 A 2018-01-02 P3 12
13 A 2018-01-03 P3 13
14 A 2018-01-04 P3 14
15 A 2018-01-05 P3 15
16 A 2018-01-01 P4 16
17 A 2018-01-02 P4 17
18 A 2018-01-03 P4 18
19 A 2018-01-04 P4 19
20 A 2018-01-05 P4 20
21 B 2018-01-01 P1 1
22 B 2018-01-02 P1 2
23 B 2018-01-03 P1 3
24 B 2018-01-04 P1 4
25 B 2018-01-05 P1 5
26 B 2018-01-01 P2 6
27 B 2018-01-02 P2 7
28 B 2018-01-03 P2 8
29 B 2018-01-04 P2 9
30 B 2018-01-05 P2 10
31 B 2018-01-01 P3 11
32 B 2018-01-02 P3 12
33 B 2018-01-03 P3 13
34 B 2018-01-04 P3 14
35 B 2018-01-05 P3 15
36 B 2018-01-01 P4 16
37 B 2018-01-02 P4 17
38 B 2018-01-03 P4 18
39 B 2018-01-04 P4 19
40 B 2018-01-05 P4 20
41 C 2018-01-01 P1 1
42 C 2018-01-02 P1 2
43 C 2018-01-03 P1 3
44 C 2018-01-04 P1 4
45 C 2018-01-05 P1 5
46 C 2018-01-01 P2 6
47 C 2018-01-02 P2 7
48 C 2018-01-03 P2 8
49 C 2018-01-04 P2 9
50 C 2018-01-05 P2 10
51 C 2018-01-01 P3 11
52 C 2018-01-02 P3 12
53 C 2018-01-03 P3 13
54 C 2018-01-04 P3 14
55 C 2018-01-05 P3 15
56 C 2018-01-01 P4 16
57 C 2018-01-02 P4 17
58 C 2018-01-03 P4 18
59 C 2018-01-04 P4 19
60 C 2018-01-05 P4 20
When creating your df, use stringsAsFactors = FALSE so as to not deal with factors.
df <- data.frame(Comp, Date, Plot,stringsAsFactors=FALSE)
df$z=as.numeric(as.factor(paste(df$Date,df$Plot,sep="#")))
> head(df,25)
Comp Date Plot z
1 A 2018-01-01 P1 1
2 A 2018-01-01 P2 2
3 A 2018-01-01 P3 3
4 A 2018-01-01 P4 4
5 A 2018-01-02 P1 5
6 A 2018-01-02 P2 6
7 A 2018-01-02 P3 7
8 A 2018-01-02 P4 8
9 A 2018-01-03 P1 9
10 A 2018-01-03 P2 10
11 A 2018-01-03 P3 11
12 A 2018-01-03 P4 12
13 A 2018-01-04 P1 13
14 A 2018-01-04 P2 14
15 A 2018-01-04 P3 15
16 A 2018-01-04 P4 16
17 A 2018-01-05 P1 17
18 A 2018-01-05 P2 18
19 A 2018-01-05 P3 19
20 A 2018-01-05 P4 20
21 B 2018-01-01 P1 1
22 B 2018-01-01 P2 2
23 B 2018-01-01 P3 3
24 B 2018-01-01 P4 4
25 B 2018-01-02 P1 5
First we generate a new variable which pastes Date and Plot columns with a random (the rarer the better) separator (#). We then take advantage of the as.numeric(as.factor() combination, which first groups the new variable as a factor and then assigns a number to each level.
#Rui Barradas had the answer with a very simple line of code:
df$new <- with(df, ave(as.integer(Comp), Comp, FUN = seq_along))

using data.table to create sequence from starting points and increments

I would like to use data.table to repeatedly add an increment to a starting point.
library(data.table)
dat <- data.table(time=seq(from=as.POSIXct("2018-01-01 01:00:01"),to=as.POSIXct("2018-01-01 01:00:10"), by="secs"), int=c(2,3,3,1,10,10,10,10,10,10), x=2*1:10)
> dat
time inc x
1: 2018-01-01 01:00:01 2 2
2: 2018-01-01 01:00:02 3 4
3: 2018-01-01 01:00:03 3 6
4: 2018-01-01 01:00:04 1 8
5: 2018-01-01 01:00:05 10 10
6: 2018-01-01 01:00:06 10 12
7: 2018-01-01 01:00:07 10 14
8: 2018-01-01 01:00:08 10 16
9: 2018-01-01 01:00:09 10 18
10: 2018-01-01 01:00:10 10 20
That is, starting in row 1, I would like to add the value of inc to time, yielding a new time. I then need to add the value of inc at that new time, to arrive at a third time. The result would then be
> res
time inc x
1: 2018-01-01 01:00:00 2 2
2: 2018-01-01 01:00:02 3 6
3: 2018-01-01 01:00:05 10 12
I would probably know how to do this in a loop, but I wonder whether data.table can handle these sorts of problems as well.
Since the values in time are continuous, my ideas was to use the cumulative values of inc to index, along the lines of
index <- dat[...,cumsum(...inc...),...]
dat[index]
but I cannot get cumsum() to ignore the values in between the points of interest. Perhaps this can be done in the i part of data.table but I would not know how. Anyone an idea?
# start with finding the next time
dat[, next.time := time + int][!dat, on = .(next.time = time), next.time := NA]
# do this in a loop for the actual problem, and stop when final column is all NA
dat[dat, on = .(next.time = time), t1 := i.next.time]
dat[dat, on = .(t1 = time), t2 := i.next.time]
dat
# time int x next.time t1 t2
# 1: 2018-01-01 01:00:01 2 2 2018-01-01 01:00:03 2018-01-01 01:00:06 <NA>
# 2: 2018-01-01 01:00:02 3 4 2018-01-01 01:00:05 <NA> <NA>
# 3: 2018-01-01 01:00:03 3 6 2018-01-01 01:00:06 <NA> <NA>
# 4: 2018-01-01 01:00:04 1 8 2018-01-01 01:00:05 <NA> <NA>
# 5: 2018-01-01 01:00:05 10 10 <NA> <NA> <NA>
# 6: 2018-01-01 01:00:06 10 12 <NA> <NA> <NA>
# 7: 2018-01-01 01:00:07 10 14 <NA> <NA> <NA>
# 8: 2018-01-01 01:00:08 10 16 <NA> <NA> <NA>
# 9: 2018-01-01 01:00:09 10 18 <NA> <NA> <NA>
#10: 2018-01-01 01:00:10 10 20 <NA> <NA> <NA>

Pivoting one column while keeping the rest in R [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 4 years ago.
I am quite new to coding in R, and im working on cleaning and transforming some data.
I have looked at some different uses of reshape() and reshape2() for the cast function to help me, but i have not been able to succeed.
Basically what i would like to do is, to move one column up as column headers for the values.
This is my data:
#My data:
KEYFIGURE LOCID PRDID KEYFIGUREDATE KEYFIGUREVALUE
Sales 1001 A 2018-01-01 1
Promo 1001 A 2018-01-02 2
Disc 1001 A 2018-01-03 3
Sales 1001 B 2018-01-01 10
Promo 1001 B 2018-01-01 11
Disc 1002 B 2018-01-03 12
The result i would like to get:
LOCID PRDID KEYFIGUREDATE Sales Promo Disc
1001 A 2018-01-01 1 2
1001 A 2018-01-03 3
1001 B 2018-01-01 10 11
1002 B 2018-01-03 12
However, i am having quite some trouble figuring out how this is possibly in a smart way w. reshape package.
You can do this in one line with tidyr::spread:
library(tidyr)
df %>%
spread(KEYFIGURE, KEYFIGUREVALUE)
LOCID PRDID KEYFIGUREDATE Disc Promo Sales
1 1001 A 2018-01-01 NA NA 1
2 1001 A 2018-01-02 NA 2 NA
3 1001 A 2018-01-03 3 NA NA
4 1001 B 2018-01-01 NA 11 10
5 1002 B 2018-01-03 12 NA NA
The way the function works is that you give it 2 variables in your dataset: the first is the variable to spread across multiple columns, while the second is the variable that sets the values to put in those cells.

SQLITE - Flatten a key-value table into columns [duplicate]

I have a table in SQLite called param_vals_breaches that looks like the following:
id param queue date_time param_val breach_count
1 c a 2013-01-01 00:00:00 188 7
2 c b 2013-01-01 00:00:00 156 8
3 c c 2013-01-01 00:00:00 100 2
4 d a 2013-01-01 00:00:00 657 0
5 d b 2013-01-01 00:00:00 23 6
6 d c 2013-01-01 00:00:00 230 12
7 c a 2013-01-01 01:00:00 100 0
8 c b 2013-01-01 01:00:00 143 9
9 c c 2013-01-01 01:00:00 12 2
10 d a 2013-01-01 01:00:00 0 1
11 d b 2013-01-01 01:00:00 29 5
12 d c 2013-01-01 01:00:00 22 14
13 c a 2013-01-01 02:00:00 188 7
14 c b 2013-01-01 02:00:00 156 8
15 c c 2013-01-01 02:00:00 100 2
16 d a 2013-01-01 02:00:00 657 0
17 d b 2013-01-01 02:00:00 23 6
18 d c 2013-01-01 02:00:00 230 12
I want to write a query that will show me a particular queue (e.g. "a") with the average param_val and breach_count for each param on an hour by hour basis. So transposing the data to get something that looks like this:
Results for Queue A
Hour 0 Hour 0 Hour 1 Hour 1 Hour 2 Hour 2
param avg_param_val avg_breach_count avg_param_val avg_breach_count avg_param_val avg_breach_count
c xxx xxx xxx xxx xxx xxx
d xxx xxx xxx xxx xxx xxx
is this possible? I'm not sure how to go about it. Thanks!
SQLite does not have a PIVOT function but you can use an aggregate function with a CASE expression to turn the rows into columns:
select param,
avg(case when time = '00' then param_val end) AvgHour0Val,
avg(case when time = '00' then breach_count end) AvgHour0Count,
avg(case when time = '01' then param_val end) AvgHour1Val,
avg(case when time = '01' then breach_count end) AvgHour1Count,
avg(case when time = '02' then param_val end) AvgHour2Val,
avg(case when time = '02' then breach_count end) AvgHour2Count
from
(
select param,
strftime('%H', date_time) time,
param_val,
breach_count
from param_vals_breaches
where queue = 'a'
) src
group by param;
See SQL Fiddle with Demo

How to recreate the table by key?

I thought it could be a very easy question, but I am really a new beginner for R.
I have a data.table with key and lots of rows, two of which could be set as key. I want to recreate the table by Key.
For example, the simple data. In this case, the key is ID and Act, and here we can get a total of 4 groups.
ID ValueDate Act Volume
1 2015-01-01 EUR 21
1 2015-02-01 EUR 22
1 2015-01-01 MAD 12
1 2015-02-01 MAD 11
2 2015-01-01 EUR 5
2 2015-02-01 EUR 7
3 2015-01-01 EUR 4
3 2015-02-01 EUR 2
3 2015-03-01 EUR 6
Here is a code to generate test data:
dd <- data.table(ID = c(1,1,1,1,2,2,3,3,3),
ValueDate = c("2015-01-01", "2015-02-01", "2015-01-01","2015-02-01", "2015-01-01","2015-02-01","2015-01-01","2015-02-01","2015-03-01"),
Act = c("EUR","EUR","MAD","MAD","EUR","EUR","EUR","EUR","EUR"),
Volume=c(21,22,12,11,5,7,4,2,6))
After change, each column should present a specific group which is defined by Key (ID and Act).
Below is the result:
ValueDate ID1_EUR D1_MAD D2_EUR D3_EUR
2015-01-01 21 12 5 4
2015-02-01 22 11 7 2
2015-03-01 NA NA NA 6
Thanks a lot !
What you are trying to do is not recreating the data.table, but reshaping it from a long format to a wide format. You can use dcast for this:
dcast(dd, ValueDate ~ ID + Act, value.var = "Volume")
which gives:
ValueDate 1_EUR 1_MAD 2_EUR 3_EUR
1: 2015-01-01 21 12 5 4
2: 2015-02-01 22 11 7 2
3: 2015-03-01 NA NA NA 6
If you want the numbers in the resulting columns to be preceded with ID, then you can use:
dcast(dd, ValueDate ~ paste0("ID",ID) + Act, value.var = "Volume")
which gives:
ValueDate ID1_EUR ID1_MAD ID2_EUR ID3_EUR
1: 2015-01-01 21 12 5 4
2: 2015-02-01 22 11 7 2
3: 2015-03-01 NA NA NA 6

Resources