Projecting in DAX - projection

I am still relatively new to the DAX language, but my question is how do I forward project based off of date specific variables such as holiday performance boosts or yearly low periods.
I have 2 tables. One that has sales data for multiple SKUs and the other has the time of year variables.
I have my forecasted values in a small table with the date ranges and % change needed like so.
date increase
1/1/2018 1
4/14/2018 .9
5/1/2018 1
6/1/2018 .85
8/1/2018 1.05
11/18/2018 1.25
I have only gotten as far as projecting out 30 days into the future using the DATEADD function with a CALCULATETABLE.
transOrder30:=VAR Days = -30
Return
SUMX(
CALCULATETABLE(sales,
DATEADD(
'Calendar'[Date] ,Days,DAY),
ALL(sales[date])
),
sales[organicOrder])
where organic_orders:=SUM(sales[organicOrder])
So how do I take this projected 30 days and duplicate it throughout time then multiply it by the time of year change?
In Excel that would be as easy as organicOrder*vlookup(date,A:B,2).
I have looked through this guys post but it doesn't work for me.
https://blog.gbrueckl.at/2015/04/recursive-calculations-powerpivot-dax/
Any help is appreciated!
Example:
Date Orders Date ToY
2/1/2018 15 2/10/2018 1
2/2/2018 19 2/20/2018 1.2
2/3/2018 12 3/2/2018 1.5
2/4/2018 18
2/5/2018 15
2/6/2018 14
2/7/2018 11
2/8/2018 16
2/9/2018 16
2/10/2018 18
2/11/2018 15.40 Projection =avg(orders)*ToY
2/12/2018 15.40 Projection
2/13/2018 15.40 Projection
2/14/2018 15.40 Projection
2/15/2018 15.40 Projection
2/16/2018 15.40 Projection
2/17/2018 15.40 Projection
2/18/2018 15.40 Projection
2/19/2018 15.40 Projection
2/20/2018 18.48 Projection
2/21/2018 18.48 Projection
2/22/2018 18.48 Projection
2/23/2018 18.48 Projection
2/24/2018 18.48 Projection
2/25/2018 18.48 Projection
2/26/2018 18.48 Projection
2/27/2018 18.48 Projection
2/28/2018 18.48 Projection
3/1/2018 18.48 Projection
3/2/2018 23.10 Projection
3/3/2018 23.10 Projection

Create a new table where you will put your projections. One way to do this is New Table > Projection = CALENDARAUTO().
Now that you have a date column, we'll add a projected orders calculated column.
(I will assume you have two table, Sales and Forecast where the former corresponds to Date and Orders columns of the first 10 rows of your example data and the latter corresponds to the Date and ToY data.)
ProjectedOrders =
VAR AverageOrder = AVERAGE(Sales[Orders])
VAR ToYDate = CALCULATE(MAX(Forecast[Date]),
Forecast[Date] < EARLIER(Projection[Date]))
RETURN
IF(Projection[Date] < MIN(Sales[Date]),
BLANK(),
IF(Projection[Date] IN VALUES(Sales[Date]),
LOOKUPVALUE(Sales[Orders], Sales[Date], Projection[Date]),
AverageOrder * LOOKUPVALUE(Forecast[ToY], Forecast[Date], ToYDate)
))
This gives a blank if the date is before any of your sales data and duplicates the sales data for the dates where sales data exists. Otherwise, it multiplies the average order with the last ToY value that occurs before the date in that row.

Related

poly2nb function takes too much time to be computed

I have a data frame that have information about crimes (variable x), and latitude and longitude of where that crime happened. I have a shape file with the districts from são paulo city. That is df:
latitude longitude n_homdol
1 -23.6 -46.6 1
2 -23.6 -46.6 1
3 -23.6 -46.6 1
4 -23.6 -46.6 1
5 -23.6 -46.6 1
6 -23.6 -46.6 1
And a shape file for the districts of são paulo,sp.dist.sf :
geometry NOME_DIST
1 POLYGON ((352436.9 7394174,... JOSE BONIFACIO
2 POLYGON ((320696.6 7383620,... JD SAO LUIS
3 POLYGON ((349461.3 7397765,... ARTUR ALVIM
4 POLYGON ((320731.1 7400615,... JAGUARA
5 POLYGON ((338651 7392203, 3... VILA PRUDENTE
6 POLYGON ((320606.2 7394439,... JAGUARE
With the help of #Humpelstielzchen, i join both data doing:
sf_df = st_as_sf(df, coords = c("longitude", "latitude"), crs = 4326)
shape_df<-st_join(sp.dist.sf, sf_df, join=st_contains)
My final goal is to implement a local moran i statistic, and i'm trying to do this with:
sp_viz <- poly2nb(shape_df, row.names = shape_df$NOME_DIST)
xy <- st_coordinates(shape_df)
ww <- nb2listw(sp_viz, style ='W', zero.policy = TRUE)
shape_df[is.na(shape_df)] <- 0
locMoran <- localmoran(shape_df$n_homdol, ww)
sids.shade <- auto.shading(c(locMoran[,1],-locMoran[,1]),
cols=brewer.pal(5,"PRGn"))
choropleth(shape_df, locMoran[,1], shading=sids.shade)
choro.legend(-46.5, -20, sids.shade,fmt="%6.2f")
title("Criminalidade (Local Moran's I)",cex.main=2)
But when i run the code, it takes hours to compute:
sp_viz <- poly2nb(shape_df, row.names = shape_df$NOME_DIST)
I have 15,000 observations, for 93 districts. I tried to run the above code with only 100 observations, and it was fast and everything went right. But with the 15,000 obs i did not see the result, because de computation goes on forever. What may be happening? I am doing something wrong? Is there a better way to do this Local moran I test?
As I can't just comment, here is some questions one might ask:
- how long do you mean by fast? some of my scripts run in seconds and I call it slow.
- are all your observation identically structured? maybe the poly2nb() function is infinitely looping on an item which has an uncommon structure. You can use the unique() function to ensure this point.
- Did you try to cut your dataset into pieces and to run each piece separately? this would help to see 1/ whether one of your parts has something to be corrected and 2/ whether R is loading all data at the same time, overloading the memory of your computer. Beware, this happen really often with huge datasets in R (and by huge, I mean data tables of > 50 Mo wheight).
Glad to have tried to help you, do not hesitate to question my answer !

brownian.bridge slow calculation and Error in area.grid[1, 1] : incorrect number of dimensions

I am trying to calculate some BBMM.contours for caribou during a movement period in northern Canada.
I am still in the exploratory phase of using this function, and have worked through some tutorials which worked fine, but now that I am trying my sample data the brownian.bridge function seems to be taking an eternity.
I understand that this is a function that can take a long time to calculate, but I have tried subsetting my data to including fewer and fewer locations, simply to see if the end product is what I want before committing to running the dataset with thousands of locations. Currently I only have 34 locations in the subset, and I have waited over night for it to run without any completion.
When I used some practice Panther location data with 1000 locations it took under a minute to run, so I am thinking there is something wrong with my code or my data.
Any help working through this would be greatly appreciated.
#Load data
data<-(X2017loc)
#Used to sort data in code below for all caribou
data$DT <- as.POSIXct(data$TimeStamp, format='%Y-%m-%d %H:%M:%S')
#Sort Data
data <- data[order(data$SAMPLED_ANIMAL_ID, data$DT),]
#TIME DIFF NECESSARY IN BBMM CODE
###Joel is not sure about this part...Timelag is maybe time until GPS upload???.
timediff <- diff(data$DT)
data <- data[-1,]
data$timelag <-as.numeric(abs(timediff))
#set Timelag
data <- data[-1,] #Remove first record with wrong timelag
data$SAMPLED_ANIMAL_ID <- factor(data$SAMPLED_ANIMAL_ID)
data<-data[!is.na(data$timelag), ]
data$LONGITUDE<-as.numeric(data$LONGITUDE)
data$LATITUDE<-as.numeric(data$LATITUDE)
BBMM = brownian.bridge(x=data$LONGITUDE, y=data$LATITUDE, time.lag=data$timelag, location.error=6, cell.size=30)
bbmm.summary(BBMM)
Additional information:
Timelag is in seconds and
Collars have 6m location error
I am not certain what the cell.size refers to and how I should determine this number.
SAMPLED_ANIMAL_ID LONGITUDE LATITUDE TimeStamp timelag
218 -143.3138219 68.2468358 2017-05-01 02:00 18000
218 -143.1637592 68.2687447 2017-05-01 07:00 18000
218 -143.0699697 68.3082906 2017-05-01 12:00 18000
218 -142.8352869 68.3182258 2017-05-01 17:00 18000
218 -142.7707111 68.2892111 2017-05-01 22:00 18000
218 -142.5362769 68.3394269 2017-05-02 03:00 18000
218 -142.4734997 68.3459528 2017-05-02 08:00 18000
218 -142.3682272 68.3801822 2017-05-02 13:00 18000
218 -142.2198042 68.4023253 2017-05-02 18:00 18000
218 -142.0235464 68.3968672 2017-05-02 23:00 18000
I would suggest to use cell.size = 100 instead of area.grid since for area.grid, you would have to define a unique rectangular grid for all animals (which could increase compute time).
Ok, I have answered my original question, in that I was missing the following code to reproject the latlong to UTM.
data <- SpatialPoints(data[ , c("LONGITUDE","LATITUDE")], proj4string=CRS("+proj=longlat +ellps=WGS84"))
data <- spTransform(data, CRS("+proj=utm +west+zone=7 +ellps=WGS84"))

Summing values for a month in R

please see data sample as follows:
3326 2015-03-03 Wm Eu Apple 2L 60
3327 2015-03-03 Tp Euro 2 Layer 420
3328 2015-03-03 Tpe 3-Layer 80
3329 2015-03-03 14/3 Bgs 145
3330 2015-03-04 T/P 196
3331 2015-03-04 Wm Eu Apple 2L 1,260
3332 2015-03-04 Tp Euro 2 Layer 360
3333 2015-03-04 14/3 Bgs 1,355
Currently graphing this data creates a really horrible graph because the amount of cartons change so rapidly by day. It would make more sense to sum the cartons by month so that each data point represents a sum for that month rather than an individual day. The current range of the data is 11/01/2008-04/01/2015.
This is the code that I am using to graph (which may or may not be relevant for this):
ggvis(myfile, ~Shipment.Date, ~ctns) %>%
layer_lines()
Shipment.Date is column 2 in the data set and ctns is the 4th column.
I don't know much about R and have given it a few trys with some code that I have found here but I don't think I have found a problem similar enough to match the code. My idea is to create a new table, sum Act. Ctns for the month and then save it as that new table and graph from there.
Thanks for any assistance! :)
Do you need this:
data.aggregated<-aggregate(list(new.value=data$value),
by=list(date.time=cut(data$date.time, breaks="1 month")),
FUN=function(x) sum(x))

Merge spatial point dataset with Spatial grid dataset using R. (Master dataset is in SP Points format)

I am working on spatial datasets using R.
Data Description
My master dataset is in SpatialPointsDataFrame format and has surface temperature data (column names - "ruralLSTday", "ruralLSTnight") for every month. Data snippet is shown below:
Master Data - (in SpatialPointsDataFrame format)
TOWN_ID ruralLSTday ruralLSTnight year month
2920006.11 2920006 303.6800 289.6400 2001 0
2920019.11 2920019 302.6071 289.0357 2001 0
2920015.11 2920015 303.4167 290.2083 2001 0
3214002.11 3214002 274.9762 293.5325 2001 0
3214003.11 3214003 216.0267 293.8704 2001 0
3207010.11 3207010 232.6923 295.5429 2001 0
Coordinates:
longitude latitude
2802003.11 78.10401 18.66295
2802001.11 77.89019 18.66485
2803003.11 79.14883 18.42483
2809002.11 79.55173 18.00016
2820004.11 78.86179 14.47118
I want to add columns in the above data about rainfall and air temperature - This data is present in SpatialGridDataFrame in the table "secondary_data" for every month. Snippet of "secondary_data" is shown below:
Secondary Data - (in SpatialGridDataFrame format)
month meant.69_73 rainfall.69_73
1 1 25.40968 0.6283871
2 2 26.19570 0.4580542
3 3 27.48942 1.0800000
4 4 28.21407 4.9440000
5 5 27.98987 9.3780645
Coordinates:
longitude latitude
[1,] 76.5 8.5
[2,] 76.5 8.5
[3,] 76.5 8.5
[4,] 76.5 8.5
[5,] 76.5 8.5
Question
How do I add the columns from secondary data to my master data by matching over latitude longitude and month? Currently the latitude/longitude information in the two table above will not match exactly as master data is a set of points and secondary data is grid.
Is there a way to find the square of the grid on the "Secondary Data" that the lat/long of my master data falls into, and interpolate?
If your SpatialPointsDataFrame object is called x, and your SpatialGridDataFrame is called y, then
x <- cbind(x, over(x, y))
will add the attributes (grid cell values) of y matching to the locations of x, to the attributes of x. Match is done by point-in-grid cell.
Interpolation is a different question; a simple way would be inverse distance with the four nearest neighbours, e.g. by
library(gstat)
x = idw(meant.69_73~1, y, x, nmax = 4)
whether you want one, or the other really depends on what your grid cells mean: do they refer to (i) the point value at the grid cell center, (ii) a value that is constant throughout the grid cell, or (iii) an average value over the whole grid cell. First case: interpolate, second: use over, third: use area-to-point interpolation (not explained here).
R package raster will offer similar functionality, but use different names.

easy way to subset data into bins

I have a data frame as seen below with over 1000 rows. I would like to subset the data into bins by 1m intervals (0-1m, 1-2m, etc.). Is there an easy way to do this without finding the minimum depth and using the subset command multiple times to place the data into the appropriate bins?
Temp..ºC. Depth..m. Light time date
1 17.31 -14.8 255 09:08 2012-06-19
2 16.83 -21.5 255 09:13 2012-06-19
3 17.15 -20.2 255 09:17 2012-06-19
4 17.31 -18.8 255 09:22 2012-06-19
5 17.78 -13.4 255 09:27 2012-06-19
6 17.78 -5.4 255 09:32 2012-06-19
Assuming that the name of your data frame is df, do the following:
split(df, findInterval(df$Depth..m., floor(min(df$Depth..m.)):0))
You will then get a list where each element is a data frame containing the rows that have Depth..m. within a particular 1 m interval.
Notice however that empty bins will be removed. If you want to keep them you can use cut instead of findInterval. The reason is that findInterval returns an integer vector, making it impossible for split to know what the set of valid bins is. It only knows the values it has seen and discards the rest. cut on the other hand returns a factor, which has all valid bins defined as levels.

Resources