Moving average with dynamic window - r
I'm trying to add a new column to my data table that contains the average of some of the following rows. How many rows to be selected for the average however depends on the time stamp of the rows.
Here is some test data:
DT<-data.table(Weekstart=c(1,2,2,3,3,4,5,5,6,6,7,7,8,8,9,9),Art=c("a","b","a","b","a","a","a","b","b","a","b","a","b","a","b","a"),Demand=c(1:16))
I want to add a column with the mean of all demands, which occured in the weeks ("Weekstart") up to three weeks before the respective week (grouped by Art, excluding the actual week).
With rollapply from zoo-library, it works like this:
setorder(DT,-Weekstart)
DT[,RollMean:=rollapply(Demand,width=list(1:3),partial=TRUE,FUN=mean,align="left",fill=NA),.(Art)]
The problem however is, some data is missing. In the example, the data for the Art b lack the week no 4, there is no Demand in week 4. As I want the average of the three prior weeks, not the three prior rows, the average is wrong. Instead, the result for Art b for week 6 should look like this:
DT[Art=="b"&Weekstart==6,RollMean:=6]
(6 instead of 14/3, because only Week 5 and Week 3 count: (8+4)/2)
Here is what I tired so far:
It would be possible to loop through the minima of the week of the following rows in order to create a vector that defines for each row, how wide the 'width' should be (the new column 'rollwidth'):
i<-3
DT[,rollwidth:=Weekstart-rollapply(Weekstart,width=list(1:3),partial=TRUE,FUN=min,align="left",fill=1),.(Art)]
while (max(DT[,Weekstart-rollapply(Weekstart,width=list(1:i),partial=TRUE,FUN=min,align="left",fill=NA),.(Art)][,V1],na.rm=TRUE)>3) {
i<-i-1
DT[rollwidth>3,rollwidth:=i]
}
But that seems very unprofessional (excuse my poor skills). And, unfortunately, the rollapply with width and rollwidth doesnt work as intended (produces warnings as 'rollwidth' is considered as all the rollwidths in the table):
DT[,RollMean2:=rollapply(Demand,width=list(1:rollwidth),partial=TRUE,FUN=mean,align="left",fill=NA),.(Art)]
What does work is
DT[,RollMean3:=rollapply(Demand,width=rollwidth,partial=TRUE,FUN=mean,align="left",fill=NA),.(Art)]
but then again, the average includes the actual week (not what I want).
Does anybody know how to apply a criterion (i.e. the difference in the weeks shall be <= 3) instead of a number of rows to the argument width?
Any suggestions are appreciated!
Related
Grouping data based on difference in days
I have a data frame that has 3 columns a subid , test,day. For each subject, I want to identify which tests happened within a time frame of x days and calculate max change in test value. Please see example below. For each subject and a given test ,I want to identify which tests happened within 3 days. so if we look at "Day" column, for the value =1 it wont have any groups as subsequent test was done 6 days after. Values of Day= 10,7,8,9 should be identified as a group and the max change among these should be calculated. Similarly Day = 12,11,10,9 should be identified as another group and the max change among these should be calculated. How can i do this using R. Thank you in advance.
How to combine sets of similar data under common columns (a specific case)
Here's a portion of a dataset that I have of daily closes for certain stocks within a common period of time in .xlsx format: What I need is an R script that would produce something like this: So I need a row for each stock everyday for the time period and the corresponding prices for them in the third column like above. Of course, I have more than 100 stocks for a period of 4 years. So that makes more than 100 rows for each day for 4 years. For example, a hundred rows of the day 5.01.2015 and so forth. I'm still very new to R so help is very much appreciated.
How do I calculate overlapping three-day log returns in the same dataframe in R?
I've just started learning R. As for now, I have prices PRC in a dataframe test together with the date and several other variables. My goal is to calculate the following within the same dataframe so I can maintain the connection to the date. 1. Overlapping three-day log returns 2. One-day log returns Through other posts I came up with the following code for the three day lag returns and the one-day lag returns respectively, but I am still unsure on how to incorporate it into my dataframe: test$logR3 <- diff(log(test$PRC)), lag=3) This code currently doesn't work due to the difference in number of rows. How do I take this into account? Can I somehow put zeros or NAs in order to fill the missing rows? Thank you in advance.
maybe something like: days=c() for(i in seq(3,nrow(test),3)){ #loop through it in steps of 3 one_day_ago_diff=log(test$PRC[i])-log(test$PRC[i-1]) #difference between today and yesterday three_days_ago_diff=log(test$PRC[i])-log(test$PRC[i-3]) #difference between today and three days ago days=c(days,c(three_days_ago_diff,NA,one_day_ago_diff)) # fills empty vector with diff from 3 days ago- followed by NA to skip 2 days ago and then one day ago } if(length(days)<nrow(test)){days=c(days, rep(NA,nrow(test)-length(days)))} #check they're the same length test$lags=days #add column to test
Tableau - Average of Ranking based on Average
For a certain data range, for a specific dimension, I need to calculate the average value of a daily rank based on the average value. First of all this is the starting point: This is quite simple and for each day and category I get the AVG(value) and the Ranke based on that AVG(Value) computed using Category. Now what I need is "just" a table with one row for each Category with the average value of that rank for the overall period. Something like this: Category Global Rank A (blue) 1,6 (1+3+1+1+1+3)/6 B (orange) 2,3 (3+2+3+2+2+2)/6 C (red) 2,0 (2+1+2+3+3+1)/6 I tried using the LOD but it's not possble using rank table calculation inside them so I'm wondering if I'm missing anything or if it's even possible in Tableau. Please find attached the twbx with the raw data here: Any Help would be appreciated.
Get Maximum Values of a column as a function of another column
I have a column with a few dozen grades that have been assigned values Good, Average or Poor. I have a different column with employment rates. I want the maximum employment rate associated with Good, Average and Poor. I can get it to pull the value for each one in three different commands using the code below, but I need it written as a single command similar to this: max(unHomework$Employment.Rate[unHomework$Job.Satisfaction.Category == 'Poor'])
We can use data.table library(data.table) setDT(unHomework)[, .(MaxER =max(Employment.Rate)), by = Job.Satisfaction.Category]