I work for a utility company and I am writing a report where I need to count how many consecutive times a meter has estimated 1, 2, 3, 4 or more times. For example, if a meter estimated in February, and March, had a normal read in April, then started estimating again in May, June, July and August, then I want to have a column that tells me that the meter estimated 4 consecutive times. Right now it is telling me 6, since it is not stopping at the regular read. HELP!
Related
I'm having a problem with the definition of order of AR, MA and ARMA time series forecasting processes. Imagine we have a time series with data from January to December, and we're in July, trying to predict August. When we say AR(2), are we using lags relating to July and June, or can those two months be any month between January and June?
I tried checking multiple sources but they define order as different things.
The order indicates how many of the previous data points are to be used for forecasting. You use an AR(2) model and want to predict August. That means, the last two data points available and known to you are July und June. If you then want to predict September, your model will use the data from August and July and so on.
You can select the data points with the order. If you simply give a number, all previous data points in that range will be used, e.g. an order of two results in the last 2 lags or an order of 5 results in the last 5 lags. You can also specify the lags in []. An order of [1,3,5] means that your model uses the last, third-last and fifth-last lags, in your case that would be July, May and March, leaving out the second- and fourth-last lag. That would then be an AR([1,3,5]) model.
For fixed date planning with a 2 Weeks sprint, I will determine how many sprints I have between two dates, April 1, 2020, and June 4, 2020. The number of weeks between the two dates is 9, 1 weeks. Does this mean there are 4,5 sprints?
I advise that there is no such thing as half a Sprint and to round to a whole number. So, for your calculation, I'd say you have four Sprints.
I have recently encountered an error message upon running my for loop as follows:
Error: object 'Month1' not found
res_org<-read.csv("XXX", sep=",", header=TRUE)
dim(res_org)
exp<-res_org
head(res_org)
exp1<-res_org
dim(exp1)
n<-length(exp1$Longitude)
i=1
for (i in 1:n){
exp$Month1<-c()
exp$Year1<-c()
Month1[i]<-exp$Month[i] - 1
if(Month1[i]<=0){
(exp1[i, 'Year1'] =(exp1$Year[i]-1))&
(exp1[i, 'Month1']=(exp1$Month[i]-1+12))
}
else{
(exp1[i, 'Year1'] = exp1$Year[i]) &
(exp1[i, 'Month1']= (exp1$Month[i] - 1))
}
}
head(exp1)
write.csv(exp1, file = "XXX")
It just seem very odd to me that R is not creating another new Month1 variable based on my specification: Month1[i]<-exp$Month[i] - 1 even though I have the variables for 'Month' and 'Year' in the imported dataset.
Purpose of the above for loop:
The goal is to subtract a single interval from each month (i = 1, 2, 3,...12) while simultaneously modifying the year for the purpose of effectively preparing the annual average calculations through the subsequent kriging imputation.
For example, at an interval of one, December (12) will be converted into November (11), February (2) will be converted into January (1), January (1) will be converted into December (12), and so on. Likewise, at an interval of two, December (12) will be converted into October (10), February (2) will be converted into December (12), January (1) will be converted into November (11), and so on and so forth.
Regarding the years, all years should remain the same except for scenarios when certain months in a year are converted to previous months in a past year (e,g, at an interval of one, if January (1) becomes December (12), the year would subsequently change to the previous year ; at an interval of two, if February (2) becomes December (12) or January (1) becomes November (11), the year would subsequently change to the previous year as well.
So, I would greatly appreciate it if anyone could potentially offer a solution to the above error message or even provide another for loop to achieve my goal as discussed above. Thank you in advance!
Best,
Lisa
How can a day or week or month, essentially a range of time be represented by a single number?
The next interval would represent a number 1 more than the number for the previous interval, just how the next second is 1 more than the previous second, in timestamp representation.
Given a bunch of such numbers, the larger number simply means its representing a time interval afterwards in time, when compared to a number smaller than it.
Just realized if I stick to UTC and represent the day as YYYYMMDD, this becomes a number that I am looking for.
20180420 // 20 april 2018
20180421 // 21 april 2018
20180510 // 10 may 2018
20190101 // 1 jan 2019
This works for representing a day perfectly, I think.
For week, maybe do ceil() of days of current month divided by 7 for representing week as a number W and then using the format: YYYYMMW.
2018043 // 3rd week of april 2018
2018045 // 5th week of april 2018, though may not be the 5th week semantically but representation model works, greater than 4th week of april 2018 and smaller number than 1st week of may 2018
For month, simply YYYYMM works.
I feel so smart right now! 😄
I have a number of data points that I am trying to extract a meaningful pattern from (or derive an equation that could then be predictive). I am trying to find a correlation (?) between RANK and DAILY SALES for any given ITEM.
So, for any given item, I have (say) two weeks of daily information, each day consists of a pairing of Inventory, and Rank.
ITEM #1
Monday: 20 in stock (rank 30)
Tuesday: 17 in stock (rank 29)
Wednesday: 14 in stock (rank 31)
The presumption is that 3 items were sold each day, and that selling ~3 a day is roughly what it means to have a rank of ~30.
Given information like this across a wide span (20,000 items, over 2 weeks) of inventory/rank/date pairings, I'd like to derive an equation/method of estimating what the daily sales would be for any given rank.
There's one problem:
The data isn't entirely clean, because -occasionally- the inventory fluctuates upward, either because of re-stocking, or because of returns. So for example, you might see something like
MONDAY: 30 in stock.
TUESDAY: 20 in stock.
WEDNESDAY: 50 in stock.
THURSDAY: 40 in stock.
FRIDAY: 41 in stock.
Indicating that, between Tuesday and wednesday, 30 more were replenished, and on thursday, one was returned.
I am planning to use mean and standard deviation on Daily sales for given rank.
So if any rank given I can predict the daily sales based on mean and standard deviation values.
Is this correct approach? IS there any better approach for this scenario
Sounds like this could be a good read for you, fpp
It provides an introduction to timeseries forecasting. Timeseries forecasting
has a lot of nuance so it can trip people up pretty easily. Some of the issues
you have already noted (e.g. seasonality). Others pertain to the statistical
properties of such series of data. Take a look through this and