I was wondering if anybody could help. If I had a data set containing two columns of date and river flow, how could I obtain the top 100 largest values of river flow, with the condition of having at least a duration of XX days (e.g. 14 days) between each "peak" (i.e. two values which fall within two weeks of each other would only count as 1 peak).
Date
Q
01/01/1990
24
02/01/1990
18
03/01/1990
40
I started by ranking all values and then picking out each peak and manually calculating if the next peak fell outside the 14 day period but I was wondering if this could be performed using a formula. Thanks.
Related
This is my first question on stackoverflow, sorry if the question is poorly put.
I am currently developing a project where I predict how much a person drinks each day. I currently have data that looks like this:
The menge column represents how much water a person has actually drunk in 30 minutes (So first value represents amount from 8:00 till before 8:30 etc..). This is a 1 day sample from 3 months of data. The day starts at 8 AM and ends at 8 PM.
I am trying to forecast the Time Series for each day. For example, given the first one or two time steps, we would predict the whole day and then we know how much in total the person has drunk until 8 PM.
I am trying to model this data as a Time Series object in R (Google Colab), in order to use Croston's Method for the forecasting. Using the ts() function, what should I set the frequency to knowing that:
The data is half-hourly
The data is from 8:00 till 20:00 each day (Does not span the whole day)
Would I need to make the data span the whole day by adding 0 values? Are there maybe better approaches for this? Thank you in advance.
When using the ts() function, the frequency is used to define the number of (usually regularly spaced) observations within a given time period. For your example, your observations are every 30 minutes between 8AM and 8PM, and your time period is 1 day. The time period of 1 day assumes that the patterns over each day is of most interest here, you could also use 1 week here.
So within each day of your data (8AM-8PM) you have 24 observations (24 half hours). So a suitable frequency for this data would be 24.
You can also pad the data with 0 values, however this isn't necessary and would complicate the model. If you padded the data so that it has observations for all half-hours of the day, the frequency would then be 48.
I would like to scale & center some data, I know how to scale it with
(scale(data.test[,1],center=TRUE,scale=TRUE))
I have 365 observations (one year), and would like to scale & center my data for a lookback period of 20 days.
For example I would like to do that:
"Normalized for a 20day lookback period" means that to scale my first value 01/01/2014 (dd/mm/yy) I have to scale it only with the 20 days before. So with values from the 11/12/13 to 31/12/13
And for the 02/01/14 scale it from the 12/12/13 to the 01/01/14 etc
Normalize the data would be
= ( the data - the mean of all data / standard deviation of all the data (see my code )
But as I want "20 day lookback period" means that I have to only look at the 20 last values it would be
= (the data - the mean of the 20 previous data) / standard deviation of the 20 previous data
I thought to make a loop maybe? As I am very new to R I don't know how to write a loop in R or even if there is a better way to do what I want...
If you could help me with this.
You want a 20 days lookback : lookback<-20 data.scale<-c() #Create
a vector for the data scaled for(i in lookback:nrow(data)){
mean<-mean(data[i-(lookback-1):i,1],na.rm=T)
sd<-sd(data[i-(lookback-1):i,1],na.rm=T)*sqrt(((lookback-1))/lookback)
data.scale<-c(data.scale,(data[i,1]-mean)/sd) }
for the row 20 you want to normalized with the data from day 1 to day 20, day 21 from day 2 to day 21 and so on...
I have two time series.
Each point in either time series is for a week. A week here is not exactly a calendar week, but the first week in a calendar year always starts from Jan 1, and the other weeks in the same year follow that, and the last week of the year may contain more than 7 days but no more than 13 days.
The first time series A is stored in a compressed (.gz) text file A.gz, which looks like (each week and the corresponding time series value are separated by a comma in a line):
week,value
20060101-20060107,0
20060108-20060114,5
...
20061217-20061223,0
20061224-20061230,0
20070101-20070107,0
20070108-20070114,4
...
20150903-20150909,0
20150910-20150916,1
The second time series B is similarly stored in a compressed (.gz) text file B.gz, but over a subset of period of A, which looks like:
week,value
20130122-20130128,509
20130129-20130204,204
...
20131217-20131223,150
20131224-20131231,148.0
20140101-20140107,365.0
20140108-20140114,45.0
...
20150305-20150311,0
20150312-20150318,364
I wonder how to calculate the cross correlation between the two time series A and B (up to a specified maximum lag), and plot A and B in a single plot, in R?
Thanks
I am trying to calculate how many days a utility bill falls within two categories (date ranges).i.e. a bill range may be between 16/8/14 - 14/10/14 (total of 60 days incl) and I want to work out how many days fall in the 1/10/2014-31/3/15 season and how many days fall within the 1/4/14-30/9/14 season. It should be 14 and 46 but I am only getting 13 and 44. Any suggestions? thanks
My data is memory consumption of an application for every 10 minute interval for the last 26 days.My start date is Oct 6th 2013 and end date is Novemeber 2nd 2013.I've read the data in to a time frame and cleaned it up. Now am trying to create a time series , something along the lines of my_ts<-ts(mydata[3],start=c(2013,10),frequency=10)
Am sure this not correct as the frequency , can someone point me in the right direction so I can plot the time series
.
In R, frequency actually means the period of the seasonality. i.e., frequency = frequency of observations per season. In your case, the "season" is presumably one day. So you want
ts(mydata[3],start=c(2013,10),frequency=24*60/10)