My data is memory consumption of an application for every 10 minute interval for the last 26 days.My start date is Oct 6th 2013 and end date is Novemeber 2nd 2013.I've read the data in to a time frame and cleaned it up. Now am trying to create a time series , something along the lines of my_ts<-ts(mydata[3],start=c(2013,10),frequency=10)
Am sure this not correct as the frequency , can someone point me in the right direction so I can plot the time series
.
In R, frequency actually means the period of the seasonality. i.e., frequency = frequency of observations per season. In your case, the "season" is presumably one day. So you want
ts(mydata[3],start=c(2013,10),frequency=24*60/10)
Related
This is my first question on stackoverflow, sorry if the question is poorly put.
I am currently developing a project where I predict how much a person drinks each day. I currently have data that looks like this:
The menge column represents how much water a person has actually drunk in 30 minutes (So first value represents amount from 8:00 till before 8:30 etc..). This is a 1 day sample from 3 months of data. The day starts at 8 AM and ends at 8 PM.
I am trying to forecast the Time Series for each day. For example, given the first one or two time steps, we would predict the whole day and then we know how much in total the person has drunk until 8 PM.
I am trying to model this data as a Time Series object in R (Google Colab), in order to use Croston's Method for the forecasting. Using the ts() function, what should I set the frequency to knowing that:
The data is half-hourly
The data is from 8:00 till 20:00 each day (Does not span the whole day)
Would I need to make the data span the whole day by adding 0 values? Are there maybe better approaches for this? Thank you in advance.
When using the ts() function, the frequency is used to define the number of (usually regularly spaced) observations within a given time period. For your example, your observations are every 30 minutes between 8AM and 8PM, and your time period is 1 day. The time period of 1 day assumes that the patterns over each day is of most interest here, you could also use 1 week here.
So within each day of your data (8AM-8PM) you have 24 observations (24 half hours). So a suitable frequency for this data would be 24.
You can also pad the data with 0 values, however this isn't necessary and would complicate the model. If you padded the data so that it has observations for all half-hours of the day, the frequency would then be 48.
I have a dataframe with Precipitation data for every day from January 1961 to December 2017 that looks like this:
DF=data.frame(Years,Month,Day,Precipitation Value)
I want to create periods of 30 days starting with 1th of January of 1961 so the first period will be 1st january to 30th January 1961 and want R to calculate the number of days without rain (Precipitation Value=0). Then, I want to do the same with the next day: 2th January so the period will be 2nd january-31st January, etc. After that, I need R to create a data frame with all the results for the year 1961. So it should be a data frame with of only one column with values (those values will be the number of days without rain in every period).
Then I need to do the same thing with all the years. Which means I will end up with 56 dataframes (1 for each year) and after that I could make a matrix with all of them (putting each data frame as a row).
The thing is I DO NOT KNOW how to start. I have no idea about how making the loop. I know it should be really easy, but I am having trouble with doing it. Specially i do not know how to tell R to stop every different year and start over and make a NEW data frame/vector with values.
Please provide a reproducible subset of your data so others can help you more effectively. While I cannot teach you how to create a loop from scratch here is some code that I think will help. This code simply calculates the moving 30 day average of precipitation using a simple for loop. You can use dplyr to filter these moving averages by year and create data frames doing that. Note I'm not counting the number of no precipitation days here but you can modify the loop easily to do that if needed
df<-data.frame(year = rep(1967:2002, each =12*30),
month = rep(rep(1:12, each = 30), 36),
day = rep(seq(1,30, by = 1), 432),
precipitation = sample(1:2000, 12*36))
df
#create a column that goes from 1 to however long your dataframe is
df$marker <- 1:nrow(df)
#'Now we create a simple loop to calculate the mean precipitation for
#'every 30 day window. You can modify this to count the number of days with
#'0 precipitation
#'the new column moving precip will tell you the mean precipitation for the
#' past 30 days relative to its postion. So if your on row 55, it will give
#' you the mean precipitation from row 25 to 55
df$movingprecip<-NA
for(i in 1:nrow(df)){
start = i #this says we start at i
end = i + 30 #we end 30 days later from i
if(end > nrow(df)){
#here I tell R to print this if there is not enough days
#in the dataset (30 days) to calculate the 30 day window mean
#this happens at the beginning of the dataset because we need to get to the
#30th row to start calculating means
print("not able to calculate, not 30 days into the data yet")
}else{
#Here I calculate the mean the of the past 30 days of precip
df$movingprecip[end] = mean(df[start:end,4])}
}
times booked_res
11:00 23
13:00 26
15:00 27
17:00 25
19:00 28
21:00 30
So I need to use the ts() function in R to convert this frame into a time series. The column on the right are the number of people reserved in each time. How should I approach this? I'm not sure about the arguments and I don't know if the frequency should be set to 24 (hours in a day) or 10 (11:00 to 21:00) as shown above. Any help appreciated.
First, find the frequency by noting that you are taking a sample every two minutes. The frequency is the inverse of the time between samples, which is 1/2 samples per minute or 30 samples per hour. The data you're interested in is on the right, so you can just use that data vector rather than the entire data frame. The code to convert that into a time series is simply:
booked_res <- c(23,26,27,25,28,30)
ts(booked_res,frequency = 30)
A simple plot with your data might be coded like this:
plot(ts(booked_res,frequency = 30),ylab='Number of people reserved',xlab='Time (in hours) since start of sampling',main='Time series chart of people reservations')
UPDATE:
A time series model can only be created in R when the times series is stationary. Using a varying sample rate would make the time series non-stationary, and so you wouldn't be able to create a time-series object in R.
This page on Analytics Vidhya provides a nice definition of stationary and non-stationary time series, while this page on R bloggers gives some resources that relate to analyzing a non-stationary time series.
I am trying to plot a decomposed time series, but running into an error:
Error in decompose(ts_ret) : time series has no or less than 2 periods`.
I am forcing the time series to a fixed period that is higher than 2.
Why does the ts think the period is less than 2?
Shouldn't the period be set automatically based on the time intervals in the data? (which are daily)
rm(list=ls())
library(jsonlite)
library(xts)
item.id<-18
eve.url<-paste0("http://eve-marketdata.com/api/item_history2.json?char_name=demo®ion_ids=10000002&type_ids=",item.id,"&days=100")
eve.data<-data.frame(fromJSON(txt=eve.url))$emd.row
eve.data$date<-as.POSIXct(eve.data$date,format="%Y-%m-%d",tz="EST")
xxx<-xts(as.numeric(eve.data[,"avgPrice"]),eve.data$date)
colnames(xxx)<-"trit"
ts_ret<-ts(xxx,frequency=52) #but Im setting the periods here.....
plot(decompose(ts_ret))
As #ufelder pointed out my dataset was too small to look at seasonal decomposition because I only had a few months of data (measured hourly), but not an entire seasons worth (which is 4 months). To fix this I had to modify the period of the dataset to once per day by using ts(xxx,frequency=365) so decompose would compare across days, not seasons.
I have two time series.
Each point in either time series is for a week. A week here is not exactly a calendar week, but the first week in a calendar year always starts from Jan 1, and the other weeks in the same year follow that, and the last week of the year may contain more than 7 days but no more than 13 days.
The first time series A is stored in a compressed (.gz) text file A.gz, which looks like (each week and the corresponding time series value are separated by a comma in a line):
week,value
20060101-20060107,0
20060108-20060114,5
...
20061217-20061223,0
20061224-20061230,0
20070101-20070107,0
20070108-20070114,4
...
20150903-20150909,0
20150910-20150916,1
The second time series B is similarly stored in a compressed (.gz) text file B.gz, but over a subset of period of A, which looks like:
week,value
20130122-20130128,509
20130129-20130204,204
...
20131217-20131223,150
20131224-20131231,148.0
20140101-20140107,365.0
20140108-20140114,45.0
...
20150305-20150311,0
20150312-20150318,364
I wonder how to calculate the cross correlation between the two time series A and B (up to a specified maximum lag), and plot A and B in a single plot, in R?
Thanks