Time series object - r

I have one table with two columns DATE and Q.
DATE Q
--------------------
2013-01-04 932
2013-01-05 409
2013-01-08 511
2013-01-11 121
2013-01-12 252
2013-01-13 201
2013-01-14 40
2013-01-15 66
2013-01-17 NA
2013-01-18 123
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 10 obs. of 2 variables:
$ DATE: POSIXct, format: "2013-01-04" "2013-01-05" "2013-01-08" "2013-01-11" ...
$ Q: num 932 409 511 121 252 201 40 66 NA 123 ..
You can see from data, there is a irregular frequency.First column have data which are converted into date format and in the second column data is numeric. So my intention is to convert this table into times series object, for further projections with forecast package.
So can anyone help me with some code to convert this table into ts object?

time <- seq(as.Date("2018-1-1"),as.Date("2019-1-1"),by=1)
df <- data.frame(Time=Time)
output <- dplyr::left_join(df,YOUR_TABLE,by="DATE")
Your table should have date column by name "DATE". So now you have NA values when your data is missing and you can transform your data to time series. I dont know if it this would help, for me sometimes it does. Maybe tackle NA problem with some replacing method.

Related

How to to show hour term when only minutes and seconds is known in a dataframe?

I have a dataframe, df
str(df)
'data.frame': 1577 obs. of 2 variables:
$ DATE: Date, format: "2016-02-01" "2016-02-01" ...
$ TIME: Factor w/ 1577 levels "00:01.3","00:01.7",..: 448 449 450 451 453 454 455 456 457 458 …
df$TIME is in "MM:SS" format, where seconds is a fractional seconds with decimal. Since hour term
(HH) is missing I want to add hour to the df$TIME so that it has a format of HH:MM:SS. The starting Hour could be 12 such that first observation of df$TIME be like "12:00:01.3"
Finally I want to create a column df$Datetime which could merge both DATE and TIME column.
Please note that the df$TIME do not follow any specific time interval in my case.
Please let me know if its possible

converting data frame (factors) into xts

I know this have been asked several times but I could not find the right way to get around my problem. I have a very simple CSV file that I upload, looking like:
27.07.2015,100
28.07.2015,100.1504
29.07.2015,100.1957
30.07.2015,100.5044
31.07.2015,100.7661
03.08.2015,100.9308
04.08.2015,100.8114
05.08.2015,100.6927
06.08.2015,100.7501
07.08.2015,100.7194
10.08.2015,100.8197
11.08.2015,100.8133
Now I need to convert my data.frame into xts so I can use the PerformanceAnalytics package. My data.frame has the structure:
> str(mpey)
'data.frame': 243 obs. of 2 variables:
$ V1: Factor w/ 243 levels "01.01.2016","01.02.2016",..: 210 218 228 234 241 21 30 38 45 52 ...
- attr(*, "names")= chr "5" "6" "7" "8" ...
$ V2: Factor w/ 242 levels "100","100.0062",..: 1 4 5 10 16 20 17 13 15 14 ...
- attr(*, "names")= chr "5" "6" "7" "8" ...
I tried different things with as.xts function but could make it work.
Could you please help me get over this?
Here's a solution using the tidyquant package, which contains as_xts() for coercing data frames to xts objects and as_tibble() for coercing time series objects such as xts to tibbles ("tidy" data frames).
Recreate your data
> data_df
# A tibble: 12 × 2
date value
<fctr> <fctr>
1 27.07.2015 100
2 28.07.2015 100.1504
3 29.07.2015 100.1957
4 30.07.2015 100.5044
5 31.07.2015 100.7661
6 03.08.2015 100.9308
7 04.08.2015 100.8114
8 05.08.2015 100.6927
9 06.08.2015 100.7501
10 07.08.2015 100.7194
11 10.08.2015 100.8197
12 11.08.2015 100.8133
First, we need to reformat your data frame. The dates and values are both stored as factors and they need to be in a date and double class, respectively. We'll load tidyquant and reformat the data frame. Note that tidyquant loads the tidyverse and financial packages so you don't need to load anything else. The date can be converted with lubridate::dmy which converts characters in a day-month-year format to date. The value needs to go from factor to character then from character to double, and this is done by nesting as.numeric and as.character.
> library(tidyquant)
> data_tib <- data_df %>%
mutate(date = dmy(date),
value = as.numeric(as.character(value)))
> data_tib
# A tibble: 12 × 2
date value
<date> <dbl>
1 2015-07-27 100.0000
2 2015-07-28 100.1504
3 2015-07-29 100.1957
4 2015-07-30 100.5044
5 2015-07-31 100.7661
6 2015-08-03 100.9308
7 2015-08-04 100.8114
8 2015-08-05 100.6927
9 2015-08-06 100.7501
10 2015-08-07 100.7194
11 2015-08-10 100.8197
12 2015-08-11 100.8133
Now, we can coerce to xts using the tidyquant::as_xts() function. Just specify date_col = date.
> data_xts <- data_tib %>%
as_xts(date_col = date)
> data_xts
value
2015-07-27 100.0000
2015-07-28 100.1504
2015-07-29 100.1957
2015-07-30 100.5044
2015-07-31 100.7661
2015-08-03 100.9308
2015-08-04 100.8114
2015-08-05 100.6927
2015-08-06 100.7501
2015-08-07 100.7194
2015-08-10 100.8197
2015-08-11 100.8133

Calendaring Monthly Usages for each Date

Here, i have a data set with Start date and End Date and the usages. I have calculated the number of Days between these two days and got the daily usages. (I am okay with one flat usages for each day for now).
Now, what i want to achieve is the sum of the usage for each day in those TIME-FRAME FOR month of June. For example, the first case will be just the Daily_usage
START_DATE END_DATE x DAYS DAILY_USAGE
1 2015-05-01 2015-06-01 261605.00 32 8175.156250
And, for 2nd, i want to the add the Usage 3905 to June 1st, and also to June 2nd because it spans in both June 1st and June 2nd.
2015-05-04 2015-06-02 117159.00 30 3905.3000000
I want to continue doing this for all 387 rows and at the end get the sum of Usages for each day. And,I do not know how to do this for hundreds of records.
This is what my datasets looks right now:
str(YYY)
'data.frame': 387 obs. of 5 variables:
$ START_DATE : Date, format: "2015-05-01" "2015-05-04" "2015-05-11" "2015- 05-13" ...
$ END_DATE : Date, format: "2015-06-01" "2015-06-01" "2015-06-01" "2015-06-01" ...
$ x : num 261605 1380796 183 103 489 ...
$ DAYS : num 32 29 22 20 19 12 1 34 30 29 ...
$ DAILY_USAGE: num 8175.16 47613.66 8.32 5.13 25.74 ...
Also, the header.
START_DATE END_DATE x DAYS DAILY_USAGE
1 2015-05-01 2015-06-01 261605.00 32 8175.1562500
2 2015-05-04 2015-06-01 1380796.00 29 47613.6551724
6 2015-05-21 2015-06-01 1392.00 12 116.0000000
7 2015-06-01 2015-06-01 2503.00 1 2503.0000000
8 2015-04-30 2015-06-02 0.00 34 0.0000000
9 2015-05-04 2015-06-02 117159.00 30 3905.3000000
10 2015-05-05 2015-06-02 193334.00 29 6666.6896552
13 2015-05-04 2015-06-03 630.00 31 20.3225806
and so on........
Example of data sets and Results
I will call this data set. EXAMPLE1 (For 3 days, mocked up data)
START_DATE END_DATE x DAYS DAILY_USAGE
5/1/2015 6/1/2015 261605 32 8175.15625
5/4/2015 6/1/2015 1380796 29 47613.65517
5/11/2015 6/1/2015 183 22 8.318181818
4/30/2015 6/2/2015 0 34 0
5/20/2015 6/2/2015 70 14 5
6/1/2015 6/2/2015 569 2 284.5
6/1/2015 6/3/2015 582 3 194
6/2/2015 6/3/2015 6 2 3
For the above examples, answer should be like this
DAY USAGE
6/1/2015 56280.6296
6/2/2015 486.5
6/3/2015 197
HOW?
In Example 1, for June 1st, i have added all the rows of usages except the last row usage because the last row doesn't include the the date 06/01 in time-frame. It starts in 06/02 and ends in 06/03.
To get June 2nd, i have added all the usages from Row 4 to 8 because June 2nd is between all of those start and end dates.
For June 3rd, i have only added, Last two rows to get 197.
So, where to sum, depends on the time-frame of Start & End_date.
Hope this helps!
There might be a easy trick to do this than to write 400 lines of If else statement.
Thank you again for your time!!
-Gyve
library(lubridate)
indx <- lapply(unique(mdy(df[,2])), '%within%', interval(mdy(df[,1]), mdy(df[,2])))
cbind.data.frame(DAY=unique(df$END_DATE),
USAGE=unlist(lapply(indx, function(x) sum(df$DAILY_USAGE[x]))))
# DAY USAGE
# 1 6/1/2015 56280.63
# 2 6/2/2015 486.50
# 3 6/3/2015 197.00
Explanation
We can expand it to explain what is happening:
indx <- lapply(unique(mdy(df[,2])), '%within%', interval(mdy(df[,1]), mdy(df[,2])))
The unique end dates are tested to be within the range days in the first and second columns. mdy is a quick way to convert to POSIXct with lubridate. The operator %within% tests a date against an interval. We created intervals with interval('col1', 'col2'). This creates an index that we can subset the data by.
In our final data frame,
cbind.data.frame(DAY=unique(df$END_DATE),
creates the first column of dates.
And,
USAGE=unlist(lapply(indx, function(x) sum(df$DAILY_USAGE[x])))
takes the sum of df$DAILY_USAGE by the index that we created.

Proper date formatting in R

I am currently working with this dataset.
'data.frame': 2938 obs. of 4 variables:
$ X : int 21562 21603 21618 21620 21659 21990 21996 22024 22592 22665 ...
$ uuid : Factor w/ 2938 levels "0005d695-6bc8-48ad-b323-803499630e43",..: 2396 2910 2372 2008 2582 1405 2114 1447 2348 2503 ...
$ date : Factor w/ 2927 levels "2015-06-06T06:33:14Z",..: 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 ...
$ type : Factor w/ 1 level "productCart": 1 1 1 1 1 1 1 1 1 1 ...
There is a variable date here, where date is in this format:
date --> 2015-06-06T06:33:14Z
I want to create a new variable and change date into a more workable format which should look like that:
NewDate --> 2015-06-06 06:33:14
Could you please give me any advice? I am trying few different approaches and none of them works so far.
You can use as.POSIXct to convert to 'POSIXct' class
date1 <- as.POSIXct(date, format='%Y-%m-%dT%H:%M:%SZ')
date1
#[1] "2015-06-06 06:33:14 EDT"
Or using lubridate
library(lubridate)
ymd_hms(date, tz='EDT')
#[1] "2015-06-06 06:33:14 EDT"
If we want to extract the hour and minute part, format can be used
format(date1, '%H:%M')
#[1] "06:33"
data
date <- "2015-06-06T06:33:14Z"

Plot Time Series hourly data for 3 years

Below is what my data looks like. My data is called sales1156.
> sales1156
date.and.time hsales
06/01/11 09:00 14.00
06/01/11 10:00 28.00
06/01/11 11:00 28.00
06/01/11 12:00 28.00
06/01/11 13:00 28.00
06/01/11 14:00 28.00
The data continues till 4th Oct 2013(04/10/2013). I have used the following commands to create a time series object.
> hsales1156xts<-as.xts(sales1156,order.by=as.Date(sales1156$date.and.time,frequency=24))
> is.xts(hsales1156xts)
[1] TRUE
The problem is that I am not able to plot a proper graph.
> plot.xts(hsales1156xts) # This command is throwing a warning as mentioned below
Warning message:
In plot.xts(hsales1156xts) : only the univariate series will be plotted
Stackoverflow is not allowing me to attach the graph. Someone please help me to plot this time series. Any good read or suggestion would be great. I am unable to make much out of the xts and zoo documents. Thus a little detailed syntax and explanation is required.
The date column needs to be excluded from data input in as.xts(x=
Test Example:
require(PerformanceAnalytics)
data(economics)
colnames(economics)
#[1] "date" "pce" "pop" "psavert" "uempmed" "unemploy"
#Subset your timeseries
economics_sub=economics[,c("date","uempmed")]
#Ensure your date or datetime object is in the correct format
economics_sub$date=as.Date(economics_sub[,1],format="%Y-%m-%d")
#Exclude date column whie reading data in "x ="
economics_xts<-as.xts(x=economics_sub[,"uempmed"],order.by=economics_sub[,"date"])
colnames(economics_xts)=colnames(economics_sub)[-1]
head(economics_xts)
# uempmed
#1967-06-30 4.5
#1967-07-31 4.7
#1967-08-31 4.6
#1967-09-30 4.9
#1967-10-31 4.7
#1967-11-30 4.8
#Plot Series using PerformanceAnalytics function 'chart_Series'
chart_Series(economics_xts)
Your Example:
#Data input
sales1156=read.csv(text='date.time,hsales
"06/01/11 09:00",14.00
"06/01/11 10:00",28.00
"06/01/11 11:00",28.00
"06/01/11 12:00",28.00
"06/01/11 13:00",28.00
"06/01/11 14:00",28.00',header=TRUE)
#Check format of your datetime index
str(sales1156)
#'data.frame': 6 obs. of 2 variables:
# $ date.time: Factor w/ 6 levels " 06/01/11 09:00",..: 1 2 3 4 5 6
# $ hsales : num 14 28 28 28 28 28
#The datetime index has been read as a factor and not as datetime object
#Convert datetime to appropriate format, in this case POSIXct format
sales1156$date.time=as.POSIXct(sales1156$date.time,format="%d/%m/%y %H:%M")
#Check if your formatting has worked as intended
str(sales1156)
#'data.frame': 6 obs. of 2 variables:
# $ date.time: POSIXct, format: "2011-01-06 09:00:00" "2011-01-06 10:00:00" ...
# $ hsales : num 14 28 28 28 28 28
#Converion to xts,exclude date column whie reading data in "x ="
hsales1156xts<-as.xts(x=sales1156[,"hsales"],order.by=sales1156[,"date.time"])
colnames(hsales1156xts)=colnames(sales1156)[-1]
head(hsales1156xts)
# hsales
#2011-01-06 09:00:00 14
#2011-01-06 10:00:00 28
#2011-01-06 11:00:00 28
#2011-01-06 12:00:00 28
#2011-01-06 13:00:00 28
#2011-01-06 14:00:00 28
#Plot Series using PerformanceAnalytics function 'chart_Series'
chart_Series(hsales1156xts)

Resources