Convert data frame to time series R - r

I have the following data
data_sample
date Sum
1 Feb 2015 3322.01
2 Mar 2015 6652.77
3 Apr 2015 3311.12
etc
I need to convert to time series for forecasting
> data <- xts(data_sample[,-1], order.by=as.Date(data_sample[,1], "%Y %m"))
Error in 1 - frac : non-numeric argument to binary operator
> data <- xts(data_sample[,-1], order.by=as.Date(data_sample[,1], "%m %Y"))
Error in 1 - frac : non-numeric argument to binary operator
> ts_ts(ts_long(data_sample))
Error in guess_time(x) :
No [time] column detected. To be explict, name time column as 'time'.

If you want to use as.Date(), you have to specify full dates.
Simply add 01 at the end of each entry.
date <- c("Feb 2015", "Mar 2015", "Apr 2015")
date <- as.Date(paste(date, "01"), format="%b %Y %d")
You can convert them back as follows,
format(date, "%b %Y")
or use as.yearmon from zoo library,
library("zoo")
as.yearmon(date)
Some examples here: Converting Date formats in R

R has multiple ways of representing time series. Since you are working with only Date and Sum, I have created a sample time series for you. I choose random dates and numbers.
Call for Packages
library(xts)
Create a Data Frame
data_sample <- data.frame(
date = as.Date(c("2012-01-01","2013-01-01","2014-01-01", )),
sum1 = c(3322.01, 6652.77, 3311.12))
head(data_sample)
Convert the date as in a format which R understands.
rdate<- as.Date(data_sample$date, "%m/%d/%y")
fix(rdate)
Plot the graph
plot(data_sample$sum1~rdate,type="l",col="red")
Execution of above code will gives below output.

Assuming data_sample is as shown reproducibly in the Note at the end, convert to a time series of class zoo using read.zoo and then either use it in that form or convert it to some other class such as xts or ts using the appropriate as.* function. Here we used yearmon class to represent the index as that directly represents year and month without day. This class will be used in zoo and xts and when converting to ts it will be converted appropriately.
library(xts) # this also loads zoo
z <- read.zoo(data_sample, FUN = as.yearmon, format = "%b %Y")
as.xts(z)
as.ts(z)
Date
It is also possible to use Date class for the index in zoo and xts but that does not work well with ts class. Using Date class implies that the distance between consecutive points varies according to the number of days per month as opposed to being a regularly spaced series so using Date for monthly data is normally not useful for forecasting.
zd <- aggregate(z, as.Date, c)
xd <- as.xts(zd)
Note
Input in reproducible form
Lines <- "date,Sum
1,Feb 2015,3322.01
2,Mar 2015,6652.77
3,Apr 2015,3311.12 "
data_sample <- read.csv(text = Lines)

air1 <- type.convert(.preformat.ts(AirPassengers))
airpassengers <- as.data.frame(air1)
View(airpassengers)
class(airpassengers)
[1] "data.frame"
It converts time series data to dataframe.

Related

Convert character to date format and then compute difference in days

I know this question has probably been answered in different ways, but still struggling with this. I am working with a dataset where the dates format for date1 is '2/1/2000', '5/12/2000', '6/30/2015' where the class() is character. And the second column of dates date2 in the format '2015-07-06', '2015-08-01', '2017-10-09' where the class() is "POSIXct" "POSIXt" .
I am attempting to standardize both columns so I can compute the difference in days between them using something like this
abs(difftime(date1 ,date2 , units = c("days")))
I have tried numerous ways in converting the first date1 into the same class using strtime, lubridate etc. What's the best way to move forward for me to be able to standardize both and compute the difference in days?
sample data
x <- c('2/1/2000', '5/12/2000', '6/30/2015')
y <- as.POSIXct(c('2015-07-06', '2015-08-01', '2017-10-09'))
code
#make both posixct
x2 <- as.POSIXct(x, format = "%m/%d/%Y")
abs(x2 - y)
# Time differences in days
# [1] 5633.958 5559.000 832.000

How to convert R date() values

I've got a dateframe with a lot of dates in it that were generated by the date() command in R, resembling the first dataframe below. On my computer with this version of R, the date values are formatted like this "Thu Mar 18 11:15:23 2021" - I believe this is all base R stuff.
I want to strip the weekday, the hours, minutes, and seconds away, and then transform it so that it looks like this "2021-03-18". My goal dataframe is the second dataframe below. I've tried various as.Date() or strftime functions to no avail.
df <- data.frame(date=c(date(),date()),value = c(1,2))
df <- data.frame(date =c("2021-03-18","2021-03-18"), value = c(1,2))
If you don't need strings, you can skip the strftime call and only use as.Date
df <- data.frame(
date=c(date(),date()),
value = c(1,2),
stringsAsFactors = FALSE
)
df$date <- strftime(as.Date(df$date, "%c"), "%Y-%m-%d")
https://stat.ethz.ch/R-manual/R-patched/library/base/html/strptime.html

Format POSIX scenario in Dates

Create a variable of value 15Aug1947 and 15Aug2018 in POSIX Date format.
Find the number of days elapsed since Independence as of 15th August 2018.
Need to code in R language.
DATE1 <- c("15Aug1947")
DATE2 <- c("15Aug2018")
X <- as.Date(DATE1, "%d/%m/%y") - as.Date(DATE2 , "%d/%m/%y")
print(X)
You are close, but are missing a small detail. The second argument in as.Date requires you to specify exactly in what format your dates is coming from. Right now, you are saying your date is comprised of 15/08/1947. Two things are wrong with this. Your date has no slashes and the month is not an integer but an abbreviation of the month name. The correct way to parse this date would be
> ps <- "%d%b%Y"
> DATE1 <- c("15Aug1947")
> DATE2 <- c("15Aug2018")
> X <- as.Date(DATE1, ps) - as.Date(DATE2 , ps)
>
> print(X)
Time difference of -25933 days
For more information on how to construct the string for parsing, see ?strptime.
You can use a package to parse dates automatically, such as lubridate.
The following code may help!
#Create a variable of value 15Aug1947 and 15Aug2018 in POSIX Date format
dt <- c(as.POSIXct("15Aug1947", format = "%d%b%Y"),as.POSIXct("15Aug1948", format = "%d%b%Y"))
#Finding the number of days elapsed
difftime(dt[2], dt[1], units = "days")
#Time difference of 25933 days

Converting Forecast class representation of dates to actual calendar dates

Good day,
I am building an auto.arima forecast in R. I was able to complete the forecast successfully, however the results is not displaying the date.
Forecast result:
The Plot
Data
So if you look at the x-axis, you see here it displays the years in periods.I would like to be able to export this data with actual dates
I use
library("tseries")
library("forecast")
library("xts")
The code:
Pulsedata$date <- as.Date(Pulsedata$date,format = "%d-%b-%y")
PD_ts <- msts(Pulsedata$Call_volume, start = c(2016, 01), end = c(2018,
365), seasonal.periods=c(365))
DPD_ts <- decompose(PD_ts, "multiplicative")
AA <- auto.arima(ts(PD_ts,frequency=365),D=1)
Myforecast <- forecast(AA,h=365)
plot(Myforecast)
I have tried:
Anydate
sweep
as.date
lubridate
setDT

Create date column from datetime in R

I am new to R and I am an avid SAS programmer and am just having a difficult time wrapping my head around R.
Within a data frame I have a date time column formatted as a POSIXct with the following the column appearing as "2013-01-01 00:53:00". I would like to create a date column using a function that extracts the date and a column to extract the hour. In an ideal world I would like to be able to extract the date, year, day, month, time and hour all within the data frame to create these additional columns within the data frame.
It is wise to always to be careful with as.Date(as.POSIXct(...)):
E.g., for me in Australia:
df <- data.frame(dt=as.POSIXct("2013-01-01 00:53:00"))
df
# dt
#1 2013-01-01 00:53:00
as.Date(df$dt)
#[1] "2012-12-31"
You'll see that this is problematic as the dates don't match. You'll hit problems if your POSIXct object is not in the UTC timezone as as.Date defaults to tz="UTC" for this class. See here for more info: as.Date(as.POSIXct()) gives the wrong date?
To be safe you probably need to match your timezones:
as.Date(df$dt,tz=Sys.timezone()) #assuming you've just created df in the same session:
#[1] "2013-01-01"
Or safer option #1:
df <- data.frame(dt=as.POSIXct("2013-01-01 00:53:00",tz="UTC"))
as.Date(df$dt)
#[1] "2013-01-01"
Or safer option #2:
as.Date(df$dt,tz=attr(df$dt,"tzone"))
#[1] "2013-01-01"
Or alternatively use format to extract parts of the POSIXct object:
as.Date(format(df$dt,"%Y-%m-%d"))
#[1] "2013-01-01"
as.numeric(format(df$dt,"%Y"))
#[1] 2013
as.numeric(format(df$dt,"%m"))
#[1] 1
as.numeric(format(df$dt,"%d"))
#[1] 1
Use the lubridate package. For example, if df is a data.frame with a column dt of type POSIXct, then you could:
df$date = as.Date(as.POSIXct(df$dt, tz="UTC"))
df$year = year(df$dt)
df$month = month(df$dt)
df$day = day(df$dt)
# and so on...
If your can store your data in a data.table, then this is even easier:
df[, `:=`(date = as.Date(as.POSIXct(dt, tz="UTC")), year = year(dt), ...)]

Resources