Problems with anomalize function - r

I need to check the data array with function "Anomalize".
First I hooked up some libraries
library(tidyverse)
library(anomalize)
library(dplyr)
library(zoo)
library(ggplot2)
library(forecast)
library(anytime)
Then I delete all column that i do not need for this task
trash1 <- ASD[, -2]
trash2 <- trash1[,-2]
trash3 <- trash2[,-2]
trash4 <- trash3[,-2]
trash5 <- trash4[,-2]
trash6 <- trash5[,-2]
trash7 <- trash6[,-4]
trash8 <- trash7[,-4]
view(trash8)
Change class from Factor to Date:
trash8$DMY <- as.Date(trash8$DMY, format="%d.%m.%y")
Than I tryed to anomalize this
trash_tbl <- as_tibble(trash8)
trash_tbl %>%
time_decompose(Qp) %>%
anomalize(remainder) %>%
time_recompose() %>%
plot_anomalies(time_recomposed = TRUE, ncol = 3 , alpha_dots = 0.5)
As the result I have this error:
Converting from tbl_df to tbl_time.
Auto-index message: index = DMY
Note: Index not ordered. tibbletime assumes index is in ascending order. Results may not be as desired.
Error: Only year, quarter, month, week, and day periods are allowed for an index of class Date
Please help me with it or say, what can I read to solve that problem??
This is my data. DMY - Date, MCC - Factor, Art - Numeric, Qp - Numeric , Ql - Factor
1 DMY MCC Art Qp Ql
1 2016-01-01 UA0000468 1801 3520 440
2 2016-01-01 UA0000468 3102 3024 604,8
3 2016-01-01 UA0000468 4419 270 521,1
4 2016-01-01 UA0000468 5537 1080 2084,4
5 2016-01-03 UA0010557 3528 180 36
6 2016-01-03 UA0010557 3529 198 39,6
...

Related

Fill in missing date and fill with the data above

I've researched enough until i ask this here but can you please help me with some ideas for this issue?
My data table (df) looks like this:
client id value repmonth
123 100 2012-01-31
123 200 2012-02-31
123 300 2012-05-31
Therefore I have 2 missing months. And i want my data table to look like this:
client id value repmonth
123 100 2012-01-31
123 200 2012-02-31
123 200 2012-03-31
123 200 2012-04-31
123 300 2012-05-31
The code should be filling in the missing repmonth and fill the rows with the last value, in this case 200 and the came client id.
I have tried the following:
zoo library
tidyr library
dlpyr library
posixct
As for codes: ...plenty of fails
library(tidyr)
df %>%
mutate (repmonth = as.Date(repmonth)) %>%
complete(repmonth = seq.Date(min(repmonth), max(repmonth),by ="month"))
or
library(dplyr)
df$reportingDate.end.month <- as.POSIXct(df$datetime, tz = "GMT")
df <- tbl_df(df)
list_df <- list(df, df) # fake list of data.frames
seq_df <- data_frame(datetime = seq.POSIXt(as.POSIXct("2012-01-31"),
as.POSIXct("2018-12-31"),
by="month"))
lapply(list_df, function(x){full_join(total_loan_portfolios_3$reportingDate.end.month, seq_df, by=reportingDate.end.month)})
total_loan_portfolios_3$reportingmonth_notmissing <- full_join(seq_df,total_loan_portfolios_3$reportingDate.end.month)
or
library(dplyr)
ts <- seq.POSIXt(as.POSIXct("2012-01-01",'%d/%m/%Y'), as.POSIXct("2018/12/01",'%d/%m/%Y'), by="month")
ts <- seq.POSIXt(as.POSIXlt("2012-01-01"), as.POSIXlt("2018-12-01"), by="month")
ts <- format.POSIXct(ts,'%d/%m/%Y')
df <- data.frame(timestamp=ts)
total_loan_portfolios_3 <- full_join(df,total_loan_portfolios_3$Reporting_date)
Finally, I have plenty of errors like
the format is not date
or
Error in seq.int(r1$mon, 12 * (to0$year - r1$year) + to0$mon, by) :
'from' must be a finite number
and others.
The following solution uses lubridate and tidyr packages. Note that in OP example, dates are malformed, but implies having data with last-day-of-month input, so tried to replicate it here. Solution creates a sequence of dates from min input date to max input date to get all possible months of interest. Note that input dates are normalized to first-day-of-month to ensure proper sequence generation. With the sequence created, a left-join merge is done to merge data we have and identify missing data. Then fill() is applied to columns to fill in the missing NAs.
library(lubridate)
library(tidyr)
#Note OP has month of Feb with 31 days... Corrected to 28 but this fails to parse as a date
df <- data.frame(client_id=c(123,123,123),value=c(100,200,300),repmonth=c("2012-01-31","2012-02-29","2012-05-31"),stringsAsFactors = F)
df$repmonth <- ymd(df$repmonth) #convert character dates to Dates
start_month <- min(df$repmonth)
start_month <- start_month - days(day(start_month)-1) #first day of month to so seq.Date sequences properly
all_dates <- seq.Date(from=start_month,to=max(df$repmonth),by="1 month")
all_dates <- (all_dates %m+% months(1)) - days(1) #all end-of-month-day since OP suggests having last-day-of-month input?
all_dates <- data.frame(repmonth=all_dates)
df<-merge(x=all_dates,y=df,by="repmonth",all.x=T)
df <- fill(df,c("client_id","value"))
Solution yields:
> df
repmonth client_id value
1 2012-01-31 123 100
2 2012-02-29 123 200
3 2012-03-31 123 200
4 2012-04-30 123 200
5 2012-05-31 123 300

compare date variable with a list of dates

I have a df with a datetime variable (made with lubridate)
str(raw_data$date)
POSIXct[1:37166], format: "2016-11-04 09:12:38" "2016-11-04 09:04:08" "2016-11-04 09:04:14" "2016-11-04 09:08:01" "2016-11-04 09:11:56" ...
and a list of dates for a school term
vsdate<- c("2017/01/30","2017/03/31","2017/04/18","2017/06/30","2017/07/17","2017/09/22","2017/10/09","2017/12/22","2018/01/30","2018/03/29","2018/04/16","2018/06/29","2018/07/16","2018/09/21","2018/10/08","2018/12/21")
vsdate <- as_date(vsdate)
I want to compare if the dates in the list are between the dates in raw_data. I have done this below, but I can't get it to work in the tidyverse:
vsdate<- c("2017/01/30","2017/03/31","2017/04/18","2017/06/30","2017/07/17","2017/09/22","2017/10/09","2017/12/22","2018/01/30","2018/03/29","2018/04/16","2018/06/29","2018/07/16","2018/09/21","2018/10/08","2018/12/21")
vsdate <- as.Date(vsdate)
raw_data$Vic.School.Term=0
raw_data[raw_data$date<=vsdate[2]& raw_data$date>=vsdate[1],"Vic.School.Term"]<-1
raw_data[raw_data$date<vsdate[4]& raw_data$date>=vsdate[3],"Vic.School.Term"]<-1
raw_data[raw_data$date<vsdate[6]& raw_data$date>=vsdate[5],"Vic.School.Term"]<-1
raw_data[raw_data$date<vsdate[8]& raw_data$date>=vsdate[7],"Vic.School.Term"]<-1
raw_data[raw_data$date<=vsdate[10]& raw_data$date>=vsdate[9],"Vic.School.Term"]<-1
raw_data[raw_data$date<vsdate[12]& raw_data$date>=vsdate[11],"Vic.School.Term"]<-1
raw_data[raw_data$date<vsdate[14]& raw_data$date>=vsdate[13],"Vic.School.Term"]<-1
raw_data[raw_data$date<vsdate[16]& raw_data$date>=vsdate[15],"Vic.School.Term"]<-1
and here is my failed attempt in the tidyverse:
raw_data<- raw_data <- mutate(school.term=case_when(
between(date,vsdate[1],vsdate[2] ~ 1)))
Error in between(date, vsdate[1], vsdate[2] ~ 1) :
Expecting a single value: [extent=3].
Thanks!
Your between function is not closed properly. The proper signature for it is between(value,left, right) and you have between(value, left, right ~1). See below for the 1st few cases:
library(dplyr)
library(lubridate)
raw_data <- data.frame( date = c("2016-11-04 09:12:38", "2016-11-04 09:04:08",
"2016-11-04 09:04:14", "2016-11-04 09:08:01",
"2016-11-04 09:11:56", "2017-02-15 09:10:01",
"2017-05-01 10:00:00")
)
raw_data %>% mutate(date = ymd_hms(date)) -> raw_data
str(raw_data)
vsdate<- ymd(c("2017/01/30","2017/03/31","2017/04/18","2017/06/30",
"2017/07/17","2017/09/22","2017/10/09","2017/12/22",
"2018/01/30","2018/03/29","2018/04/16","2018/06/29",
"2018/07/16","2018/09/21","2018/10/08","2018/12/21"))
str(vsdate)
raw_data %>% mutate(school.term = case_when(between(as.Date(date), vsdate[1], vsdate[2]) ~1,
between(as.Date(date), vsdate[3], vsdate[4]) ~1,
TRUE ~ 0)
date school.term
1 2016-11-04 09:12:38 0
2 2016-11-04 09:04:08 0
3 2016-11-04 09:04:14 0
4 2016-11-04 09:08:01 0
5 2016-11-04 09:11:56 0
6 2017-02-15 09:10:01 1
7 2017-05-01 10:00:00 1
Also, note the as.Date function in the between. This allows the comparison between POSIXct and regular date format in R

How to use TTR::adjRatios() with tidyquant?

I'm trying to reproduce this example which adjusts stock prices for dividends using the tidyquant framework.
Here is the original example:
library(quantmod)
library(tidyquant)
library(timetk)
SPY.Close <- Cl(getSymbols("SPY", auto.assign=FALSE))
SPY.Div <- getDividends("SPY", auto.assign=FALSE)
# Within xts framework
SPY <- merge(SPY.Close, SPY.Div)
# now adjust close for dividends
ratios <- adjRatios(dividends=SPY[,"SPY.div"], close=SPY[,"SPY.Close"])
SPY$SPY.Adjusted <- (ratios$Split * ratios$Div) * SPY$SPY.Close
# only keep dates from the original object
SPY <- SPY[index(SPY.Close), ]
Here is my attempt to do this in tidyquant:
#convert xts to tibble
spy.tbl <- tk_tbl(merge(SPY.Close, SPY.Div), preserve_index = TRUE)
#add a splits placeholder because adjRatios() complains if its not there.
spy.tbl$SPY.splits <- 0
spy.adj <- spy.tbl %>%
tq_mutate(
select = c(index, SPY.Close, SPY.div, SPY.splits),
mutate_fun = adjRatios,
splits = SPY.splits,
dividends = SPY.div,
close = SPY.Close
)
but this gives the error:
Error in fun_transmute(., ...) : unused argument (.)
I've tried various combinations of arguments, but I can't seem to make it work.
Just in case anyone searches for this topic, I've solved my own problem with the following code. The benefit of this is that it is done in the tidyverse framework and is easily expanded to many tickers with group_by(ticker).
data is a data frame with Close and previous close:
Date ticker Close Cl.prev
1 2017-08-14 SPY_US 246.54 244.12
2 2017-08-15 SPY_US 246.51 246.54
3 2017-08-16 SPY_US 246.94 246.51
4 2017-08-17 SPY_US 243.09 246.94
5 2017-08-18 SPY_US 242.71 243.09
6 2017-08-21 SPY_US 242.90 242.71
div.data is a tibble with only the dividend payments, Date here is the Ex-Div date.
ticker Date div
2 SPY_US 2017-09-15 1.234574
3 SPY_US 2017-12-15 1.351333
4 SPY_US 2018-03-16 1.096775
5 SPY_US 2018-06-15 1.245568
This chain merges the price data into the div.data in order to get the prices to calculate the adjRatio
div.data <- div.data %>%
left_join(., data[, c("Date", "ticker", "Close", "Cl.prev")], by = c("ticker", "Date"))
This chain calculates the adjRatio:
div.data <- div.data %>%
mutate(ratio = 1-div / Cl.prev) %>%
mutate(adjRatio = rev(cumprod(rev(ratio)))) %>%
select(-Close, -Cl.prev, -ratio)
This chain merges the div.data back into the price series, propagates the adjRatio and calculates the Adjusted Close:
data.adj <- data %>%
left_join(., div.data, by = c("ticker", "Date") ) %>%
mutate(adjRatio = dplyr::lead(adjRatio, n=1)) %>%
mutate(adjRatio = na.locf(adjRatio, fromLast = TRUE, na.rm = FALSE)) %>%
mutate(adjRatio = na.fill(adjRatio, fill = 1.0)) %>%
mutate(Cl.adj = Close * adjRatio) %>%
select(-Cl.prev, -div, -adjRatio)
Here is the final data:
> head(data.adj)
Date ticker Close Cl.adj
1 2017-08-14 SPY_US 246.54 242.0153
2 2017-08-15 SPY_US 246.51 241.9858
3 2017-08-16 SPY_US 246.94 242.4079
4 2017-08-17 SPY_US 243.09 238.6286
5 2017-08-18 SPY_US 242.71 238.2556
6 2017-08-21 SPY_US 242.90 238.4421
At the moment, there are only two forms of tq_mutate() and tq_mutate_xy(). The adjRatios() function has 3 inputs, which would require x,y,z.

R convert yy-mm string to date format [duplicate]

I have a data frame (df) like the following:
Date Arrivals
2014-07 100
2014-08 150
2014-09 200
I know that I can convert the yearmon dates to the first date of each month as follows:
df$Date <- as.POSIXct(paste0(as.character(df[,1]),"-01"), format = "%Y-%m-%d")
However, given that my data is not available until the end of the month I want to index it to the end rather than the beginning, and I cannot figure it out. Any help appreciated.
If the Date variable is an actual yearmon class vector, from the zoo package, the as.Date.yearmon method can do what you want via its argument frac.
Using your data, and assuming that the Date was originally a character vector
library("zoo")
df <- data.frame(Date = c("2014-07", "2014-08", "2014-09"),
Arrivals = c(100, 150, 200))
I convert this to a yearmon vector:
df <- transform(df, Date2 = as.yearmon(Date))
Assuming this is what you have, then you can achieve what you want using as.Date() with frac = 1:
df <- transform(df, Date3 = as.Date(Date2, frac = 1))
which gives:
> df
Date Arrivals Date2 Date3
1 2014-07 100 Jul 2014 2014-07-31
2 2014-08 150 Aug 2014 2014-08-31
3 2014-09 200 Sep 2014 2014-09-30
That shows the individual steps. If you only want the final Date this is a one-liner
## assuming `Date` is a `yearmon` object
df <- transform(df, Date = as.Date(Date, frac = 1))
## or if not a `yearmon`
df <- transform(df, Date = as.Date(as.yearmon(Date), frac = 1))
The argument frac in the fraction of the month to assign to the resulting dates when converting from yearmon objects to Date objects. Hence, to get the first day of the month, rather than convert to a character and paste on "-01" as your Question showed, it's better to coerce to a Date object with frac = 0.
If the Date in your df is not a yearmon class object, then you can solve your problem by converting it to one and then using the as.Date() method as described above.
Here is a way to do it using the zoo package.
R code:
library(zoo)
df
# Date Arrivals
# 1 2014-07 100
# 2 2014-08 150
# 3 2014-09 200
df$Date <- as.Date(as.yearmon(df$Date), frac = 1)
# output
# Date Arrivals
# 1 2014-07-31 100
# 2 2014-08-31 150
# 3 2014-09-30 200
Using lubridate, you can add a month and subtract a day to get the last day of the month:
library(lubridate)
ymd(paste0(df$Date, '-01')) + months(1) - days(1)
# [1] "2014-07-31" "2014-08-31" "2014-09-30"

Converting yearmon column to last date of the month in R

I have a data frame (df) like the following:
Date Arrivals
2014-07 100
2014-08 150
2014-09 200
I know that I can convert the yearmon dates to the first date of each month as follows:
df$Date <- as.POSIXct(paste0(as.character(df[,1]),"-01"), format = "%Y-%m-%d")
However, given that my data is not available until the end of the month I want to index it to the end rather than the beginning, and I cannot figure it out. Any help appreciated.
If the Date variable is an actual yearmon class vector, from the zoo package, the as.Date.yearmon method can do what you want via its argument frac.
Using your data, and assuming that the Date was originally a character vector
library("zoo")
df <- data.frame(Date = c("2014-07", "2014-08", "2014-09"),
Arrivals = c(100, 150, 200))
I convert this to a yearmon vector:
df <- transform(df, Date2 = as.yearmon(Date))
Assuming this is what you have, then you can achieve what you want using as.Date() with frac = 1:
df <- transform(df, Date3 = as.Date(Date2, frac = 1))
which gives:
> df
Date Arrivals Date2 Date3
1 2014-07 100 Jul 2014 2014-07-31
2 2014-08 150 Aug 2014 2014-08-31
3 2014-09 200 Sep 2014 2014-09-30
That shows the individual steps. If you only want the final Date this is a one-liner
## assuming `Date` is a `yearmon` object
df <- transform(df, Date = as.Date(Date, frac = 1))
## or if not a `yearmon`
df <- transform(df, Date = as.Date(as.yearmon(Date), frac = 1))
The argument frac in the fraction of the month to assign to the resulting dates when converting from yearmon objects to Date objects. Hence, to get the first day of the month, rather than convert to a character and paste on "-01" as your Question showed, it's better to coerce to a Date object with frac = 0.
If the Date in your df is not a yearmon class object, then you can solve your problem by converting it to one and then using the as.Date() method as described above.
Here is a way to do it using the zoo package.
R code:
library(zoo)
df
# Date Arrivals
# 1 2014-07 100
# 2 2014-08 150
# 3 2014-09 200
df$Date <- as.Date(as.yearmon(df$Date), frac = 1)
# output
# Date Arrivals
# 1 2014-07-31 100
# 2 2014-08-31 150
# 3 2014-09-30 200
Using lubridate, you can add a month and subtract a day to get the last day of the month:
library(lubridate)
ymd(paste0(df$Date, '-01')) + months(1) - days(1)
# [1] "2014-07-31" "2014-08-31" "2014-09-30"

Resources