I am working with data collected monthly. In my dataset, there are some months where no data was collected and thus, there is no entry in my data. I have previously used bfastts for similar occurrences when data was collected daily, so that I may have NA values in my data. How may I do the same for monthly data, using bfastts or some other function?
eg. below if needed
2006-06-01 2.260121
2006-07-01 2.306800
2006-08-01 2.246624
2006-09-01 1.724565
2006-11-01 1.630561
2007-05-01 2.228918
2007-06-01 2.228918
2007-07-01 2.22891
I wish to have NA fields for December to March.
The question did not specify what class of object is desired but here are three. zoo supports an irregularly spaced index so it does not need to insert NA's but ts does not and converting from zoo to ts automatically inserts NA's. Convert the ts object back to zoo again or to a data frame to get a zoo or data frame object with NA's.
The zoo and data frame objects use yearmon class for the index which internally represents year/month as year + fraction where fraction is 0, 1/12, ..., 11/12 for Jan, Feb, ..., Dec and displays in meaningful form. as.Date can be used to convert yearmon objects to Date objects although in this case yearmon probably makes more sense since it directly represents year and month without day.
If you want to go in the other direction and remove NA's use na.omit(z_na) or na.omit(DF_na) .
library(zoo)
# zoo object - no NA's
z <- read.zoo(DF, FUN = as.yearmon)
# ts object with NA's
tt <- as.ts(z)
# zoo object with NA's
z_na <- as.zoo(tt)
# data.frame with NA's
DF_na <- fortify.zoo(tt)
Note
Lines <- "2006-06-01 2.260121
2006-07-01 2.306800
2006-08-01 2.246624
2006-09-01 1.724565
2006-11-01 1.630561
2007-05-01 2.228918
2007-06-01 2.228918
2007-07-01 2.22891"
DF <- read.table(text = Lines)
Related
This question already has an answer here:
Subsetting winter (Dez, Jan, Feb) from daily time series (zoo)
(1 answer)
Closed 7 years ago.
My zoo object
I Would like to create a subset of values (these are discharge values) containing only December flow values.
Thank you!
We can extract the 'months' from the index with format, get a logical index by comparing with 'Dec', and use that to subset the zoo object.
z1[format(index(z1), '%b')=='Dec']
#1938-12-03 1938-12-10 1938-12-17 1938-12-24 1938-12-31
# 49 50 51 52 53
If we convert to xts object, .indexmon from the xts package can be also used. The .indexmon starts from 0, so December is 11.
library(xts)
z1[.indexmon(as.xts(z1))==11]
Other options from the comments are using grep on the index to get the numeric index and subset (from #Pierre Lafortune)
z1[grep("-12-",index(z1))]
Or with subset/month option (from # G. Grothendieck)
subset(z1, months(time(z1)) == "December")
data
library(zoo)
z1 <- zoo(1:100, order.by = seq(as.Date('1938-01-01'),
length.out=100, by = '1 week'))
I have some intermittent demand data that only includes lines where demand is present. I bring it in via read.csv, and my 2 columns are Date (as date) and Quantity (as integer). Then I convert it to a zoo series and combine the daily demand into monthly demand. My final output is a zoo series with the date being the first day of the month and the summed demand for that month.
My problem is that this zoo series is missing the in between months that have zero demand and I need these to forecast intermittent demand correctly.
For example: I have quantity 2 in date 2013-01-01 and then the next line is quantity 3 in 2013-10-01. I need to add quantity zero to 2013-02-01 through 2013-09-01.
Date <- c('1/1/2013','10/1/2013','11/1/2013')
Quantity <- c('2','3','6')
Date <- as.Date(Date, "%m/%d/%Y")
df <- data.frame(Date, Quantity)
df <- read.zoo(df)
df
The zoo series output:
2013-01-01 2013-10-01 2013-11-01
2 3 6
Because "df" is a zoo object, you may use merge.zoo and its fill argument. The current data set is merged with an empty zoo object which contains all the desired dates.
tt <- seq(min(Date), max(Date), "month")
merge(df, zoo(, tt), fill = 0)
# 2013-01-01 2013-02-01 2013-03-01 2013-04-01 2013-05-01 2013-06-01 2013-07-01 2013-08-01 2013-09-01 2013-10-01 2013-11-01
# 2 0 0 0 0 0 0 0 0 3 6
For further examples, see ?merge.zoo ("extend an irregular series to a regular one").
You can use merge to add the missing rows and then set their values to zero.
First, let's create some fake data:
# Vector of dates from Jan 1, 2015, to Mar 31, 2015
dates = seq(as.Date("2015-01-01"), as.Date("2015-03-31"), by="1 day")
# Let's create data for few of these dates, leaving some out
set.seed(55)
dat = data.frame(dates=dates[sample(1:length(dates), 70)],
quantity=sample(1:10, 70, replace=TRUE))
dat = dat[order(dat$dates),]
Now let's make believe dat is what you imported from a csv file. We want to fill in quantity=0 for the missing dates. So first we need to add rows for the missing dates. You can do this by creating a date vector containing all dates from the first date to the last date in your csv file and using the merge function. In this case, we've already created that date vector above.
Now merge in rows for the missing dates. The new rows will have NA for quantity. We'll change those NAs to zero below.
dat = merge(data.frame(dates), dat, by="dates", all.x=TRUE)
# Set missing values to zero
dat$quantity[is.na(dat$quantity)] = 0
Now you can aggregate by month, convert to a zoo series, etc.
i have an irregular time interval like this
df=data.frame(Date=c("2013-01-08","2013-01-11","2013-01-13","2013-01-21","2013-02-06"), runningtotal=c(800,910,1060,1210,660)
i found through zoo object it can be merged with a regular time interval and fill in 0 as missing values. However, I need to fill in previous value instead, except at month start fill it with 0. So the end output is like this:
date runningtotal
2013-01-01 0
2013-01-02 0
...
2013-01-08 800
2013-01-09 800
2013-01-10 800
2013-01-11 910
2013-01-12 910
2013-01-13 1060
...
2013-02-01 0
And also, does it make sense to fill in value like this for forecasting purpose?
Thanks.
Try approxfun with the constant method. I don't have lubridate and just deal with regular Date objects. For instance:
df<-data.frame(Date=c("2013-01-08","2013-01-11","2013-01-13","2013-01-21","2013-02-06"), runningtotal=c(800,910,1060,1210,660))
df$Date<-as.Date(as.character(df$Date))
#create some new dates
newDates<-seq(df$Date[1],df$Date[5],length.out=10)
intfun<-approxfun(df$Date,df$runningtotal,method="constant",yleft=0,yright=0)
data.frame(newDates,intfun(newDates))
I would use na.locf from zoo package. But You should prepare data before applying it.
## generate a vector of dates
mm <- min(DF$Date)
day(mm) <- 1
seq_dates <- seq.POSIXt(mm,max(DF$Date),by='days')
## add zeros valus for the beging of month
DF <- rbind(DF,data.frame(Date=seq_dates[day(seq_dates)==1],runningtotal=0))
library(zoo)
## merge with the sequence of dates , and apply na.locf for previous values.
na.locf(merge(seq_dates,DF,by=1,all.x=TRUE))
The idea is to apply na.locf that change missing values with the previous non missing values. Merge your data with a sequence of dates(from the first month to the end of dates) will insert missing values.
I'm trying to analyze 1-year %-change data in R on two data series by merging them into one file. One series is weekly and the other is monthly. Converting the weekly series to monthly is the problem. Using apply.monthly() on the weekly data creates a monthly file but with intra-monthly dates that don't match the first-day-of-month format in the monthly series after combining the two files via merge.xts(). Question: How to change the resulting merged file (sample below) to one monthly entry for both series?
2012-11-01 0.02079801 NA
2012-11-24 NA -0.03375796
2012-12-01 0.02052502 NA
2012-12-29 NA 0.04442094
2013-01-01 0.01881466 NA
2013-01-26 NA 0.06370272
2013-02-01 0.01859883 NA
2013-02-23 NA 0.02999318
You can pass indexAt="firstof" in a call to to.monthly to get monthly data using the first of the month for the index.
library(quantmod)
getSymbols(c("USPRIV", "ICSA"), src="FRED")
merge(USPRIV, to.monthly(ICSA, indexAt="firstof", OHLC=FALSE))
Something like this:
do.call(rbind, by(d[-1], d[[1]] - as.POSIXlt(d[[1]])$mday, FUN=apply, 2, sum, na.rm=TRUE))
## V2 V3
## 2012-10-31 0.02079801 -0.03375796
## 2012-11-30 0.02052502 0.04442094
## 2012-12-31 0.01881466 0.06370272
## 2013-01-31 0.01859883 0.02999318
Note that the dates are encoded as row names, not as a column in the result.
It is a frequently occurring issue. And sometimes I forget my own solution for it and google does not easily lead to one. So I am posting my solution here.
Basically, just convert the index of monthly aggregated series to yearmon. You can also optionally convert it back to yyyy-mm-dd (to 1st of each month ) format with as.date . After the exact dates are stripped and the indices are 'homogenised' , all the columns align perfectly.
# Here with dplyr
time(myxts)<- time(myxts) %>% as.yearmon() %%> as.date()
#or without dplyr
time(myxts)<- as.date( as.yearmon( time(myxts) ) )
I have a spreadsheet in excel which consists of first row of dates and then subsequent columns that refer to prices of different securities on those dates.
I saved the excel file as a csv and then imported to excel using
prices=read.csv(file="C:/Documents and Settings/Hugh/My Documents/PhD/Option prices.csv",header = TRUE, sep = ",")
This creates the correct time series data
x<-ts(prices[,2])
but does not have the dates attached.
However the dates refer to working days. So although in general they represent Monday-Friday this is not always the case because of holidays etc.
How then can I create a time series where the dates are read in from the first column of the csv file? I can not find an example in R where this is done
As you didn't give any data, here is a made-up data.frame:
R> DF <- data.frame(date="2011-05-15", time=c("08:25:00", "08:45:00",
+ "09:05:11"), val=rnorm(3, 100, 5))
R> DF
date time val
1 2011-05-15 08:25:00 99.5926
2 2011-05-15 08:45:00 95.8724
3 2011-05-15 09:05:11 96.6436
R> DF <- within(DF, posix <- as.POSIXct(paste(date, time)))
R> DF
date time val posix
1 2011-05-15 08:25:00 99.5926 2011-05-15 08:25:00
2 2011-05-15 08:45:00 95.8724 2011-05-15 08:45:00
3 2011-05-15 09:05:11 96.6436 2011-05-15 09:05:11
R>
I used within(), you can use other means to in order to assign new columns. The key is that paste() allows you to combine columns, and you could use other R functions to modify the data as needed.
The key advantage of having dates and times parsed in a suitable type (like POSIXct) is that other functions can then use it. Here is zoo:
R> z <- with(DF, zoo(val, order.by=posix))
R> summary(z)
Index z
Min. :2011-05-15 08:25:00.00 Min. :95.9
1st Qu.:2011-05-15 08:35:00.00 1st Qu.:96.3
Median :2011-05-15 08:45:00.00 Median :96.6
Mean :2011-05-15 08:45:03.67 Mean :97.4
3rd Qu.:2011-05-15 08:55:05.50 3rd Qu.:98.1
Max. :2011-05-15 09:05:11.00 Max. :99.6
R>