Convert characters to dates in a panel data set - r

I am downloading and using the following panel data,
# load / install package
library(rsdmx)
library(dplyr)
# Total
Assets.PIT <- readSDMX("http://widukind-api.cepremap.org/api/v1/sdmx/IMF/data/IFS/..Q.BFPA-BP6-USD")
Assets.PIT <- as.data.frame(Assets.PIT)
names(Assets.PIT)[10]<-"A.PI.T"
names(Assets.PIT)[6]<-"Code"
AP<-Assets.PIT[c("WIDUKIND_NAME","Code","TIME_PERIOD","A.PI.T")]
AP<-rename(AP, Country=WIDUKIND_NAME, Year=TIME_PERIOD)
My goal is to convert the column vector Year in the dataframe AP into a vector of class dates. In other words, I want R to understand the time series part of my panel data. For your information, I have quarterly data, with unbalanced date range across cross sections (in my case countries).
head(AP$Year)
[1] "2008-Q2" "2008-Q3" "2008-Q4" "2009-Q1" "2009-Q2" "2009-Q3"
Or,
AP$Year<-as.factor(AP$Year)
head(AP$Year)
[1] 2008-Q2 2008-Q3 2008-Q4 2009-Q1 2009-Q2 2009-Q3
264 Levels: 1950-Q1 1950-Q2 1950-Q3 1950-Q4 1951-Q1 1951-Q2 1951-Q3 1951-Q4 1952-Q1 1952-Q2 1952-Q3 ... 2015-Q4
Is there any easy solution to convert these character dates into time-series dates?

library(zoo)
as.Date(as.yearqtr(AP$year, format ='%YQ-%q'))
This should do it.

Related

Quantmod - Chop data and constructing matrix of return series

I am having trouble with my R assignment I am working on this semester.
Here is the part that I am tasked with doing that I am confused about:
iv. Download 3 month TBill rate from Fred for the same sample period 01/01/1993 to 12/31/2013.
Useful Hints: You may have to chop the data to match the sample period.
v. Construct a matrix of return series combining Stock, S&P500, and TBill for the sample period.
Useful Hints:
Note that the rownames for the TBill may not match with the other two return series, as the dates do not match, although the month and year matches
You have to construct the row names for each of the series as Year – Month format (e.g. 1993-01) or delete the rownames from T-bill before you can combine all three series into one Return matrix.
You have to convert the Return matrix to a dataframe before you use the lm() function.
I tried this below like I have used getSymbols before for SPY and AAPL but it pulls an entire data set rather than the specific date range. How can I chop the data so it fits the desired date range?
getSymbols('TB3MS', src = 'FRED', from = "1993-01-01", to = "2013-12-31")
Next, how would I go about constructing the matrix of return series combining all of the stocks? Can anyone point me in the right direction?
Filtering an xts object: see examples in the xts documentation ?xts.
# filter 1993 until 2013
TB3MS["1993/2013"]
But these dates are of, because tbills are at the first day of the month, the stock dates are the last day of the month. With the coredata you can extract the tbill data and stick it into the other timeseries if the rows match.
Taking the data example from your previous question, you could do something like this (and I'm creating more steps than needed, you could combine a few statements into one):
# create monthly returns of the spy data and give the column a better name than monthly.returns
spy_returns <- monthlyReturn(SPY)
colnames(spy_returns) <- "SPY_returns"
# filter the tbill data
TB3MS_1993_2013 <- TB3MS["1993/2013"]
# add tbill data to spy data
spy_returns$TB3MS <- coredata(TB3MS_1993_2013)
Merging xts objects can just be done with merge. They will be merged on the dates.
merge(spy_returns, aapl_returns) would combine these two. If you have a lot of tickers, use Reduce (check help and SO on how to use Reduce with merge) but better would be to use the tidyquant package if allowed.

Converting monthly numerics to readable dates in R

How can set R to count months instead of dates when converting integers to dates?
After reading several threads on how to convert dates in R, it seems like nobody has asked how it is possible to convert numeric dates if the numerics is given in monthly timeseries. E.g. 552 represents January 2006.
I have tried several things, such as using as.Date(dates,origin="1899-12-01"), but I reckognize that R counts days instead of months. Thus, the code on year-month number 552 above yields "1901-06-06" instead of the correct 2006-01-01.
Sidenote: I also want the format to be YEARmonth, but does R allow displaying dates without days?
I think your starting date should be '1960-01-01'.
anyway you can solve this problem using the package lubridate.
in this case you can start from a date and add months.
library(lubridate)
as.Date('1960-01-01') %m+% months(552)
it gives you
[1] "2006-01-01"
you can display only the year and month of a date, but in that case R coerces the date into a character.
format(as.Date('2006-01-01'), "%Y-%m")

Time series analysis applicability?

I have a sample data frame like this (date column format is mm-dd-YYYY):
date count grp
01-09-2009 54 1
01-09-2009 100 2
01-09-2009 546 3
01-10-2009 67 4
01-11-2009 80 5
01-11-2009 45 6
I want to convert this data frame into time series using ts(), but the problem is: the current data frame has multiple values for the same date. Can we apply time series in this case?
Can I convert data frame into time series, and build a model (ARIMA) which can forecast count value on a daily basis?
OR should I forecast count value based on grp, but in that case, I have to select only grp and count column of a data frame. So in that case, I have to skip date column, and daily forecast for count value is not possible?
Suppose if I want to aggregate count value on per day basis. I tried with aggregate function, but there we have to specify date value, but I have a very large data set? Any other option available in r?
Can somebody, please, suggest if there is a better approach to follow? My assumption is that the time series forcast works only for bivariate data? Is this assumption right?
It seems like there are two aspects of your problem:
i want to convert this data frame into time series using ts(), but the
problem is- current data frame having multiple values for the same
date. can we apply time series in this case?
If you are happy making use of the xts package you could attempt:
dta2$date <- as.Date(dta2$date, "%d-%m-%Y")
dtaXTS <- xts::as.xts(dta2[,2:3], dta2$date)
which would result in:
>> head(dtaXTS)
count grp
2009-09-01 54 1
2009-09-01 100 2
2009-09-01 546 3
2009-10-01 67 4
2009-11-01 80 5
2009-11-01 45 6
of the following classes:
>> class(dtaXTS)
[1] "xts" "zoo"
You could then use your time series object as univariate time series and refer to the selected variable or as a multivariate time series, example using PerformanceAnalytics packages:
PerformanceAnalytics::chart.TimeSeries(dtaXTS)
Side points
Concerning your second question:
can somebody plz suggest me what is the better approach to follow, my
assumption is time series forcast is works only for bivariate data? is
this assumption also right?
IMHO, this is rather broad. I would suggest that you use created xts object and elaborate on the model you want to utilise and why, if it's a conceptual question about nature of time series analysis you may prefer to post your follow-up question on CrossValidated.
Data sourced via: dta2 <- read.delim(pipe("pbpaste"), sep = "") using the provided example.
Since daily forecasts are wanted we need to aggregate to daily. Using DF from the Note at the end, read the first two columns of data into a zoo series z using read.zoo and argument aggregate=sum. We could optionally convert that to a "ts" series (tser <- as.ts(z)) although this is unnecessary for many forecasting functions. In particular, checking out the source code of auto.arima we see that it runs x <- as.ts(x) on its input before further processing. Finally run auto.arima, forecast or other forecasting function.
library(forecast)
library(zoo)
z <- read.zoo(DF[1:2], format = "%m-%d-%Y", aggregate = sum)
auto.arima(z)
forecast(z)
Note: DF is given reproducibly here:
Lines <- "date count grp
01-09-2009 54 1
01-09-2009 100 2
01-09-2009 546 3
01-10-2009 67 4
01-11-2009 80 5
01-11-2009 45 6"
DF <- read.table(text = Lines, header = TRUE)
Updated: Revised after re-reading question.

Extract numbers from string as numeric or dates in R

I am working with some hdf5 data sets. However, the dates are stored in the file and no hint of these dates from the file name. The attribute file consists of day of the year, month of the year, day of the month and year columns.
I would like to pull out data to create time series identity for each of the files i.e.year month date format that can be used for time series.
A sample of the data can be downloaded here:
[ ftp://l5eil01.larc.nasa.gov/tesl1l2l3/TES/TL3COD.003/2007.08.31/TES-Aura_L3-CO_r0000006311_F01_09.he5 ]
There is an attribute group file and a data group file.
I use the R library "rhdf5" to explore the hdf5 files. E.g
CO1<-h5ls ("TES-Aura_L3-CO_r0000006311_F01_09.he5")
Attr<-h5read("TES-Aura_L3-CO_r0000006311_F01_09.he5","HDFEOS INFORMATION/coremetadata")
Data<-h5read("TES-Aura_L3-CO_r0000006311_F01_09.he5", "HDFEOS\SWATHS\ColumnAmountNO2\Data Fields\ColumnAmountNO2Trop")
The Attr when read consist of a long string with the only required information being "2007-08-31" which is the date of acquisition. I have been able to extract this using the Stringr library:
regexp <- "([[:digit:]]{4})([-])([[:digit:]]{2})([-])([[:digit:]]{2})"
Date<-str_extract(Attr,pattern=regexp)
which returns the Date as:
"2007-08-31"
The only problem left now is that the Date isnt recognised as numeric or date. How do I change this as I need to bind the Date with the data for all days to create a time series (more like an identifier as the data sets are irregular), please? a sample of how it looks after extracting the dates from string and binding with the CO values for each date is below
Dates CO3b
[1,] "2011-03-01" 1.625811e+18
[2,] "2011-03-04" 1.655504e+18
[3,] "2011-03-11" 1.690428e+18
[4,] "2011-03-15" 1.679871e+18
[5,] "2011-03-17" 1.705987e+18
[6,] "2011-03-17" 1.661198e+18
[7,] "2011-03-17" 1.662694e+18
[8,] "2011-03-20" 1.520328e+18
[9,] "2011-03-21" 1.510642e+18
[10,] "2011-03-21" 1.556637e+18
However, R recognises these dates as character and not as date. I need to convert them to a time series I can work with.
Seems like you've already done all the hard work! Based off your comment, here's how you could take it across the finish line.
From your comment, seems like you have the strings in a good format. Given that your variable is named date, simply go
dateObjects<-as.Date(Date) #where Date is your variable
and either the single value or vector of character strings (as the format you gave in the comment) will now be date objects, which you could use with a library like zoo to create time series.
If your strings are not necessarily in the format you've described, then refer to the following link to see how to format other string forms as dates.
http://www.statmethods.net/input/dates.html
Given your example data frame you can create a time series in the following way, using the package zoo.
library(zoo)
datavect<-as.zoo(df$CO3b)
index(datavect)<-as.Date(df$Date)
here we take your CO data, covert it to a zoo object, then assign the appropriate date to each entry, converting it from a character to a date object. Now if you print datavect, you'll see each data entry attached to a date. This allows you to take advantage of zoo methods, such as merge and window.
Here is one approach not using string extraction. If you know how long your time series should be, which you should based on the length of your dataset and knowledge of its periodicity, you could just create a regular date series and then add that into a data.frame with other variables of interest. Assuming you have daily data the below would work. Obviously your length.out would be different.
d1 <- ISOdate(year=2007,month=8,day=31)
d2 <- as.Date(format(seq(from=d1,by="day",length.out=10),"%Y-%m-%d"))
[1] "2007-08-31" "2007-09-01" "2007-09-02" "2007-09-03" "2007-09-04" "2007-09-05" "2007-09-06" "2007-09-07" "2007-09-08" "2007-09-09"
class(d2)
[1] "Date"
Edit of Original:
Oh I see. Well after reading in your new data example the below worked for me. It was a pretty straight forward transform. cheers
library(magrittr) # Needed for the pipe operator %>% it makes it really easy to string steps together.
dateData
Dates CO3b
1 2011-03-01 1.63e+18
2 2011-03-04 1.66e+18
3 2011-03-11 1.69e+18
4 2011-03-15 1.68e+18
5 2011-03-17 1.71e+18
6 2011-03-17 1.66e+18
7 2011-03-17 1.66e+18
8 2011-03-20 1.52e+18
9 2011-03-21 1.51e+18
10 2011-03-21 1.56e+18
dateData %>% sapply(class) # classes before transforming (character,numeric)
dateData[,1] <- as.Date(dateData[,1]) # Transform to date
dateData %>% sapply(class) # classes after transforming (Date,numeric)
str(dateData) # one more check
'data.frame': 10 obs. of 2 variables:
$ Dates: Date, format: "2011-03-01" "2011-03-04" "2011-03-11" "2011-03-15" ...
$ CO3b : num 1.63e+18 1.66e+18 1.69e+18 1.68e+18 1.71e+18 ...

Creating a single timestamp from separate DAY OF YEAR, Year and Time columns in R

I have a time series dataset for several meteorological variables. The time data is logged in three separate columns:
Year (e.g. 2012)
Day of year (e.g. 261 representing 17-September in a Leap Year)
Hrs:Mins (e.g. 1610)
Is there a way I can merge the three columns to create a single timestamp in R? I'm not very familiar with how R deals with the Day of Year variable.
Thanks for any help with this!
It looks like the timeDate package can handle gregorian time frames. I haven't used it personally but it looks straightforward. There is a shift argument in some methods that allow you to set the offset from your data.
http://cran.r-project.org/web/packages/timeDate/timeDate.pdf
Because you mentioned it, I thought I'd show the actual code to merge together separate columns. When you have the values you need in separate columns you can use paste to bring them together and lubridate::mdy to parse them.
library(lubridate)
col.month <- "Jan"
col.year <- "2012"
col.day <- "23"
date <- mdy(paste(col.month, col.day, col.year, sep = "-"))
Lubridate is a great package, here's the official page: https://github.com/hadley/lubridate
And here is a nice set of examples: http://www.r-statistics.com/2012/03/do-more-with-dates-and-times-in-r-with-lubridate-1-1-0/
You should get quite far using ISOdatetime. This function takes vectors of year, day, hour, and minute as input and outputs an POSIXct object which represents time. You just have to split the third column into two separate hour minute columns and you can use the function.

Resources