TIme series data in R, problems with dates - r

Date T1V T2V T3V T1MV T2MV T3MV
1997-12-31 2.631202 2.201695 -0.660092 -0.77492483 0.282662305 4.66506798
1998-01-30 2.193793 3.763458 5.565432 3.50711734 2.874381814 5.14118430
1998-02-27 5.173496 8.727646 6.333820 2.59892279 8.363146480 9.27289259
This is the table I am working with in R. It is much bigger. Data is on monthly basis up until 2014.The different columns are just the return dates on different portfolios. I always get errors if I want to use it as a time series data. I downloaded the PerformanceAnalytics package. For example for the SharpeRatio function it gives me.
> SharpeRatio(T1V)
Error in checkData(R) :
The data cannot be converted into a time series. If you are trying to passin names from a data object with one column, you should use the form 'data[rows, columns, drop = FALSE]'. Rownames should have standard date formats, such as '1985-03-15'.
when you look at the date column in the table you see that the date format is exactly this format.
I tried a hundred things. It also doesn^t let me plot the charts with lines only with points.
Any help is much appreciated.
> dput(FactorR[1:5,])
structure(list(Date = structure(1:5, .Label = c("1997-12-31",
"1998-01-30", "1998-02-27", "1998-03-31", "1998-04-30", "1998-05-29",
"1998-06-30", "1998-07-31", "1998-08-31", "1998-09-30", "1998-10-30",
"1998-11-30", "1998-12-31", "1999-01-29", "1999-02-26", "1999-03-31",
"1999-04-30", "1999-05-31", "1999-06-30", "1999-07-30", "1999-08-31",
"1999-09-30", "1999-10-29", "1999-11-30", "1999-12-31", "2000-01-31",
"2000-02-29", "2000-03-31", "2000-04-28", "2000-05-31", "2000-06-30",
"2000-07-31", "2000-08-31", "2000-09-29", "2000-10-31", "2000-11-30",
"2000-12-29", "2001-01-31", "2001-02-28", "2001-03-30", "2001-04-30",
.
.
.
, class = "factor"),
T1V = c(2.631202, 2.193793, 5.173496, 8.033864, 1.369065),
T2V = c(2.201695, 3.763458, 8.727646, 11.375482, 3.097196
), T3V = c(-0.660092, 5.565432, 6.33382, 20.608638, 4.022475
), T1MV = c(-0.774924835, 3.507117337, 2.598922792, 16.26945887,
4.544096701), T2MV = c(0.282662305, 2.874381814, 8.36314648,
12.7091841, 1.078742371), T3MV = c(4.665067984, 5.141184302,
9.27289259, 10.62133318, 2.791853987), T1BTM = c(0.617378168,
3.498582776, 3.332624722, 8.802164975, 1.366229683), T2BTM = c(1.101407825,
5.578394125, 8.910685728, 20.05317039, 1.258609942), T3BTM = c(2.454019461,
2.445706552, 7.991651412, 10.79096755, 5.464002646), T1MOM = c(2.99986853,
4.982808153, 8.657010689, 10.60637296, 4.44333707), T2MOM = c(0.011102554,
3.184165606, 7.55229158, 11.9341773, 0.328377299), T3MOM = c(1.161834369,
3.355709694, 4.025659592, 17.12665788, 3.55822744), Rm = c(1.390935,
3.840895, 6.744987, 13.262647, 2.753486), SMB = c(-5.439992819,
-1.634066965, -6.673969798, 5.648125694, 1.752242715), HML = c(-1.836641293,
1.052876225, -4.65902669, -1.988802574, -4.097772963), MOM = c(1.838034161,
1.62709846, 4.631351096, -6.520284921, 0.885109629)), .Names = c("Date",
"T1V", "T2V", "T3V", "T1MV", "T2MV", "T3MV", "T1BTM", "T2BTM",
"T3BTM", "T1MOM", "T2MOM", "T3MOM", "Rm", "SMB", "HML", "MOM"
), row.names = c(NA, 5L), class = "data.frame")

Two things are wrong:
Your Date column doesn't contain dates but factors.
SharpeRatio doesn't know how to convert your data.frame to a time series object.
By doing the conversion manually, we can specify which column to use as time index and on-the-fly convert it to Date:
library(PerformanceAnalytics)
FactorR_xts <- xts(x = FactorR[, -1], # use all columns except for first column (date) as data
order.by = as.Date(FactorR$Date) # Convert Date column from factor to Date and use as time index
)
SharpeRatio(FactorR_xts)

Related

Error in if (class(x) == "numeric") { : the condition has length > 1

I`m trying to visualise data of the following form:
date volaEUROSTOXX volaSA volaKENYA25 volaNAM volaNIGERIA
1 10feb2012 0.29844454 0.1675901 0.007862087 0.12084170 0.10247617
2 17feb2012 0.31811157 0.2260064 0.157017220 0.33648935 0.22584127
3 24feb2012 0.30013672 0.1039974 0.083863921 0.11694768 0.16388161
To do so, I first converted the date (stored as a character in the original data frame) into a date-format. Which works just fine:
vola$date <- as.Date(vola$date)
str(vola$date)
Date[1:543], format: "2012-02-10" "2012-02-17" "2012-02-24" "2012-03-02" "2012-03-09"
However, if I now try to graph my data by using the chart.TimeSeries command, I get the following:
chart.TimeSeries(volatility_annul_stringdate,lwd=2,auto.grid=F,ylab="Annualized Log Volatility",xlab="Time",
main="Log Volatility",lty=1,
legend.loc="topright")
Error in if (class(x) == "numeric") { : the condition has length > 1
I tried:
Converting my date variable (in the date format) further into a time series object:
vola$date <- ts(vola$date, frequency=52, start=c(2012,9)) #returned same error from above
Converting the whole data set using its-command:
vol.xts <- xts(vola, order.by= vola$date, unique = TRUE ) # which then returned:
order.by requires an appropriate time-based object
#even though date is a time-series
What am I doing wrong? I am rather new to RStudio.. I really want to use the chart.TimeSeries command. Can someone help me?
Thanks in advance!
My MRE:
library(PerformanceAnalytics)
vola <- structure(list(date_2 = c("2012-02-10", "2012-02-17", "2012-02-24",
"2012-03-02"), volaEUROSTOXX = c(0.298444539308548, 0.318111568689346,
0.300136715173721, 0.299697518348694), volaKENYA25 = c(0.00786208733916283,
0.157017216086388, 0.0838639214634895, 0.152377054095268), volaNAM = c(0.120841704308987,
0.336489349603653, 0.116947680711746, 0.157027021050453), volaNIGERIA = c(0.102476172149181,
0.225841268897057, 0.163881614804268, 0.317349642515182), volaSA = c(0.167590111494064,
0.226006388664246, 0.103997424244881, 0.193037077784538), date = structure(c(1328832000,
1329436800, 1330041600, 1330646400), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), row.names = c(NA, -4L), class = c("tbl_df", "tbl",
"data.frame"))
vola <- subset(vola, select = -c(date))
vola$date_2 <- as.Date(vola$date_2)
chart.TimeSeries(vola,lwd=2,auto.grid=F,ylab="Annualized Log Volatility",xlab="Time",
main="Log Volatility",lty=1,
legend.loc="topright")
#This returns the above mentioned error message.
#Thus, I tried the following:
vola$date_2 <- ts(vola$date_2, frequency=52, start=c(2012,9))
chart.TimeSeries(vola,lwd=2,auto.grid=F,ylab="Annualized Log Volatility",xlab="Time",
main="Log Volatility",lty=1,
legend.loc="topright")
#Which returned a different error (as described above)
#And I tried:
vol.xts <- xts(vola, order.by= vola$date_2, unique = TRUE )
#This also returned an error message.
#My intention was to then run:
#chart.TimeSeries(vol.xts,lwd=2,auto.grid=F,ylab="Annualized Log Volatility",xlab="Time",
main="Log Volatility",lty=1,
legend.loc="topright")
The documentation of PerformanceAnalytics::chart.TimeSeries is a bit vague. The issue is that when passing a dataframe you have to set the dates as row.names. To this end I first converted your data (which is a tibble) to a data.frame. Afterwards I add the dates as rownames and drop the date column:
library(PerformanceAnalytics)
vola <- as.data.frame(vola)
vola <- subset(vola, select = -c(date))
row.names(vola) <- as.Date(vola$date_2)
vola$date_2 <- NULL
chart.TimeSeries(vola,
lwd = 2, auto.grid = F, ylab = "Annualized Log Volatility", xlab = "Time",
main = "Log Volatility", lty = 1,
legend.loc = "topright"
)

Labelling Variables

I have a series of variables that fall under one related question: lets say there are 20 such variables in my dataframe, each one corresponds to an option on a MC question. They are titled popn1, popn2......popn20.
I want to label each variable by its option, as an example: (popn1 = Everyone; popn2=Children)
I'm using the labelVector package.
Is there a way I can do it without writing out each variable name? Ex. is there a paste function I can use, such as
df2 <- Set_label(df1,
(paste0(popn, 1:20) = "Everyone", "Children", .... "Youth"?)
This can be done in base R quite easily. Here's some sample data (using columns instead of 20, to make it easier to view)
popn1 popn2 popn3 popn4 popn5
1 -0.4085141 3.240716 2.730837 6.428722 8.015210
2 3.1378943 2.512700 2.021546 3.333371 5.654401
3 2.4073278 1.475619 2.449742 2.817447 6.295569
It looks like you already have your new column names in a character vector:
your_column_names <- c("Everyone", "Youth", "Someone", "Something", "Somewhere")
Then you just use the setNames argument on the column names for your data:
colnames(data) <- setNames(your_column_names, colnames(data))
Everyone Youth Someone Something Somewhere
1 -0.4085141 3.240716 2.730837 6.428722 8.015210
2 3.1378943 2.512700 2.021546 3.333371 5.654401
3 2.4073278 1.475619 2.449742 2.817447 6.295569
Sample Data:
data <- structure(list(popn1 = c(-0.408514139489243, 3.13789432899688,
2.40732780606037), popn2 = c(3.24071608151551, 2.51269963339946,
1.47561933493116), popn3 = c(2.73083728435832, 2.02154567048998,
2.44974180329751), popn4 = c(6.42872215439841, 3.3333709733048,
2.81744655980154), popn5 = c(8.0152099281755, 5.65440141443164,
6.29556905855252)), class = "data.frame", row.names = c(NA, -3L
))

Why is 'weeks' from specific date not calculated?

I have a sample q below that contains three dates of dd/mm/yy in q$test
test
1 210376
2 141292
3 280280
I want to create a new covariate q$new that calculates the date difference from q$test to today.
I tried
q$new <- as.numeric(difftime(as.Date(q$test,format='%d/%m/%y'), as.Date(Sys.Date()), unit="weeks"))
But I receive an error message
Error in q$new <- as.numeric(difftime(as.Date(q$test, format =
"%d/%m/%y"), : object of type 'closure' is not subsettable
Do you have any idea whats wrong? Or have another solution?
q <- structure(list(test = c(210376L, 141292L, 280280L)), class = "data.frame", row.names = c(NA,
-3L))
You could do
as.numeric(difftime(Sys.Date(), as.Date(as.character(q$test), "%d%m%y"), units = "weeks"))
#[1] 2257.286 1384.143 2051.714
Few pointers -
1) Sys.Date is already of class "Date" so no need for as.Date there
2) as.Date was expecting a character string as input hence wrapped q$test in as.character
3) format in as.Date is used to represent the format we have as input and not the output we want. So in your case you used the format "%d/%m/%y" whereas the format you had was %d%m%y.

How to use setorder function in R

I have a mock up data set:
d1 = structure(
list(
chan1 = c(1.49955768204777, 1.57924608282282,
1.62079872172079, 1.49955768204777,
1.50897108417039, 1.47897959168283),
chan2 = c(3.71459266186863, 3.71459266186863,
3.66763591782946, 3.67359273988532,
3.66408366995924, 3.68083665073346),
chan3 = c(8.32529316285155, 6.30229174858652,
6.97551768293611, 6.52653674461786,
6.52653674461786, 6.07823977152575),
chan4 = c(11.023719681933, 11.023719681933,
11.023719681933, 11.4613297390623,
11.4613297390623, 11.5813471428122),
chan5 = c(7.32862391337389, 7.38103675023449,
7.81796038841145, 7.4216715642288,
7.51924428352424, 7.35498863975821),
rowname = c(2042051, 1454646, 289170,
3307469, 3890829, 1741489),
total_conv = c(359.161333500186, 359.161312264452,
359.16130836516, 359.161294408793,
359.161289598969, 359.161209958641),
sum = c(31.8917871020749, 30.0008869254455,
31.1056323928309, 30.5826884698421,
30.6801655213341, 30.1743917965125)
),
.Names = c("chan1", "chan2", "chan3", "chan4", "chan5",
"rowname", "total_conv", "sum"),
class = "data.frame",
row.names = c(NA, -6L)
)
Now I need to sort this data set by total_conv and sum variables.
Here total_conv should be sort in descending order and sum in ascending order.
When I use the following function, I unable to sort my data set in required format.
d1<-setorder(as.data.table(d1),-total_conv,sum)
How can I overcome this issue?
You can also try order instead of setorder:
setDT(d1)[order(-total_conv, sum)]
It will first sort by descending total_conv and then by descending sum.

YYYYMMDD format

structure(list(date = c(20140717L, 20140611L, 20140611L, 20140704L,
20140411L, 20140906L, 20140512L, 20140717L, 20140819L, 20140415L,
20140812L, 20140403L, 20140424L, 20140818L, 20140922L, 20140625L,
20141006L, 20140918L, 20140811L, 20140819L, 20140602L, 20140626L,
20140729L, 20140624L, 20140909L, 20140705L, 20140920L, 20140515L,
20140531L, 20140628L, 20140822L, 20140508L, 20140809L, 20140627L,
20140727L, 20140711L, 20140714L, 20140710L, 20140403L, 20140525L,
20140428L, 20140501L, 20140915L, 20140510L, 20140601L, 20140921L,
20140815L, 20140610L, 20140418L, 20140812L, 20140614L, 20140814L,
20140626L, 20140412L, 20140912L, 20140514L, 20140919L, 20140706L,
20140411L, 20140711L, 20140624L, 20140430L, 20140521L, 20140418L,
20140713L, 20140424L, 20140601L, 20140923L, 20140406L, 20140905L,
20140613L, 20140412L, 20140407L, 20140402L, 20140813L, 20140903L,
20140827L, 20140521L, 20140524L, 20140404L, 20140419L, 20140412L,
20140902L, 20140623L, 20140925L, 20140528L, 20140731L, 20140513L,
20140821L, 20140703L, 20140724L, 20140818L, 20140801L, 20140628L,
20140801L, 20140521L, 20140906L, 20140725L, 20140522L, 20140927L,
20140615L, 20140920L, 20140813L, 20140815L, 20140924L, 20140614L,
20140912L, 20140710L, 20140807L, 20140501L, 20140420L, 20140630L,
20140704L, 20140401L, 20140605L, 20140928L, 20140806L, 20140614L,
20140907L, 20140704L, 20140403L, 20140804L, 20140603L, 20140728L,
20140919L, 20140731L, 20140426L, 20140930L, 20140502L, 20140827L,
20140815L, 20140628L, 20140902L, 20140616L, 20140613L, 20140726L,
20140721L, 20140425L, 20140715L, 20140607L, 20140913L, 20140621L,
20140708L, 20140427L, 20140506L, 20140425L, 20140411L, 20140615L,
20140713L, 20140424L, 20140406L, 20140711L, 20140415L, 20140909L,
20141004L, 20140725L, 20140602L, 20140405L, 20140525L, 20140605L,
20140521L, 20140506L, 20140414L, 20140916L, 20140512L, 20140830L,
20140722L, 20140711L, 20140628L, 20140613L, 20140618L, 20140719L,
20140416L, 20140727L, 20140521L, 20140718L, 20140814L, 20140515L,
20140501L, 20140725L, 20140507L, 20140619L, 20140525L, 20140609L,
20140614L, 20140402L, 20140914L, 20140517L, 20140826L, 20140602L,
20140920L, 20140718L, 20140915L, 20140715L, 20140708L, 20140419L,
20140819L, 20140501L, 20140807L, 20140404L)), .Names = "date", row.names = c(NA,
-200L), class = "data.frame")
This data frame has date values as class of integer. This data set is just one column of my data set. The original data set also has another variable called "total sales". I want to make a plot which has dates in X axis and on Y axis, total sales.
However, because the dates are regarded as integer, the plot is bad. So I want to let R understand the date column as date variables so I can get improved plot.
How can it be possible? Please give me help. Thank you very much.
You might have better luck with as.Date. If df is the data, then you can do
df$date <- as.Date(as.character(df$date), format = "%Y%m%d")
with(df, plot(date))
If d is the date column, you can try:
strptime(t(d),format='%Y%m%d')

Resources