monthlyReturn and unequal month length - r

I have 300+ companies and need to calculate monthly return for them and later use it as one of the variables in my data set.
I download prices from Yahoo and calculated monthly return using quantmod package:
require(quantmod)
stockData <- lapply(symbols,function(x) getSymbols(x,auto.assign=FALSE, src='yahoo', from = '2000-01-01'))
stockDataReturn <- lapply(stockData,function(x) monthlyReturn(Ad(x)))
The problem I have is that some companies have different month ends (due to trading halts, etc.) which is reflected in the output list: 2013-12-30 for company AAA and 2013-12-31 for company BBB and the rest of the sample.
When I merge the list using
returns <- do.call(merge.xts, stockDataReturn)
It creates a separate row for 2013-12-30 with all NAs except for AAA company.
How can I resolve this? My understanding is that I need to need to stick to month-year format which I need to use as the index before I merge.
Ideally, what I want is that at the monthlyReturn stage, it uses the beginning of the month date rather than end of the month.

You could use lubridate's floor_date to merge on the same beginning of the month timestamp rather than end of the month timestamp. Or use ceiling date to round to the same end of month timestamp for all securities before merging.
library(lubridate)
stockDataReturn <- lapply(stockDataReturn,
function(x) {
index(x) <- floor_date(index(x), "month")
# Or if you want to round to end of month change to:
# index(x) <- ceiling_date(index(x), "month")
x
})
returns <- do.call(merge, stockDataReturn)
colnames(returns) <- symbols

Related

R generate one random date per month between defined interval

I'd like to generate a list of random dates between a defined interval using R such that there is only one date for each month present in the interval.
I've tried using a variation of the code from another solution, but I can't seem to limit it to one date per month. I get multiple dates for a given month.
Here's my attempt
df = data.frame(Date=c(sample(seq(as.Date('2020/01/01'), as.Date('2020/09/01'), by="day"), 9)))
But I seem to get more than one date for a given month. Any inputs would be highly appreciated.
First I create a table, containing all the possible dates that you want to sample. And I store in a column of this table, the index, or the number of the month of each date, using the month() function from lubridate package.
library(lubridate)
dates <- data.frame(
days = seq(as.Date('2020/01/01'), as.Date('2020/09/01'), by="day")
)
dates$month <- month(dates$day)
Then, the idea is to create a loop with lapply() function. In each loop, I select in the table dates, only the dates of that month, and I paste these months in to the sample() function.
results <- lapply(1:9, function(x){
sample_dates <- dates$days[dates$month == x]
return(sample(sample_dates, size = 1))
})
df <- data.frame(
dates = as.Date(unlist(results), origin = "1970-01-01")
)
Resulting this:
dates
1 2020-01-19
2 2020-02-06
3 2020-03-26
4 2020-04-13
5 2020-05-16
6 2020-06-29
7 2020-07-06
8 2020-08-21
9 2020-09-01
In other words, the ideia of this approach is to provide selected dates to sample() function on each loop. So it will sample, or choose a date, only for that specific month, on each loop.
How about this:
First you create a function that returns a random day from month 'month'
Then you lapply for all months you need, 1 to 9
x <- function(month){
(Date=c(sample(seq(as.Date(paste0('2020/',month,'/01')), as.Date(paste0('2020/',month+1,'/01')), by="day"), 1)))
}
df <- data.frame(
dates = as.Date(unlist(lapply(1:9,x)), origin = "1970-01-01")
)
If you also want the results to be random (not January, February, March...) you only need to add a sample:
df <- data.frame(
dates = as.Date(unlist(sample(lapply(1:9,x))), origin = "1970-01-01")
)

Is there a way to use the round date to next trading day while keeping both date and variable columns in R?

How can I round the dates in the date column to the following business day? So each Saturday, Sunday and holiday should be transformed to the following business day. Furthermore, how can we include the output from the other columns as well in the transformation to following business days?
I tried this with the bizdays function:
TestDates <- RawTweetDataWSentiment
View(TestDates)
bizdays.options$set(default.calendar="UnitedKingdom/ANBIMA")
cal <- create.calendar("UnitedKingdom/ANBIMA", holidays=holidaysANBIMA, weekdays=c("saturday", "sunday"))
adjust.next(TestDates$Date, cal)
TestDates1 <- adjust.next(TestDates$Date, cal)
View(TestDates1)
This however only returns the date column
Does anyone know how to do this in R?

R - Next highest value in a time series

A relatively simple question, but one I can't seem to find any examples.
I have simple forex price data which is in a 2 column xts object called subx1:
Datetime, Price
2016-09-01 00:00:01, 1.11563
2016-09-01 00:00:01, 1.11564
2016-09-01 00:00:02, 1.11564
2016-09-01 00:00:03, 1.11565
... and so forth.
I'm trying to find the first time after 2pm when the price goes higher than the pre-2pm high which is held in another object's column called daypeakxts$before2.High and
Where a sample of daypeakxts is:
Date, before2.High
2016-09-01, 1.11567
2016-09-02, 1.11987
This is a bad example of what I'm trying to do:
subxresult <- index(subx1, subx1$datetime > daypeakxts$before2.High)
... so I'm looking to discover a datetime for a price using a conditional statement with a day's value in another xts object.
You didn't provide enough data for a reproducible example, so I'm going to use some daily data that comes with the xts package.
library(xts)
data(sample_matrix)
x <- as.xts(sample_matrix, dateForamt = "Date")
# Aggregate and find the high for each week
Week.High <- apply.weekly(x, function(x) max(x$High))
# Finding the pre-2pm high would be something like:
# Pre.2pm.High <- apply.daily(x["T00:00/T14:00"], function(x) max(x$High))
# Merge the period high with the original data, and
# fill NA with the last observation carried forward
y <- merge(x, Week.High, fill = na.locf)
# Lag the period high, so it aligns with the following period
y$Week.High <- lag(y$Week.High)
# Find the first instance where the next period's high
# is higher than the previous period's high
y$First.Higher <- apply.weekly(y, function(x) which(x$High > x$Week.High)[1])

How to calculate summary statistics within specified date/time range within time series, using an input of multiple start and end dates?

I have a (dummy) data frame with time series data:
datetime <- as.POSIXct(seq(ISOdate(2012,12,22), ISOdate(2012,12,23), by="hour"), tz='EST')
data <- rnorm(25, 10, 5)
df <- data.frame(datetime, data)
I also have a separate data frame with start and end times as the two columns:
start <- as.POSIXct(c('2012/12/22 19:53', '2012/12/22 23:05'), tz='gmt')
end <- as.POSIXct(c('2012/12/22 21:06', '2012/12/22 23:58'), tz='gmt')
index <- data.frame(start, end)
What I'd like to do is "feed" the main data frame 'df' the 'index' data frame, and, for each start and end date/time combination, find the average value of "data" within that date/time range. This would be equivalent to doing a subset of 'df' manually for each start/end time, but in a combined fashion. (My real data set has years of data, and a hundred date/time ranges I want to feed it FYI).
End goal is to have three columns, start time, end time, and the average numeric value of 'data' within those times.
In general you don't want to grow a data frame one row at a time by calling rbind because it is very inefficient (see the second circle of the R inferno for details). In your case, you can use sapply to replicate this logic:
index$mean <- sapply(1:nrow(index), function(i) mean(df[df$datetime >= index$start[i] &
df$datetime <= index$end[i],2]))
index
# start end mean
# 1 2012-12-22 19:53:00 2012-12-22 21:06:00 9.563336
# 2 2012-12-22 23:05:00 2012-12-22 23:58:00 NaN
I figured out how to do it with a for loop. If anyone has a more efficient solution, that would be great. The for loop solution:
d <- data.frame()
for i in (1:nrow(index)) {
d <- rbind(d, mean(subset(df, datetime >= index[i,1] &
datetime <= index[i,2])[,2]))}

Remove incomplete month from monthly return calculation

I have some code for grabbing stock prices and calculating monthly returns. I would like to drop the last return if the price used to calculate it did not occur at month end. For example, running the code below returns prices through 2014-06-13. And, the monthlyReturn function calculates a return for June even though there hasn't been a full month. Is there an easy way to make sure monthlyReturn is only computing returns on full months or to drop the last month from the return vector if it wasn't calculated on a full month of prices?
library(quantmod)
symbols <- c('XLY', 'XLP', 'XLE', 'XLF', 'XLV', 'XLI', 'XLB', 'XLK', 'XLU')
Stock <- xts()
Prices <- xts()
for (i in 1:length(symbols)){
Stock <- getSymbols(symbols[i],auto.assign = FALSE)
Prices <- merge(Prices,Stock[,6])
}
returns <- do.call(cbind, lapply(Prices, monthlyReturn, leading=FALSE))
names(returns) <- symbols
I found this bit of code, but it seems to have some limitations. Is there a way to improve this?
if(tail(index(x.xts),1) != as.Date(as.yearmon(tail(index(x.xts),1)), frac=1)){
x.m.xts = x.m.xts[-dim(x.m.xts)[1],]
}
# That test isn't quite right, but its close. It won't work on the first
# day of a new month when the last business day wasn't the last day of
# the month. It will work for the second day.
You can use negative subsetting with xts:::last.xts. This will remove the last month
last(returns, "-1 months")
But you only want to remove the last month if the month hasn't ended yet, so compare the month of the last row, with the month of the current date.
if (format(end(returns), "%Y%m") == format(Sys.Date(), "%Y%m"))
returns <- last(returns, "-1 month")

Resources