Grouping an xts by month - r

I have an xts object which I have split into months using split(data, "months"). This splits my data into months of each year. I want to group my data into months regardless of the year thanks.

require(lubridate) # we will use the month() function
# basically extracts the month from a date
split(data, month(as.Date(data)))
# your code, with as.Date() we make sure it's the correct format

Related

Sort date column and convert csv file to a time -series

I need your help. I am new to R, I have this csv file shorturl.at/chDK9 with the "All Share Index" from the Nigerian stock exchange, formatted in a matrix, with the months as rows and the years as columns.
I am trying to do 4 things:
Reshape the data, to four columns for Date, Month, Year, ASI
The period should be a date column in the format 01-2013 for January 2013 and so on.
Arrange the data by the date, oldest to newest
Convert the data to a time-series type for analysis (xts prefarably)
So far I have solved 1 & 2 above.
Please see my code below
rASI <- ASI_conv_to_USD_2003_2018
gathered.rASI <- gather(rASI, Month, ASI, -Year)
gathered.rASI$Date <- format(as.Date(paste0(gathered.rASI$Month, gathered.rASI$Year, "01"), format="%b%Y%d"), "%m-%Y")
ASI <- select(gathered.rASI, Date, Month, Year, ASI)
Created on 2020-06-04 by the reprex package (v0.3.0)
I do not know what I am doing wrong, but the date column still shows as a chr. How do I make the date column function as a proper date?
Any help would be greatly appreciated.
Data:
Year,January,February,March,April,May,June,July,August,September,October,November,December
2003,104.904946,108.036674,106.6532671,106.1211644,110.6369777,114.3109402,109.7382693,120.7042254,129.0513061,141.9747008,140.2999274,147.4647619
2004,168.4931751,184.3675093,171.8948949,194.2243976,209.6846881,218.4302457,204.5201028,179.6591854,171.788925,176.3957704,175.7856172,180.1624481
2005,174.3600786,165.874575,156.2704949,165.9616111,162.3373385,162.9130468,165.5409489,177.6973735,190.975969,200.5254592,189.5253288,187.4381323
2006,184.2754864,187.0039216,184.1151874,183.9374803,195.3248086,207.753217,220.2425152,261.5902624,257.3486166,257.9713924,257.9644269,262.3660079
2007,290.763576,321.9563671,344.0977116,373.70341,397.1224052,408.8450816,422.9554882,404.1068702,405.3995157,413.592025,462.4500768,498.6259673
2008,465.9093801,564.6059512,542.1712123,511.539673,507.3090565,481.7790407,457.4977173,411.7628813,398.2089436,312.9651073,284.4105236,240.5413384
2009,151.4739254,160.8334365,136.7210055,147.8068088,203.1480164,183.6687179,169.4245226,152.975866,150.2860646,146.6946313,142.143901,141.1054878
2010,152.3225241,155.1887111,175.6850474,178.6050908,176.5795117,171.5144595,174.5049291,163.103972,154.3394041,169.2037838,167.046543,166.6141118
2011,179.0501835,173.3762495,163.0327771,164.2939247,168.9634855,165.1146804,158.8889704,141.5247531,132.2063595,139.799399,128.3830306,132.7185019
2012,133.3492814,129.4949163,132.8047714,142.0467784,142.1346216,138.9576042,148.4574482,152.9350934,167.5144256,170.2584385,170.6456267,180.8386037
2013,205.1867431,213.04438,216.0144928,215.3981965,243.4601263,232.9424155,244.1989566,233.469857,235.6526892,242.2584675,250.74636,266.2963273
2014,261.3308857,254.8076651,249.6006828,247.9683695,267.1803131,273.6744186,271.1943568,267.5533724,265.4434783,241.8539225,209.9881459,206.9083582
2015,176.4899701,152.4243544,161.5512468,176.6316031,174.6074809,170.307101,153.589313,151.067888,158.9094935,148.4871247,140.5468193,145.7620865
2016,121.710687,125.041883,128.7848346,127.5440712,140.7794402,104.7709381,89.631776,90.34052373,92.97916325,89.3927422,83.19668309,88.25819376
2017,85.43474979,83.04616393,83.42762792,84.35732766,96.74749098,108.4396857,120.8084876,116.2751597,116.1014906,120.1450704,124.20491,125.1822913
2018,145.2937418,141.8812705,136.0134688,135.2162844,124.7488623,125.4006552,121.2306533,114.014232,107.1321563,106.06426,100.7971596,102.5464927
Here might be a way out, gathering your data (i.e., changing them from wide to long), creating a date variable and only then translating the result to xts.
## This assumes that you already have written the data frame (as in your example)
myxts <- ASI_conv_to_USD_2003_2018 %>%
## gather changes the data from wide to long
tidyr::gather("month","value",-Year) %>%
## dmy creates the date variable
mutate(dat = paste0("01 ",month," ",Year) %>% lubridate::dmy()) %>%
## keep only the date and the value
select(dat, value) %>%
## sort by date (not compulsory)
arrange(dat) %>%
## convert to xts (note that xts::as_xts() is deprecated)
timetk::tk_xts(select=value,date_var=dat)

Converting variables in form of "2015M01" to date format in R?

I have a date frame df that simply looks like this:
month values
2012M01 99904
2012M02 99616
2012M03 99530
2012M04 99500
2012M05 99380
2012M06 99103
2013M01 98533
2013M02 97600
2013M03 96431
2013M04 95369
2013M05 94527
2013M06 93783
with month that was written in form of "M01", "M02"... and so on.
Now I want to convert this column to date format, is there a way to do it in R with lubridate?
I also want to select columns that contain one certain month from each year, like only March columns from all these years, what is the best way to do it?
The short answer is that dates require a year, month and day, so you cannot convert directly to a date format. You have 2 options.
Option 1: convert to a year-month format using zoo::as.yearmon.
library(zoo)
df$yearmon <- as.yearmon(df$month, "%YM%m")
# you can get e.g. month from that
months(df$yearmon[1])
# [1] "January"
Option 2: convert to a date by assuming that the day is always the first day of the month.
df$date <- as.Date(paste(df$month, "01", sep = "-"), "%YM%m-%d")
For selection (and I think you mean select rows, not columns), you already have everything you need. For example, to select only March 2013:
library(dplyr)
df %>% filter(month == "2013M03")
Something like this will get it:
raw <- "2012M01"
dt <- strptime(raw,format = "%YM%m")
dt will be in a Posix format. The strptime function will assign a '1' as the default day of month to make it a complete date.

monthlyReturn and unequal month length

I have 300+ companies and need to calculate monthly return for them and later use it as one of the variables in my data set.
I download prices from Yahoo and calculated monthly return using quantmod package:
require(quantmod)
stockData <- lapply(symbols,function(x) getSymbols(x,auto.assign=FALSE, src='yahoo', from = '2000-01-01'))
stockDataReturn <- lapply(stockData,function(x) monthlyReturn(Ad(x)))
The problem I have is that some companies have different month ends (due to trading halts, etc.) which is reflected in the output list: 2013-12-30 for company AAA and 2013-12-31 for company BBB and the rest of the sample.
When I merge the list using
returns <- do.call(merge.xts, stockDataReturn)
It creates a separate row for 2013-12-30 with all NAs except for AAA company.
How can I resolve this? My understanding is that I need to need to stick to month-year format which I need to use as the index before I merge.
Ideally, what I want is that at the monthlyReturn stage, it uses the beginning of the month date rather than end of the month.
You could use lubridate's floor_date to merge on the same beginning of the month timestamp rather than end of the month timestamp. Or use ceiling date to round to the same end of month timestamp for all securities before merging.
library(lubridate)
stockDataReturn <- lapply(stockDataReturn,
function(x) {
index(x) <- floor_date(index(x), "month")
# Or if you want to round to end of month change to:
# index(x) <- ceiling_date(index(x), "month")
x
})
returns <- do.call(merge, stockDataReturn)
colnames(returns) <- symbols

Sorting the date column in calendar order in R

Is there a way to sort the date column in R in calendar order. like begining from "Jan-16", "Feb-16", "Mar-16" or beginning with recent month "May-16", "Apr-16" and "Mar-16".
Regards,
Mohan
One solution is to add the year, and then convert the vector to the Date class:
# dates
dates <- c("Jan-16", "Feb-16", "Mar-16")
# convert to date class
dates <- as.Date(paste0("2016-", dates), format="%Y-%b-%d")
# get most recent date
max(dates)
# sort
sort(dates, decreasing=T)

how to extract all values from a data frame by month for years

I have data in a zoo data structure. I want to pull all August daily vaules over 10 years and compute monthly statistics for a period of record. Any thoughts on easy way to do this?
an example will be great of the specific date format, however, try format()
for example:
x <- as.POSIXct("2009-08-03 12:01:59.23")
format(x,"%b")
For simplicity, just create a new column with the format() then subset it with the month your looking for.

Resources