I need your help. I am new to R, I have this csv file shorturl.at/chDK9 with the "All Share Index" from the Nigerian stock exchange, formatted in a matrix, with the months as rows and the years as columns.
I am trying to do 4 things:
Reshape the data, to four columns for Date, Month, Year, ASI
The period should be a date column in the format 01-2013 for January 2013 and so on.
Arrange the data by the date, oldest to newest
Convert the data to a time-series type for analysis (xts prefarably)
So far I have solved 1 & 2 above.
Please see my code below
rASI <- ASI_conv_to_USD_2003_2018
gathered.rASI <- gather(rASI, Month, ASI, -Year)
gathered.rASI$Date <- format(as.Date(paste0(gathered.rASI$Month, gathered.rASI$Year, "01"), format="%b%Y%d"), "%m-%Y")
ASI <- select(gathered.rASI, Date, Month, Year, ASI)
Created on 2020-06-04 by the reprex package (v0.3.0)
I do not know what I am doing wrong, but the date column still shows as a chr. How do I make the date column function as a proper date?
Any help would be greatly appreciated.
Data:
Year,January,February,March,April,May,June,July,August,September,October,November,December
2003,104.904946,108.036674,106.6532671,106.1211644,110.6369777,114.3109402,109.7382693,120.7042254,129.0513061,141.9747008,140.2999274,147.4647619
2004,168.4931751,184.3675093,171.8948949,194.2243976,209.6846881,218.4302457,204.5201028,179.6591854,171.788925,176.3957704,175.7856172,180.1624481
2005,174.3600786,165.874575,156.2704949,165.9616111,162.3373385,162.9130468,165.5409489,177.6973735,190.975969,200.5254592,189.5253288,187.4381323
2006,184.2754864,187.0039216,184.1151874,183.9374803,195.3248086,207.753217,220.2425152,261.5902624,257.3486166,257.9713924,257.9644269,262.3660079
2007,290.763576,321.9563671,344.0977116,373.70341,397.1224052,408.8450816,422.9554882,404.1068702,405.3995157,413.592025,462.4500768,498.6259673
2008,465.9093801,564.6059512,542.1712123,511.539673,507.3090565,481.7790407,457.4977173,411.7628813,398.2089436,312.9651073,284.4105236,240.5413384
2009,151.4739254,160.8334365,136.7210055,147.8068088,203.1480164,183.6687179,169.4245226,152.975866,150.2860646,146.6946313,142.143901,141.1054878
2010,152.3225241,155.1887111,175.6850474,178.6050908,176.5795117,171.5144595,174.5049291,163.103972,154.3394041,169.2037838,167.046543,166.6141118
2011,179.0501835,173.3762495,163.0327771,164.2939247,168.9634855,165.1146804,158.8889704,141.5247531,132.2063595,139.799399,128.3830306,132.7185019
2012,133.3492814,129.4949163,132.8047714,142.0467784,142.1346216,138.9576042,148.4574482,152.9350934,167.5144256,170.2584385,170.6456267,180.8386037
2013,205.1867431,213.04438,216.0144928,215.3981965,243.4601263,232.9424155,244.1989566,233.469857,235.6526892,242.2584675,250.74636,266.2963273
2014,261.3308857,254.8076651,249.6006828,247.9683695,267.1803131,273.6744186,271.1943568,267.5533724,265.4434783,241.8539225,209.9881459,206.9083582
2015,176.4899701,152.4243544,161.5512468,176.6316031,174.6074809,170.307101,153.589313,151.067888,158.9094935,148.4871247,140.5468193,145.7620865
2016,121.710687,125.041883,128.7848346,127.5440712,140.7794402,104.7709381,89.631776,90.34052373,92.97916325,89.3927422,83.19668309,88.25819376
2017,85.43474979,83.04616393,83.42762792,84.35732766,96.74749098,108.4396857,120.8084876,116.2751597,116.1014906,120.1450704,124.20491,125.1822913
2018,145.2937418,141.8812705,136.0134688,135.2162844,124.7488623,125.4006552,121.2306533,114.014232,107.1321563,106.06426,100.7971596,102.5464927
Here might be a way out, gathering your data (i.e., changing them from wide to long), creating a date variable and only then translating the result to xts.
## This assumes that you already have written the data frame (as in your example)
myxts <- ASI_conv_to_USD_2003_2018 %>%
## gather changes the data from wide to long
tidyr::gather("month","value",-Year) %>%
## dmy creates the date variable
mutate(dat = paste0("01 ",month," ",Year) %>% lubridate::dmy()) %>%
## keep only the date and the value
select(dat, value) %>%
## sort by date (not compulsory)
arrange(dat) %>%
## convert to xts (note that xts::as_xts() is deprecated)
timetk::tk_xts(select=value,date_var=dat)
I have a date frame df that simply looks like this:
month values
2012M01 99904
2012M02 99616
2012M03 99530
2012M04 99500
2012M05 99380
2012M06 99103
2013M01 98533
2013M02 97600
2013M03 96431
2013M04 95369
2013M05 94527
2013M06 93783
with month that was written in form of "M01", "M02"... and so on.
Now I want to convert this column to date format, is there a way to do it in R with lubridate?
I also want to select columns that contain one certain month from each year, like only March columns from all these years, what is the best way to do it?
The short answer is that dates require a year, month and day, so you cannot convert directly to a date format. You have 2 options.
Option 1: convert to a year-month format using zoo::as.yearmon.
library(zoo)
df$yearmon <- as.yearmon(df$month, "%YM%m")
# you can get e.g. month from that
months(df$yearmon[1])
# [1] "January"
Option 2: convert to a date by assuming that the day is always the first day of the month.
df$date <- as.Date(paste(df$month, "01", sep = "-"), "%YM%m-%d")
For selection (and I think you mean select rows, not columns), you already have everything you need. For example, to select only March 2013:
library(dplyr)
df %>% filter(month == "2013M03")
Something like this will get it:
raw <- "2012M01"
dt <- strptime(raw,format = "%YM%m")
dt will be in a Posix format. The strptime function will assign a '1' as the default day of month to make it a complete date.