I have a data set that will be used for time series. the date column is currently structured as follows:
> head(cam_shiller)
div stock dates
1 0.495 7.09 1933m1
2 0.490 6.25 1933m2
3 0.485 6.23 1933m3
4 0.480 6.89 1933m4
5 0.475 8.87 1933m5
6 0.470 10.39 1933m6
If I'm not mistaken, monthly data for time series should look like this: yyyy-mm. So I'm trying to make my date column look like this:
div stock dates
1 0.495 7.09 1933-01
2 0.490 6.25 1933-02
3 0.485 6.23 1933-03
4 0.480 6.89 1933-04
5 0.475 8.87 1933-05
6 0.470 10.39 1933-06
However, using the as.yearmo function produces a column full of NAs. I tried removing the 'm' and replacing it with a dash, and then running as.yearmo again. Now the results look like this:
div stock dates
1 0.495 7.09 Jan 1933
2 0.490 6.25 Feb 1933
3 0.485 6.23 Mar 1933
4 0.480 6.89 Apr 1933
5 0.475 8.87 May 1933
6 0.470 10.39 Jun 1933
How do I change the dates into the yyyy-mm format?
library(zoo)
cam_shiller = read.csv('https://raw.githubusercontent.com/bandcar/Examples/main/cam_shiller.csv')
cam_shiller$dates = gsub('m', '-', cam_shiller$dates)
cam_shiller$dates = as.yearmon(cam_shiller$dates)
Actually, in ts you just need to specify start= and frequency.
res <- ts(cam_shiller[, -3], start=1933, frequency=12)
res
# div stock
# Jan 1933 0.4950 7.09
# Feb 1933 0.4900 6.25
# Mar 1933 0.4850 6.23
# Apr 1933 0.4800 6.89
# May 1933 0.4750 8.87
# Jun 1933 0.4700 10.39
# Jul 1933 0.4650 11.23
# Aug 1933 0.4600 10.67
# Sep 1933 0.4550 10.58
# Oct 1933 0.4500 9.55
# Nov 1933 0.4450 9.78
# Dec 1933 0.4400 9.97
# Jan 1934 0.4408 10.54
# Feb 1934 0.4417 11.32
# Mar 1934 0.4425 10.74
# Apr 1934 0.4433 10.92
# May 1934 0.4442 9.81
# Jun 1934 0.4450 9.94
# Jul 1934 0.4458 9.47
# Aug 1934 0.4467 9.10
# Sep 1934 0.4475 8.88
# Oct 1934 0.4483 8.95
# Nov 1934 0.4492 9.20
# Dec 1934 0.4500 9.26
# ...
Or
ts(cam_shiller$stock, start=c(1933, 1), frequency=12)
# Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
# 1933 7.09 6.25 6.23 6.89 8.87 10.39 11.23 10.67 10.58 9.55 9.78 9.97
# 1934 10.54 11.32 10.74 10.92 9.81 9.94 9.47 9.10 8.88 8.95 9.20 9.26
# 1935 9.26 8.98 8.41 9.04 9.75 10.12 10.65 11.37 11.61 11.92 13.04 13.04
# ...
It may be wise to check beforehand that there are no gaps in the data by evaluating the column and row variances of years and month matrices:
test <- do.call(rbind, strsplit(cam_shiller$dates, 'm')) |>
type.convert(as.is=TRUE)
matrixStats::colVars(matrix(test[, 1], 12))
# [1] 0 0 ...
matrixStats::rowVars(matrix(test[, 2], 12))
# [1] 0 0 0 0 0 0 0 0 0 0 0 0
If you use the xts::xts, it's rather picky since it wants a time-based class such as "Date" or "POSIXct". So you need whole dates, i.e. paste a 01 as pseudo date.
res <- transform(cam_shiller, dates=strptime(paste(dates, '01'), format='%Ym%m %d')) |>
{\(.) xts::as.xts(.[1:2], .$dates)}()
head(res)
# div stock
# 1933-01-01 0.495 7.09
# 1933-02-01 0.490 6.25
# 1933-03-01 0.485 6.23
# 1933-04-01 0.480 6.89
# 1933-05-01 0.475 8.87
# 1933-06-01 0.470 10.39
class(res)
# [1] "xts" "zoo"
Data:
cam_shiller <- structure(list(div = c(0.495, 0.49, 0.485, 0.48, 0.475, 0.47,
0.465, 0.46, 0.455, 0.45, 0.445, 0.44, 0.4408, 0.4417, 0.4425,
0.4433, 0.4442, 0.445, 0.4458, 0.4467, 0.4475, 0.4483, 0.4492,
0.45), stock = c(7.09, 6.25, 6.23, 6.89, 8.87, 10.39, 11.23,
10.67, 10.58, 9.55, 9.78, 9.97, 10.54, 11.32, 10.74, 10.92, 9.81,
9.94, 9.47, 9.1, 8.88, 8.95, 9.2, 9.26), dates = c("1933m1",
"1933m2", "1933m3", "1933m4", "1933m5", "1933m6", "1933m7", "1933m8",
"1933m9", "1933m10", "1933m11", "1933m12", "1934m1", "1934m2",
"1934m3", "1934m4", "1934m5", "1934m6", "1934m7", "1934m8", "1934m9",
"1934m10", "1934m11", "1934m12")), row.names = c(NA, 24L), class = "data.frame")
Try lubridate::ym to change dates to yyyy-mm format
library(tidyverse)
cam_shiller = read.csv('https://raw.githubusercontent.com/bandcar/Examples/main/cam_shiller.csv')
cam_shiller %>%
mutate(
date = lubridate::ym(dates),
date = strftime(date, "%Y-%m")
) %>%
head()
#> div stock dates date
#> 1 0.495 7.09 1933m1 1933-01
#> 2 0.490 6.25 1933m2 1933-02
#> 3 0.485 6.23 1933m3 1933-03
#> 4 0.480 6.89 1933m4 1933-04
#> 5 0.475 8.87 1933m5 1933-05
#> 6 0.470 10.39 1933m6 1933-06
Created on 2022-10-01 with reprex v2.0.2
The form in the question is already correct. It is not true
that you need to change it. It renders as Jan 1933, etc. but internally it is represented as year+(month-1)/12 (where month is a number 1, 2, ..., 12) which is exactly what you need for analysis. You do not want a character string of the form yyyy-mm for analysis.
If by "time series" you mean a zoo series then using u defined in the Note at the end, z below gives that with a yearmon index. The index argument to read.csv.zoo gives the column number or name of the index, the FUN argument tells it how to convert it and the format argument tells it the precise form of the dates.
If what you mean by time series is that you want a ts series then tt below gives that.
If what you mean is a data frame with a yearmon column then DF below gives that.
With either a zoo series or a ts series one could perform a variety of analyses. For example, acf(z) or acf(tt) would give the autocorrelation function.
For more information see ?read.csv.zoo . There is also an entire vignette on read.zoo and its variants. The vignettes are linked to on the CRAN home page for zoo. Also see ?strptime for the percent codes.
library(zoo)
# zoo series with yearmon column
z <- read.csv.zoo(u, index = 3, FUN = as.yearmon, format = "%Ym%m")
# ts series
tt <- as.ts(z)
# data frame with yearmon column
DF <- u |>
read.csv() |>
transform(dates = as.yearmon(dates, "%Ym%m"))
A character string of the form yyyy-mm is not a suitable form for most analyses but if you really did want that anyways then
# zoo series with yyyy-mm character string index
z2 <- aggregate(z, format(index(z), "%Y-%m"), c)
# data.frame with yyyy-mm character string column
DF2 <- transform(DF, dates = format(dates, "%Y-%m"))
Note
u <- "https://raw.githubusercontent.com/bandcar/Examples/main/cam_shiller.csv"
I have this large xts, aggregated monthly with apply.monthly function.
2011-07-31 269.8
2011-08-31 251.0
2011-09-30 201.8
2011-10-31 95.8
2011-11-30 NA
2011-12-31 49.3
2012-01-31 77.1
...
What I want is to calculate the average of Jan-Dec months for all the period. Something like this, but in xts form:
01 541.8
02 23.0
03 34.8
04 12.8
05 21.8
06 44.8
07 22.8
08 55.0
09 287.8
10 15.8
11 113
12 419.3
I want to avoid using dplyr functions like group_by. I think there must be a solution using split and lapply / do.call
I tried spliting the xts in years
xtsobject <- split(xtsobject, f = "years")
and then I dont know how to use properly the lapply function in order to calculate the 12 averages (Jan-Dec) of all the period.
This question
Group by period.apply() in xts
is similar, but in my xts I dont have/want a new column, I think it can be done using the xts index.
Assuming the input data x, shown reproducibly in the Note at the end, useaggregate.zoo like this:
ag <- aggregate(x, cycle(as.yearmon(time(x))), mean, na.rm = TRUE)
ag
giving the following zoo series:
1 77.1
7 269.8
8 251.0
9 201.8
10 95.8
11 NaN
12 49.3
We could plot it like this:
plot(ag, type = "h")
Note
Lines <- "2011-07-31 269.8
2011-08-31 251.0
2011-09-30 201.8
2011-10-31 95.8
2011-11-30 NA
2011-12-31 49.3
2012-01-31 77.1"
library(xts)
z <- read.zoo(text = Lines)
x <- as.xts(z)
You can use the base::months function to extract the month before calculating the mean:
do.call(rbind, lapply(split(x, base::months(index(x))), mean, na.rm=TRUE))
output:
[,1]
April 165.1600
August 290.2444
December 106.8200
February 82.6300
January 62.9100
July 264.9889
June 246.4889
March 100.5500
May 246.3333
November 116.6400
October 151.3667
September 158.5667
It seems the index is a number and not a POSIXct object. You can convert it and use format to extract months and use it in tapply :
tapply(xtsobject[, 1], format(as.POSIXct(zoo::index(xtsobject),
origin = '1970-01-01'), '%m'), mean, na.rm = TRUE)
I am getting an "Error: Don't know how to add o to a plot" after converting my dataset to ts format. I am working on a forecasting project and I have this data set which I converted from xts to ts:
library(tidyverse)
library(TTR)
library(forecast)
[,1]
1998-12-31 0.025
1999-12-31 0.038
2000-12-31 0.086
2001-12-31 0.142
2002-12-31 0.190
2003-12-31 0.273
2004-12-31 0.394
2005-12-31 0.406
2006-12-31 0.483
2007-12-31 0.612
2008-12-31 0.746
2009-12-31 0.823
2010-12-31 0.930
2011-12-31 0.987
2012-12-31 1.064
2013-12-31 1.100
2014-12-31 1.160
2015-12-31 1.152
2016-12-31 1.204
#convert to ts
df.ts <- ts(df.xts, start = c(1998,1), end = c(2016,1), frequency = 1)
#select training period
train <- window(df.ts, end = c(2013, 1))
#simple naive forecasting model
fit5 <- train %>%
snaive(h = h)
I am getting the error once I try overlaying my forecast to the original dataset through autoplot:
autoplot(df.ts) + autoplot(fit5)
Error: Don't know how to add o to a plot
I tried changing the start and end values but I still get the same error. Am I missing something here? I'd appreciate if anyone could help me out since I am just starting to work with ts format.
I have the following data that I am trying to plot with dygraphs in R:
ts.rmean ts.rmax
0001-01-01 3.163478 5.86
0002-01-01 3.095909 4.67
0003-01-01 3.112000 6.01
0004-01-01 2.922800 5.44
0005-01-01 2.981154 5.21
0006-01-01 3.089167 5.26
0007-01-01 3.168000 6.28
0008-01-01 3.040400 5.00
0009-01-01 2.809130 6.04
0010-01-01 3.002174 4.64
0011-01-01 3.002000 4.93
0012-01-01 3.081250 5.28
0013-01-01 2.687083 4.62
Each line represents a daily value between 01 Jan - 31 Dec for ts.rmean and ts.rmax. Since I have not specified the date, the x-axis of the plot shows the index of each line from 1 to 366. It is possible to modify the data so that the x-axis would show Month-Day?
You could do something like this:
library(dygraphs)
library(xts)
#convert the rownames of your data frame to a year-month-day,
#used 2012 because it has 366 days and subsetted to fit the example
rownames(data)<-strptime(paste0("2012-",1:366),format="%Y-%j")[1:nrow(data)]
#transform to xts
data<-as.xts(data)
#plot
dygraph(data)
this is my first question on this forum.
I would like to re-model the structure of my dataset.
I would like to split the column "Teams" into two columns. One with the hometeam and another with the awayteam.
I also would like to split the result into two columns. Homegoals and Awaygoals. The new columns should not have a zero infront of the "real" goals scored.
BEFORE
Date Time Teams Results Homewin Draw Awaywin
18 May 19:45 AC Milan - Sassuolo 02:01 1.26 6.22 10.47
18 May 19:45 Chievo - Inter 02:01 3.73 3.42 2.05
18 May 19:45 Fiorentina - Torino 02:02 2.84 3.58 2.39
AFTER
Date Time Hometeam Awayteam Homegoals Awaygoals Homewin Draw Awaywin
18 May 19:45 AC Milan Sassuolo 2 1 1.26 6.22 10.47
18 May 19:45 Chievo Inter 2 1 3.73 3.42 2.05
18 May 19:45 Fiorentina Torino 2 2 2.84 3.58 2.39
Can R fix this problem for me? Which packages do i need?
I want to be able to do this for many excel spreadsheets with different leagues and divisions but all with the same structure.
Can someone help me and my data.frame?
tidyr solution:
separate(your.data.frame, Teams, c('Home', 'Away'), sep = " - ")
Base R solution (following this answer):
df <- data.frame(do.call(rbind, strsplit(as.character(your.df$teams), " - ")))
names(df) <- c("Home", "Away")
Here's an approach that uses cSplit from the splitstackshape package, which uses and returns a data.table. Presuming your original data frame is named df,
library(splitstackshape)
setnames(
cSplit(df, 3:4, c(" - ", ":"))[, c(1:2, 6:9, 3:5), with = FALSE],
3:6,
paste0(c("Home", "Away"), rep(c("Team", "Goals"), each = 2))
)[]
# Date Time HomeTeam AwayTeam HomeGoals AwayGoals Homewin Draw Awaywin
# 1: 18 May 19:45 AC Milan Sassuolo 2 1 1.26 6.22 10.47
# 2: 18 May 19:45 Chievo Inter 2 1 3.73 3.42 2.05
# 3: 18 May 19:45 Fiorentina Torino 2 2 2.84 3.58 2.39