Change origin for time series in r - r

I have a time series in R that I would like to work with, spanning from 01-01-52 to 01-01-88. (1952 to 1988). 37 observations.
However, when I read it in in R, I encounter the problem that the observations from 01-01-52 to 01-01-68 are interpreted as being in 2052 etc., rather than 1952.
How do I force R to read in all the data as being from 1952 to 1988?
Link to my data: https://www.dropbox.com/s/93foyc238skt3xj/AgricIndus.csv?dl=0
This is the code I have used. Do you know what I need to do with my code to make it read properly?
agri <- read.table("AgricIndus.csv",
sep = ",", header = TRUE, skip = 0,
stringsAsFactors = FALSE)
agri$time <- as.Date(agri$time, "%m-%d-%y")
agri.xts <- xts(agri[, 2:3], order.by = agri$time)

One way (hack) can be the following:
agri$time <- as.Date(paste0(substring(agri$time,1,6), '19', substring(agri$time,7,8)),
"%m-%d-%Y")
agri$time
# [1] "01-01-52" "01-01-53" "01-01-54" "01-01-55" "01-01-56" "01-01-57" "01-01-58" "01-01-59" "01-01-60" "01-01-61" "01-01-62" "01-01-63" "01-01-64" "01-01-65"
# [15] "01-01-66" "01-01-67" "01-01-68" "01-01-69" "01-01-70" "01-01-71" "01-01-72" "01-01-73" "01-01-74" "01-01-75" "01-01-76" "01-01-77" "01-01-78" "01-01-79"
# [29] "01-01-80" "01-01-81" "01-01-82" "01-01-83" "01-01-84" "01-01-85" "01-01-86" "01-01-87" "01-01-88"

If you can be sure that your time series is regular then the it is probably the easiest to generate a regular date sequence like so:
agri$time <- seq.Date(as.Date("1952-01-01"),as.Date("1988-01-01"),by='years’)
Another easy solution that would work for irregular time series as well would be to read your data as years 52 to 88 with format = %m-%d-%Y (capitalized “Y” !) and add 1900 years:
df$time <- as.POSIXlt(as.Date(df$time,format = '%m-%d-%Y'))
df$time$year <-df$time$year + 1900
df$time <- as.Date(df$time)
df$time
[1] "1952-01-01" "1953-01-01" "1954-01-01" "1955-01-01"
[5] "1956-01-01" "1957-01-01" "1958-01-01" "1959-01-01"
[9] "1960-01-01" "1961-01-01" "1962-01-01" "1963-01-01"
[13] "1964-01-01" "1965-01-01" "1966-01-01" "1967-01-01"
[17] "1968-01-01" "1969-01-01" "1970-01-01" "1971-01-01"
[21] "1972-01-01" "1973-01-01" "1974-01-01" "1975-01-01"
[25] "1976-01-01" "1977-01-01" "1978-01-01" "1979-01-01"
[29] "1980-01-01" "1981-01-01" "1982-01-01" "1983-01-01"
[33] "1984-01-01" "1985-01-01" "1986-01-01" "1987-01-01"
[37] "1988-01-01"

Related

how to sort list.files() in correct date order?

Using normal list.files() in the working directory return the file list but the numeric order is messed up.
f <- list.files(pattern="*.nc")
f
# [1] "te1971-1.nc" "te1971-10.nc" "te1971-11.nc" "te1971-12.nc"
# [5] "te1971-2.nc" "te1971-3.nc" "te1971-4.nc" "te1971-5.nc"
# [9] "te1971-6.nc" "te1971-7.nc" "te1971-8.nc" "te1971-9.nc"
where the number after "-" describes the month number.
I used the following to try to sort it
myFiles <- paste("te", i, "-", c(1:12), ".nc", sep = "")
mixedsort(myFiles)
it returns ordered files but in reverse:
[1] "te1971-12.nc" "te1971-11.nc" "tev1971-10.nc" "te1971-9.nc"
[5] "te1971-8.nc" "te1971-7.nc" "te1971-6.nc" "te1971-5.nc"
[9] "te1971-4.nc" "te1971-3.nc" "te1971-2.nc" "te1971-1.nc"
How do I fix this?
The issue is that the values get alphabetically sorted.
You could gsub out years and months as groups (.) and add "-1" as first day of the month to the yield, coerce it as.Date and order by that.
x[order(as.Date(gsub('.*(\\d{4})-(\\d{,2}).*', '\\1-\\2-1', x)))]
# [1] "te1971-1.nc" "te1971-2.nc" "te1971-3.nc" "te1971-4.nc" "te1971-5.nc"
# [6] "te1971-6.nc" "te1971-7.nc" "te1971-8.nc" "te1971-9.nc" "te1971-10.nc"
# [11] "te1971-11.nc" "te1971-12.nc"
Data:
x <- c("te1971-1.nc", "te1971-10.nc", "te1971-11.nc", "te1971-12.nc",
"te1971-2.nc", "te1971-3.nc", "te1971-4.nc", "te1971-5.nc", "te1971-6.nc",
"te1971-7.nc", "te1971-8.nc", "te1971-9.nc")

Create date with R

Hy guys,
I want to build in R a Date as:
02-06-year
for 15 years.
Here the code:
library(timeDate)
listHolidays
seq=0:5000
data.iniziale <- as.Date("2015-01-01")
calendario = data.iniziale + seq
l = length(calendario)
for (i in 1:l){
x[i]=as.Date(year(calendario[i]),06,02)
}
It does not work as is.
How can I do for that date
Similar to Albins solution, but I understood the question slightly different:
format(seq(as.Date("2015-01-01"), as.Date("2030-01-01"), "year"), "%Y-06-02")
Output:
[1] "2015-06-02" "2016-06-02" "2017-06-02" "2018-06-02" "2019-06-02" "2020-06-02" "2021-06-02"
[8] "2022-06-02" "2023-06-02" "2024-06-02" "2025-06-02" "2026-06-02" "2027-06-02" "2028-06-02"
[15] "2029-06-02" "2030-06-02"
I suggest to use some of existing functions of R to facilitate your task.
With the seq function, you can generate simply a sequence of dates. And format is as shown below:
format(seq(as.Date("2015-01-01"), as.Date("2030-01-01"), "days"), "%m-%d-%Y")
Output (partly):
[1] "01-01-2015" "01-02-2015" "01-03-2015" "01-04-2015" "01-05-2015" "01-06-2015" "01-07-2015" "01-08-2015"
[9] "01-09-2015" "01-10-2015" "01-11-2015" "01-12-2015" "01-13-2015" "01-14-2015" "01-15-2015" "01-16-2015"
Another possible solution, using lubridate:
library(tidyverse)
library(lubridate)
str_c("2-6-", 2000:2014) %>% dmy
#> [1] "2000-06-02" "2001-06-02" "2002-06-02" "2003-06-02" "2004-06-02"
#> [6] "2005-06-02" "2006-06-02" "2007-06-02" "2008-06-02" "2009-06-02"
#> [11] "2010-06-02" "2011-06-02" "2012-06-02" "2013-06-02" "2014-06-02"

Calculate a rolling percent change in R

I am trying to calculate a 20-day rolling percent change in R based off of a stock's closing price. Below is a sample of the most recent 100 days of closing price data. df$Close[1] is the most recent day, df$Close[2] is the previous day, and so on.
df$Close
[1] 342.94 346.22 346.18 335.24 330.45 334.20 325.45 333.79 334.90 341.66 333.74 334.49 329.75 329.82 330.56 322.81 317.87 306.84
[19] 310.39 310.60 324.46 338.03 333.12 341.06 337.25 341.01 345.30 338.69 340.77 342.96 347.56 340.89 327.74 327.64 335.37 338.62
[37] 341.13 335.85 331.62 328.08 329.98 323.57 316.92 312.22 315.81 328.69 324.61 341.88 340.78 339.99 335.34 324.76 328.53 324.54
[55] 323.77 325.45 330.05 329.22 333.64 332.96 326.23 343.01 339.39 339.61 340.65 353.58 352.96 345.96 343.21 357.48 355.70 364.72
[73] 373.06 373.92 376.53 376.51 378.69 378.00 377.57 382.18 376.26 375.28 382.05 379.38 380.66 372.63 364.38 368.39 365.51 363.35
[91] 359.37 355.12 355.45 358.45 366.56 363.18 362.65 359.96 361.13 361.61
Previously, I had used the following code to calculate the percent change:
PercChange(df, Var = 'Close', type = 'percent', NewVar = 'OneMonthChange', slideBy = 20)
which gave me the following output:
df$OneMonthChange
[1] 5.695617e-02 2.422862e-02 3.920509e-02 -1.706445e-02 -2.016308e-02 -1.997009e-02 -5.748624e-02 -1.446751e-02 -1.722569e-02
[10] -3.790530e-03 -3.976292e-02 -1.877438e-02 6.132910e-03 6.653644e-03 -1.434237e-02 -4.668950e-02 -6.818515e-02 -8.637785e-02
[19] -6.401906e-02 -5.327969e-02 -1.672829e-02 4.468894e-02 5.111700e-02 9.237076e-02 6.788892e-02 3.748213e-02 6.373802e-02
[28] -9.330759e-03 -2.934445e-05 8.735551e-03 3.644063e-02 4.966745e-02 -2.404651e-03 9.551981e-03 3.582790e-02 4.046705e-02
[37] 3.357067e-02 2.013851e-02 -6.054430e-03 -1.465642e-02 1.149496e-02 -5.667473e-02 -6.620702e-02 -8.065134e-02 -7.291942e-02
[46] -7.039425e-02 -8.032072e-02 -1.179327e-02 -7.080213e-03 -4.892581e-02 -5.723925e-02 -1.095635e-01 -1.193642e-01 -1.320603e-01
[55] -1.401216e-01 -1.356139e-01 -1.284428e-01 -1.290476e-01 -1.163493e-01 -1.287875e-01 -1.329666e-01 -8.598913e-02 -1.116608e-01
[64] -1.048289e-01 -1.051069e-01 -5.112310e-02 -3.134091e-02 -6.088656e-02 -6.101064e-02 -1.615522e-02 -1.021232e-02 2.703312e-02
[73] 4.954283e-02 4.315804e-02 2.719882e-02 3.670356e-02 4.422997e-02 5.011668e-02 4.552377e-02 5.688449e-02 3.507469e-02
[82] 3.391465e-02 6.444333e-02 8.011616e-02 8.157409e-02 4.583216e-02 1.691226e-02 -1.310009e-02 -6.253229e-03 -2.445900e-02
[91] -2.817816e-02 1.119052e-02 2.662970e-02 4.914242e-02 8.787654e-02 6.454450e-02 5.280729e-02 3.546875e-02 2.567525e-02
[100] 2.392683e-02
The PercChange function has now been deprecated and I need to find a new function to replace it. Essentially, I need a function that calculates the percent change of df$Close[1:20] (This would be Close of day 1 minus close of day 20, divided by close of day 20), then rolls to [2:21] for the next row, then [3:22],[4:23], and so on.
Thanks in advance!
A tidyverse approach
library(tidyr)
library(dplyr)
df %>% mutate(OneMonthChange=(Close-lead(Close, 20))/lead(Close, 20),
OneMonthChange=replace_na(OneMonthChange,0))
Close OneMonthChange
1 342.94 5.695617e-02
2 346.22 2.422862e-02
3 346.18 3.920509e-02
4 335.24 -1.706445e-02
5 330.45 -2.016308e-02
6 334.20 -1.997009e-02
etc...
Here is a simple Base R solution:
PercChange<- function(x, slideBy){
-diff(x, slideBy)/ tail(x, -slideBy)
}
PercChange(df$Close, slideBy = 20)
[1] 5.695617e-02 2.422862e-02 3.920509e-02 -1.706445e-02
[5] -2.016308e-02 -1.997009e-02 -5.748624e-02 -1.446751e-02
[9] -1.722569e-02 -3.790530e-03 -3.976292e-02 -1.877438e-02
If you desire a datframe back, then modify this into:
PercChange<- function(data, Var, NewVar, slideBy){
x <- data[[Var]]
data[NewVar] <- c(-diff(x, slideBy)/ tail(x, -slideBy), numeric(slideBy))
data
}
PercChange(df, Var = 'Close', NewVar = 'OneMonthChange', slideBy = 20)
data:
df <- structure(list(Close = c(342.94, 346.22, 346.18, 335.24, 330.45,
334.2, 325.45, 333.79, 334.9, 341.66, 333.74, 334.49, 329.75,
329.82, 330.56, 322.81, 317.87, 306.84, 310.39, 310.6, 324.46,
338.03, 333.12, 341.06, 337.25, 341.01, 345.3, 338.69, 340.77,
342.96, 347.56, 340.89, 327.74, 327.64, 335.37, 338.62, 341.13,
335.85, 331.62, 328.08, 329.98, 323.57, 316.92, 312.22, 315.81,
328.69, 324.61, 341.88, 340.78, 339.99, 335.34, 324.76, 328.53,
324.54, 323.77, 325.45, 330.05, 329.22, 333.64, 332.96, 326.23,
343.01, 339.39, 339.61, 340.65, 353.58, 352.96, 345.96, 343.21,
357.48, 355.7, 364.72, 373.06, 373.92, 376.53, 376.51, 378.69,
378, 377.57, 382.18, 376.26, 375.28, 382.05, 379.38, 380.66,
372.63, 364.38, 368.39, 365.51, 363.35, 359.37, 355.12, 355.45,
358.45, 366.56, 363.18, 362.65, 359.96, 361.13, 361.61)), class = "data.frame", row.names = c(NA,
-100L))

Dynamically replace specific characters within strings and assign them to new variables

I have a bunch of character vectors which I use to download some files (one for each month of the year), for which I have to change the date for every single link manually (at the end of the vector). It looks like this:
query_01_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.01.2019&to=31.01.2019"
query_02_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.02.2019&to=28.02.2019"
query_03_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.03.2019&to=31.03.2019"
query_04_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.04.2019&to=30.04.2019"
query_05_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.05.2019&to=31.05.2019"
query_06_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.06.2019&to=30.06.2019"
query_07_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.07.2019&to=31.07.2019"
query_08_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.08.2019&to=31.08.2019"
query_09_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.09.2019&to=30.09.2019"
query_10_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.10.2019&to=31.10.2019"
query_11_19 = "?format=Html&userId=1232&userHash=1277KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.11.2019&to=30.11.2019"
query_12_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.12.2019&to=31.12.2019"
This is already rather tedious for one year, but it becomes a real pain if I want to this for all the following years (let's say until 2030).
Is there an easier way to do this?
Thanks in advance!
A few tricks to make this easy:
use of seq.Date to generate the first day of each month (it is shown here as seq due to the convenience R's S3 methods provide);
substract 1 from those to get the last day of the previous months; and
join those together with paste0 after formating them to the dot-separated date format.
## 1
dates <- seq(as.Date("2018-01-01"), as.Date("2019-01-01"), by = "month")
dates
# [1] "2018-01-01" "2018-02-01" "2018-03-01" "2018-04-01" "2018-05-01" "2018-06-01" "2018-07-01"
# [8] "2018-08-01" "2018-09-01" "2018-10-01" "2018-11-01" "2018-12-01" "2019-01-01"
dates_first <- format(dates[-length(dates)], format = "%d.%m.%Y")
## 2
dates_last <- format(dates[-1] - 1L, format = "%d.%m.%Y")
dates_last
# [1] "31.01.2018" "28.02.2018" "31.03.2018" "30.04.2018" "31.05.2018" "30.06.2018" "31.07.2018"
# [8] "31.08.2018" "30.09.2018" "31.10.2018" "30.11.2018" "31.12.2018"
## 3
paste0(
"?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=",
dates_first,
"&to=",
dates_last)
# [1] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.01.2018&to=31.01.2018"
# [2] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.02.2018&to=28.02.2018"
# [3] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.03.2018&to=31.03.2018"
# [4] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.04.2018&to=30.04.2018"
# [5] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.05.2018&to=31.05.2018"
# [6] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.06.2018&to=30.06.2018"
# [7] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.07.2018&to=31.07.2018"
# [8] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.08.2018&to=31.08.2018"
# [9] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.09.2018&to=30.09.2018"
# [10] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.10.2018&to=31.10.2018"
# [11] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.11.2018&to=30.11.2018"
# [12] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.12.2018&to=31.12.2018"
(Easily could have been done with sprintf or related functions.)

create sequence of months and days with no year

Good afternoon. I am attempting to make a sequence of month-day dates. The desired result would be October 1st through December 31st. I think the code is simple enough to show my intent, but is flawed in its logic. Thanks very much for any help here.
start <- as.Date("10/1", "m%-d%")
end <- as.Date("12/31", "m%-d%")
seq(start, end, by = "1 Day")
We can create a date sequence (any year would work, no need to be 2001 in my example), and then use the format function.
format(seq.Date(as.Date("2001-10-31"), as.Date("2001-12-31"), by = 1),
format = "%m-%d")
# [1] "10-31" "11-01" "11-02" "11-03" "11-04" "11-05" "11-06" "11-07" "11-08" "11-09" "11-10"
# [12] "11-11" "11-12" "11-13" "11-14" "11-15" "11-16" "11-17" "11-18" "11-19" "11-20" "11-21"
# [23] "11-22" "11-23" "11-24" "11-25" "11-26" "11-27" "11-28" "11-29" "11-30" "12-01" "12-02"
# [34] "12-03" "12-04" "12-05" "12-06" "12-07" "12-08" "12-09" "12-10" "12-11" "12-12" "12-13"
# [45] "12-14" "12-15" "12-16" "12-17" "12-18" "12-19" "12-20" "12-21" "12-22" "12-23" "12-24"
# [56] "12-25" "12-26" "12-27" "12-28" "12-29" "12-30" "12-31"
Alternatively with lubridate:
library(lubridate)
format( seq(ymd('2018-10-01'),ymd('2018-12-31'),by='days'), "%m-%d" )
[1] "10-01" "10-02" "10-03" "10-04" "10-05" "10-06" "10-07" "10-08" "10-09"
...etc to "12-31"

Resources