Dynamically replace specific characters within strings and assign them to new variables - r

I have a bunch of character vectors which I use to download some files (one for each month of the year), for which I have to change the date for every single link manually (at the end of the vector). It looks like this:
query_01_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.01.2019&to=31.01.2019"
query_02_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.02.2019&to=28.02.2019"
query_03_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.03.2019&to=31.03.2019"
query_04_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.04.2019&to=30.04.2019"
query_05_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.05.2019&to=31.05.2019"
query_06_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.06.2019&to=30.06.2019"
query_07_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.07.2019&to=31.07.2019"
query_08_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.08.2019&to=31.08.2019"
query_09_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.09.2019&to=30.09.2019"
query_10_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.10.2019&to=31.10.2019"
query_11_19 = "?format=Html&userId=1232&userHash=1277KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.11.2019&to=30.11.2019"
query_12_19 = "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.12.2019&to=31.12.2019"
This is already rather tedious for one year, but it becomes a real pain if I want to this for all the following years (let's say until 2030).
Is there an easier way to do this?
Thanks in advance!

A few tricks to make this easy:
use of seq.Date to generate the first day of each month (it is shown here as seq due to the convenience R's S3 methods provide);
substract 1 from those to get the last day of the previous months; and
join those together with paste0 after formating them to the dot-separated date format.
## 1
dates <- seq(as.Date("2018-01-01"), as.Date("2019-01-01"), by = "month")
dates
# [1] "2018-01-01" "2018-02-01" "2018-03-01" "2018-04-01" "2018-05-01" "2018-06-01" "2018-07-01"
# [8] "2018-08-01" "2018-09-01" "2018-10-01" "2018-11-01" "2018-12-01" "2019-01-01"
dates_first <- format(dates[-length(dates)], format = "%d.%m.%Y")
## 2
dates_last <- format(dates[-1] - 1L, format = "%d.%m.%Y")
dates_last
# [1] "31.01.2018" "28.02.2018" "31.03.2018" "30.04.2018" "31.05.2018" "30.06.2018" "31.07.2018"
# [8] "31.08.2018" "30.09.2018" "31.10.2018" "30.11.2018" "31.12.2018"
## 3
paste0(
"?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=",
dates_first,
"&to=",
dates_last)
# [1] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.01.2018&to=31.01.2018"
# [2] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.02.2018&to=28.02.2018"
# [3] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.03.2018&to=31.03.2018"
# [4] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.04.2018&to=30.04.2018"
# [5] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.05.2018&to=31.05.2018"
# [6] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.06.2018&to=30.06.2018"
# [7] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.07.2018&to=31.07.2018"
# [8] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.08.2018&to=31.08.2018"
# [9] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.09.2018&to=30.09.2018"
# [10] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.10.2018&to=31.10.2018"
# [11] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.11.2018&to=30.11.2018"
# [12] "?format=Html&userId=1232&userHash=U127KfIHaiz3ks2gXEgNctA9n8P4c87o1SFcEu2weKpNdupQwmuRaMltEN7&query=ApplicationStatusByJob&from=01.12.2018&to=31.12.2018"
(Easily could have been done with sprintf or related functions.)

Related

How to make never ending quarters in r studio [duplicate]

I want to generate a sequence of dates with one quarter interval, with a starting date and ending date. I have below code :
> seq(as.Date('1980-12-31'), as.Date('1985-06-30'), by = 'quarter')
[1] "1980-12-31" "1981-03-31" "1981-07-01" "1981-10-01" "1981-12-31"
[6] "1982-03-31" "1982-07-01" "1982-10-01" "1982-12-31" "1983-03-31"
[11] "1983-07-01" "1983-10-01" "1983-12-31" "1984-03-31" "1984-07-01"
[16] "1984-10-01" "1984-12-31" "1985-03-31"
As you can see, this is not generating right sequence, as I dont understand how the date "1981-07-01" is coming here, I would expect "1981-06-30".
Is there any way to generate such sequence correctly with quarter interval?
Thanks for your time.
The from and to dates in the question are both end-of-quarter dates so we assume that that is the general case you are interested in.
1) Create a sequence of yearqtr objects yq and then convert them to Date class. frac=1 tells it s to use the end of the month. Alternately just use yq since that directly models years with quarters.
library(zoo)
from <- as.Date('1980-12-31')
to <- as.Date('1985-06-30')
yq <- seq(as.yearqtr(from), as.yearqtr(to), by = 1/4)
as.Date(yq, frac = 1)
giving;
[1] "1980-12-31" "1981-03-31" "1981-06-30" "1981-09-30" "1981-12-31"
[6] "1982-03-31" "1982-06-30" "1982-09-30" "1982-12-31" "1983-03-31"
[11] "1983-06-30" "1983-09-30" "1983-12-31" "1984-03-31" "1984-06-30"
[16] "1984-09-30" "1984-12-31" "1985-03-31" "1985-06-30"
2) or without any packages add 1 to from and to so that they are at the beginning of the next month, create the sequence (it has no trouble with first of month sequences) and then subtract 1 from the generated sequence giving the same result as above.
seq(from + 1, to + 1, by = "quarter") - 1
Using the clock package and R >= 4.1:
library(clock)
seq(year_quarter_day(1980, 4), year_quarter_day(1985, 2), by = 1) |>
set_day("last") |>
as_date()
# [1] "1980-12-31" "1981-03-31" "1981-06-30" "1981-09-30" "1981-12-31" "1982-03-31" "1982-06-30" "1982-09-30" "1982-12-31"
# [10] "1983-03-31" "1983-06-30" "1983-09-30" "1983-12-31" "1984-03-31" "1984-06-30" "1984-09-30" "1984-12-31" "1985-03-31"
# [19] "1985-06-30"
Note that this includes the final quarter. I don't know if that was your intent.
Different definition of "quarter". A quarter might well be (although it is not in R) 365/4 days. Look at output of :
as.Date('1980-12-31')+(365/4)*(0:12)
#[1] "1980-12-31" "1981-04-01" "1981-07-01" "1981-09-30" "1981-12-31" "1982-04-01" "1982-07-01" "1982-09-30"
#[9] "1982-12-31" "1983-04-01" "1983-07-01" "1983-09-30" "1983-12-31"
In order to avoid the days of the month from surprising you, you need to use a starting day of the month between 1 and 28, at least in non-leap years.
seq(as.Date('1981-01-01'), as.Date('1985-06-30'), by = 'quarter')
[1] "1981-01-01" "1981-04-01" "1981-07-01" "1981-10-01" "1982-01-01" "1982-04-01" "1982-07-01" "1982-10-01"
[9] "1983-01-01" "1983-04-01" "1983-07-01" "1983-10-01" "1984-01-01" "1984-04-01" "1984-07-01" "1984-10-01"
[17] "1985-01-01" "1985-04-01"

how to sort list.files() in correct date order?

Using normal list.files() in the working directory return the file list but the numeric order is messed up.
f <- list.files(pattern="*.nc")
f
# [1] "te1971-1.nc" "te1971-10.nc" "te1971-11.nc" "te1971-12.nc"
# [5] "te1971-2.nc" "te1971-3.nc" "te1971-4.nc" "te1971-5.nc"
# [9] "te1971-6.nc" "te1971-7.nc" "te1971-8.nc" "te1971-9.nc"
where the number after "-" describes the month number.
I used the following to try to sort it
myFiles <- paste("te", i, "-", c(1:12), ".nc", sep = "")
mixedsort(myFiles)
it returns ordered files but in reverse:
[1] "te1971-12.nc" "te1971-11.nc" "tev1971-10.nc" "te1971-9.nc"
[5] "te1971-8.nc" "te1971-7.nc" "te1971-6.nc" "te1971-5.nc"
[9] "te1971-4.nc" "te1971-3.nc" "te1971-2.nc" "te1971-1.nc"
How do I fix this?
The issue is that the values get alphabetically sorted.
You could gsub out years and months as groups (.) and add "-1" as first day of the month to the yield, coerce it as.Date and order by that.
x[order(as.Date(gsub('.*(\\d{4})-(\\d{,2}).*', '\\1-\\2-1', x)))]
# [1] "te1971-1.nc" "te1971-2.nc" "te1971-3.nc" "te1971-4.nc" "te1971-5.nc"
# [6] "te1971-6.nc" "te1971-7.nc" "te1971-8.nc" "te1971-9.nc" "te1971-10.nc"
# [11] "te1971-11.nc" "te1971-12.nc"
Data:
x <- c("te1971-1.nc", "te1971-10.nc", "te1971-11.nc", "te1971-12.nc",
"te1971-2.nc", "te1971-3.nc", "te1971-4.nc", "te1971-5.nc", "te1971-6.nc",
"te1971-7.nc", "te1971-8.nc", "te1971-9.nc")

Inserting Previous Dates in R Vector

I'm trying to insert the previous date for every date in a vector in R.
This is my current vector:
[1] "1990-02-08" "1990-03-28" "1990-05-16" "1990-07-05" "1990-07-13" "1990-08-22" "1990-10-03"
[8] "1990-10-29" "1990-11-14" "1990-12-07" "1990-12-18" "1991-01-08" "1991-02-01" "1991-02-07"
I'm trying to get the following:
[1] "1990-02-07" "1990-02-08" "1990-03-27" "1990-03-28" "1990-05-15" "1990-05-16" "1990-07-05"
ect.
I tried the following:
dates_lagged = as.Date(dates)-1
dates_combined = c(date, dates_lagged)
However, with this method, some dates are not getting lagged.
Is there a better way to do this?
Edit: to answer the comment, this is my code (replaced CSV with its starting values):
FOMC <- read_csv(file = c("x", "1990-02-08", "1990-03-28", "1990-05-16", "1990-07-05", "1990-07-13", "1990-08-22", "1990-10-03",
"1990-10-29", "1990-11-14", "1990-12-07"))
FOMC$x <- as.Date(FOMC$x, format = "%Y-%m-%d")
colnames(FOMC) <- "Date"
dates_vector <- FOMC[["Date"]]
FOMC = as.vector(as.Date(dates_vector))
dates_lagged = as.Date(FOMC)-1
dates_combined = c(FOMC, dates_lagged)
as.Date(dates_combined)
For some reason, there is no "1990-10-28" before "1990-10-29" for example, and I can't figure out why.
You could try:
as.Date(c(rbind(dates - 1, dates)), origin = "1970-01-01")
#> [1] "1990-02-07" "1990-02-08" "1990-03-27" "1990-03-28" "1990-05-15"
#> [6] "1990-05-16" "1990-07-04" "1990-07-05" "1990-07-12" "1990-07-13"
#> [11] "1990-08-21" "1990-08-22" "1990-10-02" "1990-10-03" "1990-10-28"
#> [16] "1990-10-29" "1990-11-13" "1990-11-14" "1990-12-06" "1990-12-07"
#> [21] "1990-12-17" "1990-12-18" "1991-01-07" "1991-01-08" "1991-01-31"
#> [26] "1991-02-01" "1991-02-06" "1991-02-07"
Data
dates <- c("1990-02-08", "1990-03-28", "1990-05-16", "1990-07-05", "1990-07-13",
"1990-08-22", "1990-10-03", "1990-10-29", "1990-11-14", "1990-12-07",
"1990-12-18", "1991-01-08", "1991-02-01", "1991-02-07")
dates <- as.Date(dates)
Created on 2021-11-04 by the reprex package (v2.0.0)

Is there a specific function in R to merge 2 vectors [duplicate]

This question already has answers here:
Pasting two vectors with combinations of all vectors' elements
(8 answers)
Closed 2 years ago.
I have two vectors, one that contains a list of variables, and one that contains dates, such as
Variables_Pays <- c("PIB", "ConsommationPrivee","ConsommationPubliques",
"FBCF","ProductionIndustrielle","Inflation","InflationSousJacente",
"PrixProductionIndustrielle","CoutHoraireTravail")
Annee_Pays <- c("2000","2001")
I want to merge them to have a vector with each variable indexed by my date, that is my desired output is
> Colonnes_Pays_Principaux
[1] "PIB_2020" "PIB_2021" "ConsommationPrivee_2020"
[4] "ConsommationPrivee_2021" "ConsommationPubliques_2020" "ConsommationPubliques_2021"
[7] "FBCF_2020" "FBCF_2021" "ProductionIndustrielle_2020"
[10] "ProductionIndustrielle_2021" "Inflation_2020" "Inflation_2021"
[13] "InflationSousJacente_2020" "InflationSousJacente_2021" "PrixProductionIndustrielle_2020"
[16] "PrixProductionIndustrielle_2021" "CoutHoraireTravail_2020" "CoutHoraireTravail_2021"
Is there a simpler / more readabl way than a double for loop as I have tried and succeeded below ?
Colonnes_Pays_Principaux <- vector()
for (Variable in (1:length(Variables_Pays))){
for (Annee in (1:length(Annee_Pays))){
Colonnes_Pays_Principaux=
append(Colonnes_Pays_Principaux,
paste(Variables_Pays[Variable],Annee_Pays[Annee],sep="_")
)
}
}
expand.grid will create a data frame with all combinations of the two vectors.
with(
expand.grid(Variables_Pays, Annee_Pays),
paste0(Var1, "_", Var2)
)
#> [1] "PIB_2000" "ConsommationPrivee_2000"
#> [3] "ConsommationPubliques_2000" "FBCF_2000"
#> [5] "ProductionIndustrielle_2000" "Inflation_2000"
#> [7] "InflationSousJacente_2000" "PrixProductionIndustrielle_2000"
#> [9] "CoutHoraireTravail_2000" "PIB_2001"
#> [11] "ConsommationPrivee_2001" "ConsommationPubliques_2001"
#> [13] "FBCF_2001" "ProductionIndustrielle_2001"
#> [15] "Inflation_2001" "InflationSousJacente_2001"
#> [17] "PrixProductionIndustrielle_2001" "CoutHoraireTravail_2001"
We can use outer :
c(t(outer(Variables_Pays, Annee_Pays, paste, sep = '_')))
# [1] "PIB_2000" "PIB_2001"
# [3] "ConsommationPrivee_2000" "ConsommationPrivee_2001"
# [5] "ConsommationPubliques_2000" "ConsommationPubliques_2001"
# [7] "FBCF_2000" "FBCF_2001"
# [9] "ProductionIndustrielle_2000" "ProductionIndustrielle_2001"
#[11] "Inflation_2000" "Inflation_2001"
#[13] "InflationSousJacente_2000" "InflationSousJacente_2001"
#[15] "PrixProductionIndustrielle_2000" "PrixProductionIndustrielle_2001"
#[17] "CoutHoraireTravail_2000" "CoutHoraireTravail_2001"
No real need to go beyond the basics here! Use paste for pasting the strings and rep to repeat either Annee_Pays och Variables_Pays to get all combinations:
Variables_Pays <- c("PIB", "ConsommationPrivee","ConsommationPubliques",
"FBCF","ProductionIndustrielle","Inflation","InflationSousJacente",
"PrixProductionIndustrielle","CoutHoraireTravail")
Annee_Pays <- c("2000","2001")
# To get this is the same order as in your example:
paste(rep(Variables_Pays, rep(2, length(Variables_Pays))), Annee_Pays, sep = "_")
# Alternative order:
paste(Variables_Pays, rep(Annee_Pays, rep(length(Variables_Pays), 2)), sep = "_")
# Or, if order doesn't matter too much:
paste(Variables_Pays, rep(Annee_Pays, length(Variables_Pays)), sep = "_")
In base R:
Variables_Pays <- c("PIB", "ConsommationPrivee","ConsommationPubliques",
"FBCF","ProductionIndustrielle","Inflation","InflationSousJacente",
"PrixProductionIndustrielle","CoutHoraireTravail")
Annee_Pays <- c("2000","2001")
cbind(paste(Variables_Pays, Annee_Pays,sep="_"),paste(Variables_Pays, rev(Annee_Pays),sep="_")

Change origin for time series in r

I have a time series in R that I would like to work with, spanning from 01-01-52 to 01-01-88. (1952 to 1988). 37 observations.
However, when I read it in in R, I encounter the problem that the observations from 01-01-52 to 01-01-68 are interpreted as being in 2052 etc., rather than 1952.
How do I force R to read in all the data as being from 1952 to 1988?
Link to my data: https://www.dropbox.com/s/93foyc238skt3xj/AgricIndus.csv?dl=0
This is the code I have used. Do you know what I need to do with my code to make it read properly?
agri <- read.table("AgricIndus.csv",
sep = ",", header = TRUE, skip = 0,
stringsAsFactors = FALSE)
agri$time <- as.Date(agri$time, "%m-%d-%y")
agri.xts <- xts(agri[, 2:3], order.by = agri$time)
One way (hack) can be the following:
agri$time <- as.Date(paste0(substring(agri$time,1,6), '19', substring(agri$time,7,8)),
"%m-%d-%Y")
agri$time
# [1] "01-01-52" "01-01-53" "01-01-54" "01-01-55" "01-01-56" "01-01-57" "01-01-58" "01-01-59" "01-01-60" "01-01-61" "01-01-62" "01-01-63" "01-01-64" "01-01-65"
# [15] "01-01-66" "01-01-67" "01-01-68" "01-01-69" "01-01-70" "01-01-71" "01-01-72" "01-01-73" "01-01-74" "01-01-75" "01-01-76" "01-01-77" "01-01-78" "01-01-79"
# [29] "01-01-80" "01-01-81" "01-01-82" "01-01-83" "01-01-84" "01-01-85" "01-01-86" "01-01-87" "01-01-88"
If you can be sure that your time series is regular then the it is probably the easiest to generate a regular date sequence like so:
agri$time <- seq.Date(as.Date("1952-01-01"),as.Date("1988-01-01"),by='years’)
Another easy solution that would work for irregular time series as well would be to read your data as years 52 to 88 with format = %m-%d-%Y (capitalized “Y” !) and add 1900 years:
df$time <- as.POSIXlt(as.Date(df$time,format = '%m-%d-%Y'))
df$time$year <-df$time$year + 1900
df$time <- as.Date(df$time)
df$time
[1] "1952-01-01" "1953-01-01" "1954-01-01" "1955-01-01"
[5] "1956-01-01" "1957-01-01" "1958-01-01" "1959-01-01"
[9] "1960-01-01" "1961-01-01" "1962-01-01" "1963-01-01"
[13] "1964-01-01" "1965-01-01" "1966-01-01" "1967-01-01"
[17] "1968-01-01" "1969-01-01" "1970-01-01" "1971-01-01"
[21] "1972-01-01" "1973-01-01" "1974-01-01" "1975-01-01"
[25] "1976-01-01" "1977-01-01" "1978-01-01" "1979-01-01"
[29] "1980-01-01" "1981-01-01" "1982-01-01" "1983-01-01"
[33] "1984-01-01" "1985-01-01" "1986-01-01" "1987-01-01"
[37] "1988-01-01"

Resources