I have a data that want to calculate the growth rate by the previous year and quarter.
# dt
yq A B
2013 Q1 35233684 270950851
2013 Q2 36235895 274194641
2013 Q3 36767497 275614372
2013 Q4 37273346 277125049
2014 Q1 37788578 278202677
2014 Q2 38674955 281025545
str(dt)
Classes ‘data.table’ and 'data.frame': 6 obs. of 3 variables:
$ yq : 'yearqtr' num 2013 Q1 2013 Q2 2013 Q3 2013 Q4 ...
$ A : int 35233684 36235895 36767497 37273346 37788578 38674955
$ B: int 270950851 274194641 275614372 277125049 278202677 281025545
- attr(*, ".internal.selfref")=<externalptr>
The code I use:
dt[, lapply(.SD, function(x)x/shift(x) - 1), .SDcols = 2:3, by = .(quarter(yq))]
quarter A B
1 NA NA
1 0.07251283 0.02676436
2 NA NA
2 0.06731060 0.02491261
3 NA NA
4 NA NA
I got the result; however, I want the format like this:
I want it to keep the column yq and order with year and quarter.
yq A B
2013 Q1 35233684 270950851
2013 Q2 36235895 274194641
2013 Q3 36767497 275614372
2013 Q4 37273346 277125049
2014 Q1 37788578 278202677
2014 Q2 38674955 281025545
yq A B A_R B_R
2013 Q1 35233684 270950851 NA NA
2013 Q2 36235895 274194641 NA NA
2013 Q3 36767497 275614372 NA NA
2013 Q4 37273346 277125049 NA NA
2014 Q1 37788578 278202677 0.07251283 0.02676436
2014 Q2 38674955 281025545 0.06731060 0.02491261
How do I do to edit my code?
# Data
library(data.table)
dt <- fread("yq A B
2013 Q1 35233684 270950851
2013 Q2 36235895 274194641
2013 Q3 36767497 275614372
2013 Q4 37273346 277125049
2014 Q1 37788578 278202677
2014 Q2 38674955 28102554", header = T)
So I see you are using the zoo package and the function yearqtr. I am unable to get the yq column read using your fread but I just quickly reproduced the data as follows:
library(zoo)
dt<-data.table(cbind(yq=2013 + seq(0,5)/4,
A = c(35233684, 36235895, 36767497, 37273346, 37788578, 38674955),
B = c(270950851, 274194641, 275614372, 277125049, 278202677, 281025545)))
Then just converted the yq as follows:
dt[,yq:=as.yearqtr(yq)]
Now if you want to keep that column you will need to update the columns by specifying them:
cols<-c("A","B")
dt[,eval(cols):=lapply(.SD,function(x)x/shift(x) - 1), .SDcols = 2:3, by = .(quarter(yq))]
So simply add as many columns as you need to the cols vector and use eval so data.table will not create a new column named "cols"! Does this answer your question?
I am not familiar with data.table package. But here is how I would do it using dplyr.
You can first separate your yq column into two columns, y and q. I skipped this step in my code because I don't know what exact datatype you used in the original data.
Then group by q to do the calculation.
library(data.table)
dt <- fread(
"y q A B
2013 Q1 35233684 270950851
2013 Q2 36235895 274194641
2013 Q3 36767497 275614372
2013 Q4 37273346 277125049
2014 Q1 37788578 278202677
2014 Q2 38674955 281025545", header = T)
library(tidyverse)
dt%>%group_by(q)%>%
arrange(y)%>%
mutate(growth_rate_over_year_A= A/lag(A)-1,
growth_rate_over_year_B= B/lag(B)-1)%>%
ungroup
output:
# A tibble: 6 x 6
y q A B growth_rate_over_year_A growth_rate_over_year_B
<int> <chr> <int> <int> <dbl> <dbl>
1 2013 Q1 35233684 270950851 NA NA
2 2013 Q2 36235895 274194641 NA NA
3 2013 Q3 36767497 275614372 NA NA
4 2013 Q4 37273346 277125049 NA NA
5 2014 Q1 37788578 278202677 0.0725 0.0268
6 2014 Q2 38674955 281025545 0.0673 0.0249
Related
I have the following dataframe:
> str(database)
'data.frame': 8547287 obs. of 4 variables:
$ cited_id : num 4.06e+08 5.41e+07 5.31e+07 5.04e+07 3.79e+08 ...
$ cited_pub_year : num 2014 1989 2002 2002 2015 ...
$ citing_id : num 3.34e+08 3.37e+08 4.06e+08 4.19e+08 4.25e+08 ...
$ citing_pub_year: num 2011 2011 2013 2014 2014 ...
The variables cited_id and citing_id contain the IDs of the objects from which this database has been obtained.
This is an example of the dataframe:
cited_id cited_pub_year citing_id citing_pub_year
1 405821349 2014 419185055 2011
2 405821349 1989 336621202 2011
3 53148996 2002 406314162 2013
4 53148996 2002 419185055 2014
5 379369076 2015 424901495 2014
6 53148996 2011 441055669 2015
7 405821349 2014 447519383 2015
8 405821349 2015 469644221 2016
9 329268142 2014 470861263 2016
10 45433355 2008 55422577 2008
For example the ID 405821349 has been cited by 419185055, 336621202, 447519383 and 469644221. For each pair of IDs I would like to calculate the intersection of their citing IDs. The quantity Pj.k below is the length of the intersection. I tried with the following code
total_id<-c(database$cited_id,database$citing_id)
total_id<-unique(total_id)
df<-data.frame(data_k=character(),data_j=character(),Pj.k=numeric(),
stringsAsFactors = F)
for (k in 1:(length(total_id)-1)) {
data_k<-total_id[k]
citing_data_k<-database[database$cited_id==data_k,]
for (j in (k+1):length(total_id)) {
data_j<-total_id[j]
citing_data_j<-database[database$cited_id==data_j,]
Pj.k<-length(intersect(citing_data_j$citing_id,citing_data_k$citing_id))
dfxx=data.frame(data_k=data_k,data_j=data_j,Pj.k=Pj.k,
stringsAsFactors = F)
df<-rbind(df,dfxx)
}
}
Anyway, it takes too long! How could I speed it up?
Using xtabs, tcrossprod and sparse matrices:
library(Matrix)
library(data.table)
m2 <- as(
triu(
tcrossprod(
m1 <- xtabs(data = database[,c(1, 3)], sparse = TRUE)
), k = 1
), "TsparseMatrix"
)
df <- data.frame(
data_k = row.names(m1)[attr(m2, "i") + 1L],
data_j = row.names(m1)[attr(m2, "j") + 1L],
Pj.k = attr(m2, "x"),
stringsAsFactors = FALSE
)
Inspired by answers in Count combinations of categorical variables, regardless of order, in R? , count pairs:
database = read.table(header = T, stringsAsFactors = F, text =
"cited_id cited_pub_year citing_id citing_pub_year
1 405821349 2014 419185055 2011
2 405821349 1989 336621202 2011
3 53148996 2002 406314162 2013
4 53148996 2002 419185055 2014
5 379369076 2015 424901495 2014
6 53148996 2011 441055669 2015
7 405821349 2014 447519383 2015
8 405821349 2015 469644221 2016
9 329268142 2014 470861263 2016
10 45433355 2008 55422577 2008")
database |>
dplyr::count(pairs = paste(pmin(cited_id, citing_id),
pmax(cited_id, citing_id)))
#> pairs n
#> 1 329268142 470861263 1
#> 2 336621202 405821349 1
#> 3 379369076 424901495 1
#> 4 405821349 419185055 1
#> 5 405821349 447519383 1
#> 6 405821349 469644221 1
#> 7 45433355 55422577 1
#> 8 53148996 406314162 1
#> 9 53148996 419185055 1
#> 10 53148996 441055669 1
Depending on what you actually need you might find with(database, table(cited_id = cited_id, citing_id = citing_id)) useful too.
I have 3 quarterly time-series data: beer, temp, income, and all those data start from 2010 Q1 and end at 2018 Q3.
here is my data:
Qtr1 Qtr2 Qtr3 Qtr4
2010 3.301 2.826 2.712 3.934
2011 3.192 2.975 2.865 3.789
2012 2.728 2.840 2.633 3.837
2013 3.090 2.779 2.594 3.960
2014 2.771 2.860 2.676 3.831
2015 2.986 2.558 2.810 3.743
2016 3.054 2.764 2.985 3.807
2017 3.046 2.880 2.689 4.005
2018 3.013 2.800 2.937
> temp
Qtr1 Qtr2 Qtr3 Qtr4
2010 16.766667 11.433333 9.400000 14.533333
2011 17.033333 11.966667 8.633333 13.900000
2012 15.800000 10.600000 9.700000 13.766667
2013 17.033333 11.333333 10.200000 14.866667
2014 16.266667 11.900000 9.266667 13.900000
2015 17.300000 11.400000 8.733333 13.966667
2016 18.033333 12.400000 9.300000 14.100000
2017 16.533333 11.100000 9.733333 15.300000
2018 18.400000 11.033333 9.700000
> income
Qtr1 Qtr2 Qtr3 Qtr4
2010 48.064 47.755 47.878 47.707
2011 48.226 49.063 49.322 49.518
2012 49.714 49.390 49.683 50.386
2013 50.405 51.476 52.527 53.456
2014 54.309 54.308 54.811 54.723
2015 55.254 55.913 56.472 56.316
2016 58.013 58.312 58.744 59.806
2017 59.881 60.683 61.164 61.887
2018 61.969 62.507 63.054
I tried to forecast 2 years values of beer using trend and seasonal dummy predictor, but R always give me dimension error.
> forecast(tslm(beer~temp+income+trend+season), h = 8)
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) :
variable lengths differ (found for 'trend')
In addition: Warning message:
'newdata' had 8 rows but variables found have 35 rows
Using data.frame, but it always has warning messages
> df = data.frame(beer,temp,income)
> forecast(tslm(beer~temp+income+trend+season, data = df), h = 8, newdata = df)
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
2018 Q4 2.991699 2.3132374 3.670161 1.9328516 4.050546
2019 Q1 1.752979 1.0424701 2.463488 0.6441168 2.861841
2019 Q2 1.667426 0.9738984 2.360954 0.5850656 2.749787
2019 Q3 1.875770 1.0662253 2.685315 0.6123465 3.139194
2019 Q4 2.884266 2.1308413 3.637691 1.7084267 4.060105
2020 Q1 1.729527 1.0011085 2.457945 0.5927141 2.866339
2020 Q2 1.599902 0.8838936 2.315910 0.4824569 2.717347
2020 Q3 1.837085 1.0376823 2.636488 0.5894896 3.084681
2020 Q4 2.800613 2.0470872 3.554139 1.6246159 3.976610
2021 Q1 1.566346 0.7583452 2.374347 0.3053320 2.827360
2021 Q2 1.537637 0.7593199 2.315954 0.3229493 2.752324
2021 Q3 1.758491 0.9202322 2.596749 0.4502548 3.066726
2021 Q4 2.766178 1.9445748 3.587782 1.4839351 4.048421
2022 Q1 1.600676 0.8060401 2.395313 0.3605199 2.840833
2022 Q2 1.610888 0.8665356 2.355241 0.4492074 2.772569
2022 Q3 1.870518 1.0513857 2.689650 0.5921317 3.148904
2022 Q4 2.855698 2.1234601 3.587935 1.7129243 3.998471
2023 Q1 1.675867 0.9187581 2.432976 0.4942778 2.857457
2023 Q2 1.590225 0.8580061 2.322445 0.4474806 2.732970
2023 Q3 1.783578 0.9603794 2.606776 0.4988456 3.068310
2023 Q4 2.829362 2.0411286 3.617595 1.5991983 4.059525
2024 Q1 1.629442 0.8509889 2.407896 0.4145418 2.844343
2024 Q2 1.546023 0.7994307 2.292615 0.3808469 2.711199
2024 Q3 1.759382 0.9209619 2.597803 0.4508937 3.067871
2024 Q4 2.906656 2.1369607 3.676351 1.7054240 4.107887
2025 Q1 1.694576 0.9426298 2.446521 0.5210444 2.868107
2025 Q2 1.585464 0.8512783 2.319649 0.4396504 2.731277
2025 Q3 1.858994 1.0774412 2.640548 0.6392561 3.078733
2025 Q4 2.836440 2.0876545 3.585226 1.6678407 4.005040
2026 Q1 1.664587 0.9073179 2.421857 0.4827478 2.846427
2026 Q2 1.628942 0.9118032 2.346081 0.5097325 2.748152
2026 Q3 1.911943 1.1070396 2.716846 0.6557631 3.168123
2026 Q4 2.916889 2.1414033 3.692375 1.7066199 4.127158
2027 Q1 1.649728 0.8839868 2.415469 0.4546670 2.844789
2027 Q2 1.619649 0.8980352 2.341262 0.4934558 2.745842
Warning messages:
1: In forecast.lm(tslm(beer ~ temp + income + trend + season, data = df), :
Could not find required variable temp in newdata. Specify newdata as a named data.frame
2: In forecast.lm(tslm(beer ~ temp + income + trend + season, data = df), :
Could not find required variable income in newdata. Specify newdata as a named data.frame
I tried to rename the column in dataframe, this time works well but the plot doesn't look right
> names(df)[2] = "temp"
> names(df)[3] = "income"
> autoplot(forecast(tslm(beer~temp+income+trend+season, data = df), h = 8, newdata = df))
[enter image description here][1]
But when I exclude the predictor temp and income, it works well
> forecast(tslm(beer~trend+season), h = 8)
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
2018 Q4 3.854655 3.654084 4.055226 3.542067 4.167244
2019 Q1 3.010562 2.809372 3.211751 2.697010 3.324113
2019 Q2 2.799562 2.598372 3.000751 2.486010 3.113113
2019 Q3 2.757228 2.556039 2.958417 2.443676 3.070780
2019 Q4 3.852745 3.648494 4.056997 3.534421 4.171070
2020 Q1 3.008652 2.803430 3.213874 2.688815 3.328489
2020 Q2 2.797652 2.592430 3.002874 2.477815 3.117489
2020 Q3 2.755318 2.550096 2.960540 2.435481 3.075155
I want forecast 2 years beer value with temp, income, trend, seasonal dummy as predictor, I tried everything I know..
Please help.
Thanks in advance.
There are a couple of problems here. The first is that you are providing historical temp and income data in the newdata argument, when they should be future values for these variables. The second issue is that the forecast package is not particularly good at finding the relevant variables in newdata and is getting confused here. Workarounds are possible, but I suggest you use the newer fable package instead of forecast which makes this sort of thing much easier.
library(tidyverse)
library(lubridate)
library(tsibble)
library(fable)
df <- tsibble(
quarter = seq(yearquarter("2010 Q1"), to=yearquarter("2018 Q3"), by = 1),
beer = c(
3.301, 2.826, 2.712, 3.934, 3.192, 2.975, 2.865, 3.789,
2.728, 2.840, 2.633, 3.837, 3.090, 2.779, 2.594, 3.960,
2.771, 2.860, 2.676, 3.831, 2.986, 2.558, 2.810, 3.743,
3.054, 2.764, 2.985, 3.807, 3.046, 2.880, 2.689, 4.005,
3.013, 2.800, 2.937
),
temp = c(
16.766667, 11.433333, 9.400000, 14.533333, 17.033333, 11.966667, 8.633333, 13.900000,
15.800000, 10.600000, 9.700000, 13.766667, 17.033333, 11.333333, 10.200000, 14.866667,
16.266667, 11.900000, 9.266667, 13.900000, 17.300000, 11.400000, 8.733333, 13.966667,
18.033333, 12.400000, 9.300000, 14.100000, 16.533333, 11.100000, 9.733333, 15.300000,
18.400000, 11.033333, 9.700000
),
income = c(
48.064, 47.755, 47.878, 47.707, 48.226, 49.063, 49.322, 49.518,
49.714, 49.390, 49.683, 50.386, 50.405, 51.476, 52.527, 53.456,
54.309, 54.308, 54.811, 54.723, 55.254, 55.913, 56.472, 56.316,
58.013, 58.312, 58.744, 59.806, 59.881, 60.683, 61.164, 61.887,
61.969, 62.507, 63.054
),
index = quarter
)
df
#> # A tsibble: 35 x 4 [1Q]
#> quarter beer temp income
#> <qtr> <dbl> <dbl> <dbl>
#> 1 2010 Q1 3.30 16.8 48.1
#> 2 2010 Q2 2.83 11.4 47.8
#> 3 2010 Q3 2.71 9.4 47.9
#> 4 2010 Q4 3.93 14.5 47.7
#> 5 2011 Q1 3.19 17.0 48.2
#> 6 2011 Q2 2.98 12.0 49.1
#> 7 2011 Q3 2.86 8.63 49.3
#> 8 2011 Q4 3.79 13.9 49.5
#> 9 2012 Q1 2.73 15.8 49.7
#> 10 2012 Q2 2.84 10.6 49.4
#> # … with 25 more rows
train <- df %>% filter(year(quarter) <= 2016)
test <- df %>% filter(year(quarter) > 2016)
fc <- train %>%
model(TSLM(beer ~ temp + income + trend() + season())) %>%
forecast(new_data = test)
Created on 2020-04-29 by the reprex package (v0.3.0)
I have a dataframe in R which has three columns Product_Name(name of books), Year and Units (number of units sold in that year) which looks like this:
Product_Name Year Units
A Modest Proposal 2011 10000
A Modest Proposal 2012 11000
A Modest Proposal 2013 12000
A Modest Proposal 2014 13000
Animal Farm 2011 8000
Animal Farm 2012 9000
Animal Farm 2013 11000
Animal Farm 2014 15000
Catch 22 2011 1000
Catch 22 2012 2000
Catch 22 2013 3000
Catch 22 2014 4000
....
I intend to make a R Shiny dashboard with that where I want to keep the year as a drop-down menu option, for which I wanted to have the dataframe in the following format
A Modest Proposal Animal Farm Catch 22
2011 10000 8000 1000
2012 11000 9000 2000
2013 12000 11000 3000
2014 13000 15000 4000
or the other way round where the Product Names are row indexes and Years are column indexes, either way goes.
How can I do this in R?
Your general issue is transforming long data to wide data. For this, you can use data.table's dcast function (amongst many others):
dt = data.table(
Name = c(rep('A', 4), rep('B', 4), rep('C', 4)),
Year = c(rep(2011:2014, 3)),
Units = rnorm(12)
)
> dt
Name Year Units
1: A 2011 -0.26861318
2: A 2012 0.27194732
3: A 2013 -0.39331361
4: A 2014 0.58200101
5: B 2011 0.09885381
6: B 2012 -0.13786098
7: B 2013 0.03778400
8: B 2014 0.02576433
9: C 2011 -0.86682584
10: C 2012 -1.34319590
11: C 2013 0.10012673
12: C 2014 -0.42956207
> dcast(dt, Year ~ Name, value.var = 'Units')
Year A B C
1: 2011 -0.2686132 0.09885381 -0.8668258
2: 2012 0.2719473 -0.13786098 -1.3431959
3: 2013 -0.3933136 0.03778400 0.1001267
4: 2014 0.5820010 0.02576433 -0.4295621
For the next time, it is easier if you provide a reproducible example, so that the people assisting you do not have to manually recreate your data structure :)
You need to use pivot_wider from tidyr package. I assumed your data is saved in df and you also need dplyr package for %>% (piping)
library(tidyr)
library(dplyr)
df %>%
pivot_wider(names_from = Product_Name, values_from = Units)
Assuming that your dataframe is ordered by Product_Name and by year, I will generate artificial data similar to your datafrme, try this:
Col_1 <- sort(rep(LETTERS[1:3], 4))
Col_2 <- rep(2011:2014, 3)
# artificial data
resp <- ceiling(rnorm(12, 5000, 500))
uu <- data.frame(Col_1, Col_2, resp)
uu
# output is
Col_1 Col_2 resp
1 A 2011 5297
2 A 2012 4963
3 A 2013 4369
4 A 2014 4278
5 B 2011 4721
6 B 2012 5021
7 B 2013 4118
8 B 2014 5262
9 C 2011 4601
10 C 2012 5013
11 C 2013 5707
12 C 2014 5637
>
> # Here starts
> output <- aggregate(uu$resp, list(uu$Col_1), function(x) {x})
> output
Group.1 x.1 x.2 x.3 x.4
1 A 5297 4963 4369 4278
2 B 4721 5021 4118 5262
3 C 4601 5013 5707 5637
>
output2 <- output [, -1]
colnames(output2) <- levels(as.factor(uu$Col_2))
rownames(output2) <- levels(as.factor(uu$Col_1))
# transpose the matrix
> t(output2)
A B C
2011 5297 4721 4601
2012 4963 5021 5013
2013 4369 4118 5707
2014 4278 5262 5637
> # or convert to data.frame
> as.data.frame(t(output2))
A B C
2011 5297 4721 4601
2012 4963 5021 5013
2013 4369 4118 5707
2014 4278 5262 5637
First and foremost - thank you for viewing my question - regardless of if you answer or not.
I am trying to add a column that contains the lagged values of the Quarter value to my DF, however, I get the below warning when I do so:
Warning messages:
1: In mutate_impl(.data, dots) :
Vectorizing 'yearqtr' elements may not preserve their attributes
Below is my sample data (my data starts on 1/3/2018)
Ticker Price Date Quarter
A 10 1/3/18 2018 Q1
A 13.5 2/15/18 2018 Q1
A 12.9 4/2/18 2018 Q2
A 11.2 5/3/18 2018 Q2
B 35.2 1/4/18 2018 Q1
B 33.1 3/2/18 2018 Q1
B 31 4/6/18 2018 Q2
... ... ... ...
XYZ 102 5/6/18 2018 Q2
I have a huge table with multiple stocks and multiple dates. The way I calculate the quarter column is :
df$quarter <- lag(as.yearqtr(df$Date))
But however - I can't get to add a column that would lag the values of the Quarter. Would anyone know a possible workaround?
I would like the below output:
Ticker Price Date Quarter Lag_Q
A 10 1/3/18 2018 Q1 NA
A 13.5 2/15/18 2018 Q1 NA
A 12.9 4/2/18 2018 Q2 2018 Q1
A 11.2 5/3/18 2018 Q2 2018 Q1
B 35.2 1/4/18 2018 Q1 NA
B 33.1 3/2/18 2018 Q1 NA
B 31 4/6/18 2018 Q2 2018 Q1
... ... ... ...
XYZ 102 5/6/18 2018 Q2 2018 Q1
Firstly, I'd suggest organizing your data so that each column represents prices of an individual security and each row is a specific date. From there, you can transform all securities easily, but I'm not sure what your end goal is. The xts package is excellent and has been optimized in c, and is kind of the securities industry standard. I highly suggest exploring it. But that's beyond the scope of your post!
For your data structure though, a single line should do:
df$lag_Q <- as.yearqtr( ifelse(test = (df$quarter=="2018 Q1"),
yes = NA,
no = df$quarter-0.25) )
I have a data set that contains quarterly data for 8 years. If I randomly select each quarter from one of the years I could in theory construct a "new" year. For example: new year = Q1(2009), Q2(2012), Q3(2010), Q4(2015).
The problem I have, is that I would like to construct a data set that contains all such permutations. With 8 years and 4 quarters that would give me 4^8= 65536 "new" years. Is this something best tackled with a nested loop, or are there functions out there that could work better?
We can use expand.grid to create a matrix of all possible combinations:
nrow(do.call('expand.grid', replicate(8, 1:4, simplify=FALSE)))
[1] 65536
I think you want combinations of the 8 years over 4 quarters so the number of combinations is 8^4 = 4096:
> x <- years <- 2008:2015
> length(x)
[1] 8
> comb <- expand.grid(x, x, x, x)
> head(comb)
Var1 Var2 Var3 Var4
1 2008 2008 2008 2008
2 2009 2008 2008 2008
3 2010 2008 2008 2008
4 2011 2008 2008 2008
5 2012 2008 2008 2008
6 2013 2008 2008 2008
> tail(comb)
Var1 Var2 Var3 Var4
4091 2010 2015 2015 2015
4092 2011 2015 2015 2015
4093 2012 2015 2015 2015
4094 2013 2015 2015 2015
4095 2014 2015 2015 2015
4096 2015 2015 2015 2015
> nrow(comb)
[1] 4096
Each row is a year and Var1, Var2, Var3, Var4 are the 4 quarters.
You may want to wait a bit to see if someone gives you a less 'janky' answer, but this example takes a time series, takes all permutations with no repeated quarters inside of each year, and returns those new years values with the old year and quarters info as columns.
set.seed(1234)
# Make some fake data
q_dat <- data.frame(year = c(rep(2011,4),
rep(2012,4),
rep(2013,4)),
quarters = rep(c("Q1","Q2","Q3","Q4"),3),
x = rnorm(12))
q_dat
year quarters x
1 2011 Q1 -1.2070657
2 2011 Q2 0.2774292
3 2011 Q3 1.0844412
4 2011 Q4 -2.3456977
5 2012 Q1 0.4291247
6 2012 Q2 0.5060559
7 2012 Q3 -0.5747400
8 2012 Q4 -0.5466319
9 2013 Q1 -0.5644520
10 2013 Q2 -0.8900378
11 2013 Q3 -0.4771927
12 2013 Q4 -0.9983864
So what are going to do is
1, Take all possible combinations of the time series
2, Remove all duplicates so each made up year does not have the same quarter in it.
# Expand out all possible combinations of our three years
q_perms <- expand.grid(q1 = 1:nrow(q_dat), q2 = 1:nrow(q_dat) ,
q3 = 1:nrow(q_dat), q4 = 1:nrow(q_dat))
# remove any duplicate combinations
# EX: So we don't get c(2011Q1,2011Q1,2011Q1,2011Q1) as a year
q_perms <- q_perms[apply(q_perms,1,function(x) !any(duplicated(x))),]
# Transpose the grid, remake it as a data frame, and lapply over it
l_rand_dat <- lapply(data.frame(t(q_perms)),function(x) q_dat[x,])
# returns one unique year per list
l_rand_dat[[30]]
year quarters x
5 2012 Q1 0.4291247
6 2012 Q2 0.5060559
2 2011 Q2 0.2774292
1 2011 Q1 -1.2070657
# bind all of those together
rand_bind <- do.call(rbind,l_rand_dat)
head(rand_bind)
year quarters x
X172.4 2011 Q4 -2.3456977
X172.3 2011 Q3 1.0844412
X172.2 2011 Q2 0.2774292
X172.1 2011 Q1 -1.2070657
X173.5 2012 Q1 0.4291247
X173.3 2011 Q3 1.0844412
This is a pretty memory intensive answer. If someone can skip the 'make all possible combinations' step then that would be a significant improvement.