All permutations of quartely data - r

I have a data set that contains quarterly data for 8 years. If I randomly select each quarter from one of the years I could in theory construct a "new" year. For example: new year = Q1(2009), Q2(2012), Q3(2010), Q4(2015).
The problem I have, is that I would like to construct a data set that contains all such permutations. With 8 years and 4 quarters that would give me 4^8= 65536 "new" years. Is this something best tackled with a nested loop, or are there functions out there that could work better?

We can use expand.grid to create a matrix of all possible combinations:
nrow(do.call('expand.grid', replicate(8, 1:4, simplify=FALSE)))
[1] 65536

I think you want combinations of the 8 years over 4 quarters so the number of combinations is 8^4 = 4096:
> x <- years <- 2008:2015
> length(x)
[1] 8
> comb <- expand.grid(x, x, x, x)
> head(comb)
Var1 Var2 Var3 Var4
1 2008 2008 2008 2008
2 2009 2008 2008 2008
3 2010 2008 2008 2008
4 2011 2008 2008 2008
5 2012 2008 2008 2008
6 2013 2008 2008 2008
> tail(comb)
Var1 Var2 Var3 Var4
4091 2010 2015 2015 2015
4092 2011 2015 2015 2015
4093 2012 2015 2015 2015
4094 2013 2015 2015 2015
4095 2014 2015 2015 2015
4096 2015 2015 2015 2015
> nrow(comb)
[1] 4096
Each row is a year and Var1, Var2, Var3, Var4 are the 4 quarters.

You may want to wait a bit to see if someone gives you a less 'janky' answer, but this example takes a time series, takes all permutations with no repeated quarters inside of each year, and returns those new years values with the old year and quarters info as columns.
set.seed(1234)
# Make some fake data
q_dat <- data.frame(year = c(rep(2011,4),
rep(2012,4),
rep(2013,4)),
quarters = rep(c("Q1","Q2","Q3","Q4"),3),
x = rnorm(12))
q_dat
year quarters x
1 2011 Q1 -1.2070657
2 2011 Q2 0.2774292
3 2011 Q3 1.0844412
4 2011 Q4 -2.3456977
5 2012 Q1 0.4291247
6 2012 Q2 0.5060559
7 2012 Q3 -0.5747400
8 2012 Q4 -0.5466319
9 2013 Q1 -0.5644520
10 2013 Q2 -0.8900378
11 2013 Q3 -0.4771927
12 2013 Q4 -0.9983864
So what are going to do is
1, Take all possible combinations of the time series
2, Remove all duplicates so each made up year does not have the same quarter in it.
# Expand out all possible combinations of our three years
q_perms <- expand.grid(q1 = 1:nrow(q_dat), q2 = 1:nrow(q_dat) ,
q3 = 1:nrow(q_dat), q4 = 1:nrow(q_dat))
# remove any duplicate combinations
# EX: So we don't get c(2011Q1,2011Q1,2011Q1,2011Q1) as a year
q_perms <- q_perms[apply(q_perms,1,function(x) !any(duplicated(x))),]
# Transpose the grid, remake it as a data frame, and lapply over it
l_rand_dat <- lapply(data.frame(t(q_perms)),function(x) q_dat[x,])
# returns one unique year per list
l_rand_dat[[30]]
year quarters x
5 2012 Q1 0.4291247
6 2012 Q2 0.5060559
2 2011 Q2 0.2774292
1 2011 Q1 -1.2070657
# bind all of those together
rand_bind <- do.call(rbind,l_rand_dat)
head(rand_bind)
year quarters x
X172.4 2011 Q4 -2.3456977
X172.3 2011 Q3 1.0844412
X172.2 2011 Q2 0.2774292
X172.1 2011 Q1 -1.2070657
X173.5 2012 Q1 0.4291247
X173.3 2011 Q3 1.0844412
This is a pretty memory intensive answer. If someone can skip the 'make all possible combinations' step then that would be a significant improvement.

Related

R: Count number of new observations compared to a previous groups

I would like to know the number of new observations that occurred between groups.
If I have the following data:
Year
Observation
2009
A
2009
A
2009
B
2010
A
2010
B
2010
C
I wound like the output to be
Year
New_Obsevation_Count
2009
2
2010
1
I am new to R and don't really know how to move forward. I have tried using the count function in the tidyverse package but still can't figure out.
You can use union in Reduce:
y <- split(x$Observation, x$Year)
data.frame(Year = names(y), nNew =
diff(lengths(Reduce(union, y, NULL, accumulate = TRUE))))
# Year nNew
#1 2009 2
#2 2010 1
Data:
x <- read.table(header=TRUE, text="Year Observation
2009 A
2009 A
2009 B
2010 A
2010 B
2010 C")

How can I change row and column indexes of a dataframe in R?

I have a dataframe in R which has three columns Product_Name(name of books), Year and Units (number of units sold in that year) which looks like this:
Product_Name Year Units
A Modest Proposal 2011 10000
A Modest Proposal 2012 11000
A Modest Proposal 2013 12000
A Modest Proposal 2014 13000
Animal Farm 2011 8000
Animal Farm 2012 9000
Animal Farm 2013 11000
Animal Farm 2014 15000
Catch 22 2011 1000
Catch 22 2012 2000
Catch 22 2013 3000
Catch 22 2014 4000
....
I intend to make a R Shiny dashboard with that where I want to keep the year as a drop-down menu option, for which I wanted to have the dataframe in the following format
A Modest Proposal Animal Farm Catch 22
2011 10000 8000 1000
2012 11000 9000 2000
2013 12000 11000 3000
2014 13000 15000 4000
or the other way round where the Product Names are row indexes and Years are column indexes, either way goes.
How can I do this in R?
Your general issue is transforming long data to wide data. For this, you can use data.table's dcast function (amongst many others):
dt = data.table(
Name = c(rep('A', 4), rep('B', 4), rep('C', 4)),
Year = c(rep(2011:2014, 3)),
Units = rnorm(12)
)
> dt
Name Year Units
1: A 2011 -0.26861318
2: A 2012 0.27194732
3: A 2013 -0.39331361
4: A 2014 0.58200101
5: B 2011 0.09885381
6: B 2012 -0.13786098
7: B 2013 0.03778400
8: B 2014 0.02576433
9: C 2011 -0.86682584
10: C 2012 -1.34319590
11: C 2013 0.10012673
12: C 2014 -0.42956207
> dcast(dt, Year ~ Name, value.var = 'Units')
Year A B C
1: 2011 -0.2686132 0.09885381 -0.8668258
2: 2012 0.2719473 -0.13786098 -1.3431959
3: 2013 -0.3933136 0.03778400 0.1001267
4: 2014 0.5820010 0.02576433 -0.4295621
For the next time, it is easier if you provide a reproducible example, so that the people assisting you do not have to manually recreate your data structure :)
You need to use pivot_wider from tidyr package. I assumed your data is saved in df and you also need dplyr package for %>% (piping)
library(tidyr)
library(dplyr)
df %>%
pivot_wider(names_from = Product_Name, values_from = Units)
Assuming that your dataframe is ordered by Product_Name and by year, I will generate artificial data similar to your datafrme, try this:
Col_1 <- sort(rep(LETTERS[1:3], 4))
Col_2 <- rep(2011:2014, 3)
# artificial data
resp <- ceiling(rnorm(12, 5000, 500))
uu <- data.frame(Col_1, Col_2, resp)
uu
# output is
Col_1 Col_2 resp
1 A 2011 5297
2 A 2012 4963
3 A 2013 4369
4 A 2014 4278
5 B 2011 4721
6 B 2012 5021
7 B 2013 4118
8 B 2014 5262
9 C 2011 4601
10 C 2012 5013
11 C 2013 5707
12 C 2014 5637
>
> # Here starts
> output <- aggregate(uu$resp, list(uu$Col_1), function(x) {x})
> output
Group.1 x.1 x.2 x.3 x.4
1 A 5297 4963 4369 4278
2 B 4721 5021 4118 5262
3 C 4601 5013 5707 5637
>
output2 <- output [, -1]
colnames(output2) <- levels(as.factor(uu$Col_2))
rownames(output2) <- levels(as.factor(uu$Col_1))
# transpose the matrix
> t(output2)
A B C
2011 5297 4721 4601
2012 4963 5021 5013
2013 4369 4118 5707
2014 4278 5262 5637
> # or convert to data.frame
> as.data.frame(t(output2))
A B C
2011 5297 4721 4601
2012 4963 5021 5013
2013 4369 4118 5707
2014 4278 5262 5637

compute rate by group

I have a data that want to calculate the growth rate by the previous year and quarter.
# dt
yq A B
2013 Q1 35233684 270950851
2013 Q2 36235895 274194641
2013 Q3 36767497 275614372
2013 Q4 37273346 277125049
2014 Q1 37788578 278202677
2014 Q2 38674955 281025545
str(dt)
Classes ‘data.table’ and 'data.frame': 6 obs. of 3 variables:
$ yq : 'yearqtr' num 2013 Q1 2013 Q2 2013 Q3 2013 Q4 ...
$ A : int 35233684 36235895 36767497 37273346 37788578 38674955
$ B: int 270950851 274194641 275614372 277125049 278202677 281025545
- attr(*, ".internal.selfref")=<externalptr>
The code I use:
dt[, lapply(.SD, function(x)x/shift(x) - 1), .SDcols = 2:3, by = .(quarter(yq))]
quarter A B
1 NA NA
1 0.07251283 0.02676436
2 NA NA
2 0.06731060 0.02491261
3 NA NA
4 NA NA
I got the result; however, I want the format like this:
I want it to keep the column yq and order with year and quarter.
yq A B
2013 Q1 35233684 270950851
2013 Q2 36235895 274194641
2013 Q3 36767497 275614372
2013 Q4 37273346 277125049
2014 Q1 37788578 278202677
2014 Q2 38674955 281025545
yq A B A_R B_R
2013 Q1 35233684 270950851 NA NA
2013 Q2 36235895 274194641 NA NA
2013 Q3 36767497 275614372 NA NA
2013 Q4 37273346 277125049 NA NA
2014 Q1 37788578 278202677 0.07251283 0.02676436
2014 Q2 38674955 281025545 0.06731060 0.02491261
How do I do to edit my code?
# Data
library(data.table)
dt <- fread("yq A B
2013 Q1 35233684 270950851
2013 Q2 36235895 274194641
2013 Q3 36767497 275614372
2013 Q4 37273346 277125049
2014 Q1 37788578 278202677
2014 Q2 38674955 28102554", header = T)
So I see you are using the zoo package and the function yearqtr. I am unable to get the yq column read using your fread but I just quickly reproduced the data as follows:
library(zoo)
dt<-data.table(cbind(yq=2013 + seq(0,5)/4,
A = c(35233684, 36235895, 36767497, 37273346, 37788578, 38674955),
B = c(270950851, 274194641, 275614372, 277125049, 278202677, 281025545)))
Then just converted the yq as follows:
dt[,yq:=as.yearqtr(yq)]
Now if you want to keep that column you will need to update the columns by specifying them:
cols<-c("A","B")
dt[,eval(cols):=lapply(.SD,function(x)x/shift(x) - 1), .SDcols = 2:3, by = .(quarter(yq))]
So simply add as many columns as you need to the cols vector and use eval so data.table will not create a new column named "cols"! Does this answer your question?
I am not familiar with data.table package. But here is how I would do it using dplyr.
You can first separate your yq column into two columns, y and q. I skipped this step in my code because I don't know what exact datatype you used in the original data.
Then group by q to do the calculation.
library(data.table)
dt <- fread(
"y q A B
2013 Q1 35233684 270950851
2013 Q2 36235895 274194641
2013 Q3 36767497 275614372
2013 Q4 37273346 277125049
2014 Q1 37788578 278202677
2014 Q2 38674955 281025545", header = T)
library(tidyverse)
dt%>%group_by(q)%>%
arrange(y)%>%
mutate(growth_rate_over_year_A= A/lag(A)-1,
growth_rate_over_year_B= B/lag(B)-1)%>%
ungroup
output:
# A tibble: 6 x 6
y q A B growth_rate_over_year_A growth_rate_over_year_B
<int> <chr> <int> <int> <dbl> <dbl>
1 2013 Q1 35233684 270950851 NA NA
2 2013 Q2 36235895 274194641 NA NA
3 2013 Q3 36767497 275614372 NA NA
4 2013 Q4 37273346 277125049 NA NA
5 2014 Q1 37788578 278202677 0.0725 0.0268
6 2014 Q2 38674955 281025545 0.0673 0.0249

Aggregating based on previous year and this year

I have these data sets
month Year Rain
10 2010 376.8
11 2010 282.78
12 2010 324.58
1 2011 73.51
2 2011 225.89
3 2011 22.96
I used
df2prnext<-
aggregate(Rain~Year, data = subdataprnext, mean)
but I need the mean value of 217.53.
I am not getting the expected result. Thank you for your help.

import txt file with desired data structure in R

The txt is like
#---*----1----*----2----*---
Name Time.Period Value
A Jan 2013 10
B Jan 2013 11
C Jan 2013 12
A Feb 2013 9
B Feb 2013 11
C Feb 2013 15
A Mar 2013 10
B Mar 2013 8
C Mar 2013 13
I tried to use read.table with readLines and count.field as shown belows:
> path <- list.files()
> data <- read.table(text=readLines(path)[count.fields(path, blank.lines.skip=FALSE) == 4])
Warning message:
In readLines(path) : incomplete final line found on 'data1.txt'
> data
V1 V2 V3 V4
1 A Jan 2013 10
2 B Jan 2013 11
3 C Jan 2013 12
4 A Feb 2013 9
5 B Feb 2013 11
6 C Feb 2013 15
7 A Mar 2013 10
8 B Mar 2013 8
9 C Mar 2013 13
The problem is that it give four attributes instead of three. Therefore i manipulate my data as below which seeking a alternative.
> library(zoo)
> data$Name <- as.character(data$V1)
> data$Time.Period <- as.yearmon(paste(data$V2, data$V3, sep=" "))
> data$Value <- as.numeric(data$V4)
> DATA <- data[, 5:7]
> DATA
Name Time.Period Value
1 A Jan 2013 10
2 B Jan 2013 11
3 C Jan 2013 12
4 A Feb 2013 9
5 B Feb 2013 11
6 C Feb 2013 15
7 A Mar 2013 10
8 B Mar 2013 8
9 C Mar 2013 13
You can use read.fwf to read fixed width files. You need to correctly specify the width of each column, in spaces.
data <- read.fwf(path, widths=c(-12, 8, -4, 2), header=T)
The key there is how you specify the width. Negative means skip that many places, positive means read that many. I am assuming entries in the last column have only 2 digits. Change widths accordingly if this is not the case. You will probably also have to fix the column names.
You will have to change the indices if the file format changes, or come up with some clever regexp to read it from the first few rows. A better solution would be to enclose your strings in " or, even better, avoid the format altogether.
?count.fields
As the R Documentation states count.fields counts the number of fields, as separated by sep, in each of the lines of file read, when you set count.fields(path, blank.lines.skip=FALSE) == 4 it will skip the header row which actually has three fields.

Resources