How do I lag Quarters in r? - r

First and foremost - thank you for viewing my question - regardless of if you answer or not.
I am trying to add a column that contains the lagged values of the Quarter value to my DF, however, I get the below warning when I do so:
Warning messages:
1: In mutate_impl(.data, dots) :
Vectorizing 'yearqtr' elements may not preserve their attributes
Below is my sample data (my data starts on 1/3/2018)
Ticker Price Date Quarter
A 10 1/3/18 2018 Q1
A 13.5 2/15/18 2018 Q1
A 12.9 4/2/18 2018 Q2
A 11.2 5/3/18 2018 Q2
B 35.2 1/4/18 2018 Q1
B 33.1 3/2/18 2018 Q1
B 31 4/6/18 2018 Q2
... ... ... ...
XYZ 102 5/6/18 2018 Q2
I have a huge table with multiple stocks and multiple dates. The way I calculate the quarter column is :
df$quarter <- lag(as.yearqtr(df$Date))
But however - I can't get to add a column that would lag the values of the Quarter. Would anyone know a possible workaround?
I would like the below output:
Ticker Price Date Quarter Lag_Q
A 10 1/3/18 2018 Q1 NA
A 13.5 2/15/18 2018 Q1 NA
A 12.9 4/2/18 2018 Q2 2018 Q1
A 11.2 5/3/18 2018 Q2 2018 Q1
B 35.2 1/4/18 2018 Q1 NA
B 33.1 3/2/18 2018 Q1 NA
B 31 4/6/18 2018 Q2 2018 Q1
... ... ... ...
XYZ 102 5/6/18 2018 Q2 2018 Q1

Firstly, I'd suggest organizing your data so that each column represents prices of an individual security and each row is a specific date. From there, you can transform all securities easily, but I'm not sure what your end goal is. The xts package is excellent and has been optimized in c, and is kind of the securities industry standard. I highly suggest exploring it. But that's beyond the scope of your post!
For your data structure though, a single line should do:
df$lag_Q <- as.yearqtr( ifelse(test = (df$quarter=="2018 Q1"),
yes = NA,
no = df$quarter-0.25) )

Related

In R - How do I plot the data from the row where the date=='"2020 Q1' to the last row of the dataframe?

example of the data:
Date
Inflation
2020 Q1
2
2020 Q2
2.1
2020 Q3
2
2020 Q4
2.1
I am using ggplot ggplot(CPI,aes(x=date,y=inflation,group=1))+geom_line() but I would like to only plot from whichever row number that contains date= 2020 Q1 until the last row of the set (which will change everytime as I am getting webscraping the data from a website).

Aggregating based on previous year and this year

I have these data sets
month Year Rain
10 2010 376.8
11 2010 282.78
12 2010 324.58
1 2011 73.51
2 2011 225.89
3 2011 22.96
I used
df2prnext<-
aggregate(Rain~Year, data = subdataprnext, mean)
but I need the mean value of 217.53.
I am not getting the expected result. Thank you for your help.

Combining unequal data frames and applying a calculation

I've been doing some data cleaning and regressions but now I would like to apply the output however, I'm stuck on the following problem.
One data frame called "Historical" and looks like this:
Year Value
2014 5
2015 7.5
2016 11
The other data frame is called "forecast" and looks like this (new years in the future):
Year Growth
2017 0.05
2018 0.11
etc
So I would like to have one data frame to show historical values and forecasted values starting in 2017 (11*1.05)
How can I go about this?
Much appreciated
Given
a <- read.table(header=T, text="Year Value
2014 5
2015 7.5
2016 11")
b <- read.table(header=T, text="
Year Growth
2017 0.05
2018 0.11")
You could e.g. do
rbind(a, cbind(
Year=b$Year,
Value=cumprod(c(tail(a$Value, 1), 1+b$Growth))[-1])
)
# Year Value
# 1 2014 5.0000
# 2 2015 7.5000
# 3 2016 11.0000
# 4 2017 11.5500
# 5 2018 12.8205

Boxplot not plotting all data

I'm trying to plot a boxplot for a time series (e.g. http://www.r-graph-gallery.com/146-boxplot-for-time-series/) and can get every other example to work, bar my last one. I have averages per month for six years (2011 to 2016) and have data for 2014 and 2015 (albeit in small quantities), but for some reason, boxes aren't being shown for the 2014 and 2015 data.
My input data has three columns: year, month and residency index (a value between 0 and 1). There are multiple individuals (in this example, 37) each with an average residency index per month per year (including 2014 and 2015).
For example:
year month RI
2015 1 NA
2015 2 NA
2015 3 NA
2015 4 NA
2015 5 NA
2015 6 NA
2015 7 0.387096774
2015 8 0.580645161
2015 9 0.3
2015 10 0.225806452
2015 11 0.3
2015 12 0.161290323
2016 1 0.096774194
2016 2 0.103448276
2016 3 0.161290323
2016 4 0.366666667
2016 5 0.258064516
2016 6 0.266666667
2016 7 0.387096774
2016 8 0.129032258
2016 9 0.133333333
2016 10 0.032258065
2016 11 0.133333333
2016 12 0.129032258
which is repeated for each individual fish.
My code:
#make boxplot
boxplot(RI$RI~RI$month+RI$year,
xaxt="n",xlab="",col=my_colours,pch=20,cex=0.3,ylab="Residency Index (RI)", ylim=c(0,1))
abline(v=seq(0,12*6,12)+0.5,col="grey")
axis(1,labels=unique(RI$year),at=seq(6,12*6,12))
The average trend line works as per the other examples.
a=aggregate(RI$RI,by=list(RI$month,RI$year),mean, na.rm=TRUE)
lines(a[,3],type="l",col="red",lwd=2)
Any help on this matter would be greatly appreciated.
Your problem seems to be the presence of missing values, NA, in your data, the other values are plotted correctly. I've simplified your code a bit.
boxplot(RI$RI ~ RI$month + RI$year,
ylab="Residency Index (RI)")
a <- aggregate(RI ~ month + year, data = RI, FUN = mean, na.rm = TRUE)
lines(c(rep(NA, 6), a[,3]), type="l", col="red", lwd=2)
Also, I believe that maybe a boxplot is not the best way to depict your data. You only have one value per year/month, when a boxplot would require more. Maybe a simple scatter plot will do better.

How to find unique field values from two columns in data frame

I have a data frame containing many columns, including Quarter and CustomerID. In this I want to identify the unique combinations of Quarter and CustomerID.
For eg:
masterdf <- read.csv(text = "
Quarter, CustomerID, ProductID
2009 Q1, 1234, 1
2009 Q1, 1234, 2
2009 Q2, 1324, 3
2009 Q3, 1234, 4
2009 Q3, 1234, 5
2009 Q3, 8764, 6
2009 Q4, 5432, 7")
What i want is:
FilterQuarter UniqueCustomerID
2009 Q1 1234
2009 Q2 1324
2009 Q3 8764
2009 Q3 1234
2009 Q4 5432
How to do this in R? I tried unique function but it is not working as i want.
The long comments under the OP are getting hard to follow. You are looking for duplicated as pointed out by #RomanLustrik. Use it to subset your original data.frame like this...
masterdf[ ! duplicated( masterdf[ c("Quarter" , "CustomerID") ] ) , ]
# Quarter CustomerID
#1 2009 Q1 1234
#3 2009 Q2 1324
#4 2009 Q3 1234
#6 2009 Q3 8764
#7 2009 Q4 5432
Another simple way is to use SQL queries from R, check the codes below.
This assumes masterdf is the name of the original file...
library(sqldf)
sqldf("select Quarter, CustomerID from masterdf group by 1,2")

Resources