I am looking for a better way to compare a value from a day (day X) to the previous day (day X-1). Here I am using the airquality dataset. Suppose I am interested in comparing the wind from one day to the wind from the previous day. Right now I am using merge() to bring together two dataframes - one current day dataframe and one from the previous day. I am also just subtracting 1 from the Day column to get the PrevDay column:
airquality$PrevDay=airquality$Day-1
airquality.comp <- merge(
airquality[,c("Wind","Day")],
airquality[,c("Temp","PrevDay")],
by.x=c("Day"),by.y=c("PrevDay"))
My issue here is that I'd need to create another dataframe if I wanted to look back 2 days or if I wanted to switch Wind and Temp and look at them the other way. This just seems clunky. Can anyone recommend a better way of doing this?
IMO data.table may be harder to get used to compared to dplyr, but it will save your tail later when you need robust analysis:
setDT(airquality)[, shift(Wind, n=2L, type="lag") < Wind]
In base R, you can add an NA value and eliminate the last for comparison:
with(airquality, c(NA,head(Wind,-1)) < Wind)
Whar kind of comparison do you need?
For example, to check if the followonf values is greater you could use:
library(dplyr)
with(airquality, lag(Wind) < Wind)
Or with two lags:
with(airquality, lag(Wind, 2) < Wind)
It depends on what questions you are trying to answer, but I would look into Autocorrelation (the correlation of a time series with its own lagged values). You may want to look into the acf() function to compare the time series to itself since this will help you highlight which lags are significantly correlated.
Or if you want to compare 2 different metrics (such as Wind and Temp), then you may want to try the ccf() function since it allows you to input 2 different vectors and it will compute the cross correlation with lags. For example:
ccf(airquality$Wind,airquality$Temp)
If you are interested in autocorrelation or cross-correlation, in particular, then you might also consider something like mutual information, which will work for non-Gaussian data as well. Both the infotheo and entropy (more here) packages for R have built-in functions to do so.
I am trying to reproduce an equally weighted price index. I have the prices and a matrix which tells when a stock is in and out of the index. Here is a small part of the data.
table="date, A, B,C,D,E,F
1,31/01/1998,1,1,1,1,1,0
2,28/02/1998,1,1,1,1,1,0
3,31/03/1998,1,1,1,1,1,0
4,30/04/1998,1,1,1,1,1,0
5,31/05/1998,1,1,1,1,1,0"
matrix=read.csv(text=table)
table2="date,A,B,C,D,E,F
1,05/01/98,20.56,97.40,279.70,72.85,20.33,298.00
2,06/01/98,20.56,96.50,276.20,72.90,20.22,299.90
3,14/02/98,20.84,98.45,282.50,73.75,20.70,302.80
4,15/02/98,20.90,98.50,280.70,73.65,20.71,306.50
5,09/03/98,20.58,97.00,276.20,72.95,20.25,304.00"
price=read.csv(text=table2)
The order of the stock is the same in the price and matrix data. Since I would like to multiply the matrix with the price I turned both into matrix.
as.matrix(price)
as.matrix(matrix)
as.Date[price[,1], format="%d/%m/%y"] #Error: object of type 'closure' is not subsettable
as.Date[matrix[,1], format="%d/%m/%Y"]
(1)However here I got my first problem. The dates are not recognized whether in the matrix nor if I do it before using as.Matrix(). I also tried methods proposed here (Extract month and year from a zoo::yearmon object). I need the dates for the following reason. I would like to make a loop which 1. Takes the month and year from the matrix and searches for the same months and years in the price data. 2. If same month and years are found it should multiply the row from matrix with the rows from prices. This is due to the fact that the matrix is on monthly basis and the prices are daily. And this would be my Idea of the loop:
for (matrix(%m/%Y) in price$date){
if (matrix(%m/%Y)== price(%m/%y)
c<- matrix[position of matrix(%m/%Y),] %*% price[position of price(%m/%y),]
}
(2)However I never worked with loops before and the second question is if the for loop is suitable for my problem? The desired output of the loop would be the following:
table3="date,A,B,C,D,E,F
1,05/01/98,20.56,97.40,279.70,72.85,20.33,0
2,06/01/98,20.56,96.50,276.20,72.90,20.22,0
3,14/02/98,20.84,98.45,282.50,73.75,20.70,0
4,15/02/98,20.90,98.50,280.70,73.65,20.71,0
5,09/03/98,20.58,97.00,276.20,72.95,20.25,0"
desired_c=read.csv(text=table3)
At the end however, I would like to have an equally weighted price index like this:
table4="date, price
1,05/01/98,98.168
2,06/01/98,97.276
3,14/02/98,99.248
4,15/02/98,98.892
5,09/03/98,97.396"
desired_index=read.csv(text=table4)
if I could put that in my loop that would be great. Please note that the matrix and the prices are consisting of many observations. Therefore only deleting last column is not an option.
I am working with NDVI3g data sets. My problem is that i am trying to create monthly composite data sets from the bi-monthly original data sets using maximum value composite method in R. Please i need your help, because i tried my possible best, but couldn't figure it out. The problem with data is that the first composite in a month is named as for example below;
AF99sep15a.n14-VI3g: first 15 days
AF99sep15b.n14-VI3g : Last 15 days;
I have 31 years data sets (i.e 1982-2012).
Kindly need your help on how to combine the whole data sets into a monthly composite.
given RasterStack gimms and that you want to average sequential pairs, I think you can do
i <- rep(1:(nlayers(gimms)/2), each =2)
x <- stackApply(gimms, i, mean)
Make sure to also check out the gimms package which includes the function monthlyComposite (including optional parallel support) to create monthly maximum value composites from the initial half-monthly layers. Needless to say, the function is heavily based on stackApply from the raster package.