Drawdown duration - r

How can I get the duration of the drawdowns in a zoo serie?
the drawdowns can be calculated with cummax(mydata)-mydata. Whenever this value is above zero I have a drawdown.
The Drawdown is the measure of the decline from a historical peak (maximum).
It lasts till this value is reached again.

The PerformanceAnalytics package has several functions to do this operation.
> library(PerformanceAnalytics)
> data(edhec)
> dd <- findDrawdowns(edhec[,"Funds of Funds", drop=FALSE])
> dd$length
[1] 3 3 6 5 4 11 14 5 2 10 2 6 3 2 4 9 2 2 13 8 5 5 4 2 7
[26] 6 11 3 2 23
As a side note, if you have two dates in a time series and need to know the time between them, just use diff. You can also use the lubridate package.

Related

Distance Between Points Within Radius at Time Intervals

Data looks like this:
ID Lat Long Time
1 3 3 00:01
1 3 4 00:02
1 4 4 00:03
2 4 3 00:01
2 4 4 00:02
2 4 5 00:03
3 5 2 00:01
3 5 3 00:02
3 5 4 00:03
4 9 9 00:01
4 9 8 00:02
4 8 8 00:03
5 7 8 00:01
5 8 8 00:02
5 8 9 00:03
I want to measure how far the IDs are away from each other within a given radius at each given time interval. I am doing this on 1057 ID's across 16213 time intervals so efficiency is important.
It is important to measure distance between points within a radius because if the points are too far away I don't care. I am trying to measure distances between points who are relatively close. For example I don't care how far away ID 1 is from ID 5 but I care about how far ID 4 is from ID 5.
I am using R and the sp package.
For what I can see, there will be repeated values many times. Therefore, I would suggest to calculate the distance for each pair of coordinates only once (even if repeated many times in the df) as a starting point. Than you can filter the data and merge the tables. (I would add it as a comment, but I don't have enought reputation to do so yet).
The first lines would be:
#Creating a DF with no repeated coordinates
df2 <- df %>% group_by(Lat,Long) %>% summarise()
# Calculating Distances
Dist <- distm(cbind(df2$Long,df2$Lat))

R - How to sum a column based on date range? [duplicate]

This question already has an answer here:
R // Sum by based on date range
(1 answer)
Closed 7 years ago.
Suppose I have df1 like this:
Date Var1
01/01/2015 1
01/02/2015 4
....
07/24/2015 1
07/25/2015 6
07/26/2015 23
07/27/2015 15
Q1: Sum of Var1 on previous 3 days of 7/27/2015 (not including 7/27).
Q2: Sum of Var1 on previous 3 days of 7/25/2015 (This is not last row), basically I choose anyday as reference day, and then calculate rolling sum.
As suggested in one of the comments in the link referenced by #SeƱorO, with a little bit of work you can use zoo::rollsum:
library(zoo)
set.seed(42)
df <- data.frame(d=seq.POSIXt(as.POSIXct('2015-01-01'), as.POSIXct('2015-02-14'), by='days'),
x=sample(20, size=45, replace=T))
k <- 3
df$sum3 <- c(0, cumsum(df$x[1:(k-1)]),
head(zoo::rollsum(df$x, k=k), n=-1))
df
## d x sum3
## 1 2015-01-01 16 0
## 2 2015-01-02 12 16
## 3 2015-01-03 15 28
## 4 2015-01-04 15 43
## 5 2015-01-05 17 42
## 6 2015-01-06 10 47
## 7 2015-01-07 11 42
The 0, cumsum(...) is to pre-populate the first two rows that are ignored (rollsum(x, k) returns a vector of length length(x)-k+1). The head(..., n=-1) discards the last element, because you said that the nth entry should sum the previous 3 and not its own row.

(In)correct use of a linear time trend variable, and most efficient fix?

I have 3133 rows representing payments made on some of the 5296 days between 7/1/2000 and 12/31/2014; that is, the "Date" feature is non-continuous:
> head(d_exp_0014)
Year Month Day Amount Count myDate
1 2000 7 6 792078.6 9 2000-07-06
2 2000 7 7 140065.5 9 2000-07-07
3 2000 7 11 190553.2 9 2000-07-11
4 2000 7 12 119208.6 9 2000-07-12
5 2000 7 16 1068156.3 9 2000-07-16
6 2000 7 17 0.0 9 2000-07-17
I would like to fit a linear time trend variable,
t <- 1:3133
to a linear model explaining the variation in the Amount of the expenditure.
fit_t <- lm(Amount ~ t + Count, d_exp_0014)
However, this is obviously wrong, as t increments in different amounts between the dates:
> head(exp)
Year Month Day Amount Count Date t
1 2000 7 6 792078.6 9 2000-07-06 1
2 2000 7 7 140065.5 9 2000-07-07 2
3 2000 7 11 190553.2 9 2000-07-11 3
4 2000 7 12 119208.6 9 2000-07-12 4
5 2000 7 16 1068156.3 9 2000-07-16 5
6 2000 7 17 0.0 9 2000-07-17 6
Which to me is the exact opposite of a linear trend.
What is the most efficient way to get this data.frame merged to a continuous date-index? Will a date vector like
CTS_date_V <- as.data.frame(seq(as.Date("2000/07/01"), as.Date("2014/12/31"), "days"), colnames = "Date")
yield different results?
I'm open to any packages (using fpp, forecast, timeSeries, xts, ts, as of right now); just looking for a good answer to deploy in functional form, since these payments are going to be updated every week and I'd like to automate the append to this data.frame.
I think some kind of transformation to regular (continuous) time series is a good idea.
You can use xts to transform time series data (it is handy, because it can be used in other packages as regular ts)
Filling the gaps
# convert myDate to POSIXct if necessary
# create xts from data frame x
ts1 <- xts(data.frame(a = x$Amount, c = x$Count), x$myDate )
ts1
# create empty time series
ts_empty <- seq( from = start(ts1), to = end(ts1), by = "DSTday")
# merge the empty ts to the data and fill the gap with 0
ts2 <- merge( ts1, ts_empty, fill = 0)
# or interpolate, for example:
ts2 <- merge( ts1, ts_empty, fill = NA)
ts2 <- na.locf(ts2)
# zoo-xts ready functions are:
# na.locf - constant previous value
# na.approx - linear approximation
# na.spline - cubic spline interpolation
Deduplicate dates
In your sample there is now sign of duplicated values. But based on a new question it is very likely. I think you want to aggregate values with sum function:
ts1 <- period.apply( ts1, endpoints(ts1,'days'), sum)

FInding date gaps in R

I am using R and have a vector of dates as Day of Year (DOY) in which some days are missing. I want to find where these missing days are.
DOY <- c(1,2,5,6,7,10,15,16,17)
I want an output which tells me that missing days are between day:
2 to 5
7 to 10
10 to 15
(Or the indices of these locations)
rDOY <- range(DOY);
rnDOY <- seq(rDOY[1],rDOY[2])
rnDOY[!rnDOY %in% DOY]
[1] 3 4 8 9 11 12 13 14
If instead you don't want the mssing days and do wnat the beginnings and ends of the missing items:
> DOY[ diff(DOY)!=1]
[1] 2 7 10
> DOY[-1] [ diff(DOY)!=1]
[1] 5 10 15

Prepare Time Series for Machine Learning - Long to Wide Format

I have a data frame of time series data in a 'long' format where there is 1 row/observation per day. I would like to transform this data into a 'wide' format. Each row/observation should have the time series value for the current date and the previous 2 days.
To provide a concrete example, I will use the Air Quality data available in R. This is what my input data frame looks like.
> input <- airquality[1:4,c("Month", "Day", "Ozone")]
> input
Month Day Ozone
1 5 1 41
2 5 2 36
3 5 3 12
4 5 4 18
I would like to transform this input so that it looks like the following.
output <- data.frame(Month = 5, Day = 1:4, Ozone=c(41,36,12,18), Ozone.Prev.1=c(NA,41,36,12), Ozone.Prev.2=c(NA,NA,41,36))
> output
Month Day Ozone Ozone.Prev.1 Ozone.Prev.2
1 5 1 41 NA NA
2 5 2 36 41 NA
3 5 3 12 36 41
4 5 4 18 12 36
Any suggestions on a nice, clean way to do this? Many thanks in advance.
You can use the lag function from zoo, but the following small function get's the trick done without using additional packages:
shift_vector = function(vec, n) c(rep(NA, n), head(vec, -n))
output = transform(input, prev_1 = shift_vector(Ozone, 1),
prev_2 = shift_vector(Ozone, 2))
output
Month Day Ozone prev_1 prev_2
1 5 1 41 NA NA
2 5 2 36 41 NA
3 5 3 12 36 41
4 5 4 18 12 36

Resources