How can I get the duration of the drawdowns in a zoo serie?
the drawdowns can be calculated with cummax(mydata)-mydata. Whenever this value is above zero I have a drawdown.
The Drawdown is the measure of the decline from a historical peak (maximum).
It lasts till this value is reached again.
The PerformanceAnalytics package has several functions to do this operation.
> library(PerformanceAnalytics)
> data(edhec)
> dd <- findDrawdowns(edhec[,"Funds of Funds", drop=FALSE])
> dd$length
[1] 3 3 6 5 4 11 14 5 2 10 2 6 3 2 4 9 2 2 13 8 5 5 4 2 7
[26] 6 11 3 2 23
As a side note, if you have two dates in a time series and need to know the time between them, just use diff. You can also use the lubridate package.
Related
Data looks like this:
ID Lat Long Time
1 3 3 00:01
1 3 4 00:02
1 4 4 00:03
2 4 3 00:01
2 4 4 00:02
2 4 5 00:03
3 5 2 00:01
3 5 3 00:02
3 5 4 00:03
4 9 9 00:01
4 9 8 00:02
4 8 8 00:03
5 7 8 00:01
5 8 8 00:02
5 8 9 00:03
I want to measure how far the IDs are away from each other within a given radius at each given time interval. I am doing this on 1057 ID's across 16213 time intervals so efficiency is important.
It is important to measure distance between points within a radius because if the points are too far away I don't care. I am trying to measure distances between points who are relatively close. For example I don't care how far away ID 1 is from ID 5 but I care about how far ID 4 is from ID 5.
I am using R and the sp package.
For what I can see, there will be repeated values many times. Therefore, I would suggest to calculate the distance for each pair of coordinates only once (even if repeated many times in the df) as a starting point. Than you can filter the data and merge the tables. (I would add it as a comment, but I don't have enought reputation to do so yet).
The first lines would be:
#Creating a DF with no repeated coordinates
df2 <- df %>% group_by(Lat,Long) %>% summarise()
# Calculating Distances
Dist <- distm(cbind(df2$Long,df2$Lat))
This question already has an answer here:
R // Sum by based on date range
(1 answer)
Closed 7 years ago.
Suppose I have df1 like this:
Date Var1
01/01/2015 1
01/02/2015 4
....
07/24/2015 1
07/25/2015 6
07/26/2015 23
07/27/2015 15
Q1: Sum of Var1 on previous 3 days of 7/27/2015 (not including 7/27).
Q2: Sum of Var1 on previous 3 days of 7/25/2015 (This is not last row), basically I choose anyday as reference day, and then calculate rolling sum.
As suggested in one of the comments in the link referenced by #SeƱorO, with a little bit of work you can use zoo::rollsum:
library(zoo)
set.seed(42)
df <- data.frame(d=seq.POSIXt(as.POSIXct('2015-01-01'), as.POSIXct('2015-02-14'), by='days'),
x=sample(20, size=45, replace=T))
k <- 3
df$sum3 <- c(0, cumsum(df$x[1:(k-1)]),
head(zoo::rollsum(df$x, k=k), n=-1))
df
## d x sum3
## 1 2015-01-01 16 0
## 2 2015-01-02 12 16
## 3 2015-01-03 15 28
## 4 2015-01-04 15 43
## 5 2015-01-05 17 42
## 6 2015-01-06 10 47
## 7 2015-01-07 11 42
The 0, cumsum(...) is to pre-populate the first two rows that are ignored (rollsum(x, k) returns a vector of length length(x)-k+1). The head(..., n=-1) discards the last element, because you said that the nth entry should sum the previous 3 and not its own row.
I have 3133 rows representing payments made on some of the 5296 days between 7/1/2000 and 12/31/2014; that is, the "Date" feature is non-continuous:
> head(d_exp_0014)
Year Month Day Amount Count myDate
1 2000 7 6 792078.6 9 2000-07-06
2 2000 7 7 140065.5 9 2000-07-07
3 2000 7 11 190553.2 9 2000-07-11
4 2000 7 12 119208.6 9 2000-07-12
5 2000 7 16 1068156.3 9 2000-07-16
6 2000 7 17 0.0 9 2000-07-17
I would like to fit a linear time trend variable,
t <- 1:3133
to a linear model explaining the variation in the Amount of the expenditure.
fit_t <- lm(Amount ~ t + Count, d_exp_0014)
However, this is obviously wrong, as t increments in different amounts between the dates:
> head(exp)
Year Month Day Amount Count Date t
1 2000 7 6 792078.6 9 2000-07-06 1
2 2000 7 7 140065.5 9 2000-07-07 2
3 2000 7 11 190553.2 9 2000-07-11 3
4 2000 7 12 119208.6 9 2000-07-12 4
5 2000 7 16 1068156.3 9 2000-07-16 5
6 2000 7 17 0.0 9 2000-07-17 6
Which to me is the exact opposite of a linear trend.
What is the most efficient way to get this data.frame merged to a continuous date-index? Will a date vector like
CTS_date_V <- as.data.frame(seq(as.Date("2000/07/01"), as.Date("2014/12/31"), "days"), colnames = "Date")
yield different results?
I'm open to any packages (using fpp, forecast, timeSeries, xts, ts, as of right now); just looking for a good answer to deploy in functional form, since these payments are going to be updated every week and I'd like to automate the append to this data.frame.
I think some kind of transformation to regular (continuous) time series is a good idea.
You can use xts to transform time series data (it is handy, because it can be used in other packages as regular ts)
Filling the gaps
# convert myDate to POSIXct if necessary
# create xts from data frame x
ts1 <- xts(data.frame(a = x$Amount, c = x$Count), x$myDate )
ts1
# create empty time series
ts_empty <- seq( from = start(ts1), to = end(ts1), by = "DSTday")
# merge the empty ts to the data and fill the gap with 0
ts2 <- merge( ts1, ts_empty, fill = 0)
# or interpolate, for example:
ts2 <- merge( ts1, ts_empty, fill = NA)
ts2 <- na.locf(ts2)
# zoo-xts ready functions are:
# na.locf - constant previous value
# na.approx - linear approximation
# na.spline - cubic spline interpolation
Deduplicate dates
In your sample there is now sign of duplicated values. But based on a new question it is very likely. I think you want to aggregate values with sum function:
ts1 <- period.apply( ts1, endpoints(ts1,'days'), sum)
I am using R and have a vector of dates as Day of Year (DOY) in which some days are missing. I want to find where these missing days are.
DOY <- c(1,2,5,6,7,10,15,16,17)
I want an output which tells me that missing days are between day:
2 to 5
7 to 10
10 to 15
(Or the indices of these locations)
rDOY <- range(DOY);
rnDOY <- seq(rDOY[1],rDOY[2])
rnDOY[!rnDOY %in% DOY]
[1] 3 4 8 9 11 12 13 14
If instead you don't want the mssing days and do wnat the beginnings and ends of the missing items:
> DOY[ diff(DOY)!=1]
[1] 2 7 10
> DOY[-1] [ diff(DOY)!=1]
[1] 5 10 15
I have a data frame of time series data in a 'long' format where there is 1 row/observation per day. I would like to transform this data into a 'wide' format. Each row/observation should have the time series value for the current date and the previous 2 days.
To provide a concrete example, I will use the Air Quality data available in R. This is what my input data frame looks like.
> input <- airquality[1:4,c("Month", "Day", "Ozone")]
> input
Month Day Ozone
1 5 1 41
2 5 2 36
3 5 3 12
4 5 4 18
I would like to transform this input so that it looks like the following.
output <- data.frame(Month = 5, Day = 1:4, Ozone=c(41,36,12,18), Ozone.Prev.1=c(NA,41,36,12), Ozone.Prev.2=c(NA,NA,41,36))
> output
Month Day Ozone Ozone.Prev.1 Ozone.Prev.2
1 5 1 41 NA NA
2 5 2 36 41 NA
3 5 3 12 36 41
4 5 4 18 12 36
Any suggestions on a nice, clean way to do this? Many thanks in advance.
You can use the lag function from zoo, but the following small function get's the trick done without using additional packages:
shift_vector = function(vec, n) c(rep(NA, n), head(vec, -n))
output = transform(input, prev_1 = shift_vector(Ozone, 1),
prev_2 = shift_vector(Ozone, 2))
output
Month Day Ozone prev_1 prev_2
1 5 1 41 NA NA
2 5 2 36 41 NA
3 5 3 12 36 41
4 5 4 18 12 36