Get highest value of X rows - r

I'm looking for a function to calculate the highest value for the prior X periods on an XTS object. The function would return a vector with such values.
I would believe there are multiple ways to calculate this. Surprisingly I could not find this covered in a prior SO question. I am hoping there is a package with a function already defined for this. If there is none maybe someone knows how to tackle it.
The example below shows how the vector with the highest values of the last 3 periods would look like for XTS object XTS1.
library('xts')
XTS1 <- structure(c(12, 7, 7, 22, 24, 30, 26, 23, 27, 30), .indexCLASS = c("POSIXct", "POSIXt"), .indexTZ = "", tclass = c("POSIXct", "POSIXt"), tzone = "", class = c("xts", "zoo"), .CLASS = structure("double", class = "CLASS"), formattable = structure(list(formatter = "formatC", format = structure(list(format = "f", digits = 2), .Names = c("format", "digits")), preproc = "percent_preproc", postproc = "percent_postproc"), .Names = c("formatter", "format", "preproc", "postproc")), index = structure(c(1413981900, 1413982800, 1413983700, 1413984600, 1413985500, 1413986400, 1413987300, 1413988200, 1413989100, 1413990000), tzone = "", tclass = c("POSIXct", "POSIXt")), .Dim = c(10L, 1L))
#DESIRED OUTPUT
[,1] GetHighest(3)
2014-10-22 08:45:00 12 NA
2014-10-22 09:00:00 7 12
2014-10-22 09:15:00 7 12
2014-10-22 09:30:00 22 12
2014-10-22 09:45:00 24 22
2014-10-22 10:00:00 30 24
2014-10-22 10:15:00 26 30
2014-10-22 10:30:00 23 30
2014-10-22 10:45:00 27 30
2014-10-22 11:00:00 30 27

You could use rollapply from zoo.
So it would look something like this:
GetHighest_3 = rollapply(data = XTS1, width = 3, FUN = max)
Then combine it:
cbind(XTS1, GetHighest_3)
The only probelm I see, is that it will probably return NA for the first 2 values, not only the first value, since it has a width of 3.
I wasn't able to test the code, since I don't have access to R right now, so there might be some misspelling.

Related

R Group ID's with overlapping time intervals

I have a large dataset with multiple groups within the dataset of IDs with Start & Stop datetimes. What I'm trying to do is within each group identify where a subgroup occurred. A subgroup within a group would be when two ID's overlap with their START & END datetime columns. Below is script to create a sample dataset in R for one group. What I want to do is within each group create a column called, "Grp" that groups those subgroups with overlapping START & END datetimes.
What I have...
structure(list(ID = c(1,2,3,4), START = structure(c(1490904000, 1490918400,
1508363100, 1508379300), tzone = "UTC", class = c("POSIXct",
"POSIXt")), END = structure(c(1492050600, 1492247700,
1509062400, 1509031800), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), class = "data.frame", row.names = c(NA, -4L), .Names = c("ID","START",
"END"))
What I want is...
structure(list(ID = c(1,2,3,4), START = structure(c(1490904000, 1508379300,
1508363100, 1490918400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), END = structure(c(1492050600, 1509031800,
1509062400, 1492247700), tzone = "UTC", class = c("POSIXct",
"POSIXt")), Grp = c(1,2,2,1)), class = "data.frame", row.names = c(NA, -4L), .Names = c("ID","START",
"END","Grp"))
I've tried using lubridate's interval, and finding an overlap that way, but no luck. Any help would be greatly appreciated.
Atfter sorting by START, the condition for a new group is that the END of the previous row is less than the START of the next group:
head(df1$END, -1) < tail(df1$START,-1)
df1 <- structure(list(ID = c(1,2,3,4), START = structure(c(1490904000, 1490918400,
1508363100, 1508379300), tzone = "UTC", class = c("POSIXct",
"POSIXt")), END = structure(c(1492050600, 1492247700,
1509062400, 1509031800), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), class = "data.frame", row.names = c(NA, -4L), .Names = c("ID","START",
"END"))
df1
ID START END
1 1 2017-03-30 20:00:00 2017-04-13 02:30:00
2 2 2017-03-31 00:00:00 2017-04-15 09:15:00
3 3 2017-10-18 21:45:00 2017-10-27 00:00:00
4 4 2017-10-19 02:15:00 2017-10-26 15:30:00
df1a <- df1[ order(df1$START), ]
df1a$grp <- cumsum( c( 1, head(df1$END, -1) < tail(df1$START,-1) ))
df1a
#---------------
ID START END grp
1 1 2017-03-30 20:00:00 2017-04-13 02:30:00 1
2 2 2017-03-31 00:00:00 2017-04-15 09:15:00 1
3 3 2017-10-18 21:45:00 2017-10-27 00:00:00 2
4 4 2017-10-19 02:15:00 2017-10-26 15:30:00 2
Here's a function that answers the first part of my response to the comment below:
grp_overlaps <- function(endings, beginnings){
cumsum(c( 1, head(endings, -1) < tail(beginnings, -1) )) }

Finding value whose date interval encompasses punctual date

I have a data.table of theoric values which are given for an interval:
firstDate lastDate theoric
2017-01-01 2017-01-03 10
2017-01-05 2017-01-25 20
2017-02-01 2017-08-31 30
On the other hand, I have punctual measured values:
datetime measured
2017-01-02 11
2017-01-08 22
2017-01-09 19
2017-01-26 25
2017-03-02 32
I would like to have, for each measured value, the corresponding theoric value (the one whose interval includes the measurement date).
Notes: 1. Theoric intervals can not overlap. 2. If a measurement is not within any theroric interval, return NA.
Expected output:
datetime measured theoric
2017-01-02 11 10
2017-01-08 22 20
2017-01-09 19 20
2017-01-26 25 NA
2017-03-02 32 30
Reproducible dataset:
theoricDt <- structure(list(firstDate = structure(c(1483228800, 1483574400, 1485907200), class = c("POSIXct", "POSIXt"), tzone = "GMT"), lastDate = structure(c(1483401600, 1485302400, 1504137600 ), class = c("POSIXct", "POSIXt"), tzone = "GMT"), theoric = c(10, 20, 30)), .Names = c("firstDate", "lastDate", "theoric"), row.names = c(NA, -3L), class = c("data.table", "data.frame"))
measureDt <- structure(list(datetime = structure(c(1483315200, 1483833600, 1483920000, 1485388800, 1488412800), class = c("POSIXct", "POSIXt"), tzone = "GMT"), measured = c(11, 22, 19, 25, 32)), .Names = c("datetime", "measured"), row.names = c(NA, -5L), class = c("data.table","data.frame"))
You can use a non-equi join:
measureDt[theoricDt, on = .(datetime >= firstDate, datetime <= lastDate),
theoric := i.theoric]
measureDt
# datetime measured theoric
#1: 2017-01-02 11 10
#2: 2017-01-08 22 20
#3: 2017-01-09 19 20
#4: 2017-01-26 25 NA
#5: 2017-03-02 32 30

Function to return modified matrix in R

The following code adds vector XTS1$XTSSum2 to xts object XTS1:
library(xts)
XTS1 <- structure(c(12, 7, 7, 22, 24, 30, 26, 23, 27, 30), .indexCLASS = c("POSIXct", "POSIXt"), .indexTZ = "", tclass = c("POSIXct", "POSIXt"), tzone = "", class = c("xts", "zoo"), .CLASS = structure("double", class = "CLASS"), formattable = structure(list(formatter = "formatC", format = structure(list(format = "f", digits = 2), .Names = c("format", "digits")), preproc = "percent_preproc", postproc = "percent_postproc"), .Names = c("formatter", "format", "preproc", "postproc")), index = structure(c(1413981900, 1413982800, 1413983700, 1413984600, 1413985500, 1413986400, 1413987300, 1413988200, 1413989100, 1413990000), tzone = "", tclass = c("POSIXct", "POSIXt")), .Dim = c(10L, 1L))
colnames(XTS1) <- "XTS1"
XTS1$XTSSum2 <- XTS1$XTS1 + lag(XTS1$XTS1,1)
The following function performs the same operation.
addfunction <- function(x){
x$XTSSum2 <- x$XTS1 + lag(x$XTS1,1)
}
addfunction(XTS1)
But the vector XTS1$XTSSum2 is not stored.
How can I get addfunction to store the vector so that after running addfunction(XTS1), XTS1 will look like this:
XTS1 XTSSum2
2014-10-22 08:45:00 12 NA
2014-10-22 09:00:00 7 19
2014-10-22 09:15:00 7 14
2014-10-22 09:30:00 22 29
2014-10-22 09:45:00 24 46
2014-10-22 10:00:00 30 54
2014-10-22 10:15:00 26 56
2014-10-22 10:30:00 23 49
2014-10-22 10:45:00 27 50
2014-10-22 11:00:00 30 57
The reproducible example uses an xts object, presume the same solution would apply
to xts objects, matrices and data frames.
The assignment is happening within the function's environment, not the global one. You need to return the result in the function, and assign it with the function call. Try this:
addfunction <- function(x){
x$XTSSum2 <- x$XTS1 + lag(x$XTS1,1)
x
}
XTS1 <- addfunction(XTS1)

Date conversion as.yearmon

I have a dataframe that looks like this:
library(zoo)
head(monthly.station6)
[,1]
1995-02-28 00:00:00 2.07
1995-03-01 00:00:00 5.70
1995-04-30 01:00:00 0.65
1995-05-31 01:00:00 1.03
1995-06-30 01:00:00 0.77
1995-07-31 01:00:00 0.39
I am applying this code: monthly.station6[,0] <- as.yearmon(monthly.station6[,0]) to try to convert this into a year month format, but I think the fact that the date column is [,0] is preventing it? Not sure where I am going wrong, any help would be appreciated!
head(monthly.station6)
[,1]
Feb 1995 2.07
Mar 1995 5.70
Apr 1995 0.65
May 1995 1.03
Jun 1995 0.77
Jul 1995 0.39
as requested dput(head(monthly.station6)):
structure(c(2.07, 5.7, 0.65, 1.03, 0.77, 0.39), .indexCLASS = c("POSIXct",
"POSIXt"), .indexTZ = "", tclass = c("POSIXct", "POSIXt"), tzone = "", class = c("xts",
"zoo"), index = structure(c(793929600, 794016000, 799200000,
801878400, 804470400, 807148800), tzone = "", tclass = c("POSIXct",
"POSIXt")), .Dim = c(6L, 1L))
1) index The object's class is "xts", not "data.frame". Use index (or time) to modify monthly.station6:
library(xts)
index(monthly.station6) <- as.yearmon(index(monthly.station6))
2) aggregate.zoo Another possibility is to use aggregate.zoo. That will return a "zoo" object so convert it back to "xts" :
library(xts)
as.xts(aggregate(monthly.station6, as.yearmon))
3) fortify.zoo Since the question mentions data.frame, if what you really wanted was a data.frame then the first statement after library will create a data.frame with a first column of Index and the second will perform the conversion to "yearmon":
library(xts)
DF <- fortify.zoo(monthly.station6)
transform(DF, Index = as.yearmon(Index))
Note: If you want just year then it cannot be an xts object but you could represent it as a data.frame. Using DF from (3):
transform(DF, Index = as.numeric(format(Index, "%Y")))

change specific hours of xts object with POSIXct index

I have a data frame which looks like this
df = data.frame (time = c("2013-12-23 00:00:00", "2013-12-23 00:13:00", "2013-12-23 00:14:00", "2013-12-23 00:14:01",
"2013-12-24 00:00:00", "2013-12-24 00:12:00", "2013-12-24 00:15:00", "2013-12-24 00:16:00"),
value = c(1, 2, 3, 4, 5, 6, 7, 8))
I transform this data frame to an xts object and use the POSIXct format for the index
df = as.xts(as.numeric(as.character(df[,"value"])), order.by = as.POSIXct(df[,"time"]))
What I now need is to change all the indices whose time is 00:00:00 to 22:00:00.
All other time indices must stay as they are.
The resulting object looks like this
>df
[,1]
2013-12-23 00:13:00 2
2013-12-23 00:14:00 3
2013-12-23 00:14:01 4
2013-12-23 22:00:00 1
2013-12-24 00:12:00 6
2013-12-24 00:15:00 7
2013-12-24 00:16:00 8
2013-12-24 22:00:00 5
Thanks for your help! Pat
We could use sub to replace the '00:00:00' to '22:00:00' in the original dataset and then do the xts conversion
df$time <- as.POSIXct(sub('00:00:00', '22:00:00', df$time),
format='%Y-%m-%d %H:%M:%S')
library(xts)
xts(df$value, order.by=df$time)
# [,1]
#2013-12-23 00:13:00 2
#2013-12-23 00:14:00 3
#2013-12-23 00:14:01 4
#2013-12-23 22:00:00 1
#2013-12-24 00:12:00 6
#2013-12-24 00:15:00 7
#2013-12-24 00:16:00 8
#2013-12-24 22:00:00 5
Here's a function that will shift the zero-hour of an xts object by n seconds.
shiftZeroHour <- function(x, n=1) {
stopifnot(is.xts(x))
# find zero hour
plt <- as.POSIXlt(index(x), tz=indexTZ(x))
isZeroHour <- plt$hour == 0 & plt$min == 0 & plt$sec == 0
# shift zero hour index values
.index(x)[isZeroHour] <- .index(x)[isZeroHour] + n
# ensure index is ordered properly
as.xts(x)
}
Here is how to use it with your sample data:
xdf <- structure(c(1, 2, 3, 4, 5, 6, 7, 8), .Dim = c(8L, 1L),
index = structure(c(1387778400, 1387779180, 1387779240, 1387779241,
1387864800, 1387865520, 1387865700, 1387865760), tzone = "",
tclass = c("POSIXct", "POSIXt")), class = c("xts", "zoo"),
.indexCLASS = c("POSIXct", "POSIXt"), tclass = c("POSIXct", "POSIXt"),
.indexTZ = "", tzone = "")
shiftZeroHour(xdf, 60*60*22)

Resources