Function to return modified matrix in R - r

The following code adds vector XTS1$XTSSum2 to xts object XTS1:
library(xts)
XTS1 <- structure(c(12, 7, 7, 22, 24, 30, 26, 23, 27, 30), .indexCLASS = c("POSIXct", "POSIXt"), .indexTZ = "", tclass = c("POSIXct", "POSIXt"), tzone = "", class = c("xts", "zoo"), .CLASS = structure("double", class = "CLASS"), formattable = structure(list(formatter = "formatC", format = structure(list(format = "f", digits = 2), .Names = c("format", "digits")), preproc = "percent_preproc", postproc = "percent_postproc"), .Names = c("formatter", "format", "preproc", "postproc")), index = structure(c(1413981900, 1413982800, 1413983700, 1413984600, 1413985500, 1413986400, 1413987300, 1413988200, 1413989100, 1413990000), tzone = "", tclass = c("POSIXct", "POSIXt")), .Dim = c(10L, 1L))
colnames(XTS1) <- "XTS1"
XTS1$XTSSum2 <- XTS1$XTS1 + lag(XTS1$XTS1,1)
The following function performs the same operation.
addfunction <- function(x){
x$XTSSum2 <- x$XTS1 + lag(x$XTS1,1)
}
addfunction(XTS1)
But the vector XTS1$XTSSum2 is not stored.
How can I get addfunction to store the vector so that after running addfunction(XTS1), XTS1 will look like this:
XTS1 XTSSum2
2014-10-22 08:45:00 12 NA
2014-10-22 09:00:00 7 19
2014-10-22 09:15:00 7 14
2014-10-22 09:30:00 22 29
2014-10-22 09:45:00 24 46
2014-10-22 10:00:00 30 54
2014-10-22 10:15:00 26 56
2014-10-22 10:30:00 23 49
2014-10-22 10:45:00 27 50
2014-10-22 11:00:00 30 57
The reproducible example uses an xts object, presume the same solution would apply
to xts objects, matrices and data frames.

The assignment is happening within the function's environment, not the global one. You need to return the result in the function, and assign it with the function call. Try this:
addfunction <- function(x){
x$XTSSum2 <- x$XTS1 + lag(x$XTS1,1)
x
}
XTS1 <- addfunction(XTS1)

Related

R Group ID's with overlapping time intervals

I have a large dataset with multiple groups within the dataset of IDs with Start & Stop datetimes. What I'm trying to do is within each group identify where a subgroup occurred. A subgroup within a group would be when two ID's overlap with their START & END datetime columns. Below is script to create a sample dataset in R for one group. What I want to do is within each group create a column called, "Grp" that groups those subgroups with overlapping START & END datetimes.
What I have...
structure(list(ID = c(1,2,3,4), START = structure(c(1490904000, 1490918400,
1508363100, 1508379300), tzone = "UTC", class = c("POSIXct",
"POSIXt")), END = structure(c(1492050600, 1492247700,
1509062400, 1509031800), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), class = "data.frame", row.names = c(NA, -4L), .Names = c("ID","START",
"END"))
What I want is...
structure(list(ID = c(1,2,3,4), START = structure(c(1490904000, 1508379300,
1508363100, 1490918400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), END = structure(c(1492050600, 1509031800,
1509062400, 1492247700), tzone = "UTC", class = c("POSIXct",
"POSIXt")), Grp = c(1,2,2,1)), class = "data.frame", row.names = c(NA, -4L), .Names = c("ID","START",
"END","Grp"))
I've tried using lubridate's interval, and finding an overlap that way, but no luck. Any help would be greatly appreciated.
Atfter sorting by START, the condition for a new group is that the END of the previous row is less than the START of the next group:
head(df1$END, -1) < tail(df1$START,-1)
df1 <- structure(list(ID = c(1,2,3,4), START = structure(c(1490904000, 1490918400,
1508363100, 1508379300), tzone = "UTC", class = c("POSIXct",
"POSIXt")), END = structure(c(1492050600, 1492247700,
1509062400, 1509031800), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), class = "data.frame", row.names = c(NA, -4L), .Names = c("ID","START",
"END"))
df1
ID START END
1 1 2017-03-30 20:00:00 2017-04-13 02:30:00
2 2 2017-03-31 00:00:00 2017-04-15 09:15:00
3 3 2017-10-18 21:45:00 2017-10-27 00:00:00
4 4 2017-10-19 02:15:00 2017-10-26 15:30:00
df1a <- df1[ order(df1$START), ]
df1a$grp <- cumsum( c( 1, head(df1$END, -1) < tail(df1$START,-1) ))
df1a
#---------------
ID START END grp
1 1 2017-03-30 20:00:00 2017-04-13 02:30:00 1
2 2 2017-03-31 00:00:00 2017-04-15 09:15:00 1
3 3 2017-10-18 21:45:00 2017-10-27 00:00:00 2
4 4 2017-10-19 02:15:00 2017-10-26 15:30:00 2
Here's a function that answers the first part of my response to the comment below:
grp_overlaps <- function(endings, beginnings){
cumsum(c( 1, head(endings, -1) < tail(beginnings, -1) )) }

Finding value whose date interval encompasses punctual date

I have a data.table of theoric values which are given for an interval:
firstDate lastDate theoric
2017-01-01 2017-01-03 10
2017-01-05 2017-01-25 20
2017-02-01 2017-08-31 30
On the other hand, I have punctual measured values:
datetime measured
2017-01-02 11
2017-01-08 22
2017-01-09 19
2017-01-26 25
2017-03-02 32
I would like to have, for each measured value, the corresponding theoric value (the one whose interval includes the measurement date).
Notes: 1. Theoric intervals can not overlap. 2. If a measurement is not within any theroric interval, return NA.
Expected output:
datetime measured theoric
2017-01-02 11 10
2017-01-08 22 20
2017-01-09 19 20
2017-01-26 25 NA
2017-03-02 32 30
Reproducible dataset:
theoricDt <- structure(list(firstDate = structure(c(1483228800, 1483574400, 1485907200), class = c("POSIXct", "POSIXt"), tzone = "GMT"), lastDate = structure(c(1483401600, 1485302400, 1504137600 ), class = c("POSIXct", "POSIXt"), tzone = "GMT"), theoric = c(10, 20, 30)), .Names = c("firstDate", "lastDate", "theoric"), row.names = c(NA, -3L), class = c("data.table", "data.frame"))
measureDt <- structure(list(datetime = structure(c(1483315200, 1483833600, 1483920000, 1485388800, 1488412800), class = c("POSIXct", "POSIXt"), tzone = "GMT"), measured = c(11, 22, 19, 25, 32)), .Names = c("datetime", "measured"), row.names = c(NA, -5L), class = c("data.table","data.frame"))
You can use a non-equi join:
measureDt[theoricDt, on = .(datetime >= firstDate, datetime <= lastDate),
theoric := i.theoric]
measureDt
# datetime measured theoric
#1: 2017-01-02 11 10
#2: 2017-01-08 22 20
#3: 2017-01-09 19 20
#4: 2017-01-26 25 NA
#5: 2017-03-02 32 30

Get highest value of X rows

I'm looking for a function to calculate the highest value for the prior X periods on an XTS object. The function would return a vector with such values.
I would believe there are multiple ways to calculate this. Surprisingly I could not find this covered in a prior SO question. I am hoping there is a package with a function already defined for this. If there is none maybe someone knows how to tackle it.
The example below shows how the vector with the highest values of the last 3 periods would look like for XTS object XTS1.
library('xts')
XTS1 <- structure(c(12, 7, 7, 22, 24, 30, 26, 23, 27, 30), .indexCLASS = c("POSIXct", "POSIXt"), .indexTZ = "", tclass = c("POSIXct", "POSIXt"), tzone = "", class = c("xts", "zoo"), .CLASS = structure("double", class = "CLASS"), formattable = structure(list(formatter = "formatC", format = structure(list(format = "f", digits = 2), .Names = c("format", "digits")), preproc = "percent_preproc", postproc = "percent_postproc"), .Names = c("formatter", "format", "preproc", "postproc")), index = structure(c(1413981900, 1413982800, 1413983700, 1413984600, 1413985500, 1413986400, 1413987300, 1413988200, 1413989100, 1413990000), tzone = "", tclass = c("POSIXct", "POSIXt")), .Dim = c(10L, 1L))
#DESIRED OUTPUT
[,1] GetHighest(3)
2014-10-22 08:45:00 12 NA
2014-10-22 09:00:00 7 12
2014-10-22 09:15:00 7 12
2014-10-22 09:30:00 22 12
2014-10-22 09:45:00 24 22
2014-10-22 10:00:00 30 24
2014-10-22 10:15:00 26 30
2014-10-22 10:30:00 23 30
2014-10-22 10:45:00 27 30
2014-10-22 11:00:00 30 27
You could use rollapply from zoo.
So it would look something like this:
GetHighest_3 = rollapply(data = XTS1, width = 3, FUN = max)
Then combine it:
cbind(XTS1, GetHighest_3)
The only probelm I see, is that it will probably return NA for the first 2 values, not only the first value, since it has a width of 3.
I wasn't able to test the code, since I don't have access to R right now, so there might be some misspelling.

change specific hours of xts object with POSIXct index

I have a data frame which looks like this
df = data.frame (time = c("2013-12-23 00:00:00", "2013-12-23 00:13:00", "2013-12-23 00:14:00", "2013-12-23 00:14:01",
"2013-12-24 00:00:00", "2013-12-24 00:12:00", "2013-12-24 00:15:00", "2013-12-24 00:16:00"),
value = c(1, 2, 3, 4, 5, 6, 7, 8))
I transform this data frame to an xts object and use the POSIXct format for the index
df = as.xts(as.numeric(as.character(df[,"value"])), order.by = as.POSIXct(df[,"time"]))
What I now need is to change all the indices whose time is 00:00:00 to 22:00:00.
All other time indices must stay as they are.
The resulting object looks like this
>df
[,1]
2013-12-23 00:13:00 2
2013-12-23 00:14:00 3
2013-12-23 00:14:01 4
2013-12-23 22:00:00 1
2013-12-24 00:12:00 6
2013-12-24 00:15:00 7
2013-12-24 00:16:00 8
2013-12-24 22:00:00 5
Thanks for your help! Pat
We could use sub to replace the '00:00:00' to '22:00:00' in the original dataset and then do the xts conversion
df$time <- as.POSIXct(sub('00:00:00', '22:00:00', df$time),
format='%Y-%m-%d %H:%M:%S')
library(xts)
xts(df$value, order.by=df$time)
# [,1]
#2013-12-23 00:13:00 2
#2013-12-23 00:14:00 3
#2013-12-23 00:14:01 4
#2013-12-23 22:00:00 1
#2013-12-24 00:12:00 6
#2013-12-24 00:15:00 7
#2013-12-24 00:16:00 8
#2013-12-24 22:00:00 5
Here's a function that will shift the zero-hour of an xts object by n seconds.
shiftZeroHour <- function(x, n=1) {
stopifnot(is.xts(x))
# find zero hour
plt <- as.POSIXlt(index(x), tz=indexTZ(x))
isZeroHour <- plt$hour == 0 & plt$min == 0 & plt$sec == 0
# shift zero hour index values
.index(x)[isZeroHour] <- .index(x)[isZeroHour] + n
# ensure index is ordered properly
as.xts(x)
}
Here is how to use it with your sample data:
xdf <- structure(c(1, 2, 3, 4, 5, 6, 7, 8), .Dim = c(8L, 1L),
index = structure(c(1387778400, 1387779180, 1387779240, 1387779241,
1387864800, 1387865520, 1387865700, 1387865760), tzone = "",
tclass = c("POSIXct", "POSIXt")), class = c("xts", "zoo"),
.indexCLASS = c("POSIXct", "POSIXt"), tclass = c("POSIXct", "POSIXt"),
.indexTZ = "", tzone = "")
shiftZeroHour(xdf, 60*60*22)

Scatterplot of two xts time series

I've got two xts time series. A small sample of them:
ts1
[,1]
2009-05-06 00:00:00 38.414
2009-05-06 00:15:00 45.079
2009-05-06 00:30:00 38.878
2009-05-06 00:45:00 49.889
2009-05-06 01:00:00 41.270
2009-05-06 01:15:00 41.050
2009-05-06 01:30:00 38.951
2009-05-06 01:45:00 39.854
2009-05-06 02:00:00 37.803
2009-05-06 02:15:00 42.930
ts2
[,1]
2009-05-06 00:00:00 406.887
2009-05-06 00:15:00 413.298
2009-05-06 00:30:00 409.353
2009-05-06 00:45:00 412.312
2009-05-06 01:00:00 409.353
2009-05-06 01:15:00 415.271
2009-05-06 01:30:00 416.257
2009-05-06 01:45:00 416.257
2009-05-06 02:00:00 416.257
2009-05-06 02:15:00 419.216
Now I want to create a scatterplot ts1 against ts2. According to the documentation of CRAN (and I also found example in stackoverflow in the same way) it should work like this: plot(ts1, ts2). But I get an error.
plot(ts1,ts2)
# Error in plot(xycoords$x, xycoords$y, type = type, axes = FALSE, ann = FALSE, :
# object 'xycoords' not found
What's going wrong? It works great great with normal ts with the ~ sign, but this doesn't work in xts. I also tried plot(ts1[, 1], ts2[, 1]).
The easiest thing to do is to call plot.zoo directly, instead of allowing the plot generic to dispatch to plot.xts.
ts1 <-
structure(c(38.414, 45.079, 38.878, 49.889, 41.27, 41.05, 38.951,
39.854, 37.803, 42.93), .Dim = c(10L, 1L), index = structure(c(1241586000,
1241586900, 1241587800, 1241588700, 1241589600, 1241590500, 1241591400,
1241592300, 1241593200, 1241594100), tzone = "", tclass = c("POSIXct",
"POSIXt")), class = c("xts", "zoo"), .indexCLASS = c("POSIXct",
"POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = "", tzone = "")
ts2 <-
structure(c(406.887, 413.298, 409.353, 412.312, 409.353, 415.271,
416.257, 416.257, 416.257, 419.216), .Dim = c(10L, 1L),
index = structure(c(1241586000, 1241586900, 1241587800, 1241588700,
1241589600, 1241590500, 1241591400, 1241592300, 1241593200, 1241594100),
tzone = "", tclass = c("POSIXct", "POSIXt")), class = c("xts", "zoo"),
.indexCLASS = c("POSIXct", "POSIXt"), tclass = c("POSIXct", "POSIXt"),
.indexTZ = "", tzone = "")
plot.zoo(ts1, ts2)

Resources