Join time series in R - r

I would like to create a time series with a monthly interval by extending an already existing time series.
I have "t1" time series:
structure(c(49.25, 49.25, 30, 99.25, 99.25, 100.5, 101,
91.25), .Dim = c(1L, 8L), .Dimnames = list(NULL, c("2021-03-31",
"2022-03-31", "2022-05-31", "2022-09-30", "2022-12-31", "2023-03-31",
"2023-05-31", "2023-09-30")), .Tsp = c(1, 1, 1), class = c("mts",
"ts", "matrix"))
I would like to extend the above series to include monthly observations. How can I do this?

The object in the question is in a strange form. It consists of 9 separate time series with column names given by character dates. First extract the character dates and the values into a zoo object with yearmon time class -- yearmon directly represents a year and month without day. Ensure that it has frequency 12 and convert it to ts class which will have the effect of filling in the missing months. Finally extend it to the desired date.
library(zoo)
z <- zoo(t1[-1], as.yearmon(colnames(t1)[-1]), frequency = 12)
tt <- window(as.ts(z), end = c(2024, 11), extend = TRUE)
tt
giving this ts object:
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2021 49.25 NA NA NA NA NA NA NA NA NA
2022 NA NA 49.25 NA 30.00 NA NA NA 99.25 NA NA 99.25
2023 NA NA 100.50 NA 101.00 NA NA NA 91.25 NA NA NA
2024 NA NA NA NA NA NA NA NA NA NA NA
Note that you can use View(as.zoo(tt)) to view tt and can use na.approx(tt, na.rm = FALSE, rule = 2) to fill in internal NAs with interpolated values and trailing NAs with the last non-NA value.
Note
The input is shown in the question as:
t1 <- structure(c(49.25, 49.25, 30, 99.25, 99.25, 100.5, 101, 91.25), .Dim = c(1L, 8L), .Dimnames = list(NULL, c("2021-03-31", "2022-03-31", "2022-05-31", "2022-09-30", "2022-12-31", "2023-03-31", "2023-05-31", "2023-09-30")), .Tsp = c(1, 1, 1), class = c("mts", "ts", "matrix"))

Related

How do I count how many days it took to surpass a value in R?

I have a data df2
date X Days
2020-01-06 525 NA
2020-01-07 799 NA
2020-01-08 782 NA
2020-01-09 542 NA
2020-01-10 638 5
2020-01-11 1000 5
2020-01-12 1400 3
2020-01-13 3500 1
I want to count how many days it will take for the sum of X to surpass a value. In this case, the value is 3000.
For example on 1/13, it took 1 day because X is 3500, so it already surpassed 3000. On 1/12 it took 3 days (1400+1000+638)=3038.
I wish to get the column Days.
dput(df2)
structure(list(date = structure(c(1578268800, 1578355200, 1578441600,
1578528000, 1578614400, 1578700800, 1578787200, 1578873600), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), X = c(525, 799, 782, 542, 638, 1000,
1400, 3500), Days = c(NA, NA, NA, NA, 5, 5, 3, 1)), class = "data.frame", row.names = c(NA,
-8L))
I think a rolling-function works well. Unlike most rolling functions which have a fixed window that is smaller than the length of data, we will intentionally make this full-width.
zoo::rollapplyr(
df2$X, nrow(df2),
FUN = function(z) which(cumsum(rev(z)) > 3000)[1],
partial = TRUE)
# [1] NA NA NA NA 5 5 3 1
(I'm ignoring date, assuming that the rows are consecutive-days.)
cs <- c(0, cumsum(rev(df$X)))
out <- sapply(cs, function(x) which(cs - x > 3e3)[1])
rev(out - seq_along(cs))
#> [1] NA NA NA NA NA 5 5 3 1
Created on 2022-01-06 by the reprex package (v2.0.1)

Add different time periods in R (Create new total age column)

I have time periods as specified below. I have the age in terms of different units of time (Year, Month, Weeks, Days) but would like to add the different time units to give me one total age. My issue is that the functions I am finding in R when trying to convert the time units take the year as a specific year and the month as a specific month rather than a number of years or a number of months.
Could you show me how to add say, 68 years to 5 months to 3 days and 2 hours and so on. In other words to create a column with the total age in years which I can then easily convert to the total age in months and so on?
> head(KimAge)
# A tibble: 6 x 7
ID Years Months Weeks Days Hours
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 68 5 NA NA NA
2 2 70 2 NA NA NA
3 3 NA NA NA NA NA
4 4 23 NA NA NA NA
5 5 NA NA NA 3 NA
6 6 NA NA NA NA NA
In some pseudo-code, I am trying to write something like the pseudo-code below
KimAge$TotalAge = as.Year(Years) + as.month(Months) + as.week(Weeks) + as.days(Days) + as.hour(Hours)
To create the total age column you can use the lubridate library:
library(dplyr)
library(lubridate)
KimAge <- tibble(years = c(68, 70, NA, 23, NA, NA),
months = c(5, 2, NA, NA, NA, NA),
weeks = c(NA, NA, NA, NA, NA, NA),
days = c(NA, NA, NA, NA, 3, NA),
hours = c(NA, NA, NA, NA, NA, NA))
# convert NA to zero
KimAge[is.na(KimAge)] <- 0
# time to duration
KimAge$TotalAge <- duration(year = KimAge$years,
month = KimAge$months,
week = KimAge$weeks,
day = KimAge$days,
hour = KimAge$hours)
If you know the birthdate and death date:
KimAge <- tibble(birth = c("1974/03/21 12:40",
"2004/9/2 00:10",
"2014/12/12 00:00",
"2012/2/1 0:0"),
death = c("2020/03/11 16:40",
"2020/7/2 14:00",
"2021/1/4 23:01",
"2012/3/2 0:0"))
KimAge$birth <- parse_date_time(KimAge$birth, "ymd H:M")
KimAge$death <- parse_date_time(KimAge$death, "ymd H:M")
KimAge$TotalAge_d <- as.duration(KimAge$death - KimAge$birth)
KimAge$TotalAge_i <- as.interval(KimAge$birth , KimAge$death)
# interval version
KimAge$years = KimAge$TotalAge_i %/% years(1)
KimAge$months = KimAge$TotalAge_i %% years(1) %/%months(1)
KimAge$days = KimAge$TotalAge_i %% years(1) %% months(1) %/% days(1)
For more information on lubridate https://lubridate.tidyverse.org/. Read the differences between lubridate::period() and lubridate::interval().

What is the best way to expand the xts object and fill with NA?

Say i have the following xts object. How do I and what is the best way to expand 20 more rows and fill all the entries of the new rows with NA ?
structure(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, -0.626453810742332,
0.183643324222082, -0.835628612410047, 1.59528080213779, 0.329507771815361,
-0.820468384118015, 0.487429052428485, 0.738324705129217, 0.575781351653492,
-0.305388387156356, 1.51178116845085, 0.389843236411431, -0.621240580541804,
-2.2146998871775, 1.12493091814311, -0.0449336090152309, -0.0161902630989461,
0.943836210685299, 0.821221195098089, 0.593901321217509, 0.918977371608218,
0.782136300731067, 0.0745649833651906, -1.98935169586337, 0.61982574789471,
-0.0561287395290008, -0.155795506705329, -1.47075238389927, -0.47815005510862,
2.83588312039941, 4.71735910305809, 1.79442454531401, 2.77534322311874,
1.89238991883419, -0.754119113657213, 1.17001087340064, 1.2114200925793,
1.88137320657763, 4.20005074396777, 3.52635149691509, 1.67095280749283,
1.49327663972698, 3.39392675080947, 3.11332639734731, 0.62248861090096,
0.585009686075761, 2.72916392427366, 3.53706584903083, 1.77530757569954,
3.76221545290843, 2.79621176073414, 0.775947213498458, 2.68223938284885,
-0.258726192161585, 4.86604740340207, 5.96079979701172, 1.26555704706698,
-0.0882692526330615, 4.70915888232724, 2.59483618835753, 10.2048532815143,
2.88227999180049, 5.06921808735233, 3.084006476342, 0.770180373352784,
3.56637689854303, -2.41487588667311, 7.39666458468866, 3.45976001463569,
9.51783501108646, 4.42652858669899, 0.870160707234557, 4.83217906046716,
0.197707105067245, -0.760900200717306, 3.87433870655239, 1.6701243803447,
3.00331605489487, 3.22302397245499, 1.23143716143578, 1.29399380154449,
2.5944641546285, 6.53426098971961, -1.57070040128929, 4.78183856288526,
3.99885111364055, 6.18929951182909), .Dim = c(29L, 4L), .Dimnames = list(
NULL, c("x", "y1", "y2", "y3")), index = structure(c(1167667200,
1167753600, 1167840000, 1167926400, 1168012800, 1168099200, 1168185600,
1168272000, 1168358400, 1168444800, 1168531200, 1168617600, 1168704000,
1168790400, 1168876800, 1168963200, 1169049600, 1169136000, 1169222400,
1169308800, 1169395200, 1169481600, 1169568000, 1169654400, 1169740800,
1169827200, 1169913600, 1.17e+09, 1170086400), tzone = "", tclass = c("POSIXct",
"POSIXt")), class = c("xts", "zoo"))
Best way is always debatable. But the following works without any other packages. I use seq to create the newly wanted dates, starting from the last timestamp of the xts object. Add 1 day (60*60*24 seconds) to that and end after 20 days.
Then it is just a question of merging and the NA's are created automatically.
library(xts)
# create additional sequence of dates.
new <- seq(from = end(my_xts) + 60*60*24,
to = end(my_xts) + 20*60*60*24,
by = "day")
my_xts_new <- merge(my_xts, new)
tail(my_xts_new)
x y1 y2 y3
2007-02-13 17:00:00 NA NA NA NA
2007-02-14 17:00:00 NA NA NA NA
2007-02-15 17:00:00 NA NA NA NA
2007-02-16 17:00:00 NA NA NA NA
2007-02-17 17:00:00 NA NA NA NA
2007-02-18 17:00:00 NA NA NA NA

Grouped multiplication of matrices with unequal dimensions

I am new to R and trying to determine how I can do the following:
I have 2 matrices, each row is a date and each column is a number. The second matrix is much longer than the first. I want to create a function that will multiply the first row (say its a January number) by the first 4 rows of second matrix (which are all January numbers as well). So, I'm looking for 4 results. Then I want to move to the second row of the first matrix (February number) and multiply it by the 4 February numbers from the second matrix. Eventually, I am hoping to get to the code that will multiply the first by the second if the month and years match.
First Matrix
Jan 2007 143.75
Feb 2007 140.93
Second Matrix
2007-01-05 12.14
2007-01-12 10.15
2007-01-19 10.40
2007-01-26 11.13
2007-02-02 10.08
2007-02-09 11.10
2007-02-16 10.02
2007-02-23 10.58
Assuming those are both matrices, and that the dates on the left are the row names, you can try something along these lines. Here we match the months of the row names of two matrices and use it to create a vector for the calculation.
idx <- match(format(as.Date(rownames(m2)), "%b"), sub(" .*", "", rownames(m1)))
m2 * m1[idx]
# [,1]
# 2007-01-05 1745.125
# 2007-01-12 1459.062
# 2007-01-19 1495.000
# 2007-01-26 1599.938
# 2007-02-02 1420.574
# 2007-02-09 1564.323
# 2007-02-16 1412.119
# 2007-02-23 1491.039
Data:
m1 <- structure(c(143.75, 140.93), .Dim = c(2L, 1L), .Dimnames = list(
c("Jan 2007", "Feb 2007"), NULL))
m2 <- structure(c(12.14, 10.15, 10.4, 11.13, 10.08, 11.1, 10.02, 10.58
), .Dim = c(8L, 1L), .Dimnames = list(c("2007-01-05", "2007-01-12",
"2007-01-19", "2007-01-26", "2007-02-02", "2007-02-09", "2007-02-16",
"2007-02-23"), NULL))
Note: You haven't given us much information in your post, like whether or not you are doing this for multiple years, whether the dates are the row names or columns, etc. If you are doing this for multiple years, then please post a more representative data example with desired result.
We can try
row.names(m2) <- format(as.Date(row.names(m2)), '%b %Y')
transform(merge(m1, m2, by = "row.names"), new = V1.x * V1.y)
# Row.names V1.x V1.y new
#1 Feb 2007 140.93 10.08 1420.574
#2 Feb 2007 140.93 11.10 1564.323
#3 Feb 2007 140.93 10.02 1412.119
#4 Feb 2007 140.93 10.58 1491.039
#5 Jan 2007 143.75 12.14 1745.125
#6 Jan 2007 143.75 10.15 1459.062
#7 Jan 2007 143.75 10.40 1495.000
#8 Jan 2007 143.75 11.13 1599.938
data
m1 <- structure(c(143.75, 140.93), .Dim = c(2L, 1L),
.Dimnames = list(
c("Jan 2007", "Feb 2007"), NULL))
m2 <- structure(c(12.14, 10.15, 10.4, 11.13, 10.08, 11.1,
10.02, 10.58), .Dim = c(8L, 1L),
.Dimnames = list(c("2007-01-05", "2007-01-12",
"2007-01-19", "2007-01-26", "2007-02-02", "2007-02-09",
"2007-02-16", "2007-02-23"), NULL))

subsetting and performing calculations on time series data, avoiding loops

I'm trying to figure out how to do the following without looping. I have a melted dataset of time, study site, and flow that looks like:
datetime site flow
6/1/2009 00:00 EBT NA
6/2/2009 01:00 EBT NA
6/3/2009 02:00 EBT 0.1
6/4/2009 03:00 EBT NA
6/5/2009 04:00 EBT NA
6/1/2009 00:00 MUT 0.4
6/2/2009 01:00 MUT 0.3
6/3/2009 02:00 MUT 0.2
6/4/2009 03:00 MUT NA
6/5/2009 04:00 MUT NA
I need to subset this by site, and then for periods when there are at least two subsequent flow measurements I need to perform a couple of calculations, *for example the mean of the current and previous measurement.
The trick is that I need to perform the average on each set of consecutive measurements, i.e. if there are three in a row for each of the latter two I need the average of that measurement and the previous one. I've added a goal column to the sample dataframe with the results I'd like to get.*
I'd like to end up with a similar looking dataframe with the datetime, site, and result of the calculation. There is a full time series for each site.
Thanks for any help!
data generator:
structure(list(datetime = structure(c(1167627600, 1167717600,
1167807600, 1167897600, 1167987600, 1167627600, 1167717600, 1167807600,
1167897600, 1167987600, 1168077600, 1168167600, 1168257600, 1168347600,
1168437600), class = c("POSIXct", "POSIXt"), tzone = ""), site = structure(c(1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("EBT",
"MUT"), class = "factor"), flow = c(NA, 0.1, NA, NA, NA, NA,
0.4, 0.2, NA, NA, 0.4, 0.2, 0.1, NA, NA), goal = c(NA, NA, NA,
NA, NA, NA, NA, 0.3, NA, NA, NA, 0.3, 0.15, NA, NA)), .Names = c("datetime",
"site", "flow", "goal"), row.names = c(NA, -15L), class = "data.frame")
This will separate your dataframe by site and then filter only rows that have two or more consecutive non-NA values in flow:
by(sample, sample$site, function(d) d[with(rle(!is.na(d$flow)), rep(values & lengths>=2, lengths)),])
You can then work on the function inside to do your calculations as needed.
For instance, if you want to add the mean as a new column (assuming you want NA when not defined) you can use this:
f <- function(d)
{
x <- with(rle(!is.na(d$flow)), rep(values & lengths>=2, lengths))
within(d, {avg <- NA; avg[x] <- mean(d[x,"flow"])})
}
b <- by(sample, sample$site, f)
Reduce(rbind, b)
Result:
datetime site flow avg
1 2009-06-01 01:00:00 EBT NA NA
2 2009-06-02 02:00:00 EBT NA NA
3 2009-06-03 03:00:00 EBT 0.1 NA
4 2009-06-04 04:00:00 EBT NA NA
5 2009-06-05 05:00:00 EBT NA NA
6 2009-06-01 01:00:00 MUT 0.4 0.3
7 2009-06-02 02:00:00 MUT 0.3 0.3
8 2009-06-03 03:00:00 MUT 0.2 0.3
9 2009-06-04 04:00:00 MUT NA NA
10 2009-06-05 05:00:00 MUT NA NA
EDIT: To get the mean between the current flow measure and the previous one, you can use this:
f <- function(d)
{
within(d, avg <- (flow+c(NA,head(flow,-1)))/2)
}
Reduce(rbind, by(sample, sample$site, f))
Note that cases with a single measure are automatically set to NA. New result:
datetime site flow goal avg
1 2007-01-01 03:00:00 EBT NA NA NA
2 2007-01-02 04:00:00 EBT 0.1 NA NA
3 2007-01-03 05:00:00 EBT NA NA NA
4 2007-01-04 06:00:00 EBT NA NA NA
5 2007-01-05 07:00:00 EBT NA NA NA
6 2007-01-01 03:00:00 MUT NA NA NA
7 2007-01-02 04:00:00 MUT 0.4 NA NA
8 2007-01-03 05:00:00 MUT 0.2 0.30 0.30
9 2007-01-04 06:00:00 MUT NA NA NA
10 2007-01-05 07:00:00 MUT NA NA NA
11 2007-01-06 08:00:00 MUT 0.4 NA NA
12 2007-01-07 09:00:00 MUT 0.2 0.30 0.30
13 2007-01-08 10:00:00 MUT 0.1 0.15 0.15
14 2007-01-09 11:00:00 MUT NA NA NA
15 2007-01-10 12:00:00 MUT NA NA NA
Plyr functions are a good way to split apart dataframes by certain variables, which is what you need to do.
I thought of two ways to handle intervals on a vector: first with vector multiplication (for the mean of the data), and second with vectorizing a function (for generating the labels). They're both doing pretty much the same thing, though.
library(reshape2)
library(plyr)
library(lubridate)
meanBetween <- function(x){
l <- length(x)
diag(outer(x[1:(l-1)], x[2:l], "+"))/2
}
output <- ddply(sample, .(site), function(df){
df <- df[order(df$datetime, decreasing=FALSE), ]
result <- meanBetween(df$flow)
names(result) <- Reduce(c, (mapply(as.interval,
df$datetime[-1],
df$datetime[1:(length(df$datetime)-1)],
SIMPLIFY=FALSE)))
result
})
melt(output) # to make it look nicer

Resources