Use of loops and seasonal adjustment in R with multivariate timeseries - r

My goal is to adjust data seasonally and save only seasonal factors.
My data consists of 60+ time series. As you see, one of my problems is that there
is a lot of time series that I need to adjust at once. Here’s my try to do this in R:
library("timeDate")
library("timeSeries")
library("seasonal")
mDxts <- structure(c(35.8, 41.6, 35.9, 36.9, 42.43, 36.067,28.67, 29.53, 32.83, 29.867,23.9, 20.8, 21.167, NA, NA, NA, NA, NA, NA, NA, 149.67,108.89, 89.067, 83.33, 77.2,64.91, 50.2, 48, 62.13, 52.93,43.2, 38.8, 37.9, 19, 18, 17, 16.5, 16, 15.5, 15), class = c("xts","zoo"), .indexCLASS = "yearqtr", tclass = "yearqtr", .indexTZ = "", tzone = "", index = structure(c(946684800,954547200, 962409600, 970358400, 978307200, 986083200, 993945600,1001894400, 1009843200, 1017619200, 1025481600, 1033430400, 1041379200,1491004800, 1498867200, 1506816000, 1514764800, 1522540800, 1530403200,1538352000), tzone = "", tclass = "yearqtr"), .Dim = c(20L, 2L), .Dimnames = list(NULL, c("depall", "ref")))
ll<-lapply(mDxts, function(e) ts(e,start=c(2000,1),frequency=4))
#I. Seasonal adjustment and saving only d10 component:
sf<-sapply(ll,function(e) try(seas(e,x11="",na.action=na.exclude, transform.function = "none",x11.mode="logadd",arima.model="(0 1 1) (0 1 0)",regression.aictest = NULL, outlier=NULL,x11.save="d10")))
As I understand, my main problem is that performing sf<-sapply(st,function(e) try(seas(e, x11.save="d10")) I obtain a list where each element is a list with all d10, d11 and so on (take a look by SF.df<-as.data.frame(SF)).

Well, time passed, and now I'm able to give an answer to the questions.
I needed to adjust seasonally a lot a data simultaneously and then to recalculate them at one time.
Firstly, one problem was that the result of seasonal adjustment was saved as list. The answer is in using fuction final from seasonal:
sa<-sapply(st,function(e) try(final(seas(e,x11="",na.action=na.exclude,
transform.function = "none",x11.mode="logadd",arima.model="(0 1 1) (0 1 0)",
regression.aictest = NULL, outlier=NULL))))
To save the seasonal factor a function series from the same package was needed:
sf<-sapply(st,function(e) try(series(seas(e,x11="",na.action=na.exclude,
transform.function = "none",x11.mode="logadd",arima.model="(0 1 1) (0 1 0)",
regression.aictest = NULL, outlier=NULL),"d10")))

Related

I cannot figure out why my dates don't work

structure(list(Position = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), Date = structure(c(1685750400, 1685750400, 1685750400, 1685750400, 1685750400, 1685750400, 1685750400, 1685750400, 1685750400, 1685750400), tzone = "UTC", class = c("POSIXct", "POSIXt")), Time
= structure(c(-2209017523, -2209017518, -2209017513, -2209017508, -2209017503, -2209017498, -2209017493, -2209017488, -2209017483, -2209017478), tzone = "UTC", class = c("POSIXct", "POSIXt")), DateTime = structure(c(1685808077, 1685808082, 1685808087,
1685808092, 1685808097, 1685808102, 1685808107, 1685808112, 1685808117, 1685808122), tzone = "UTC", class = c("POSIXct", "POSIXt")), Temperatuur = c(21.2, 21.2, 21.6, 21.7, 22, 22.2, 20.1, 20.2, 20.3, 20.3), Treatment = c("Tempex", "Tempex", "Tempex",
"Tempex", "Tempex", "Tempex", "Tempex", "Tempex", "Tempex", "Tempex")), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))
My R code:
time.start=as.POSIXct("2023-06-03, 16:00",format="%H:%M")
time.end=as.POSIXct("2023-07-03, 07:30",format="%H:%M")
ggplot(logger, aes(DateTime, Temperatuur, color = Treatment))+
geom_line(size = 1)+
scale_x_datetime(limits = c(time.start, time.end),
breaks = date_breaks("24 hours"),
labels = date_format("%H:%M"))
I just cannot figure out why I am getting this: plot
Can someone please help me?
I tried to follow advice online, but that got me nowhere. Changing from 24 to 12 hours also didn't help. Maybe it's a problem in my excel file, but that all seems alright.
If we remove the commas from the time.* strings and fix the errant format=, it should work.
time.start <- as.POSIXct("2023-06-03 16:00")
time.end <- as.POSIXct("2023-07-03 07:30")
ggplot(logger, aes(DateTime, Temperatuur, color = Treatment)) +
geom_line(size = 1) +
scale_x_datetime(
limits = c(time.start, time.end),
breaks = date_breaks("24 hours"),
labels = date_format("%H:%M"))
I think perhaps you are confounding the use of labels=date_format("%H:%M") as what you think time.start/time.end should be (or how they should look), which is incorrect. Since the x-axis is (I'm inferring) a POSIXt object, then the time.* objects also must be the same object. You don't need to make them "look" (i.e., "%H:%M") the same as how you want the axis labels rendered, it is handled automatically by ggplot.

geom_tile plot to visualize missing observations of temporal data

I'm trying to visualize missing observations in a time series dataset using R.
I have a dataset of the following format with temperature values on 5 different sensors over 15 minute time periods.
quarter_hour_readings <- structure(list(sensor_id = c(
"5a", "5b", "5c", "6a", "6b", "5a",
"5b", "5c", "6a", "6b"
), date_time = structure(c(
1573805700,
1573805700, 1573805700, 1573805700, 1573805700, 1573806600, 1573806600,
1573806600, 1573806600, 1573806600
), class = c("POSIXct", "POSIXt"), tzone = ""), value = c(
14.4, 21.8, 15, 19.2, 10, 14.7, 21.1,
15.8, 18.5, 10.4
)), class = "data.frame", row.names = c(NA, -10L))
I tried the following code to plot a tile plot of observations of each sensor over time, but an empty plot appears with no error message.
library(ggplot2)
library(dplyr)
quarter_hour_readings %>%
ggplot(aes(x = date_time,
y = sensor_id,
fill = value)) +
geom_tile()
The following is a similar plot I visualized using Tableau for the same data, where each bar is a sensor and the x-axis represents continuous-time.

Multiple individual graphs from a unique dataframe

I know that some subjects are about similar questions, but even using those I was not able to resolve the issue on my own. Thus, I am sorry if this subject appears as a duplicate but I am a bit stuck.
I have to draw nearly 40 graphs representing body temperature variations accross 24hours (a graph per individual of the study). To do that, I tried to write a loop using dplyrand ggplot2 packages. You may find bellow an exemple of my data. There are numerous missing values but I don't think it represents an issue regarding the current question.
structure(list(heures = structure(1:13, .Label = c("01:00:00",
"03:00:00", "05:00:00", "07:00:00", "08:00:00", "10:00:00", "12:00:00",
"13:30:00", "15:00:00", "17:00:00", "19:00:00", "21:00:00", "23:00:00"
), class = "factor"), x1= c(36.55, 36.5, 36.44444444,
36.6, 36.86666667, 37.26, 37, NA, NA, 37.3, 37.1, 37, 35.6),
x2 = c(NA, 34.5, 35.4, 36.1, NA, NA, NA, NA, NA,
NA, NA, NA, NA), x3 = c(36.9, 36.4, NA, NA, 36.9,
NA, NA, NA, NA, 37.5, 37.5, 36.9, 37.1), x4 = c(36,
35.8, NA, NA, NA, 37.4, 36.7, 36.3, NA, 37.5, 37, NA, NA)), class = "data.frame", row.names = c(NA,
-13L))
So far, I have written the following code with "indiv" being a dataframe containing the above presented data.
names <- c(colnames(indiv))
graph <- list()
test <- function(df, names) {
for (i in 1:length(df)) {
name <- names[i]
stock <- df %>%
filter(heures, !!name)
graph[[i]] <- ggplot(data=stock, aes(x=heures, y=stock[,2])) +
geom_point() +
labs(x="Hours (HH:MM:SS)",
y="Temperature",
title=colnames(stock[2]))
}
return(graph)
}
It returns an error that seems to indicate the filter function does not work properly:
Warning messages:
1: In Ops.factor(~heures, ~"x1") :
‘&’ not meaningful for factors
I can't figure out what I'm doing wrong in this. I also tried a code without the dplyr part present in the current loop, but it didn't gave me the wanted output neither.
Thank you in advance for your advises.
I have came out with this idea : tidy a little the dataset to make it easier to use with ggplot and then split it and store the splitted dataframe in a list. Then I use lapply to avoid using a loop along with a custom function to create plots.
This is not a very fast way if you have a lot of data but I use this trick a lot with small datasets.
This code creates a plot for each individual (not facets).
library(tidyverse) # all functions of these packages are not necessary here
df = structure(list(heures = structure(1:13, .Label = c("01:00:00",
"03:00:00", "05:00:00", "07:00:00", "08:00:00", "10:00:00", "12:00:00",
"13:30:00", "15:00:00", "17:00:00", "19:00:00", "21:00:00", "23:00:00"
), class = "factor"), x1= c(36.55, 36.5, 36.44444444,
36.6, 36.86666667, 37.26, 37, NA, NA, 37.3, 37.1, 37, 35.6),
x2 = c(NA, 34.5, 35.4, 36.1, NA, NA, NA, NA, NA,
NA, NA, NA, NA), x3 = c(36.9, 36.4, NA, NA, 36.9,
NA, NA, NA, NA, 37.5, 37.5, 36.9, 37.1), x4 = c(36,
35.8, NA, NA, NA, 37.4, 36.7, 36.3, NA, 37.5, 37, NA, NA)), class = "data.frame", row.names = c(NA,
-13L))
# tidy your data, good practice makes it easier to plot things with ggplot
df = df %>% pivot_longer(2:ncol(df), names_to = "individual", values_to = "temperature")
# I would do it this way:
df_list = split(df, df$individual)
plot_fun = function(df) {
title = unique(df$individual)
ggplot(df, aes(x=heures, y=temperature))+
geom_point() +
labs(title = title)
#### add here things to save your plots, store them somewhere, etc
}
lapply(df_list, FUN = plot_fun)
Using toy data as your data frame is incomplete:
df <- tibble(
X=rep(1:10, times=2),
Y=c(1:10, seq(10, 1, -1)),
Name=rep(c("Patient 1", "Patient 2"), each=10)
)
df %>% ggplot() +
geom_line(aes(x=X, y=Y)) +
facet_grid(rows=vars(Name))
Giving

Apply technical analysis indicator to multiple assets

I am trying to apply this simple technical analysis indicator to an xts dataframe called prices. But I can't manage to create the loop for the signal. Do you have some suggestions?
library(TTR)
library(Hmisc)
library(xts)
prices = structure(c(70.27, 70.29, 70.31, 70.67, 70.41, 70.53, 70.56,
69.61, 70.32, 69.97, 70.13, 68.88, 68.97, 70.75, 71.32, 71.32,
71.32, 72.02, 72.48, 73.33, 73.59, 73.93, 73.47, 72.13, 72.17,
73.18, 72.59, 73.34, 73.43, 72.78, 72.43, 72.3, 71.27, 71.51,
71.94, 71.1, 69.77, 70.02, 70.26, 69.6, 70.13, 70.13, 71.27,
70.58, 69.52, 69.58, 69.46, 69.62, 69.07, 69.98, 44.245, 44.125,
44.09, 44.155, 43.93, 44.305, 44.065, 43.37, 43.685, 43.285,
43.355, 42.305, 42.65, 43.64, 43.885, 43.885, 43.885, 44.12,
44.385, 44.78, 44.985, 44.985, 44.865, 44.38, 44.05, 44.65, 44.065,
44.62, 44.73, 44.32, 44.275, 44.145, 43.615, 43.975, 44.52, 44.335,
43.585, 43.715, 43.83, 43.735, 44.09, 44.005, 44.775, 44.325,
43.555, 43.535, 43.325, 43.425, 43.04, 43.45, 166.09, 166.44,
165.04, 167.69, 168.08, 169.17, 168.67, 167.19, 167.19, 164.39,
163.26, 159.64, 160.33, 162.83, 163.4, 163.4, 163.4, 164.79,
166.23, 168.3, 168.29, 169.34, 168.56, 166.81, 165.39, 165.98,
162.64, 163.78, 164.91, 164, 162.1, 162.25, 161.45, 162.08, 162.37,
160.09, 157.96, 158.45, 159.95, 159.75, 160.58, 160.51, 164.09,
161.96, 160.84, 161.41, 159.48, 159.45, 158.09, 158.49, 66, 66.19,
66.31, 67.17, 66.84, 67.32, 67.26, 66.19, 66.46, 65.62, 65.61,
63.87, 64.09, 64.73, 65.72, 65.72, 65.72, 66.11, 66.96, 67.53,
67.57, 67.53, 67.25, 65.98, 65.52, 66.19, 65.23, 66.2, 66.4,
65.53, 65.52, 65.37, 64.54, 64.57, 64.85, 64, 62.94, 63.18, 63.87,
63.3, 63.9, 63.83, 64.76, 64, 63.62, 63.92, 63.02, 63.27, 62.33,
62.65), .indexCLASS = "Date", tclass = "Date", .indexTZ = "UTC", tzone = "UTC", format = "%Y-%m-%d", class = c("xts",
"zoo"), index = structure(c(1301616000, 1301875200, 1301961600,
1302048000, 1302134400, 1302220800, 1302480000, 1302566400, 1302652800,
1302739200, 1302825600, 1303084800, 1303171200, 1303257600, 1303344000,
1303430400, 1303689600, 1303776000, 1303862400, 1303948800, 1304035200,
1304294400, 1304380800, 1304467200, 1304553600, 1304640000, 1304899200,
1304985600, 1305072000, 1305158400, 1305244800, 1305504000, 1305590400,
1305676800, 1305763200, 1305849600, 1306108800, 1306195200, 1306281600,
1306368000, 1306454400, 1306713600, 1306800000, 1306886400, 1306972800,
1307059200, 1307318400, 1307404800, 1307491200, 1307577600), tzone = "UTC", tclass = "Date"), .Dim = c(50L,
4L), .Dimnames = list(NULL, c("A", "B", "C", "D")))
#I apply the EMA indicator to the prices xts dataframe
EMA20fn <- function(x) EMA(x, n=20)
EMA20prices <- xts(apply(prices, 2, EMA20fn), order.by=index(EMA20fn(prices[,1])))
#I know how to create the signals (in EMA20prices) for a single asset, but I don't know
#what kind of loop it's required to apply the signal to every asset in the dataframe
#prices
ema20tr <- Lag(ifelse(Lag(prices[,1])<Lag(EMA20prices[,1])& prices[,1]>EMA20prices[,1],1,
ifelse(Lag(prices[,1])>Lag(EMA20prices[,1])& prices[,1]<EMA20prices[,1],-1,0)))
ema20tr[is.na(ema20tr)] <- 0
ema20sig <- ifelse(ema20tr>1,0,0)
for(i in 2:length(prices[,1])){ema20sig[i] <- ifelse(ema20tr[i]==1,1,
ifelse(ema20tr[i]==-1,0,ema20sig[i-1]))}
ema20sig[is.na(ema20sig)] <- 1
Thank you in advance for the answers!
The following changes to your code will do what you want on all four columns (with the prices data structure as per the question)
library(TTR)
library(Hmisc)
library(xts)
#I apply the EMA indicator to the prices xts dataframe
EMA20fn <- function(x) EMA(x, n=20)
EMA20prices <- xts(apply(prices, 2, EMA20fn), order.by=index(EMA20fn(prices[,1])))
#I know how to create the signals (in EMA20prices) for a single asset, but I don't know
#what kind of loop it's required to apply the signal to every asset in the dataframe
#prices
ema20tr = NULL
for (j in 1:ncol(prices)) {
ema20tr <- cbind(ema20tr,Lag(ifelse(Lag(prices[,j])<Lag(EMA20prices[,j])& prices[,j]>EMA20prices[,j],1,
ifelse(Lag(prices[,j])>Lag(EMA20prices[,j])& prices[,j]<EMA20prices[,j],-1,0))))
}
ema20tr[is.na(ema20tr)] <- 0
ema20sig <- ifelse(ema20tr>1,0,0)
for (j in 1:ncol(prices)) {
for(i in 2:length(prices[,j])) {ema20sig[i,j] <- ifelse(ema20tr[i,j]==1,1,
ifelse(ema20tr[i,j]==-1,0,ema20sig[i-1,j]))}
}
ema20sig[is.na(ema20sig)] <- 1

Calculate percentage to total using rowPercents

I am trying to calculate a percentage to total for, lets say, the following reproducible example:
structure(c(197.95, 197.95, 197.95, 186.8, 190.51, 195.16, 199.81,
202.59, 202.59, 202.59, 92.28, 92.28, 90.07, 89.82, 87.36, 87.61,
90.56, 89.82, 90.07, 89.82, 20.43, 20.43, 20.43, 20.43, 20.43,
20.43, 20.43, 20.43, 20.43, 20.64, 24.7, 24.95, 24.54, 23.97,
23.97, 24.38, 24.38, 24.38, 24.54, 24.54, 37.4, 37.4, 37.4, 35.43,
35.43, 35.43, 35.43, 35.43, 35.43, 39.37, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 16.05,
16.05, 16.05, 16.05, 15.62, 15.62, 16.05, 15.62, 15.62, 15.62,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), index = structure(c(470620800,
470880000, 470966400, 471052800, 471139200, 471225600, 471484800,
471571200, 471657600, 471744000), tzone = "UTC", tclass = "Date"), .indexCLASS = "Date", .indexTZ = "UTC", tclass = "Date", tzone = "UTC", class = c("xts",
"zoo"), .Dim = c(10L, 9L), .Dimnames = list(NULL, c("AVON", "BA.",
"CMRG", "COB", "MGGT", "QQ.", "RR.", "SNR", "ULE")))
I need to return the same presentation of my data but each value is a percentage of the total of the row it belongs to. I did a lot of research and tried prop.table which returns a subscript error and finally I used rowPercents which is part of RcmdrMisc package. However, I could not find how to let it ignore the NA in my data set.
In the example provides there are two whole columns of NA. I can not drop them as the whole data set has some values for the subsequent rows.
Note the the class of my example is zoo and xts
You don't need any external packages for this.
dat.percent <- dat / rowSums(dat, na.rm = T) * 100
Check that it works:
> all(abs(rowSums(dat.percent, na.rm = T) - 100) < 0.0001)
[1] TRUE
prop.table does not seem to work with xts/zoo objects but this works:
library(xts)
prop.table(coredata(x), 1)
It returns all NAs which is correct since there is an NA in each row (and it is impossible to calculate the proportions without knowing every value). If you want to regard the NA values as zero then:
prop.table( na.fill(coredata(x), 0), 1)

Resources