different formats of dates using as.POSIXct() or similar - r

I am writing a function where I plot several different data frames composed by time series.
The data frames are composed by several columns. The first one of each is always "time", and the following are the parameters.
Any data frame is different from each other
Once I imported the data set, the function create a calendar vector "Time"
Time <- as.POSIXct(TS[,1])
In a for loop, I use the function xts() to create the time series of one single parameter (one by one), using the column "time" to order the time series.
xts(TS[,i],order.by = Time)
then I plot.
As result, the script looks like this
TS <- read.table("ts.txt",header = T, dec = ".")
Time <- as.POSIXct(TS[,1])
for (i in 2:length(TS[1,])
{
p <- plot(xts(TS[,i],order.by = Time)
print(p)
}
I have problems with as.POSIXct() when the format of my vector time in the data frames is not yyyy-mm-dd. Here few examples:
In some data frames, in "time" I have only "yyyy" in which pasting the "mm-dd" to "yyyy", would not make any sense because of the data (the columns, in this case, are the months).
In other situations, I have also negative dates because they are BC.
Are there other functions I can use to create calendar dates suitable to xts() using a different format like mine?
Here three examples of data set I have the problem with:
#1
year <- c(seq(1900, 2000, by = 10))
Jan <- c(rnorm(length(year), mean = 1, 5))
Feb <- c(rnorm(length(year), mean = 6, 9))
TS <- as.data.frame(cbind(year,Jan,Feb))
str(TS)
#2
year <- c(seq(-500, 2000, by = 100))
Jan <- c(rnorm(length(year), mean = 1, 5))
Feb <- c(rnorm(length(year), mean = 6, 9))
TS <- as.data.frame(cbind(year,Jan,Feb))
str(TS)
#3
time <- c("-100/01/01", "-100/06/01", "0/01/01", "1400/01/01", "2000/01/01")
people <- abs(c(rnorm(length(time), mean = 6, 9)))
TS <- as.data.frame(cbind(time,people))
str(TS)

Related

Creating time series in R

I have a CSV file containing data as follows-
date, group, integer_value
The date starts from 01-January-2013 to 31-October-2015 for the 20 groups contained in the data.
I want to create a time series for the 20 different groups. But the dates are not continuous and have sporadic gaps in it, hence-
group4series <- ts(group4, frequency = 365.25, start = c(2013,1,1))
works from programming point of view but is not correct due to gaps in data.
How can I use the 'date' column of the data to create the time series instead of the usual 'frequency' parameter of 'ts()' function?
Thanks!
You could use zoo::zoo instead of ts.
Since you don't provide sample data, let's generate daily data, and remove some days to introduce "gaps".
set.seed(2018)
dates <- seq(as.Date("2015/12/01"), as.Date("2016/07/01"), by = "1 day")
dates <- dates[sample(length(dates), 100)]
We construct a sample data.frame
df <- data.frame(
dates = dates,
val = cumsum(runif(length(dates))))
To turn df into a zoo timeseries, you can do the following
library(zoo)
ts <- with(df, zoo(val, dates))
Let's plot the timeseries
plot.zoo(ts)

Multiply a timeseries by a factor in R

I have the following time series:
ts <- cbind(data.frame(date=seq(as.Date("2017/11/01"), by = "day", length.out = 30)),value=rep(5,30))
ts <- ts[order(ts$date, decreasing=T),]
I would like to adjust it by the below cumulative factor that has a value on some given dates:
cf <- cbind(data.frame(date=as.Date(c("2017/11/28", "2017/11/25","2017/11/04","2017/09/25"))),cumfactor=c(0.8,0.7,0.6,.05))
Such that, the value on each date on ts will be multiplied (adjusted) by the cumfactor on cf on the corresponding date and that cumfactor will be used for subsequent (earlier) dates until the next cumfactor shows up for an earlier date. The first (latest) dates in ts should not be adjusted if they are later than the first (latest) cumfactor date.
I am looking for the following result:
result <- cbind(data.frame(date=seq(as.Date("2017/11/01"), by = "day", length.out = 30)),value=c(3,3,3,3,3.5,3.5,3.5,3.5,3.5,3.5,3.5,3.5,3.5,3.5,3.5,3.5,3.5,3.5,3.5,3.5,3.5,3.5,3.5,3.5,3.5,4,4,4,5,5))
result <- result[order(result$date, decreasing=T),]
My guess is that a for loop could be the best option but I haven't been successful at obtaining this result.
Merge ts and cf, carry forward the factors and multiply them.
library(zoo)
m <- merge(ts, cf, all.x = TRUE)[nrow(ts):1, ]
transform(m, value = value * na.fill(na.locf0(cumfactor), 1))
We have retained your descending sequence of dates from the question but note that in R normally time series are represented in ascending order of date.

R check consistency of separated timeseries-table

I have a timeseries-table like this, which goes up to 2000 31 12 23 (12/31/2000 23:00):
I'd like to add temparature values from several weatherstations to it. The problem is, that obviously the different timeseries dont't match by count of rows, so there must be gaps.
How can I check up on these dataframes if they consequently follow the pattern of 0-24 hours, 1-12 months and get information of where these gaps are?
If your data is in the format of the link then you can probably convert it to a POSIXct object by doing the following (assuming your data frame is called data):
data = as.data.frame(list(YY = rep("1962",10),
MM = rep("01",10),
DD = rep("01",10),
HH = c("00","01","02","03","04",
"05","06","07","08","09")))
date = paste(data$YY,data$MM,data$DD,sep="-")
data$dateTime = as.POSIXct(paste(date,data$HH,sep=" "),format="%Y-%m-%d %H")
That should put your data into a POSIXct format. If your temperature dataset also has a column called "dateTime" and it's a POSIXct object you should be able to use the merge function and it will combine the two data frames
temp = as.data.frame(list(YY = rep("1962",10),
MM = rep("01",10),
DD = rep("01",10),
HH = c("00","01","02","03","04",
"05","06","07","08","09")))
date1 = paste(temp$YY,temp$MM,temp$DD,sep="-")
temp$dateTime = as.POSIXct(paste(date1,temp$HH,sep=" "),format="%Y-%m-%d %H")
temp$temp = round(rnorm(10,0,5),1)
temp = temp[,c("dateTime","temp")]
#let's say your temperature dataset is missing an entry for a certain timestamp
temp = temp[-3,]
# this data frame won't have an entry for 02:00:00
data1 = merge(data,temp)
data1
# if you want to look at time differences you can try something like this
diff(data1$dateTime)
# this one will fill in the temp value as NA at 02:00:00
data2 = merge(data,temp,all.x = T)
data2
diff(data2$dateTime)
I hope that helps, I often use the merge function when I'm trying to match up timestamps from ecological datasets
Thank you for you answer and sorry for my late reply.
Couldn't make it without your helpful hints though I now managed to merge all my timeseries on a slightly different way:
Sys.setenv(TZ='UTC') #setting system time to UTC for not having DST-gaps
# creating empty hourly timeseries for following join
start = strptime("1962010100", format="%Y%m%d%H")
end = strptime("2000123123", format= "%Y%m%d%H")
series62_00 <- data.frame(
MESS_DATUM=seq(start, end, by="hour",tz ='UTC'), t = NA)
# joining all the temperatureseries with same timespan using "plyr"-package
library("plyr")
t_allstations <- list(series62_00,t282,t867,t1270,t2261,t2503
,t2597,t3668,t3946,t4752,t5397,t5419,t5705)
t_omain_DWD <- join_all(t_allstations, by = "MESS_DATUM", type = "left")
Using join_all with type = "left" makes sure, that the column "Date" is not changed and missing temperature values are filled in as NA's.

Use dygraph for R to plot xts time series by year only?

I am trying to use the new dygraphs for R library to plot winning times for men and women in the Boston Marathon each year. I've got a data frame of winning times by second, here's a portion of it:
winners <- data.frame(year=1966:1971, mensec=c(8231, 8145, 8537, 8029, 7830, 8325), womensec=c(12100, 12437, 12600, 12166, 11107, 11310))
But I don't know how to create an xts object from this. I can create a regular time series from each column and graph each one using dygraph in a separate graph
men <- ts(winners$mensec, frequency = 1, start=winners$year[1])
dygraph(men)
women <- ts(winners$womensec, frequency = 1, start=winners$year[1])
dygraph(women)
If I try to cbind the time series it won't work in dygraph
both <- cbind(men, women)
dygraph(both)
The error message is
Error in xts(x.mat, order.by = order.by, frequency = frequency(x), ...) :
NROW(x) must match length(order.by)
Any suggestions? Thanks
This looks like a bug in as.xts.ts. It uses length(x) to create the sequence of dates for the index, which returns the number of elements for a matrix (not the number of rows).
You can work around it by using as.xts on your ts objects before calling cbind on them.
both <- cbind(men=as.xts(men), women=as.xts(women))
Looks like Joshua answered the question. Here is the full answer with slightly different code and also with a year x-axis formatter.
library(xts)
library(dygraphs)
winners <- data.frame(
year=1966:1971
, mensec=c(8231, 8145, 8537, 8029, 7830, 8325)
, womensec=c(12100, 12437, 12600, 12166, 11107, 11310)
)
winners_xts <- as.xts(
winners[-1]
, order.by = as.Date(
paste0(winners$year,"-01-01",format="%Y-01-01")
)
)
dygraph( winners_xts ) %>%
dyAxis(
name="x"
,axisLabelFormatter = "function(d){ return d.getFullYear() }"
)
# using the original data here is a way to merge
# merge will work just like cbind
# but need to convert to xts first
men <- ts(winners$mensec, frequency = 1, start=winners$year[1])
women <- ts(winners$womensec, frequency = 1, start=winners$year[1])
do.call(merge,Map(as.xts,list(men=men,women=women)))

Time series subset plot showing data outside of the subset range in R

First of all, very new to programming.
I'm working on an independent project to run regression on time series FED data. I have two tables: one is 2 year treasury yields (2YR) observed daily and the other is the DOW index observed daily. My DOW data only goes back to 2006, so I want to subset the rows from the 2YR that correspond with the dates I have DOW data.
When I create a new table in R, only the values for the subset are shown, but when I plot the data, it's plotting values from a time earlier than I need (data that isn't in the subset).
Below is my very messy work in progress:
setwd("/Users/efridge123/Desktop/R/Directory/Fed_Data")
yield <- read.table("tyield.csv", header = TRUE, sep = ",")
djia <- read.csv("DJIA.csv", header = TRUE, sep = ",")
DOW <- djia[,"VALUE"]
yield1 <- yield[-(1:7320), c("DATE", "DGS2")]
**also tried..**
##yield1 <- yield[7321:9929, c("DATE","DGS2")]
yield1$DATE <- as.Date(yield1$DATE, format = "%Y-%m-%d")
dy2 <- cbind(yield1, DOW)
yield_time_series <- function(){
dy2$DOW <- NULL
plot(dy2, ylab = "2 Year Yield")
}
yield_time_series()
Below is a picture of the plot:
I suppose I could probably just extract a subset from 2YR and add it to DOW (here I did the opposite), but I would rather have working knowledge of how the problem can be resolved in R, so that in the future when I don't have the option, I have the knowledge.

Resources