R - interpolate time series to same interval but new time points - r

Apologies if this is an obvious question, but I am new to R, having spent many years with SAS
I have two data files of measurements taken on 10-second intervals, but not at the exact same time points. I would like to convert one of the time series to match the times of the other, adjusting its values with linear interpolation. In SAS I could do this pretty quickly with proc expand, but I can't find anything similar package in R (I've looked at zoo and xts).
To show what I mean, here are snippets of my two files. In this case one time series is on the 3's and the other is on the 2's. In this particular case that 1-second difference is probably trivial, but this is a problem I run into a lot in my work and so I'd like to know if there's any easy way to recalculate, via linear interpolation, the values in the second data set to also be on the 3's
Date,Time,Value
3-Nov-16,13:15:53,264.651
3-Nov-16,13:16:03,264.58
3-Nov-16,13:16:13,264.368
3-Nov-16,13:16:23,264.273
3-Nov-16,13:16:33,264.391
11/3/16,1:15:52 PM,10.1
11/3/16,1:16:02 PM,10.1
11/3/16,1:16:12 PM,10.1
11/3/16,1:16:22 PM,10.1
11/3/16,1:16:32 PM,10.1

You can use the 'approx' function. Here is an example with your data:
> input <- read.table(text = "11/3/16,1:15:52 PM,10.1
+
+ 11/3/16,1:16:02 PM,10.1
+
+ 11/3/16,1:16:12 PM,10.1
+
+ 11/3/16,1:16:22 PM,10.1
+
+ 11/3/16,1:16:32 PM,10.1", as.is = TRUE, sep = ',')
>
> # convert the date
> input$time <- as.POSIXct(input$V2, format = "%I:%M:%S %p")
> library(lubridate)
>
> input$newtime <- input$time
>
> first <- read.table(text = "3-Nov-16,13:15:53,264.651
+
+ 3-Nov-16,13:16:03,264.58
+
+ 3-Nov-16,13:16:13,264.368
+
+ 3-Nov-16,13:16:23,264.273
+
+ 3-Nov-16,13:16:33,264.391", as.is = TRUE, sep = ',')
> first$time <- as.POSIXct(first$V2, format = "%H:%M:%S")
>
> # use "approx" to interprete values
> # find values for times in "input" since "first" has different values
> input$result <- approx(first$time,
+ first$V3,
+ xout = input$time,
+ rule = 2
+ )$y
>
>
> input
V1 V2 V3 time newtime result
1 11/3/16 1:15:52 PM 10.1 2017-01-11 13:15:52 2017-01-11 13:15:52 264.6510
2 11/3/16 1:16:02 PM 10.1 2017-01-11 13:16:02 2017-01-11 13:16:02 264.5871
3 11/3/16 1:16:12 PM 10.1 2017-01-11 13:16:12 2017-01-11 13:16:12 264.3892
4 11/3/16 1:16:22 PM 10.1 2017-01-11 13:16:22 2017-01-11 13:16:22 264.2825
5 11/3/16 1:16:32 PM 10.1 2017-01-11 13:16:32 2017-01-11 13:16:32 264.3792
> first
V1 V2 V3 time
1 3-Nov-16 13:15:53 264.651 2017-01-11 13:15:53
2 3-Nov-16 13:16:03 264.580 2017-01-11 13:16:03
3 3-Nov-16 13:16:13 264.368 2017-01-11 13:16:13
4 3-Nov-16 13:16:23 264.273 2017-01-11 13:16:23
5 3-Nov-16 13:16:33 264.391 2017-01-11 13:16:33
>

I apologize that I am bit lazy trying evaluating input as you asked. I am still learning R. I wonder if this piece of code solves your fundamental issue.
The algorithm is simple
I change all time to Unix time which is the number of seconds since epoch.
I use unixtime as x and the second number as y
I create new data points based on the second set of datetime as unixtime.
toUnixTime <- function( dateobj ){
return (as.numeric(as.POSIXct(dateobj, origin="1970-01-01")))
}
toDateTime <- function( unixtime){
return (as.POSIXct(unixtime, origin="1970-01-01"))
}
toUnix <- function( datetime ){
return ( as.numeric(strptime( datetime, "%d-%b-%y,%H:%M:%S") ) )
}
toUnix2 <- function( datetime ){
return ( as.numeric(strptime( datetime, "%m/%d/%y,%I:%M:%S %p") ) )
}
main <- function(){
x <- c(toUnix("3-Nov-16,13:15:53" ),toUnix("3-Nov-16,13:16:03:53"))
y <- c(264.651,264.58)
f <- approxfun(x,y)
f(toUnix2("11/3/16,1:16:02 PM"))
}
main()
This outputs
264.5871 for 11/3/16,1:16:02 PM

Related

Converting time in integer to MIN:SEC

I am using this code to find a difference between two times:
station_data.avg$duration[i] = if_else(station_data.avg$swath[i] != 0, round(
difftime(station_data.avg$end[i], station_data.avg$start[i], units = "mins"),
3
), 0)
But the output is 3.116667 and I want the output to be in the format Min:sec so 3:18
I tried
station_data.avg$duration[i]= as.character(times(station_data.avg$duration[i] / (24 * 60 )))
and was hoping that would work but it did not
You can use the chron package to convert fraction of the minute (ie, x.25 indicating 25% of a minute) into x.15 indicating out of 60 seconds (15/60 = 25). An example is below, but if you edit your question to make it reproducible, I can provide more specific help.
Data
a <- Sys.time()
b <- Sys.time() + 60 * 3 + 15 # add 3 min 15 seconds
Code
difftime(b, a, units = "min")
# Time difference of 3.250006 mins
chron::times(as.numeric(difftime(b, a, units = "days")))
# [1] 00:03:15
Note the change to units = "days" in this context.
You could further parse this out by wrapping this in lubridate::hms:
lubridate::hms(
chron::times(as.numeric(difftime(b, a, units = "days")))
)
# [1] "3M 15S"

Non-conformable arrays in eventstudies package R

I am trying my best at a simple event study in R using the market model. I am using the eventstudies package and one of the steps is the calculation of the abnormal returns, being ar <- es$z.e - esMean.
The authors of the package have provided an example for the xts object (StockPriceReturns), Events (SplitDates) and for Market (Other Returns). This looks like the following:
> library(eventstudies)
> data(StockPriceReturns)
> data(SplitDates)
> head(SplitDates)
unit when
5 BHEL 2011-10-03
6 Bharti.Airtel 2009-07-24
8 Cipla 2004-05-11
9 Coal.India 2010-02-16
10 Dr.Reddy 2001-10-10
11 HDFC.Bank 2011-07-14
> head(StockPriceReturns)
Bajaj.Auto
2010-07-01 0.5277396
2010-07-02 -1.7309383
2010-07-05 -0.2530097
2010-07-06 -0.3167551
2010-07-07 -1.2771502
2010-07-08 -0.2827092
With this data, I would like to do an event study – see below R code:
# 10-day window around the event
es <- phys2eventtime(z=StockPriceReturns, SplitDates, width = 10)
es.cmr <- constantMeanReturn(es$z.e[which(attributes(es$z.e)$index%in%-30:-11), ], residual = FALSE)
ar <- es$z.e - es.cmr
ar <- window(ar, start = -1, end = 10)
car <- remap.cumsum(ar, is.pc = FALSE, base = 0)
rowMeans(car, na.rm = TRUE)
So, if I run ar <- es$z.e - es.cmr it returns:
> ar <- es$z.e - es.cmr
Error in `-.default`(es$z.e, es.cmr) : non-conformable arrays
I have looked at the original function (available at their GitHub) to figure out this error, but I ran out of ideas how to debug it. I hope someone can help me sort it out.
Regarding the marketModel: does anybody know how to do the same with the marketModel function of the eventstudies package. Because here I'm not able to write es.mm (see below) due to the error message: ROW(firm.returns) == NROW(market.returns) is not TRUE
es.mm <- marketModel(firm.returns = es$z.e[which(attributes(es$z.e)$index%in%-30:-11), ], market.returns = OtherReturns[, "NiftyIndex"], residual = FALSE)
Thank you very much for your help.

Renko Chart in R

I am trying to construct Renko Chart using the obtained from Yahoo finance and was wondering if there is any package to do so. I had a look at the most financial packages but was only able to find Candlestick charts.
For more information on Renko charts use the link given here
Really cool question! Apparently, there is really nothing of that sort available for R. There were some attempts to do similar things (e.g., waterfall charts) on various sites, but they all don't quite hit the spot. Soooo... I made a little weekend project out of it with data.table and ggplot.
rrenko
There are still bugs, instabilities, and visual things that I would love to optimize (and the code is full of commented out debug notes), but the main idea should be there. Open for feedback and points for improvement.
Caveats: There are still case where the data transformation screws up, especially if the size is very small or very large. This should be fixable in the near future. Also, the renko() function at the moment expects a dataframe with two columns: date (x-axis) and close (y-axis).
Installation
devtools::install_github("RomanAbashin/rrenko")
library(rrenko)
Code
renko(df, size = 5, style = "modern") +
scale_y_continuous(breaks = seq(0, 150, 10)) +
labs(x = "", y = "")
renko(df, size = 5, style = "classic") +
scale_y_continuous(breaks = seq(0, 150, 10)) +
labs(x = "", y = "")
Data
set.seed(1702)
df <- data.frame(date = seq.Date(as.Date("2014-05-02"), as.Date("2018-05-04"), by = "week"),
close = abs(100 + cumsum(sample(seq(-4.9, 4.9, 0.1), 210, replace = TRUE))))
> head(df)
date close
1: 2014-05-02 104.0
2: 2014-05-09 108.7
3: 2014-05-16 111.5
4: 2014-05-23 110.3
5: 2014-05-30 108.9
6: 2014-06-06 106.5
I'm R investment developer, I used some parts of Roman's code to optimize some lines of my Renko code. Roman's ggplot skills are awesome. The plot function was just possible because of Roman's code.
If someone is interesting:
https://github.com/Kinzel/k_rrenko
It will need the packages: xts, ggplot2 and data.table
"Ativo" need to be a xts, with one of columns named "close" to work.
EDIT:
After TeeKea request, how to use it is simple:
"Ativo" is a EURUSD xts 15-min of 2015-01-01 to 2015-06-01. If the "close" column is not found, it will be used the last one.
> head(Ativo)
Open High Low Close
2015-01-01 20:00:00 1.20965 1.21022 1.20959 1.21006
2015-01-01 20:15:00 1.21004 1.21004 1.20979 1.21003
2015-01-01 20:30:00 1.21033 1.21041 1.20982 1.21007
2015-01-01 20:45:00 1.21006 1.21007 1.20978 1.21002
2015-01-01 21:00:00 1.21000 1.21002 1.20983 1.21002
2015-01-02 00:00:00 1.21037 1.21063 1.21024 1.21037
How to use krenko_plot:
krenko_plot(Ativo, 0.01,withDates = F)
Link to image krenko_plot
Compared to plot.xts
plot.xts(Ativo, type='candles')
Link to image plot.xts
There are two main variables: size and threshold.
"size" is the size of the bricks. Is needed to run.
"threshold" is the threshold of new a brick. Default is 1.
The first brick is removed to ensure reliability.
Here's a quick and dirty solution, adapted from a python script here.
# Get some test data
library(rvest)
url <- read_html("https://coinmarketcap.com/currencies/bitcoin/historical-data/?start=20170602&end=20181126")
df <- url %>% html_table() %>% as.data.frame()
# Make sure to have your time sequence the right way up
data <- apply(df[nrow(df):1, 3:4], 1, mean)
# Build the renko function
renko <- function(data, delta){
pre <- data[1]
xpos <- NULL
ypos <- NULL
xneg <- NULL
yneg <- NULL
for(i in 1:length(data)){
increment <- data[i] - pre
incrementPerc <- increment / pre
pre <- data[i]
if(incrementPerc > delta){
xpos <- c(xpos, i)
ypos <- c(ypos, data[i])
}
if(incrementPerc < -delta){
xneg <- c(xneg, i)
yneg <- c(yneg, data[i])
}
}
signal <- list(xpos = xpos,
ypos = unname(ypos),
xneg = xneg,
yneg = unname(yneg))
return(signal)
}
# Apply the renko function and plot the outcome
signals <- renko(data = data, delta = 0.05)
plot(1:length(data), data, type = "l")
points(signals$xneg, signals$yneg, col = "red", pch = 19)
points(signals$xpos, signals$ypos, col = "yellowgreen", pch = 19)
NOTE: This is not a renko chart (thanks to #Roman). Buy and sell signals are displayed only. See reference mentioned above...

apply.rolling window / loop with a rank function

How can I make a rolling window / loop (look-back period 30 days / data points) while ranking the data with base::rank? See below that the apply.rolling function seems not to work.
See example below:
# example data
require(xts)
set.seed(3)
A <- matrix(runif(900, max=30), ncol=3)
Data <- xts(A, Sys.Date()-300:1)
names(Data) <- c("C1", "C2", "C3")
This results in (only last 7 days / data points are shown):
2016-06-20 16.71131510 12.80074552 19.27525535
2016-06-21 22.92512330 25.11613536 17.45237229
2016-06-22 20.09403965 17.20945809 28.06481040
2016-06-23 28.68593738 4.84698272 18.36108782
2016-06-24 15.52956209 25.54946621 3.97892474
2016-06-25 25.76582707 18.14117193 8.17883282
2016-06-26 25.23925100 16.07418907 15.35118717
I select only the last 30 data points:
rolldata30 <- tail(Data[,2:3], 30)
rollindex30 <- tail(Data[,1], 30)
I rank the data (last 30 data points) of vector C2 and C3 based on their original values. Thus this is the period 2016-05-28 until 2016-6-26. Then I make a new vector which calculates an average of the two.
factorx shows the result I am interested in.
rank30 <- as.xts(apply(-rolldata30, 2, rank, na.last= "keep"))
factor <- cbind(rollindex30, global = rowMeans(rank30))
factorx <- last(factor)
Which results in:
2016-06-20 16.711315 14.5
2016-06-21 22.925123 9.5
2016-06-22 20.094040 9.0
2016-06-23 28.685937 19.0
2016-06-24 15.529562 15.0
2016-06-25 25.765827 18.5
2016-06-26 25.239251 17.0
with data on the last day:
C1 global
2016-06-26 25.23925 17
How can I make the calculation rolling in order to make the same calculation for 2016-5-27 until 2016-06-26, 2016-05-26 until 2016-06-25, etc.?
Using PerformanceAnalytics::apply.rolling gives an error:
Error in xts(x, order.by = order.by, frequency = frequency, .CLASS = "double", :
order.by requires an appropriate time-based object
require(PerformanceAnalytics)
test1 <- apply.rolling(Data, width=30, gap=30, by=1, FUN=function(x) as.xts(-x, 2, rank))
I made the following function. factorz gives the same result. Perhaps the function helps to make it rolling?
rollrank <- function(x)
{
a <- tail(x, 30)
b <- as.xts(apply(-a, 2, rank, na.last= "keep"))
c <- cbind(a, global = rowMeans(b))
d <- last(c)
return(d)
}
factorz <- rollrank(Data[,2:3])
The FUN argument to apply.rolling doesn't make sense. I suspect you meant FUN = function(x) as.xts(apply(-x, 2, rank, na.last="keep")). But that still will not work because FUN returns an object with more than one row.
Your rollrank function comes very close to what you need, and I recommend you use rollapply instead of apply.rolling. I suggest that you make a function based on your first example, then pass that function to rollapply.
myrank <- function(x) {
rolldata30 <- x[,2:3]
rollindex30 <- x[,1]
rank30 <- as.xts(apply(-rolldata30, 2, rank, na.last= "keep"))
factor <- cbind(rollindex30, global = rowMeans(rank30))
factorx <- last(factor)
return(factorx)
}
test1 <- rollapply(Data, 30, myrank, by.column=FALSE)
tail(test1)
# C1 global
# 2016-06-23 7.806336 19.5
# 2016-06-24 17.456436 17.5
# 2016-06-25 29.196350 12.5
# 2016-06-26 25.185687 11.0
# 2016-06-27 19.775105 6.5
# 2016-06-28 12.067774 16.0

how to do statistic with time date

I got a serial of times, as following,
2013-12-27 00:31:15
2013-12-29 17:01:17
2013-12-31 01:52:41
....
my target is to know what time in a day is more important, like most times are in the period of 17:00 ~ 19:00.
In order to do that, I think I should draw every single time as a point in x-axes, and the unit of x-axes is minute.
I don't know how to do it exactly with R and ggplot2.
Am I on the right way? I mean, is there a better way to get my target?
library(chron)
# create some test data - hrs
set.seed(123)
Lines <- "2013-12-27 00:31:15
2013-12-29 17:01:17
2013-12-31 01:52:41
"
tt0 <- times(read.table(text = Lines)[[2]]) %% 1
rng <- range(tt0)
hrs <- 24 * as.vector(sort(diff(rng) * runif(100)^2 + rng[1]))
# create density, find maximum of it and plot
d <- density(hrs)
max.hrs <- d$x[which.max(d$y)]
ggplot(data.frame(hrs)) +
geom_density(aes(hrs)) +
geom_vline(xintercept = max.hrs)
giving:
> max.hrs # in hours - nearly 2 am
[1] 1.989523
> times(max.hrs / 24) # convert to times
[1] 01:59:22

Resources