Replace values in an xts object according to some events on specific dates in R - r

I have two signal series and a data series as below.
BuyDates<-seq(as.Date("2013/1/1"), as.Date("2013/3/1"), by = "5 days")
SellDates<-seq(as.Date("2013/1/1"), as.Date("2013/3/1"), by = "7 days")
data<- xts(c(rnorm(32,100,3)),seq(as.Date("2013/1/1"), as.Date("2013/2/1"), by = "days"))
What i want is,the dates on which data gets buy signal from BuyDates,the value of data should be replaced by 1 and for SellDates it should be -1.And,on the remaining days in the sequence,1 or -1 should be carried forward till it gets the opposite signal,and for the days till the 1st signal,value should be replaced with NA.
kindly help

You can subset the data as usual:
data<- xts(rep(NA, 32),seq(as.Date("2013/1/1"), as.Date("2013/2/1"), by = "days"))
data[BuyDates] <- 1
data[SellDates] <- -1
Then you can carry forward the non-NA values using na.locf.
na.locf(data)

Related

Identify Min & Max Numeric Value within Date/Datetime range repeatedly

I am completely new to R so this is proving too complex to handle for me right now, so any help is much appreciated.
I am analysing price action data for BTC. I have 1 minute candles from 2019-09-08 19:13:00 to 2022-03-15 00:22:00 with the variables of open, high, low, close price as well as volume in BTC & USD and trade count for each of those minutes. Data source is https://www.cryptodatadownload.com/data/binance/ for anyone interested.
I cleaned up & correctly formatted the data and now want to analyse when BTC price made a low & high for various date & time ranges, for example:
What time of day in 30 minute increments did BTC made a low for the week?
Here is what I believe I need to do:
I need to tell R that 30 minutes is a range and identify the lowest & highest value for the "Low" and "High" variables within in as well as that a day is a range and within that the lowest & highest value for the "Low" and "High" variables as well as define a week as a range and within that the lowest & highest value for the "Low" and "High" variables.
Then I'd need to mark these values, the best method I can think of would be creating a new variable and have it as a TRUE/FALSE column like so:
btcusdt_binance_fut_1min$pa.low.of.week.30min
btcusdt_binance_fut_1min$pa.high.of.week.30min
Every minute row that is within that 30min low and high will be marked TRUE and every other minute within that week will be marked FALSE.
I looked at lubridate's interval() function but as far as I know the problem is I'd need to define each year, month, week, day, 30mins interval individually with start and end time, which is obviously not feasible. I believe I run into the same problem with the subset() function.
Another option seems to be the seq() and seq.POSIXt() functions as well as the range() function, but I haven't found a way for it.
Here is all my code and I am using this data set: https://www.cryptodatadownload.com/cdd/BTCUSDT_Binance_futures_data_minute.csv
library(readr)
library(lubridate)
library(tidyverse)
library(plyr)
library(dplyr)
# IMPORT CSV FILE AS DATA SET
# Name data set & choose import file
# Skip = 1 for skipping first row of CSV
btcusdt_binance_fut_1min <-
read.csv(
file.choose(),
skip = 1,
header = T,
sep = ","
)
# CLEAN UP & REORGANISE DATA
# Remove unix & symbol column
btcusdt_binance_fut_1min$unix = NULL
btcusdt_binance_fut_1min$symbol = NULL
# Rename date column to datetime
colnames(btcusdt_binance_fut_1min)[colnames(btcusdt_binance_fut_1min) == "date"] <-
"datetime"
# Convert datetime column to POSIXct format
btcusdt_binance_fut_1min$datetime <-
as_datetime(btcusdt_binance_fut_1min$datetime, tz = "UTC")
# Create variable column for each time element
btcusdt_binance_fut_1min$year <-
year(btcusdt_binance_fut_1min$datetime)
btcusdt_binance_fut_1min$month <-
month(btcusdt_binance_fut_1min$datetime)
btcusdt_binance_fut_1min$week <-
isoweek(btcusdt_binance_fut_1min$datetime)
btcusdt_binance_fut_1min$weekday <-
wday(btcusdt_binance_fut_1min$datetime,
label = TRUE,
abbr = FALSE)
btcusdt_binance_fut_1min$hour <-
hour(btcusdt_binance_fut_1min$datetime)
btcusdt_binance_fut_1min$minute <-
minute(btcusdt_binance_fut_1min$datetime)
# Reorder columns
btcusdt_binance_fut_1min <-
btcusdt_binance_fut_1min[, c(1, 9, 10, 11, 12, 13, 14, 4, 3, 2, 5, 6, 7, 8)]
Using data.table we can do the following:
btcusdt_binance_fut_1min <- data.table(datetime = seq.POSIXt(as.POSIXct("2022-01-01 0:00"), as.POSIXct("2022-01-01 2:59"), by = "1 min"))
btcusdt_binance_fut_1min[, group := format(as.POSIXct(cut(datetime, breaks = "30 min")), "%H:%M")]
the cut function will "floor" each datetime to it's nearest, smaller, half an hour. The format and as.POSIXct are just there to remove the date part to allow for easy comparing between dates for the same half hours, but if you prefer to keep it a datetime you can remove these functions.
After this the next steps are pretty straightforward:
btcusdt_binance_fut_1min[, .(High = max(High), Low = min(Low)), by=.(group)]

Trying to calculate a difference in dates, while excluding weekends - R-studio

IT_tickets[,"ticket_age"] <- NA
{R aging_count for Tasks}
IT_tickets$ticket_age[c(all_tasks)] <- difftime(IT_tickets$closed_at_date[c(all_tasks)], IT_tickets$sys_created_date[c(all_tasks)], units = "days")
I have this column called "ticket age" in my dataset IT_tickets, which calculates the difference in days when a ticket gets created and closed. How can I recode this so that it excludes weekends from the difference in days.
Similar to how NETWORK days function works in Excel.
If you don't have to include the holidays, you can do this
IT_tickets$ticket_age[c(all_tasks)] <- sum(!weekdays(seq(IT_tickets$sys_created_date[c(all_tasks)],
IT_tickets$closed_at_date[c(all_tasks)],
"days")) %in% c("Saturday", "Sunday")) - 1
If you want to include the start date into the count, you can remove the subtraction of 1.
Another way:
IT_tickets$ticket_age[c(all_tasks)] <- (IT_tickets$ticket_age[c(all_tasks)]%/%7) * 5 + IT_tickets$ticket_age[c(all_tasks)]%%7

creating inteval object in r using lubridate package [duplicate]

This question already has an answer here:
indicateing to which interval a date belongs
(1 answer)
Closed 4 years ago.
hi i have data from uber :
about pick ups in NYC .
im trying to add a column to the raw data, that indicates for each row, for
which time interval (which is represented by a single timepoint at the beginning of thetime interval) it belongs.
i want to Create a vector containing all relevant timepoints (i.e. every 15 minutes
Use int_diff function from lubridate package on this vector to create an
interval object.
Run a loop on all the time points in the raw data and for each data
point; indicate to which interval (which is represented by a single
timepoint at the beginning of the time interval) it belongs.
i tried looking for explanations how to use the int_diff function but i dont understand how my vector should look and how the syntax of int_diff works
tanks for the help :)
Is this what you have in mind?
start <- mdy_hm('4/11/2014 0:00') # start of the period
end <- mdy_hm('5/12/2015 0:00') # end
time_seq <- seq(from = start, to = end, by = '15 mins') # sequence by 15 minutes
times <- mdy_hm(c('4/11/2014 0:12', '4/11/2014 1:24')) # times to find intervals for
dat <- data.frame(times)
dat$intervals <- cut(times, breaks = time_seq) # assign each time to an interval
intervals_cols <- model.matrix(~ - + intervals, dat) # turn this into a set of columns, one for each interval, with a 1 indicating that this observation falls into the column

R: How to lag xts column by one day of the set

Imagine an intra-day set of data, e.g. hourly intervals. Thanks to Google and valuable Joshua's answers to other people, I managed to create new columns in the xts object carrying DAILY Open/High/Low/Close values. These are daily values applied on intra-day intervals so all rows of the same day have the same value in particular column. Since the HLC values are look-ahead biased, I want to move them to the next day. Let's focus on just one column called Prev.Day.Close.
Actual status:
My Prev.Day.Close column caries proper values for the current day. All "2010-01-01 ??:??" rows have the same value - Close of 2010-01-01 trading session. So it is not PREVIOUS day at the moment how the column name says.
What I need:
Lag the Prev.Day.Close column to the NEXT DAY OF THE SET.
I cannot lag it using lag() because it works on row (not day) basis. It must not be fixed calendar day like:
C <- ave(x$Close, .indexday(x), FUN = last)
index(C) <- index(C) + 86400
x$Prev.Day.Close <- C
Because this solution does not care about real data in the set. For example it adds new rows because the original data set has holes on weekends and holidays. Moreover, two particular days may not have the same number of intervals (rows) so the shifted data will not fit.
Desired result:
All rows of the first day in the set have NA in Prev.Day.Close because there is no previous day to get data from.
All rows of the second day have the same value in Prev.Day.Close - Any of the values I actually have in Prev.Day.Close of previous day.
The same for every next row.
If I understand correctly, here's one way to do it:
require(xts)
# sample data
dt <- .POSIXct(seq(1, 86400*4, 3600), tz="UTC")-1
x <- xts(seq_along(dt), dt)
# get the last value for each calendar day
daily.last <- apply.daily(x, last)
# merge the last value of the day with the origianl data set
y <- merge(x, daily.last)
# now lag the last value of the day and carry the NA forward
# y$daily.last <- na.locf(lag(y$daily.last))
y$daily.last <- lag(y$daily.last)
y$daily.last <- na.locf(y$daily.last)
Basically, you want to get the end of day values, merge them with the original data, then lag them. That will align the previous end of day values with the beginning of the day.

How do I subset every day except the last five days of zoo data?

I am trying to extract all dates except for the last five days from a zoo dataset into a single object.
This question is somewhat related to How do I subset the last week for every month of a zoo object in R?
You can reproduce the dataset with this code:
set.seed(123)
price <- rnorm(365)
data <- cbind(seq(as.Date("2013-01-01"), by = "day", length.out = 365), price)
zoodata <- zoo(data[,2], as.Date(data[,1]))
For my output, I'm hoping to get a combined dataset of everything except the last five days of each month. For example, if there are 20 days in the first month's data and 19 days in the second month's, I only want to subset the first 15 and 14 days of data respectively.
I tried using the head() function and the first() function to extract the first three weeks, but since each month will have a different amount of days according to month or leap year months, it's not ideal.
Thank you.
Here are a few approaches:
1) as.Date Let tt be the dates. Then we compute a Date vector the same length as tt which has the corresponding last date of the month. We then pick out those dates which are at least 5 days away from that:
tt <- time(zoodata)
last.date.of.month <- as.Date(as.yearmon(tt), frac = 1)
zoodata[ last.date.of.month - tt >= 5 ]
2) tapply/head For each month tapply head(x, -5) to the data and then concatenate the reduced months back together:
do.call("c", tapply(zoodata, as.yearmon(time(zoodata)), head, -5))
3) ave Define revseq which given a vector or zoo object returns sequence numbers in reverse order so that the last element corresponds to 1. Then use ave to create a vector ix the same length as zoodata which assigns such reverse sequence numbers to the days of each month. Thus the ix value for the last day of the month will be 1, for the second last day 2, etc. Finally subset zoodata to those elements corresponding to sequence numbers greater than 5:
revseq <- function(x) rev(seq_along(x))
ix <- ave(seq_along(zoodata), as.yearmon(time(zoodata)), FUN = revseq)
z <- zoodata[ ix > 5 ]
ADDED Solutions (1) and (2).
Exactly the same way as in the answer to your other question:
Split dataset by month, remove last 5 days, just add a "-":
library(xts)
xts.data <- as.xts(zoodata)
lapply(split(xts.data, "months"), last, "-5 days")
And the same way, if you want it on one single object:
do.call(rbind, lapply(split(xts.data, "months"), last, "-5 days"))

Resources