Filter specificaly with hour/minutes/seconds - R - r

I have this df:
> Time
[1] "02:15:00" "02:30:00" "02:45:00" "03:00:00" "03:15:00" "03:30:00"
I wanna delete all the time values before 3:00:00. However, I need to do it in a format of hour = 3, minutes = 0, seconds = 0. Like:
df <- df[df$Time < a_function(hour=3, minutes=0, seconds=0) ,]
I want to know how can I do this with time values, as I can do it with year, month, and day.

Why do you want to do it that way? Just asking for context. Also have you looked at lubridate::hms()? It takes a time object and converts to periods of time.

Related

R - Use Lubridate to create 1 second intervals in datetime column where only minutes are specified

I am working with a time series that looks something like this:
# making a df with POSIXct datetime sequence with just minutes
#Make reproducible data frame:
set.seed(1234)
datetime <- rep(lubridate::ymd_hm("2016-08-01 15:10"), 60)
# Generate measured value
value <- runif(n = 60, min = 280, max = 1000)
df <- data.frame(datetime, value)
The data is actually recorded at 1 second intervals, but it appears as 60 rows with the same hour and minute with with seconds part always at 00. I want to change it such that each minute has its seconds value increasing at one second intervals. The actual dataset includes many hours of data. Thank you
We can use
df$datetime <- with(df, datetime + seconds(seq_along(datetime)) -1)

Identify Min & Max Numeric Value within Date/Datetime range repeatedly

I am completely new to R so this is proving too complex to handle for me right now, so any help is much appreciated.
I am analysing price action data for BTC. I have 1 minute candles from 2019-09-08 19:13:00 to 2022-03-15 00:22:00 with the variables of open, high, low, close price as well as volume in BTC & USD and trade count for each of those minutes. Data source is https://www.cryptodatadownload.com/data/binance/ for anyone interested.
I cleaned up & correctly formatted the data and now want to analyse when BTC price made a low & high for various date & time ranges, for example:
What time of day in 30 minute increments did BTC made a low for the week?
Here is what I believe I need to do:
I need to tell R that 30 minutes is a range and identify the lowest & highest value for the "Low" and "High" variables within in as well as that a day is a range and within that the lowest & highest value for the "Low" and "High" variables as well as define a week as a range and within that the lowest & highest value for the "Low" and "High" variables.
Then I'd need to mark these values, the best method I can think of would be creating a new variable and have it as a TRUE/FALSE column like so:
btcusdt_binance_fut_1min$pa.low.of.week.30min
btcusdt_binance_fut_1min$pa.high.of.week.30min
Every minute row that is within that 30min low and high will be marked TRUE and every other minute within that week will be marked FALSE.
I looked at lubridate's interval() function but as far as I know the problem is I'd need to define each year, month, week, day, 30mins interval individually with start and end time, which is obviously not feasible. I believe I run into the same problem with the subset() function.
Another option seems to be the seq() and seq.POSIXt() functions as well as the range() function, but I haven't found a way for it.
Here is all my code and I am using this data set: https://www.cryptodatadownload.com/cdd/BTCUSDT_Binance_futures_data_minute.csv
library(readr)
library(lubridate)
library(tidyverse)
library(plyr)
library(dplyr)
# IMPORT CSV FILE AS DATA SET
# Name data set & choose import file
# Skip = 1 for skipping first row of CSV
btcusdt_binance_fut_1min <-
read.csv(
file.choose(),
skip = 1,
header = T,
sep = ","
)
# CLEAN UP & REORGANISE DATA
# Remove unix & symbol column
btcusdt_binance_fut_1min$unix = NULL
btcusdt_binance_fut_1min$symbol = NULL
# Rename date column to datetime
colnames(btcusdt_binance_fut_1min)[colnames(btcusdt_binance_fut_1min) == "date"] <-
"datetime"
# Convert datetime column to POSIXct format
btcusdt_binance_fut_1min$datetime <-
as_datetime(btcusdt_binance_fut_1min$datetime, tz = "UTC")
# Create variable column for each time element
btcusdt_binance_fut_1min$year <-
year(btcusdt_binance_fut_1min$datetime)
btcusdt_binance_fut_1min$month <-
month(btcusdt_binance_fut_1min$datetime)
btcusdt_binance_fut_1min$week <-
isoweek(btcusdt_binance_fut_1min$datetime)
btcusdt_binance_fut_1min$weekday <-
wday(btcusdt_binance_fut_1min$datetime,
label = TRUE,
abbr = FALSE)
btcusdt_binance_fut_1min$hour <-
hour(btcusdt_binance_fut_1min$datetime)
btcusdt_binance_fut_1min$minute <-
minute(btcusdt_binance_fut_1min$datetime)
# Reorder columns
btcusdt_binance_fut_1min <-
btcusdt_binance_fut_1min[, c(1, 9, 10, 11, 12, 13, 14, 4, 3, 2, 5, 6, 7, 8)]
Using data.table we can do the following:
btcusdt_binance_fut_1min <- data.table(datetime = seq.POSIXt(as.POSIXct("2022-01-01 0:00"), as.POSIXct("2022-01-01 2:59"), by = "1 min"))
btcusdt_binance_fut_1min[, group := format(as.POSIXct(cut(datetime, breaks = "30 min")), "%H:%M")]
the cut function will "floor" each datetime to it's nearest, smaller, half an hour. The format and as.POSIXct are just there to remove the date part to allow for easy comparing between dates for the same half hours, but if you prefer to keep it a datetime you can remove these functions.
After this the next steps are pretty straightforward:
btcusdt_binance_fut_1min[, .(High = max(High), Low = min(Low)), by=.(group)]

How to create intervals of 1 hour

How to create for every date hourly timestamps?
So for example from 00:00 til 23:59. The result of the function could be 10:00. I read on the internet that loop could work but we couldn't make it fit.
Data sample:
df = data.frame( id = c(1, 2, 3, 4), Date = c(2021-04-18, 2021-04-19, 2021-04-21
07:07:08.000, 2021-04-22))
A few points:
The input shown in the question is not valid R syntax so we assume what we have is the data frame shown reproducibly in the Note at the end.
the question did not describe the specific output desired so we will assume that what is wanted is a POSIXct vector of hourly values which in (1) below we assume is from the first hour of the minimum date to the last hour of the maximum date in the current time zone or in (2) below we assume that we only want hourly sequences for the dates in df also in the current time zone.
we assume that any times in the input should be dropped.
we assume that the id column of the input should be ignored.
No packages are used.
1) This calculates hour 0 of the first date and hour 0 of the day after the last date giving rng. The as.Date takes the Date part, range extracts out the smallest and largest dates into a vector of two components, adding 0:1 adds 0 to the first date leaving it as is and 1 to the second date converting it to the date after the last date. The format ensures that the Dates are converted to POSIXct in the current time zone rather than UTC. Then it creates an hourly sequence from those and uses head to drop the last value since it would be the day after the input's last date.
rng <- as.POSIXct(format(range(as.Date(df$Date)) + 0:1))
head(seq(rng[1], rng[2], "hour"), -1)
2) Another possibility is to paste together each date with each hour from 0 to 23 and then convert that to POSIXct. This will give the same result if the input dates are sequential; otherwise, it will give the hours only for those dates provided.
with(expand.grid(Date = as.Date(df$Date), hour = paste0(0:23, ":00:00")),
sort(as.POSIXct(paste(Date, hour))))
Note
df <- data.frame( id = c(1, 2, 3, 4),
Date = c("2021-04-18", "2021-04-19", "2021-04-21 07:07:08.000", "2021-04-22"))

How convert Time and Date

I have one question. How to convert that format 20110711201023 of date and time, to the number of hours. This is output of software which I use to image analysis, and I can’t change it. It is very important to define starting Date and Time.
Format: 2011 year, 07 month, 11 day, 20 hour, 10 minute, 23 second.
Example:
Starting Data and Time - 20110709201023,
First Data and Time - 20110711214020
Result = 49,5h.
I have 10000 data in this format so I don't want to do this manually.
I will be very gratefully for any advice.
Best is to first make it a real R time object using strptime:
time_obj = strptime("20110711201023", format = "%Y%m%d%H%M%S")
If you do this with both the start and the end date, you can simply say:
end_time - start_time
to get the difference in seconds, which can easily be converted to number of hours. To convert a whole list of these time strings, simply do:
time_vector = strptime(dat$time_string, format = "%Y%m%d%H%M%S")
where dat is the data.frame with the data, and time_string the column containing the time strings. Note that strptime works also on a vector (it is vectorized). You can also make the new time vector part of dat:
dat$time = strptime(dat$time_string, format = "%Y%m%d%H%M%S")
or more elegantly (at least if you hate $ as much as me :)):
dat = within(dat, { time = strptime(dat$time_string, format = "%Y%m%d%H%M%S") })

Obtaining or subsetting the first 5 minutes of each day of data from an xts

I would like to subset out the first 5 minutes of time series data for each day from minutely data, however the first 5 minutes do not occur at the same time each day thus using something like xtsobj["T09:00/T09:05"] would not work since the beginning of the first 5 minutes changes. i.e. sometimes it starts at 9:20am or some other random time in the morning instead of 9am.
So far, I have been able to subset out the first minute for each day using a function like:
k <- diff(index(xtsobj))> 10000
xtsobj[c(1, which(k)+1)]
i.e. finding gaps in the data that are larger than 10000 seconds, but going from that to finding the first 5 minutes of each day is proving more difficult as the data is not always evenly spaced out. I.e. between first minute and 5th minute there could be from 2 row to 5 rows and thus using something like:
xtsobj[c(1, which(k)+6)]
and then binding the results together
is not always accurate. I was hoping that a function like 'first' could be used, but wasn't sure how to do this for multiple days, perhaps this might be the optimal solution. Is there a better way of obtaining this information?
Many thanks for the stackoverflow community in advance.
split(xtsobj, "days") will create a list with an xts object for each day.
Then you can apply head to the each day
lapply(split(xtsobj, "days"), head, 5)
or more generally
lapply(split(xtsobj, "days"), function(x) {
x[1:5, ]
})
Finally, you can rbind the days back together if you want.
do.call(rbind, lapply(split(xtsobj, "days"), function(x) x[1:5, ]))
What about you use the package lubridate, first find out the starting point each day that according to you changes sort of randomly, and then use the function minutes
So it would be something like:
five_minutes_after = starting_point_each_day + minutes(5)
Then you can use the usual subset of xts doing something like:
5_min_period = paste(starting_point_each_day,five_minutes_after,sep='/')
xtsobj[5_min_period]
Edit:
#Joshua
I think this works, look at this example:
library(lubridate)
x <- xts(cumsum(rnorm(20, 0, 0.1)), Sys.time() - seq(60,1200,60))
starting_point_each_day= index(x[1])
five_minutes_after = index(x[1]) + minutes(5)
five_min_period = paste(starting_point_each_day,five_minutes_after,sep='/')
x[five_min_period]
In my previous example I made a mistake, I put the five_min_period between quotes.
Was that what you were pointing out Joshua? Also maybe the starting point is not necessary, just:
until5min=paste('/',five_minutes_after,sep="")
x[until5min]

Resources