How do I hide/remove year from datetime query - datetime

I want to return month - day from my date column. It is set to Datetime format.
I have tried DATENAME, DATEPART, month, day, and year and all of them give me a singular response. I'd like to keep the month and day in the same column. If not I'll find a workaround.
2022-01-09
01-09
Select Cast(created_date as date)Date,
[Name],
count(Name)Total,
Cast(Sum([Total_Tax_Exclusive_Price])as Decimal(18,2))Revenue
From DBO.Receipts
Where [name] Like 'Smartfruit Smoothie%'
and [Created_Date] between '2022-09-15' and '2022-12-20'
Group by Cast(created_date as date),Name
This query returns
Date | Name | Count|Revenue
2022-09-15 Smartfruit Smoothie 10 46.75
2022-09-16 Smartfruit Smoothie 3 14.00
2022-09-17 Smartfruit Smoothie 14 64.75
2022-09-19 Smartfruit Smoothie 10 49.00
2022-09-20 Smartfruit Smoothie 6 29.50
2022-09-21 Smartfruit Smoothie 14 69.75

Related

R count total dates in each row that are later than a certain date

I have a data frame with dates in the format of yyyy-mm-dd hh:mm:ss.
I want to count the total values in each row that are later than 2021-05-22 00:00:00, exclude the ID and creation_date column, and add a new column date_count with the count.
This is the data frame I have:
ID creation_date date1 date2 date230
1 2021-03-04 06:40:37 2021-03... 2022-06... 2022-06
2 2021-03-05 04:24:43 2021-04... NA NA
3 2022-03-19 20:37:07 2022-05... 2022-06... NA
...
563 2022-04-23 20:59:45 2022-05... 2022-05... 2022-06
This is the desired result:
ID creation_date date1 date2 date230 date_count
1 2021-03-04 06:40:37 2021-03... 2022-06... 2022-06... 123
2 2021-03-05 04:24:43 2021-04... NA NA 123
3 2022-03-19 20:37:07 2022-05... 2022-06... NA 123
...
563 2022-04-23 20:59:45 2022-05... 2022-05... 2022-06... 123
Any help would be appreciated. Thank you!

In R how to calculate if Date is earlier then date X?

I got a DF with a date column in it. I want to check if the date in the column is after or before 1st of January 2020. Create a new column and if the previous columns date is before then insert 1st of January 2020 if not then insert previous columns date.
Date is in format YYYY-MM-DD
Beginning End
2020-12-31 2021-01-12
2018-01-02 2020-03-10
2019-04-12 2020-12-04
2020-10-15 2021-03-27
I want:
Beginning End Beginning_2
2020-12-31 2021-01-12 2020-12-31
2018-01-02 2020-03-10 2020-01-01
2019-04-12 2020-12-04 2020-01-01
2020-10-15 2021-03-27 2020-10-15
The code i wrote is:
DF$Beginning_2 <- ifelse("2020-01-01" > DF$Beginning,"2020-01-01", DF$Beginning)
I'm getting this
Beginning End Beginning_2
2020-12-31 2021-01-12 18554
2018-01-02 2020-03-10 2020-01-01
2019-04-12 2020-12-04 2020-01-01
2020-10-15 2021-03-27 18453
My code works half way. It turns the format in to char. I need it to stay as date. I tried butting as date all over the code but nothing much changed. The biggest change was that greater then 2020-01-01 dates were NA instead of "18554".
How to fix my code?
Thank you
You can use pmax:
DF$Beginning_2 <- pmax(DF$Beginning, as.Date("2020-01-01"))
#DF$Beginning_2 <- pmax(DF$Beginning, "2020-01-01") #Works also
DF
# Beginning End Beginning_2
#1 2020-12-31 2021-01-12 2020-12-31
#2 2018-01-02 2020-03-10 2020-01-01
#3 2019-04-12 2020-12-04 2020-01-01
#4 2020-10-15 2021-03-27 2020-10-15
str(DF)
#'data.frame': 4 obs. of 3 variables:
# $ Beginning : Date, format: "2020-12-31" "2018-01-02" ...
# $ End : Date, format: "2021-01-12" "2020-03-10" ...
# $ Beginning_2: Date, format: "2020-12-31" "2020-01-01" ...
Base R ifelse would return dates as numbers you will need to convert them back to dates.
DF$Beginning_2 <- as.Date(ifelse(DF$Beginning > as.Date("2020-01-01"),
DF$Beginning, as.Date("2020-01-01")), origin = '1970-01-01')
You may use dplyr::if_else which will maintain the class of the date columns.
DF$Beginning_2 <- dplyr::if_else(DF$Beginning > as.Date("2020-01-01"),
DF$Beginning, as.Date("2020-01-01"))
DF
# Beginning End Beginning_2
#1 2020-12-31 2021-01-12 2020-12-31
#2 2018-01-02 2020-03-10 2020-01-01
#3 2019-04-12 2020-12-04 2020-01-01
#4 2020-10-15 2021-03-27 2020-10-15

Minute time series in R. How to insert missing values in order to have the same steps in time?

I have a dataset where column 1 is date-time and column 2 is the price at a specific point in time. This data is downloaded to Excel with bloomberg excel add-in. Then I used read_excel function to import this file to R.
This is how the data looks like in R
Question: the data is supposed to be with 1 min intervals, but it is not always the case. Sometimes the time in the next row is more than 1 min later. So, how can I insert extra rows for the missing minutes? So, for each date I would like to have the following sequence:
08:00
08:01
08:02
...
16:58
16:59
17:00
For these points in time, I would like keep the price from the dataset. If the price is not there, it should add missing. For example if we have:
...
12:31 100
12:32 102
12:35 101
...
then I would like to have:
...
12:31 100
12:32 102
12:33 missing
12:34 missing
12:35 101
...
what is the easiest way to do this? Thank you!
You can create an xts with the prices you have and merge it with a sequence that has a higher frequency (e.g. every minute).
library(xts)
library(lubridate)
set.seed(123)
prices <- 100 + rnorm(16)
timeindex <- seq(ymd_hm('2020-05-28 08:45'),
ymd_hm('2020-05-28 09:15'),
by = '2 mins')
prices_xts <- xts(prices, order.by = timeindex)
> head(prices_xts)
[,1]
2020-05-28 08:45:00 99.43952
2020-05-28 08:47:00 99.76982
2020-05-28 08:49:00 101.55871
2020-05-28 08:51:00 100.07051
2020-05-28 08:53:00 100.12929
2020-05-28 08:55:00 101.71506
timeindex2 <- seq(ymd_hm('2020-05-28 08:45'),
ymd_hm('2020-05-28 09:15'),
by = '1 mins')
prices_with_gaps_xts <- merge.xts(prices_xts,
timeindex2)
> head(prices_with_gaps_xts)
prices_xts
2020-05-28 08:45:00 99.43952
2020-05-28 08:46:00 NA
2020-05-28 08:47:00 99.76982
2020-05-28 08:48:00 NA
2020-05-28 08:49:00 101.55871
2020-05-28 08:50:00 NA

Calculation using the date function

Need to get the sales for every last month of the years.
Month Sales
01-03-2018 2351
01-06-2018 4522
01-09-2018 3632
01-12-2018 6894
01-03-2019 5469
01-06-2019 6546
01-09-2019 7885
01-12-2019 6597
01-03-2020 7845
01-06-2020 6894
01-09-2020 5469
01-12-2020 6546
01-03-2021 2351
01-06-2021 4522
01-09-2021 3632
01-12-2021 6546
01-03-2022 7885
01-06-2022 6597
01-09-2022 7845
01-12-2022 6894
Here i want to find the sales of every 12 months of the year.
Output should be as follows:
Month Sales
01-12-2018 6894
01-12-2019 6597
01-12-2020 6546
01-12-2021 6546
01-12-2022 6894
I can select every forth row from the table, but i want to do it using the Date Function. Please help.
Make sure your column Month is set as a date variable and use format to get the month, i.e.
#Make sure it is a date variable
df$Month <- as.POSIXct(df$Month, format = '%d-%m-%Y')
df[format(df$Month, '%m') == 12,]
which gives,
Month Sales
4 2018-12-01 6894
8 2019-12-01 6597
12 2020-12-01 6546
16 2021-12-01 6546
20 2022-12-01 6894
One way with startsWith:
#Month needs to be of character type
df[startsWith(df$Month, '01-12-'), ]
# Month Sales
#4 01-12-2018 6894
#8 01-12-2019 6597
#12 01-12-2020 6546
#16 01-12-2021 6546
#20 01-12-2022 6894

Intraday high/low clustering

I am attempting to perform a study on the clustering of high/low points based on time. I managed to achieve the above by using to.daily on intraday data and merging the two using:
intraday.merge <- merge(intraday,daily)
intraday.merge <- na.locf(intraday.merge)
intraday.merge <- intraday.merge["T08:30:00/T16:30:00"] # remove record at 00:00:00
Next, I tried to obtain the records where the high == daily.high/low == daily.low using:
intradayhi <- test[test$High == test$Daily.High]
intradaylo <- test[test$Low == test$Daily.Low]
Resulting data resembles the following:
Open High Low Close Volume Daily.Open Daily.High Daily.Low Daily.Close Daily.Volume
2012-06-19 08:45:00 258.9 259.1 258.5 258.7 1424 258.9 259.1 257.7 258.7 31523
2012-06-20 13:30:00 260.8 260.9 260.6 260.6 1616 260.4 260.9 259.2 260.8 35358
2012-06-21 08:40:00 260.7 260.8 260.4 260.5 493 260.7 260.8 257.4 258.3 31360
2012-06-22 12:10:00 255.9 256.2 255.9 256.1 626 254.5 256.2 253.9 255.3 50515
2012-06-22 12:15:00 256.1 256.2 255.9 255.9 779 254.5 256.2 253.9 255.3 50515
2012-06-25 11:55:00 254.5 254.7 254.4 254.6 1589 253.8 254.7 251.5 253.9 65621
2012-06-26 08:45:00 253.4 254.2 253.2 253.7 5849 253.8 254.2 252.4 253.1 70635
2012-06-27 11:25:00 255.6 256.0 255.5 255.9 973 251.8 256.0 251.8 255.2 53335
2012-06-28 09:00:00 257.0 257.3 256.9 257.1 601 255.3 257.3 255.0 255.1 23978
2012-06-29 13:45:00 253.0 253.4 253.0 253.4 451 247.3 253.4 246.9 253.4 52539
There are duplicated results using the subset, how do I achieve only the first record of the day? I would then be able to plot the count of records for periods in the day.
Also, are there alternate methods to get the results I want? Thanks in advance.
Edit:
Sample output should look like this, count could either be 1st result for day or aggregated (more than 1 occurrence in that day):
Time Count
08:40:00 60
08:45:00 54
08:50:00 60
...
14:00:00 20
14:05:00 12
14:10:00 30
You can get the first observation of each day via:
y <- apply.daily(x, first)
Then you can simply aggregate the count based on hours and minutes:
z <- aggregate(1:NROW(y), by=list(Time=format(index(y),"%H:%M")), sum)

Resources