Formatting my table Oracle SQL

Formatting my table Oracle SQL - oracle11g

I have a table that has the following columns
select a.ssrmeet_crn
,a.ssrmeet_begin_time
,a.ssrmeet_end_time
,a.ssrmeet_start_date
,a.ssrmeet_end_date
,a.ssrmeet_sun_day
,a.ssrmeet_mon_day
,a.ssrmeet_tue_day
,a.ssrmeet_wed_day
,a.ssrmeet_thu_day
,a.ssrmeet_fri_day
,a.ssrmeet_sat_day
from ssrmeet a
where a.ssrmeet_term_code = 201730
and a.ssrmeet_crn = 32789
A row for one class would look like this
SSRMEET_CRN SSRMEET_BEGIN_TIME SSRMEET_END_TIME SSRMEET_START_DATE SSRMEET_END_DATE SSRMEET_SUN_DAY SSRMEET_MON_DAY SSRMEET_TUE_DAY SSRMEET_WED_DAY SSRMEET_THU_DAY SSRMEET_FRI_DAY SSRMEET_SAT_DAY
32789 1000 1050 1/8/2018 12:00:00 AM 5/4/2018 12:00:00 AM M W F
I need it to list out every class meet like this
CRN START_DATE END_DATE
32789 1/15/2018 10:00:00 AM 1/15/2018 10:50:00 AM
32789 1/22/2018 10:00:00 AM 1/22/2018 10:50:00 AM
32789 1/29/2018 10:00:00 AM 1/29/2018 10:50:00 AM
32789 2/05/2018 10:00:00 AM 2/05/2018 10:50:00 AM
32789 2/12/2018 10:00:00 AM 2/12/2018 10:50:00 AM
32789 2/19/2018 10:00:00 AM 2/19/2018 10:50:00 AM
32789 2/26/2018 10:00:00 AM 2/26/2018 10:50:00 AM
32789 3/05/2018 10:00:00 AM 3/05/2018 10:50:00 AM
32789 3/12/2018 10:00:00 AM 3/12/2018 10:50:00 AM
32789 3/19/2018 10:00:00 AM 3/19/2018 10:50:00 AM
32789 3/26/2018 10:00:00 AM 3/26/2018 10:50:00 AM
32789 4/02/2018 10:00:00 AM 4/02/2018 10:50:00 AM
32789 4/09/2018 10:00:00 AM 4/09/2018 10:50:00 AM
32789 3/16/2018 10:00:00 AM 3/16/2018 10:50:00 AM
32789 4/23/2018 10:00:00 AM 4/23/2018 10:50:00 AM
32789 4/30/2018 10:00:00 AM 4/30/2018 10:50:00 AM
32789 1/10/2018 10:00:00 AM 1/10/2018 10:50:00 AM
32789 1/17/2018 10:00:00 AM 1/17/2018 10:50:00 AM
32789 1/24/2018 10:00:00 AM 1/24/2018 10:50:00 AM
32789 1/31/2018 10:00:00 AM 1/31/2018 10:50:00 AM
32789 2/07/2018 10:00:00 AM 2/07/2018 10:50:00 AM
32789 2/14/2018 10:00:00 AM 2/14/2018 10:50:00 AM
32789 2/21/2018 10:00:00 AM 2/21/2018 10:50:00 AM
2789 2/28/2018 10:00:00 AM 2/28/2018 10:50:00 AM
32789 3/07/2018 10:00:00 AM 3/07/2018 10:50:00 AM
32789 3/14/2018 10:00:00 AM 3/14/2018 10:50:00 AM
32789 3/21/2018 10:00:00 AM 3/21/2018 10:50:00 AM
32789 3/28/2018 10:00:00 AM 3/28/2018 10:50:00 AM
32789 4/04/2018 10:00:00 AM 4/04/2018 10:50:00 AM
32789 4/11/2018 10:00:00 AM 4/11/2018 10:50:00 AM
32789 4/18/2018 10:00:00 AM 4/18/2018 10:50:00 AM
32789 4/25/2018 10:00:00 AM 4/25/2018 10:50:00 AM
32789 5/02/2018 10:00:00 AM 5/02/2018 10:50:00 AM
32789 1/12/2018 10:00:00 AM 1/12/2018 10:50:00 AM
32789 1/19/2018 10:00:00 AM 1/19/2018 10:50:00 AM
32789 1/26/2018 10:00:00 AM 1/26/2018 10:50:00 AM
32789 2/02/2018 10:00:00 AM 2/02/2018 10:50:00 AM
32789 2/09/2018 10:00:00 AM 2/09/2018 10:50:00 AM
32789 2/16/2018 10:00:00 AM 2/16/2018 10:50:00 AM
32789 2/23/2018 10:00:00 AM 2/23/2018 10:50:00 AM
32789 3/02/2018 10:00:00 AM 3/02/2018 10:50:00 AM
32789 3/09/2018 10:00:00 AM 3/09/2018 10:50:00 AM
32789 3/16/2018 10:00:00 AM 3/16/2018 10:50:00 AM
32789 3/23/2018 10:00:00 AM 3/23/2018 10:50:00 AM
32789 3/30/2018 10:00:00 AM 3/30/2018 10:50:00 AM
32789 4/06/2018 10:00:00 AM 4/06/2018 10:50:00 AM
32789 4/13/2018 10:00:00 AM 4/13/2018 10:50:00 AM
32789 4/20/2018 10:00:00 AM 4/20/2018 10:50:00 AM
32789 4/27/2018 10:00:00 AM 4/27/2018 10:50:00 AM
32789 5/04/2018 10:00:00 AM 5/04/2018 10:50:00 AM
The day columns just have indicators if the class meets on that day. (Sunday 'U',
Monday 'M',
Tuesday 'T',
Wednesday 'W',
Thursday 'R',
Friday 'F',
Saturday 'S')
and the begin times and end times are in 24hour format.
I've tried a couple different paths to try and solve including using connect by this but I'm stumped.
This is as far as I've gotten with it. I can generate the first dates with it but I don't know where to go from here..
with
data_prep as(
select 'U' dotw_code, 'Sunday' week_day from dual
union all
select 'M', 'Monday' from dual
union all
select 'T', 'Tuesday' from dual
union all
select 'W', 'Wednesday' from dual
union all
select 'R', 'Thursday' from dual
union all
select 'F', 'Friday' from dual
union all
select 'S', 'Saturday' from dual
)
select a.ssrmeet_crn crn
,case b.dotw_code
when 'U'
then
next_day(to_date(to_char(ssrmeet_start_date, 'MM/DD/YYYY') || ' ' || ssrmeet_begin_time, 'MM/DD/YYYY HH24:MI:SS'),'SUN')
when 'M'
then
next_day(to_date(to_char(ssrmeet_start_date, 'MM/DD/YYYY') || ' ' || ssrmeet_begin_time, 'MM/DD/YYYY HH24:MI:SS'),'MON')
when 'T'
then
next_day(to_date(to_char(ssrmeet_start_date, 'MM/DD/YYYY') || ' ' || ssrmeet_begin_time, 'MM/DD/YYYY HH24:MI:SS'),'TUE')
when 'W'
then
next_day(to_date(to_char(ssrmeet_start_date, 'MM/DD/YYYY') || ' ' || ssrmeet_begin_time, 'MM/DD/YYYY HH24:MI:SS'),'WED')
when 'R'
then
next_day(to_date(to_char(ssrmeet_start_date, 'MM/DD/YYYY') || ' ' || ssrmeet_begin_time, 'MM/DD/YYYY HH24:MI:SS'),'THU')
when 'F'
then
next_day(to_date(to_char(ssrmeet_start_date, 'MM/DD/YYYY') || ' ' || ssrmeet_begin_time, 'MM/DD/YYYY HH24:MI:SS'),'FRI')
when 'S'
then
next_day(to_date(to_char(ssrmeet_start_date, 'MM/DD/YYYY') || ' ' || ssrmeet_begin_time, 'MM/DD/YYYY HH24:MI:SS'),'SAT')
end start_date
,case b.dotw_code
when 'U'
then
next_day(to_date(to_char(ssrmeet_start_date, 'MM/DD/YYYY') || ' ' || ssrmeet_end_time, 'MM/DD/YYYY HH24:MI:SS'),'SUN')
when 'M'
then
next_day(to_date(to_char(ssrmeet_start_date, 'MM/DD/YYYY') || ' ' || ssrmeet_end_time, 'MM/DD/YYYY HH24:MI:SS'),'MON')
when 'T'
then
next_day(to_date(to_char(ssrmeet_start_date, 'MM/DD/YYYY') || ' ' || ssrmeet_end_time, 'MM/DD/YYYY HH24:MI:SS'),'TUE')
when 'W'
then
next_day(to_date(to_char(ssrmeet_start_date, 'MM/DD/YYYY') || ' ' || ssrmeet_end_time, 'MM/DD/YYYY HH24:MI:SS'),'WED')
when 'R'
then
next_day(to_date(to_char(ssrmeet_start_date, 'MM/DD/YYYY') || ' ' || ssrmeet_end_time, 'MM/DD/YYYY HH24:MI:SS'),'THU')
when 'F'
then
next_day(to_date(to_char(ssrmeet_start_date, 'MM/DD/YYYY') || ' ' || ssrmeet_end_time, 'MM/DD/YYYY HH24:MI:SS'),'FRI')
when 'S'
then
next_day(to_date(to_char(ssrmeet_start_date, 'MM/DD/YYYY') || ' ' || ssrmeet_end_time, 'MM/DD/YYYY HH24:MI:SS'),'SAT')
end end_date
from ssrmeet a
left join data_prep b
on a.ssrmeet_sun_day = b.dotw_code
or a.ssrmeet_mon_day = b.dotw_code
or a.ssrmeet_tue_day = b.dotw_code
or a.ssrmeet_wed_day = b.dotw_code
or a.ssrmeet_thu_day = b.dotw_code
or a.ssrmeet_fri_day = b.dotw_code
or a.ssrmeet_sat_day = b.dotw_code
where ssrmeet_term_code = 201730
and ssrmeet_crn = 32789
If anyone could help out or even give me pointers on how to properly format my question on the forums I'd greatly appreciate it.

If I understand you correctly (that you'd like to have a list of the dates the meetings are supposed to happen between the start and end dates), then I think the following query should work for you:
WITH ssrmeet AS (SELECT 32789 ssrmeet_crn, 201730 ssrmeet_term_code, to_date('01/03/2017', 'dd/mm/yyyy') ssrmeet_start_date, to_date('01/04/2017', 'dd/mm/yyyy') ssrmeet_end_date, NULL sun_day, NULL mon_day, 'T' tue_day, 'W' wed_day, NULL thu_day, 'F' fri_day, 'S' sat_day FROM dual UNION ALL
SELECT 32790 ssrmeet_crn, 201730 ssrmeet_term_code, to_date('05/03/2017', 'dd/mm/yyyy') ssrmeet_start_date, to_date('27/03/2017', 'dd/mm/yyyy') ssrmeet_end_date, NULL sun_day, NULL mon_day, 'T' tue_day, 'W' wed_day, NULL thu_day, 'F' fri_day, 'S' sat_day FROM dual)
SELECT ssrmeet_crn,
ssrmeet_start_date,
ssrmeet_end_date,
dt,
to_char(dt, 'Dy', 'nls_date_language = english') dt_day,
CASE WHEN (to_char(dt, 'Dy', 'nls_date_language = english') = 'Mon' AND mon_day = 'M')
OR (to_char(dt, 'Dy', 'nls_date_language = english') = 'Tue' AND tue_day = 'T')
OR (to_char(dt, 'Dy', 'nls_date_language = english') = 'Wed' AND wed_day = 'W')
OR (to_char(dt, 'Dy', 'nls_date_language = english') = 'Thu' AND thu_day = 'R')
OR (to_char(dt, 'Dy', 'nls_date_language = english') = 'Fri' AND fri_day = 'F')
OR (to_char(dt, 'Dy', 'nls_date_language = english') = 'Sat' AND sat_day = 'S')
OR (to_char(dt, 'Dy', 'nls_date_language = english') = 'Sun' AND sun_day = 'U')
THEN dt
END meet_dt
FROM (SELECT ssrmeet_crn,
ssrmeet_start_date,
ssrmeet_end_date,
ssrmeet_start_date - 1 + LEVEL dt,
sun_day,
mon_day,
tue_day,
wed_day,
thu_day,
fri_day,
sat_day
FROM ssrmeet
CONNECT BY ssrmeet_crn = PRIOR ssrmeet_crn
AND ssrmeet_term_code = PRIOR ssrmeet_term_code
AND PRIOR sys_guid() IS NOT NULL
AND LEVEL <= ssrmeet_end_date - ssrmeet_start_date + 1)
WHERE CASE WHEN (to_char(dt, 'Dy', 'nls_date_language = english') = 'Mon' AND mon_day = 'M')
OR (to_char(dt, 'Dy', 'nls_date_language = english') = 'Tue' AND tue_day = 'T')
OR (to_char(dt, 'Dy', 'nls_date_language = english') = 'Wed' AND wed_day = 'W')
OR (to_char(dt, 'Dy', 'nls_date_language = english') = 'Thu' AND thu_day = 'R')
OR (to_char(dt, 'Dy', 'nls_date_language = english') = 'Fri' AND fri_day = 'F')
OR (to_char(dt, 'Dy', 'nls_date_language = english') = 'Sat' AND sat_day = 'S')
OR (to_char(dt, 'Dy', 'nls_date_language = english') = 'Sun' AND sun_day = 'U')
THEN dt
END IS NOT NULL
ORDER BY ssrmeet_crn,
ssrmeet_start_date,
dt;
SSRMEET_CRN SSRMEET_START_DATE SSRMEET_END_DATE DT DT_DAY MEET_DT
----------- ------------------ ---------------- ----------- ------ -----------
32789 01/03/2017 01/04/2017 01/03/2017 Wed 01/03/2017
32789 01/03/2017 01/04/2017 03/03/2017 Fri 03/03/2017
32789 01/03/2017 01/04/2017 04/03/2017 Sat 04/03/2017
32789 01/03/2017 01/04/2017 07/03/2017 Tue 07/03/2017
32789 01/03/2017 01/04/2017 08/03/2017 Wed 08/03/2017
32789 01/03/2017 01/04/2017 10/03/2017 Fri 10/03/2017
32789 01/03/2017 01/04/2017 11/03/2017 Sat 11/03/2017
32789 01/03/2017 01/04/2017 14/03/2017 Tue 14/03/2017
32789 01/03/2017 01/04/2017 15/03/2017 Wed 15/03/2017
32789 01/03/2017 01/04/2017 17/03/2017 Fri 17/03/2017
32789 01/03/2017 01/04/2017 18/03/2017 Sat 18/03/2017
32789 01/03/2017 01/04/2017 21/03/2017 Tue 21/03/2017
32789 01/03/2017 01/04/2017 22/03/2017 Wed 22/03/2017
32789 01/03/2017 01/04/2017 24/03/2017 Fri 24/03/2017
32789 01/03/2017 01/04/2017 25/03/2017 Sat 25/03/2017
32789 01/03/2017 01/04/2017 28/03/2017 Tue 28/03/2017
32789 01/03/2017 01/04/2017 29/03/2017 Wed 29/03/2017
32789 01/03/2017 01/04/2017 31/03/2017 Fri 31/03/2017
32789 01/03/2017 01/04/2017 01/04/2017 Sat 01/04/2017
32790 05/03/2017 27/03/2017 07/03/2017 Tue 07/03/2017
32790 05/03/2017 27/03/2017 08/03/2017 Wed 08/03/2017
32790 05/03/2017 27/03/2017 10/03/2017 Fri 10/03/2017
32790 05/03/2017 27/03/2017 11/03/2017 Sat 11/03/2017
32790 05/03/2017 27/03/2017 14/03/2017 Tue 14/03/2017
32790 05/03/2017 27/03/2017 15/03/2017 Wed 15/03/2017
32790 05/03/2017 27/03/2017 17/03/2017 Fri 17/03/2017
32790 05/03/2017 27/03/2017 18/03/2017 Sat 18/03/2017
32790 05/03/2017 27/03/2017 21/03/2017 Tue 21/03/2017
32790 05/03/2017 27/03/2017 22/03/2017 Wed 22/03/2017
32790 05/03/2017 27/03/2017 24/03/2017 Fri 24/03/2017
32790 05/03/2017 27/03/2017 25/03/2017 Sat 25/03/2017
This works by first of all doing a hierarchical query, producing a row for each date between the start and end dates (adding 1 to include the start_date in the results). This will produce the hierarchical query for each row if there is more than one row. If you are only expecting to do this for a single row at a time, you only need the last line in the connect by clause (the first three are necessary to make the hierarchy for each row, rather than across the set of rows. I assume that ssrmeet_term_code, ssrmeet_crn is unique).
Once you have this list, you can then check the day flags to see if the date is one of the flagged days, and if it isn't, don't output it. (N.B. since you didn't provide any sample data or expected output, I've had to guess at the table contents and the expected output.)
If you have any chance of being able to change the design, I would go with a simple Y/N (or Y/null) in the day flag columns, rather than each column having its own individual flag (i.e. mon_flag = Y, tue_flag = Y, etc, rather than mon_flag = 'M', tue_flag = 'T', etc)

Related

Update year only in column timestamp date field SQLITE

I want to update the year only to 2025 without changing the month day and time
what I have
2027-01-01 09:30:00
2012-03-06 12:00:00
2014-01-01 17:24:00
2020-07-03 04:30:00
2020-01-01 05:50:00
2021-09-03 06:30:00
2013-01-01 23:30:00
2026-01-01 08:30:00
2028-01-01 09:30:00
what i required is below:
2025-01-01 09:30:00
2025-03-06 12:30:00
2025-01-01 17:24:00
2025-07-03 04:30:00
2025-01-01 05:50:00
2025-09-03 06:30:00
2025-01-01 23:30:00
2025-01-01 08:30:00
2025-01-01 09:30:00
I am using dB Browser for SQLite
what i have tried but it didn't worked
update t set
d = datetime(strftime('%Y', datetime(2059)) || strftime('-%m-%d', d));

You may update via a substring operation:
UPDATE yourTable
SET ts = '2025-' || SUBSTR(ts, 6, 14);
Note that SQLite does not actually have a timestamp/datetime type. Instead, these values would be stored as text, and hence we can do a substring operation on them.

How to loop st_distance through list

My goal is to apply the st_distance function to a very large data frame,
yet because the data frame concerns multiple individuals, I split it using the purrr package and split function.
I have seen the use of 'lists' and 'forloops' in the past but I have no experience with these.
Below is a fraction of my dataset, I have split the dataframe by ID, into a list with 43 elements.
The st_distance function I plan to use looks something like, it it would be applied to the full data frame, not split into a list:
PART 2:
I want to do the same as explained by Dave2e, but now for geosphere::bearing
I have attached long and lat in wgs84 to the initial data frame, which now looks like this:
ID Date Time Datetime Long Lat x y
10_17 4/18/2017 15:02:00 4/18/2017 15:02 379800.5 5181001 -91.72272 46.35156
10_17 4/20/2017 6:00:00 4/20/2017 6:00 383409 5179885 -91.7044 46.34891
10_17 4/21/2017 21:02:00 4/21/2017 21:02 383191.2 5177960 -91.72297 46.35134
10_24 4/22/2017 10:03:00 4/22/2017 10:03 383448.6 5179918 -91.72298 46.35134
10_17 4/23/2017 12:01:00 4/23/2017 12:01 378582.5 5182110 -91.7242 46.34506
10_24 4/24/2017 1:00:00 4/24/2017 1:00 383647.4 5180009 -91.72515 46.34738
10_24 4/25/2017 16:01:00 4/25/2017 16:01 383407.9 5179872 -91.7184 46.32236
10_17 4/26/2017 18:02:00 4/26/2017 18:02 380691.9 5179353 -91.65361 46.34712
10_36 4/27/2017 20:00:00 4/27/2017 20:00 382521.9 5175266 -91.66127 46.3485
10_36 4/29/2017 11:01:00 4/29/2017 11:01 383443.8 5179909 -91.70303 46.35451
10_36 4/30/2017 0:00:00 4/30/2017 0:00 383060.8 5178361 -91.6685 46.32941
10_40 4/30/2017 13:02:00 4/30/2017 13:02 383426.3 5179873 -91.70263 46.35481
10_40 5/2/2017 17:02:00 5/2/2017 17:02 383393.7 5179883 -91.67099 46.34138
10_40 5/3/2017 6:01:00 5/3/2017 6:01 382875.8 5179376 -91.66324 46.34763
10_88 5/3/2017 19:02:00 5/3/2017 19:02 383264.3 5179948 -91.73075 46.3684
10_88 5/4/2017 8:01:00 5/4/2017 8:01 378554.4 5181966 -91.70413 46.35429
10_88 5/4/2017 21:03:00 5/4/2017 21:03 379830.5 5177232 -91.66452 46.37274
I then try a function similar to the one below, but with the coordinates changed to x and y but it leads to an error
dis_list <- split(data, data$ID)
answer <- lapply(dis_list, function(df) {
start <- df[-1 , c("x", "y")] %>%
st_as_sf(coords = c('x', 'y'))
end <- df[-nrow(df), c("x", "y")] %>%
st_as_sf(coords = c('x', 'y'))
angles <-geosphere::bearing(start, end)
df$angles <- c(NA, angles)
df
})
answer
which gives the error
Error in .pointsToMatrix(p1) :
'list' object cannot be coerced to type 'double'

Here is an basic solution. I split the original data into multiple data frames using split and then wrapped the distance function in lapply().
data <- read.table(header=TRUE, text="ID Date Time Datetime time2 Long Lat
10_17 4/18/2017 15:02:00 4/18/2017 15:02 379800.5 5181001
10_17 4/20/2017 6:00:00 4/20/2017 6:00 383409 5179885
10_17 4/21/2017 21:02:00 4/21/2017 21:02 383191.2 5177960
10_24 4/22/2017 10:03:00 4/22/2017 10:03 383448.6 5179918
10_17 4/23/2017 12:01:00 4/23/2017 12:01 378582.5 5182110
10_24 4/24/2017 1:00:00 4/24/2017 1:00 383647.4 5180009
10_24 4/25/2017 16:01:00 4/25/2017 16:01 383407.9 5179872
10_17 4/26/2017 18:02:00 4/26/2017 18:02 380691.9 5179353
10_36 4/27/2017 20:00:00 4/27/2017 20:00 382521.9 5175266
10_36 4/29/2017 11:01:00 4/29/2017 11:01 383443.8 5179909
10_36 4/30/2017 0:00:00 4/30/2017 0:00 383060.8 5178361
10_40 4/30/2017 13:02:00 4/30/2017 13:02 383426.3 5179873
10_40 5/2/2017 17:02:00 5/2/2017 17:02 383393.7 5179883
10_40 5/3/2017 6:01:00 5/3/2017 6:01 382875.8 5179376
10_88 5/3/2017 19:02:00 5/3/2017 19:02 383264.3 5179948
10_88 5/4/2017 8:01:00 5/4/2017 8:01 378554.4 5181966
10_88 5/4/2017 21:03:00 5/4/2017 21:03 379830.5 5177232")
#EPSG:32615 32615
library(sf)
library(magrittr)
dfs <- split(data, data$ID)
answer <- lapply(dfs, function(df) {
#convert to a sf oject and specify coordinate systems
start <- df[-1 , c("Long", "Lat")] %>%
st_as_sf(coords = c('Long', 'Lat')) %>%
st_set_crs(32615)
end <- df[-nrow(df), c("Long", "Lat")] %>%
st_as_sf(coords = c('Long', 'Lat')) %>%
st_set_crs(32615)
#long_lat <-st_transform(start, 4326)
distances <-sf::st_distance(start, end, by_element = TRUE)
df$distances <- c(NA, distances)
df
})
answer
$`10_17`
ID Date Time Datetime time2 Long Lat distances
1 10_17 4/18/2017 15:02:00 4/18/2017 15:02 379800.5 5181001 NA
2 10_17 4/20/2017 6:00:00 4/20/2017 6:00 383409.0 5179885 3777.132
3 10_17 4/21/2017 21:02:00 4/21/2017 21:02 383191.2 5177960 1937.282
5 10_17 4/23/2017 12:01:00 4/23/2017 12:01 378582.5 5182110 6201.824
8 10_17 4/26/2017 18:02:00 4/26/2017 18:02 380691.9 5179353 3471.400
$`10_24`
ID Date Time Datetime time2 Long Lat distances
4 10_24 4/22/2017 10:03:00 4/22/2017 10:03 383448.6 5179918 NA
6 10_24 4/24/2017 1:00:00 4/24/2017 1:00 383647.4 5180009 218.6377
7 10_24 4/25/2017 16:01:00 4/25/2017 16:01 383407.9 5179872 275.9153
There should be an easier way to calculate distances between rows instead of creating 2 series of points.
Referenced: Converting table columns to spatial objects

Unpredictable results using cut() function in R to convert dates to 15 minute intervals

OK, this is making me crazy.
I have several datasets with time values that need to be rolled up into 15 minute intervals.
I found a solution here that works beautifully on one dataset. But on the next one I try to do I'm getting weird results. I have a column with character data representing dates:
BeginTime
-------------------------------
1 1/3/19 1:50 PM
2 1/3/19 1:30 PM
3 1/3/19 4:56 PM
4 1/4/19 11:23 AM
5 1/6/19 7:45 PM
6 1/7/19 10:15 PM
7 1/8/19 12:02 PM
8 1/9/19 10:43 PM
And I'm using the following code (which is exactly what I used on the other dataset except for the names)
df$by15 = cut(mdy_hm(df$BeginTime), breaks="15 min")
but what I get is:
BeginTime by15
-------------------------------------------------------
1 1/3/19 1:50 PM 2019-01-03 13:36:00
2 1/3/19 1:30 PM 2019-01-03 13:21:00
3 1/3/19 4:56 PM 2019-01-03 16:51:00
4 1/4/19 11:23 AM 2019-01-04 11:21:00
5 1/6/19 7:45 PM 2019-01-06 19:36:00
6 1/7/19 10:15 PM 2019-01-07 22:06:00
7 1/8/19 12:02 PM 2019-01-08 11:51:00
8 1/9/19 10:43 PM 2019-01-09 22:36:00
9 1/10/19 11:25 AM 2019-01-10 11:21:00
Any suggestions on why I'm getting such random times instead of the 15-minute intervals I'm looking for? Like I said, this worked fine on the other data set.

You can use lubridate::round_date() function which will roll-up your datetime data as follows;
library(lubridate) # To handle datetime data
library(dplyr) # For data manipulation
# Creating dataframe
df <-
data.frame(
BeginTime = c("1/3/19 1:50 PM", "1/3/19 1:30 PM", "1/3/19 4:56 PM",
"1/4/19 11:23 AM", "1/6/19 7:45 PM", "1/7/19 10:15 PM",
"1/8/19 12:02 PM", "1/9/19 10:43 PM")
)
df %>%
# First we parse the data in order to convert it from string format to datetime
mutate(by15 = parse_date_time(BeginTime, '%d/%m/%y %I:%M %p'),
# We roll up the data/round it to 15 minutes interval
by15 = round_date(by15, "15 mins"))
#
# BeginTime by15
# 1/3/19 1:50 PM 2019-03-01 13:45:00
# 1/3/19 1:30 PM 2019-03-01 13:30:00
# 1/3/19 4:56 PM 2019-03-01 17:00:00
# 1/4/19 11:23 AM 2019-04-01 11:30:00
# 1/6/19 7:45 PM 2019-06-01 19:45:00
# 1/7/19 10:15 PM 2019-07-01 22:15:00
# 1/8/19 12:02 PM 2019-08-01 12:00:00
# 1/9/19 10:43 PM 2019-09-01 22:45:00

Intraday high/low clustering

I am attempting to perform a study on the clustering of high/low points based on time. I managed to achieve the above by using to.daily on intraday data and merging the two using:
intraday.merge <- merge(intraday,daily)
intraday.merge <- na.locf(intraday.merge)
intraday.merge <- intraday.merge["T08:30:00/T16:30:00"] # remove record at 00:00:00
Next, I tried to obtain the records where the high == daily.high/low == daily.low using:
intradayhi <- test[test$High == test$Daily.High]
intradaylo <- test[test$Low == test$Daily.Low]
Resulting data resembles the following:
Open High Low Close Volume Daily.Open Daily.High Daily.Low Daily.Close Daily.Volume
2012-06-19 08:45:00 258.9 259.1 258.5 258.7 1424 258.9 259.1 257.7 258.7 31523
2012-06-20 13:30:00 260.8 260.9 260.6 260.6 1616 260.4 260.9 259.2 260.8 35358
2012-06-21 08:40:00 260.7 260.8 260.4 260.5 493 260.7 260.8 257.4 258.3 31360
2012-06-22 12:10:00 255.9 256.2 255.9 256.1 626 254.5 256.2 253.9 255.3 50515
2012-06-22 12:15:00 256.1 256.2 255.9 255.9 779 254.5 256.2 253.9 255.3 50515
2012-06-25 11:55:00 254.5 254.7 254.4 254.6 1589 253.8 254.7 251.5 253.9 65621
2012-06-26 08:45:00 253.4 254.2 253.2 253.7 5849 253.8 254.2 252.4 253.1 70635
2012-06-27 11:25:00 255.6 256.0 255.5 255.9 973 251.8 256.0 251.8 255.2 53335
2012-06-28 09:00:00 257.0 257.3 256.9 257.1 601 255.3 257.3 255.0 255.1 23978
2012-06-29 13:45:00 253.0 253.4 253.0 253.4 451 247.3 253.4 246.9 253.4 52539
There are duplicated results using the subset, how do I achieve only the first record of the day? I would then be able to plot the count of records for periods in the day.
Also, are there alternate methods to get the results I want? Thanks in advance.
Edit:
Sample output should look like this, count could either be 1st result for day or aggregated (more than 1 occurrence in that day):
Time Count
08:40:00 60
08:45:00 54
08:50:00 60
...
14:00:00 20
14:05:00 12
14:10:00 30

You can get the first observation of each day via:
y <- apply.daily(x, first)
Then you can simply aggregate the count based on hours and minutes:
z <- aggregate(1:NROW(y), by=list(Time=format(index(y),"%H:%M")), sum)

Date time am/pm in R

I'm having an issue with the datetime field of a timeseries:
> CO1temp[163:169,]
Date OPEN HIGH LOW CLOSE
163 7/11/2011 11:45:00 PM 116.30 116.30 116.09 116.18
164 7/11/2011 11:50:00 PM 116.16 116.78 116.13 116.70
165 7/11/2011 11:55:00 PM 116.69 116.83 116.51 116.65
166 7/12/2011 116.65 116.79 116.44 116.50
167 7/12/2011 12:05:00 AM 116.50 116.60 116.39 116.47
168 7/12/2011 12:10:00 AM 116.49 116.55 116.38 116.52
169 7/12/2011 12:15:00 AM 116.52 116.67 116.39 116.44
As you can see the midnight time (line 166) is not showing properly.
Which creates a NA when I create my xts object:
CO1 <- as.xts(CO1temp[, 2:5], order.by = as.POSIXct(CO1temp[,1],format='%m/%d/%Y %r'),frequency="5 minutes")
> CO1[163:169,]
OPEN HIGH LOW CLOSE
2011-07-11 23:45:00 116.30 116.30 116.09 116.18
2011-07-11 23:50:00 116.16 116.78 116.13 116.70
2011-07-11 23:55:00 116.69 116.83 116.51 116.65
<NA> 116.65 116.79 116.44 116.50
2011-07-12 00:05:00 116.50 116.60 116.39 116.47
2011-07-12 00:10:00 116.49 116.55 116.38 116.52
2011-07-12 00:15:00 116.52 116.67 116.39 116.44
This later leads to more problem when I want to analyze this timeseries.
?strptime is quite specific about it:
The default for the format methods is "%Y-%m-%d %H:%M:%S" if any component has a time component which is not midnight, and "%Y-%m-%d" otherwise.
However my datetime is not in the standard format.
I would greatly appreciate any help.

This a kind of a hack but it works.
You just have to append "12:00:00 AM" to your vector of date: those which are lacking the hour information will be read correctly, and in the dates that already have the hour information it will just be ignored and only the one that was already there will be read.
CO1 <- as.xts(CO1temp[, 2:5],
order.by = as.POSIXct(paste(CO1temp$Date,"12:00:00 AM", sep=" "),
format='%m/%d/%Y %r'),
frequency="5 minutes")
CO1
OPEN HIGH LOW CLOSE
2011-07-11 23:45:00 116.30 116.30 116.09 116.18
2011-07-11 23:50:00 116.16 116.78 116.13 116.70
2011-07-11 23:55:00 116.69 116.83 116.51 116.65
2011-07-12 00:00:00 116.65 116.79 116.44 116.50
2011-07-12 00:05:00 116.50 116.60 116.39 116.47
2011-07-12 00:10:00 116.49 116.55 116.38 116.52
2011-07-12 00:15:00 116.52 116.67 116.39 116.44
That being said, if you ended up with your dataframe as it is after using strptime then your date column is already in POSIXct format and therefore the following should work directly:
as.xts(CO1temp[, 2:5], order.by = CO1temp$Date, frequency = "5 minutes")

Categories

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Formatting my table Oracle SQL - oracle11g

Related

Update year only in column timestamp date field SQLITE

How to loop st_distance through list

Unpredictable results using cut() function in R to convert dates to 15 minute intervals

Intraday high/low clustering

Date time am/pm in R

Categories

Resources