How to add hourly rows on a daily sequence dataframe? - r

if i have a daily data in the following format:
A:
DD-MM-YYYY
01-01-2000
02-01-2000
03-01-2000
04-01-2000
...
31-12-2010
31-12-2010
31-12-2010
31-12-2010
How to add hourly values to all the days and obtain a new A like:
A:
DD-MM-YYYY hour
01-01-2000 00:00
01-01-2000 01:00
01-01-2000 02:00
01-01-2000 03:00
...
01-01-2000 21:00
01-01-2000 22:00
01-01-2000 23:00
...
...
31-12-2010 21:00
31-12-2010 22:00
31-12-2010 23:00

This will stick 00:00 to 23:00 on to each of your days:
expand.grid(day = A$`DD-MM-YYYY`, hour = sprintf("%02d:00", 0:23))
However, in the real world you might prefer to use seq.POSIXt, which will account for leap years, daylight savings, etc.

Related

Upsample data with mean

I am trying to upsample my datetime data and fill in the gap with a mean rather than forward or backward fill.
Sample df-
TIME VALUE
01:00 4
02:00 8
03:00 2
desired output-
TIME VALUE
01:00 4
01:30 6
02:00 8
02:30 5
03:00 2
Currently I did a straightforward resample('30min') and want fill the NaN values
TIME VALUE
01:00 4
01:30 NaN
02:00 8
02:30 NaN
03:00 2
With the mean rather than backward or forward fill.
Figured one way to solve the problem.
df2=df.resample('30min',on='Time')
final=df2.interpolate(method='linear')
But I would keen to look at other ways to this apart from interpolation!

Unpredictable results using cut() function in R to convert dates to 15 minute intervals

OK, this is making me crazy.
I have several datasets with time values that need to be rolled up into 15 minute intervals.
I found a solution here that works beautifully on one dataset. But on the next one I try to do I'm getting weird results. I have a column with character data representing dates:
BeginTime
-------------------------------
1 1/3/19 1:50 PM
2 1/3/19 1:30 PM
3 1/3/19 4:56 PM
4 1/4/19 11:23 AM
5 1/6/19 7:45 PM
6 1/7/19 10:15 PM
7 1/8/19 12:02 PM
8 1/9/19 10:43 PM
And I'm using the following code (which is exactly what I used on the other dataset except for the names)
df$by15 = cut(mdy_hm(df$BeginTime), breaks="15 min")
but what I get is:
BeginTime by15
-------------------------------------------------------
1 1/3/19 1:50 PM 2019-01-03 13:36:00
2 1/3/19 1:30 PM 2019-01-03 13:21:00
3 1/3/19 4:56 PM 2019-01-03 16:51:00
4 1/4/19 11:23 AM 2019-01-04 11:21:00
5 1/6/19 7:45 PM 2019-01-06 19:36:00
6 1/7/19 10:15 PM 2019-01-07 22:06:00
7 1/8/19 12:02 PM 2019-01-08 11:51:00
8 1/9/19 10:43 PM 2019-01-09 22:36:00
9 1/10/19 11:25 AM 2019-01-10 11:21:00
Any suggestions on why I'm getting such random times instead of the 15-minute intervals I'm looking for? Like I said, this worked fine on the other data set.
You can use lubridate::round_date() function which will roll-up your datetime data as follows;
library(lubridate) # To handle datetime data
library(dplyr) # For data manipulation
# Creating dataframe
df <-
data.frame(
BeginTime = c("1/3/19 1:50 PM", "1/3/19 1:30 PM", "1/3/19 4:56 PM",
"1/4/19 11:23 AM", "1/6/19 7:45 PM", "1/7/19 10:15 PM",
"1/8/19 12:02 PM", "1/9/19 10:43 PM")
)
df %>%
# First we parse the data in order to convert it from string format to datetime
mutate(by15 = parse_date_time(BeginTime, '%d/%m/%y %I:%M %p'),
# We roll up the data/round it to 15 minutes interval
by15 = round_date(by15, "15 mins"))
#
# BeginTime by15
# 1/3/19 1:50 PM 2019-03-01 13:45:00
# 1/3/19 1:30 PM 2019-03-01 13:30:00
# 1/3/19 4:56 PM 2019-03-01 17:00:00
# 1/4/19 11:23 AM 2019-04-01 11:30:00
# 1/6/19 7:45 PM 2019-06-01 19:45:00
# 1/7/19 10:15 PM 2019-07-01 22:15:00
# 1/8/19 12:02 PM 2019-08-01 12:00:00
# 1/9/19 10:43 PM 2019-09-01 22:45:00

group a column by date with different formats

I have a dataset where one column has a date and time values. Every date has multiple entries. The first row for every date has a date value inthe form 29MAY2018_00:00:00.000000 while the rest of the row for the same date has time values i.e. 20:00 - 21:00. The problem is that I want to sum the values in another column for each day.
The sample data has the following format
Date A
29MAY2018_00:00:00.000000
20:00 - 21:00 0.009
21:00 - 22:00 0.003
22:00 - 23:00 0.0003
23:00 - 00:00 0
30MAY2018_00:00:00.000000
00:00 - 01:00 -0.0016
01:00 - 02:00 -0.0012
02:00 - 03:00 -0.0002
03:00 - 04:00 -0.0023
04:00 - 05:00 0
05:00 - 06:00 -0.0005
20:00 - 21:00 -0.0042
21:00 - 22:00 -0.0035
22:00 - 23:00 -0.0026
23:00 - 00:00 -0.001
I have created a new column
data$C[data$A ==0 ] <- 0
data$C[data$A < 0 ] <- -1
data$C[data$A > 0 ] <- 1
I need to sum the column `C' for every date.
The output should be
A B
29-MAY-2019 4
30-MAY-2019 -9
31-MAY-2019 3
An option would be to create a grouping column based on the occurrence of full datetime format in the 'Date', summarise the first 'Date', convert it to Date format (with anydate from anytime) and get the sum of sign of 'A'
library(tidyverse)
library(anytime)
data %>%
group_by(grp = cumsum(str_detect(Date, "[A-Z]"))) %>%
summarise(Date = anydate(first(Date)),
B = sum(sign(A), na.rm = TRUE))

How to parse year from a date in r [duplicate]

This question already has answers here:
Extract year from date
(7 answers)
Closed 5 years ago.
I have 53000 Date data-set and I want to extract only "year" from the date variable.
Do you guys know how can I do this?
My data are as follows:
OPN_DT_TM
18/07/2003 10:55
12/06/2004 6:00
9/06/2007 12:20
29/06/2001 16:00
6/06/2000 7:55
27/11/2006 10:15
17/11/2001 17:00
12/05/2004 22:00
16/04/2005 22:00
18/03/2005 8:40
13/06/2006 11:10
30/07/2006 12:00
16/07/2002 6:10
16/07/2002 7:15
3/09/2004 6:00
9/11/2004 15:20
25/08/2005 14:15
24/11/2001 19:10
15/04/2002 6:30
20/06/2002 6:30
17/03/2003 7:00
15/01/2005 13:00
23/03/2007 1:00
21/01/2001 10:30
,,,
This can be achieved by converting the entries into Date format and extracting the year, for instance like this:
> format(as.Date("15/01/2005 13:00", format="%d/%m/%Y %H:%M"),"%Y")
[1] "2005"
To get in-depth knowledge about dates and times in R, please see this.

Time series is throwing up error message

I am trying to conduct a time series analysis based on this dataset:
time POINT_Y POINT_X
00:00 106.78 207.44
00:30 106.61 207.6
01:00 103.72 208.33
01:30 102.57 207.35
02:00 102.27 206.3
02:30 101.6 206.43
03:00 100.66 206.73
03:30 101.11 206.5
04:00 100.95 206.63
04:30 102.02 206.27
05:00 105.83 207.93
05:30 106.98 207.15
06:00 107.32 206.28
06:30 108.36 204.7
07:00 107.97 203.41
07:30 107.76 202.63
08:00 107.85 201.13
08:30 107.6 198.74
It has been set as:
austriacus<-read.table("austriacus.txt",header=T).
The time series function: x.ts<-ts(POINT_X,time) is not working and is producing the following error message: Error in is.data.frame(data) : object 'POINT_X' not found
Any ideas on this?
Try the zoo and chron packages:
Lines <- "time POINT_Y POINT_X
00:00 106.78 207.44
00:30 106.61 207.6
01:00 103.72 208.33
01:30 102.57 207.35
02:00 102.27 206.3
02:30 101.6 206.43
03:00 100.66 206.73
03:30 101.11 206.5
04:00 100.95 206.63
04:30 102.02 206.27
05:00 105.83 207.93
05:30 106.98 207.15
06:00 107.32 206.28
06:30 108.36 204.7
07:00 107.97 203.41
07:30 107.76 202.63
08:00 107.85 201.13
08:30 107.6 198.74
"
library(zoo)
library(chron)
to.times <- function(x) times(paste0(x, ":00"))
# z <- read.zoo("myfile", header = TRUE, FUN = to.times)
z <- read.zoo(text = Lines, header = TRUE, FUN = to.times)
plot(z)

Resources