create 30 min interval for time series with different start time - r

I have data for electricity sensor reading with interval 15 min but the start time is not fixed for example
in this day it start at min 13 another day start from different minute
dateTime KW
1/1/2013 1:13 34.70
1/1/2013 1:28 43.50
1/1/2013 1:43 50.50
1/1/2013 1:58 57.50
.
.
.//here start from min 02
1/30/2013 0:02 131736.30
1/30/2013 0:17 131744.30
1/30/2013 0:32 131751.10
1/30/2013 0:47 131759.00
I have data for one year and i need to have regular interval 30 min starting from mid night 00:00.
I am new to R ..can anyone help me

May be you can try:
dT <- as.POSIXct(strptime(df$dateTime, '%m/%d/%Y %H:%M'))
grp <- as.POSIXct(cut(c(as.POSIXct(gsub(' +.*', '', min(dT))), dT,
as.POSIXct(gsub(' +.*', '', max(dT)+24*3600))), breaks='30 min'))
df$grp <- grp[-c(1,length(grp))]
df
# dateTime KW grp
#1 1/1/2013 1:13 34.7 2013-01-01 01:00:00
#2 1/1/2013 1:28 43.5 2013-01-01 01:00:00
#3 1/1/2013 1:43 50.5 2013-01-01 01:30:00
#4 1/1/2013 1:58 57.5 2013-01-01 01:30:00
#5 1/30/2013 0:02 131736.3 2013-01-30 00:00:00
#6 1/30/2013 0:17 131744.3 2013-01-30 00:00:00
#7 1/30/2013 0:32 131751.1 2013-01-30 00:30:00
#8 1/30/2013 0:47 131759.0 2013-01-30 00:30:00
data
df <- structure(list(dateTime = c("1/1/2013 1:13", "1/1/2013 1:28",
"1/1/2013 1:43", "1/1/2013 1:58", "1/30/2013 0:02", "1/30/2013 0:17",
"1/30/2013 0:32", "1/30/2013 0:47"), KW = c(34.7, 43.5, 50.5,
57.5, 131736.3, 131744.3, 131751.1, 131759)), .Names = c("dateTime",
"KW"), class = "data.frame", row.names = c(NA, -8L))

Related

Using loop to calculate bearings over a list

My goal is to apply the geosphere::bearing function to a very large data frame,
yet because the data frame concerns multiple individuals, I split it using the purrr package and split function.
I have seen the use of 'lists' and 'forloops' in the past but I have no experience with these.
Below is a fraction of my dataset, I have split the dataframe by ID, into a list with 43 elements. I have attached long and lat in wgs84 to the initial data frame.
ID Date Time Datetime Long Lat x y
10_17 4/18/2017 15:02:00 4/18/2017 15:02 379800.5 5181001 -91.72272 46.35156
10_17 4/20/2017 6:00:00 4/20/2017 6:00 383409 5179885 -91.7044 46.34891
10_17 4/21/2017 21:02:00 4/21/2017 21:02 383191.2 5177960 -91.72297 46.35134
10_24 4/22/2017 10:03:00 4/22/2017 10:03 383448.6 5179918 -91.72298 46.35134
10_17 4/23/2017 12:01:00 4/23/2017 12:01 378582.5 5182110 -91.7242 46.34506
10_24 4/24/2017 1:00:00 4/24/2017 1:00 383647.4 5180009 -91.72515 46.34738
10_24 4/25/2017 16:01:00 4/25/2017 16:01 383407.9 5179872 -91.7184 46.32236
10_17 4/26/2017 18:02:00 4/26/2017 18:02 380691.9 5179353 -91.65361 46.34712
10_36 4/27/2017 20:00:00 4/27/2017 20:00 382521.9 5175266 -91.66127 46.3485
10_36 4/29/2017 11:01:00 4/29/2017 11:01 383443.8 5179909 -91.70303 46.35451
10_36 4/30/2017 0:00:00 4/30/2017 0:00 383060.8 5178361 -91.6685 46.32941
10_40 4/30/2017 13:02:00 4/30/2017 13:02 383426.3 5179873 -91.70263 46.35481
10_40 5/2/2017 17:02:00 5/2/2017 17:02 383393.7 5179883 -91.67099 46.34138
10_40 5/3/2017 6:01:00 5/3/2017 6:01 382875.8 5179376 -91.66324 46.34763
10_88 5/3/2017 19:02:00 5/3/2017 19:02 383264.3 5179948 -91.73075 46.3684
10_88 5/4/2017 8:01:00 5/4/2017 8:01 378554.4 5181966 -91.70413 46.35429
10_88 5/4/2017 21:03:00 5/4/2017 21:03 379830.5 5177232 -91.66452 46.37274
I then try this function
library(geosphere)
library(sf)
library(magrittr)
dis_list <- split(data, data$ID)
answer <- lapply(dis_list, function(df) {
start <- df[-1 , c("x", "y")] %>%
st_as_sf(coords = c('x', 'y'))
end <- df[-nrow(df), c("x", "y")] %>%
st_as_sf(coords = c('x', 'y'))
angles <-geosphere::bearing(start, end)
df$angles <- c(NA, angles)
df
})
answer
which gives the error
Error in .pointsToMatrix(p1) :
'list' object cannot be coerced to type 'double'
A google search on "pass sf points to geosphere bearings" brings up this SE::GIS answer that seems to address the issue which I would characterize as "how to extract numeric vectors from items that are sf-classed POINTS": https://gis.stackexchange.com/questions/416316/compute-east-west-or-north-south-orientation-of-polylines-sf-linestring-in-r
I needed to work with a single section first and then apply the lessons from #Spacedman to this task:
> st_coordinates( st_as_sf(dis_list[[1]], coords = c('x', 'y')) )
X Y
1 -91.72272 46.35156
2 -91.70440 46.34891
3 -91.72297 46.35134
4 -91.72420 46.34506
5 -91.65361 46.34712
So st_coordinates wilL extract the POINTS classed values into a two column matrix that can THEN get passed to geosphere::bearings
dis_list <- split(dat, dat$ID)
answer <- lapply(dis_list, function(df) {
start <- df[-1 , c("x", "y")] %>%
st_as_sf(coords = c('x', 'y')) %>% st_coordinates
end1 <- df[-nrow(df), c("x", "y")] %>%
st_as_sf(coords = c('x', 'y')) %>% st_coordinates
angles <-geosphere::bearing(start, end1)
df$angles <- c(NA, angles)
df
})
answer
#------------------------
$`10_17`
ID Date Time date time Long Lat x y
1 10_17 4/18/2017 15:02:00 4/18/2017 15:02 379800.5 5181001 -91.72272 46.35156
2 10_17 4/20/2017 6:00:00 4/20/2017 6:00 383409.0 5179885 -91.70440 46.34891
3 10_17 4/21/2017 21:02:00 4/21/2017 21:02 383191.2 5177960 -91.72297 46.35134
5 10_17 4/23/2017 12:01:00 4/23/2017 12:01 378582.5 5182110 -91.72420 46.34506
8 10_17 4/26/2017 18:02:00 4/26/2017 18:02 380691.9 5179353 -91.65361 46.34712
Datetime angles
1 4/18/2017 15:02 NA
2 4/20/2017 6:00 -78.194383
3 4/21/2017 21:02 100.694352
5 4/23/2017 12:01 7.723513
8 4/26/2017 18:02 -92.387473
$`10_24`
ID Date Time date time Long Lat x y
4 10_24 4/22/2017 10:03:00 4/22/2017 10:03 383448.6 5179918 -91.72298 46.35134
6 10_24 4/24/2017 1:00:00 4/24/2017 1:00 383647.4 5180009 -91.72515 46.34738
7 10_24 4/25/2017 16:01:00 4/25/2017 16:01 383407.9 5179872 -91.71840 46.32236
Datetime angles
4 4/22/2017 10:03 NA
6 4/24/2017 1:00 20.77910
7 4/25/2017 16:01 -10.58228
$`10_36`
ID Date Time date time Long Lat x y
9 10_36 4/27/2017 20:00:00 4/27/2017 20:00 382521.9 5175266 -91.66127 46.34850
10 10_36 4/29/2017 11:01:00 4/29/2017 11:01 383443.8 5179909 -91.70303 46.35451
11 10_36 4/30/2017 0:00:00 4/30/2017 0:00 383060.8 5178361 -91.66850 46.32941
Datetime angles
9 4/27/2017 20:00 NA
10 4/29/2017 11:01 101.72602
11 4/30/2017 0:00 -43.60192
$`10_40`
ID Date Time date time Long Lat x y
12 10_40 4/30/2017 13:02:00 4/30/2017 13:02 383426.3 5179873 -91.70263 46.35481
13 10_40 5/2/2017 17:02:00 5/2/2017 17:02 383393.7 5179883 -91.67099 46.34138
14 10_40 5/3/2017 6:01:00 5/3/2017 6:01 382875.8 5179376 -91.66324 46.34763
Datetime angles
12 4/30/2017 13:02 NA
13 5/2/2017 17:02 -58.48235
14 5/3/2017 6:01 -139.34297
$`10_88`
ID Date Time date time Long Lat x y
15 10_88 5/3/2017 19:02:00 5/3/2017 19:02 383264.3 5179948 -91.73075 46.36840
16 10_88 5/4/2017 8:01:00 5/4/2017 8:01 378554.4 5181966 -91.70413 46.35429
17 10_88 5/4/2017 21:03:00 5/4/2017 21:03 379830.5 5177232 -91.66452 46.37274
Datetime angles
15 5/3/2017 19:02 NA
16 5/4/2017 8:01 -52.55217
17 5/4/2017 21:03 -123.91920
The help page for st_coordinates characterizes its function as "retrieve coordinates in matrix form".
Given the data is all ready in longitude and latitude form.
Then just using bearing(data[, c("Long", "Lat")]) and distGeo(data[, c("Long", "Lat")]) from geosphere on the split data frames will work. No need to create a start and end points.
library(geosphere)
dfs <- split(data, data$ID)
library(geosphere)
answer <- lapply(dfs, function(df) {
df$distances <-c(distGeo(df[,c("Long", "Lat")]))
df$bearings <- c(bearing(df[,c("Long", "Lat")]))
df
})
answer
The sf package is useful for converting between coordinate systems, but with the data set above, that step can be skipped. I find the geosphere package more straight forward and simpler to use.

Adding data based on date from another dataframe

I have two datasets. One with multiple dates:
date, time
1 2013-05-01 12:43:34
2 2013-05-02 05:04:23
3 2013-05-02 09:34:34
4 2013-05-02 12:32:23
5 2013-05-03 23:23:23
6 2013-05-04 15:34:17
and one with sunrise and sunsets data:
Sunrise Sunset
2013-05-01 06:43:00 2013-05-01 21:02:12
2013-05-02 06:44:00 2013-05-02 21:03:13
2013-05-03 06:44:56 2013-05-03 21:04:02
2013-05-04 06:45:32 2013-05-04 21:05:00
I want to add a column to the first dataframe with either "Day" or "night", based on whether the date and time from the first dataframe is between the sunrise and sunset time and dates.
date, time Day or night
1 2013-05-01 12:43:34 Day
2 2013-05-02 05:04:23 Night
3 2013-05-02 09:34:34 Day
4 2013-05-02 12:32:23 Day
5 2013-05-03 23:23:23 Night
6 2013-05-04 15:34:17 Day
I tried copying and if_else functions, but the length of rows is different because for one year I have 365 sunrises and sunsets but I've also got multiple measurements for one day (total of 28000 rows).
Can anyone help me with my problem.
Thanks in advance.
df1 <- structure(list(date_time = c("2013-05-01 12:43:34", "2013-05-02 05:04:23",
"2013-05-02 09:34:34", "2013-05-02 12:32:23", "2013-05-03 23:23:23",
"2013-05-04 15:34:17")), row.names = c(NA, -6L), class = c("data.frame"))
df2 <- structure(list(Sunrise = c("2013-05-01 06:43:00", "2013-05-02 06:44:00",
"2013-05-03 06:44:56", "2013-05-04 06:45:32"), Sunset = c("2013-05-01 21:02:12",
"2013-05-02 21:03:13", "2013-05-03 21:04:02", "2013-05-04 21:05:00"
)), row.names = c(NA, -4L), class = c("data.frame"))
# prepare df1
df1 <- df1 %>%
mutate(date_time = as.POSIXct(date_time, tz = "UTC")) %>%
mutate(Date = as.Date(date_time))
# prepare df2
df2 <- df2 %>%
mutate(Sunrise = as.POSIXct(Sunrise, tz = "UTC")) %>%
mutate(Sunset = as.POSIXct(Sunset, tz = "UTC")) %>%
mutate(Date = as.Date(Sunrise))
library(lubridate) # for the use of interval
merge(df1, df2, by = "Date") %>%
mutate(DayOrNight = ifelse(date_time %within% interval(Sunrise, Sunset), "Day", "Night"))
# Date date_time Sunrise Sunset DayOrNight
# 1 2013-05-01 2013-05-01 12:43:34 2013-05-01 06:43:00 2013-05-01 21:02:12 Day
# 2 2013-05-02 2013-05-02 05:04:23 2013-05-02 06:44:00 2013-05-02 21:03:13 Night
# 3 2013-05-02 2013-05-02 09:34:34 2013-05-02 06:44:00 2013-05-02 21:03:13 Day
# 4 2013-05-02 2013-05-02 12:32:23 2013-05-02 06:44:00 2013-05-02 21:03:13 Day
# 5 2013-05-03 2013-05-03 23:23:23 2013-05-03 06:44:56 2013-05-03 21:04:02 Night
# 6 2013-05-04 2013-05-04 15:34:17 2013-05-04 06:45:32 2013-05-04 21:05:00 Day

Unpredictable results using cut() function in R to convert dates to 15 minute intervals

OK, this is making me crazy.
I have several datasets with time values that need to be rolled up into 15 minute intervals.
I found a solution here that works beautifully on one dataset. But on the next one I try to do I'm getting weird results. I have a column with character data representing dates:
BeginTime
-------------------------------
1 1/3/19 1:50 PM
2 1/3/19 1:30 PM
3 1/3/19 4:56 PM
4 1/4/19 11:23 AM
5 1/6/19 7:45 PM
6 1/7/19 10:15 PM
7 1/8/19 12:02 PM
8 1/9/19 10:43 PM
And I'm using the following code (which is exactly what I used on the other dataset except for the names)
df$by15 = cut(mdy_hm(df$BeginTime), breaks="15 min")
but what I get is:
BeginTime by15
-------------------------------------------------------
1 1/3/19 1:50 PM 2019-01-03 13:36:00
2 1/3/19 1:30 PM 2019-01-03 13:21:00
3 1/3/19 4:56 PM 2019-01-03 16:51:00
4 1/4/19 11:23 AM 2019-01-04 11:21:00
5 1/6/19 7:45 PM 2019-01-06 19:36:00
6 1/7/19 10:15 PM 2019-01-07 22:06:00
7 1/8/19 12:02 PM 2019-01-08 11:51:00
8 1/9/19 10:43 PM 2019-01-09 22:36:00
9 1/10/19 11:25 AM 2019-01-10 11:21:00
Any suggestions on why I'm getting such random times instead of the 15-minute intervals I'm looking for? Like I said, this worked fine on the other data set.
You can use lubridate::round_date() function which will roll-up your datetime data as follows;
library(lubridate) # To handle datetime data
library(dplyr) # For data manipulation
# Creating dataframe
df <-
data.frame(
BeginTime = c("1/3/19 1:50 PM", "1/3/19 1:30 PM", "1/3/19 4:56 PM",
"1/4/19 11:23 AM", "1/6/19 7:45 PM", "1/7/19 10:15 PM",
"1/8/19 12:02 PM", "1/9/19 10:43 PM")
)
df %>%
# First we parse the data in order to convert it from string format to datetime
mutate(by15 = parse_date_time(BeginTime, '%d/%m/%y %I:%M %p'),
# We roll up the data/round it to 15 minutes interval
by15 = round_date(by15, "15 mins"))
#
# BeginTime by15
# 1/3/19 1:50 PM 2019-03-01 13:45:00
# 1/3/19 1:30 PM 2019-03-01 13:30:00
# 1/3/19 4:56 PM 2019-03-01 17:00:00
# 1/4/19 11:23 AM 2019-04-01 11:30:00
# 1/6/19 7:45 PM 2019-06-01 19:45:00
# 1/7/19 10:15 PM 2019-07-01 22:15:00
# 1/8/19 12:02 PM 2019-08-01 12:00:00
# 1/9/19 10:43 PM 2019-09-01 22:45:00

Changing quarterly data into hourly data

I have data as below. It is from 01.01.2015~31.12.2015.
The data is in quarterly base. But I want to add, for example, like 0:00, 0:15, 0:30, 0:45 together to make a hour data. How can I make this into hourly data?
Thank you in advance.
Date Hour Day-ahead Total Load Forecast [MW] - Germany (DE)
01.01.2015 0:00 42955
01.01.2015 0:15 42412
01.01.2015 0:30 41901
01.01.2015 0:45 41355
01.01.2015 1:00 40710
01.01.2015 1:15 40204
01.01.2015 1:30 39640
01.01.2015 1:45 39324
01.01.2015 2:00 39002
01.01.2015 2:15 38869
01.01.2015 2:30 38783
01.01.2015 2:45 38598
01.01.2015 3:00 38626
01.01.2015 3:15 38459
01.01.2015 3:30 38414
...
> dput(head(new3))
structure(list(Date = structure(c(16436, 16436, 16436, 16436,
16436, 16436), class = "Date"), Hour = c("0:00", "0:15", "0:30",
"0:45", "1:00", "1:15"), Dayahead = c("42955", "42412", "41901",
"41355", "40710", "40204"), Actual = c(42425L, 42021L, 42068L,
41874L, 41230L, 40810L), Difference = c("530", "391", "-167",
"-519", "-520", "-606")), .Names = c("Date", "Hour", "Dayahead",
"Actual", "Difference"), row.names = c(NA, 6L), class = "data.frame")
I've created a small data set for example.
df <- read.csv(text = "Date,Hour,Val
2013-06-03,06:01,0
2013-06-03,12:08,-1
2013-06-03,12:48,3.3
2013-06-03,13:58,2
2013-06-03,13:01,12
2013-06-03,13:08,3
2013-06-03,14:48,4
2013-06-03,14:58,8
2013-06-03,15:01,9.2
2013-06-03,15:08,12.3
2013-06-03,16:48,0
2013-06-03,19:58,-10", stringsAsFactors = FALSE)
With group_by and summarize from dplyr and floor_date from lubridate this can be done:
library(dplyr)
library(lubridate)
df %>%
group_by(Hours=floor_date(ymd_hm(paste(Date, Hour)), "1 hour")) %>%
summarize(Val=sum(Val))
# # A tibble: 7 x 2
# Hours Val
# <dttm> <dbl>
# 1 2013-03-06 06:00:00 0
# 2 2013-03-06 12:00:00 2.30
# 3 2013-03-06 13:00:00 17.0
# 4 2013-03-06 14:00:00 12.0
# 5 2013-03-06 15:00:00 21.5
# 6 2013-03-06 16:00:00 0
# 7 2013-03-06 19:00:00 -10.0
lets say your data frame is called df
> head(df)
Date Hour Forecast
1 01.01.2015 12:00:00 AM 42955
2 01.01.2015 12:15:00 AM 42412
3 01.01.2015 12:30:00 AM 41901
4 01.01.2015 12:45:00 AM 41355
5 01.01.2015 01:00:00 AM 40710
6 01.01.2015 01:15:00 AM 40204
you can aggregate your forecast to hourly basis by the following code
library(lubridate)
df$DateTime=paste(df$Date,df$Hour,sep=" ")%>%dmy_hms%>%floor_date(unit="hour")
result<-ddply(df,.(DateTime),summarize,x=sum(Forecast))
> result
DateTime x
1 2015-01-01 00:00:00 168623
2 2015-01-01 01:00:00 159878
3 2015-01-01 02:00:00 155252
4 2015-01-01 03:00:00 115499
variable x has the sum of forecasts for every hour. Timestamp 00:00:00 aggregates times 00:00, 00:15, 00:30, 00:45.

Change 15 minute data to daily mean in R

I have 15 minute data that I want to change into daily mean. I just listed the Columbia data below, but there are other sites (CR1 and CR2) where I didn't list that data. I put my code at the bottom. I get an error at
x <- xts(d[,-1], as.POSIXct(d[,1], format="%Y-%m-%d %H:%M", tz = "EST"))
Error in as.POSIXct.default(d[, 1], format = "%Y-%m-%d %H:%M", tz = "EST") :
do not know how to convert 'd[, 1]' to class “POSIXct”"
I'm pretty new to R so I'm sorry if the answer is something incredibly simple and I should have caught it.
datetime Discharge Columbia
2014-01-19 22:00 6030 4.3
2014-01-19 22:15 5970 4.28
2014-01-19 22:30 5880 4.25
2014-01-19 22:45 5830 4.23
2014-01-19 23:00 5710 4.19
2014-01-19 23:15 5620 4.16
2014-01-19 23:30 5510 4.12
2014-01-19 23:45 5400 4.08
2014-01-20 00:00 5340 4.06
2014-01-20 00:15 5290 4.04
2014-01-20 00:30 5260 4.03
2014-01-20 00:45 5210 4.01
2014-01-20 01:00 5180 4
2014-01-20 01:15 4990 3.93
2014-01-20 01:30 4830 3.87
2014-01-20 01:45 4810 3.86
2014-01-20 02:00 4780 3.85
2014-01-20 02:15 4780 3.85
2014-01-20 02:30 4760 3.84
2014-01-20 02:45 4760 3.84
2014-01-20 03:00 4760 3.84
2014-01-20 03:15 4760 3.84
USGS_Columbia_Data <- read.csv("~/Desktop/R/USGS_Columbia_Data.csv",header=TRUE)
## daily averages of the data
library(xts)
d <- structure(list(datetime = (USGS_Columbia_Data[1]),
Columbia = (USGS_Columbia_Data[3]),
CR1 = (USGS_Columbia_Data[5]),
CR2 = (USGS_Columbia_Data[7])),
.Names = c("datetime", "Columbia", "CR1", "CR2"),
row.names = c(NA, -3L), class = "data.frame")
x <- xts(d[,-1], as.POSIXct(d[,1], format="%Y-%m-%d %H:%M", tz = "EST"))
apply.daily(x, colMeans)
The other answer works, apparently, but you can (and probably should) use xts for something like this. The problem is with your use of structure(...) to create the data frame. USGS_Columbia_Data is already a data frame. If you want to extract columns 1,3,5, and 7, do this:
d <- USGS_Columbia_Data[,c(1,3,5,7)]
colnames(d) <- c("datetime","Columbia","CR1","CR2"")
You may not need the second line if USGS_Columbia_Data already has those column names. Having done that, you can create a date-indexed xts object as follows:
x <- xts(d[,-1], as.Date(d[,1], format="%Y-%m-%d"))
Then either of the following will work: (note I'm using the d from your example here).
apply.daily(x,mean)
# Discharge Columbia
# 2014-01-19 5743.75 4.201250
# 2014-01-20 4965.00 3.918571
aggregate(x,as.Date,mean)
# Discharge Columbia
# 2014-01-19 5743.75 4.201250
# 2014-01-20 4965.00 3.918571
will work.
If you want to leave the index as POSIXct, use this:
x <- xts(d[,-1], as.POSIXct(d[,1], format="%Y-%m-%d %H:%M"))
apply.daily(x,mean)
# Discharge Columbia
# 2014-01-19 23:45:00 5743.75 4.201250
# 2014-01-20 03:15:00 4965.00 3.918571
But note the index is the last time on each date, not the date itself.
You could use cut and aggregate
# make certain datetime is class POSIXct
d$datetime <- as.POSIXct(d$datetime, tz='EST')
aggregate(list(Discharge = d$Discharge, Columbia = d$Columbia), list(time = cut(d$datetime, "1 day")), mean)
> aggregate(list(Discharge = d$Discharge, Columbia = d$Columbia), list(datetime = cut(t$datetime, "1 day")), mean)
time Discharge Columbia
1 2014-01-19 5743.75 4.201250
2 2014-01-20 4965.00 3.918571

Resources