Change 15 minute data to daily mean in R - r

I have 15 minute data that I want to change into daily mean. I just listed the Columbia data below, but there are other sites (CR1 and CR2) where I didn't list that data. I put my code at the bottom. I get an error at
x <- xts(d[,-1], as.POSIXct(d[,1], format="%Y-%m-%d %H:%M", tz = "EST"))
Error in as.POSIXct.default(d[, 1], format = "%Y-%m-%d %H:%M", tz = "EST") :
do not know how to convert 'd[, 1]' to class “POSIXct”"
I'm pretty new to R so I'm sorry if the answer is something incredibly simple and I should have caught it.
datetime Discharge Columbia
2014-01-19 22:00 6030 4.3
2014-01-19 22:15 5970 4.28
2014-01-19 22:30 5880 4.25
2014-01-19 22:45 5830 4.23
2014-01-19 23:00 5710 4.19
2014-01-19 23:15 5620 4.16
2014-01-19 23:30 5510 4.12
2014-01-19 23:45 5400 4.08
2014-01-20 00:00 5340 4.06
2014-01-20 00:15 5290 4.04
2014-01-20 00:30 5260 4.03
2014-01-20 00:45 5210 4.01
2014-01-20 01:00 5180 4
2014-01-20 01:15 4990 3.93
2014-01-20 01:30 4830 3.87
2014-01-20 01:45 4810 3.86
2014-01-20 02:00 4780 3.85
2014-01-20 02:15 4780 3.85
2014-01-20 02:30 4760 3.84
2014-01-20 02:45 4760 3.84
2014-01-20 03:00 4760 3.84
2014-01-20 03:15 4760 3.84
USGS_Columbia_Data <- read.csv("~/Desktop/R/USGS_Columbia_Data.csv",header=TRUE)
## daily averages of the data
library(xts)
d <- structure(list(datetime = (USGS_Columbia_Data[1]),
Columbia = (USGS_Columbia_Data[3]),
CR1 = (USGS_Columbia_Data[5]),
CR2 = (USGS_Columbia_Data[7])),
.Names = c("datetime", "Columbia", "CR1", "CR2"),
row.names = c(NA, -3L), class = "data.frame")
x <- xts(d[,-1], as.POSIXct(d[,1], format="%Y-%m-%d %H:%M", tz = "EST"))
apply.daily(x, colMeans)

The other answer works, apparently, but you can (and probably should) use xts for something like this. The problem is with your use of structure(...) to create the data frame. USGS_Columbia_Data is already a data frame. If you want to extract columns 1,3,5, and 7, do this:
d <- USGS_Columbia_Data[,c(1,3,5,7)]
colnames(d) <- c("datetime","Columbia","CR1","CR2"")
You may not need the second line if USGS_Columbia_Data already has those column names. Having done that, you can create a date-indexed xts object as follows:
x <- xts(d[,-1], as.Date(d[,1], format="%Y-%m-%d"))
Then either of the following will work: (note I'm using the d from your example here).
apply.daily(x,mean)
# Discharge Columbia
# 2014-01-19 5743.75 4.201250
# 2014-01-20 4965.00 3.918571
aggregate(x,as.Date,mean)
# Discharge Columbia
# 2014-01-19 5743.75 4.201250
# 2014-01-20 4965.00 3.918571
will work.
If you want to leave the index as POSIXct, use this:
x <- xts(d[,-1], as.POSIXct(d[,1], format="%Y-%m-%d %H:%M"))
apply.daily(x,mean)
# Discharge Columbia
# 2014-01-19 23:45:00 5743.75 4.201250
# 2014-01-20 03:15:00 4965.00 3.918571
But note the index is the last time on each date, not the date itself.

You could use cut and aggregate
# make certain datetime is class POSIXct
d$datetime <- as.POSIXct(d$datetime, tz='EST')
aggregate(list(Discharge = d$Discharge, Columbia = d$Columbia), list(time = cut(d$datetime, "1 day")), mean)
> aggregate(list(Discharge = d$Discharge, Columbia = d$Columbia), list(datetime = cut(t$datetime, "1 day")), mean)
time Discharge Columbia
1 2014-01-19 5743.75 4.201250
2 2014-01-20 4965.00 3.918571

Related

Can you specify what space to separate columns by?

I am working with a data set called sleep with the following columns:
head(sleep)
Id SleepDay TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
1 1503960366 4/12/2016 12:00:00 AM 1 327 346
2 1503960366 4/13/2016 12:00:00 AM 2 384 407
3 1503960366 4/15/2016 12:00:00 AM 1 412 442
4 1503960366 4/16/2016 12:00:00 AM 2 340 367
I am trying to separate the sleepDay column into two columns named "Date" and "Sleep"
I used the separate function and was able to create the two columns below:
separate(weight_log, Date, into = c("Date", "Time"), sep = ' ')
Id Date Time WeightKg WeightPounds Fat BMI IsManualReport LogId
1 1503960366 5/2/2016 11:59:59 52.6 115.9631 22 22.65 True 1.462234e+12
2 1503960366 5/3/2016 11:59:59 52.6 115.9631 NA 22.65 True 1.462320e+12
3 1927972279 4/13/2016 1:08:52 133.5 294.3171 NA 47.54 False 1.460510e+12
I want to be able to keep the AM and PM next to the times, but with the function I used, they seem to disappear I assume because I am separating based on a space. Is there anyway to be able to specify that I am only trying to separate the column into two based on the first space?
Edit: The data set Sleep shown at the top is different then the dataset I used the separator function on which is weight_log, but the issue is the same
data.frame(SleepDay = "4/12/2016 12:00:00 AM") %>%
separate(SleepDay, into = c("Date", "Time"), sep = " ", extra = "merge")
# Date Time
#1 4/12/2016 12:00:00 AM
If you are doing further analysis or visualization, I recommend converting the text into a datetime.
library(lubridate)
data.frame(SleepDay = "4/12/2016 12:05:00 AM") %>%
mutate(SleepDay = mdy_hms(SleepDay),
SleepDay_base = as.POSIXct(SleepDay),
date = as_date(SleepDay),
time_12 = format(SleepDay, "%I:%M %p"),
time_24 = format(SleepDay, "%H:%M"))
# SleepDay SleepDay_base date time_12 time_24
#1 2016-04-12 00:05:00 2016-04-12 00:05:00 2016-04-12 12:05 AM 00:05

aggregate data frame to typical year/week

so i have a large data frame with a date time column of class POSIXct and a another column with price data of class numeric. the date time column has values of the form "1998-12-07 02:00:00 AEST" that are half hour observations across 20 years. a sample data set can be generated with the following code (vary the 100 to whatever number of observations are necessary):
data.frame(date.time = seq.POSIXt(as.POSIXct("1998-12-07 02:00:00 AEST"), as.POSIXct(Sys.Date()+1), by = "30 min")[1:100], price = rnorm(100))
i want to look at a typical year and typical week. so for the typical year i have the following code:
mean.year <- aggregate(df$price, by = list(format(df$date.time, "%m-%d %H:%M")), mean)
it seems to give me what i want:
Group.1 x
1 01-01 00:00 31.86200
2 01-01 00:30 34.20526
3 01-01 01:00 28.40105
4 01-01 01:30 26.01684
5 01-01 02:00 23.68895
6 01-01 02:30 23.70632
however the column "Group.1" is of class character and i would like it to be of class POSIXct. how can i do this?
for the typical week i have the following code
mean.week <- aggregate(df$price, by = list(format(df$date.time, "%wday %H:%M")), mean)
the output is as follows
Group.1 x
1 0day 00:00 33.05613
2 0day 00:30 30.92815
3 0day 01:00 29.26245
4 0day 01:30 29.47959
5 0day 02:00 29.18380
6 0day 02:30 25.99400
again, column "Group.1" is of class character and i would like POSIXct. also, i would like to have the day of the week as "Monday", "Tuesday", etc. instead of 0day. how would i do this?
Convert the datetime to a character string that can validly be converted back to POSIXct and then do so:
mean.year <- aggregate(df["price"],
by = list(time = as.POSIXct(format(df$date.time, "2000-%m-%d %H:%M"))), mean)
head(mean.year)
## time price
## 1 2000-12-07 02:00:00 -0.56047565
## 2 2000-12-07 02:30:00 -0.23017749
## 3 2000-12-07 03:00:00 1.55870831
## 4 2000-12-07 03:30:00 0.07050839
## 5 2000-12-07 04:00:00 0.12928774
## 6 2000-12-07 04:30:00 1.71506499
To get the day of the week use %a or %A -- see ?strptime for the list of percent codes.
mean.week <- aggregate(df["price"],
by = list(time = format(df$date.time, "%a %H:%M")), mean)
head(mean.week)
## time price
## 1 Mon 02:00 -0.56047565
## 2 Mon 02:30 -0.23017749
## 3 Mon 03:00 1.55870831
## 4 Mon 03:30 0.07050839
## 5 Mon 04:00 0.12928774
## 6 Mon 04:30 1.71506499
Note
The input df in reproducible form -- note that set.seed is needed to make it reproducible:
set.seed(123)
df <- data.frame(date.time = seq.POSIXt(as.POSIXct("1998-12-07 02:00:00 AEST"),
as.POSIXct(Sys.Date()+1), by = "30 min")[1:100], price = rnorm(100))

Changing quarterly data into hourly data

I have data as below. It is from 01.01.2015~31.12.2015.
The data is in quarterly base. But I want to add, for example, like 0:00, 0:15, 0:30, 0:45 together to make a hour data. How can I make this into hourly data?
Thank you in advance.
Date Hour Day-ahead Total Load Forecast [MW] - Germany (DE)
01.01.2015 0:00 42955
01.01.2015 0:15 42412
01.01.2015 0:30 41901
01.01.2015 0:45 41355
01.01.2015 1:00 40710
01.01.2015 1:15 40204
01.01.2015 1:30 39640
01.01.2015 1:45 39324
01.01.2015 2:00 39002
01.01.2015 2:15 38869
01.01.2015 2:30 38783
01.01.2015 2:45 38598
01.01.2015 3:00 38626
01.01.2015 3:15 38459
01.01.2015 3:30 38414
...
> dput(head(new3))
structure(list(Date = structure(c(16436, 16436, 16436, 16436,
16436, 16436), class = "Date"), Hour = c("0:00", "0:15", "0:30",
"0:45", "1:00", "1:15"), Dayahead = c("42955", "42412", "41901",
"41355", "40710", "40204"), Actual = c(42425L, 42021L, 42068L,
41874L, 41230L, 40810L), Difference = c("530", "391", "-167",
"-519", "-520", "-606")), .Names = c("Date", "Hour", "Dayahead",
"Actual", "Difference"), row.names = c(NA, 6L), class = "data.frame")
I've created a small data set for example.
df <- read.csv(text = "Date,Hour,Val
2013-06-03,06:01,0
2013-06-03,12:08,-1
2013-06-03,12:48,3.3
2013-06-03,13:58,2
2013-06-03,13:01,12
2013-06-03,13:08,3
2013-06-03,14:48,4
2013-06-03,14:58,8
2013-06-03,15:01,9.2
2013-06-03,15:08,12.3
2013-06-03,16:48,0
2013-06-03,19:58,-10", stringsAsFactors = FALSE)
With group_by and summarize from dplyr and floor_date from lubridate this can be done:
library(dplyr)
library(lubridate)
df %>%
group_by(Hours=floor_date(ymd_hm(paste(Date, Hour)), "1 hour")) %>%
summarize(Val=sum(Val))
# # A tibble: 7 x 2
# Hours Val
# <dttm> <dbl>
# 1 2013-03-06 06:00:00 0
# 2 2013-03-06 12:00:00 2.30
# 3 2013-03-06 13:00:00 17.0
# 4 2013-03-06 14:00:00 12.0
# 5 2013-03-06 15:00:00 21.5
# 6 2013-03-06 16:00:00 0
# 7 2013-03-06 19:00:00 -10.0
lets say your data frame is called df
> head(df)
Date Hour Forecast
1 01.01.2015 12:00:00 AM 42955
2 01.01.2015 12:15:00 AM 42412
3 01.01.2015 12:30:00 AM 41901
4 01.01.2015 12:45:00 AM 41355
5 01.01.2015 01:00:00 AM 40710
6 01.01.2015 01:15:00 AM 40204
you can aggregate your forecast to hourly basis by the following code
library(lubridate)
df$DateTime=paste(df$Date,df$Hour,sep=" ")%>%dmy_hms%>%floor_date(unit="hour")
result<-ddply(df,.(DateTime),summarize,x=sum(Forecast))
> result
DateTime x
1 2015-01-01 00:00:00 168623
2 2015-01-01 01:00:00 159878
3 2015-01-01 02:00:00 155252
4 2015-01-01 03:00:00 115499
variable x has the sum of forecasts for every hour. Timestamp 00:00:00 aggregates times 00:00, 00:15, 00:30, 00:45.

create 30 min interval for time series with different start time

I have data for electricity sensor reading with interval 15 min but the start time is not fixed for example
in this day it start at min 13 another day start from different minute
dateTime KW
1/1/2013 1:13 34.70
1/1/2013 1:28 43.50
1/1/2013 1:43 50.50
1/1/2013 1:58 57.50
.
.
.//here start from min 02
1/30/2013 0:02 131736.30
1/30/2013 0:17 131744.30
1/30/2013 0:32 131751.10
1/30/2013 0:47 131759.00
I have data for one year and i need to have regular interval 30 min starting from mid night 00:00.
I am new to R ..can anyone help me
May be you can try:
dT <- as.POSIXct(strptime(df$dateTime, '%m/%d/%Y %H:%M'))
grp <- as.POSIXct(cut(c(as.POSIXct(gsub(' +.*', '', min(dT))), dT,
as.POSIXct(gsub(' +.*', '', max(dT)+24*3600))), breaks='30 min'))
df$grp <- grp[-c(1,length(grp))]
df
# dateTime KW grp
#1 1/1/2013 1:13 34.7 2013-01-01 01:00:00
#2 1/1/2013 1:28 43.5 2013-01-01 01:00:00
#3 1/1/2013 1:43 50.5 2013-01-01 01:30:00
#4 1/1/2013 1:58 57.5 2013-01-01 01:30:00
#5 1/30/2013 0:02 131736.3 2013-01-30 00:00:00
#6 1/30/2013 0:17 131744.3 2013-01-30 00:00:00
#7 1/30/2013 0:32 131751.1 2013-01-30 00:30:00
#8 1/30/2013 0:47 131759.0 2013-01-30 00:30:00
data
df <- structure(list(dateTime = c("1/1/2013 1:13", "1/1/2013 1:28",
"1/1/2013 1:43", "1/1/2013 1:58", "1/30/2013 0:02", "1/30/2013 0:17",
"1/30/2013 0:32", "1/30/2013 0:47"), KW = c(34.7, 43.5, 50.5,
57.5, 131736.3, 131744.3, 131751.1, 131759)), .Names = c("dateTime",
"KW"), class = "data.frame", row.names = c(NA, -8L))

Averaging a continuous measurement of meteorological parameters on R

I am quite new to R, and I am trying to find a way to average continuous data into a specific period of time.
My data is a month recording of several parameters with 1s time steps
The table via read.csv has a date and time in one column and several other columns with values.
TimeStamp UTC Pitch Roll Heave(m)
05-02-13 6:45 0 0 0
05-02-13 6:46 0.75 -0.34 0.01
05-02-13 6:47 0.81 -0.32 0
05-02-13 6:48 0.79 -0.37 0
05-02-13 6:49 0.73 -0.08 -0.02
So I want to average the data in specific intervals: 20 min for example in a way that the average for hour 7:00, takes all the points from hour 6:41 to 7:00 and returns the average in this interval and so on for the entire dataset.
The time interval will look like this :
TimeStamp
05-02-13 19:00 462
05-02-13 19:20 332
05-02-13 19:40 15
05-02-13 20:00 10
05-02-13 20:20 42
Here is a reproducible dataset similar to your own.
meteorological <- data.frame(
TimeStamp = rep.int("05-02-13", 1440),
UTC = paste(
rep(formatC(0:23, width = 2, flag = "0"), each = 60),
rep(formatC(0:59, width = 2, flag = "0"), times = 24),
sep = ":"
),
Pitch = runif(1440),
Roll = rnorm(1440),
Heave = rnorm(1440)
)
The first thing that you need to do is to combine the first two columns to create a single (POSIXct) date-time column.
library(lubridate)
meteorological$DateTime <- with(
meteorological,
dmy_hm(paste(TimeStamp, UTC))
)
Then set up a sequence of break points for your different time groupings.
breaks <- seq(ymd("2013-02-05"), ymd("2013-02-06"), "20 mins")
Finally, you can calculate the summary statistics for each group. There are many ways to do this. ddply from the plyr package is a good choice.
library(plyr)
ddply(
meteorological,
.(cut(DateTime, breaks)),
summarise,
MeanPitch = mean(Pitch),
MeanRoll = mean(Roll),
MeanHeave = mean(Heave)
)
Please see if something simple like this works for you:
myseq <- data.frame(time=seq(ISOdate(2014,1,1,12,0,0), ISOdate(2014,1,1,13,0,0), "5 min"))
myseq$cltime <- cut(myseq$time, "20 min", labels = F)
> myseq
time cltime
1 2014-01-01 12:00:00 1
2 2014-01-01 12:05:00 1
3 2014-01-01 12:10:00 1
4 2014-01-01 12:15:00 1
5 2014-01-01 12:20:00 2
6 2014-01-01 12:25:00 2
7 2014-01-01 12:30:00 2
8 2014-01-01 12:35:00 2
9 2014-01-01 12:40:00 3
10 2014-01-01 12:45:00 3
11 2014-01-01 12:50:00 3
12 2014-01-01 12:55:00 3
13 2014-01-01 13:00:00 4

Resources