I have created a dataframe with two columns.
> head(data_frame)
Date Rainfall
1 1992-01-06 14:00:00 0.3
2 1992-01-06 15:00:00 0.2
3 1992-01-06 16:00:00 0.3
4 1992-01-06 18:00:00 0.1
5 1992-01-06 19:00:00 0.3
6 1992-01-06 20:00:00 0.8
Rainfall is numeric and Date is POSIXct.
> class(data_frame$Date)
[1] "POSIXct"
> class(data_frame$Rainfall)
[1] "numeric"
When I try to create a time series using xts function, I get the following error:
> time_series <- xts::xts(data_frame$Rainfall, order.by = data_frame$Date)
Error in xts::xts(data_frame$Rainfall, order.by = data_frame$Date) :
order.by requires an appropriate time-based object
xts should be able to handle POSIXct. I went through a similar question posted here, where the solution was to convert date into the above format. Looking at those answers, my code should work. I can't figure out why it is not.
Reproducible example:
head_data_frame = structure(list(
Date = structure(
c(
694659600,
694663200,
694666800,
694674000,
694677600,
694681200
),
class = "POSIXct"
),
Rainfall = c(0.3,
0.2, 0.3, 0.1, 0.3, 0.8)
),
row.names = c(NA, 6L),
class = "data.frame")
The class appears to be broken, did you use a package? Normally it's c("POSIXct", "POSIXt") but yours is just "POSIXt".
class(head_data_frame$Date)
# [1] "POSIXct"
Fix:
class(head_data_frame$Date) <- c("POSIXct", "POSIXt")
Test:
xts::xts(head_data_frame$Rainfall, order.by = head_data_frame$Date)
# [,1]
# 1992-01-06 02:00:00 0.3
# 1992-01-06 03:00:00 0.2
# 1992-01-06 04:00:00 0.3
# 1992-01-06 06:00:00 0.1
# 1992-01-06 07:00:00 0.3
# 1992-01-06 08:00:00 0.8
Works! :)
I can get it to work if I change the timezone to UTC.
head_data_frame$Date <- lubridate::force_tz(head_data_frame$Date, tzone = "UTC")
xts::xts(head_data_frame$Rainfall, order.by = head_data_frame$Date)
# [,1]
#1992-01-06 09:00:00 0.3
#1992-01-06 10:00:00 0.2
#1992-01-06 11:00:00 0.3
#1992-01-06 13:00:00 0.1
#1992-01-06 14:00:00 0.3
#1992-01-06 15:00:00 0.8
#Warning message:
#timezone of object (UTC) is different than current timezone ().
Related
I have a selection of scattered timestamp data based on requests to a particular service. This data covers approximately 3.5-4 years of requests against this service.
I am looking to turn this selection of variable-interval timestamps into a frequency-binned timeseries in R.
How would I go about converting these timestamps into a frequency-binned timeseries, such as "between 1 and 1:15PM on this day, there were 7 requests, and between 1:15 and 1:30PM there were 2, and between 1:30 and 1:45, there were 0", being sure to also have a bin where there is nothing?
The data is just a vector of timestamps from a database dump, all of the format: ""2014-02-17 13:10:46". Just a big ol' vector with ~2 million objects in it.
You could use tools for handling time series data from xts and zoo. Note that you will need some artificial 'data':
library(xts)
set.seed(42)
ts.index <- ISOdatetime(2018, 1, 8, 8:9, sample(60, 10), 0)
ts <- xts(rep(1, length(ts.index)), ts.index)
aggregate(ts, time(ts) - as.numeric(time(ts)) %% 900, length, regular = TRUE)
#>
#> 2018-01-08 08:15:00 1
#> 2018-01-08 08:30:00 3
#> 2018-01-08 08:45:00 1
#> 2018-01-08 09:00:00 1
#> 2018-01-08 09:15:00 1
#> 2018-01-08 09:45:00 3
Edit: If you want to include bins without observations, you can convert to a strictly regular ts object and replace the inserted NAvalues with zero:
raw <- aggregate(ts, time(ts) - as.numeric(time(ts)) %% 900, length, regular = TRUE)
as.xts(na.fill(as.ts(raw), 0), dateFormat = "POSIXct")
#> zoo(coredata(x), tt)
#> 2018-01-08 08:15:00 1
#> 2018-01-08 08:30:00 3
#> 2018-01-08 08:45:00 1
#> 2018-01-08 09:00:00 1
#> 2018-01-08 09:15:00 1
#> 2018-01-08 09:30:00 0
#> 2018-01-08 09:45:00 3
Edit 2: It also works for the provided sample data:
library(xts)
data <- c(1228917812, 1245038910, 1245986979, 1268750482, 1281615510, 1292561113)
class(data) = c("POSIXct", "POSIXt")
attr(data, "tzone") <- "UTC"
dput(data)
#> structure(c(1228917812, 1245038910, 1245986979, 1268750482, 1281615510,
#> 1292561113), class = c("POSIXct", "POSIXt"), tzone = "UTC")
ts <- xts(rep(1, length(data)), data)
raw <- aggregate(ts, time(ts) - as.numeric(time(ts)) %% 900, length, regular = TRUE)
head(as.xts(na.fill(as.ts(raw), 0), dateFormat = "POSIXct"))
#> zoo(coredata(x), tt)
#> 2008-12-10 15:00:00 1
#> 2008-12-10 15:15:00 0
#> 2008-12-10 15:30:00 0
#> 2008-12-10 15:45:00 0
#> 2008-12-10 16:00:00 0
#> 2008-12-10 16:15:00 0
I have two data frames: A
y_m_d SNOW
1 2010-01-01 0.0
2 2010-01-02 0.0
3 2010-01-03 0.1
4 2010-01-04 0.0
5 2010-01-05 0.0
6 2010-01-06 2.3
B:
time temp
1 2010-01-01 00:00:00 20.00000
2 2010-01-01 01:00:00 18.33333
3 2010-01-01 02:00:00 17.00000
4 2010-01-01 03:00:00 25.33333
5 2010-01-01 04:00:00 23.33333
I want to combine two data frame based on time. A is a daily record and B is a hourly record. I want to fill the A record at the beginning of each day at 00:00:00 and leave the rest of day blank.
The result should be look like this:
time temp SNOW
1 2010-01-01 00:00:00 20.00000 0.0
2 2010-01-01 01:00:00 18.33333
3 2010-01-01 02:00:00 17.00000
4 2010-01-01 03:00:00 25.33333
5 2010-01-01 04:00:00 23.33333
6 2010-01-01 05:00:00 22.66667
Could you please give me some advice?
Thank you.
Here's a quick solution:
A$y_m_d <- as.Date(A$y_m_d)
B$SNOW <- sapply(as.Date(B$time), function(x) A[A$y_m_d==x, "SNOW"])
This might not be the most efficient way in the world to do this, but it is a solution. I attempted to create data with the exact same variable types and structure as you.
# Create example data
y_m_d <- as.POSIXct(c("2010-01-01", "2010-01-02"), format="%Y-%m-%d")
SNOW <- c(0, 0.1)
time <- as.POSIXct(c("2010-01-01 00:00:00", "2010-01-01 01:00:00", "2010-01-01 02:00:00", "2010-01-02 00:00:00", "2010-01-02 01:00:00", "2010-01-02 02:00:00"), format="%Y-%m-%d %H:%M:%S")
temp <- rnorm(6, mean=20, sd=4)
A <- data.frame(y_m_d, SNOW)
B <- data.frame(time, temp)
# Check data
A
## y_m_d SNOW
## 1 2010-01-01 0.0
## 2 2010-01-02 0.1
B
## time temp
## 1 2010-01-01 00:00:00 17.52852
## 2 2010-01-01 01:00:00 12.42715
## 3 2010-01-01 02:00:00 21.79584
## 4 2010-01-02 00:00:00 19.90442
## 5 2010-01-02 01:00:00 16.40524
## 6 2010-01-02 02:00:00 16.86854
# Loop through days and construct new SNOW variable
days <- as.POSIXct(format(B$time, "%Y-%m-%d"), format="%Y-%m-%d")
SNOW_new <- c()
for (i in 1:nrow(A)) {
SNOW_new <- c(A[i, "SNOW"], rep(NA, sum(days==A[i, "y_m_d"])-1), SNOW_new)
}
# Create new data frame
C <- data.frame(B, SNOW_new)
## time temp SNOW_new
## 1 2010-01-01 00:00:00 17.52852 0.1
## 2 2010-01-01 01:00:00 12.42715 NA
## 3 2010-01-01 02:00:00 21.79584 NA
## 4 2010-01-02 00:00:00 19.90442 0.0
## 5 2010-01-02 01:00:00 16.40524 NA
## 6 2010-01-02 02:00:00 16.86854 NA
I put NA rather than a blank space because I assume you want the SNOW_new variable to be numeric, not character. But if you do want a blank space, you can just replace the NA in the rep function with a "".
Making sure time variables are in the right format.
A$y_m_d <- as.POSIXct(A$y_m_d, format="%Y-%m-%d")
B$time <- as.POSIXct(B$time, format="%Y-%m-%d %H:%M:%S")
The package lubridate is suited to merge time series data
#install.packages("lubridate")
library(lubridate)
A <- xts(A[,-1], order.by = A$y_m_d)
B <- xts(B[,-1], order.by = B$time)
merge.xts(A, B)
I have a table in R like:
start duration
02/01/2012 20:00:00 5
05/01/2012 07:00:00 6
etc... etc...
I got to this by importing a table from Microsoft Excel that looked like this:
date time duration
2012/02/01 20:00:00 5
etc...
I then merged the date and time columns by running the following code:
d.f <- within(d.f, { start=format(as.POSIXct(paste(date, time)), "%m/%d/%Y %H:%M:%S") })
I want to create a third column called 'end', which will be calculated as the number of hours after the start time. I am pretty sure that my time is a POSIXct vector. I have seen how to manipulate one datetime object, but how can I do that for the entire column?
The expected result should look like:
start duration end
02/01/2012 20:00:00 5 02/02/2012 01:00:00
05/01/2012 07:00:00 6 05/01/2012 13:00:00
etc... etc... etc...
Using lubridate
> library(lubridate)
> df$start <- mdy_hms(df$start)
> df$end <- df$start + hours(df$duration)
> df
# start duration end
#1 2012-02-01 20:00:00 5 2012-02-02 01:00:00
#2 2012-05-01 07:00:00 6 2012-05-01 13:00:00
data
df <- structure(list(start = c("02/01/2012 20:00:00", "05/01/2012 07:00:00"
), duration = 5:6), .Names = c("start", "duration"), class = "data.frame", row.names = c(NA,
-2L))
You can simply add dur*3600 to start column of the data frame. E.g. with one date:
start = as.POSIXct("02/01/2012 20:00:00",format="%m/%d/%Y %H:%M:%S")
start
[1] "2012-02-01 20:00:00 CST"
start + 5*3600
[1] "2012-02-02 01:00:00 CST"
I have a POSIXct class vector containing am hours and I want to replace the values in a data frame containing a character class column. When I do the replacement the class changes to character. I'm proceeding as follows:
class(data2014.im.t[,2])
[1] "character"
class(horas.am)
[1] "POSIXct" "POSIXt"
head(horas.am)
[1] "1970-01-01 09:00:00 COT" "1970-01-01 10:00:00 COT" "1970-01-01 11:00:00 COT" "1970-01-01 12:00:00 COT"
[5] "1970-01-01 01:00:00 COT" "1970-01-01 02:00:00 COT"
data2014.im.t[grep("([a])", data2014.im.t[,2]), 2] <- horas.am
class(data2014.im.t[,2])
[1] "character"
head(data2014.im.t[,2])
[1] "50400" "54000" "57600" "104400" "64800" "68400"
Evidently I would like to have a POSIXct column containing hours. Any thoughts?
You should explicitly do the conversion yourself
#sample data
horas.am <- seq(as.POSIXct("2014-01-01 05:00:00"), length.out=10, by="2 hours")
data2014.im.t <- data.frame(a=1:10, b=rep("a",10), stringsAsFactors=FALSE)
class(data2014.im.t[,2])
# [1] "character"
class(horas.am)
# [1] "POSIXct" "POSIXt"
# NO:
data2014.im.t[grep("([a])", data2014.im.t[,2]), 2] <- horas.am
# YES
data2014.im.t[grep("([a])", data2014.im.t[,2]), 2] <- as.character(horas.am)
data2014.im.t
# a b
# 1 1 2014-01-01 05:00:00
# 2 2 2014-01-01 07:00:00
# 3 3 2014-01-01 09:00:00
# 4 4 2014-01-01 11:00:00
# 5 5 2014-01-01 13:00:00
# 6 6 2014-01-01 15:00:00
# 7 7 2014-01-01 17:00:00
# 8 8 2014-01-01 19:00:00
# 9 9 2014-01-01 21:00:00
# 10 10 2014-01-01 23:00:00
class(data2014.im.t[,2])
# [1] "character"
I have read in and formatted my data set like shown under.
library(xts)
#Read data from file
x <- read.csv("data.dat", header=F)
x[is.na(x)] <- c(0) #If empty fill in zero
#Construct data frames
rawdata.h <- data.frame(x[,2],x[,3],x[,4],x[,5],x[,6],x[,7],x[,8]) #Hourly data
rawdata.15min <- data.frame(x[,10]) #15 min data
#Convert time index to proper format
index.h <- as.POSIXct(strptime(x[,1], "%d.%m.%Y %H:%M"))
index.15min <- as.POSIXct(strptime(x[,9], "%d.%m.%Y %H:%M"))
#Set column names
names(rawdata.h) <- c("spot","RKup", "RKdown","RKcon","anm", "pp.stat","prod.h")
names(rawdata.15min) <- c("prod.15min")
#Convert data frames to time series objects
data.htemp <- xts(rawdata.h,order.by=index.h)
data.15mintemp <- xts(rawdata.15min,order.by=index.15min)
#Select desired subset period
data.h <- data.htemp["2013"]
data.15min <- data.15mintemp["2013"]
I want to be able to combine hourly data from data.h$prod.h with data, with 15 min resolution, from data.15min$prod.15min corresponding to the same hour.
An example would be to take the average of the hourly value at time 2013-12-01 00:00-01:00 with the last 15 minute value in that same hour, i.e. the 15 minute value from time 2013-12-01 00:45-01:00. I'm looking for a flexible way to do this with an arbitrary hour.
Any suggestions?
Edit: Just to clarify further: I want to do something like this:
N <- NROW(data.h$prod.h)
for (i in 1:N){
prod.average[i] <- mean(data.h$prod.h[i] + #INSERT CODE THAT FINDS LAST 15 MIN IN HOUR i )
}
I found a solution to my problem by converting the 15 minute data into hourly data using the very useful .index* function from the xts package like shown under.
prod.new <- data.15min$prod.15min[.indexmin(data.15min$prod.15min) %in% c(45:59)]
This creates a new time series with only the values occuring in the 45-59 minute interval each hour.
For those curious my data looked like this:
Original hourly series:
> data.h$prod.h[1:4]
2013-01-01 00:00:00 19.744
2013-01-01 01:00:00 27.866
2013-01-01 02:00:00 26.227
2013-01-01 03:00:00 16.013
Original 15 minute series:
> data.15min$prod.15min[1:4]
2013-09-30 00:00:00 16.4251
2013-09-30 00:15:00 18.4495
2013-09-30 00:30:00 7.2125
2013-09-30 00:45:00 12.1913
2013-09-30 01:00:00 12.4606
2013-09-30 01:15:00 12.7299
2013-09-30 01:30:00 12.9992
2013-09-30 01:45:00 26.7522
New series with only the last 15 minutes in each hour:
> prod.new[1:4]
2013-09-30 00:45:00 12.1913
2013-09-30 01:45:00 26.7522
2013-09-30 02:45:00 5.0332
2013-09-30 03:45:00 2.6974
Short answer
df %>%
group_by(t = cut(time, "30 min")) %>%
summarise(v = mean(value))
Long answer
Since, you want to compress the 15 minutes time series to a smaller resolution (30 minutes), you should use dplyr package or any other package that computes the "group by" concept.
For instance:
s = seq(as.POSIXct("2017-01-01"), as.POSIXct("2017-01-02"), "15 min")
df = data.frame(time = s, value=1:97)
df is a time series with 97 rows and two columns.
head(df)
time value
1 2017-01-01 00:00:00 1
2 2017-01-01 00:15:00 2
3 2017-01-01 00:30:00 3
4 2017-01-01 00:45:00 4
5 2017-01-01 01:00:00 5
6 2017-01-01 01:15:00 6
The cut.POSIXt, group_by and summarise functions do the work:
df %>%
group_by(t = cut(time, "30 min")) %>%
summarise(v = mean(value))
t v
1 2017-01-01 00:00:00 1.5
2 2017-01-01 00:30:00 3.5
3 2017-01-01 01:00:00 5.5
4 2017-01-01 01:30:00 7.5
5 2017-01-01 02:00:00 9.5
6 2017-01-01 02:30:00 11.5
A more robust way is to convert 15 minutes values into hourly values by taking average. Then do whatever operation you want to.
### 15 Minutes Data
min15 <- structure(list(V1 = structure(1:8, .Label = c("2013-01-01 00:00:00",
"2013-01-01 00:15:00", "2013-01-01 00:30:00", "2013-01-01 00:45:00",
"2013-01-01 01:00:00", "2013-01-01 01:15:00", "2013-01-01 01:30:00",
"2013-01-01 01:45:00"), class = "factor"), V2 = c(16.4251, 18.4495,
7.2125, 12.1913, 12.4606, 12.7299, 12.9992, 26.7522)), .Names = c("V1",
"V2"), class = "data.frame", row.names = c(NA, -8L))
min15
### Hourly Data
hourly <- structure(list(V1 = structure(1:4, .Label = c("2013-01-01 00:00:00",
"2013-01-01 01:00:00", "2013-01-01 02:00:00", "2013-01-01 03:00:00"
), class = "factor"), V2 = c(19.744, 27.866, 26.227, 16.013)), .Names = c("V1",
"V2"), class = "data.frame", row.names = c(NA, -4L))
hourly
### Convert 15min data into hourly data by taking average of 4 values
min15$V1 <- as.POSIXct(min15$V1,origin="1970-01-01 0:0:0")
min15 <- aggregate(. ~ cut(min15$V1,"60 min"),min15[setdiff(names(min15), "V1")],mean)
min15
names(min15) <- c("time","min15")
names(hourly) <- c("time","hourly")
### merge the corresponding values
combined <- merge(hourly,min15)
### average of hourly and 15min values
rowMeans(combined[,2:3])