I have the following dataframe (ts1):
D1 Diff
1 20/11/2014 16:00 0.00
2 20/11/2014 17:00 0.01
3 20/11/2014 19:00 0.03
I would like to add a new column to ts1 that will be the difference in hours between successive rows D1 (dates) in hours.
The new ts1 should be:
D1 Diff N
1 20/11/2014 16:00 0.00
2 20/11/2014 17:00 0.01 1
3 20/11/2014 19:00 0.03 2
For calculating the difference in hours independently I use:
library(lubridate)
difftime(dmy_hm("29/12/2014 11:00"), dmy_hm("29/12/2014 9:00"), units="hours")
I know that for calculating the difference between each row I need to transform the ts1 into matrix.
I use the following command:
> ts1$N<-difftime(dmy_hm(as.matrix(ts1$D1)), units="hours")
And I get:
Error in as.POSIXct(time2) : argument "time2" is missing, with no default
Suppose ts1 is as shown in Note 2 at the end. Then create a POSIXct variable tt from D1, convert tt to numeric giving the number of seconds since the Epoch, divide that by 3600 to get the number of hours since the Epoch and take differences. No packages are used.
tt <- as.POSIXct(ts1$D1, format = "%d/%m/%Y %H:%M")
m <- transform(ts1, N = c(NA, diff(as.numeric(tt) / 3600)))
giving:
> m
D1 Diff N
1 20/11/2014 16:00 0.00 NA
2 20/11/2014 17:00 0.01 1
3 20/11/2014 19:00 0.03 2
Note 1: I assume you are looking for N so that you can fill in the empty hours. In that case you don't really need N. Also, it would be easier to deal with time series if you use a time series representation. First we convert ts1 to a zoo object, then we create a zero width zoo object with the datetimes that we need and finally we merge them:
library(zoo)
z <- read.zoo(ts1, tz = "", format = "%d/%m/%Y %H:%M")
z0 <- zoo(, seq(start(z), end(z), "hours"))
zz <- merge(z, z0)
giving:
> zz
2014-11-20 16:00:00 2014-11-20 17:00:00 2014-11-20 18:00:00 2014-11-20 19:00:00
0.00 0.01 NA 0.03
If you really did need a data frame back then:
DF <- fortify.zoo(zz)
Note 2: Input used in reproducible form is:
Lines <- "D1,Diff
1,20/11/2014 16:00,0.00
2,20/11/2014 17:00,0.01
3,20/11/2014 19:00,0.03"
ts1 <- read.csv(text = Lines, as.is = TRUE)
Thanks to #David Arenburg and #nicola:
Can use either:
res <- diff(as.POSIXct(ts1$D1, format = "%d/%m/%Y %H:%M")) ; units(res) <- "hours"
Or:
res <- diff(dmy_hm(ts1$D1))
and afterwards:
ts1$N <- c(NA_real_, as.numeric(res))
Related
I have a per minute timeseries for a number of years.
I need to compute a the following value for each minute data point:
q <- (Fn-Fd)/Fn
Whereby Fn is the average F value at night time between 12-1 AM and Fd is just the minute data point.
Now obviously the Fn changes each day so one approach would be to calculate Fn perhaps using a dplyr function and i would need to create a loop of some kind or re-organise my data frame...
dummy data:
#string of dates for a one month
datetime <- seq(
from=as.POSIXct("2012-1-1 0:00:00", tz="UTC"),
to=as.POSIXct("2012-2-1 0:00:00", tz="UTC"),
by="min"
)
#variable F
F <- runif(44641, min = 0, max =2)
#dataframe
df <- as.data.frame(cbind(datetime,F))
library(lubridate)
#make sure its in "POSIXct" "POSIXt" format
df$datetime <- as_datetime(df$datetime)
Or a less elegant way might be to get Fn on its own, between the times using dplyr first - i think it will be something like this:
Fn <- df %>%
filter(between(as.numeric(format(datetime, "%H")), 0, 1)) %>%
group_by(hour=format(datetime, "%Y-%m-%d %H:")) %>%
summarise(value=mean(df$F))
But I am not sure my syntax is correct there? Am I calculating the mean F between 12 and 1 AM per day?
Then i could just print the average Fn value for each min per day to my dataframe and do the simple calculation to get Q.
Thanks in advance for advice here.
Maybe something like this ?
library(dplyr)
library(lubridate)
df %>%
group_by(Date = as.Date(datetime)) %>%
mutate(F_mean = mean(F[hour(datetime) == 0]),
value = (F_mean - F)/F_mean) %>%
ungroup() %>%
select(-F_mean, -Date)
# datetime F value
# <dttm> <dbl> <dbl>
# 1 2012-01-01 00:00:00 1.97 -0.902
# 2 2012-01-01 00:01:00 0.194 0.813
# 3 2012-01-01 00:02:00 1.52 -0.467
# 4 2012-01-01 00:03:00 1.66 -0.599
# 5 2012-01-01 00:04:00 0.765 0.262
# 6 2012-01-01 00:05:00 1.31 -0.267
# 7 2012-01-01 00:06:00 1.62 -0.565
# 8 2012-01-01 00:07:00 0.642 0.380
# 9 2012-01-01 00:08:00 1.62 -0.560
#10 2012-01-01 00:09:00 1.68 -0.621
# ... with 44,631 more rows
We first group_by every date get the mean value for 0th hour (values between 00:00 to 00:59) each day and calculate value using the formula given.
I have a very big dataframe in R, containing weather data with the following format.
valid temp
1 17/08/2014 00:20 14
2 17/08/2014 00:50 14
3 17/08/2014 01:20 13.5
4 17/08/2014 01:50 13
5 17/08/2014 02:20 12
6 17/08/2014 02:50 10
I would like to convert these sub-hourly data to hourly, like the following.
valid tmpc
1 2014-08-17 00:00:00 14
2 2014-08-17 01:00:00 13.75
3 2014-08-17 02:00:00 12.5
The class of df$valid is 'factor'. I have tried first converting them to Date through POSIXct, but it gives only NA values. I have also tried changing the system locale and still I get NAs.
We can do this in base R by converting to POSIXlt, set the minute to 0, convert it back to POSIXct and aggregate to get the mean of 'temp'
df1$valid <- strptime(df1$valid, "%d/%m/%Y %H:%M")
df1$valid$min <- 0
df1$valid <- as.POSIXct(df1$valid)
aggregate(temp~valid, df1, FUN = mean)
Option 1: The lubridate solution using ceiling_date or round_date. It's not clear according to your data frame and results if what you want is to round or ceiling. For instance, in the first row you are rounding and in the third using ceiling. Anyways here the example:
library(lubridate)
df <- data.frame(i = 1, valid= "17/08/2014 01:28", temp = 14)
df$valid <- dmy_hm(df$valid)
df$valid_round <- ceiling_date(df$valid , unit="hours")
Results:
i valid temp valid_round
1 1 2014-08-17 01:28:00 14 2014-08-17 02:00:00
Option 2: using the base functions. Use:
df$valid <- as.POSIXct(strptime(df$valid, "%d/%m/%Y %H:%M", tz ="UTC"))
and then round it.
How would you calculate time difference of two consecutive rows of timestamps in minutes and add the result to a new column.
I have tried this:
data$hours <- as.numeric(floor(difftime(timestamps(data), (timestamps(data)[1]), units="mins")))
But only get difference from time zero and onwards.
Added example data with 'mins' column that I want to be added
timestamps mins
2013-06-23 00:00:00 NA
2013-06-23 01:00:00 60
2013-06-23 02:00:00 60
2013-06-23 04:00:00 120
The code that you're using with the [1] is always referencing the first element of the timestamps vector.
To do what you want, you want to look at all but the first element minus all but the last element.
mytimes <- data.frame(timestamps=c("2013-06-23 00:00:00",
"2013-06-23 01:00:00",
"2013-06-23 02:00:00",
"2013-06-23 04:00:00"),
mins=NA)
mytimes$mins <- c(NA, difftime(mytimes$timestamps[-1],
mytimes$timestamps[-nrow(mytimes)],
units="mins"))
What this code does is:
Setup a data frame so that you will keep the length of the timestamps and mins the same.
Within that data frame, put the timestamps you have and the fact that you don't have any mins yet (i.e. NA).
Select all but the first element of timestamps mytimes$timestamps[-1]
Select all but the last element of timestamps mytimes$timestamps[-nrow(mytimes)]
Subtract them difftime (since they're well-formatted, you don't first have to make them POSIXct objects) with the units of minutes. units="mins"
Put an NA in front because you have one fewer difference than you have rows c(NA, ...)
Drop all of that back into the original data frame's mins column mytimes$mins <-
Another option is to calculate it with this approach:
# create some data for an MWE
hrs <- c(0,1,2,4)
df <- data.frame(timestamps = as.POSIXct(paste("2015-12-17",
paste(hrs, "00", "00", sep = ":"))))
df
# timestamps
# 1 2015-12-17 00:00:00
# 2 2015-12-17 01:00:00
# 3 2015-12-17 02:00:00
# 4 2015-12-17 04:00:00
# create a function that calculates the lag for n periods
lag <- function(x, n) c(rep(NA, n), x[1:(length(x) - n)])
# create a new column named mins
df$mins <- as.numeric(df$timestamps - lag(df$timestamps, 1)) / 60
df
# timestamps mins
# 1 2015-12-17 00:00:00 NA
# 2 2015-12-17 01:00:00 60
# 3 2015-12-17 02:00:00 60
# 4 2015-12-17 04:00:00 120
I have rainfall data at very irregular intervals. Every time it records 0.01 inches of rain, the data logger records the time down to seconds. A few data points look like this:
datetime <- as.POSIXct(as.character(c("2/5/15 16:28:38", "2/5/15 16:29:36", "2/5/15 16:29:41", "2/5/15 16:30:00")), format="%m/%d/%y %H:%M:%S")
value <- rep(0.01, 4)
df <- data.frame(datetime, value)
df
> datetime value
> 1 2015-02-05 16:28:38 0.01
> 2 2015-02-05 16:29:36 0.01
> 3 2015-02-05 16:29:41 0.01
> 4 2015-02-05 16:30:00 0.01
I was trying to get the hang of zoo and xts, but to no avail. My end goal is to sum the "values" by even minute, like so:
2015-02-05 16:27 0
2015-02-05 16:28 0.01
2015-02-05 16:29 0.02
2015-02-05 16:30 0.01
2015-02-05 16:31 0
Does anybody have any general guidance on this? I would be much appreciative.
Read the data frame into a zoo object and calculate the times truncated to the
minute, cumsum'ing over the values by minute. Then calculate a sequence of times from one minute before to one minute after the times in our data, remove
any times already in the data and merge it filling with zeros. If you don't really need the added zero times then stop after computing zcum:
library(zoo)
z <- read.zoo(df, tz = "")
mins <- trunc(time(z), "mins")
zcum <- ave(z, mins, FUN = cumsum)
rng <- range(mins)
tt <- seq(rng[1] - 60, rng[2] + 60, by = "mins")
tt <- tt[ ! format(tt) %in% format(mins) ]
merge(zcum, zoo(, tt), fill = 0)
giving:
2015-02-05 16:27:00 2015-02-05 16:28:38 2015-02-05 16:29:36 2015-02-05 16:29:41
0.00 0.01 0.01 0.02
2015-02-05 16:30:00 2015-02-05 16:31:00
0.01 0.00
I have time series data where the values are stored at the end of a 1-min sampling interval (i.e. data for 00:00 belongs to the interval 23:59 - 00:00 etc.).
I now would like to average in 5 min intervals giving the mean concentrations at 00:05, 00:10, etc.
What I get with the code below is the averages at 00:04, 00:09, etc., which seems to be related to the endpoints function, but I cannot figure out how to average correctly (i.e. in my case minutes 00:01 to 00:05 reported as mean at 00:05 etc.)
library(zoo)
library(xts)
t1 <- as.POSIXct("2012-1-1 0:1:0")
t2 <- as.POSIXct("2012-1-1 0:15:0")
d <- seq(t1, t2, by = "1 min")
x <- rnorm(length(d))
z <- zoo(x, d)
period.apply(z,endpoints(z,"mins",5),mean)
> 2012-01-01 00:04:00 2012-01-01 00:09:00 2012-01-01 00:14:00 2012-01-01 00:15:00
0.6864088 -0.9403631 -0.4269895 0.6728044
The endpoints function is working correctly. You need to change your index values. 00:05:00 is the beginning of the 5th minute, not the end.
> z <- zoo(x, d-1)
> period.apply(z,endpoints(z,"mins",5),mean)
2012-01-01 00:04:59 2012-01-01 00:09:59 2012-01-01 00:14:59
1.2324436 -0.5881076 0.5067009