R_Sub setting a timeseries by specific hour of the day - r

I have a long time series (zoo), 'obs' with one hour timestep and three years data
> head(obs)
time obs
2009-12-22 01:00:00 23.708
2009-12-22 02:00:00 23.708
2009-12-22 03:00:00 23.708
2009-12-22 04:00:00 23.708
2009-12-22 06:00:00 23.708
2009-12-22 07:00:00 23.708
I am only interested in the readings of 01:00:00 of each day and want to subset this series only. Is there anyway to do it? I am already using 'xts' package but couldn't find a way.

Try:
subset(obs, as.numeric(format(obs$time, "%H"))==1)
This extracts the hours from obs$time in a 0 to 24 format, and subsets only the times where it is equal to 1.

xts is the right package. What you are interested in is the function
[.xts (Extract subsets of xts Objects)
For example:
obs["T01:00/T01:59"]
will return all the observation where the "T" time is between 01:00 and 01:59.

Related

Calculating mean and sd of bedtime (hh:mm) in R - problem are times before/after midnight

I got the following dataset:
data <- read.table(text="
wake_time sleep_time
08:38:00 23:05:00
09:30:00 00:50:00
06:45:00 22:15:00
07:27:00 23:34:00
09:00:00 23:00:00
09:05:00 00:10:00
06:40:00 23:28:00
10:00:00 23:30:00
08:10:00 00:10:00
08:07:00 00:38:00", header=T)
I used the chron-package to calculate the average wake_time:
> mean(times(data$wake_time))
[1] 08:20:12
But when I do the same for the variable sleep_time, this happens:
> mean(times(data$sleep_time))
[1] 14:04:00
I guess the result is distorted because the sleep_time contains times before and after midnight.
But how can I solve this problem?
Additionally:
How can I calculate the sd of the times. I want to use it like "mean wake-up-time 08:20 ± 44 min" for example.
THe times values are stored as numbers 0-1 representing a fraction of a day. If the sleep time is earlier than the wake time, you can "add a day" before taking the mean. For example
library(chron)
wake <- times(data$wake_time)
sleep <- times(data$sleep_time)
times(mean(ifelse(sleep < wake, sleep+1, sleep)))
# [1] 23:40:00
And since the values are parts of a day, if you want the sd in minutes, you'd take the partial day values and convert to minutes
sd(ifelse(sleep < wake, sleep+1, sleep) * 24*60)
# [1] 47.60252

Getting weeks from timestamp which spans more than a year in R

I have a dataset containing a series of time stamps from 01/01/2015 to 01/01/2017 (dd/mm/yyyy). I want to convert it to Weeks (i.e) 01/01/2015 Week 0, 08/01/2015 becomes Week 1 ... 01/01/2017 should become Week 104 (or something around this number).
I tried the following method
> sD
"2016-04-13 05:30:00 IST" "2017-04-10 05:30:00 IST"
> format(as.Date(sD,format = guess_formats(sD,c('dmy'))), "%W")
"15" "15"
Here for the same date but for different years I am getting the same Week. I need the output to change with year also. How to go about doing this?
Just take the difference and specify the unit as weeks:
x <- as.Date(c("2015-01-01","2015-01-08","2017-01-01"))
difftime(x, as.Date("2015-01-01"), units="weeks")
#Time differences in weeks
#[1] 0.0000 1.0000 104.4286

Associate numbers to datetime/timestamp

I have a dataframe df with a certain number of columns. One of them, ts, is timestamps:
1462147403122 1462147412990 1462147388224 1462147415651 1462147397069 1462147392497
...
1463529545634 1463529558639 1463529556798 1463529558788 1463529564627 1463529557370.
I have also at my disposal the corresponding datetime in the datetime column:
"2016-05-02 02:03:23 CEST" "2016-05-02 02:03:32 CEST" "2016-05-02 02:03:08 CEST" "2016-05-02 02:03:35 CEST" "2016-05-02 02:03:17 CEST" "2016-05-02 02:03:12 CEST"
...
"2016-05-18 01:59:05 CEST" "2016-05-18 01:59:18 CEST" "2016-05-18 01:59:16 CEST" "2016-05-18 01:59:18 CEST" "2016-05-18 01:59:24 CEST" "2016-05-18 01:59:17 CEST"
As you can see my dataframe contains data accross several day. Let's say there are 3. I would like to add a column containing number 1, 2 or 3. 1 if the line belongs to the first day, 2 for the second day, etc...
Thank you very much in advance,
Clement
One way to do this is to keep track of total days elapsed each time the date changes, as demonstrated below.
# Fake data
dat = data.frame(datetime = c(seq(as.POSIXct("2016-05-02 01:03:11"),
as.POSIXct("2016-05-05 01:03:11"), length.out=6),
seq(as.POSIXct("2016-05-09 01:09:11"),
as.POSIXct("2016-05-16 02:03:11"), length.out=4)))
tz(dat$datetime) = "UTC"
Note, if your datetime column is not already in a datetime format, convert it to one using as.POSIXct.
Now, create a new column with the day number, counting the first day in the sequence as day 1.
dat$day = c(1, cumsum(as.numeric(diff(as.Date(dat$datetime, tz="UTC")))) + 1)
dat
datetime day
1 2016-05-02 01:03:11 1
2 2016-05-02 15:27:11 1
3 2016-05-03 05:51:11 2
4 2016-05-03 20:15:11 2
5 2016-05-04 10:39:11 3
6 2016-05-05 01:03:11 4
7 2016-05-09 01:09:11 8
8 2016-05-11 09:27:11 10
9 2016-05-13 17:45:11 12
10 2016-05-16 02:03:11 15
I specified the timezone in the code above to avoid getting tripped up by potential silent shifts between my local timezone and UTC. For example, note the silent shift from my default local time zone ("America/Los_Angeles") to UTC when converting a POSIXct datetime to a date:
# Fake data
datetime = seq(as.POSIXct("2016-05-02 01:03:11"), as.POSIXct("2016-05-05 01:03:11"), length.out=6)
tz(datetime)
[1] ""
date = as.Date(datetime)
tz(date)
[1] "UTC"
data.frame(datetime, date)
datetime date
1 2016-05-02 01:03:11 2016-05-02
2 2016-05-02 15:27:11 2016-05-02
3 2016-05-03 05:51:11 2016-05-03
4 2016-05-03 20:15:11 2016-05-04 # Note day is different due to timezone shift
5 2016-05-04 10:39:11 2016-05-04
6 2016-05-05 01:03:11 2016-05-05

Create hourly intervals without regard to day-month-year in R

I have a list of dates as this:
"2014-01-20 18:47:09 GMT"
"2014-01-20 18:46:59 GMT"
"2014-01-20 18:46:59 GMT"
"2014-01-20 18:46:59 GMT"
"2014-01-20 18:46:41 GMT"
I used this code to split the dates in four-hour intervals
data.frame(table(cut(datenormord, breaks = "4 hour")))
Results are these:
2013-07-22 06:00:00 144
2013-07-22 11:00:00 268
2013-07-22 16:00:00 331
2013-07-22 21:00:00 332
What I want is to see how many observations there are in each interval of four hours but not taking account of days months and years. For example I would like to see how many observations there are from 00:00 to 04:00 by adding observations of everyday of every year contained in my dataset
For example i want something like this:
01:00:00 1230
06:00:00 2430
11:00:00 3230
You can try removing the dates from your date using strftime then reformatting them to a date, which will just add the current day, year and month to all the datapoints. You can them break and count like you posted.
datenormord<-c("2014-01-20 01:47:09 GMT", "2014-01-20 07:46:59 GMT","2014-01-20 13:46:59 GMT" ,"2014-01-20 18:46:59 GMT" ,"2014-01-20 18:46:41 GMT")
datenormord<-strftime(as.POSIXlt(datenormord), format="%H:%M:%S")
datenormord<-as.POSIXlt(datenormord, format="%H:%M:%S")
result<-data.frame(table(cut(datenormord, breaks = "4 hour")))
You can remove the date in the final data frame as well:
result$Var1<-with(result,format(strftime(Var1,format="%H:%M")))

Reducing multi-column xts to single column xts based on provided column indexes

I have an xts object with multiple columns of the same type. I have another xts object with integers that correspond to column positions in the first object. I would like to generate an third xts object that contains one column representing the value of the column indicated by the corresponding index. For example:
# xts.1:
2003-07-30 17:00:00 0.2015173 0.10159303 0.19244332 0.08138396
2003-08-28 17:00:00 0.1890154 0.06889412 0.12700216 0.04631253
2003-09-29 17:00:00 0.1336947 0.08023267 0.09167604 0.02376319
2003-10-30 16:00:00 0.1713496 0.13324238 0.11427968 0.05946272
# xts.2:
2003-07-30 17:00:00 1
2003-08-28 17:00:00 4
2003-09-29 17:00:00 2
2003-10-30 16:00:00 3
# Desired result:
2003-07-30 17:00:00 0.2015173
2003-08-28 17:00:00 0.04631253
2003-09-29 17:00:00 0.08023267
2003-10-30 16:00:00 0.11427968
I feel like I'm missing something very elementary about how to do this but, if so, it's escaping me at the moment.
Your data appear to be monthly, so I would strongly recommend you move from a POSIXct index to a Date or yearmon index. Otherwise you may run into issues with timezones and daylight saving time.
One way to solve this problem is to merge xts.1 and xts.2, then loop over all the rows of the resulting object, subsetting the column by the xts.2 column in the merged data.
Since your data are monthly, you can loop over all the observations with apply.monthly.
> xts.3 <- merge(xts.1,xts.2)
> apply.monthly(xts.3, function(x) x[,x$xts.2])
[,1]
2003-07-30 17:00:00 0.20151730
2003-08-28 17:00:00 0.04631253
2003-09-29 17:00:00 0.08023267
2003-10-30 16:00:00 0.11427968

Resources