I've got some a dataframe with the date, hour, and minute in columns. I would like to plot the value column by some sort of timestamp. Anyway to do this?
> head(fl)
date hour min value
1 2014-02-23 0 0 81
2 2014-02-23 0 1 65
3 2014-02-23 0 2 73
4 2014-02-23 0 3 81
5 2014-02-23 0 4 89
6 2014-02-23 0 5 69
...
Right now I'm using ggplot2, but it combines the minutes of every hour and day together :(
ggplot( fl, aes( min, value) ) + geom_line()
Any thoughts?
An as.POSIXct alternative giving the same result as #RobertKrzyzanowski
fl <- data.frame(date=c('2014-02-23', '2014-02-22'), hour = c(0,0), min = c(1,2))
fl$stamp <- with(fl, as.POSIXct( paste(date,hour,min), format="%Y-%m-%d %H %M"))
#> fl
# date hour min stamp
#1 2014-02-23 0 1 2014-02-23 00:01:00
#2 2014-02-22 0 2 2014-02-22 00:02:00
Try the function ISOdatetime(Year, Month, Day, Hour, Min, Sec):
fl <- data.frame(date=c('2014-02-23', '2014-02-22'), hour = c(0,0), min = c(1,2))
zip <- function(x) do.call(Map, append(list(c), x))
args <- unname(append(zip(strsplit(as.character(fl$date), '-')), list(fl$hour, fl$min, 0)))
fl$timestamp <- do.call(ISOdatetime, args)
print(fl)
# date hour min timestamp
# 1 2014-02-23 0 1 2014-02-23 00:01:00
# 2 2014-02-12 0 2 2014-02-12 00:02:00
library(lubridate)
fl$datetime <- ymd_hm(paste(fl$date,fl$hour,fl$min,sep='-'))
ggplot(fl, aes( datetime, value) ) + geom_line()
df<- within(df[5:6], { DT=format(as.POSIXct(paste(df$date, df$time, sep = ' ')),
"%Y-%m-%d %H:%M:%S")
})
Related
I created a dataframe (Dates) of dates/times every 3 hours from 1981-2010 as follows:
# Create dates and times
start <- as.POSIXct("1981-01-01")
interval <- 60
end <- start + as.difftime(10957, units="days")
Dates = data.frame(seq(from=start, by=interval*180, to=end))
colnames(Dates) = "Date"
I now want to split the data into four separate columns with year, month, day and hour. I tried so split the dates using the following code:
Date.split = strsplit(Dates, "-| ")
But I get the following error:
Error in strsplit(Dates, "-| ") : non-character argument
If I try to convert the Dates data to characters then it completely changes the dates, e.g.
Dates.char = as.character(Dates)
gives the following output:
Dates.char Large Character (993.5 kB)
chr "c(347155200, 347166000 ...
I'm getting lost with the conversion between character and numeric and don't know where to go from here. Any insights much appreciated.
One way is to use format.
head(
setNames(
cbind(Dates,
format(Dates, "%Y"), format(Dates, "%m"), format(Dates, "%d"),
format(Dates, "%H")),
c("dates", "year", "month", "day", "hour"))
)
dates year month day hour
1 1981-01-01 00:00:00 1981 01 01 00
2 1981-01-01 03:00:00 1981 01 01 03
3 1981-01-01 06:00:00 1981 01 01 06
4 1981-01-01 09:00:00 1981 01 01 09
5 1981-01-01 12:00:00 1981 01 01 12
6 1981-01-01 15:00:00 1981 01 01 15
A very concise way is to decompose the POSIXlt record:
Dates = cbind(Dates, do.call(rbind, lapply(Dates$Date, as.POSIXlt)))
or
Dates <- data.frame(Dates, unclass(as.POSIXlt(Dates$Date)))
It will return you some aditional data, however. You can filter further
# Date sec min hour mday mon year wday yday isdst zone # gmtoff
# 1 1981-01-01 00:00:00 0 0 0 1 0 81 4 0 0 -03 # -10800
# 2 1981-01-01 03:00:00 0 0 3 1 0 81 4 0 0 -03 # -10800
# 3 1981-01-01 06:00:00 0 0 6 1 0 81 4 0 0 -03 # -10800
I've this function to generate monthly ranges, it should consider years where february has 28 or 29 days:
starts ends
1 2017-01-01 2017-01-31
2 2017-02-01 2017-02-28
3 2017-03-01 2017-03-31
It works with:
make_date_ranges(as.Date("2017-01-01"), Sys.Date())
But gives error with:
make_date_ranges(as.Date("2017-01-01"), as.Date("2019-12-31"))
Why?
make_date_ranges(as.Date("2017-01-01"), as.Date("2019-12-31"))
Error in data.frame(starts, ends) :
arguments imply differing number of rows: 38, 36
add_months <- function(date, n){
seq(date, by = paste (n, "months"), length = 2)[2]
}
make_date_ranges <- function(start, end){
starts <- seq(from = start,
to = Sys.Date()-1 ,
by = "1 month")
ends <- c((seq(from = add_months(start, 1),
to = end,
by = "1 month" ))-1,
(Sys.Date()-1))
data.frame(starts,ends)
}
## useage
make_date_ranges(as.Date("2017-01-01"), as.Date("2019-12-31"))
1) First, define start of month, som, and end of month, eom functions which take a Date class object, date string in standard Date format or yearmon object and produce a Date class object giving the start or end of its year/months.
Using those, create a monthly Date series s using the start of each month from the month/year of from to that of to. Use pmax to ensure that the series does not extend before from and pmin so that it does not extend past to.
The input arguments can be strings in standard Date format, Date class objects or yearmon class objects. In the yearmon case it assumes the user wanted the full month for every month. (The if statement can be omitted if you don't need to support yearmon inputs.)
library(zoo)
som <- function(x) as.Date(as.yearmon(x))
eom <- function(x) as.Date(as.yearmon(x), frac = 1)
date_ranges2 <- function(from, to) {
if (inherits(to, "yearmon")) to <- eom(to)
s <- seq(som(from), eom(to), "month")
data.frame(from = pmax(as.Date(from), s), to = pmin(as.Date(to), eom(s)))
}
date_ranges2("2000-01-10", "2000-06-20")
## from to
## 1 2000-01-10 2000-01-31
## 2 2000-02-01 2000-02-29
## 3 2000-03-01 2000-03-31
## 4 2000-04-01 2000-04-30
## 5 2000-05-01 2000-05-31
## 6 2000-06-01 2000-06-20
date_ranges2(as.yearmon("2000-01"), as.yearmon("2000-06"))
## from to
## 1 2000-01-01 2000-01-31
## 2 2000-02-01 2000-02-29
## 3 2000-03-01 2000-03-31
## 4 2000-04-01 2000-04-30
## 5 2000-05-01 2000-05-31
## 6 2000-06-01 2000-06-30
2) This alternative takes the same approach but defines start of month (som) and end of month (eom) functions without using yearmon so that only base R is needed. It takes character strings in standard Date format or Date class inputs and gives the same output as (1).
som <- function(x) as.Date(cut(as.Date(x), "month")) # start of month
eom <- function(x) som(som(x) + 32) - 1 # end of month
date_ranges3 <- function(from, to) {
s <- seq(som(from), as.Date(to), "month")
data.frame(from = pmax(as.Date(from), s), to = pmin(as.Date(to), eom(s)))
}
date_ranges3("2000-01-10", "2000-06-20")
## from to
## 1 2000-01-10 2000-01-31
## 2 2000-02-01 2000-02-29
## 3 2000-03-01 2000-03-31
## 4 2000-04-01 2000-04-30
## 5 2000-05-01 2000-05-31
## 6 2000-06-01 2000-06-20
date_ranges3(som("2000-01-10"), eom("2000-06-20"))
## from to
## 1 2000-01-01 2000-01-31
## 2 2000-02-01 2000-02-29
## 3 2000-03-01 2000-03-31
## 4 2000-04-01 2000-04-30
## 5 2000-05-01 2000-05-31
## 6 2000-06-01 2000-06-30
You don't need to use seq twice -- you can subtract 1 day from the firsts of each month to get the ends, and generate one too many starts, then shift & subset:
make_date_ranges = function(start, end) {
# format(end, "%Y-%m-01") essentially truncates end to
# the first day of end's month; 32 days later is guaranteed to be
# in the subsequent month
starts = seq(from = start, to = as.Date(format(end, '%Y-%m-01')) + 32, by = 'month')
data.frame(starts = head(starts, -1L), ends = tail(starts - 1, -1L))
}
x = make_date_ranges(as.Date("2017-01-01"), as.Date("2019-12-31"))
rbind(head(x), tail(x))
# starts ends
# 1 2017-01-01 2017-01-31
# 2 2017-02-01 2017-02-28
# 3 2017-03-01 2017-03-31
# 4 2017-04-01 2017-04-30
# 5 2017-05-01 2017-05-31
# 6 2017-06-01 2017-06-30
# 31 2019-07-01 2019-07-31
# 32 2019-08-01 2019-08-31
# 33 2019-09-01 2019-09-30
# 34 2019-10-01 2019-10-31
# 35 2019-11-01 2019-11-30
# 36 2019-12-01 2019-12-31
I have a dataset that contains start and end time stamps, as well as a performance percentage. I'd like to calculate group statistics over hourly blocks, e.g. "the average performance for the midnight hour was x%."
My question is if there is a more efficient way to do this than a series of ifelse() statements.
# some sample data
pre.starting <- data.frame(starting = format(seq.POSIXt(from =
as.POSIXct(Sys.Date()), to = as.POSIXct(Sys.Date()+1), by = "5 min"),
"%H:%M", tz="GMT"))
pre.ending <- data.frame(ending = pre.starting[seq(1, nrow(pre.starting),
2), ])
ending2 <- pre.ending[-c(1), ]
starting2 <- data.frame(pre.starting = pre.starting[!(pre.starting$starting
%in% pre.ending$ending),])
dataset <- data.frame(starting = starting2
, ending = ending2
, perct = rnorm(nrow(starting2), 0.5, 0.2))
For example, I could create hour blocks with code along the lines of the following:
dataset2 <- dataset %>%
mutate(hour = ifelse(starting >= 00:00 & ending < 01:00, 12
, ifelse(starting >= 01:00 & ending < 02:00, 1
, ifelse(starting >= 02:00 & ending < 03:00, 13)))
) %>%
group_by(hour) %>%
summarise(mean.perct = mean(perct, na.rm=T))
Is there a way to make this code more efficient, or improve beyond ifelse()?
We can use cut ending hour based on hourly interval after converting timestamps into POSIXct and then take mean for each hour.
library(dplyr)
dataset %>%
mutate_at(vars(pre.starting, ending), as.POSIXct, format = "%H:%M") %>%
group_by(ending_hour = cut(ending, breaks = "1 hour")) %>%
summarise(mean.perct = mean(perct, na.rm = TRUE))
# ending_hour mean.perct
# <fct> <dbl>
# 1 2019-09-30 00:00:00 0.540
# 2 2019-09-30 01:00:00 0.450
# 3 2019-09-30 02:00:00 0.612
# 4 2019-09-30 03:00:00 0.470
# 5 2019-09-30 04:00:00 0.564
# 6 2019-09-30 05:00:00 0.437
# 7 2019-09-30 06:00:00 0.413
# 8 2019-09-30 07:00:00 0.397
# 9 2019-09-30 08:00:00 0.492
#10 2019-09-30 09:00:00 0.613
# … with 14 more rows
I have following dataframe in R.
Date Car_NO
2016-12-24 19:35:00 ABC
2016-12-24 19:55:00 DEF
2016-12-24 20:15:00 RTY
2016-12-24 20:35:00 WER
2016-12-24 21:34:00 DER
2016-12-24 00:23:00 ABC
2016-12-24 00:22:00 ERT
2016-12-24 11:45:00 RTY
2016-12-24 13:09:00 RTY
Date format is "POSIXct" "POSIXt"
I want to count hourly movement of car traffic. like 12-1,1-2,2-3,3-4 and so on
Currently my approach is following
df$time <- ymd_hms(df$Date)
df$hours <- hour(df$time)
df$minutes <- minute(df$time)
df$time <- as.numeric(paste(df$hours,df$minutes,sep="."))
And after this I will apply ifelse loop to divide it in hourly time slots,but I think it will be long and tedious way to do it. Is there any easy approach in R.
My desired dataframe would be
Time_Slots Car_Traffic_count
00-01 2
01-02 0
02-03 0
.
.
.
19-20 2
20-21 2
21-22 1
.
.
.
Simplest would be to just use the starting hour to indicate a time interval:
# sample data
df = data.frame(time = Sys.time()+seq(1,10)*10000, runif(10) )
# summarize
library(dplyr)
df$hour = factor(as.numeric(format(df$time,"%H")), levels = seq(0,24))
df = df %>%
group_by(hour) %>%
summarize(count=n()) %>%
complete(hour, fill = list(count = 0))
Output:
# A tibble: 24 x 2
hour count
<fctr> <dbl>
1 0 0
2 1 1
3 2 0
4 3 0
5 4 1
6 5 0
7 6 1
8 7 0
9 8 0
10 9 1
# ... with 14 more rows
You can optionally add:
df$formatted = paste0(as.character(df$hour),"-",as.numeric(as.character(df$hour))+1)
at then end to get your desired format. Hope this helps!
I'm trying to study times in which flow was operating at a given level. I would like to find when flows were above a given level for 4 or more hours. How would I go about doing this?
Sample code:
Date<-format(seq(as.POSIXct("2014-01-01 01:00"), as.POSIXct("2015-01-01 00:00"), by="hour"), "%Y-%m-%d %H:%M", usetz = FALSE)
Flow<-runif(8760, 0, 2300)
IsHigh<- function(x ){
if (x < 1600) return(0)
if (1600 <= x) return(1)
}
isHighFlow = unlist(lapply(Flow, IsHigh))
df = data.frame(Date, Flow, isHighFlow )
I was asked to edit my questions to supply what I would like to see as an output.
I would like to see a data from such as the one below. The only issue is the hourseHighFlow is incorrect. I'm not sure how to fix the code to generation the correct hoursHighFlow.
temp <- df %>%
mutate(highFlowInterval = cumsum(isHighFlow==1)) %>%
group_by(highFlowInterval) %>%
summarise(hoursHighFlow = n(), minDate = min(as.character(Date)), maxDate = max(as.character(Date)))
#Then join the two tables together.
temp2<-sqldf("SELECT *
FROM temp LEFT JOIN df
ON df.Date BETWEEN temp.minDate AND temp.maxDate")
Able to use subset to select the length of time running at a high flow rate.
t<-subset(temp2,isHighFlow==1)
t<-subset(t, hoursHighFlow>=4)
Put it in a data.table:
require(data.table)
DT <- data.table(df)
Mark runs and lengths:
DT[,`:=`(r=.GRP,rlen=.N),by={r <- rle(isHighFlow);rep(1:length(r[[1]]),r$lengths)}]
Subset to long runs:
DT[rlen>4L]
How it works:
New columns are created in the second argument of DT[i,j,by] with :=.
.GRP and .N are special variables for, respectively, the index and size of the by group.
A data.table can be subset simply with DT[i], unlike a data.frame.
Apart from subsetting, most of what works with a data.frame works the same on a data.table.
Here is a solution using the dplyr package:
df %>%
mutate(interval = cumsum(isHighFlow!=lag(isHighFlow, default = 0))) %>%
group_by(interval) %>%
summarise(hoursHighFlow = n(), minDate = min(as.character(Date)), maxDate = max(as.character(Date)), isHighFlow = mean(isHighFlow)) %>%
filter(hoursHighFlow >= 4, isHighFlow == 1)
Result:
interval hoursHighFlow minDate maxDate isHighFlow
1 25 4 2014-01-03 07:00 2014-01-03 10:00 1
2 117 4 2014-01-12 01:00 2014-01-12 04:00 1
3 245 6 2014-01-23 13:00 2014-01-23 18:00 1
4 401 6 2014-02-07 03:00 2014-02-07 08:00 1
5 437 5 2014-02-11 02:00 2014-02-11 06:00 1
6 441 4 2014-02-11 21:00 2014-02-12 00:00 1
7 459 4 2014-02-13 09:00 2014-02-13 12:00 1
8 487 4 2014-02-16 03:00 2014-02-16 06:00 1
9 539 7 2014-02-21 08:00 2014-02-21 14:00 1
10 567 4 2014-02-24 11:00 2014-02-24 14:00 1
.. ... ... ... ... ...
As Frank notes, you could achieve the same result with using rle to set intervals, replacing the mutate line with:
mutate(interval = rep(1:length(rle(df$isHighFlow)[[2]]),rle(df$isHighFlow)[[1]])) %>%