data <- data.frame(dates = c("2014-10-28 00:01:59.526","2014-10-27 13:30:01.526"),
times = c("23:59:59","13:29:55"),
hour = c(23,13),
minute = c(59,29),
second = c(59,55))
data[,1] <- as.POSIXct(data[,1])
data[,2] <- as.factor(data[,2])
class(data[,1])
class(data[,2])
class(data[,3])
class(data[,4])
class(data[,5])
data
dates times hour minute second
1 2014-10-28 00:01:59.526 23:59:59 23 59 59
2 2014-10-27 13:30:01.526 13:29:55 13 29 55
I need to populate a new column "NewDate" with a POSIXct data that is the combination of the date and time column BUT there IF the hour column shows 23 then the date for "NEwDate" should be the date from the "date" column MINUS 1 day otherwise it should be the date from the "date" column.
So the final output should be:
date time hour minute second NewDate
1 2014-10-28 00:01:59.526 23:59:59 23 59 59 2014-10-27 23:59:59 #NewDate = date-1 + time
2 2014-10-27 13:30:01.526 13:29:55 13 29 55 2014-10-17 13:29:55 #NewDate = date + time
(NewDate has to be a POSIXct)
What is the best way to do this WITHOUT looping down the data frame and doing something like:
library(lubridate) #lubridate contains hour(), minute(), second()
CorrectTIME <- function(date, hour, minute, second)
{
NewDate<- vector("numeric",length(date))
for(i in 1:length(date))
{
if(hour[i] > hour(date[i]) )
{
NewDate[i] =ISOdatetime(year(date[i]), month(date[i]), day(date[i])-1, hour[i], minute[i], second[i], tz="GMT")
}else
{
NewDate[i] =ISOdatetime(year(date[i]), month(date[i]), day(date[i]), hour[i], minute[i], second[i], tz="GMT")
}
}
}
with(data, paste ( format( as.Date(dates) - (hour == 23) , "%Y-%m-%d"),
paste( hour, minute, second, sep=":")))
#[1] "2014-10-27 23:59:59" "2014-10-27 13:29:55"
Ooops, forgot the as.POSIXct:
as.POSIXct( with(data, paste ( format( as.Date(dates) - (hour==23) ,
"%Y-%m-%d"), paste( hour, minute, second, sep=":"))) )
[1] "2014-10-27 23:59:59 PDT" "2014-10-27 13:29:55 PDT"
Related
I can convert to POSIXct most of the time like for instance:
as.POSIXct( "20:16:32", format = "%H:%M:%S" )
[1] "2017-06-23 20:16:32 EDT"
But once the time goes beyond 24h, it fails:
as.POSIXct( "24:16:32", format = "%H:%M:%S" )
[1] NA
Which makes some sense as 24:16:32 should rather be read as 00:16:32
Such standards of 24+ are however well spread in the design of public transportation. I could of course replace all "24:" by "00:", but I am sure there is a more elegant way out.
Read the time string into a data frame dd and set next_day to 1 if the hour exceeds 24 or more or 0 if not. Subtract 24 from the hour if next_day is 1 and add 1 day's worth of seconds. Given that today is June 23, 2017 this would work for hours between 0 and 47.
x <- "24:16:32" # test input
dd <- read.table(text = x, sep = ":", col.names = c("hh", "mm", "ss"))
next_day <- dd$hh >= 24
s <- sprintf("%s %0d:%0d:%0d", Sys.Date(), dd$hh - 24 * next_day, dd$mm, dd$ss)
as.POSIXct(s) + next_day * 24 * 60 * 60
## "2017-06-24 00:16:32 EDT"
I have a data frame with hour stamp and corresponding temperature measured. The measurements are taken at random intervals over time continuously. I would like to convert the hours to respective date-time and temperature measured. My data frame looks like this: (The measurement started at 20/05/2016)
Time, Temp
09.25,28
10.35,28.2
18.25,29
23.50,30
01.10,31
12.00,36
02.00,25
I would like to create a data.frame with respective date-time and Temp like below:
Time, Temp
2016-05-20 09:25,28
2016-05-20 10:35,28.2
2016-05-20 18:25,29
2016-05-20 23:50,30
2016-05-21 01:10,31
2016-05-21 12:00,36
2016-05-22 02:00,25
I am thankful for any comments and tips on the packages or functions in R, I can have a look to do this. Thanks for your time.
A possible solution in base R:
df$Time <- as.POSIXct(strptime(paste('2016-05-20', sprintf('%05.2f',df$Time)), format = '%Y-%m-%d %H.%M', tz = 'GMT'))
df$Time <- df$Time + cumsum(c(0,diff(df$Time)) < 0) * 86400 # 86400 = 60 * 60 * 24
which gives:
> df
Time Temp
1 2016-05-20 09:25:00 28.0
2 2016-05-20 10:35:00 28.2
3 2016-05-20 18:25:00 29.0
4 2016-05-20 23:50:00 30.0
5 2016-05-21 01:10:00 31.0
6 2016-05-21 12:00:00 36.0
7 2016-05-22 02:00:00 25.0
An alternative with data.table (off course you can also use cumsum with diff instead of rleid & shift):
setDT(df)[, Time := as.POSIXct(strptime(paste('2016-05-20', sprintf('%05.2f',Time)), format = '%Y-%m-%d %H.%M', tz = 'GMT')) +
(rleid(Time < shift(Time, fill = Time[1]))-1) * 86400]
Or with dplyr:
library(dplyr)
df %>%
mutate(Time = as.POSIXct(strptime(paste('2016-05-20',
sprintf('%05.2f',Time)),
format = '%Y-%m-%d %H.%M', tz = 'GMT')) +
cumsum(c(0,diff(Time)) < 0)*86400)
which will both give the same result.
Used data:
df <- read.table(text='Time, Temp
09.25,28
10.35,28.2
18.25,29
23.50,30
01.10,31
12.00,36
02.00,25', header=TRUE, sep=',')
You can use a custom date format combined with some code that detects when a new day begins (assuming the first measurement takes place earlier in the day than the last measurement of the previous day).
# starting day
start_date = "2016-05-20"
values=read.csv('values.txt', colClasses=c("character",NA))
last=c(0,values$Time[1:nrow(values)-1])
day=cumsum(values$Time<last)
Time = strptime(paste(start_date,values$Time), "%Y-%m-%d %H.%M")
Time = Time + day*86400
values$Time = Time
I have a dataframe of datetimes, like so:
library(lubridate)
date_seq <- seq.POSIXt(ymd_hm('2016-04-01 0:00'), ymd_hm('2016-04-30 23:30'), by = '30 mins')
datetimes <- data.frame(datetime = date_seq)
I've also got a dataframe containing opening times that specify a range of days over which the opening times apply and an hour range over which the store is open for the days in the date range, like so:
opening_times <- data.frame(from_date = c('2016-03-01', '2016-04-15'),
till_date = c('2016-04-15', '2016-05-20'),
from_time = c('11:00', '10:30'),
till_time = c('22:00', '23:00'))
What I would like is to mark in datetimes those rows which are inside the opening hours. That is, I want a column that is TRUE whenever the datetime in the row is within both from_date and till_date and within from_time and till_time.
If the dataset isn't too big, I'd recommend creating a new dataset from opening_times -
opening_times$from_date = as.Date(opening_times$from_date, '%Y-%m-%d')
opening_times$till_date = as.Date(opening_times$till_date, '%Y-%m-%d')
opening_times2 = do.call(
rbind,
lapply(
seq(nrow(opening_times)),
function (rownumber) {
data.frame(
seq.Date(
from = opening_times[rownumber,'from_date'],
to = opening_times[rownumber,'till_date'],
by = 1
),
from_time = opening_times[rownumber,'from_time'],
till_time = opening_times[rownumber,'till_time']
)
}
)
)
and then merging it with datetimes by date and checking for whether time falls between the two values.
lubridate has a %within% function for checking whether a time is within a lubridate::interval which can make this easy once you create a vector of intervals:
# make a sequence of days in each set from opening_times
open_intervals <- apply(opening_times, 1, function(x){
dates <- seq.Date(ymd(x[1]), ymd(x[2]), by = 'day')
})
# turn each date into a lubridate::interval object with the appropriate times
open_intervals <- mapply(function(dates, from, to){
interval(ymd_hm(paste(dates, from)), ymd_hm(paste(dates, to)))
}, open_intervals, opening_times$from_time, opening_times$till_time)
# combine list items into one vector of intervals
open_intervals <- do.call(c, open_intervals)
# use lubridate::%within% to check if each datetime is in any open interval
datetimes$open <- sapply(datetimes$datetime, function(x){
any(x %within% open_intervals)
})
datetimes[20:26,]
# datetime open
# 20 2016-04-01 09:30:00 FALSE
# 21 2016-04-01 10:00:00 FALSE
# 22 2016-04-01 10:30:00 FALSE
# 23 2016-04-01 11:00:00 TRUE
# 24 2016-04-01 11:30:00 TRUE
# 25 2016-04-01 12:00:00 TRUE
# 26 2016-04-01 12:30:00 TRUE
Edit
If you have exactly two sets of hours, you can condense the whole thing into a (somewhat huge) ifelse:
datetimes$open <- ifelse(as.Date(datetimes$datetime) %within%
interval(opening_times$from_date[1],
opening_times$till_date[1]),
hm(format(datetimes$datetime, '%H:%M')) >= hm(opening_times$from_time)[1] &
hm(format(datetimes$datetime, '%H:%M')) <= hm(opening_times$till_time)[1],
hm(format(datetimes$datetime, '%H:%M')) >= hm(opening_times$from_time)[2] &
hm(format(datetimes$datetime, '%H:%M')) <= hm(opening_times$till_time)[2])
or
datetimes$open <- ifelse(as.Date(datetimes$datetime) %within%
interval(opening_times$from_date[1],
opening_times$till_date[1]),
datetimes$datetime %within%
interval(ymd_hm(paste(as.Date(datetimes$datetime), opening_times$from_time[1])),
ymd_hm(paste(as.Date(datetimes$datetime), opening_times$till_time[1]))),
datetimes$datetime %within%
interval(ymd_hm(paste(as.Date(datetimes$datetime), opening_times$from_time[2])),
ymd_hm(paste(as.Date(datetimes$datetime), opening_times$till_time[2]))))
I'm new to R, so this may very well be a simple problem, but it's causing me a lot of difficulty.
I am trying to subset between two values found across data frames, and I am having difficulty when trying to subset between these two values. I will first describe what I've done, what is working, and then what is not working.
I have two data frames. One has a series of storm data, including dates of storm events, and the other has a series of data corresponding to discharge for many thousands of monitoring events. I am trying to see if any of the discharge data corresponds within the storm event start and end dates/times.
What I have done thus far is as follows:
Example discharge data:
X. DateTime Depth DateTime1 newcol
1 3 8/2/2013 13:15 0.038 2013-08-02 13:15:00 1375463700
2 4 8/2/2013 13:30 0.038 2013-08-02 13:30:00 1375464600
3 5 8/2/2013 13:45 0.039 2013-08-02 13:45:00 1375465500
4 6 8/2/2013 14:00 0.039 2013-08-02 14:00:00 1375466400
Example storm data:
Storm newStart newEnd
1 1 1382125500 1382130000
2 2 1385768100 1385794200
#Make a value to which the csv files are attached
CA_Storms <- read.csv(file = "CA_Storms.csv", header = TRUE, stringsAsFactors = FALSE)
CA_adj <- read.csv(file = "CA_Adj.csv", header = TRUE, stringsAsFactors = FALSE)
#strptime function (do this for all data sets)
CA_adj$DateTime1 <- strptime(CA_adj$DateTime, format = "%m/%d/%Y %H:%M")
CA_Storms$Start.time1 <- strptime(CA_Storms$Start.time, format = "%m/%d/%Y %H:%M")
CA_Storms$End.time1 <- strptime(CA_Storms$End.time, format = "%m/%d/%Y %H:%M")
#Make dates and times continuous
CA_adj$newcol <- as.numeric(CA_adj$DateTime1)
CA_Storms$newStart <- as.numeric(CA_Storms$Start.time1)
CA_Storms$newEnd <- as.numeric(CA_Storms$End.time1)
This allows me to do the following subsets successfully:
CA_adj[CA_adj$newcol == "1375463700", ]
Example output:
X. DateTime Depth DateTime1 newcol
1 3 8/2/2013 13:15 0.038 2013-08-02 13:15:00 1375463700
CA_adj[CA_adj$newcol == CA_Storms[1,19], ]
X. DateTime Depth DateTime1 newcol
7403 7408 10/18/2013 15:45 0.058 2013-10-18 15:45:00 1382125500
CA_adj[CA_adj$newcol <= CA_Storms[1,20], ]
However, whenever I try to have it move between two values, such as in:
CA_adj[CA_adj$newcol >= CA_Storms[1,19] & CA_adj$newol <= CA_Storms[1,20], ]
it responds with this:
[1] X. DateTime Depth DateTime1 newcol
<0 rows> (or 0-length row.names)
I know this output is incorrect, as, through a cursory look through my large data set, there is at least one value that falls within these criteria.
What gives?
discharge<-data.frame( x=c(3,4,5,6),
DateTime=c("8/2/2013 13:15","8/2/2013 13:30",
"8/2/2013 13:45","8/2/2013 14:00"),
Depth=c(0.038, 0.038, 0.039, 0.039)
)
discharge$DateTime1<- as.POSIXct(discharge$DateTime, format = "%m/%d/%Y %H:%M")
storm<-data.frame( storm=c(1,2),
start=c("8/2/2013 13:15","8/2/2013 16:30"),
end=c("8/2/2013 13:45","8/2/2013 16:45")
)
storm$start<- as.POSIXct(storm$start, format = "%m/%d/%Y %H:%M")
storm$end<- as.POSIXct(storm$end, format = "%m/%d/%Y %H:%M")
discharge[(discharge$DateTime1>=storm[1,2] & discharge$DateTime1<=storm[1,3]),]
Here is an example of a subset data in .csv files. There are three columns with no header. The first column represents the date/time and the second column is load [kw] and the third column is 1= weekday, 0 = weekends/ holiday.
9/9/2010 3:00 153.94 1
9/9/2010 3:15 148.46 1
I would like to program in R, so that it selects the first and second column within time ranges from 10:00 to 20:00 for all weekdays (when the third column is 1) within a month of September and do not know what's the best and most efficient way to code.
code dt <- read.csv("file", header = F, sep=",")
#Select a column with weekday designation = 1, weekend or holiday = 0
y <- data.frame(dt[,3])
#Select a column with timestamps and loads
x <- data.frame(dt[,1:2])
t <- data.frame(dt[,1])
#convert timestamps into readable format
s <- strptime("9/1/2010 0:00", format="%m/%d/%Y %H:%M")
e <- strptime("9/30/2010 23:45", format="%m/%d/%Y %H:%M")
range <- seq(s,e, by = "min")
df <- data.frame(range)
OP ask for "best and efficient way to code" this without showing "inefficient code", so #Justin is right.
It's seems that the OP is new to R (and it's officially the summer of love) so I give it a try and I have a solution (not sure about efficiency..)
index <- c("9/9/2010 19:00", "9/9/2010 21:15", "10/9/2010 11:00", "3/10/2010 10:30")
index <- as.POSIXct(index, format = "%d/%m/%Y %H:%M")
set.seed(1)
Data <- data.frame(Date = index, load = rnorm(4, mean = 120, sd = 10), weeks = c(0, 1, 1, 1))
## Data
## Date load weeks
## 1 2010-09-09 19:00:00 113.74 0
## 2 2010-09-09 21:15:00 121.84 1
## 3 2010-09-10 11:00:00 111.64 1
## 4 2010-10-03 10:30:00 135.95 1
cond <- expression(format(Date, "%H:%M") < "20:00" &
format(Date, "%H:%M") > "10:00" &
weeks == 1 &
format(Date, "%m") == "09")
subset(Data, eval(cond))
## Date load weeks
## 3 2010-09-10 11:00:00 111.64 1