Recode Date (time) varibre in to new discrete variable - r

i have time variable : "00:00:29","00:06:39","20:43:15"....
and I want to recode to new vector - time based work shifts:
07:00:00 - 13:00:00 - 1
13:00:00 - 20:00:00 - 2
23:00:00 - 7:00:00 - 3
thanks for any idea :)

Assuming the time variables are strings as shown, this seems to work:
secNr <- function(x){ sum(as.numeric(unlist(strsplit(x,":",fixed=TRUE))) * c(3600,60,1)) }
workShift <- function(x)
{
n <- which.max(secNr(x) >= c(secNr("23:00:00"),secNr("20:00:00"),secNr("13:00:00"),secNr("07:00:00"),secNr("00:00:00")))
c(3,NA,2,1,3)[n]
}
"workShift" computes the work shift of one such time string. If you have a vector of time strings, use "sapply". Example:
> Time <- sprintf("%i:%02i:00", 0:23, sample(0:59,24))
> Shift <- sapply(Time,"workShift")
> Shift
0:37:00 1:17:00 2:35:00 3:09:00 4:08:00 5:28:00 6:03:00 7:43:00 8:27:00 9:38:00 10:48:00 11:50:00 12:58:00 13:32:00 14:05:00 15:39:00 16:56:00
3 3 3 3 3 3 3 1 1 1 1 1 1 2 2 2 2
17:00:00 18:22:00 19:02:00 20:42:00 21:11:00 22:15:00 23:01:00
2 2 2 NA NA NA 3

Related

Error in prepData function in package moveHMM contiguous data

I am trying to use the prepData function in the R package moveHMM. I am getting "Error in prepData(x, coordNames = c("lon", "lat")) : Each animal's obervations must be contiguous."
x is a data.frame with column names "ID", "long", "lat". ID column is the name of each animal as a character, and lon/lat are numeric. There are no NA values, no missing rows.
I do not know what this error means nor can I fix it. Help please.
x <- data.frame(dat$ID, dat$lon, dat$lat)
hmmgps <- prepData(x, coordNames=c("lon", "lat"))
The function prepData assumes that the rows for each track (or each animal) are grouped together in the data frame. The error message indicates that it is not the case, and that at least one track is split. For example, the following (artificial) data set would cause this error:
> data
ID lon lat
1 1 54.08658 12.190313
2 1 54.20608 12.101203
3 1 54.18977 12.270896
4 2 55.79217 9.943341
5 2 55.88145 9.986028
6 2 55.91742 9.887342
7 1 54.25305 12.374541
8 1 54.28061 12.190078
This is because the track with ID "1" is split into two parts, separated by the track with ID "2".
The tracks need to be contiguous, i.e. all observations with ID "1" should come first, followed by all observations with ID "2". One possible solution would be to order the data by ID and by date.
Consider the same data set, with a "date" column:
> data
ID lon lat date
1 1 54.08658 12.190313 2019-09-06 14:20:00
2 1 54.20608 12.101203 2019-09-06 15:20:00
3 1 54.18977 12.270896 2019-09-06 16:20:00
4 2 55.79217 9.943341 2019-09-04 07:55:00
5 2 55.88145 9.986028 2019-09-04 08:55:00
6 2 55.91742 9.887342 2019-09-04 09:55:00
7 1 54.25305 12.374541 2019-09-06 17:20:00
8 1 54.28061 12.190078 2019-09-06 18:20:00
Following the answer to that question, you can define the ordered data set with:
> data_ordered <- data[with(data, order(ID, date)),]
> data_ordered
ID lon lat date
1 1 54.08658 12.190313 2019-09-06 14:20:00
2 1 54.20608 12.101203 2019-09-06 15:20:00
3 1 54.18977 12.270896 2019-09-06 16:20:00
7 1 54.25305 12.374541 2019-09-06 17:20:00
8 1 54.28061 12.190078 2019-09-06 18:20:00
4 2 55.79217 9.943341 2019-09-04 07:55:00
5 2 55.88145 9.986028 2019-09-04 08:55:00
6 2 55.91742 9.887342 2019-09-04 09:55:00
Then, the ordered data (excluding the date column) can be passed to prepData:
> hmmgps <- prepData(data_ordered[,1:3], coordNames = c("lon", "lat"))
> hmmgps
ID step angle x y
1 1 16.32042 NA 54.08658 12.190313
2 1 18.85560 2.3133191 54.20608 12.101203
3 1 13.37296 -0.6347523 54.18977 12.270896
4 1 20.62507 -2.4551318 54.25305 12.374541
5 1 NA NA 54.28061 12.190078
6 2 10.86906 NA 55.79217 9.943341
7 2 11.60618 -1.6734604 55.88145 9.986028
8 2 NA NA 55.91742 9.887342
I hope that this helps.

Select value from time range dataframe in R

I have a dataframe of datetimes
tdata_df <- data.frame(timestamp=seq(c(ISOdate(2018,4,20)), by = (60*229), length.out = 6))
tdata_df
timestamp
1 2018-04-20 21:00:00
2 2018-04-21 00:49:00
3 2018-04-21 04:38:00
4 2018-04-21 08:27:00
5 2018-04-21 12:16:00
6 2018-04-21 16:05:00
then I would like to get value from this time range table
time_range_df <- data.frame(start=c("08:30","11:35","15:10","05:00"),
end=c("11:29","15:09","02:29","08:29"),value=c(1,2,3,4))
timerange_df
start end value
1 08:30 11:29 1
2 11:35 15:09 2
3 15:10 02:29 3
4 05:00 08:29 4
like this
timestamp value
1 2018-04-20 21:00:00 3
2 2018-04-21 00:49:00 3
3 2018-04-21 04:38:00 NA
4 2018-04-21 08:27:00 4
5 2018-04-21 12:16:00 2
6 2018-04-21 16:05:00 3
Any help would be greatly appreciated.
The sqldf package provides greater flexibility to join in such cases. The approach is:
Change time in time_range_df to offset from mid-night.
Add a column in tdata_df to represent time elapsed since midnight
Join both data frames for overlapped time since midnight
library(lubridate)
time_range_df$start <- as.numeric(seconds(hm(time_range_df$start)))
time_range_df$end <- as.numeric(seconds(hm(time_range_df$end)))
tdata_df$timeSinceMidNigh <- as.numeric(seconds(hms(format(ymd_hms(tdata_df$timestamp),
format = "%H:%M:%S"))))
library(sqldf)
sqlquery <- "SELECT D1.timestamp, Q.value FROM tdata_df D1
LEFT JOIN (SELECT * FROM tdata_df D, time_range_df R
WHERE (R.start < R.end AND D.timeSinceMidNigh between R.start AND R.end) OR
(R.start > R.end AND D.timeSinceMidNigh between R.start AND 86400) OR
(R.start > R.end AND D.timeSinceMidNigh between 0 and R.end)) Q
ON D1.timestamp = Q.timestamp"
sqldf(sqlquery)
# timestamp value
# 1 2018-04-20 13:00:00 2
# 2 2018-04-20 16:49:00 3
# 3 2018-04-20 20:38:00 3
# 4 2018-04-21 00:27:00 3
# 5 2018-04-21 04:16:00 NA
# 6 2018-04-21 08:05:00 4
Data:
tdata_df <- data.frame(timestamp=seq(c(ISOdate(2018,4,20)), by = (60*229), length.out = 6))
time_range_df <- data.frame(start=c("08:30","11:35","15:10","05:00"),
end=c("11:29","15:09","02:29","08:29"),value=c(1,2,3,4))

Using a rolling time interval to count rows in R and dplyr

Let's say I have a dataframe of timestamps with the corresponding number of tickets sold at that time.
Timestamp ticket_count
(time) (int)
1 2016-01-01 05:30:00 1
2 2016-01-01 05:32:00 1
3 2016-01-01 05:38:00 1
4 2016-01-01 05:46:00 1
5 2016-01-01 05:47:00 1
6 2016-01-01 06:07:00 1
7 2016-01-01 06:13:00 2
8 2016-01-01 06:21:00 1
9 2016-01-01 06:22:00 1
10 2016-01-01 06:25:00 1
I want to know how to calculate the number of tickets sold within a certain time frame of all tickets. For example, I want to calculate the number of tickets sold up to 15 minutes after all tickets. In this case, the first row would have three tickets, the second row would have four tickets, etc.
Ideally, I'm looking for a dplyr solution, as I want to do this for multiple stores with a group_by() function. However, I'm having a little trouble figuring out how to hold each Timestamp fixed for a given row while simultaneously searching through all Timestamps via dplyr syntax.
In the current development version of data.table, v1.9.7, non-equi joins are implemented. Assuming your data.frame is called df and the Timestamp column is POSIXct type:
require(data.table) # v1.9.7+
window = 15L # minutes
(counts = setDT(df)[.(t=Timestamp+window*60L), on=.(Timestamp<t),
.(counts=sum(ticket_count)), by=.EACHI]$counts)
# [1] 3 4 5 5 5 9 11 11 11 11
# add that as a column to original data.table by reference
df[, counts := counts]
For each row in t, all rows where df$Timestamp < that_row is fetched. And by=.EACHI instructs the expression sum(ticket_count) to run for each row in t. That gives your desired result.
Hope this helps.
This is a simpler version of the ugly one I wrote earlier..
# install.packages('dplyr')
library(dplyr)
your_data %>%
mutate(timestamp = as.POSIXct(timestamp, format = '%m/%d/%Y %H:%M'),
ticket_count = as.numeric(ticket_count)) %>%
mutate(window = cut(timestamp, '15 min')) %>%
group_by(window) %>%
dplyr::summarise(tickets = sum(ticket_count))
window tickets
(fctr) (dbl)
1 2016-01-01 05:30:00 3
2 2016-01-01 05:45:00 2
3 2016-01-01 06:00:00 3
4 2016-01-01 06:15:00 3
Here is a solution using data.table. Also incorporating different stores.
Example data:
library(data.table)
dt <- data.table(Timestamp = as.POSIXct("2016-01-01 05:30:00")+seq(60,120000,by=60),
ticket_count = sample(1:9, 2000, T),
store = c(rep(c("A","B","C","D"), 500)))
Now apply the following:
ts <- dt$Timestamp
for(x in ts) {
end <- x+900
dt[Timestamp <= end & Timestamp >= x ,CS := sum(ticket_count),by=store]
}
This gives you
Timestamp ticket_count store CS
1: 2016-01-01 05:31:00 3 A 13
2: 2016-01-01 05:32:00 5 B 20
3: 2016-01-01 05:33:00 3 C 19
4: 2016-01-01 05:34:00 7 D 12
5: 2016-01-01 05:35:00 1 A 15
---
1996: 2016-01-02 14:46:00 4 D 10
1997: 2016-01-02 14:47:00 9 A 9
1998: 2016-01-02 14:48:00 2 B 2
1999: 2016-01-02 14:49:00 2 C 2
2000: 2016-01-02 14:50:00 6 D 6

Alter values in one data frame based on comparison values in another in R

I am trying to subtract one hour to date/times within a POSIXct column that are earlier than or equal to a time stated in a different comparison dataframe for that particular ID.
For example:
#create sample data
Time<-as.POSIXct(c("2015-10-02 08:00:00","2015-11-02 11:00:00","2015-10-11 10:00:00","2015-11-11 09:00:00","2015-10-24 08:00:00","2015-10-27 08:00:00"), format = "%Y-%m-%d %H:%M:%S")
ID<-c(01,01,02,02,03,03)
data<-data.frame(Time,ID)
Which produces this:
Time ID
1 2015-10-02 08:00:00 1
2 2015-11-02 11:00:00 1
3 2015-10-11 10:00:00 2
4 2015-11-11 09:00:00 2
5 2015-10-24 08:00:00 3
6 2015-10-27 08:00:00 3
I then have another dataframe with a key date and time for each ID to compare against. The Time in data should be compared against Comparison in ComparisonData for the particular ID it is associated with. If the Time value in data is earlier than or equal to the comparison value one hour should be subtracted from the value in data:
#create sample comparison data
Comparison<-as.POSIXct(c("2015-10-29 08:00:00","2015-11-02 08:00:00","2015-10-26 08:30:00"), format = "%Y-%m-%d %H:%M:%S")
ID<-c(01,02,03)
ComparisonData<-data.frame(Comparison,ID)
This should look like this:
Comparison ID
1 2015-10-29 08:00:00 1
2 2015-11-02 08:00:00 2
3 2015-10-26 08:30:00 3
In summary, the code should check all times of a certain ID to see if any are earlier than or equal to the value specified in ComparisonData and if they are, subtract one hour. This should give this data frame as an output:
Time ID
1 2015-10-02 07:00:00 1
2 2015-11-02 11:00:00 1
3 2015-10-11 09:00:00 2
4 2015-11-11 09:00:00 2
5 2015-10-24 07:00:00 3
6 2015-10-27 08:00:00 3
I have looked at similar solutions such as this but I cannot understand how to also check the times using the right timing with that particular ID.
I think ddply seems quite a promising option but I'm not sure how to use it for this particular problem.
Here's a quick and efficient solution using data.table. First we join the two data sets by ID and then just modify the Times which are lower or equal to Comparison
library(data.table) # v1.9.6+
setDT(data)[ComparisonData, end := i.Comparison, on = "ID"]
data[Time <= end, Time := Time - 3600L][, end := NULL]
data
# Time ID
# 1: 2015-10-02 07:00:00 1
# 2: 2015-11-02 11:00:00 1
# 3: 2015-10-11 09:00:00 2
# 4: 2015-11-11 09:00:00 2
# 5: 2015-10-24 07:00:00 3
# 6: 2015-10-27 08:00:00 3
Alternatively, we could do this in one step while joining using ifelse (not sure how efficient this though)
setDT(data)[ComparisonData,
Time := ifelse(Time <= i.Comparison,
Time - 3600L, Time),
on = "ID"]
data
# Time ID
# 1: 2015-10-02 07:00:00 1
# 2: 2015-11-02 11:00:00 1
# 3: 2015-10-11 09:00:00 2
# 4: 2015-11-11 09:00:00 2
# 5: 2015-10-24 07:00:00 3
# 6: 2015-10-27 08:00:00 3
I am sure there is going to be a better solution than this, however, I think this works.
for(i in 1:nrow(data)) {
if(data$Time[i] < ComparisonData[data$ID[i], 1]){
data$Time[i] <- data$Time[i] - 3600
}
}
# Time ID
#1 2015-10-02 07:00:00 1
#2 2015-11-02 11:00:00 1
#3 2015-10-11 09:00:00 2
#4 2015-11-11 09:00:00 2
#5 2015-10-24 07:00:00 3
#6 2015-10-27 08:00:00 3
This is going to iterate through every row in data.
ComparisonData[data$ID[i], 1] gets the time column in ComparisonData for the corresponding ID. If this is greater than the Time column in data then reduce the time by 1 hour.

Produce weekly average plots from large dataset in R

I am quite new to R and have been struggling with trying to convert my data and could use some much needed help.
I have a dataframe which is approx. 70,000*2. This data covers a whole year (52 weeks/365 days). A portion of it looks like this:
Create.Date.Time Ticket.ID
1 2013-06-01 12:59:00 INCIDENT684790
2 2013-06-02 07:56:00 SERVICE684793
3 2013-06-02 09:39:00 SERVICE684794
4 2013-06-02 14:14:00 SERVICE684796
5 2013-06-02 17:20:00 SERVICE684797
6 2013-06-03 07:20:00 SERVICE684799
7 2013-06-03 08:02:00 SERVICE684839
8 2013-06-03 08:04:00 SERVICE684841
9 2013-06-03 08:04:00 SERVICE684842
10 2013-06-03 08:08:00 SERVICE684843
I am trying to get the number of tickets in every hour of the week (that is, hour 1 to hour 168) for each week. Hour 1 would start on Monday at 00.00, and hour 168 would be Sunday 23.00-23.59. This would be repeated for each week. I want to use the Create.Date.Time data to calculate the hour of the week the ticket is in, say for:
2013-06-01 12:59:00 INCIDENT684790 - hour 133,
2013-06-03 08:08:00 SERVICE684843 - hour 9
I am then going to do averages for each hour and plot those. I am completely at a loss as to where to start. Could someone please point me to the right direction?
Before addressing the plotting aspect of your question, is this the format of data you are trying to get? This uses the package lubridate which you might have to install (install.packages("lubridate",dependencies=TRUE)).
library(lubridate)
##
Events <- paste(
sample(c("INCIDENT","SERVICE"),20000,replace=TRUE),
sample(600000:900000,20000)
)
t0 <- as.POSIXct(
"2013-01-01 00:00:00",
format="%Y-%m-%d %H:%M:%S",
tz="America/New_York")
Dates <- sort(t0 + sample(0:(3600*24*365-1),20000))
Weeks <- week(Dates)
wDay <- wday(Dates,label=TRUE)
Hour <- hour(Dates)
##
hourShift <- function(time,wday){
hShift <- sapply(wday, function(X){
if(X=="Mon"){
0
} else if(X=="Tues"){
24*1
} else if(X=="Wed"){
24*2
} else if(X=="Thurs"){
24*3
} else if(X=="Fri"){
24*4
} else if(X=="Sat"){
24*5
} else {
24*6
}
})
##
tOut <- hour(time) + hShift + 1
return(tOut)
}
##
weekHour <- hourShift(time=Dates,wday=wDay)
##
Data <- data.frame(
Event=Events,
Timestamp=Dates,
Week=Weeks,
wDay=wDay,
dayHour=Hour,
weekHour=weekHour,
stringsAsFactors=FALSE)
##
This gives you:
> head(Data)
Event Timestamp Week wDay dayHour weekHour
1 SERVICE 783405 2013-01-01 00:13:55 1 Tues 0 25
2 INCIDENT 860015 2013-01-01 01:06:41 1 Tues 1 26
3 INCIDENT 808309 2013-01-01 01:10:05 1 Tues 1 26
4 INCIDENT 835509 2013-01-01 01:21:44 1 Tues 1 26
5 SERVICE 769239 2013-01-01 02:04:59 1 Tues 2 27
6 SERVICE 762269 2013-01-01 02:07:41 1 Tues 2 27

Resources