I have a data set that has event dates at multiple locations:
year<-rep(2010:2021,3)
month<-rep(1:3,12)
loc<-rep(letters[1:3],each=12)
event_date<-as.Date(paste(year,month, "01"), "%Y%m%d")
event_data<-data.frame(loc,year,event_date)
I want to expand the data set so for each month of the year I have a time since event variable for each location (preferably in months but days is fine). I have tried below but there are minus values at the change of the years and I want to continue on with the time since event until the event in the follow year (no minus values)
months<-expand.grid(year=unique(year),month=1:12)
month_data<-left_join(event_data, months, by = "year")
month_data$date<-as.Date(paste(month_data$year,month_data$month, "01"), "%Y%m%d")
month_data$diff<-month_data$date-month_data$event_date
May not be the neatest but this has done the trick. Instead of expand.grid I have written a loop that creates a sequence of dates by month between each event date :
res<-list()
r_list<-list()
for(i in unique(event_data$loc)) {
event_data1<-event_data[event_data$loc==i,] # outer loop splitting dataset by location
for (j in 1:length(event_data1$event_date)) {
if (is.na(event_data1$event_date[j+1]) == TRUE) {
dates<-as.Date(seq(event_data1$event_date[j],(event_data1$event_date[j]+365), by = 'month')) # seq[j+1] fails as there is no date after the last one. Adding a years worth of months
dates<-head(dates, -1)
camp<-rep(year(dates)[1], length(dates))
diff<- 0:(length(camp)-1)
loc<-rep(event_data1$loc[j], length(dates))
r_list[[j]]<-data.frame(dates,camp,diff,loc)
}
else {dates<-as.Date(seq(event_data1$event_date[j],event_data1$event_date[j+1], by = 'month')) # sequence from one event to another by month
dates<-head(dates, -1) # the last date in the seq is in fact the next event so needs removed
camp<-rep(year(dates)[1], length(dates)) # year of event
diff<- 0:(length(camp)-1) # Months since event
loc<-rep(event_data1$loc[j], length(dates)) # location
r_list[[j]]<-data.frame(dates,camp,diff,loc)
}
}
res<-c(res,r_list)
}
data_new<-do.call("rbind",(res))
Related
I need to run an analysis from 10AM to 4PM.
The original data runs from 9 AM to 5 PM, everyday for one year.
How to include only the indicated time period for analysis ?
window in zoo does not help for the same.
structure(c(0, 7.12149266486255e-05, 0.000142429853297251, 0.000213644779945877,
0.000284859706594502, 0.000356074633243128, 0.000427289559891753,
0.000498504486540379, 0.000569719413189004, 0.00064093433983763,
0.000712149266486256, 0.000783364193134881, 0.000854579119783507,
0.000925794046432132, 0.000997008973080758, 0.00106822389972938,
0.00113943882637801, 0.00121065375302663, 0.00128186867967526,
0.00135308360632389, 0.00142429853297251, 0.00149551345962114,
0.00156672838626976, 0.00163794331291839, 0.00170915823956701,
0.00178037316621564, 0.00185158809286426, 0.00192280301951289,
0.00199401794616152, 0.00206523287281014), index = structure(c(1009942620,
1009942680, 1009942740, 1009942800, 1009942860, 1009942920, 1009942980,
1009943040, 1009943100, 1009943160, 1009943220, 1009943280, 1009943340,
1009943400, 1009943460, 1009943520, 1009943580, 1009943640, 1009943700,
1009943760, 1009943820, 1009943880, 1009943940, 1009944000, 1009944060,
1009944120, 1009944180, 1009944240, 1009944300, 1009944360), class = c("POSIXct",
"POSIXt")), class = "zoo")
How to select periods of time > 10 AM and time < 4 PM, across several days.
If z is the zoo object then
1) use this to extract hour of each time point and then subset to only those that are 10, 11, 12, 13, 14 or 15.
z[format(time(z), "%H") %in% 10:15]
2) or use this alternative which is similar but uses POSIXlt to get the hour:
z[as.POSIXlt(time(z))$hour %in% 10:15]
3) or convert the series to xts and use this:
x <- as.xts(z)["T10:00/T15:00"]
drop(as.zoo(x))
Omit the second line if it is ok to return an xts series.
Time Zone
Be sure that you have the time zone set correctly since the time in one time zone is not the same as in another time zone.
We can query the current time zone of the session like this:
Sys.timezone()
and can set it like this:
Sys.setenv(TZ = "...")
where ... is replaced with the time zone desired. Common settings are:
Sys.setenv(TZ = "GMT")
Sys.setenv(TZ = "") # restore default
The following will show the possible time zones that can be used:
OlsonNames()
You only need all this if the time zone of your session is not already set to the time zone of the data.
You could build a tibble for analysis with time, value and hour information. You can then filter the rows only between 10AM to 4PM.
library(dplyr)
library(zoo)
tibble(time = index(df),
value = coredata(df),
hour = lubridate::hour(time)) %>%
filter(between(hour, 10, 15)) -> result
result
My Question is divided into 2 parts:
1st part:
I have a function, getdata() which I use to pull information for a date range.
get_data <- function (fac_num, start_date, end_date) {
if (!(is.null(fac_num) | is.null(start_date) | is.null(end_date))) {
if(end_date - start_date > 7) {
start_date <- end_date - 7
#start_date <- as.Date('2017-07-05')
#end_date <- as.Date('2017-07-06')
#fac_num <- "005"
}
new_start_date <- paste0(start_date,' 05:00:00')
new_end_date <- paste0(end_date + 1,' 05:00:00')
qry <- paste0("SELECT FAC_NUM, USER_ID, APPL_ID, FUNC_ID, ST_ID, NXT_ST_ID, RESP_PRMT_DATA,
ST_DT_TM, END_DT_TM, RESP_PRMT_TY_CDE,
REQ_INP_DATA FROM OPSDBA.STG_RFS_INTERACTION WHERE TRANS_ST_DT_TM >= DATE'",
start_date,"' AND TRANS_ST_DT_TM BETWEEN TO_TIMESTAMP('",new_start_date,"', 'YYYY-MM-DD HH:MI:SS') AND TO_TIMESTAMP('",new_end_date,"', 'YYYY-MM-DD HH:MI:SS')
AND APPL_ID='CTS' AND FAC_NUM='",fac_num,"'")
and then I perform calculations on it.
Further, in my program. I use this getdata() function to pull data for a new set of analysis.
rf_log_perform <- get_data(display_facility_decode(input$facNum2),
input$dateRange2, input$dateRange2 + 1)
Here since I am using just a single date instead of range, I have added one to the range so that the getdata() function would work.
I then wanted to modify the date range in such a way that, it does not show anything past 11:59 for the selected date.
rf_log_perform$date <- ifelse(strftime(rf_log_perform$st_dt_tm, format="%H:%M:%S")<'05:00:00',
format(as.POSIXct(strptime(rf_log_perform$st_dt_tm - 1*86400 , '%Y-%m-%d %H:%M:%S')),format = '%Y-%m-%d'),
format(as.POSIXct(strptime(rf_log_perform$st_dt_tm , '%Y-%m-%d %H:%M:%S')),format = '%Y-%m-%d'))
By using the getdata() function, I would be able to pull data for date range 08/29/2017, 05:00:00 to 08/30/2017, 05:00:00 which is considered to be a day in my example.
But for my calculations, I want to discard everything which is beyond 08/29/2017, 11:59:59 PM, for more accurate results.
For this purpose, I have added an ifelse statement in there to sort that out. But this isn't behaving as I expect and am confused on why not.
Unfortunately I still can not comment on the main question.
I encourage you to make two adjustments to your question to improve the chances on getting an answer to your question:
1) Please make your example reproducible e.g. provide date ranges, wrap your code in a well defined function etc.
2) Explain what you are trying to achieve. What is your intention and expected result.
Im tring to use the bizdays package to generate a vector with bus days between two dates.
fer = as.data.frame(as.Date(fer[1:938]))
#Define default calendar
bizdays.options$set(default.calendar=fer)
dt1 = as.Date(Sys.Date())
dt2 = as.Date(Sys.Date()-(365*10)) #sample 10 year window
#Create date vector
datas = bizseq(dt2, dt1)
i get this error: "Error in bizseq.Date(dt2, dt1) : Given date out of range."
the same behavior for any function bizdays et al.
any ideas?
I had a similar problem, but could not apply the accepted answer to my case. What worked for me was to make sure that the first and last holiday in the vector holidays at least covers (or exceeds) the range of dates provided to bizdays():
library(bizdays)
This works (from_date and to_date both lie within the first and last holiday provided by holidays):
holidays <- c("2016-08-10", "2016-08-13")
from_date <- "2016-08-11"
to_date <- "2016-08-12"
cal <- Calendar(holidays, weekdays=c('sunday', 'saturday'))
bizdays(from_date, to_date, cal)
#1
This does not work (to_date lies outside of the last holiday of holidays):
holidays <- c("2016-08-10", "2016-08-11")
from_date <- "2016-08-11"
to_date <- "2016-08-12"
cal <- Calendar(holidays, weekdays=c('sunday', 'saturday'))
bizdays(from_date, to_date, cal)
# Error in bizdays.Date(from, to, cal) : Given date out of range.
If fer is the holidays, you can try with:
bizdays.options$set(default.calendar=Calendar(holidays=fer))
I am new to R and struggling with the fact that functions are able to operate on whole vectors without having to explicitly specify this.
My goal
I have a data frame calls with multiple columns, one of which is a “date” column. Now I want to add a new column, “daytime”, that labels the daytime the particular entry’s date falls into:
> calls
call_id length date direction daytime
1 258 531 1400594572974 outgoing afternoon
2 259 0 1375555528144 unanswered evening
3 260 778 1385922648396 incoming evening
What I have done so far
I have already implemented methods that return a vector of booleans like that:
# Operates on POSIXlt timestamps
is.earlymorning <- function(date) {
hour(floor_date(date, "hour")) >= 5 & hour(floor_date(date, "hour")) < 9
}
The call is.earlymorning(“2014-05-20 16:02:52”, “2013-08-03 20:45:28”, “2013-12-01 19:30:48”) would thus return (“FALSE”, “FALSE”, “FALSE”). What I am currently struggling with is to implement a function that actually returns labels. What I would like the function to do is the following:
# rawDate is a long value of the date as ms since 1970
Daytime <- function(rawDate) {
date <- as.POSIXlt(as.numeric(rawDate) / 1000, origin = "1970-01-01")
if (is.earlymorning(date)) {
"earlymorning"
} else if (is.morning(date)) {
"morning"
} else if (is.afternoon(date)) {
"afternoon"
} else if (is.evening(date)) {
"evening"
} else if (is.earlynight(date)) {
"earlynight"
} else if (is.latenight(date)) {
"latenight"
}
}
The problem
Obviously, my above approach does not work since the if-conditions would operate on whole vectors in my example. Is there an elegant way to solve this problem? I am sure I am confusing or missing some important points, but as I mentioned I am pretty new to R.
In short, what I want to implement is a function that returns a vector of labels according to a vector of date values:
# Insert new column with daytime labels
calls$daytime <- Daytime(df$date)
# or something like that:
calls$daytime <- sapply(df$date, Daytime)
# Daytime(1400594572974, 1375555528144, 1385922648396) => (“afternoon”, “evening”, “evening”)
One approach would be to use cut rather than ifelse. I am not entirely sure how you want to label hours, but this will give you the idea. foo is your data (i.e., calls).
library(dplyr)
# Following your idea
ana <- transform(foo, date = as.POSIXlt(as.numeric(date) / 1000, origin = "1970-01-01"))
ana %>%
mutate(hour = cut(as.numeric(format(date, "%H")),
breaks = c(00,04,08,12,16,20,24),
label = c("late night", "early morning",
"morning", "afternoon",
"evening", "early night")
)
)
# call_id length date direction daytime hour
#1 258 531 2014-05-20 23:02:52 outgoing afternoon early night
#2 259 0 2013-08-04 03:45:28 unanswered evening late night
#3 260 778 2013-12-02 03:30:48 incoming evening late night
There is no need to have 6 different functions to establish which period of the day a given date is. It suffices to define a vector which matches the hour with the daytime. For instance:
Daytime<-function(rawDate) {
#change the vector according to your definition of the daytime.
#the first value corresponds to hour 0 and the last to hour 23
hours<-c(rep("latenight",5),rep("earlymorning",4),rep("morning",4),rep("afternoon",4),rep("evening",4),rep("earlynight",3))
hours[as.POSIXlt(as.numeric(rawDate) / 1000, origin = "1970-01-01")$hour+1]
}
Given Thomas' hint, I solved my problem in the following (addmittedly unelegant) way:
Daytime <- function(rawDates) {
dates <- as.POSIXlt(as.numeric(rawDates) / 1000, origin = "1970-01-01")
ifelse(is.earlymorning(dates), "earlymorning",
ifelse(is.morning(dates), "morning",
ifelse(is.afternoon(dates), "afternoon",
ifelse(is.evening(dates), "evening",
ifelse(is.earlynight(dates), "earlynight",
ifelse(is.latenight(dates), "latenight",
"N/A")
)
)
)
)
)
}
Considering a case with more labels this approach will get unmaintainable soon. Right now it serves my purposes and I will leave it at that since I must focus on analysing the data as soon as possible. But I will let you know if I had time left and found a less complicated solution! Thank you for your quick response, Thomas.
I'm measuring a physiological variable with a millisecond timestamp on a number of patients. For each patient I want to apply a factor to a subset of the timestamped rows describing their posture at that exact moment.
I've tried creating the following function, which works fine when describing the first posture. When trying to apply the next "posture-factor," the previously registered posture is deleted.
TestPatient <- data.frame(Time=seq(c(ISOdatetime(2011,12,22,12,00,00)), by = "sec", length.out = 100),Value=rnorm(100, 9, 3))
patientpositionslice <- function(patient,positiontype,timestart,timestop) {
patient$Position[
format(patient$Time, "%Y-%m-%d %H:%M:%S") >= timestart &
format(patient$Time, "%Y-%m-%d %H:%M:%S") < timestop] <- positiontype
patient
}
TestPatientNew <- patientpositionslice(TestPatient,"Horizontal","2011-12-22 12:00:05","2011-12-22 12:00:10")
TestPatientNew <- patientpositionslice(TestPatient,"Vertical","2011-12-22 12:00:15","2011-12-22 12:00:20")
How do I modify the function so I can apply it repeatedly on the same patient with different postures such as "Horizontal", "Vertical", "Sitting" etc.?
Here's your solution. Probably there are more elegant ways but this is mine ;)
TestPatient <- data.frame(Time=seq(c(ISOdatetime(2011,12,22,12,00,00)), by = "sec", length.out = 100),Value=rnorm(100, 9, 3))
#Included column with position
TestPatient$position <- NA
patientpositionslice <- function(patient,positiontype,timestart,timestop) {
#changed the test to ifelse() function
new<-ifelse(
format(patient$Time, "%Y-%m-%d %H:%M:%S") >= timestart &
format(patient$Time, "%Y-%m-%d %H:%M:%S") < timestop , positiontype, patient$position)
patient$position <- new
patient
}
TestPatientNew <- patientpositionslice(TestPatient,"Horizontal","2011-12-22 12:00:05","2011-12-22 12:00:10")
#For repeated insertion use the previous object
TestPatientNew <- patientpositionslice(TestPatientNew ,"Vertical","2011-12-22 12:00:15","2011-12-22 12:00:20")
i commented the changes. hope it is like you wanted it else just correct me.