I'm trying to do some tests around measurement periods in time. I'd like to increment the size of the measurement bins (ie 1 month vs 2 months, etc.).
I have a data frame with a date seq() which works fine my problem is with incrementing the date by a month, week, etc.
df1 <- data.frame(id = 1:20, date1 = seq(as.Date('2012-01-01'),by = 'month', len = 20))
df1$date2 <- df1$date1 + 30
This is obviously wrong if I want the 1st of each month or week. Is there a function or package for this type of issue?
EDIT:
This :
seq( x, by = "month", length.out = 1)
seems to work for individual cells, but won't work for a column as it returns a numeric:
df1$date2 <- sapply(df1$date1, function(x) seq( x, by = "month", length.out = 1))
> head(df1)
id date1 date2
1 1 2012-01-01 15340
2 2 2012-02-01 15371
3 3 2012-03-01 15400
4 4 2012-04-01 15431
5 5 2012-05-01 15461
6 6 2012-06-01 15492
It sounds like you're looking for cut:
df1$date2 <- cut(df1$date1 + as.difftime(31, units='days'), breaks='months')
df1$date3 <- cut(df1$date2 + as.difftime(1, units='weeks'), breaks='weeks')
There might be more elegant solutions but this should work -
df1$date2 <- as.Date(
paste(
ifelse(
strftime(df1$date1,'%m') == 12,
as.integer(strftime(df1$date1,'%Y')) + 1,
as.integer(strftime(df1$date1,'%Y'))
),
ifelse(
strftime(df1$date1,'%m') == 12,
1,
as.integer(strftime(df1$date1,'%m')) + 1
),
1,
sep = "-"
),
"%Y-%m-%d"
)
Related
This question already has answers here:
Convert week number to date
(5 answers)
Closed 9 months ago.
I've got a Dataset that looks like this:
Year
Week
Cases
2010
1
2
2010
4
3
2010
5
5
2010
6
1
I would like to convert the Year-Week columns into a single timestamp column (dd/mm/yyyy). Day of the week could be the first or the last one.
Is there a simple way to solve this?
Best,
Daniel
The weeks function in lubridate and str_c function in stringr might provide it:
df <- tribble(~year, ~week, 2010,1,2010,4,2010,5,2010,6)
df_tbl <- df %>%
mutate(beg = ymd(str_c(year, "-01-01")),
date_var = beg + weeks(week))
df_tbl$date_var
If you count week 1 as starting on 1st January, you could do:
as.Date(paste(df$Year, 1, 1, sep = '-')) + 7 * (df$Week - 1)
#> [1] "2010-01-01" "2010-01-22" "2010-01-29" "2010-02-05"
If you count week 1 as starting on the first Monday of the year (as per ISO 8601) then you could use this little function:
year_week <- function(year, week) {
as.Date(unlist((Map(function(y, w) {
d <- which(lubridate::wday(as.Date(paste(y, 1, 1:7, sep = '-'))) == 2)
as.Date(paste(y, 1, d, sep = '-')) + (w - 1) * 7}, y = year, w = week))),
origin = '1970-01-01')
}
This will give you the date of the nth Monday in the year, so that we have:
year_week(df$Year, df$Week)
#> [1] "2010-01-04" "2010-01-25" "2010-02-01" "2010-02-08"
A subject was measured at several time points over several days. I have a row "resptime_s" (time that the subject was answered the beep on his smartphone). Now I want to know the mean time between those (so between the rows of this column) with the night time taken out (nighttime is always from 22:30 p.m till 7:30 a.m). Take as example:
The R script:
setwd("C:/Users/Hanne/Desktop/")
dat <- read.csv(file="datnew2.csv", sep=";",header=TRUE)
rows <- c(1:388) #time points
columns <- c(2,60) # datum and time
nVariables = 2
newdata<-dat[rows,columns]
head(newdata)
fun2 <- function(x){
bt <- as.integer(sub("(^\\d{1,2}):.*", "\\1", x))
f <- cumsum(c(FALSE, diff(bt) < 0))
d <- rep(as.Date("2018-01-01"), length.out = length(bt))
bt <- as.POSIXct(paste(d, x))
res <- sapply(split(bt, f), function(b) c(0, difftime(b[-1], b[1])))
unname(unlist(res))
}
fun2(newdata$resptime_s)
But the result isn't correct.
And with:
dput(head(newdata, 30))
I obtained this output:
Using the different functions for working with time intervals in lubridate gives the most elegant and easy to understand solution.
library(tidyverse)
library(lubridate)
data <- tribble(
~time_point, ~beeptime,
1, "08:30",
2, "11:13",
3, "12:08",
4, "17:20",
5, "22:47",
6, "7:36",
7, "9:40"
) %>%
mutate(beeptime = as_datetime(hm(beeptime)))
1. Define the daytime interval
day <- interval(
as_datetime(hm("07:30")),
as_datetime(hm("22:30"))
)
2. Keep daytime beeps and estimate the time (interval) between them
# %--% is basically the same as interval() above.
data_interval <-
data %>%
filter(beeptime %within% day) %>%
mutate(beep_interval = lag(beeptime) %--% beeptime)
3. Take the average
# You can use as.numeric() to extract (e.g.) minutes, which you can
# just pass to mean().
data_interval$beep_interval %>%
as.numeric("minutes") %>%
abs() %>%
mean(na.rm = TRUE)
#> [1] 247.6
Try the following. It pastes a date that increments every time the next hour is less than the previous one. Then difftime works as expected.
fun <- function(x){
bt <- as.integer(sub("(^\\d{1,2}):.*", "\\1", x))
inx <- as.logical(cumsum(c(FALSE, diff(bt) < 0)))
d <- rep(as.Date("2018-01-01"), length.out = length(bt))
d[inx] <- d[inx] + 1
beeptime <- as.POSIXct(paste(d, x))
difftime(beeptime[-1], beeptime[1])
}
fun(newdata$beeptime)
#Time differences in hours
#[1] 2.716667 3.633333 8.833333 14.283333 23.100000 25.166667
Data.
newdata <-
structure(list(time_point = 1:7, beeptime = structure(1:7, .Label = c("08:30",
"11:13", "12:08", "17:20", "22:47", "7:36", "9:40"), class = "factor")), class = "data.frame", row.names = c(NA,
-7L))
Edit.
I believe that I have missunderstood the question. The OP does not want differences between the first hour and all others. What is needed is the differences restarting from zero every night.
If this is the case, the following function will do it.
fun2 <- function(x){
bt <- as.integer(sub("(^\\d{1,2}):.*", "\\1", x))
f <- cumsum(c(FALSE, diff(bt) < 0))
d <- rep(as.Date("2018-01-01"), length.out = length(bt))
bt <- as.POSIXct(paste(d, x))
res <- sapply(split(bt, f), function(b) c(0, difftime(b[-1], b[1])))
unname(unlist(res))
}
fun2(newdata$beeptime)
#[1] 0.000000 2.716667 3.633333 8.833333 14.283333 0.000000 2.066667
Another approach could be to convert beeptime in offset (in seconds) from midnight using lubridate package.
We can then write a function(s) to calculate difference in time excluding night time (22:30 - 7:30).
Before we start solution, lets have a look for offset in seconds from midnight for 7:30 and 22:30.
library(lubridate)
as.numeric(seconds(hm("7:30")))
# [1] 27000
as.numeric(seconds(hm("22:30")))
# [1] 81000
I have written two sets of function to calculate difference between two times:
# Function checks individual time and shifts them to night boundary. So that
# time over night can be excluded.
checkNightBoundry <- function(val){
if(val < 27000){
val = 27000
} else if(val > 81000) {
val = 81000
}
val
}
# Arguments are offset from midnight in seconds
# Calculate difference between two time, excluding midtime
calcDifftime <- function(currVal, prevVal){
diffTime <- 0
currVal = checkNightBoundry(currVal)
prevVal = checkNightBoundry(prevVal)
if(currVal > prevVal){
diffTime = currVal - prevVal
}else if(currVal < prevVal){
diffTime = (81000 - prevVal) + (currVal - 27000)
}
diffTime
}
Now, use above functions:
library(dplyr)
library(lubridate)
df %>% mutate(beeptimeOffset = as.numeric(seconds(hm(beeptime)))) %>%
mutate(diffTime = mapply(calcDifftime,
beeptimeOffset, lag(beeptimeOffset, default = first(beeptimeOffset)))/3600)
# timepoint beeptime beeptimeOffset(sec) diffTime(hrs)
# 1 1 08:30 30600 0.0000000
# 2 2 11:13 40380 2.7166667
# 3 3 12:08 43680 0.9166667
# 4 4 17:20 62400 5.2000000
# 5 5 22:47 82020 5.1666667
# 6 6 7:36 27360 0.1000000
# 7 7 9:40 34800 2.0666667
Data:
df <- read.table(text =
"timepoint beeptime
1 08:30
2 11:13
3 12:08
4 17:20
5 22:47
6 7:36
7 9:40",
header = TRUE, stringsAsFactors = FALSE)
Made up a data frame. How to calculate the squared difference/error in hourly TMP and DW for 1/1 to 1/9 against 1/10? Need the sum of squared difference between hour1 to hour 24 of each day from 1/1 to 1/9 against 1/10
The output should look like
Date SETmp SEDW
2012/1/1 X1 Y1
......
2012/1/9 X9 Y9
Data:
set.seed(1)
dataset <- data.frame(Date = seq(from = as.POSIXct("2012-1-1 0:00", tz = "UTC"),
to = as.POSIXct("2012-1-10 23:00", tz = "UTC"),
by="hour"),
TMP = rnorm(240),
DW = rnorm(240))
If I understand your question correctly, we can get there using the by and merge functions:
# add day and hour columns (for subsetting and merge)
dataset$day <- lubridate::day(dataset$Date)
dataset$hour <- lubridate::hour(dataset$Date)
# split data apart
data_ten <- subset(dataset, day == 10)
data_one_to_nine <- subset(dataset, day != 10)
# for each date, merge to data_ten using hours
# then calculate sum of squared differences
do.call('rbind.data.frame',
by(data_one_to_nine, data_one_to_nine$day, function(d){
xm <- merge(d, data_ten, by = 'hour')
data.frame(
'Date' = unique(as.Date(d$Date)),
'SE_TMP' = sum((xm$TMP.x - xm$TMP.y)^2),
'SE_DW' = sum((xm$DW.x - xm$DW.y)^2),
stringsAsFactors = FALSE
)
})
)
Date SE_TMP SE_DW
1 2012-01-01 59.33207 63.41261
2 2012-01-02 42.04597 58.90700
3 2012-01-03 66.15492 51.81897
4 2012-01-04 31.83438 40.68851
5 2012-01-05 30.26666 59.30694
6 2012-01-06 45.05186 55.39751
7 2012-01-07 61.93305 39.76287
8 2012-01-08 37.08246 47.81958
9 2012-01-09 58.54562 64.79331
I have the following R matrix:
Date MyVal
2016 1
2017 2
2018 3
....
2026 10
What I want to do is "blow it up" so that it goes like this (where monthly values are linearly interpolated):
Date MyVal
01/01/2016 1
02/01/2016 ..
....
01/01/2017 2
....
01/01/2026 10
I realize I can easily generate the sequence using:
DateVec <- seq(as.Date(paste(minYear,"/01/01", sep = "")), as.Date(paste(maxYear, "/01/01", sep = "")), by = "month")
And I can use that to make a large matrix and then fill things in using a for loop over the DateVector in but I wonder if there's a more elegant R way to do this?
You can use stats::approx:
library(stats)
ipc <- approx(df$Date, df$MyVal, xout = DateVec,
rule = 1, method = "linear", ties = mean)
You probably need to first convert the data in your original data-frame to have month and day and also be in asPOSIXct or as.Date format.
Based on what you provided, this works:
#Make the reference data-frame for interpolation:
DateVec <- seq(min(df$Date, na.rm=T),
max(df$Date, na.rm=T), by = "month")
#Interpolation:
intrpltd_df <- approx(df$Date, df$MyVal, xout = DateVec,
rule = 1, method = "linear", ties = mean)
# x y
# 1 2016-01-01 1.000000
# 2 2016-02-01 1.084699
# 3 2016-03-01 1.163934
# 4 2016-04-01 1.248634
# 5 2016-05-01 1.330601
# 6 2016-06-01 1.415301
Data:
#reproducing the data-frame:
Date <- seq(2016,2026)
MyVal <- seq(1:11)
Date <- data.frame(as.Date(paste0(Date,"/01/01"))) #yyyy-mm-dd format
df <- cbind(Date, MyVal)
df <- as.data.frame(df)
colnames(df) <- c ("Date", "MyVal") #Changing Column Names
I have a dataframe with a number of accounts, their status and the start and endtime for that status. I would like to report on the number of accounts in each of these statuses over a date range. The data looks like the df below, with the resulting report. (Actual data contains more state values. N/A values are shown with a dummy date far in the future.)
df <- data.frame(account = c(1,1,2,3),
state = c("Open","Closed","Open","Open"),
startdate = c("2016-01-01","2016-04-04","2016-03-02","2016-08-01"),
enddate = c("2016-04-04","2999-01-01","2016-05-02","2016-08-05")
)
report <- data.frame(date = seq(from = as.Date("2016-04-01"),by="1 day", length.out = 6),
number.open = c(2,2,2,1,1,1)
)
I have looked at options involving rowwise() and mutate from dplyr and foverlaps from data.table, but haven't been able to code it up so it works.
(See Checking if Date is Between two Dates in R)
We can use sapply to do this for us:
report$NumberOpen <-
sapply(report$date, function(x)
sum(as.Date(df1$startdate) < as.Date(x) &
as.Date(df1$enddate) > as.Date(x) &
df1$state == 'Open'))
# report
# date NumberOpen
# 1 2016-04-01 2
# 2 2016-04-02 2
# 3 2016-04-03 2
# 4 2016-04-04 1
# 5 2016-04-05 1
# 6 2016-04-06 1
data
df1 <- data.frame(account = c(1,1,2,3),
state = c("Open","Closed","Open","Open"),
startdate = c("2016-01-01","2016-04-04","2016-03-02","2016-08-01"),
enddate = c("2016-04-04","2999-01-01","2016-05-02","2016-08-05")
)
report <- data.frame(date = seq(from = as.Date("2016-04-01"),by="1 day", length.out = 6)
)