processing date and time data in R - r

Dear all, I have a data frame which comes directly from a sensor. The data provide the date and time in a single column. I want R to be able to recognise this data and then create an adjacent column in the data frame which gives a number that corresponds to a new day in the time and date column. For example 25/02/2011 13:34 in data$time.date would give 1 in the new column data$day, and 26/02/2011 13:34 in data$time.date would get 2 and so on....
Does anyone know how to go about solving this? Thanks in advance for any help.

You can use cut() and convert to numeric the factor resulting from that call. Here is an example with dummy data:
> sdate <- as.POSIXlt("25/02/2011 13:34", format = "%d/%m/%Y %H:%M")
> edate <- as.POSIXlt("02/03/2011 13:34", format = "%d/%m/%Y %H:%M")
> df <- data.frame(dt = seq(from = sdate, to = edate, by = "hours"))
> head(df)
dt
1 2011-02-25 13:34:00
2 2011-02-25 14:34:00
3 2011-02-25 15:34:00
4 2011-02-25 16:34:00
5 2011-02-25 17:34:00
6 2011-02-25 18:34:00
We cut the date time column into days using cut(). This results in a factor with the dates as labels. We convert this factor to numerics to get 1, 2, ...:
> df <- within(df, day <- cut(dt, "day", labels = FALSE))
> head(df, 13)
dt day
1 2011-02-25 13:34:00 1
2 2011-02-25 14:34:00 1
3 2011-02-25 15:34:00 1
4 2011-02-25 16:34:00 1
5 2011-02-25 17:34:00 1
6 2011-02-25 18:34:00 1
7 2011-02-25 19:34:00 1
8 2011-02-25 20:34:00 1
9 2011-02-25 21:34:00 1
10 2011-02-25 22:34:00 1
11 2011-02-25 23:34:00 1
12 2011-02-26 00:34:00 2
13 2011-02-26 01:34:00 2

You can achieve this using cut.POSIXt. For example:
dat <- data.frame(datetimes = Sys.time() - seq(360000, 0, by=-3600))
dat$day <- cut(dat$datetimes, breaks="day", labels=FALSE)
Note that this assumes your date time column is correclty formated as a date-time class.
See ?DateTimeClasses for details.

Related

Time series - Convert every column of dataframe to time series

I have a dataframe df in R:
month abc1 def2 xyz3
201201 1 2 4
201202 2 5 7
201203 4 11 4
201204 6 23 40
I would like to convert each of the columns (of which there are ~50, each with ~100 monthly observations) to a time series format in order to check for seasonality in the data, using the decompose function.
I assumed a for loop using the ts function would be the best way of doing this. I would like to use something along the lines of the loop below, although I realise using a function on the left side of the <- produces an error. Is there a way to dynamically name variables generated by a loop?
for(i in 2:ncol(df)) {
paste(names(df[, i]), "_ts") <- ts(df[ ,i], start = c(2012, 1), end = c(2021,11), frequency = 12)
}
You could try zoo:
test = data.frame(month=c("201201", "201202", "201203", "201204"), abc1=c(1,2,3,4), def2=c(4,6,7,10), xyz3=c(12,15,16,19))
library(zoo)
ZOO =zoo(test[, c("abc1", "def2", "xyz3")], order.by=as.Date(paste0(test$month, "01"), format="%Y%m%d"))
ts(ZOO, frequency=12)
Output:
abc1 def2 xyz3
Jan 1 1 4 12
Feb 1 2 6 15
Mar 1 3 7 16
Apr 1 4 10 19
attr(,"index")
[1] 2012-01-01 2012-02-01 2012-03-01 2012-04-01
Update:
Now with correct frequency.

Count the number of active episodes per month from data with start and end dates

I am trying to get a count of active clients per month, using data that has a start and end date to each client's episode. The code I am using I can't work out how to count per month, rather than per every n days.
Here is some sample data:
Start.Date <- as.Date(c("2014-01-01", "2014-01-02","2014-01-03","2014-01-03"))
End.Date<- as.Date(c("2014-01-04", "2014-01-03","2014-01-03","2014-01-04"))
Make sure the dates are dates:
Start.Date <- as.Date(Start.Date, "%d/%m/%Y")
End.Date <- as.Date(End.Date, "%d/%m/%Y")
Here is the code I am using, which current counts the number per day:
library(plyr)
count(Reduce(c, Map(seq, start.month, end.month, by = 1)))
which returns:
x freq
1 2014-01-01 1
2 2014-01-02 2
3 2014-01-03 4
4 2014-01-04 2
The "by" argument can be changed to be however many days I want, but problems arise because months have different lengths.
Would anyone be able to suggest how I can count per month?
Thanks a lot.
note: I now realize that for my example data I have only used dates in the same month, but my real data has dates spanning 3 years.
Here's a solution that seems to work. First, I set the seed so that the example is reproducible.
# Set seed for reproducible example
set.seed(33550336)
Next, I create a dummy data frame.
# Test data
df <- data.frame(Start_date = as.Date(sample(seq(as.Date('2014/01/01'), as.Date('2015/01/01'), by="day"), 12))) %>%
mutate(End_date = as.Date(Start_date + sample(1:365, 12, replace = TRUE)))
which looks like,
# Start_date End_date
# 1 2014-11-13 2015-09-26
# 2 2014-05-09 2014-06-16
# 3 2014-07-11 2014-08-16
# 4 2014-01-25 2014-04-23
# 5 2014-05-16 2014-12-19
# 6 2014-11-29 2015-07-11
# 7 2014-09-21 2015-03-30
# 8 2014-09-15 2015-01-03
# 9 2014-09-17 2014-09-26
# 10 2014-12-03 2015-05-08
# 11 2014-08-03 2015-01-12
# 12 2014-01-16 2014-12-12
The function below takes a start date and end date and creates a sequence of months between these dates.
# Sequence of months
mon_seq <- function(start, end){
# Change each day to the first to aid month counting
day(start) <- 1
day(end) <- 1
# Create a sequence of months
seq(start, end, by = "month")
}
Right, this is the tricky bit. I apply my function mon_seq to all rows in the data frame using mapply. This gives the months between each start and end date. Then, I combine all these months together into a vector. I format this vector so that dates just contain months and years. Finally, I pipe (using dplyr's %>%) this into table which counts each occurrence of year-month and I cast as a data frame.
data.frame(format(do.call("c", mapply(mon_seq, df$Start_date, df$End_date)), "%Y-%m") %>% table)
This gives,
# . Freq
# 1 2014-01 2
# 2 2014-02 2
# 3 2014-03 2
# 4 2014-04 2
# 5 2014-05 3
# 6 2014-06 3
# 7 2014-07 3
# 8 2014-08 4
# 9 2014-09 6
# 10 2014-10 5
# 11 2014-11 7
# 12 2014-12 8
# 13 2015-01 6
# 14 2015-02 4
# 15 2015-03 4
# 16 2015-04 3
# 17 2015-05 3
# 18 2015-06 2
# 19 2015-07 2
# 20 2015-08 1
# 21 2015-09 1

R: Create a New Column in R to determine Semester Based on Two Dates

I have some data. ID and date and I'm trying to create a new field for semester.
df:
id date
1 20160822
2 20170109
3 20170828
4 20170925
5 20180108
6 20180402
7 20160711
8 20150831
9 20160111
10 20160502
11 20160829
12 20170109
13 20170501
I also have a semester table:
start end season_year
20120801 20121222 Fall-2012
20121223 20130123 Winter-2013
20130124 20130523 Spring-2013
20130524 20130805 Summer-2013
20130806 20131228 Fall-2013
20131229 20140122 Winter-2014
20140123 20140522 Spring-2014
20140523 20140804 Summer-2014
20140805 20141227 Fall-2014
20141228 20150128 Winter-2015
20150129 20150528 Spring-2015
20150529 20150803 Summer-2015
20150804 20151226 Fall-2015
20151227 20160127 Winter-2016
20160128 20160526 Spring-2016
20160527 20160801 Summer-2016
20160802 20161224 Fall-2016
20161225 20170125 Winter-2017
20170126 20170525 Spring-2017
20170526 20170807 Summer-2017
20170808 20171230 Fall-2017
20171231 20180124 Winter-2018
20180125 20180524 Spring-2018
20180525 20180806 Summer-2018
20180807 20181222 Fall-2018
20181223 20190123 Winter-2019
20190124 20190523 Spring-2019
20190524 20180804 Summer-2019
I'd like to create a new field in df if df$date is between semester$start and semester$end, then place the respective value semester$season_year in df
I tried to see if the lubridate package could help but that seems to be more for calculations
I saw this question and it seems to be the closest to what i want, but, to make things more complicated, not all of our semesters are six months
Does this work?
library(lubridate)
semester$start <- ymd(semester$start)
semester$end <- ymd(semester$end)
df$date <- ymd(df$date)
LU <- Map(`:`, semester$start, semester$end)
LU <- data.frame(value = unlist(LU),
index = rep(seq_along(LU), lapply(LU, length)))
df$semester <- semester$season_year[LU$index[match(df$date, LU$value)]]
A solution using non-equi update joins using data.table and lubridate package can be as:
library(data.table)
setDT(df)
setDT(semester)
df[,date:=as.IDate(as.character(date), format = "%Y%m%d")]
semester[,':='(start = as.IDate(as.character(start), format = "%Y%m%d"),
end=as.IDate(as.character(end), format = "%Y%m%d"))]
df[semester, on=.(date >= start, date <= end), season_year := i.season_year]
df
# id date season_year
# 1: 1 2016-08-22 Fall-2016
# 2: 2 2017-01-09 Winter-2017
# 3: 3 2017-08-28 Fall-2017
# 4: 4 2017-09-25 Fall-2017
# 5: 5 2018-01-08 Winter-2018
# 6: 6 2018-04-02 Spring-2018
# 7: 7 2016-07-11 Summer-2016
# 8: 8 2015-08-31 Fall-2015
# 9: 9 2016-01-11 Winter-2016
# 10: 10 2016-05-02 Spring-2016
# 11: 11 2016-08-29 Fall-2016
# 12: 12 2017-01-09 Winter-2017
# 13: 13 2017-05-01 Spring-2017
Data:
df <- read.table(text="
id date
1 20160822
2 20170109
3 20170828
4 20170925
5 20180108
6 20180402
7 20160711
8 20150831
9 20160111
10 20160502
11 20160829
12 20170109
13 20170501",
header = TRUE, stringsAsFactors = FALSE)
semester <- read.table(text="
start end season_year
20120801 20121222 Fall-2012
20121223 20130123 Winter-2013
20130124 20130523 Spring-2013
20130524 20130805 Summer-2013
20130806 20131228 Fall-2013
20131229 20140122 Winter-2014
20140123 20140522 Spring-2014
20140523 20140804 Summer-2014
20140805 20141227 Fall-2014
20141228 20150128 Winter-2015
20150129 20150528 Spring-2015
20150529 20150803 Summer-2015
20150804 20151226 Fall-2015
20151227 20160127 Winter-2016
20160128 20160526 Spring-2016
20160527 20160801 Summer-2016
20160802 20161224 Fall-2016
20161225 20170125 Winter-2017
20170126 20170525 Spring-2017
20170526 20170807 Summer-2017
20170808 20171230 Fall-2017
20171231 20180124 Winter-2018
20180125 20180524 Spring-2018
20180525 20180806 Summer-2018
20180807 20181222 Fall-2018
20181223 20190123 Winter-2019
20190124 20190523 Spring-2019
20190524 20180804 Summer-2019",
header = TRUE, stringsAsFactors = FALSE)

Convert YYYYMMDD to mm/dd/yyyy format in R

I have a dataframe in R, which has two variables that are dates and I need to calculate the difference in days between them. However, they are formatted as YYYYMMDD. How do I change it to a date format readable in R?
This should work
lubridate::ymd(given_date_format)
I like anydate() from the anytime package. Quick demo, with actual data:
R> set.seed(123) # be reproducible
R> data <- data.frame(inp=Sys.Date() + cumsum(runif(10)*10))
R> data$ymd <- format(data$inp, "%Y%m%d") ## as yyyymmdd
R> data$int <- as.integer(data$ymd) ## same as integer
R> library(anytime)
R> data$diff1 <- c(NA, diff(anydate(data$ymd))) # reads YMD
R> data$diff2 <- c(NA, diff(anydate(data$int))) # also reads int
R> data
inp ymd int diff1 diff2
1 2017-06-23 20170623 20170623 NA NA
2 2017-07-01 20170701 20170701 8 8
3 2017-07-05 20170705 20170705 4 4
4 2017-07-14 20170714 20170714 9 9
5 2017-07-24 20170724 20170724 10 10
6 2017-07-24 20170724 20170724 0 0
7 2017-07-29 20170729 20170729 5 5
8 2017-08-07 20170807 20170807 9 9
9 2017-08-13 20170813 20170813 6 6
10 2017-08-17 20170817 20170817 4 4
R>
Here the first column is actual dates we work from. Columns two and three are then generates to match OP's requirement: YMD, either in character or integer.
We then compute differences on them, account for the first 'lost' data point differences when we have no predecessor and show that either date format works.

Converting sets of calendar dates to Julian days in a data frame

I am a beginner in R and I am trying to convert sets of calendar dates to sets of Julian dates in a data frame using R. I know there are a similar questions answered but I am not being able to get I want.
df <- data.frame(Date = c('2010-06-20','2005-10-19','2000-05-01','2003-04-04','2010-11-20','2009-09-14'), No = c(1, 4, 6, 11, 7, 9))
df$ jDate <- as.POSIXct(as.numeric(df$Date), origin = '1970-01-01')
gives me
df
Date No cDate
1 2010-06-20 1 1969-12-31 19:00:05
2 2005-10-19 4 1969-12-31 19:00:03
3 2000-05-01 6 1969-12-31 19:00:01
4 2003-04-04 11 1969-12-31 19:00:02
5 2010-11-20 7 1969-12-31 19:00:06
6 2009-09-14 9 1969-12-31 19:00:04
How could I get a column with Julian days in the column 'jDate'?
Thank you for your help.
You can do
df$Date <- as.Date(df$Date)
to get the date, and then
df$jDate <- format(df$Date, "%j")
to get the julian days or
df$jDateYr <- format(df$Date, "%Y-%j")
to prepend the year (if you want). This returns
df
Date No jDate jDateYr
1 2010-06-20 1 171 2010-171
2 2005-10-19 4 292 2005-292
3 2000-05-01 6 122 2000-122
4 2003-04-04 11 094 2003-094
5 2010-11-20 7 324 2010-324
6 2009-09-14 9 257 2009-257
To read more about the possible date-time formats, see ?strptime.
Based on aosmith's comments, I did this and got what I wanted.
> df$jDate <- julian(as.Date(df$Date), origin = as.Date('1970-01-01'))
df
Date No jDate
1 2010-06-20 1 14780
2 2005-10-19 4 13075
3 2000-05-01 6 11078
4 2003-04-04 11 12146
5 2010-11-20 7 14933
6 2009-09-14 9 14501

Resources