conditional statement in r - r

I'm struggling to write an if then statement in R. I have a variable called diel and I would like this variable term to either be "day" or "night" based on the values of a variable called hour. I wrote the code first in SAS and it looks like this:
length diel $5;
if 7 <= hour < 17 then diel = 'day';
if 19 <= hour <= 24 then diel = 'night';
if 0 <= hour < 5 then diel = 'night';
run;
As you can see the hours of dusk(17-19) and dawn (5-7) are left out. This is really the problem I'm having in R, I can't figure out how to leave out dusk and dawn. When I write:
dat4$diel <- ifelse ((dat4$hour)< 17, ifelse((dat4$hour) <=7,"day","night"),"night")
it labels the correct hours of day but labels everything else as night. When I try any other combination like adding another ifelse statement if overwrites the first statement and labels all of hours as day. Thanks for any suggestions!

Something like this might do:
hour <- 0:24
c('night', NA, 'day', NA, 'night')[findInterval(hour, c(0,5,7,17,19,24), rightmost.closed=TRUE)]
## [1] "night" "night" "night" "night" "night" NA NA
## [8] "day" "day" "day" "day" "day" "day" "day"
## [15] "day" "day" "day" NA NA "night" "night"
## [22] "night" "night" "night" "night"

I would do this :
dat = data.frame(hour =0:24)
transform(dat,diel =ifelse( hour < 17 & hour >=7 , 'day',
ifelse(hour>=19 | hour <5,'night',NA)))
hour diel
1 0 night
2 1 night
3 2 night
4 3 night
5 4 night
6 5 <NA>
7 6 <NA>
8 7 day
9 8 day
10 9 day
11 10 day
12 11 day
13 12 day
14 13 day
15 14 day
16 15 day
17 16 day
18 17 <NA>
19 18 <NA>
20 19 night
21 20 night
22 21 night
23 22 night
24 23 night
25 24 night

Related

Calculating week numbers WITHOUT a yearwise reset (i.e. week_id = 55 is valid and shows it is a year after) + with a specified start date

This probably seems straightforward, but I am pretty stumped.
I have a set of dates ~ August 1 of each year and need to sum sales by week number. The earliest date is 2008-12-08 (YYYY-MM-DD). I need to create a "week_id" field where week #1 begins on 2008-12-08. And the date 2011-09-03 is week 142. Note that this is different since the calculation of week number does not reset every year.
I am putting up a small example dataset here:
data <- data.frame(
dates = c("2008-12-08", "2009-08-10", "2010-03-31", "2011-10-16", "2008-06-03", "2009-11-14" , "2010-05-05", "2011-09-03"))
data$date = as.Date(data$date)
Any help is appreciated
data$week_id = as.numeric(data$date - as.Date("2008-12-08")) %/% 7 + 1
This would take the day difference between the two dates and find the integer number of 7 days elapsed. I add one since we want the dates where zero weeks have elapsed since the start to be week 1 instead of week 0.
dates date week_id
1 2008-12-07 2008-12-07 0 # added for testing
2 2008-12-08 2008-12-08 1
3 2008-12-09 2008-12-09 1 # added for testing
4 2008-12-14 2008-12-14 1 # added for testing
5 2008-12-15 2008-12-15 2 # added for testing
6 2009-08-10 2009-08-10 36
7 2010-03-31 2010-03-31 69
8 2011-10-16 2011-10-16 149
9 2008-06-03 2008-06-03 -26
10 2009-11-14 2009-11-14 49
11 2010-05-05 2010-05-05 74
12 2011-09-03 2011-09-03 143

Get the first date of the first five-consecutive dates in a list using R

I have the following data:
structure(list(V1 = c("1979-01-28", "1979-01-29", "1979-01-30",
"1979-02-13", "1979-02-14", "1979-02-17", "1979-02-18", "1979-02-19",
"1979-02-20", "1979-02-21", "1979-02-22", "1979-02-23", "1979-03-07",
"1979-03-14", "1979-03-18", "1979-03-29", "1979-03-30", "1979-03-31",
"1979-04-01", "1979-04-02", "1979-04-03", "1979-04-04", "1979-04-05")), class =
"data.frame", row.names = c(NA,-22L))
This is a list of dates. The interval is daily but with gaps.
I would like to get the first date of a five-day sequence that occurred first.
So in the example above, the expected output is "1979-02-17".
Right now, I am getting the dates manually. How can I do this in R?
I'll appreciate any help on this.
Using rle and diff.
df$V1[with(rle(diff(as.Date(df$V1)) == 1), {
inds <- which.max(values & lengths >= 5)
sum(lengths[1:(inds - 1)]) + 1
})]
#[1] "1979-02-17"
How about
df=data.frame("V1"=df$V1)
df$V2=difftime(df$V1,c(tail(df$V1,-1),NA))
tmp=rle(as.numeric(df$V2))
df$V3=rep(tmp$lengths,tmp$lengths)
df
V1 V2 V3
1 1979-01-28 24 hours 2
2 1979-01-29 24 hours 2
3 1979-01-30 336 hours 1
4 1979-02-13 24 hours 1
5 1979-02-14 72 hours 1
6 1979-02-17 24 hours 6
7 1979-02-18 24 hours 6
8 1979-02-19 24 hours 6
9 1979-02-20 24 hours 6
10 1979-02-21 24 hours 6
11 1979-02-22 24 hours 6
12 1979-02-23 288 hours 1
13 1979-03-07 168 hours 1
14 1979-03-14 96 hours 1
15 1979-03-18 264 hours 1
16 1979-03-29 24 hours 3
17 1979-03-30 24 hours 3
18 1979-03-31 24 hours 3
19 1979-04-01 23 hours 1
20 1979-04-02 24 hours 3
21 1979-04-03 24 hours 3
22 1979-04-04 24 hours 3
23 1979-04-05 NA hours 1
df$V1[which.max(df$V3>=5)]
[1] "1979-02-17"

Getting Mean of all aggregated values for every quarter hour in dataframe and assigning

I have some sampled data from a sensor with no particular time differences between samples looking like this:
> Y_cl[[1]]
index Date time Glucose POS
10 11 2017-06-10 03:01:00 136 2017-06-10 00:01:00
14 15 2017-06-10 03:06:00 132 2017-06-10 00:06:00
18 19 2017-06-10 03:11:00 133 2017-06-10 00:11:00
22 23 2017-06-10 03:16:00 130 2017-06-10 00:16:00
26 27 2017-06-10 03:20:59 119 2017-06-10 00:20:59
30 31 2017-06-10 03:26:00 115 2017-06-10 00:26:00
34 35 2017-06-10 03:30:59 117 2017-06-10 00:30:59
38 39 2017-06-10 03:36:00 114 2017-06-10 00:36:00
42 43 2017-06-10 03:40:59 113 2017-06-10 00:40:59
The data is saved in the format of Dataframes stored in list Y_cl, each list element is for one day. I am trying to select ALL samples between every quarter hour of the clock and get the mean, resulting in 4 points for each hour of each day, mathematically defined (NOT CODE) as:
mean(Glucose(H:00 <Y_cl[[1]]$time< H:15))==> Glucose_av(H:00),
mean(Glucose(H:15 <Y_cl[[1]]$time< H:30))==> Glucose_av(H:15),
mean(Glucose(H:30 <Y_cl[[1]]$time< H:45))==> Glucose_av(H:30),
mean(Glucose(H:45 <Y_cl[[1]]$time< (H+1):00))==>Glucose_av(H:45)
I have tried searching but have found links on how to select or cut every 15 minutes differences, while I need to group every hours data based on which quarter of the hour they are in, average, and assign the result to corresponding quarter. Y_cl[[1]]['POS'] is in standard POSIXct format. Any help would be appreciated.
Here is a solution using lubridate and plyr packages :
data$POS <- NULL
data$POS = as.POSIXct(paste(data$Date, data$time)) # POS correction
library(lubridate)
library(plyr)
data$day <- day(data$POS) # extract day
data$hour <- hour(data$POS) # extract hour
data$minute <- minute(data$POS) # extract minute
Create a new factor according to the quarter :
data$quarter <- NA
data$quarter[data$minute >= 0 & data$minute < 15] <- "q1" # 1st quarter
data$quarter[data$minute >= 15 & data$minute < 30] <- "q2" # 2ndquarter
data$quarter[data$minute >= 30 & data$minute < 45] <- "q3" # 3rd quarter
data$quarter[data$minute >= 45 & data$minute < 60] <- "q4" # 4th quarter
Summarize data for each quarter (compute mean of Glucose for each combination of day, hour and quarter) :
output <- ddply(data, c("day", "hour", "quarter"), summarise, result = mean(Glucose))
Result :
> output
day hour quarter result
1 10 3 q1 133.6667
2 10 3 q2 121.3333
3 10 3 q3 114.6667
I did it by flooring the result of the minutes of each time stamp divided by 15, where YPOS is the list within the time stamps for each day i with the list Y_cl exist:
SeI<- function(i){
*###seperate the hours from the minutes for use later and store in K1*
strftime(YPOS[[i]], format="%H")
K1<- (floor((as.numeric(strftime(YPOS[[i]], format="%M")))/15))*15
*###get the minutes and divide by 15, keeping the floor,multiplying by 15,store in K2*
K2<- strftime(YPOS[[i]], format="%Y-%m-%d %H", tz="GMT")
*###paste K1 and K2 together an save in POSTIXCT format as T_av*
TT<- paste0(K2, ':', K1)
T_av<- as.POSIXct(TT,format="%Y-%m-%d %H:%M", tz="GMT" )}
and then applying it over all days in the list:
lapply(1:length(Y_cl), function(i) SeI(i) )
My solution included taking the time stamps from the list Y_cl and saving it in YPOS.

Finding the time interval in which a date occurs

I'm working on time-series analyses and I'm hoping to develop multiple datasets with different units of analysis. Namely: the units in data set 1 will be districts in country X for 2-week periods within a span of 4 years (districtYearPeriodCode), the units in data set 2 will be districts in country X for 4-week periods within a span of 4 years, and so forth.
I have created a number of data frames containing start and end dates for each interval, as well as an interval ID. The one below is for the 2-week intervals.
begin <- seq(ymd('2004-01-01'),ymd('2004-06-30'), by = as.difftime(weeks(2)))
end <- seq(ymd('2004-01-14'),ymd('2004-06-30'), by = as.difftime(weeks(2)))
interval <- seq(1,13,1)
df2 <- data.frame(begin, end, interval)
begin end interval
1 2004-01-01 2004-01-14 1
2 2004-01-15 2004-01-28 2
3 2004-01-29 2004-02-11 3
4 2004-02-12 2004-02-25 4
5 2004-02-26 2004-03-10 5
6 2004-03-11 2004-03-24 6
7 2004-03-25 2004-04-07 7
8 2004-04-08 2004-04-21 8
9 2004-04-22 2004-05-05 9
10 2004-05-06 2004-05-19 10
11 2004-05-20 2004-06-02 11
12 2004-06-03 2004-06-16 12
13 2004-06-17 2004-06-30 13
In addition to this I have a data frame that contains observations for events, dates included. It looks something like this:
new.df3 <- data.frame(dates5, districts5)
new.df3
dates5 districts5
1 2004-01-01 d1
2 2004-01-02 d2
3 2004-01-03 d3
4 2004-01-04 d4
5 2004-01-05 d5
Is there a function I can write or a command I can use to end up with something like this?
dates5 districts5 interval5
1 2004-01-01 d1 1
2 2004-01-02 d2 1
3 2004-01-03 d3 1
4 2004-01-04 d4 1
5 2004-01-05 d5 1
I have been trying to find an answer in the lubridate package, or in other threads but all answers seem to be tailored at finding out whether a date falls within a specific time interval instead of identifying the interval a date falls into from a group of intervals.
Much appreiciated!
I used the purrr approached outlined by #alistair in here. I reproduce it below:
elements %>%
map(~intervals$phase[.x >= intervals$start & .x <= intervals$end]) %>%
# Clean up a bit. Shorter, but less readable: map_chr(~.x[1] %||% NA)
map_chr(~ifelse(length(.x) == 0, NA, .x))
## [1] "a" "a" "a" NA "b" "b" "c"

How to Parse Year + Week Number in R?

Is there a good way to get a year + week number converted a date in R? I have tried the following:
> as.POSIXct("2008 41", format="%Y %U")
[1] "2008-02-21 EST"
> as.POSIXct("2008 42", format="%Y %U")
[1] "2008-02-21 EST"
According to ?strftime:
%Y Year with century. Note that whereas there was no zero in the
original Gregorian calendar, ISO 8601:2004 defines it to be valid
(interpreted as 1BC): see http://en.wikipedia.org/wiki/0_(year). Note
that the standard also says that years before 1582 in its calendar
should only be used with agreement of the parties involved.
%U Week of the year as decimal number (00–53) using Sunday as the
first day 1 of the week (and typically with the first Sunday of the
year as day 1 of week 1). The US convention.
This is kinda like another question you may have seen before. :)
The key issue is: what day should a week number specify? Is it the first day of the week? The last? That's ambiguous. I don't know if week one is the first day of the year or the 7th day of the year, or possibly the first Sunday or Monday of the year (which is a frequent interpretation). (And it's worse than that: these generally appear to be 0-indexed, rather than 1-indexed.) So, an enumerated day of the week needs to be specified.
For instance, try this:
as.POSIXlt("2008 42 1", format = "%Y %U %u")
The %u indicator specifies the day of the week.
Additional note: See ?strptime for the various options for format conversion. It's important to be careful about the enumeration of weeks, as these can be split across the end of the year, and day 1 is ambiguous: is it specified based on a Sunday or Monday, or from the first day of the year? This should all be specified and tested on the different systems where the R code will run. I'm not certain that Windows and POSIX systems sing the same tune on some of these conversions, hence I'd test and test again.
Day-of-week == zero in the POSIXlt DateTimesClasses system is Sunday. Not exactly Biblical and not in agreement with the R indexing that starts at "1" convention either, but that's what it is. Week zero is the first (partial) week in the year. Week one (but day of week zero) starts with the first Sunday. And all the other sequence types in POSIXlt have 0 as their starting point. It kind of interesting to see what coercing the list elements of POSIXlt objects do. The only way you can actually change a POSIXlt date is to alter the $year, the $mon or the $mday elements. The others seem to be epiphenomena.
today <- as.POSIXlt(Sys.Date())
today # Tuesday
#[1] "2012-02-21 UTC"
today$wday <- 0 # attempt to make it Sunday
today
# [1] "2012-02-21 UTC" The attempt fails
today$mday <- 19
today
#[1] "2012-02-19 UTC" Success
I did not come up with this myself (it's taken from a blog post by Forester), but nevertheless I thought I'd add this to the answer list because it's the first implementation of the ISO 8601 week number convention that I've seen in R.
No doubt, week numbers are a very ambiguous topic, but I prefer an ISO standard over the current implementation of week numbers via format(..., "%U") because it seems that this is what most people agreed on, at least in Germany (calendars etc.).
I've put the actual function def at the bottom to facilitate focusing on the output first. Also, I just stumbled across package ISOweek, maybe worth a try.
Approach Comparison
x.days <- c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun")
x.names <- sapply(1:length(posix), function(x) {
x.day <- as.POSIXlt(posix[x], tz="Europe/Berlin")$wday
if (x.day == 0) {
x.day <- 7
}
out <- x.days[x.day]
})
data.frame(
posix,
name=x.names,
week.r=weeknum,
week.iso=ISOweek(as.character(posix), tzone="Europe/Berlin")$weeknum
)
# Result
posix name week.r week.iso
1 2012-01-01 Sun 1 4480458
2 2012-01-02 Mon 1 1
3 2012-01-03 Tue 1 1
4 2012-01-04 Wed 1 1
5 2012-01-05 Thu 1 1
6 2012-01-06 Fri 1 1
7 2012-01-07 Sat 1 1
8 2012-01-08 Sun 2 1
9 2012-01-09 Mon 2 2
10 2012-01-10 Tue 2 2
11 2012-01-11 Wed 2 2
12 2012-01-12 Thu 2 2
13 2012-01-13 Fri 2 2
14 2012-01-14 Sat 2 2
15 2012-01-15 Sun 3 2
16 2012-01-16 Mon 3 3
17 2012-01-17 Tue 3 3
18 2012-01-18 Wed 3 3
19 2012-01-19 Thu 3 3
20 2012-01-20 Fri 3 3
21 2012-01-21 Sat 3 3
22 2012-01-22 Sun 4 3
23 2012-01-23 Mon 4 4
24 2012-01-24 Tue 4 4
25 2012-01-25 Wed 4 4
26 2012-01-26 Thu 4 4
27 2012-01-27 Fri 4 4
28 2012-01-28 Sat 4 4
29 2012-01-29 Sun 5 4
30 2012-01-30 Mon 5 5
31 2012-01-31 Tue 5 5
Function Def
It's taken directly from the blog post, I've just changed a couple of minor things. The function is still kind of sketchy (e.g. the week number of the first date is far off), but I find it to be a nice start!
ISOweek <- function(
date,
format="%Y-%m-%d",
tzone="UTC",
return.val="weekofyear"
){
##converts dates into "dayofyear" or "weekofyear", the latter providing the ISO-8601 week
##date should be a vector of class Date or a vector of formatted character strings
##format refers to the date form used if a vector of
## character strings is supplied
##convert date to POSIXt format
if(class(date)[1]%in%c("Date","character")){
date=as.POSIXlt(date,format=format, tz=tzone)
}
# if(class(date)[1]!="POSIXt"){
if (!inherits(date, "POSIXt")) {
print("Date is of wrong format.")
break
}else if(class(date)[2]=="POSIXct"){
date=as.POSIXlt(date, tz=tzone)
}
print(date)
if(return.val=="dayofyear"){
##add 1 because POSIXt is base zero
return(date$yday+1)
}else if(return.val=="weekofyear"){
##Based on the ISO8601 weekdate system,
## Monday is the first day of the week
## W01 is the week with 4 Jan in it.
year=1900+date$year
jan4=strptime(paste(year,1,4,sep="-"),format="%Y-%m-%d")
wday=jan4$wday
wday[wday==0]=7 ##convert to base 1, where Monday == 1, Sunday==7
##calculate the date of the first week of the year
weekstart=jan4-(wday-1)*86400
weeknum=ceiling(as.numeric((difftime(date,weekstart,units="days")+0.1)/7))
#########################################################################
##calculate week for days of the year occuring in the next year's week 1.
#########################################################################
mday=date$mday
wday=date$wday
wday[wday==0]=7
year=ifelse(weeknum==53 & mday-wday>=28,year+1,year)
weeknum=ifelse(weeknum==53 & mday-wday>=28,1,weeknum)
################################################################
##calculate week for days of the year occuring prior to week 1.
################################################################
##first calculate the numbe of weeks in the previous year
year.shift=year-1
jan4.shift=strptime(paste(year.shift,1,4,sep="-"),format="%Y-%m-%d")
wday=jan4.shift$wday
wday[wday==0]=7 ##convert to base 1, where Monday == 1, Sunday==7
weekstart=jan4.shift-(wday-1)*86400
weeknum.shift=ceiling(as.numeric((difftime(date,weekstart)+0.1)/7))
##update year and week
year=ifelse(weeknum==0,year.shift,year)
weeknum=ifelse(weeknum==0,weeknum.shift,weeknum)
return(list("year"=year,"weeknum"=weeknum))
}else{
print("Unknown return.val")
break
}
}

Resources