I have a period of time (110 years) that has been divided in pentads (5 days periods), so I have 8030 values. What I would like to do is to assign to each value the correspondent month, e.g. the first value corresponding to the first 5 days of the all period will be assigned to January and so on.
Can the chron package do this?
Many thanks
There are lots of ways to retireve the month for a date. Let's use today as an example.
(x <- Sys.Date())
For most date and time behaviour, the lubridate package should be your first port of call. This has the month function that does what you want.
library(lubridate)
month(x)
## [1] 2
month(x, label = TRUE)
## [1] Feb
## Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < Oct < Nov < Dec
month(x, label = TRUE, abbr = FALSE)
## [1] February
## 12 Levels: January < February < March < April < May < June < July < ... < December
The chron package has a month.day.year function that retrieves those three components of the date.
month.day.year(x)
## $month
## [1] 2
##
## $day
## [1] 14
##
## $year
## [1] 2014
The data.table package also has a month function.
library(data.table)
month(x)
## [1] 2
Related
I have a dataset (8000 observations) that has a string date variable. I would like to split the variable into StartDt and EndDt of format "%B %d %Y". The variable also spans calendar years eg Dec 30 to Jan 5 2019. I have not had success trying to use the stringr package and manipulate accordingly - appreciate any insights!
Df<-data.frame(Date2=c("Dec 16 to 22 2018","Dec 23 to 29 2018", "Dec 30 to Jan 5 2019"))
Use str_match with regex and capture the values needed from the string. Pattern with ? means they are optional.
#extract the data in a dataframe based on pattern
dat <- as.data.frame(stringr::str_match(Df$Date2, '([A-Za-z]+)\\s(\\d+)\\sto\\s?([A-Za-z]+)?\\s(\\d+)\\s(\\d+)')[, -1])
#Change the columns to respective type
dat <- type.convert(dat, as.is = TRUE)
#Copy the year column
dat$V6 <- dat$V5
#Copy the month column if it is the same
dat$V3[is.na(dat$V3)] <- dat$V1[is.na(dat$V3)]
#Subtract 1 from the year only if the End month is earlier than Start month
dat <- transform(dat, V5 = V5 - as.integer(match(V1, month.abb) > match(V3, month.abb)))
#Create the final result dataframe pasting the values
result <- data.frame(Start = with(dat, paste(V1, V2, V5)),
End = with(dat, paste(V3, V4, V6)))
result
# Start End
#1 Dec 16 2018 Dec 22 2018
#2 Dec 23 2018 Dec 29 2018
#3 Dec 30 2018 Jan 5 2019
#4 Apr 15 2018 May 20 2018
data
Added an additional date ("Apr 15 to May 20 2018") in the input for testing purpose.
Df <- data.frame(Date2=c("Dec 16 to 22 2018","Dec 23 to 29 2018",
"Dec 30 to Jan 5 2019", "Apr 15 to May 20 2018"))
Bit long perhaps, but this will require a few steps, though it does only use vectorized functions:
library(glue)
library(stringr)
Df<-data.frame(Date2=c("Dec 16 to 22 2018","Dec 23 to 29 2018", "Dec 30 to Jan 5 2019"))
## a regular expression to match abbreivated month names:
mnthrx <- paste0( "(?:", paste( month.abb, collapse="|" ), ")" )
## the big regex we will use to match it all:
rx <- glue( "({mnthrx}) (\\d+) to (?:({mnthrx}) )?(\\d+) (\\d+)" )
m <- str_match( Df$Date2, rx )
## The end date:
day2 <- as.integer(m[,5])
month2 <- m[,4]
year2 <- as.integer( m[, ncol(m)])
## The start date:
day1 <- m[,3]
month1 <- m[,2]
year1 <- year2
## if month2 is missing, its because we're in month1 still
j <- is.na(month2)
month2[j] <- month1[j]
month.number1 <- match( month1, month.abb )
month.number2 <- match( month2, month.abb )
## if month2 is smaller than month1, we swapped years:
i.next.year <- month.number2 < month.number1
year1[i.next.year] <- year2[i.next.year]-1
data.frame(
StartDt = paste( month1,day1,year1, sep=" " ),
EndDt = paste( month2,day2,year2, sep=" " )
)
It produces this:
StartDt EndDt
1 Dec 16 2018 Dec 22 2018
2 Dec 23 2018 Dec 29 2018
3 Dec 30 2018 Jan 5 2019
I have data from several years and each record has a date value (YYYY-MM-DD). I want to label each record with the season that it fell into. For example, I want to take all the records from December 15 to March 15, across all years, and put "Winter" in a season column. Is there a way in R to specify a sequence of dates using just the month and date, regardless of year?
Lubridate quarter command doesn't work because I have custom dates to define the seasons and the seasons are not all of equal length, and I can't just do month(datevalue) %in% c(12,1,2,3) because I need to split the months in half (i.e. March 15 is winter and March 16 is spring).
I could manually enter in the date range for each year in my dataset (e.g. Dec 15 2015 to March 15 2015 or Dec 15 2016 to Mar 15 2016, etc...), but is there a better way?
You can extract the month and date out of the date column and use case_when to assign Season based on those two dates.
library(dplyr)
library(lubridate)
df %>%
mutate(day = day(Date),
month = month(Date),
Season = case_when(#15 December to 15 March as Winter
month == 12 & day >= 15 |
month %in% 1:2 | month == 3 & day <= 15 ~ "Winter",
#Add conditions for other season
)
)
We assume that when the question says that winter is "Dec 15 2015 to March 15 201 or Dec 15 2016 to Mar 15 2016" what is really meant is that winter is Dec 16, 2015 to Mar 15, 2016 or Dec 16, 2016 to Mar 15, 2017.
Also it is not clear what the precise output is supposed to be but in each case below we provide a second argument which takes a vector giving the season names or numbers. The default is that winter is reported as 1, spring is 2, summer is 3 and fall is 4 but you could pass a second argument of c("Winter", "Spring", "Summer", "Fall") instead or use other names if you wish.
1) yearmon/yearqtr Convert to Date class and subtract 15. Then convert that to yearmon class which represents dates internally as year + fraction where fraction = 0 for January, 1/12 for February, ..., 11/12 for December. Add 1/12 to get to the next month. Convert that to yearqtr class which represents dates as year + fraction where fraction is 0, 1/4, 2/4 or 3/4 for the 4 quarters and take cycle of that which gives the quarter number (1, 2, 3 or 4).
If we knew that the input x was a Date vector as opposed to a character vector then we could simplify this by replacing as.Date(x) in season.
library(zoo)
season <- function(x, s = 1:4)
s[cycle(as.yearqtr(as.yearmon(as.Date(x) - 15) + 1/12))]
# test
d <- c(as.Date("2020-12-15") + 0:1, as.Date("2021-03-15") + 0:1)
season(d)
## [1] 4 1 1 2
season(d, c("Winter", "Spring", "Summer", "Fall"))
## [1] "Fall" "Winter" "Winter" "Spring"
2) base The above could be translated to base R using POSIXlt. Subtract 15 as before and then add 1 to the month to get to the next month. Finally extract the month and ensure that is is less than or equal to the third month.
season.lt <- function(x, s = 1:4) {
lt <- as.POSIXlt(as.Date(d) - 15)
lt$mon <- lt$mon + 1
s[as.POSIXlt(format(lt))$mon %/% 3 + 1]
}
# test - d defined in (1)
is.season.lt(d)
## [1] 4 1 1 2
3) lubridate We can follow the same logic in lubridate like this:
season.lub <- function(x, s = 1:4)
s[(month((as.Date(x) - 15) %m+% months(1)) - 1) %/% 3 + 1]
# test - d defined in (1)
season.lub(d)
## [1] 4 1 1 2
I have a df with dates formatted in the following way.
Date Year
<chr> <dbl>
Sunday, Jul 27 2008
Tuesday, Jul 29 2008
Wednesday, July 31 (1) 2008
Wednesday, July 31 (2) 2008
Is there a simple way to achieve the following format of columns and values? I'd also like to remove the (1) and (2) notations on the two July 31 dates.
Date Year Month Day Day_of_Week
2008-07-27 2008 07 27 Sunday
With base R, you can do:
dat <- data.frame(
Date = c("Sunday, Jul 27" ,"Tuesday, Jul 29", "Wednesday, July 31", "Wednesday, July 31"),
Year = rep(2008, 4),
stringsAsFactors = FALSE
)
dts <- as.POSIXlt(paste(dat$Year, dat$Date), format = "%Y %A, %B %d")
POSIXlt provides a list-based reference for the date/time. To see them, try unclass(dts[1]).
From here it can be rather academic:
dat$Month = 1 + dts$mon # months are 0-based in POSIXlt
dat$Day = dts$mday
dat$Day_of_Week = weekdays(dts)
dat
# Date Year Month Day Day_of_Week
# 1 Sunday, Jul 27 2008 7 27 Sunday
# 2 Tuesday, Jul 29 2008 7 29 Tuesday
# 3 Wednesday, July 31 2008 7 31 Thursday
# 4 Wednesday, July 31 2008 7 31 Thursday
library(dplyr)
library(lubridate)
dat = data_frame(date = c('Sunday, Jul 27','Tuesday, Jul 29', 'Wednesday, July
31 (1)','Wednesday, July 31 (2)'), year=rep(2008,4))
dat %>%
mutate(date = gsub("\\s*\\([^\\)]+\\)","",as.character(date)),
date = parse_date_time(date,'A, b! d ')) -> dat1
year(dat1$date) <- dat1$year
# A tibble: 4 × 2
date year
<dttm> <dbl>
1 2008-07-27 2008
2 2008-07-29 2008
3 2008-07-31 2008
4 2008-07-31 2008
If I have a date, say "2014-05-13" and I want to calculate the month in decimal, I would do this:
5 + 13/31 = 5.419355
How would it be possible in R to take a vector of dates and turn in it into a "month decimal" vector?
For example:
dates = c("2010-01-24", "2013-04-08", "2014-03-05", "2013-03-08", "2014-02-14",
"2004-01-28", "2006-02-21", "2013-03-28", "2013-04-01", "2006-02-14",
"2006-01-28", "2014-01-19", "2012-03-12", "2014-01-30", "2005-04-17")
library(lubridate)
month(dates) + day(dates)/31
As you can see, it would be wrong to put "31" as the diviser since the number of days differ depending on the month, and sometimes year (leap years).
So what would be the best solution?
You can use monthDaysfunction from Hmisc package
> require(Hmisc)
> library(lubridate)
> month(dates) + day(dates)/monthDays(dates)
[1] 1.774194 4.266667 3.161290 3.258065 2.500000 1.903226 2.750000 3.903226 4.033333
[10] 2.500000 1.903226 1.612903 3.387097 1.967742 4.566667
With magrittr,
library(magrittr)
library(lubridate)
dates %>% ymd() %>% { month(.) + day(.) / days_in_month(.) }
## Jan Apr Mar Mar Feb Jan Feb Mar Apr Feb Jan
## 1.774194 4.266667 3.161290 3.258065 2.500000 1.903226 2.750000 3.903226 4.033333 2.500000 1.903226
## Jan Mar Jan Apr
## 1.612903 3.387097 1.967742 4.566667
For some reason the vector gets named, so add %>% unname() if you like.
Here is a base R hack that uses a trick I've seen on SO to get the first day of the next month and subtract 1 to return the last day of the month of interest.
# format dates to Date class
dates <- as.Date(dates)
# get the next month
nextMonths <- as.integer(substr(dates, 6, 7)) + 1L
# replace next month with 1 if it is equal to 13
nextMonths[nextMonths == 13] <- 1L
# extract the number of days using date formatting (%d), paste, and subtraction
dayCount <- as.integer(format(as.Date(paste(substr(dates, 1, 4),
nextMonths, "01", sep="-"))-1L, format="%d"))
dayCount
[1] 31 30 31 31 28 31 28 31 30 28 31 31 31 31 30
# get month with fraction using date formatting (%m)
as.integer(format(dates, format="%m")) + (as.integer(format(dates, format="%d")) / dayCount)
[1] 1.774194 4.266667 3.161290 3.258065 2.500000 1.903226 2.750000 3.903226 4.033333 2.500000
[11] 1.903226 1.612903 3.387097 1.967742 4.566667
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Find the day of a week in R
I have a data for days like 11-01-2011 etc. But I want to add the data corresponding
the date as Monday, Tuesday etc. Is there any R package that contains the information of the dates with days?
weekdays(as.Date('16-08-2012','%d-%m-%Y'))
[1] "Thursday"
The lubridate package is great for this sort of stuff.
> wday(as.Date('16-08-2012','%d-%m-%Y'))
[1] 5
> wday(as.Date('16-08-2012','%d-%m-%Y'), label=TRUE)
[1] Thurs
Levels: Sun < Mon < Tues < Wed < Thurs < Fri < Sat
> wday(as.Date('16-08-2012','%d-%m-%Y'), label=TRUE, abbr = FALSE)
[1] Thursday
Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < Friday < Saturday
Here is some information to create your own library or routine
Constants:
day_of_month
the day of the month
e.g. if input mm-dd-yyy then dd
month:
march = 1
april = 2
may = 3
...
year
yy[yy] (last to digits from yyyy)
*subtract 1 if month jan or feb
e.g. if input date is 02-01-2012 (mm-dd-yyyy)
year = (12-1) = 11
century
[yy]yy (first two digits from yyyy)
e.g. if input year is 2012 then 20 = century
* year 2000, 1900, ... are 20-1, 19-1 respectively
ALGORITHM
step1: floor(century / 4)
step2: year
step3: floor(year/4)
step4: floor(month*2.6 -0.2) #this is the leap year correction
step5: day_of_month
step6: add step1...step5
step7: divide by 7 # modulo 7 in codespeak
step8: the remainder is the day of the week
To Interpret Results:
Sun = 0, Mon = 1, Tues = 3, etc..
Not a library, but as the public service jingle goes...
"Read: The More you Know"
Ref: http://www.faqs.org/faqs/sci-math-faq/dayWeek/