Extracting year from two different date format - r

I have column say x which has two different date formats 12/31/1998 and 12/--/98. As you can see, in the second format date is missing and year is in 2 digits.
I need to extract year from all the dates in my column. So, when I am using Year<- data.frame(format(df$x, "%Y")) it returning year for first format. For second format, it is returning NA.
I would appreciate all the help. Thanks.

You could get a bit creative and specify an ugly format for the missing data, and then just keep one of the valid responses:
vals <- c("12/31/1998", "12/--/98")
out <- pmax(
as.Date(vals, "%m/%d/%Y"),
as.Date(paste0("01",vals), "%d%m/--/%y"),
na.rm=TRUE
)
format(out, "%Y")
#[1] "1998" "1998"

If they are all in the format where the year is the last number after "/" you can use basename. Then you just need to covert the 2 character years to a four year format:
vals <- c("12/31/1998", "12/--/98", "68", "69")
yrs <- basename(vals)
yrs <- ifelse(nchar(yrs) == 2, format(as.Date(yrs, format = "%y"), "%Y"), yrs)
yrs
# [1] "1998" "1998" "2068" "1969"
The issue is it does not work with dates older than 1969.

Related

How can I delete numbers and characters from my date and convert a character column to a date column

I have a dataframe with the column name perioden. This column contains the date but it is written in this format: 2010JJ00, 2011JJ00, 2012JJ00, 2013JJ00 etc..
This column is also a character when I look at the structure. I've tried multiple solutions but so far am still stuck, my qeustion is how can I convert this column to a date and how do I remove the JJ00 part so that you only see the year format of the column.
You can try this approach. Using gsub() to remove the non desired text (as said by #AllanCameron) and then format to date using paste0() to add the day and month, and as.Date() for date transformation:
#Data
df <- data.frame(Date=c('2010JJ00', '2011JJ00', '2012JJ00', '2013JJ00'),stringsAsFactors = F)
#Remove string
df$Date <- gsub('JJ00','',df$Date)
#Format to date, you will need a day and month
df$Date2 <- as.Date(paste0(df$Date,'-01-01'))
Output:
Date Date2
1 2010 2010-01-01
2 2011 2011-01-01
3 2012 2012-01-01
4 2013 2013-01-01
We can use ymd with truncated option
library(lubridate)
library(stringr)
ymd(str_remove(df$Date, 'JJ\\d+'), truncated = 2)
#[1] "2010-01-01" "2011-01-01" "2012-01-01" "2013-01-01"
data
df <- data.frame(Date=c('2010JJ00', '2011JJ00', '2012JJ00', '2013JJ00'), stringsAsFactors = FALSE)

Get the month from the week of the year

Let's say we have this:
ex <- c('2012-41')
This represent the week 41 from the year 2012. How would I get the month from this?
Since a week can be between two months, I will be interested to get the month when that week started (here October).
Not duplicate to How to extract Month from date in R (do not have a standard date format like %Y-%m-%d).
you could try:
ex <- c('2019-10')
splitDate <- strsplit(ex, "-")
dateNew <- as.Date(paste(splitDate[[1]][1], splitDate[[1]][2], 1, sep="-"), "%Y-%U-%u")
monthSelected <- lubridate::month(dateNew)
3
I hope this helps!
This depends on the definition of week. See the discussion of %V and %W in ?strptime for two possible definitions of week. We use %V below but the function allows one to specify the other if desired. The function performs a sapply over the elements of x and for each such element it extracts the year into yr and forms a sequence of all dates for that year in sq. It then converts those dates to year-month and finds the first occurrence of the current component of x in that sequence, finally extracting the match's month.
yw2m <- function(x, fmt = "%Y-%V") {
sapply(x, function(x) {
yr <- as.numeric(substr(x, 1, 4))
sq <- seq(as.Date(paste0(yr, "-01-01")), as.Date(paste0(yr, "-12-31")), "day")
as.numeric(format(sq[which.max(format(sq, fmt) == x)], "%m"))
})
}
yw2m('2012-41')
## [1] 10
The following will add the week-of-year to an input of year-week formatted strings and return a vector of dates as character. The lubridate package weeks() function will add the dates corresponding to the end of the relevant week. Note for example I've added an additional case in your 'ex' variable to the 52nd week, and it returns Dec-31st
library(lubridate)
ex <- c('2012-41','2016-4','2018-52')
dates <- strsplit(ex,"-")
dates <- sapply(dates,function(x) {
year_week <- unlist(x)
year <- year_week[1]
week <- year_week[2]
start_date <- as.Date(paste0(year,'-01-01'))
date <- start_date+weeks(week)
#note here: OP asked for beginning of week.
#There's some ambiguity here, the above is end-of-week;
#uncommment here for beginning of week, just subtracted 6 days.
#I think this might yield inconsistent results, especially year-boundaries
#hence suggestion to use end of week. See below for possible solution
#date <- start_date+weeks(week)-days(6)
return (as.character(date))
})
Yields:
> dates
[1] "2012-10-14" "2016-01-29" "2018-12-31"
And to simply get the month from these full dates:
month(dates)
Yields:
> month(dates)
[1] 10 1 12

Factor to Month-Year Conversion in R

Have a factor column as April-2017 February-2017 etc. Want to convert it to Month and Year to order the column as per the month and year so that it start from January. Tried following:
Combi$Month <- as.yearmon(levels(Combined$Month))[Combined$Month] -> Yields 'NA'
Combined$Month <- as.Date(Combined$Month,'%B-%Y') -> Yields 'NA'
The "yearmon" class can represent year-month and sorts as expected:
library(zoo)
x <- factor(c('April-2017', 'February-2017')) # test data
ym <- as.yearmon(x, "%B-%Y")
sort(ym)
## [1] "Feb 2017" "Apr 2017"
Because of this you don't really need to convert it to "Date" class nor do you need the year and month separately but if for some reason not stated in the question you still do need separate values then as.integer(ym) and cycle(ym) give the years as 4 digit numbers and the months as numbers between 1 and 12. Also as.Date(ym) gives "Date" class values.
An R base way:
# Some sample data
df <- data.frame(period=sample(c("April-2017","February-2017"),10, replace = TRUE))
nicep <- function(x) {
months <- c('January','February','March','April','May','June','July','August','September','October','November','December')
l <- strsplit(x, '-')
return(sprintf("%s-%02d",l[[1]][2], which(months == l[[1]][1])))
}
# change levels for a nice name
levels(df$period) <- unlist(lapply(as.character(levels(df$period)), FUN=nicep))

Format date strings comprising weeks and quarters as Date objects

I have dates in an R dataframe column formatted as character strings as WK01Q32014.
I want to turn each date into a Date() object.
So I altered the format to make it look like 01-3-2014. I want to try to do something like as.Date("01-3-2014","%W-%Q-%Y") for example, but there is no format code for quarters that I know of.
Is there any way to do this using the lubridate, zoo, or any other libraries?
I dont know of any specific function, but here's a basic one:
convert_WQ_to_Date <- function(D) {
weeks <- as.integer(substr(D, 3, 4))
quarter <- as.integer(substr(D, 6, 6))
year <- substr(D, 7, 10)
days <- 7 * ((quarter - 1) * 13 + (weeks-1))
as.Date(sprintf("%s-01-01", year)) + days
}
Example
D <- c("WK01Q32014", "WK01Q12014", "WK05Q42014", "WK01Q22014", "WK02Q32014")
convert_WQ_to_Date(D)
[1] "2014-07-02" "2014-01-01" "2014-10-29" "2014-04-02" "2014-07-09"
The week, quarter and year does not uniquely define a date so we will have to add some assumption. Here we add the assumption that the first week is the first day of the quarter, the second week is 7 days later and so on,
Below, we extract the qtr-year part and use as.yearqtr in the zoo package to convert that to a yearqtr object and then use as.Date to convert that to a date which is the first of the quarter. We then extract the week, subtract 1 and multiply by 7 to get the days offset. Adding the first of the quarter to the offset gives the result:
library(zoo)
xx <- "01-3-2014" # week-quarter-year
qtr.start <- as.Date(as.yearqtr(sub("...", "", xx), "%q-%Y"))
days <- 7 * (as.numeric(sub("-.*", "", xx)) - 1)
qtr.start + days
## [1] "2014-07-01"
Assuming the traditional notion of each quarter starting respectively at the 1st January, 1st April, 1st July and 1st September (in line with the quarters function), just start at these dates and add 7 days for each week:
x <- c("01-3-2014","01-1-2014","05-4-2014","01-2-2014","02-3-2014")
y <- as.numeric(substr(x,6,9))
m <- as.numeric(substr(x,4,4))
d <- as.numeric(substr(x,1,2))
as.Date(paste(y,(m-1)*3+1,"01",sep="-")) + (7*(d-1))
#[1] "2014-07-01" "2014-01-01" "2014-10-29" "2014-04-01" "2014-07-08"

Delete part of a value for the whole column

If a have a vector such as the following:
REF_YEAR
1994-01-01
1995-01-01
1996-01-01
how can I delete the part "-01-01", so that I only get the year for the whole column?
If your vector is formatted as Dates, you can do:
x <- as.Date("2001-01-01")
format(x, "%Y")
#[1] "2001"
And for your example data:
# Your sample data:
df <- read.table(header=TRUE, text = "REF_YEAR
1994-01-01
1995-01-01
1996-01-01", stringsAsFactors = FALSE)
Convert your data to Date format:
df$REF_YEAR <- as.Date(df$REF_YEAR) # skip this step if it's already formatted as Date
Now convert to year format:
df$REF_YEAR <- format(df$REF_YEAR, "%Y")
Or
transform(df, REF_YEAR = format(REF_YEAR, "%Y"))
Result in both cases:
df
# REF_YEAR
#1 1994
#2 1995
#3 1996
You only need to make sure your data is in Date format (use as.Date() for conversion).
This can be done using regular expression. You can either keep the first four digit or eliminate the last six. Here is how you can do using the second option as asked by you.
ref_year = as.character("1994-01-01")
ref_year_only = substr(ref_year, 1, nchar(ref_year) - 6) ; ref_year_only
Also, please show some effort while asking questions on stack.
Without converting to Date, you could also try:
library(stringr)
df$YEAR <- str_extract(df$REF_YEAR, perl('\\d+(?=-)'))
df$YEAR
#[1] "1994" "1995" "1996"

Resources