Factor to Month-Year Conversion in R - r

Have a factor column as April-2017 February-2017 etc. Want to convert it to Month and Year to order the column as per the month and year so that it start from January. Tried following:
Combi$Month <- as.yearmon(levels(Combined$Month))[Combined$Month] -> Yields 'NA'
Combined$Month <- as.Date(Combined$Month,'%B-%Y') -> Yields 'NA'

The "yearmon" class can represent year-month and sorts as expected:
library(zoo)
x <- factor(c('April-2017', 'February-2017')) # test data
ym <- as.yearmon(x, "%B-%Y")
sort(ym)
## [1] "Feb 2017" "Apr 2017"
Because of this you don't really need to convert it to "Date" class nor do you need the year and month separately but if for some reason not stated in the question you still do need separate values then as.integer(ym) and cycle(ym) give the years as 4 digit numbers and the months as numbers between 1 and 12. Also as.Date(ym) gives "Date" class values.

An R base way:
# Some sample data
df <- data.frame(period=sample(c("April-2017","February-2017"),10, replace = TRUE))
nicep <- function(x) {
months <- c('January','February','March','April','May','June','July','August','September','October','November','December')
l <- strsplit(x, '-')
return(sprintf("%s-%02d",l[[1]][2], which(months == l[[1]][1])))
}
# change levels for a nice name
levels(df$period) <- unlist(lapply(as.character(levels(df$period)), FUN=nicep))

Related

Get the month from the week of the year

Let's say we have this:
ex <- c('2012-41')
This represent the week 41 from the year 2012. How would I get the month from this?
Since a week can be between two months, I will be interested to get the month when that week started (here October).
Not duplicate to How to extract Month from date in R (do not have a standard date format like %Y-%m-%d).
you could try:
ex <- c('2019-10')
splitDate <- strsplit(ex, "-")
dateNew <- as.Date(paste(splitDate[[1]][1], splitDate[[1]][2], 1, sep="-"), "%Y-%U-%u")
monthSelected <- lubridate::month(dateNew)
3
I hope this helps!
This depends on the definition of week. See the discussion of %V and %W in ?strptime for two possible definitions of week. We use %V below but the function allows one to specify the other if desired. The function performs a sapply over the elements of x and for each such element it extracts the year into yr and forms a sequence of all dates for that year in sq. It then converts those dates to year-month and finds the first occurrence of the current component of x in that sequence, finally extracting the match's month.
yw2m <- function(x, fmt = "%Y-%V") {
sapply(x, function(x) {
yr <- as.numeric(substr(x, 1, 4))
sq <- seq(as.Date(paste0(yr, "-01-01")), as.Date(paste0(yr, "-12-31")), "day")
as.numeric(format(sq[which.max(format(sq, fmt) == x)], "%m"))
})
}
yw2m('2012-41')
## [1] 10
The following will add the week-of-year to an input of year-week formatted strings and return a vector of dates as character. The lubridate package weeks() function will add the dates corresponding to the end of the relevant week. Note for example I've added an additional case in your 'ex' variable to the 52nd week, and it returns Dec-31st
library(lubridate)
ex <- c('2012-41','2016-4','2018-52')
dates <- strsplit(ex,"-")
dates <- sapply(dates,function(x) {
year_week <- unlist(x)
year <- year_week[1]
week <- year_week[2]
start_date <- as.Date(paste0(year,'-01-01'))
date <- start_date+weeks(week)
#note here: OP asked for beginning of week.
#There's some ambiguity here, the above is end-of-week;
#uncommment here for beginning of week, just subtracted 6 days.
#I think this might yield inconsistent results, especially year-boundaries
#hence suggestion to use end of week. See below for possible solution
#date <- start_date+weeks(week)-days(6)
return (as.character(date))
})
Yields:
> dates
[1] "2012-10-14" "2016-01-29" "2018-12-31"
And to simply get the month from these full dates:
month(dates)
Yields:
> month(dates)
[1] 10 1 12

Extracting year from two different date format

I have column say x which has two different date formats 12/31/1998 and 12/--/98. As you can see, in the second format date is missing and year is in 2 digits.
I need to extract year from all the dates in my column. So, when I am using Year<- data.frame(format(df$x, "%Y")) it returning year for first format. For second format, it is returning NA.
I would appreciate all the help. Thanks.
You could get a bit creative and specify an ugly format for the missing data, and then just keep one of the valid responses:
vals <- c("12/31/1998", "12/--/98")
out <- pmax(
as.Date(vals, "%m/%d/%Y"),
as.Date(paste0("01",vals), "%d%m/--/%y"),
na.rm=TRUE
)
format(out, "%Y")
#[1] "1998" "1998"
If they are all in the format where the year is the last number after "/" you can use basename. Then you just need to covert the 2 character years to a four year format:
vals <- c("12/31/1998", "12/--/98", "68", "69")
yrs <- basename(vals)
yrs <- ifelse(nchar(yrs) == 2, format(as.Date(yrs, format = "%y"), "%Y"), yrs)
yrs
# [1] "1998" "1998" "2068" "1969"
The issue is it does not work with dates older than 1969.

How do I make a column from substring of another column in R? [duplicate]

This question already has answers here:
Extracting the last n characters from a string in R
(15 answers)
Closed 6 years ago.
I have a dataframe df with the column ReleaseDate, a Factor column with data like this:
Apr 10, 2001
Apr 10, 2007
...
I want to make a new column ReleaseYear with only the year, which is always the last four characters in the ReleaseDate data.
How do I get the last four characters from ReleaseDate for ReleaseYear?
Here are two options, one use the year from lubridate package, another use regular expression:
library(lubridate)
year(as.Date("Apr 10, 2001", format = "%b %d, %Y"))
[1] 2001
library(stringr)
str_extract("Apr 10, 2001", "\\d{4}$")
[1] "2001"
This is one option. gsub will return everything after ", ".
a <- c("Apr 10, 2001", "Apr 10, 2007")
df <- data.frame(a)
colnames(df) <- "ReleaseDate"
df$ReleaseYear <- gsub("^.*?, ","",a)
This is an alternative.
df$ReleaseYear <- substr(df$ReleaseDate, 9, 12)
One more option.
library(stringr)
df$ReleaseYear <- str_sub(df$ReleaseDate, -4)
use substr. substr(x, start, stop). your start will be the length of df -4.
substr(df, nchar(df)-4,4)

Format date strings comprising weeks and quarters as Date objects

I have dates in an R dataframe column formatted as character strings as WK01Q32014.
I want to turn each date into a Date() object.
So I altered the format to make it look like 01-3-2014. I want to try to do something like as.Date("01-3-2014","%W-%Q-%Y") for example, but there is no format code for quarters that I know of.
Is there any way to do this using the lubridate, zoo, or any other libraries?
I dont know of any specific function, but here's a basic one:
convert_WQ_to_Date <- function(D) {
weeks <- as.integer(substr(D, 3, 4))
quarter <- as.integer(substr(D, 6, 6))
year <- substr(D, 7, 10)
days <- 7 * ((quarter - 1) * 13 + (weeks-1))
as.Date(sprintf("%s-01-01", year)) + days
}
Example
D <- c("WK01Q32014", "WK01Q12014", "WK05Q42014", "WK01Q22014", "WK02Q32014")
convert_WQ_to_Date(D)
[1] "2014-07-02" "2014-01-01" "2014-10-29" "2014-04-02" "2014-07-09"
The week, quarter and year does not uniquely define a date so we will have to add some assumption. Here we add the assumption that the first week is the first day of the quarter, the second week is 7 days later and so on,
Below, we extract the qtr-year part and use as.yearqtr in the zoo package to convert that to a yearqtr object and then use as.Date to convert that to a date which is the first of the quarter. We then extract the week, subtract 1 and multiply by 7 to get the days offset. Adding the first of the quarter to the offset gives the result:
library(zoo)
xx <- "01-3-2014" # week-quarter-year
qtr.start <- as.Date(as.yearqtr(sub("...", "", xx), "%q-%Y"))
days <- 7 * (as.numeric(sub("-.*", "", xx)) - 1)
qtr.start + days
## [1] "2014-07-01"
Assuming the traditional notion of each quarter starting respectively at the 1st January, 1st April, 1st July and 1st September (in line with the quarters function), just start at these dates and add 7 days for each week:
x <- c("01-3-2014","01-1-2014","05-4-2014","01-2-2014","02-3-2014")
y <- as.numeric(substr(x,6,9))
m <- as.numeric(substr(x,4,4))
d <- as.numeric(substr(x,1,2))
as.Date(paste(y,(m-1)*3+1,"01",sep="-")) + (7*(d-1))
#[1] "2014-07-01" "2014-01-01" "2014-10-29" "2014-04-01" "2014-07-08"

How to subset a list of dataframes in R?

I have multiple datasets of physical variables, and I want to do some work on it with R. However, I would like to use a list. Here is my code for 1 of my dataframe :
# Table definition
df.jannuary <- read.table("C:\\...file1.csv", sep=";")
# Subset of the table containing only variables of interest
df.jannuary_sub <- subset(df.jannuary, select=c(2:8, 11:12))
# Column names
colnames(df.jannuary_sub)<-c("year","day","hour","minute","temp_air","temp_eau","humidity_rel","wind_intensity","wind_direction")
# Aggregation of the 4 Year-Day-Hour-Minute columns into a single column and conversion into a POSIXct objet through the temporary column "timestamp"
df.jannuary_sub$timestamp <- as.POSIXct(paste(df.jannuary_sub$year, df.jannuary_sub$day, df.jannuary_sub$hour, df.jannuary_sub$minute), format="%Y %j %H %M", tz="GMT")
# Getting the date with a new format from julian day to normal day into a column called "date"
df.jannuary_sub$date <- format(df.jannuary_sub$timestamp,"%d/%m/%Y %H:%M",tz = "GMT")
# Suppression of the 4 Year-Day-Hour-Minute initial columns and of the temporary column "timestamp", and placement of the date column as column 1
df.jannuary_sub <- subset(df.jannuary_sub, select=c(11, 5:9))
This code works. The thing is I got all the months of the year, for several years.
So I started to use a list, here is the example for the year 2011 :
df.jannuary <- read.table("C:\\...\file1.dat", sep=",")
#...
df.december <- read.table("C:\\...\file12.dat", sep=",")
# Creation of a list containing the month datasets, with a subset of the tables containing only variables of interest
list.dataset_2011<-list(
df.jannuary_sub <- subset(df.jannuary, select=c(2:8, 11:12)),
#...
df.december_sub <- subset(df.december, select=c(2:8, 11:12))
)
# Column names for all variables of the list for (j in 1:12)
{
colnames(list.dataset_2011[[j]])<-c("year","day","hour","minute","temp_air","temp_eau","humidity_rel","wind_intensity","wind_direction")
}
# Conversion of the list into a data.frame called "list.dataset_2011" for (i in 1:9)
{
list.dataset_2011[[i]]<-as.data.frame(list.dataset_2011[[i]])
}
# Aggregation of the 4 Year-Day-Hour-Minute columns into a single column and conversion into a POSIXct objet through the temporary column "timestamp"
list.dataset_2011$timestamp <- as.POSIXct(paste(list.dataset_2011$year, list.dataset_2011$day, list.dataset_2011$hour, list.dataset_2011$minute), format="%Y %j %H %M", tz="GMT")
# Getting the date with a new format from julian day to normal day into a column called "date"
list.dataset_2011$date <- format(list.dataset_2011$timestamp,"%d/%m/%Y %H:%M",tz = "GMT")
# Suppression of the 4 Year-Day-Hour-Minute initial columns and of the temporary column "timestamp", and placement of the date column as column 1
list.dataset_2011 <- subset(list.dataset_2011, select=c(11, 5:9))
I encounter a problem at the end of my code (hoping the rest is working !) with the subset command, which doesn't appear to work for the attribute "list".

Resources