I am having some trouble formatting the following date with lubridate. I'm not married to the lubridate approach but can someone recommend a good way to format these wonky Sept dates?
library(lubridate)
df <- data.frame(y=1:5, Date=c("Sept 1 2002","Sept 7 2002","Sept 9 2002","Sept 20 2002","Sept 21 2002"))
I didn't really expect this to work:
df$Date2=mdy(df$Date)
But I do not understand why this one didn't work:
df$Date2=parse_date_time(df$Date, "%b %d %Y")
Any ideas?
It will work if we match the abbreviations as in month.abb. One option would be to remove the 't' in 'Sept' using sub.
mdy(sub('(...).', '\\1', df$Date))
#[1] "2002-09-01 UTC" "2002-09-07 UTC" "2002-09-09 UTC" "2002-09-20 UTC" "2002-09-21 UTC"
and
month.abb
#[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
If we look at ?strptime
%b: Abbreviated month name in the current locale on this platform.
(Also matches full name on input: in some locales there are no
abbreviations of names.)
Related
I have a df with a column which has dates stored in character format, for which I want to extract the months. For this I use the following:
mutate(
Date = as.Date(
str_remove(Timestamp, "_.*")
),
Month = month(
Date,
label = F)
)
However, the October, November and December are stored with an extra zero in front of the month. The lubridate library doesn't recognise it. How can I adjust the code above to fix this? This is my Timestamp column:
c("2021-010-01_00h39m", "2021-010-01_01h53m", "2021-010-01_02h36m",
"2021-010-01_10h32m", "2021-010-01_10h34m", "2021-010-01_14h27m"
)
First convert the values to date and use format to get months from it.
format(as.Date(x, '%Y-0%m-%d'), '%b')
#[1] "Oct" "Oct" "Oct" "Oct" "Oct" "Oct"
%b gives abbreviated month name, you may also use %B or %m depending on your choice.
format(as.Date(x, '%Y-0%m-%d'), '%B')
#[1] "October" "October" "October" "October" "October" "October"
format(as.Date(x, '%Y-0%m-%d'), '%m')
#[1] "10" "10" "10" "10" "10" "10"
One way would be use strsplit to extract the second element:
month.abb[readr::parse_number(sapply(strsplit(x, split = '-'), "[[", 2))]
which will return:
#"Oct" "Oct" "Oct" "Oct" "Oct" "Oct"
data:
c("2021-010-01_00h39m", "2021-010-01_01h53m", "2021-010-01_02h36m",
"2021-010-01_10h32m", "2021-010-01_10h34m", "2021-010-01_14h27m"
) -> x
Before marking as duplicate, I've tried a few other solutions, namely these:
R, strptime(), %b, trying to convert character to date format
strptime, as.POSIXct and as.Date return unexpected NA
But neither seem to work for me.
I'm trying to convert a time format Dec-18 to a POSIXct time (would be 2018-12-01 in this case). I'm attempting to use strptime with %b and %y to achieve this as so:
> strptime("Dec-18", format = "%b-%y")
[1] NA
But obviously it is not working. I'm reading a out about "locales" and such, but the above solutions did not work for me. I attempted the following:
> Sys.setlocale("LC_TIME", "C")
[1] "C"
> strptime("Dec-18", format = "%b-%y")
[1] NA
It was also suggested to use this locale, Sys.setlocale("LC_TIME", "en_GB.UTF-8"), but I get an error when trying to use this:
> Sys.setlocale("LC_TIME", "en_GB.UTF-8")
[1] ""
Warning message:
In Sys.setlocale("LC_TIME", "en_GB.UTF-8") :
OS reports request to set locale to "en_GB.UTF-8" cannot be honored
Kind of at a loss for what to do here. My abbreviated months seem right based off this:
> month.abb
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
Here's the version of R that I am running:
R version 3.5.3 (2019-03-11) -- "Great Truth"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)
Thanks in advance.
With lubridate, you can do:
parse_date_time("Dec-18", "my")
[1] "2018-12-01 UTC"
The most simplest solution would be
library(zoo)
as.Date(as.yearmon("Dec-18", "%b-%y"))
#[1] "2018-12-01"
The issue in the OP's code is that strptime or as.Date requires a day too. If it is not there, the format is not complete for a Date. One option would be to paste a day in strptime and it works
strptime(paste0("Dec-18", "-01"), format = "%b-%y-%d")
#[1] "2018-12-01 EST"
The simplest solution will be this:
as.Date(x = paste0("01-", "Dec-18"),
format = "%d-%b-%y")
#> [1] "2018-12-01"
format(x = as.Date(x = paste0("01-", "Dec-18"),
format = "%d-%b-%y"),
format = "%b-%y")
#> [1] "Dec-18"
Created on 2019-05-15 by the reprex package (v0.2.1)
R doesn't recognise Dec-18 as date. Add a 01- so that it can detect it as date, and then display as you prefer.
Before marking as duplicate, I've tried a few other solutions, namely these:
R, strptime(), %b, trying to convert character to date format
strptime, as.POSIXct and as.Date return unexpected NA
But neither seem to work for me.
I'm trying to convert a time format Dec-18 to a POSIXct time (would be 2018-12-01 in this case). I'm attempting to use strptime with %b and %y to achieve this as so:
> strptime("Dec-18", format = "%b-%y")
[1] NA
But obviously it is not working. I'm reading a out about "locales" and such, but the above solutions did not work for me. I attempted the following:
> Sys.setlocale("LC_TIME", "C")
[1] "C"
> strptime("Dec-18", format = "%b-%y")
[1] NA
It was also suggested to use this locale, Sys.setlocale("LC_TIME", "en_GB.UTF-8"), but I get an error when trying to use this:
> Sys.setlocale("LC_TIME", "en_GB.UTF-8")
[1] ""
Warning message:
In Sys.setlocale("LC_TIME", "en_GB.UTF-8") :
OS reports request to set locale to "en_GB.UTF-8" cannot be honored
Kind of at a loss for what to do here. My abbreviated months seem right based off this:
> month.abb
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
Here's the version of R that I am running:
R version 3.5.3 (2019-03-11) -- "Great Truth"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)
Thanks in advance.
With lubridate, you can do:
parse_date_time("Dec-18", "my")
[1] "2018-12-01 UTC"
The most simplest solution would be
library(zoo)
as.Date(as.yearmon("Dec-18", "%b-%y"))
#[1] "2018-12-01"
The issue in the OP's code is that strptime or as.Date requires a day too. If it is not there, the format is not complete for a Date. One option would be to paste a day in strptime and it works
strptime(paste0("Dec-18", "-01"), format = "%b-%y-%d")
#[1] "2018-12-01 EST"
The simplest solution will be this:
as.Date(x = paste0("01-", "Dec-18"),
format = "%d-%b-%y")
#> [1] "2018-12-01"
format(x = as.Date(x = paste0("01-", "Dec-18"),
format = "%d-%b-%y"),
format = "%b-%y")
#> [1] "Dec-18"
Created on 2019-05-15 by the reprex package (v0.2.1)
R doesn't recognise Dec-18 as date. Add a 01- so that it can detect it as date, and then display as you prefer.
I have a dataframe consisting of 96321 observatipns of 11 variables. This data is confidential so I am not able to share it with you. Although I am sharing some screenshot of my data.
My focus is on the FY and OM variables.
levels(mydata$FY)
[1] "2010/11" "2011/12" "2012/13" "2013/14" "2014/15" "2015/16"
levels(mydata$OM)
[1] "Apr" "Aug" "Dec" "Feb" "Jan" "Jul" "Jun" "Mar" "May" "Nov" "Oct" "Sep"
I just want to re-arrange the levels of the 'OM' variable as I want to start my year from April to March (financial Year).
I used the following command to rearrange the levels of my 'OM' variables:
table(is.na(mydata$OM))
FALSE
96321
levels(mydata$OM)<-c('Apr','May','Jun','July','Aug','Sep','Oct','Nov','Dec','Jan','Feb','Mar'
)
table(is.na(mydata$OM)) #NO NA is introduced
FALSE
96321
levels(mydata$OM)
[1] "Apr" "May" "Jun" "July" "Aug" "Sep" "Oct" "Nov" "Dec" "Jan" "Feb" "Mar"
I got the result as I expected but when I tried to arrange my data sorted by the 'OM' variable using sql I am not getting the desired result.
sortedData <-sqldf('SELECT * FROM mydata
ORDER BY OM ASC')
I expected the result in increasing order of levels of 'OM' variable like Apr first then May and then Mar in the last. But the order is somewhat distorted. Please help me on this.
Note:- I also tried
mydata$OM <- factor(mydata$OM, levels = c('Apr','May','Jun','July','Aug','Sep','Oct','Nov','Dec','Jan','Feb','Mar'
))
mydata$OM <-factor(mydata$OM, levels = c('Apr','May','Jun','July','Aug','Sep','Oct','Nov','Dec',
'Jan','Feb','Mar'),
labels = c('Apr','May','Jun','July','Aug','Sep','Oct','Nov','Dec',
'Jan','Feb','Mar'))
But these introduced NA in the result.
table(is.na(mydata$OM))
FALSE TRUE
88097 8224
mydata$OM <- factor(mydata$OM, levels = c('Apr','May','Jun','July','Aug','Sep','Oct','Nov','Dec','Jan','Feb','Mar'
))
Use mydata[order(mydata$OM),]
This will solve your problem. In case of Multiple sorting use
mydata[order(mydata$OM,mydata$FY),]
If I have a vector:
Months = month.abb[1:12]
I want to extract all the months that start with Letter J (in this case, Jan, Jun, and Jul).
Is there a wildcard character, like * in Excel, which lists all elements of vectors which you search for J*?
How do I extract elements that start with either letter 'M' or 'A'. The expected output would be Mar,May,Apr,Aug?
Try:
grep("^J", Months,value=TRUE)
#[1] "Jan" "Jun" "Jul"
grep("^A|^M", Months,value=TRUE)
#[1] "Mar" "Apr" "May" "Aug"
You'll find the glob2rx function helpful for converting wildcard constructions to regular expressions:
> glob2rx("J*")
[1] "^J"
> grep(glob2rx("J*"), Months, value=TRUE)
[1] "Jan" "Jun" "Jul"
If you happen to have stringr loaded you could do:
library(stringr)
str_subset(Months, "^J")
[1] "Jan" "Jun" "Jul"