I have dates in this format:
Apr-12,
Dec-12,
30-Jul-14,
Mar-16,
29-Feb-16,
May-17,
20-Nov-14,
R is treating it like factor variable. I want it to treat it like a date, and wherever the day of the date is missing, it should replace it with 1st.
Thank you in advance!
I think we need to parse them separately because the format is not consistent. We first parse the ones which have date, month and year component. The ones which return NA's are then parsed by adding "01" in them.
new_x <- as.Date(x, "%d-%b-%y")
new_x[is.na(new_x)] <- as.Date(paste0("01-", x[is.na(new_x)]), "%d-%b-%y")
new_x
#[1] "2012-04-01" "2012-12-01" "2014-07-30" "2016-03-01" "2016-02-29" "2017-05-01"
#[7] "2014-11-20"
Read more about formats at ?strptime.
data
x <-factor(c("Apr-12", "Dec-12", "30-Jul-14", "Mar-16", "29-Feb-16",
"May-17","20-Nov-14"))
Conditionally append a "01-" when the first three characters are not in the system vector, month.abb
as.Date( ifelse( substr(dtvec,1,3) %in% month.abb, paste0("01-",dtvec), dtvec) ,"%d-%b-%y")
[1] "2012-04-01" "2012-12-01" "2014-07-30" "2016-03-01" "2016-02-29" "2017-05-01" "2014-11-20"
Related
Dates formatted as 4/29/2016 are parsed correctly, but dates formatted as 6242016 and 2042016 are not parsed.
Does R think that some of the dates without the slash have day first instead of month?
I've tried including dmy in lubrdiate and it still doesn't work.
I've tried looking at Sys.getlocale("LC_TIME") and it gives me "English_United States.1252".
demo$date <- as.character(demo$date)
demo <- demo %>%
mutate(date = parse_date_time(date, "mdy"))
You can wrangle your dates all into the same format using stringr. Then convert to numeric and use lubridate to parse.
library(stringr)
library(lubridate)
dates <- c("6242016", "2042016", "4/29/2016")
dates <- str_remove_all(dates, "/")
dates <- as.numeric(dates)
lubridate::mdy(dates)
# [1] "2016-06-24" "2016-02-04" "2016-04-29"
as.Date(sprintf("%08d",
as.numeric(gsub("/", "", c("6242016", "2042016", "4/29/2016")))),
format = "%m%d%Y")
# [1] "2016-06-24" "2016-02-04" "2016-04-29"
This
as.Date("2042016", "%m%d%Y")
returns NA as opposed to
as.Date("02042016", "%m%d%Y")
This is because the month has to be represented by two digits (00-12)
Try adding a leading zero for months in the range [1,9].
I have three data tables in R. Each one has a date column. The tables are vix_data,gold_ohlc_data,btc_ohlc_data. They are formatted as follows:
head(vix_data$Date)
[1] 1/2/04 1/5/04 1/6/04 1/7/04 1/8/04 1/9/04
3435 Levels: 1/10/05 1/10/06 1/10/07 1/10/08 1/10/11 ... 9/9/16
head(gold_ohlc_data$date)
[1] 8/23/17 8/22/17 8/21/17 8/18/17 8/17/17 8/16/17
2519 Levels: 1/10/08 1/10/11 1/10/12 1/10/13 1/10/14 ... 9/9/16
head(btc_ohlc_data$Date)
[1] "2017-08-23" "2017-08-22" "2017-08-21" "2017-08-20" "2017-08-19"
[6] "2017-08-18"
How can I change the date column in the vix_data and gold_ohlc_data tables to match the btc_ohlc_data format? I have tried several methods, for example using as.Date to transform each column- but this usually messes up the values and inserts a lot of N/A's
An option is to use functions from the package lubridate. The users need to know which one is day and which one is month to select the right function to use, such as dmy or mdy
# Load package
library(lubridate)
# Create example string
date1 <- c("1/2/04", "1/5/04", "1/6/04", "1/7/04", "1/8/04", "1/9/04")
date2 <- c("8/23/17", "8/22/17", "8/21/17", "8/18/17", "8/17/17", "8/16/17")
# Convert to date class
dmy(date1)
# [1] "2004-02-01" "2004-05-01" "2004-06-01" "2004-07-01" "2004-08-01" "2004-09-01"
mdy(date1)
# [1] "2004-01-02" "2004-01-05" "2004-01-06" "2004-01-07" "2004-01-08" "2004-01-09"
mdy(date2)
# [1] "2017-08-23" "2017-08-22" "2017-08-21" "2017-08-18" "2017-08-17" "2017-08-16"
Look into the package lubridate. lubridate::dmy() and ymd() should handle this just fine.
It looks like your data are read in as factors, so first you'll have to change them to characters. Then after that you can convert it to a date and specify the input format where %m represents the numerical month, %d represents the day, and %y represents the 2-digit year.
x <- c('1/2/04', '1/5/04', '1/6/04', '1/7/04', '1/8/04', '1/9/04')
y <- as.Date(x, format = "%m/%d/%y")
y
[1] "2004-01-02" "2004-01-05" "2004-01-06" "2004-01-07" "2004-01-08"
[6] "2004-01-09"
Are you sure you're specifying as.Date correctly? For example, do you have %y, instead of %Y?
I did the following and it worked:
> vix <- c("1/2/04", "1/5/04", "1/6/04", "1/7/04", "1/8/04", "1/9/04")
> vix<- as.factor(vix)
> vix
[1] 1/2/04 1/5/04 1/6/04 1/7/04 1/8/04 1/9/04
Levels: 1/2/04 1/5/04 1/6/04 1/7/04 1/8/04 1/9/04
> as.Date(vix, "%m/%d/%y")
[1] "2004-01-02" "2004-01-05" "2004-01-06" "2004-01-07" "2004-01-08" "2004-01-09"
I am facing an issue with formatting of date and cannot find a solution. Here is the code - the second date becomes not the format like I want to.
date1
#[1] "01. Nov 11"
ndate1 <- as.Date(date1, "%d. %B %y")
ndate1
#[1] "2011-11-01"
date2
#[1] "26-May-13"
ndate2 <- as.Date(date2, "%d-%B-%y")
ndate2
#[1] NA
You can determine the complete or abbreviated month names in your locale using the example on the ?Constants page:
format(ISOdate(2000, 1:12, 1), "%b")
Per ?strptime on input you can use either "%B" or "%b" for either abbreviated or complete names.
This is most probably due to an incompatibility with the locale settings. If the output of Sys.getlocale("LC_TIME") does not correspond to an English setting, like "en_US.UTF-8" or "en_GB.UTF-8", the abbreviation "May" (which, coincidentally, is not even an abbreviation in this case) is not recognized in most (all?) other settings. In contrast, "Nov" is a valid abbreviation for the month of November in several languages. This might explain why the first case with date1 does not cause trouble.
We could try this:
Sys.setlocale("LC_TIME", "en_US.UTF-8")
date2 <- "26-May-13"
ndate2 <- as.Date(date2, "%d-%b-%y")
ndate2
#[1] "2013-05-26"
I've imported one date value into R:
dtime <- read.csv("dtime.csv", header=TRUE)
It's output (7th Nov, 2013) is printed as:
> dtime
Date
1 07-11-2013 23:06
and also its class is 'factor'.
> class(dtime$Date)
[1] "factor"
Now, I want to extract the time details (hours, minutes, seconds) from the data. So, I was trying to convert the dataframe's date value to Date type. But none of the following commands worked:
dtime <- as.Date(as.character(dtime),format="%d%m%Y")
unclass(as.POSIXct(dtime))
as.POSIXct(dtime$Date, format = "%d-%m-%Y %H:%M:%S")
How do I achieve this in R???
Your attempts didn't work because the format specified was wrong.
With base R there are two possible ways of solving this, with as.POSIXlt
Res <- as.POSIXlt(dtime$Date, format = "%d-%m-%Y %H:%M")
Res$hour
Res$min
Also, for more options, see
attr(Res, "names")
## [1] "sec" "min" "hour" "mday" "mon" "year" "wday" "yday" "isdst" "zone" "gmtoff"
Or a bit less conveniently with as.POSIXct
Res2 <- as.POSIXct(dtime$Date, format = "%d-%m-%Y %H:%M")
format(Res2, "%H") # returns a character vector
format(Res2, "%M") # returns a character vector
I would like to contribute solution utilising lubridate :
dates <- c("07-11-2013 23:06", "08-10-2012 11:11")
dta <- data.frame(dates)
require(lubridate)
dta$properDate <- dmy_hm(dta$dates)
If needed, lubridate will enable you to conveniently specify time zones or extract additional information.
How would I convert the following character variables to dates?
strDates <- c("Jan.2008", "Feb.2008")
str(strDates)
chr [1:2] "Jan.2008" "Feb.2008"
dates <- as.Date(strDates, "%b %Y")
str(dates)
Date[1:2], format: NA NA
Any assistance would be greatly appreciated
To form a valid 'date', you also need a day which your data was lacking. So we add one, and we simply use an arbitrary day (here: first of the month):
R> strDates <- c("Jan.2008", "Feb.2008")
R> strptime(paste("01", strDates), "%d %b.%Y")
[1] "2008-01-01" "2008-02-01"
R>
A Date requires a day element as well, so you can add that to the input string with paste:
full.dates <- paste("01", strDates, sep = ".")
Specify the template correctly, including separator tokens:
as.Date(full.dates, "%d.%b.%Y")
[1] "2008-01-01" "2008-02-01"