R converting a factor YYYY-MM to a date - r

I have a dataframe with a date in the form YYYY-MM, class factor and I am trying to convert it to class date.
I tried:
Date <- c("2015-08","2015-09","2015-08")
Val <- c(1,2,3)
df <- data.frame(Date,Val)
df[,1] <- as.POSIXct(as.character(df[,1]), format = "%Y-%m") 
df
But this does not work. I would be grateful for your help.

1) Convert the dates to zoo's "yearmon" class and then to "Date" class:
> library(zoo)
> transform(df, Date = as.Date(as.yearmon(Date)))
Date Val
1 2015-08-01 1
2 2015-09-01 2
3 2015-08-01 3
The question did not specify which date to convert to so we used the first of the month. Had the last of the month been wanted we could have used this instead:
transform(df, Date = as.Date(as.yearmon(Date), frac = 1))
2) Another possibility not using zoo is to just add the day of the month yourself and then convert to "Date" class.
> transform(df, Date = paste(Date, 1, sep = "-"))
Date Val
1 2015-08-01 1
2 2015-09-01 2
3 2015-08-01 3
3) Alternately, might want to just use "yearmon" directly since that directly models year and month with no day.
> library(zoo)
> transform(df, Date = as.yearmon(Date))
Date Val
1 Aug 2015 1
2 Sep 2015 2
3 Aug 2015 3
Note: Do not use "POSIXct" class as this gives a time zone dependent result that can cause subtle errors if you are not careful. A date in one time zone is not necessarily the same as in another time zone.

R does not support Dates in the format "%Y-%m"... A day is needed
You can do the following:
as.POSIXct(paste0(as.character(df[,1]),"-01"), format = "%Y-%m-%d")
Resulting in
"2015-08-01 CEST" "2015-09-01 CEST" "2015-08-01 CEST"

Related

Rounding dates with round_date() in R

I try convert date format yyyymmdd in yyyy only in R.
In how to convert numeric only year in Date in R? presented a very interesting answer, as it managed to make R understand to convert an 8-digit entry (yyyymmdd) as a 4-digit year year (yyyy) in the lubricated package, this is very good for me.
in old code i used round_date() for it:
date2<-c('01/01/2000','08/08/2000','16/03/2001','25/12/2000','29/02/2000')
name<-c('A','B','C','D','E')
df<-data.frame(date2,name)
df2 <- df %>%
mutate(date2 = dmy(date2)) %>%
mutate(year_date = round_date(date2,'year'))
df2
str(df2)
date2<date> name<chr> year_date <date>
2000-01-01 A 2000-01-01
2000-08-08 B 2001-01-01
2001-03-16 C 2001-01-01
2000-12-25 D 2001-01-01
2000-02-29 E 2000-01-01
But I started to have problems with my statistical analysis when discovering for example that a date 2000-08-08 was rounded up to the year 2001-01-01, instead of 2001-01-01 as I expected.
This is a very big problem for me, since information that belongs to the year 2005 has been moved to the year 2006, considering that I have more than 1400 rows in my database.
I noticed that dates after the middle of the year (after June) are rounded up to the next year, this is very bad.
How do I round a 2000-08-08 date to just 2000 instead of 2001?
Doesn't this (simpler, also only base R) operation do what you want?
> date2 <- c('01/01/2000','08/08/2000','16/03/2001','25/12/2000','29/02/2000')
> dd <- as.Date(date2, "%d/%m/%Y")
> yd <- format(dd, "%Y-01-01")
> dt <- as.Date(yd)
> D <- data.frame(date2=date2, date=dd, y=yd, d=dt)
> D
date2 date y d
1 01/01/2000 2000-01-01 2000-01-01 2000-01-01
2 08/08/2000 2000-08-08 2000-01-01 2000-01-01
3 16/03/2001 2001-03-16 2001-01-01 2001-01-01
4 25/12/2000 2000-12-25 2000-01-01 2000-01-01
5 29/02/2000 2000-02-29 2000-01-01 2000-01-01
>
In essence we just extract the year component from the (parsed as date) Date object and append -01-01.
Edit: There are also trunc() operations for Date and Datetime objects. Oddly, truncation for years only works for Datetime (see the help page for trunc.Date for more) so this works too:
> as.Date(trunc(as.POSIXlt(dd), "years"))
[1] "2000-01-01" "2000-01-01" "2001-01-01" "2000-01-01" "2000-01-01"
>
Edit 2: We can use that last step in a cleaner / simpler solution in a data.frame with three columns for input data (as characters), parse data as a proper Date type and the desired truncated year data — all using base R without further dependencies. Of course, if you would want to you could rewrite it via the pipe and lubridate for the same result via slightly slower route (which only matters for "large" data).
> date2 <- c('01/01/2000','08/08/2000','16/03/2001','25/12/2000','29/02/2000')
> pd <- as.Date(date2, "%d/%m/%Y")
> td <- as.Date(trunc(as.POSIXlt(pd), "years"))
> D <- data.frame(input = date2, parsed = pd, output = td)
> D
input parsed output
1 01/01/2000 2000-01-01 2000-01-01
2 08/08/2000 2000-08-08 2000-01-01
3 16/03/2001 2001-03-16 2001-01-01
4 25/12/2000 2000-12-25 2000-01-01
5 29/02/2000 2000-02-29 2000-01-01
>
For a real "production" use you may not need the data.frame and do not need to keep the intermediate result leading to a one-liner:
> as.Date(trunc(as.POSIXlt( as.Date(date2, "%d/%m/%Y") ), "years"))
[1] "2000-01-01" "2000-01-01" "2001-01-01" "2000-01-01" "2000-01-01"
>
which is likely the most compact and efficient conversion you can get.
If you want just the year (and not the date corresponding to the first day of the year) you can use lubridate::year().
df %>% mutate(across(date2,dmy),
year_date=year(date2))
If you do want the first day of the year then floor_date() will do the trick.
df %>% mutate(across(date2,dmy),
year_date=floor_date(date2,"year"))
or if you only need the truncated date you could go directly to mutate(year_date=floor_date(dmy(date2)))
In base R, year() would be format(date2, "%Y"), as shown in #DirkEddelbuettel's answer.
If you consult the round_datehelp page, you will also see floor_date:
library("lubridate")
library("dplyr")
date2 <- c('01/01/2000','08/08/2000','16/03/2001','25/12/2000','29/02/2000')
name <- c('A','B','C','D','E')
df <- data.frame(date2,name)
df2 <- df %>%
mutate(date2 = dmy(date2)) %>%
mutate(year_date = floor_date(date2,'year'))
df2

R convert yy-mm string to date format [duplicate]

I have a data frame (df) like the following:
Date Arrivals
2014-07 100
2014-08 150
2014-09 200
I know that I can convert the yearmon dates to the first date of each month as follows:
df$Date <- as.POSIXct(paste0(as.character(df[,1]),"-01"), format = "%Y-%m-%d")
However, given that my data is not available until the end of the month I want to index it to the end rather than the beginning, and I cannot figure it out. Any help appreciated.
If the Date variable is an actual yearmon class vector, from the zoo package, the as.Date.yearmon method can do what you want via its argument frac.
Using your data, and assuming that the Date was originally a character vector
library("zoo")
df <- data.frame(Date = c("2014-07", "2014-08", "2014-09"),
Arrivals = c(100, 150, 200))
I convert this to a yearmon vector:
df <- transform(df, Date2 = as.yearmon(Date))
Assuming this is what you have, then you can achieve what you want using as.Date() with frac = 1:
df <- transform(df, Date3 = as.Date(Date2, frac = 1))
which gives:
> df
Date Arrivals Date2 Date3
1 2014-07 100 Jul 2014 2014-07-31
2 2014-08 150 Aug 2014 2014-08-31
3 2014-09 200 Sep 2014 2014-09-30
That shows the individual steps. If you only want the final Date this is a one-liner
## assuming `Date` is a `yearmon` object
df <- transform(df, Date = as.Date(Date, frac = 1))
## or if not a `yearmon`
df <- transform(df, Date = as.Date(as.yearmon(Date), frac = 1))
The argument frac in the fraction of the month to assign to the resulting dates when converting from yearmon objects to Date objects. Hence, to get the first day of the month, rather than convert to a character and paste on "-01" as your Question showed, it's better to coerce to a Date object with frac = 0.
If the Date in your df is not a yearmon class object, then you can solve your problem by converting it to one and then using the as.Date() method as described above.
Here is a way to do it using the zoo package.
R code:
library(zoo)
df
# Date Arrivals
# 1 2014-07 100
# 2 2014-08 150
# 3 2014-09 200
df$Date <- as.Date(as.yearmon(df$Date), frac = 1)
# output
# Date Arrivals
# 1 2014-07-31 100
# 2 2014-08-31 150
# 3 2014-09-30 200
Using lubridate, you can add a month and subtract a day to get the last day of the month:
library(lubridate)
ymd(paste0(df$Date, '-01')) + months(1) - days(1)
# [1] "2014-07-31" "2014-08-31" "2014-09-30"

how can i extract month and date and year from data column in R

I had a column with date datatype. in my column the dates are in 4/1/2007 format. now I want to extract month value from that column and date value from that column in different column in R. my date are from 01/01/2012 to 01/01/ 2015 plz help me.
If your variable is date type (as you say in the post) simply use following to extract month:
month_var = format(df$datecolumn, "%m") # this will give output like "09"
month_var = format(df$datecolumn, "%b") # this will give output like "Sep"
month_var = format(df$datecolumn, "%B") # this will give output like "September"
If your date variable in not in date format, then you will have to convert them into date format.
df$datecolumn<- as.Date(x, format = "%m/%d/%Y")
Assuming your initial data is character and not POSIX.
df <- data.frame(d = c("4/1/2007", "01/01/2012", "02/01/2015"),
stringsAsFactors = FALSE)
df
# d
# 1 4/1/2007
# 2 01/01/2012
# 3 02/01/2015
These are not yet "dates", just strings.
df$d2 = as.POSIXct(df$d, format = "%m/%d/%Y")
df
# d d2
# 1 4/1/2007 2007-04-01
# 2 01/01/2012 2012-01-01
# 3 02/01/2015 2015-02-01
Now they proper dates (in the R fashion). These two lines extract just a single component from each "date"; see ?strptime for details on all available formats.
df$dY = format(df$d2, "%Y")
df$dm = format(df$d2, "%m")
df
# d d2 dY dm
# 1 4/1/2007 2007-04-01 2007 04
# 2 01/01/2012 2012-01-01 2012 01
# 3 02/01/2015 2015-02-01 2015 02
An alternative method would be to extract the substrings from each string, but now you're getting into regex-pain; for that, I'd suggest sticking with somebody else's regex lessons-learned, and translate through POSIXct (or even POSIXlt if you want).

Converting yearmon column to last date of the month in R

I have a data frame (df) like the following:
Date Arrivals
2014-07 100
2014-08 150
2014-09 200
I know that I can convert the yearmon dates to the first date of each month as follows:
df$Date <- as.POSIXct(paste0(as.character(df[,1]),"-01"), format = "%Y-%m-%d")
However, given that my data is not available until the end of the month I want to index it to the end rather than the beginning, and I cannot figure it out. Any help appreciated.
If the Date variable is an actual yearmon class vector, from the zoo package, the as.Date.yearmon method can do what you want via its argument frac.
Using your data, and assuming that the Date was originally a character vector
library("zoo")
df <- data.frame(Date = c("2014-07", "2014-08", "2014-09"),
Arrivals = c(100, 150, 200))
I convert this to a yearmon vector:
df <- transform(df, Date2 = as.yearmon(Date))
Assuming this is what you have, then you can achieve what you want using as.Date() with frac = 1:
df <- transform(df, Date3 = as.Date(Date2, frac = 1))
which gives:
> df
Date Arrivals Date2 Date3
1 2014-07 100 Jul 2014 2014-07-31
2 2014-08 150 Aug 2014 2014-08-31
3 2014-09 200 Sep 2014 2014-09-30
That shows the individual steps. If you only want the final Date this is a one-liner
## assuming `Date` is a `yearmon` object
df <- transform(df, Date = as.Date(Date, frac = 1))
## or if not a `yearmon`
df <- transform(df, Date = as.Date(as.yearmon(Date), frac = 1))
The argument frac in the fraction of the month to assign to the resulting dates when converting from yearmon objects to Date objects. Hence, to get the first day of the month, rather than convert to a character and paste on "-01" as your Question showed, it's better to coerce to a Date object with frac = 0.
If the Date in your df is not a yearmon class object, then you can solve your problem by converting it to one and then using the as.Date() method as described above.
Here is a way to do it using the zoo package.
R code:
library(zoo)
df
# Date Arrivals
# 1 2014-07 100
# 2 2014-08 150
# 3 2014-09 200
df$Date <- as.Date(as.yearmon(df$Date), frac = 1)
# output
# Date Arrivals
# 1 2014-07-31 100
# 2 2014-08-31 150
# 3 2014-09-30 200
Using lubridate, you can add a month and subtract a day to get the last day of the month:
library(lubridate)
ymd(paste0(df$Date, '-01')) + months(1) - days(1)
# [1] "2014-07-31" "2014-08-31" "2014-09-30"

Split date data (m/d/y) into 3 separate columns

I need to convert date (m/d/y format) into 3 separate columns on which I hope to run an algorithm.(I'm trying to convert my dates into Julian Day Numbers). Saw this suggestion for another user for separating data out into multiple columns using Oracle. I'm using R and am throughly stuck about how to code this appropriately. Would A1,A2...represent my new column headings, and what would the format difference be with the "update set" section?
update <tablename> set A1 = substr(ORIG, 1, 4),
A2 = substr(ORIG, 5, 6),
A3 = substr(ORIG, 11, 6),
A4 = substr(ORIG, 17, 5);
I'm trying hard to improve my skills in R but cannot figure this one...any help is much appreciated. Thanks in advance... :)
I use the format() method for Date objects to pull apart dates in R. Using Dirk's datetext, here is how I would go about breaking up a date into its constituent parts:
datetxt <- c("2010-01-02", "2010-02-03", "2010-09-10")
datetxt <- as.Date(datetxt)
df <- data.frame(date = datetxt,
year = as.numeric(format(datetxt, format = "%Y")),
month = as.numeric(format(datetxt, format = "%m")),
day = as.numeric(format(datetxt, format = "%d")))
Which gives:
> df
date year month day
1 2010-01-02 2010 1 2
2 2010-02-03 2010 2 3
3 2010-09-10 2010 9 10
Note what several others have said; you can get the Julian dates without splitting out the various date components. I added this answer to show how you could do the breaking apart if you needed it for something else.
Given a text variable x, like this:
> x
[1] "10/3/2001"
then:
> as.Date(x,"%m/%d/%Y")
[1] "2001-10-03"
converts it to a date object. Then, if you need it:
> julian(as.Date(x,"%m/%d/%Y"))
[1] 11598
attr(,"origin")
[1] "1970-01-01"
gives you a Julian date (relative to 1970-01-01).
Don't try the substring thing...
See help(as.Date) for more.
Quick ones:
Julian date converters already exist in base R, see eg help(julian).
One approach may be to parse the date as a POSIXlt and to then read off the components. Other date / time classes and packages will work too but there is something to be said for base R.
Parsing dates as string is almost always a bad approach.
Here is an example:
datetxt <- c("2010-01-02", "2010-02-03", "2010-09-10")
dates <- as.Date(datetxt) ## you could examine these as well
plt <- as.POSIXlt(dates) ## now as POSIXlt types
plt[["year"]] + 1900 ## years are with offset 1900
#[1] 2010 2010 2010
plt[["mon"]] + 1 ## and months are on the 0 .. 11 intervasl
#[1] 1 2 9
plt[["mday"]]
#[1] 2 3 10
df <- data.frame(year=plt[["year"]] + 1900,
month=plt[["mon"]] + 1, day=plt[["mday"]])
df
# year month day
#1 2010 1 2
#2 2010 2 3
#3 2010 9 10
And of course
julian(dates)
#[1] 14611 14643 14862
#attr(,"origin")
#[1] "1970-01-01"
To convert date (m/d/y format) into 3 separate columns,consider the df,
df <- data.frame(date = c("01-02-18", "02-20-18", "03-23-18"))
df
date
1 01-02-18
2 02-20-18
3 03-23-18
Convert to date format
df$date <- as.Date(df$date, format="%m-%d-%y")
df
date
1 2018-01-02
2 2018-02-20
3 2018-03-23
To get three seperate columns with year, month and date,
library(lubridate)
df$year <- year(ymd(df$date))
df$month <- month(ymd(df$date))
df$day <- day(ymd(df$date))
df
date year month day
1 2018-01-02 2018 1 2
2 2018-02-20 2018 2 20
3 2018-03-23 2018 3 23
Hope this helps.
Hi Gavin: another way [using your idea] is:
The data-frame we will use is oilstocks which contains a variety of variables related to the changes over time of the oil and gas stocks.
The variables are:
colnames(stocks)
"bpV" "bpO" "bpC" "bpMN" "bpMX" "emdate" "emV" "emO" "emC"
"emMN" "emMN.1" "chdate" "chV" "cbO" "chC" "chMN" "chMX"
One of the first things to do is change the emdate field, which is an integer vector, into a date vector.
realdate<-as.Date(emdate,format="%m/%d/%Y")
Next we want to split emdate column into three separate columns representing month, day and year using the idea supplied by you.
> dfdate <- data.frame(date=realdate)
year=as.numeric (format(realdate,"%Y"))
month=as.numeric (format(realdate,"%m"))
day=as.numeric (format(realdate,"%d"))
ls() will include the individual vectors, day, month, year and dfdate.
Now merge the dfdate, day, month, year into the original data-frame [stocks].
ostocks<-cbind(dfdate,day,month,year,stocks)
colnames(ostocks)
"date" "day" "month" "year" "bpV" "bpO" "bpC" "bpMN" "bpMX" "emdate" "emV" "emO" "emC" "emMN" "emMX" "chdate" "chV"
"cbO" "chC" "chMN" "chMX"
Similar results and I also have date, day, month, year as separate vectors outside of the df.

Resources