convert a data frame with yyyymm to a date in R [duplicate] - r

I have a dataset that looks like this:
Month count
2009-01 12
2009-02 310
2009-03 2379
2009-04 234
2009-05 14
2009-08 1
2009-09 34
2009-10 2386
I want to plot the data (months as x values and counts as y values). Since there are gaps in the data, I want to convert the Information for the Month into a date. I tried:
as.Date("2009-03", "%Y-%m")
But it did not work. Whats wrong? It seems that as.Date() requires also a day and is not able to set a standard value for the day? Which function solves my problem?

Since dates correspond to a numeric value and a starting date, you indeed need the day. If you really need your data to be in Date format, you can just fix the day to the first of each month manually by pasting it to the date:
month <- "2009-03"
as.Date(paste(month, "-01", sep=""))

Try this. (Here we use text=Lines to keep the example self contained but in reality we would replace it with the file name.)
Lines <- "2009-01 12
2009-02 310
2009-03 2379
2009-04 234
2009-05 14
2009-08 1
2009-09 34
2009-10 2386"
library(zoo)
z <- read.zoo(text = Lines, FUN = as.yearmon)
plot(z)
The X axis is not so pretty with this data but if you have more data in reality it might be ok or you can use the code for a fancy X axis shown in the examples section of ?plot.zoo .
The zoo series, z, that is created above has a "yearmon" time index and looks like this:
> z
Jan 2009 Feb 2009 Mar 2009 Apr 2009 May 2009 Aug 2009 Sep 2009 Oct 2009
12 310 2379 234 14 1 34 2386
"yearmon" can be used alone as well:
> as.yearmon("2000-03")
[1] "Mar 2000"
Note:
"yearmon" class objects sort in calendar order.
This will plot the monthly points at equally spaced intervals which is likely what is wanted; however, if it were desired to plot the points at unequally spaced intervals spaced in proportion to the number of days in each month then convert the index of z to "Date" class: time(z) <- as.Date(time(z)) .

The most concise solution if you need the dates to be in Date format:
library(zoo)
month <- "2000-03"
as.Date(as.yearmon(month))
[1] "2000-03-01"
as.Date will fix the first day of each month to a yearmon object for you.

You could also achieve this with the parse_date_time or fast_strptime functions from the lubridate-package:
> parse_date_time(dates1, "ym")
[1] "2009-01-01 UTC" "2009-02-01 UTC" "2009-03-01 UTC"
> fast_strptime(dates1, "%Y-%m")
[1] "2009-01-01 UTC" "2009-02-01 UTC" "2009-03-01 UTC"
The difference between those two is that parse_date_time allows for lubridate-style format specification, while fast_strptime requires the same format specification as strptime.
For specifying the timezone, you can use the tz-parameter:
> parse_date_time(dates1, "ym", tz = "CET")
[1] "2009-01-01 CET" "2009-02-01 CET" "2009-03-01 CET"
When you have irregularities in your date-time data, you can use the truncated-parameter to specify how many irregularities are allowed:
> parse_date_time(dates2, "ymdHMS", truncated = 3)
[1] "2012-06-01 12:23:00 UTC" "2012-06-01 12:00:00 UTC" "2012-06-01 00:00:00 UTC"
Used data:
dates1 <- c("2009-01","2009-02","2009-03")
dates2 <- c("2012-06-01 12:23","2012-06-01 12",'2012-06-01")

Using anytime package:
library(anytime)
anydate("2009-01")
# [1] "2009-01-01"

Indeed, as has been mentioned above (and elsewhere on SO), in order to convert the string to a date, you need a specific date of the month. From the as.Date() manual page:
If the date string does not specify the date completely, the returned answer may be system-specific. The most common behaviour is to assume that a missing year, month or day is the current one. If it specifies a date incorrectly, reliable implementations will give an error and the date is reported as NA. Unfortunately some common implementations (such as glibc) are unreliable and guess at the intended meaning.
A simple solution would be to paste the date "01" to each date and use strptime() to indicate it as the first day of that month.
For those seeking a little more background on processing dates and times in R:
In R, times use POSIXct and POSIXlt classes and dates use the Date class.
Dates are stored as the number of days since January 1st, 1970 and times are stored as the number of seconds since January 1st, 1970.
So, for example:
d <- as.Date("1971-01-01")
unclass(d) # one year after 1970-01-01
# [1] 365
pct <- Sys.time() # in POSIXct
unclass(pct) # number of seconds since 1970-01-01
# [1] 1450276559
plt <- as.POSIXlt(pct)
up <- unclass(plt) # up is now a list containing the components of time
names(up)
# [1] "sec" "min" "hour" "mday" "mon" "year" "wday" "yday" "isdst" "zone"
# [11] "gmtoff"
up$hour
# [1] 9
To perform operations on dates and times:
plt - as.POSIXlt(d)
# Time difference of 16420.61 days
And to process dates, you can use strptime() (borrowing these examples from the manual page):
strptime("20/2/06 11:16:16.683", "%d/%m/%y %H:%M:%OS")
# [1] "2006-02-20 11:16:16 EST"
# And in vectorized form:
dates <- c("1jan1960", "2jan1960", "31mar1960", "30jul1960")
strptime(dates, "%d%b%Y")
# [1] "1960-01-01 EST" "1960-01-02 EST" "1960-03-31 EST" "1960-07-30 EDT"

I think #ben-rollert's solution is a good solution.
You just have to be careful if you want to use this solution in a function inside a new package.
When developping packages, it's recommended to use the syntaxe packagename::function_name() (see http://kbroman.org/pkg_primer/pages/depends.html).
In this case, you have to use the version of as.Date() defined by the zoo library.
Here is an example :
> devtools::session_info()
Session info ----------------------------------------------------------------------------------------------------------------------------------------------------
setting value
version R version 3.3.1 (2016-06-21)
system x86_64, linux-gnu
ui RStudio (1.0.35)
language (EN)
collate C
tz <NA>
date 2016-11-09
Packages --------------------------------------------------------------------------------------------------------------------------------------------------------
package * version date source
devtools 1.12.0 2016-06-24 CRAN (R 3.3.1)
digest 0.6.10 2016-08-02 CRAN (R 3.2.3)
memoise 1.0.0 2016-01-29 CRAN (R 3.2.3)
withr 1.0.2 2016-06-20 CRAN (R 3.2.3)
> as.Date(zoo::as.yearmon("1989-10", "%Y-%m"))
Error in as.Date.default(zoo::as.yearmon("1989-10", "%Y-%m")) :
do not know how to convert 'zoo::as.yearmon("1989-10", "%Y-%m")' to class “Date”
> zoo::as.Date(zoo::as.yearmon("1989-10", "%Y-%m"))
[1] "1989-10-01"
So if you're developping a package, the good practice is to use :
zoo::as.Date(zoo::as.yearmon("1989-10", "%Y-%m"))

tidyverse recently added the clock package in addition to lubridate that has some nice functionality for this:
library(clock)
x <- year_month_day_parse(df$Month, format = "%Y-%m", precision = "month")
# <year_month_day<month>[8]>
# [1] "2009-01" "2009-02" "2009-03" "2009-04" "2009-05" "2009-08" "2009-09" "2009-10"
Date Manipulation and Extraction
The output of this is a year-month-day vector where you can still do date arithmetic and apply other common functions as expected:
sort(x, decreasing = T)
# <year_month_day<month>[8]>
# [1] "2009-10" "2009-09" "2009-08" "2009-05" "2009-04" "2009-03" "2009-02" "2009-01"
add_months(x, 3)
# <year_month_day<month>[8]>
# [1] "2009-04" "2009-05" "2009-06" "2009-07" "2009-08" "2009-11" "2009-12" "2010-01"
add_years(x, -2)
# <year_month_day<month>[8]>
# [1] "2007-01" "2007-02" "2007-03" "2007-04" "2007-05" "2007-08" "2007-09" "2007-10"
get_month(x)
# [1] 1 2 3 4 5 8 9 10
You can also set the day, if you need it, with set_day:
set_day(x, 1)
<year_month_day<day>[8]>
[1] "2009-01-01" "2009-02-01" "2009-03-01" "2009-04-01" "2009-05-01" "2009-08-01"
[7] "2009-09-01" "2009-10-01"
Handling Invalid Dates
Or if you wanted to cleanly get the last day of every month with this structure, the invalid_* set of functions can help:
# not 31 days in Feb, Apr, Sep
y <- set_day(x, 31)
# <year_month_day<day>[8]>
# [1] "2009-01-31" "2009-02-31" "2009-03-31" "2009-04-31" "2009-05-31" "2009-08-31"
# [7] "2009-09-31" "2009-10-31"
invalid_any(y)
[1] TRUE
invalid_detect(y)
[1] FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE
You can handle invalid dates with invalid_resolve or you can use drop them with invalid_remove:
invalid_resolve(y, invalid = "previous")
<year_month_day<day>[8]>
[1] "2009-01-31" "2009-02-28" "2009-03-31" "2009-04-30" "2009-05-31" "2009-08-31"
[7] "2009-09-30" "2009-10-31"
From the documentation you can specify the following values for the invalid argument to handle invalid dates:
"previous": The previous valid instant in time.
"previous-day": The previous valid day in time, keeping the time of day.
"next": The next valid instant in time.
"next-day": The next valid day in time, keeping the time of day.
"overflow": Overflow by the number of days that the input is invalid
by. Time of day is dropped.
"overflow-day": Overflow by the number of days that the input is
invalid by. Time of day is kept.
"NA": Replace invalid dates with NA.
"error": Error on invalid dates.

A way using ym from lubridate.
The month can either be a number, an abbreviated month or a full month name with a variety of separators (even without separator), e.g.
library(lubridate)
ym(c("2012/September", "2012-Aug", "2012.07", 201204))
[1] "2012-09-01" "2012-08-01" "2012-07-01" "2012-04-01"
on the given data:
ym(dat$Month)
[1] "2009-01-01" "2009-02-01" "2009-03-01" "2009-04-01" "2009-05-01"
[6] "2009-08-01" "2009-09-01" "2009-10-01"
Note that there's also my if you have it the other way round, e.g. Sep/2022.
Data
dat <- structure(list(Month = c("2009-01", "2009-02", "2009-03", "2009-04",
"2009-05", "2009-08", "2009-09", "2009-10"), count = c(12L, 310L,
2379L, 234L, 14L, 1L, 34L, 2386L)), class = "data.frame", row.names = c(NA,
-8L))

Related

Change date format in R from: "Year-Month" to "Year-Month-Day" [duplicate]

I have a dataset that looks like this:
Month count
2009-01 12
2009-02 310
2009-03 2379
2009-04 234
2009-05 14
2009-08 1
2009-09 34
2009-10 2386
I want to plot the data (months as x values and counts as y values). Since there are gaps in the data, I want to convert the Information for the Month into a date. I tried:
as.Date("2009-03", "%Y-%m")
But it did not work. Whats wrong? It seems that as.Date() requires also a day and is not able to set a standard value for the day? Which function solves my problem?
Since dates correspond to a numeric value and a starting date, you indeed need the day. If you really need your data to be in Date format, you can just fix the day to the first of each month manually by pasting it to the date:
month <- "2009-03"
as.Date(paste(month, "-01", sep=""))
Try this. (Here we use text=Lines to keep the example self contained but in reality we would replace it with the file name.)
Lines <- "2009-01 12
2009-02 310
2009-03 2379
2009-04 234
2009-05 14
2009-08 1
2009-09 34
2009-10 2386"
library(zoo)
z <- read.zoo(text = Lines, FUN = as.yearmon)
plot(z)
The X axis is not so pretty with this data but if you have more data in reality it might be ok or you can use the code for a fancy X axis shown in the examples section of ?plot.zoo .
The zoo series, z, that is created above has a "yearmon" time index and looks like this:
> z
Jan 2009 Feb 2009 Mar 2009 Apr 2009 May 2009 Aug 2009 Sep 2009 Oct 2009
12 310 2379 234 14 1 34 2386
"yearmon" can be used alone as well:
> as.yearmon("2000-03")
[1] "Mar 2000"
Note:
"yearmon" class objects sort in calendar order.
This will plot the monthly points at equally spaced intervals which is likely what is wanted; however, if it were desired to plot the points at unequally spaced intervals spaced in proportion to the number of days in each month then convert the index of z to "Date" class: time(z) <- as.Date(time(z)) .
The most concise solution if you need the dates to be in Date format:
library(zoo)
month <- "2000-03"
as.Date(as.yearmon(month))
[1] "2000-03-01"
as.Date will fix the first day of each month to a yearmon object for you.
You could also achieve this with the parse_date_time or fast_strptime functions from the lubridate-package:
> parse_date_time(dates1, "ym")
[1] "2009-01-01 UTC" "2009-02-01 UTC" "2009-03-01 UTC"
> fast_strptime(dates1, "%Y-%m")
[1] "2009-01-01 UTC" "2009-02-01 UTC" "2009-03-01 UTC"
The difference between those two is that parse_date_time allows for lubridate-style format specification, while fast_strptime requires the same format specification as strptime.
For specifying the timezone, you can use the tz-parameter:
> parse_date_time(dates1, "ym", tz = "CET")
[1] "2009-01-01 CET" "2009-02-01 CET" "2009-03-01 CET"
When you have irregularities in your date-time data, you can use the truncated-parameter to specify how many irregularities are allowed:
> parse_date_time(dates2, "ymdHMS", truncated = 3)
[1] "2012-06-01 12:23:00 UTC" "2012-06-01 12:00:00 UTC" "2012-06-01 00:00:00 UTC"
Used data:
dates1 <- c("2009-01","2009-02","2009-03")
dates2 <- c("2012-06-01 12:23","2012-06-01 12",'2012-06-01")
Using anytime package:
library(anytime)
anydate("2009-01")
# [1] "2009-01-01"
Indeed, as has been mentioned above (and elsewhere on SO), in order to convert the string to a date, you need a specific date of the month. From the as.Date() manual page:
If the date string does not specify the date completely, the returned answer may be system-specific. The most common behaviour is to assume that a missing year, month or day is the current one. If it specifies a date incorrectly, reliable implementations will give an error and the date is reported as NA. Unfortunately some common implementations (such as glibc) are unreliable and guess at the intended meaning.
A simple solution would be to paste the date "01" to each date and use strptime() to indicate it as the first day of that month.
For those seeking a little more background on processing dates and times in R:
In R, times use POSIXct and POSIXlt classes and dates use the Date class.
Dates are stored as the number of days since January 1st, 1970 and times are stored as the number of seconds since January 1st, 1970.
So, for example:
d <- as.Date("1971-01-01")
unclass(d) # one year after 1970-01-01
# [1] 365
pct <- Sys.time() # in POSIXct
unclass(pct) # number of seconds since 1970-01-01
# [1] 1450276559
plt <- as.POSIXlt(pct)
up <- unclass(plt) # up is now a list containing the components of time
names(up)
# [1] "sec" "min" "hour" "mday" "mon" "year" "wday" "yday" "isdst" "zone"
# [11] "gmtoff"
up$hour
# [1] 9
To perform operations on dates and times:
plt - as.POSIXlt(d)
# Time difference of 16420.61 days
And to process dates, you can use strptime() (borrowing these examples from the manual page):
strptime("20/2/06 11:16:16.683", "%d/%m/%y %H:%M:%OS")
# [1] "2006-02-20 11:16:16 EST"
# And in vectorized form:
dates <- c("1jan1960", "2jan1960", "31mar1960", "30jul1960")
strptime(dates, "%d%b%Y")
# [1] "1960-01-01 EST" "1960-01-02 EST" "1960-03-31 EST" "1960-07-30 EDT"
I think #ben-rollert's solution is a good solution.
You just have to be careful if you want to use this solution in a function inside a new package.
When developping packages, it's recommended to use the syntaxe packagename::function_name() (see http://kbroman.org/pkg_primer/pages/depends.html).
In this case, you have to use the version of as.Date() defined by the zoo library.
Here is an example :
> devtools::session_info()
Session info ----------------------------------------------------------------------------------------------------------------------------------------------------
setting value
version R version 3.3.1 (2016-06-21)
system x86_64, linux-gnu
ui RStudio (1.0.35)
language (EN)
collate C
tz <NA>
date 2016-11-09
Packages --------------------------------------------------------------------------------------------------------------------------------------------------------
package * version date source
devtools 1.12.0 2016-06-24 CRAN (R 3.3.1)
digest 0.6.10 2016-08-02 CRAN (R 3.2.3)
memoise 1.0.0 2016-01-29 CRAN (R 3.2.3)
withr 1.0.2 2016-06-20 CRAN (R 3.2.3)
> as.Date(zoo::as.yearmon("1989-10", "%Y-%m"))
Error in as.Date.default(zoo::as.yearmon("1989-10", "%Y-%m")) :
do not know how to convert 'zoo::as.yearmon("1989-10", "%Y-%m")' to class “Date”
> zoo::as.Date(zoo::as.yearmon("1989-10", "%Y-%m"))
[1] "1989-10-01"
So if you're developping a package, the good practice is to use :
zoo::as.Date(zoo::as.yearmon("1989-10", "%Y-%m"))
tidyverse recently added the clock package in addition to lubridate that has some nice functionality for this:
library(clock)
x <- year_month_day_parse(df$Month, format = "%Y-%m", precision = "month")
# <year_month_day<month>[8]>
# [1] "2009-01" "2009-02" "2009-03" "2009-04" "2009-05" "2009-08" "2009-09" "2009-10"
Date Manipulation and Extraction
The output of this is a year-month-day vector where you can still do date arithmetic and apply other common functions as expected:
sort(x, decreasing = T)
# <year_month_day<month>[8]>
# [1] "2009-10" "2009-09" "2009-08" "2009-05" "2009-04" "2009-03" "2009-02" "2009-01"
add_months(x, 3)
# <year_month_day<month>[8]>
# [1] "2009-04" "2009-05" "2009-06" "2009-07" "2009-08" "2009-11" "2009-12" "2010-01"
add_years(x, -2)
# <year_month_day<month>[8]>
# [1] "2007-01" "2007-02" "2007-03" "2007-04" "2007-05" "2007-08" "2007-09" "2007-10"
get_month(x)
# [1] 1 2 3 4 5 8 9 10
You can also set the day, if you need it, with set_day:
set_day(x, 1)
<year_month_day<day>[8]>
[1] "2009-01-01" "2009-02-01" "2009-03-01" "2009-04-01" "2009-05-01" "2009-08-01"
[7] "2009-09-01" "2009-10-01"
Handling Invalid Dates
Or if you wanted to cleanly get the last day of every month with this structure, the invalid_* set of functions can help:
# not 31 days in Feb, Apr, Sep
y <- set_day(x, 31)
# <year_month_day<day>[8]>
# [1] "2009-01-31" "2009-02-31" "2009-03-31" "2009-04-31" "2009-05-31" "2009-08-31"
# [7] "2009-09-31" "2009-10-31"
invalid_any(y)
[1] TRUE
invalid_detect(y)
[1] FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE
You can handle invalid dates with invalid_resolve or you can use drop them with invalid_remove:
invalid_resolve(y, invalid = "previous")
<year_month_day<day>[8]>
[1] "2009-01-31" "2009-02-28" "2009-03-31" "2009-04-30" "2009-05-31" "2009-08-31"
[7] "2009-09-30" "2009-10-31"
From the documentation you can specify the following values for the invalid argument to handle invalid dates:
"previous": The previous valid instant in time.
"previous-day": The previous valid day in time, keeping the time of day.
"next": The next valid instant in time.
"next-day": The next valid day in time, keeping the time of day.
"overflow": Overflow by the number of days that the input is invalid
by. Time of day is dropped.
"overflow-day": Overflow by the number of days that the input is
invalid by. Time of day is kept.
"NA": Replace invalid dates with NA.
"error": Error on invalid dates.
A way using ym from lubridate.
The month can either be a number, an abbreviated month or a full month name with a variety of separators (even without separator), e.g.
library(lubridate)
ym(c("2012/September", "2012-Aug", "2012.07", 201204))
[1] "2012-09-01" "2012-08-01" "2012-07-01" "2012-04-01"
on the given data:
ym(dat$Month)
[1] "2009-01-01" "2009-02-01" "2009-03-01" "2009-04-01" "2009-05-01"
[6] "2009-08-01" "2009-09-01" "2009-10-01"
Note that there's also my if you have it the other way round, e.g. Sep/2022.
Data
dat <- structure(list(Month = c("2009-01", "2009-02", "2009-03", "2009-04",
"2009-05", "2009-08", "2009-09", "2009-10"), count = c(12L, 310L,
2379L, 234L, 14L, 1L, 34L, 2386L)), class = "data.frame", row.names = c(NA,
-8L))

transfer Date format in R [duplicate]

I have a dataset that looks like this:
Month count
2009-01 12
2009-02 310
2009-03 2379
2009-04 234
2009-05 14
2009-08 1
2009-09 34
2009-10 2386
I want to plot the data (months as x values and counts as y values). Since there are gaps in the data, I want to convert the Information for the Month into a date. I tried:
as.Date("2009-03", "%Y-%m")
But it did not work. Whats wrong? It seems that as.Date() requires also a day and is not able to set a standard value for the day? Which function solves my problem?
Since dates correspond to a numeric value and a starting date, you indeed need the day. If you really need your data to be in Date format, you can just fix the day to the first of each month manually by pasting it to the date:
month <- "2009-03"
as.Date(paste(month, "-01", sep=""))
Try this. (Here we use text=Lines to keep the example self contained but in reality we would replace it with the file name.)
Lines <- "2009-01 12
2009-02 310
2009-03 2379
2009-04 234
2009-05 14
2009-08 1
2009-09 34
2009-10 2386"
library(zoo)
z <- read.zoo(text = Lines, FUN = as.yearmon)
plot(z)
The X axis is not so pretty with this data but if you have more data in reality it might be ok or you can use the code for a fancy X axis shown in the examples section of ?plot.zoo .
The zoo series, z, that is created above has a "yearmon" time index and looks like this:
> z
Jan 2009 Feb 2009 Mar 2009 Apr 2009 May 2009 Aug 2009 Sep 2009 Oct 2009
12 310 2379 234 14 1 34 2386
"yearmon" can be used alone as well:
> as.yearmon("2000-03")
[1] "Mar 2000"
Note:
"yearmon" class objects sort in calendar order.
This will plot the monthly points at equally spaced intervals which is likely what is wanted; however, if it were desired to plot the points at unequally spaced intervals spaced in proportion to the number of days in each month then convert the index of z to "Date" class: time(z) <- as.Date(time(z)) .
The most concise solution if you need the dates to be in Date format:
library(zoo)
month <- "2000-03"
as.Date(as.yearmon(month))
[1] "2000-03-01"
as.Date will fix the first day of each month to a yearmon object for you.
You could also achieve this with the parse_date_time or fast_strptime functions from the lubridate-package:
> parse_date_time(dates1, "ym")
[1] "2009-01-01 UTC" "2009-02-01 UTC" "2009-03-01 UTC"
> fast_strptime(dates1, "%Y-%m")
[1] "2009-01-01 UTC" "2009-02-01 UTC" "2009-03-01 UTC"
The difference between those two is that parse_date_time allows for lubridate-style format specification, while fast_strptime requires the same format specification as strptime.
For specifying the timezone, you can use the tz-parameter:
> parse_date_time(dates1, "ym", tz = "CET")
[1] "2009-01-01 CET" "2009-02-01 CET" "2009-03-01 CET"
When you have irregularities in your date-time data, you can use the truncated-parameter to specify how many irregularities are allowed:
> parse_date_time(dates2, "ymdHMS", truncated = 3)
[1] "2012-06-01 12:23:00 UTC" "2012-06-01 12:00:00 UTC" "2012-06-01 00:00:00 UTC"
Used data:
dates1 <- c("2009-01","2009-02","2009-03")
dates2 <- c("2012-06-01 12:23","2012-06-01 12",'2012-06-01")
Using anytime package:
library(anytime)
anydate("2009-01")
# [1] "2009-01-01"
Indeed, as has been mentioned above (and elsewhere on SO), in order to convert the string to a date, you need a specific date of the month. From the as.Date() manual page:
If the date string does not specify the date completely, the returned answer may be system-specific. The most common behaviour is to assume that a missing year, month or day is the current one. If it specifies a date incorrectly, reliable implementations will give an error and the date is reported as NA. Unfortunately some common implementations (such as glibc) are unreliable and guess at the intended meaning.
A simple solution would be to paste the date "01" to each date and use strptime() to indicate it as the first day of that month.
For those seeking a little more background on processing dates and times in R:
In R, times use POSIXct and POSIXlt classes and dates use the Date class.
Dates are stored as the number of days since January 1st, 1970 and times are stored as the number of seconds since January 1st, 1970.
So, for example:
d <- as.Date("1971-01-01")
unclass(d) # one year after 1970-01-01
# [1] 365
pct <- Sys.time() # in POSIXct
unclass(pct) # number of seconds since 1970-01-01
# [1] 1450276559
plt <- as.POSIXlt(pct)
up <- unclass(plt) # up is now a list containing the components of time
names(up)
# [1] "sec" "min" "hour" "mday" "mon" "year" "wday" "yday" "isdst" "zone"
# [11] "gmtoff"
up$hour
# [1] 9
To perform operations on dates and times:
plt - as.POSIXlt(d)
# Time difference of 16420.61 days
And to process dates, you can use strptime() (borrowing these examples from the manual page):
strptime("20/2/06 11:16:16.683", "%d/%m/%y %H:%M:%OS")
# [1] "2006-02-20 11:16:16 EST"
# And in vectorized form:
dates <- c("1jan1960", "2jan1960", "31mar1960", "30jul1960")
strptime(dates, "%d%b%Y")
# [1] "1960-01-01 EST" "1960-01-02 EST" "1960-03-31 EST" "1960-07-30 EDT"
I think #ben-rollert's solution is a good solution.
You just have to be careful if you want to use this solution in a function inside a new package.
When developping packages, it's recommended to use the syntaxe packagename::function_name() (see http://kbroman.org/pkg_primer/pages/depends.html).
In this case, you have to use the version of as.Date() defined by the zoo library.
Here is an example :
> devtools::session_info()
Session info ----------------------------------------------------------------------------------------------------------------------------------------------------
setting value
version R version 3.3.1 (2016-06-21)
system x86_64, linux-gnu
ui RStudio (1.0.35)
language (EN)
collate C
tz <NA>
date 2016-11-09
Packages --------------------------------------------------------------------------------------------------------------------------------------------------------
package * version date source
devtools 1.12.0 2016-06-24 CRAN (R 3.3.1)
digest 0.6.10 2016-08-02 CRAN (R 3.2.3)
memoise 1.0.0 2016-01-29 CRAN (R 3.2.3)
withr 1.0.2 2016-06-20 CRAN (R 3.2.3)
> as.Date(zoo::as.yearmon("1989-10", "%Y-%m"))
Error in as.Date.default(zoo::as.yearmon("1989-10", "%Y-%m")) :
do not know how to convert 'zoo::as.yearmon("1989-10", "%Y-%m")' to class “Date”
> zoo::as.Date(zoo::as.yearmon("1989-10", "%Y-%m"))
[1] "1989-10-01"
So if you're developping a package, the good practice is to use :
zoo::as.Date(zoo::as.yearmon("1989-10", "%Y-%m"))
tidyverse recently added the clock package in addition to lubridate that has some nice functionality for this:
library(clock)
x <- year_month_day_parse(df$Month, format = "%Y-%m", precision = "month")
# <year_month_day<month>[8]>
# [1] "2009-01" "2009-02" "2009-03" "2009-04" "2009-05" "2009-08" "2009-09" "2009-10"
Date Manipulation and Extraction
The output of this is a year-month-day vector where you can still do date arithmetic and apply other common functions as expected:
sort(x, decreasing = T)
# <year_month_day<month>[8]>
# [1] "2009-10" "2009-09" "2009-08" "2009-05" "2009-04" "2009-03" "2009-02" "2009-01"
add_months(x, 3)
# <year_month_day<month>[8]>
# [1] "2009-04" "2009-05" "2009-06" "2009-07" "2009-08" "2009-11" "2009-12" "2010-01"
add_years(x, -2)
# <year_month_day<month>[8]>
# [1] "2007-01" "2007-02" "2007-03" "2007-04" "2007-05" "2007-08" "2007-09" "2007-10"
get_month(x)
# [1] 1 2 3 4 5 8 9 10
You can also set the day, if you need it, with set_day:
set_day(x, 1)
<year_month_day<day>[8]>
[1] "2009-01-01" "2009-02-01" "2009-03-01" "2009-04-01" "2009-05-01" "2009-08-01"
[7] "2009-09-01" "2009-10-01"
Handling Invalid Dates
Or if you wanted to cleanly get the last day of every month with this structure, the invalid_* set of functions can help:
# not 31 days in Feb, Apr, Sep
y <- set_day(x, 31)
# <year_month_day<day>[8]>
# [1] "2009-01-31" "2009-02-31" "2009-03-31" "2009-04-31" "2009-05-31" "2009-08-31"
# [7] "2009-09-31" "2009-10-31"
invalid_any(y)
[1] TRUE
invalid_detect(y)
[1] FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE
You can handle invalid dates with invalid_resolve or you can use drop them with invalid_remove:
invalid_resolve(y, invalid = "previous")
<year_month_day<day>[8]>
[1] "2009-01-31" "2009-02-28" "2009-03-31" "2009-04-30" "2009-05-31" "2009-08-31"
[7] "2009-09-30" "2009-10-31"
From the documentation you can specify the following values for the invalid argument to handle invalid dates:
"previous": The previous valid instant in time.
"previous-day": The previous valid day in time, keeping the time of day.
"next": The next valid instant in time.
"next-day": The next valid day in time, keeping the time of day.
"overflow": Overflow by the number of days that the input is invalid
by. Time of day is dropped.
"overflow-day": Overflow by the number of days that the input is
invalid by. Time of day is kept.
"NA": Replace invalid dates with NA.
"error": Error on invalid dates.
A way using ym from lubridate.
The month can either be a number, an abbreviated month or a full month name with a variety of separators (even without separator), e.g.
library(lubridate)
ym(c("2012/September", "2012-Aug", "2012.07", 201204))
[1] "2012-09-01" "2012-08-01" "2012-07-01" "2012-04-01"
on the given data:
ym(dat$Month)
[1] "2009-01-01" "2009-02-01" "2009-03-01" "2009-04-01" "2009-05-01"
[6] "2009-08-01" "2009-09-01" "2009-10-01"
Note that there's also my if you have it the other way round, e.g. Sep/2022.
Data
dat <- structure(list(Month = c("2009-01", "2009-02", "2009-03", "2009-04",
"2009-05", "2009-08", "2009-09", "2009-10"), count = c(12L, 310L,
2379L, 234L, 14L, 1L, 34L, 2386L)), class = "data.frame", row.names = c(NA,
-8L))

I have a simple question. convert yyyy-mm to yyyy-mm-dd in r [duplicate]

I have a dataset that looks like this:
Month count
2009-01 12
2009-02 310
2009-03 2379
2009-04 234
2009-05 14
2009-08 1
2009-09 34
2009-10 2386
I want to plot the data (months as x values and counts as y values). Since there are gaps in the data, I want to convert the Information for the Month into a date. I tried:
as.Date("2009-03", "%Y-%m")
But it did not work. Whats wrong? It seems that as.Date() requires also a day and is not able to set a standard value for the day? Which function solves my problem?
Since dates correspond to a numeric value and a starting date, you indeed need the day. If you really need your data to be in Date format, you can just fix the day to the first of each month manually by pasting it to the date:
month <- "2009-03"
as.Date(paste(month, "-01", sep=""))
Try this. (Here we use text=Lines to keep the example self contained but in reality we would replace it with the file name.)
Lines <- "2009-01 12
2009-02 310
2009-03 2379
2009-04 234
2009-05 14
2009-08 1
2009-09 34
2009-10 2386"
library(zoo)
z <- read.zoo(text = Lines, FUN = as.yearmon)
plot(z)
The X axis is not so pretty with this data but if you have more data in reality it might be ok or you can use the code for a fancy X axis shown in the examples section of ?plot.zoo .
The zoo series, z, that is created above has a "yearmon" time index and looks like this:
> z
Jan 2009 Feb 2009 Mar 2009 Apr 2009 May 2009 Aug 2009 Sep 2009 Oct 2009
12 310 2379 234 14 1 34 2386
"yearmon" can be used alone as well:
> as.yearmon("2000-03")
[1] "Mar 2000"
Note:
"yearmon" class objects sort in calendar order.
This will plot the monthly points at equally spaced intervals which is likely what is wanted; however, if it were desired to plot the points at unequally spaced intervals spaced in proportion to the number of days in each month then convert the index of z to "Date" class: time(z) <- as.Date(time(z)) .
The most concise solution if you need the dates to be in Date format:
library(zoo)
month <- "2000-03"
as.Date(as.yearmon(month))
[1] "2000-03-01"
as.Date will fix the first day of each month to a yearmon object for you.
You could also achieve this with the parse_date_time or fast_strptime functions from the lubridate-package:
> parse_date_time(dates1, "ym")
[1] "2009-01-01 UTC" "2009-02-01 UTC" "2009-03-01 UTC"
> fast_strptime(dates1, "%Y-%m")
[1] "2009-01-01 UTC" "2009-02-01 UTC" "2009-03-01 UTC"
The difference between those two is that parse_date_time allows for lubridate-style format specification, while fast_strptime requires the same format specification as strptime.
For specifying the timezone, you can use the tz-parameter:
> parse_date_time(dates1, "ym", tz = "CET")
[1] "2009-01-01 CET" "2009-02-01 CET" "2009-03-01 CET"
When you have irregularities in your date-time data, you can use the truncated-parameter to specify how many irregularities are allowed:
> parse_date_time(dates2, "ymdHMS", truncated = 3)
[1] "2012-06-01 12:23:00 UTC" "2012-06-01 12:00:00 UTC" "2012-06-01 00:00:00 UTC"
Used data:
dates1 <- c("2009-01","2009-02","2009-03")
dates2 <- c("2012-06-01 12:23","2012-06-01 12",'2012-06-01")
Using anytime package:
library(anytime)
anydate("2009-01")
# [1] "2009-01-01"
Indeed, as has been mentioned above (and elsewhere on SO), in order to convert the string to a date, you need a specific date of the month. From the as.Date() manual page:
If the date string does not specify the date completely, the returned answer may be system-specific. The most common behaviour is to assume that a missing year, month or day is the current one. If it specifies a date incorrectly, reliable implementations will give an error and the date is reported as NA. Unfortunately some common implementations (such as glibc) are unreliable and guess at the intended meaning.
A simple solution would be to paste the date "01" to each date and use strptime() to indicate it as the first day of that month.
For those seeking a little more background on processing dates and times in R:
In R, times use POSIXct and POSIXlt classes and dates use the Date class.
Dates are stored as the number of days since January 1st, 1970 and times are stored as the number of seconds since January 1st, 1970.
So, for example:
d <- as.Date("1971-01-01")
unclass(d) # one year after 1970-01-01
# [1] 365
pct <- Sys.time() # in POSIXct
unclass(pct) # number of seconds since 1970-01-01
# [1] 1450276559
plt <- as.POSIXlt(pct)
up <- unclass(plt) # up is now a list containing the components of time
names(up)
# [1] "sec" "min" "hour" "mday" "mon" "year" "wday" "yday" "isdst" "zone"
# [11] "gmtoff"
up$hour
# [1] 9
To perform operations on dates and times:
plt - as.POSIXlt(d)
# Time difference of 16420.61 days
And to process dates, you can use strptime() (borrowing these examples from the manual page):
strptime("20/2/06 11:16:16.683", "%d/%m/%y %H:%M:%OS")
# [1] "2006-02-20 11:16:16 EST"
# And in vectorized form:
dates <- c("1jan1960", "2jan1960", "31mar1960", "30jul1960")
strptime(dates, "%d%b%Y")
# [1] "1960-01-01 EST" "1960-01-02 EST" "1960-03-31 EST" "1960-07-30 EDT"
I think #ben-rollert's solution is a good solution.
You just have to be careful if you want to use this solution in a function inside a new package.
When developping packages, it's recommended to use the syntaxe packagename::function_name() (see http://kbroman.org/pkg_primer/pages/depends.html).
In this case, you have to use the version of as.Date() defined by the zoo library.
Here is an example :
> devtools::session_info()
Session info ----------------------------------------------------------------------------------------------------------------------------------------------------
setting value
version R version 3.3.1 (2016-06-21)
system x86_64, linux-gnu
ui RStudio (1.0.35)
language (EN)
collate C
tz <NA>
date 2016-11-09
Packages --------------------------------------------------------------------------------------------------------------------------------------------------------
package * version date source
devtools 1.12.0 2016-06-24 CRAN (R 3.3.1)
digest 0.6.10 2016-08-02 CRAN (R 3.2.3)
memoise 1.0.0 2016-01-29 CRAN (R 3.2.3)
withr 1.0.2 2016-06-20 CRAN (R 3.2.3)
> as.Date(zoo::as.yearmon("1989-10", "%Y-%m"))
Error in as.Date.default(zoo::as.yearmon("1989-10", "%Y-%m")) :
do not know how to convert 'zoo::as.yearmon("1989-10", "%Y-%m")' to class “Date”
> zoo::as.Date(zoo::as.yearmon("1989-10", "%Y-%m"))
[1] "1989-10-01"
So if you're developping a package, the good practice is to use :
zoo::as.Date(zoo::as.yearmon("1989-10", "%Y-%m"))
tidyverse recently added the clock package in addition to lubridate that has some nice functionality for this:
library(clock)
x <- year_month_day_parse(df$Month, format = "%Y-%m", precision = "month")
# <year_month_day<month>[8]>
# [1] "2009-01" "2009-02" "2009-03" "2009-04" "2009-05" "2009-08" "2009-09" "2009-10"
Date Manipulation and Extraction
The output of this is a year-month-day vector where you can still do date arithmetic and apply other common functions as expected:
sort(x, decreasing = T)
# <year_month_day<month>[8]>
# [1] "2009-10" "2009-09" "2009-08" "2009-05" "2009-04" "2009-03" "2009-02" "2009-01"
add_months(x, 3)
# <year_month_day<month>[8]>
# [1] "2009-04" "2009-05" "2009-06" "2009-07" "2009-08" "2009-11" "2009-12" "2010-01"
add_years(x, -2)
# <year_month_day<month>[8]>
# [1] "2007-01" "2007-02" "2007-03" "2007-04" "2007-05" "2007-08" "2007-09" "2007-10"
get_month(x)
# [1] 1 2 3 4 5 8 9 10
You can also set the day, if you need it, with set_day:
set_day(x, 1)
<year_month_day<day>[8]>
[1] "2009-01-01" "2009-02-01" "2009-03-01" "2009-04-01" "2009-05-01" "2009-08-01"
[7] "2009-09-01" "2009-10-01"
Handling Invalid Dates
Or if you wanted to cleanly get the last day of every month with this structure, the invalid_* set of functions can help:
# not 31 days in Feb, Apr, Sep
y <- set_day(x, 31)
# <year_month_day<day>[8]>
# [1] "2009-01-31" "2009-02-31" "2009-03-31" "2009-04-31" "2009-05-31" "2009-08-31"
# [7] "2009-09-31" "2009-10-31"
invalid_any(y)
[1] TRUE
invalid_detect(y)
[1] FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE
You can handle invalid dates with invalid_resolve or you can use drop them with invalid_remove:
invalid_resolve(y, invalid = "previous")
<year_month_day<day>[8]>
[1] "2009-01-31" "2009-02-28" "2009-03-31" "2009-04-30" "2009-05-31" "2009-08-31"
[7] "2009-09-30" "2009-10-31"
From the documentation you can specify the following values for the invalid argument to handle invalid dates:
"previous": The previous valid instant in time.
"previous-day": The previous valid day in time, keeping the time of day.
"next": The next valid instant in time.
"next-day": The next valid day in time, keeping the time of day.
"overflow": Overflow by the number of days that the input is invalid
by. Time of day is dropped.
"overflow-day": Overflow by the number of days that the input is
invalid by. Time of day is kept.
"NA": Replace invalid dates with NA.
"error": Error on invalid dates.
A way using ym from lubridate.
The month can either be a number, an abbreviated month or a full month name with a variety of separators (even without separator), e.g.
library(lubridate)
ym(c("2012/September", "2012-Aug", "2012.07", 201204))
[1] "2012-09-01" "2012-08-01" "2012-07-01" "2012-04-01"
on the given data:
ym(dat$Month)
[1] "2009-01-01" "2009-02-01" "2009-03-01" "2009-04-01" "2009-05-01"
[6] "2009-08-01" "2009-09-01" "2009-10-01"
Note that there's also my if you have it the other way round, e.g. Sep/2022.
Data
dat <- structure(list(Month = c("2009-01", "2009-02", "2009-03", "2009-04",
"2009-05", "2009-08", "2009-09", "2009-10"), count = c(12L, 310L,
2379L, 234L, 14L, 1L, 34L, 2386L)), class = "data.frame", row.names = c(NA,
-8L))

Convert a date column with space to r date column [duplicate]

I have a dataset that looks like this:
Month count
2009-01 12
2009-02 310
2009-03 2379
2009-04 234
2009-05 14
2009-08 1
2009-09 34
2009-10 2386
I want to plot the data (months as x values and counts as y values). Since there are gaps in the data, I want to convert the Information for the Month into a date. I tried:
as.Date("2009-03", "%Y-%m")
But it did not work. Whats wrong? It seems that as.Date() requires also a day and is not able to set a standard value for the day? Which function solves my problem?
Since dates correspond to a numeric value and a starting date, you indeed need the day. If you really need your data to be in Date format, you can just fix the day to the first of each month manually by pasting it to the date:
month <- "2009-03"
as.Date(paste(month, "-01", sep=""))
Try this. (Here we use text=Lines to keep the example self contained but in reality we would replace it with the file name.)
Lines <- "2009-01 12
2009-02 310
2009-03 2379
2009-04 234
2009-05 14
2009-08 1
2009-09 34
2009-10 2386"
library(zoo)
z <- read.zoo(text = Lines, FUN = as.yearmon)
plot(z)
The X axis is not so pretty with this data but if you have more data in reality it might be ok or you can use the code for a fancy X axis shown in the examples section of ?plot.zoo .
The zoo series, z, that is created above has a "yearmon" time index and looks like this:
> z
Jan 2009 Feb 2009 Mar 2009 Apr 2009 May 2009 Aug 2009 Sep 2009 Oct 2009
12 310 2379 234 14 1 34 2386
"yearmon" can be used alone as well:
> as.yearmon("2000-03")
[1] "Mar 2000"
Note:
"yearmon" class objects sort in calendar order.
This will plot the monthly points at equally spaced intervals which is likely what is wanted; however, if it were desired to plot the points at unequally spaced intervals spaced in proportion to the number of days in each month then convert the index of z to "Date" class: time(z) <- as.Date(time(z)) .
The most concise solution if you need the dates to be in Date format:
library(zoo)
month <- "2000-03"
as.Date(as.yearmon(month))
[1] "2000-03-01"
as.Date will fix the first day of each month to a yearmon object for you.
You could also achieve this with the parse_date_time or fast_strptime functions from the lubridate-package:
> parse_date_time(dates1, "ym")
[1] "2009-01-01 UTC" "2009-02-01 UTC" "2009-03-01 UTC"
> fast_strptime(dates1, "%Y-%m")
[1] "2009-01-01 UTC" "2009-02-01 UTC" "2009-03-01 UTC"
The difference between those two is that parse_date_time allows for lubridate-style format specification, while fast_strptime requires the same format specification as strptime.
For specifying the timezone, you can use the tz-parameter:
> parse_date_time(dates1, "ym", tz = "CET")
[1] "2009-01-01 CET" "2009-02-01 CET" "2009-03-01 CET"
When you have irregularities in your date-time data, you can use the truncated-parameter to specify how many irregularities are allowed:
> parse_date_time(dates2, "ymdHMS", truncated = 3)
[1] "2012-06-01 12:23:00 UTC" "2012-06-01 12:00:00 UTC" "2012-06-01 00:00:00 UTC"
Used data:
dates1 <- c("2009-01","2009-02","2009-03")
dates2 <- c("2012-06-01 12:23","2012-06-01 12",'2012-06-01")
Using anytime package:
library(anytime)
anydate("2009-01")
# [1] "2009-01-01"
Indeed, as has been mentioned above (and elsewhere on SO), in order to convert the string to a date, you need a specific date of the month. From the as.Date() manual page:
If the date string does not specify the date completely, the returned answer may be system-specific. The most common behaviour is to assume that a missing year, month or day is the current one. If it specifies a date incorrectly, reliable implementations will give an error and the date is reported as NA. Unfortunately some common implementations (such as glibc) are unreliable and guess at the intended meaning.
A simple solution would be to paste the date "01" to each date and use strptime() to indicate it as the first day of that month.
For those seeking a little more background on processing dates and times in R:
In R, times use POSIXct and POSIXlt classes and dates use the Date class.
Dates are stored as the number of days since January 1st, 1970 and times are stored as the number of seconds since January 1st, 1970.
So, for example:
d <- as.Date("1971-01-01")
unclass(d) # one year after 1970-01-01
# [1] 365
pct <- Sys.time() # in POSIXct
unclass(pct) # number of seconds since 1970-01-01
# [1] 1450276559
plt <- as.POSIXlt(pct)
up <- unclass(plt) # up is now a list containing the components of time
names(up)
# [1] "sec" "min" "hour" "mday" "mon" "year" "wday" "yday" "isdst" "zone"
# [11] "gmtoff"
up$hour
# [1] 9
To perform operations on dates and times:
plt - as.POSIXlt(d)
# Time difference of 16420.61 days
And to process dates, you can use strptime() (borrowing these examples from the manual page):
strptime("20/2/06 11:16:16.683", "%d/%m/%y %H:%M:%OS")
# [1] "2006-02-20 11:16:16 EST"
# And in vectorized form:
dates <- c("1jan1960", "2jan1960", "31mar1960", "30jul1960")
strptime(dates, "%d%b%Y")
# [1] "1960-01-01 EST" "1960-01-02 EST" "1960-03-31 EST" "1960-07-30 EDT"
I think #ben-rollert's solution is a good solution.
You just have to be careful if you want to use this solution in a function inside a new package.
When developping packages, it's recommended to use the syntaxe packagename::function_name() (see http://kbroman.org/pkg_primer/pages/depends.html).
In this case, you have to use the version of as.Date() defined by the zoo library.
Here is an example :
> devtools::session_info()
Session info ----------------------------------------------------------------------------------------------------------------------------------------------------
setting value
version R version 3.3.1 (2016-06-21)
system x86_64, linux-gnu
ui RStudio (1.0.35)
language (EN)
collate C
tz <NA>
date 2016-11-09
Packages --------------------------------------------------------------------------------------------------------------------------------------------------------
package * version date source
devtools 1.12.0 2016-06-24 CRAN (R 3.3.1)
digest 0.6.10 2016-08-02 CRAN (R 3.2.3)
memoise 1.0.0 2016-01-29 CRAN (R 3.2.3)
withr 1.0.2 2016-06-20 CRAN (R 3.2.3)
> as.Date(zoo::as.yearmon("1989-10", "%Y-%m"))
Error in as.Date.default(zoo::as.yearmon("1989-10", "%Y-%m")) :
do not know how to convert 'zoo::as.yearmon("1989-10", "%Y-%m")' to class “Date”
> zoo::as.Date(zoo::as.yearmon("1989-10", "%Y-%m"))
[1] "1989-10-01"
So if you're developping a package, the good practice is to use :
zoo::as.Date(zoo::as.yearmon("1989-10", "%Y-%m"))
tidyverse recently added the clock package in addition to lubridate that has some nice functionality for this:
library(clock)
x <- year_month_day_parse(df$Month, format = "%Y-%m", precision = "month")
# <year_month_day<month>[8]>
# [1] "2009-01" "2009-02" "2009-03" "2009-04" "2009-05" "2009-08" "2009-09" "2009-10"
Date Manipulation and Extraction
The output of this is a year-month-day vector where you can still do date arithmetic and apply other common functions as expected:
sort(x, decreasing = T)
# <year_month_day<month>[8]>
# [1] "2009-10" "2009-09" "2009-08" "2009-05" "2009-04" "2009-03" "2009-02" "2009-01"
add_months(x, 3)
# <year_month_day<month>[8]>
# [1] "2009-04" "2009-05" "2009-06" "2009-07" "2009-08" "2009-11" "2009-12" "2010-01"
add_years(x, -2)
# <year_month_day<month>[8]>
# [1] "2007-01" "2007-02" "2007-03" "2007-04" "2007-05" "2007-08" "2007-09" "2007-10"
get_month(x)
# [1] 1 2 3 4 5 8 9 10
You can also set the day, if you need it, with set_day:
set_day(x, 1)
<year_month_day<day>[8]>
[1] "2009-01-01" "2009-02-01" "2009-03-01" "2009-04-01" "2009-05-01" "2009-08-01"
[7] "2009-09-01" "2009-10-01"
Handling Invalid Dates
Or if you wanted to cleanly get the last day of every month with this structure, the invalid_* set of functions can help:
# not 31 days in Feb, Apr, Sep
y <- set_day(x, 31)
# <year_month_day<day>[8]>
# [1] "2009-01-31" "2009-02-31" "2009-03-31" "2009-04-31" "2009-05-31" "2009-08-31"
# [7] "2009-09-31" "2009-10-31"
invalid_any(y)
[1] TRUE
invalid_detect(y)
[1] FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE
You can handle invalid dates with invalid_resolve or you can use drop them with invalid_remove:
invalid_resolve(y, invalid = "previous")
<year_month_day<day>[8]>
[1] "2009-01-31" "2009-02-28" "2009-03-31" "2009-04-30" "2009-05-31" "2009-08-31"
[7] "2009-09-30" "2009-10-31"
From the documentation you can specify the following values for the invalid argument to handle invalid dates:
"previous": The previous valid instant in time.
"previous-day": The previous valid day in time, keeping the time of day.
"next": The next valid instant in time.
"next-day": The next valid day in time, keeping the time of day.
"overflow": Overflow by the number of days that the input is invalid
by. Time of day is dropped.
"overflow-day": Overflow by the number of days that the input is
invalid by. Time of day is kept.
"NA": Replace invalid dates with NA.
"error": Error on invalid dates.
A way using ym from lubridate.
The month can either be a number, an abbreviated month or a full month name with a variety of separators (even without separator), e.g.
library(lubridate)
ym(c("2012/September", "2012-Aug", "2012.07", 201204))
[1] "2012-09-01" "2012-08-01" "2012-07-01" "2012-04-01"
on the given data:
ym(dat$Month)
[1] "2009-01-01" "2009-02-01" "2009-03-01" "2009-04-01" "2009-05-01"
[6] "2009-08-01" "2009-09-01" "2009-10-01"
Note that there's also my if you have it the other way round, e.g. Sep/2022.
Data
dat <- structure(list(Month = c("2009-01", "2009-02", "2009-03", "2009-04",
"2009-05", "2009-08", "2009-09", "2009-10"), count = c(12L, 310L,
2379L, 234L, 14L, 1L, 34L, 2386L)), class = "data.frame", row.names = c(NA,
-8L))

Read Date yyyymm [duplicate]

I have a dataset that looks like this:
Month count
2009-01 12
2009-02 310
2009-03 2379
2009-04 234
2009-05 14
2009-08 1
2009-09 34
2009-10 2386
I want to plot the data (months as x values and counts as y values). Since there are gaps in the data, I want to convert the Information for the Month into a date. I tried:
as.Date("2009-03", "%Y-%m")
But it did not work. Whats wrong? It seems that as.Date() requires also a day and is not able to set a standard value for the day? Which function solves my problem?
Since dates correspond to a numeric value and a starting date, you indeed need the day. If you really need your data to be in Date format, you can just fix the day to the first of each month manually by pasting it to the date:
month <- "2009-03"
as.Date(paste(month, "-01", sep=""))
Try this. (Here we use text=Lines to keep the example self contained but in reality we would replace it with the file name.)
Lines <- "2009-01 12
2009-02 310
2009-03 2379
2009-04 234
2009-05 14
2009-08 1
2009-09 34
2009-10 2386"
library(zoo)
z <- read.zoo(text = Lines, FUN = as.yearmon)
plot(z)
The X axis is not so pretty with this data but if you have more data in reality it might be ok or you can use the code for a fancy X axis shown in the examples section of ?plot.zoo .
The zoo series, z, that is created above has a "yearmon" time index and looks like this:
> z
Jan 2009 Feb 2009 Mar 2009 Apr 2009 May 2009 Aug 2009 Sep 2009 Oct 2009
12 310 2379 234 14 1 34 2386
"yearmon" can be used alone as well:
> as.yearmon("2000-03")
[1] "Mar 2000"
Note:
"yearmon" class objects sort in calendar order.
This will plot the monthly points at equally spaced intervals which is likely what is wanted; however, if it were desired to plot the points at unequally spaced intervals spaced in proportion to the number of days in each month then convert the index of z to "Date" class: time(z) <- as.Date(time(z)) .
The most concise solution if you need the dates to be in Date format:
library(zoo)
month <- "2000-03"
as.Date(as.yearmon(month))
[1] "2000-03-01"
as.Date will fix the first day of each month to a yearmon object for you.
You could also achieve this with the parse_date_time or fast_strptime functions from the lubridate-package:
> parse_date_time(dates1, "ym")
[1] "2009-01-01 UTC" "2009-02-01 UTC" "2009-03-01 UTC"
> fast_strptime(dates1, "%Y-%m")
[1] "2009-01-01 UTC" "2009-02-01 UTC" "2009-03-01 UTC"
The difference between those two is that parse_date_time allows for lubridate-style format specification, while fast_strptime requires the same format specification as strptime.
For specifying the timezone, you can use the tz-parameter:
> parse_date_time(dates1, "ym", tz = "CET")
[1] "2009-01-01 CET" "2009-02-01 CET" "2009-03-01 CET"
When you have irregularities in your date-time data, you can use the truncated-parameter to specify how many irregularities are allowed:
> parse_date_time(dates2, "ymdHMS", truncated = 3)
[1] "2012-06-01 12:23:00 UTC" "2012-06-01 12:00:00 UTC" "2012-06-01 00:00:00 UTC"
Used data:
dates1 <- c("2009-01","2009-02","2009-03")
dates2 <- c("2012-06-01 12:23","2012-06-01 12",'2012-06-01")
Using anytime package:
library(anytime)
anydate("2009-01")
# [1] "2009-01-01"
Indeed, as has been mentioned above (and elsewhere on SO), in order to convert the string to a date, you need a specific date of the month. From the as.Date() manual page:
If the date string does not specify the date completely, the returned answer may be system-specific. The most common behaviour is to assume that a missing year, month or day is the current one. If it specifies a date incorrectly, reliable implementations will give an error and the date is reported as NA. Unfortunately some common implementations (such as glibc) are unreliable and guess at the intended meaning.
A simple solution would be to paste the date "01" to each date and use strptime() to indicate it as the first day of that month.
For those seeking a little more background on processing dates and times in R:
In R, times use POSIXct and POSIXlt classes and dates use the Date class.
Dates are stored as the number of days since January 1st, 1970 and times are stored as the number of seconds since January 1st, 1970.
So, for example:
d <- as.Date("1971-01-01")
unclass(d) # one year after 1970-01-01
# [1] 365
pct <- Sys.time() # in POSIXct
unclass(pct) # number of seconds since 1970-01-01
# [1] 1450276559
plt <- as.POSIXlt(pct)
up <- unclass(plt) # up is now a list containing the components of time
names(up)
# [1] "sec" "min" "hour" "mday" "mon" "year" "wday" "yday" "isdst" "zone"
# [11] "gmtoff"
up$hour
# [1] 9
To perform operations on dates and times:
plt - as.POSIXlt(d)
# Time difference of 16420.61 days
And to process dates, you can use strptime() (borrowing these examples from the manual page):
strptime("20/2/06 11:16:16.683", "%d/%m/%y %H:%M:%OS")
# [1] "2006-02-20 11:16:16 EST"
# And in vectorized form:
dates <- c("1jan1960", "2jan1960", "31mar1960", "30jul1960")
strptime(dates, "%d%b%Y")
# [1] "1960-01-01 EST" "1960-01-02 EST" "1960-03-31 EST" "1960-07-30 EDT"
I think #ben-rollert's solution is a good solution.
You just have to be careful if you want to use this solution in a function inside a new package.
When developping packages, it's recommended to use the syntaxe packagename::function_name() (see http://kbroman.org/pkg_primer/pages/depends.html).
In this case, you have to use the version of as.Date() defined by the zoo library.
Here is an example :
> devtools::session_info()
Session info ----------------------------------------------------------------------------------------------------------------------------------------------------
setting value
version R version 3.3.1 (2016-06-21)
system x86_64, linux-gnu
ui RStudio (1.0.35)
language (EN)
collate C
tz <NA>
date 2016-11-09
Packages --------------------------------------------------------------------------------------------------------------------------------------------------------
package * version date source
devtools 1.12.0 2016-06-24 CRAN (R 3.3.1)
digest 0.6.10 2016-08-02 CRAN (R 3.2.3)
memoise 1.0.0 2016-01-29 CRAN (R 3.2.3)
withr 1.0.2 2016-06-20 CRAN (R 3.2.3)
> as.Date(zoo::as.yearmon("1989-10", "%Y-%m"))
Error in as.Date.default(zoo::as.yearmon("1989-10", "%Y-%m")) :
do not know how to convert 'zoo::as.yearmon("1989-10", "%Y-%m")' to class “Date”
> zoo::as.Date(zoo::as.yearmon("1989-10", "%Y-%m"))
[1] "1989-10-01"
So if you're developping a package, the good practice is to use :
zoo::as.Date(zoo::as.yearmon("1989-10", "%Y-%m"))
tidyverse recently added the clock package in addition to lubridate that has some nice functionality for this:
library(clock)
x <- year_month_day_parse(df$Month, format = "%Y-%m", precision = "month")
# <year_month_day<month>[8]>
# [1] "2009-01" "2009-02" "2009-03" "2009-04" "2009-05" "2009-08" "2009-09" "2009-10"
Date Manipulation and Extraction
The output of this is a year-month-day vector where you can still do date arithmetic and apply other common functions as expected:
sort(x, decreasing = T)
# <year_month_day<month>[8]>
# [1] "2009-10" "2009-09" "2009-08" "2009-05" "2009-04" "2009-03" "2009-02" "2009-01"
add_months(x, 3)
# <year_month_day<month>[8]>
# [1] "2009-04" "2009-05" "2009-06" "2009-07" "2009-08" "2009-11" "2009-12" "2010-01"
add_years(x, -2)
# <year_month_day<month>[8]>
# [1] "2007-01" "2007-02" "2007-03" "2007-04" "2007-05" "2007-08" "2007-09" "2007-10"
get_month(x)
# [1] 1 2 3 4 5 8 9 10
You can also set the day, if you need it, with set_day:
set_day(x, 1)
<year_month_day<day>[8]>
[1] "2009-01-01" "2009-02-01" "2009-03-01" "2009-04-01" "2009-05-01" "2009-08-01"
[7] "2009-09-01" "2009-10-01"
Handling Invalid Dates
Or if you wanted to cleanly get the last day of every month with this structure, the invalid_* set of functions can help:
# not 31 days in Feb, Apr, Sep
y <- set_day(x, 31)
# <year_month_day<day>[8]>
# [1] "2009-01-31" "2009-02-31" "2009-03-31" "2009-04-31" "2009-05-31" "2009-08-31"
# [7] "2009-09-31" "2009-10-31"
invalid_any(y)
[1] TRUE
invalid_detect(y)
[1] FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE
You can handle invalid dates with invalid_resolve or you can use drop them with invalid_remove:
invalid_resolve(y, invalid = "previous")
<year_month_day<day>[8]>
[1] "2009-01-31" "2009-02-28" "2009-03-31" "2009-04-30" "2009-05-31" "2009-08-31"
[7] "2009-09-30" "2009-10-31"
From the documentation you can specify the following values for the invalid argument to handle invalid dates:
"previous": The previous valid instant in time.
"previous-day": The previous valid day in time, keeping the time of day.
"next": The next valid instant in time.
"next-day": The next valid day in time, keeping the time of day.
"overflow": Overflow by the number of days that the input is invalid
by. Time of day is dropped.
"overflow-day": Overflow by the number of days that the input is
invalid by. Time of day is kept.
"NA": Replace invalid dates with NA.
"error": Error on invalid dates.
A way using ym from lubridate.
The month can either be a number, an abbreviated month or a full month name with a variety of separators (even without separator), e.g.
library(lubridate)
ym(c("2012/September", "2012-Aug", "2012.07", 201204))
[1] "2012-09-01" "2012-08-01" "2012-07-01" "2012-04-01"
on the given data:
ym(dat$Month)
[1] "2009-01-01" "2009-02-01" "2009-03-01" "2009-04-01" "2009-05-01"
[6] "2009-08-01" "2009-09-01" "2009-10-01"
Note that there's also my if you have it the other way round, e.g. Sep/2022.
Data
dat <- structure(list(Month = c("2009-01", "2009-02", "2009-03", "2009-04",
"2009-05", "2009-08", "2009-09", "2009-10"), count = c(12L, 310L,
2379L, 234L, 14L, 1L, 34L, 2386L)), class = "data.frame", row.names = c(NA,
-8L))

Resources