Unable to convert Month-Year string to Date in R - r

I'm using as.Date to convert a string like Aug-2002 to a dates object representing just the month of Aug of 2002, or if a day must be specified, Aug 1, 2002.
However
> as.Date(c('07-2002'), "%M-%Y")
[1] "2002-11-06"
> as.Date(c('Aug-2002'), "%b-%Y")
[1] NA
Why does the first line of code convert it to a different month and day? And the second one is NA?
I referred to this table for the formatting symbols.

The problem you are having is that the dates you have do not have a day value. Without the day value the format="%m-%Y" will not work in as.Date. These options below will solve them:
as.Date(paste0('01-', c('07-2002')), format="%d-%m-%Y")
library(zoo) #this is a little more forgiving:
as.yearmon(c('07-2002'), "%m-%Y")
as.yearmon(c('Aug-2002'), "%b-%Y")
as.Date(as.yearmon(c('07-2002'), "%m-%Y"))

Related

How to convert character to Date format in R?

How to convert the below in character to Date format?
YYYY.MM
I am facing an issue dealing with zeroes after decimal points for month 10.
Say
2012.10
appears in my input source data as
2012.1
with the zero post decimal missing. How do I bring this back in the Date format?
Since you have only year and month, you need to assign some value for day before you convert to date. In the example below, the day has arbitrarily been chosen as 15.
IF THE INPUT IS CHARACTER
dates = c("2012.10", "2012.01")
lubridate::ymd(paste0(year_month = dates, day = "15"))
#[1] "2012-10-15" "2012-01-15"
IF THE INPUT IS NUMERIC
dates = c(2012.10, 2012.01)
do.call(c, lapply(strsplit(as.character(dates), "\\."), function(d){
if(nchar(d[2]) == 1){
d[2] = paste0(d[2],"0")
}
lubridate::ymd(paste0(year = d[1], month = d[2], day = "15"))
}))
#[1] "2012-10-15" "2012-01-15"
The zoo package has a "yearmon" class for representing year and month without day. Internally it stores them as year + fraction where fraction = 0 for Jan, 1/12 for Feb, 2/12 for Mar and so on but it prints in a nicer format and sorts as expected. Assuming that your input, x, is numeric convert it to character with 2 digit month and then apply as.yearmon with the appropriate format.
library(zoo)
x <- c(2012.1, 2012.01) # test data
as.yearmon(sprintf("%.2f", x), "%Y.%m")
## [1] "Oct 2012" "Jan 2012"
as.Date can be applied to convert a "yearmon" object to "Date" class if desired but normally that is not necessary.
as.Date(as.yearmon(sprintf("%.2f", x), "%Y.%m"))
## [1] "2012-10-01" "2012-01-01"
The code below uses the ymd() function from the lubridate package and sprintf() to coerce dates given in a numeric format
dates <- c(2012.1, 2012.01)
as well as dates given as a character string
dates <- c("2012.1", "2012.01")
where the part left of the decimal point specifies the year whereas the fractional part denote the month.
lubridate::ymd(sprintf("%.2f", as.numeric(dates)), truncated = 1L)
[1] "2012-10-01" "2012-01-01"
The format specification %.2f tells sprintf() to use 2 decimal places.
The parameter truncated = 1L indicates that one date element is missing (day) and should be completed by the default value (the first day of the month). Alternatively, the day of the month can be directly specified in the format specification to sprintf():
lubridate::ymd(sprintf("%.2f-15", as.numeric(dates)))
[1] "2012-10-15" "2012-01-15"

in R how to convert a date in character format to numeric and then easily calculate the difference between two dates

So I want to convert "October 2010" and "November 2010" to a numeric format and hence if I take the difference of these two I get result: 1.
I tried to use as.date function but it seems that it only works for full format: month-day-year.
You can try formatting your raw date strings, and treating each one as being on the first day of that month.
dates <- c("October 2010", "November 2010")
# extract the first three letters for the month, and the last 4 digits for the year
dates.new <- paste0(substr(dates, 1, 3), "-01-", substr(dates, nchar(dates)-3, nchar(dates)))
> dates.new
[1] "Oct-01-2010" "Nov-01-2010"
# convert to POSIXct
dates.posix <- as.POSIXct(dates.new, format="%B-%d-%y")
diff <- dates.posix[2] - dates.posix[1]
> diff
Time difference of 31 days
In your question you want to calculate the difference in number of months and not in number of days. You could map your month-year character vector to a numeric number of months, starting at month 1 with the first month in your dataset and ending with month n with the last month in your dataset. Then it would be straightforward to calculate a difference in number of months.
Alternatively - to be able to manipulate date-time objects - you will have to create full dates, by introducing a 01 in front of all dates for example "01 November 2010" and then calculating the difference between dates. This the main part of the answer below.
Manipulating date-time objects
The lubridate package can calculate the difference between two dates. It deals with non trivial issues such as February 29th. If it's not installed on your system:
install.packages("lubridate")
Then
library(lubridate)
ymd("20160301")-ymd("20160228")
# Time difference of 2 days
ymd("20150301")-ymd("20150228")
# Time difference of 1 days
To read full month names look at formatting details in help(parse_date_time)
d <- parse_date_time("November 01 2010", "Bdy") - parse_date_time("October 01 2010", "Bdy")
d
# Time difference of 31 days
d is a difftime object, (based on converting a difftime to integer) you can convert it to a numeric number of days and weeks (but not to a number of months):
class(d)
# [1] "difftime"
as.numeric(d, units="days")
# [1] 31
as.numeric(d, units="weeks")
# [1] 4.428571

How to convert decimal date format (e.g. 2011.580) to normal date format?

I'm trying to change from the decimal date format (return type of cpts.ts() from the changepoint package) to the normal date format %Y-%m-%d. Example:
cpts.ts(myTimeSeries.BinSeg)
[1] 2001.667 2004.083 2008.750 2011.583 2011.917
The actual dates are sometime around August 2001, January 2004, September 2008, June/July 2011 and December 2011 (I don't know them exactly, I'm reading them off a graph).
I can't seem to find a standard method of converting this format back to the usual date format.
Can anybody help me?
Thanks
Slightly different results with lubridate:
library(lubridate)
decimals <- c(2001.667, 2004.083, 2008.750, 2011.583, 2011.917)
format(date_decimal(decimals), "%Y-%m-%d")
# [1] "2001-09-01" "2004-01-31" "2008-10-01" "2011-08-01" "2011-12-01"
> foo <- c(2001.667,2004.083,2008.750,2011.583,2011.917)
> as.Date(paste(trunc(foo),round((foo-trunc(foo))*365,0)),"%Y %j")
[1] "2001-08-31" "2004-01-30" "2008-09-30" "2011-08-01" "2011-12-01"
Look at ?as.Date and its format parameter, which will direct you to ?strptime, from which I took the %j format specification.
You may need to adapt for some corner cases, like January 1st.
For those considering a base R solution to this issue, the core of lubridate's date_decimal is essentially:
start <- as.POSIXct(paste0(trunc(foo), "/01/01"), tz="UTC")
end <- as.POSIXct(paste0(trunc(foo)+1,"/01/01"), tz="UTC")
start + (difftime(end, start, units="secs") * (foo - trunc(foo)))
I.e. - set a start date back at the start of the year in which the date occurs, set an end date at the start of the following year, multiply the difference between start and end by the fraction of the year elapsed, add this difference back to the start. Doing this takes into account leap years, and will work appropriately for January 1st.

How to convert ordinal date day-month-year format using R

I have log files where the date is mentioned in the ordinal date format.
wikipedia page for ordinal date
i.e 14273 implies 273'rd day of 2014 so 14273 is 30-Sep-2014.
is there a function in R to convert ordinal date (14273) to (30-Sep-2014).
Tried the date package but didn come across a function that would do this.
Try as.Date with the indicated format:
as.Date(sprintf("%05d", 14273), format = "%y%j")
## [1] "2014-09-30"
Notes
For more information see ?strptime [link]
The 273 part is sometimes referred to as the day of the year (as opposed to the day of the month) or the day number or the julian day relative to the beginning of the year.
If the input were a character string of the form yyjjj (rather than numeric) then as.Date(x, format = "%y%j") will do.
Update Have updated to also handle years with one digit as per comments.
Data example
x<-as.character(c("14273", "09001", "07031", "01033"))
Data conversion
x1<-substr(x, start=0, stop=2)
x2<-substr(x, start=3, stop=5)
x3<-format(strptime(x2, format="%j"), format="%m-%d")
date<-as.Date(paste(x3, x1, sep="-"), format="%m-%d-%y")
You can use lubridate package as follows:
>library(lubridate)
# Create a template date object
>date <- as.POSIXlt("2009-02-10")
# Update the date using
> update(date, year=2014, yday=273)
[1] "2014-09-30 JST"

From MMDD to day of the year in R

I have this .txt file:
http://pastebin.com/raw.php?i=0fdswDxF
First column (Date) shows date in month/day
So 0601 is the 1st of June
When I load this into R and I show the data, it removes the first 0 in the data.
So when loaded it looks like:
601
602
etc
For 1st of June, 2nd of June
For the months 10,11,12, it remains unchanged.
How do I change it back to 0601 etc.?
What I am trying to do is to change these days into the day of the year, for instance,
1st of January (0101) would be 1, and 31st of December would be 365.
There is no leap year to be considered.
I have the code to change this, if my data was shown as 0601 etc, but not as 601 etc.
copperNew$Date = as.numeric(as.POSIXct(strptime(paste0("2013",copperNew$Date), format="%Y%m%d")) -
as.POSIXct("2012-12-31"), units = "days")
Where Date of course is from the file linked above.
Please ask if you do not consider the description to be good enough.
You can use colClasses in the read.table function, then convert to POSIXlt and extract the year date. You are over complicating the process.
copperNew <- read.table("http://pastebin.com/raw.php?i=0fdswDxF", header=TRUE,
colClasses=c("character", "integer", rep("numeric", 3)))
tmp <- as.POSIXlt( copperNew$Date, format='%m%d' )
copperNew$Yday <- tmp$yday
The as.POSIXct function is able to parse a string without a year (assumes the current year) and computes the day of the year for you.
d<-as.Date("0201", format = "%m%d")
strftime(d, format="%j")
#[1] "032"
First you parse your string and obtain Date object which represents your date (notice that it will add current year, so if you want to count days for some specific year add it to your string: as.Date("1988-0201", format = "%Y-%m%d")).
Function strftime will convert your Date to POSIXlt object and return day of year. If you want the result to be a numeric value, you can do it like this: as.numeric(strftime(d, format = "%j"))(Thanks Gavin Simpson)
Convert it to POSIXlt using a year that is not a leap-year, then access the yday element and add 1 (because yday is 0 on January 1st).
strptime(paste0("2011","0201"),"%Y%m%d")$yday+1
# [1] 32
From start-to-finish:
x <- read.table("http://pastebin.com/raw.php?i=0fdswDxF",
colClasses=c("character",rep("numeric",5)), header=TRUE)
x$Date <- strptime(paste0("2011",x$Date),"%Y%m%d")$yday+1
In which language?
If it's something like C#, Java or Javascript, I'd follow these steps:
1-) parse a pair of integers from that column;
2-) create a datetime variable whose day and month are taken from the integers from step one. Set the year to some fixed value, or to the current year.
3-) create another datetime variable, whose date is the 1st of February of the same year as the one in step 2.
The number of the day is the difference in days between the datetime variables, + 1 day.
This one worked for me:
copperNew <- read.table("http://pastebin.com/raw.php?i=0fdswDxF",
header=TRUE, sep=" ", colClasses=c("character",
"integer",
rep("numeric", 3)))
copperNew$diff = difftime(as.POSIXct(strptime(paste0("2013",dat$Date),
format="%Y%m%d", tz="GMT")),
as.POSIXct("2012-12-31", tz="GMT"), units="days")
I had to specify the timezone (tz argument in as.POSIXct), otherwise I got two different timezones for the vectors I am subtracting and therefore non-integer days.

Resources