Converting dates from excel to R - r

I have difficulty converting dates from excel (reading from csv) to R. Help is much appreciated.
Here is what I'm doing:
df$date = as.Date(df$excel.date, format = "%d/%m/%Y")
However, some dates get converted but some not. Here is the output of:
head(df$date)
[1] NA NA NA "0006-01-05" NA NA
the first 5 entries imported from csv file are as follows:
7/28/05
7/28/05
12/16/05
5/1/06
4/21/05
and here is the output of:
head(df$excel.date)
[1] 7/28/05 7/28/05 12/16/05 5/1/06 4/21/05 1/25/07
1079 Levels: 1/1/00 1/1/02 1/1/97 1/10/96 1/10/99 1/11/04 1/11/94 1/11/96 1/11/97 1/11/98 ... 9/9/99
str(df)
.
.
$ excel.date : Factor w/ 1079 levels "1/1/00","1/1/02",..: 869 869 288 618 561 48 710 1022 172 241 ...

First of all, make sure you have the dates in your file in an unambiguous format, using full years (not just 2 last numbers). %Y is for "year with century" (see ?strptime) but you don't seem to have century. So you can use %y (at your own risk, see ?strptime again) or reformat the dates in Excel.
It is also a good idea to use as.is=TRUE with read.csv when reading in these data -- otherwise character vectors are converted to factors which can lead to unexpected results.
And on Wndows it may be easier to use RODBC to read in dates directly from xls or xlsx file.
(edit)
The following may give a hint:
> as.Date("13/04/2014", format= "%d/%m/%Y")
[1] "2014-04-13"
> as.Date(factor("13/04/2014"), format= "%d/%m/%Y")
[1] "2014-04-13"
> as.Date(factor("13/04/14"), format= "%d/%m/%Y")
[1] "14-04-13"
> as.Date(factor("13/04/14"), format= "%d/%m/%y")
[1] "2014-04-13"
(So as.Date can actually take care of factors - the magick happens in as.Date.factor method defined as:
function (x, ...) as.Date(as.character(x), ...)
It is not a good idea to represent dates as factors but in this case it is not a problem either. I think the problem is excel which saves your years as 2-digit numbers in a CSV file, without asking you.)
-
The ?strptime help file says that using %y is platform specific - you can have different results on different machines. So if there's no way of going back to the source and save the csv in a better way you might use something like the following:
x <- c("7/28/05", "7/28/05", "12/16/05", "5/1/06", "4/21/05", "1/25/07")
repairExcelDates <- function(x, yearcol=3, fmt="%m/%d/%Y") {
x <- do.call(rbind, lapply(strsplit(x, "/"), as.numeric))
year <- x[,yearcol]
if(any(year>99)) stop("dont'know what to do")
x[,yearcol] <- ifelse(year <= as.numeric(format(Sys.Date(), "%Y")), year+2000, year + 1900)
# if year <= current year then add 2000, otherwise add 1900
x <- apply(x, 1, paste, collapse="/")
as.Date(x, format=fmt)
}
repairExcelDates(x)
# [1] "2005-07-28" "2005-07-28" "2005-12-16" "2006-05-01" "2005-04-21"
# [6] "2007-01-25"

Your data is formatted as Month/Day/Year so
df$date = as.Date(df$excel.date, format = "%d/%m/%Y")
should be
df$date = as.Date(df$excel.date, format = "%m/%d/%Y")

Related

What does calling as.numeric() do to a lubridate Date object?

I am working with an external package that's converting columns of a dataframe with the lubridate date type Date into numeric type. (Confirmed by running as.numeric() on the columns).
I'm wondering if there's a way to convert it back?
For example, if I have the date "O1-01-2021" then running as.numeric on it returns -719143. How can I turn that back into "O1-01-2021" ?
Note that Date class is part of base R, not lubridate.
You probably assumed that the data was year/month/day by mistake. Using base R to eliminate lubridate as a problem we can replicate the question's result like this:
as.numeric(as.Date("01-01-2021", "%Y-%m-%d"))
## [1] -719143
Had we used day/month/year we would have gotten:
as.numeric(as.Date("01-01-2021", "%d-%m-%Y"))
## [1] 18628
or using lubridate
library(lubridate)
as.numeric(dmy("01-01-2021"))
## [1] 18628
It would be best if you fix the mistake that resulted in -719143 but if you don't control that and are faced with an input of
-719143 and want to get as.Date("2021-01-01") as the output then:
# input x is numeric; result is Date class
fixup <- function(x) as.Date(format(.Date(x), "%y-%m-%d"), "%d-%m-%y")
fixup(-719143)
## [1] "2020-01-01"
Note that we can't tell from the question whether 01-01-2020 is supposed to represent day-month-year or month-day-year so we assumed the first but if it is to represent the second then it should be obvious at this point how to proceed.
EDIT #2: It looks like the original data is being parsed as Jan 20, year 1, which might happen if the year-month-day columns were jumbled while being parsed:
as.numeric(as.Date("01-01-2021", format = "%Y-%m-%d", origin = "1970-01-01"))
[1] -719143
as.numeric(as.Date("0001-01-20", origin = "1970-01-01"))
[1] -719143
Is there a way to share an example of the raw data as you have it? e.g. dput(MY_DATA[1:10, DATE_COL])
EDIT: -719143 is about 1970 years of days, which can't be a coincidence, given that many date/time formats use 1970 as a baseline. I wonder if 01-01-2021 is being interpreted as the numeric formula equal to -2021 and so we're looking at perhaps -2021 seconds/days/[?] before year zero, which would be about -1970 years before the epoch...
-719143/(365)
[1] -1970.255
For instance, we can get something close with:
as.numeric(as.Date("0000-01-01", origin = "1970-01-01"))
[1] -719528
Original answer:
R treats a string describing a date as text:
x <- "01-01-2021"
class(x)
[1] "character"
We can convert it to a Date data type using these two equivalent commands:
base_dt <- as.Date(x, "%m-%d-%Y") # base R version
lubridt <- lubridate::mdy(x) # convenience lubridate function
identical(base_dt, lubridt)
[1] TRUE
Under the hood, a Date object in R is a numeric value with a flag telling R it's a date:
> typeof(lubridt) # What general type of data is it?
[1] "double" # --> numeric, stored as a double
> as.numeric(lubridt)
[1] 18628
> class(lubridt) # Does it have any special class attributes?
[1] "Date" # --> yes, it's a Date
> dput(lubridt) # How would we construct it from scratch?
structure(18628, class = "Date") # --> by giving 18628 a Date attribute
In R, a Date is encoded as the number of days since 1970 began:
> as.Date("1970-01-1") + as.numeric(lubridt)
[1] "2021-01-01"
We could convert it back to the original text using:
format(base_dt, "%m-%d-%Y")
[1] "01-01-2021"
identical(x, format(base_dt, "%m-%d-%Y"))
[1] TRUE

Convert a string into dates using R

I have a column of dates written as monthyear in the format:
11960 - this would be Jan 1960
121960 - this would be Dec 1960
I would like to convert this column into a day-month-year format assuming the first of the month as each date.
I have tried (using one number as an example as opposed to dt$dob)
x <- sprintf("%08d%", 11960)
and then x <- as.date(x, format = "%d%m%Y)
but this gives me NAs as I assume it doesn't like the 00 at the start
So I tried pasting 01 to each value but this pastes it to the end (R noob here). I was thinking maybe posting 01 to the start and then using the sprintf function may work still:
paste 01 to start of 11960 = 011960
sprintf("%08d%", 011960) to maybe give 0101960?
Then use as.Date to convert?
Many thanks for your help
i used paste0() instead of sprintf, but it seems it works.
> x<-paste0("010",11960)
> x
[1] "01011960"
> as.Date(x , format = "%d%m%Y" )
[1] "1960-01-01"
EDIT for 2 digit months i use ifelse() and nchar()
y<-c(11960,11970,11980, 111960,111970,111980)
x<-ifelse(nchar(y) == 5,paste0("010",y),paste0("01",y))
> x
[1] "01011960" "01011970" "01011980" "01111960" "01111970" "01111980"
as.Date(x , format = "%d%m%Y" )
[1] "1960-01-01" "1970-01-01" "1980-01-01" "1960-11-01" "1970-11-01" "1980-11-01"

dates transfer from spreadsheet to R

I have 451 dates in the format "2002-06-18",YYYY-MM-DD, in the spreadsheet program libre office calc. I would like to transfer these dates into R as a column with the name "Date_Sale".
In the next step I copied this column of dates to a text file. In the next step I read this text file into R by the command
Date_Sale <- read.csv("Date_Sale.txt", header=FALSE,stringsAsFactors=FALSE)
> str(Date_Sale)
'data.frame': 451 obs. of 1 variable:
$ V1: chr "2002-06-18" "2002-05-22" "2002-05-23" "2002-10-23" ...
Above the command str etc. shows that the data was read as dataframe in the format chr, character, into R. Now I tried to use the command
Date_Sale <- strptime(Date_Sale, "%Y-%m-%d")
There appears the error message
Fehler in strptime(Date_Sale, "%Y-%m-%d") :
Eingabe-Zeichenkette ist zu lang
If I use one element in the command above it works.
firstday <- strptime("2002-06-18", "%Y-%m-%d")
[1] "2002-06-18 CEST"
Here is one approach
library(tidyverse)
df <- tribble(~my_date,
"2002-06-18",
"2002-05-22",
"2002-05-23",
"2002-10-23")
df %>%
mutate(my_date = lubridate::ymd(my_date))
or
df %>%
mutate(my_date = as.Date(my_date, format = '%Y-%m-%d'))
Be careful with timezones when converting data. strptime will use your current time zone by default which may be summer time (daylight saving time). Check ?strptime

Format two kinds of factor dates in R

I have two sets of date looking strings; either 31.3.14 or 31/3/14
I would like to format them to 31-3-2014
Now I know how to format each of them to desired format, but I don't know how to distinguish them and apply the approach bellow.
For this format 31.3.14 :
format(as.Date(as.character("31.3.14"), "%d.%m.%y"), "%d-%m-%Y")
For this format 31/3/14 :
format(as.Date(as.character("31/3/14"), "%d/%m/%Y"), "%d-%m-%Y"))
I have this sorts of dates in a dataframe column randomly so I would need to apply given method for the right set of format.
EDIT: sorry I have also different kinds of dates, also: "2013-04-01" here the solution provided with dmy function fails.
Could also do it with base R by removing punctuations first
Dates <- c("31.3.14", "31/3/14")
format(as.Date(gsub("[[:punct:]]", "-", Dates), format = "%d-%m-%y"), "%d-%m-%Y")
## [1] "31-03-2014" "31-03-2014"
Hadley Wickham's Lubridate package makes this easy.
> require(lubridate)
> test <- data.frame(raw = c("31.3.14", "31/3/14"))
> test$formatted <- dmy(test$raw)
> test
raw formatted
1 31.3.14 2014-03-31
2 31/3/14 2014-03-31
EDIT:
Based on the edit to the question, one can use ifelse() within a function to detect a four-digit sequence at the start of the date string.
require(stringr)
myDateFun <- function(x){
z <- ifelse(str_detect(x, "^\\d{4}") == TRUE,
ymd(x), dmy(x) )
z <- as.POSIXlt(z, origin = "1970-01-01")
z <- format(z, "%Y-%m-%d")
return(z)
}
test <- data.frame(raw = c("31.3.14", "31/3/14", "2014-3-31"))
test$formatted.2 <- myDateFun(test$raw)
test
raw formatted formatted.2
1 31.3.14 2014-03-31 2014-03-31
2 31/3/14 2014-03-31 2014-03-31
3 2014-3-31 <NA> 2014-03-31

as.Date returning NA while converting from 'ddmmmyyyy'

I am trying to convert the string "2013-JAN-14" into a Date as follow :
sdate1 <- "2013-JAN-14"
ddate1 <- as.Date(sdate1,format="%Y-%b-%d")
ddate1
but I get :
[1] NA
What am I doing wrong ? should I install a package for this purpose (I tried installing chron) .
Works for me. The reasons it doesn't for you probably has to do with your system locale.
?as.Date has the following to say:
## This will give NA(s) in some locales; setting the C locale
## as in the commented lines will overcome this on most systems.
## lct <- Sys.getlocale("LC_TIME"); Sys.setlocale("LC_TIME", "C")
x <- c("1jan1960", "2jan1960", "31mar1960", "30jul1960")
z <- as.Date(x, "%d%b%Y")
## Sys.setlocale("LC_TIME", lct)
Worth a try.
This can also happen if you try to convert your date of class factor into a date of class Date. You need to first convert into POSIXt otherwise as.Date doesn't know what part of your string corresponds to what.
Wrong way: direct conversion from factor to date:
a<-as.factor("24/06/2018")
b<-as.Date(a,format="%Y-%m-%d")
You will get as an output:
a
[1] 24/06/2018
Levels: 24/06/2018
class(a)
[1] "factor"
b
[1] NA
Right way, converting factor into POSIXt and then into date
a<-as.factor("24/06/2018")
abis<-strptime(a,format="%d/%m/%Y") #defining what is the original format of your date
b<-as.Date(abis,format="%Y-%m-%d") #defining what is the desired format of your date
You will get as an output:
abis
[1] "2018-06-24 AEST"
class(abis)
[1] "POSIXlt" "POSIXt"
b
[1] "2018-06-24"
class(b)
[1] "Date"
My solution below might not work for every problem that results in as.Date() returning NA's, but it does work for some, namely, when the Date variable is read in in factor format.
Simply read in the .csv with stringsAsFactors=FALSE
data <- read.csv("data.csv", stringsAsFactors = FALSE)
data$date <- as.Date(data$date)
After trying (and failing) to solve the NA problem with my system locale, this solution worked for me.

Resources