csv with dates in decimal format - r

I've been provided a csv file with the date column as follows:
1990.12466
1990.20137
1990.2863
1990.36849
1990.45342
1990.53562
1990.62055
1990.70548
1990.78767
1990.8726
1990.95479
1991.03973
This is data I'll be using in highcharts, I can't seem to find any functionality to get these formats into YYYYMMDD
It appears like this data was made in R using something like the lubridate function but I have no way of confirming this.
Any ideas on the best way to get this data into YYYMMDD ?

Assuming that the first four digits represent the year, and the digits after the decimal represent the percentage through the year, you can use the following formula to convert these values into a MS Excel date-time code: (with dates to be converted residing in column "A")
=DATE(MID(A1,1,4),1,1)+((A1-MID(A1,1,4))*(IF(OR(MOD(MID(A1,1,4),400)=0,AND(MOD(MID(A1,1,4),4)=0,MOD(MID(A1,1,4),100)<>0)),365, 366)))
Once you have these MS Excel date-time codes, you can format the date in Excel to whatever format you need (see Format a date the way you want).

Something like this should work. First we linearly interpolate between the beginning of the year and the end of the year, and then we format the output into YYYYMMDD format as requested:
decimal_to_date = function(dt){
yr = floor(dt)
yr_begin = ISOdate(yr, 1, 1, 0, 0, 0)
yr_end = ISOdate(yr+1, 1, 1, 0, 0, 0)
interpolated_date = yr_begin + (yr_end - yr_begin) * (dt - yr)
return(format(interpolated_date, '%Y%m%d'))
}
Then for example decimal_to_date(1990.12466) returns 19900215 for February 15, 1990.
If you output the times as well as the dates, the time of day is always very near noon, which suggests something about the process that generated your data, although I'm not exactly sure what.

For what it's worth, here's a very slight variation on Michael Lugo's answer, which indeed does the trick. The ISOdate() function outputs a date-time object. Following code uses as.Date() which outputs date only. Following code also takes a brief shortcut in the calculation of the number of days in a calendar year - which you'll need for the interpolation. This shortcut requires loading a library, however, which the original answer does not.
library(lubridate)
decimals <- c(1990.12466,1990.20137,1990.2863,1990.36849,1990.45342,1990.53562,1990.62055,1990.70548,1990.78767,1990.8726,1990.95479,1991.03973)
decimal_to_date2 = function(dt){
nDays <- yday(paste0(floor(dt),"-12-31"))
day1 <- as.Date(paste0(floor(dt),"-01-01"))
interpolated_date <- day1+(dt-floor(dt))*nDays
return(format(interpolated_date, '%Y%m%d'))
}
decimal_to_date2(decimals)
Results of first answer and mine are identical.

Related

How to read in excel file when Date and Time in the same column in R

I am trying to read an excel file into R. Among other fields, the excel file has two "date" fields, each containing both the date and time stamp in the SAME field.
Example:
StartDate 9/14/2019 10:18:59 AM
EndDate 9/18/2019 2:27:14 AM
When I tried read_excel to read in the excel file, the data frame formatted these two columns very strangely. It spat out the days (with decimals). Such as 43712.429849537039, Which I thought was days from Jan-01-1970 (the origin date that popped up when I typed lubrudate::origin).
data %<>%
mutate(StartDate = as.Date(StartDate, origin = "1970-01-01 UTC"))
So I tried converting this back using as.Date, but it converts it to the totally wrong date... (converts all the dates to the year 2089). Example, 2089-09-05.
Any help with this would be really appreciated! There must be a simpler way to directly read in a date-time column?!
You can use the lubridate package, it is excellent:
library(tidyverse)
df <- data.frame(StartDate =c("9/14/2019 10:18:59 AM","9/14/2019 3:18:59 PM"),
EndDate= c("9/18/2019 2:27:14 AM","9/18/2019 1:27:14 PM"))
df <- df %>% mutate(StartDate = lubridate::mdy_hms(StartDate), EndDate = lubridate::mdy_hms(EndDate))
It turns out that excel has a different "origin date" from R. Excels counts the days from 01-01-1900, where as R counts days from 01-01-1970.
When I used read_excel to read the file into a df, R used excels' counts of days. Which is why I got a weird date when I tried to convert to the date format using 1970. As soon as I used as.Date with excels "origin" date of 1990 (excels origin date), my dates parsed out correctly!

format - display fractional time data as hh:mm:ss R

I have data below for work hours which I need to compare - start and stop with date and time. I first extract the time portion of each as start and stop variables, then use the chron package to change them from factor data to something I can compare more easily.
require(chron)
eg_data3 <- data.frame(
id = c('42', '42', '42', '42', '42'),
time_in = as.factor(c('11/5/2017 13:52', '11/4/2017 14:25', '11/5/2017 15:30', '11/5/2017 17:10', '11/6/2017 18:20')),
time_out = as.factor(c('11/5/2017 13:59', '11/4/2017 14:59', '11/5/2017 16:00', '11/5/2017 17:45', '11/6/2017 18:50')))
eg_data3$start_time <- substring(strptime(eg_data3$time_in, format = "%m/%d/%Y %H:%M"),12,19)
eg_data3$end_time <- substring(strptime(eg_data3$time_out, format = "%m/%d/%Y %H:%M"),12,19)
eg_data3$end_time <- chron(times = eg_data3$end_time)
eg_data3$start_time <- chron(times = eg_data3$start_time)
Next, I generate another variable which compares the difference between stop time 1, and start time 2, IE stop time in row 1 with start time in row 2, to see the gap between them.
require(dplyr)
eg_data3 <- eg_data3 %>% group_by(id) %>% mutate(diff_outX0_inX1 = start_time - lag(end_time))
When I do this, the variable is formatted as a decimal. I cannot for the life of me get it to display as hh:mm:ss. I have tried specifying out.format as hh:mm:ss in chron, changing time_in / time_out to numeric and character before and after extraction and applying chron(times), changing the format of the diff_ variable after, etc.
What seems like a very simple question -
How do I get the result comparison (diff_outX0_inX1) variable to display as time, either hh:mm or hh:mm:ss ?? I know the formula to convert fractional days into minutes in Excel, but I'd prefer to not write out a two step function, I assume it's a simple formatting issue.
Any help is appreciated.
EDIT - got flagged as a duplicate...OK. I asked if there was a way to do this that did not involve writing a function. The answer that was linked involves a function. First comment provided a clean simple answer. I can reproduce the answer in the comment, I could not reproduce the function myself, not nearly as helpful. I also added another solution that does not requre dplyr. No where I looked online showed me something as simple as "just format the result with chron."

How to convert date and time into a numeric value in R

I am relatively new to R and I have a dataset in which I am trying to convert a date and time into a numeric value. The date and time are in the format 01JUN17:00:00:00 under a variable called pickup_datetime. I have tried using the code
cab_small_sample$pickup_datetime <- as.numeric(as.Date(cab_small_sample$pickup_datetime, format = '%d%b%y'))
but this way doesn't incorporate time, I tried to add the time format to the format section of code but still did not work. Is there an R function that will convert the data into a numeric value>
R has two main time classes: "Date" and "POSIXct". POSIXct is a datetime class and you can get all the gory details at: ? DateTimeClasses. The help page for the formats used at the time of data input, however, are at ?striptime.
cab_small_sample <- data.frame(pickup_datetime = "01JUN17:00:00:00")
cab_small_sample$pickup_dt <- as.numeric(as.POSIXct(cab_small_sample$pickup_datetime,
format = '%d%b%y:%H:%M:%S'))
cab_small_sample
# pickup_datetime pickup_dt
#1 01JUN17:00:00:00 1496300400 # seconds since 1970-01-01
I find that a "destructive reassignment of values" is generally a bad idea so as a "my (best?) practice rule" I don't assign to the same column until I'm sure I have the code working properly. (And I always leave an untouched copy somewhere safe.)
lubridate is an extremely handy package for dealing with dates. It includes a variety of functions which do the date/time parsing for you, as long as you can provide the order of components. In this case, since your data is in day-month-year-hms form, you can use the dmy_hms function.
library(lubridate)
cab_small_sample <- dplyr::tibble(
pickup_datetime = c("01JUN17:00:00:00", "01JUN17:11:00:00"))
cab_small_sample$pickup_POSIX <- dmy_hms(cab_small_sample$pickup_datetime)

R Timeline Without Dates

I'm trying to make a timeline like you'd make with any of the timevis, vistime, or timeline R packages, but I'm only interested in times and not dates. I don't mind putting a placeholder date in there, but it seems that all of these packages require the start and end times to include dates and include the date in the timeline.
I've been searching for ways to either not include dates in a timeline or only print the time but not the date in any of these package, but haven't been able to find anything. Does anyone have any ideas?
All of those packages use as.POSIXct under the hood, which requires objects to be Date objects and doesn't work with times only. So, if your data is about only one day, you can add the date on the clock times (using paste) and e.g. vistime will display only the time (ok, a date almost completely hidden in the corner):
dat <- data.frame(event = 1:2,
start = c("14:00", "16:00"),
end = c("15:30", "17:00"))
# add a Date
dat[,c("start", "end")] <- sapply(dat[,c("start", "end")], function(x) paste(Sys.Date(), x))
vistime(dat)
I use vistime version 0.7.0.9000 which can be obtained by executing devtools::install_github("shosaco/vistime").
If you want to represent times without any date information, you should try out the package hms. It is part of the tidyverse collection and is described as:
A simple class for storing durations or time-of-day values and displaying them in the hh:mm:ss format.
Example use:
library(hms)
hms(56, 34, 12)
#> 12:34:56

Extract dates times from a data.frame in R

I have a dataset with some date time like this "{datetime:2015-07-01 09:10:00" So I wanted to remove the text, and then keep the date & the time as as.Date returns only the date. So I write this code but the only problem I have is that during the second line with strsplit, it only returns me the date time of the first line and so erase the others... I woud love to get ALL my date time not only the first. I thought about sapply maybe, but I can't make it right I have many errors or maybe with a loop for? I am novice to R so I don't really know how to do this the best way.
Could you help me please? Besides If you have another idea for the time & date format or a simple way to do it, it should be very nice of you too.
data$`Date Time`=as.character(data$`Date Time`)
data$`Date Time`=unlist(strsplit(data[,1], split='e:'))[2]
date=substr(data$`Date Time`,0,10)
date=as.Date(date)
time=substr(data$`Date Time`,12,19)
data$Date=date
data$Time=time
Thank you very much for your help!
You could use the format argument to avoid all the strsplit:
times <- as.POSIXct(data$`Date Time`, format='{datetime:%Y-%m-%d %H:%M:%S')
(The reason for the "{datetime:" in the format is because you mentioned this is the format of your strings).
This object has both date and time in it, and then you can just store it in the dataframe as a single column of type POSIXct rather than two columns of type string e.g.
data$datetime <- times
but if you do want to store the date as a Date and the time as a string (as in your example above):
data$Date <- as.Date(times)
data$Time <- strftime(times, format='%H:%M:%S')
See ?as.Date, ?as.POSIXct, ?strptime for more details on that format argument and various conversions between date and string.

Resources