Converting hour, minutes columns in dataframe to time format - r

I have a dataframe containing 3 columns including minutes and hours.
I want to convert these columns (namely minutes and column) to time format. Given the data in drame:
Score Hour Min
10 10 56
23 17 01
I would like to get:
Score Time
10 10:56:00
23 17:01:00

You could use ISOdatetime to convert the numbers in the hour and min to a POSIXct object. However, a POSIXct object is only defined when it also includes a year, month and day. So depending on your needs to can do several things:
If you need a real time object which is correctly printed in graphs for example and can be used in arithmetic (addition, subtraction), you need to use ISOdatetime. ISOdatetime returns a so called POSIXct object, which is an R object which represents time. Then in ISOdatetime you just use fixed values for year, month, and day. This ofcourse only works if your dataset does not span multiple years.
If you just need a character column Time, you can convert the POSIXct output to string using strftime. By setting the format argument to "%H:%M:00". In this case however, you could also use sprintf to create the new character column without converting to POSIXct: sprintf("%s:%s:00", drame$Hour, drame$Min).

You can use paste() function to merge the two column data into a char and then use strptime() to convert to timestamp
x<-1:6
##1 2 3 4 5 6
y<-8:13
## 8 9 10 11 12 13
timestamp <- paste(x,":",y,":00",sep="")
timestamp
will result in
#1:8:00 2:9:00 3:10:00 4:11:00 5:12:00 6:13:00
If you prefer to convert this to timestamp object try using
strptime(mergedData,"%H:%M:%S")
## uses current date by default
if you happen to have Date in another column use paste() to make a char formattted date and use below to get date time
##strptime(mergedData,"%d/%m/%Y %H:%M:%S")

Related

Filtering h2o dataset by date, but being column imported as time in R

I have a .csv that I am importing into h2o which has dates stored as "YYYY-mm-dd" format. When I import this into h2o through R, these columns are read in as time (milliseconds) since 1970 (as explained by the problem listed here - https://0xdata.atlassian.net/browse/PUBDEV-3434).
> head(data.hex$date_used_dt)
date_used_dt
1 1489449600000
2 1520380800000
3 1469491200000
4 1465862400000
5 1464912000000
6 1516147200000
I need to turn this column into a date format. h2o.as_date() cannot work since this is not a factor or string. Is there a function that converts the time variable from h2o to a date within h2o? Something like h2o.as_date(), but that could be used on time variables? I need to keep this dataset in h2o.
All dates within h2o are represented like this. Even if you have a character column of dates ("2018-01-01") and you use h2o.as_date() it will be represented in milliseconds.
What you can do if you want to filter on dates is use the h2o.day, h2o.month and h2o.year functions.
data.hex[h2o.day(data.hex$date_used_dt) == 5, ] if you only want every 5th day of every month.
Or any combination of month and year like data.hex[h2o.year(data.hex$date_used_dt) == 2017 & h2o.month(data.hex$date_used_dt) == 12, ] if you just want december 2017.

Extracting dates from columns and sort them

Dear colleagues I have the following dataset:
Time1 Signal1 Time2 Signal2 Time3 Signal 3
2018-05-06 17:41:44 Value 1 2018-05-06 17:32:39 Value 1 2018-05-07 00:06:00 .....
Time X columns are in POSIXct format, Because the time of the signals is different I am trying to make a custom resampling and I am trying to extract the timestamp of each signal.
I need to storage the time of each signal, putting this values in one vector and short this vector in ascending order.
I have try to:
NewTime<-sort(dataset[,c(1,3,5)])
Error: Can't use matrix or array for column indexing
Also with:
NewTime<-sort(unlist(Time_Trend[, c(1,3,5)]))
But with the last time I loose the date format, is there any way of doing this procedure without loosing the POSIXct format apart that having the vector in messy format.
Finally I have tried with this:
NewTime<-cbind(data$X1,data$X3, data$X5)
actualTime<-as.POSIXct(actualTime, origin="2018-05-06 07:50:32") #lowest value
But it returns me a vector with year date 2066. Anyone that has done this before?
If we want to order based on multiple columns
dataset[do.call(order, dataset[,c(1,3,5)]),]
If we are looking for creating a vector of datetime variables and then do the sort
sort(do.call(`c`, dataset[c(1, 3, 5)]))

R difftime subtracts 2 days

I have some timedelta strings which were exported from Python. I'm trying to import them for use in R, but I'm getting some weird results.
When the timedeltas are small, I get results that are off by 2 days, e.g.:
> as.difftime('26 days 04:53:36.000000000',format='%d days %H:%M:%S.000000000')
Time difference of 24.20389 days
When they are larger, it doesn't work at all:
> as.difftime('36 days 04:53:36.000000000',format='%d days %H:%M:%S.000000000')
Time difference of NA secs
I also read into 'R' some time delta objects I had processed with 'Python' and had a similar issue with the 26 days 04:53:36.000000000 format. As Gregor said, %d in strptime is the day of the month as a zero padded decimal number so won't work with numbers >31 and there doesn't seem to be an option for cumulative days (probably because strptime is for date time objects and not time delta objects).
My solution was to convert the objects to strings and extract the numerical data as Gregor suggested and I did this using the gsub function.
# convert to strings
data$tdelta <- as.character(data$tdelta)
# extract numerical data
days <- as.numeric(gsub('^.*([0-9]+) days.*$','\\1',data$tdelta))
hours <- as.numeric(gsub('^.*ys ([0-9]+):.*$','\\1',data$tdelta))
minutes <- as.numeric(gsub('^.*:([0-9]+):.*$','\\1',data$tdelta))
seconds <- as.numeric(gsub('^.*:([0-9]+)..*$','\\1',data$tdelta))
# add up numerical components to whatever units you want
time_diff_seconds <- seconds + minutes*60 + hours*60*60 + days*24*60*60
# add column to data frame
data$tdelta <- time_diff_seconds
That should allow you to do computations with the time differences. Hope that helps.

Extract numbers from string as numeric or dates in R

I am working with some hdf5 data sets. However, the dates are stored in the file and no hint of these dates from the file name. The attribute file consists of day of the year, month of the year, day of the month and year columns.
I would like to pull out data to create time series identity for each of the files i.e.year month date format that can be used for time series.
A sample of the data can be downloaded here:
[ ftp://l5eil01.larc.nasa.gov/tesl1l2l3/TES/TL3COD.003/2007.08.31/TES-Aura_L3-CO_r0000006311_F01_09.he5 ]
There is an attribute group file and a data group file.
I use the R library "rhdf5" to explore the hdf5 files. E.g
CO1<-h5ls ("TES-Aura_L3-CO_r0000006311_F01_09.he5")
Attr<-h5read("TES-Aura_L3-CO_r0000006311_F01_09.he5","HDFEOS INFORMATION/coremetadata")
Data<-h5read("TES-Aura_L3-CO_r0000006311_F01_09.he5", "HDFEOS\SWATHS\ColumnAmountNO2\Data Fields\ColumnAmountNO2Trop")
The Attr when read consist of a long string with the only required information being "2007-08-31" which is the date of acquisition. I have been able to extract this using the Stringr library:
regexp <- "([[:digit:]]{4})([-])([[:digit:]]{2})([-])([[:digit:]]{2})"
Date<-str_extract(Attr,pattern=regexp)
which returns the Date as:
"2007-08-31"
The only problem left now is that the Date isnt recognised as numeric or date. How do I change this as I need to bind the Date with the data for all days to create a time series (more like an identifier as the data sets are irregular), please? a sample of how it looks after extracting the dates from string and binding with the CO values for each date is below
Dates CO3b
[1,] "2011-03-01" 1.625811e+18
[2,] "2011-03-04" 1.655504e+18
[3,] "2011-03-11" 1.690428e+18
[4,] "2011-03-15" 1.679871e+18
[5,] "2011-03-17" 1.705987e+18
[6,] "2011-03-17" 1.661198e+18
[7,] "2011-03-17" 1.662694e+18
[8,] "2011-03-20" 1.520328e+18
[9,] "2011-03-21" 1.510642e+18
[10,] "2011-03-21" 1.556637e+18
However, R recognises these dates as character and not as date. I need to convert them to a time series I can work with.
Seems like you've already done all the hard work! Based off your comment, here's how you could take it across the finish line.
From your comment, seems like you have the strings in a good format. Given that your variable is named date, simply go
dateObjects<-as.Date(Date) #where Date is your variable
and either the single value or vector of character strings (as the format you gave in the comment) will now be date objects, which you could use with a library like zoo to create time series.
If your strings are not necessarily in the format you've described, then refer to the following link to see how to format other string forms as dates.
http://www.statmethods.net/input/dates.html
Given your example data frame you can create a time series in the following way, using the package zoo.
library(zoo)
datavect<-as.zoo(df$CO3b)
index(datavect)<-as.Date(df$Date)
here we take your CO data, covert it to a zoo object, then assign the appropriate date to each entry, converting it from a character to a date object. Now if you print datavect, you'll see each data entry attached to a date. This allows you to take advantage of zoo methods, such as merge and window.
Here is one approach not using string extraction. If you know how long your time series should be, which you should based on the length of your dataset and knowledge of its periodicity, you could just create a regular date series and then add that into a data.frame with other variables of interest. Assuming you have daily data the below would work. Obviously your length.out would be different.
d1 <- ISOdate(year=2007,month=8,day=31)
d2 <- as.Date(format(seq(from=d1,by="day",length.out=10),"%Y-%m-%d"))
[1] "2007-08-31" "2007-09-01" "2007-09-02" "2007-09-03" "2007-09-04" "2007-09-05" "2007-09-06" "2007-09-07" "2007-09-08" "2007-09-09"
class(d2)
[1] "Date"
Edit of Original:
Oh I see. Well after reading in your new data example the below worked for me. It was a pretty straight forward transform. cheers
library(magrittr) # Needed for the pipe operator %>% it makes it really easy to string steps together.
dateData
Dates CO3b
1 2011-03-01 1.63e+18
2 2011-03-04 1.66e+18
3 2011-03-11 1.69e+18
4 2011-03-15 1.68e+18
5 2011-03-17 1.71e+18
6 2011-03-17 1.66e+18
7 2011-03-17 1.66e+18
8 2011-03-20 1.52e+18
9 2011-03-21 1.51e+18
10 2011-03-21 1.56e+18
dateData %>% sapply(class) # classes before transforming (character,numeric)
dateData[,1] <- as.Date(dateData[,1]) # Transform to date
dateData %>% sapply(class) # classes after transforming (Date,numeric)
str(dateData) # one more check
'data.frame': 10 obs. of 2 variables:
$ Dates: Date, format: "2011-03-01" "2011-03-04" "2011-03-11" "2011-03-15" ...
$ CO3b : num 1.63e+18 1.66e+18 1.69e+18 1.68e+18 1.71e+18 ...

How to determine the correct argument for origin in as.Date, R

I have a data set in R that contains a column of dates in the format yyyy/mm/dd. I am trying to use as.Date to convert these dates to date objects in R. However, I cannot seem to find the correct argument for origin to input into as.Date. The following code is an example of what I have been trying. I am using a CSV file from Excel, so I used origin="1899/12/30 based on other sites I have looked at.
> as.Date(2001/04/26, origin="1899/12/30")
[1] "1900-01-18"
However, this is not working since the input date 2001/04/26 is returned as "1900-01-18". I need to convert the dates into date objects so I can then convert the dates into julian dates.
You can either is as.Date with a numeric value, or with a character value. When you type just 2001/04/26 into R, that's doing division and getting 19.24 (a numeric value). And numeric values require an origin and the number you supply is the offset from that origin. So you're getting 19 days away from your origin, ie "1900-01-18". A date like Apr 26 2001 would be
as.Date(40659, origin="1899-12-30")
# [1] "2011-04-26"
If your dates from Excel "look like" dates chances are they are character values (or factors). To convert a character value to a Date with as.Date() you want so specify a format. Here
as.Date("2001/04/26", format="%Y/%m/%d")
# [1] "2001-04-26"
see ?strptime for details on the special % variables. Now if you're read your data into a data.frame with read.table or something, there's a chance your variable may be a factor. If that's the case, you'll want do convert to character with'
as.Date(as.character(mydf$datecol), format="%Y/%m/%d")

Resources