Date-time data in R - r

I have my data in following format.
x <- c("2012-03-01T00:05:55+00:00", "2012-03-01T00:06:23+00:00",
"2012-03-01T00:06:52+00:00")
Actual data is very long.
My objective is
convert them to hourly time-series in R
to aggregate my data to hourly data

First convert your dates into a date-time class using asPOSIXct
df = data.frame(x = c("2012-03-01T00:05:55+00:00", "2012-03-01T00:06:23+00:00",
"2012-03-01T00:06:52+00:00"))
df$times = as.POSIXct(df$x, format = "%Y-%m-%dT00:%H:%M+%S")
Then extract just the hour part using format
df$hour = format(df$times, '%H')
This give you :
x times hour
1 2012-03-01T00:05:55+00:00 2012-03-01 05:55:00 05
2 2012-03-01T00:06:23+00:00 2012-03-01 06:23:00 06
3 2012-03-01T00:06:52+00:00 2012-03-01 06:52:00 06
Or you can extract the date and the hour using:
df$date_hour = format(df$times, '%Y-%m-%d:%H')
for more infor see ?strftime it says "A conversion specification is introduced by %, usually followed by a single letter or O or E and then a single letter. Any character in the format string not part of a conversion specification is interpreted literally (and %% gives %). Widely implemented conversion specifications include:... %H
Hours as decimal number (00–23). As a special exception strings such as 24:00:00 are accepted for input, since ISO 8601 allows these."
Now you can do any aggregartion you want using something like plyr::ddply
library(plyr)
ddply(df, .(hour), nrow)
hour V1
1 05 1
2 06 2
or
ddply(df, .(date_hour), nrow)
date_hour V1
1 2012-03-01:05 1
2 2012-03-01:06 2

Related

How can I merge three variables into one variable that represents the merged variables separated by a comma? [duplicate]

This question already has answers here:
Concatenate a vector of strings/character
(8 answers)
Closed 5 years ago.
I have three variables: Year, Month, and Day. How can I merge them into one variable ("Date") so that the variable is represented as such:
yyyy-mm-dd
Thanks in advance and best regards!
How do you merge three variables into one variable?
Consider two methods:
Old school
With dplyr, lubridate, and data frames
And consider the data types. You can have:
Number or character
Date or POSIXct final type
Old School Method
The old school method is straightforward. I assume you are using vectors or lists and don't know data frames yet. Let's take your data, force it to a standardized, unambiguous format, and concatenate the data.
> y <- 2012:2015
> y
[1] 2012 2013 2014 2015
> m <- 1:4
> m
[1] 1 2 3 4
> d <- 10:13
> d
[1] 10 11 12 13
Use as.numeric if you want to be safe and convert everything to the same format before concatenation. If you get any NA values you will need to handle them with the is.na function and provide a default value.
Use paste with the sep separator value set to your delimiter, in this case, the hyphen.
> paste(y,m,d, sep = '-')
[1] "2012-1-10" "2013-2-11" "2014-3-12" "2015-4-13"
Dataframe / Dplyr / Lubridate Way
> df <- data.frame(year = y, mon = m, day = d)
> df
year mon day
1 2012 1 10
2 2013 2 11
3 2014 3 12
4 2015 4 13
Below I do the following:
Take the df object
Create a new variable name Date
Concatenate the numeric variables y, m, and d with a - separator
Convert the string literal into a Date format with ymd()
> df %>%
mutate(Date = ymd(
paste(y,m,d, sep = '-')
)
)
year mon day Date
1 2012 1 10 2012-01-10
2 2013 2 11 2013-02-11
3 2014 3 12 2014-03-12
4 2015 4 13 2015-04-13
Below we create year-month-day character strings, yyyy-mm-dd character strings (similar except one digit month and day are zero padded out to 2 digits) and Date class. The last one prints out as yyyy-mm-dd and can be manipulated in ways that character strings can't, for example adding one to a Date class object gives the next day.
First we set up some sample input:
year <- c(2017, 2015, 2014)
month <- c(3, 1, 10)
day <- c(15, 9, 25)
convert to year-month-day character string This is not quite yyyy-mm-dd since 1 digit months and days are not zero padded to 2 digits:
paste(year, month, day, sep = "-")
## [1] "2017-3-15" "2015-1-9" "2014-10-25"
convert to Date class It prints on console as yyyy-mm-dd. Two alternatives:
as.Date(paste(year, month, day, sep = "-"))
## [1] "2017-03-15" "2015-01-09" "2014-10-25"
as.Date(ISOdate(year, month, day))
## [1] "2017-03-15" "2015-01-09" "2014-10-25"
convert to character string yyyy-mm-dd In this case 1 digit month and day are zero padded out to 2 characters. Two alternatives:
as.character(as.Date(paste(year, month, day, sep = "-")))
## [1] "2017-03-15" "2015-01-09" "2014-10-25"
sprintf("%d-%02d-%02d", year, month, day)
## [1] "2017-03-15" "2015-01-09" "2014-10-25"

how can i extract month and date and year from data column in R

I had a column with date datatype. in my column the dates are in 4/1/2007 format. now I want to extract month value from that column and date value from that column in different column in R. my date are from 01/01/2012 to 01/01/ 2015 plz help me.
If your variable is date type (as you say in the post) simply use following to extract month:
month_var = format(df$datecolumn, "%m") # this will give output like "09"
month_var = format(df$datecolumn, "%b") # this will give output like "Sep"
month_var = format(df$datecolumn, "%B") # this will give output like "September"
If your date variable in not in date format, then you will have to convert them into date format.
df$datecolumn<- as.Date(x, format = "%m/%d/%Y")
Assuming your initial data is character and not POSIX.
df <- data.frame(d = c("4/1/2007", "01/01/2012", "02/01/2015"),
stringsAsFactors = FALSE)
df
# d
# 1 4/1/2007
# 2 01/01/2012
# 3 02/01/2015
These are not yet "dates", just strings.
df$d2 = as.POSIXct(df$d, format = "%m/%d/%Y")
df
# d d2
# 1 4/1/2007 2007-04-01
# 2 01/01/2012 2012-01-01
# 3 02/01/2015 2015-02-01
Now they proper dates (in the R fashion). These two lines extract just a single component from each "date"; see ?strptime for details on all available formats.
df$dY = format(df$d2, "%Y")
df$dm = format(df$d2, "%m")
df
# d d2 dY dm
# 1 4/1/2007 2007-04-01 2007 04
# 2 01/01/2012 2012-01-01 2012 01
# 3 02/01/2015 2015-02-01 2015 02
An alternative method would be to extract the substrings from each string, but now you're getting into regex-pain; for that, I'd suggest sticking with somebody else's regex lessons-learned, and translate through POSIXct (or even POSIXlt if you want).

How to interpret H2O's time data type?

I have a data frame in R that I am passing to H2O using the as.h2o().
dataset.h2o <- as.h2o(dataset,destination_frame = "dataset.h2o")
Doing an str() on the data frame, we can see that the week_of_date class is of datatype Date
$ primary_account_id : int 31 31 31 31 31 31 31 31 31 31 ...
$ week_of_date : Date, format: "2015-08-31" "2015-09-07" "2015-09-14" "2015-09-21" ...
However, when viewed in H2O Flow, it seems to be converted to a datatype called time - which is of the format
week_of_date time 0 0 0 0 1440943200000.0 1447592400000.0 1444480409625.8884 2013362534.5706
When I bring back the data to R using as.data.frame
returned.dataset <- as.data.frame(dataset.h2o)
it is stored in a format that I am unable to understand and therefore parse back
$ primary_account_id: int 31 31 698 1060 1060 1060 1060 1060 1060 1133 ...
$ week_of_date :Class 'POSIXct' num [1:194] 1442757600000 1446382800000 1446382800000 1442152800000 1442757600000 ...
Could you please point me in the direction of how I can achieve better interoperability with dates between R and H2O?
Thanks!
It is a bug in h2o. H2o returns date time in milliseconds while R expects seconds. See jira issue 3434.
What you can do in the meantime is recode the date column:
as.Date(structure(returned.dataset$week_of_date/1000, class = c("POSIXct", "POSIXt")))
Refer to the response by phiver for a more detailed answer, but another simple workaround would be to convert the date columns to character before passing to H2O (if you do not need the column in a date format in H2O). Here is a simple example.
# construct a sample df with a date format column
df <- data.frame(week_of_date = as.Date(c('2015-09-29','2015-10-05')))
str(df$week_of_date)
Date[1:2], format: "2015-09-29" "2015-10-05"
# convert the column to H2O
df$week_of_date <- as.character(df$week_of_date)
str(df$week_of_date)
chr [1:2] "2015-09-29" "2015-10-05"
# convert to H2OFRAME and pass back to R data.frame and re-convert to date
df.hex <- as.h2o(df)
df2 <- as.data.frame(df.hex)
df2$week_of_date <- as.Date(df2$week_of_date)
str(df2$week_of_date)
Date[1:2], format: "2015-09-29" "2015-10-05"
Both answers above are great. However, my workaround which I deem more efficient would be to pass the dataset to h2o excluding the date column. Then when you train a model and then make predictions, these would have the same amount of fields/rows as that of the original dataset for which you could just attach the Date column to the predictions vector or matrix.
Of course, the predictions in this solutions is related to the period as for backtesting.
Converting to H2o and back is easy if the date-time columns are in the proper format. (Accuracy of times in milliseconds cab be lost). As mentioned in the H20 FAQ
H2O is set to auto-detect two major date/time formats. The first
format is for dates formatted as yyyy-MM-dd. ... The second date
format is for dates formatted as dd-MMM-yy.
Times are specified as HH:mm:ss. HH is a two-digit hour and must be a
value between 0-23 (for 24-hour time) or 1-12 (for a twelve-hour
clock). mm is a two-digit minute value and must be a value between 0-59.
ss is a two-digit second value and must be a value between 0-59.
Example
Example Data
dates <- c("02/27/92", "02/27/92", "01/14/92", "02/28/92", "02/01/92")
times <- c("23:03:20", "22:29:56", "01:03:30", "18:21:03", "16:56:26")
x <- paste(dates, times)
df <- data.frame(datetime = strptime(x, "%m/%d/%y %H:%M:%S"))
# > df
# datetime
# 1 1992-02-27 23:03:20
# 2 1992-02-27 22:29:56
# 3 1992-01-14 01:03:30
# 4 1992-02-28 18:21:03
# 5 1992-02-01 16:56:26
Change the format to one that H2o prefers
# Change format
df$datetime <- format(df$datetime, format = "%Y-%m-%d %H:%M:%S")
#H2o format
h2o_df <- as.h2o(df)
# Convert back
back_df <- as.data.frame(h2o_df)
back_df
# datetime
# 1 1992-02-27 23:03:20
# 2 1992-02-27 22:29:56
# 3 1992-01-14 01:03:30
# 4 1992-02-28 18:21:03
# 5 1992-02-01 16:56:26

Convert serial number to character representation in R

I am importing some weather data, but the timestamp is split across different columns. I want to join these columns and create posix objects out of them.
datenum <- c()
for (i in 1:dim(weather)[1]){
date_string <- paste0(weather$Year.UTC[i],'-',weather$Month.UTC[i],'-',weather$Day.UTC[i],'-',weather$Hour.UTC[i]) # different columns of data
# for i = 1, date_string = "2012-12-31-23"
datenum[i] <- as.POSIXct(date_string, format="%Y-%m-%d-%H",tz="GMT", origin = "1960-01-01")
# for i = 1, datenum[1] = 1356994800 (numeric)
}
as.Date(datenum[1], origin = "1960-01-01")
# Gives character = "7285-07-27"
To visually confirm that I am doing it right, I would like to see a string in the form "yyyy-mm-dd HH:MM:SS", which is what I try to obtain with as.Date. The origin is the same when converting to a serial number and back to a character, but the date is completely wrong. What I am doing wrong?
Why so complicated?
weather <- data.frame(Year.UTC=c(2012, 2013),
Month.UTC=c(1,2),
Day.UTC=c(1,2),
Hour.UTC=c(22,23))
weather <- within(weather, datetime <-
as.POSIXct(paste(Year.UTC, Month.UTC, Day.UTC, Hour.UTC, sep="-"),
format="%Y-%m-%d-%H", tz="UTC"))
# Year.UTC Month.UTC Day.UTC Hour.UTC datetime
#1 2012 1 1 22 2012-01-01 22:00:00
#2 2013 2 2 23 2013-02-02 23:00:00
As you see, you don't need a loop at all.

Split date data (m/d/y) into 3 separate columns

I need to convert date (m/d/y format) into 3 separate columns on which I hope to run an algorithm.(I'm trying to convert my dates into Julian Day Numbers). Saw this suggestion for another user for separating data out into multiple columns using Oracle. I'm using R and am throughly stuck about how to code this appropriately. Would A1,A2...represent my new column headings, and what would the format difference be with the "update set" section?
update <tablename> set A1 = substr(ORIG, 1, 4),
A2 = substr(ORIG, 5, 6),
A3 = substr(ORIG, 11, 6),
A4 = substr(ORIG, 17, 5);
I'm trying hard to improve my skills in R but cannot figure this one...any help is much appreciated. Thanks in advance... :)
I use the format() method for Date objects to pull apart dates in R. Using Dirk's datetext, here is how I would go about breaking up a date into its constituent parts:
datetxt <- c("2010-01-02", "2010-02-03", "2010-09-10")
datetxt <- as.Date(datetxt)
df <- data.frame(date = datetxt,
year = as.numeric(format(datetxt, format = "%Y")),
month = as.numeric(format(datetxt, format = "%m")),
day = as.numeric(format(datetxt, format = "%d")))
Which gives:
> df
date year month day
1 2010-01-02 2010 1 2
2 2010-02-03 2010 2 3
3 2010-09-10 2010 9 10
Note what several others have said; you can get the Julian dates without splitting out the various date components. I added this answer to show how you could do the breaking apart if you needed it for something else.
Given a text variable x, like this:
> x
[1] "10/3/2001"
then:
> as.Date(x,"%m/%d/%Y")
[1] "2001-10-03"
converts it to a date object. Then, if you need it:
> julian(as.Date(x,"%m/%d/%Y"))
[1] 11598
attr(,"origin")
[1] "1970-01-01"
gives you a Julian date (relative to 1970-01-01).
Don't try the substring thing...
See help(as.Date) for more.
Quick ones:
Julian date converters already exist in base R, see eg help(julian).
One approach may be to parse the date as a POSIXlt and to then read off the components. Other date / time classes and packages will work too but there is something to be said for base R.
Parsing dates as string is almost always a bad approach.
Here is an example:
datetxt <- c("2010-01-02", "2010-02-03", "2010-09-10")
dates <- as.Date(datetxt) ## you could examine these as well
plt <- as.POSIXlt(dates) ## now as POSIXlt types
plt[["year"]] + 1900 ## years are with offset 1900
#[1] 2010 2010 2010
plt[["mon"]] + 1 ## and months are on the 0 .. 11 intervasl
#[1] 1 2 9
plt[["mday"]]
#[1] 2 3 10
df <- data.frame(year=plt[["year"]] + 1900,
month=plt[["mon"]] + 1, day=plt[["mday"]])
df
# year month day
#1 2010 1 2
#2 2010 2 3
#3 2010 9 10
And of course
julian(dates)
#[1] 14611 14643 14862
#attr(,"origin")
#[1] "1970-01-01"
To convert date (m/d/y format) into 3 separate columns,consider the df,
df <- data.frame(date = c("01-02-18", "02-20-18", "03-23-18"))
df
date
1 01-02-18
2 02-20-18
3 03-23-18
Convert to date format
df$date <- as.Date(df$date, format="%m-%d-%y")
df
date
1 2018-01-02
2 2018-02-20
3 2018-03-23
To get three seperate columns with year, month and date,
library(lubridate)
df$year <- year(ymd(df$date))
df$month <- month(ymd(df$date))
df$day <- day(ymd(df$date))
df
date year month day
1 2018-01-02 2018 1 2
2 2018-02-20 2018 2 20
3 2018-03-23 2018 3 23
Hope this helps.
Hi Gavin: another way [using your idea] is:
The data-frame we will use is oilstocks which contains a variety of variables related to the changes over time of the oil and gas stocks.
The variables are:
colnames(stocks)
"bpV" "bpO" "bpC" "bpMN" "bpMX" "emdate" "emV" "emO" "emC"
"emMN" "emMN.1" "chdate" "chV" "cbO" "chC" "chMN" "chMX"
One of the first things to do is change the emdate field, which is an integer vector, into a date vector.
realdate<-as.Date(emdate,format="%m/%d/%Y")
Next we want to split emdate column into three separate columns representing month, day and year using the idea supplied by you.
> dfdate <- data.frame(date=realdate)
year=as.numeric (format(realdate,"%Y"))
month=as.numeric (format(realdate,"%m"))
day=as.numeric (format(realdate,"%d"))
ls() will include the individual vectors, day, month, year and dfdate.
Now merge the dfdate, day, month, year into the original data-frame [stocks].
ostocks<-cbind(dfdate,day,month,year,stocks)
colnames(ostocks)
"date" "day" "month" "year" "bpV" "bpO" "bpC" "bpMN" "bpMX" "emdate" "emV" "emO" "emC" "emMN" "emMX" "chdate" "chV"
"cbO" "chC" "chMN" "chMX"
Similar results and I also have date, day, month, year as separate vectors outside of the df.

Resources