I have a factor column DATE in a dataframe a that shows dates written like this:
01/01/2012 00
It shows the day, the month, the year and the hour.
On stackoverflow I found this way to transform from factor to datetime:
a$DATE <- as.POSIXct(as.character(a$DATE), format = "%d/%m/%Y %H")
However when I try to check the dataframe by View(a) I only get to see the date without the hour. All the dates appear like this:
2012-01-01
I have also tried to specify datetime by saving the dataframe in a csv and importing it again through the Rstudio button "Import Dataset". When I specify the type by clicking on the header of the DATE column I get the same error: the hour doesn't show.
Is the method I used correct?
If yes, how can I show the hour?
If it's not possible to show the hour, how can I get the hour from the POSIXct type?
I can't seem to reproduce your issue, could you possible provide a complete minimal reproducible example that demonste the issue?
Here's what I got.
times <- c("01/01/2012 00", "30/11/2013 11", "17/03/2014 23")
times_factor <- as.factor(times)
times_factor
#> [1] 01/01/2012 00 30/11/2013 11 17/03/2014 23
#> Levels: 01/01/2012 00 17/03/2014 23 30/11/2013 11
foo <- as.POSIXct(times_factor, format = "%d/%m/%Y %H")
foo
#> [1] "2012-01-01 00:00:00 CET" "2013-11-30 11:00:00 CET" "2014-03-17 23:00:00 CET"
bar <- format(foo,"%d/%m/%Y %H")
bar
#> [1] "01/01/2012 00" "30/11/2013 11" "17/03/2014 23"
# install.packages(c("tidyverse"), dependencies = TRUE)
library(lubridate)
dmy_h(times_factor, quiet = T, tz = "CET")
#> [1] "2012-01-01 00:00:00 CET" "2013-11-30 11:00:00 CET" "2014-03-17 23:00:00 CET"
Related
This question already has answers here:
Convert date-time string to class Date
(4 answers)
Closed 3 years ago.
I have date&time stamp as a character variable
"2018-12-13 11:00:01 EST" "2018-10-23 22:00:01 EDT" "2018-11-03 14:15:00 EDT" "2018-10-04 19:30:00 EDT" "2018-11-10 17:15:31 EST" "2018-10-05 13:30:00 EDT"
How can I strip the time from this character vector?
PS: Can someone please help. I have tried using strptime but I am getting NA values as a result
It's a bit unclear whether you want the date or time but if you want the date then as.Date ignores any junk after the date so:
x <- c("2018-12-13 11:00:01 EST", "2018-10-23 22:00:01 EDT")
as.Date(x)
## [1] "2018-12-13" "2018-10-23"
would be sufficient to get a Date vector from the input vector x. No packages are used.
If you want the time then:
read.table(text = x, as.is = TRUE)[[2]]
## [1] "11:00:01" "22:00:01"
If you want a data frame with each part in a separate column then:
read.table(text = x, as.is = TRUE, col.names = c("date", "time", "tz"))
## date time tz
## 1 2018-12-13 11:00:01 EST
## 2 2018-10-23 22:00:01 EDT
I think the OP wants to extract the time from date-time variable (going by the title of the question).
x <- "2018-12-13 11:00:01 EST"
as.character(strptime(x, "%Y-%m-%d %H:%M:%S"), "%H:%M:%S")
[1] "11:00:01"
Another option:
library(lubridate)
format(ymd_hms(x, tz = "EST"), "%H:%M:%S")
[1] "11:00:01"
The package lubridate makes everything like this easy:
library(lubridate)
x <- "2018-12-13 11:00:01 EST"
as_date(ymd_hms(x))
You can use the as.Date function and specify the format
> as.Date("2018-12-13 11:00:01 EST", format="%Y-%m-%d")
[1] "2018-12-13"
If all values are in a vector:
x = c("2018-12-13 11:00:01 EST", "2018-10-23 22:00:01 EDT",
"2018-11-03 14:15:00 EDT", "2018-10-04 19:30:00 EDT",
"2018-11-10 17:15:31 EST", "2018-10-05 13:30:00 EDT")
> as.Date(x, format="%Y-%m-%d")
[1] "2018-12-13" "2018-10-23" "2018-11-03" "2018-10-04" "2018-11-10"
[6] "2018-10-05"
Good Afternoon! I have data which consist of date and time of share price. I need to join this data to the one column.
date time open high low close
1 1999.04.08 11:00 1.0803 1.0817 1.0797 1.0809
2 1999.04.08 12:00 1.0808 1.0821 1.0806 1.0807
3 1999.04.08 13:00 1.0809 1.0814 1.0801 1.0813
4 1999.04.08 14:00 1.0819 1.0845 1.0815 1.0844
5 1999.04.08 15:00 1.0839 1.0857 1.0832 1.0844
6 1999.04.08 16:00 1.0842 1.0852 1.0824 1.0834
I tried to do that using this function:
df1 <- within(data, { timestamp = strptime(paste(date, time), "%Y/%m/%d%H:%M:%S") })
but I got the column of NAs.
Also I tried to do that using:
data$date_time = mdy_hm(paste(data$date, data$time))
but I got again the error:
Warning message:
All formats failed to parse. No formats found.
Please, tell me what I do wrong.
In your particular example, let's break it down first to see why you are getting NA values, and then generate a solution that creates your desired results.
> date <- c("1999.04.08", "1999.04.08")
> time <- c("11:00", "12:00")
> df <- data.frame(date, time, stringsAsFactors = F)
> df
date time
1 1999.04.08 11:00
2 1999.04.08 12:00
> str(df)
'data.frame': 2 obs. of 2 variables:
$ date: chr "1999.04.08" "1999.04.08"
$ time: chr "11:00" "12:00"
Don't forget to use str to understand the data type(s) you are dealing with. That can and will greatly influence the answer to your question. Looking at the help description of function strptime, we see the following definition:
strptime converts character vectors to class "POSIXlt": its input x is first converted by as.character. Each input string is processed as far as necessary for the format specified: any trailing characters are ignored.
So, let's break down your code:
df1 <- within(data,
{ timestamp = strptime(paste(date, time),
"%Y/%m/%d%H:%M:%S")
})
First, the paste function:
> paste(date[1], time[1])
[1] "1999.04.08 11:00"
This generates a character vector with the format above.
Next, the strptime command.
> strptime(paste(date[1], time[1]), "%Y/%m/%d%H:%M:%S")
[1] NA
Okay, we see an NA. First, be sure to explicitly write format =, if it reads as tedious, then you should not be having any problems writing flawless code that you will remember forever. Looking at the help code we see:
x <- c("1jan1960", "2jan1960", "31mar1960", "30jul1960")
z <- strptime(x, "%d%b%Y")
> z
[1] "1960-01-01 PST" "1960-01-02 PST" "1960-03-31 PST" "1960-07-30 PDT"
Notice the help section also defines upper/lower case Y, and the same with the month and date variables. In your case, you are trying to extract something of the following form: YYYY/mm/ddHH:MM:SS, such as 2017/20/1111:28:30. Do you see the issue now?
Using your string extraction attempt, we modify it slightly to get the format you are looking for:
> strptime(paste(date, time), format = "%Y.%m.%d %H:%M")
[1] "1999-04-08 11:00:00 PDT" "1999-04-08 12:00:00 PDT"
Putting it all together you get:
> df1 <- within(df, {timestamp = strptime(paste(date, time), format = "%Y.%m.%d %H:%M")})
> str(df1)
'data.frame': 2 obs. of 3 variables:
$ date : chr "1999.04.08" "1999.04.08"
$ time : chr "11:00" "12:00"
$ timestamp: POSIXlt, format: "1999-04-08 11:00:00" "1999-04-08 12:00:00"
> df1
date time timestamp
1 1999.04.08 11:00 1999-04-08 11:00:00
2 1999.04.08 12:00 1999-04-08 12:00:00
Oh yeah, and try out the dplyr package.
library(dplyr)
> df %>%
mutate(ts = as.POSIXct(paste(date,time),
format = "%Y.%m.%d %H:%M"))
date time ts
1 1999.04.08 11:00 1999-04-08 11:00:00
2 1999.04.08 12:00 1999-04-08 12:00:00
I have a regular 5 minute interval datetime data sets (about 50). POSIXt/ lubridate functions convert my datetime very nicely to a 24 hour format as required. But I would like to add another column with my day's definition to be from 6 am to 6 am (which is currently midnight to midnight). I am trying to do this to capture after 12AM activity as a part of current date rather than the next one.
I am currently trying to create a group every 288th row (there are 288 5minute intervals in a day). But it creates a problem because my datasets don't necessarily start at a unique time.
I do not want to create offsets because that tampers with the values corresponding to the time.
Any efficient ways around this problem? Thank you.
You can efficiently do it by first generating a sequence of date/times, then using cut to find the bin in which each value falls:
set.seed(2)
dat <- Sys.time() + sort(runif(10, min=0, max=5*24*60*60))
dat
# [1] "2017-07-29 15:43:10 PDT" "2017-07-29 20:23:12 PDT" "2017-07-29 22:24:22 PDT" "2017-07-31 08:22:57 PDT"
# [5] "2017-07-31 18:13:06 PDT" "2017-07-31 21:01:10 PDT" "2017-08-01 12:30:19 PDT" "2017-08-02 04:14:03 PDT"
# [9] "2017-08-02 17:26:14 PDT" "2017-08-02 17:28:52 PDT"
sixs <- seq(as.POSIXct("2017-07-29 06:00:00", tz = "UTC"), as.POSIXct("2017-08-03 06:00:00", tz = "UTC"), by = "day")
sixs
# [1] "2017-07-29 06:00:00 UTC" "2017-07-30 06:00:00 UTC" "2017-07-31 06:00:00 UTC" "2017-08-01 06:00:00 UTC"
# [5] "2017-08-02 06:00:00 UTC" "2017-08-03 06:00:00 UTC"
cut(dat, sixs, label = FALSE)
# [1] 1 1 1 3 3 3 4 5 5 5
According to the help page (?seq.POSIXt), you might choose by="DSTday" instead.
Checkout this question and the corresponding answer: How to manipulate the time part of a date column?
It illustrates a more robust solution as it is independent of your data structure (e.g. repeatition).
Following #meenaparam's solution:
Convert all date columns to dmy_hms format from lubridate package. Please explore other options like dmy_hm or ymd_hms etc, as per your specific need.
mutate(DATE = dmy_hms(DATE))
Now create a column to identify the data points that need to be modified in different ways. Like your data points with 00:00:00 to 05:59:59 (hms) needs to be part of the previous date.
DAY_PAST = case_when(hour(DATE) < 6 ~ "yup", TRUE ~ "nope"))
Now convert the day value of these "yup" dates to day(DATE)-1
NEW_DATE = case_when(DAY_PAST == "yup"
~ make_datetime(year(DATE-86400), month(DATE-86400), day = day(DATE-86400), hour = hour(DATE)),
TRUE ~ DATE)
.
I have a data set containing the following date, along with several others
03/12/2017 02:17:13
I want to put the whole data set into a data table, so I used read_csv and as.data.table to create DT which contained the date/time information in date.
Next I used
DT[, date := as.POSIXct(date, format = "%m/%d/%Y %H:%M:%S")]
Everything looked fine except I had some NA values where the original data had dates. The following expression returns an NA
as.POSIXct("03/12/2017 02:17:13", format = "%m/%d/%Y %H:%M:%S")
The question is why and how to fix.
Just use functions anytime() or utctime() from package anytime
R> library(anytime)
R> anytime("03/12/2017 02:17:13")
[1] "2017-03-12 01:17:13 CST"
R>
or
R> utctime("03/12/2017 02:17:13")
[1] "2017-03-11 20:17:13 CST"
R>
The real crux is that time did not exists in North America due to DST. You could parse it as UTC as UTC does not observer daylight savings:
R> utctime("03/12/2017 02:17:13", tz="UTC")
[1] "2017-03-12 02:17:13 UTC"
R>
You can express that UTC time as Mountain time, but it gets you the previous day:
R> utctime("03/12/2017 02:17:13", tz="America/Denver")
[1] "2017-03-11 19:17:13 MST"
R>
Ultimately, you (as the analyst) have to provide as to what was measured. UTC would make sense, the others may need adjustment.
My solution is below but ways to improve appreciated.
The explanation for the NA is that in the mountain time zone in the US, that date and time is in the window of the switch to daylight savings where the time doesn't exist, hence NA. While the time zone is not explicitly specified, I guess R must be picking it up from the computer's time, which is in "America/Denver"
The solution is to explicitly state the date/time string is in UTC and then convert back as follows:
time.utc <- as.POSIXct("03/12/2017 02:17:13", format = "%m/%d/%Y %H:%M:%S", tz = "UTC")
> time.utc
[1] "2017-03-12 02:17:13 UTC"
>
Next, add 6 hours to the UTC time which is the difference between UTC and MST
time.utc2 <- time.utc + 6 * 60 * 60
> time.utc2
[1] "2017-03-12 08:17:13 UTC"
>
Now convert to America/Denver time using daylight savings.
time.mdt <- format(time.utc2, usetz = TRUE, tz = "America/Denver")
> time.mdt
[1] "2017-03-12 01:17:13 MST"
>
Note that this is in standard time, because daylight savings doesn't start until 2 am.
If you change the original string from 2 am to 3 am, you get the following
> time.mdt
[1] "2017-03-12 03:17:13 MDT"
>
The hour between 2 and 3 is lost in the change from standard to daylight savings but the data are now correct.
i have a time series Data with 10 Minutes difference when i try to convert to date and time type using `df$Time1 <- dmy_hm(df$Time, tz="Asia/Calcutta")
it returns NA at 24 o Clock time interval as you can see i have tried with df$Time1 <- dmy_hm(df$Time, tz="Asia/Calcutta")and df$Time1 = as.POSIXct(df$Time, format="%d-%m-%y %H:%M") Please do guide me on this i am clueless whats happening at 02-07-16 00:00
One option would be using parse_date_time from lubridate which can take multiple formats
library(lubridate)
parse_date_time(df$Time, c('dmy_HM', 'dmy'))
#[1] "2016-07-01 23:30:00 UTC" "2016-07-01 23:40:00 UTC"
#[3] "2016-07-01 23:50:00 UTC" "2016-07-02 00:00:00 UTC"
data
df <- data.frame(Time = c("01-07-16 23:30", "01-07-16 23:40", "01-07-16 23:50",
"02-07-16"))