Convert dd/mm/yyyy H to date format - r

I have the following column in R
dteday = c("01/01/2011 0", "01/01/2011 1" , "01/01/2011 2", "01/01/2011 19")
df = data.frame(dteday)
dteday
1 01/01/2011 0
2 01/01/2011 1
3 01/01/2011 2
4 01/01/2011 19
I want the column to be converted into a proper %d/%m/%Y H:M format
The string on the left is in %d/%m/%Y format while the integer on the right is the hour. The integer on the right represents the hour / time. This is my desired output
dteday
1 01/01/2011 00:00
2 01/01/2011 01:00
3 01/01/2011 02:00
4 01/01/2011 19:00

That specificly-formatted output can be achieved by combining strftime and as.POSIXct, but it will still be a character string
df$dteday = strftime(as.POSIXct(df$dteday, format = "%d/%m/%Y %H"), format = "%d/%m/%Y %H:%M")
# dteday
# 1 01/01/2011 00:00
# 2 01/01/2011 01:00
# 3 01/01/2011 02:00
# 4 01/01/2011 19:00

Without converting to date you could do:
sapply(
strsplit(dteday, ' '),
function(x) sprintf('%s %02d:00', x[1], as.integer(x[2])))
)
# [1] "01/01/2011 00:00" "01/01/2011 01:00" "01/01/2011 02:00" "01/01/2011 19:00"

You can use lubridate's dmy_h to convert string to POSIXct.
df$dteday <- lubridate::dmy_h(df$dteday)
df$dteday
#[1] "2011-01-01 00:00:00 UTC" "2011-01-01 01:00:00 UTC"
# "2011-01-01 02:00:00 UTC" "2011-01-01 19:00:00 UTC"
Then use format to get data in format of your choice.
format(df$dteday, '%d/%m/%Y %H:%M')
#[1] "01/01/2011 00:00" "01/01/2011 01:00" "01/01/2011 02:00" "01/01/2011 19:00"

Related

Why can I not put POSIXct objects into a data frame in R?

We have the code:
times <- c("2:30 PM", "10:00 AM", "10:00 AM")
mydat <- data.frame(times=times)
which results in
> mydat
times
1 2:30 PM
2 10:00 AM
3 10:00 AM
I want to convert these times, which are characters, into POSIX format. So I do
mydat$ntimes <- as.POSIXct(NA,"")
mydat$ntimes <- sapply(mydat$times, function(x) parse_date_time(x, '%I:%M %p'))
Then we get
> mydat
times ntimes
1 2:30 PM -62167167000
2 10:00 AM -62167183200
3 10:00 AM -62167183200
I have no idea why these are negative. Furthermore, if instead of sapply we did a loop:
for (i in 1:length(mydat$times)){
mydat$ntimes[i] <- parse_date_time(mydat$times[i], '%I:%M %p')
}
we get the format right, but everything is off by 7 minutes and 2 seconds, why is that?
> mydat
times ntimes
1 2:30 PM 0000-01-01 06:37:02
2 10:00 AM 0000-01-01 02:07:02
3 10:00 AM 0000-01-01 02:07:02
You don't need a loop for this :
as.POSIXct(mydat$times, format = '%I:%M %p', tz = 'UTC')
#[1] "2021-03-14 14:30:00 UTC" "2021-03-14 10:00:00 UTC" "2021-03-14 10:00:00 UTC"
Or
lubridate::parse_date_time(mydat$times, '%I:%M %p')
#[1] "0000-01-01 14:30:00 UTC" "0000-01-01 10:00:00 UTC" "0000-01-01 10:00:00 UTC"
The difference in two options is that when the date is absent as.POSIXct will give today's date whereas parse_date_time will give 0000-01-01.
Base R Solution
You can use the strptime function to convert the times variable of character type to POSIXlt. Without a date provided, this function also returns todays date.
times <- c("2:30 PM", "10:00 AM", "10:00 AM")
mydat <- data.frame(times=times)
# FORMAT SPECIFICATIONS:
# %I = Hours as decimal number (01–12).
# %M = Minute as decimal number (00–59).
# %p = AM/PM indicator in the locale.
strptime(mydat$times, format='%I:%M %p', tz = 'UTC')
#> [1] "2021-03-13 14:30:00 UTC" "2021-03-13 10:00:00 UTC"
#> [3] "2021-03-13 10:00:00 UTC"
Created on 2021-03-13 by the reprex package (v0.3.0)
Add it to the data frame as a new variable
times <- c("2:30 PM", "10:00 AM", "10:00 AM")
mydat <- data.frame(times=times)
mydat$new_times <- strptime(mydat$times, format='%I:%M %p')
#> times new_times
#> 1 2:30 PM 2021-03-13 14:30:00
#> 2 10:00 AM 2021-03-13 10:00:00
#> 3 10:00 AM 2021-03-13 10:00:00
Created on 2021-03-13 by the reprex package (v0.3.0)

CSV date format changes after import into R

I tried to import csv with date format:
3/1/2017 0:00
3/1/2017 1:00
3/1/2017 2:00
3/1/2017 3:00
3/1/2017 4:00
3/1/2017 5:00
into R, however the date format appears in R become:
2017-03-01 00:00:00 2017-03-01 01:00:00 2017-03-01 02:00:00 2017-03-01 03:00:00 2017-03-01 04:00:00 2017-03-01 05:00:00
How can I read csv into R as the original format without changing anything?
It is in the "original" format, in the sense that you're probably looking at a POSIXct or POSIXlt object. You can reformat dates and datetimes using format() or strftime(), but this will render them character.
So as long as you're working with the datetime objects, just leave it as is. If you need to report, you can use any of the aforementioned functions to format the string:
x <- "3/1/2017 3:00"
x1 <- as.POSIXct(x, format = "%d/%m/%Y %H:%M")
x1
# [1] "2017-01-03 03:00:00 CET"
strftime(x1, format = "%d/%m/%Y %H:%M")
# [1] "03/01/2017 03:00"
format(x1, format = "%d/%m/%Y %H:%M")
# [1] "03/01/2017 03:00"

Text process using R

I am quite new in programming and R Software.
My data-set includes date-time variables as following:
2007/11/0103
2007/11/0104
2007/11/0105
2007/11/0106
I need an operator which count from left up to the character number 10 and then execute a space and copy the last two characters and then add :00 for all columns.
Expected results:
2007/11/01 03:00
2007/11/01 04:00
2007/11/01 05:00
2007/11/01 06:00
If you want to actually turn your data into a "POSIXlt" "POSIXt" class in R (so you could subtract/add days, minutes and etc from/to it) you could do
# Your data
temp <- c("2007/11/0103", "2007/11/0104", "2007/11/0105", "2007/11/0106")
temp2 <- strptime(temp, "%Y/%m/%d%H")
## [1] "2007-11-01 03:00:00 IST" "2007-11-01 04:00:00 IST" "2007-11-01 05:00:00 IST" "2007-11-01 06:00:00 IST"
You could then extract hours for example
temp2$hour
## [1] 3 4 5 6
Add hours
temp2 + 3600
## [1] "2007-11-01 04:00:00 IST" "2007-11-01 05:00:00 IST" "2007-11-01 06:00:00 IST" "2007-11-01 07:00:00 IST"
And so on. If you just want the format you mentioned in your question (which is just a character string), you can also do
format(strptime(temp, "%Y/%m/%d%H"), format = "%Y/%m/%d %H:%M")
#[1] "2007/11/01 03:00" "2007/11/01 04:00" "2007/11/01 05:00" "2007/11/01 06:00"
Try
library(lubridate)
dat <- read.table(text="2007/11/0103
2007/11/0104
2007/11/0105
2007/11/0106",header=F,stringsAsFactors=F)
dat$V1 <- format(ymd_h(dat$V1),"%Y/%m/%d %H:%M")
dat
# V1
# 1 2007/11/01 03:00
# 2 2007/11/01 04:00
# 3 2007/11/01 05:00
# 4 2007/11/01 06:00
Suppose your dates are a vector named dates
library(stringr)
paste0(paste(str_sub(dates, end=10), str_sub(dates, 11)), ":00")
paste and substr are your friends here. Type ? before either to see the documentation
my.parser <- function(a){
paste0(substr(a, 0,10),' ',substr(a,11,12),':00') # paste0 is like paste but does not add whitespace
}
a<- '2007/11/0103'
my.parser(a) # = "2007/11/01 03:00"

Splitting a factor at a space in R

I want to split x (which is a factor)
dd = data.frame(x = c("29-4-2014 06:00:00", "9-4-2014 12:00:00", "9-4-2014 00:00:00", "6-5-2014 00:00:00" ,"7-4-2014 00:00:00" , "29-5-2014 00:00:00"))
x
29-4-2014 06:00:00
9-4-2014 12:00:00
9-4-2014 00:00:00
6-5-2014 00:00:00
7-4-2014 00:00:00
29-5-2014 00:00:00
at the horizontal space and get two columns as:
x.date x.time
29-4-2014 06:00:00
9-4-2014 12:00:00
9-4-2014 00:00:00
6-5-2014 00:00:00
7-4-2014 00:00:00
29-5-2014 00:00:00
Any suggestion is appreciated!
strsplit is typically used here, but you can also use read.table:
read.table(text = as.character(dd$x))
# V1 V2
# 1 29-4-2014 06:00:00
# 2 9-4-2014 12:00:00
# 3 9-4-2014 00:00:00
# 4 6-5-2014 00:00:00
# 5 7-4-2014 00:00:00
# 6 29-5-2014 00:00:00
Other option (better)
# Convert to POSIXct objects
times <- as.POSIXct(dd$x, format="%d-%m-%Y %T")
# You may also want to specify the time zone
times <- as.POSIXct(dd$x, format="%d-%m-%Y %T", tz="GMT")
Then, to extract times
strftime(times, "%T")
[1] "06:00:00" "12:00:00" "00:00:00" "00:00:00" "00:00:00" "00:00:00"
or dates
strftime(times, "%D")
[1] "04/29/14" "04/09/14" "04/09/14" "05/06/14" "04/07/14" "05/29/14"
or, any format you want, really
strftime(times, "%d %b %Y at %T")
[1] "29 Apr 2014 at 06:00:00" "09 Apr 2014 at 12:00:00"
[3] "09 Apr 2014 at 00:00:00" "06 May 2014 at 00:00:00"
[5] "07 Apr 2014 at 00:00:00" "29 May 2014 at 00:00:00"
See, for more info: ?as.POSIXct and ?strftime
Here is another approach using lubridate:
dd = data.frame(x = c("29-4-2014 06:00:00", "9-4-2014 12:00:00", "9-4-2014 00:00:00", "6-5-2014 00:00:00" ,"7-4-2014 00:00:00" , "29-5-2014 00:00:00"),
stringsAsFactors = FALSE)
Note the use of stringsAsFactors = FALSE, which prevents your dates from being read as factors.
library(lubridate)
dd2 <- transform(dd,x2 = dmy_hms(x))
transform(dd2, the_year = year(x2))
x x2 the_year
1 29-4-2014 06:00:00 2014-04-29 06:00:00 2014
2 9-4-2014 12:00:00 2014-04-09 12:00:00 2014
3 9-4-2014 00:00:00 2014-04-09 00:00:00 2014
4 6-5-2014 00:00:00 2014-05-06 00:00:00 2014
5 7-4-2014 00:00:00 2014-04-07 00:00:00 2014
6 29-5-2014 00:00:00 2014-05-29 00:00:00 2014

obtain hour from DateTime vector

I have a DateTime vector within a data.frame where the data frame is made up of 8760 observations representing hourly intervals throughout the year e.g.
2010-01-01 00:00
2010-01-01 01:00
2010-01-01 02:00
2010-01-01 03:00
and so on.
I would like to create a data.frame which has the original DateTime vector as the first column and then the hourly values in the second column e.g.
2010-01-01 00:00 00:00
2010-01-01 01:00 01:00
How can this be achieved?
Use format or strptime to extract the time information.
Create a POSIXct vector:
x <- seq(as.POSIXct("2012-05-21"), by=("+1 hour"), length.out=5)
Extract the time:
data.frame(
date=x,
time=format(x, "%H:%M")
)
date time
1 2012-05-21 00:00:00 00:00
2 2012-05-21 01:00:00 01:00
3 2012-05-21 02:00:00 02:00
4 2012-05-21 03:00:00 03:00
5 2012-05-21 04:00:00 04:00
If the input vector is a character vector, then you have to convert to POSIXct first:
Create some data
dat <- data.frame(
DateTime=format(seq(as.POSIXct("2012-05-21"), by=("+1 hour"), length.out=5), format="%Y-%m-%d %H:%M")
)
dat
DateTime
1 2012-05-21 00:00
2 2012-05-21 01:00
3 2012-05-21 02:00
4 2012-05-21 03:00
5 2012-05-21 04:00
Split time out:
data.frame(
DateTime=dat$DateTime,
time=format(as.POSIXct(dat$DateTime, format="%Y-%m-%d %H:%M"), format="%H:%M")
)
DateTime time
1 2012-05-21 00:00 00:00
2 2012-05-21 01:00 01:00
3 2012-05-21 02:00 02:00
4 2012-05-21 03:00 03:00
5 2012-05-21 04:00 04:00
Or generically, not treating them as dates, you can use the following provided that the time and dates are padded correctly.
library(stringr)
df <- data.frame(DateTime = c("2010-01-01 00:00", "2010-01-01 01:00", "2010-01-01 02:00", "2010-01-01 03:00"))
df <- data.frame(df, Time = str_sub(df$DateTime, -5, -1))
It depends on your needs really.
Using lubridate
library(stringr)
library(lubridate)
library(plyr)
df <- data.frame(DateTime = c("2010-01-01 00:00", "2010-01-01 01:00", "2010-01-01 02:00", "2010-01-01 03:00"))
df <- mutate(df, DateTime = ymd_hm(DateTime),
time = str_c(hour(DateTime), str_pad(minute(DateTime), 2, side = 'right', pad = '0'), sep = ':'))
On a more general note, for anyone that comes here from google and maybe wants to group by hour:
The key here is: lubridate::hour(datetime)
p22 in the cran doc here: https://cran.r-project.org/web/packages/lubridate/lubridate.pdf

Resources