Converting date into timestamp in R using strptime() function - r

I've read a txt file into R as a CSV. As I understand, R will not recognise the strings as timestamps automatically, so I’ll need to convert them from text values using the strptime() function.
Here's an input of my text file:
29/1/12 19:48
30/1/12 21:07
2/2/12 15:53
3/4/12 0:49
5/10/12 2:00
24/10/12 17:11
14/11/12 3:49
11/8/13 16:00
12/7/14 17:00
31/7/14 8:08
31/7/14 10:48
6/8/14 9:24
16/12/14 3:34
24/1/15 19:37
16/6/15 15:55
16/6/15 19:56
18/6/15 1:24
25/6/15 17:20
26/6/15 18:28
1/7/15 15:58
1/7/15 18:05
2/7/15 18:20
2/7/15 18:59
I have tried:
y <- strptime(timestamp, "%d/%m/%y %H:%M")
but keep on getting NA.
Can someone help me to solve this? Thank you.

# Make some sample data
timestamp <- c('29/1/12 19:48','30/1/12 21:07','2/2/12 15:53')
# Check that you are dealing characters not factors
str(timestamp)
#chr [1:3] "29/1/12 19:48" "30/1/12 21:07" "2/2/12 15:53"
# Best to specify the time zone when using striptime
strptime(timestamp, "%d/%m/%y %H:%M", tz = 'UTC')
#[1] "2012-01-29 19:48:00 UTC" "2012-01-30 21:07:00 UTC" "2012-02-02 15:53:00 UTC"
# The lubridate package is very useful for working with dates/times
library(lubridate)
lubridate::dmy_hm(timestamp)
#[1] "2012-01-29 19:48:00 UTC" "2012-01-30 21:07:00 UTC" "2012-02-02 15:53:00 UTC"

Related

R converting datetime format in foreign language not working

I am trying to use the package parsedate to parse/convert several different datetime formats into a uniform/homogenous format the issue is that some dates will be in English (my machine language) and some will be in Spanish allow me to illustrate:
I have two vectors:
#English dates
dates<-c("2016 jun 15 8:39 p.m","2016 apr 2 8:39 a.m","2016 dec 2 8:39 a.m")
#Spanish dates
fechas<-c("2016 junio 15 8:39 p.m","2016 abril 2 8:39 a.m","2016 diciembre 2 8:39 a.m")
I noticed that the function parse_date() correctly converts the vector dates into the desired output format, but when trying to parse the vector with Spanish dates it does not work even when changing local time to "Spanish" as is shown below:
#Parsing english dates
parsedate::parse_date(dates)
> parsedate::parse_date(dates)
[1] "2016-06-15 08:39:00 UTC" "2016-04-02 08:39:00 UTC" "2016-12-02 08:39:00 UTC"
#Parsing spanish dates
Sys.setlocale("LC_TIME", "Spanish")
parsedate::parse_date(fechas)
> Sys.setlocale("LC_TIME", "Spanish")
[1] "Spanish_Spain.1252"
> parsedate::parse_date(fechas)
[1] "2016-01-15 08:39:00 UTC" "2016-01-02 08:39:00 UTC" "2016-01-02 08:39:00 UTC"
The Spanish output is wrong because it should return the same output in the English dates, I have tried several ways to properly change the local time of my machine to Spanish with no luck.
I will be very thankful if you can help me.
See here https://github.com/tidyverse/lubridate/issues/781
Sys.setlocale("LC_TIME", "Spanish_Spain.1252")
format <- "%a#%A#%b#%B#%p#"
enc2utf8(unique(format(lubridate:::.date_template, format = format)))
str(lubridate:::.get_locale_regs("Spanish_Spain.1252"))
library(lubridate)
Sys.getlocale("LC_TIME")
[1] "Spanish_Spain.1252"
parse_date_time(fechas, 'ymd HM')
[1] "2016-06-15 08:39:00 UTC" "2016-04-02 08:39:00 UTC" "2016-12-02 08:39:00 UTC"

anytime function (anytime package) to capture days and month correcly

I have the following character "07.01.2009 22:40:00" where 07 is the day and 01 the month. When I call anytime("07.01.2009 22:40:00") it returns "2009-07-01 22:40:00 -03" as if 07 is the month. Is there any way I tell this function that 07 is the day and not the month?
Another option is strptime from base R
strptime('07.01.2009 22:40:00', format = '%d.%m.%Y %T', tz = 'UTC')
#[1] "2009-01-07 22:40:00 UTC"
Not sure if you can specify specific formats in anytime but here are two alternatives :
Using lubridate :
lubridate::dmy_hms('07.01.2009 22:40:00')
#[1] "2009-01-07 22:40:00 UTC"
Using base R :
as.POSIXct('07.01.2009 22:40:00', format = '%d.%m.%Y %T', tz = 'UTC')
#[1] "2009-01-07 22:40:00 UTC"

Date and Time Issue at 24 'o clock in data set returns NA after POSIXct command

i have a time series Data with 10 Minutes difference when i try to convert to date and time type using `df$Time1 <- dmy_hm(df$Time, tz="Asia/Calcutta")
it returns NA at 24 o Clock time interval as you can see i have tried with df$Time1 <- dmy_hm(df$Time, tz="Asia/Calcutta")and df$Time1 = as.POSIXct(df$Time, format="%d-%m-%y %H:%M") Please do guide me on this i am clueless whats happening at 02-07-16 00:00
One option would be using parse_date_time from lubridate which can take multiple formats
library(lubridate)
parse_date_time(df$Time, c('dmy_HM', 'dmy'))
#[1] "2016-07-01 23:30:00 UTC" "2016-07-01 23:40:00 UTC"
#[3] "2016-07-01 23:50:00 UTC" "2016-07-02 00:00:00 UTC"
data
df <- data.frame(Time = c("01-07-16 23:30", "01-07-16 23:40", "01-07-16 23:50",
"02-07-16"))

R read_excel: How to get the correct timestamps [duplicate]

My actual data looks like:
8/8/2013 15:10
7/26/2013 10:30
7/11/2013 14:20
3/28/2013 16:15
3/18/2013 15:50
When I read this from the excel file, R reads it as:
41494.63
41481.44
41466.60
41361.68
41351.66
So I used as.POSIXct(as.numeric(x[1:5])*86400, origin="1899-12-30",tz="GMT") and I got:
2013-08-08 15:07:12 GMT
2013-07-26 10:33:36 GMT
2013-07-11 14:24:00 GMT
2013-03-28 16:19:12 GMT
2013-03-18 15:50:24 GMT
Why there is a difference in time? How to overcome it?
The problem is that either R of Excel is rounding the number to two decimals. When you convert the for example the cell with 8/8/2013 15:10 to text formatting (in Excel on Mac OSX), you get the number 41494.63194.
When you use:
as.POSIXct(41494.63194*86400, origin="1899-12-30",tz="GMT")
it will give you:
[1] "2013-08-08 15:09:59 GMT"
This is 1 second off from the original date (which is also an indication that 41494.63194 is rounded to five decimals).
Probably the best solution to do is export your excel-file to a .csv or a tab-separated .txt file and then read it into R. This gives me at least the correct dates:
> df
datum
1 8/8/2013 15:10
2 7/26/2013 10:30
3 7/11/2013 14:20
4 3/28/2013 16:15
5 3/18/2013 15:50
Given
x <- c("8/8/2013 15:10","7/26/2013 10:30","7/11/2013 14:20","3/28/2013 16:15","3/18/2013 15:50")
(which is read as a character vector),
try
x <- as.POSIXct(x, format = "%m/%d/%Y %H:%M", tz = "GMT")
It reads correctly as a POSIXct vector to me.
Maybe it is a matter of how R reads the data. Just an example here with lubridate seems to work well.
x <- "8/8/2013 15:10"
library(lubridate)
dmy_hm(x, tz = "GMT")
[1] "2013-08-08 15:10:00 GMT"
This is how it works over here on a Windows system. This is what a source Excel 2010 file looks like:
date num secs constant Rtime
(mm/dd/yyyy) (in Excel) (num*86400) (Windows) (secs-constant)
08/08/2013 15:10 41494.63 3585136200 2209161600 1375974600
07/26/2013 10:30 41481.44 3583996200 2209161600 1374834600
11/07/2013 14:20 41585.60 3592995600 2209161600 1383834000
03/28/2013 16:15 41361.68 3573648900 2209161600 1364487300
03/18/2013 15:50 41351.66 3572783400 2209161600 1363621800
Rtime <- c(1375974600,1374834600,1383834000,1364487300,1363621800)
as.POSIXct(Rtime,origin="1970-01-01",tz="GMT")
#[1] "2013-08-08 15:10:00 GMT" "2013-07-26 10:30:00 GMT"
#[3] "2013-11-07 14:20:00 GMT" "2013-03-28 16:15:00 GMT"
#[5] "2013-03-18 15:50:00 GMT"
Why this constant? Firstly, because Excel and Office generally is a mess when dealing with dates. Seriously, look over here: Why is 1899-12-30 the zero date in Access / SQL Server instead of 12/31?
2209161600 is the difference in seconds between the POSIXct start of 1970-01-01 and 1899-12-30, which is the 0 point in Excel on Windows.
dput(as.POSIXct(2209161600,origin="1899-12-30",tz="GMT"))
#structure(0, tzone = "GMT", class = c("POSIXct", "POSIXt"))

Best way to deal with differing date data [duplicate]

I am trying to do some simple operation in R, after loading a table i encountered a date column which has many formats combined.
**Date**
1/28/14 6:43 PM
1/29/14 4:10 PM
1/30/14 12:09 PM
1/30/14 12:12 PM
02-03-14 19:49
02-03-14 20:03
02-05-14 14:33
I need to convert this to format like 28-01-2014 18:43 i.e. %d-%m-%y %h:%m
I tried this
tablename$Date <- as.Date(as.character(tablename$Date), "%d-%m-%y %h:%m")
but doing this its filling NA in the entire column. Please help me to get this right!
The lubridate package makes quick work of this:
library(lubridate)
d <- parse_date_time(dates, names(guess_formats(dates, c("mdy HM", "mdy IMp"))))
d
## [1] "2014-01-28 18:43:00 UTC" "2014-01-29 16:10:00 UTC"
## [3] "2014-01-30 12:09:00 UTC" "2014-01-30 12:12:00 UTC"
## [5] "2014-02-03 19:49:00 UTC" "2014-02-03 20:03:00 UTC"
## [7] "2014-02-05 14:33:00 UTC"
# put in desired format
format(d, "%m-%d-%Y %H:%M:%S")
## [1] "01-28-2014 18:43:00" "01-29-2014 16:10:00" "01-30-2014 12:09:00"
## [4] "01-30-2014 12:12:00" "02-03-2014 19:49:00" "02-03-2014 20:03:00"
## [7] "02-05-2014 14:33:00"
You'll need to adjust the vector in guess_formats if you come across other format variations.

Resources