I am not getting the right conversion when I try to convert 12 hours to 24 hours. My script (with sample data is below).
library(lubridate)
library(data.table)
# Read Sample Data
df <- data.frame(c("April 22 2016 10:49:15 AM","April 22 2016 10:01:21 AM","April 22 2016 09:06:40 AM","April 21 2016 09:50:49 PM","April 21 2016 06:07:18 PM"))
colnames(df) <- c("Date") # Set Column name
dt <- setDT(df) # Convert to data.table
ff <- function(x) as.POSIXlt(strptime(x,"%B %d %Y %H:%M:%S %p"))
dt[,dates := as.Date(ff(Date))]
When I try creating a new variable called TOD, I get the output in H:M:S format without converting it into 24 hour format. What I mean is that for the 3rd row, instead of getting 21:50:49 I get 09:50:49. I tried two different ways to do this. One use as.ITime from data.table and then also using strptime. The code I use to calculate TOD is below.
dt[,TOD1 := as.ITime(ff(Date))]
dt$TOD2 <- format(strptime(dt$Date, "%B %d %Y %H:%M:%S %p"), format="%I:%M:%S")
I thought of trying it using dataframe instead of data.table to eliminate any issues with using strptime in data.table and still got the same answer.
df$TOD <- format(strptime(df$Date, "%B %d %Y %H:%M:%S %p"), format="%I:%M:%S") # Using dataframe instead of data.table
Any insights on how to get the right answer?
As commented #lmo, you need to use %I parameter instead of %H, from ?strptime:
%H
Hours as decimal number (00–23). As a special exception strings
such as 24:00:00 are accepted for input, since ISO 8601 allows these.
%I
Hours as decimal number (01–12).
strptime("April 21 2016 09:50:49 PM", "%B %d %Y %I:%M:%S %p")
# [1] "2016-04-21 21:50:49 EDT"
Here you go:
library(lubridate)
df$Date <- mdy_hms(df$Date)
Note that while mdy_hms is extremely convenient and takes care of the 12 / 24 hour time for you, it will automatically assign UTC as a time zone. You can specify a different one if you need. You can then convert df to a data.table if you like.
Related
I'm trying to convert a character column with dates to date format. However, the dates are in an ambiguous format. Some entries are of the format %d.%m.%Y (e.g., "03.02.2021"), while others are %d %b %Y (e.g., "3 Feb 2021").
I've tried as.Date(tryFormats=c("%d %b %Y", "%d.%m.%Y")), but realized that tryFormats is only flexible for the first entry, so that the entries of type %d %b %Y are correctly identified but those of %d.%m.%Y become NAs, or vice versa. I've also tried the anytime package, but that produced NAs in a similar fashion.
I've made sure that the column doesn't contain any NAs or empty strings, and I don't receive any error message.
Try the parsedate package :
df <-read.table(header=TRUE,text=
"d
03.02.2021
'3 Feb 2021'
13/3/2021
13-3-2020")
df %>% mutate(date=parsedate::parse_date(d))
## d date
##1 03.02.2021 2021-02-03
##2 3 Feb 2021 2021-02-03
##3 13/3/2021 2021-03-13
##4 13-3-2020 2020-03-13
Similar (but expanded) to Roland's suggestion, my answer here (in the (2) section) suggests a way to deal with multiple candidate formats.
## sample data
x <- c("03.02.2021", "3 Feb 2021")
formats <- c("%d.%m.%Y", "%d %b %Y")
dates <- as.Date(rep(NA, length(x)))
for (fmt in formats) {
nas <- is.na(dates)
dates[nas] <- as.Date(x[nas], format=fmt)
}
dates
# [1] "2021-02-03" "2021-02-03"
It is better to have the most-frequent format first in the formats vector. One could add a quick-escape to the loop if there are many formats, such as
for (fmt in formats) {
nas <- is.na(dates)
if (!any(nas)) break
dates[nas] <- as.Date(x[nas], format=fmt)
}
but I suspect that it really won't be very beneficial unless both formats and x are rather large (I have no sizes in mind to quantify "large").
did you try lubridate ?
df <-read.table(header=TRUE,text=
"d
03.02.2021
'3 Feb 2021'
13/3/2021
13-3-2020")
dmy(df$d)
[1] "2021-02-03" "2021-02-03" "2021-03-13" "2020-03-13"
Using anydate
library(anytime)
addFormats(c("%d/m/%Y", '%d-%m-%Y') )
anydate(df$d)
[1] "2021-02-03" "2021-02-03" "2021-03-13" "2020-03-13"
I merged several other dataframes together. However, now the dates are no longer chronologically order (See photo). How do I order the dataframe based on the values of the 'Date' column?
R dataframe output which I want to change
I first tried to set the 'Date' column as index, but since the 'Date' column does not only have unique values, I can't.
Whenever I do:
new_df <- new_df[order(new_df$Date),]
it only sorts the dates based on their first value.
Next to that, sometimes there are multiple exact the same values for the 'Date' column. How can I make the index the same whenever the 'Date' column has the exact same value?
The order should be based on the converted to Date class
new_df$Date1 <- as.Date(new_df$Date, "%A, %d %b %Y, %H:%M")
If we want to keep the time part as well in ordering, use as.POSIXct
new_df$Date1 <- as.POSIXct(new_df$Date,format = "%A, %d %b %Y, %H:%M")
and then do
new_df <- new_df[order(new_df$Date1),]
If we want to create a time series object, use xts
library(xts)
xts(new_df["Income"], order.by = new_df$Date1)
As a reproducible example
> str1 <- "Saturday, 12 Apr 2014, 18:00"
> as.Date(str1, "%A, %d %b %Y, %H:%M")
[1] "2014-04-12"
> as.POSIXct(str1, format = "%A, %d %b %Y, %H:%M")
[1] "2014-04-12 18:00:00 EDT"
I need to calculate time difference in minutes/hours/days etc between 2 Date-Time columns of two dataframes, please find the details below
df1 <- data.frame (Name = c("Aks","Bob","Caty","David"),
timestamp = c("Mon Apr 1 14:23:09 1980", "Sun Jun 12 12:10:21 1975", "Fri Jan 5 18:45:10 1985", "Thu Feb 19 02:26:19 1990"))
df2 <- data.frame (Name = c("Aks","Bob","Caty","David"),
timestamp = c("Apr-01-1980 14:28:00","Jun-12-1975 12:45:10","Jan-05-1985 17:50:30","Feb-19-1990 02:28:00"))
I am facing problem in converting df1$timestamp and df2$timestamp , here POSIXct & as.Date are not working getting error - non numeric argument to binary operator
I need to calculate time diff in mins/hrs or days
One approach is strptime and indicate the appropriate directives in the datetime format:
df1$timestamp2 <- strptime(df1$timestamp, "%a %b %d %H:%M:%S %Y")
df2$timestamp2 <- strptime(df2$timestamp, "%b-%d-%Y %H:%M:%S")
In this case, you have:
%a abbreviated weekday name
%b abbreviated month name
%d day of the month
%H hour, 24-hour clock
%M minute
%S second
%Y year including century
Then you can use difftime to get the difference, and specify the units (in this case, difference expressed in hours):
difftime(df1$timestamp2, df2$timestamp2, units = "hours")
Output
Time differences in hours
[1] -0.08083333 -0.58027778 0.91111111 -0.02805556
If locale-setting prevent correct reading, try:
# Store current locale
orig_locale <- Sys.getlocale("LC_TIME")
Sys.setlocale("LC_TIME", "C")
# Convert to posix-timestamp
df1$timestamp <- as.POSIXct( df1$timestamp, format = "%a %b %d %H:%M:%S %Y")
df2$timestamp <- as.POSIXct( df2$timestamp, format = "%b-%d-%Y %H:%M:%S")
# Restore locale
Sys.setlocale("LC_TIME", orig_locale)
# Calculate difference
df2$timestamp - df1$timestamp
# Time differences in mins
# [1] 4.850000 34.816667 -54.666667 1.683333
I am trying to parse a column into two variables, "date" and "time" in R. I have installed the lubridate library.
The current csv file has the following timestamp format: yyyyMMdd hh:mm a (e.g. '20170423 12:26 AM') and imports the column as character.
I'm trying this but its not working on my current variable 'Tran_Date' (below code doesn't work):
transactions_file <- as_date('Tran_Date', "%Y%m%d %H:%M %p")
I like the base R solution like this,
Tran_Date <- as.POSIXct("20170423 12:26 AM", format = "%Y%m%d %I:%M %p")
Tran_Date
#> [1] "2017-04-23 00:26:00 CEST"
transactions_file <- data.frame(
date = format(Tran_Date,"%m/%d/%Y"),
time = format(Tran_Date,"%H:%M")) # possibly add %p if you use %I
transactions_file
#> date time
#> 1 04/23/2017 00:26
with lubridate,
# install.packages(c("tidyverse"), dependencies = TRUE)
library(lubridate)
Tran_Date <- ymd_hm("20170423 12:26 AM")
then you could recycle the above or use some combination of day(Tran_Date) cbind paste with month(Tran_Date) and similar with paste(hour(Tran_Date), minute(Tran_Date), sep = ":") or most likely something smarter.
I have a data frame where the date format is as follows:
1:9:Tue Aug 12 2014 19:25:24 GMT+0530 (IST)
I want to extract three variables day, date and time in three different columns and add it to the data frame
Day as Tue
Date as 12/08/2014
Time as 7:25:24PM
The first two numbers do not mean anything.
The dataframe consists of over 700,000 rows and I want to the new columns to replace the existing ones.
You should be careful about adding the datetime to your data.frame as 3 separate columns, because your 3 columns do not uniquely identify a specific datetime because you do not account for timezone. This shouldn't be a problem if all your datetimes are in the same timezone though.
s <- '1:9:Tue Aug 12 2014 19:25:24 GMT+0530 (IST)'
# If the first two numbers do not mean anything and are always separated by a
# colon, then we can remove them with the following gsub command:
s <- gsub("^[[:digit:]:]+","",s)
# Now we can convert the string to a POSIXlt object, assuming they all follow
# the format of including "GMT" before the signed timezone offset
p <- strptime(s, "%a %b %d %Y %H:%M:%S GMT%z")
The above will work even if your datetimes have different timezone offsets. For example:
# these times are the same, just in a different timezone (the second is made up)
s <- c('1:9:Tue Aug 12 2014 19:25:24 GMT+0530 (IST)',
'9:1:Tue Aug 12 2014 19:55:24 GMT+0600 (WAT)')
s <- gsub("^[[:digit:]:]+","",s)
p <- strptime(s, "%a %b %d %Y %H:%M:%S GMT%z")
# the times are the same
as.POSIXct(p, tz="UTC")
# [1] "2014-08-12 08:55:24 UTC" "2014-08-12 08:55:24 UTC"
Formatting the datetimes into the strings you want is easy; just use the format specifications in ?strptime.
data.frame(Day=format(p, "%a"), Date=format(p, "%d/%m/%Y"),
Time=format(p, "%I:%M:%S%p"), stringsAsFactors=FALSE)
This was a tough one. R doesn't have the best support for string and date/time functions. But I was able to get it to work with some hacks:
str <- '1:9:Tue Aug 12 2014 19:25:24 GMT+0530 (IST)';
fieldsBad <- strsplit(str,':')[[1]];
fields <- c(fieldsBad[1:2],paste0(fieldsBad[3:length(fieldsBad)],collapse=':'));
dt <- strptime(fields[3],'%a %b %d %Y %H:%M:%S');
df <- data.frame();
df[1,'Day'] <- strftime(dt,'%a');
df[1,'Date'] <- strftime(dt,'%d/%m/%Y');
df[1,'Time'] <- gsub('^0','',strftime(dt,'%I:%M:%S%p'));
df;
Shows:
Day Date Time
1 Tue 12/08/2014 7:25:24PM
Explanation of hacks:
Unfortunately, the strsplit() function does not allow specifying a maximum number of fields to produce, unlike (for example) http://perldoc.perl.org/functions/split.html in Perl, which has a LIMIT argument, which would be perfect here. So I had to sort of "over-split" and then paste the extra fields back together again on colon with paste0().
Also, the strptime() call ignores the time zone information, although fortunately still parses all it can from the input string. I tried passing the time zone information explicitly to the tz= argument, but it wouldn't recognize IST or GMT+0530 or anything I tried. But since you don't seem to require the time zone, we're ok.
Finally, no format specifier for strftime() seems to allow specifying the 12-hour time without a leading zero, so I had to use %I and call gsub() to strip it off, if present.
library(lubridate)
library(stringr)
d <- "1:9:Tue Aug 12 2014 19:25:24 GMT+0530 (IST)"
d <- gsub("^[[:alnum:]:]+ ", "", d)
tz <- gsub("[ +-]", "", str_extract(d, " ([[:upper:]]+)[+-]"))
strptime(d, "%b %d %Y %H:%M:%S", tz=tz)
## [1] "Aug 12 2014 19:25:24 GMT+0530 (IST)"
You'll prbly need to mapply that in a data frame context since strptime takes an atomic vector for tz. So, do something like:
dat$parsed <- mapply(as.POSIXct,
gsub("^[[:alnum:]:]+ ", "", dat$date),
format="%b %d %Y %H:%M:%S",
tz=gsub("[ +-]", "", str_extract(dat$date, " ([[:upper:]]+)[+-]")))
(that'll make dat$parsed numeric, but that's what POSIXct converts it to, so it's easy to work with)
I realy don't know how to do it in R, but if you get this string from js, you can do something like this:
var date = new Date('Tue Aug 12 2014 19:25:24 GMT+0530 (IST)');
console.log(date.getTime());
console.log(date.getTimezoneOffset());
get time method will return unix timestamp in ms, and getTimezoneOffset will return timezone offset in minutes. Then, you can parse it using date funcions in R. I hope, it is implemented there.