R - How to convert decimal to MM:SS.XXX - r

How do you convert SS.xxx (Seconds.Milliseconds) to MM:SS.xxx (Minutes:Seconds.Milliseconds) using R?
For example, my input is
time = 92.180
my desired output is
time = 01:32.180
All time fields have 3 decimal places.

one option is the lubridate package - since you did not specify the output class I included a few possible outputs:
package(lubridate)
t <- 92.180
# your output string as character
lubridate::seconds(t) %>%
lubridate::as_datetime() %>%
format("%M:%OS3")
# output as period
lubridate::seconds(t) %>%
lubridate::as.period()
# output as duration
lubridate::seconds(t) %>%
lubridate::as.duration()
# output as time time
lubridate::seconds(t) %>%
lubridate::as.difftime()

Related

How to run left join in dplyr transforming the key columns ( using lubridate function) on the fly

I have two databases where I need to combine columns based on 2 common Date columns, with condition that the DAY for those dates are the same.
"2020/01/01 20:30" MUST MATCH "2020/01//01 17:50"
All dates are in POSIXct format.
While I could use some pre-cprocessing with string parsing or the like, I wanted to handle it via lubridate/dplyr like:
DB_New <- left_join(DB_A,DB_B, by=c((date(Date1) = date(Date2)))
notice I am using the function "date" from dplyr to rightly match condition as explained above. I am though getting the error as below:
DB_with_rain <- left_join(DB_FEB_2019_join,Chuvas_BH, by=c(date(Saida_Real)= date(DateTime)))
Error: unexpected '=' in "DB_with_rain <- left_join(DB_FEB_2019_join,Chuvas_BH, by=c(date(Saida_Real)="
Within in the by, we cannot do the conversion - it expects the column name as a string. It should be done before the left_join
library(dplyr)
DF_FEB_2019_join %>%
mutate(Saida_Real = as.Date(Saida_Real, format = "%Y/%m/%d %H:%M")) %>%
left_join(Chuvas_BH %>%
mutate(DateTime = as.Date(DateTime, format = "%Y/%m/%d %H:%M")),
by = c(Saida_Real = "DateTime"))
With lubridate function, the as.Date can be replaced with ymd_hm and convert to Date class with as.Date

Can't figure out how to change "X5.13.1996" to date class?

I have dates listed as "X5.13.1996", representing May 13th, 1996. The class for the date column is currently a character.
When using mdy from lubridate, it keeps populating NA. Is there a code I can use to get rid of the "X" to successfully use the code? Is there anything else I can do?
You can use substring(date_variable, 2) to drop the first character from the string.
substring("X5.13.1996", 2)
[1] "5.13.1996"
To convert a variable (i.e., column) in your data frame:
library(dplyr)
library(lubridate)
dates <- data.frame(
dt = c("X5.13.1996", "X11.15.2021")
)
dates %>%
mutate(converted = mdy(substring(dt, 2)))
or, without dplyr:
dates$converted <- mdy(substring(dates$dt, 2))
Output:
dt converted
1 X5.13.1996 1996-05-13
2 X11.15.2021 2021-11-15

R: extract dates and numbers from PDF

I'm really struggling to extract the proper information from several thousands PDF files from NTSB (some Dates and numbers to be specific); these PDFs don't require to be OCRed and each report is almost identical in length and layout information.
I need to extract the date and the time of the accident (first page) and some other information, like Pilot's age or its Flight experience. What I tried does the job for several files but is not working for each file the since code I am using is poorly written.
# an example with a single file
library(pdftools)
library(readr)
# Download the file and read it row by row
file <- 'http://data.ntsb.gov/carol-repgen/api/Aviation/ReportMain/GenerateNewestReport/89789/pdf' # less than 100 kb
destfile <- paste0(getwd(),"/example.pdf")
download.file(file, destfile)
pdf <- pdf_text(destfile)
rows <-scan(textConnection(pdf),
what="character", sep = "\n")
# Extract the date of the accident based on the 'Date & Time' occurrence.
date <-rows[grep(pattern = 'Date & Time', x = rows, ignore.case = T, value = F)]
date <- strsplit(date, " ")
date[[1]][9] #this method is not desirable since the date will not be always in that position
# Pilot age
age <- rows[grep(pattern = 'Age', x = rows, ignore.case = F, value = F)]
age <- strsplit(age, split = ' ')
age <- age[[1]][length(age[[1]])] # again, I'm using the exact position in that list
age <- readr::parse_number(age) #
The main issue I got is when I am trying to extract the date and time of the accident. Is it possible to extract that exact piece of information by avoiding using a list as I did here?
I think the best approach to achieve what you want is to use regex.
In this case I use stringr library. The main idea with regex is to find
the desire string pattern, in this case is the date 'July 29, 2014, 11:15'
Take on count that you'll have to check the date format for each pdf file
library(pdftools)
library(readr)
library(stringr)
# Download the file and read it row by row
file <- 'http://data.ntsb.gov/carol-repgen/api/Aviation/ReportMain/GenerateNewestReport/89789/pdf' # less than 100 kb
destfile <- paste0(getwd(), "/example.pdf")
download.file(file, destfile)
pdf <- pdf_text(destfile)
## New code
# Regex pattern for date 'July 29, 2014, 11:15'
regex_pattern <- "[T|t]ime\\:(.*\\d{2}\\:\\d{2})"
# Getting date from page 1
grouped_matched <- str_match_all(pdf[1], regex_pattern)
# This returns a list with groups. You're interested in group 2
raw_date <- grouped_matched[[1]][2] # First element, second group
# Clean date
date <- trimws(raw_date)
# Using dplyr
library(dplyr)
date <- pdf[1] %>%
str_match_all(regex_pattern) %>%
.[[1]] %>% # First list element
.[2] %>% # Second group
trimws() # Remove extra white spaces
You can make a function to extract the date changing the regex pattern for different files
Regards

How to format properly date-time column in R using mutate?

I am trying to format a string column to a date-time serie.
The row in the column are like this example: "2019-02-27T19:08:29+000"
(dateTime is the column, the variable)
mutate(df,dateTime=as.Date(dateTime, format = "%Y-%m-%dT%H:%M:%S+0000"))
But the results is:
2019-02-27
What about the hours, minutes and seconds ?
I need it to apply a filter by date-time
Your code is almost correct. Just the extra 0 and the as.Date command were wrong:
library("dplyr")
df <- data.frame(dateTime = "2019-02-27T19:08:29+000",
stringsAsFactors = FALSE)
mutate(df, dateTime = as.POSIXct(dateTime, format = "%Y-%m-%dT%H:%M:%S+000"))

How do I create a column which takes a date from another column in R?

I have a data frame of a few columns, the last one is called a Filename. This is how it looks like.
Product Company Filename
… … mg-tvd_bmmh_20170930.csv
… … mg-tvd_bmmh_2016_06_13.csv
… … …
I am trying to write a short script in R which takes dates from a filename and transforms it into a new column which I call a Date. So a new data frame would look like this:
Product Company Date Filename
… … 09/30/2017 mg-tvd_bmmh_20170930.csv
… … 16/13/2017 mg-tvd_bmmh_2016_06_13.csv
… … … …
This is a relevant piece of my script.
df <- mutate(df, Date <- grep(pattern = "(\d{4})_?(\d{2})_?
(\d{1,2})", df$Filename, value = TRUE))
ddf$Date <- as.Date(Date,format = "%m/%d/%y")
Any advice why I can't get it working?
I am getting these errors:
Error: '\d' is an unrecognized escape in character string starting ""(\d"
Error in as.Date(Date, format = "%m/%d/%y") :
object 'Date' not found
You can use this command:
transform(df, Date = as.Date(sub(".*\\D(\\d{4})_?(\\d{2})_?(\\d{1,2}).*",
"\\1\\2\\3", Filename), "%Y%m%d"))
You are getting the error because instead of:
ddf$Date <- as.Date(Date,format = "%m/%d/%y")
you should have:
df$Date <- as.Date(df$Date,format = "%Y/%m/%d")
or:
df %>%
mutate(Date = as.Date(df$Date,format = "%Y/%m/%d"))
The incorrect specification of format = "%m/%d/%y" would give you NA values in Date while the incorrect reference of as.Date(Date, ... would throw you the error.
You can also use str_extract from stringr to extract the dates and ymd from lubridate to parse it to Date object:
library(dplyr)
library(stringr)
library(lubridate)
df %>%
mutate(Date = ymd(str_extract(Filename, "\\d{4}_?\\d{2}_?\\d{2}(?=\\.csv)")))
Data:
Product Company Filename Date
1 1 3 mg-tvd_bmmh_20170930.csv 2017-09-30
2 2 4 mg-tvd_bmmh_2016_06_13.csv 2016-06-13
The advantage with ymd is that it "...recognize arbitrary non-digit separators as well as no separator..." So there is no need to standardize the Date character vector before parsing. For instance,
> df$Filename %>% str_extract("\\d{4}_?\\d{2}_?\\d{2}(?=\\.csv)")
[1] "20170930" "2016_06_13"
The error you show is originating because special characters in regex need to be double escaped in R (e.g. \d should be \\d). I would suggest using sub for the regex portion so you can control the output, and adding wildcards (*) after the underscores to get matches if there is or is not an underscore (like your example shows).
Formatting in as.Date wants a capital Y (%Y) for year.
The updated code would be:
df <- mutate(df, Date = sub(pattern = ".*_(\\d{4})_*(\\d{2})_*(\\d{1,2}).*", "\\2/\\3/\\1", df$Filename))
df$Date <- as.Date(df$Date,format = "%m/%d/%Y")

Resources