How to format properly date-time column in R using mutate? - r

I am trying to format a string column to a date-time serie.
The row in the column are like this example: "2019-02-27T19:08:29+000"
(dateTime is the column, the variable)
mutate(df,dateTime=as.Date(dateTime, format = "%Y-%m-%dT%H:%M:%S+0000"))
But the results is:
2019-02-27
What about the hours, minutes and seconds ?
I need it to apply a filter by date-time

Your code is almost correct. Just the extra 0 and the as.Date command were wrong:
library("dplyr")
df <- data.frame(dateTime = "2019-02-27T19:08:29+000",
stringsAsFactors = FALSE)
mutate(df, dateTime = as.POSIXct(dateTime, format = "%Y-%m-%dT%H:%M:%S+000"))

Related

How to run left join in dplyr transforming the key columns ( using lubridate function) on the fly

I have two databases where I need to combine columns based on 2 common Date columns, with condition that the DAY for those dates are the same.
"2020/01/01 20:30" MUST MATCH "2020/01//01 17:50"
All dates are in POSIXct format.
While I could use some pre-cprocessing with string parsing or the like, I wanted to handle it via lubridate/dplyr like:
DB_New <- left_join(DB_A,DB_B, by=c((date(Date1) = date(Date2)))
notice I am using the function "date" from dplyr to rightly match condition as explained above. I am though getting the error as below:
DB_with_rain <- left_join(DB_FEB_2019_join,Chuvas_BH, by=c(date(Saida_Real)= date(DateTime)))
Error: unexpected '=' in "DB_with_rain <- left_join(DB_FEB_2019_join,Chuvas_BH, by=c(date(Saida_Real)="
Within in the by, we cannot do the conversion - it expects the column name as a string. It should be done before the left_join
library(dplyr)
DF_FEB_2019_join %>%
mutate(Saida_Real = as.Date(Saida_Real, format = "%Y/%m/%d %H:%M")) %>%
left_join(Chuvas_BH %>%
mutate(DateTime = as.Date(DateTime, format = "%Y/%m/%d %H:%M")),
by = c(Saida_Real = "DateTime"))
With lubridate function, the as.Date can be replaced with ymd_hm and convert to Date class with as.Date

Converting numbers to dates in R

I have a large dataset that I'm importing from a txt file that has multiple date variables that are being formatted as number values 20190101, is there a way to assign a date format as part of import? There is no header in the file and I'm assigning names and lengths sample code below.
df <- read_fwf("file name",
fwf_cols(id = 8,
update_date = 8,
name = 35),
skip = 0)
Or is there a way to convert multiple values in one statement vs one at a time?
df$update_date <- as.Date(as.character(df$update_date), "%Y%m%d")
Here is a way to convert multiple values in one statement into Dates
(assuming yyyy mm dd). Here we target all columns that end with "date" in their name.
library(dplyr)
df <- data.frame(update_date = c(20190101, 20190102, 20190103),
end_date = c(20200101, 20200102, 20200103))
df %>% mutate_at(vars(ends_with("date")), ~as.Date(as.character(.x),format="%Y%m%d"))
You might similarly use
mutate_at(vars(starts_with("date"))
or
mutate_at(vars(c(update_date, end_date)

yyyymmddHHMMSS convert to yyyy-mm-ss HH:MM:SS

In R:
How can I change the column of a data frame from yyyymmddHHMMSS to yyyy-mm-ss HH:MM:SS?
I tried
for(i in 1:nrow(tabla_eaq)){
tabla_eaq[i,'datetime'] = ymd_hms(tabla_eaq[i,'datetime'])
}
But it shows up as for example 1606514222 for input 20201127215702.
We don't need a for loop as the ymd_hms is vectorized
library(lubridate)
tabla_eaq$datetime <- ymd_hms(tabla_eaq$datetime)
data
tabla_eaq <- data.frame(datetime = c(20201127215702, 20201127215702, 20201127215702))

Aggregate date and time in R

My data has a start and end time stamp such as this:
200401010000 200401010030
200401010030 200401010100
200401010100 200401010130 and so on...
I'm trying to convert these fields into %YYYY%MM%DD%HH%MM format using lubridate and as.POSIXct but it I get only NAs. Any help will be appreciated.
My goal is to aggregate the data for each month.
The code I've used so far is as follows:
start_time = as.POSIXct(dat$TIMESTAMP_START, format = "%YYYY%MM%DD %HH%MM",origin = "2004-01-01 00:00", tz="EDT")
stop_time = as.POSIXct(dat$TIMESTAMP_END, format = "%YYYY%MM%DD%HH%MM",origin = "2004-01-01 00:30", tz="EDT")
dat$interval <- interval(start_time, stop_time)
Two problems I can see:
If you're using lubridate already, you should probably use the function ymd_hm(), which is just cleaner IMO.
You can't apply that function to a vector (which I presume dat$TIMESTAMP_START and dat$TIMESTAMP_END are); to do this, you can use:
start_time <- sapply(dat$TIMESTAMP_START, ymd_hm())
end_time <- sapply(dat$TIMESTAMP_END, ymd_hm())
That will apply the function to each item in your vector.

How do I create a column which takes a date from another column in R?

I have a data frame of a few columns, the last one is called a Filename. This is how it looks like.
Product Company Filename
… … mg-tvd_bmmh_20170930.csv
… … mg-tvd_bmmh_2016_06_13.csv
… … …
I am trying to write a short script in R which takes dates from a filename and transforms it into a new column which I call a Date. So a new data frame would look like this:
Product Company Date Filename
… … 09/30/2017 mg-tvd_bmmh_20170930.csv
… … 16/13/2017 mg-tvd_bmmh_2016_06_13.csv
… … … …
This is a relevant piece of my script.
df <- mutate(df, Date <- grep(pattern = "(\d{4})_?(\d{2})_?
(\d{1,2})", df$Filename, value = TRUE))
ddf$Date <- as.Date(Date,format = "%m/%d/%y")
Any advice why I can't get it working?
I am getting these errors:
Error: '\d' is an unrecognized escape in character string starting ""(\d"
Error in as.Date(Date, format = "%m/%d/%y") :
object 'Date' not found
You can use this command:
transform(df, Date = as.Date(sub(".*\\D(\\d{4})_?(\\d{2})_?(\\d{1,2}).*",
"\\1\\2\\3", Filename), "%Y%m%d"))
You are getting the error because instead of:
ddf$Date <- as.Date(Date,format = "%m/%d/%y")
you should have:
df$Date <- as.Date(df$Date,format = "%Y/%m/%d")
or:
df %>%
mutate(Date = as.Date(df$Date,format = "%Y/%m/%d"))
The incorrect specification of format = "%m/%d/%y" would give you NA values in Date while the incorrect reference of as.Date(Date, ... would throw you the error.
You can also use str_extract from stringr to extract the dates and ymd from lubridate to parse it to Date object:
library(dplyr)
library(stringr)
library(lubridate)
df %>%
mutate(Date = ymd(str_extract(Filename, "\\d{4}_?\\d{2}_?\\d{2}(?=\\.csv)")))
Data:
Product Company Filename Date
1 1 3 mg-tvd_bmmh_20170930.csv 2017-09-30
2 2 4 mg-tvd_bmmh_2016_06_13.csv 2016-06-13
The advantage with ymd is that it "...recognize arbitrary non-digit separators as well as no separator..." So there is no need to standardize the Date character vector before parsing. For instance,
> df$Filename %>% str_extract("\\d{4}_?\\d{2}_?\\d{2}(?=\\.csv)")
[1] "20170930" "2016_06_13"
The error you show is originating because special characters in regex need to be double escaped in R (e.g. \d should be \\d). I would suggest using sub for the regex portion so you can control the output, and adding wildcards (*) after the underscores to get matches if there is or is not an underscore (like your example shows).
Formatting in as.Date wants a capital Y (%Y) for year.
The updated code would be:
df <- mutate(df, Date = sub(pattern = ".*_(\\d{4})_*(\\d{2})_*(\\d{1,2}).*", "\\2/\\3/\\1", df$Filename))
df$Date <- as.Date(df$Date,format = "%m/%d/%Y")

Resources