Conver text string "ABCYYYYMM" to date format in R - r

I have a data frame where dates are represented by the string "ABC202003" with the format "ABCYYYYMM". How can I remove the "ABC" part and conver it to Date format month-year in R?

Does this work:
> library(dplyr)
> library(stringr)
> str <- c('ABC202003','DEF202004')
> df <- data.frame(str = str)
> df
str
1 ABC202003
2 DEF202004
> df %>% mutate(date = str_extract_all(str, '\\d+')) %>%
+ mutate(date = str_replace_all(date, '(\\d{4})(\\d{2})','\\1-\\2'))
str date
1 ABC202003 2020-03
2 DEF202004 2020-04
>
In month-year format:
> df %>% mutate(date = str_extract_all(str, '\\d+')) %>%
+ mutate(date = str_replace_all(date, '(\\d{4})(\\d{2})','\\2-\\1'))
str date
1 ABC202003 03-2020
2 DEF202004 04-2020
>

The data in the question, corrected.
x <- "ABC022003"
If there are always 3 characters at the beginning of the string, first run this:
date <- as.Date(paste0("01", substring(x, 4)), "%d%m%Y")
If there could be a different number of non-numeric digits, run this:
date <- as.Date(paste0("01", gsub("[^[:digit:]]", "", x)), "%d%m%Y")
Now date is an object of class "character". Any of the following will create a month-year string.
format(date, "%m-%Y")
#[1] "02-2003"
format(date, "%b-%Y")
#[1] "Feb-2003"
zoo::as.yearmon(date)
#[1] "Feb 2003"

We can get the digits with parse_number and then use ymd with truncated to convert to Date class. If needed to change the format to month-Year, then use format
library(dplyr)
library(lubridate)
df %>%
mutate(date = format(ymd(readr::parse_number(str), truncated = 2), '%m-%Y'))
# str date
#1 ABC202003 03-2020
#2 DEF202004 04-2020
if it needs to be Date class, remove the format
df %>%
mutate(date = ymd(readr::parse_number(str), truncated = 2))
# str date
#1 ABC202003 2020-03-01
#2 DEF202004 2020-04-01
data
df <- structure(list(str = c("ABC202003", "DEF202004")),
class = "data.frame", row.names = c(NA,
-2L))

First get rid of the letters at the beginning using gsub
x <- c('ABC202003','DEF202004')
x <- gsub("[^0-9.-]", "", x)
Then use parse_date_time from lubridate to parse it as a date
x <- lubridate::parse_date_time(x, orders = 'ym', truncated = 1)
then finally use format to format them as you wish
format(x, '%Y-%m')
This is the end result:
[1] "2020-03" "2020-04"

Related

Converting ddmmyy-xxxx to date in R

I have a dateframe with a column with numbers that represent a date. So 110190-1111 is ddmmyy-xxxx, where the x's don't matter. It is implicit that the century is 1900.
df <- c("110190-1111", "220391-1111", "241287-1111")
I would like to have it converted to.
c("1990-01-11", "1991-03-22", "1987-12-24)
I have removed the last 4 digits and the "-" with the following.
ID <- c("110190-1111", "220391-1111", "241287-1111")
df <- data.frame(ID)
df <- df %>% mutate(date=gsub("-.*", "", ID))
I have tried fiddling with the as.Date function with no luck. Any suggestions? Thanks.
as.Date ignores junk at the end so
df %>% mutate(Date = as.Date(ID, "%d%m%y"))
giving:
ID Date
1 110190-1111 1990-01-11
2 220391-1111 1991-03-22
3 241287-1111 1987-12-24
or using only base R:
transform(df, Date = as.Date(ID, "%d%m%y"))
We can use dmy from lubridate
library(lubridate)
df$date <- dmy(df$date)

Standardizing the date format using R

I am having trouble standardizing the Date format to be dd-mm-YYYY, This is my current code
Dataset
date
1 23/07/2020
2 22-Jul-2020
Current Output
df$date<-as.Date(df$date)
df$date = format(df$date, "%d-%b-%Y")
date
1 20-Jul-0022
2 <NA>
Desired Output
date
1 23-Jul-2020
2 22-Jul-2020
You can try this way
library(lubridate)
df$date <- dmy(df$date)
df$date <- format(df$date, format = "%d-%b-%Y")
# date
# 1 23-Jul-2020
# 2 22-Jul-2020
Data
df <- read.table(text = "date
1 23/07/2020
2 22-Jul-2020", header = TRUE)
I've saved your example data set as a dataframe named df. I used group_by from dplyr to all each date to be converted separately to the correct format.
library(tidyverse)
df %>%
group_by(date) %>%
mutate(date = as.Date(date, tryFormats = c("%d-%b-%Y", "%d/%m/%Y"))) %>%
mutate(date = format(date, "%d-%b-%Y"))

Keep only the year from a data timestamp column

Having a dataframe like this:
data.frame(id = c(1,3), timestamp = c("20-10-2009 11:35:12", "01-01-2017 12:21:21"), stringAsFactor = FALSE)
How is it possible to keep only year in the timestamp column having in mind that all years are after 2000? An expected output:
data.frame(id = c(1,3), timestamp = c("2009", "2017"), stringAsFactor = FALSE)
Base R:
format(as.Date(df$timestamp, "%d-%m-%Y %H:%M:%S"), "%Y")
[1] "2009" "2017"
So in the dataframe:
df$year <- format(as.Date(df$timestamp, "%d-%m-%Y %H:%M:%S"), "%Y")
id timestamp year
1 1 20-10-2009 11:35:12 2009
2 3 01-01-2017 12:21:21 2017
Another option, if you're into or familiar with regex, is this:
sub(".*([0-9]{4}).*", "\\1", df$timestamp)
[1] "2009" "2017"
See if this answers your question. The code and the output is as follows :-
library(lubridate)
library(tidyverse)
df <- data.frame(id = c(1,3,4), timestamp = c("20-10-2009 11:35:12", "01-01-2017 12:21:21","01-01-1998 12:21:21"), stringAsFactor = FALSE)
df$timestamp <- dmy_hms(df$timestamp)
df1 <- df %>%
filter(year(timestamp) > 2000) %>%
mutate(new_year = year(timestamp))
df1
#id timestamp stringAsFactor new_year
#1 1 2009-10-20 11:35:12 FALSE 2009
#2 3 2017-01-01 12:21:21 FALSE 2017
If you're not afraid of external packages, one option would be to make use of the lubridate package:
df <- data.frame(id = c(1,3), timestamp = c("20-10-2009 11:35:12", "01-01-2017 12:21:21"))
df <- df %>%
mutate(timestamp = lubridate::dmy_hms(timestamp)) %>%
mutate(year = lubridate::year(timestamp))
Obviously, if you actually want to replace the timestampe column, you have to change the last mutate command. Result:
id timestamp year
1 1 2009-10-20 11:35:12 2009
2 3 2017-01-01 12:21:21 2017
I have a tidyverse solution to your problem:
library(tidyverse)
data.frame(id = c(1,3), timestamp = c("20-10-2009 11:35:12", "01-01-2017 12:21:21"), stringAsFactor = FALSE) %>%
mutate(timestamp = timestamp %>%
str_extract("\\d{4}"))
The function str_extract("\\d{4}") should always extract the first four digits of your target variable.

Convert character to date and numeric, maintain same format

In the output of the code below the variables day and sales are in the format that I need but not the type, it outputs type chr instead. The variables should be date and num respectively. I've tried many things but either I get chr or some sort of error. For instance, using as.Date() doesn´t change the variable day to the format "%d/%m/%Y". The code with sample data:
library(dplyr)
library(lubridate)
df <- data.frame(matrix(c("2017-09-04","2017-09-05",103,104,17356,18022),ncol = 3, nrow = 2))
colnames(df) <- c("DATE","ORDER_ID","SALES")
df$DATE <- as.Date(df$DATE, format = "%Y-%m-%d")
df$SALES <- as.numeric(as.character(df$SALES))
df$ORDER_ID <- as.numeric(as.character(df$ORDER_ID))
TOTALSALES <- df %>%
select(ORDER_ID,DATE,SALES) %>%
mutate(weekday = wday(DATE, label=TRUE)) %>%
mutate(DATE=as.Date(DATE)) %>%
filter(!wday(DATE) %in% c(1, 7) & !(DATE %in% as.Date(c('2017-01-02','2017-02-27','2017-02-28','2017-04-14'))) ) %>%
group_by(day=floor_date(DATE,"day")) %>%
summarise(sales=sum(SALES)) %>%
data.frame()
TOTALSALES$day <- TOTALSALES$day %>%
as.POSIXlt(, tz="America/Sao_Paulo") %>%
format("%d/%m/%Y")
TOTALSALES$sales <- TOTALSALES$sales %>%
format(digits=9, decimal.mark=",",nsmall=2,big.mark = ".")
TOTALSALES$day <- as.Date(df$DATE, format = "%d/%m/%Y")
Any idea how can I solve this problem or a direction on how it should be done ?
Appreciate any help
I'm not sure I understand your question.
To print a Date object in a particular date-time format you can use format
# This *converts* a character vector/factor to a vector of Dates
df$DATE <- as.Date(df$DATE, format = "%Y-%m-%d")
# This *prints* the Date vector as a character vector with format "%d/%m/%Y"
format(df$DATE, format = "%d/%m/%Y")
Minimal example
ss <- c("2017-09-04","2017-09-05")
date <- as.Date(ss, format = "%Y-%m-%d")
format(date, format = "%d/%m/%Y")
#[1] "04/09/2017" "05/09/2017"

Format Date to Year-Month in R

I would like to retain my current date column in year-month format as date. It currently gets converted to chr format. I have tried as_datetime but it coerces all values to NA.
The format I am looking for is: "2017-01"
library(lubridate)
df<- data.frame(Date=c("2017-01-01","2017-01-02","2017-01-03","2017-01-04",
"2018-01-01","2018-01-02","2018-02-01","2018-03-02"),
N=c(24,10,13,12,10,10,33,45))
df$Date <- as_datetime(df$Date)
df$Date <- ymd(df$Date)
df$Date <- strftime(df$Date,format="%Y-%m")
Thanks in advance!
lubridate only handle dates, and dates have days. However, as alistaire mentions, you can floor them by month of you want work monthly:
library(tidyverse)
df_month <-
df %>%
mutate(Date = floor_date(as_date(Date), "month"))
If you e.g. want to aggregate by month, just group_by() and summarize().
df_month %>%
group_by(Date) %>%
summarize(N = sum(N)) %>%
ungroup()
#> # A tibble: 4 x 2
#> Date N
#> <date> <dbl>
#>1 2017-01-01 59
#>2 2018-01-01 20
#>3 2018-02-01 33
#>4 2018-03-01 45
You can solve this with zoo::as.yearmon() function. Follows the solution:
library(tidyquant)
library(magrittr)
library(dplyr)
df <- data.frame(Date=c("2017-01-01","2017-01-02","2017-01-03","2017-01-04",
"2018-01-01","2018-01-02","2018-02-01","2018-03-02"),
N=c(24,10,13,12,10,10,33,45))
df %<>% mutate(Date = zoo::as.yearmon(Date))
You can use cut function, and use breaks="month" to transform all your days in your dates to the first day of the month. So any date within the same month will have the same date in the new created column.
This is usefull to group all other variables in your data frame by month (essentially what you are trying to do). However cut will create a factor, but this can be converted back to a date. So you can still have the date class in your data frame.
You just can't get rid of the day in a date (because then, is not a date...). Afterwards you can create a nice format for axes or tables. For example:
true_date <-
as.POSIXlt(
c(
"2017-01-01",
"2017-01-02",
"2017-01-03",
"2017-01-04",
"2018-01-01",
"2018-01-02",
"2018-02-01",
"2018-03-02"
),
format = "%F"
)
df <-
data.frame(
Date = cut(true_date, breaks = "month"),
N = c(24, 10, 13, 12, 10, 10, 33, 45)
)
## here df$Date is a 'factor'. You could use substr to create a formated column
df$formated_date <- substr(df$Date, start = 1, stop = 7)
## and you can convert back to date class. format = "%F", is ISO 8601 standard date format
df$true_date <- strptime(x = as.character(df$Date), format = "%F")
str(df)

Resources