Converting string into year-month in R? - r

I want to convert the following string into year-month?
df <- tribble(
~date,
'20201227',
)
Here is the desired output.
new_df <- tribble(
~date,
'2020-12',
)
How can I do this?

Convert to Date class and use format
library(dplyr)
df <- df %>%
mutate(date = format(as.Date(date, '%Y%m%d'), '%Y-%m'))

Another possible option using gsub (but the as.Date answer by #akrun is more recommended)
transform(
df,
date = gsub("(\\d{4})(\\d{2}).*", "\\1-\\2", date)
)
gives
date
1 2020-12

Related

How to Invert date format

I have a large database with one of the columns containing dates with the following format: DD-MM-YYYY.
I would like to invert the date format, to something like this: YYYY-MM-DD.
Can someone tell me how can I do it using bash OR R?
A possible solution:
library(tidyverse)
library(lubridate)
df <- data.frame(date=c("11-4-2021","5-6-2019"))
df %>%
mutate(date2 = dmy(date) %>% ymd)
#> date date2
#> 1 11-4-2021 2021-04-11
#> 2 5-6-2019 2019-06-05
In bash, we can use string manipulation:
dmy=30-12-2021
echo "${dmy:6:4}-${dmy:3:2}-${dmy:0:2}" # 2021-12-30
or with read:
IFS="-" read -r d m y <<<"$dmy"
echo "$y-$m-$d"
I used R to solve my problem like this:
df > data.frame with dates on column "eventDate". Dates were in the format DD-MM-YYYY. There were several cells with incomplete dates (e.g. MM-YYYY or YYYY).
library(tidyr)
x <- separate(df, col = eventDate, into = c("day", "month", "year"), sep="-")
y <- x %>% unite("eventDate_2", year:month:day, remove=TRUE, sep="-", na.rm= TRUE)
y <- cbind(y, df$eventDate) # add the original column for comparing if it had worked and correct individual errors.
echo "${dmy:6:4}-${dmy:3:2}-${dmy:0:2}"

Converting ddmmyy-xxxx to date in R

I have a dateframe with a column with numbers that represent a date. So 110190-1111 is ddmmyy-xxxx, where the x's don't matter. It is implicit that the century is 1900.
df <- c("110190-1111", "220391-1111", "241287-1111")
I would like to have it converted to.
c("1990-01-11", "1991-03-22", "1987-12-24)
I have removed the last 4 digits and the "-" with the following.
ID <- c("110190-1111", "220391-1111", "241287-1111")
df <- data.frame(ID)
df <- df %>% mutate(date=gsub("-.*", "", ID))
I have tried fiddling with the as.Date function with no luck. Any suggestions? Thanks.
as.Date ignores junk at the end so
df %>% mutate(Date = as.Date(ID, "%d%m%y"))
giving:
ID Date
1 110190-1111 1990-01-11
2 220391-1111 1991-03-22
3 241287-1111 1987-12-24
or using only base R:
transform(df, Date = as.Date(ID, "%d%m%y"))
We can use dmy from lubridate
library(lubridate)
df$date <- dmy(df$date)

Standardizing the date format using R

I am having trouble standardizing the Date format to be dd-mm-YYYY, This is my current code
Dataset
date
1 23/07/2020
2 22-Jul-2020
Current Output
df$date<-as.Date(df$date)
df$date = format(df$date, "%d-%b-%Y")
date
1 20-Jul-0022
2 <NA>
Desired Output
date
1 23-Jul-2020
2 22-Jul-2020
You can try this way
library(lubridate)
df$date <- dmy(df$date)
df$date <- format(df$date, format = "%d-%b-%Y")
# date
# 1 23-Jul-2020
# 2 22-Jul-2020
Data
df <- read.table(text = "date
1 23/07/2020
2 22-Jul-2020", header = TRUE)
I've saved your example data set as a dataframe named df. I used group_by from dplyr to all each date to be converted separately to the correct format.
library(tidyverse)
df %>%
group_by(date) %>%
mutate(date = as.Date(date, tryFormats = c("%d-%b-%Y", "%d/%m/%Y"))) %>%
mutate(date = format(date, "%d-%b-%Y"))

Convert character to date and numeric, maintain same format

In the output of the code below the variables day and sales are in the format that I need but not the type, it outputs type chr instead. The variables should be date and num respectively. I've tried many things but either I get chr or some sort of error. For instance, using as.Date() doesn´t change the variable day to the format "%d/%m/%Y". The code with sample data:
library(dplyr)
library(lubridate)
df <- data.frame(matrix(c("2017-09-04","2017-09-05",103,104,17356,18022),ncol = 3, nrow = 2))
colnames(df) <- c("DATE","ORDER_ID","SALES")
df$DATE <- as.Date(df$DATE, format = "%Y-%m-%d")
df$SALES <- as.numeric(as.character(df$SALES))
df$ORDER_ID <- as.numeric(as.character(df$ORDER_ID))
TOTALSALES <- df %>%
select(ORDER_ID,DATE,SALES) %>%
mutate(weekday = wday(DATE, label=TRUE)) %>%
mutate(DATE=as.Date(DATE)) %>%
filter(!wday(DATE) %in% c(1, 7) & !(DATE %in% as.Date(c('2017-01-02','2017-02-27','2017-02-28','2017-04-14'))) ) %>%
group_by(day=floor_date(DATE,"day")) %>%
summarise(sales=sum(SALES)) %>%
data.frame()
TOTALSALES$day <- TOTALSALES$day %>%
as.POSIXlt(, tz="America/Sao_Paulo") %>%
format("%d/%m/%Y")
TOTALSALES$sales <- TOTALSALES$sales %>%
format(digits=9, decimal.mark=",",nsmall=2,big.mark = ".")
TOTALSALES$day <- as.Date(df$DATE, format = "%d/%m/%Y")
Any idea how can I solve this problem or a direction on how it should be done ?
Appreciate any help
I'm not sure I understand your question.
To print a Date object in a particular date-time format you can use format
# This *converts* a character vector/factor to a vector of Dates
df$DATE <- as.Date(df$DATE, format = "%Y-%m-%d")
# This *prints* the Date vector as a character vector with format "%d/%m/%Y"
format(df$DATE, format = "%d/%m/%Y")
Minimal example
ss <- c("2017-09-04","2017-09-05")
date <- as.Date(ss, format = "%Y-%m-%d")
format(date, format = "%d/%m/%Y")
#[1] "04/09/2017" "05/09/2017"

Want to combine date and time in a column using R

I have the following dataframe
Date Time
10/03/2014 12.00.00
11/03/2014 13.00.00
12/03/2014 14.00.00
I want to create one single column as follows
DT
10/03/2014 12.00.00
11/03/2014 13.00.00
12/03/2014 14.00.00
when I run
data$DT <- as.POSIXct(paste(x$Date, x$Time), format="%d-%m-%Y %H:%M:%S")
I get a column DT with all NA values.
Data$DT <- as.POSIXct(as.character(paste(data$Date, data$Time)), format="%d/%m/%Y %H.%M.%S")
OR
data$Time <- gsub('\\.',':',data$Time)
data$Date <- gsub('/','-',data$Date)
data$DT <- as.POSIXct(as.character(paste(data$Date, data$Time)), format="%d-%m-%Y %H:%M:%S")
Use the package lubridate:
data$DT <- with(data, ymd(Date) + hms(Time))
If you want the column to be a POSIXct, do the following after that:
data$DT <- as.POSIXct(data$DT)
This should be a very common problem, hence contributing with a reproducible answer using dplyr:
## reproducible example
library(dplyr)
library(magrittr)
DF <- data.frame(Date = c("10/03/2014", "11/03/2014", "12/03/2014"),
Time = c("12.00.00", "13.00.00", "14.00.00"))
DF_DT <- DF %>%
mutate(DateTime = paste(Date, Time)) %>%
mutate(across('DateTime', ~ as.POSIXct(.x, format = "%d/%m/%Y %H.%M.%S")))

Resources