Converting ddmmyy-xxxx to date in R - r

I have a dateframe with a column with numbers that represent a date. So 110190-1111 is ddmmyy-xxxx, where the x's don't matter. It is implicit that the century is 1900.
df <- c("110190-1111", "220391-1111", "241287-1111")
I would like to have it converted to.
c("1990-01-11", "1991-03-22", "1987-12-24)
I have removed the last 4 digits and the "-" with the following.
ID <- c("110190-1111", "220391-1111", "241287-1111")
df <- data.frame(ID)
df <- df %>% mutate(date=gsub("-.*", "", ID))
I have tried fiddling with the as.Date function with no luck. Any suggestions? Thanks.

as.Date ignores junk at the end so
df %>% mutate(Date = as.Date(ID, "%d%m%y"))
giving:
ID Date
1 110190-1111 1990-01-11
2 220391-1111 1991-03-22
3 241287-1111 1987-12-24
or using only base R:
transform(df, Date = as.Date(ID, "%d%m%y"))

We can use dmy from lubridate
library(lubridate)
df$date <- dmy(df$date)

Related

Set 2 digits in Month

I have a DF and I would like to create a column with YEAR and MONTH, but setting 2 digits for the month. See my code:
ID <- c(111,222,333,444,555)
DATE <- c(as.Date(c('10/10/2021','12/11/2021','30/12/2021','20/01/2022','25/02/2022') ,"%d/%m/%Y"))
DF_1 <- data.frame(ID, DATE)
Adding the YEAR and MONTH column:
DF_2 <- DF_1 %>%
mutate(YEAR_MONTH = paste(lubridate::year(DATA),
lubridate::month(DATE),
sep = ""))
As you can see, in IDs 444 and 555 the month only presented one digit. I would like it to look like this:
ID <- c(111,222,333,444,555)
DATE <- c(as.Date(c('10/10/2021','12/11/2021','30/12/2021','20/01/2022','25/02/2022') ,"%d/%m/%Y"))
YEAR_MONTH <- c('202110','202111','202112','202201','202202')
DF_3 <- data.frame(ID, DATE, YEAR_MONTH)
How would I go about treating these months that are showing up with just one digit?
Grateful.
Instead of using lubridate year/month, we can directly modify with format which returns the 4 digit year and 2 digit month. lubridate returns a numeric/integer value which cannot have 0 as padding on the left
library(dplyr)
DF_1 <- DF_1 %>%
mutate(YEAR_MONTH = format(DATE, "%Y%m"))
Or using base R
DF_1$YEAR_MONTH <- with(DF_1, format(DATE, "%Y%m"))

How to Invert date format

I have a large database with one of the columns containing dates with the following format: DD-MM-YYYY.
I would like to invert the date format, to something like this: YYYY-MM-DD.
Can someone tell me how can I do it using bash OR R?
A possible solution:
library(tidyverse)
library(lubridate)
df <- data.frame(date=c("11-4-2021","5-6-2019"))
df %>%
mutate(date2 = dmy(date) %>% ymd)
#> date date2
#> 1 11-4-2021 2021-04-11
#> 2 5-6-2019 2019-06-05
In bash, we can use string manipulation:
dmy=30-12-2021
echo "${dmy:6:4}-${dmy:3:2}-${dmy:0:2}" # 2021-12-30
or with read:
IFS="-" read -r d m y <<<"$dmy"
echo "$y-$m-$d"
I used R to solve my problem like this:
df > data.frame with dates on column "eventDate". Dates were in the format DD-MM-YYYY. There were several cells with incomplete dates (e.g. MM-YYYY or YYYY).
library(tidyr)
x <- separate(df, col = eventDate, into = c("day", "month", "year"), sep="-")
y <- x %>% unite("eventDate_2", year:month:day, remove=TRUE, sep="-", na.rm= TRUE)
y <- cbind(y, df$eventDate) # add the original column for comparing if it had worked and correct individual errors.
echo "${dmy:6:4}-${dmy:3:2}-${dmy:0:2}"

R: How to remove the day from a date? [duplicate]

This question already has answers here:
Format Date to Year-Month in R
(3 answers)
Closed 2 years ago.
I have a bunch of dates in a df column in the following format: dd.mm.yyyy
I want it to look like this: 01/2020 (mm.yyyy)
How can I remove the day from all of the dates?
Use format to specify the date format you'd like
date <- as.Date("13/01/2020", format = "%d/%m/%Y")
format(date, "%m/%Y")
[1] "01/2020"
Edit - applying to dataframe column
dates <- c("13/01/2020", "17/02/2015", "13/03/2013")
df <- data.frame(dates, stringsAsFactors = FALSE)
df$dates <- as.Date(df$dates, format = "%d/%m/%Y")
df$dates_format <- format(df$dates, "%m/%Y")
df
dates dates_format
1 2020-01-13 01/2020
2 2015-02-17 02/2015
3 2013-03-13 03/2013
Besides format by #Greg, another option is using sub like below
> sub(".*?/","","13/01/2020")
[1] "01/2020"
Here is a solution using lubridate.
library(lubridate)
#Set the desired format (mm-yyyy) as my_stamp
my_stamp<-stamp( "02-2019",
orders = "my")
#A df with a column full of dates
df <- data.frame(dates = c("30/04/2020","29/03/2020","28/02/2020"))
#Change the column from string to date format
df$dates<-dmy(df$dates)
#Apply the format you desire to the dates (i.e., only month and year)
df$dates<-my_stamp(df$dates)
# dates
#1 04-2020
#2 03-2020
#3 02-2020
There are explicit date formatting options in R (see answer by Greg). Another option would be to separate the date into 3 columns, and then recombine the month and year, putting a / in between. Note this leaves the new date column in character format, which you may want to change depending on your needs.
library(tidyr)
df <- data.frame(date = "13/01/2020")
df <- separate(df, date, into = c("day","month","year"), sep = "/")
df$newdate <- paste(df$month, df$year, sep = "/")

Replace characters with dates in dataframe in r

I have a dataframe with dates stored as strings. The conversion with strptime works fine when I test it in the terminal, but when I want to assign the date in the original cell, I get an error:
provided 11 variables to replace 1 variables
This must be due to the fact that the Object created by strptime() POSIXlt is a list.
How can I assign that object into the cell? I later want to order the dataframe by the date column.
I'm sorry that I can't share the code, due to privacy restrictions.
Edit: This snippet should produce the same error
#creating dataframe
x <- c( "20.11.2019 10:12:15", "21.10.2019 10:12:16", "20.10.2019 10:12:20")
y <- c( "1234", "1238", "1250")
df <- data.frame( "date" = x, "id" = y)
df[order(df$date),] #ordering by date
df #showing that dates get ordered 'incorrectly'
df[,1] = strptime(df[,1], "%d.%m.%Y %H:%M:%S") #trying to replace character with dates while converting
#afterwards I want to order them again 'correctly'
Personally I would use dplyr to mutate the values of the original cell. In combination with lubridate it works for me (at least I think this what you wanted):
df <- df %>% mutate(date =ymd_hms(strptime(date, "%d.%m.%Y %H:%M:%S"))) %>% arrange(date)
date id
1 2019-10-20 10:12:20 1250
2 2019-10-21 10:12:16 1238
3 2019-11-20 10:12:15 1234
This simple adjustment also works. Change df[,1] to df$date.
df$date = strptime(df[,1], "%d.%m.%Y %H:%M:%S")

Convert column of ints to year

I'm new to R, so please no hate. I want to convert the below column of ints to a column of years
Convert this:
Date: int 189507 189508 189509 ...
To this:
Year: int 1895 1895 1895
Code
library(tidyverse)
library(lubridate)
df <- read_csv("noaa-central-park.csv")
year <- df$Date
df <- transform(df, year = as.Date(as.character(year), "%Y"))
tempByYears <- group_by(df, year)
Question: I still get a year-month-day format as shown below. How to fix?
Sources: Stackoverflow questions, group_by() video
I'm assuming that the value in Date is Year + Month, in the format %Y%m. In that case, it would be better not to read it into R as in integer. You could specify that Date be a character, for example.
I'm using df1 for the data frame variable name because df may cause confusion with the function of the same name.
df1 <- read_csv("noaa-central-park.csv",
col_types = cols(Date = col_character()))
Now assuming that every Date starts with a 4-digit year, the simplest way to get year is to extract the first 4 characters and convert to numeric:
df1 <- df1 %>%
mutate(year = as.numeric(substring(Date, 1, 4))

Resources