Error in converting datetime in factor format into date - r

i am a rookie using r to conduct some time series analysis.
Here the column "date" is factor format, and i used the as.Date() function and ymd() function and try to convert it into Date format(belike 2015-04-01), but the two functions give me wrong answers
my data$Date is like:
enter image description here
and the wrong answer is like:
enter image description here
i wonder why it is and can anyone give me some solutions?
a lot thx!!!

Code
library(dplyr)
library(stringr)
## assuming that your date dataframe is like this
df <- data.frame(Date = as.factor(c("01/01/2016 16:00:00","02/01/2022 16:00:00","05/01/2022 16:00:00", "21/12/2022 16:00:00","03/09/2021 16:00:00", "21/12/2017 16:00:00")))
df$Date %>%
as.character() %>%
str_remove_all(" .*") %>%
as.POSIXct(format = "%d/%m/%Y") %>%
strftime() -> df$Date
Output
> df
Date
1 2016-01-01
2 2022-01-02
3 2022-01-05
4 2022-12-21
5 2021-09-03
6 2017-12-21
hope this helps

Related

Merging two lines of code into one using indexing or any other tool

I have two differnet lines of code which performing same thing but for two different columns. I tried different methods to merge them into just one line of code but everytime I get an error.
Code is just to tranform datetime column from chr to dttm:
df[["started_at"]] <- as.POSIXct(df[["started_at"]], format = "%Y-%m-%d %H:%M:%S") %>% ymd_hms()
df[["ended_at"]] <- as.POSIXct(df[["ended_at"]], format = "%Y-%m-%d %H:%M:%S") %>% ymd_hms()
If you are comfortable with the package dplyr, you can use mutate() with across().
Input
I've created a dummy dataframe df for demonstration.
library(dplyr)
library(lubridate)
# dummy dataframe
df <- tibble(started_at = "2020-01-30 11:11:11",
ended_at = "2020-12-06 15:43:26",
ID = "123")
# A tibble: 1 × 3
started_at ended_at ID
<chr> <chr> <chr>
1 2020-01-30 11:11:11 2020-12-06 15:43:26 123
Solution
df <- df %>% mutate(across(c(started_at, ended_at),
~ as.POSIXct(.x, format = "%Y-%m-%d %H:%M:%S") %>%
ymd_hms()))
# A tibble: 1 × 3
started_at ended_at ID
<dttm> <dttm> <chr>
1 2020-01-30 11:11:11 2020-12-06 15:43:26 123
Any of
df %>% mutate(across(c(started_at, ended_at), as.POSIXct))
df %>% mutate(across(c(started_at, ended_at), ymd_hms))
will coerce to class "POSIXct".
If you know that the date/time columns are the only ones ending in "_at" , you can simplify the code above to any of
df %>% mutate(across(ends_with("_at"), as.POSIXct))
df %>% mutate(across(ends_with("_at"), ymd_hms))
On both case, the rule is
If you want to avoid loading another package, lubridate, just for this, use the code line calling as.POSIXct.
If you need more date and time functions, to load and use package lubridate is probably a good idea.
As the OP showed base R code, a base R variant for simultaneously transforming multiple columns at once can be with lapply
df[c("started_at", "ended_at")] <- lapply(df[c("started_at", "ended_at")],
as.POSIXct)
The format is only needed if it not in the default format. With POSIXct/POSIXlt, default format is YYYY-MM-HH which is the format as showed in the OP's post

How to Invert date format

I have a large database with one of the columns containing dates with the following format: DD-MM-YYYY.
I would like to invert the date format, to something like this: YYYY-MM-DD.
Can someone tell me how can I do it using bash OR R?
A possible solution:
library(tidyverse)
library(lubridate)
df <- data.frame(date=c("11-4-2021","5-6-2019"))
df %>%
mutate(date2 = dmy(date) %>% ymd)
#> date date2
#> 1 11-4-2021 2021-04-11
#> 2 5-6-2019 2019-06-05
In bash, we can use string manipulation:
dmy=30-12-2021
echo "${dmy:6:4}-${dmy:3:2}-${dmy:0:2}" # 2021-12-30
or with read:
IFS="-" read -r d m y <<<"$dmy"
echo "$y-$m-$d"
I used R to solve my problem like this:
df > data.frame with dates on column "eventDate". Dates were in the format DD-MM-YYYY. There were several cells with incomplete dates (e.g. MM-YYYY or YYYY).
library(tidyr)
x <- separate(df, col = eventDate, into = c("day", "month", "year"), sep="-")
y <- x %>% unite("eventDate_2", year:month:day, remove=TRUE, sep="-", na.rm= TRUE)
y <- cbind(y, df$eventDate) # add the original column for comparing if it had worked and correct individual errors.
echo "${dmy:6:4}-${dmy:3:2}-${dmy:0:2}"

Converting ddmmyy-xxxx to date in R

I have a dateframe with a column with numbers that represent a date. So 110190-1111 is ddmmyy-xxxx, where the x's don't matter. It is implicit that the century is 1900.
df <- c("110190-1111", "220391-1111", "241287-1111")
I would like to have it converted to.
c("1990-01-11", "1991-03-22", "1987-12-24)
I have removed the last 4 digits and the "-" with the following.
ID <- c("110190-1111", "220391-1111", "241287-1111")
df <- data.frame(ID)
df <- df %>% mutate(date=gsub("-.*", "", ID))
I have tried fiddling with the as.Date function with no luck. Any suggestions? Thanks.
as.Date ignores junk at the end so
df %>% mutate(Date = as.Date(ID, "%d%m%y"))
giving:
ID Date
1 110190-1111 1990-01-11
2 220391-1111 1991-03-22
3 241287-1111 1987-12-24
or using only base R:
transform(df, Date = as.Date(ID, "%d%m%y"))
We can use dmy from lubridate
library(lubridate)
df$date <- dmy(df$date)

R: How to remove the day from a date? [duplicate]

This question already has answers here:
Format Date to Year-Month in R
(3 answers)
Closed 2 years ago.
I have a bunch of dates in a df column in the following format: dd.mm.yyyy
I want it to look like this: 01/2020 (mm.yyyy)
How can I remove the day from all of the dates?
Use format to specify the date format you'd like
date <- as.Date("13/01/2020", format = "%d/%m/%Y")
format(date, "%m/%Y")
[1] "01/2020"
Edit - applying to dataframe column
dates <- c("13/01/2020", "17/02/2015", "13/03/2013")
df <- data.frame(dates, stringsAsFactors = FALSE)
df$dates <- as.Date(df$dates, format = "%d/%m/%Y")
df$dates_format <- format(df$dates, "%m/%Y")
df
dates dates_format
1 2020-01-13 01/2020
2 2015-02-17 02/2015
3 2013-03-13 03/2013
Besides format by #Greg, another option is using sub like below
> sub(".*?/","","13/01/2020")
[1] "01/2020"
Here is a solution using lubridate.
library(lubridate)
#Set the desired format (mm-yyyy) as my_stamp
my_stamp<-stamp( "02-2019",
orders = "my")
#A df with a column full of dates
df <- data.frame(dates = c("30/04/2020","29/03/2020","28/02/2020"))
#Change the column from string to date format
df$dates<-dmy(df$dates)
#Apply the format you desire to the dates (i.e., only month and year)
df$dates<-my_stamp(df$dates)
# dates
#1 04-2020
#2 03-2020
#3 02-2020
There are explicit date formatting options in R (see answer by Greg). Another option would be to separate the date into 3 columns, and then recombine the month and year, putting a / in between. Note this leaves the new date column in character format, which you may want to change depending on your needs.
library(tidyr)
df <- data.frame(date = "13/01/2020")
df <- separate(df, date, into = c("day","month","year"), sep = "/")
df$newdate <- paste(df$month, df$year, sep = "/")

Replace characters with dates in dataframe in r

I have a dataframe with dates stored as strings. The conversion with strptime works fine when I test it in the terminal, but when I want to assign the date in the original cell, I get an error:
provided 11 variables to replace 1 variables
This must be due to the fact that the Object created by strptime() POSIXlt is a list.
How can I assign that object into the cell? I later want to order the dataframe by the date column.
I'm sorry that I can't share the code, due to privacy restrictions.
Edit: This snippet should produce the same error
#creating dataframe
x <- c( "20.11.2019 10:12:15", "21.10.2019 10:12:16", "20.10.2019 10:12:20")
y <- c( "1234", "1238", "1250")
df <- data.frame( "date" = x, "id" = y)
df[order(df$date),] #ordering by date
df #showing that dates get ordered 'incorrectly'
df[,1] = strptime(df[,1], "%d.%m.%Y %H:%M:%S") #trying to replace character with dates while converting
#afterwards I want to order them again 'correctly'
Personally I would use dplyr to mutate the values of the original cell. In combination with lubridate it works for me (at least I think this what you wanted):
df <- df %>% mutate(date =ymd_hms(strptime(date, "%d.%m.%Y %H:%M:%S"))) %>% arrange(date)
date id
1 2019-10-20 10:12:20 1250
2 2019-10-21 10:12:16 1238
3 2019-11-20 10:12:15 1234
This simple adjustment also works. Change df[,1] to df$date.
df$date = strptime(df[,1], "%d.%m.%Y %H:%M:%S")

Resources