I have a large database with one of the columns containing dates with the following format: DD-MM-YYYY.
I would like to invert the date format, to something like this: YYYY-MM-DD.
Can someone tell me how can I do it using bash OR R?
A possible solution:
library(tidyverse)
library(lubridate)
df <- data.frame(date=c("11-4-2021","5-6-2019"))
df %>%
mutate(date2 = dmy(date) %>% ymd)
#> date date2
#> 1 11-4-2021 2021-04-11
#> 2 5-6-2019 2019-06-05
In bash, we can use string manipulation:
dmy=30-12-2021
echo "${dmy:6:4}-${dmy:3:2}-${dmy:0:2}" # 2021-12-30
or with read:
IFS="-" read -r d m y <<<"$dmy"
echo "$y-$m-$d"
I used R to solve my problem like this:
df > data.frame with dates on column "eventDate". Dates were in the format DD-MM-YYYY. There were several cells with incomplete dates (e.g. MM-YYYY or YYYY).
library(tidyr)
x <- separate(df, col = eventDate, into = c("day", "month", "year"), sep="-")
y <- x %>% unite("eventDate_2", year:month:day, remove=TRUE, sep="-", na.rm= TRUE)
y <- cbind(y, df$eventDate) # add the original column for comparing if it had worked and correct individual errors.
echo "${dmy:6:4}-${dmy:3:2}-${dmy:0:2}"
Related
I have a dateframe with a column with numbers that represent a date. So 110190-1111 is ddmmyy-xxxx, where the x's don't matter. It is implicit that the century is 1900.
df <- c("110190-1111", "220391-1111", "241287-1111")
I would like to have it converted to.
c("1990-01-11", "1991-03-22", "1987-12-24)
I have removed the last 4 digits and the "-" with the following.
ID <- c("110190-1111", "220391-1111", "241287-1111")
df <- data.frame(ID)
df <- df %>% mutate(date=gsub("-.*", "", ID))
I have tried fiddling with the as.Date function with no luck. Any suggestions? Thanks.
as.Date ignores junk at the end so
df %>% mutate(Date = as.Date(ID, "%d%m%y"))
giving:
ID Date
1 110190-1111 1990-01-11
2 220391-1111 1991-03-22
3 241287-1111 1987-12-24
or using only base R:
transform(df, Date = as.Date(ID, "%d%m%y"))
We can use dmy from lubridate
library(lubridate)
df$date <- dmy(df$date)
This question already has answers here:
Format Date to Year-Month in R
(3 answers)
Closed 2 years ago.
I have a bunch of dates in a df column in the following format: dd.mm.yyyy
I want it to look like this: 01/2020 (mm.yyyy)
How can I remove the day from all of the dates?
Use format to specify the date format you'd like
date <- as.Date("13/01/2020", format = "%d/%m/%Y")
format(date, "%m/%Y")
[1] "01/2020"
Edit - applying to dataframe column
dates <- c("13/01/2020", "17/02/2015", "13/03/2013")
df <- data.frame(dates, stringsAsFactors = FALSE)
df$dates <- as.Date(df$dates, format = "%d/%m/%Y")
df$dates_format <- format(df$dates, "%m/%Y")
df
dates dates_format
1 2020-01-13 01/2020
2 2015-02-17 02/2015
3 2013-03-13 03/2013
Besides format by #Greg, another option is using sub like below
> sub(".*?/","","13/01/2020")
[1] "01/2020"
Here is a solution using lubridate.
library(lubridate)
#Set the desired format (mm-yyyy) as my_stamp
my_stamp<-stamp( "02-2019",
orders = "my")
#A df with a column full of dates
df <- data.frame(dates = c("30/04/2020","29/03/2020","28/02/2020"))
#Change the column from string to date format
df$dates<-dmy(df$dates)
#Apply the format you desire to the dates (i.e., only month and year)
df$dates<-my_stamp(df$dates)
# dates
#1 04-2020
#2 03-2020
#3 02-2020
There are explicit date formatting options in R (see answer by Greg). Another option would be to separate the date into 3 columns, and then recombine the month and year, putting a / in between. Note this leaves the new date column in character format, which you may want to change depending on your needs.
library(tidyr)
df <- data.frame(date = "13/01/2020")
df <- separate(df, date, into = c("day","month","year"), sep = "/")
df$newdate <- paste(df$month, df$year, sep = "/")
I am an aspiring data scientist, and this will be my first ever question on StackOF.
I have this line of code to help wrangle me data. My date filter is static. I would prefer not to have to go in an change this hardcoded value every year. What is the best alternative for my date filter to make it more dynamic? The date column is also difficult to work with because it is not a
"date", it is a "dbl"
library(dplyr)
library(lubridate)
# create a sample dataframe
df <- data.frame(
DATE = c(20191230, 20191231, 20200122)
)
Tried so far:
df %>%
filter(DATE >= 20191231)
# load packages (lubridate for dates)
library(dplyr)
library(lubridate)
# create a sample dataframe
df <- data.frame(
DATE = c(20191230, 20191231, 20200122)
)
This looks like this:
DATE
1 20191230
2 20191231
3 20200122
# and now...
df %>% # take the dataframe
mutate(DATE = ymd(DATE)) %>% # turn the DATE column actually into a date
filter(DATE >= floor_date(Sys.Date(), "year") - days(1))
...and filter rows where DATE is >= to one day before the first day of this year (floor_date(Sys.Date(), "year"))
DATE
1 2019-12-31
2 2020-01-22
I have a dataframe with dates stored as strings. The conversion with strptime works fine when I test it in the terminal, but when I want to assign the date in the original cell, I get an error:
provided 11 variables to replace 1 variables
This must be due to the fact that the Object created by strptime() POSIXlt is a list.
How can I assign that object into the cell? I later want to order the dataframe by the date column.
I'm sorry that I can't share the code, due to privacy restrictions.
Edit: This snippet should produce the same error
#creating dataframe
x <- c( "20.11.2019 10:12:15", "21.10.2019 10:12:16", "20.10.2019 10:12:20")
y <- c( "1234", "1238", "1250")
df <- data.frame( "date" = x, "id" = y)
df[order(df$date),] #ordering by date
df #showing that dates get ordered 'incorrectly'
df[,1] = strptime(df[,1], "%d.%m.%Y %H:%M:%S") #trying to replace character with dates while converting
#afterwards I want to order them again 'correctly'
Personally I would use dplyr to mutate the values of the original cell. In combination with lubridate it works for me (at least I think this what you wanted):
df <- df %>% mutate(date =ymd_hms(strptime(date, "%d.%m.%Y %H:%M:%S"))) %>% arrange(date)
date id
1 2019-10-20 10:12:20 1250
2 2019-10-21 10:12:16 1238
3 2019-11-20 10:12:15 1234
This simple adjustment also works. Change df[,1] to df$date.
df$date = strptime(df[,1], "%d.%m.%Y %H:%M:%S")
I'm new to R, so please no hate. I want to convert the below column of ints to a column of years
Convert this:
Date: int 189507 189508 189509 ...
To this:
Year: int 1895 1895 1895
Code
library(tidyverse)
library(lubridate)
df <- read_csv("noaa-central-park.csv")
year <- df$Date
df <- transform(df, year = as.Date(as.character(year), "%Y"))
tempByYears <- group_by(df, year)
Question: I still get a year-month-day format as shown below. How to fix?
Sources: Stackoverflow questions, group_by() video
I'm assuming that the value in Date is Year + Month, in the format %Y%m. In that case, it would be better not to read it into R as in integer. You could specify that Date be a character, for example.
I'm using df1 for the data frame variable name because df may cause confusion with the function of the same name.
df1 <- read_csv("noaa-central-park.csv",
col_types = cols(Date = col_character()))
Now assuming that every Date starts with a 4-digit year, the simplest way to get year is to extract the first 4 characters and convert to numeric:
df1 <- df1 %>%
mutate(year = as.numeric(substring(Date, 1, 4))