changing date format with only months and year - r

I have a column with dates that are formatted like this:
yyyymm (e.g. 201809)
I want them to be formatted like this:
mm.yyyy (e.g. 09.2018)
I tried:
FF5factors$date <- strptime(FF5factors$date, format= "%Y%m")
format(FF5factors$date, format="%m.%Y")
But it only returns NA values.

What about:
d <- '201809'
format(as.Date(d,'%Y%M'),'%m.%Y')
[1] "09.2018"

Here are some alternatives. The question did not provide date in reproducible form so we assume the first line below although the first 4 alternatives will also work with date <- "201809" and with date <- factor(201809) .
date <- 201809
# 1
sub("(....)(..)", "\\2.\\1", date)
## [1] "09.2018"
# 2
library(zoo)
format(as.yearmon(format(date), "%Y%m"), "%m.%Y")
## [1] "09.2018"
# 3
paste(substr(date, 5, 6), substr(date, 1, 4), sep = ".")
## [1] "09.2018"
# 4
format(as.Date(paste0(date, "01"), "%Y%m%d"), "%m.%Y")
## [1] "09.2018"
# 5
sprintf("%02d.%d", date %% 100, date %/%100)
## [1] "09.2018"

Related

Parsing dates in R with weird format

Hi guys I have a list of dates with this weird format : X1.22.20 X1.23.20 (month/day/year).
and I would like to have "2020-06-11" ('%d %b %Y').
I tried this:
> min.date <- min(dates)
> max.date <- max(dates)
> min.date.txt <- min.date %>% format('%d %b %Y')
> max.date.txt <- max.date %>% format('%d %b %Y') %>% paste('UTC')
> min.date
[1] "2002-10-10"
And the value is crazy because I know for sure that there are not 2002 int his data.
Any help?
Thanks
Assuming that the question is how to convert the input x shown below to Date class use as.Date with a format that corresponds to the input so it must start with X, have dots where the input has dots, etc. Look at ?strptime for documentation on the percent codes.
x <- c("X1.22.20", "X1.23.20") # input
as.Date(x, format = "X%m.%d.%y")
## [1] "2020-01-22" "2020-01-23"
Note that if you got those dates like this:
Lines <- "1.22.20 1.23.20
1 2
3 4"
read.table(text = Lines, header = TRUE)
## X1.22.20 X1.23.20
## 1 1 2
## 2 3 4
then the X can be avoided using check.names = FALSE as follows:
read.table(text = Lines, header = TRUE, check.names = FALSE)
## 1.22.20 1.23.20
## 1 1 2
## 2 3 4

How to change the date format & remove rows from dataframe before certain date R Studio

I have a dataframe with over 8.8 million observations and I need to remove rows from the dataframe before a certain date. Currently the date format is in MM/DD/YYYY but I would like to convert it to R date format (I believe YYYY-MM-DD).
When I run the code that I have below, it puts them in the correct R format, but it does not keep the correct date. For some reason, it makes the dates 2020. None of the dates in my data frame have the year 2020
> dates <- nyc_call_data_sample$INCIDENT_DATETIME
> date <- as.Date(dates,
+ format = "%m/%d/%y")
> head(nyc_call_data_sample$INCIDENT_DATETIME)
[1] "07/01/2015" "04/24/2016" "04/01/2013" "02/07/2015" "06/27/2016" "05/04/2017"
> head(date)
[1] "2020-07-01" "2020-04-24" "2020-04-01" "2020-02-07" "2020-06-27" "2020-05-04"
> nyc_call_data_sample$INCIDENT_DATETIME <- strptime(as.character(nzd$date), "%d/%m/%y")
Also, I have data that goes back as far as 2013. How would I go about removing all rows from the dataframe that are before 01/01/2017
Thanks!
as.Date and basic ?Extraction are your friend here.
dat <- data.frame(
unformatted = c("07/01/2015", "04/24/2016", "04/01/2013", "02/07/2015", "06/27/2016", "05/04/2017")
)
dat$date <- as.Date(dat$unformatted, format = "%m/%d/%Y")
dat
# unformatted date
# 1 07/01/2015 2015-07-01
# 2 04/24/2016 2016-04-24
# 3 04/01/2013 2013-04-01
# 4 02/07/2015 2015-02-07
# 5 06/27/2016 2016-06-27
# 6 05/04/2017 2017-05-04
dat[ dat$date > as.Date("2017-01-01"), ]
# unformatted date
# 6 05/04/2017 2017-05-04
(Feel free to remove the unformatted column with dat$unformatted <- NULL.)
With tidyverse:
library(dplyr)
dat %>%
mutate(date = as.Date(unformatted, format = "%m/%d/%Y")) %>%
select(-unformatted) %>%
filter(date > as.Date("2017-01-01"))
# date
# 1 2017-05-04

Finding previous month end date

From date 10/31/2018, I want to obtain 09/30/2018.
I tried:
PREV.PROD.DATE<-seq.Date(from=as.Date(chron("10/31/2018")),length=2,by="-1 months")
but it returns:
"2018-10-31" "2018-10-01"
How can I obtain 09/30 instead of 10/01?
Notes: I would like to avoid to use an external package, and I would like the solution to work for any end of month date.
The integer units of Date are days, so you can do:
seq.Date(from=as.Date("2018-10-31"), length=2, by="-1 months") - c(0,1)
# [1] "2018-10-31" "2018-09-30"
If you want arbitrary prior-last-date:
(d <- as.POSIXlt(Sys.Date()))
# [1] "2018-11-08 UTC"
d$mday <- 1L
as.Date(d) - 1
# [1] "2018-10-31"
Replace Sys.Date() with whatever single date you have. If you want to vectorize this:
(ds <- Sys.Date() + c(5, 20, 50))
# [1] "2018-11-13" "2018-11-28" "2018-12-28"
lapply(as.POSIXlt(ds), function(a) as.Date(`$<-`(a, "mday", 1L)) - 1L)
# [[1]]
# [1] "2018-10-31"
# [[2]]
# [1] "2018-10-31"
# [[3]]
# [1] "2018-11-30"
I like to floor the date - making it the first day of its month, and then subtract 1 to make it the last day of the previous month:
x = as.Date("2018-10-31")
library(lubridate)
floor_date(x, unit = "months") - 1
# [1] "2018-09-30"
Here's a version without using other packages:
as.Date(format(x, "%Y-%m-01")) - 1
# [1] "2018-09-30"
I don't use function chron. But I think the function duration from lubridate helps you.
You don't need use floor_date to confuse you.
library(lubridate)
#>
#> 载入程辑包:'lubridate'
#> The following object is masked from 'package:base':
#>
#> date
library(tidyverse)
better_date <-
str_split("10/31/2018",'/') %>%
.[[1]] %>%
.[c(3,1,2)] %>%
str_flatten(collapse = '-') %>%
as.Date()
(better_date - duration('1 months')) %>%
as.Date()
#> [1] "2018-09-30"
Created on 2018-11-09 by the reprex package (v0.2.1)

Convert date to day-of-week in R

I have a date in this format in my data frame:
"02-July-2015"
And I need to convert it to the day of the week (i.e. 183). Something like:
df$day_of_week <- weekdays(as.Date(df$date_column))
But this doesn't understand the format of the dates.
You could use lubridate to convert to day of week or day of year.
library(lubridate)
# "02-July-2015" is Thursday
date_string <- "02-July-2015"
dt <- dmy(date_string)
dt
## [1] "2015-07-02 UTC"
### Day of week : (1-7, Sunday is 1)
wday(dt)
## [1] 5
### Day of year (1-366; for 2015, only 365)
yday(dt)
## [1] 183
### Or a little shorter to do the same thing for Day of year
yday(dmy("02-July-2015"))
## [1] 183
day = as.POSIXct("02-July-2015",format="%d-%b-%Y")
# see ?strptime for more information on date-time conversions
# Day of year as decimal number (001–366).
format(day,format="%j")
[1] "183"
#Weekday as a decimal number (1–7, Monday is 1).
format(day,format="%u")
[1] "4"
This is what anotherFishGuy supposed, plus converting the values to as.numeric so they fit through classifier.
# day <- Sys.time()
as.num.format <- function(day, ...){
as.numeric(format(day, ...))
}
doy <- as.num.format(day,format="%j")
doy <- as.num.format(day,format="%u")
hour <- as.num.format(day, "%H")

Filter data.table whether date-type column contains month

I have a lot of outliers in the months of January and December, hence I want to exclude them for now. Here's my data.table:
> str(statistics2)
Classes 'data.table' and 'data.frame': 1418 obs. of 4 variables:
$ status: chr "hire" "normal" "hire" "hire" ...
$ month : Date, format: "1993-01-01" "1993-01-01" ...
$ NOBS : int 37459 765 12 16 24 17 2 12 2 11 ...
I tried to create a condition that checks the month, but I get the following error.
format(statistics2['month'], "%m")
Error in `[.data.table`(statistics2, "month") :
typeof x.month (double) != typeof i.month (character)
Since your question specifically asks about data.table, there is a set of lubridate-like functions built into the data.table package (load the package and type ?month, for instance). You don't need format(...) or lubridate.
library(data.table)
DT <- data.table(status=c("hire","normal","hire"),
month=as.Date(c("1993-01-01","1993-06-01", "1993-12-01")),
NOBS=c(37459,765,12))
DT
# status month NOBS
# 1: hire 1993-01-01 37459
# 2: normal 1993-06-01 765
# 3: hire 1993-12-01 12
DT[!(month(month) %in% c(1,12))]
# status month NOBS
# 1: normal 1993-06-01 765
Well, if statistics2 is a data.frame
statistics2 <- data.frame(status=c("hire","normal","hire"),
month=as.Date(c("1993-01-01","1993-06-01", "1993-12-01")),
NOBS=c(37459,765,12)
)
then you should use
format(statistics2[["month"]], "%m")
# [1] "01" "06" "12"
(note the double brackets -- otherwise you're returning a list which format() cannot correctly interpret).
If statistics2 is a data.table
statistics2dt <- data.table(statistics2)
then I would have thought statistics2dt['month'] would have returned a different error, but the correct syntax in that case is
format(statistics2dt[, month], "%m")
# [1] "01" "06" "12"
(no quotes and a comma)
You could use lubridate to extract the months and exclude those from the data frame:
require(lubridate)
rm(list = ls(all = T))
set.seed(0)
months <- round(runif(100, 1, 12), digits = 0)
years <- round(runif(100, 2013, 2014), digits = 0)
day <- round(runif(100, 2, 25), digits = 0)
dates <- paste(years, months, day, sep = "-")
dates <- as.Date(dates, "%Y-%m-%d")
NOBS <- round(runif(100, 1, 1000), digits = 0)
statistics2 <- cbind.data.frame(dates, NOBS)
months <- month(statistics2$dates)
excJanDec <- statistics2[-which(months %in% c(1, 12)) ,]

Resources