Convert Date formats in base R [duplicate] - r

This question already has answers here:
Changing date format in R
(7 answers)
Closed 3 years ago.
Given two dates in a data frame that are in this format:
df <- tibble(date = c('25/05/95', '21/09/18'))
df$date <- as.Date(df$date)
How can I convert the dates into this format - date = c('1995-05-25', '2018-09-21') with the year appearing first and in four digit format, and by only using base R?
Here is my attempt, I successfully reversed the order, but still wasn't able to express the year in 4 digit format:
df <- tibble(date_orig = c('25/05/1995', '21/09/2018'))
df$date <- as.Date(df$date_orig)
year_date <- format(df$date, '%d')
month_date <- format(df$date, '%m')
day_date <- format(df$date, '%y')
df$newdate <- as.Date(paste(paste(year_date, month_date, sep = '-'), day_date, sep = '-'))
df$newdate_final <- as.Date(df$newdate, '%Y-%m-%d')

You need to know which format your date follows and find it in ?strptime to convert it in date object. As you required output is the standard way to represent dates you would not need format.
as.Date(df$date, "%d/%m/%Y")
#[1] "1995-05-25" "2018-09-21"

Related

Convert date format from dd-mm-yyyy to dd/mm/yyyy

I would like convert my date format from dd-mm-yyyy to dd/mm/yyyy
Data:
date
1 22-Jul-2020
Current code:
format(as.Date(df$date, '%d:%m:%Y'), '%d/%m/%Y' )
[1] NA NA
Desired Output:
date
1 22/07/2020
The format in as.Date should match the input format. It is %d followed by -, then abbrevation for month (%b) followed by - and 4 digit year (%Y)
df$date <- format(as.Date(df$date, '%d-%b-%Y'), '%d/%m/%Y' )
df$date
#[1] "22/07/2020"
data
df <- structure(list(date = "22-Jul-2020"), class = "data.frame", row.names = "1")
You can try
library(lubridate)
df <- data.frame(date = c("22-Jul-2020"))
df$date <- dmy(df$date)
df$date <- format(df$date, format = "%d/%m/%Y")
# date
#1 22/07/2020

Changing date format (NA error)

So I have this data file which includes dates and other values. I've imported my data using the following code:
df <- read.csv(file.choose(), header=T, stringsAsFactors=F)
This is so that all the values in the data frame are in character. This makes the next step easier for me.
The data.frame (df) includes:
date x
20020102 1
20020102 2
The date changes every few thousand rows.
I want to change the date format so that it would be yyyy-mm-dd.
I've tried using the code:
df$date <- as.Date(df$date, format="%Y-%m-%d")
and have also used
df$date <- strptime(df$date, format="%Y-%m-%d")
but have always gotten NA values in the date column.
I'm a beginner at R so it would be very helpful if the solution could be simple or can be explained clearly.
Thanks so much!
You can use the correct format
df$date <- as.Date(df$date, format='%Y%m%d')
It is not clear whether you have numeric or non-numeric 'date' column. If it is 'numeric', convert to 'character' first
df$date <- as.Date(as.character(df$date), format='%Y%m%d')
But, strptime would work even if the column is numeric.
Or using library(lubridate)
library(lubridate)
ymd(df$date)
The problem is that your colunm "date" is not of class 'Date', it is a 'numeric' vector, thus the command as.Date returns NA`s.
You can check if the class of the colunm date is correct with this command:
class(df$date)
Following the advise from #akrun you should transform the date colunm into a 'character' vector, then you can format the style the way you want:
### your data example:
df <- data.frame(date = c(20020102, 20020102),
x = c(1,2))
class(df$date)
#> [1] "numeric"
#convert the colunm date to character
df$date <- as.character(df$date)
# Then, convert to the desired date format:
df$date <- as.Date(df$date, format='%Y%m%d')
# check the results:
df
#> date x
#> 1 2002-01-02 1
#> 2 2002-01-02 2
class(df$date)
#> [1] "Date"

R - How to subset a table between two specific dates?

I have hourly data values of eight years, and I would like to subset all the values within an specific year. For example a data set for 2007, another for 2008 and so on. At the moment I have many problems with the date format, because when I specific a time period, I get another date period.
Here is my table: LValley, and that is what I have tried:
LValley <- read.table("C:/LValley.txt", header=TRUE, dec = ",", sep="\t")
year2007 <- subset(LValley, date > as.Date("01.01.2007 01:00", "%d.%m.%Y %H:%M") & date < as.Date("01.02.2008 01:00", "%d.%m.%Y %H:%M"))
but it returns me another date period, and I would like exactly all the data from 2007.
I have used also the function of this example, and I have the same results # Subset a dataframe between 2 dates
mydatefunc <- function(x,y){LValley[LValley$date >= x & LValley$date <= y,]}
DATE1 <- as.Date("01.01.2007 01:00", "%d.%m.%Y %H:%M")
DATE2 <- as.Date("01.01.2008 00:00", "%d.%m.%Y %H:%M")
Test2007 <- mydatefunc(DATE1,DATE2)
I will appreciate very much you help,
Kind regards,
Darwin
You need to convert the date column in the file to date class. For example:
LValley <- read.table("LValley.txt", header=TRUE,dec=",", sep="\t", stringsAsFactors=FALSE)
date1 <- as.Date(LValley$date, "%d.%m.%Y %H:%M")
Test2007 <- subset(LValley, date1>=DATE1 & date1 <=DATE2)
dim(Test2007)
#[1] 6249 4

Can I subset specific years and months directly from POSIXct datetimes?

I have time series data and I am trying to subset the following:
1) periods between specific years (beginning 12AM January 1 and ending 11pm December 31)
2) periods without specific months
These are two independent subsets I am trying to do.
Given the following dataframe:
test <- data.frame(seq(from = as.POSIXct("1983-03-09 01:00"), to = as.POSIXct("1985-01-08 00:00"), by = "hour"))
colnames(test) <- "DateTime"
test$Value<-sample(0:100,16104,rep=TRUE)
I can first create Year and Month columns and use these to subset:
# Add year column
test$Year <- as.numeric(format(test$DateTime, "%Y"))
# Add month column
test$Month <- as.numeric(format(test$DateTime, "%m"))
# Subset specific year (1984 in this case)
sub1 = subset(test, Year!="1983" & Year!="1985")
# Subset specific months (April and May in this case)
sub2 = subset(test, Month=="4" | Month=="5")
However, I am wondering if there is a better way to do this directly from the POSIXct datetimes (without having to first create the Year and Month columns. Any ideas?
sub1 <- subset(test, format(DateTime, "%Y") %in% c("1983" , "1985") )
sub2 <- subset(test, as.numeric(format(DateTime, "%m")) %in% 4:5)

Converting multiple columns in an R dataframe to Date Format

I have a large datafile where all the dates have been loaded as charaters. I would like to change all the Dates columns to date format. Most of the dates have "%y%m%d" format, some have "%Y%m%d" format. There are 25 columns of dates, so changing each one individually is inefficient.
I can do
df$DATE1 <- as.Date(df$DATE1, format ="%y%m%d")
df$DATE2 <- as.Date(df$DATE2, format ="%y%m%d")
etc., but very bad coding.
I tried the following code, but is is not working. This assumes all of the dates are of the format "%y%m%d". Using grep("DATE", names(df)) will get all the Dates columns
df[ , grep("DATE", names(df))] <- as.Date(df[ , grep("DATE", names(df))], "%y%m%d")
Try:
df[, cols <- grep("^DATE", names(df))] <- lapply(df[, cols <- grep("^DATE", names(df))], as.Date, format = "%y%m%d")
Example:
df <- data.frame(DATE1 = c('910812', '900928'), DATE2 = c('890813', '890910'))
# Apply the above and you get:
# > df
# DATE1 DATE2
# 1 1991-08-12 1989-08-13
# 2 1990-09-28 1989-09-10
# > class(df[, 1])
# [1] "Date"

Resources