Converting multiple columns in an R dataframe to Date Format - r

I have a large datafile where all the dates have been loaded as charaters. I would like to change all the Dates columns to date format. Most of the dates have "%y%m%d" format, some have "%Y%m%d" format. There are 25 columns of dates, so changing each one individually is inefficient.
I can do
df$DATE1 <- as.Date(df$DATE1, format ="%y%m%d")
df$DATE2 <- as.Date(df$DATE2, format ="%y%m%d")
etc., but very bad coding.
I tried the following code, but is is not working. This assumes all of the dates are of the format "%y%m%d". Using grep("DATE", names(df)) will get all the Dates columns
df[ , grep("DATE", names(df))] <- as.Date(df[ , grep("DATE", names(df))], "%y%m%d")

Try:
df[, cols <- grep("^DATE", names(df))] <- lapply(df[, cols <- grep("^DATE", names(df))], as.Date, format = "%y%m%d")
Example:
df <- data.frame(DATE1 = c('910812', '900928'), DATE2 = c('890813', '890910'))
# Apply the above and you get:
# > df
# DATE1 DATE2
# 1 1991-08-12 1989-08-13
# 2 1990-09-28 1989-09-10
# > class(df[, 1])
# [1] "Date"

Related

Retain one column in a data frame as POSIXct and convert others to numeric in r

Leaving the date column I would like to convert rest of the columns in a data frame from chr to numeric. How could I achieve this? There are many columns in the data frame and below is only an extract. Thanks.
Date RECORD Battery_V Data_logger_Temp_C VWC_CS7
2021-06-25 12:34:00 0 12.47 14.14 0.127
Suppose we have the data frame shown reproducibly in the Note at the end. Then convert all columns except the first as shown. No packages are used.
DF2 <- replace(DF, -1, lapply(DF[-1], as.numeric))
or
DF2 <- DF
DF2[-1] <- lapply(DF2[-1], as.numeric)
or we can convert all character columns using:
ok <- sapply(DF, is.character)
DF2 <- replace(DF, ok, lapply(DF[ok], as.numeric))
or
DF2 <- DF
ok <- sapply(DF2, is.character)
DF2[ok] <- lapply(DF2[ok], as.numeric)
Note
Lines <- " Date RECORD Battery_V Data_logger_Temp_C VWC_CS7
2021-06-25T12:34:00 0 12.47 14.14 0.127"
DF <- read.table(text = Lines, header = TRUE, colClasses = "character",
strip.white = TRUE)
DF$Date <- as.POSIXct(DF$Date, format = "%Y-%m-%dT%H:%M:%S")
library(dplyr)
df <- df %>%
mutate(across(.cols = -Date, as.numeric))

Convert Date formats in base R [duplicate]

This question already has answers here:
Changing date format in R
(7 answers)
Closed 3 years ago.
Given two dates in a data frame that are in this format:
df <- tibble(date = c('25/05/95', '21/09/18'))
df$date <- as.Date(df$date)
How can I convert the dates into this format - date = c('1995-05-25', '2018-09-21') with the year appearing first and in four digit format, and by only using base R?
Here is my attempt, I successfully reversed the order, but still wasn't able to express the year in 4 digit format:
df <- tibble(date_orig = c('25/05/1995', '21/09/2018'))
df$date <- as.Date(df$date_orig)
year_date <- format(df$date, '%d')
month_date <- format(df$date, '%m')
day_date <- format(df$date, '%y')
df$newdate <- as.Date(paste(paste(year_date, month_date, sep = '-'), day_date, sep = '-'))
df$newdate_final <- as.Date(df$newdate, '%Y-%m-%d')
You need to know which format your date follows and find it in ?strptime to convert it in date object. As you required output is the standard way to represent dates you would not need format.
as.Date(df$date, "%d/%m/%Y")
#[1] "1995-05-25" "2018-09-21"

How to set timestamp as an index for a data frame in R

I have a dataframe called 'trial'. I have combined the Date and Time in the data frame to get a field which has timestamp as a POSIXct. I want to set this combined date time or the timestampas the index for my data frame 'trial' how can I do so? I have seen similar questions on this with no success.
The code is as follows:
trial <- read.csv("2018_05_04_h093500.csv", header=TRUE, skip = 16, sep=",")
trial$Date <- with(trial, as.POSIXct(paste(as.Date(Date, format="%Y/%m/%d"), Time)))
dtPOSIXct <- as.POSIXct(trial$Date )
dtTime <- as.numeric(dtPOSIXct - trunc(dtPOSIXct, "days"))
class(dtTime) <- "POSIXct"
If you're talking about rownames which is the R equivalent to index in pandas, they can't be POSIXct datetimes, they have to be characters.
# sample data
x <- data.frame('a' = 1:3, 'b' = c('a', 'b', 'c'))
print(x)
# a b
#2018-01-01 15:51:33 1 a
#2018-01-04 11:42:31 2 b
#2018-01-07 22:04:41 3 c
dates <- c('2018-01-01 15:51:33', '2018-01-04 11:42:31', '2018-01-07 22:04:41')
rownames(x) <- as.POSIXct(dates, format = '%Y-%m-%d %H:%M:%S')
print(class(rownames(x)[1]))
# [1] "character"
That said, you can still coerce them to POSIXct (or any other class, obviously) at the time of evaluation at the cost of some overhead and code clutteredness:
# print x where rownames, when coerced to POSIXct, represent dates after d
d <- as.POSIXct('2018-01-03 00:00:00', '%Y-%m-%d %H:%M:%S')
x[as.POSIXct(rownames(x), format = f) > d, ]
# a b
#2018-01-04 11:42:31 2 b
#2018-01-07 22:04:41 3 c
However, perhaps an easier approach would be to just have an arbitrary column effectively act as a datetime index:
x$date <- as.POSIXct(dates, format = '%Y-%m-%d %H:%M:%S')
class(x$date[1])
# [1] "POSIXct" "POSIXt"

Change multiple character columns to date

I have multiple character columns (around 20) that I would like to change all to date formats and drop the time using r. I've tried loops, mutate and apply.
Here is some sample data using just two columns
col1 = c("2017-04-01 23:00:00", "2017-03-03 00:00:01", "2017-04-02
00:00:01")
col2 = c("2017-04-10 08:41:49", "2017-04-10 08:39:48", "2017-04-10
08:41:51")
df <- cbind(col1, col2)
I've tried:
df <- df %>% mutate(df, funs(ymd))
and
df <- df %>% mutate(df, funs(mdy))
Both gave me an error. I've also tried putting all column names in a list and do a
for(i in namedlist) {
as_date(df[i])
glimpse(df)
}
That didn't work either.
I've tried to use the answer from Convert multiple columns to dates with lubridate and dplyr and that did not work either. That posts wanted certain variables to be converted. I want all of my variables to be converted so the var command doesn't apply.
Any suggestions to do this efficiently? Thank you.
If you're applying over all columns, you can do a very short call with lapply. I'll pass it here using data.table:
library( data.table )
setDT( df )
df <- df[ , lapply( .SD, as.Date ) ]
On your test data, this gives:
> df
col1 col2
1: 2017-04-01 2017-04-10
2: 2017-03-03 2017-04-10
3: 2017-04-02 2017-04-10
NOTE: your test data actually results in a matrix, so you need to convert it to a data.frame first (or directly to a data.table).
You can do the same thing with just base R, but I personally like the above solution better:
df <- as.data.frame( lapply( df, as.Date ) )
> df
col1 col2
1 2017-04-01 2017-04-10
2 2017-03-03 2017-04-10
3 2017-04-02 2017-04-10
EDIT: This time with the right wildcards for the as.Date function. I also added a reproducible example:
library(dplyr)
df <- data.frame(date_1 = c("2019-01-01", "2019-01-02", "2019-01-03"),
date_2 = c("2019-01-04", "2019-01-05", "2019-01-06"),
value = c(1,2,3),
stringsAsFactors = F)
str(df)
date_cols <- c("date_1", "date_2")
df_2 <- df %>%
mutate_at(vars(date_cols), funs(as.Date(., "%Y-%m-%d")))
str(df_2)

Changing date format (NA error)

So I have this data file which includes dates and other values. I've imported my data using the following code:
df <- read.csv(file.choose(), header=T, stringsAsFactors=F)
This is so that all the values in the data frame are in character. This makes the next step easier for me.
The data.frame (df) includes:
date x
20020102 1
20020102 2
The date changes every few thousand rows.
I want to change the date format so that it would be yyyy-mm-dd.
I've tried using the code:
df$date <- as.Date(df$date, format="%Y-%m-%d")
and have also used
df$date <- strptime(df$date, format="%Y-%m-%d")
but have always gotten NA values in the date column.
I'm a beginner at R so it would be very helpful if the solution could be simple or can be explained clearly.
Thanks so much!
You can use the correct format
df$date <- as.Date(df$date, format='%Y%m%d')
It is not clear whether you have numeric or non-numeric 'date' column. If it is 'numeric', convert to 'character' first
df$date <- as.Date(as.character(df$date), format='%Y%m%d')
But, strptime would work even if the column is numeric.
Or using library(lubridate)
library(lubridate)
ymd(df$date)
The problem is that your colunm "date" is not of class 'Date', it is a 'numeric' vector, thus the command as.Date returns NA`s.
You can check if the class of the colunm date is correct with this command:
class(df$date)
Following the advise from #akrun you should transform the date colunm into a 'character' vector, then you can format the style the way you want:
### your data example:
df <- data.frame(date = c(20020102, 20020102),
x = c(1,2))
class(df$date)
#> [1] "numeric"
#convert the colunm date to character
df$date <- as.character(df$date)
# Then, convert to the desired date format:
df$date <- as.Date(df$date, format='%Y%m%d')
# check the results:
df
#> date x
#> 1 2002-01-02 1
#> 2 2002-01-02 2
class(df$date)
#> [1] "Date"

Resources