So I have this data file which includes dates and other values. I've imported my data using the following code:
df <- read.csv(file.choose(), header=T, stringsAsFactors=F)
This is so that all the values in the data frame are in character. This makes the next step easier for me.
The data.frame (df) includes:
date x
20020102 1
20020102 2
The date changes every few thousand rows.
I want to change the date format so that it would be yyyy-mm-dd.
I've tried using the code:
df$date <- as.Date(df$date, format="%Y-%m-%d")
and have also used
df$date <- strptime(df$date, format="%Y-%m-%d")
but have always gotten NA values in the date column.
I'm a beginner at R so it would be very helpful if the solution could be simple or can be explained clearly.
Thanks so much!
You can use the correct format
df$date <- as.Date(df$date, format='%Y%m%d')
It is not clear whether you have numeric or non-numeric 'date' column. If it is 'numeric', convert to 'character' first
df$date <- as.Date(as.character(df$date), format='%Y%m%d')
But, strptime would work even if the column is numeric.
Or using library(lubridate)
library(lubridate)
ymd(df$date)
The problem is that your colunm "date" is not of class 'Date', it is a 'numeric' vector, thus the command as.Date returns NA`s.
You can check if the class of the colunm date is correct with this command:
class(df$date)
Following the advise from #akrun you should transform the date colunm into a 'character' vector, then you can format the style the way you want:
### your data example:
df <- data.frame(date = c(20020102, 20020102),
x = c(1,2))
class(df$date)
#> [1] "numeric"
#convert the colunm date to character
df$date <- as.character(df$date)
# Then, convert to the desired date format:
df$date <- as.Date(df$date, format='%Y%m%d')
# check the results:
df
#> date x
#> 1 2002-01-02 1
#> 2 2002-01-02 2
class(df$date)
#> [1] "Date"
Related
Id. Int. Date.
234. 10. 10-05-2018
345. 05. 15-05-2018
564. 04. 17-06-2018
DF <- read.csv(file)
str(df)
I found date is in factor, so I want another column next to Date column with those date but in date format.
df$dte <- as.Date(df$Date, format= "%d/%b/%Y")
But I got a column next to Date called "date"column but the values are <NA>.
Kindly help me.
Changing df$dte <- as.Date(df$Date, format= "%d/%b/%Y") to df$dte <- as.Date(df$Date, format= "%d-%m-%Y") .
This question already has answers here:
Changing date format in R
(7 answers)
Closed 3 years ago.
Given two dates in a data frame that are in this format:
df <- tibble(date = c('25/05/95', '21/09/18'))
df$date <- as.Date(df$date)
How can I convert the dates into this format - date = c('1995-05-25', '2018-09-21') with the year appearing first and in four digit format, and by only using base R?
Here is my attempt, I successfully reversed the order, but still wasn't able to express the year in 4 digit format:
df <- tibble(date_orig = c('25/05/1995', '21/09/2018'))
df$date <- as.Date(df$date_orig)
year_date <- format(df$date, '%d')
month_date <- format(df$date, '%m')
day_date <- format(df$date, '%y')
df$newdate <- as.Date(paste(paste(year_date, month_date, sep = '-'), day_date, sep = '-'))
df$newdate_final <- as.Date(df$newdate, '%Y-%m-%d')
You need to know which format your date follows and find it in ?strptime to convert it in date object. As you required output is the standard way to represent dates you would not need format.
as.Date(df$date, "%d/%m/%Y")
#[1] "1995-05-25" "2018-09-21"
I have a dataframe called 'trial'. I have combined the Date and Time in the data frame to get a field which has timestamp as a POSIXct. I want to set this combined date time or the timestampas the index for my data frame 'trial' how can I do so? I have seen similar questions on this with no success.
The code is as follows:
trial <- read.csv("2018_05_04_h093500.csv", header=TRUE, skip = 16, sep=",")
trial$Date <- with(trial, as.POSIXct(paste(as.Date(Date, format="%Y/%m/%d"), Time)))
dtPOSIXct <- as.POSIXct(trial$Date )
dtTime <- as.numeric(dtPOSIXct - trunc(dtPOSIXct, "days"))
class(dtTime) <- "POSIXct"
If you're talking about rownames which is the R equivalent to index in pandas, they can't be POSIXct datetimes, they have to be characters.
# sample data
x <- data.frame('a' = 1:3, 'b' = c('a', 'b', 'c'))
print(x)
# a b
#2018-01-01 15:51:33 1 a
#2018-01-04 11:42:31 2 b
#2018-01-07 22:04:41 3 c
dates <- c('2018-01-01 15:51:33', '2018-01-04 11:42:31', '2018-01-07 22:04:41')
rownames(x) <- as.POSIXct(dates, format = '%Y-%m-%d %H:%M:%S')
print(class(rownames(x)[1]))
# [1] "character"
That said, you can still coerce them to POSIXct (or any other class, obviously) at the time of evaluation at the cost of some overhead and code clutteredness:
# print x where rownames, when coerced to POSIXct, represent dates after d
d <- as.POSIXct('2018-01-03 00:00:00', '%Y-%m-%d %H:%M:%S')
x[as.POSIXct(rownames(x), format = f) > d, ]
# a b
#2018-01-04 11:42:31 2 b
#2018-01-07 22:04:41 3 c
However, perhaps an easier approach would be to just have an arbitrary column effectively act as a datetime index:
x$date <- as.POSIXct(dates, format = '%Y-%m-%d %H:%M:%S')
class(x$date[1])
# [1] "POSIXct" "POSIXt"
I have a csv file which has date in following format.
8/13/2016
8/13/2016
8/13/2016
2016-08-13T08:26:04Z
2016-08-13T14:30:23Z
8/13/2016
8/13/2016
When I import this into R it takes it as a character. I want to convert it into Date format,but when I convert it into date format it takes all NA values
as.Date(df$create_date,format="%m%d%y")
Date field in CSV has different formats in which date is recorded. How can I convert it into date format in R
A base R option (assuming that there are only two formats in the OP's 'create_date' column), will be to create a logical index with grepl for those date elements that start with 'year', subset the 'create_date' based on the logical index ('i1'), convert to Date class separately and assign those separately to a Date vector of the same length as the number of rows of the dataset to create the full Date class.
i1 <- grepl("^[0-9]{4}", df$create_date)
v1 <- as.Date(df$create_date[i1])
v2 <- as.Date(df$create_date[!i1], "%m/%d/%Y")
v3 <- Sys.Date() + 0:(nrow(df)-1)
v3[i1] <- v1
v3[!i1] <- v2
df$create_date <- v3
Or as I commented in the OP's post (first) parse_date_time from lubridate can be used
library(lubridate)
as.Date(parse_date_time(df$create_date, c('mdy', 'ymd_hms')))
#[1] "2016-08-13" "2016-08-13" "2016-08-13" "2016-08-13"
#[5] "2016-08-13" "2016-08-13" "2016-08-13"
data
df <- structure(list(create_date = c("8/13/2016", "8/13/2016",
"8/13/2016",
"2016-08-13T08:26:04Z", "2016-08-13T14:30:23Z", "8/13/2016",
"8/13/2016")), .Names = "create_date", class = "data.frame",
row.names = c(NA, -7L))
I have a large datafile where all the dates have been loaded as charaters. I would like to change all the Dates columns to date format. Most of the dates have "%y%m%d" format, some have "%Y%m%d" format. There are 25 columns of dates, so changing each one individually is inefficient.
I can do
df$DATE1 <- as.Date(df$DATE1, format ="%y%m%d")
df$DATE2 <- as.Date(df$DATE2, format ="%y%m%d")
etc., but very bad coding.
I tried the following code, but is is not working. This assumes all of the dates are of the format "%y%m%d". Using grep("DATE", names(df)) will get all the Dates columns
df[ , grep("DATE", names(df))] <- as.Date(df[ , grep("DATE", names(df))], "%y%m%d")
Try:
df[, cols <- grep("^DATE", names(df))] <- lapply(df[, cols <- grep("^DATE", names(df))], as.Date, format = "%y%m%d")
Example:
df <- data.frame(DATE1 = c('910812', '900928'), DATE2 = c('890813', '890910'))
# Apply the above and you get:
# > df
# DATE1 DATE2
# 1 1991-08-12 1989-08-13
# 2 1990-09-28 1989-09-10
# > class(df[, 1])
# [1] "Date"