Character to date in R - r

One of the columns in my data frame is a character which has the following format (an example):
2013-02-05 08:00:00
Some of the rows are NULL in this column. I want to change the class to date format but I am getting NA for all rows.
Could you please tell me what should I do to make it work?

You should install Hadley Wickham's lubridate package, and use:
> ymd_hms("2013-02-05 08:00:00")
The package includes many other functions that'll help you (safely) manipulate datetime and interval objects.

Based on your comment, assuming your data frame is DF, and your date column (as character) DATE.STR, I would do the following:
DF$DATE=as.Date(DF$DATE.STR)
Of course, using lubridate would give you more options, but I think you can use base R for this.

Related

Converting character variable in date format

I'd like to convert my variable "birthdate" from a character class to dates. They're actually written like that "dd/mm/yyyy". I tried to use the function as.Date but I obtained something wrong :
x$age <- as.Date(x$birhtdate)
R doesn't read the character string correctly. For example 21/12/1948 becomes 0021/12/19
I am a bit lost, I also tried to use the function format but without success.
Thank for your help !
You can use the R package lubridate to explicitly use specific ordering of day and month:
x <- data.frame(birhtdate = "21/12/1948")
x$birhtdate <- lubridate::parse_date_time(x$birhtdate, orders = "dmy")
x
#> birhtdate
#> 1 1948-12-21
Created on 2023-01-04 by the reprex package (v2.0.1)
base R answer:
Yes, you need to provide R with the format, there are so much different possibilities like '-' or a space or different order mm/dd/yyyy
So:
as.Date('21/12/1948', format = '%d/%m/%Y')
will work.
Output:
[1] "1948-12-21"

Reading Time column in R

I am reading a Excel file with time as a column.
This column has values like
23:29:04
23:04:31
21:55:37
21:52:27
21:49:53
When I read this column using R , read column comes as a numeric value :
0.961469907
0.913622685
0.911423611
0.907094907
0.906250000
0.899490741
There is no correspondence between above mentioned Excel and R column values. These are just samples.
I tried using
strptime(TimeStamp,format="%H:%M:%S)
It gives all values as NA.
Please suggest how to read time correctly in R.
These numbers are fractions of a day corresponding to times. Time objects are, e.g., implemented in package chron:
library(chron)
x <- c(0.961469907, 0.913622685, 0.911423611, 0.907094907, 0.906250000, 0.899490741)
x <- times(x)
print(x)
#[1] 23:04:31 21:55:37 21:52:27 21:46:13 21:45:00 21:35:16
Read the columns as string and wrap your strptime command with as.POSIXct:
as.POSIXct(strptime(TimeStamp,format="%H:%M:%S"))

How do you convert multiple columns to date format in R using lubridate?

I have a database with multiple columns of dates as character class. I want to use the lubridate package in R to convert them all at once. I'm not having trouble parsing the date format, but in applying lubridate over multiple columns. Any suggestions?
crimes.df <- data.frame(offense.date = c('06102003', '05122006'), charge.date = c('07152003', '10012010'))
I have tried
crimes.df[,1:2]<-mdy(crimes.df[,1:2])
and
crimes.df[,1:2]<-lapply(crimes.df[,1:2], function(x) mdy(crimes.df[,1:2]))
both return this error:
Warning message:
All formats failed to parse. No formats found.
(and, inconveniently, wipe out all data in the columns.)
Using lapply, we are looping the columns of the dataset and the function mdy is applied on each column.
crimes.df[] <- lapply(crimes.df, mdy)
In the OP's code, if we are calling the anonymous function (function(x)), then the function (mdy) should be applied on 'x'
crimes.df[] <- lapply(crimes.df, function(x) mdy(x))
Also, note that since there are only 2 columns, we don't need to specify the crimes.df[,1:2]

Using R, How to Convert a Factor into a Date without losing the year?

When I read in a data file using R, a Date variable is a Factor.
For example, let epc hold the data set. Then if we look at the structure of the first date, we get
str(epc$Date[1])
Factor w/ 1 level "16/12/2006": 1
If you were to convert this to a character, as.character(epc$Date[1]), you'd get exactly the same thing: "16/12/2006"
No matter what I've tried, I can't convert this type of object into a valid date.
if the date is "16/12/2006" (which I'm assuming is Dec. 16th, 2006), then as.Date(epc_full$Date[1]) gives "0016-12-20" -- the full year is lost.
I've tried many different things, e.g., first converting the date into a character, trying different versions of as.Date(), etc., but I keep getting exactly the same result when the input is "16/12/2006"
What's the trick here?
So first, for some reason this is being read in as a factor with a level that is labeled "16/12/2006". You are converting this to a character, so I'll start there.
There are a number of ways to do this, but I think the easiest is to use the lubridate package.
#Install package
install.packages("lubridate")
library(lubridate)
yourTextDate <- "16/12/2006"
yourDate <- dmy(yourTextDate)
yourDate
If you strictly want a Date class, then:
as.Date(s, format='%d/%m/%Y')
## [1] "2006-12-16"
class(.Last.value)
## [1] "Date"
So you can convert the entire column of dates with:
as.Date(epc$Date, format='%d/%m/%Y')
This should work whether it is a character or factor.
lubridate works well as well, but if you're sticking with base R, this works well too.

extract part of a date in a dataframe column

thanks for your help in advance. i am working with the getQuote function in the quantmod package, which returns the following data frame:
is there a way to modify all the dates in the first column to exclude the time stamp, while retaining the data frame structure? i just want the "YYYY-MM-DD" in the first column. i know that if it was a vector of dates, i would use substr(df[,1],1,10). i have also looked into the apply function, with: apply(df[,1],1,substr,1,10).
Another option not mentioned yet:
tt <- getQuote("AAPL")
trunc(tt[,1], units='days')
This returns the date in POSIXlt. You can wrap it in as.POSIXct, if you want.
using ?strptime
tt <- getQuote("AAPL")
tt[,1]
[1] "2013-01-16 02:52:00 CET"
as.POSIXct(strptime(tt[,1],format ='%Y-%m-%d')) ## as.POSIXct because strptime returns POSIXlt
[1] "2013-01-16 CET"
EDIT
You can use the format argument of POSIXct, but you need to convert the tt[,1] to character before.
as.POSIXct(as.character(tt[,1]),format ='%Y-%m-%d')
[1] "2013-01-16 CET"
I would do this with lubridate
library(plyr)
library(lubridate)
tickers <- c("AAPL","AAJX","ABR")
df <- ldply(tickers, getQuote)
rownames(df) <- tickers
df[,"Trade Time"] <- paste(year(df[,"Trade Time"]),month(df[,"Trade Time"]),day(df[,"Trade Time"]),sep="-")
There might be a more elegant way of printing the date, but this is what came to me first.
You may just use gsub. No need to convert data type.
tt <- getQuote("AAPL")
tt[, 'Trade Time']<- gsub(" [0-9]{2}:[0-9]{2}:[0-9]{2}", "", tt[, 'Trade Time'])
It can be as simple as:
tt[,1]=as.Date(tt[,1])
(where tt is tt <- getQuote("AAPL"), as shown in the alternative answers)
The blank before the comma means "do all rows" and the 1 after the comma means "operate on (just) the first column".
I prefer this solution because it gives you a Date object, which must be exactly what you want if you are trying to strip off timestamps.
agstudy's answer give you a date with a timezone, and that is going to bite you the first time you run your script in a different timezone. (Aside: I got some regressions in a unit test suite when I ran them in the U.K. while there at Christmas, due to a subtle timezone assumption in my test code.)

Resources