I am working in R and I need to change from a column in format
9/27/2011 3:33:00 PM
to a value format. In Excel I can use the function value() but I do not know how to do it in R.
My data looks like this:
9/27/2011 15:33 a 1 5 9
9/27/2011 15:33 v 2 6 2
9/27/2011 15:34 c 3 7 1
To convert a string into R date format, use as.POSIXct - then you can coerce it to a numeric value using as.numeric:
> x <- as.POSIXct("9/27/2011 3:33:00 PM", format="%m/%d/%Y %H:%M:%S %p")
> x
[1] "2011-09-27 03:33:00 BST"
> as.numeric(x)
[1] 1317090780
The value you get indicates the number of seconds since an arbitrary date, usually 1/1/1970. Note that this is different from Excel, where a date is stored as the number of days since an arbitrary date (1/1/1900 if my memory serves me well - I try not to use Excel any more.)
For more information, see ?DateTimeClasses
This was useful for me:
> test=as.POSIXlt("09/13/2006", format="%m/%d/%Y")
> test
[1] "2006-09-13"
> 1900+test$year
[1] 2006
> test$yday
[1] 255
> test$yday/365
[1] 0.6986301
> 1900+test$year+test$yday/366
[1] 2006.697
You can use similar approaches if you need day numbers like in Excel.
Related
I am working with an external package that's converting columns of a dataframe with the lubridate date type Date into numeric type. (Confirmed by running as.numeric() on the columns).
I'm wondering if there's a way to convert it back?
For example, if I have the date "O1-01-2021" then running as.numeric on it returns -719143. How can I turn that back into "O1-01-2021" ?
Note that Date class is part of base R, not lubridate.
You probably assumed that the data was year/month/day by mistake. Using base R to eliminate lubridate as a problem we can replicate the question's result like this:
as.numeric(as.Date("01-01-2021", "%Y-%m-%d"))
## [1] -719143
Had we used day/month/year we would have gotten:
as.numeric(as.Date("01-01-2021", "%d-%m-%Y"))
## [1] 18628
or using lubridate
library(lubridate)
as.numeric(dmy("01-01-2021"))
## [1] 18628
It would be best if you fix the mistake that resulted in -719143 but if you don't control that and are faced with an input of
-719143 and want to get as.Date("2021-01-01") as the output then:
# input x is numeric; result is Date class
fixup <- function(x) as.Date(format(.Date(x), "%y-%m-%d"), "%d-%m-%y")
fixup(-719143)
## [1] "2020-01-01"
Note that we can't tell from the question whether 01-01-2020 is supposed to represent day-month-year or month-day-year so we assumed the first but if it is to represent the second then it should be obvious at this point how to proceed.
EDIT #2: It looks like the original data is being parsed as Jan 20, year 1, which might happen if the year-month-day columns were jumbled while being parsed:
as.numeric(as.Date("01-01-2021", format = "%Y-%m-%d", origin = "1970-01-01"))
[1] -719143
as.numeric(as.Date("0001-01-20", origin = "1970-01-01"))
[1] -719143
Is there a way to share an example of the raw data as you have it? e.g. dput(MY_DATA[1:10, DATE_COL])
EDIT: -719143 is about 1970 years of days, which can't be a coincidence, given that many date/time formats use 1970 as a baseline. I wonder if 01-01-2021 is being interpreted as the numeric formula equal to -2021 and so we're looking at perhaps -2021 seconds/days/[?] before year zero, which would be about -1970 years before the epoch...
-719143/(365)
[1] -1970.255
For instance, we can get something close with:
as.numeric(as.Date("0000-01-01", origin = "1970-01-01"))
[1] -719528
Original answer:
R treats a string describing a date as text:
x <- "01-01-2021"
class(x)
[1] "character"
We can convert it to a Date data type using these two equivalent commands:
base_dt <- as.Date(x, "%m-%d-%Y") # base R version
lubridt <- lubridate::mdy(x) # convenience lubridate function
identical(base_dt, lubridt)
[1] TRUE
Under the hood, a Date object in R is a numeric value with a flag telling R it's a date:
> typeof(lubridt) # What general type of data is it?
[1] "double" # --> numeric, stored as a double
> as.numeric(lubridt)
[1] 18628
> class(lubridt) # Does it have any special class attributes?
[1] "Date" # --> yes, it's a Date
> dput(lubridt) # How would we construct it from scratch?
structure(18628, class = "Date") # --> by giving 18628 a Date attribute
In R, a Date is encoded as the number of days since 1970 began:
> as.Date("1970-01-1") + as.numeric(lubridt)
[1] "2021-01-01"
We could convert it back to the original text using:
format(base_dt, "%m-%d-%Y")
[1] "01-01-2021"
identical(x, format(base_dt, "%m-%d-%Y"))
[1] TRUE
I have three data tables in R. Each one has a date column. The tables are vix_data,gold_ohlc_data,btc_ohlc_data. They are formatted as follows:
head(vix_data$Date)
[1] 1/2/04 1/5/04 1/6/04 1/7/04 1/8/04 1/9/04
3435 Levels: 1/10/05 1/10/06 1/10/07 1/10/08 1/10/11 ... 9/9/16
head(gold_ohlc_data$date)
[1] 8/23/17 8/22/17 8/21/17 8/18/17 8/17/17 8/16/17
2519 Levels: 1/10/08 1/10/11 1/10/12 1/10/13 1/10/14 ... 9/9/16
head(btc_ohlc_data$Date)
[1] "2017-08-23" "2017-08-22" "2017-08-21" "2017-08-20" "2017-08-19"
[6] "2017-08-18"
How can I change the date column in the vix_data and gold_ohlc_data tables to match the btc_ohlc_data format? I have tried several methods, for example using as.Date to transform each column- but this usually messes up the values and inserts a lot of N/A's
An option is to use functions from the package lubridate. The users need to know which one is day and which one is month to select the right function to use, such as dmy or mdy
# Load package
library(lubridate)
# Create example string
date1 <- c("1/2/04", "1/5/04", "1/6/04", "1/7/04", "1/8/04", "1/9/04")
date2 <- c("8/23/17", "8/22/17", "8/21/17", "8/18/17", "8/17/17", "8/16/17")
# Convert to date class
dmy(date1)
# [1] "2004-02-01" "2004-05-01" "2004-06-01" "2004-07-01" "2004-08-01" "2004-09-01"
mdy(date1)
# [1] "2004-01-02" "2004-01-05" "2004-01-06" "2004-01-07" "2004-01-08" "2004-01-09"
mdy(date2)
# [1] "2017-08-23" "2017-08-22" "2017-08-21" "2017-08-18" "2017-08-17" "2017-08-16"
Look into the package lubridate. lubridate::dmy() and ymd() should handle this just fine.
It looks like your data are read in as factors, so first you'll have to change them to characters. Then after that you can convert it to a date and specify the input format where %m represents the numerical month, %d represents the day, and %y represents the 2-digit year.
x <- c('1/2/04', '1/5/04', '1/6/04', '1/7/04', '1/8/04', '1/9/04')
y <- as.Date(x, format = "%m/%d/%y")
y
[1] "2004-01-02" "2004-01-05" "2004-01-06" "2004-01-07" "2004-01-08"
[6] "2004-01-09"
Are you sure you're specifying as.Date correctly? For example, do you have %y, instead of %Y?
I did the following and it worked:
> vix <- c("1/2/04", "1/5/04", "1/6/04", "1/7/04", "1/8/04", "1/9/04")
> vix<- as.factor(vix)
> vix
[1] 1/2/04 1/5/04 1/6/04 1/7/04 1/8/04 1/9/04
Levels: 1/2/04 1/5/04 1/6/04 1/7/04 1/8/04 1/9/04
> as.Date(vix, "%m/%d/%y")
[1] "2004-01-02" "2004-01-05" "2004-01-06" "2004-01-07" "2004-01-08" "2004-01-09"
I have a column of dates containing 2 different formats, that is DD/MM/YY and D/M/YY. Because the Microsoft Excel (for mac 2011, 14.3.9) recognised those dates labelled D/M/YY as M/D/YY in part of the variables, the output dates become incorrect.
Then I turned to R and tried to transform the column into a format of "DD-MON-YYYY", where MON is short form of months, like 01-Jan-2014. The column is something like this:
> head(date, 10)
date
1 17/12/96
2 27/6/07
3 21/6/13
4 24/7/13
5 17/7/13
6 16/7/13
7 13/10/99
8 20/2/97
9 14/12/96
10 19/6/13
I used the format function
format(date,"%d %b %Y")
And the output was
Error in format.default(structure(as.character(x), names = names(x), dim = dim(x), :
invalid 'trim' argument
I have also tried the lubridate package with no success.
> library(lubridate)
> dmy(date)
[1] NA
Warning message:
All formats failed to parse. No formats found.
Is there any simple method to transform the date?
You need to transform your strings to objects of class date, e.g.
as.Date("17/12/96", "%d/%m/%y")
[1] "1996-12-17"
and then apply your format
format(as.Date("17/12/96", "%d/%m/%y"), "%d-%b-%Y")
[1] "17-Dec-1996"
I have a column named timings of class factor with time stamps in the following format:
1/11/07 15:15
I applied strptime on timings to generate tStamp as follows:
tStamp=strptime(timings,format="%m/%d/%Y %H:%M")
i)
The corresponding entry in tStamp looks like 0007-01-11 15:15:00 now. Why has it made 2007 or 07 into 0007? What is a correct way to generate tStamp?
ii)
After generating tStamp correctly, how do we convert it to the Unix time Seconds. (Seconds since...1970) format?
You need the lowercase %y for 2-digit years:
R> pt <- strptime("1/11/07 15:15",format="%m/%d/%y %H:%M")
R> pt
[1] "2007-01-11 15:15:00 CST"
R>
where CST is my local timezone.
And as.numeric() or as.double() converts to a double ...
R> as.numeric(pt)
[1] 1168550100
... which has fractional seconds if those are in the input:
R> options("digits.secs"=3) # show milliseconds
R> as.numeric(Sys.time()) # convert current time
[1] 1372201674.52 # now with sub0seconds.
How can I add one hour to all the elements of the index of a zoo series?
I've tried
newseries <- myzooseries
index(newseries) <- index(myzooseries)+times("1:00:00")
but I get the message
Incompatible methods ("Ops.dates", "Ops.times") for "+"
thanks
My index is a chron object with date and time but I've tried with simpler examples and I can't get it
This is easily solved by adding the time you want in a numerical fashion :
newseries <- myzooseries
index(newseries) <- index(myzooseries) + 1/24
chron objects are represented as decimal numbers, so you can use that to calculate. A day is 1, so an hour is 1/24, a minute 1/1440 and so on. You can see this easily if you use the function times. This gives you the times of the object tested, eg :
> A <- chron(c("01/01/97","01/02/97","01/03/97"))
> B <- A + 1/24
> B
[1] (01/01/97 01:00:00) (01/02/97 01:00:00) (01/03/97 01:00:00)
> times(A)
Time in days:
[1] 9862 9863 9864
> times(B)
Time in days:
[1] 9862.042 9863.042 9864.042
> times(B-A)
[1] 01:00:00 01:00:00 01:00:00
> times(A[3]-B[1])
Time in days:
[1] 1.958333
Convert to POSIXct, add 60*60 (1h in s) and then convert back.