Convert unix timestamp column to day of week in R - r

I am working with a data frame in R labeled "mydata". The first column, labled "ts" contains unix timestamp fields. I'd like to convert these fields to days of the week.
I've tried using strptime and POSIXct functions but I'm not sure how to execute them properly:
> strptime(ts, "%w")
--Returned this error:
"Error in as.character(x) : cannot coerce type 'closure' to vector of type 'character'"
I also just tried just converting it to human-readable format with POSIXct:
as.Date(as.POSIXct(ts, origin="1970-01-01"))
--Returned this error:
"Error in as.POSIXct.default(ts, origin = "1970-01-01") :
do not know how to convert 'ts' to class “POSIXct”"
Update: Here is what ended up working for me:
> mydata$ts <- as.Date(mydata$ts)
then
> mydata$ts <- strftime( mydata$ts , "%w" )

No need to go all the way to strftime when POSIXlt gives you this directly, and strftime calls as.POSIXlt.
wday <- function(x) as.POSIXlt(x)$wday
wday(Sys.time()) # Today is Sunday
## [1] 0
There is also the weekdays function, if you want character rather than numeric output:
weekdays(Sys.time())
## [1] "Sunday"

Related

What does calling as.numeric() do to a lubridate Date object?

I am working with an external package that's converting columns of a dataframe with the lubridate date type Date into numeric type. (Confirmed by running as.numeric() on the columns).
I'm wondering if there's a way to convert it back?
For example, if I have the date "O1-01-2021" then running as.numeric on it returns -719143. How can I turn that back into "O1-01-2021" ?
Note that Date class is part of base R, not lubridate.
You probably assumed that the data was year/month/day by mistake. Using base R to eliminate lubridate as a problem we can replicate the question's result like this:
as.numeric(as.Date("01-01-2021", "%Y-%m-%d"))
## [1] -719143
Had we used day/month/year we would have gotten:
as.numeric(as.Date("01-01-2021", "%d-%m-%Y"))
## [1] 18628
or using lubridate
library(lubridate)
as.numeric(dmy("01-01-2021"))
## [1] 18628
It would be best if you fix the mistake that resulted in -719143 but if you don't control that and are faced with an input of
-719143 and want to get as.Date("2021-01-01") as the output then:
# input x is numeric; result is Date class
fixup <- function(x) as.Date(format(.Date(x), "%y-%m-%d"), "%d-%m-%y")
fixup(-719143)
## [1] "2020-01-01"
Note that we can't tell from the question whether 01-01-2020 is supposed to represent day-month-year or month-day-year so we assumed the first but if it is to represent the second then it should be obvious at this point how to proceed.
EDIT #2: It looks like the original data is being parsed as Jan 20, year 1, which might happen if the year-month-day columns were jumbled while being parsed:
as.numeric(as.Date("01-01-2021", format = "%Y-%m-%d", origin = "1970-01-01"))
[1] -719143
as.numeric(as.Date("0001-01-20", origin = "1970-01-01"))
[1] -719143
Is there a way to share an example of the raw data as you have it? e.g. dput(MY_DATA[1:10, DATE_COL])
EDIT: -719143 is about 1970 years of days, which can't be a coincidence, given that many date/time formats use 1970 as a baseline. I wonder if 01-01-2021 is being interpreted as the numeric formula equal to -2021 and so we're looking at perhaps -2021 seconds/days/[?] before year zero, which would be about -1970 years before the epoch...
-719143/(365)
[1] -1970.255
For instance, we can get something close with:
as.numeric(as.Date("0000-01-01", origin = "1970-01-01"))
[1] -719528
Original answer:
R treats a string describing a date as text:
x <- "01-01-2021"
class(x)
[1] "character"
We can convert it to a Date data type using these two equivalent commands:
base_dt <- as.Date(x, "%m-%d-%Y") # base R version
lubridt <- lubridate::mdy(x) # convenience lubridate function
identical(base_dt, lubridt)
[1] TRUE
Under the hood, a Date object in R is a numeric value with a flag telling R it's a date:
> typeof(lubridt) # What general type of data is it?
[1] "double" # --> numeric, stored as a double
> as.numeric(lubridt)
[1] 18628
> class(lubridt) # Does it have any special class attributes?
[1] "Date" # --> yes, it's a Date
> dput(lubridt) # How would we construct it from scratch?
structure(18628, class = "Date") # --> by giving 18628 a Date attribute
In R, a Date is encoded as the number of days since 1970 began:
> as.Date("1970-01-1") + as.numeric(lubridt)
[1] "2021-01-01"
We could convert it back to the original text using:
format(base_dt, "%m-%d-%Y")
[1] "01-01-2021"
identical(x, format(base_dt, "%m-%d-%Y"))
[1] TRUE

R convert time (xx:xx:xx) to min. as.POSIXct returns error "character string is not in a standard unambiguous format"

I have some data about focus times in xx:xx:xx format that I would like to add together over several sessions. So I am trying to convert each of them to minutes or any format that can be summed and understood. I've tried as.POSIX.ct but I get the error in the title. I've also tried lubidate hms(day1)/60 and I get another error. Here's a shortened reproducible example.
day1 <- c("01:05:38", "00:56:54", "00:48:17")
day2 <- c("00:37:57", "00:21:09", "00:43:34")
day1convert <- as.numeric(as.POSIXct(day1), units = "mins")
This returns the error: "Error in as.POSIXlt.character(x, tz, ...) :
character string is not in a standard unambiguous format"
library(lubridate)
convert<-hms(day1)/60
This returns the error:"Error in validObject(.Object) :
invalid class “Period” object: periods must have integer values
Any help would be appreciated.
Not exactly sure what is your expected output but you can try the following to get time in minutes.
library(lubridate)
period_to_seconds(hms(day1))/60
#[1] 65.63333 56.90000 48.28333
If you want to convert time to POSIXct format
as.POSIXct(day1, format = "%T", tz = "UTC")
#[1] "2020-02-15 01:05:38 UTC" "2020-02-15 00:56:54 UTC" "2020-02-15 00:48:17 UTC"
In base R:
sapply(strsplit(day1, ":"), function(x) as.difftime(sum(c(60, 1, 1/60)*as.numeric(x)), units="mins"))
#> [1] 65.63333 56.90000 48.28333
1) chron::times Use chron times class. They can be added and if we convert them to numeric they are given in fractions of a day.
library(chron)
times(day1) + times(day2)
## [1] 01:43:35 01:18:03 01:31:51
as.numeric(times(day1) + times(day2))
## [1] 0.07193287 0.05420139 0.06378472
2) data.table::as.ITime Another possibility is data.table's ITime. They can be added or converting to numeric gives seconds.
library(data.table)
as.ITime(day1) + as.ITime(day2)
## [1] "01:43:35" "01:18:03" "01:31:51"
as.numeric(as.ITime(day1) + as.ITime(day2))
## [1] 6215 4683 5511

Reformatting date and timestamp with r [duplicate]

This question already has answers here:
Changing date format in R
(7 answers)
Closed 3 years ago.
In order to prevent an error in uploading xls data into a sql database, I am trying to reformat a date type of "08/22/2019 02:05 PM CDT" and want only the date, not the time or the timezone. Many efforts to use the default, POSIX and lubridate actions have failed. The xls file formats the date column as general.
I have a column of data to convert, not a single cell. This is a part of a loop for multiple files in a folder.
Failures:
#mydata_r11_Date2 <- strptime(as.character(mydata_r11_Date$Date), "%d/%m/%Y")
# parse_date_time(x = mydata_r11_Date$Date,
# orders = c("d m y", "d B Y", "m/d/y"),
# locale = "eng")
#
#
# mydata_r11_Date <- as.character(mydata_r11_Date)
mydata_r11_Date <- gsub('^([0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\\.[0-9]+[+-][0-9]{2}):([0-9]{2})$',
'\\1\\2',
mydata_r11_Date$Date)
ymd_hms(mydata_r11_Date$Date)
mydata_r11_Date <- as_date (mydata_r11_Date$Date,format = "%Y-%m-%d")
mydata_r11_Date2 <- format(as.Date(mydata_r11_Date,"%Y-%m-%d"),"%Y-%m-%d")
Errors include:
Warning message:
All formats failed to parse. No formats found.
Error in as.Date.default(x, ...) :
do not know how to convert 'x' to class “Date”
Error in as.Date.default(mydata_r11_Date$Date, format = "%Y-%m-%d") :
do not know how to convert 'mydata_r11_Date$Date' to class “Date”
Error: unexpected ',' in " mydata_r11_Date <- as.Date(mydata_r11_Date$Date),"
Error in as_date(x) : object 'x' not found
library(readxl)library(reshape2) library(lubridate)
import xsl
mydata_r11 <- read_excel("C:/FOLDER/FOLDER/FOLDER/OUTPUT/WADUJONOKO_student_assessment_results.xls",1,skip = 1, col_types = "list")
Isolate date column
mydata_r11_Date <- mydata_r11[,c(8)]
Convert date
mydata_r11_Date 2 <-
Have "08/22/2019 02:05 PM CDT"
Want "08/22/2019"
I don't understand why you are resorting to complex regex here when you seem to only want the date component, which is the first 10 characters of the timestamps. Just take the substring and then call as.Date with an appropriate format mask:
x <- "08/22/2019 02:05 PM CDT"
y <- substr(x, 1, 10)
as.Date(y, format = "%m/%d/%Y")
[1] "2019-08-22"

setting column to datetime in R

The date in my dataset is like this: 20130501000000 and I'm trying to convert this to a better datetime format in R
data1$date <- as.Date(data1$date, format = "%Y-%m-%s-%h-%m-%s")
However, I get an error for needing an origin. After I put the very first cell under date in as origin, it converts every cell under date to N/A. Is this right or should I try as.POSIXct()?
That is a somewhat degenerate format, but the anytime() and anydate() functions of the anytime package can help you, without requiring any explicit format strings:
R> anytime("20130501000000") ## returns POSIXct
[1] "2013-05-01 CDT"
R> anydate("20130501000000") ## returns Date
[1] "2013-05-01"
R>
Not that we parse from character representation here -- parsing from numeric would be wrong as we use a conflicting heuristic to make sense of dates stored a numeric values.
So here your code would just become
data1$data <- anytime::anydate(data1$date)
provided data1$date is in character, else wrap one as.character() around it.
Lastly, if you actually want Datetime rather than Date (as per your title), don't use anydate() but anytime().
Before I write my answer, I would like to say that the format argument should be the format that your string is in. Therefore, if you have "20130501000000", you have to use (you don't have - between each component of your date in the string format):
as.Date("20130501000000", format = "%Y%m%d%H%M%S")
# [1] "2013-05-01"
which works just fine, does not produce any error, and will return an object of class Date:
as.Date("20130501000000", format = "%Y%m%d%H%M%S") |> class()
# [1] "Date"
Therefore, I think your issue is more of a formatting and not origin of the date.
Now to my detailed answer:
As far as I know and can understand, the as.Date() will convert it to "date", so if you want the time part of the string as well, you have to use as.POSIXct():
as.POSIXct("20130501000000", format = "%Y%m%d%H%M%S")
# [1] "2013-05-01 EEST"
as.POSIXct("20130501000000", format = "%Y%m%d%H%M%S") |> class()
# [1] "POSIXct" "POSIXt"
Note that the timezone is EEST which is my local timezone, if you want to define the timezone, you have to define it. For example to set the timezone to UTC:
as.POSIXct("20130501000000", format = "%Y%m%d%H%M%S", tz = "UTC")
# [1] "2013-05-01 UTC"
using the as.POSIXct() you can do arithmetic with the object:
times <- c("20130501000000",
"20130501035001") # added 03:50:01 to the first element
class(times)
# [1] "character"
times <- as.POSIXct(times, format = "%Y%m%d%H%M%S", tz = "UTC")
class(times)
# [1] "POSIXct" "POSIXt"
times[2] - times[1]
# Time difference of 3.833611 hours

Convert yyyymmdd string to Date class in R

I would like to convert these dates with format YYYYMMDD to a Date class.
dates <- data.frame(Date = c("20130707", "20130706", "20130705", "20130704"))
I tried:
dates <- as.Date(dates, "%Y%m%d")
And I get the following error:
Error in as.Date.default(dates, "%Y%m%d") :
do not know how to convert 'dates' to class "Date"
What would be the correct way to set this format?
You need to provide the Date column, not the entire data.frame.
R> as.Date(dates[["Date"]], "%Y%m%d")
[1] "2013-07-07" "2013-07-06" "2013-07-05" "2013-07-04"
An extra conversion to characters works for me:
dates<-as.Date(as.character(dates),format="%Y%m%d")
Without the conversion the following error occurs:
dates<-as.Date(dates,format="%Y%m%d")
Error in as.Date.numeric(dates, format = "%Y%m%d") :
'origin' must be supplied
Different error but this might help, works for POSIXct too, paste date and hours, format %Y%m%d%H
Classic R:
> start_numeric <- as.Date('20170215', format = '%Y%m%d');
> start_numeric
[1] "2017-02-15"
> format(start_numeric, "%Y%m%d")
[1] "20170215"
Use the lubridate package for an easy conversion:
date_test <- data.frame(Date = c("20130707", "20130706", "20130705", "20130704"))
date_test$Date <- ymd(date_test$Date)
date_test
Date
1 2013-07-07
2 2013-07-06
3 2013-07-05
4 2013-07-04
Instead of using brackets, you can use variable name:
dates <- data.frame(Date = c("20130707", "20130706", "20130705", "20130704"))
as.Date(dates$Date, "%Y%m%d")
[1] "2013-07-07" "2013-07-06" "2013-07-05" "2013-07-04"

Resources