Cannot Correctly Subset R DF - r

I have pulled some data via sql into a dataframe. I am now trying to subset such data and have had no luck.
I wish to loop through each row and identify the previous hour, after which I wish to select a subset of the DF where date == previous hour. (I understand there are other ways of doing this however i wish to understand why this isn't working). When I do this it returns an empty df. However If i directly paste the value of previous hour as a string I get the result I desire.
Both variables are POSIXCT and any attempt to convert to character fails. Can someone please tell me what on earth is going on? :S
My code:
for(row in 1:3){
PreviousHour <- as.POSIXct(Data$mydate[row] - hours(1), tz = "UTC")
Date <- Data$mydate[row]
print(c(Data$mydate[row],PreviousHour))
#"2019-11-20 23:00:00 GMT" "2019-11-20 22:00:00 GMT"
print(Data$mydate[row] == PreviousHour)
#FALSE
print(subset(Data,Data$mydate == PreviousHour))
# A tibble 0x5
print(subset(Data,Data$mydate == "2019-11-20 22:00:00 GMT"))
# A tibble 1x5
}
Code if I manually create the df (This works):
mydate <- c(as.POSIXct("2019-11-20 22:00:00", tz = "UTC"),as.POSIXct("2019-11-20 21:00:00", tz = "UTC"))
Data <- data.frame(mydate)
for(row in 1:1){
PreviousHour <- as.POSIXct(Data$mydate[row] - hours(1), tz = "UTC")
Date <- Data$mydate[row]
print(c(Data$mydate[row],PreviousHour))
#"2019-11-20 22:00:00 GMT" "2019-11-20 21:00:00 GMT"
print(Data$mydate[row] == PreviousHour)
#FALSE
print(subset(Data,Data$mydate == PreviousHour))
# A tibble 1x1
}

Related

as.POSIXct returning a double when used in a function instead of DateTime

I have a messy database to deal with where the date time was sometimes stored as 24 hour format with no seconds and other times it was stored as 12 hour time format with AM/PM at the end (could have happened during a Windows update of our measurement computer or something, I don't know).
I want to convert the DateTime string to a usable DateTime object with as.POSIXct but when I try the follow code it is converted into a double (checked the class it is also numeric)
main_function <- function(res_df)
{
res_df <- res_df %>%
mutate(DateTime = sapply(DateTime, date_time_convert))
}
date_time_convert <- function(dt_string, tz="Europe/Amsterdam")
{
if(str_detect(dt_string, "M")){
dt_format <- "%m/%d/%Y %I:%M:%S %p"
}else
{
dt_format <- "%m/%d/%Y %H:%M"
}
as.POSIXct(dt_string, format=dt_format, tz=tz)
}
When I debug, the code executes properly in the function (returns a DateTime object), but when it moves into my dataframe the dates are all converted into doubles.
sapply and similar do not always play well with POSIXt as an output. Here's an alternative: use do.call(c, lapply(..., date_time_convert)).
Demo with sample data:
vec <- c("2021-01-01", "2022-01-01")
### neither 'sapply(..)' nor 'unlist(lapply(..))' work
sapply(vec, as.POSIXct)
# 2021-01-01 2022-01-01
# 1609477200 1641013200
unlist(lapply(vec, as.POSIXct))
# [1] 1609477200 1641013200
do.call(c, lapply(vec, as.POSIXct))
# [1] "2021-01-01 EST" "2022-01-01 EST"
which means your code would be
res_df %>%
mutate(DateTime = do.call(c, lapply(DateTime, date_time_convert)))

R. Convert TimeStamp column from DataFrame to Date Format column

I have got a question.
There are a column with timestamp records like '1643410273'(summary more than 1.2M records). How can I transform it into Date format?
I created this code (R Language):
mydata <- read.csv("summary_dataset.csv")
unique(mydata$Callsign)
flight <- mydata[mydata$Callsign == "AFR228",]
AltitudeValue <- flight$Altitude
UTC_Timestamp <- flight$Timestamp
Flight_Date <- vector()
for (i in 1:length(UTC_Timestamp)){
Flight_Date[i]=as.POSIXct(UTC_Timestamp[i], origin='1970-01-01', tz="UTC")
}
Flight_Date
But, in result, vector Flight_Date was filled Timestamp records. What's wrong?
Convert the Timestamp column first to numeric, change it to POSIXct format by passing origin and extract only the date from it.
flight$Flight_Date <- as.Date(as.POSIXct(as.numeric(flight$Timestamp),
origin='1970-01-01', tz="UTC"))
Example -
as.POSIXct(1643410273, origin='1970-01-01', tz="UTC")
#[1] "2022-01-28 22:51:13 UTC"
as.Date(as.POSIXct(1643410273, origin='1970-01-01', tz="UTC"))
#[1] "2022-01-28"

change all datetimes to 2PM and keep the dates same

I have datetime object and I want to change all times to 2PM and keep the dates same.
I used floor_date to get the start of the corresponding date and then added period of 14 hours to get 2PM.
Sometime, result shows only the date and no time. Sometimes it shows both date and time.
Is there another approach to do this
library(lubridate)
t1 <- floor_date(Sys.time(), unit = "day") + hours(14)
t2 <- floor_date(ymd_hms("2021-08-25 10:36:00"), unit = "day") + hours(14)
You can replace the time component with the hour. Here is a function to do that.
change_time_to_x <- function(time, x) {
as.POSIXct(sub('\\s.*', x, time), tz = 'UTC')
}
input <- lubridate::ymd_hms(Sys.time(), "2021-08-25 10:36:00", "2012-12-31 00:00:00")
change_time_to_x(input, '14:00:00')
#[1] "2021-08-26 14:00:00 UTC" "2021-08-25 14:00:00 UTC" "2012-12-31 14:00:00 UTC"

Creating a dummy variable for certain hours of the day

i need some help. I'm currently trying to fit a linear model to hourly electricity prices. So, I was thinking of creating a dummy, which takes the value 1, if the hour of the day is between 06:00 and 20:00. Unfortunately, I have struggled so far.
time.cet <- as.POSIXct(time.numeric, origin = "1970-01-01", tz=local.time.zone)
hours.S <- strftime(time.cet, format = "%H:%M:%S", tz=local.time.zone)
head(time.cet)
[1] "2007-01-01 00:00:00 CET" "2007-01-01 01:00:00 CET" "2007-01-01 02:00:00 CET"
[4] "2007-01-01 03:00:00 CET" "2007-01-01 04:00:00 CET" "2007-01-01 05:00:00 CET"
I, hope someone can help.
When I do time cutoffs I like to make the cutoffs as objects. This way, if you need to change the cutoffs, it's much easier to change the object's value instead of the value in the conditional statements.
My code below uses lubridate(), which is a great package for managing time/dates.
My code below should give you the info you need to incorporate a dummy variable into your analysis.
###
### Load Package
###
library(lubridate)
###
### Designate Time Cut-Offs
###
Beginning <- hms("06:00:00")
End <- hms("20:00:00")
###
### Designate Test Cut-Offs
###
Test.1 <- hms("5:00:00")
Test.2 <- hms("11:00:00")
###
### Test Conditional Logic
###
### Value will be 1 if time is between, value will be 0 if it is not.
###
ifelse( ((Test.1 >= Beginning) & (Test.1 <= End)) , 1, 0)
########## This should (and does) return a 0
ifelse( ((Test.2 >= Beginning) & (Test.2 <= End)) , 1, 0)
####### This should (and does) return a 1
###
### Create New Variable On Previous Data Frame (Your.DF) named Time.Dummy
###
### Value for new variable will be 1 if time is between, value will be 0 if it is not.
###
Your.DF$Time.Dummy <- ifelse( ((time.cet >= Beginning) & (time.cet <= End)) , 1, 0)
ifelse() statements are a convenient way to create a dummy variable. I don't know much about working with time personally, but creating a dummy variable would take a form similar to:
dummy <- with(data, ifelse(time > 06:00 & time < 20:00, 1, 0)
Where data is whatever your data is called, and time is the column that your times are stored in. You may need to play around with the conditions a little bit if the times don't behave like normal numeric vectors (which I assume for this purpose they will).
library(lubridate)
# Create fake data
set.seed(2)
dat = data.frame(time = seq(ymd_hms("2016-01-01 00:00:00"), ymd_hms("2016-01-31 00:00:00"), by="hour"))
dat$price = 1 + cumsum(rnorm(nrow(dat), 0, 0.01))
# Create time dummy
dat$dummy = ifelse(hour(dat$time) >=6 & hour(dat$time) <= 20, 1, 0)
Try to include reproducible code next time. Looks like you're missing time.numeric for instance.
Okay, I had to make up some random times.
time.cet <- c( ymd_hms( "2007-01-01 00:00:00" ),
ymd_hms( "2007-01-01 06:00:00" ),
ymd_hms( "2007-01-01 12:00:00" ) )
time.cet
[1] "2006-12-31 18:00:00 CST" "2007-01-01 00:00:00 CST" "2007-01-01 06:00:00 CST"
Note a time zone issue, which is unimportant to the solution.
You can use dplyr::between and lubridate::hour to get a list of TRUE/FALSE (or 1/0) for whether X time is between A & B.
library(dplyr)
library(lubridate)
A <- 6
B <- 20
between( hour(time.cet), A, B )
[1] TRUE FALSE TRUE
Note that between is inclusive >= & <=

How to maintain time zone with POSIX DateTime and avoid NAs introduced by coercion in R?

I create a time series dataframe that I will use to merge other time series data into.
dates2010 <- seq(as.POSIXct("2010-06-15 00:00:00", tz = "GMT"), as.POSIXct("2010-09-15 23:00:00", tz = "GMT"), by="hour") # make string of DateTimes for summer 2010
dates2011 <- seq(as.POSIXct("2011-06-15 00:00:00", tz = "GMT"), as.POSIXct("2011-09-15 23:00:00", tz = "GMT"), by="hour") # make string of DateTimes for summer 2011
dates <- c(dates2010, dates2011) # combine the dates from both years
sites <- c("a", "b", "c") # make string of all sites
datereps <- rep(dates, length(sites)) # repeat the date string the same number of times as there are sites
sitereps <- rep(sites, each = length(dates)) # repeat each site in the string the same number of times as there are dates
n <- data.frame(DateTime = datereps, SiteName = sitereps) # merge two strings with all dates and all sites
n <- n[order(n$SiteName, n$Date),] # re-order based on site, then date
If I run the above code, 'dates2010' and 'dates2011' are in GMT format: dates2010[1] "2011-06-15 00:00:00 GMT".
But when I create the object 'dates' for some reason format switches to EST: dates[1]
"2010-06-14 19:00:00 EST"
Maybe it has something to do with POSIX classes?
class(dates2010)
[1] "POSIXct" "POSIXt"
I attempted to change the default time zone for R to GMT to avoid time zone switching problems. This results in an NA coersion error when I attempt to order the data frame 'n' and merge other data frames into 'n'.
n <- n[order(n$SiteName, n$Date),]
Warning message:
In xtfrm.POSIXct(x) : NAs introduced by coercion
Any thoughts on how I might keep time zones constant and avoid the NA coercion errors? Thank You!
c() drops attributes. So when you created dates, the time zone was dropped and it automatically defaulted to the current locale. Fortunately you can use structure() and set the time zone there.
dates <- structure(c(dates2010, dates2011), tzone = "GMT")
head(dates)
# [1] "2010-06-15 00:00:00 GMT" "2010-06-15 01:00:00 GMT"
# [3] "2010-06-15 02:00:00 GMT" "2010-06-15 03:00:00 GMT"
# [5] "2010-06-15 04:00:00 GMT" "2010-06-15 05:00:00 GMT"
If dates was already created, you can add/change the tzone attribute later.
attr(dates, "tzone") <- "GMT"

Resources