Date and Time defaulting to Jan 01, 1AD in Lubridate R package - r

folks...
I am having trouble with date/time showing up properly in lubridate.
Here's my code:
Temp.dat <- read_excel("Temperature Data.xlsx", sheet = "Sheet1", na="NA") %>%
mutate(Treatment = as.factor(Treatment),
TempC=as.factor(TempC),
TempF=as.factor(TempF),
Month=as.factor(Month),
Day=as.factor(Day),
Year=as.factor(Year),
Time=as.factor(Time))%>%
select(TempC, Treatment, Month, Day, Year, Time)%>%
mutate(Measurement=make_datetime(Month, Day, Year, Time))
Here's what it spits out:
tibble [44 x 7] (S3: tbl_df/tbl/data.frame)
$ TempC : Factor w/ 38 levels "15.5555555555556",..: 31 32 29 20 17 28 27 26 23 24 ...
$ Treatment : Factor w/ 2 levels "Grass","Soil": 1 1 1 1 2 2 2 2 2 2 ...
$ Month : Factor w/ 1 level "6": 1 1 1 1 1 1 1 1 1 1 ...
$ Day : Factor w/ 2 levels "15","16": 1 1 1 1 1 1 1 1 1 1 ...
$ Year : Factor w/ 1 level "2022": 1 1 1 1 1 1 1 1 1 1 ...
$ Time : Factor w/ 3 levels "700","1200","1600": 3 3 3 3 3 3 3 3 3 3 ...
**$ Measurement: POSIXct[1:44], format: "0001-01-01 03:00:00" "0001-01-01 03:00:00" "0001-01-01 03:00:00" "0001-01-01 03:00:00" ...**
I've put asterisks by the problem result. It should spit out June 16th at 0700 or something like that, but instead it's defaulting to January 01, 1AD for some reason. I've tried adding colons to the date in excel, but that defaults to a 12-hour timecycle and I'd like to keep this at 24 hours.
What's going on here?

This will work as long as the format in the excel file for date is set to time, and it imports as a date-time object that lubridate can interpret.
library(dplyr)
library(lubridate)
Temp.dat <- read_excel("t.xlsx", sheet = "Sheet1", na="NA") %>%
mutate(Treatment = as.factor(Treatment),
TempC = as.numeric(TempC),
TempF = as.numeric(TempF),
Month = as.numeric(Month),
Day = as.numeric(Day),
Year = as.numeric(Year),
Hour = hour(Time),
Minute = minute(Time)) %>%
select(TempC, Treatment, Month, Day, Year, Hour, Minute) %>%
mutate(Measurement = make_datetime(year = Year,
month = Month,
day = Day,
hour = Hour,
min = Minute))
Notice the value for the arguments for make_datetime() are set to numeric, which is what the function expects. If you pass factors, the function gives you the weird dates you were seeing.
No need to convert Time to string and extract hours and minutes, as I suggested in the comments, since you can use lubridate's minute() and hour() functions.
EDIT
In order to be able to use lubridate's functions Time needs to be a date-time object. You can check that it is by looking at what read_excel() produces
> str(read_excel("t.xlsx", sheet = "Sheet1", na="NA"))
tibble [2 × 7] (S3: tbl_df/tbl/data.frame)
$ Treatment: chr [1:2] "s" "c"
$ TempC : num [1:2] 34 23
$ TempF : num [1:2] 99 60
$ Month : num [1:2] 5 4
$ Day : num [1:2] 1 15
$ Year : num [1:2] 2020 2021
$ Time : POSIXct[1:2], format: "1899-12-31 04:33:23" "1899-12-31 03:20:23"
See that Time is type POSIXct, a date-time object. If it is not, then you need to convert it into one if you want to use lubridate's minute() and hour() functions. If it cannot be converted, there are other solutions, but they depend on what you have.

Related

How to convert a column(class is POSIXct) in a preloaded dataset in R to a character column?

I have this preloaded dataset in R that I got from library(dplyr). The dataset names pedestrian.
enter image description here
I want to convert a column Date_Time whose class is S3: POSIXct to chr.
I used as.character() but it didn't give me what I want. My goal is to create a new dataset that exclude Date_Time column. Select() still keep Date_Time column.
You have some incorrect understandings of where you got the data. There is no such dataset in dplyr. How do I know this?
data(package="dplyr")
# ---- returns ---
Data sets in package ‘dplyr’:
band_instruments Band membership
band_instruments2 Band membership
band_members Band membership
starwars Starwars characters
storms Storm tracks data
There is a dataset named "pedestrian" in the naniar package and it does have entries that match values in the fragment of data imaged in the link (although the column names and column order are different). Specifically:
grep("Birrarung", (naniar::pedestrian$sensor_name) ) # returns 8455+1001 values)
But your dataset has too many entries to be an exact match. Your dataset has 66,037 lines, somewhat less than double the number of rows in naniar::pedestrian.
str(naniar::pedestrian)
tibble [37,700 × 9] (S3: tbl_df/tbl/data.frame)
$ hourly_counts: int [1:37700] 883 597 294 183 118 68 47 52 120 333 ...
$ date_time : POSIXct[1:37700], format: "2016-01-01 00:00:00" "2016-01-01 01:00:00" "2016-01-01 02:00:00" "2016-01-01 03:00:00" ...
$ year : int [1:37700] 2016 2016 2016 2016 2016 2016 2016 2016 2016 2016 ...
$ month : Ord.factor w/ 12 levels "January"<"February"<..: 1 1 1 1 1 1 1 1 1 1 ...
$ month_day : int [1:37700] 1 1 1 1 1 1 1 1 1 1 ...
$ week_day : Ord.factor w/ 7 levels "Sunday"<"Monday"<..: 6 6 6 6 6 6 6 6 6 6 ...
$ hour : int [1:37700] 0 1 2 3 4 5 6 7 8 9 ...
$ sensor_id : int [1:37700] 2 2 2 2 2 2 2 2 2 2 ...
$ sensor_name : chr [1:37700] "Bourke Street Mall (South)" "Bourke Street Mall (South)" "Bourke Street Mall (South)" "Bourke Street Mall (South)" ...
Your request seems a bit confusing. "My goal is to create a new dataset that exclude Date_Time column. Select() still keep Date_Time column." There is no column with that name in the pedestrian dataset (since capitalization needs to be exact), but if there were, it would be simple matter to exclude it:
new_dat <- old_dat[!"Date_Time" %in% names(old_dat)]
# And maybe
new_dat <- old_dat %>% select(-Date_Time)
It would also be a simple matter to destructively convert to character with:
new_dat$Date_Time <- as.character(old_dat$Date_Time)
Since you have not shown what code you tried and cannot even tell us where the data comes from, we can only speculate what you did and how you are failing.
Maybe someone else created a datset from the publicly accessible data at https://data.melbourne.vic.gov.au/Transport/Pedestrian-Counting-System-Monthly-counts-per-hour/b2ak-trbp
AHA! Found it!
https://tsibble.tidyverts.org/reference/pedestrian.html
str(pedestrian)
tbl_ts [66,037 × 5] (S3: tbl_ts/tbl_df/tbl/data.frame)
$ Sensor : chr [1:66037] "Birrarung Marr" "Birrarung Marr" "Birrarung Marr" "Birrarung Marr" ...
$ Date_Time: POSIXct[1:66037], format: "2015-01-01 00:00:00" "2015-01-01 01:00:00" "2015-01-01 02:00:00" "2015-01-01 03:00:00" ...
$ Date : Date[1:66037], format: "2015-01-01" "2015-01-01" "2015-01-01" "2015-01-01" ...
$ Time : int [1:66037] 0 1 2 3 4 5 6 7 8 9 ...
$ Count : int [1:66037] 1630 826 567 264 139 77 44 56 113 166 ...
- attr(*, "key")= tibble [4 × 2] (S3: tbl_df/tbl/data.frame)
..$ Sensor: chr [1:4] "Birrarung Marr" "Bourke Street Mall (North)" "QV Market-Elizabeth St (West)" "Southern Cross Station"
..$ .rows : list<int> [1:4]
.. ..$ : int [1:14566] 1 2 3 4 5 6 7 8 9 10 ...
.. ..$ : int [1:16414] 14567 14568 14569 14570 14571 14572 14573 14574 14575 14576 ...
.. ..$ : int [1:17518] 30981 30982 30983 30984 30985 30986 30987 30988 30989 30990 ...
.. ..$ : int [1:17539] 48499 48500 48501 48502 48503 48504 48505 48506 48507 48508 ...
.. ..# ptype: int(0)
..- attr(*, ".drop")= logi TRUE
- attr(*, "index")= chr "Date_Time"
..- attr(*, "ordered")= logi TRUE
- attr(*, "index2")= chr "Date_Time"
- attr(*, "interval")= interval [1:1] 1h
..# .regular: logi TRUE
And now I see that the inimitable Rob Hyndman is behind this project: https://robjhyndman.com/hyndsight/tsibbles/
And I guess I am a bit behind the times or perhaps I should say I'm behind the time series. There are over 500 hits for an SO search on "[r] tsibble". Now I need to ask; did this answer your question?
Testing:
new_ped <- pedestrian %>% select(-Date_Time) # fails
pedestrian$Date_Time <- as.character(pedestrian$Date_Time) # succeeds
The tsibble::pedestrian dataset is an S4 object and may not behave in the same manner as ordinary R objects. But my error was in using the wrong operator for column removal.
?select # need to choose dplyr version rather than MASS version
Need to use ! rather than "-":
new_ped <- pedestrian %>% select(!Date_Time)
> str(new_ped)
tibble [66,037 × 4] (S3: tbl_df/tbl/data.frame)
$ Sensor: chr [1:66037] "Birrarung Marr" "Birrarung Marr" "Birrarung Marr" "Birrarung Marr" ...
$ Date : Date[1:66037], format: "2015-01-01" "2015-01-01" "2015-01-01" "2015-01-01" ...
$ Time : int [1:66037] 0 1 2 3 4 5 6 7 8 9 ...
$ Count : int [1:66037] 1630 826 567 264 139 77 44 56 113 166 ...

Problem of ambiguous character format using scale_x_date in ggplot2

str(tidy_factors)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 70650 obs. of 4 variables:
$ date : Date, format: "1992-06-01" "1992-06-02" ...
$ Factor : Factor w/ 5 levels "CMA","HML","MKT",..: 3 3 3 3 3 3 3 3 3 3 ...
$ Variable: Factor w/ 2 levels "Centrality","Return": 2 2 2 2 2 2 2 2 2 2 ...
$ Value : num -0.0012 -0.0022 -0.0012 -0.0029 0.0003 -0.0043 -0.0037 -0.0038 0.0026 -0.0024 ...
I would like to understand the pattern in the Value that the Factor takes over time (date).
library(tidyverse)
tidy_factors %>% filter(Variable=="Centrality")%>%
group_by(date) %>%
ggplot(aes(x=date,y=Factor, fill=Value))+
geom_bar(stat="identity")
I get to visualize it in a nice manner but the dates on the x axis are indistinguishable. When I try to scale_x_date to get a better understanding of the values that it takes according to different periods I get the following error:
tidy_factors %>% filter(Variable=="Centrality")%>%
group_by(date) %>%
ggplot(aes(x=date,y=Factor, fill=Value))+
geom_bar(stat="identity")+
scale_x_date(date_breaks = "1 year", date_labels="%Y")
Error in charToDate(x) :
character string is not in a standard unambiguous format
I also tried "1 years","1 month" ecc...
The dates are already unique for each Factor level. Can you tell me what is the problem?

How to create a time interval that count the rows in such time interval in R

I have a data frame that stores call records from a call center. My purpose is to count how many records exist per time interval, for example, in a time interval of 30 minutes there may be three call records (that is, three calls entered within that specific time interval); In case there are no records for that time interval, then my counter should show me a zero value.
This post was useful but I do not achieve that when there are no records in a time interval it shows me a zero value.
This is the structure of my call_log:
Classes ‘data.table’ and 'data.frame': 24416 obs. of 23 variables:
$ closecallid : int 1145000 1144998 1144997 1144996 1144995 1144991 1144989 1144987 1144986 1144984 ...
$ lead_id : int 1167647 1167645 1167644 1167643 1167642 1167638 1167636 1167634 1167633 1167631 ...
$ list_id :integer64 998 998 998 998 998 998 998 998 ...
$ campaign_id : chr "212120" "212120" "212120" "212120" ...
$ call_date : POSIXct, format: "2019-08-26 20:25:30" "2019-08-26 19:32:28" "2019-08-26 19:27:03" ...
$ start_epoch : POSIXct, format: "2019-08-26 20:25:30" "2019-08-26 19:32:28" "2019-08-26 19:27:03" ...
$ end_epoch : POSIXct, format: "2019-08-26 20:36:25" "2019-08-26 19:44:52" "2019-08-26 19:40:23" ...
$ length_in_sec : int 655 744 800 1109 771 511 640 153 757 227 ...
$ status : chr "Ar" "Ar" "Ar" "Ar" ...
$ phone_code : chr "1" "1" "1" "1" ...
$ phone_number : chr "17035555" "43667342" "3135324788" "3214255222" ...
$ user : chr "jfino" "jfino" "jfino" "jfino" ...
$ comments : chr "AUTO" "AUTO" "AUTO" "AUTO" ...
$ processed : chr "N" "N" "N" "N" ...
$ queue_seconds : num 0 524 692 577 238 95 104 0 0 0 ...
$ user_group : chr "CEAS" "CEAS" "CEAS" "CEAS" ...
$ xfercallid : int 0 0 0 0 0 0 0 0 0 0 ...
$ term_reason : chr "CALLER" "CALLER" "CALLER" "AGENT" ...
$ uniqueid : chr "1566869112.557969" "1566865941.557957" "1566865611.557952" "1566865127.557947" ...
$ agent_only : chr "" "" "" "" ...
$ queue_position: int 1 2 2 2 1 2 1 1 1 1 ...
$ called_count : int 1 1 1 1 1 1 1 1 1 1 ...
And, this is my code
df <- setDT(call_log)[ , list(number_customers_arrive = sum(called_count)), by = cut(call_date, "30 min")]
Thanks in advance.
Since there is not a reproducible example, I attempt the solution on a simulated data frame. First we create a log of calls with ID and time:
library(lubridate)
library(dplyr)
library(magrittr)
set.seed(123)
# Generate 100 random call times during a day
calls.df <- data.frame(id=seq(1,100,1), calltime=sample(seq(as.POSIXct('2019/10/01'),
as.POSIXct('2019/10/02'), by="min"), 100))
There may not be all intervals represented in your call data so generate a sequence of all 30 minute bins in case:
full.df <- data.frame(bin=seq(as.POSIXct('2019/10/01'), as.POSIXct('2019/10/02'), by="30 min"))
Next tally up counts of calls in represented bins:
calls.df %>% arrange(calltime) %>% mutate(diff=interval(lag(calltime),calltime)) %>%
mutate(mins=diff#.Data/60) %>% select(-diff) %>%
mutate(bin=floor_date(calltime, unit="30 minutes")) %>%
group_by(bin) %>% tally() -> orig.counts
Now make sure there are zeroes for unrepresented bins:
right_join(orig.counts,full.df,by="bin") %>% mutate(count=ifelse(is.na(n), 0, n))
# A tibble: 49 x 3
bin n count
<dttm> <int> <dbl>
1 2019-10-01 00:00:00 2 2
2 2019-10-01 00:30:00 1 1
3 2019-10-01 01:00:00 2 2
4 2019-10-01 01:30:00 NA 0
5 2019-10-01 02:00:00 2 2
6 2019-10-01 02:30:00 4 4
7 2019-10-01 03:00:00 1 1
8 2019-10-01 03:30:00 1 1
9 2019-10-01 04:00:00 2 2
10 2019-10-01 04:30:00 1 1
# ... with 39 more rows
Hope this is helpful for you.

Why does mutate change the variable type?

activity <- mutate(
activity, steps = ifelse(is.na(steps), lookup_mean(interval), steps))
The "steps" variable changes from an int to a list. I want it to stay an "int" so I can aggregate it (aggregate is failing because it is a list type).
Before:
> str(activity)
'data.frame': 17568 obs. of 3 variables:
$ steps : int NA NA NA NA NA NA NA NA NA NA ...
$ date : Factor w/ 61 levels "2012-10-01","2012-10-02",..: 1 1 1 1 1 1 1 1 1 1 ...
$ interval: int 0 5 10 15 20 25 30 35 40 45 ...
After:
> str(activity)
'data.frame': 17568 obs. of 3 variables:
$ steps :List of 17568
..$ : num 1.72
..$ : num 1.72
Lookup mean is defined here:
lookup_mean <- function(i) {
return filter(daily_activity_pattern, interval == 0) %>% select(steps)
}
The problem is that lookup_mean returns a list, so R casts each value in activity$steps to a list. lookup_mean should be:
lookup_mean <- function(i) {
interval <- filter(daily_activity_pattern, interval == 0) %>% select(steps)
return(interval$steps)
}

Rbind throwing Error in NextMethod() : invalid value

My data frame is as follows:
> t
Day TestID VarID
1 2013-04-27 Total Total
> str(t)
'data.frame': 1 obs. of 3 variables:
$ Day : Date, format: "2013-04-27"
$ TestID: factor [1, 1] Total
..- attr(*, "levels")= chr "Total"
$ VarID : Factor w/ 3 levels "0|0","731|18503",..: 3
When I try doing a rbind I get the following error
> rbind(t,t)
Error in NextMethod() : invalid value
but when I try to recreate the data frame directly I don't get that error:
> t <- data.frame(Day = as.Date("2013-04-27"),TestID = "Total", VarID = "Total")
> t
Day TestID VarID
1 2013-04-27 Total Total
> str(t)
'data.frame': 1 obs. of 3 variables:
$ Day : Date, format: "2013-04-27"
$ TestID: Factor w/ 1 level "Total": 1
$ VarID : Factor w/ 1 level "Total": 1
> rbind(t,t)
Day TestID VarID
1 2013-04-27 Total Total
2 2013-04-27 Total Total
Can anyone help me figure out what is going on and how can I avoid this error.
Thanks.
The major difference I see is that the TestID variable in the first version is factor [1, 1] (a matrix) rather than Factor (a vector)
First version:
t1 <- data.frame(Day = as.Date("2013-04-27"),
TestID = "Total", VarID = "Total")
rbind(t1,t1)
Convert to second version:
t2 <- t1
dim(t2$TestID) <- c(1,1)
str(t2$TestID)
## factor [1, 1] Total
## - attr(*, "levels")= chr "Total"
rbind(t2,t2)
## Error in NextMethod() : invalid value
Fix the mangled version:
t3 <- t2
t3$TestID <- drop(t3$TestID)
rbind(t3,t3) ## works

Resources