Draw trend lines on a scatterplot with datetime variable - r

I have a data frame of almost 1600 observations with this structure:
head(df)
Start_Time Duration
1 2014-09-18 10:01:00 4 mins
2 2014-09-18 08:01:00 41 mins
3 2014-09-18 08:01:00 22 mins
4 2014-09-18 08:01:00 41 mins
5 2014-09-18 08:01:00 60 mins
6 2014-09-18 07:02:00 17 mins
I have plotted my data with this function:
plot(df$Start_Time,as.numeric(df$Duration), ylab = "Duration", xlab = "Date", ylim = c(0,450))
Since the data frame contains several tens of observations per day, I would like to draw a trend line in order to make it easier to read the data visually.
I tried this code:
fit <- glm(df$Start_Time~df$Duration)
co <- coef(fit)
abline(fit, col="red", lwd=2)
but I get this error:
Error in model.frame.default(formula = df$Start_Time ~ df$Duration, :
invalid type (list) for variable 'df$Start_Time'
I got the same error with this code:
abline(lm(df$Start_Time ~ df$Duration))
From reading the error messages, I suppose that those functions can't hande non-numeric values.
I tried this and got no error, but the line wasn't displayed on my graph:
fit <- glm(as.numeric(df$Start_Time)~df$Duration)
co <- coef(fit)
abline(fit, col="red", lwd=2)
What is the correct way of drawing trend lines / regression lines when one of the variables is in the datetime format?
NOTE: what follows is the result of str(df)
str(df)
'data.frame': 4121 obs. of 2 variables:
$ Start_Time: POSIXlt, format: "2014-09-18 10:01:00" "2014-09-18 08:01:00" "2014-09-18 08:01:00" "2014-09-18 08:01:00" ...
$ Duration :Class 'difftime' atomic [1:4121] 4 41 22 41 60 17 17 2 3 3 ... .. ..- attr(*, "units")= chr "mins"

Try the following code that reproduces data in the format you stated, then fits a linear model using lm() instead of glm() and plots the results, including a line of best fit.
set.seed(1)
times <- as.POSIXct("2014-09-18") + sort(runif(11, min=0, max=1000))
df <- data.frame(Start_time = times[-11])
df$Duration <- difftime(times[-11], times[-1])
model <- lm(Start_time ~ Duration, df)
plot(Start_time ~ Duration, df)
abline(model)
The structure of the data frame is the same as you report:
str(df)
'data.frame': 10 obs. of 2 variables:
$ Start_time: POSIXct, format: "2014-09-18 00:01:01" "2014-09-18 00:03:21" "2014-09-18 00:03:25" ...
$ Duration :Class 'difftime' atomic [1:10] -139.9 -4.29 -59.53 -106.62 -200.73 ...
.. ..- attr(*, "units")= chr "secs"

Related

Error in MEEM(object, conLin, control$niterEM) in lme function

I'm trying to apply the lme function to my data, but the model gives follow message:
mod.1 = lme(lon ~ sex + month2 + bat + sex*month2, random=~1|id, method="ML", data = AA_patch_GLM, na.action=na.exclude)
Error in MEEM(object, conLin, control$niterEM) :
Singularity in backsolve at level 0, block 1
dput for data, copy from https://pastebin.com/tv3NvChR (too large to include here)
str(AA_patch_GLM)
'data.frame': 2005 obs. of 12 variables:
$ lon : num -25.3 -25.4 -25.4 -25.4 -25.4 ...
$ lat : num -51.9 -51.9 -52 -52 -52 ...
$ id : Factor w/ 12 levels "24641.05","24642.03",..: 1 1 1 1 1 1 1 1 1 1 ...
$ sex : Factor w/ 2 levels "F","M": 1 1 1 1 1 1 1 1 1 1 ...
$ bat : int -3442 -3364 -3462 -3216 -3216 -2643 -2812 -2307 -2131 -2131 ...
$ year : chr "2005" "2005" "2005" "2005" ...
$ month : chr "12" "12" "12" "12" ...
$ patch_id: Factor w/ 45 levels "111870.17_1",..: 34 34 34 34 34 34 34 34 34 34 ...
$ YMD : Date, format: "2005-12-30" "2005-12-31" "2005-12-31" ...
$ month2 : Ord.factor w/ 7 levels "January"<"February"<..: 7 7 7 7 7 1 1 1 1 1 ...
$ lonsc : num [1:2005, 1] -0.209 -0.213 -0.215 -0.219 -0.222 ...
$ batsc : num [1:2005, 1] 0.131 0.179 0.118 0.271 0.271 ...
What's the problem?
I saw a solution applying the lme4::lmer function, but there is another option to continue to use lme function?
The problem is that you have collinear combinations of predictors. In particular, here are some diagnostics:
## construct the fixed-effect model matrix for your problem
X <- model.matrix(~ sex + month2 + bat + sex*month2, data = AA_patch_GLM)
lc <- caret::findLinearCombos(X)
colnames(X)[lc$linearCombos[[1]]]
## [1] "sexM:month2^6" "(Intercept)" "sexM" "month2.L"
## [5] "month2.C" "month2^4" "month2^5" "month2^6"
## [9] "sexM:month2.L" "sexM:month2.C" "sexM:month2^4" "sexM:month2^5"
This is in a weird order, but it suggests that the sex × month interaction is causing problems. Indeed:
with(AA_patch_GLM, table(sex, month2))
## sex January February March April May June December
## F 367 276 317 204 43 0 6
## M 131 93 90 120 124 75 159
shows that you're missing data for one sex/month combination (i.e., no females were sampled in June).
You can:
construct the sex/month interaction yourself (data$SM <- with(data, interaction(sex, month2, drop = TRUE))) and use ~ SM + bat — but then you'll have to sort out main effects and interactions yourself (ugh)
construct the model matrix by hand (as above), drop the redundant column(s), then include all the resulting columns in the model:
d2 <- with(AA_patch_GLM,
data.frame(lon,
as.data.frame(X),
id))
## drop linearly dependent column
## note data.frame() has "sanitized" variable names (:, ^ both converted to .)
d2 <- d2[names(d2) != "sexM.month2.6"]
lme(reformulate(colnames(d2)[2:15], response = "lon"),
random=~1|id, method="ML", data = d2)
Again, the results will be uglier than the simpler version of the model.
use a patched version of nlme (I submitted a patch here but it hasn't been considered)
remotes::install_github("bbolker/nlme")

Date and Time defaulting to Jan 01, 1AD in Lubridate R package

folks...
I am having trouble with date/time showing up properly in lubridate.
Here's my code:
Temp.dat <- read_excel("Temperature Data.xlsx", sheet = "Sheet1", na="NA") %>%
mutate(Treatment = as.factor(Treatment),
TempC=as.factor(TempC),
TempF=as.factor(TempF),
Month=as.factor(Month),
Day=as.factor(Day),
Year=as.factor(Year),
Time=as.factor(Time))%>%
select(TempC, Treatment, Month, Day, Year, Time)%>%
mutate(Measurement=make_datetime(Month, Day, Year, Time))
Here's what it spits out:
tibble [44 x 7] (S3: tbl_df/tbl/data.frame)
$ TempC : Factor w/ 38 levels "15.5555555555556",..: 31 32 29 20 17 28 27 26 23 24 ...
$ Treatment : Factor w/ 2 levels "Grass","Soil": 1 1 1 1 2 2 2 2 2 2 ...
$ Month : Factor w/ 1 level "6": 1 1 1 1 1 1 1 1 1 1 ...
$ Day : Factor w/ 2 levels "15","16": 1 1 1 1 1 1 1 1 1 1 ...
$ Year : Factor w/ 1 level "2022": 1 1 1 1 1 1 1 1 1 1 ...
$ Time : Factor w/ 3 levels "700","1200","1600": 3 3 3 3 3 3 3 3 3 3 ...
**$ Measurement: POSIXct[1:44], format: "0001-01-01 03:00:00" "0001-01-01 03:00:00" "0001-01-01 03:00:00" "0001-01-01 03:00:00" ...**
I've put asterisks by the problem result. It should spit out June 16th at 0700 or something like that, but instead it's defaulting to January 01, 1AD for some reason. I've tried adding colons to the date in excel, but that defaults to a 12-hour timecycle and I'd like to keep this at 24 hours.
What's going on here?
This will work as long as the format in the excel file for date is set to time, and it imports as a date-time object that lubridate can interpret.
library(dplyr)
library(lubridate)
Temp.dat <- read_excel("t.xlsx", sheet = "Sheet1", na="NA") %>%
mutate(Treatment = as.factor(Treatment),
TempC = as.numeric(TempC),
TempF = as.numeric(TempF),
Month = as.numeric(Month),
Day = as.numeric(Day),
Year = as.numeric(Year),
Hour = hour(Time),
Minute = minute(Time)) %>%
select(TempC, Treatment, Month, Day, Year, Hour, Minute) %>%
mutate(Measurement = make_datetime(year = Year,
month = Month,
day = Day,
hour = Hour,
min = Minute))
Notice the value for the arguments for make_datetime() are set to numeric, which is what the function expects. If you pass factors, the function gives you the weird dates you were seeing.
No need to convert Time to string and extract hours and minutes, as I suggested in the comments, since you can use lubridate's minute() and hour() functions.
EDIT
In order to be able to use lubridate's functions Time needs to be a date-time object. You can check that it is by looking at what read_excel() produces
> str(read_excel("t.xlsx", sheet = "Sheet1", na="NA"))
tibble [2 × 7] (S3: tbl_df/tbl/data.frame)
$ Treatment: chr [1:2] "s" "c"
$ TempC : num [1:2] 34 23
$ TempF : num [1:2] 99 60
$ Month : num [1:2] 5 4
$ Day : num [1:2] 1 15
$ Year : num [1:2] 2020 2021
$ Time : POSIXct[1:2], format: "1899-12-31 04:33:23" "1899-12-31 03:20:23"
See that Time is type POSIXct, a date-time object. If it is not, then you need to convert it into one if you want to use lubridate's minute() and hour() functions. If it cannot be converted, there are other solutions, but they depend on what you have.

Create ITime intervals in data.table

I have a datetime variable (vardt) as a character in large data table. E.g. "21/07/2011 15:54:57"
I can turn it into ITime class (e.g. 15:54:57) with DT[,newtimevar:=as.ITime(substr(DT$vardt,12,19))] but I would like to create groups of minutes, so from 21/07/2011 15:54:57 I would obtain 15:54:00 or 15:54.
I have tried: DT[,cuttime := as.ITime(cut(DT$vardt, breaks = "1 min",))]
but it didn't work. I am reading the zoo package documentation but I haven't found anything yet. Any idea/function that could be useful for this case in a large data table?
Here are two possible approaches:
library(data.table)
##
x <- Sys.time()+sample(seq(0,24*3600,60),101,TRUE)
x <- gsub(
"(\\d+)\\-(\\d+)\\-(\\d+)",
"\\3/\\2/\\1",
x)
##
DT <- data.table(vardt=x)
##
DT[,time:=as.ITime(substr(vardt,12,19))]
##
DT[,hour_min:=as.ITime(
gsub("(\\d+)\\:(\\d+)\\:(\\d+)",
"\\1\\:\\2\\:00",time))]
DT[,c_hour_min:=substr(time,1,5)]
##
R> head(DT)
vardt time hour_min c_hour_min
1: 28/01/2015 05:38:30 05:38:30 05:38:00 05:38
2: 27/01/2015 14:15:30 14:15:30 14:15:00 14:15
3: 28/01/2015 06:03:30 06:03:30 06:03:00 06:03
4: 28/01/2015 00:37:30 00:37:30 00:37:00 00:37
5: 27/01/2015 17:59:30 17:59:30 17:59:00 17:59
6: 28/01/2015 03:46:30 03:46:30 03:46:00 03:46
R> str(DT,vec.len=2)
Classes ‘data.table’ and 'data.frame': 101 obs. of 4 variables:
$ vardt : chr "28/01/2015 05:38:30" "27/01/2015 14:15:30" ...
$ time :Class 'ITime' int [1:101] 20310 51330 21810 2250 64770 ...
$ hour_min :Class 'ITime' int [1:101] 20280 51300 21780 2220 64740 ...
$ c_hour_min: chr "05:38" "14:15" ...
- attr(*, ".internal.selfref")=<externalptr>
The first case, hour_min, preserves the ITime class, while the second case, c_hour_min, is just a character vector.

Work with durations over 24 hours in R

I have a series of duration that range up to 118 hours in a format like so "118:34:42" where 118 is hours, 34 is minutes, and 42 is seconds. Output should be a number of seconds.
I would like to convert this to some kind of time type in R, but most of the libraries I've looked at want to add a date (lubridate, zoo, xts), or return "NA" due to the hours being beyond a 24 hour range. I could parse the string and return a number of seconds, but I'm wondering if there's a faster way.
I'm slightly new to R (maybe 3 months in to working with this).
Any help figuring out how to deal with this would be appreciated.
Example:
library(lubridate)
x <- c("118:34:42", "114:12:12")
tt <- hms(x)
Error in parse_date_time(hms, orders, truncated = truncated, quiet = TRUE) :
No formats could be infered from the training set.
#try another route
w <- "118:34:42"
tt2 <- hms(w)
tt2
#[1] NA
z <- "7:02:02"
tt3 <- hmw(z)
tt3
#[1] "7H 2M 2S"
In the lubridate package there is a function hms() that returns a time object:
library(lubridate)
x <- c("118:34:42", "114:12:12")
tt <- hms(x)
tt
[1] 118 hours, 34 minutes and 42 seconds
[2] 114 hours, 12 minutes and 12 seconds
The function hms() returns an object of class Period:
str(tt)
Formal class 'Period' [package "lubridate"] with 6 slots
..# .Data : num [1:2] 42 12
..# year : num [1:2] 0 0
..# month : num [1:2] 0 0
..# day : num [1:2] 0 0
..# hour : num [1:2] 118 114
..# minute: num [1:2] 34 12
You can do arithmetic using these objects. For example:
tt[2] - tt[1]
[1] -4 hours, -22 minutes and -30 seconds

Trouble plotting with dates in R

I am relatively new to R and am having trouble plotting grouped data against date. I have count data grouped by month over 4 years. I don't want May of 2008 grouped with May 2009 but rather points for each month of each year with standard errors. Here is my code so far but I get a blank graph with no points. I can get rid of the axis.POSIXct line and I get a graph with points and error bars. The problem seems to be around the scaling or data format of the plot vs. the axis. Can anyone help me here?
> r <- as.POSIXct(range(refmCount$mo.yr), "month")
>
> ############# can get plot and points to line up on the x-axis##########################
> plot(refmCount$mo.yr, refmCount$count, type = "n", xaxt = "n",
+ xlab = "Date",
+ ylab = "Mean number of salamanders per night",
+ xlim = c(r[1], r[2]))
> axis.POSIXct(1, at = seq(r[1], r[2], by = "month"), format = "%b")
> points(refmCount$mo.yr, refmCount$count, type = "p", pch = 19)
points(depmCount$mo.yr, depmCount$count, type = "p", pch = 24)
> arrows(refmCount$mo.yr, refmCount$count+mCount$se, refmCount$mo.yr, refmCount$count- refmCount$se, angle=90, code=3, length=0)
>
> str(refmCount)
'data.frame': 19 obs. of 7 variables:
$ mo.yr:Class 'Date' num [1:19] 14000 14031 14061 14092 14123 ...
$ trt : Factor w/ 2 levels "Depletion","Reference": 2 2 2 2 2 2 2 2 2 2 ...
$ N : num 75 110 15 10 34 20 20 10 40 15 ...
$ count: num 3.6 5.95 3.47 6.7 11.12 ...
$ sd : num 8.58 8.4 4.42 3.47 11.88 ...
$ se : num 0.99 0.801 1.142 1.096 2.037 ...
$ ci : num 1.97 1.59 2.45 2.48 4.14 ...
> r
[1] "2008-04-30 20:00:00 EDT" "2011-05-31 20:00:00 EDT"
>
You have two choices. Install package "zoo" and use the yearmon class, or calculate numeric months so that May 2005 is 2005.4167. You can create prettier labels with paste(month.abb[month], year).

Resources