I'm trying to run different forecast modeling methods on a monthly tsibble dataset. Its head() looks like:
# A tsibble: 6 x 2 [1M]
month total
<mth> <dbl>
1 2000 Jan 104.
2 2000 Feb 618.
3 2000 Mar 1005.
4 2000 Apr 523.
5 2000 May 1908.
6 2000 Jun 1062.
and has a structure of:
tsibble [212 x 2] (S3: tbl_ts/tbl_df/tbl/data.frame)
$ month: mth [1:212] 2000 Jan, 2000 Feb, 2000 Mar, 2000 Apr, 2000 May, 2000 Jun, 2000 Jul, 2000 Aug, 2000 Sep, 2000 Oct, 2000 Nov...
$ total: num [1:212] 104 618 1005 523 1908 ...
- attr(*, "key")= tibble [1 x 1] (S3: tbl_df/tbl/data.frame)
..$ .rows: list<int> [1:1]
.. ..$ : int [1:212] 1 2 3 4 5 6 7 8 9 10 ...
.. ..# ptype: int(0)
- attr(*, "index")= chr "month"
..- attr(*, "ordered")= logi TRUE
- attr(*, "index2")= chr "month"
- attr(*, "interval")= interval [1:1] 1M
..# .regular: logi TRUE
The dataset is monthly from 2000/01 to 2017/08 with no missing values or time periods. I'm trying to run a model such as:
df %>%
model(STL(total ~ season(window=9),robust=T)) %>%
components() %>% autoplot()
fit <- df %>%
model(ANN =ETS(total ~ error("A") + trend("A") + season()))
But for any type of model I try to run I get the exact same error each time. I'm looking for suggestions to correct the structure of the tsibble to allow these model functions to work.
Error in UseMethod("model") :
no applicable method for 'model' applied to an object of class "c('tbl_ts', 'tbl_df', 'tbl', 'data.frame')"
EDIT: Including reproducible example:
a = c(sample(1:1000,212))
df.ts <- ts(a, start=c(2000,1),end=c(2017,8),frequency=12)
df <- df.ts %>% as_tsibble()
Thanks for the example, I was able to get it to run without any errors, as follows:
library(tidyverse)
library(fpp3)
a = c(sample(1:1000,212))
df.ts <- ts(a, start=c(2000,1),end=c(2017,8),frequency=12)
df <- df.ts %>% as_tsibble()
df %>%
model(STL(a ~ season(window=9),robust=T)) %>%
components() %>% autoplot()
fit <- df %>%
model(ANN =ETS(a ~ error("A") + trend("A") + season()))
report(fit)
Here is what the decomposition looks like:
Here is the report of the model:
As both Russ Conte and Rob Hyndman found there's nothing inherently wrong with the example code being used.
I believe there was an overlapping issue between two packages, as my issue was resolved upon removing and reinstalling the forecasting packages.
Related
I'm trying to apply the lme function to my data, but the model gives follow message:
mod.1 = lme(lon ~ sex + month2 + bat + sex*month2, random=~1|id, method="ML", data = AA_patch_GLM, na.action=na.exclude)
Error in MEEM(object, conLin, control$niterEM) :
Singularity in backsolve at level 0, block 1
dput for data, copy from https://pastebin.com/tv3NvChR (too large to include here)
str(AA_patch_GLM)
'data.frame': 2005 obs. of 12 variables:
$ lon : num -25.3 -25.4 -25.4 -25.4 -25.4 ...
$ lat : num -51.9 -51.9 -52 -52 -52 ...
$ id : Factor w/ 12 levels "24641.05","24642.03",..: 1 1 1 1 1 1 1 1 1 1 ...
$ sex : Factor w/ 2 levels "F","M": 1 1 1 1 1 1 1 1 1 1 ...
$ bat : int -3442 -3364 -3462 -3216 -3216 -2643 -2812 -2307 -2131 -2131 ...
$ year : chr "2005" "2005" "2005" "2005" ...
$ month : chr "12" "12" "12" "12" ...
$ patch_id: Factor w/ 45 levels "111870.17_1",..: 34 34 34 34 34 34 34 34 34 34 ...
$ YMD : Date, format: "2005-12-30" "2005-12-31" "2005-12-31" ...
$ month2 : Ord.factor w/ 7 levels "January"<"February"<..: 7 7 7 7 7 1 1 1 1 1 ...
$ lonsc : num [1:2005, 1] -0.209 -0.213 -0.215 -0.219 -0.222 ...
$ batsc : num [1:2005, 1] 0.131 0.179 0.118 0.271 0.271 ...
What's the problem?
I saw a solution applying the lme4::lmer function, but there is another option to continue to use lme function?
The problem is that you have collinear combinations of predictors. In particular, here are some diagnostics:
## construct the fixed-effect model matrix for your problem
X <- model.matrix(~ sex + month2 + bat + sex*month2, data = AA_patch_GLM)
lc <- caret::findLinearCombos(X)
colnames(X)[lc$linearCombos[[1]]]
## [1] "sexM:month2^6" "(Intercept)" "sexM" "month2.L"
## [5] "month2.C" "month2^4" "month2^5" "month2^6"
## [9] "sexM:month2.L" "sexM:month2.C" "sexM:month2^4" "sexM:month2^5"
This is in a weird order, but it suggests that the sex × month interaction is causing problems. Indeed:
with(AA_patch_GLM, table(sex, month2))
## sex January February March April May June December
## F 367 276 317 204 43 0 6
## M 131 93 90 120 124 75 159
shows that you're missing data for one sex/month combination (i.e., no females were sampled in June).
You can:
construct the sex/month interaction yourself (data$SM <- with(data, interaction(sex, month2, drop = TRUE))) and use ~ SM + bat — but then you'll have to sort out main effects and interactions yourself (ugh)
construct the model matrix by hand (as above), drop the redundant column(s), then include all the resulting columns in the model:
d2 <- with(AA_patch_GLM,
data.frame(lon,
as.data.frame(X),
id))
## drop linearly dependent column
## note data.frame() has "sanitized" variable names (:, ^ both converted to .)
d2 <- d2[names(d2) != "sexM.month2.6"]
lme(reformulate(colnames(d2)[2:15], response = "lon"),
random=~1|id, method="ML", data = d2)
Again, the results will be uglier than the simpler version of the model.
use a patched version of nlme (I submitted a patch here but it hasn't been considered)
remotes::install_github("bbolker/nlme")
folks...
I am having trouble with date/time showing up properly in lubridate.
Here's my code:
Temp.dat <- read_excel("Temperature Data.xlsx", sheet = "Sheet1", na="NA") %>%
mutate(Treatment = as.factor(Treatment),
TempC=as.factor(TempC),
TempF=as.factor(TempF),
Month=as.factor(Month),
Day=as.factor(Day),
Year=as.factor(Year),
Time=as.factor(Time))%>%
select(TempC, Treatment, Month, Day, Year, Time)%>%
mutate(Measurement=make_datetime(Month, Day, Year, Time))
Here's what it spits out:
tibble [44 x 7] (S3: tbl_df/tbl/data.frame)
$ TempC : Factor w/ 38 levels "15.5555555555556",..: 31 32 29 20 17 28 27 26 23 24 ...
$ Treatment : Factor w/ 2 levels "Grass","Soil": 1 1 1 1 2 2 2 2 2 2 ...
$ Month : Factor w/ 1 level "6": 1 1 1 1 1 1 1 1 1 1 ...
$ Day : Factor w/ 2 levels "15","16": 1 1 1 1 1 1 1 1 1 1 ...
$ Year : Factor w/ 1 level "2022": 1 1 1 1 1 1 1 1 1 1 ...
$ Time : Factor w/ 3 levels "700","1200","1600": 3 3 3 3 3 3 3 3 3 3 ...
**$ Measurement: POSIXct[1:44], format: "0001-01-01 03:00:00" "0001-01-01 03:00:00" "0001-01-01 03:00:00" "0001-01-01 03:00:00" ...**
I've put asterisks by the problem result. It should spit out June 16th at 0700 or something like that, but instead it's defaulting to January 01, 1AD for some reason. I've tried adding colons to the date in excel, but that defaults to a 12-hour timecycle and I'd like to keep this at 24 hours.
What's going on here?
This will work as long as the format in the excel file for date is set to time, and it imports as a date-time object that lubridate can interpret.
library(dplyr)
library(lubridate)
Temp.dat <- read_excel("t.xlsx", sheet = "Sheet1", na="NA") %>%
mutate(Treatment = as.factor(Treatment),
TempC = as.numeric(TempC),
TempF = as.numeric(TempF),
Month = as.numeric(Month),
Day = as.numeric(Day),
Year = as.numeric(Year),
Hour = hour(Time),
Minute = minute(Time)) %>%
select(TempC, Treatment, Month, Day, Year, Hour, Minute) %>%
mutate(Measurement = make_datetime(year = Year,
month = Month,
day = Day,
hour = Hour,
min = Minute))
Notice the value for the arguments for make_datetime() are set to numeric, which is what the function expects. If you pass factors, the function gives you the weird dates you were seeing.
No need to convert Time to string and extract hours and minutes, as I suggested in the comments, since you can use lubridate's minute() and hour() functions.
EDIT
In order to be able to use lubridate's functions Time needs to be a date-time object. You can check that it is by looking at what read_excel() produces
> str(read_excel("t.xlsx", sheet = "Sheet1", na="NA"))
tibble [2 × 7] (S3: tbl_df/tbl/data.frame)
$ Treatment: chr [1:2] "s" "c"
$ TempC : num [1:2] 34 23
$ TempF : num [1:2] 99 60
$ Month : num [1:2] 5 4
$ Day : num [1:2] 1 15
$ Year : num [1:2] 2020 2021
$ Time : POSIXct[1:2], format: "1899-12-31 04:33:23" "1899-12-31 03:20:23"
See that Time is type POSIXct, a date-time object. If it is not, then you need to convert it into one if you want to use lubridate's minute() and hour() functions. If it cannot be converted, there are other solutions, but they depend on what you have.
I am trying to follow this tutorial but I am encountering an error. I have my data formatted exactly like it is in the example, at least as far as I can tell. But when I try to run the calculation for the MCP below:
#Space use by week, month, or year - MCP
> mcps.year<-trk %>% nest(-id,-year) %>%
+ mutate(mcparea = map(data, ~hr_mcp(., levels = c(0.95)) %>% hr_area)) %>%
+ select(id, year, mcparea) %>% unnest()
I receive this error:
Error: Problem with `mutate()` column `mcparea`.
i `mcparea = map(data, ~hr_mcp(., levels = c(0.95)) %>% hr_area)`.
x is.numeric(crs) || is.character(crs) || inherits(crs, "crs") is not TRUE
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
All elements of `...` must be named.
Did you want `data = c(x_, y_, t_, tod_, week, month, hour, sl, nsd_)`?
How can I fix this? I tried setting x_ and y_ as coordinates and then setting the projection to make it a spatial points dataframe:
trk <- st_as_sf(trk, coords = c("x_", "y_"))
st_crs(trk) <- 32614
But then I got this error:
Error: Problem with `mutate()` column `mcparea`.
i `mcparea = map(data, ~hr_mcp(., levels = c(0.95)) %>% hr_area)`.
x Can't subset columns that don't exist.
x Columns `x_` and `y_` don't exist.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
All elements of `...` must be named.
Did you want `data = c(t_, tod_, week, month, hour, sl, nsd_, geometry)`?
EDIT:
Ok here is the code Im running!
#Set seed for random number generator
set.seed(10299)
#Import data
setwd("C:/Users/saraa/OneDrive - UNT System/Masters Thesis/Data/Testing RSF")
turtle <- read_csv("West2021KDE.csv")
turtle.dat <- as(turtle, "data.frame")
#Data cleaning
#Remove incomplete observations
ind<-complete.cases(turtle.dat[,c("y", "x", "date", "CST")])
turtle.dat<-turtle.dat[ind==TRUE,]
#Check for and remove duplicates
ind2<-turtle.dat %>% select(y, x, CST, date, id) %>% duplicated
sum(ind2)
turtle.dat<-turtle.dat[ind2!=TRUE,]
#Make timestamp a date/time variable
turtle.dat$timestamp <- with(turtle.dat, mdy(date) + hms(CST))
turtle.dat$timestamp<-as.POSIXct(turtle.dat$timestamp, format="%Y-%m-%d %H:%M")
#Plot data
#Separately
ggplot(turtle.dat, aes(x=x, y=y))+geom_point()+
facet_wrap(~id, scales="free")
#Combined
ggplot(turtle.dat, aes(x=x, y=y, color=as.factor(id)))+
geom_point()
#Creating a track using amt
trk <- mk_track(turtle.dat, .x=x, .y=y, .t=timestamp, id=id,
crs = CRS("+init=epsg:4326"))
#Now it is easy to calculate day/night with either movement track
trk <- trk %>% time_of_day()
#Transform coordinates
trk <- transform_coords(trk,sp::CRS("+init=epsg:32614"))
trk.class<-class(trk)
#Nest tracks by individual
nesttrk<-trk%>%nest(-id)
nesttrk
#Make new columns of time of observation
trk <- trk %>%
mutate(week = week(t_),
month = month(t_, label=TRUE),
year=year(t_),
hour = hour(t_))
#Calculate movement distances
trk <- trk %>% nest(-id) %>%
mutate(sl = map(data, step_lengths),
nsd_=map(data, nsd)) %>% unnest(cols = c(data, sl, nsd_))
#Transform to track object
class(trk)
class(trk)<-trk.class
trk
#Net squared displacement over time
ggplot(trk, aes(x = t_, y=nsd_)) + geom_point()+
facet_wrap(~id, scales="free")
#Step length distribution
ggplot(trk, aes(x = month, y = log(sl))) +
geom_boxplot()+geom_smooth()+facet_wrap(~id)
#Space use by week, month, or year - MCP
mcps.year<-trk %>% nest(-id,-year) %>%
mutate(mcparea = map(data, ~hr_mcp(., levels = c(0.95)) %>% hr_area)) %>%
select(id, year, mcparea) %>% unnest()
ggplot(mcps.year, aes(x = year, y = area, colour=as.factor(year))) + geom_point()+
geom_smooth()+ facet_wrap(~id, scales="free")
and str(trk)
> str(trk)
track_xyt [1,058 x 13] (S3: track_xyt/track_xy/tbl_df/tbl/data.frame)
$ id : chr [1:1058] "M13" "M13" "M13" "M13" ...
$ x_ : Named num [1:1058] 687578 687491 687455 687566 687518 ...
..- attr(*, "names")= chr [1:1058] "1" "3" "122" "128" ...
$ y_ : Named num [1:1058] 3659888 3659889 3659893 3659891 3659959 ...
..- attr(*, "names")= chr [1:1058] "1" "3" "122" "128" ...
$ t_ : POSIXct[1:1058], format: "2021-04-03 15:19:00" "2021-04-18 11:19:00" "2021-05-02 12:43:00" "2021-06-03 11:23:00" ...
$ tod_ : Factor w/ 2 levels "day","night": 1 2 1 1 2 2 2 2 2 2 ...
$ dir_abs: num [1:1058] 1.558 1.485 4.699 0.616 5.525 ...
$ dir_rel: num [1:1058] NA -0.073 -3.07 2.2 -1.374 ...
$ sl : num [1:1058] 87.2 36.1 111.6 83.8 40.5 ...
$ nsd_ : Named num [1:1058] 0 7600 15171 140 8642 ...
..- attr(*, "names")= chr [1:1058] "1" "3" "122" "128" ...
$ week : num [1:1058] 14 16 18 22 23 25 26 27 30 30 ...
$ month : Ord.factor w/ 12 levels "Jan"<"Feb"<"Mar"<..: 4 4 5 6 6 6 7 7 7 7 ...
$ year : num [1:1058] 2021 2021 2021 2021 2021 ...
$ hour : int [1:1058] 15 11 12 11 10 9 9 9 10 0 ...
I have a data frame of almost 1600 observations with this structure:
head(df)
Start_Time Duration
1 2014-09-18 10:01:00 4 mins
2 2014-09-18 08:01:00 41 mins
3 2014-09-18 08:01:00 22 mins
4 2014-09-18 08:01:00 41 mins
5 2014-09-18 08:01:00 60 mins
6 2014-09-18 07:02:00 17 mins
I have plotted my data with this function:
plot(df$Start_Time,as.numeric(df$Duration), ylab = "Duration", xlab = "Date", ylim = c(0,450))
Since the data frame contains several tens of observations per day, I would like to draw a trend line in order to make it easier to read the data visually.
I tried this code:
fit <- glm(df$Start_Time~df$Duration)
co <- coef(fit)
abline(fit, col="red", lwd=2)
but I get this error:
Error in model.frame.default(formula = df$Start_Time ~ df$Duration, :
invalid type (list) for variable 'df$Start_Time'
I got the same error with this code:
abline(lm(df$Start_Time ~ df$Duration))
From reading the error messages, I suppose that those functions can't hande non-numeric values.
I tried this and got no error, but the line wasn't displayed on my graph:
fit <- glm(as.numeric(df$Start_Time)~df$Duration)
co <- coef(fit)
abline(fit, col="red", lwd=2)
What is the correct way of drawing trend lines / regression lines when one of the variables is in the datetime format?
NOTE: what follows is the result of str(df)
str(df)
'data.frame': 4121 obs. of 2 variables:
$ Start_Time: POSIXlt, format: "2014-09-18 10:01:00" "2014-09-18 08:01:00" "2014-09-18 08:01:00" "2014-09-18 08:01:00" ...
$ Duration :Class 'difftime' atomic [1:4121] 4 41 22 41 60 17 17 2 3 3 ... .. ..- attr(*, "units")= chr "mins"
Try the following code that reproduces data in the format you stated, then fits a linear model using lm() instead of glm() and plots the results, including a line of best fit.
set.seed(1)
times <- as.POSIXct("2014-09-18") + sort(runif(11, min=0, max=1000))
df <- data.frame(Start_time = times[-11])
df$Duration <- difftime(times[-11], times[-1])
model <- lm(Start_time ~ Duration, df)
plot(Start_time ~ Duration, df)
abline(model)
The structure of the data frame is the same as you report:
str(df)
'data.frame': 10 obs. of 2 variables:
$ Start_time: POSIXct, format: "2014-09-18 00:01:01" "2014-09-18 00:03:21" "2014-09-18 00:03:25" ...
$ Duration :Class 'difftime' atomic [1:10] -139.9 -4.29 -59.53 -106.62 -200.73 ...
.. ..- attr(*, "units")= chr "secs"
I would like to cross-classify and plot bal using qplot facets:
> str(bal)
'data.frame': 2096 obs. of 6 variables:
$ fips : chr "24510" "24510" "24510" "24510" ...
$ SCC : chr "10100601" "10200601" "10200602" "30100699" ...
$ Pollutant: chr "PM25-PRI" "PM25-PRI" "PM25-PRI" "PM25-PRI" ...
$ Emissions: num 6.53 78.88 0.92 10.38 10.86 ...
$ type : chr "POINT" "POINT" "POINT" "POINT" ...
$ year : int 1999 1999 1999 1999 1999 1999 1999 1999 1999 1999 ...
I'm interested in the two classifiers year and type:
> levels(factor(bal$year))
[1] "1999" "2002" "2005" "2008"
> levels(factor(bal$type))
[1] "NON-ROAD" "NONPOINT" "ON-ROAD" "POINT"
I get it so far, that I can plot the distribution of Emissions cross-classified by year and type:
What I'm unable to do is to plot the sum of the distributions of each year, which I however am able to compute:
> tapply(bal$Emissions, list(bal$year, bal$type), sum)
NON-ROAD NONPOINT ON-ROAD POINT
1999 522.94000 2107.625 346.82000 296.7950
2002 240.84692 1509.500 134.30882 569.2600
2005 248.93369 1509.500 130.43038 1202.4900
2008 55.82356 1373.207 88.27546 344.9752
My guess was something along the lines of
> qplot(bal$year, tapply(bal$Emissions, list(bal$year, bal$type), sum),
data=bal, facets= . ~ type)
Error: Aesthetics must either be length one, or the same length as the
dataProblems:tapply(bal$Emissions, list(bal$year, bal$type), sum)
but I dont get what R is telling me there.
How can I plot this matrix using qplot?
You dan do that using ggplot with either
qplot(year, Emissions, data=bal,
stat="summary", fun.y="sum",
facets= .~type
)
or
ggplot(bal) +
aes(year, Emissions) +
stat_summary(fun.y="sum",geom="point") +
facet_grid(.~type)
Both should give you the following plot which seems to match up well to your summary data.