Creating a sequence object from SPELL data - r

I am trying to create a sequence object with seqdef using SPELL format. Here is an example of my data:
spell <- structure(list(ID = c(1, 3, 3, 4, 5, 5, 6, 8, 9, 10, 11, 11,
12, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 14, 15, 15,
15, 15, 15, 15, 15, 16, 16, 16, 16, 17, 17, 17, 18, 18, 18, 19,
19), status = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 2, 3, 1, 2, 3, 2, 3, 1, 1, 1, 3, 1, 3, 3, 1, 3, 1, 1, 1,
1, 1, 3, 3, 1, 3, 1, 1, 1), time1 = c(1, 1, 57, 1, 1, 91, 1,
1, 1, 1, 1, 104, 1, 1, 60, 109, 121, 1, 42, 47, 54, 64, 72, 78,
85, 116, 1, 29, 39, 69, 74, 78, 88, 1, 16, 40, 68, 1, 30, 123,
1, 39, 51, 1, 61), time2 = c(125, 57, 125, 125, 91, 125, 125,
125, 125, 125, 104, 125, 125, 60, 109, 121, 125, 42, 47, 54,
64, 72, 78, 85, 116, 125, 29, 39, 69, 74, 78, 88, 125, 16, 40,
68, 125, 30, 123, 125, 39, 51, 125, 61, 125)), .Names = c("ID",
"status", "time1", "time2"), row.names = c(NA, 45L), class = "data.frame")
When I try to define the sequence object, a strange error is thrown:
spell.seq <- seqdef(data=spell, informat="SPELL", id="ID", begin="time1", end="time2",
status="status", limit=125,process=FALSE)
[>] time axis: 1 -> 125
[>] SPELL data converted into 17 STS sequences
[>] 3 distinct states appear in the data:
1 = 1
2 = 2
3 = 3
[>] state coding:
[alphabet] [label] [long label]
1 1 1 1
2 2 2 2
3 3 3 3
[>] 17 sequences in the data set
[>] min/max sequence length: 125/125
Error in `row.names<-.data.frame`(`*tmp*`, value = value) :
invalid 'row.names' length
However, if I do the same indirectly via seqformat, preserving the same arguments, no error is thrown:
sts <- seqformat(data=spell,from="SPELL",to="STS",
id="ID",begin="time1",end="time2",status="status",
limit=125,process=FALSE)
seqs <- seqdef(sts,right="DEL")
Using TraMineR 1.8-5 with R 3.0.0 Windows 7 64-bit. Is this a bug or am I doing something wrong? Thanks in advance.

A quick look at the source of seqdef() for how the row.names are set shows they are set based on the value of the id argument.
Looking in ?seqdef for id shows
id
optional argument for setting the rownames of the sequence object. If NULL (default), the rownames are taken from the input data. If set to "auto", sequences are numbered from 1 to the number of sequences. A vector of rownames of length equal to the number of sequences may be specified as well.
From the example in the question you are passing id="ID" which does not meet these criteria. Changing this to id=NULL allows the command to complete as expected and a check for equality using identical( spell.seq, seqs) yields true.

Related

From Boxplot to Barplot in ggplot possible?

I have to do a ggplot barplot with errorbars, Tukey sig. letters for plants grown with different fertilizer concentraitions.
The data should be grouped after the dif. concentrations and the sig. letters should be added automaticaly.
I have already a code for the same problem but for Boxplot - which is working nicely. I tried several tutorials with barplots but I always get the problem; stat_count() can only have an x or y aesthetic.
So I thought, is it possible to get my boxplot code to a barplot code? I tried but I couldnt do it :) And if not - how do I automatically add tukeyHSD Test result sig. letters to a ggplot barplot?
This is my Code for the boxplot with the tukey letters:
    value_max = Dünger, group_by(Duenger.g), summarize(max_value = max(Höhe.cm))
hsd=HSD.test(aov(Höhe.cm~Duenger.g, data=Dünger),
trt = "Duenger.g", group = T) sig.letters <- hsd$groups[order(row.names(hsd$groups)), ]
J <- ggplot(Dünger, aes(x = Duenger.g, y = Höhe.cm))+ geom_boxplot(aes(fill= Duenger.g))+ scale_fill_discrete(labels=c("0.5g", '1g', "2g", "3g", "4g"))+ geom_text(data = value_max, aes(x=Duenger.g, y = 0.1 + max_value, label = sig.letters$groups), vjust=0)+ stat_boxplot(geom = 'errorbar', width = 0.1)+ ggtitle("Auswirkung von Dünger auf die Höhe von Pflanzen") + xlab("Dünger in g") + ylab("Höhe in cm"); J
This is how it looks:
boxplot with tukey
Data from dput:
structure(list(Duenger.g = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
0.5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4), plant = c(1, 2, 3, 4, 5, 7, 10, 11, 12, 13, 14, 18, 19,
21, 23, 24, 25, 26, 27, 29, 30, 31, 33, 34, 35, 37, 38, 39, 40,
41, 42, 43, 44, 48, 49, 50, 53, 54, 55, 56, 57, 58, 61, 62, 64,
65, 66, 67, 68, 69, 70, 71, 72, 73, 75, 79, 80, 81, 83, 85, 86,
88, 89, 91, 93, 99, 100, 102, 103, 104, 105, 106, 107, 108, 110,
111, 112, 113, 114, 115, 116, 117, 118, 120, 122, 123, 125, 126,
127, 128, 130, 131, 132, 134, 136, 138, 139, 140, 141, 143, 144,
145, 146, 147, 149), height.cm = c(5.7, 2.8, 5.5, 8, 3.5, 2.5,
4, 6, 10, 4.5, 7, 8.3, 11, 7, 8, 2.5, 7.4, 3, 14.5, 7, 12, 7.5,
30.5, 27, 6.5, 19, 10.4, 12.7, 27.3, 11, 11, 10.5, 10.5, 13,
53, 12.5, 12, 6, 12, 35, 8, 16, 56, 63, 69, 62, 98, 65, 77, 32,
85, 75, 33.7, 75, 55, 38.8, 39, 46, 35, 59, 44, 31.5, 49, 34,
52, 37, 43, 38, 28, 14, 28, 19, 20, 23, 17.5, 32, 16, 17, 24.7,
34, 50, 12, 14, 21, 33, 39.3, 41, 29, 35, 48, 40, 65, 35, 10,
26, 34, 41, 32, 38, 23.5, 22.2, 20.5, 29, 34, 45)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -105L))
Thank you
mirai
A bar chart and a boxplot are two different things. By default geom_boxplot computes the boxplot stats by default (stat="boxplot"). In contrast when you use geom_bar it will by default count the number of observations (stat="count") which are then mapped on y. That's the reason why you get an error. Hence, simply replacing geom_boxplot by geom_bar will not give your your desired result. Instead you could use e.g. stat_summary to create your bar chart with errorbars. Additionally I created a summary dataset to add the labels on the top of the error bars.
library(ggplot2)
library(dplyr)
library(agricolae)
Dünger <- Dünger |>
rename("Höhe.cm" = height.cm) |>
mutate(Duenger.g = factor(Duenger.g))
hsd <- HSD.test(aov(Höhe.cm ~ Duenger.g, data = Dünger), trt = "Duenger.g", group = T)
sig.letters <- hsd$groups %>% mutate(Duenger.g = row.names(.))
duenger_sum <- Dünger |>
group_by(Duenger.g) |>
summarize(mean_se(Höhe.cm)) |>
left_join(sig.letters, by = "Duenger.g")
ggplot(Dünger, aes(x = Duenger.g, y = Höhe.cm, fill = Duenger.g)) +
stat_summary(geom = "bar", fun = "mean") +
stat_summary(geom = "errorbar", width = .1) +
scale_fill_discrete(labels = c("0.5g", "1g", "2g", "3g", "4g")) +
geom_text(data = duenger_sum, aes(y = ymax, label = groups), vjust = 0, nudge_y = 1) +
labs(
title = "Auswirkung von Dünger auf die Höhe von Pflanzen",
x = "Dünger in g", y = "Höhe in cm"
)
#> No summary function supplied, defaulting to `mean_se()`
But as the summary dataset now already contains the mean and the values for the error bars a second option would be to do:
ggplot(duenger_sum, aes(x = Duenger.g, y = y, fill = Duenger.g)) +
geom_col() +
geom_errorbar(aes(ymin = ymin, ymax = ymax), width = .1) +
scale_fill_discrete(labels = c("0.5g", "1g", "2g", "3g", "4g")) +
geom_text(aes(y = ymax, label = groups), vjust = 0, nudge_y = 1) +
labs(
title = "Auswirkung von Dünger auf die Höhe von Pflanzen",
x = "Dünger in g", y = "Höhe in cm"
)

How can I edit the common legend title name using ggplot2 and ggpubr?

I am using ggpubr to combine multiple graphs in a single plot, but cannot seem to correctly generate one graph with the title that I would like. I would like the title to say "Customized legend," given that it is a common legend for both graphs. Does anybody know how I can do this?
Here is my data:
data1 = data.frame(var1 = c(1,
1,
1,
1,
2,
2,
2,
2,
3,
3,
3,
3,
4,
4,
4,
4,
5,
5,
5,
5,
6,
6,
6,
6,
7,
7,
7,
7,
8,
8,
8,
8,
9,
9,
9,
9,
10,
10,
10,
10,
11,
11,
11,
11,
12,
12,
12,
12,
13,
13,
13,
13,
14,
14,
14,
14,
15,
15,
15,
15,
16,
16,
16,
16,
17,
17,
17,
17,
18,
18,
18,
18,
19,
19,
19,
19,
20,
20,
20,
20,
21,
21,
21,
21,
22,
22,
22,
22,
23,
23,
23,
23,
24,
24,
24,
24,
25,
25,
25,
25,
26,
26,
26,
26,
27,
27,
27,
27,
28,
28,
28,
28,
29,
29,
29,
29,
30,
30,
30,
30,
31,
31,
31,
31,
32,
32,
32,
32,
33,
33,
33,
33),
var2 = c(1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4),
var3 = c(113,
89,
99,
41,
72,
64,
39,
139,
135,
17,
3,
135,
63,
126,
34,
87,
84,
125,
123,
18,
115,
11,
68,
85,
48,
95,
56,
129,
41,
78,
82,
122,
124,
4,
60,
132,
67,
128,
46,
79,
110,
88,
19,
88,
88,
126,
30,
11,
52,
66,
15,
52,
6,
74,
14,
101,
88,
70,
58,
20,
104,
76,
134,
23,
40,
1,
47,
25,
49,
110,
96,
100,
106,
26,
93,
19,
87,
41,
13,
40,
63,
87,
137,
105,
89,
95,
24,
49,
112,
92,
45,
105,
112,
105,
114,
129,
84,
33,
95,
95,
15,
90,
1,
62,
20,
7,
18,
96,
4,
71,
42,
94,
45,
102,
55,
98,
124,
80,
76,
97,
41,
31,
25,
21,
135,
138,
121,
93,
17,
13,
49,
26))
data2 <- data.frame(var1a = c(1,
1,
1,
1,
2,
2,
2,
2,
3,
3,
3,
3,
4,
4,
4,
4,
5,
5,
5,
5,
6,
6,
6,
6,
7,
7,
7,
7,
8,
8,
8,
8,
9,
9,
9,
9,
10,
10,
10,
10,
11,
11,
11,
11,
12,
12,
12,
12,
13,
13,
13,
13,
14,
14,
14,
14,
15,
15,
15,
15,
16,
16,
16,
16,
17,
17,
17,
17,
18,
18,
18,
18,
19,
19,
19,
19,
20,
20,
20,
20,
21,
21,
21,
21,
22,
22,
22,
22,
23,
23,
23,
23,
24,
24,
24,
24,
25,
25,
25,
25,
26,
26,
26,
26,
27,
27,
27,
27,
28,
28,
28,
28,
29,
29,
29,
29,
30,
30,
30,
30,
31,
31,
31,
31,
32,
32,
32,
32,
33,
33,
33,
33),
var2a = c(1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4),
var3a = c(113,
89,
99,
41,
72,
64,
39,
139,
135,
17,
3,
135,
63,
126,
34,
87,
84,
125,
123,
18,
115,
11,
68,
85,
48,
95,
56,
129,
41,
78,
82,
122,
124,
4,
60,
132,
67,
128,
46,
79,
110,
88,
19,
88,
88,
126,
30,
11,
52,
66,
15,
52,
6,
74,
14,
101,
88,
70,
58,
20,
104,
76,
134,
23,
40,
1,
47,
25,
49,
110,
96,
100,
106,
26,
93,
19,
87,
41,
13,
40,
63,
87,
137,
105,
89,
95,
24,
49,
112,
92,
45,
105,
112,
105,
114,
129,
84,
33,
95,
95,
15,
90,
1,
62,
20,
7,
18,
96,
4,
71,
42,
94,
45,
102,
55,
98,
124,
80,
76,
97,
41,
31,
25,
21,
135,
138,
121,
93,
17,
13,
49,
26))
Here is the code that I am using:
#Open packages
library(ggplot2)
library(ggpubr)
#Set the theme
theme_set(theme_pubr())
#Change necessary columns to factor
data1$var2 <- factor(data1$var2, levels = c(1,2,3,4))
data2$var2a <- factor(data2$var2a, levels = c(1,2,3,4))
#Generate the plots
#Generate plots
plot1 <- ggplot(data1, aes(x = var1, y = var3, group = var2)) +
geom_line(size = 1.5, aes(linetype = var2, color = var2)) +
xlab('x_label') +
ylab('y_label')+
scale_fill_discrete(name = 'customized legend')
plot2 <- ggplot(data2, aes(x = var1a, y = var3a, group = var2a)) +
geom_line(size = 1.5, aes(linetype = var2a, color = var2a)) +
xlab('x_label') +
ylab('y_label')+
scale_fill_discrete(name = 'customized legend')
#Combine both into one picture
fig <- ggarrange(plot1, plot2,
ncol = 2,
nrow = 1,
common.legend = TRUE,
legend = "bottom")
fig
Since you didn't use the fill aesthetic in your ggplot, you should not use scale_fill_discrete. What you need is to set the legend title of linetype and color to "customized legend", since those are the aesthetics that you used.
library(ggplot2)
library(ggpubr)
plot1 <- ggplot(data1, aes(x = var1, y = var3, group = var2)) +
geom_line(size = 1.5, aes(linetype = var2, color = var2)) +
xlab('x_label') +
ylab('y_label') +
labs(linetype = "customized legend", color = "customized legend")
plot2 <- ggplot(data2, aes(x = var1a, y = var3a, group = var2a)) +
geom_line(size = 1.5, aes(linetype = var2a, color = var2a)) +
xlab('x_label') +
ylab('y_label') +
labs(linetype = "customized legend", color = "customized legend")
#Combine both into one picture
ggarrange(plot1, plot2,
ncol = 2,
nrow = 1,
common.legend = TRUE,
legend = "bottom")

Create a graph from data frame with a layout base on attribute

I create a graph from a data frame. And I would like the vertices to be arranged and moved apart according to the hamming value that is contained in data.rw$Hamming.
I would like some help
data.rw <- structure(list(g1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4,
4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6,
6, 6, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 9, 9, 9, 9, 10, 10, 10,
11, 11, 12), g2 = c(2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 3,
4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 5, 6, 7, 8, 9, 10, 11, 12, 13, 6, 7, 8, 9, 10, 11, 12, 13,
7, 8, 9, 10, 11, 12, 13, 8, 9, 10, 11, 12, 13, 9, 10, 11, 12,
13, 10, 11, 12, 13, 11, 12, 13, 12, 13, 13), Hamming = c(116,
86, 101, 92, 84, 78, 83, 102, 87, 100, 96, 97, 90, 111, 98, 90,
92, 87, 114, 95, 108, 104, 109, 85, 74, 68, 60, 67, 84, 71, 84,
78, 79, 83, 85, 79, 78, 101, 90, 101, 91, 92, 72, 66, 67, 92,
77, 90, 82, 83, 62, 59, 88, 71, 86, 78, 81, 59, 78, 63, 74, 68,
73, 83, 60, 77, 75, 72, 89, 100, 94, 97, 79, 75, 82, 90, 93,
91)), row.names = c(NA, -78L), class = "data.frame")
set.seed(1234)
vertice.df <- unique(c(data.rw$name1,data.rw$name2))
g <- graph_from_data_frame(d = data.rw, vertices = vertice.df, directed = F)
plot(g)
I recommend a distance-based layout for this task, multidimensional scaling comes to mind:
m <- get.adjacency(g, attr = "Hamming", sparse = F)
# optionally: m <- dist(m)
l <- layout_with_mds(g, dist = m, dim = 2)
First extract the weighted adjacency matrix from the graph and feed it into the layout function (dist = m). This returns a 2-dimensional matrix l (dim = 2) that you can use as layout for the position of the nodes.
plot(g, layout = l)
Have a look at ?cmdscale if you are interested in MDS and specifically the eig parameter to later assess the goodness-of-fit. Chances are that two dimensions are not enough to adequately reflect the distances between nodes. But that's for you to decide.

Auto.Arima transform timeseries and xreg correlation with lagged forecast timeseries

I'm trying to forecast an auto.arima() model like the one below.
I was wondering in general if it was necessary to transform a timeseries so that it resembled a normal distribution before passing it to auto.arima()?
Also does it matter if your xreg=... predictor is correlated with a lag of the timeseries you're trying to predict, or vice versa?
Code:
tsTrain <-tsTiTo[1:60]
tsTest <- tsTiTo[61:100]
Xreg<-CustCount[1:100]
##Predictor
xregTrain2 <- Xreg[1:60]
xregTest2 <- Xreg[61:100]
Arima.fit2 <- auto.arima(tsTrain, xreg = xregTrain2)
Acast2<-forecast(Arima.fit2, h=40, xreg = xregTest2)
Data:
#dput(ds$CustCount[1:100])
CustCount = c(3, 3, 1, 4, 1, 3, 2, 3, 2, 4, 1, 1, 5, 6, 8, 5, 2, 7, 7, 3, 2, 2, 2, 1, 3, 2, 3, 1, 1, 2, 1, 1, 3, 2, 2, 2, 3, 7, 5, 6, 8, 7, 3, 5, 6, 6, 8, 4, 2, 1, 2, 1, NA, NA, 4, 2, 2, 4, 11, 2, 8, 1, 4, 7, 11, 5, 3, 10, 7, 1, 1, NA, 2, NA, NA, 2, NA, NA, 1, 2, 3, 5, 9, 5, 9, 6, 6, 1, 5, 3, 7, 5, 8, 3, 2, 6, 3, 2, 3, 1 )
# dput(tsTiTo[1:100])
tsTiTo = c(45, 34, 11, 79, 102, 45, 21, 45, 104, 20, 2, 207, 45, 2, 3, 153, 8, 2, 173, 11, 207, 79, 45, 153, 192, 173, 130, 4, 173, 174, 173, 130, 79, 154, 4, 104, 192, 153, 192, 104, 28, 173, 52, 45, 11, 29, 22, 81, 7, 79, 193, 104, 1, 1, 46, 130, 45, 154, 153, 7, 174, 21, 193, 45, 79, 173, 45, 153, 45, 173, 2, 1, 2, 1, 1, 8, 1, 1, 79, 45, 79, 173, 45, 2, 173, 130, 104, 19, 4, 34, 2, 192, 42, 41, 31, 39, 11, 79, 4, 79)
Short answer is no and no to both questions. See the long answer below.
I was wondering in general if it was necessary to transform a
timeseries so that it resembled a normal distribution before passing
it to auto.arima()?
No. In the case of time series data, it is the innovation errors that you want to be normally distributed. Not the time series you are modelling.
This is similar to in the case of liner regression model, you don't expect the predictors to be normally distributed. It is the errors that you'd expect to be normally distributed.
Also does it matter if your xreg=... predictor is correlated with a
lag of the timeseries you're trying to predict, or vice versa?
You'd hope xreg are correlated this way. We are typing to use that information when looking for an appropriate model to forecast.

Difference Predictors in Auto.Arima Forecast

I'm trying to build an auto.arima forecast with predictors like the example below. I've noticed that my predictor is non-stationary. So I was wondering if I should difference the predictor before inputting it in the xreg parameter, like I've shown below. The real data set is much larger, this just an example. Any advice is greatly appreciated.
Code:
tsTrain <-tsTiTo[1:60]
tsTest <- tsTiTo[61:100]
ndiffs(ds$CustCount)
##returns 1
diffedCustCount<-diff(ds$CustCount,differences=1)
Xreg<-diffedCustCount[1:100]
##Predictor
xregTrain2 <- Xreg[1:60]
xregTest2 <- Xreg[61:100]
Arima.fit2 <- auto.arima(tsTrain, xreg = xregTrain2)
Acast2<-forecast(Arima.fit2, h=40, xreg = xregTest2)
Data:
dput(ds$CustCount[1:100])
c(3, 3, 1, 4, 1, 3, 2, 3, 2, 4, 1, 1, 5, 6, 8, 5, 2, 7, 7, 3, 2, 2, 2, 1, 3, 2, 3, 1, 1, 2, 1, 1, 3, 2, 2, 2, 3, 7, 5, 6, 8, 7, 3, 5, 6, 6, 8, 4, 2, 1, 2, 1, NA, NA, 4, 2, 2, 4, 11, 2, 8, 1, 4, 7, 11, 5, 3, 10, 7, 1, 1, NA, 2, NA, NA, 2, NA, NA, 1, 2, 3, 5, 9, 5, 9, 6, 6, 1, 5, 3, 7, 5, 8, 3, 2, 6, 3, 2, 3, 1 )
dput(tsTiTo[1:100])
c(45, 34, 11, 79, 102, 45, 21, 45, 104, 20, 2, 207, 45, 2, 3, 153, 8, 2, 173, 11, 207, 79, 45, 153, 192, 173, 130, 4, 173, 174, 173, 130, 79, 154, 4, 104, 192, 153, 192, 104, 28, 173, 52, 45, 11, 29, 22, 81, 7, 79, 193, 104, 1, 1, 46, 130, 45, 154, 153, 7, 174, 21, 193, 45, 79, 173, 45, 153, 45, 173, 2, 1, 2, 1, 1, 8, 1, 1, 79, 45, 79, 173, 45, 2, 173, 130, 104, 19, 4, 34, 2, 192, 42, 41, 31, 39, 11, 79, 4, 79)
The xreg argument in auto.arima performs a dynamic regression which is to say that you are performing a linear regression and fitting the errors with an arma process.
While auto.arima() used to require manual differencing for non-stationary data when external regressors are included, this is no longer the case. auto.arima() will take non-stationary data as an input and determine the order of differencing using a unit-root test.
See this Post from Rob Hyndman for further detail.

Resources