Difference Predictors in Auto.Arima Forecast

Difference Predictors in Auto.Arima Forecast - r

I'm trying to build an auto.arima forecast with predictors like the example below. I've noticed that my predictor is non-stationary. So I was wondering if I should difference the predictor before inputting it in the xreg parameter, like I've shown below. The real data set is much larger, this just an example. Any advice is greatly appreciated.
Code:
tsTrain <-tsTiTo[1:60]
tsTest <- tsTiTo[61:100]
ndiffs(ds$CustCount)
##returns 1
diffedCustCount<-diff(ds$CustCount,differences=1)
Xreg<-diffedCustCount[1:100]
##Predictor
xregTrain2 <- Xreg[1:60]
xregTest2 <- Xreg[61:100]
Arima.fit2 <- auto.arima(tsTrain, xreg = xregTrain2)
Acast2<-forecast(Arima.fit2, h=40, xreg = xregTest2)
Data:
dput(ds$CustCount[1:100])
c(3, 3, 1, 4, 1, 3, 2, 3, 2, 4, 1, 1, 5, 6, 8, 5, 2, 7, 7, 3, 2, 2, 2, 1, 3, 2, 3, 1, 1, 2, 1, 1, 3, 2, 2, 2, 3, 7, 5, 6, 8, 7, 3, 5, 6, 6, 8, 4, 2, 1, 2, 1, NA, NA, 4, 2, 2, 4, 11, 2, 8, 1, 4, 7, 11, 5, 3, 10, 7, 1, 1, NA, 2, NA, NA, 2, NA, NA, 1, 2, 3, 5, 9, 5, 9, 6, 6, 1, 5, 3, 7, 5, 8, 3, 2, 6, 3, 2, 3, 1 )
dput(tsTiTo[1:100])
c(45, 34, 11, 79, 102, 45, 21, 45, 104, 20, 2, 207, 45, 2, 3, 153, 8, 2, 173, 11, 207, 79, 45, 153, 192, 173, 130, 4, 173, 174, 173, 130, 79, 154, 4, 104, 192, 153, 192, 104, 28, 173, 52, 45, 11, 29, 22, 81, 7, 79, 193, 104, 1, 1, 46, 130, 45, 154, 153, 7, 174, 21, 193, 45, 79, 173, 45, 153, 45, 173, 2, 1, 2, 1, 1, 8, 1, 1, 79, 45, 79, 173, 45, 2, 173, 130, 104, 19, 4, 34, 2, 192, 42, 41, 31, 39, 11, 79, 4, 79)

The xreg argument in auto.arima performs a dynamic regression which is to say that you are performing a linear regression and fitting the errors with an arma process.
While auto.arima() used to require manual differencing for non-stationary data when external regressors are included, this is no longer the case. auto.arima() will take non-stationary data as an input and determine the order of differencing using a unit-root test.
See this Post from Rob Hyndman for further detail.

Related

How to generate a map for property cluster

Could you help me make a graph in R similar to the one I inserted in the image below, which shows the properties on a map, differentiating by cluster. See in my database that I have 4 properties, properties 1 and 3 are of cluster 1 and properties 2 and 4 are of cluster 2. In addition, the database has the coordinates of the properties, so I believe that with this information I can generate a graph similar to what I inserted. Surely, there must be some package in R that does something similar. Any help is welcome!
This link can help: https://rstudio-pubs-static.s3.amazonaws.com/176768_ec7fb4801e3a4772886d61e65885fbdd.html
#database
df<-structure(list(Properties = c(1,2,3,4),
Latitude = c(-24.930473, -24.95575,-24.924161,-24.95579),
Longitude = c(-49.994889, -49.990162,-50.004343, -50.007371),
cluster = c(1,2,1,2)), class = "data.frame", row.names = c(NA, -4L))
Properties Latitude Longitude cluster
1 1 -24.93047 -49.99489 1
2 2 -24.95575 -49.99016 2
3 3 -24.92416 -50.00434 1
4 4 -24.95579 -50.00737 2
Example of figure:
Your code
#database
df<-structure(list(Propertie = c(1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,
30, 31, 32, 33, 34, 35, 38, 39, 40, 42, 43, 44, 45, 46, 47, 48,
49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 61, 62, 64, 65, 66,
67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82,
83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98,
99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111,
112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124,
125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137,
138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150,
151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163,
164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176,
177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189,
190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202,
203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215,
216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228,
229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241,
242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254,
255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267,
268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280,
281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293,
294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306,
307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319,
320, 321, 322, 323, 324, 325), Latitude = c(-24.927417, -24.927417,
-24.927417, -24.927417, -24.930195, -24.930473, -24.946306, -24.949361,
-24.949361, -24.950195, -24.950195, -24.951584, -24.95575, -24.954084,
-24.96075, -24.957139, -24.95825, -24.96825, -24.961334, -24.968806,
-24.976861, -24.982139, -24.986584, -24.985487, -24.994362, -24.994362,
-24.999084, -24.771583, -24.77186, -24.772138, -24.772138, -24.78686,
-24.78436, -24.872139, -24.822222, -24.83549, -24.874916, -24.874916,
-24.874639, -24.865472, -24.873838, -24.87325, -24.858611, -24.874361,
-24.874361, -24.86, -24.860472, -24.874916, -24.814638, -24.814666,
-24.818527, -24.818527, -24.822694, -24.822694, -24.845472, -24.844638,
-24.878528, -24.879639, -24.879639, -24.906028, -24.897972, -24.900278,
-24.900278, -24.90075, -24.902972, -24.899361, -24.898611, -24.899083,
-24.913889, -24.908333, -24.914361, -24.914361, -24.924361, -24.915472,
-24.91075, -24.913805, -24.913528, -24.912139, -24.919917, -24.914083,
-24.914361, -24.914361, -24.925194, -24.92575, -24.928528, -24.929361,
-24.934361, -24.935278, -24.922694, -24.927139, -24.927972, -24.931861,
-24.936861, -24.878537, -24.887972, -24.882972, -24.901583, -24.901667,
-24.902139, -24.902139, -24.90325, -24.902972, -24.90299, -24.90575,
-24.905791, -24.899639, -24.899083, -24.875472, -24.878805, -24.883805,
-24.884916, -24.8905, -24.884083, -24.884087, -24.905194, -24.904125,
-24.894722, -24.895222, -24.895194, -24.911028, -24.907972, -24.908805,
-24.919916, -24.919361, -24.919639, -24.919639, -24.920194, -24.920472,
-24.917972, -24.908805, -24.911305, -24.91325, -24.917416, -24.928528,
-24.929083, -24.92325, -24.923805, -24.93188, -24.932139, -24.936028,
-24.935472, -24.937139, -24.923805, -24.922139, -24.922139, -24.926861,
-24.908805, -24.908333, -24.908805, -24.913805, -24.913805, -24.929638,
-24.939917, -24.943806, -24.942695, -24.94325, -24.944639, -24.946028,
-24.94825, -24.954084, -24.956111, -24.958611, -24.958806, -24.959084,
-24.958528, -24.958528, -24.956584, -24.955833, -24.95825, -24.960833,
-24.967417, -24.962695, -24.958611, -24.959083, -24.96075, -24.96075,
-24.964361, -24.961306, -24.961028, -24.962417, -24.965833, -24.964639,
-24.963806, -24.964917, -24.965472, -24.966861, -24.968528, -24.942972,
-24.948611, -24.950556, -24.951028, -24.951028, -24.93825, -24.941889,
-24.943528, -24.944639, -24.945194, -24.945472, -24.949083, -24.946861,
-24.94825, -24.949361, -24.951306, -24.948805, -24.948, -24.95075,
-24.952694, -24.959722, -24.961583, -24.96325, -24.96325, -24.96325,
-24.964639, -24.96575, -24.959361, -24.954639, -24.960472, -24.960472,
-24.966583, -24.970195, -24.972417, -24.976306, -24.974084, -24.974167,
-24.974639, -24.979362, -24.979639, -24.980278, -24.982973, -24.982973,
-24.977417, -24.979639, -24.981028, -24.981028, -24.98325, -24.969361,
-24.988056, -24.987139, -24.987139, -24.986584, -24.984639, -24.984639,
-24.984917, -24.984917, -24.994917, -24.987139, -24.989917, -24.992139,
-24.991861, -24.991861, -24.989639, -24.989917, -24.989917, -24.991861,
-24.989639, -24.992417, -24.975195, -24.97325, -24.979361, -24.972694,
-24.972972, -24.942417, -24.941861, -24.93825, -24.938273, -24.949639,
-24.948333, -24.948805, -24.949639, -24.949639, -24.951615, -24.951583,
-24.951615, -24.953611, -24.954639, -24.954639, -24.954639, -24.956861,
-24.956861, -24.966028, -24.956861, -24.955556, -24.957176, -24.96075,
-24.960194, -24.960231, -24.980194, -24.969106, -24.986306, -24.986306,
-24.993806, -24.877972, -24.878889, -24.87686, -24.886305, -24.875749,
-24.876305, -24.876319, -24.878805, -24.891027, -24.898527, -24.898527,
-24.904083, -24.904083, -24.905, -24.901328, -24.902138, -24.898268,
-24.900782, -24.901305, -24.88493, -24.887138, -24.929638, -25.001862,
-25.004084, -25.011028, -25.000194, -25.000472), Longitude = c(-49.98793,
-49.98793, -49.98793, -49.988778, -49.98962, -49.994889, -49.999912,
-49.991273, -49.991273, -49.996551, -49.996551, -49.995704, -49.990162,
-49.992945, -49.990718, -49.999056, -49.998222, -49.981259, -49.997389,
-49.979357, -49.999908, -49.995713, -49.980449, -49.995736, -49.980444,
-49.980444, -49.986852, -50.200149, -50.200172, -50.199602, -50.199603,
-50.199339, -50.209899, -50.038787, -50.243338, -50.235446, -50.139343,
-50.139348, -50.154871, -50.164607, -50.179621, -50.179895, -50.226412,
-50.196297, -50.196297, -50.233639, -50.234066, -50.242649, -50.251816,
-50.252098, -50.258233, -50.258233, -50.288502, -50.288525, -50.251001,
-50.261575, -50.039037, -50.044333, -50.044333, -50.015148, -50.115163,
-50.094472, -50.094472, -50.094899, -50.108204, -50.111829, -50.113653,
-50.114079, -50.010278, -50.017523, -50.010704, -50.010704, -50.004343,
-50.087667, -50.106547, -50.103487, -50.116283, -50.117968, -50.101301,
-50.119913, -50.120191, -50.120191, -50.079593, -50.080167, -50.082112,
-50.093519, -50.070172, -50.074194, -50.095459, -50.117959, -50.121024,
-50.094079, -50.102677, -50.129635, -50.140468, -50.143492, -50.166288,
-50.166426, -50.166816, -50.166844, -50.166024, -50.169635, -50.169635,
-50.165154, -50.165154, -50.175427, -50.182686, -50.188496, -50.203515,
-50.208765, -50.208487, -50.220728, -50.24933, -50.24933, -50.190159,
-50.204603, -50.241421, -50.241576, -50.241849, -50.135746, -50.144894,
-50.142117, -50.14408, -50.146839, -50.148223, -50.148223, -50.143802,
-50.144066, -50.151269, -50.163802, -50.159357, -50.160168, -50.159066,
-50.138232, -50.137107, -50.151288, -50.151001, -50.137376, -50.139061,
-50.132691, -50.132968, -50.152399, -50.170709, -50.176566, -50.176566,
-50.173237, -50.195182, -50.196949, -50.197376, -50.209608, -50.209608,
-50.239872, -50.007371, -50.006579, -50.007931, -50.008523, -50.01044,
-50.013787, -50.014607, -50.014037, -50.013056, -50.004181, -50.006569,
-50.004607, -50.008482, -50.008482, -50.026278, -50.030861, -50.018523,
-50.019444, -50.014903, -50.020181, -50.045875, -50.046301, -50.057121,
-50.057121, -50.036278, -50.040176, -50.043227, -50.044894, -50.036125,
-50.050158, -50.055186, -50.04876, -50.053213, -50.062385, -50.061561,
-50.085727, -50.093361, -50.083352, -50.083227, -50.083228, -50.10488,
-50.10351, -50.108783, -50.121816, -50.121279, -50.098487, -50.093788,
-50.104315, -50.10238, -50.107121, -50.108482, -50.111024, -50.124043,
-50.115723, -50.124343, -50.083375, -50.074315, -50.073515, -50.073514,
-50.073769, -50.070459, -50.072959, -50.106561, -50.116857, -50.113797,
-50.113797, -50.103802, -50.007107, -50.001815, -50.005185, -50.022371,
-50.021685, -50.022111, -50.004597, -50.006269, -50.007778, -50.001843,
-50.001843, -50.01906, -50.020185, -50.020185, -50.020426, -50.021843,
-50.06044, -50.00362, -50.00519, -50.00519, -50.007102, -50.024079,
-50.024079, -50.023778, -50.023778, -50.010732, -50.037686, -50.032936,
-50.03657, -50.038204, -50.038223, -50.041283, -50.042375, -50.044885,
-50.043227, -50.05851, -50.03988, -50.062653, -50.087385, -50.077112,
-50.110996, -50.119061, -50.126279, -50.132691, -50.149052, -50.149052,
-50.137371, -50.141431, -50.141858, -50.170992, -50.170992, -50.176288,
-50.176844, -50.176844, -50.14225, -50.142404, -50.142404, -50.142408,
-50.155432, -50.155432, -50.14852, -50.159344, -50.160579, -50.157409,
-50.158209, -50.170436, -50.170436, -50.132121, -50.165154, -50.144052,
-50.144052, -50.13408, -50.263247, -50.264755, -50.26821, -50.257386,
-50.28265, -50.2924, -50.2924, -50.303516, -50.264891, -50.251543,
-50.251543, -50.261302, -50.261539, -50.264755, -50.270455, -50.270747,
-50.294067, -50.290159, -50.290432, -50.315715, -50.320456, -50.251849,
-49.989338, -49.986551, -49.976296, -50.127404, -50.127654),
cluster = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 1,
4, 4, 5, 5, 5, 5, 5, 5, 4, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 1, 1, 1, 1, 5, 5, 5, 5, 5, 5, 5, 5, 1, 1, 1, 1,
1, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 4, 4, 5, 5, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 5, 5, 5, 5, 5, 5, 2, 5, 5, 5, 5, 5, 5,
5, 5, 2, 2, 2, 2, 2, 2, 2, 5, 5, 5, 5, 5, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 2, 5, 5)), row.names = c(NA,
-318L), class = c("tbl_df", "tbl", "data.frame"))
w1<-convexhull.xy(df$Longitude[df$cluster==1], df$Latitude[df$cluster==1])
w2<-convexhull.xy(df$Longitude[df$cluster==2], df$Latitude[df$cluster==2])
w3<-convexhull.xy(df$Longitude[df$cluster==3], df$Latitude[df$cluster==3])
w4<-convexhull.xy(df$Longitude[df$cluster==4], df$Latitude[df$cluster==4])
w5<-convexhull.xy(df$Longitude[df$cluster==5], df$Latitude[df$cluster==5])
p1<-st_as_sf(w1, crs=4269)
p2<-st_as_sf(w2, crs=4269)
p3<-st_as_sf(w3, crs=4269)
p4<-st_as_sf(w4, crs=4269)
p5<-st_as_sf(w5, crs=4269)
poly<-rbind(p1,p2,p3,p4,p5)
poly[,"cluster"]<-c(1,2,3,4,5)
pts<-st_as_sf(df, coords=c("Longitude", "Latitude"), crs=4269)
tmap_mode("plot")
tm_shape(poly)+
tm_polygons(col="cluster", palette=c("darkolivegreen","skyblue","skyblue","yellow","pink"), style="cat", title="cluster")+
tm_shape(pts)+
tm_dots(size=2)+
tm_layout(legend.outside = TRUE)

One approach would be use a voronoi partition. ggvoronoi will do this for you with ggplot2, and you could easily overlay it on a ggmap map.
There is also a st_voronoi function in the sf package which will create a voronoi partition shapefile from a MULTIPOINT shape (see update below).
Here is a simple example using your data. I have removed duplicated points (i.e. the same point in different clusters) which break the voronoi algorithm!
library(tidyverse) #specifically ggplot2 and dplyr for the pipe
library(ggvoronoi)
df %>% distinct(Longitude, Latitude, .keep_all = TRUE) %>%
ggplot(aes(x = Longitude, y = Latitude, fill = factor(cluster), label = cluster)) +
geom_voronoi() +
geom_text()
Update:
To do this with sf_voronoi you can do the following (unlike ggvoronoi, sf_voronoi works without having to weed out the duplicates)...
pts_vor <- pts %>% st_union() %>% #merge points into a MULTIPOINT
st_voronoi() %>% #calculate voronoi polygons
st_cast() %>% #next three lines return it to a useable list of polygons
data.frame(geometry = .) %>%
st_sf(.) %>%
st_join(., pts) #merge with clusters and other variables
pts_vor %>% ggplot(aes(fill = factor(cluster))) +
geom_sf(colour = NA) + #draw voronoi tiles (no borders)
geom_sf_text(data = pts, aes(label = cluster)) + #plot points
coord_sf(xlim = c(-50.35, -49.95), ylim = c(-25.05, -24.75))

#Antonio, I think this might be the solution you are after, but it requires at least three points per cluster to work, which from your figure I am assuming you have in your full dataset. The trick is to create convex hulls and convert them into polygons. This can be accomplished using the convexhull.xy() function in the spatstat:: package. Then these can be converted into simple features in the sf:: package, and then drawn with your mapping package of choice. I personally am a fan of the tmap:: package. Here is a reproducible example. Note, I had to add two more points to your example data to make this work (you cannot compute a polygon from only two points).
##Loading Necessary Packages##
library(spatstat)#For convexhull.xy() function
library(tmap)# For drawing the map
library(sf) #To create simple features for mapping
##Loading Example Data##
df<-structure(list(Properties = c(1,2,3,4,5,6),
Latitude = c(-24.930473, -24.95575,-24.924161,-24.95579, -24.94557, -24.93267),
Longitude = c(-49.994889, -49.990162,-50.004343, -50.007371, -50.01542, -50.00702),
cluster = c(1,2,1,2,1,2)), class = "data.frame", row.names = c(NA, -6L))
##Calculating convexhulls for each cluster##
w1<-convexhull.xy(df$Longitude[df$cluster==1], df$Latitude[df$cluster==1])
w2<-convexhull.xy(df$Longitude[df$cluster==2], df$Latitude[df$cluster==2])
##Converting hulls to simple features. Note, I assumed that you are using the EPSG 4269 projection (WGS84)
p1<-st_as_sf(w1, crs=4269)
p2<-st_as_sf(w2, crs=4269)
#Combining the two simple features together
poly<-rbind(p1,p2)
#Labelling the clusters
poly[,"cluster"]<-c(1,2)
#Creating a point simple feature from your property data in the dataframe
pts<-st_as_sf(df, coords=c("Longitude", "Latitude"), crs=4269)
#Setting the mapping mode to plot. Change this to "view" if you want an interactive map
tmap_mode("plot")
#Drawing the map
tm_shape(poly)+
tm_polygons(col="cluster", palette=c("darkolivegreen", "skyblue"), style="cat", title="cluster")+
tm_shape(pts)+
tm_dots(size=2)+
tm_layout(legend.outside = TRUE)

How can I edit the common legend title name using ggplot2 and ggpubr?

I am using ggpubr to combine multiple graphs in a single plot, but cannot seem to correctly generate one graph with the title that I would like. I would like the title to say "Customized legend," given that it is a common legend for both graphs. Does anybody know how I can do this?
Here is my data:
data1 = data.frame(var1 = c(1,
1,
1,
1,
2,
2,
2,
2,
3,
3,
3,
3,
4,
4,
4,
4,
5,
5,
5,
5,
6,
6,
6,
6,
7,
7,
7,
7,
8,
8,
8,
8,
9,
9,
9,
9,
10,
10,
10,
10,
11,
11,
11,
11,
12,
12,
12,
12,
13,
13,
13,
13,
14,
14,
14,
14,
15,
15,
15,
15,
16,
16,
16,
16,
17,
17,
17,
17,
18,
18,
18,
18,
19,
19,
19,
19,
20,
20,
20,
20,
21,
21,
21,
21,
22,
22,
22,
22,
23,
23,
23,
23,
24,
24,
24,
24,
25,
25,
25,
25,
26,
26,
26,
26,
27,
27,
27,
27,
28,
28,
28,
28,
29,
29,
29,
29,
30,
30,
30,
30,
31,
31,
31,
31,
32,
32,
32,
32,
33,
33,
33,
33),
var2 = c(1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4),
var3 = c(113,
89,
99,
41,
72,
64,
39,
139,
135,
17,
3,
135,
63,
126,
34,
87,
84,
125,
123,
18,
115,
11,
68,
85,
48,
95,
56,
129,
41,
78,
82,
122,
124,
4,
60,
132,
67,
128,
46,
79,
110,
88,
19,
88,
88,
126,
30,
11,
52,
66,
15,
52,
6,
74,
14,
101,
88,
70,
58,
20,
104,
76,
134,
23,
40,
1,
47,
25,
49,
110,
96,
100,
106,
26,
93,
19,
87,
41,
13,
40,
63,
87,
137,
105,
89,
95,
24,
49,
112,
92,
45,
105,
112,
105,
114,
129,
84,
33,
95,
95,
15,
90,
1,
62,
20,
7,
18,
96,
4,
71,
42,
94,
45,
102,
55,
98,
124,
80,
76,
97,
41,
31,
25,
21,
135,
138,
121,
93,
17,
13,
49,
26))
data2 <- data.frame(var1a = c(1,
1,
1,
1,
2,
2,
2,
2,
3,
3,
3,
3,
4,
4,
4,
4,
5,
5,
5,
5,
6,
6,
6,
6,
7,
7,
7,
7,
8,
8,
8,
8,
9,
9,
9,
9,
10,
10,
10,
10,
11,
11,
11,
11,
12,
12,
12,
12,
13,
13,
13,
13,
14,
14,
14,
14,
15,
15,
15,
15,
16,
16,
16,
16,
17,
17,
17,
17,
18,
18,
18,
18,
19,
19,
19,
19,
20,
20,
20,
20,
21,
21,
21,
21,
22,
22,
22,
22,
23,
23,
23,
23,
24,
24,
24,
24,
25,
25,
25,
25,
26,
26,
26,
26,
27,
27,
27,
27,
28,
28,
28,
28,
29,
29,
29,
29,
30,
30,
30,
30,
31,
31,
31,
31,
32,
32,
32,
32,
33,
33,
33,
33),
var2a = c(1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4,
1,
2,
3,
4),
var3a = c(113,
89,
99,
41,
72,
64,
39,
139,
135,
17,
3,
135,
63,
126,
34,
87,
84,
125,
123,
18,
115,
11,
68,
85,
48,
95,
56,
129,
41,
78,
82,
122,
124,
4,
60,
132,
67,
128,
46,
79,
110,
88,
19,
88,
88,
126,
30,
11,
52,
66,
15,
52,
6,
74,
14,
101,
88,
70,
58,
20,
104,
76,
134,
23,
40,
1,
47,
25,
49,
110,
96,
100,
106,
26,
93,
19,
87,
41,
13,
40,
63,
87,
137,
105,
89,
95,
24,
49,
112,
92,
45,
105,
112,
105,
114,
129,
84,
33,
95,
95,
15,
90,
1,
62,
20,
7,
18,
96,
4,
71,
42,
94,
45,
102,
55,
98,
124,
80,
76,
97,
41,
31,
25,
21,
135,
138,
121,
93,
17,
13,
49,
26))
Here is the code that I am using:
#Open packages
library(ggplot2)
library(ggpubr)
#Set the theme
theme_set(theme_pubr())
#Change necessary columns to factor
data1$var2 <- factor(data1$var2, levels = c(1,2,3,4))
data2$var2a <- factor(data2$var2a, levels = c(1,2,3,4))
#Generate the plots
#Generate plots
plot1 <- ggplot(data1, aes(x = var1, y = var3, group = var2)) +
geom_line(size = 1.5, aes(linetype = var2, color = var2)) +
xlab('x_label') +
ylab('y_label')+
scale_fill_discrete(name = 'customized legend')
plot2 <- ggplot(data2, aes(x = var1a, y = var3a, group = var2a)) +
geom_line(size = 1.5, aes(linetype = var2a, color = var2a)) +
xlab('x_label') +
ylab('y_label')+
scale_fill_discrete(name = 'customized legend')
#Combine both into one picture
fig <- ggarrange(plot1, plot2,
ncol = 2,
nrow = 1,
common.legend = TRUE,
legend = "bottom")
fig

Since you didn't use the fill aesthetic in your ggplot, you should not use scale_fill_discrete. What you need is to set the legend title of linetype and color to "customized legend", since those are the aesthetics that you used.
library(ggplot2)
library(ggpubr)
plot1 <- ggplot(data1, aes(x = var1, y = var3, group = var2)) +
geom_line(size = 1.5, aes(linetype = var2, color = var2)) +
xlab('x_label') +
ylab('y_label') +
labs(linetype = "customized legend", color = "customized legend")
plot2 <- ggplot(data2, aes(x = var1a, y = var3a, group = var2a)) +
geom_line(size = 1.5, aes(linetype = var2a, color = var2a)) +
xlab('x_label') +
ylab('y_label') +
labs(linetype = "customized legend", color = "customized legend")
#Combine both into one picture
ggarrange(plot1, plot2,
ncol = 2,
nrow = 1,
common.legend = TRUE,
legend = "bottom")

Create a graph from data frame with a layout base on attribute

I create a graph from a data frame. And I would like the vertices to be arranged and moved apart according to the hamming value that is contained in data.rw$Hamming.
I would like some help
data.rw <- structure(list(g1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4,
4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6,
6, 6, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 9, 9, 9, 9, 10, 10, 10,
11, 11, 12), g2 = c(2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 3,
4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 5, 6, 7, 8, 9, 10, 11, 12, 13, 6, 7, 8, 9, 10, 11, 12, 13,
7, 8, 9, 10, 11, 12, 13, 8, 9, 10, 11, 12, 13, 9, 10, 11, 12,
13, 10, 11, 12, 13, 11, 12, 13, 12, 13, 13), Hamming = c(116,
86, 101, 92, 84, 78, 83, 102, 87, 100, 96, 97, 90, 111, 98, 90,
92, 87, 114, 95, 108, 104, 109, 85, 74, 68, 60, 67, 84, 71, 84,
78, 79, 83, 85, 79, 78, 101, 90, 101, 91, 92, 72, 66, 67, 92,
77, 90, 82, 83, 62, 59, 88, 71, 86, 78, 81, 59, 78, 63, 74, 68,
73, 83, 60, 77, 75, 72, 89, 100, 94, 97, 79, 75, 82, 90, 93,
91)), row.names = c(NA, -78L), class = "data.frame")
set.seed(1234)
vertice.df <- unique(c(data.rw$name1,data.rw$name2))
g <- graph_from_data_frame(d = data.rw, vertices = vertice.df, directed = F)
plot(g)

I recommend a distance-based layout for this task, multidimensional scaling comes to mind:
m <- get.adjacency(g, attr = "Hamming", sparse = F)
# optionally: m <- dist(m)
l <- layout_with_mds(g, dist = m, dim = 2)
First extract the weighted adjacency matrix from the graph and feed it into the layout function (dist = m). This returns a 2-dimensional matrix l (dim = 2) that you can use as layout for the position of the nodes.
plot(g, layout = l)
Have a look at ?cmdscale if you are interested in MDS and specifically the eig parameter to later assess the goodness-of-fit. Chances are that two dimensions are not enough to adequately reflect the distances between nodes. But that's for you to decide.

Auto.Arima transform timeseries and xreg correlation with lagged forecast timeseries

I'm trying to forecast an auto.arima() model like the one below.
I was wondering in general if it was necessary to transform a timeseries so that it resembled a normal distribution before passing it to auto.arima()?
Also does it matter if your xreg=... predictor is correlated with a lag of the timeseries you're trying to predict, or vice versa?
Code:
tsTrain <-tsTiTo[1:60]
tsTest <- tsTiTo[61:100]
Xreg<-CustCount[1:100]
##Predictor
xregTrain2 <- Xreg[1:60]
xregTest2 <- Xreg[61:100]
Arima.fit2 <- auto.arima(tsTrain, xreg = xregTrain2)
Acast2<-forecast(Arima.fit2, h=40, xreg = xregTest2)
Data:
#dput(ds$CustCount[1:100])
CustCount = c(3, 3, 1, 4, 1, 3, 2, 3, 2, 4, 1, 1, 5, 6, 8, 5, 2, 7, 7, 3, 2, 2, 2, 1, 3, 2, 3, 1, 1, 2, 1, 1, 3, 2, 2, 2, 3, 7, 5, 6, 8, 7, 3, 5, 6, 6, 8, 4, 2, 1, 2, 1, NA, NA, 4, 2, 2, 4, 11, 2, 8, 1, 4, 7, 11, 5, 3, 10, 7, 1, 1, NA, 2, NA, NA, 2, NA, NA, 1, 2, 3, 5, 9, 5, 9, 6, 6, 1, 5, 3, 7, 5, 8, 3, 2, 6, 3, 2, 3, 1 )
# dput(tsTiTo[1:100])
tsTiTo = c(45, 34, 11, 79, 102, 45, 21, 45, 104, 20, 2, 207, 45, 2, 3, 153, 8, 2, 173, 11, 207, 79, 45, 153, 192, 173, 130, 4, 173, 174, 173, 130, 79, 154, 4, 104, 192, 153, 192, 104, 28, 173, 52, 45, 11, 29, 22, 81, 7, 79, 193, 104, 1, 1, 46, 130, 45, 154, 153, 7, 174, 21, 193, 45, 79, 173, 45, 153, 45, 173, 2, 1, 2, 1, 1, 8, 1, 1, 79, 45, 79, 173, 45, 2, 173, 130, 104, 19, 4, 34, 2, 192, 42, 41, 31, 39, 11, 79, 4, 79)

Short answer is no and no to both questions. See the long answer below.
I was wondering in general if it was necessary to transform a
timeseries so that it resembled a normal distribution before passing
it to auto.arima()?
No. In the case of time series data, it is the innovation errors that you want to be normally distributed. Not the time series you are modelling.
This is similar to in the case of liner regression model, you don't expect the predictors to be normally distributed. It is the errors that you'd expect to be normally distributed.
Also does it matter if your xreg=... predictor is correlated with a
lag of the timeseries you're trying to predict, or vice versa?
You'd hope xreg are correlated this way. We are typing to use that information when looking for an appropriate model to forecast.

Principal component analysis using "Principal"

I am using principal() function from psych package to replicate SPSS principal component analysis results in R. (as recommended in: https://stats.stackexchange.com/questions/612/is-pca-followed-by-a-rotation-e-g-varimax-still-pca)
I'm using the code below:
com<-principal(ws, nfactors = 1, residuals = FALSE,rotate="varimax",n.obs=NA, covar=TRUE,
scores=TRUE,missing=TRUE,impute="mean",oblique.scores=TRUE,
method="correlation")
But I'm getting the following error:
Error in solve.default(model, r) :
Lapack routine dgesv: system is exactly singular: U[104,104] = 0
In addition: Warning message:
In cor.smooth(r) : Matrix was not positive definite, smoothing was done
Any advice would be much appreciated. I'm attaching a sample of my data (by dput(ws)) below. Thank you so much!
structure(list(hv001 = c(906, 3101, 4001, 6801, 1502, 3508, 4301,
2205, 508, 901, 906, 2304, 4913, 3803, 5704, 2902, 4701, 2303,
3602, 5905, 4805, 1405, 6301, 4007, 904, 302, 7003, 1602, 7107,
4801, 1803, 3403, 6904, 3002, 509, 6904, 1407, 5803, 5802, 4504,
405, 502, 7207, 2706, 2703, 4007, 3403, 703, 2304, 4806, 403,
502, 5607, 902, 2707, 2709, 6402, 6902, 1902, 7501, 2202, 2201,
7205, 4901, 2204, 3504, 5608, 3202, 3505, 2202, 5608, 2306, 7402,
7109, 7303, 5904, 3602, 6102, 101, 7302, 906, 3506, 6003, 6302,
909, 4801, 4806, 3104, 5708, 6802, 404, 4903, 5609, 2101, 3502,
4002, 1602, 7206, 6903, 7103), hv002 = c(26, 372, 147, 215, 23,
48, 130, 17, 48, 214, 93, 15, 122, 27, 172, 5, 53, 100, 177,
341, 279, 42, 47, 10, 15, 47, 90, 106, 218, 110, 95, 86, 69,
77, 47, 49, 43, 55, 324, 88, 120, 177, 112, 462, 347, 3, 69,
36, 16, 561, 109, 322, 28, 47, 123, 8, 174, 66, 101, 55, 212,
201, 57, 231, 53, 128, 211, 136, 102, 246, 154, 5, 59, 231, 263,
108, 137, 35, 98, 165, 27, 205, 8, 70, 147, 189, 417, 71, 23,
56, 139, 302, 106, 83, 24, 12, 56, 95, 4, 198), hv005 = c(346129,
1051441, 458555, 513316, 2337718, 2526226, 1956333, 420415, 2337718,
106202, 346129, 420415, 1181939, 1956333, 852471, 420415, 1956333,
420415, 1956333, 811088, 1181939, 1042611, 513316, 1956333, 346129,
1042611, 595684, 2337718, 249256, 1181939, 2526226, 671551, 595684,
1566584, 2337718, 595684, 1042611, 852471, 264004, 1956333, 2337718,
537817, 1238001, 1051441, 1051441, 1956333, 671551, 1042611,
420415, 1181939, 2337718, 537817, 852471, 106202, 1051441, 1051441,
513316, 595684, 2526226, 513316, 162660, 162660, 1238001, 333940,
420415, 671551, 852471, 2526226, 2526226, 162660, 852471, 420415,
214934, 1238001, 595684, 811088, 1956333, 811088, 346129, 595684,
346129, 2526226, 811088, 513316, 346129, 1181939, 1181939, 1566584,
852471, 513316, 2337718, 333940, 852471, 1566584, 671551, 458555,
2337718, 1238001, 595684, 249256), hv025 = c(2, 1, 1, 2, 2, 2,
2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 2, 1, 1, 2, 1, 2,
2, 2, 2, 1, 2, 1, 1, 1, 2, 2, 2, 2, 1, 1, 2, 1, 2, 1, 2, 2, 2,
1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 1, 2, 2, 1, 1, 2, 2, 2, 1), hv009 = c(7, 3, 4, 5, 4, 3, 4,
2, 5, 2, 8, 3, 4, 4, 2, 3, 7, 4, 3, 5, 9, 5, 6, 1, 3, 5, 7, 5,
6, 2, 4, 4, 4, 3, 5, 5, 3, 5, 3, 4, 4, 5, 3, 5, 4, 4, 5, 9, 3,
3, 4, 3, 4, 3, 5, 4, 9, 8, 6, 1, 5, 4, 6, 1, 8, 3, 3, 5, 2, 6,
5, 2, 3, 1, 7, 3, 7, 3, 7, 7, 3, 4, 5, 5, 3, 3, 4, 5, 8, 2, 3,
5, 2, 7, 2, 2, 7, 10, 6, 5), hv012 = c(7, 3, 4, 5, 4, 3, 4, 2,
5, 2, 8, 3, 4, 4, 2, 3, 3, 4, 3, 5, 9, 5, 6, 1, 3, 5, 7, 5, 6,
2, 4, 4, 4, 3, 5, 5, 3, 5, 3, 4, 4, 5, 3, 5, 4, 4, 5, 9, 3, 3,
4, 3, 4, 3, 5, 2, 9, 8, 6, 1, 5, 4, 6, 1, 8, 3, 3, 5, 2, 6, 5,
2, 3, 1, 7, 3, 7, 3, 7, 7, 3, 4, 5, 5, 3, 3, 4, 5, 6, 2, 3, 5,
2, 7, 2, 2, 7, 10, 6, 4), hv013 = c(7, 3, 4, 4, 4, 3, 4, 2, 5,
2, 8, 3, 4, 4, 2, 3, 7, 4, 3, 5, 9, 5, 4, 1, 3, 3, 7, 5, 6, 2,
0, 4, 3, 3, 5, 5, 3, 5, 3, 4, 4, 3, 3, 5, 4, 4, 5, 9, 3, 3, 4,
3, 4, 3, 5, 4, 9, 8, 6, 1, 4, 4, 6, 1, 8, 3, 3, 5, 2, 6, 5, 2,
3, 1, 5, 3, 7, 3, 7, 7, 3, 4, 5, 5, 3, 2, 4, 5, 8, 2, 3, 3, 1,
7, 2, 2, 7, 10, 6, 5), hv201 = c(44, 12, 12, 43, 21, 41, 13,
43, 21, 12, 44, 12, 13, 31, 21, 13, 12, 12, 11, 31, 21, 21, 12,
13, 13, 43, 41, 21, 21, 12, 21, 21, 13, 12, 12, 13, 21, 12, 21,
13, 21, 21, 21, 31, 11, 13, 21, 12, 12, 21, 21, 21, 21, 44, 13,
21, 44, 13, 21, 13, 12, 12, 21, 11, 12, 13, 13, 21, 13, 12, 43,
13, 12, 21, 13, 13, 13, 13, 12, 43, 44, 13, 42, 13, 13, 11, 21,
13, 21, 43, 12, 11, 32, 12, 11, 13, 21, 21, 44, 21), hv204 = c(15,
996, 996, 20, 996, 5, 15, 30, 996, 996, 30, 996, 5, 25, 5, 20,
996, 996, 996, 10, 996, 10, 996, 20, 60, 60, 30, 996, 996, 996,
30, 10, 15, 996, 996, 10, 996, 996, 996, 996, 996, 996, 5, 996,
996, 10, 996, 996, 996, 10, 996, 996, 996, 20, 10, 15, 20, 5,
10, 10, 996, 996, 996, 996, 996, 996, 5, 996, 5, 996, 90, 5,
996, 996, 5, 996, 15, 20, 996, 15, 30, 9, 30, 5, 10, 996, 996,
20, 996, 15, 996, 996, 10, 996, 996, 5, 996, 996, 5, 996), hv205 = c(23,
12, 12, 22, 31, 22, 31, 31, 12, 31, 31, 12, 23, 23, 31, 12, 13,
13, 23, 12, 13, 31, 22, 13, 31, 23, 31, 13, 31, 11, 31, 12, 31,
23, 12, 31, 31, 23, 31, 12, 31, 12, 31, 12, 12, 12, 13, 23, 41,
12, 12, 12, 13, 23, 12, 14, 31, 22, 31, 22, 12, 12, 31, 12, 13,
12, 31, 12, 12, 12, 31, 22, 13, 31, 31, 31, 22, 13, 13, 31, 23,
12, 31, 31, 13, 12, 12, 31, 13, 31, 12, 12, 31, 23, 12, 12, 31,
12, 22, 12)), .Names = c("hv001", "hv002", "hv005", "hv025",
"hv009", "hv012", "hv013", "hv201", "hv204", "hv205"), row.names = c(10678L,
3179L, 4323L, 8790L, 835L, 3936L, 4789L, 1696L, 6299L, 10495L,
10703L, 1909L, 5863L, 4156L, 7311L, 2912L, 5073L, 1857L, 3991L,
7840L, 5254L, 690L, 8477L, 4520L, 10597L, 3056L, 9158L, 889L,
9572L, 5096L, 1178L, 3598L, 9040L, 2989L, 6336L, 9030L, 768L,
7583L, 7539L, 5005L, 4713L, 6064L, 10013L, 2486L, 2392L, 4531L,
3596L, 9322L, 1910L, 5317L, 4631L, 6080L, 7070L, 10542L, 2501L,
2594L, 8582L, 8957L, 1223L, 10372L, 1587L, 1547L, 9960L, 5458L,
1667L, 3749L, 7100L, 3355L, 3783L, 1591L, 7087L, 1991L, 10264L,
9655L, 10147L, 7781L, 3982L, 8396L, 120L, 10106L, 10679L, 3841L,
8033L, 8519L, 10789L, 5111L, 5313L, 3309L, 7464L, 8835L, 4668L,
5526L, 7117L, 1537L, 3698L, 4355L, 916L, 10006L, 8995L, 9434L
), class = "data.frame")

If you leave the method default (which is "regression") you should be able to get the PCA results. However, I doubt if you will get the same results as SPSS (I have seen discrepancy between SPSS and R with "varimax" rotation results).

I went through the code. It seems that scores cannot be passed as an argument to the function so please change the value of the score to FALSE.
Try the code below....
res = principal(ws, nfactors = 1, residuals = F, rotate = "varimax", n.obs = NA, covar = T, scores = F, missing = T, impute = "mean", oblique.scores = T, method = "correlation")
I think it will help to continue your work.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Difference Predictors in Auto.Arima Forecast - r

Related

How to generate a map for property cluster

How can I edit the common legend title name using ggplot2 and ggpubr?

Create a graph from data frame with a layout base on attribute

Auto.Arima transform timeseries and xreg correlation with lagged forecast timeseries

Principal component analysis using "Principal"

Categories

Resources