auto.arima forecast with multivariate xreg - unexpected results

auto.arima forecast with multivariate xreg - unexpected results - r

In my spare time I try to sharpen my skills a bit on forecasting techniques and today's issue focused on forecasting with multiple regressors. I have created a time series that is influenced by two regressors, but wondering how forecast with them.
library(forecast)
I tried the following:
First my time series:
ts.series3 <- structure(c(313, 253, 230, 258, 261, 303, 266, 269, 245, 274,
346, 252, 283, 286, 260, 365, 295, 268, 301, 304, 353, 310, 313,
285, 319, 403, 294, 330, 333, 303, 425, 343, 312, 350, 354, 411,
361, 366, 333, 469, 380, 346, 487, 394, 359, 404, 511, 372, 418
), .Tsp = c(2003.08333333333, 2007.08333333333, 12), class = "ts")
The time series above is based on the trend show in ts.trend (below) and is modified by the modifiers. In case the first modifiers is relevant the value is increased by 25%, and in case of the second then the value is decreased with 10%. When both are applicable then they are increased by 15%.
ts.trend <- structure(c(250, 253, 255, 258, 261, 264, 266, 269, 272, 274,
277, 280, 283, 286, 289, 292, 295, 298, 301, 304, 307, 310, 313,
316, 319, 323, 326, 330, 333, 337, 340, 343, 347, 350, 354, 357,
361, 366, 370, 375, 380, 385, 390, 394, 399, 404, 409, 414, 418
), .Tsp = c(2003.08333333333, 2007.08333333333, 12), class = "ts")
A multivariate time series with the two regressors:
modifiers <- structure(c(1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0,
0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0,
0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0,
1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0,
1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), .Dim = c(60L,
2L), .Dimnames = list(NULL, c("Adjust1", "Adjust2")), .Tsp = c(2003.08333333333,
2008, 12), class = c("mts", "ts"))
Then I try to make the following model:
fit.series3 <- auto.arima(ts.series3,xreg=window(modifiers,end=2007.16))
fcast.series3 <- forecast(fit.series3,xreg=window(modifiers,start=2007.161))
The code seems to be working fine, but the plot (see below) doesn't really make sense as there are no regressors identified you would expect that the forecast would more or less follow the trend line. Is there somebody who can provide some insights into what is happening here?
plot(fcast.series3)
The forecast plot looks as following when I am not using any regressor variables. I am more confident about this forecast than about the one in the plot above. I used the following lines of code to produce the chart:
fit.series3clean <- auto.arima(ts.series3)
fcast.series3clean <- forecast(fit.series3clean)
plot(fcast.series3clean)
I am wondering whether somebody understand what is happening with my forecast with multivariate xreg values. Also, I am curious to hear about other approaches to forecasting with multivariate regressors.

Take a look at your fitted model:
> fit.series3
Series: ts.series3
ARIMA(0,1,1)(0,1,0)[12]
Coefficients:
ma1 Adjust1 Adjust2
-0.7586 80.1919 285.6239
s.e. 0.0832 0.0842 NaN
sigma^2 estimated as 71.55: log likelihood=-128.38
AIC=264.76 AICc=266.05 BIC=271.09
Warning message:
In sqrt(diag(x$var.coef)) : NaNs produced
There is a problem with the coefficient of Adjust2 as the standard error is NaN.
The following code shows the problem:
> window(diff(diff(modifiers[,2],12)),end=2007.16)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2004 0 0 0 0 0 0 0 0 0 0
2005 0 0 0 0 0 0 0 0 0 0 0 0
2006 0 0 0 0 0 0 0 0 0 0 0 0
2007 0 0
During the fitting period the twice differenced Adjust2 is always zero making the coefficient essentially undefined (and with infinite variance).

Related

How to Write a List into a Empty data fame with a Loop in R

I am trying to write a list into a data frame after forecast is generated in a loop. But the forecast generated is overwritten in one column of the data frame and I want it to go in different columns. I am not sure what is it that I am doing wrong. Please share your opinion on this.
library(dplyr)
library(tidyverse)
library(tidyr)
library(tidymodels)
library(forecast)
library(prophet)
library(readxl)
library(writexl)
library(tibble)
pd <- readxl::read_excel("C:/X/X/X/X/Dummy.xlsx")
colnames(pd)[1]="ds"
colnames(pd)
pd1 <- pd %>% select(`X1`,`X2`,`X3`)
pd2 <- pd %>% select(`X1`)
Output = data.frame()
for(i in 2:ncol(pd))
{
Yi<- ts(data = pd[,i],
frequency = 12,
start = c(2019,1),
end = c(2022,8))
#print(Yi)
Model = HoltWinters(x=Yi,
seasonal = 'additive')
Predictions = forecast(Model,h=6)
print(Predictions$mean)
Output = as.data.frame(Predictions$mean)
print(Output)
}
When I print the output, I can see that the forecast is getting written into the data frame as shown in the image below but its overwritten as I cant specify the column reference.
https://imgur.com/gallery/2MTZq5r
I tried perfroming this, but this also failed
Output[,i] = as.data.frame(Predictions$mean)
The Ideal Expectation is given in the image below:
https://imgur.com/gallery/t98hUJE
But even this output will be fine if the other one is not possible.
https://imgur.com/gallery/pzxXifs
the dataframe pd is given below:
structure(list(ds = c("2019-01-01", "2019-02-01", "2019-03-01",
"2019-04-01", "2019-05-01", "2019-06-01", "2019-07-01", "2019-08-01",
"2019-09-01", "2019-10-01", "2019-11-01", "2019-12-01", "2020-01-01",
"2020-02-01", "2020-03-01", "2020-04-01", "2020-05-01", "2020-06-01",
"2020-07-01", "2020-08-01", "2020-09-01", "2020-10-01", "2020-11-01",
"2020-12-01", "2021-01-01", "2021-02-01", "2021-03-01", "2021-04-01",
"2021-05-01", "2021-06-01", "2021-07-01", "2021-08-01", "2021-09-01",
"2021-10-01", "2021-11-01", "2021-12-01", "2022-01-01", "2022-02-01",
"2022-03-01", "2022-04-01", "2022-05-01", "2022-06-01", "2022-07-01",
"2022-08-01"), X1 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5,
85, 72, 111, 96, 50, 95, 48, 87, 75, 249, 173, 74, 86, 127, 209,
92, 137, 49, 84, 75, 73, 376, 196, 91, 107, 124, 177, 244, 275,
100, 176), X2 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 19, 29, 243,
281, 262, 283, 0, 264, 104, 289, 41, 76), X3 = c(0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 157, 171, 377, 409, 375, 314, 253, 322,
130, 472, 115, 179)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -44L))

The issue which you are coming across is because you have created a empty data frame, if you create a data frame based on the required length of months needed for the forecast, the numbers of columns with time series data or just a data frame and fill it with NA, you should be able to get the output with the above mentioned code and a few changes. I wrote the code which would create a empty data frame in par with your requirement.
Creation of the empty data frame can be done by data.frame(matrix(NA,nrow = 6,ncol = 6))
To determine the rows and columns dynamically, you can use the below code.
freq = 6
pd1 <- pd %>% select(`X1`,`X2`,`X3`)
Output = data.frame(matrix(NA,nrow = freq,ncol = ncol(pd1)))
for(i in 1:ncol(pd1))
{
Yi<- ts(data = pd1[,i],
frequency = 12,
start = c(2019,1),
end = c(2022,8))
Model = HoltWinters(x=Yi,
seasonal = 'additive')
Predictions = forecast(Model,h=freq)
Output[,i] = as.data.frame(Predictions$mean)
}

if you do what I did bellow the Output is now a list. Is this what you wanted?
library(forecast)
pd<-structure(list(ds=c("2019-01-01","2019-02-01","2019-03-01",
"2019-04-01","2019-05-01","2019-06-01","2019-07-01","2019-08-01",
"2019-09-01","2019-10-01","2019-11-01","2019-12-01","2020-01-01",
"2020-02-01","2020-03-01","2020-04-01","2020-05-01","2020-06-01",
"2020-07-01","2020-08-01","2020-09-01","2020-10-01","2020-11-01",
"2020-12-01","2021-01-01","2021-02-01","2021-03-01","2021-04-01",
"2021-05-01","2021-06-01","2021-07-01","2021-08-01","2021-09-01",
"2021-10-01","2021-11-01","2021-12-01","2022-01-01","2022-02-01",
"2022-03-01","2022-04-01","2022-05-01","2022-06-01","2022-07-01",
"2022-08-01"),
X1 = c(0,0,0,0,0,0,0,0,0,0,0,0,5,85,72,111,96,50,95,48,87,75,249,173,74,86,127,209,92,137,49,84,75,73,376,196,91,107,124,177,244,275,100,176),
X2 = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,19,29,243,281,262,283,0,264,104,289,41,76),
X3 = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,157,171,377,409,375,314,253,322,130,472,115,179)),
class = c("tbl_df","tbl","data.frame"),
row.names = c(NA,-44L))
colnames(pd)[1]="ds"
colnames(pd)
output = list()
for(i in c("X1","X2","X3")) {
Yi<-ts(data = pd[,i],
frequency = 12,
start = c(2019,1),
end = c(2022,8))
Model<-HoltWinters(x=Yi,seasonal = 'additive')
Predictions = forecast(Model,h=6)
print(Predictions$mean)
output[[i]] = as.data.frame(Predictions$mean)
}
do.call(cbind,output)

StatsPlots Boxplot decrease width of boxes

I have a boxplot in Julia, that I created using StatsPlots boxplot:
boxes = -0.002:0.0001:0.0012
boxed = [[sum([1 for tuple ∈ data if tuple.y > box-0.000125 && tuple.y ≤ box+0.000125]) for box ∈ boxes] for data in datas]
boxplot(repeat([box for box ∈ boxes], outer=size(boxed)[1]), [(boxed...)...]; outliers=false)
The current result looks like this:
which is obviously hideous. I need to reduce the width of the boxes to a ~20000th of what it currently is. I can achieve this by scaling the x axis accordingly:
boxplot(repeat([box*20000 for box ∈ boxes], outer=size(boxed)[1]), [(boxed...)...]; outliers=false)
but then the x-axis has wrong values.
The help of the boxplot command sadly doesn't specify such an option:
help?> boxplot
search: boxplot boxplot! groupedboxplot groupedboxplot!
boxplot(x, y)
boxplot!(x, y)
Make a box and whisker plot.
Keyword arguments
≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡
• notch: Bool. Notch the box plot? (false)
• range: Real. Values more than range*IQR below the first quartile or above the third quartile are shown as outliers (1.5)
• outliers: Bool. Show outliers? (true)
• whisker_width: Real or Symbol. Length of whiskers (:match)
Example
≡≡≡≡≡≡≡≡≡
julia> using StatsPlots
julia> boxplot(repeat([1,2,3],outer=100),randn(300))
and I've already tried reasonable options like boxwex, width or box_width, which all didn't help. The documentation sadly also is of no help at all.
How can I change the width of the boxes without changing the scale of the x axis?
If, for some reason, you're interested, here's the content of the boxed array:
[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 7, 26, 80, 170, 322, 486, 688, 817, 888, 849, 783, 732, 624, 500, 349, 232, 130, 49], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 28, 83, 181, 318, 491, 670, 761, 849, 843, 862, 799, 646, 481, 361, 225, 98, 50], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 8, 28, 80, 179, 322, 493, 660, 753, 803, 832, 823, 783, 657, 541, 367, 223, 121, 62], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 7, 23, 84, 171, 312, 463, 640, 778, 834, 820, 763, 752, 655, 518, 374, 244, 133, 52], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 21, 70, 169, 342, 527, 725, 808, 861, 857, 799, 688, 622, 523, 369, 232, 115, 41], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 9, 28, 76, 150, 301, 492, 660, 760, 823, 862, 790, 749, 646, 525, 352, 223, 116, 54], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 6, 21, 64, 165, 290, 434, 585, 771, 852, 847, 785, 739, 630, 535, 354, 230, 114, 42], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 4, 19, 76, 190, 337, 506, 680, 775, 851, 853, 816, 705, 588, 496, 388, 232, 127, 54]]
With that, the plot can be replicated as follows:
using StatsPlots
boxes = -0.002:0.0001:0.0012
boxed = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 7, 26, 80, 170, 322, 486, 688, 817, 888, 849, 783, 732, 624, 500, 349, 232, 130, 49], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 28, 83, 181, 318, 491, 670, 761, 849, 843, 862, 799, 646, 481, 361, 225, 98, 50], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 8, 28, 80, 179, 322, 493, 660, 753, 803, 832, 823, 783, 657, 541, 367, 223, 121, 62], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 7, 23, 84, 171, 312, 463, 640, 778, 834, 820, 763, 752, 655, 518, 374, 244, 133, 52], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 21, 70, 169, 342, 527, 725, 808, 861, 857, 799, 688, 622, 523, 369, 232, 115, 41], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 9, 28, 76, 150, 301, 492, 660, 760, 823, 862, 790, 749, 646, 525, 352, 223, 116, 54], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 6, 21, 64, 165, 290, 434, 585, 771, 852, 847, 785, 739, 630, 535, 354, 230, 114, 42], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 4, 19, 76, 190, 337, 506, 680, 775, 851, 853, 816, 705, 588, 496, 388, 232, 127, 54]]
boxplot(repeat([box for box ∈ boxes], outer=size(boxed)[1]), [(boxed...)...]; outliers=false)

You can change the xticks:
boxed = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 7, 26, 80, 170, 322, 486, 688, 817, 888, 849, 783, 732, 624, 500, 349, 232, 130, 49], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 28, 83, 181, 318, 491, 670, 761, 849, 843, 862, 799, 646, 481, 361, 225, 98, 50], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 8, 28, 80, 179, 322, 493, 660, 753, 803, 832, 823, 783, 657, 541, 367, 223, 121, 62], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 7, 23, 84, 171, 312, 463, 640, 778, 834, 820, 763, 752, 655, 518, 374, 244, 133, 52], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 21, 70, 169, 342, 527, 725, 808, 861, 857, 799, 688, 622, 523, 369, 232, 115, 41], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 9, 28, 76, 150, 301, 492, 660, 760, 823, 862, 790, 749, 646, 525, 352, 223, 116, 54], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 6, 21, 64, 165, 290, 434, 585, 771, 852, 847, 785, 739, 630, 535, 354, 230, 114, 42], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 4, 19, 76, 190, 337, 506, 680, 775, 851, 853, 816, 705, 588, 496, 388, 232, 127, 54]]
boxes = -0.002:0.0001:0.0012
xx = repeat(boxes, outer = length(boxed))
yy = collect(Iterators.flatten(boxed))
using StatsPlots
xtick = collect(-0.002:0.0005:0.0012)
boxplot(xx * 20000, yy, xticks = (xtick * 20000, xtick))
Update: you can change bar_width
boxplot(xx, yy, bar_width = 0.0001)

How to plot a data.table with columns with lists

Currently I have a data.table of this form:
USER active reason days # of elements by hour
4q7C0o 1 NA 28 c(0, 0, 0, 0, 0, 0, 5, 98, 167, 211, 246)
2BrKY63 1 NA 28 c(0, 0, 0, 0, 0, 0, 0, 5, 15, 24, 89, 187)
3drUy6I 1 NA 28 c(0, 0, 0, 0, 0, 0, 0, 0, 1, 112, 265, 309)
G5ALtO 1 NA 28 c(0, 0, 0, 0, 0, 0, 0, 2, 20, 153, 170)
Where in the column "#elements by hour" each list is 24 elements long (i ommited the rest just for clarity)
However I don know how to perform the 2 following things:
1) plot all the #elements by hour in a single plot and label them by "user" or "active" (something that appears like a time series)
2) apply a function also to the column "elements by hour"
I tried following but it gives nothing:
plotserieslines <- function(yvar){
ggplot(tickets_by_hour_2019031, aes_(x=c(0:23) ,y=yvar)) +
geom_line()
}
lapply(names(tickets_by_hour_2019031[,#elements by hour,]), plotserieslines)
and $tickets_by_hour_2019031$ is my data.table

HistogramTools package usage in R

I am generating two histograms based on the script below, they are not equal on the x axis and as I wish to compare them I can not do so. Therefore, what can I do to run the script properly, any ideas how to approach this issue?
Thanks
x<-c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 3,
2, 6, 5, 13, 12, 15, 27, 34, 37, 58, 85, 90, 111, 131, 161, 164,
191, 211, 267, 293, 288, 320, 364, 370, 379, 413, 429, 473, 546,
539, 551, 593, 614, 594, 644, 617, 599, 605, 531, 591, 524, 482,
470, 437, 446, 428, 384, 368, 331, 332, 320, 317, 295, 266, 286,
284, 342, 360, 394, 480, 502, 600, 547, 610, 524, 545, 497, 414,
381, 345, 351, 371, 326, 336, 341, 336, 324, 346, 360, 386, 368,
396, 428, 432, 434, 438, 513, 498, 452, 452, 403, 397, 407, 405,
460, 515, 541, 608, 522, 542, 514, 517, 551, 661, 669, 739, 805,
847, 921, 1031, 965, 973, 1030, 1043, 815, 818, 648, 520, 433,
338, 295, 162, 106, 70, 44, 15, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
y<-c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0,
1, 1, 6, 4, 11, 4, 9, 15, 12, 34, 40, 49, 75, 65, 107, 132, 136,
157, 178, 189, 217, 278, 276, 296, 323, 435, 464, 473, 581, 613,
705, 820, 925, 1025, 1061, 1080, 1176, 1236, 1166, 1075, 1027,
976, 935, 807, 697, 658, 593, 440, 408, 347, 312, 296, 242, 284,
260, 243, 254, 283, 291, 371, 444, 470, 607, 719, 676, 722, 644,
678, 650, 662, 666, 607, 621, 558, 623, 634, 634, 699, 756, 771,
790, 852, 893, 1011, 1048, 1010, 966, 936, 860, 791, 681, 686,
752, 850, 952, 1049, 1094, 1134, 1156, 1198, 1351, 1342, 1533,
1461, 1271, 1065, 865, 739, 534, 459, 359, 275, 169, 124, 108,
80, 74, 64, 69, 61, 59, 56, 60, 76, 113, 102, 132, 101, 79, 92,
55, 41, 26, 17, 5, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0)
h1 <- hist(x)
h2 <- hist(y)
require(HistogramTools)
minkowski.dist(h1, h2, 1)
minkowski.dist(h1, h2, 2)
minkowski.dist(h1, h2, 3)
intersect.dist(h1, h2)
kl.divergence(h1, h2)
jeffrey.divergence(h1, h2)

Both vectors (x and y) have 256 observations. Are these observations paired, i.e., for each x there is a corresponding y, and they have the same unit of measurement?
If yes, you can subtract one vector from the other and just plot the histogram of their differences to compare each other. Something like below:
length(x) #check number os observations in x
length(y) #check number os observations in y
diff = x-y #difference between x and y
hist(diff, xlab="x-y", main="Difference of vectors x and y")
If the x and y cannot be paired, but they have the same unit of measument there is the option provided by Hav0k.
Set the axis on both histograms with the same length and breaks to visually compare each other.
par(mfrow=c(1,2)) #stacks hisotgrams in one row and two columns
hist(x, xlim=c(0,1600), ylim=c(0,200), breaks=seq(0,1600,100),main="")
hist(y, xlim=c(0,1600), ylim=c(0,200), breaks=seq(0,1600,100),main="")
It is also possible to calculate the dissimilarity metrics with these:
h1 = hist(x, xlim=c(0,1600), ylim=c(0,200), breaks=seq(0,1600,100),main="")
h2 = hist(y, xlim=c(0,1600), ylim=c(0,200), breaks=seq(0,1600,100),main="")
minkowski.dist(h1, h2, 1) #116
minkowski.dist(h1, h2, 2) #38.88
minkowski.dist(h1, h2, 3) #29.81
intersect.dist(h1, h2) #0.22
If x and y have different units of measurements there is the option of standardizing the data before computing the dissimilarities.
x_standardized = (x-mean(x))/(sd(x))
y_standardized = (y-mean(y))/(sd(y))
h1=hist(x_standardized)
h2=hist(y_standardized)
minkowski.dist(h1, h2, 1) #58
minkowski.dist(h1, h2, 2)#26.57
minkowski.dist(h1, h2, 3) #22.1
intersect.dist(h1, h2) # 0.11
kl.divergence(h1, h2) # 0.07
jeffrey.divergence(h1, h2) #0.03

image function in R

I'm trying to do something a little bit complicated for a beginner in programming.
I have a matrix 16x16 and I want to plot the values as a heatmap using image() in R.
How can I plot the "0" (zeros) in blue when the sum (row index + column index) is <= 15? Is that possible?
example matrix:
x <- c(3045, 893, 692, 830, 617, 155, 246, 657, 105, 60, 18, 7, 7, 4, 2, 11234,
2985, 2242, 2471, 1575, 366, 503, 1283, 170, 79, 32, 6, 4, 1, 3, 19475, 4756,
3233, 3251, 1810, 409, 575, 1210, 139, 41, 11, 4, 2, 0, 0, 20830, 4739, 2990,
2531, 1346, 298, 325, 612, 60, 17, 1, 0, 1, 0, 0, 15304, 3196, 1885, 1440, 610,
117, 115, 185, 14, 2, 0, 0, 0, 0, 0, 8026, 1535, 806, 539, 223, 33, 37, 39, 0,
0, 0, 0, 0, 0, 0, 3300, 562, 286, 141, 45, 14, 5, 12, 0, 0, 0, 0, 0, 0, 0, 1067,
160, 65, 40, 14, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 277, 47, 6, 2, 1, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 72, 6, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 5, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
xmat <- matrix(x, ncol = 12)
xmat <- cbind(xmat, rep(0,16), rep(0,16), rep(0,16), rep(0,16))
xmat <- rbind(xmat, rep(0,16))
dimnames(xmat) = list(0:15, 0:15)
xmat
Thanks!
Vitor

Plot the cases meeting the criteria as blue.
xmat.new <- xmat
xmat.new[!((row(xmat) + col(xmat) <= 15) & xmat==0)] <- NA
image(xmat.new,col="blue")
Plot the cases not meeting the criteria as normal. Notice the add=TRUE
xmat.new <- xmat
xmat.new[((row(xmat) + col(xmat) <= 15) & xmat==0)] <- NA
image(xmat.new,add=TRUE)
Result:
Edited to include #Marek's suggestion to simplify the statements.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

auto.arima forecast with multivariate xreg - unexpected results - r

Related

How to Write a List into a Empty data fame with a Loop in R

StatsPlots Boxplot decrease width of boxes

How to plot a data.table with columns with lists

HistogramTools package usage in R

image function in R

Categories

Resources