Related
I am trying to write a list into a data frame after forecast is generated in a loop. But the forecast generated is overwritten in one column of the data frame and I want it to go in different columns. I am not sure what is it that I am doing wrong. Please share your opinion on this.
library(dplyr)
library(tidyverse)
library(tidyr)
library(tidymodels)
library(forecast)
library(prophet)
library(readxl)
library(writexl)
library(tibble)
pd <- readxl::read_excel("C:/X/X/X/X/Dummy.xlsx")
colnames(pd)[1]="ds"
colnames(pd)
pd1 <- pd %>% select(`X1`,`X2`,`X3`)
pd2 <- pd %>% select(`X1`)
Output = data.frame()
for(i in 2:ncol(pd))
{
Yi<- ts(data = pd[,i],
frequency = 12,
start = c(2019,1),
end = c(2022,8))
#print(Yi)
Model = HoltWinters(x=Yi,
seasonal = 'additive')
Predictions = forecast(Model,h=6)
print(Predictions$mean)
Output = as.data.frame(Predictions$mean)
print(Output)
}
When I print the output, I can see that the forecast is getting written into the data frame as shown in the image below but its overwritten as I cant specify the column reference.
https://imgur.com/gallery/2MTZq5r
I tried perfroming this, but this also failed
Output[,i] = as.data.frame(Predictions$mean)
The Ideal Expectation is given in the image below:
https://imgur.com/gallery/t98hUJE
But even this output will be fine if the other one is not possible.
https://imgur.com/gallery/pzxXifs
the dataframe pd is given below:
structure(list(ds = c("2019-01-01", "2019-02-01", "2019-03-01",
"2019-04-01", "2019-05-01", "2019-06-01", "2019-07-01", "2019-08-01",
"2019-09-01", "2019-10-01", "2019-11-01", "2019-12-01", "2020-01-01",
"2020-02-01", "2020-03-01", "2020-04-01", "2020-05-01", "2020-06-01",
"2020-07-01", "2020-08-01", "2020-09-01", "2020-10-01", "2020-11-01",
"2020-12-01", "2021-01-01", "2021-02-01", "2021-03-01", "2021-04-01",
"2021-05-01", "2021-06-01", "2021-07-01", "2021-08-01", "2021-09-01",
"2021-10-01", "2021-11-01", "2021-12-01", "2022-01-01", "2022-02-01",
"2022-03-01", "2022-04-01", "2022-05-01", "2022-06-01", "2022-07-01",
"2022-08-01"), X1 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5,
85, 72, 111, 96, 50, 95, 48, 87, 75, 249, 173, 74, 86, 127, 209,
92, 137, 49, 84, 75, 73, 376, 196, 91, 107, 124, 177, 244, 275,
100, 176), X2 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 19, 29, 243,
281, 262, 283, 0, 264, 104, 289, 41, 76), X3 = c(0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 157, 171, 377, 409, 375, 314, 253, 322,
130, 472, 115, 179)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -44L))
The issue which you are coming across is because you have created a empty data frame, if you create a data frame based on the required length of months needed for the forecast, the numbers of columns with time series data or just a data frame and fill it with NA, you should be able to get the output with the above mentioned code and a few changes. I wrote the code which would create a empty data frame in par with your requirement.
Creation of the empty data frame can be done by data.frame(matrix(NA,nrow = 6,ncol = 6))
To determine the rows and columns dynamically, you can use the below code.
freq = 6
pd1 <- pd %>% select(`X1`,`X2`,`X3`)
Output = data.frame(matrix(NA,nrow = freq,ncol = ncol(pd1)))
for(i in 1:ncol(pd1))
{
Yi<- ts(data = pd1[,i],
frequency = 12,
start = c(2019,1),
end = c(2022,8))
Model = HoltWinters(x=Yi,
seasonal = 'additive')
Predictions = forecast(Model,h=freq)
Output[,i] = as.data.frame(Predictions$mean)
}
if you do what I did bellow the Output is now a list. Is this what you wanted?
library(forecast)
pd<-structure(list(ds=c("2019-01-01","2019-02-01","2019-03-01",
"2019-04-01","2019-05-01","2019-06-01","2019-07-01","2019-08-01",
"2019-09-01","2019-10-01","2019-11-01","2019-12-01","2020-01-01",
"2020-02-01","2020-03-01","2020-04-01","2020-05-01","2020-06-01",
"2020-07-01","2020-08-01","2020-09-01","2020-10-01","2020-11-01",
"2020-12-01","2021-01-01","2021-02-01","2021-03-01","2021-04-01",
"2021-05-01","2021-06-01","2021-07-01","2021-08-01","2021-09-01",
"2021-10-01","2021-11-01","2021-12-01","2022-01-01","2022-02-01",
"2022-03-01","2022-04-01","2022-05-01","2022-06-01","2022-07-01",
"2022-08-01"),
X1 = c(0,0,0,0,0,0,0,0,0,0,0,0,5,85,72,111,96,50,95,48,87,75,249,173,74,86,127,209,92,137,49,84,75,73,376,196,91,107,124,177,244,275,100,176),
X2 = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,19,29,243,281,262,283,0,264,104,289,41,76),
X3 = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,157,171,377,409,375,314,253,322,130,472,115,179)),
class = c("tbl_df","tbl","data.frame"),
row.names = c(NA,-44L))
colnames(pd)[1]="ds"
colnames(pd)
output = list()
for(i in c("X1","X2","X3")) {
Yi<-ts(data = pd[,i],
frequency = 12,
start = c(2019,1),
end = c(2022,8))
Model<-HoltWinters(x=Yi,seasonal = 'additive')
Predictions = forecast(Model,h=6)
print(Predictions$mean)
output[[i]] = as.data.frame(Predictions$mean)
}
do.call(cbind,output)
I have a boxplot in Julia, that I created using StatsPlots boxplot:
boxes = -0.002:0.0001:0.0012
boxed = [[sum([1 for tuple ∈ data if tuple.y > box-0.000125 && tuple.y ≤ box+0.000125]) for box ∈ boxes] for data in datas]
boxplot(repeat([box for box ∈ boxes], outer=size(boxed)[1]), [(boxed...)...]; outliers=false)
The current result looks like this:
which is obviously hideous. I need to reduce the width of the boxes to a ~20000th of what it currently is. I can achieve this by scaling the x axis accordingly:
boxplot(repeat([box*20000 for box ∈ boxes], outer=size(boxed)[1]), [(boxed...)...]; outliers=false)
but then the x-axis has wrong values.
The help of the boxplot command sadly doesn't specify such an option:
help?> boxplot
search: boxplot boxplot! groupedboxplot groupedboxplot!
boxplot(x, y)
boxplot!(x, y)
Make a box and whisker plot.
Keyword arguments
≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡
• notch: Bool. Notch the box plot? (false)
• range: Real. Values more than range*IQR below the first quartile or above the third quartile are shown as outliers (1.5)
• outliers: Bool. Show outliers? (true)
• whisker_width: Real or Symbol. Length of whiskers (:match)
Example
≡≡≡≡≡≡≡≡≡
julia> using StatsPlots
julia> boxplot(repeat([1,2,3],outer=100),randn(300))
and I've already tried reasonable options like boxwex, width or box_width, which all didn't help. The documentation sadly also is of no help at all.
How can I change the width of the boxes without changing the scale of the x axis?
If, for some reason, you're interested, here's the content of the boxed array:
[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 7, 26, 80, 170, 322, 486, 688, 817, 888, 849, 783, 732, 624, 500, 349, 232, 130, 49], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 28, 83, 181, 318, 491, 670, 761, 849, 843, 862, 799, 646, 481, 361, 225, 98, 50], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 8, 28, 80, 179, 322, 493, 660, 753, 803, 832, 823, 783, 657, 541, 367, 223, 121, 62], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 7, 23, 84, 171, 312, 463, 640, 778, 834, 820, 763, 752, 655, 518, 374, 244, 133, 52], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 21, 70, 169, 342, 527, 725, 808, 861, 857, 799, 688, 622, 523, 369, 232, 115, 41], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 9, 28, 76, 150, 301, 492, 660, 760, 823, 862, 790, 749, 646, 525, 352, 223, 116, 54], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 6, 21, 64, 165, 290, 434, 585, 771, 852, 847, 785, 739, 630, 535, 354, 230, 114, 42], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 4, 19, 76, 190, 337, 506, 680, 775, 851, 853, 816, 705, 588, 496, 388, 232, 127, 54]]
With that, the plot can be replicated as follows:
using StatsPlots
boxes = -0.002:0.0001:0.0012
boxed = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 7, 26, 80, 170, 322, 486, 688, 817, 888, 849, 783, 732, 624, 500, 349, 232, 130, 49], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 28, 83, 181, 318, 491, 670, 761, 849, 843, 862, 799, 646, 481, 361, 225, 98, 50], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 8, 28, 80, 179, 322, 493, 660, 753, 803, 832, 823, 783, 657, 541, 367, 223, 121, 62], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 7, 23, 84, 171, 312, 463, 640, 778, 834, 820, 763, 752, 655, 518, 374, 244, 133, 52], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 21, 70, 169, 342, 527, 725, 808, 861, 857, 799, 688, 622, 523, 369, 232, 115, 41], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 9, 28, 76, 150, 301, 492, 660, 760, 823, 862, 790, 749, 646, 525, 352, 223, 116, 54], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 6, 21, 64, 165, 290, 434, 585, 771, 852, 847, 785, 739, 630, 535, 354, 230, 114, 42], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 4, 19, 76, 190, 337, 506, 680, 775, 851, 853, 816, 705, 588, 496, 388, 232, 127, 54]]
boxplot(repeat([box for box ∈ boxes], outer=size(boxed)[1]), [(boxed...)...]; outliers=false)
You can change the xticks:
boxed = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 7, 26, 80, 170, 322, 486, 688, 817, 888, 849, 783, 732, 624, 500, 349, 232, 130, 49], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 28, 83, 181, 318, 491, 670, 761, 849, 843, 862, 799, 646, 481, 361, 225, 98, 50], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 8, 28, 80, 179, 322, 493, 660, 753, 803, 832, 823, 783, 657, 541, 367, 223, 121, 62], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 7, 23, 84, 171, 312, 463, 640, 778, 834, 820, 763, 752, 655, 518, 374, 244, 133, 52], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 21, 70, 169, 342, 527, 725, 808, 861, 857, 799, 688, 622, 523, 369, 232, 115, 41], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 9, 28, 76, 150, 301, 492, 660, 760, 823, 862, 790, 749, 646, 525, 352, 223, 116, 54], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 6, 21, 64, 165, 290, 434, 585, 771, 852, 847, 785, 739, 630, 535, 354, 230, 114, 42], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 4, 19, 76, 190, 337, 506, 680, 775, 851, 853, 816, 705, 588, 496, 388, 232, 127, 54]]
boxes = -0.002:0.0001:0.0012
xx = repeat(boxes, outer = length(boxed))
yy = collect(Iterators.flatten(boxed))
using StatsPlots
xtick = collect(-0.002:0.0005:0.0012)
boxplot(xx * 20000, yy, xticks = (xtick * 20000, xtick))
Update: you can change bar_width
boxplot(xx, yy, bar_width = 0.0001)
Currently I have a data.table of this form:
USER active reason days # of elements by hour
4q7C0o 1 NA 28 c(0, 0, 0, 0, 0, 0, 5, 98, 167, 211, 246)
2BrKY63 1 NA 28 c(0, 0, 0, 0, 0, 0, 0, 5, 15, 24, 89, 187)
3drUy6I 1 NA 28 c(0, 0, 0, 0, 0, 0, 0, 0, 1, 112, 265, 309)
G5ALtO 1 NA 28 c(0, 0, 0, 0, 0, 0, 0, 2, 20, 153, 170)
Where in the column "#elements by hour" each list is 24 elements long (i ommited the rest just for clarity)
However I don know how to perform the 2 following things:
1) plot all the #elements by hour in a single plot and label them by "user" or "active" (something that appears like a time series)
2) apply a function also to the column "elements by hour"
I tried following but it gives nothing:
plotserieslines <- function(yvar){
ggplot(tickets_by_hour_2019031, aes_(x=c(0:23) ,y=yvar)) +
geom_line()
}
lapply(names(tickets_by_hour_2019031[,#elements by hour,]), plotserieslines)
and $tickets_by_hour_2019031$ is my data.table
I am generating two histograms based on the script below, they are not equal on the x axis and as I wish to compare them I can not do so. Therefore, what can I do to run the script properly, any ideas how to approach this issue?
Thanks
x<-c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 3,
2, 6, 5, 13, 12, 15, 27, 34, 37, 58, 85, 90, 111, 131, 161, 164,
191, 211, 267, 293, 288, 320, 364, 370, 379, 413, 429, 473, 546,
539, 551, 593, 614, 594, 644, 617, 599, 605, 531, 591, 524, 482,
470, 437, 446, 428, 384, 368, 331, 332, 320, 317, 295, 266, 286,
284, 342, 360, 394, 480, 502, 600, 547, 610, 524, 545, 497, 414,
381, 345, 351, 371, 326, 336, 341, 336, 324, 346, 360, 386, 368,
396, 428, 432, 434, 438, 513, 498, 452, 452, 403, 397, 407, 405,
460, 515, 541, 608, 522, 542, 514, 517, 551, 661, 669, 739, 805,
847, 921, 1031, 965, 973, 1030, 1043, 815, 818, 648, 520, 433,
338, 295, 162, 106, 70, 44, 15, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
y<-c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0,
1, 1, 6, 4, 11, 4, 9, 15, 12, 34, 40, 49, 75, 65, 107, 132, 136,
157, 178, 189, 217, 278, 276, 296, 323, 435, 464, 473, 581, 613,
705, 820, 925, 1025, 1061, 1080, 1176, 1236, 1166, 1075, 1027,
976, 935, 807, 697, 658, 593, 440, 408, 347, 312, 296, 242, 284,
260, 243, 254, 283, 291, 371, 444, 470, 607, 719, 676, 722, 644,
678, 650, 662, 666, 607, 621, 558, 623, 634, 634, 699, 756, 771,
790, 852, 893, 1011, 1048, 1010, 966, 936, 860, 791, 681, 686,
752, 850, 952, 1049, 1094, 1134, 1156, 1198, 1351, 1342, 1533,
1461, 1271, 1065, 865, 739, 534, 459, 359, 275, 169, 124, 108,
80, 74, 64, 69, 61, 59, 56, 60, 76, 113, 102, 132, 101, 79, 92,
55, 41, 26, 17, 5, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0)
h1 <- hist(x)
h2 <- hist(y)
require(HistogramTools)
minkowski.dist(h1, h2, 1)
minkowski.dist(h1, h2, 2)
minkowski.dist(h1, h2, 3)
intersect.dist(h1, h2)
kl.divergence(h1, h2)
jeffrey.divergence(h1, h2)
Both vectors (x and y) have 256 observations. Are these observations paired, i.e., for each x there is a corresponding y, and they have the same unit of measurement?
If yes, you can subtract one vector from the other and just plot the histogram of their differences to compare each other. Something like below:
length(x) #check number os observations in x
length(y) #check number os observations in y
diff = x-y #difference between x and y
hist(diff, xlab="x-y", main="Difference of vectors x and y")
If the x and y cannot be paired, but they have the same unit of measument there is the option provided by Hav0k.
Set the axis on both histograms with the same length and breaks to visually compare each other.
par(mfrow=c(1,2)) #stacks hisotgrams in one row and two columns
hist(x, xlim=c(0,1600), ylim=c(0,200), breaks=seq(0,1600,100),main="")
hist(y, xlim=c(0,1600), ylim=c(0,200), breaks=seq(0,1600,100),main="")
It is also possible to calculate the dissimilarity metrics with these:
h1 = hist(x, xlim=c(0,1600), ylim=c(0,200), breaks=seq(0,1600,100),main="")
h2 = hist(y, xlim=c(0,1600), ylim=c(0,200), breaks=seq(0,1600,100),main="")
minkowski.dist(h1, h2, 1) #116
minkowski.dist(h1, h2, 2) #38.88
minkowski.dist(h1, h2, 3) #29.81
intersect.dist(h1, h2) #0.22
If x and y have different units of measurements there is the option of standardizing the data before computing the dissimilarities.
x_standardized = (x-mean(x))/(sd(x))
y_standardized = (y-mean(y))/(sd(y))
h1=hist(x_standardized)
h2=hist(y_standardized)
minkowski.dist(h1, h2, 1) #58
minkowski.dist(h1, h2, 2)#26.57
minkowski.dist(h1, h2, 3) #22.1
intersect.dist(h1, h2) # 0.11
kl.divergence(h1, h2) # 0.07
jeffrey.divergence(h1, h2) #0.03
I'm trying to do something a little bit complicated for a beginner in programming.
I have a matrix 16x16 and I want to plot the values as a heatmap using image() in R.
How can I plot the "0" (zeros) in blue when the sum (row index + column index) is <= 15? Is that possible?
example matrix:
x <- c(3045, 893, 692, 830, 617, 155, 246, 657, 105, 60, 18, 7, 7, 4, 2, 11234,
2985, 2242, 2471, 1575, 366, 503, 1283, 170, 79, 32, 6, 4, 1, 3, 19475, 4756,
3233, 3251, 1810, 409, 575, 1210, 139, 41, 11, 4, 2, 0, 0, 20830, 4739, 2990,
2531, 1346, 298, 325, 612, 60, 17, 1, 0, 1, 0, 0, 15304, 3196, 1885, 1440, 610,
117, 115, 185, 14, 2, 0, 0, 0, 0, 0, 8026, 1535, 806, 539, 223, 33, 37, 39, 0,
0, 0, 0, 0, 0, 0, 3300, 562, 286, 141, 45, 14, 5, 12, 0, 0, 0, 0, 0, 0, 0, 1067,
160, 65, 40, 14, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 277, 47, 6, 2, 1, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 72, 6, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 5, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
xmat <- matrix(x, ncol = 12)
xmat <- cbind(xmat, rep(0,16), rep(0,16), rep(0,16), rep(0,16))
xmat <- rbind(xmat, rep(0,16))
dimnames(xmat) = list(0:15, 0:15)
xmat
Thanks!
Vitor
Plot the cases meeting the criteria as blue.
xmat.new <- xmat
xmat.new[!((row(xmat) + col(xmat) <= 15) & xmat==0)] <- NA
image(xmat.new,col="blue")
Plot the cases not meeting the criteria as normal. Notice the add=TRUE
xmat.new <- xmat
xmat.new[((row(xmat) + col(xmat) <= 15) & xmat==0)] <- NA
image(xmat.new,add=TRUE)
Result:
Edited to include #Marek's suggestion to simplify the statements.