Holt-Winters fitted values does not start from first date in R - r

I am trying to overlay Holt-Winter fitted values (red) with original values (black) in R, but the red line does not begin from the start.
Perhaps this is related to the size of moving window, if yes, how do I alter in the HoltWinter() object in R?
Tried:
dfts <- dfts$Monthly_Sales %>% ts(start = c(2019, 1), end = c(2021, 12), frequency = 12)
hw1 <- HoltWinters(dfts)
hw1.pred <- predict(hw1, n.ahead = 24, prediction.interval = TRUE, level = 0.95)
plot(hw1, predicted.values = hw1.pred, ylab = "Sales Amt")
Expecting: Red line to begin from the start or near to the start.
Got:
Thanks.

Related

How to plot very high odds ratios in R using forestplot? (Issue with max limit on clip)

I have been trying to display the odds of receiving a psychotropic medication for a list of psychiatric diagnoses but have not been able to show the entire range (on a log scale) due to the limitations of the x axis.
Looking at the forestplot documentation, it appears that the clip() is what is used to specify the xlimits. However, I have noticed that anytime I set it to be something greater than 54 the number on the bottom will not be shown at all and it stops at 4. This is an issue for me because I need to plot numbers as high as 221 (the upper confidence limit for my highest odds ratio).
I am using the following code:
# Cochrane data from the 'rmeta'-package
base_data <- tibble::tibble(mean = c(19.92 , 41.46, 11.67, 11.69, 25.44, 105.89, 145.45),
lower = c(17.09, 34.70, 9.04, 10.92, 19.78, 67.40, 95.64),
upper = c(23.22, 49.54, 15.07, 12.51, 32.73, 166.37, 221.22),
study = c("Autism", "Conduct Problems", "Tic Disorder", "ADHD",
"OCD", "Schizophrenia", "Manic Bipolar"),
OR = c("19.92" , "41.46", "11.67", "11.69", "25.44", "105.89", "145.45"))
base_data |>
forestplot(labeltext = c(study, OR),
clip = c(0.1, 54),
xlog = TRUE) |>
fp_set_style(box = "royalblue",
line = "darkblue",
summary = "royalblue") |>
fp_add_header(study = c("", "Study"),
OR = c("", "OR")) |>
fp_append_row(mean = 60.22,
lower = 41,
upper = 83,
study = "Summary",
OR = "60.22",
is.summary = TRUE) |>
fp_set_zebra_style("#EFEFEF")
Which creates this graph:
If I set the clip to 220 I am able to plot this but the x axis will stop at 4 as shown below:
Does anyone know how to get past this issue and set the xlimit ticks to a very high number (e.g. 100+) while still using a log scale?
Keeping it on a log scale would mean there would be an equal distance between 1, 10, 100, and show the entire range of answers (up till the final value of 221)while still allowing one to see the difference between values at the lower end.
Any help is extremely appreciated. Thank you so much!
According to the docs:
xlog: The xlog outputs the axis in log() format but the input data
should be in antilog/exp format
So you could change your data using exp. To add labels you can use xticks. Here some reproducible code:
library(forestplot)
base_data$mean <- exp(base_data$mean)
base_data$lower <- exp(base_data$lower)
base_data$upper <- exp(base_data$upper)
base_data |>
forestplot(labeltext = c(study, OR),
xlog = TRUE,
xticks = c(0, 50, 100, 150, 200, 250))|>
fp_set_style(box = "royalblue",
line = "darkblue",
summary = "royalblue") |>
fp_add_header(study = c("", "Study"),
OR = c("", "OR")) |>
fp_append_row(mean = 60.22,
lower = 41,
upper = 83,
study = "Summary",
OR = "60.22",
is.summary = TRUE) |>
fp_set_zebra_style("#EFEFEF")
Created on 2023-01-24 with reprex v2.0.2

A calendar issues in wavelet analyze

I want to apply a simple wavelet analyze using "waveletcomp" package. I want to use the year shown in x-axis. But it always report error in "lease check your calendar dates, format and time zone: dates may not be in an unambiguous format or chronological. The default numerical axis was used instead." I tried to fix the date, but it seems fine. I really don't know where is the wrong part. Thank you in advance.
Here is the code.
library('WaveletComp')
firecount <- data.frame( YEAR = c("1986-01-01","1987-01-01","1988-01-01","1989-01-01","1990-01-01"
,"1991-01-01","1992-01-01","1993-01-01","1994-01-01","1995-01-01"
,"1996-01-01","1997-01-01","1998-01-01","1999-01-01","2000-01-01"
,"2001-01-01","2002-01-01","2003-01-01","2004-01-01","2005-01-01"
,"2006-01-01","2007-01-01","2008-01-01","2009-01-01","2010-01-01"
,"2011-01-01","2012-01-01","2013-01-01","2014-01-01","2015-01-01"
,"2016-01-01","2017-01-01","2018-01-01","2019-01-01","2020-01-01"
),
COUNT = c(3,5,4,0,0,0,13,0,2,3,0,1,0,3,15,13,
59,18,42,16,20,46,44,8,68,18,7,3,9
,48,7,48,23,84,54)
)
flycount$YEAR <- as.Date(as.character(firecount$YEAR),"%Y")
my.w <- analyze.wavelet(flycount, my.series = "COUNT",
loess.span = 0.5,
dt = 1, dj = 1/35,
lowerPeriod = 2, upperPeriod = 12,
make.pval = TRUE, n.sim = 10,
)
wt.image(my.w, color.key = "interval", n.levels = 15,
legend.params = list(lab = "fire occurrence wavelet", label.digits = 2),
periodlab = "periods (years)",
# Concerning item 1 above --- plot the square root of power:
exponent = 0.5,
# Concerning item 2 above --- time axis:
show.date = TRUE,
date.format = "%F",
timelab = "",
spec.time.axis = list(at = c(paste(1986:2020, "-01-01", sep = "")),
labels = c(1986:2020)),
timetcl = -0.5)
The function analyze.wavelet automatically takes the date from a dataframe column called date. So just rename your column from YEAR to date and you're good to go.

plot(var()) displays two different plots, how do I merge them into one? Also having two y axis

> dput(head(inputData))
structure(list(Date = c("2018:07:00", "2018:06:00", "2018:05:00",
"2018:04:00", "2018:03:00", "2018:02:00"), IIP = c(125.8, 127.5,
129.7, 122.6, 140.3, 127.4), CPI = c(139.8, 138.5, 137.8, 137.1,
136.5, 136.4), `Term Spread` = c(1.580025, 1.89438, 2.020112,
1.899074, 1.470544, 1.776862), RealMoney = c(142713.9916, 140728.6495,
140032.2762, 139845.5215, 139816.4682, 139625.865), NSE50 = c(10991.15682,
10742.97381, 10664.44773, 10472.93333, 10232.61842, 10533.10526
), CallMoneyRate = c(6.161175, 6.10112, 5.912088, 5.902226, 5.949956,
5.925538), STCreditSpread = c(-0.4977, -0.3619, 0.4923, 0.1592,
0.3819, -0.1363)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
I want to make my autoregressive plot like this plot:
#------> importing all libraries
library(readr)
install.packages("lubridtae")
library("lubridate")
install.packages("forecast")
library('ggplot2')
library('fpp')
library('forecast')
library('tseries')
#--------->reading data
inputData <- read_csv("C:/Users/sanat/Downloads/exercise_1.csv")
#--------->calculating the lag=1 for NSE50
diff_NSE50<-(diff(inputData$NSE50, lag = 1, differences = 1)/lag(inputData$NSE50))
diff_RealM2<-(diff(inputData$RealMoney, lag = 1, differences = 1)/lag(inputData$RealMoney))
plot.ts(diff_NSE50)
#--------->
lm_fit = dynlm(IIP ~ CallMoneyRate + STCreditSpread + diff_NSE50 + diff_RealM2, data = inputData)
summary(lm_fit)
#--------->
inputData_ts = ts(inputData, frequency = 12, start = 2012)
#--------->area of my doubt is here
VAR_data <- window(ts.union(ts(inputData$IIP), ts(inputData$CallMoneyRate)))
VAR_est <- VAR(y = VAR_data, p = 12)
plot(VAR_est)
I want to my plots to get plotted together in same plot. How do I serparate the var() plots to two separate ones.
Current plot:
My dataset :
dataset
Okay, so this still needs some work, but it should set the right framework for you. I would look more into working with the ggplot2 for future.
Few extra packages needed, namely library(vars) and library(dynlm).
Starting from,
VAR_est <- VAR(y = VAR_data, p = 12)
Now we extract the values we want from the VAR_est object.
y <- as.numeric(VAR_est$y[,1])
z <- as.numeric(VAR_est$y[,2])
x <- 1:length(y)
## second data set on a very different scale
par(mar = c(5, 4, 4, 4) + 0.3) # Leave space for z axis
plot(x, y, type = "l") # first plot
par(new = TRUE)
plot(x, z, type = "l", axes = FALSE, bty = "n", xlab = "", ylab = "")
axis(side=4, at = pretty(range(z)))
mtext("z", side=4, line=3)
I will leave you to add the dotted lines on etc...
Hint: Decompose the VAR_est object, for example, VAR_est$datamat, then see which bit of data corresponds to the part of the plot you want.
Used some of this

Creating a Kaplan Meier plot with Survival probabilities at time points

I'm trying to create a plot in R that would generate a table of the survival probabilities at specified points in time in a table.
Currently the plot looks like the following:
R code for the plot using the survminer package:
ggsurvplot(fit,
pval = TRUE, conf.int = TRUE,
risk.table = TRUE, # Add risk table
risk.table.col = "strata", # Change risk table color by groups
linetype = "strata", # Change line type by groups
ggtheme = theme_bw(), # Change ggplot2 theme
palette = c("#E7B800", "#2E9FDF"))
Ideally I would like a table below the "Number at risk by time" to display the survival probabilities for each strata at times 250, 500, 750, and 1000.
I can retrieve the survival probabilities with the following code:
summary(fit, times=0:1000)
I made a function for that a wile back. It takes as an argument a survfit object and a time sequence and returns the survival probabilities.
ConstruirTabela = function(a, sequencia = seq(250,1000,by=250)){
quebra=NULL
for(i in 1:(length(a$time)-1)){
if(a$time[i] > a$time[i+1]){
quebra = c(quebra,i)
}
}
quebra= c(quebra,length(a$time))
lsurv = list()
ltime = list()
previous = 0
for(i in 1:length(quebra)){
periodo = c((previous+1):quebra[i])
lsurv[[i]] = a$surv[periodo]
ltime[[i]] = a$time[periodo]
previous = quebra[i]
}
matriz=matrix(ncol=length(ltime),nrow=length(sequencia))
for(i in 1:length(sequencia)){
for(j in 1:length(ltime)){
indice = which.min(abs(ltime[[j]]-sequencia[i]))
matriz[i,j] = lsurv[[j]][indice]
}
}
retorno = as.data.frame(matriz)
f=strsplit(names(a$strata),"=")
names(retorno) = sapply(f, "[[", 2)
rownames(retorno) = as.character(sequencia)
return(retorno)}
It's probably not the best way to achieve this, but check if it works for you.
Try this ggpubr library. Look at the very bottom of this page. It shows a graph with a text table.

Plot class probability by neuron in self organizing maps

I found a nice tutorial of self organizing map clustering in R in which, it is explained how to display your input data in the unit space (see below). In order to set up some rules for the labeling, I would like to compute the probability of each class in each neuron and plot it. Computing the probability is rather easy: take for each unit the number of observations of class i and divide it by the total number of observations in this unit. I end up with data.frame pc. Now I struggle to map this result, any clue on how to do it?
library(kohonen)
data(yeast)
set.seed(7)
yeast.supersom <- supersom(yeast, somgrid(8, 8, "hexagonal"),whatmap = 3:6)
classes <- levels(yeast$class)
colors <- c("yellow", "green", "blue", "red", "orange")
par(mfrow = c(3, 2))
plot(yeast.supersom, type = "mapping",pch = 1, main = "All", keepMargins = TRUE,bgcol = gray(0.85))
library(plyr)
pc <- data.frame(Var1=c(1:64))
for (i in seq(along = classes)) {
X.class <- lapply(yeast, function(x) subset(x, yeast$class == classes[i]))
X.map <- map(yeast.supersom, X.class)
plot(yeast.supersom, type = "mapping", classif = X.map,
col = colors[i], pch = 1, main = classes[i], add=TRUE)
# compute percentage per unit
v1F <- levels(as.factor(X.map$unit.classif))
v2F <- levels(as.factor(yeast.supersom$unit.classif))
fList<- base::union(v2F,v1F)
pc <- join(pc,as.data.frame(table(factor(X.map$unit.classif,levels=fList))/table(factor(yeast.supersom$unit.classif,levels=fList))*100),by = 'Var1')
colnames(pc)[NCOL(pc)]<-classes[i]
}
OKay guys here is a solution:
Once I have computed the probability, it derives a color code from a defined gradient (rbPal). The gradient is defined by a upper and a lower bound and the shade of the colors are proportional to their interval. THis is done with the function findInterval.
# compute percentage per unit
v1F <- levels(as.factor(X.map$unit.classif))
v2F <- levels(as.factor(yeast.supersom$unit.classif))
fList<- base::union(v2F,v1F)
pc <- join(pc,as.data.frame(table(factor(X.map$unit.classif,levels=fList))/table(factor(yeast.supersom$unit.classif,levels=fList))*100),by = 'Var1')
colnames(pc)[NCOL(pc)]<-classes[i]
rbPal <- colorRampPalette(c('blue','yellow','red'))
plot(yeast.supersom, type="mapping", bgcol = rbPal((100))[(findInterval(pc[,which(colnames(pc)==as.character(classes[i]))], seq(0:100))+1)], main = paste("Probabily Clusters:", classes[i]))

Resources