calculating aggregate confidence intervals for forecasts - r

I'm using ARIMA models to estimate sales forecast for a company. The company's sales channel is broken down into 4 sales channels and I'm running 4 different models to estimate the sales for each channel. Eventually, I'm going to aggregate the sales of these channels to find the total forecasted sales for the whole company. My questions is, how should i go about finding the confidence interval for the overall forecast? Adding up the confidence intervals of each channel is not correct since that will give me a very large interval.
I'd really appreciate if anyone can give me some idea on how to approach this sort of issue. Thanks in advance!

The confidence interval is based on the variance. For a 95% interval, it is the forecast plus/minus 1.96*sqrt(variance). So you need to estimate the variance of the sum. It is not equal to the sum of the variances unless the four channels are uncorrelated. You will need to compute the correlations between the four channels, and then use the equation for the variance of a sum (check any intro stats book).

Related

Is there a method in R to find confidence interval of a proportion by non-parametric percentile bootstrap method

I have a table which has been generated in one of the publication and I am trying to do similar analysis for another study. please help me out to find CI by non-parametric percentile bootstrap method.
here is the table
I want to know how the ci for 1-RR is calculated.
here is the statistical explanation for it:
We used the Kaplan-Meier estimator18 to construct
cumulative incidence curves and to estimate the risk for
each outcome. The risks were compared via ratios and
differences. We estimated the risk ratio for each outcome
using only matched pairs in which both individuals were
still at risk 7 days after receipt of the third vaccine dose in
those vaccinated. We analysed outcomes in the full
population and in subgroups defined by strata of age,
sex, and number of comorbidities. 95% CIs were
calculated using the nonparametric percentile bootstrap
method with 1000 repetitions. The effectiveness of the
third dose was estimated as 1 – risk ratio. As a sensitivity
analysis, vaccine effectiveness was also estimated as
1 – incidence rate ratio derived from a Poisson regression
using the same dataset, with no further adjustment.
Analyses were done using R software (version 4.0.4)."
How to calculate CI.

How to deccde the different time intervals to estimate hazard ratio when proportional hazard assumption does not hold

When the proportional hazard ratio assumption is not met, one way to handle it is to split the follow-up time period into different intervals and then estimate HR for each interval. My question is how I can determine the intervals based on the following Shoenfeld residual plot? It seems to me that I cannot create any interval where the line is approximately horizontal. Thank you.

Is it possible to specify the correlation between two distributions?

For context, say there were two academic exams --morning and afternoon-- conducted. I'm only given the summary statistics -- mean, median, skew and kurtosis for the scores on both exams, so I'm unable to say exactly how many students passed, but I can estimate it by fitting the moments and creating a custom pearson distribution. I can estimate, for example, how many students passed the first and the second exam, as well as giving it a standard deviation to quantify my error.
What I would like to do is to estimate the number of students who pass the course, defined as having the average score of both morning and afternoon exams being over 60%. If the performance of students on both tests are completely independent, I suppose this would be easy - I just generate scores for both tests in the form of two lists, average them, count the number of items over 60%, and repeat, say 10000 times.
If both tests are completely dependent, I suppose I would have to order both lists, because the student scoring the highest on the morning exam should also score the highest on the second. What I'm missing is how I should measure the degree of randomness/interdependence (maybe it has something to do with entropy?) in between, where students who score highly on exam 1 also score highly on exam 2, and if there is a package in R that I can use to specify an arbitrary degree of entropy between two variables.
A renowned concept for measuring entropy between two distributions is KL divergence:
In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy) is a measure of how one probability distribution is different from a second, reference probability distribution.
To make the measure symmetric, you can use Jensen-Shannon divergence as well.
For the implementation of KL divergence, you can use this package in R.
A special case of KL-divergence is mutual information, which is a better measure of interdependence which is what you're looking for. Instead of calculating the divergence from a reference distribution, mutual information is basically equal to the KL-divergence between the joint probability and the product of the marginal probability distributions. Mutual information is also equal to the sum of each marginal distribution's entropy minus their joint entropy, meaning that you can estimate the individual and joint entropies first, then sum them together to estimate mutual information.
Here is one implementation of mutual information for R, although there have been many other estimators introduced:
https://github.com/majianthu/copent

R - calculate confidence interval of grouped data

Suppose you have a dataset named data as follows:
Gender Pneumonia_Incidence lower_CI upper_CI
Male 38000 30000 44000
Female 34000 32000 38000
I would now like to calculate the total pneumonia incidence, which can be done easily:
sum(data$Pneumonia_Incidence)
But how can I calculate lower and upper CI for this estimate? Is it valid to just sum lower_CI and upper_CI values or would I need to do something else?
How can I calculate lower and upper CI for this estimate?
You cannot with the information you have provided. You would need to know the variance of each estimated prevalence. This must be known since someone has calculated the confidence intervals. With these you could then obtain a pooled estimate for the total variance and then calculate the overall confidence interval.
If this is important to you, I strongly suggest you consult a qualified statistician, or at least a reputable text book. This is not simple high school math. There may be other issues such as sampling weights involved. I certainly wouldn't seek statistical advice here, and probably not even at that other place often mentioned. What if your boss asked you how you calculated the confidence interval? Would you say that you consulted the internet?
Is it valid to just sum lower_CI and upper_CI values ...
No. Variances involve sample sizes. Consider this. Imagine two groups, one with a very large sample size and one with a very small one. The group with the large sample size will have a narrower confidence interval than the group with the small sample size will. If you just added the two intervals, you would end up with an overall interval that was equally weighted by both groups, which intuitively doesn't seem correct. It's a biased estimate.
... or would I need to do something else?
Consult a statistician. :)

Meta analysis in R with adjusted ORs

I would like to calculate a summary odds ratio value for two or more papers where the only information I have is the individual odds ratios with their 95% confidence intervals. Is this possible? I have been poking around in the meta package, and only figured out how to do it with crude counts.
Thanks so much!
It is quite simple.
You just need to use the natural logarithm of the odds ratio (logOR), and its standard errror (and corresponding variance). These can be easily back-calculated from the 95% confidence intervals according to the normal distribution. Finally, pool logORs with their variance.
For instance, after you have built a data frame (eg called mydata) with logOR and variance for each study, you can easily proceed with a random effect meta-analysis with the metafor package in R as follows:
res <- rma(logOR, variance, data=mydata, method="DL")
forest(res)
In the future, you may consider posting similar questions in CrossValidated.

Resources