Create Bi-variate probability density to sample from in R - r

I have some data that has two variables: spend and outcomes and they are given at a weekly frequency.
I would like to model the relationship between the two at a yearly level, but do not have enough years worth of data to build a model. I do have about 3 years worth of weekly data, however, and would like to simulate several more weeks of data points (spend and outcomes) based on a bi-variate probability density between spend and outcomes which I could then use to roll up to a yearly frequency.
Is there a package in R that can take take two variables and find an estimate for the density function which I could then use to simulate many more data points?
Thanks so much!

The simulate_kde function in the package simukde will internally make a kernel density estimation and create samples from it.
Alternatively, the MASS package has the kde2d function to obtain a bivariate kernel density.
You could then sample from that, as for instance described in this post.

Related

Extract sample variance from svykm (survey package by Lumley) for complex survey analysis

In order to compare two survival curves at a fixed point in time and perform basically a two sample test, I need to extract the sample variance of the estimate at a given point in time.
For an object created with the svykm function from Thomas Lumley's survey package in R, this should be accessible in the varlog list. Do the entries in this list constitute the transformed variances on the log scale or the untransformed variances?
I have read the documentation provided for the survey package, but did not fully come to a conclusion. I note that confidence intervals are computed on the log(survival) scale, following the default in survival package and their bounds are given as exp(log(x$surv)+1.96*sqrt(x$varlog)) and exp(log(x$surv)-1.96*sqrt(x$varlog)) in the R package documentation.
They are variances on the log scale.

Obtain the CDF of a Vine Copula (trivariate) in R

I'm aware that the "VineCopula" package has a RVinePDF function and a BiCopCDF function, however there is no RVineCDF function. I need to select and fit R-Vine Copulas to a subset of three variables with 40 observations in a dataset with 300+ levels of a grouping variable (think of fitting a copula to each municipality in Texas based on precipitation, temperature and solar radiation data).
Is there any alternative using the RVineStructureSelect and RVineCopSelect pipeline?
There are many other vine copula packages, but not sure if they will be good for your question, for example, vinereg (for fitting regression D-vine copula), vinecopulib (similar to the VineCopula but take into account the marginals).

Difference between simulate() and forecast() in "forecast" package

I am working on building a time series model.
However, I am having trouble understanding what the difference is between the simulate function and the forecast function in the forecast package.
Suppose I built an arima model and want to use it to simulate future values as long as 10 years. The data is hourly and we have a year worth of data.
When using forecast to predict the next 1000-step-ahead estimation, I got the following plot.
Using forecast method
Then I used the simulate function to simulate the next 1000 simulated values and got the following plot.
Using simulate method
Data points after the red line are simulated data points.
In the latter example, I used the following codes to simulate the future values.
simulate(arima1, nsim=1000, future=TRUE, bootstrap=TRUE))
where arima1 is my trained arima model, bootstrap residuals are used because the model residuals are not very normal.
Per definition in the forecast package, future=TRUE means that we are simulating future values based on the historical data.
Can anyone tell me what the difference is between these two method? Why does simulate() give me a much more realistic results but forecasted values from forecast() just converge to a constant after several iterations (no much fluctuation to the results from simulate())?
A simulation is a possible future sample path of the series.
A point forecast is the mean of all possible future sample paths. So the point forecasts are usually much less variable than the data.
The forecast function produces point forecasts (the mean) and interval forecasts containing the estimated variation in the future sample paths.
As a side point, an ARIMA model is not appropriate for this time series because of the skewness. You might need to use a transformation first.

Forecasting panel data and time series

I have a panel data set of lets say 1000 observations, so i=1,2,...,1000 . The data set runs in daily basis for a month, so t=1,2,...,31.
I want to estimate individual specific in R:
y_i10=αi+βi∗yi9+γi∗yi8+...+δi∗yi1+ϵit
and then produce density forecasts for the next 21 days, that is produce density forecasts for yi11,yi12 etc
My questions are:
Can I do this with plm package ? I am aware how to estimate with plm package but I do not know how to produce the forecasts.
Would it be easier (and correct) if I consider each observation as a separate time series, and use arima(9,0,0) for each one of them, and then get the density forecasts ? If so, how can I get the density forecasts ?
In (2.) , how can I include individual specific effects that are constant over time ?
Thanks a lot

Constructing ROC Curve in R [duplicate]

This question already has answers here:
ROC curve in R using ROCR package
(6 answers)
Closed 9 years ago.
I am doing ensemble forecasts for a quantity. And I have around 20 forecast values at each observation point. I will have an event definition of x% i.e. say 95% of highest observation value. I am trying to construct an ROC Curve using R:
Is ROCR a good package for probabilistic based ROC score?
Can you provide an example of how to construct this ROC curve?
Just assume a fake dataset.
I am reading all sorts of papers. But I am very confused as to how to calculate the forecast probabilities.
I would encourage you to look at the caret package. It's wonderful for ensemble learning. It'll tune your parameters for you based on RMSE, ROC (AUC), etc. by cross-validation. That is split your data up into samples with replacement, run tons of models while tuning parameters and give you back the best model.
The vignette (listed on the package page) here is excellent and you'll see examples in there showing plotting ROC curves.
However, if what you're looking for is the simple method to calculate an ROC score from predictions and held out data, check out page 11 of this pdf.

Resources