Selecting elements from a list maximizing a function - r

I was trying to create some R code from my thesis. Contextualizing my situation I've divided the 505 time series' components of S&P500 with dtw algorithm in 10 clusters. Initially I've created 100 portfolios randomly taking 1 stock for each cluster diversifying the portfolios. Then I've created a code with which I can assign each portfolio' stock a weight with Genetic Algorithm maximizing the Sharpe Ratio.
I was wondering if is there a solution for taking stocks for each cluster maximizing the sharpe ratio to obtain an optimized solution.

You could use a method from the local-search family, such as Stochastic Local Search or Simulated Annealing. For R examples, see Asset selection with Local Search or the more recent Optimization Heuristics: A Tutorial. (Disclosure: I am the maintainer of the NMOF package, which is used in the examples.)

Related

R package for survey raking that does automatic cell collapsing

I know there are various R packages for performing raking (i.e. calibration to external estimates, iterative proportional fitting, etc) to construct survey weights. I wanted to find a package that would automatically collapse cells if a cell count fell below a certain value. Is there a package out there with such a feature? Or if not raking exactly, a weighting package for a similar algorithm (e.g. GREG, entropy balancing) that would have such a feature for matchings to targets. Thank you.
Doing initial research, packages like "Ipfp: Multidimensional Iterative Proportional Fitting" didn't seems to have the feature I wanted.

Is there a function to calculate the scatter matrix in R language?

Recently I have been trying to use an optimizer to make feature selections for clustering. I need a fitness function to tell the optimizer which feature set is better. So I refer to the criteria mentioned in the book "Introduction to Statistical Pattern Recognition 2nd Ed chapter 10,10.2- Keinosuke Fukunaga". The content is shown below.
I have found a function(ScatterMatrices()) in Matlab to calculate the value J. As shown below.
However, I didn't find any function similar to ScatterMatrices() in Matlab. I would appreciate it if you could help me🙏.
withinSS: Within-class Sum of Squares Matrix
"Calculates within-class sum of squares and cross product matrix (a.k.a. within-class scatter matrix)"
Which is available in the archive Index of /src/contrib/Archive/DiscriMiner
How do I install a package that has been archived from CRAN

Quantile Regression with Time-Series Models (ARIMA-ARCH) in R

I am working on quantile forecasting with time-series data. The model I am using is ARIMA(1,1,2)-ARCH(2) and I am trying to get quantile regression estimates of my data.
So far, I have found "quantreg" package to perform quantile regression, but I have no idea how to put ARIMA-ARCH models as the model formula in function rq.
rq function seems to work for regressions with dependent and independent variables but not for time-series.
Is there some other package that I can put time-series models and do quantile regression in R? Any advice is welcome. Thanks.
I just put an answer on the Data Science forum.
It basically says that most of the ready made packages are using so called exact test based on assumption on the distribution (independent identical normal-Gauss distribution, or wider).
You also have a family of resampling methods in which you simulate a sample with a similar distribution of your observed sample, perform your ARIMA(1,1,2)-ARCH(2) and repeat the process a great number of times. Then you analyze this great number of forecast and measure (as opposed to compute) your confidence intervals.
The resampling methods differs in the way to generate the simulated samples. The most used are:
The Jackknife: in which you "forget" one point, that is you simulate a n samples of size n-1 (if n is the size of the observed sample).
The Bootstrap: in which you simulate a sample by taking n values of the original sample with replacements: some will be taken once, some twice or more, some never,...
It is a (not easy) theorem that the expectation of the confidence intervals, as most of the usual statistical estimators, are the same on the simulated sample than on the original sample. With the difference that you can measure them with a great number of simulations.
Hello and welcome to StackOverflow. Please take some time to read the help page, especially the sections named "What topics can I ask about here?" and "What types of questions should I avoid asking?". And more importantly, please read the Stack Overflow question checklist. You might also want to learn about Minimal, Complete, and Verifiable Examples.
I can try to address your question, although this is hard since you don't provide any code/data. Also, I guess by "put ARIMA-ARCH models" you actually mean that you want to make an integrated series stationary using an ARIMA(1,1,2) plus an ARCH(2) filters.
For an overview of the R time-series capabilities you can refer to the CRAN task list.
You can easily apply these filters in R with an appropriate function.
For instance, you could use the Arima() function from the forecast package, then compute the residuals with residuals() from the stats package. Next, you can use this filtered series as input for the garch() function from the tseries package. Other possibilities are of course possible. Finally, you can apply quantile regression on this filtered series. For instance, you can check out the dynrq() function from the quantreg package, which allows time-series objects in the data argument.

Compute within sum of squares from PAM cluster analysis in R

I am working on a cluster analysis with PAM in R. I computed the gower distance for my data with vegdist() and computing a cluster variable with pam() works well. Now I need a measure to determine the right k. The method I know is to visually compare the within sum of squares for different ks. How can I fetch the WSS from a series of PAM iterations to compare the sums in a plot, analogously to this example for kmeans? http://rstudio-pubs-static.s3.amazonaws.com/137758_a80b40255fdd440ab76b41a646a6c482.html#loops
PAM does not optimize WSS. WSS is the k-means objective.
Instead, use the PAM objective (maybe called TD in literature?)
See ?[pam.object][1] for the objective field:
objective
the objective function after the first and second step of the pam algorithm.
Beware that similar to WSS, objective is supposed to decrease with increasing k. Thus you can't just choose the minimum, but you should look for a knee in the plot.
Because PAM is randomized, you may want to run each k multiple times, and keep the best result only.

What R packages are available for binary data that is both correlated and clustered?

I'm working on a project now that's rather unlike anything I've done before. I have two tests with binary results that will be administered to the same sample, which is drawn from a clustered population (i.e., some subjects will be from the same family). I'd like to compare proportions of positive test results, but the clustering makes McNemar's test inappropriate so I've been reading up on alternative approaches. The two main routes seem to be 1) the clustering-adjusted McNemar alternatives by Rao and Scott (1992), Eliasziw and Donner (1991), and Obuchowski (1998), and 2) GEE.
Do you know of any implementations of the Rao-Obuchowski lineage in R (or, I suppose, SAS)? GEE is easy to find, but have you had a positive or negative experience with any particular packages? Is there another route to analyzing these data that I'm completely missing?
You could always just use a clustered bootstrap. Resample across families, which you believe are independent. That is, keep families together when you resample. Compute p2 - p1 for each sample. After 1000 iterations or so, compute the upper and bottom 2.5% quantiles. This will give you a bootstrapped 95% confidence interval. Alternatively compute the fraction of samples above zero, or whatever your hypothesis is. The procedure should have good pretty good properties unless the number of families is small.
It's probably easiest to do this by hand in R rather than relying on any package.
Check out the survey package: it is designed to take into account correlations induced by clustered sampling.
Have you already checked the CorrBin package in R?
It is for analysis of correlated binary data, there is a paper named: Using the CorrBin package for nonparametric analysis of
correlated binary data by Szabo, it includes the Rao-Scott, stochastic ordering and three versions of a GEE-based test.
The clust.bin.pair package for clustered binary matched-pair data was recently published to CRAN.
It contains implementations of Eliasziw and Donner (1991) and Obuchowski (1998), as well as two more recent tests in the same family Durkalski (2003) and Yang (2010).

Resources