Pratical questions about the vrtest package - r

I want to perform Variance Ratio tests (Lo-MackKinlay, Chow-Denning) but I have some problem with the running of the commands.
I have a price Index for 1957 to 2007. Do I need to perform the variance ratio tests on the level series or on the series of returns?
How do you fix the kvec? It is a vector with the lags for which you want to do the test right?
So here is my output:
> rcorr
[1] 0.0000 -0.1077 0.4103 -0.0347 0.1136 0.0286 0.0104 0.0104 0.1915
[10] -0.0025 0.0665 0.2127 0.0116 -0.1288 0.1640 0.3089 0.2098 -0.1071
[19] -0.2079 -0.1082 0.0022 0.1419 0.0641 -0.0082 -0.1163 -0.1731 0.0260
[28] 0.0468 0.0882 0.2640 0.3946 0.2094 0.2754 0.0623 -0.3696 -0.1095
[37] -0.1463 0.0118 0.0152 -0.0103 0.0223 0.0379 0.0580 -0.0091 -0.0510
[46] 0.0765 0.0984 0.1250 0.0519 0.1623 0.2552
> kvec<--c(2,5,10)
> Lo.Mac(rcorr,kvec)
Error in y[index] : only 0's may be mixed with negative subscripts
Why do I get this error?

It is the same error as in your other question I just answered:
kvec<--c(2,5,10)
is the same as
kvec <- -c(2,5,10)
ie
kvec <- -1 * c(2,5,10)
Remove the second dash.

Related

3month rolling correlation keeping date column in R

This is my data. Daily return data for different sectors.
I would like to compute the 3 month rolling correlation between sectors but keep the date field and have it line up.
> head(data)
Date Communication Services Consumer Discretionary Consumer Staples Energy Financials - AREITs Financials - ex AREITs Health Care
1 2003-01-02 -0.0004 0.0016 0.0033 0.0007 0.0073 0.0006 0.0370
2 2003-01-03 -0.0126 -0.0008 0.0057 -0.0019 0.0016 0.0062 0.0166
3 2003-01-06 0.0076 0.0058 -0.0051 0.0044 0.0063 0.0037 -0.0082
4 2003-01-07 -0.0152 0.0052 -0.0024 -0.0042 -0.0037 -0.0014 0.0027
5 2003-01-08 0.0107 0.0017 0.0047 -0.0057 0.0013 -0.0008 -0.0003
6 2003-01-09 -0.0157 0.0019 -0.0020 0.0009 -0.0016 -0.0012 0.0055
`
My data type is this
$ Date : Date[1:5241], format: "2003-01-02" "2003-01-03" "2003-01-06" "2003-01-07" ...
$ Communication Services : num [1:5241] -0.0004 -0.0126 0.0076 -0.0152 0.0107 -0.0157 0.0057 -0.0131 0.0044 0.0103 ...
$ Consumer Discretionary : num [1:5241] 0.0016 -0.0008 0.0058 0.0052 0.0017 0.0019 -0.0022 0.0057 -0.0028 0.0039 ...
$ Consumer Staples : num [1:5241] 0.0033 0.0057 -0.0051 -0.0024 0.0047 -0.002 0.0043 -0.0005 0.0163 0.004 ...
$ Energy : num [1:5241] 0.0007 -0.0019 0.0044 -0.0042 -0.0057 0.0009 0.0058 0.0167 -0.0026 -0.0043 ...
$ Financials - AREITs : num [1:5241] 0.0073 0.0016 0.0063 -0.0037 0.0013 -0.0016 0 0.0025 -0.0051 0.0026 ...`
Currently what I am doing is this:
rollingcor <- rollapply(data, width=60, function(x) cor(x[,2],x[,3]),by=60, by.column=FALSE)
This works fine and works out the rolling 60 day correlation and shifts the window by 60 days. However it doesnt keep the date column and I find it hard to match the dates.
The end goal here is to produce a df in which the the date is every 3 months and the other columns are the correlations between all the sectors in my data.
Please read the information at the top of the r tag and, in particular provide the input in an easily reproducible manner using dput. In the absence of that we will use data shown below based on the 6x2 BOD data frame that comes with R and use a width of 4. The names on the correlation columns are the row:column numbers in the correlation matrix. For example, compare the 4th row of the output below with cor(data[1:4, -1]) .
fill=NA causes it to output the same number of rows as the input by filling with NA's.
library(zoo)
# test data
data <- cbind(Date = as.Date("2023-02-01") + 0:5, BOD, X = 1:6)
# given data frame x return lower triangular part of cor matrix
# Last 2 lines add row:column names.
Cor <- function(x) {
k <- cor(x)
lo <- lower.tri(k)
k.lo <- k[lo]
m <- which(lo, arr.ind = TRUE) # rows & cols of lower tri
setNames(k.lo, paste(m[, 1], m[, 2], sep = ":"))
}
cbind(data, rollapplyr(data[-1], 4, Cor, by.column = FALSE, fill = NA))
giving:
Date Time demand X 2:1 3:1 3:2
1 2023-02-01 1 8.3 1 NA NA NA
2 2023-02-02 2 10.3 2 NA NA NA
3 2023-02-03 3 19.0 3 NA NA NA
4 2023-02-04 4 16.0 4 0.8280576 1.0000000 0.8280576
5 2023-02-05 5 15.6 5 0.4604354 1.0000000 0.4604354
6 2023-02-06 7 19.8 6 0.2959666 0.9827076 0.1223522

How do I integrate a line created by the smooth.spline function?

I'm trying to integrate a list created by predict(smooth.spline) but I'm getting the following error: Error in stats::integrate(...) :
evaluation of function gave a result of wrong length.
predict(smooth.spline(x,y) gives:
$x
[1] 0.000 0.033 0.067 0.100 0.133 0.167 0.200 0.233 0.267 0.300 0.333 0.367 0.400 0.433 0.467 0.500
[17] 0.533 0.567 0.600 0.633 0.667 0.700 0.733 0.767 0.800 0.833 0.867 0.900 0.933 0.967 1.000 1.033
[33] 1.067 1.100 1.133 1.167 1.200 1.233 1.267 1.300 1.333 1.367 1.400 1.433 1.467 1.500 1.533 1.567
[49] 1.600 1.633 1.667 1.700 1.733 1.767 1.800 1.833 1.867 1.900 1.933 1.967 2.000 2.033 2.067 2.100
[65] 2.133 2.167 2.200 2.233 2.267 2.300 2.333 2.367 2.400 2.433 2.467 2.500 2.533 2.567 2.600 2.633
[81] 2.667 2.700 2.733 2.767 2.800 2.833 2.867 2.900 2.933 2.967 3.000 3.033 3.067 3.100 3.133 3.167
[97] 3.200 3.233 3.267 3.300 3.333 3.367 3.400 3.433 3.467 3.500 3.533 3.567 3.600 3.633 3.667 3.700
[113] 3.733 3.767 3.800 3.833 3.867 3.900 3.933 3.967 4.000 4.033 4.067 4.100 4.133 4.167 4.200 4.233
[129] 4.267 4.300 4.333 4.367 4.400 4.433 4.467 4.500 4.533 4.567 4.600 4.633 4.667 4.700 4.733 4.767
[145] 4.800 4.833 4.867 4.900 4.933 4.967 5.000 5.033 5.067 5.100 5.133 5.167 5.200 5.233 5.267 5.300
[161] 5.333 5.367 5.400 5.433 5.467 5.500 5.533 5.567 5.600 5.633 5.667 5.700 5.733 5.767 5.800 5.833
[177] 5.867 5.900 5.933 5.967 6.000 6.033 6.067 6.100 6.133 6.167 6.200 6.233 6.267 6.300 6.333 6.367
[193] 6.400 6.433 6.467 6.500 6.533 6.567 6.600 6.633 6.667 6.700 6.733 6.767 6.800 6.833 6.867 6.900
[209] 6.933 6.967 7.000 7.033 7.067 7.100 7.133 7.167 7.200 7.233 7.267 7.300 7.333 7.367 7.400 7.433
[225] 7.467 7.500 7.533 7.567 7.600 7.633 7.667 7.700 7.733 7.767 7.800 7.833 7.867 7.900 7.933 7.967
[241] 8.000 8.033 8.067 8.100 8.133 8.167 8.200 8.233 8.267 8.300 8.333 8.367 8.400 8.433 8.467 8.500
[257] 8.533 8.567 8.600 8.633 8.667 8.700 8.733 8.767 8.800 8.833 8.867 8.900 8.933 8.967 9.000 9.033
[273] 9.067 9.100 9.133 9.167 9.200 9.233 9.267 9.300 9.333 9.367 9.400 9.433 9.467 9.500 9.533 9.567
[289] 9.600 9.633 9.667 9.700 9.733 9.767 9.800 9.833 9.867 9.900 9.933 9.967 10.000 10.033 10.067 10.100
$y
[1] 59.96571 182.14589 308.06545 430.28967 552.13181 676.76001 796.27007 913.45605 1030.73901 1140.24735
[11] 1244.62019 1345.89199 1437.37738 1521.99577 1601.97896 1672.60118 1736.28174 1794.58753 1844.06630 1886.59891
[21] 1923.24013 1952.04715 1974.93273 1993.22884 2006.84446 2017.75964 2027.59482 2036.61631 2045.82650 2056.14890
[31] 2067.21217 2079.44489 2093.29127 2107.48046 2121.84443 2136.20938 2149.03007 2160.03152 2168.83055 2174.72156
[41] 2177.92034 2178.50434 2177.25261 2175.18231 2173.05271 2171.23280 2169.75413 2168.60865 2167.58021 2166.28136
[51] 2164.31765 2161.56924 2157.84126 2153.06845 2147.68110 2141.80856 2135.99289 2131.40947 2128.57716 2127.73980
[61] 2129.07173 2132.52768 2137.84677 2144.15311 2151.04004 2158.20845 2164.72665 2170.38182 2175.16221 2178.72060
[71] 2181.26140 2183.34329 2185.47108 2188.20964 2191.71999 2195.72978 2200.17822 2204.67512 2208.37304 2210.99201
[81] 2212.16148 2211.52661 2209.27941 2205.52709 2200.82773 2195.80333 2191.14046 2187.86227 2186.22909 2186.61490
[91] 2189.21504 2193.74033 2200.00587 2207.23478 2215.15186 2223.55507 2231.56558 2239.35648 2247.15616 2254.58452
[101] 2262.25845 2270.90839 2280.40791 2291.00929 2302.93232 2315.07098 2327.30700 2339.53707 2350.58890 2360.39110
[111] 2368.83106 2375.48715 2380.80457 2385.21836 2389.36786 2394.40853 2401.47143 2410.55245 2422.11132 2436.78865
[121] 2453.43711 2472.06315 2492.92121 2514.41941 2536.79884 2560.48574 2584.09299 2608.55242 2635.61496 2664.80169
[131] 2697.76567 2735.79016 2776.54744 2820.81417 2868.96931 2916.89215 2964.73344 3012.72300 3056.87880 3097.62601
[141] 3135.48071 3167.79172 3195.56342 3220.27772 3241.55129 3261.03300 3279.41808 3295.63106 3310.16876 3323.00826
[151] 3332.94381 3340.03845 3344.39672 3345.94806 3345.34005 3343.03700 3339.80326 3336.46397 3333.90149 3333.10272
[161] 3334.29421 3337.81087 3343.53943 3351.20699 3360.65966 3370.86645 3381.56693 3392.54603 3402.66565 3411.98625
[171] 3420.52889 3427.65472 3433.82738 3439.48350 3444.52521 3449.15602 3453.47469 3457.18103 3460.26646 3462.61691
[181] 3463.90801 3464.03740 3462.81764 3460.39884 3456.89191 3452.34917 3447.51817 3442.81170 3438.49642 3434.61442
[191] 3430.68032 3426.12851 3420.51956 3412.97424 3402.44270 3389.08015 3372.22571 3350.92543 3326.65679 3299.18832
[201] 3267.98034 3235.60437 3201.97284 3166.74241 3132.31425 3097.84231 3062.28419 3027.69000 2992.94842 2956.82062
[211] 2921.23160 2884.94573 2846.71167 2808.67879 2769.66061 2728.44573 2687.49711 2645.56586 2600.90609 2555.63728
[221] 2507.95605 2455.68553 2401.27869 2342.78231 2278.34602 2212.01091 2142.26985 2067.55831 1993.06085 1917.46648
[231] 1839.35164 1764.18963 1690.48889 1616.92292 1548.58020 1483.78349 1421.22958 1365.02723 1313.47540 1265.38224
[241] 1223.67578 1186.75059 1153.52704 1125.77912 1102.26304 1082.24588 1066.67248 1054.56916 1045.35940 1039.20608
[251] 1035.34023 1033.24970 1032.58511 1032.85175 1033.69725 1034.73437 1035.66522 1036.21146 1036.16962 1035.42480
[261] 1033.76896 1031.12350 1027.27529 1021.86005 1014.99372 1006.33762 995.34857 982.53272 967.47341 949.51507
[271] 929.75179 907.75896 882.86053 856.68919 828.72692 798.22411 767.10143 734.59731 699.82246 665.13042
[281] 629.85926 593.30425 558.00149 523.24723 488.37898 455.79640 424.78607 394.77350 367.79586 343.17422
[291] 320.38235 300.83710 283.85695 268.87085 256.54269 246.16897 237.22002 229.91066 223.66652 218.11256
[301] 213.36419 209.04868 204.88159 200.94805
smooth <- predict(smooth.spline(x,y))
Then I pass this data to the function command:
func <- function(x) smooth
#Attempt to integrate
integrate(func,0,10)$value
Error in stats::integrate(...) :
evaluation of function gave a result of wrong length
I get the same error when I attempt to Vectorize the function
> integrate(Vectorize(func),0,10)$value
Error in stats::integrate(...) :
evaluation of function gave a result of wrong length
Ultimately, I'm trying to find an upper limit to the integral with a given Area under the curve but I can't even seem to complete the integral function.
You didn't include any reproducible data, so I can't test this advice for you, but here are two suggestions.
First: if you are starting with the smooth object that has evenly spaced x values and corresponding y values from predictions from the spline, then don't bother with integrate(), just use the trapezoidal rule to approximate the integral:
with(smooth, (x[2]-x[1])*(sum(y) - mean(y[c(1, length(y))])))
The Simpson's rule formula would be a bit more accurate but also more complicated.
Second: if you are starting with data vectors x and y, then you should construct a function which takes a vector of new x values and returns the corresponding predictions of y, and pass that function to integrate(). Here I do it that way:
fit <- smooth.spline(x, y)
smooth <- function(x) predict(fit, x)$y
integrate(smooth, 0, 10)

Changepoints detection in time series in R

I need some guidance regarding how changepoints work in time series. I am trying to detect some changepoints using R, and the package called "changepoint" (https://cran.r-project.org/web/packages/changepoint/changepoint.pdf).
There are options for how to detect when the variance (cpt.var) and the mean (cpt.mean) change, but what I'm trying to look for is when the time series changes trend.
Maybe I'm confused with what changepoints really are, but is there any way to get this information?
I am showing the result of using cpt.var() function, and I have added some arrows, showing what I would like to achieve.
Is there any way to achieve this? I guess should be somehow like inflection points...
I would appreciate any light on this.
Thanks beforehand,
Jon
EDIT
I have tried with the approach of using diff(), but is not detecting the change correctly:
The data I am using is the following:
[1] 10.695 10.715 10.700 10.665 10.830 10.830 10.800 11.070 11.145 11.270 11.015 11.060 10.945 10.965 10.780 10.735 10.705 10.680 10.600 10.335 10.220 10.125
[23] 10.370 10.595 10.680 11.000 10.980 11.065 11.060 11.355 11.445 11.415 11.350 11.310 11.330 11.360 11.445 11.335 11.275 11.300 11.295 11.470 11.445 11.325
[45] 11.300 11.260 11.200 11.210 11.230 11.240 11.300 11.250 11.285 11.215 11.260 11.395 11.410 11.235 11.320 11.475 11.470 11.685 11.740 11.740 11.700 11.905
[67] 11.720 12.230 12.285 12.505 12.410 11.995 12.110 12.005 11.915 11.890 11.820 11.730 11.700 11.660 11.685 11.615 11.360 11.425 11.185 11.275 11.265 11.375
[89] 11.310 11.250 11.050 10.880 10.775 10.775 10.805 10.755 10.595 10.700 10.585 10.510 10.290 10.255 10.395 10.290 10.425 10.405 10.365 10.010 10.305 10.185
[111] 10.400 10.700 10.725 10.875 10.750 10.760 10.905 10.680 10.670 10.895 10.790 10.990 10.925 10.980 10.975 11.035 10.895 10.985 11.035 11.295 11.245 11.535
[133] 11.510 11.430 11.450 11.390 11.520 11.585
And when I do diff() I get this data:
[1] 0.020 -0.015 -0.035 0.165 0.000 -0.030 0.270 0.075 0.125 -0.255 0.045 -0.115 0.020 -0.185 -0.045 -0.030 -0.025 -0.080 -0.265 -0.115 -0.095 0.245
[23] 0.225 0.085 0.320 -0.020 0.085 -0.005 0.295 0.090 -0.030 -0.065 -0.040 0.020 0.030 0.085 -0.110 -0.060 0.025 -0.005 0.175 -0.025 -0.120 -0.025
[45] -0.040 -0.060 0.010 0.020 0.010 0.060 -0.050 0.035 -0.070 0.045 0.135 0.015 -0.175 0.085 0.155 -0.005 0.215 0.055 0.000 -0.040 0.205 -0.185
[67] 0.510 0.055 0.220 -0.095 -0.415 0.115 -0.105 -0.090 -0.025 -0.070 -0.090 -0.030 -0.040 0.025 -0.070 -0.255 0.065 -0.240 0.090 -0.010 0.110 -0.065
[89] -0.060 -0.200 -0.170 -0.105 0.000 0.030 -0.050 -0.160 0.105 -0.115 -0.075 -0.220 -0.035 0.140 -0.105 0.135 -0.020 -0.040 -0.355 0.295 -0.120 0.215
[111] 0.300 0.025 0.150 -0.125 0.010 0.145 -0.225 -0.010 0.225 -0.105 0.200 -0.065 0.055 -0.005 0.060 -0.140 0.090 0.050 0.260 -0.050 0.290 -0.025
[133] -0.080 0.020 -0.060 0.130 0.065
What I get is the next results:
> cpt =cpt.mean(diff(vector), method="PELT")
> (cpt.pts <- attributes(cpt)$cpts)
[1] 137
Appearly does not make sense... Any clue?
In R, there are many packages available for time series changepoint detection. changepoint is definitely a very useful one. A partial list of the packages is summarized in CRAN Task View:
Change point detection is provided in strucchange (using linear regression models), and in trend (using nonparametric tests). The changepoint package provides many popular changepoint methods, and ecp does nonparametric changepoint detection for univariate and multivariate series. changepoint.np implements the nonparametric PELT algorithm, while changepoint.mv detects changepoints in multivariate time series. InspectChangepoint uses sparse projection to estimate changepoints in high-dimensional time series. robcp provides robust change-point detection using Huberized cusum tests, and Rbeast provides Bayesian change-point detection and time series decomposition.
Here is also a great blog comparing several alternative packages: https://www.marinedatascience.co/blog/2019/09/28/comparison-of-change-point-detection-methods/. Another impressive comparison is from Dr. Jonas Kristoffer Lindeløv who developed the mcp package: https://lindeloev.github.io/mcp/articles/packages.html.
Below I used your sample time series to generate some quick results using the Rbeast package developed by myself (chosen here apparently for ego of self-promoting as well as perceived relvance). Rbeast is a Baysian changepoint detection algorithm and it can estimate the probability of changepoint occurrence. It can also be used for decomposing time series into seasonality and trend, but apparently, your time series is trend-only, so in the beast function below, season='none' is specified.
y = c(10.695,10.715,10.700,10.665,10.830,10.830,10.800,11.070,11.145,11.270,11.015,11.060,10.945,10.965,10.780,10.735,10.705,
10.680,10.600,10.335,10.220,10.125,10.370,10.595,10.680,11.000,10.980,11.065,11.060,11.355,11.445,11.415,11.350,11.310,11.330,
11.360,11.445,11.335,11.275,11.300,11.295,11.470,11.445,11.325,11.300,11.260,11.200,11.210,11.230,11.240,11.300,11.250,11.285,
11.215,11.260,11.395,11.410,11.235,11.320,11.475,11.470,11.685,11.740,11.740,11.700,11.905,11.720,12.230,12.285,12.505,12.410,
11.995,12.110,12.005,11.915,11.890,11.820,11.730,11.700,11.660,11.685,11.615,11.360,11.425,11.185,11.275,11.265,11.375,11.310,
11.250,11.050,10.880,10.775,10.775,10.805,10.755,10.595,10.700,10.585,10.510,10.290,10.255,10.395,10.290,10.425,10.405,10.365,
10.010,10.305,10.185,10.400,10.700,10.725,10.875,10.750,10.760,10.905,10.680,10.670,10.895,10.790,10.990,10.925,10.980,10.975,
11.035,10.895,10.985,11.035,11.295,11.245,11.535 ,11.510,11.430,11.450,11.390,11.520,11.585)
library(Rbeast)
out=beast(y, season='none')
plot(out)
print(out)
In the figure above, dashed vertical lines mark the most likely locations of changepoints; the green curve of Pr(tcp) shows the point-wise probability of changepoint occurrence over time. The order_t curve gives the estimated mean order of the piecewise polynomials needed to adequately fit the trend (the 0-th order is constant and the 1st order is linear): An average order toward 0 means that the trend is more likely to be flat and an order close to 1 means that the trend is linear. The output can be also printed as some ascii outputs, as shown below. Again, it says that the time series is most likely to have 8 changepoints; their most probable locations are given in out$trend$cp.
Result for time series #1 (total number of time series in 'out': 1)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ SEASONAL CHANGEPOINTS +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
No seasonal/periodic component present (i.e., season='none')
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ TREND CHANGEPOINTS +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
An ascii plot of the probability dist for number of chgpts(ncp)
---------------------------------------------------------------
Pr(ncp=0 )=0.000|* |
Pr(ncp=1 )=0.000|* |
Pr(ncp=2 )=0.000|* |
Pr(ncp=3 )=0.000|* |
Pr(ncp=4 )=0.000|* |
Pr(ncp=5 )=0.000|* |
Pr(ncp=6 )=0.055|***** |
Pr(ncp=7 )=0.074|****** |
Pr(ncp=8 )=0.575|******************************************** |
Pr(ncp=9 )=0.240|******************* |
Pr(ncp=10)=0.056|***** |
---------------------------------------------------------------
Max ncp : 10 | A parameter you set (e.g., maxTrendKnotNum) |
Mode ncp: 8 | Pr(ncp= 8)=0.57; there is a 57.5% probability|
| that the trend componet has 8 chngept(s). |
Avg ncp : 8.17 | Sum[ncp*Pr(ncp)] |
---------------------------------------------------------------
List of most probable trend changepoints (avg number of changpts: 8.17)
--------------------------------.
tcp# |time (cp) |prob(cpPr)|
-----|---------------|----------|
1 |8.0000 | 0.92767|
2 |112.0000 | 0.91433|
3 |68.0000 | 0.84213|
4 |21.0000 | 0.80188|
5 |32.0000 | 0.78171|
6 |130.0000 | 0.76938|
7 |101.0000 | 0.66404|
8 |62.0000 | 0.61171|
--------------------------------'
If the signal isn't too noisy, you could use diff to detect changepoints in slope instead of mean:
library(changepoint)
set.seed(1)
slope <- rep(sample(10,10)-5,sample(100,10))
sig <- cumsum(slope)+runif(n=length(slope),min = -1, max = 1)
cpt =cpt.mean(diff(sig),method="PELT")
# Show change point
(cpt.pts <- attributes(cpt)$cpts)
#> [1] 58 109 206 312 367 440 447 520 599
plot(sig,type="l")
lines(x=cpt.pts,y=sig[cpt.pts],type="p",col="red",cex=2)
Another option which seems to work better with the data you provided is to use piecewise linear segmentation:
library(ifultools)
changepoints <- linearSegmentation(x=1:length(data),y=data,angle.tolerance = 90,n.fit=10,plot=T)
changepoints
#[1] 13 24 36 58 72 106

Error: faceting variables must have at least one value

I have the following dataset:
Col1 Col2 Col3 Col4 Col5 Col6
4439.5 6.5211 50.0182 29.4709 -0.0207 0.0888
4453 25.1186 46.5586 34.1279 -0.0529 0.082
4453.5 24.2974 46.6291 30.6281 -0.057 0.0809
4457.5 25.3257 49.6885 26.2664 -0.0357 0.0837
4465 7.1077 53.516 32.5077 -0.0398 0.1099
4465.5 7.5892 53.0884 33.1582 -0.0395 0.1128
4898.5 8.8296 55.0611 40.3813 -0.0123 0.1389
4899 9.2469 54.4799 37.1927 -0.0061 0.1354
4900 13.4119 50.8334 28.9441 -0.0272 0.1071
4900.5 21.8415 50.1127 24.2351 -0.0375 0.0882
4905 11.3824 52.4024 37.2646 -0.0324 0.1215
4918.5 6.2601 49.9454 27.715 0.0101 0.1444
4919 7.4157 49.7412 25.6159 -0.0164 0.1038
4932 25.737 46.2825 38.6334 -0.0425 0.0717
5008.5 13.641 49.7868 18.0337 -0.0213 0.111
5010.5 13.5935 49.5352 23.9319 -0.0518 0.0979
5012 16.6945 48.0672 25.2408 -0.0446 0.0985
5014.5 14.1303 49.6361 23.1816 -0.0455 0.1056
5040 7.6895 49.8688 31.562 -0.0138 0.126
5044 12.594 60.822 52.4569 0.0481 0.1877
5045.5 10.3719 56.443 43.3782 0.0076 0.1403
5046 8.1382 54.5388 46.2675 0.01 0.1443
5051.5 29.0142 46.8052 43.3224 -0.0465 0.0917
5052 32.3053 46.4278 32.9387 -0.0509 0.0868
5052.5 38.4807 45.3555 24.4187 -0.0619 0.0774
5053 38.8954 43.8459 21.8487 -0.0688 0.0681
5055 19.69 50.9335 46.9419 -0.0527 0.0897
5055.5 11.7398 51.8329 59.5443 -0.0307 0.1083
5056 13.3196 51.8329 55.4419 -0.0276 0.1262
5056.5 18.3702 51.7003 39.232 -0.0408 0.1105
5057.5 14.0531 50.1129 24.4546 -0.0444 0.0921
5058 15.292 49.8805 23.0938 -0.0347 0.0925
5059 20.5135 49.52 21.6173 -0.0333 0.1006
5060 14.5151 47.5836 27.0685 -0.0156 0.1062
5060.5 14.5188 48.2506 27.9704 -0.0363 0.1018
5228 1.2168 54.2009 17.4351 0.0583 0.1794
5229 3.5896 51.7649 26.1107 -0.0033 0.1362
5232.5 2.7404 53.5941 38.6852 0.0646 0.194
5233 3.6694 53.9483 36.674 0.0633 0.204
5234 1.3789 53.8741 18.5804 0.0693 0.1958
5234.5 0.8592 53.6052 18.1654 0.0742 0.1982
5237 2.6951 52.3763 24.8098 0.0549 0.1923
I am trying to create an R visual that will break out each Column into facets, using Col1 as the identity column.
To do this I am using this (faulty) code:
library(reshape2)
library(plotly)
plot.data <- dataset
melted <- melt(dataset, id.vars="Col1")
sp <- ggplot(melted, aes(x=Col1, y=value)) + geom_line()
# Divide by variable in the vertical direction
sp + facet_grid(variable~.)
ggplotly()
However, I am receiving an error saying:
Faceting variables must have at least one value
I know this is an unlikely solution, but did you make sure all your filters are correct / not filtering out values somehow? I find that filter are often a source of mistakes for me so if it works in R, that could be the problem.
I had the same error and it was my filtering:
Example:
I did this data <- data[data$symbol == geneId,] instead of data <- data[data$symbol %in% geneId,]

Error in solve.QP

So essentially I have two matrices containing the excess returns of stocks (R) and the expected excess return (ER).
R<-matrix(runif(47*78),ncol = 78)
ER<-matrix(runif(47*78),ncol = 78)
I then combine these removing the first row of R and adding the first row of ER to form a new matrix R1.
I then do this for R2 i.e. removing first two rows of and R and rbinding it with the first 2 rows of ER.
I do this until I have n-1 new matrices from R1 to R47.
I then find the Var-Cov matrix of each of the Return matrices using cov() i.e. Var-Cov1 to Var-Cov47.
n<-47
switch_matrices <- function(mat1, mat2, nrows){
rbind(mat1[(1+nrows):nrow(mat1),],mat2[1:nrows,])
}
l<-lapply(1:n-1, function(nrows) switch_matrices(R,ER, nrows))
list2env(setNames(l,paste0("R",seq_along(l))), envir = parent.frame())
b<-lapply(l, cov)
list2env(setNames(b,paste0("VarCov",seq_along(b))), envir = parent.frame())
I am now trying to find the asset allocation using quadprog. So for example:
D_mat <- 2*VarCov1
d_vec <- rep(0,78)
A_mat <- cbind(rep(1,78),diag(78))
b_vec <- c(1,d_vec)
library(quadprog)
output <- solve.QP(Dmat = D_mat, dvec = d_vec,Amat = A_mat, bvec = b_vec,meq =1)
# The asset allocation
(round(output$solution, 4))
For some reason when running solve.QP with any Var-Cov matrix found I get this error:
Error in solve.QP(Dmat = D_mat, dvec = d_vec, Amat = A_mat, bvec = b_vec, :
matrix D in quadratic function is not positive definite!
I'm wondering what I am doing wrong or even why this is not working.
The input matrix isn't positive definite, which is a necessary condition for the optimization algorithm.
Why your matrix isn't positive definite will have to do with your specific data (the real data, not the randomly generated example) and will be both a statistical and subject matter specific question.
However, from a programming perspective there is a workaround. We can use nearPD from the Matrix package to find the nearest positive definite matrix as a viable alternative:
# Data generated by code in the question using set.seed(123)
library(quadprog)
library(Matrix)
pd_D_mat <- nearPD(D_mat)
output <- solve.QP(Dmat = as.matrix(pd_D_mat$mat),
dvec = d_vec,
Amat = A_mat,
bvec = b_vec,
meq = 1)
# The asset allocation
(round(output$solution, 4))
[1] 0.0052 0.0000 0.0173 0.0739 0.0000 0.0248 0.0082 0.0180 0.0000 0.0217 0.0177 0.0000 0.0000 0.0053 0.0000 0.0173 0.0216 0.0000
[19] 0.0000 0.0049 0.0042 0.0546 0.0049 0.0088 0.0250 0.0272 0.0325 0.0298 0.0000 0.0160 0.0000 0.0064 0.0276 0.0145 0.0178 0.0000
[37] 0.0258 0.0000 0.0413 0.0000 0.0071 0.0000 0.0268 0.0095 0.0326 0.0112 0.0381 0.0172 0.0000 0.0179 0.0000 0.0292 0.0125 0.0000
[55] 0.0000 0.0000 0.0232 0.0058 0.0000 0.0000 0.0000 0.0143 0.0274 0.0160 0.0000 0.0287 0.0000 0.0000 0.0203 0.0226 0.0311 0.0345
[73] 0.0012 0.0004 0.0000 0.0000 0.0000 0.0000

Resources